ArticlePDF Available

Reinforcement learning applications in environmental sustainability: a review

March 2024
Artificial Intelligence Review 57(4)

March 2024
57(4)

DOI:10.1007/s10462-024-10706-5

License
CC BY 4.0

Authors:

Alberto Castellini

University of Verona

Davide La Torre

SKEMA Business School, Université Côte d'Azur

Show all 5 authorsHide

Environmental sustainability is a worldwide key challenge attracting increasing attention due to climate change, pollution, and biodiversity decline. Reinforcement learning, initially employed in gaming contexts, has been recently applied to real-world domains, including the environmental sustainability realm, where uncertainty challenges strategy learning and adaptation. In this work, we survey the literature to identify the main applications of reinforcement learning in environmental sustainability and the predominant methods employed to address these challenges. We analyzed 181 papers and answered seven research questions, e.g., “How many academic studies have been published from 2003 to 2023 about RL for environmental sustainability?” and “What were the application domains and the methodologies used?”. Our analysis reveals an exponential growth in this field over the past two decades, with a rate of 0.42 in the number of publications (from 2 papers in 2007 to 53 in 2022), a strong interest in sustainability issues related to energy fields, and a preference for single-agent RL approaches to deal with sustainability. Finally, this work provides practitioners with a clear overview of the main challenges and open problems that should be tackled in future research.

Academic studies published from 2003 to 2023. Pink dots represent the number of publications per year used to compute the regression model represented by the green line

…

Number of publications per Country on RL approaches for environmental sustainability

…

Co-author relationships with Country as a unit of analysis. Nodes represent states, and links depict co-authorship relationships. The thickness of the link is proportional to the number of papers in the co-authorship relationship

…

Overview of application domains. For each application domain (y-axis), we show the number of occurrences of keywords belonging to its macro-area (x-axis)

…

Overview of RL methods used. For each RL method (y-axis), we show the number of occurrences of corresponding keywords (x-axis)

…

Figures - available from: Artificial Intelligence Review

This content is subject to copyright. Terms and conditions apply.

Access to this full-text is provided by Springer Nature.

Learn more

Content available from Artificial Intelligence Review

This content is subject to copyright. Terms and conditions apply.

Vol.:(0123456789)

Artiﬁcial Intelligence Review (2024) 57:88

https://doi.org/10.1007/s10462-024-10706-5

1 3

Reinforcement learning applications inenvironmental

sustainability: areview

MaddalenaZuccotto1· AlbertoCastellini1· DavideLaTorre2· LapoMola3,4·

AlessandroFarinelli1

Published online: 12 March 2024

Abstract

Environmental sustainability is a worldwide key challenge attracting increasing attention

due to climate change, pollution, and biodiversity decline. Reinforcement learning, initially

employed in gaming contexts, has been recently applied to real-world domains, including

the environmental sustainability realm, where uncertainty challenges strategy learning and

adaptation. In this work, we survey the literature to identify the main applications of rein-

forcement learning in environmental sustainability and the predominant methods employed

to address these challenges. We analyzed 181 papers and answered seven research ques-

tions, e.g., “How many academic studies have been published from 2003 to 2023 about RL

for environmental sustainability?” and “What were the application domains and the meth-

odologies used?”. Our analysis reveals an exponential growth in this ﬁeld over the past two

decades, with a rate of 0.42 in the number of publications (from 2 papers in 2007 to 53 in

2022), a strong interest in sustainability issues related to energy ﬁelds, and a preference for

single-agent RL approaches to deal with sustainability. Finally, this work provides prac-

titioners with a clear overview of the main challenges and open problems that should be

tackled in future research.

Keywords Reinforcement learning· Environmental sustainability· Reinforcement learning

applications· Sustainable development· Artiﬁcial intelligence

* Maddalena Zuccotto

maddalena.zuccotto@univr.it

1 Department ofComputer Science, University ofVerona, Strada Le Grazie 15, 37134Verona, Italy

2 SKEMA Business School, Université Côte d’Azur, Sophia Antipolis, 60 Rue Fedor Dostoïevski,

06902Valbonne, France

3 Department ofManagement, University ofVerona, Via Cantarane 24, 37129Verona, Italy

4 SKEMA Business School, Université Côte d’Azur (GREDEG), Sophia Antipolis, 60 Rue Fedor

Dostoïevski, 06902Valbonne, France

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 2 of 68

1 Introduction

Artiﬁcial Intelligence (AI) is taking an increasingly important role in industry and soci-

ety. AI techniques have been recently introduced in autonomous driving, personalized

shopping, and fraud prevention, just to make a few examples. A key challenge faced

by today’s society for which AI can bring an important advancement is environmental

sustainability. Climate change, pollution, biodiversity decline, poor health, and poverty

have led in the last years governments and companies to focus more and more their

eﬀorts and investments on solutions to environmental sustainability problems, which

are usually characterized by an ineﬃcient and increased use of resources. Environmen-

tal sustainability can be deﬁned as a set of constraints regarding the use of renewa-

ble and nonrenewable resources on the one hand, pollution, and waste assimilation on

the other (Goodland 1995). In this regard, in 2015, the United Nations published the

“2030 Agenda for Sustainable Development” the centerpiece of which is 17 Sustainable

Development Goals (United Nations 2015) to be fully achieved by 2030 to attain sus-

tainable development in the economic, social, and environmental contexts, and elimi-

nate all forms of poverty.

AI-based algorithms can control autonomous drones used in water monitoring (Stec-

canella etal. 2020; Marchesini etal.2021; Bianchi etal.2023), extract from acquired data

new insight about environmental conditions (Castellini etal. 2020; Azzalini etal. 2020),

improve the healthiness of indoor environments (Capuzzo etal. 2022), or demand forecast

in district heating networks (Bianchi etal.2019; Castellini etal. 2021,2022). Several AI

techniques have been employed to address various environmental sustainability challenges.

These approaches enable the eﬃcient management of distributed resources within smart

grids (Roncalli etal. 2019; Orfanoudakis and Chalkiadakis 2023), improve the power ﬂow

for DC grids (Blij etal. 2020), increase the utilization of renewable resources for elec-

tric vehicle charging (Koufakis etal. 2020), and mitigate carbon emissions in urban trans-

portation by fostering ridesharing and reducing traﬃc congestion (Bistaﬀa et al. 2021,

2017). Furthermore, a crucial aspect of climate change prevention involves optimizing the

energy consumption associated with heating and cooling residential properties. To tackle

this issue, AI-based approaches have been developed methods to enhance the eﬃciency of

home systems (Panagopoulos etal. 2015; Auﬀenberg etal. 2017) and quantify the thermal

eﬃciency of residences (Brown etal. 2021). Among the broad spectrum of AI techniques

in this survey, we focus on Reinforcement Learning (RL)(Sutton and Barto 2018), which

has recently obtained impressive success, achieving human-level performance in several

tasks, such as in the context of games (Silver etal. 2016, 2017).

One of the most important and interesting challenges in today’s RL research is the

application of RL algorithms to real-world domains, where uncertainty makes strategy

learning and adaptation much more complex than in game environments. In particular, the

application of RL to environmental sustainability has achieved, in the last decade, a strong

interest from both the computer science community and the communities of environmental

sciences and business. Reducing carbon emissions requires increasing renewable resources

usage, such as solar and wind power. While these resources are economically eﬃcient,

their stochastic and intermittent nature poses challenges in replacing nonrenewable energy

sources within energy networks. RL, with a systematic trial-and-error interaction with

dynamic environments, oﬀers a promising approach for learning optimal policies that can

adapt to changing system dynamics and eﬀectively manage environmental uncertainty.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 3 of 68 88

Thus, an RL agent is capable of handling variations in operating conditions, for instance,

due to a change in resource availability or weather conditions.

This work surveys the recent use of RL to improve environmental sustainability. It pro-

vides a comprehensive overview of the diﬀerent application domains where RL has been

used, such as energy and water resource management, and traﬃc management. The goal is

to show practitioners the state-of-the-art RL methods that are currently used to solve envi-

ronmental sustainability problems in each of these domains. For each paper analyzed, we

consider

• The problem tackled,

• The RL approach used,

• The challenges faced,

• The formalization of the RL problem (i.e., type of state/action space, type of transition

model, type of RL method, performance measures used to evaluate the results).

The paper is structured as follows. Section2 presents the surveys already available on top-

ics close to RL and environmental sustainability. Section3 presents the basic concepts of

RL as well as a formalization of the main concepts. In Sect.4, we present the research

methodology used in our survey. Section5 describes the results of our research, consider-

ing diﬀerent levels of detail. In particular, in Sects.5.1.1, 5.1.2 and 5.1.3, we provide a

quantitative analysis of the state-of-the-art related to the application of RL in environmen-

tal sustainability over the last two decades. Then, Sect.5.1.4 outlines domains where RL

techniques are applied and the RL-based approaches employed to address environmental

sustainability. In Sect.5.2, our focus shifts to a subset of 35 main papers, for which we

analyze the application domains of proposed RL techniques, provide technical insights into

problem formalization, discuss the performance metrics used for evaluation, and consider

the challenges addressed. Section5.3 provides an in-depth analysis of each of these main

papers. Finally, in Sect.6 we discuss our ﬁndings, and in Sect.7 we draw conclusions and

summarize future directions.

2 Related work

The literature provides already some surveys on the application of RL to problems

related to environmental sustainability, but all these works focus only on speciﬁc aspects

of environmental sustainability or they consider also AI methods diﬀerent from RL. For

instance, Ma etal. (2020) focus on Energy-Harvesting Internet of Things (IoT) devices,

oﬀering insights into recent advancements addressing challenges in commercializa-

tion, standards development, context sensing, intermittent computing, and communica-

tion strategies. Charef etal. (2023) conduct a study considering various AI techniques,

including RL, to enhance energy sustainability within IoT networks. They categorize

studies based on the challenges they address, establishing connections between chal-

lenges and AI-based solutions while delineating the performance metrics used for

evaluation. Within the domain of Architecture, Engineering, Construction, and Opera-

tion, Rampini and ReCecconi (2022) concentrate on the application of AI techniques,

including RL, in Asset Management. Their work reviews studies related to several

aspects such as energy management, condition assessment, operations, risk, and project

management, identifying key points for future development in this context. Alanne and

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 4 of 68

Sierla (2022) shift their focus to smart buildings, discussing the learning capabilities of

intelligent buildings and categorizing learning application domains based on objectives.

They also survey the application of RL and Deep Reinforcement Learning (DRL) in

decision-making and energy management, encompassing aspects like control of heating

and cooling systems and lighting systems. Within the context of smart buildings and

smart grids, Mabina et al. (2021) examine the utilization of Machine Learning (ML),

including RL, for optimizing energy consumption and electric water heater schedul-

ing, emphasizing the advantages of these approaches in Demand Response (DR) due

to their interaction with the environment. Himeur etal. (2022) investigate the integra-

tion of AI-big data analytics into various tasks such as load forecasting, water man-

agement, and indoor environmental quality monitoring, focusing on the role of RL and

DRL in optimizing occupant comfort and energy consumption. Yang etal. (2020) focus

on the application of RL and DRL techniques to sustainable energy and electric sys-

tems, addressing issues such as optimization, control, energy markets, cyber security,

and electric vehicle management.

In the realm of transportation systems, Li etal. (2023) explore various topics, includ-

ing cooperative mobility-on-demand systems, driver assistance systems, autonomous

vehicles (AVs), and electric vehicles (EVs). Sabet and Farooq (2022) study the state-

of-the-art in the context of Green Vehicle Routing Problems, which involve reducing

greenhouse gas (GHG) emissions and addressing issues like charging activities, pickup

and delivery operations, and energy consumption. Moreover, the authors note that most

of the works leverage metaheuristics while using RL methods is uncommon. Chen etal.

(2019) tackle sustainability concerns within the Internet of Vehicles, leveraging 5th

generation mobile network (5G) technology, Mobile Edge Computing architecture, and

DRL to optimize energy consumption and resource utilization. Rangel-Martinez etal.

(2021) assess the application of ML techniques, including RL, in manufacturing, with

a focus on energy-related ﬁelds impacting environmental sustainability. Sivamayil etal.

(2023) explore a wide range of RL applications (e.g., Natural Language Processing,

health care, etc.) emphasizing Energy Management Systems with an environmental sus-

tainability perspective. Mischos etal. (2023) investigate Intelligent Energy Management

Systems across diverse building environments, considering control types and optimiza-

tion approaches, including ML, DL, and DRL. Yao etal. (2023) discuss the application

of Agent-Based Modeling and Multi-Agent System modeling in the transition to Multi-

Energy Systems, highlighting RL and suggesting future research directions in Multi-

Agent Reinforcement Learning (MARL) for energy systems.

While these works address speciﬁc aspects of environmental sustainability using RL

methods, our review takes a comprehensive approach, analyzing all contexts in which

RL techniques have recently contributed to enhancing environmental sustainability. Our

goal is to provide practitioners with insights into state-of-the-art methods for addressing

environmental sustainability challenges across various application domains, including

energy and water resource management and traﬃc management. In summary, the main

contribution of this survey consists of oﬀering an overview of RL application domains

within the context of environmental sustainability.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 5 of 68 88

3 Reinforcement learning: preliminaries andmain denitions

In this section, we present the basic concepts of RL as well as a formalization of the

main concepts. RL, a prominent machine learning paradigm, focuses on learning a pol-

icy that maximizes cumulative rewards, i.e., which action should be selected consider-

ing the environment conﬁguration for achieving the best possible outcome. Key ele-

ments of RL are listed in the following:

• The agent is the entity that makes decisions and performs actions in the environment;

• The environment represents the system with which the agent interacts and provides

the agent with feedback on the performed action;

• The policy is a function that deﬁnes the agent’s behavior considering the environment con-

ﬁguration (i.e., a map between what the agent observes and what the agent should do);

• The reward is a numerical signal that provides feedback on the action performed by the

agent;

• The value function speciﬁes state values, namely, how valuable it is to reach a state, con-

sidering also future states reachable from it;

• The model of the environment (optional) is a stochastic function providing next state prob-

ability given current state and action, it allows simulating the behavior of the environment

in response to the agent’s actions.

RL methods (Sutton and Barto 2018) can be categorized into two main groups: model-free

and model-based (Moerland et al. 2020). Over the past two decades, model-free methods

have demonstrated signiﬁcant success. Meanwhile, model-based approaches have become a

focal point in current research due to their potential to enhance sample eﬃciency, which is a

reduction in interactions with the environment. This eﬃciency is achieved by explicitly repre-

senting the model of the environment and incorporating relevant prior knowledge (Castellini

etal. 2019; Zuccotto etal. 2022a, b). Additionally, model-based methods oﬀer the advantage

of addressing the risks associated with taking actions in partially observable environments

(Mazzi etal. 2021, 2023; Simão etal. 2023) or partially known environment (Castellini etal.

2023).

A common framework to formalize the RL problem is by using Markov Decision Pro-

cess (MDP)(Puterman 1994). An MDP is a tuple

𝛾)

where S is a ﬁnite set of

states, A is a ﬁnite set of actions,

T∶S×A

→

Π(S)

is the transition model where

Π(S)

the space of probability distribution over states,

R∶S×A→ℝ

is the reward function and

𝛾∈[0, 1)

is the discount factor. The agent’s goal is to maximize the expected discounted

return

𝔼

[∑

∞

t=0

𝛾

R(st,at

)]

acting optimally, namely, choosing in each state

, at time t, the

action

with the highest expected reward. The solution of an MDP is an optimal policy,

namely, a function that optimally maps states into actions. A policy is optimal if it maxi-

mizes the expected discounted return. The discount factor

𝛾

reduces the weight of long-term

rewards guaranteeing convergence. In the case of partially observable environments, an

extension of the MDP framework, namely POMDP(Kaelbling etal. 1998), can be used. A

POMDP is a tuple

𝛾)

where the elements shared with MDP are augmented

, a ﬁnite set of observations, and

O∶S×A

→

Π(Ω)

, the observation model. In contrast

to MDPS, in POMDPs the agent is not able to directly observe the current state

but it

maintains a probability distribution over states S, called belief, which updates at each time-

step. The belief summarizes the agent’s previous experiences, i.e. the sequence of actions

and observations that the agent took from an initial belief

to the belief b. The solution

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 6 of 68

of a POMDP is an optimal policy, namely, a function that optimally maps belief states into

actions. In the following, we will survey applications of RL to environmental sustainability,

hence we will investigate how the elements described in this section (e.g., MDP modeling

framework, RL algorithms, etc.) have been used so far to solve problems related to environ-

mental sustainability.

4 Review methodology

In this section, we outline the research methodology we used for this study. It consists of 5

steps: (i) the deﬁnition of the research questions, (ii) paper collection process, (iii) the deﬁ-

nition of inclusion and exclusion criteria, (iv) the identiﬁcation of relevant studies based on

inclusion and exclusion criteria, (v) data extraction and analysis.

Research questions. The ﬁrst step involves deﬁning the research questions we want

to answer on the application of RL techniques for environmental sustainability. The goal

of our questions is twofold: to oﬀer a quantitative analysis of the state of the art related to

the application of RL to environmental sustainability and to analyze the use of these tech-

niques focusing on sustainability. Speciﬁcally, we aim to answer the following questions:

• RQ1: How many academic studies have been published from 2003 to 2023 about RL

for environmental sustainability?

• RQ2: What were the most relevant publication channels used?

• RQ3: In which country were located the most active research centers?

• RQ4: What were the application domains and the methodologies used?

• RQ5: How was the RL problem formalized (i.e., type of state/action space, type of tran-

sition model, and type of dataset used)?

• RQ6: Which evaluation metrics were used to assess the performance?

• RQ7: What were the challenges addressed?

The databases we use to collect papers are those of the search engines Scopus and Web of

Science. To limit the scope of research to the application of RL approaches for environ-

mental sustainability, we deﬁne the following search strings:

• “reinforcement learning AND sustainable AND environment”;

• “reinforcement learning AND environmental AND sustainability”;

• “reinforcement learning AND environment AND sustainability”;

• “reinforcement learning AND environmental AND sustainable”.

The search on the two databases led to a total of 375 papers, 236 collected from Scopus

and 139 from Web of Science.

Selection criteria for the initial set of (181) papers. To reﬁne the results of the search,

we outline the following inclusion and exclusion criteria.

Inclusion criteria. To determine studies eligible for inclusion in this work, we consider

the following criteria:

• It is written in English;

• It is clearly focused on RL for environmental sustainability;

• In the case of duplicate articles, the most recent version is included.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 7 of 68 88

Exclusion criteria. To further reﬁne our search, we apply the following exclusion criteria:

the study is an editorial, a conference review, or a book chapter.

Following these criteria, we found 181 papers (104 articles, 70 conference papers,

and 7 reviews). We combine the information in the index keywords of these papers

with their number of citations and the publication year. In particular, we compute

the number of occurrences of each keyword to identify the application domains and

methodologies most used in the literature. To this aim, we standardize the keywords

to avoid spelling variations. Then, we combine these values with the number of cita-

tions and the publication year to identify the most recent and relevant studies. In cases

where index keywords are missing, we use author keywords. For the only three papers

that do not have author nor index keywords, we use the title as related keywords.

Selection criteria for the set of (35) main papers. To identify papers for the in-

depth analysis, we applied the following criteria that consider the most important

keyword occurrences (i.e., the most frequent keywords), the publication year, and the

number of citations based on publication year.

• Presence of at least one keyword with no less than 10 occurrences;

• Publication year from 2013 to 2023;

• Number of citations:

– Papers published in 2022–2021, at least 3 citations;

– Papers published in 2020–2019, at least 10 citations;

– Papers published in 2018–2013, at least 20 citations.

Following these criteria, we selected 35 studies that have been explored in-depth, and

answers to the research questions deﬁned above have been reported.

In the following sections, we ﬁrst consider the initial 181 papers found using the

search strings deﬁned above and applying inclusion/exclusion criteria. In Sect.5.1.1

we answer question RQ1 for those papers, in Sect. 5.1.2 we answer question RQ2,

in Sect.5.1.3 we answer question RQ3 and in Sect.5.1.4 we answer question RQ4.

Namely, we ﬁrst analyze the number of papers that focus on RL for sustainability pub-

lished in the last 20 years, then we identify the main international conferences, work-

shops, and journals used to disseminate research, subsequently, we ﬁnd the research

centers that are particularly active in this research/application topic, and ﬁnally, we

analyze the application domains and RL methodologies used. From Sect.5.2, we start

focusing only on the main 35 papers identiﬁed using main papers selection criteria.

In particular, we answer question RQ4 in Sect.5.2.1, question RQ5 in Sect. 5.2.2,

question RQ6 in Sect.5.2.3, and question RQ7 in Sect.5.2.4. Namely, for these main

papers, we ﬁrst analyze the application domains of RL techniques and the RL-based

approaches used to tackle environmental sustainability; then we analyze the way in

which the problem has been formalized; subsequently, we investigate the evaluation

measures used; ﬁnally, we identify the main challenges addresses. Notice that ques-

tions RQ1, RQ2, and RQ3 have not been answered considering only the main 35

papers because these questions aim to provide a quantitative analysis of the state of the

art as a whole, and this subset of articles is part of the 181 papers used to answer these

three questions.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 8 of 68

5 Results ofthereview

This section reports the results of the analysis provided in this survey, ﬁrst for the ini-

tial set of 181 papers, then for the subset of the main 35 papers.

5.1 Analysis oftheinitial set of181 papers

The initial set of papers, selected using the search strings of Sect.4, is analyzed by answer-

ing questions RQ1, RQ2, RQ3, and RQ4.

5.1.1 RQ1: How many academic studies have been published from2003 to2023

aboutRL forenvironmental sustainability?

This research question aims to quantify the interest of the international scientiﬁc com-

munity in applying RL methods to environmental sustainability problems over the last 20

years. As shown in Fig.1, the number of publications (pink dots) remained relatively low

until 2018, with the number of publications each year less than ﬁve. Since 2019, there has

been a rapid growth of up to 53 papers in 2022, showing the increasing interest in this

topic during the last few years. It is important to notice that the data for the year 2023 are

updated to April 2023 and do not represent a decrease in the number of studies published.

Application of inclusion and exclusion criteria leads to no publication in the years 2004,

2005, 2010, and 2011. In Fig.1, we also show that the increase in the number of publica-

tions ﬁts an exponential pattern (green line) with a growth rate of 0.42 in the number of

publications (from 2 papers in 2007 to 53 in 2022). To compute the regression model, we

do not consider 2023 in the model since its information is partial.

Fig. 1 Academic studies published from 2003 to 2023. Pink dots represent the number of publications per

year used to compute the regression model represented by the green line

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 9 of 68 88

5.1.2 RQ2: What were themost relevant publication channels used?

With this research question, we aim to show what are the main channels used to dis-

seminate research in the application of RL techniques to environmental sustainability

problems. In Table1, we show the journals and conferences with at least 2 publications.

As can be seen, the topics of the journals and conferences are very varied. In particular,

some of these communication channels are speciﬁc for sustainability, e.g., “Sustainabil-

ity (Switzerland)” and “Sustainable Cities and Society”, and many are related to envi-

ronmental aspects such as “IOP Conference Series: Earth and Environmental Science”

Table 1 Journals and conferences with at least two publications

In the “Scope” column, “CS” and “APP” indicate a technical/informatics or application-oriented perspec-

tive of the Conference/Journal, respectively, while “CS + APP” denotes a combination of them

*Including subseries lecture notes in artiﬁcial intelligence and lecture notes in bioinformatics

Publications Scope

Conference

Lecture notes in computer science (*) 5 CS

IEEE conference on intelligent transportation systems 3 CS + APP

International conference on autonomous agents and multiagent systems 2 CS

IEEE international conference on distributed computing systems 2 CS

International conference on mobility, sensing and networking 2 CS + APP

IOP conference series: earth and environmental science 2 APP

Land, water and environmental management: integrated systems for sustain-

ability

2 APP

Journal

IEEE access 7 APP

IEEE internet of things journal 5 CS + APP

Sustainability (Switzerland) 5 CS + APP

IEEE transactions on intelligent transportation systems 4 CS + APP

Sustainable cities and society 4 CS + APP

Energies 3 APP

IEEE transactions on green communications and networking 3 CS + APP

IEEE transactions on vehicular technology 3 CS + APP

Journal of cleaner production 3 APP

Applied energy 2 APP

Applied sciences (Switzerland) 2 APP

Electronics (Switzerland) 2 CS + APP

Energy and buildings 2 APP

IEEE sensors journal 2 APP

IEEE transactions on network and service management 2 CS + APP

IEEE wireless communications 2 APP

Journal of hydrology 2 APP

Resources, conservation and recycling 2 APP

Sensors 2 APP

Sustainable energy technologies and assessments 2 APP

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 10 of 68

and “IEEE Transactions on Green Communications and Networking”. Moreover, in the

third column of Table1, we provide an overview of the scope of the publication chan-

nels. To this aim, we analyze the information presented on the website of each con-

ference and journal about its scope, indicating whether it has a technical/informatics

or application-oriented perspective (“CS” or “APP”, respectively) or a combination of

them (“CS + APP”). As can be seen, most of the publication channels are application-

oriented (2 conferences + 12 journals), followed by those that present a combined scope

(2 conferences + 8 journals), ﬁnally, a few of them (3 conferences) have a more techni-

cal/informatics perspective.

5.1.3 RQ3: In which country were located themost active research centers?

This research question aims to show which countries whose research centers are most con-

cerned with the application of RL methods to environmental sustainability issues. With

this in mind, we leverage the information in the Scopus and Web of Science databases

about the 181 papers that were not excluded by the application of inclusion and exclu-

sion criteria. In Fig.2, we show only the countries with at least 5 publications and, as we

can see, the highest number of papers comes from research centers located in China (33

papers), followed by the United States (29 papers), and the United Kingdom (17 papers).

It is important to note that most of these works are developed in collaboration between

research centers in multiple countries, so we count the paper for each collaborating coun-

try. To show co-author relationships, in Fig.3, we represent only countries with at least

5 occurrences among analyzed documents. Each country is depicted as a circle, a link

between 2 circles represents a co-authorship relation, and the line weight is proportional to

the number of papers in the co-authorship relationship. As we can see, the countries with

Fig. 2 Number of publications per Country on RL approaches for environmental sustainability

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 11 of 68 88

more links are the United States (9 links), followed by Australia (7 links), China, and India

(6 links).

5.1.4 RQ4: What were theapplication domains andthemethodologies used?

This research question aims to analyze the application domains and the RL methodologies

used for tackling issues related to environmental sustainability. To this aim, we analyze the

index keywords of the 181 papers that were not excluded by applying the inclusion and

exclusion criteria and the authors’ keywords for works with no index keywords. In Fig.4,

Fig. 3 Co-author relationships with Country as a unit of analysis. Nodes represent states, and links depict

co-authorship relationships. The thickness of the link is proportional to the number of papers in the co-

authorship relationship

Fig. 4 Overview of application domains. For each application domain (y-axis), we show the number of

occurrences of keywords belonging to its macro-area (x-axis)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 12 of 68

we show the application domains with more than 10 index keyword occurrences. We group

the keywords into macro areas, such as in “Energy” we include keywords like “energy”,

“energy conservation”, “energy consumption”, etc., in “Electric energy” we group key-

words such as “electric energy storage”, “electric load dispatching”, “smart grid”, etc.

(see Appendix for details). The image clearly shows that there is a wide variety of applica-

tion domains, but most of the applications deal with sustainability issues related to energy

ﬁelds.

Regarding the proposed approaches, we follow the same procedure as previously

described for application domains grouping keywords that refer to the same method. For

example, in “Actor-Critic” we group keywords such as “actor critic”, “advantage actor-

critic (A2C)”, and “soft actor critic”. As we can see in Fig.5, the most widely used RL

method for dealing with environmental sustainability in diﬀerent application domains is

a state-of-the-art model-free algorithm, namely Q-Learning (Watkins 1989). It is impor-

tant to note that, in the image, we show only RL approaches, but there are also index

keywords related to other approaches like “genetic algorithm”, “simulated annealing”,

etc.

Moreover, we perform a bibliometrics analysis on the co-occurrence of index keywords

by using VOSviewer (Perianes-Rodriguez etal.2016). Having a co-occurrence means that

2 keywords occur in the same work. After a data cleaning process, VOSviewer detects 17

clusters by considering keywords with 3 occurrences at least. In Fig.6, each cluster cor-

responds to a color, and each element of the cluster, namely a keyword, is depicted by a

circle in the cluster color. For instance, the blue cluster is made of several blue nodes, each

of which contains a keyword (e.g., electric vehicles, charging (batteries)) belonging clus-

ter. The size of the circle and the circle label depend on the number of occurrences of the

related keyword. Lines between items depict co-occurrences of keywords in a paper. Each

cluster groups keywords identifying an application domain and/or the approaches used to

tackle related environmental sustainability issues. For example, cluster 1 (red colored on

the top-right) is somewhat related to traﬃc signals control for traﬃc management through

the application of control strategies. Cluster 2 (green colored on the left) is related to power

management and energy harvesting in wireless sensor networks.

Fig. 5 Overview of RL methods used. For each RL method (y-axis), we show the number of occurrences of

corresponding keywords (x-axis)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 13 of 68 88

5.2 Analysis ofthe35 main papers

In this section, we focus on the 35 papers chosen using the selection criteria for the main

papers (see Sect.4). First, we provide a high-level analysis of the application domain and

the RL approaches used to address environmental sustainability issues (research question

RQ4). Then, we give an overview of the RL problem formalization (i.e., type of state/

action space, type of transition model, type of RL method) (research question RQ5). Sub-

sequently, we analyze the performance measures used to evaluate the results (research

question RQ6). Finally, we evaluate the main challenges faced (research question RQ7).

5.2.1 RQ4: What were theapplication domains andthemethodologies used?

In Table2, we summarize the application domains and the RL approaches used in the

selected works. First, we group the 35 main works according to their main related appli-

cation domains (ﬁrst column). It is important to note that application domains may

overlap consequently, we report all application domains common to all papers in the

same group. Then, we indicate for each paper (second column) the method behind the

proposed technique (third column). The selected papers tackle environmental sustain-

ability issues in the application domains shown in Fig.4. In particular, the most relevant

Fig. 6 Bibliometrics analysis on the co-occurrence of index keywords. Each color outlines a cluster, and

each circle of the cluster color represents a keyword, while edges represent co-occurrences of keywords in

the same work

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 14 of 68

Table 2 Technical information about the selected works. In the third column, we report the RL methodology used

Application domain Paper Method State Action Tr. model Dataset

Electric vehicles, batteries, energy Sultanuddin etal. (2023) DDQN N/A D St. S

Zhang etal. (2021a) MA AC N/A D* Det.* R

IoT Ajao and Apeh (2023) Q-Learning D* D* St. S

Zhang etal. (2021b) Adaptive regression D* C* Det.* S

Han etal. (2020) MAB N/A D* St. S

Water resources Emamjomehzadeh etal. (2023) Q-Learning N/A N/A N/A S

Skardi etal. (2020) Q-Learning N/A N/A Det., St. R + S

Emissions/pollution Chen etal. (2021) MADDPG C C N/A S

Huo etal. (2023) Multi-agent Q-learning D* D N/A S

Agriculture Elavarasan and DurairajVincent (2020) DRQN D* N/A N/A R + S

Data, energy Shaw etal. (2022) Q-Learning, SARSA D* D* St. S (R based)

Venkataswamy etal. (2023)AC D* D N/A S, R

Urban traﬃc and transportation Ounoughi etal. (2022) DQN N/A D* N/A R

AlizadehShabestray and Abdulhai (2019) DQN D* D St. S

Aziz etal. (2018) RMART D* D* St.* S

Khalid etal. (2023) DQN D* D* Det.* S

Buildings, energy Kathirgamanathan etal. (2021)SAC C N/A N/A R + S

DeGracia etal. (2015) SARSA(

𝜆

) D* D* Det. S

Manufacturing Wang and Wang (2022) Policy network D* D* Det.* S*

Leng etal. (2021) Q-Learning* N/A D* St.* S (R based)

Mobile and wireless communication,

energy, renewable/sustainable energies

Liu etal. (2021) DDPG C C N/A S

Miozzo etal. (2015) Distributed Q-Learning D* D N/A S

Miozzo etal. (2017) Distributed Q-Learning D* D N/A S

Giri and Majumder (2022) Deep Q-Learning C* D* St.* S

Mobile and wireless communication Al-Jawad etal. (2021) Q-Learning D* D* N/A S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 15 of 68 88

Table 2 (continued)

Application domain Paper Method State Action Tr. model Dataset

Energy, electric energy Sheikhi etal. (2016) Q-Learning N/A N/A St. S

Harrold etal. (2022) Rainbow DQN C* D N/A R + S

Jendoubi and Bouﬀard (2022) MADDPG N/A C St.* S*

Energy Gao etal. (2023) Q-Learning N/A D* St.* R

Wireless sensor network, energy, renew-

able/sustainable energies

Hsu etal. (2014) Q-Learning* D* D* N/A S

Chen etal. (2016) Q-Learning D* D* St.* R + S

Feng etal. (2023) DDPG C C St.* S

Autonomous vehicles Bouhamed etal. (2020) DDPG C* C N/A S

Q-Learning D* D*

Sacco etal. (2021)AC C* D* N/A R + S

Gu etal. (2023) Policy gradient C* D St* S

In the fourth and ﬁfth columns, we indicate with “C” and “D” continuous and discrete state/action spaces, respectively. In the sixth column, “Det.” and “St.” denote determin-

istic and stochastic transition models respectively. In the seventh column “S” and “R” indicate synthetic and real-world datasets, respectively

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 16 of 68

application domain correlates to the macro area of “Energy”. Indeed, it involves more

than half of the papers in the table, considering both the works in which it represents the

main application domain and those in which it is related to the main application domain.

In Table2, we also show that 16 out of the 35 selected papers use DRL approaches

such as Deep Q-Network (DQN) (Mnih et al. 2015) and Double Deep Q-Network

(DDQN)(van Hasselt etal. 2016), and another 2 rely on DRL techniques in multi-agent

contexts, such as Multi-Agent Deep Deterministic Policy Gradient (MADDPG)(Lowe

et al. 2017). RL techniques are used by 10 articles, here the most used method is

Q-Learning, and 7 apply RL approaches to a multi-agent context. Finally, only 1 paper

adopts a Genetic Algorithm-based RL (GARL) approach.

5.2.2 RQ5: How wastheRL problem formalized (i.e., type ofstate/action space, type

oftransition model, andtype ofdataset used)?

This research question deals with a technical point of view, which we think may be helpful

for practitioners to get an overview of the environments considered by the authors in devel-

oping the proposed methods. In Table2, we summarize the information related to problem

formulation that we found in the selected papers. For each paper, we point out if the state

and action spaces are continuous or discrete and whether the transition model is determin-

istic or stochastic. Finally, we provide information on the dataset used in the experiments,

outlining whether real-world or synthetic data are used. It is important to note that not all

papers explicitly provide this information. Thus, we mark with “*” all information inferred

from reading the article. On the other hand, “N/A” speciﬁes that the information available

was not enough to infer the required data.

In the selected papers, most of the spaces of states and actions are discrete. Indeed, only

9 approaches use a continuous space state (third column of Table2), and 6 use a continu-

ous action space (fourth column of Table2). Regarding the transition model, we can see

that the model is stochastic in most cases where the information is available. In DeGracia

etal. (2015),“Det.” and “St.” are both reported because the authors test the proposed meth-

odology on both models. Finally, in the last column, we note that most of the experiments

are performed on synthetic datasets. In fact, only 9 papers use real-world data, 6 of which

combine them with synthetic data (“R + S” in the table), while 2 others use the real data to

generate larger data sets from them (“S (R based)” in the table). Only Venkataswamy etal.

(2023) test the proposed approach on both dataset types (“R, S” in the table).

5.2.3 RQ6: Which evaluation metrics were used toassess theperformance?

This research question aims to provide an overview of the authors’ performance meas-

ure choices to evaluate the proposed approaches in the 35 selected papers. In the second

column of Table3, we report information about the metrics found in the articles, which

are also indicated in the in-depth analysis of each paper in Section5.3. As we can see in

Table3, the performance measures vary widely depending on the application domain and

the goal of the method proposed in each paper. For example, reward is used as a metric in

9 articles but is computed diﬀerently depending on the context. Concerning electric vehi-

cles, in Sultanuddin etal. (2023), the reward corresponds to a penalty function consider-

ing the cost of charging and a departure incentive. Instead, in wastewater treatment plants

(WWTPs), Chen etal. (2021) use a reward function that takes into account the operational

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 17 of 68 88

Table 3 We summarize the performance measures used by the authors to evaluate the proposed approaches in the second column and the challenges they address in the third

column

Paper Performance measures Challenges

Sultanuddin etal. (2023) Reward, voltage levels, load curves, charging/discharging

curves

Grid overload prevention, driving pattern uncertainty, dimen-

sionality

Zhang etal. (2021a) MCWT, MCP, TSF, CFR Dimensionality, coordination and cooperation among agents,

charging requests competitiveness, joint optimization of multi-

ple optimization objectives

Ajao and Apeh (2023) Detection accuracy, recall, precision, speciﬁcity, F-measure Security threat to sustainability functionality

Zhang etal. (2021b) Operational logs, power wastage, power requirements, average

failure ratio

Energy requirements management, smart power allocation

Han etal. (2020) Number of ready nodes, throughput Spatial uncertainty

Emamjomehzadeh etal. (2023) Water table level, nitrate concentration, energy usage, GHG

emissions

WEF nexus modeling and management for an urban area inte-

grated water resource management

Skardi etal. (2020) Water and wastewater allocation, water and groundwater level,

nitrate concentration

Social attachments quantiﬁcation and consideration, cooperation

among agents

Chen etal. (2021) Reward, Q-values, inﬂuents, inﬂow rate, DO and dosage val-

ues, energy consumption, cost, EP, and GHG emissions

WWTP impact optimization

Huo etal. (2023) Productivity, operational mistakes, GHG emissions, queuing

time

Operational randomness and uncertainties

Elavarasan and DurairajVincent (2020) R2, MAE, MSE, RMSE, MedAE, MSLE, MAPE, PDF,

explained variance score, accuracy

Mapping between raw data and crop yield values, eﬀectiveness

dependence on extracted features quality

Shaw etal. (2022) Energy consumption, SLAV, number of migrations, ESV Energy awareness, slow convergence to optimal policy

Venkataswamy etal. (2023) Monetary job value Intermittent power, environment nonuniformity, system design

and conﬁguration eﬀects, learning and improving available

heuristic policies

Ounoughi etal. (2022) MSE, MAE, noise levels, CO

emission, fuel consumption Sustainability and proactivity integration

AlizadehShabestray and Abdulhai (2019) Average of intersection travel time, queue time, and network

travel time, weighted average intersection person travel time

Regular and transit vehicles consideration

Aziz etal. (2018) Average delay, stopped delay, number of stops, and network-

wide delay, GHG emissions

Traﬃc congestion information sharing, reward function dynamic

adaptation

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 18 of 68

Table 3 (continued)

Paper Performance measures Challenges

Khalid etal. (2023) Execution time, reward, path planned, distance Quality of experience assurance, optimal order user serving,

distance minimization

Kathirgamanathan etal. (2021) Energy purchased and cost, discomfort, reward, temperature,

power demand

Robustness, scalability, lack of well-established environments

DeGracia etal. (2015) Electrical energy saving Energy saving maximization, thermal energy storage optimiza-

tion

Wang and Wang (2022) Overall Nondominated Vector Generation, C Metric, Hyper

volume, D1

Energy awareness with simultaneous makespan and energy

minimization

Leng etal. (2021) MSE, MSLE, RMSE, R2, unit and total proﬁt, acceptance rate Demand uncertainty, order customization

Liu etal. (2021) Achievable rate, transmission power Eﬀective communication, IRS phase optimization

Miozzo etal. (2015)Throughput gain, traﬃc drop rate, energy eﬃciency, energy

eﬃcient improvement, traﬃc demand, harvested energy,

battery level, policy, normalized load at the macro bs, total

energy spent, average load, average cell load for the macro bs

battery outage, Jain’s fairness index

Energy harvesting

Miozzo etal. (2017) Switch-oﬀ rate, battery level, excess energy

Giri and Majumder (2022) Reward, capacity, network lifetime, average delay Dimensionality, eﬃcient use of collected energy for QoS

Al-Jawad etal. (2021) Throughput, packet loss, rejected ﬂows, PSNR, MOS Sustainable QoS

Sheikhi etal. (2016) Storage charge level, operational cost, primary energy involved Energy system parameter variability or stochasticity, smart grid

architecture issues

Harrold etal. (2022) MAPE, energy cost savings, relative savings, episodic rewards,

and value distribution

Energy arbitrage, renewables usage improvement, limited data

availability

Jendoubi and Bouﬀard (2022) Annual total cost, daily operation cost, PV production, aggre-

gated demand, power to be charged/discharged, generator

provided power, provider delivered electricity, PAR

Energy dispatch

Gao etal. (2023) Working and non-working energy consumption, the ratio

between working and non-working energy consumption

Energy consumption minimization

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 19 of 68 88

Table 3 (continued)

Paper Performance measures Challenges

Hsu etal. (2014) RBE, EDC, OTRT, ToD achievability Simultaneous achievement of throughput on demand satisfaction

and power consumption reduction

Chen etal. (2016) Nodes potential energy, network lifetime, area coverage ratio,

number of residual alive nodes versus the network lifetime,

recharging cycle

Simultaneous area coverage and energy balancing

Feng etal. (2023) Distribution of the ratio between the expected per-slot

harvested energy and the MS-to-sink distance within the

network, moving trajectories and steps, reward, actor-loss,

battery level, accuracy, convergence, training time

Lack of energy-related information, energy harvesting and data

transmission trade-oﬀ

Bouhamed etal. (2020) Path followed, reward, battery level, completion time of the

tour against the ground unit transmission power

Limited battery capacity, obstacle awareness

Sacco etal. (2021) Task completion time, utility, average node-antenna distance,

energy consumption, task completion time against the aver-

age computing workload, CDF and utility evolution

Task completion time reduction, energy eﬃciency

Gu etal. (2023) Energy loss, collision loss, reward, lane changes Eﬃcient response to environmental observations

The papers are grouped according to the main related application domains, as in Table2. Performance measures written in italics are used in both Miozzo etal. (2015) and

Miozzo etal. (2017)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 20 of 68

cost, consisting of multiple components, such as energy cost and biogas price, and several

indicators, like energy consumed by the aeration and sludge treatment processes and GHG

emissions. Another performance measure common to multiple application domains is, for

example, energy consumption. Indeed, it is used in contexts such as water resources man-

agement(Emamjomehzadeh etal. 2023), WWTPs(Chen etal. 2021), data centers(Shaw

etal. 2022), and AVs(Sacco etal. 2021). Even approaches related to the same application

domain may diﬀer in terms of performance measures depending on their objective. Con-

sidering, for example, the water resources context, both Emamjomehzadeh et al. (2023)

and Skardi etal. (2020) evaluate their proposed approaches using resource level and nitrate

concentration. However, in (Emamjomehzadeh etal. 2023), energy consumption and GHG

emissions are also considered, while in (Skardi etal. 2020), resource allocation is used.

5.2.4 RQ7: What were thechallenges addressed?

This research question aims to oﬀer an overview of the issues that the authors have tackled

within the 35 selected papers. In the third column of Table3, we summarize information

about the challenges addressed in the articles, which are also indicated in the in-depth anal-

ysis of each paper in Section5.3. As with the performance measures, we can see in Table3

that the challenges faced vary greatly depending on the application context and the goal

of the method proposed in each paper. As an example, considering the domain of electric

vehicles, Sultanuddin etal. (2023) address several challenges, like avoiding network energy

overload at peak times, considering the uncertainty of driving patterns, and managing

large state spaces. On the other hand, in addition to the challenge related to dimensionality,

Zhang etal. (2021a) also address issues related to coordination and collaboration among

agents, the competitiveness of charging demands, and joint optimization of multiple objec-

tive functions. However, although not explicitly stated by the authors, the challenge that

unites these papers is the development of approaches capable of adapting to changes in a

dynamic environment and managing the uncertainty associated with the environment that,

in many cases, arises from the use of renewable resource sources, which have a stochastic

and intermittent nature whose management adds further complexity to the problem.

5.3 Analysis ofsingle papers (grouped byapplication domain)

In this section, we group the 35 main papers by application domain and analyze each single

paper answering research questions RQ4, RQ5, RQ6, and RQ7. This provides the reader

interested in a speciﬁc application domain with a deep knowledge of the main features of

these papers. Notice that in answering RQ5, we use the information available in Table 2

and report in the text a “(*)” for all information inferred from reading the article.

5.3.1 Electric vehicles, Batteries, Energy

The transportation system is characterized by an increasing presence of EVs due to their

eco-friendly features. In Sultanuddin etal. (2023), it is proposed a DDQN-based approach

to provide a smart scalable charging strategy for EV ﬂeets that ensures all cars have suf-

ﬁcient charging for their trips without exceeding the maximum energy threshold of the

power grid. The charging management system combines information on the current state

of the network and vehicle with historical data, being able to schedule charging at least 24

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 21 of 68 88

hours in advance. In developing the proposed approach, it is considered an environment

with discrete actions and a stochastic transition model. The experimental evaluation is per-

formed on a synthetic dataset by using as metrics the reward, the voltage levels, the load

curves, and the charging/discharging curves. The rapid growth in the popularity of EVs

subjects the power grid infrastructure to challenges, such as preventing grid overload at

peak times. Moreover, the authors address issues related to driving pattern uncertainty and

handling large state spaces.

Zhang et al. (2021a) propose a framework for charging recommendations based on

MARL, called Multi-Agent Spatio-Temporal Reinforcement Learning (MASTER). By lev-

eraging a multi-agent actor-critic framework with Centralized Training and Decentralized

Execution (CTDE), the proposed approach increases the collaboration and cooperation

among agents, and it can make use of information about possible future charging competi-

tion through the use of a delayed access strategy. The framework is further extended to

multi-critics for addressing multiple objective optimizations. MASTER works in environ-

ments characterized by discrete actions (*) and a deterministic transition model (*), and it

has been tested on a real-world dataset. To evaluate its performance, the Mean Charging

Wait Time (MCWT), Mean Charging Price (MCP), Total Saving Fee (TSF), and Charging

Failure Rate (CFR) are used as performance measures. In the development of the proposed

charging recommendation approach, the authors face several challenges such as dealing

with large state and action space, coordination and cooperation among agents in a large-

scale system, potential competitiveness of future charging requests, and the joint optimiza-

tion of multiple optimization objectives.

5.3.2 IoT

Recent years have seen rapid advances in IoT technology enabling the development of

smart services such as smart cities, buildings, and oceans. Regarding smart cities, Ajao

and Apeh (2023) consider the Industrial Internet of Things and present a framework for

edge computing vulnerabilities. Indeed, edge computing security threatens the sustainabil-

ity functionality of urban infrastructure with various attacks, such as Man-in-the-Middle

and denial of service. In particular, to tackle authentication and privacy violation problems,

this work proposes a secure framework modeling in Petri Net, namely Secure Trust-Aware

Philosopher Privacy and Authentication (STAPPA), on which a Distributed Authorization

Algorithm is implemented. Moreover, a GARL approach is developed to optimize the net-

work during learning, detect anomalies, and optimize routing. This work regards an envi-

ronment characterized by discrete state and action spaces (*), and a stochastic transition

model. The authors test the proposed approach on a synthetic dataset and assess the perfor-

mance in anomaly detection and detection accuracy by using the popular detection accu-

racy, recall, precision, speciﬁcity, and F-measure. Ajao and Apeh (2023) deal with security

challenges, in particular authentication and privacy violation problems.

Zhang etal. (2021b) propose an IoT-based Smart Green Energy (IoT-SGE) management

system for improving the energy management of power grids allowed by DRL. The pro-

posed approach is able to balance power availability and demand by keeping grid states

steady, thus reducing power wastage. In developing IoT-SGE, it is considered an environ-

ment with discrete states (*), continuous actions (*), and a deterministic transition model

(*). The proposed approach has been evaluated on a synthetic dataset by the use of oper-

ational logs, power wastage and requirement, and average failure ratio as metrics. The

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 22 of 68

authors address an energy sustainability issue, in particular, they aim to manage energy

requirements and allocate smart power systems.

In the context of smart ocean systems, Han etal. (2020) present an analytical model to

evaluate the performance of an Internet of Underwater Things network with energy har-

vesting capabilities. The goal of this work is the maximization of IoT nodes throughput

by optimally selecting the window size. To this aim, the authors propose an RL approach

and leverage the Branch and Bound method to solve the optimization problem by autono-

mously adapting random access parameters through interaction with the network environ-

ment. Considering a realistic scenario, it is proposed a MARL approach to deal with the

lack of network information. In this case, random access parameters autonomously adapt

by using a distributed Multi-Armed Bandit (MAB)-based algorithm for each node. The

environment considered in this work is characterized by deterministic actions (*) and a

stochastic transition model. The authors test the proposed approach on a synthetic dataset,

evaluating its performance in channel access regulation in relation to the number of ready

nodes per time slot and throughput. Finally, this work addresses a fairness issue due to

spatial uncertainty in underwater acoustic communication, to deal with which the authors

formalize an optimization problem for maximizing the IoT network nodes throughput.

5.3.3 Water resources

Water resource management is a key aspect of sustainable development and usually does

not include social aspects. Emamjomehzadeh etal. (2023) propose a novel urban water

metabolism model that combines urban metabolism with the Water, Energy, and Food

(WEF)(Radini etal. 2021) nexus and thus it can consider interconnections among water,

energy, food, material, and GHG emissions. Moreover, this work proposes a physical-

behavioral model that relates the proposed approach to a MARL agent-based model nei-

ther fully cooperative nor fully competitive developed using Q-Learning. In this case, the

only technical information available concerns the use of a synthetic dataset. The proposed

approach is evaluated in terms of water table level, nitrate density, energy usage, and GHG

emissions. Considering water resource management challenges related to sustainability, the

authors aim to model and manage the WEF nexus for Integrated Water Resource Manage-

ment in an urban area, taking into account stakeholders’ characteristics.

Skardi etal. (2020) propose, instead, an approach for quantifying and including social

attachments in water and wastewater allocation tasks. This work proposes a paired physi-

cal-behavioral model, and the authors leverage Q-Learning to include social and behavioral

aspects in the decision-making process. Speciﬁcally, it uses the approach proposed by Baz-

zan etal. (2011) to integrate Social Analysis in Q-Learning, and they choose between indi-

vidual or social behavior through the use of speciﬁc reward functions. In developing the

proposed method both a deterministic and a stochastic transition model is considered. Tests

are performed on a dataset that combines real-world and synthetic data, and the perfor-

mance evaluation is conducted considering water and treated wastewater allocation to the

agents, water and groundwater level, and the concentration of nitrates to measure ground-

water quality. Using Social Network Analysis, the authors tackle a key challenge in com-

mon resource management, i.e., the cooperation among agents. Also, they aim to quantify

and include social attachments in water resource management.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 23 of 68 88

5.3.4 Emissions/pollution

The development of WWTPs has a positive impact on environmental protection by reduc-

ing pollution but, at the same time, they consume resources and produce GHG emissions

as well as residual sludge. With this in mind, Chen etal. (2021) propose an approach based

on MADDPG to control Dissolved Oxygen (DO) and chemical dosage at once and improve

sustainability accordingly. Speciﬁcally, the proposed approach uses two agents, one to con-

trol DO and one to control chemical dosage. Moreover, a reward function is designed based

on life cycle cost and various Life Cycle Assessment mid-point indicators respectively.

The proposed approach is developed considering an environment with continuous state

and action spaces and tested on a synthetic dataset. To evaluate the training process, the

reward and the Q-values determined by trained critic networks are used as metrics, while

to analyze the variation of the inﬂuents and control parameters, the authors leverage the

inﬂuents (COD, TN, TP, and NH

-N), inﬂow rate, DO and dosage values. Finally, to assess

the impact of the proposed approach are used energy consumption, cost, Eutrophication

Potential (EP), and GHG emissions. WWTPs have a positive impact on environmental pro-

tection since they reduce contaminants and environmental pollution. However, at the same

time, WWTPs consume resources and produce GHG emissions as well as residual sludge,

thus the authors seek to optimize their impact on environmental sustainability.

Intelligent ﬂeet management is crucial in mitigating direct GHG emissions in open-pit

mining operations. In this context, Huo etal. (2023) propose a MARL-based dispatching

system for reducing GHG emissions. To this aim, this work presents an environment for

haulage simulation that integrates a component for real-time computing of GHG emis-

sions. Then, Q-Learning is leveraged to improve ﬂeet productivity and reduce trucks’ emis-

sions by decreasing their waiting time. In the development of the proposed approach, an

environment characterized by discrete state (*) and action spaces is considered. Tests are

performed on a synthetic dataset and productivity, number of operational mistakes, GHG

emissions, and time spent in queue are used as evaluation metrics. In this work, the authors

tackle operational randomness and uncertainties in ﬂeet management for reducing haul

trucks’ GHG emissions in open-pit mining operations

5.3.5 Agriculture

In the context of sustainable agriculture, one of the key aspects of food security is crop

yield prediction. Elavarasan and DurairajVincent (2020) tackle this problem by using a

DRL approach, speciﬁcally a Deep Recurrent Q-Network (DRQN) (Hausknecht and Stone

2015) model. It consists of a Recurrent Neural Network (RNN) (Rumelhart et al. 1986)

on top of the DQN. The proposed approach sequentially stacks the RNN layers, feeds the

network with pre-trained parameters, and adds a linear layer to map the RNN output into

Q-values. The Q-Learning network builds a crop yield prediction environment as a ‘yield

prediction game’ that leverages both parametric feature combinations and thresholds use-

ful in agricultural production. The authors consider an environment with discrete states

(*) and test their approach on a dataset combining real-world and synthetic data, evaluat-

ing the performance by using the following metrics: Determination Coeﬃcient (R2), Mean

Absolute Error (MAE), Mean Squared Error (MSE), Root Mean Squared Error (RMSE),

Median Absolute Error (MedAE), Mean Squared Logarithmic Error (MSLE), Mean Abso-

lute Percentage Error (MAPE), Probability Density function (PDF), Explained Variance

Score, and accuracy. Finally, Elavarasan and DurairajVincent (2020) address issues related

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 24 of 68

to the application of Deep Learning (DL) methods to crop yield prediction for increas-

ing food production. Speciﬁcally, the authors tackle the incapability of DL approaches

to directly map, linearly or non-linearly, raw data with crop yield values and the strong

dependence of their eﬀectiveness on the quality of features extracted from data.

5.3.6 Data, energy

Data centers are among the largest consumers of energy. In Shaw et al. (2022), an RL-

based Virtual Machine (VM) consolidation algorithm named Advanced Reinforcement

Learning Consolidation Agent (ARLCA). Its aim consists of simultaneously improving

energy eﬃciency and delivery service guarantees. In this work, a global resource manager

constantly monitors the state of the system and identiﬁes hosts that may be overloaded due

to the resource demand change over time. The proposed approach rebalances the VM dis-

tribution and avoids the rapid overloading of hosts while ensuring eﬃcient operation. This

work presents two implementations of ARLCA based on two RL methods, i.e., Q-Learning

and SARSA, and it tests two diﬀerent approaches to balance the exploration-exploration

tradeoﬀ, namely

𝜖

-greedy, and softmax. Finally, the authors leverage the Potential Based

Reward Shaping(Ng et al. 1999) technique to include domain knowledge in the reward

structure and speed up the learning process. ARLCA works in an environment with dis-

crete state and action spaces (*) and a stochastic transition model. Its performance is evalu-

ated on a synthetic dataset (real-world-based). To evaluate the proposed VM consolidation

algorithms, energy consumption, Service Level Agreement Violations (SLAV), number

of migrations, and Energy Service Level Agreement Violations (ESV) are used as perfor-

mance measures. In this work, the authors tackle a key challenge for cloud computing ser-

vices, namely energy awareness. Further, they also face the slow convergence to the opti-

mal policy of conventional RL algorithms.

Renewable energy Aware Resource management (RARE), a DRL approach for job

scheduling in a green data center, is presented in Venkataswamy etal. (2023). This work

proposes a customized actor-critic method in which the authors use three Deep Neural Net-

works (DNNs): the encoder, the actor, and the critic. The encoder summarizes information

about the state of the environment into a compact representation of it, used as input for

both the action and the critic. The actor returns the probability of choosing each schedul-

ing action, while the critic estimates, for each action, the total expected value achieved by

starting in the current state and applying a speciﬁc action. Moreover, since DRL requires

a signiﬁcant amount of interactions with the environment to explore it and then to adapt

a randomly initialized DNN policy, the authors leverage an oﬄine learning algorithm,

namely, Behavioral Cloning, to learn a policy based on existing heuristic policy data used

as prior experience. In particular, the actor network is trained to imitate the action selec-

tion process of data within the replay memory. In developing RARE, it is considered an

environment characterized by discrete states (*) and actions and tested the performance

on both synthetic and real-world datasets by using the total job economic value as metrics.

In this work, the authors tackle several challenges related to the application of RL tech-

niques to the context of green datacenters. The ﬁrst issue relates to the environment. The

dynamic of green data center environments makes the scheduling process diﬃcult as it has

to consider and manage the intermittent and variable nature of renewable energy sources.

Moreover, the lack of uniformity in the environments makes it challenging to compare dif-

ferent approaches. The second challenge highlights the absence of discussion regarding the

systems design choices eﬀect (e.g., the planning horizon size). Such lack does not help to

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 25 of 68 88

clarify the reasons for the better performance of the RL scheduler over heuristic policies.

Furthermore, the authors discuss employing RL schedulers as a black box, without con-

sidering diﬀerent conﬁgurations, such as the size of the neural network, which can lead to

improved performance. Finally, the last challenge highlights that existing RL schedulers do

not focus on learning and improving available heuristic policies.

5.3.7 Urban traﬃc andtransportation

In recent years, the traﬃc congestion level has increased signiﬁcantly with a conse-

quent negative impact on the environment. Ounoughi etal. (2022) present EcoLight,

an approach for controlling traﬃc signals based on DRL, which aims to reduce noise

pollution, CO

emissions, and fuel consumption. The proposed method combines the

Sequence to Sequence Long Short Term Memory (SeqtoSeq-LSTM) prediction model

with the DQN algorithm. SeqtoSeq-LSTM is used to forecast the traﬃc noise level that

is part of the traﬃc information given as input to the DQN to determine the action to

perform. EcoLight works in environments with discrete actions (*) and has been tested

on a real-world dataset. The performance of EcoLight is evaluated by using the MSE,

MAE, noise levels, CO

emission, and fuel consumption as metrics. In this work, the

authors tackle the issue of developing a control method that considers not only mobility

and current traﬃc conditions but also integrates sustainability and proactivity.

On the other hand, Alizadeh Shabestray and Abdulhai (2019) present Multimodal

iNtelligent Deep (MiND), a DRL-based traﬃc signal controller that considers both

regular vehicles and public transit and leverages sensors’ information, like occupancy,

position, and speed, to optimize the ﬂow of people through an intersection by using

DQN. In developing MiND, the authors regard an environment characterized by discrete

states (*) and action and a stochastic transition method and test the proposed approach

on a synthetic dataset. To assess the performance of the proposed approach the follow-

ing measures are used: average intersection travel time, average in queue time, average

network travel time, and weighted average intersection person travel time. In this work,

the authors have to fulﬁll some important requirements to develop a real-time adaptive

traﬃc signal controller. Indeed, the controller has to consider both regular vehicles and

public transit traﬃc, and leverage sensors’ data on vehicle speed, position, and occu-

pancy, moreover, the decision-making process should be fast.

Aziz et al. (2018) present an RL-based approach to control traﬃc signals in con-

nected vehicle environments for reducing travel delays and GHG emissions. The pro-

posed method, the R-Markov Average Reward Technique (RMART), leverages conges-

tion information sharing among neighbor signal controllers and a multi-reward structure

that can dynamically adapt the reward function according to the level of congestion at

intersections. The considered environment presents discrete state (*) and action spaces

(*) and a stochastic (*) transition model. The authors test RMART on a synthetic data-

set and to evaluate its performance they use as metrics the average delay, stopped delay,

number of stops, and network-wide delay, while to assess the performance from a sus-

tainability point of view they leverage GHG emissions, i.e., CO, CO

, NOX, VOC,

PM10. Finally, this work deals with the traﬃc signal control problem to reduce travel

delays and GHG emissions by addressing the following issues: the sharing of conges-

tion information among neighbor signal controllers and the dynamic adaptation of the

reward function on the base of congestion level.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 26 of 68

Reducing the number of drivers who commute in search of car parking in urban centers

has a positive impact on environmental sustainability. In this context, Khalid etal. (2023)

propose a Long-range Autonomous Valet Parking framework that optimizes the path plan-

ning of AVs to minimize distance while serving all users by picking them up and dropping

them oﬀ at their required spots. The authors propose two learning-based solutions: Double-

Layer Ant Colony Optimization (DL-ACO) and DQN-based algorithms. DL-ACO can be

applied in new or unfamiliar environments, while DQN can be used in familiar environ-

ments to make eﬃcient and fast decisions since it is pre-trainable. The DL-ACO approach

determines the most eﬃcient path between pairs of spots and subsequently establishes the

optimal order in which users can be served. To deal with dynamic environments, it is pro-

posed a DQN-based algorithm in which the agent learns to solve the task by interacting

with the environment and using memory experience replay and the target network. The

proposed techniques aim to improve the carpool and parking experience while reducing

the congestion rate. In this work, the environment considered is characterized by discrete

states (*) and actions (*), and a deterministic (*) transition model. The proposed approach

is tested on a synthetic dataset and execution time, reward, path planned, and distance are

used as performance measures. In this work, the authors deal with path planning problems

in dynamic environments while ensuring the quality of experience for each user, optimiz-

ing the order of user pick-up and drop-oﬀ, and ﬁnally minimizing the overall distance.

5.3.8 Buildings

Buildings are interesting from a DR and Demand Side Management point of view. In this

context, Kathirgamanathan etal. (2021) leverage a DRL algorithm, namely Soft Actor-

Critic (SAC), intending to automatize energy management and harness energy ﬂexibility

by controlling the cooling set point in a commercial building environment. In developing

the proposed approach, the authors regard an environment with continuous states, and they

evaluate the performance on a dataset that combines real-world and synthetic data using as

evaluation metrics the energy purchased, energy cost, discomfort, total reward, tempera-

ture evolution, and power demand. Kathirgamanathan etal. (2021) tackle the application

of DRL methods to automatize DR without the need for a speciﬁc building model and their

robustness to diﬀerent operating environments and scalability. Moreover, the authors point

out that the lack of well-established environments makes it challenging to compare RL

algorithms over diﬀerent buildings.

De Gracia etal. (2015) instead consider Thermal Energy Storage, and in particular

latent heat, techniques to maximize energy savings by leveraging a Ventilated Double

Skin Facade (VDSF) with Phase Change Material (PCM) used as a cold energy storage

system. By using an RL approach, i.e., SARSA(

𝜆

), the authors control the VDSF to opti-

mally schedule the solidiﬁcation of PCM through mechanical ventilation during nighttime

and the stored cold release into the indoor environment at peak demand time, considering

weather and indoor conditions. The environment considered in this work presents discrete

states (*), discrete actions (*), and a deterministic transition model. Moreover, the pro-

posed approach is evaluated on a synthetic dataset considering electrical energy savings.

This work aims to maximize energy savings by considering both the beneﬁt of VDSF and

the energy used in the solidiﬁcation process. Therefore, it is crucial to determine the best

time for the charging process to solidify the PCM and store coldness.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 27 of 68 88

5.3.9 Manufacturing

Manufacturing industries are among the largest energy consumers, so it is crucial to

develop approaches that make them more energy eﬃcient. In this regard, Wang and Wang

(2022) tackle the Energy-Aware Distributed Hybrid Flow-shop Scheduling Problem (EAD-

HFSP). The goal of this work consists of simultaneously minimizing two conﬂicting

objectives, makespan, and Total Energy Consumption. To this aim, the authors formulate

a mixed-integer linear programming model of the EADHFSP and combine a Coopera-

tive Memetic Algorithm with an RL-based agent to solve the problem. The authors com-

bine two heuristics to initialize the population with various solutions and ﬁnally propose

an improvement scheme in which solutions are reﬁned by using the appropriate operator

determined by a policy agent, while the solution selection is performed through the use of

a decomposition strategy for balancing convergence and diversity. In this work, it is con-

sidered an environment characterized by discrete state (*) and action (*) spaces, a deter-

ministic (*) transition model, and the performance of the presented approach is tested on

a synthetic dataset considering the Overall Nondominated Vector Generation, C Metric,

Hyper volume, and D1

as evaluation metrics. This work addresses the EADHFSP with the

minimization of makespan and total energy consumption, a challenging problem due to the

simultaneous optimization of two conﬂicting objectives.

Leng etal. (2021) focus on Printed Circuit Board (PCB) manufacturing and propose a

Loosely-Coupled Deep Reinforcement Learning (LCDRL) model for energy-eﬃcient order

acceptance decisions. The authors leverage DL, speciﬁcally a Convolutional Neural Net-

work(LeCun 1989), to obtain an accurate prediction of the production cost, makespan,

and carbon consumption of each order by considering historical order labeled data. Then,

the proposed approach combines the forecasted data with order features to decide whether

to accept the order and determine the optimal acceptance sequence by using a reinforce-

ment learning approach based on Q-Learning. The authors regard an environment with

discrete actions (*) and a stochastic transition model, and they test the proposed method

on a synthetic dataset (real-world-based). As performance measures, the metrics MSE,

MSLE, RMSE, and R2 are used to evaluate the prediction accuracy of LCDRL, while the

performance of the approach is assessed in terms of unit proﬁt, total proﬁt, and accept-

ance rate. This work tackles the problem of order acceptance in PCB manufacturing to

achieve energy eﬃciency, reduce carbon emissions, and improve material usage. Two criti-

cal aspects of PCB manufacturing are demand uncertainty and order customization which

can lead to diﬀerent proﬁts, energy consumption, and carbon emissions. These two factors

have to be considered in production planning under production constraints.

5.3.10 Mobile andwireless communication

Sustainable energy infrastructures need high-quality communication systems to connect

user facilities and power plants for providing information interaction. In this context, Liu

etal. (2021) propose the use of a 6G network and Intelligent Reﬂective Surface (IRS) tech-

nology to create a wireless networking platform and suggest a DRL method to optimize the

phase shift of IRS and therefore improve the communication quality. Combining the 6G

Network with the IRS technology, the authors provide high-quality coverage while gain-

ing energy-saving beneﬁts. In particular, this work proposes the application of the Deep

Deterministic Policy Gradient (DDPG) (Lillicrap etal. 2016) algorithm to conﬁgure the

IRS phase shift for enhancing system coverage. The authors consider an environment

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 28 of 68

characterized by continuous state and action spaces. The performance of the proposed

approach is assessed on a synthetic dataset using two reﬂection units as metrics: the achiev-

able rate to measure the service quality and the transmission power. Developing sustaina-

ble energy infrastructure is challenging from several points of view. Liu etal. (2021) tackle

the need for an eﬀective global covering communication system using the IRS technology,

whose phase shift conﬁguration is challenging itself.

In the context of two-tier urban Heterogeneous Networks (HetNets), Miozzo et al.

(2015) model the Small Cell (SC) network as a decentralized multi-agent system. The

authors’ goal consists of improving system performance and self-sustainability of the SCs

in terms of energy consumption. To this aim, they leverage the distributed Q-Learning

algorithm so that every agent learns an appropriate Radio Resource Management (RRM)

policy. Miozzo etal. (2015) is extended in Miozzo etal. (2017). Here, it is proposed to train

oﬄine the algorithm to compute Q-values with which initialize the Q-tables of the SCs

that will be used in the online method. In both approaches, the environment presents dis-

crete states (*) and actions (*) and the dataset used is synthetic. In both works, the authors

evaluate the proposed approaches in terms of network performance by using the through-

put gain and traﬃc drop rate and their energy performance in terms of energy eﬃciency

and energy eﬃciency improvement. Moreover, Miozzo etal. (2015) analyze the behavior

of the HetNet considering traﬃc demand, harvested energy, battery level, policy, and nor-

malized load at the macro Base Station (BS). Also, the authors consider as performance

metrics the total amount of energy the system spent, the average load, the average cell load

for the macro BS battery outage, and Jain’s fairness index to assess the Quality of Service

(QoS) improvement. Finally, Miozzo etal. (2017) assess the computed policy by leverag-

ing the switch-oﬀ rate as a performance measure and use the battery level to analyze the

convergence of the online algorithm and evaluate the excess energy over storage capacity.

Both works address the problem of introducing energy harvesting into the computation of

sleeping strategies to achieve energy eﬃciency. This is challenging due to the irregular and

intermittent nature of renewable energies.

Giri and Majumder (2022), instead, leverage a Deep Q-Learning algorithm for optimiz-

ing resource allocation in energy-harvested cognitive radio networks, where primary users

networks share channel resources with secondary users and nodes can harvest energy from

the environment, such as solar or wind. The proposed approach addresses the dynamic allo-

cation of resources to achieve optimal network and throughput capacity, considering QoS,

energy constraints, and interference limitations. Moreover, the authors utilize both linear

and non-linear energy-harvested models, proposing a novel reward function that incorpo-

rates the non-linear one/model. The proposed approach works in environments character-

ized by continuous states (*), discrete actions (*), and a stochastic transition model (*) and

it has been tested on a synthetic dataset by using reward, capacity, network lifetime, and

average delay as performance measures. Giri and Majumder (2022) address the limitations

of Q-Learning-based allocation methods, then allow dealing with high-dimensional prob-

lems, improving convergence performance, and eﬃciently harnessing the collected energy

to meet the network’s QoS requirements.

Internet traﬃc has increased in recent years, and in the development of next-genera-

tion networks, it is important to address the QoS issue sustainably. In this context, Al-

Jawad etal. (2021) propose an RL-based algorithm to solve routing problems in a Software

Deﬁned Network (SDN) environment, named Reinforcement lEarning-based Dynamic

rOuting (REDO). Indeed, the proposed approach leverages Q-Learning to handle traﬃc

ﬂows by determining the most appropriate routing strategy among a set of conventional

routing algorithms with the aim to maximize ﬂows meeting the Service Level Agreement

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 29 of 68 88

as to throughput, packet loss, and rejection rate. In developing REDO, the authors con-

sider an environment with discrete state (*) and action spaces (*). The performance of the

proposed approach is evaluated on a synthetic dataset in terms of throughput, packet loss,

rejected ﬂows, PSNR, and Mean Opinion Score (MOS). In the development of next-gen-

eration networks like SDN, Al-Jawad etal. (2021) address the problem of providing QoS

sustainably through the solution of a traﬃc ﬂow routing problem.

5.3.11 Electric energy

One way to increase environmental sustainability is to improve the energy eﬃciency of

smart hubs. To this aim, Sheikhi etal. (2016) present the new Smart Energy Hub frame-

work for modeling distinct energy infrastructures under a single framework. The authors’

goal consists of optimizing the electrical and natural gas consumption of a residential

customer through the use of Q-Learning. Moreover, to improve and support information

management among users and utility service providers, the proposed framework lever-

ages Cloud Computing based systems. In this case the only technical information avail-

able concern the use of a stochastic transition model and a synthetic dataset. To evaluate

the performance of the proposed approach, the metrics used are the storage charge level,

the operational cost, and the primary energy involved. As regards dynamic load manage-

ment in smart hubs, the authors tackle two issues. The ﬁrst one is related to energy system

parameters which are often assumed to be constant but can vary with time or be stochastic

in practice. The second one is, instead, related to the conventional smart grid architecture,

which has several reported issues, including exposure to cyber-attacks, single failure prob-

lems, limited memory and storage capacity in the energy management system, and diﬃcul-

ties in implementing real-time early warning systems due to limited energy and bandwidth

resources.

In the context of smart energy networks, Harrold etal. (2022) consider a microgrid

environment and leverage DRL to control a battery for energy arbitrage and increased

use of renewable energies, namely solar and wind energy. Speciﬁcally, the authors apply

the Rainbow Deep Q-Network (Hessel et al. 2018) algorithm and add predicted values

for demand, Renewable Energy Source (RES), and energy price to agents’ information

by leveraging an Artiﬁcial Neural Network. In this work, the environment considered is

characterized by continuous states (*) and discrete actions. The authors test the proposed

approach on a dataset that considers both real-world and synthetic data and assess the pre-

diction accuracy using the MAPE. Also, they evaluate the performance of the proposed

approach through energy cost savings, relative savings, episodic rewards, and value distri-

bution. This work tackles the problem of controlling an Energy Storage System in a micro-

grid with its demand, RES, and dynamic energy pricing to perform energy arbitrage and

improve the use of RES leading to reduced energy cost. Finally, the authors point out the

limited availability of data that requires an eﬃcient algorithm training procedure.

A key aspect of sustainability and cost-eﬀectiveness in grid operation is optimal energy

dispatch. Jendoubi and Bouﬀard (2022) address a multi-dimensional power dispatch prob-

lem within a power system by leveraging MARL, speciﬁcally the MADDPG algorithm.

The proposed control framework performs CTDE to improve the coordination among

dispatchable units without communication needed, and thus it mitigates data privacy and

communication issues. In developing the presented approach, it is considered an environ-

ment with continuous actions and a stochastic transition model (*). The dataset used to

evaluate the performance is synthetic and the proposed method is evaluated in terms of the

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 30 of 68

annual total cost, variation of daily operation cost, photovoltaics (PV) production, aggre-

gated demand, amount of power to be charged/discharged, amount of power provided by

a diesel generator, amount of electricity delivered by the electricity provider, the diﬀer-

ence in the amount of electricity delivered by the electricity provider between two consecu-

tive time steps and Peak-to-Average Ratio (PAR). The authors address the energy dispatch

aspects related to the development of distributed energy resources control strategies in grid

operation to simultaneously reduce costs and delays and allow local coordination among

energy resources.

5.3.12 Energy

In recent years, international trade and container handling at port terminals have increased

greatly. Improving sustainability in port operations closely relates to the energy consump-

tion at Automated Container Terminals, where Automatic Stacking Cranes (ASCs) are

used to load, unload, and pile containers. In this context, Gao etal. (2023) propose a digi-

tal twin-based approach for container yard management. Speciﬁcally, this work focuses on

determining the optimal allocation of container tasks and scheduling of ASCs to reduce

the energy consumption of ASCs while maintaining eﬃcient loading and unloading opera-

tions. The proposed approach leverages a virtual container yard to simulate the operating

plan and mixed integer programming model to optimize the scheduling problem taking

into account the energy consumption. Finally, the authors use the Q-Learning algorithm

to determine the optimal scheduling plan and minimize energy consumption. The envi-

ronment considered in this work presents discrete actions (*) and a stochastic (*) transi-

tion model. The performance of the proposed approach is evaluated on a real-world dataset

using working and non-working energy consumption and the ratio between them as met-

rics. To improve the sustainability of port operations, in this work the problem of optimiz-

ing container yard operations to minimize energy consumption is addressed. Indeed, sev-

eral factors can introduce randomness and uncertainty into these operations, and incorrect

distribution of tasks can lead to suboptimal utilization of ASCs.

5.3.13 Wireless sensor network

Regarding embedded systems powered by a renewable energy source like an Energy Har-

vesting Wireless Sensor Node (EHWSN), Hsu etal. (2014) present a method called Rein-

forcement Learning-based throughput on-demand provisioning dynamic power management

(RLTDPM). By leveraging the Q-Learning algorithm, the proposed approach allows the

EHWSN to adapt the operational duty cycle to satisfy both the energy neutrality condition

and the throughput on-demand (ToD) requirement, ensuring perpetual operation. In devel-

oping RLTDPM, the authors regard an environment characterized by discrete state (*) and

actions (*) and evaluate the performance on a synthetic dataset by considering the residual

battery energy (RBE), exercised duty cycle (EDC), oﬀset to the required ToD (OTRT), and

ToD achievability. In this work, the authors address the problem of simultaneously achiev-

ing two mutually conﬂicting goals i.e., satisfying ToD and reducing power consumption.

Energy-Harvesting Wireless Sensor Networks (WSNs) are widely used in energy-con-

strained operation problems. In particular, Chen etal. (2016) focus on Solar-Powered Wire-

less Sensor Networks (SPWSNs) and present an RL-based Sleep Scheduling for Coverage

algorithm to improve the sustainability of SPWSN’s operations. The proposed approach

leverages a precedence operator in the group formation algorithm to prioritize sensors in

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 31 of 68 88

sparsely covered areas ensuring the desired coverage distribution. Then, the authors pro-

pose a multi-sensor cooperation Q-Learning group model to properly choose nodes’ working

modes by leveraging the developed learning and action selection strategies. The whole group

learns the sleep schedule by changing the role of the active node. The environment consid-

ered in this work presents discrete state (*) and action (*) spaces and a stochastic transition

model. The proposed approach is tested on a dataset that combines real-world and synthetic

data, and its performance is evaluated in terms of energy balancing between group members

by using the potential energy of nodes as metrics, network lifetime, area coverage ratio, num-

ber of residual alive nodes versus the network lifetime, and the recharging cycle. In this work,

the authors tackle a sleep scheduling problem to simultaneously achieve the desired area cov-

erage and energy balance between group nodes to extend the network lifetime.

On the other hand, Feng etal. (2023) propose an RL-based approach to maximize data

throughput in self-sustainable WSNs. The authors consider a Mobile Sensor (MS) that col-

lects and transmits data to a ﬁxed sink while moving within the network and harvesting

energy from the environment. By leveraging DDPG, the MS can determine the optimal

trajectory to optimize the EH performance and data transmission dealing with unknown

energy supply dynamics. The environment considered in this work is characterized by

continuous states and actions, and a stochastic transition model (*). Moreover, the perfor-

mance of the proposed approach is assessed on a synthetic dataset considering as evalua-

tion metrics the distribution of the ratio between the expected per-slot harvested energy and

the MS-to-sink distance within the network, moving trajectories of the MS, reward, actor-

loss, battery level, accuracy, moving steps, convergence, and training time. The authors

tackle two main challenges concerning the MS’s trajectory optimization to maximize data

throughput. The ﬁrst relates to the lack of energy-related information such as the energy

sources’ placement, future energy harvesting potential, and statistical parameters like the

average energy harvesting rate, which makes the problem challenging. The second consists

of the tradeoﬀ between energy harvesting and data transmission. Indeed, moving closer

to energy sources allows the MS to increase the energy harvesting amount. However, this

may lead to decreasing data transmission power due to a possible increase in the distance

between the MS and the sink.

5.3.14 Autonomous vehicles

In the last decade, Unmanned Aerial Vehicles (UAVs), i.e., drones, have been used in vari-

ous scenarios such as rapid disaster response, Search-And-Rescue, environmental monitor-

ing, etc., where humans are unable to operate in a timely and eﬃcient manner, for example,

due to the presence of physical obstacles. Bouhamed etal. (2020) consider the application

of UAVs as mobile data collection units in delay-tolerant WSNs. The authors propose a

mechanism that exploits two RL techniques, namely DDPG and Q-Learning algorithms.

The proposed approach uses DDPG to determine the best trajectory for a UAV to reach the

target destination while avoiding obstacles in the environment. Q-Learning, on the other

hand, is used to schedule the best order of visiting nodes to minimize the time needed to

collect data within a predeﬁned time limit. In this work, the environment presents con-

tinuous state (*) and action spaces for the DDPG-based part of the approach while discrete

states (*) and actions (*) for the Q-Learning-based one. The proposed mechanism is tested

on a synthetic dataset and to evaluate its obstacle avoidance and scheduling performance,

the authors analyze the path followed by the UAV, the reward collected, the UAV’s bat-

tery level, and the completion time of the tour against the ground unit transmission power.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 32 of 68

This work addresses issues related to the limited battery capacity of UAVs and challenges

related to navigating in obstacle-prone environments to enable communication between the

UAV and low transmission power sensors.

Sacco etal. (2021) propose a MARL approach based on the actor-critic framework to

tackle task oﬄoading problems from UAV swarms in edge computing environments to

simultaneously reduce task completion time and improve energy eﬃciency. The proposed

approach determines a distributed decision strategy through the collaboration among the

system’s mobile nodes that share information about the overall system state. This informa-

tion is then used by the agents to decide whether to compute a task locally or oﬄoad it to the

edge cloud and in this case, the proposed technique chooses the best transmission technol-

ogy among Wi-Fi access points and mobile network. In developing the proposed approach,

the environment considered presents continuous states (*) and discrete actions (*), and the

dataset used for testing combines real-world and synthetic data. The performance of the pre-

sented techniques is assessed in terms of task completion time and utility against a varying

number of agents, and average node-antenna distance. Then, the authors evaluate the energy

consumption necessary to complete the task by varying the average node-antenna distance

and computing workload and assess the task completion time against the average comput-

ing workload. In addition, the cumulative distribution function (CDF) and utility evolution

through episodes are considered to analyze the variability of performance among nodes and

convergence performance, respectively. Finally, in this work, the authors tackle the problem

of reducing the time necessary for task completion of UAV swarms by leveraging task oﬀ-

loading to the edge cloud while improving energy eﬃciency.

In the context of autonomous driving, Gu et al. (2023) tackle the application of RL

methods focusing on energy-saving and environmentally friendly driving strategies within

a cooperative adaptive cruise control platoon. More precisely, the goal of this work consists

of training platoon member vehicles to react eﬀectively when the leading vehicle faces a

severe collision. The authors leverage the Policy Gradient algorithm for training an RL

agent to minimize the energy consumption in inter-vehicle communication for decision-

making while avoiding collisions or minimizing the resulting damage. To this aim, two

diﬀerent loss functions are used, i.e., collision loss and energy loss. Moreover, utilizing

a speciﬁc reward function can both ensure the vehicle’s safety and consider the fuel con-

sumption resulting from the action performed by the vehicle. This work considers an envi-

ronment characterized by continuous states (*), discrete actions, and a stochastic transition

model (*). The proposed approach has been tested on a synthetic dataset using energy loss,

collision loss, reward, and lane changes as metrics. A key challenge of green autonomous

driving addressed in this work is the development of eﬀective strategies that can respond to

environmental observations by automatically generating appropriate control signals.

6 Discussion

The analysis of the literature performed in this work shows that most of the works about RL for

environmental sustainability concern the energy application domain, followed by urban traﬃc

and transportation. The main RL technique used in the reviewed manuscripts is Q-Learning.

Concerning the 35 selected articles, we observe that energy-related issues involve most of the

papers, and about half of them leverage DRL approaches, such as DQN and DDQN. In devel-

oping the proposed methods, the authors mainly consider domains with discrete state and action

spaces, and stochastic transition models, using synthetic datasets to evaluate the performance.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 33 of 68 88

Problems related to environmental sustainability were traditionally tackled with

optimization techniques in which the concept of adaptability has to be introduced

explicitly. In contrast, one of the strengths of RL is its natural way of dealing with

adaptability to changing or diﬀerent environments, a crucial feature in environmen-

tal sustainability problems since in this context the agent has to handle variations in

operating conditions due to, for example, changes in resource availability or weather

conditions. For instance, Chen etal. (2016) introduce a RL-based Sleep Scheduling

for Coverage (RLSSC) approach to ensure sustainable time-slotted operations in solar-

powered wireless sensor networks. This algorithm is compared to LEACH(Heinzel-

man et al. 2002), a high-energy-eﬃcient hierarchical routing protocol, wherein the

node chosen to be active in the current round is ineligible for selection in the subse-

quent round, and a random algorithm that randomly determines active nodes within a

group. Among the various aspects considered, a crucial criterion for evaluating algo-

rithm eﬀectiveness lies in maintaining equilibrium in energy levels, as signiﬁcant dis-

parities in current residual energy arise when a node receives an energy supplement.

RLSSC initially exhibits ﬂuctuations but eventually converges through iterative learn-

ing, exhibiting slight oscillations up and down in response to varying solar strength

throughout the day. Moreover, the proposed approach demonstrates real-time energy

balancing among sensor nodes. In contrast, non-RL-based methods lack the capacity

to adapt to the dynamic environment. Another aspect to consider is network lifetime,

where RLSSC excels in adapting to uncertainties associated with harvesting time and

the amount of acquired energy. This adaptability enables RLSSC to dynamically adjust

its scheme in real-time, eﬀectively extending the overall network lifetime. This is only

one of several examples showing that RL can provide a strong advantage in solving

problems related to environmental sustainability because of its natural capability to

deal with uncertainty and adaptation in sequential decision-making.

However, we identify several open problems in the application of RL techniques to

environmental sustainability. These concern scalability, data eﬃciency, and the necessity

to deal with large data volumes, often posing cost challenges. In future developments, it

is crucial to improve pre-training methods that allow the generation of initial policies by

simulation and leverage knowledge acquired by solving a related task. RL methods are

also sensitive to reward function therefore reward engineering is important to avoid a nega-

tive impact on performance. Moreover, in dealing with environmental sustainability prob-

lems in speciﬁc contexts like IoT, it is of particular importance to consider the presence of

computational limitations and then optimize the computational complexity of the method.

Finally, we note that most of the approaches involve single-agent systems. Extending the

proposed approaches to the multi-agent context would allow the cooperative computa-

tion of optimal policies accounting for common performance objectives to improve shared

resources management and environmental sustainability.

7 Conclusions

This review focuses on the application of RL techniques to address environmental sus-

tainability challenges, a topic of increasing interest in the international scientiﬁc commu-

nity. We have examined several contexts where RL techniques have been recently used to

enhance environmental sustainability, oﬀering practitioners insights into state-of-the-art

methodologies across diverse application domains. RL has found practical application in

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 34 of 68

environmental sustainability because the inherent uncertainty of this domain poses chal-

lenges to strategy learning and adaptation that can be naturally tackled by RL. The review of

the literature performed in this survey has identiﬁed the most common applications of RL in

environmental sustainability and the most popular methods used to address these challenges

in the last two decades. We have ﬁrst provided a quantitative analysis of the state-of-the-

art related to the application of RL in environmental sustainability and then analyzed the

use of these techniques, focusing on sustainability concerns. In particular, we have provided

an overview of the application domains of the proposed RL techniques and the approaches

used for environmental sustainability issues. Moreover, we have narrowed our attention to

35 selected papers and provided technical information on the formalization of the RL prob-

lem, the performance measures adopted for evaluation, and the challenges addressed.

Keywords mapping intomacro areas

See Tables4, 5, 6, 7 , 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20.

Table 4 Keywords grouped in the “Energy” macro area

Energy

Energy Energy aware Energy conservation

Energy consumption Energy distributions Energy eﬃciency

Energy harvesting Energy management Energy management systems

Energy resource Energy source Energy storage

Energy sustainability Energy systems Energy utilization

Distributed energies Dynamic energy Energy allocations

Energy arbitrages Energy availability Energy consumption balances

Energy ﬂexibility Energy infrastructures Energy management strategy

Energy market Energy neutrality Energy optimal scheduling

Energy routing Energy savings Energy storage system

Energy theft Energy transfer Energy usage

Energy use Energy-awareness Energy-constrained networks

Energy-saving strategies Harvesting energies Intelligent energy management

Non-renewable energy Total energy consumption Wireless energy transfers

Energy trading

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 35 of 68 88

Table 5 Keywords grouped in the “Electric energy” macro area

Electric energy

Electric energy storage Electric load dispatching

Electric power transmission networks Electric power utilization

Dynamic energy managements Dynamic loads

Dynamic power management Electric load management

Electric loads Electric power system control

Electric power transmission Electrical networks

Electricity demands Electricity loss

Micro grid Smart grid

Smart power grids Electricity grids

Grid resilience Power grids

Electricity grids Smart grid communications

Grid resilience Distributed power generation

Power management Power supply

Low power electronics Dynamic power management

Electric power system control Electric power transmission

inductive power transmission Low power

Low power and lossy network (lln) Low power networks

Low-power consumption Low-power devices

Power Power allocations

Power dispatch Power grids

Power harvesting Power limitations

Power system simulator for engineering Power system simulators

Power traces Power control

Power transmission systems Reactive power

Reactive power output Power management (telecommunication)

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 36 of 68

Table 6 Keywords grouped in the “Urban Traﬃc and Transportation” macro area

Urban traﬃc and transportation

street traﬃc control Sustainable mobility Traﬃc congestion

Traﬃc emission Traﬃc ﬂow Traﬃc light control

Traﬃc management Traﬃc signal control Traﬃc signals

Adaptive traﬃc signal control Intelligent traﬃc controls Optimal traﬃc control

Traﬃc Traﬃc conditions Traﬃc environment

Traﬃc light Traﬃc management strategies Traﬃc scheduling

Urban traﬃc Highway administration Motor transportation

Transportation Transportation system Urban transportation

Bus bunching Bus transportation Sustainable transportation

Intelligent transportation Transport systems Transportation network

Transportation planning

Table 7 Keywords grouped in the “Water resources” macro area

Water resources

Aquifer Ground water Wastewater

Wastewater treatment Water management Water quality

Water resources Water resources management Water supply

Water treatment Surface water Sustainable wastewater treatments

Waste water management Waste water recycling Wastewater treatment plant

Water metabolism Water puriﬁcation Water resources systems

Water treatment plants Groundwater resources Water level

Water quantity Watersheds

Table 8 Keywords grouped in the “Renewable/sustainable energies” macro area

Renewable/sustainable energies

Renewable energies Renewable energy resources Renewable energy source

Renewable resource Renewables Alternative energy

Solar energy Green energy Smart renewable energy

Use of renewable energies Sustainable energy Solar power generation

Renewable power generation Tidal power Wind power

Hydroelectric power plants Hydropower

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 37 of 68 88

Table 9 Keywords grouped in the “Emissions/Pollution” macro area

Emissions/Pollution

Carbon emission Carbon footprint Emission control

Gas emissions Greenhouse gas Greenhouse gas emissions

Acoustic noise Air pollution monitoring Atmospheric pollution

Carbon abatement strategy Carbon sequestration Co2 emissions

Greenhouse emissions Greenhouse gas emission reduction Groundwater pollution

Noise pollution Pollution control Vehicular emission

Water pollution Air quality

Table 10 Keywords grouped in the “Mobile and Wireless communication” macro area

Mobile and wireless communication

5G mobile communication systems Mobile telecommunication systems

6g mobile communication Mobile communications

Base stations Small cells

Wireless communication links Wireless communications

Wireless telecommunication systems Sustainable wireless communication network

Wireless communications networks Communication network

Heterogeneous networks Wireless powered communication network

Table 11 Keywords grouped in the “Data” macro area

Data

Data acquisition Data handling Data transfer

Digital storage Big data Data aggregation

Data aggregation and fusion Data analytics Data distribution

Data logger Data mining Data sensing

Data-driven approach Data-driven design Database technology

Datacenter Distributed database Next generation data centers

Data transfer Data-communication Real-time data

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 38 of 68

Table 12 Keywords grouped in

the “Wireless sensor network”

macro area

Wireless sensor network

Wireless sensor network Wireless sensor node

Heterogeneous wireless sensor networks Solar-powered wire-

less sensor networks

Rechargeable sensor networks Wireless smart sensors

Sensor nodes Integrating sensors

Sensor networks Wireless smart sensors

Sensor networks Smart sensors

Adaptive sensor selections Mobile sensors

Sensor Sensor payloads

Sleep scheduling

Table 13 Keywords grouped

in the “Autonomous vehicles”

macro area

Autonomous vehicles

Autonomous driving Autonomous navigation

Autonomous vehicles Unmanned aerial vehicles (uav)

Autonomous unmanned aerial

vehicles

Uav networks

Unmanned aerial vehicle Uav positioning

Autonomous vehicle control Autonomous ship

Auto-navigation Automated vehicles

Table 14 Keywords grouped in

the “Batteries” macro area Batteries

Battery energy storage

systems

Battery management

systems

Battery storage

Charging (batteries) Secondary batteries Battery operation

Electric batteries Battery capacity Lead acid batteries

Lithium batteries Residual battery

Table 15 Keywords grouped in

the “Manufacturing” macro area Manufacturing

Manufacture Manufacturing

Production control Supply chains

Sustainable manufacturing Distributed manufacturing systems

Manufacturing environments Large scale manufacturing systems

Manufacturing industries Manufacturing sector

Printed circuit board manufac-

turing

Process manufacturing

Sustainable manufacturing engineering and resource-eﬃcient

production

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 39 of 68 88

Database search results

In the following section, we report the results of the search performed on the databases

used, namely Scopus and Web of Science. In the following table, for each paper, we

indicate the authors, title, source title, and publication year in addition to the database in

which a corresponding record is present. More speciﬁcally, in the “Database” column,

we use the letters “S” and “W” to indicate the presence of the paper on the Scopus and W

databases, respectively, while “S, W” indicates the paper’s presence on both databases.

Table 16 Keywords grouped in

the “Electric vehicles” macro

area

Electric vehicles

charging station Electric vehicle charging

Electric vehicle charging station Electric vehicles

Electric vehicle charging station recom-

mendation

E mobilities

Fuel cell hybrid electric vehicles Vehicle-to-grid

Table 17 Keywords grouped in

the “IoT” macro area IoT

Internet of things Internet of underwater

things

Green internet of thing

Industrial internet of

thing

Table 18 Keywords grouped in

the “Agriculture” macro area Agriculture

Agricultural robots Agriculture Crops

Agricultural productions Agricultural products Crop productivity

Smart agricultures Sustainable agricultural Sustainable agri-

cultural system

Sustainable agriculture

Table 19 Keywords grouped in the “Vehicles” macro area

Vehicles

Intelligent vehicle highway systems Vehicles Transit vehicles Vehicle dispatch Vehicle ﬂeets

Table 20 Keywords grouped in the “Buildings” macro area

Buildings

Building Building coverage ratios Building energy

Building energy ﬂexibility Building energy management

systems

Building energy managements

Building stocks College buildings Intelligent buildings

Oﬃce buildings

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 40 of 68

Authors Title Source title Year Database

Sultanuddin S.J., Vibin R.,

Rajesh Kumar A., Behera

N.R., Pasha M.J., Baseer

K.K.

Development of improved

reinforcement learning

smart charging strategy

for electric vehicle ﬂeet

Journal of Energy Storage 2023 S, W

Ajao L.A., Apeh S.T. Secure edge computing vul-

nerabilities in smart cities

sustainability using petri

net and genetic algorithm-

based reinforcement

learning

Intelligent Systems with

Applications

2023 S

Wang J., Sun L. Robust dynamic bus

control: a distributional

multi-agent reinforcement

learning approach

IEEE Transactions on

Intelligent Transportation

Systems

2023 S, W

Szoke L., Aradi S., Bécsi T. Traﬃc signal control with

successor feature-based

deep reinforcement learn-

ing agent

Electronics (Switzerland) 2023 S, W

Ali M.Y., Alsaeedi A., Shah

S.A.A., Yafooz W.M.S.,

Malik A.W.

Energy eﬃcient data dis-

semination for large-scale

smart farming using

reinforcement learning

Electronics (Switzerland) 2023 S, W

Yao R., Hu Y., Varga L. Applications of agent-based

methods in multi-energy

systems-a systematic

literature review

Energies 2023 S, W

Kazemeini A., Swei O. Identifying environmentally

sustainable pavement

management strategies

via deep reinforcement

learning

Journal of Cleaner Produc-

tion

2023 S, W

Emamjomehzadeh O., Kera-

chian R., Emami-Skardi

M.J., Momeni M.

Combining urban metabo-

lism and reinforcement

learning concepts for sus-

tainable water resources

management: A nexus

approach

Journal of Environmental

Management

2023 S

Charef N., Ben Mnaouer A.,

Aloqaily M., Bouachir O.,

Guizani M.

Artiﬁcial intelligence

implication on energy

sustainability in Internet

of Things: A survey

Information Processing and

Management

2023 S, W

Savazzi S., Rampa V.,

Kianoush S., Bennis M.

An energy and carbon

footprint analysis of

distributed and federated

learning

IEEE Transactions on

Green Communications

and Networking

2023 S, W

Naseer F., Khan M.N.,

Altalbe A.

Telepresence robot with

DRL assisted delay com-

pensation in iot-enabled

sustainable healthcare

environment

Sustainability (Switzerland) 2023 S

Kolat M., Kővári B., Bécsi

T., Aradi S.

Multi-Agent reinforcement

learning for traﬃc signal

control: a cooperative

approach

Sustainability (Switzerland) 2023 S, W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 41 of 68 88

Authors Title Source title Year Database

Sivamayil K., Rajasekar E.,

Aljafari B., Nikolovski

S., Vairavasundaram S.,

Vairavasundaram I.

A systematic study on rein-

forcement learning based

applications

Energies 2023 S, W

Khalid M., Wang L., Wang

K., Aslam N., Pan C.,

Cao Y.

Deep reinforcement

learning-based long-range

autonomous valet parking

for smart cities

Sustainable Cities and

Society

2023 S

Gao Y., Chang D., Chen

C.-H.

A digital twin-based

approach for optimiz-

ing operation energy

consumption at automated

container terminals

Journal of Cleaner Produc-

tion

2023 S

Badakhshan S., Jacob R.A.,

Li B., Zhang J.

Reinforcement learning for

intentional islanding in

resilient power transmis-

sion systems

2023 IEEE Texas Power

and Energy Conference,

TPEC 2023

2023 S

Zhang W., Valencia A.,

Chang N.

Fingerprint networked

reinforcement learning

via multiagent modeling

for improving decision

making in an urban food-

energy-water nexus

IEEE Transactions on

Systems, Man, and Cyber-

netics: Systems

2023 S, W

Li C., Bai L., Yao L., Waller

S.T., Liu W.

A bibliometric analysis and

review on reinforcement

learning for transportation

applications

Transportmetrica B 2023 S, W

Venkataswamy V., Grigsby

J., Grimshaw A., Qi Y.

RARE: renewable energy

aware resource manage-

ment in datacenters

Lecture Notes in Com-

puter Science (including

subseries Lecture Notes

in Artiﬁcial Intelligence

and Lecture Notes in

Bioinformatics)

2023 S, W

No author name available

[Conference Review]

9th International Confer-

ence on Sustainable

Design and Manufactur-

ing, SDM 2022

Smart Innovation, Systems

and Technologies

2023 S

Koch L., Picerno M.,

Badalian K., Lee S.-Y.,

Andert J.

Automated function

development for emission

control with deep rein-

forcement learning

Engineering Applications of

Artiﬁcial Intelligence

2023 S, W

Huo D., Sari Y.A., Kealey

R., Zhang Q.

Reinforcement learning-

based ﬂeet dispatching for

greenhouse gas emission

reduction in open-pit min-

ing operations

Resources, Conservation

and Recycling

2023 S

Feng Y., Zhang X., Jia R.,

Lin F., Lu J., Zheng Z.,

Li M.

Intelligent trajectory design

for mobile energy harvest-

ing and data transmission

IEEE Internet of Things

Journal

2023 S, W

Chen M., Li Y., Zhang X.,

Liao R., Wang C., Bi X.

Optimization of river envi-

ronmental management

based on reinforcement

learning algorithm: a case

study of the Yellow River

in China

Environmental Science and

Pollution Research

2023 S, W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 42 of 68

Authors Title Source title Year Database

Gu Z., Liu Z., Wang Q.,

Mao Q., Shuai Z., Ma Z.

Reinforcement learning-

based approach for

minimizing energy loss of

driving platoon decisions

Sensors 2023 W

Baba-Nalikant M., Syed-

Mohamad S. M., Husin

M. H., Abdullah N. A.,

Saleh M. S. M., Rahim

A. A.

A zero-waste campus

framework: perceptions

and practices of university

campus community in

Malaysia

Recycling 2023 W

Daradkeh M. Lurkers versus contributors:

an empirical investigation

of knowledge contribution

behavior in open innova-

tion communities

Journal of Open Innovation:

Technology, Market, and

Complexity

2022 S

Jendoubi I., Bouﬀard F. Data-driven sustainable dis-

tributed energy resources’

control based on multi-

agent deep reinforcement

learning

Sustainable Energy, Grids

and Networks

2022 S

Tomin N., Shakirov V.,

Kurbatsky V., Muzychuk

R., Popova E., Sidorov D.,

Kozlov A., Yang D.

A multi-criteria approach to

designing and managing

a renewable energy com-

munity

Renewable Energy 2022 S, W

Adetunji K.E., Hofsajer

I.W., Abu-Mahfouz A.M.,

Cheng L.

A novel dynamic planning

mechanism for allocating

electric vehicle charg-

ing stations considering

distributed generation and

electronic units

Energy Reports 2022 S

Li R., Zhang X., Jiang L.,

Yang Z., Guo W.

An adaptive heuristic algo-

rithm based on reinforce-

ment learning for ship

scheduling optimization

problem

Ocean and Coastal Manage-

ment

2022 S, W

Zhang W., Xie M., Scott C.,

Pan C.

Sparsity-aware intelligent

spatiotemporal data sens-

ing for energy harvesting

IoT system

IEEE Transactions on

Computer-Aided Design

of Integrated Circuits and

Systems

2022 S, W

Yao L., Leng Z., Jiang J.,

Ni F.

Large-scale maintenance

and rehabilitation optimi-

zation for multi-lane high-

way asphalt pavement: a

reinforcement learning

approach

IEEE Transactions on

Intelligent Transportation

Systems

2022 S, W

Adetunji K.E., Hofsajer

I.W., Abu-Mahfouz A.M.,

Cheng L.

An optimization planning

framework for allocat-

ing multiple distributed

energy resources and

electric vehicle charging

stations in distribution

networks

Applied Energy 2022 S, W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 43 of 68 88

Authors Title Source title Year Database

Mahmud S., Abbasi A.,

Chakrabortty R.K., Ryan

M.J.

A self-adaptive hyper-

heuristic based multi-

objective optimisation

approach for integrated

supply chain scheduling

problems

Knowledge-Based Systems 2022 S, W

Musaddiq A., Ali R., Kim

S.W., Kim D.-S.

Learning-based resource

management for low-

power and lossy IoT

networks

IEEE Internet of Things

Journal

2022 S

Giri M.K., Majumder S. Deep Q-learning based

optimal resource alloca-

tion method for energy

harvested cognitive radio

networks

Physical Communication 2022 S

Selukar M., Jain P., Kumar

Inventory control of

multiple perishable goods

using deep reinforcement

learning for sustainable

environment

Sustainable Energy Tech-

nologies and Assessments

2022 S, W

No author name available

[Conference Review]

IFAC Workshop on Control

for Smart Cities, CSC

2022 - Proceedings

IFAC-PapersOnLine 2022 S

Alibabaei K., Gaspar P.D.,

Assunção E., Alire-

zazadeh S., Lima T.M.,

Soares V.N.G.J., Caldeira

J.M.L.P.

Comparison of on-policy

deep reinforcementlearn-

ing A2C with oﬀ-policy

DQN in irrigation optimi-

zation: a case study at a

site in portugal

Computers 2022 S, W

Raza A., Shah M.A., Khat-

tak H.A., Maple C., Al-

Turjman F., Rauf H.T.

Collaborative multi-agents

in dynamic industrial

internet of things using

deep reinforcement

learning

Environment, Development

and Sustainability

2022 S, W

Shaw R., Howley E., Bar-

rett E.

Applying reinforcement

learning towards automat-

ing energy eﬃcient virtual

machine consolidation in

cloud data centers

Information Systems 2022 S, W

Jurj S.L., Werner T., Grundt

D., Hagemann W., Möhl-

mann E.

Towards safe and sustain-

able autonomous vehicles

using environmentally-

friendly criticality metrics

Sustainability (Switzerland) 2022 S

Oubbati O.S., Atiquzzaman

M., Lim H., Rachedi A.,

Lakas A.

Synchronizing UAV teams

for timely data collec-

tion and energy transfer

by deep reinforcement

learning

IEEE Transactions on

Vehicular Technology

2022 S, W

Xu G., Guo F. Sustainability-oriented

maintenance manage-

ment of highway bridge

networks based on

Q-learning

Sustainable Cities and

Society

2022 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 44 of 68

Authors Title Source title Year Database

Wang J.-J., Wang L. A cooperative memetic

algorithm with learning-

based agent for energy-

aware distributed hybrid

ﬂow-shop scheduling

IEEE Transactions on Evo-

lutionary Computation

2022 S, W

Zhang T., Gou Y., Liu J.,

Yang T., Cui J.-H.

UDARMF: An underwater

distributed and adaptive

resource management

framework

IEEE Internet of Things

Journal

2022 S, W

Zhang M., Lu Y., Hu Y.,

Amaitik N., Xu Y.

Dynamic scheduling

method for job-shop

manufacturing systems by

deep reinforcement learn-

ing with proximal policy

optimization

Sustainability (Switzerland) 2022 S, W

Ma Y., Kassler A., Ahmed

B.S., Krakhmalev P.,

Thore A., Toyser A., Lind-

bäck H.

Using deep reinforcement

learning for zero defect

smart forging

Advances in Transdiscipli-

nary Engineering

2022 S

Jang J., Yang H.J. Deep learning-aided user

association and power

control with renewable

energy sources

IEEE Transactions on Com-

munications

2022 S, W

Danassis P., Erden Z.D.,

Faltings B.

Exploiting environmental

signals to enable policy

correlation in large-scale

decentralized systems

Autonomous Agents and

Multi-Agent Systems

2022 S

Manchella K., Haliem M.,

Aggarwal V., Bhargava B.

PassGoodPool: joint

passengers and goods

ﬂeet management with

reinforcement learning

aided pricing, matching,

and route planning

IEEE Transactions on

Intelligent Transportation

Systems

2022 S, W

Yk S., Wu J., Song S. Research on autonomous

driving decision based on

improved deep determin-

istic policy algorithm

SAE Technical Papers 2022 S

Neumann M., Palkovits D.S. Reinforcement learning

approaches for the optimi-

zation of the partial oxida-

tion reaction of methane

Industrial and Engineering

Chemistry Research

2022 S, W

Luo M., Du B., Klemmer

K., Zhu H., Wen H.

Deployment optimization

for shared e-mobility

systems with multi-agent

deep neural search

IEEE Transactions on

Intelligent Transportation

Systems

2022 S, W

Dey S., Saha S., Singh A.K.,

McDonald-Maier K.

SmartNoshWaste: using

blockchain, machine

learning, cloud computing

and QR code to reduce

food waste in decentral-

ized web 3.0 enabled

smart cities

Smart Cities 2022 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 45 of 68 88

Authors Title Source title Year Database

Kim J., Park J., Cho K. Continuous autonomous

ship learning framework

for human policies on

simulation

Applied Sciences (Swit-

zerland)

2022 S, W

Tuli S., Gill S.S., Xu M.,

Garraghan P., Bahsoon R.,

Dustdar S., Sakellariou R.,

Rana O., Buyya R., Casale

G., Jennings N.R.

HUNTER: AI based holistic

resource management

for sustainable cloud

computing

Journal of Systems and

Software

2022 S, W

Wang T., Liu L., Ding T. Optimized sustainable strat-

egy in aerial terrestrial

IoT network

Proceedings - 2022 18th

International Conference

on Mobility, Sensing and

Networking, MSN 2022

2022 S, W

Hoover W., Guerra-Zubiaga

D.A., Banta J., Wandene

K., Key K., Gonzalez-

Badillo G.

Industry 4.0 trends in intel-

ligent manufacturing auto-

mation exploring machine

learning

ASME International

Mechanical Engineering

Congress and Exposition,

Proceedings (IMECE)

2022 S

No author name available

[Conference Review]

Proceedings - 22nd IEEE

International Conference

on Data Mining Work-

shops, ICDMW 2022

IEEE International Confer-

ence on Data Mining

Workshops, ICDMW

2022 S

Heo S., Mayer P., Magno M. Predictive energy-aware

adaptive sampling with

deep reinforcement

learning

ICECS 2022 - 29th IEEE

International Conference

on Electronics, Circuits

and Systems, Proceedings

2022 S, W

No author name available

[Conference Review]

20th international confer-

ence on service-oriented

computing, ICSOC 2022

Lecture Notes in Com-

puter Science (including

subseries Lecture Notes

in Artiﬁcial Intelligence

and Lecture Notes in

Bioinformatics)

2022 S

Rampini L., Re Cecconi F. Artiﬁcial intelligence

in construction asset

management: a review of

present status, challenges

and future opportunities

Journal of Information

Technology in Construc-

tion

2022 S, W

Wang K., Yang R., Liu C.,

Samarasinghalage T.,

Zang Y.

Extracting electricity

patterns from high-

dimensional data: a com-

parison of K-Means and

DBSCAN algorithms

IOP Conference Series:

Earth and Environmental

Science

2022 S

No author name available

[Conference Review]

Proceedings - 2022 IEEE

international conference

on autonomic comput-

ing and self-organizing

systems companion,

ACSOS-C 2022

Proceedings - 2022 IEEE

International Conference

on Autonomic Comput-

ing and Self-Organizing

Systems Companion,

ACSOS-C 2022

2022 S

Dusparic I. Reinforcement learning for

sustainability: adapting in

large-scale heterogeneous

dynamic environments

Proceedings - 2022 IEEE

International Conference

on Autonomic Comput-

ing and Self-Organizing

Systems Companion,

ACSOS-C 2022

2022 S, W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 46 of 68

Authors Title Source title Year Database

Amadi K.W., Iyalla I.,

Radhakrishna P., Al Saba

M.T., Waly M.M.

Continuous dynamic drill-

oﬀ test whilst drilling

using reinforcement learn-

ing in autonomous rotary

drilling system

Society of Petroleum Engi-

neers - ADIPEC 2022

2022 S

No author name available

[Conference Review]

Proceedings - 2022 IEEE

5th international confer-

ence on artiﬁcial intel-

ligence and knowledge

engineering, AIKE 2022

Proceedings - 2022 IEEE

5th International Confer-

ence on Artiﬁcial Intel-

ligence and Knowledge

Engineering, AIKE 2022

2022 S

Wu L., Guo S., Liu Y.,

Hong Z., Zhan Y., Xu W.

Sustainable federated learn-

ing with long-term online

VCG Auction Mechanism

Proceedings - International

Conference on Distributed

Computing Systems

2022 S, W

Baumgart U., Burger M. Optimal control of traﬃc

ﬂow based on reinforce-

ment learning

Communications in Com-

puter and Information

Science

2022 S

Sabet S., Farooq B. Green vehicle routing prob-

lem: state of the art and

future directions

IEEE Access 2022 S, W

No author name available

[Conference Review]

Proceedings of International

Conference on Comput-

ing, Communication,

Security and Intelligent

Systems, IC3SIS 2022

Proceedings of International

Conference on Comput-

ing, Communication,

Security and Intelligent

Systems, IC3SIS 2022

2022 S

Andersen P.-A., Goodwin

M., Granmo O.-C.

CaiRL: a high-performance

reinforcement learning

environment toolkit

IEEE Conference on Com-

putatonal Intelligence and

Games, CIG

2022 S

Korecki M., Helbing D. Analytically guided rein-

forcement learning for

green it and ﬂuent traﬃc

IEEE Access 2022 S

No author name available

[Conference Review]

3rd international confer-

ence on resources and

environmental research,

ICRER 2021

IOP Conference Series:

Earth and Environmental

Science

2022 S

Ounoughi C., Touibi G.,

Yahia S.B.

EcoLight: eco-friendly traf-

ﬁc signal control driven

by urban noise prediction

Lecture Notes in Com-

puter Science (including

subseries Lecture Notes

in Artiﬁcial Intelligence

and Lecture Notes in

Bioinformatics)

2022 S, W

Tang Y., Deng X., Yi L.,

Xia Y., Yang L.T., Tang

A.X.

Collaborative intelligent

conﬁdent information cov-

erage node sleep schedul-

ing for 6G-empowered

green IoT

IEEE Transactions on

Green Communications

and Networking

2022 S, W

Paul S., Chowdhury S. A graph-based reinforce-

ment learning framework

for urban air mobility ﬂeet

scheduling

AIAA Aviation 2022 Forum 2022 S

No author name available

[Conference Review]

7th international scientiﬁc-

technical conference,

MANUFACTURING

2022

Lecture Notes in Mechani-

cal Engineering

2022 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 47 of 68 88

Authors Title Source title Year Database

No author name available

[Conference Review]

2nd conference of innova-

tive product design and

intelligent manufacturing

system, IPDIMS 2020

Lecture Notes in Mechani-

cal Engineering

2022 S

No author name available

[Conference Review]

7th international scientiﬁc-

technical conference,

MANUFACTURING

2022

Lecture Notes in Mechani-

cal Engineering

2022 S

Isufaj R., Sebastia D.A.,

Piera M.A.

Toward conﬂict resolution

with deep multi-agent

reinforcement learning

Journal of Air Transporta-

tion

2022 S

Yuan M., Pun M., Wang D. Rényi state entropy maxi-

mization for exploration

acceleration in reinforce-

ment learning

IEEE Transactions on Arti-

ﬁcial Intelligence

2022 S

Eriksson K., Ramasamy

S., Zhang X., Wang Z.,

Danielsson F.

Conceptual framework of

scheduling applying dis-

crete event simulation as

an environment for deep

reinforcement learning

Procedia CIRP 2022 S

Zhang W., Liu H., Xiong H.,

Xu T., Wang F., Xin H.,

Wu H.

RLCharge: imitative multi-

agent spatiotemporal

reinforcement learning for

electric vehicle charging

station recommendation

IEEE Transactions on

Knowledge and Data

Engineering

2022 S, W

Zhang W., Zhang J., Xie M.,

Liu T., Wang W., Pan C.

M2M-routing: environmen-

tal adaptive multi-agent

reinforcement learning

based multi-hop routing

policy for self-powered

IoT systems

Proceedings of the 2022

Design, Automation and

Test in Europe Confer-

ence and Exhibition,

DATE 2022

2022 S, W

No author name available

[Conference Review]

7th international scientiﬁc-

technical conference,

MANUFACTURING

2022

Lecture Notes in Mechani-

cal Engineering

2022 S

No author name available

[Conference Review]

7th international scientiﬁc-

technical conference,

MANUFACTURING

2022

Lecture Notes in Mechani-

cal Engineering

2022 S

Mao Z., Fang Z., Li M.,

Fan Y.

EvadeRL: evading PDF

malware classiﬁers with

deep reinforcement

learning

Security and Communica-

tion Networks

2022 S

Lin B., Duan J., Han M.,

Cai L.X.

Decentralized reinforcement

learning-based access

control for energy sustain-

able underwater acoustic

sub-network of MWCN

Wireless Networks (United

Kingdom)

2022 S

No author name available

[Conference Review]

7th international scientiﬁc-

technical conference,

MANUFACTURING

2022

Lecture Notes in Mechani-

cal Engineering

2022 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 48 of 68

Authors Title Source title Year Database

Maree C., Omlin C.W. Balancing proﬁt, risk, and

sustainability for portfolio

management

2022 IEEE Symposium on

Computational Intelli-

gence for Financial Engi-

neering and Economics,

CIFEr 2022 - Proceedings

2022 S, W

Lee J., Sun Y.G., Sim I.,

Kim S.H., Kim D.I., Kim

J.Y.

Non-technical loss detection

using deep reinforcement

learning for feature cost

eﬃciency and imbalanced

dataset

IEEE Access 2022 S, W

Liu Y., Yang M., Guo Z. Reinforcement learning

based optimal decision

making towards product

lifecycle sustainability

International Journal of

Computer Integrated

Manufacturing

2022 S, W

Alanne K., Sierla S. An overview of machine

learning applications for

smart buildings

Sustainable Cities and

Society

2022 S

Samuel O., Javaid N.,

Alghamdi T.A., Kumar N.

Towards sustainable smart

cities: a secure and scal-

able trading system for

residential homes using

blockchain and artiﬁcial

intelligence

Sustainable Cities and

Society

2022 S, W

Harrold D.J.B., Cao J.,

Fan Z.

Data-driven battery opera-

tion for energy arbitrage

using rainbow deep

reinforcement learning

Energy 2022 S, W

No author name available

[Conference Review]

Sustainable smart cities and

territories international

conference, SSCT 2021

Lecture Notes in Networks

and Systems

2022 S

Dhiman S., Lallotra B., Replacing of steel with

bamboo as reinforcement

with addition of sisal ﬁber

2022 W

Bekdas G., Yucel M.,

Nigdeli S. M.

Generation of eco-friendly

design for post-tensioned

axially symmetric rein-

forced concrete cylindri-

cal walls by minimizing

of CO2 emission

Structural Design of Tall

and Special Buildings

2022 W

Vandaele M., Stalhammar S. Hope dies, action begins?

The role of hope for

proactive sustainability

engagement among uni-

versity students

International Journal of

Sustainability in Higher

Education

2022 W

Sagar K. V., Jerald J. Real-time automated guided

vehicles scheduling with

markov decision process

and double Q-Learning

algorithm

Materials Today-Proceed-

ings

2022 W

Kalinin M., Ovasapyan T.,

Poltavtseva M.

Application of the learning

automaton model for

ensuring cyber resiliency

Symmetry-Basel 2022 W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 49 of 68 88

Authors Title Source title Year Database

Liu Q., Sun S., Rong B.,

Kadoch M.

Intelligent reﬂective surface

based 6G communications

for sustainable energy

infrastructure

IEEE Wireless Communica-

tions

2021 S, W

Muhammad G., Hossain

M.S.

Deep-reinforcement-

learning-based sustainable

energy distribution for

wireless communication

IEEE Wireless Communica-

tions

2021 S

Grzelczak M., Duch P. Deep reinforcement learn-

ing algorithms for path

planning domain in grid-

like environment

Applied Sciences (Swit-

zerland)

2021 S

Gao A., Wang Q., Liang W.,

Ding Z.

Game combined multi-

agent reinforcement learn-

ing approach for UAV

assisted oﬄoading

IEEE Transactions on

Vehicular Technology

2021 S, W

Atli İ., Ozturk M., Valastro

G.C., Asghar M.Z.

Multi-objective uav

positioning mechanism

for sustainable wireless

connectivity in environ-

ments with forbidden

ﬂying zones

Algorithms 2021 S

Guo L., Li Z., Outbib R. Reinforcement learning

based energy manage-

ment for fuel cell hybrid

electric vehicles

IECON Proceedings

(Industrial Electronics

Conference)

2021 S, W

Kővári B., Szőke L., Bécsi

T., Aradi S., Gáspár P.

Traﬃc signal control via

reinforcement learning for

reducing global vehicle

emission

Sustainability (Switzerland) 2021 S, W

Rangel-Martinez D., Nigam

K.D.P., Ricardez-Sandoval

L.A.

Machine learning on sus-

tainable energy: A review

and outlook on renewable

energy systems, catalysis,

smart grid and energy

storage

Chemical Engineering

Research and Design

2021 S, W

Choi J.-H., Yang B., Yu

C.W.

Artiﬁcial intelligence as

an agent to transform

research paradigms in

building science and

technology

Indoor and Built Environ-

ment

2021 S

Shani P., Chau S., Swei O. All roads lead to sustain-

ability: opportunities

to reduce the life-cycle

cost and global warming

impact of U.S. roadways

Resources, Conservation

and Recycling

2021 S

Jia R., Zhang X., Feng Y.,

Wang T., Lu J., Zheng Z.,

Li M.

Long-term energy collec-

tion in self-sustainable

sensor networks: a deep

Q-learning approach

IEEE Internet of Things

Journal

2021 S, W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 50 of 68

Authors Title Source title Year Database

Zhao J., Rodriguez M.A.,

Buyya R.

A deep reinforcement

learning approach to

resource management in

hybrid clouds harnessing

renewable energy and task

scheduling

IEEE International Confer-

ence on Cloud Comput-

ing, CLOUD

2021 S, W

Kathirgamanathan A., Man-

gina E., Finn D.P.

Development of a soft actor

critic deep reinforcement

learning approach for har-

nessing energy ﬂexibility

in a large oﬃce building

Energy and AI 2021 S, W

Mabina P., Mukoma P.,

Booysen M.J.

Sustainability matchmak-

ing: linking renewable

sources to electric water

heating through machine

learning

Energy and Buildings 2021 S

Chen K., Wang H.,

Valverde-Pérez B., Zhai

S., Vezzaro L., Wang A.

Optimal control towards

sustainable wastewater

treatment plants based on

multi-agent reinforcement

learning

Chemosphere 2021 S, W

Sacco A., Flocco M.,

Esposito F., Marchetto G.

Supporting sustainable

virtual network mutations

with mystique

IEEE Transactions on

Network and Service

Management

2021 S

Munir M.S., Tran N.H.,

Saad W., Hong C.S.

Multi-agent meta-reinforce-

ment learning for self-

powered and sustainable

edge computing systems

IEEE Transactions on

Network and Service

Management

2021 S

Pérez-Pons M.E., Alonso

R.S., García O., Marreiros

G., Corchado J.M.

Deep q-learning and prefer-

ence based multi-agent

system for sustainable

agricultural market

Sensors 2021 S

Zhang X., Manogaran G.,

Muthu B.

IoT enabled integrated

system for green energy

into smart cities

Sustainable Energy Tech-

nologies and Assessments

2021 S, W

Emami-Skardi M.J.,

Momenzadeh N., Kera-

chian R.

Social learning diﬀusion

and inﬂuential stake-

holders identiﬁcation

in socio-hydrological

environments

Journal of Hydrology 2021 S, W

Almalki A.J., Alsofyani M.,

Alghuried A., Wocjan P.,

Wang L.

Model-based variational

autoencoders with autore-

gressive ﬂows

Proceedings of the 2021

5th World Conference on

Smart Trends in Systems

Security and Sustainabil-

ity, WorldS4 2021

2021 S

Liu B., Han W., Wang E.,

Ma X., Xiong S., Qiao C.,

Wang J.

An eﬃcient message dis-

semination scheme for

cooperative drivings via

multi-agent hierarchical

attention reinforcement

learning

Proceedings - International

Conference on Distributed

Computing Systems

2021 S, W

Ghosh S., De S., Chatterjee

S., Portmann M.

Learning-based adaptive

sensor selection frame-

work for multi-sensing

WSN

IEEE Sensors Journal 2021 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 51 of 68 88

Authors Title Source title Year Database

Li L., Luo Y., Pu L. Q-learning enabled intel-

ligent energy attack in

sustainable wireless com-

munication networks

IEEE International Confer-

ence on Communications

2021 S, W

Sacco A., Esposito F., Mar-

chetto G., Montuschi P.

Sustainable task oﬄoad-

ing in UAV networks via

multi-agent reinforcement

learning

IEEE Transactions on

Vehicular Technology

2021 S

Razack A.J., Ajith V., Gupta

A deep reinforcement learn-

ing approach to traﬃc

signal control

2021 IEEE Conference on

Technologies for Sustain-

ability, SusTech 2021

2021 S

Zhang W., Liu H., Wang F.,

Xu T., Xin H., Dou D.,

Xiong H.

Intelligent electric vehicle

charging recommenda-

tion based on multi-agent

reinforcement learning

The Web Conference 2021 -

Proceedings of the World

Wide Web Conference,

WWW 2021

2021 S, W

Park J., Lee J., Kim T., Ahn

I., Park J.

Co-evolution of predator-

prey ecosystems by

reinforcement learning

agents

Entropy 2021 S, W

Eyni A., Skardi M.J.E.,

Kerachian R.

A regret-based behavioral

model for shared water

resources management:

Application of the corre-

lated equilibrium concept

Science of the Total Envi-

ronment

2021 S, W

Raeisi M., Mahboob A.S. Intelligent control of urban

intersection traﬃc light

based on reinforcement

learning algorithm

26th International Computer

Conference, Computer

Society of Iran, CSICC

2021

2021 S, W

Piovesan N., Lopez-Perez

D., Miozzo M., Dini P.

Joint load control and

energy sharing for renew-

able powered small base

stations: a machine learn-

ing approach

IEEE Transactions on

Green Communications

and Networking

2021 S

Yin H., Wei J., Zhao H.,

Xiong J., Mei K., Zhang

L., Ren B., Ma D.

An intelligent adaptative

architecture for wireless

communication in com-

plex scenarios

Scientia Sinica Informa-

tionis

2021 S

Leng J., Ruan G., Song Y.,

Liu Q., Fu Y.,Ding K.,

Chen X.

A loosely-coupled deep

reinforcement learning

approach for order accept-

ance decision of mass-

individualized printed

circuit board manufactur-

ing in industry 4.0

Journal of Cleaner Produc-

tion

2021 S, W

Baumgart U., Burger M. A reinforcement learn-

ing approach for traﬃc

control

International Conference

on Vehicle Technology

and Intelligent Transport

Systems, VEHITS - Pro-

ceedings

2021 S, W

Dinh T.H.L., Kaneko M.,

Wakao K., Kawamura K.,

Moriyama T., Takatori Y.

Towards an energy-eﬃcient

DQN-based user asso-

ciation in Sub6GHz/mm

wave integrated networks

Proceedings - 2021 17th

International Conference

on Mobility, Sensing and

Networking, MSN 2021

2021 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 52 of 68

Authors Title Source title Year Database

Barth A., Zhang L., Ma O. Cooperation of a team of

heterogeneous swarm

robots for space explora-

tion

Proceedings of the Inter-

national Astronautical

Congress, IAC

2021 S

Al-Jawad A., Comsa I.-S.,

Shah P., Gemikonakli O.,

Trestian R.

REDO: a reinforcement

learning-based dynamic

routing algorithm selec-

tion method for SDN

2021 IEEE Conference on

Network Function Vir-

tualization and Software

Deﬁned Networks, NFV-

SDN 2021 - Proceedings

2021 S, W

Cloud J.M., Nieves R.J.,

Duke A.K., Muller T.J.,

Janmohamed N.A., Buck-

les B.C., Dupuis M.A.

Towards autonomous lunar

resource excavation

via deep reinforcement

learning

Accelerating Space Com-

merce, Exploration, and

New Discovery confer-

ence, ASCEND 2021

2021 S

Isufaj R., Sebastia D.A.,

Piera M.A.

Towards conﬂict resolution

with deep multi-agent

reinforcement learning

14th USA/Europe Air Traf-

ﬁc Management Research

and Development Semi-

nar, ATM 2021

2021 S

No author name available

[Conference Review]

7th International Confer-

ence on Life System

Modeling and Simulation,

LSMS 2021, and the 7th

International Conference

on Intelligent Computing

for Sustainable Energy

and Environment, ICSEE

2021

Communications in Com-

puter and Information

Science

2021 S

No author name available

[Conference Review]

18th International Confer-

ence on Mobile Systems

and Pervasive Computing,

MobiSPC 2021, the 16th

International Confer-

ence on Future Networks

and Communications,

FNC 2021 and the 11th

International Conference

on Sustainable Energy

Information Technology,

SEIT 2021

Procedia Computer Science 2021 S

Gambin A.F., Angelats E.,

Gonzalez J.S., Miozzo M.,

DIni P.

Sustainable marine ecosys-

tems: deep learning for

water quality assessment

and forecasting

IEEE Access 2021 S, W

No author name available

[Conference Review]

AHFE conferences on

human factors in software

and systems engineering,

artiﬁcial intelligence and

social computing, and

energy, 2021

Lecture Notes in Networks

and Systems

2021 S

Serrano J.C., Mula J., Poler

Digital twin for supply

chain master planning in

zero-defect manufacturing

IFIP Advances in Informa-

tion and Communication

Technology

2021 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 53 of 68 88

Authors Title Source title Year Database

Chaudhuri R., Mukherjee

K., Narayanam R., Vallam

R.D.

Collaborative reinforce-

ment learning framework

to model evolution of

cooperation in sequential

social dilemmas

Lecture Notes in Com-

puter Science (including

subseries Lecture Notes

in Artiﬁcial Intelligence

and Lecture Notes in

Bioinformatics)

2021 S, W

Danassis P., Erden Z.D.,

Faltings B.

Improved cooperation by

exploiting a common

signal

Proceedings of the Interna-

tional Joint Conference

on Autonomous Agents

and Multiagent Systems,

AAMAS

2021 S

Daneshvar M., Asadi S.,

Mohammadi-Ivatloo B.

Energy trading possibilities

in the modern multi-car-

rier energy networks

Power Systems 2021 S

Zheng Z., Yan P., Chen Y.,

Cai J., Zhu F.

Increasing crop yield using

agriculture sensing data in

smart plant factory

Lecture Notes in Com-

puter Science (including

subseries Lecture Notes

in Artiﬁcial Intelligence

and Lecture Notes in

Bioinformatics)

2021 S

Musaddiq A., Ali R., Choi

J.-G., Kim B.-S., Kim

S.-W.

Collision observation-based

optimization of low-power

and lossy IoT network

using reinforcement

learning

Computers, Materials and

Continua

2021 S, W

Ballis H., Dimitriou L. Evaluating the performance

of reinforcement learning

signalling strategies for

sustainable urban road

networks

Advances in Intelligent

Systems and Computing

2021 S

Surovik D., Wang K.,

Vespignani M., Bruce J.,

Bekris K.E.

Adaptive tensegrity

locomotion: Controlling

a compliant icosahedron

with symmetry-reduced

reinforcement learning

International Journal of

Robotics Research

2021 S, W

Mandhare P., Yadav J.,

Kharat V., Patil C. Y.

Control and coordination of

self-adaptive traﬃc signal

using deep reinforcement

learning

International Journal of

Next-Generation Comput-

ing

2021 W

Hamutoglu N. B., Unveren-

Bilgic E. N., Salar H. C.,

Sahin Y. L.

The eﬀect of E-learning

experience on readiness,

attitude, and self-control/

self-management

Journal of Information

Technology Education-

Innovations in Practice

2021 W

de Oliveira A. M. L.,

Marques C. V., Field’s

K. A. P.

Application of mathemati-

cal modeling in the con-

struction of the ecological

house and its perspectives

in teaching

Cadernos Educacao Tecno-

logia e Sociedade

2021 W

Tiwari T., Shastry N., Nandi

Deep learning based lateral

control system

Proceedings - 2020 IEEE

International Symposium

on Sustainable Energy,

Signal Processing and

Cyber Security, iSSSC

2020

2020 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 54 of 68

Authors Title Source title Year Database

No author name available

[Conference Review]

Proceedings - 2020 Inter-

national Conference on

Pervasive Artiﬁcial Intel-

ligence, ICPAI 2020

Proceedings - 2020 Inter-

national Conference on

Pervasive Artiﬁcial Intel-

ligence, ICPAI 2020

2020 S

Pang G., Zhu X., Lu K.,

Peng Z., Deng W.

A simulator for reinforce-

ment learning training in

the recommendation ﬁeld

Proceedings - 2020 IEEE

International Symposium

on Parallel and Distrib-

uted Processing with

Applications, 2020 IEEE

International Conference

on Big Data and Cloud

Computing, 2020 IEEE

International Symposium

on Social Computing

and Networking and

2020 IEEE International

Conference on Sustain-

able Computing and

Communications, ISPA-

BDCloud-SocialCom-

SustainCom 2020

2020 S

Chemingui Y., Gastli A.,

Ellabban O.

Reinforcement learning-

based school energy

management system

Energies 2020 S, W

Krejci S.E., Ramroop-Butts

S., Torres H.N., Isokpehi

R.D.

Visual literacy intervention

for improving under-

graduate student critical

thinking of global sustain-

ability issues

Sustainability (Switzerland) 2020 S, W

Ballis H., Dimitriou L. Evaluation of reinforcement

learning traﬃc signalling

strategies for alternative

objectives: implementa-

tion in the network of

nicosia, cyprus

Transport and Telecommu-

nication

2020 S, W

Guliyev H.B., Tomin N.V.,

Ibrahimov F.S.

Methods of intelligent pro-

tection from asymmetrical

conditions in electric

networks

E3S Web of Conferences 2020 S

Wölﬂe D., Vishwanath A.,

Schmeck H.

A guide for the design of

benchmark environments

for building energy opti-

mization

BuildSys 2020 - Proceed-

ings of the 7th ACM

International Conference

on Systems for Energy-

Eﬃcient Buildings, Cities,

and Transportation

2020 S

Liu H., Zhang C., Guo Q. Data-driven robust voltage/

var control using PV

inverters in active distri-

bution networks

Proceedings - 2020 Inter-

national Conference on

Smart Grids and Energy

Systems, SGES 2020

2020 S, W

Nakamoto Y., Kumalija E.,

Zhang M.

Toward autonomous adap-

tive embedded systems for

sustainable services using

reinforcement learning

(WiP report)

Proceedings - 2020 8th

International Sympo-

sium on Computing and

Networking Workshops,

CANDARW 2020

2020 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 55 of 68 88

Authors Title Source title Year Database

No author name available

[Conference Review]

Proceedings - 2020 8th

International Sympo-

sium on Computing and

Networking Workshops,

CANDARW 2020

Proceedings - 2020 8th

International Sympo-

sium on Computing and

Networking Workshops,

CANDARW 2020

2020 S

Han M., Del Castillo L.A.,

Khairy S., Chen X., Cai

L.X., Lin B., Hou F.

Multi-agent reinforcement

learning for green energy

powered IoT networks

with random access

IEEE Vehicular Technology

Conference

2020 S, W

Henderson P., Hu J., Romoﬀ

J., Brunskill E., Jurafsky

D., Pineau J.

Towards the systematic

reporting of the energy

and carbon footprints of

machine learning

Journal of Machine Learn-

ing Research

2020 S

Tan Z., Karakose M. Comparative study for deep

reinforcement learning

with CNN, RNN, and

LSTM in autonomous

navigation

2020 International Confer-

ence on Data Analytics

for Business and Industry:

Way Towards a Sustain-

able Economy, ICDABI

2020

2020 S

Lee S., Cho Y., Lee Y.H. Injection mold production

sustainable scheduling

using deepreinforcement

learning

Sustainability (Switzerland) 2020 S, W

Han M., Duan J., Khairy S.,

Cai L.X.

Enabling sustainable

underwater IoT networks

with energy harvesting: a

decentralized reinforce-

ment learning approach

IEEE Internet of Things

Journal

2020 S, W

Opalic S.M., Goodwin M.,

Jiao L., Nielsen H.K., Lal

Kolhe M.

A deep reinforcement learn-

ing scheme for battery

energy management

2020 5th International

Conference on Smart and

Sustainable Technologies,

SpliTech 2020

2020 S

Dawn S., Saraogi U.,

Thakur U.S.

Agent-based learning for

auto-navigation within the

virtual city

2020 International Confer-

ence on Computational

Performance Evaluation,

ComPE 2020

2020 S, W

Xi F., Ruan X. Inﬂuence of intelligent

environmental art based

on reinforcement learn-

ing on the regionality of

architectural design

Proceedings of the Inter-

national Conference on

Electronics and Sustain-

able Communication

Systems, ICESC 2020

2020 S

Skardi M.J.E., Kerachian R.,

Abdolhay A.

Water and treated waste-

water allocation in urban

areas considering social

attachments

Journal of Hydrology 2020 S, W

Banerjee P.S., Mandal S.N.,

De D., Maiti B.

RL-sleep: temperature

adaptive sleep schedul-

ing using reinforcement

learning for sustainable

connectivity in wireless

sensor networks

Sustainable Computing:

Informatics and Systems

2020 S, W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 56 of 68

Authors Title Source title Year Database

Piovesan N., Miozzo M.,

Dini P.

Modeling the environment

in deep reinforcement

learning: The case of

energy harvesting base

stations

ICASSP, IEEE International

Conference on Acoustics,

Speech and Signal Pro-

cessing - Proceedings

2020 S, W

Radenkovic M., Ha Huynh

V.S.

Energy-aware opportunistic

charging and energy dis-

tribution for sustainable

vehicular edge and fog

networks

2020 5th International

Conference on Fog and

Mobile Edge Computing,

FMEC 2020

2020 S, W

Ma D., Lan G., Hassan M.,

Hu W., Das S.K.

Sensing, computing, and

communications for

energy harvesting IoTs: a

survey

IEEE Communications

Surveys and Tutorials

2020 S, W

Miozzo M., Piovesan N.,

Dini P.

Coordinated load control of

renewable powered small

base stations through

layered learning

IEEE Transactions on

Green Communications

and Networking

2020 S

No author name available

[Conference Review]

Proceedings of the 19th

international conference

on autonomous agents

and multiagent systems,

AAMAS 2020

Proceedings of the Interna-

tional Joint Conference

on Autonomous Agents

and Multiagent Systems,

AAMAS

2020 S

Yu K.-H., Jaimes E., Wang

C.-C.

Ai based energy optimiza-

tion in association with

class environment

ASME 2020 14th Inter-

national Conference on

Energy Sustainability, ES

2020

2020 S

Kazmi H., Driesen J. Automated demand side

management in buildings

Artiﬁcial Intelligence

Techniques for a Scal-

able Energy Transition:

Advanced Methods,

Digital Technologies,

Decision Support Tools,

and Applications

2020 S

No author name available

[Conference Review]

18th international con-

ference on practical

applications of agents

and multi-agent systems,

PAAMS 2020

Lecture Notes in Com-

puter Science (including

subseries Lecture Notes

in Artiﬁcial Intelligence

and Lecture Notes in

Bioinformatics)

2020 S

Bouhamed O., Ghazzai H.,

Besbes H., Massoud Y.

A UAV-assisted data col-

lection for wireless sensor

networks: autonomous

navigation and scheduling

IEEE Access 2020 S, W

Elavarasan D., Durairaj

Vincent P.M.

Crop yield prediction using

deep reinforcement learn-

ing model for sustainable

agrarian applications

IEEE Access 2020 S

Yang T., Zhao L., Li W.,

Zomaya A.Y.

Reinforcement learning in

sustainable energy and

electric systems: a survey

Annual Reviews in Control 2020 S, W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 57 of 68 88

Authors Title Source title Year Database

Perera-Villalba J. J.,

Martinez-Borreguero G.,

Naranjo-Correa F. L.,

Mateos-Nunez M.

Validation of a didactic

intervention based on

video games for the

teaching of sustainability

contents in secondary

education

14th International Tech-

nology, Education and

Development Conference

(INTED2020)

2020 W

Worum H., Lillekroken D.,

Ahlsen B., Roaldsen K. S.,

BerglandA.

Otago exercise programme-

from evidence to practice:

a qualitative study of

physiotherapists’ percep-

tions of the importance

of organisational factors

of leadership, context and

culture for knowledge

translation in Norway

BMC Health Services

Research

2020 W

Temesgene D.A., Miozzo

M., Dini P.

Dynamic control of func-

tional splits for energy

harvesting virtual small

cells: A distributed

reinforcement learning

approach

Computer Communications 2019 S, W

Lin K., Lin B., Chen X., Lu

Y., Huang Z., Mo Y.

A time-driven workﬂow

scheduling strategy for

reasoning tasks of autono-

mous driving in edge

environment

Proceedings - 2019 IEEE

Intl Conf on Parallel and

Distributed Processing

with Applications, Big

Data and Cloud Comput-

ing, Sustainable Comput-

ing and Communications,

Social Computing and

Networking, ISPA/

BDCloud/SustainCom/

SocialCom 2019

2019 S

Bhargavi K., Sathish Babu

Load balancing scheme for

the public cloud using

reinforcement learning

with raven roosting opti-

mization policy (RROP)

CSITSS 2019 - 2019 4th

International Confer-

ence on Computational

Systems and Information

Technology for Sustaina-

ble Solution, Proceedings

2019 S

Strnad F.M., Barfuss W.,

Donges J.F., Heitzig J.

Deep reinforcement

learning in World-Earth

system models to discover

sustainable management

strategies

Chaos 2019 S, W

Firdausiyah N., Taniguchi

E., Qureshi A.G.

Impacts of urban con-

solidation centres for

sustainable city logistics

using adaptive dynamic

programming based

multi-agent simulation

IOP Conference Series:

Earth and Environmental

Science

2019 S

Xu T., Wang N., Lin H.,

Sun Z.

UAV autonomous recon-

naissance route planning

based on deep reinforce-

ment learning

Proceedings of the 2019

IEEE International

Conference on Unmanned

Systems, ICUS 2019

2019 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 58 of 68

Authors Title Source title Year Database

Alizadeh Shabestray S.M.,

Abdulhai B.

Multimodal iNtelligent

Deep (MiND) traﬃc

signal controller

2019 IEEE Intelligent

Transportation Systems

Conference, ITSC 2019

2019 S, W

Ebell N., Gütlein M., Pruck-

ner M.

Sharing of energy among

cooperative households

using distributed multi-

agent reinforcement

learning

Proceedings of 2019 IEEE

PES Innovative Smart

Grid Technologies

Europe, ISGT-Europe

2019

2019 S, W

Vo N.N.Y., He X., Liu S.,

Xu G.

Deep learning for decision

making and the optimiza-

tion of socially respon-

sible investments and

portfolio

Decision Support Systems 2019 S, W

Chang S., Saha N., Castro-

Lacouture D., Yang P.P.-J.

Multivariate relationships

between campus design

parameters and energy

performance using rein-

forcement learning and

parametric modeling

Applied Energy 2019 S, W

Qin F.-B., Xu D. Review of robot manipula-

tion skill models

Zidonghua Xuebao/Acta

Automatica Sinica

2019 S

Shabana Anjum S., Md

Noor R., Ahmedy I., Anisi

M.H.

Energy optimization of sus-

tainable Internet of Things

(IoT) systems using an

energy harvesting medium

access protocol

IOP Conference Series:

Earth and Environmental

Science

2019 S

Liu Q., Liu Z., Xu W., Tang

Q., Zhou Z., Pham D.T.

Human-robot collaboration

in disassembly for sustain-

able manufacturing

International Journal of

Production Research

2019 S, W

Shi B., Yuan H., Shi R. Pricing cloud resource

based on multi-agent rein-

forcement learning in the

competing environment

Proceedings - 16th IEEE

International Symposium

on Parallel and Distrib-

uted Processing with

Applications, 17th IEEE

International Conference

on Ubiquitous Comput-

ing and Communications,

8th IEEE International

Conference on Big Data

and Cloud Computing,

11th IEEE Interna-

tional Conference on

Social Computing and

Networking and 8th IEEE

International Conference

on Sustainable Comput-

ing and Communications,

ISPA/IUCC/BDCloud/

SocialCom/SustainCom

2018

2019 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 59 of 68 88

Authors Title Source title Year Database

Do Q.V., Koo I. Dynamic bandwidth alloca-

tion scheme for wireless

networks with energy

harvesting using actor-

critic deep reinforcement

learning

1st International Conference

on Artiﬁcial Intelligence

in Information and Com-

munication, ICAIIC 2019

2019 S, W

Blad C., Koch S., Gane-

swarathas S., Kallesøe

C.S., Bøgh S.

Control of HVAC-systems

with slow thermodynamic

using reinforcement

learning

Procedia Manufacturing 2019 S

Mikhail M., Yacout S.,

Ouali M.-S.

Optimal preventive main-

tenance strategy using

reinforcement learning

Proceedings of the Inter-

national Conference on

Industrial Engineering

and Operations Manage-

ment

2019 S

Chen H., Zhao T., Li C.,

Guo Y.

Green internet of vehicles:

architecture, enabling

technologies, and applica-

tions

IEEE Access 2019 S, W

Chaudhuri R., Vallam R.D.,

Garg S., Mukherjee K.,

Kumar A., Singh S.,

Narayanam R., Mathur A.,

Parija G.

Collaborative reinforce-

ment learning model for

sustainability of coopera-

tion in sequential social

dilemmas

Proceedings of the Interna-

tional Joint Conference

on Autonomous Agents

and Multiagent Systems,

AAMAS

2019 S, W

Park J.Y., Nagy Z. The inﬂuence of building

design, sensor placement,

and occupant preferences

on occupant centered

lighting control

Computing in Civil

Engineering 2019: Smart

Cities, Sustainability,

and Resilience - Selected

Papers from the ASCE

International Conference

on Computing in Civil

Engineering 2019

2019 S

No author name available

[Conference Review]

2nd International Confer-

ence on Intelligent Human

Systems Integration, IHSI

2019

Advances in Intelligent

Systems and Computing

2019 S

McLauchlan A., Joao E. Recognising learning’ as an

uncertain source of SEA

eﬀectiveness

Impact Assessment and

Project Appraisal

2019 W

Halim D. A., Karyanto P.,

Sarwono

Education for sustainable

development: student’s

biophilia and the emome

model as an alternative

eﬀorts of enhancement

in the perspectives of

education

2nd International Confer-

ence on Science, Math-

ematics, Environment,

and Education, 2019

2019 W

Saifuddin M. R. B., Logen-

thiran T., Naayagi R. T.,

Woo W. L.

A nano-biased energy man-

agement using reinforced

learning multi-agent on

layered coalition model:

consumer sovereignty

IEEE Access 2019 W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 60 of 68

Authors Title Source title Year Database

Temesgene D.A., Miozzo

M., Dini P.

Dynamic functional split

selection in energy har-

vesting virtual small cells

using temporal diﬀerence

learning

IEEE International Sympo-

sium on Personal, Indoor

and Mobile Radio Com-

munications, PIMRC

2018 S, W

No author name available

[Conference Review]

6th IEEE international

conference on advanced

logistics and transport,

ICALT 2017 - proceed-

ings

6th IEEE International

Conference on Advanced

Logistics and Transport,

ICALT 2017 - Proceed-

ings

2018 S

Prauzek M., Mourcet

N.R.A., Hlavica J.,

Musilek P.

Q-learning algorithm for

energy management in

solar powered embedded

monitoring systems

2018 IEEE Congress on

Evolutionary Computa-

tion, CEC 2018 - Proceed-

ings

2018 S, W

Ghanshala K.K., Sharma

S., Mohan S., Nautiyal L.,

Mishra P., Joshi R.C.

Self-organizing sustainable

spectrum management

methodology in cognitive

radio vehicular Adhoc

network (CRAVENET)

environment: a reinforce-

ment learning approach

ICSCCC 2018 - 1st Inter-

national Conference on

Secure Cyber Computing

and Communications

2018 S, W

Aziz H.M.A., Zhu F., Ukku-

suri S.V.

Learning-based traﬃc

signal control algorithms

with neighborhood

information sharing: an

application for sustainable

mobility

Journal of Intelligent

Transportation Systems:

Technology, Planning,

and Operations

2018 S

Laubis K., Knöll F., Zeidler

V., Simko V.

Crowdsensing-based road

condition monitoring ser-

vice: an assessment of its

managerial implications

to road authorities

Lecture Notes in Business

Information Processing

2018 S

Hwangbo S., Yoo C. A methodology of a hybrid

hydrogen supply network

(HHSN) under alternative

energy resources (AERs)

of hydrogen footprint

constraint for sustainable

energy production (SEP)

Computer Aided Chemical

Engineering

2018 S

Ganapathi Subramanian S.,

Crowley M.

Combining MCTS and A3C

for prediction of spatially

spreading processes in

forest wildﬁre settings

Lecture Notes in Com-

puter Science (including

subseries Lecture Notes

in Artiﬁcial Intelligence

and Lecture Notes in

Bioinformatics)

2018 S

Serrat R., Alcala M.,

Delgado-Aguilar M.,

Tarres J., Oliver-Ortega

H., Mutje P.

Case study: development

of biodegradable hybrid

materials as a substitute

for glass ﬁber reinforced

composites

EDULEARN18: 10th

International Conference

on Education and New

Learning Technologies

2018 W

Huang J., Gao Y., Lu S.,

Zhao X. B., Deng Y. D.,

Gu M.

Energy-eﬃcient automatic

train driving by learning

driving patterns

2018 W

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 61 of 68 88

Authors Title Source title Year Database

Miozzo M., Giupponi L.,

Rossi M., Dini P.

Switch-On/oﬀ policies for

energy harvesting small

cells through distributed

Q-learning

2017 IEEE Wireless Com-

munications and Network-

ing Conference Work-

shops, WCNCW 2017

2017 S, W

Lindkvist E., Ekeberg Ö.,

Norberg J.

Strategies for sustainable

management of renewable

resources during environ-

mental change

Proceedings of the Royal

Society B: Biological

Sciences

2017 S

Perolat J., Leibo J.Z., Zam-

baldi V., Beattie C., Tuyls

K., Graepel T.

A multi-agent reinforce-

ment learning model of

common-pool resource

appropriation

Advances in Neural

Information Processing

Systems

2017 S, W

Dos Santos Mignon A., De

Azevedo Da Rocha R.L.

An adaptive implementation

𝜖

-greedy in reinforce-

ment learning

Procedia Computer Science 2017 S

Ciric D., Todorovic V.,

Lalic B.

Boosting student entrepre-

neurship through idealab

concept in Western

Balkan countries

9th International Confer-

ence on Education and

New Learning Technolo-

gies (EDULEARN17)

2017 W

Sheikhi A., Rayati M.,

Ranjbar A.M.

Dynamic load management

for a residential customer;

reinforcement learning

approach

Sustainable Cities and

Society

2016 S, W

Chen H., Li X., Zhao F. A reinforcement learning-

based sleep scheduling

algorithm for desired area

coverage in solar-powered

wireless sensor networks

IEEE Sensors Journal 2016 S, W

Mathlouthi S., Trabelsi

F.B.F., Zribi C.B.O.

A novel approach based on

reinforcement learning

for anaphora resolution in

Arabic texts

Proceedings of the 28th

International Business

Information Management

Association Conference -

Vision 2020: Innovation

Management, Develop-

ment Sustainability, and

Competitive Economic

Growth

2016 S

Kaiser, A., Kragulj, F.,

Grisold, T.

Identifying human needs in

organizations to develop

sustainable intellectual

capital - reﬂections on

best practices

Proceedings of the 8th

European Conference on

Intellectual Capital (ECIC

2016)

2016 W

Atesok K., Satava R. M.,

Van Heest A., Hogan M.

V., Pedowitz R. A., Fu F.

H., Sitnikov I., Marsh J.

L., Hurwitz S. R.

Retention of skills after

simulation-based training

in orthopaedic surgery

Journal of the American

Academy of Orthopaedic

Surgeons

2016 W

De Gracia A., Fernández

C., Castell A., Mateu C.,

Cabeza L.F.

Control of a PCM ventilated

facade using reinforce-

ment learning techniques

Energy and Buildings 2015 S

Soares I.B., De Hauwere

Y.-M., Januarius K., Brys

T., Salvant T., Nowe A.

Departure MANagement

with a reinforcement

learning approach:

respecting CFMU slots

IEEE Conference on Intel-

ligent Transportation

Systems, Proceedings,

ITSC

2015 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 62 of 68

Authors Title Source title Year Database

Jin J., Ma X. Adaptive group-based

signal control using rein-

forcement learning with

eligibility traces

IEEE Conference on Intel-

ligent Transportation

Systems, Proceedings,

ITSC

2015 S

Hsu R.C., Lin T.-H., Chen

S.-M., Liu C.-T.

Dynamic energy manage-

ment of energy harvesting

wireless sensor nodes

using fuzzy inference

system with reinforcement

learning

Proceeding - 2015 IEEE

International Conference

on Industrial Informatics,

INDIN 2015

2015 S, W

Miozzo M., Giupponi L.,

Rossi M., Dini P.

Distributed Q-learning for

energy harvesting hetero-

geneous networks

2015 IEEE International

Conference on Communi-

cation Workshop, ICCW

2015

2015 S

Vankov P., Vankova D., Sustainable Educational &

Emotional Model - An

Experience from Bulgaria

EDULEARN15: 7th

International Conference

on Education and New

Learning Technologies

2015 W

Morales R. C., Sotomayor J.

J., Hochstetter J., Figueroa

REFCOMTIC Interac-

tive Manual for The

Strengthen of Transversal

Teaching Competences

INTED2015: 9th Interna-

tional Technology, Educa-

tion and Development

Conference

2015 W

Geller E. S. Seven Life Lessons From

Humanistic Behavior-

ism: How to Bring the

Best Out of Yourself and

Others

Journal of Organizational

Behavior Management

2015 W

Comşa I.S., Aydin M.,

Zhang S., Kuonen P.,

Wagen J.-F., Lu Y.

Scheduling policies based

on dynamic throughput

and fairness tradeoﬀ con-

trol in LTE-A networks

Proceedings - Conference

on Local Computer Net-

works, LCN

2014 S, W

Hsu R.C., Liu C.-T., Wang

H.-L.

A reinforcement learning-

based ToD provisioning

dynamic power manage-

ment for sustainable oper-

ation of energy harvesting

wireless sensor node

IEEE Transactions on

Emerging Topics in

Computing

2014 S

Urieli D., Stone P. TacTex’13: A champion

adaptive power trading

agent

AAMAS 2014 Workshop

on Adaptive and Learning

Agents, ALA 2014

2014 S, W

Urieli D., Stone P. TacTex’13: A champion

adaptive power trading

agent

13th International Confer-

ence on Autonomous

Agents and Multiagent

Systems, AAMAS 2014

2014 S

Urieli D., Stone P. TacTex’13: A champion

adaptive power trading

agent

Proceedings of the National

Conference on Artiﬁcial

Intelligence

2014 S, W

Lindkvist E., Norberg J. Modeling experiential

learning: The challenges

posed by threshold

dynamics for sustain-

able renewable resource

management

Ecological Economics 2014 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 63 of 68 88

Authors Title Source title Year Database

Crowley M. Using equilibrium policy

gradients for spatiotem-

poral planning in forest

ecosystem management

IEEE Transactions on

Computers

2014 S

Bielskis A.A., Guseinoviene

E., Zutautas L., Drungi-

las D., Dzemydiene D.,

Gricius G.

Modeling of Ambient

Comfort Aﬀect Reward

based on multi-agents

in cloud interconnection

environment for develop-

ing the sustainable home

controller

2013 8th International Con-

ference and Exhibition

on Ecological Vehicles

and Renewable Energies,

EVER 2013

2013 S

Bielskis A.A., Guseinoviene

E., Dzemydiene D.,

Drungilas D., Gricius G.

Ambient lighting controller

based on reinforcement

learning components of

multi-agents

Elektronika ir Elektrotech-

nika

2012 S, W

No author name available

[Conference Review]

APBITM 2011 - Proceed-

ings 2011 IEEE Interna-

tional Summer Confer-

ence of Asia Paciﬁc

Business Innovation and

Technology Management

APBITM 2011 - Proceed-

ings2011 IEEE Interna-

tional Summer Confer-

ence of Asia Paciﬁc

Business Innovation and

Technology Management

2011 S

No author name available

[Conference Review]

IEEE 2011 EnergyTech,

ENERGYTECH 2011

IEEE 2011 EnergyTech,

ENERGYTECH 2011

2011 S

Anyanwu L.O., Keengwe J.,

Arome G.A.

Scalable intrusion detection

with recurrent neural

networks

ITNG2010 - 7th Interna-

tional Conference on

Information Technology:

New Generations

2010 S

Sabbadin R., Spring D.,

Bergonnier E.

A reinforcement-learning

application to biodiversity

conservation in Costa-

Rican forest

MODSIM07 - Land, Water

and Environmental

Management: Integrated

Systems for Sustainability,

Proceedings

2007 S

Chadès I., Martin T.G., Cur-

tis J.M.R., Barreto C.

Managing interacting

species: A reinforcement

learning decision theoretic

approach

MODSIM07 - Land, Water

and Environmental

Management: Integrated

Systems for Sustainability,

Proceedings

2007 S

Bielskis A. A., Denisovas

V., Ramasauskas O.

Ambient Intelligence of

e-possibilities perception

for sustainable develop-

ment

4th International Confer-

ence Citizens and Gov-

ernance for Sustainable

Development

2006 W

Salden A.H., Kempen Ma. Sustainable cybernetics

systems: Backbones of

ambient intelligent envi-

ronments

Ambient Intelligence: A

Novel Paradigm

2005 S

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 64 of 68

Authors Title Source title Year Database

Chen L., Evans T., Anand

S., Bouﬀord J., Brown H.,

Chowdhury M., Cueto

M., Dare L., Dussault G.,

Elzinga G., Fee E., Habte

D., Hanvoravongchai

P., Jacobs M., Kurowski

C., Michael S., Pablos-

Mendez A., Sewankambo

N., Solimano G., Stilwell

B., de Waal A., Wibulpol-

prasert S.

Human resources for health:

overcoming the crisis

LANCET 2004 W

Salden A., de Heer J. Natural anticipation and

selection of attention

within sustainable intel-

ligent multimodal systems

by collective intelligent

agents

8th World Multi-Conference

on Systemics, Cybernetics

and Informatics, Vol IX,

Proceedings: Computer

Science and Engineer-

ing: I

2004 W

Author contributions MZ: Conceptualization, Investigation, Methodology, Writing—Original Draft. AC,

DLT and AF: Supervision, Conceptualization, Investigation, Validation, Writing—Review and Editing. LM:

Conceptualization,Validation, Writing—Review and Editing.

Funding Open access funding provided by Università degli Studi di Verona within the CRUI-CARE

Agreement.

Declarations

Competing interests The authors declare no competing interests.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License,

which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long

as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Com-

mons licence, and indicate if changes were made. The images or other third party material in this article

are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the

material. If material is not included in the article’s Creative Commons licence and your intended use is not

permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly

from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.

References

Ajao L, Apeh S (2023) Secure edge computing vulnerabilities in smart cities sustainability using petri net

and genetic algorithm-based reinforcement learning. Intell Syst Appl. https:// doi. org/ 10. 1016/j. iswa.

2023. 200216

Al-Jawad A, Comşa I, Shah P, et al (2021) REDO: a reinforcement learning-based dynamic routing algo-

rithm selection method for SDN. In: IEEE conference on network function virtualization and software

deﬁned networks (NFV-SDN), pp 54–59, https:// doi. org/ 10. 1109/ NFV- SDN53 031. 2021. 96651 40

Alanne K, Sierla S (2022) An overview of machine learning applications for smart buildings. Sustain Cities

Soc. https:// doi. org/ 10. 1016/j. scs. 2021. 103445

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 65 of 68 88

AlizadehShabestray SM, Abdulhai B (2019) Multimodal iNtelligent Deep (MiND) traﬃc signal controller.

In: IEEE intelligent transportation systems conference (ITSC), pp 4532–4539, https:// doi. org/ 10. 1109/

ITSC. 2019. 89174 93

Auﬀenberg F, Snow S, Stein S etal (2017) A comfort-based approach to smart heating and air conditioning.

ACM Trans Intell Syst Technol. https:// doi. org/ 10. 1145/ 30577 30

Aziz H, Zhu F, Ukkusuri S (2018) Learning-based traﬃc signal control algorithms with neighborhood

information sharing: an application for sustainable mobility. J Intell Trans Syst Technol Plan Operat.

https:// doi. org/ 10. 1080/ 15472 450. 2017. 13875 46

Azzalini D, Castellini A, Luperto M, et al (2020) HMMs for anomaly detection in autonomous robots.

In: Proceedings of the 2020 international conference on autonomous agents and multiagent systems,

AAMAS, p 105–113, https:// doi. org/ 10. 5555/ 33987 61. 33987 79

Bazzan ALC, Peleteiro-Ramallo A, Burguillo-Rial JC (2011) Learning to cooperate in the iterated pris-

oner’s dilemma by means of social attachments. J Braz Comput Soc 17(3):163–174. https:// doi. org/ 10.

1007/ s13173- 011- 0038-2

Bianchi F, Castellini A, Tarocco P, etal (2019) Load forecasting in district heating networks: Model com-

parison on a real-world case study. In: Machine learning, optimization, and data science: 5th interna-

tional conference, LOD 2019, proceedings. Springer-Verlag, p 553–565, https:// doi. org/ 10. 1007/ 978-3-

030- 37599-7_ 46

Bianchi F, Corsi D, Marzari L, etal (2023) Safe and eﬃcient reinforcement learning for environmental

monitoring. In: Proceedings of Ital-IA 2023: 3rd National Conference on Artiﬁcial Intelligence, CEUR

Workshop Proceedings, vol 3486. CEUR-WS.org, pp 2610–615

Bistaﬀa F, Farinelli A, Chalkiadakis G etal (2017) A cooperative game-theoretic approach to the social

ridesharing problem. Artif Intell 246:86–117. https:// doi. org/ 10. 1016/j. artint. 2017. 02. 004

Bistaﬀa F, Blum C, Cerquides J etal (2021) A computational approach to quantify the beneﬁts of rideshar-

ing for policy makers and travellers. IEEE Trans Intell Transport Syst 22(1):119–130. https:// doi. org/

10. 1109/ TITS. 2019. 29549 82

Blij NHVD, Chaifouroosh D, Cañizares CA, etal (2020) Improved power ﬂow methods for DC grids. In:

29th IEEE international symposium on industrial electronics, ISIE. IEEE, pp 1135–1140, https:// doi.

org/ 10. 1109/ ISIE4 5063. 2020. 91525 70

Bouhamed O, Ghazzai H, Besbes H etal (2020) A UAV-assisted data collection for wireless sensor net-

works: Autonomous navigation and scheduling. IEEE Access. https:// doi. org/ 10. 1109/ ACCESS. 2020.

30025 38

Brown J, Abate A, Rogers A (2021) QUILT: quantify, infer and label the thermal eﬃciency of heating and

cooling residential homes. In: BuildSys ’21: The 8th ACM international conference on systems for

energy-eﬃcient buildings, cities, and transportation. ACM, pp 51–60, https:// doi. org/ 10. 1145/ 34866

11. 34866 53

Capuzzo M, Zanella A, Zuccotto M, etal (2022) IoT systems for healthy and safe life environments. In:

IEEE forum on research and technologies for society and industry innovation (RTSI), pp 31–37,

https:// doi. org/ 10. 1109/ RTSI5 5261. 2022. 99051 93

Castellini A, Chalkiadakis G, Farinelli A (2019) Inﬂuence of state-variable constraints on partially observ-

able monte carlo planning. In: Proceedings of the twenty-eighth international joint conference on arti-

ﬁcial intelligence, IJCAI 2019. International Joint Conferences on Artiﬁcial Intelligence Organization,

pp 5540–5546, https:// doi. org/ 10. 24963/ ijcai. 2019/ 769

Castellini A, Bicego M, Masillo F et al (2020) Time series segmentation for state-model generation of

autonomous aquatic drones: a systematic framework. Eng Appl Artif Intell. https:// doi. org/ 10. 1016/j.

engap pai. 2020. 103499

Castellini A, Bianchi F, Farinelli A (2021) Predictive model generation for load forecasting in district heat-

ing networks. IEEE Intell Syst 36(4):86–95. https:// doi. org/ 10. 1109/ MIS. 2020. 30059 03

Castellini A, Bianchi F, Farinelli A (2022) Generation and interpretation of parsimonious predictive models

for load forecasting in smart heating networks. Appl Intell 52(9):9621–9637. https:// doi. org/ 10. 1007/

s10489- 021- 02949-4

Castellini A, Bianchi F, Zorzi E, etal (2023) Scalable safe policy improvement via Monte Carlo tree search.

In: Proceedings of the 40th international conference on machine learning, proceedings of machine

learning research, vol 202. PMLR, pp 3732–3756

Charef N, Ben Mnaouer A, Aloqaily M etal (2023) Artiﬁcial intelligence implication on energy sustain-

ability in internet of things: a survey. Info Process Manag. https:// doi. org/ 10. 1016/j. ipm. 2022. 103212

Chen H, Li X, Zhao F (2016) A reinforcement learning-based sleep scheduling algorithm for desired area

coverage in solar-powered wireless sensor networks. IEEE Sensors Journal. https:// doi. org/ 10. 1109/

JSEN. 2016. 25170 84

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 66 of 68

Chen H, Zhao T, Li C etal (2019) Green internet of vehicles: Architecture, enabling technologies, and

applications. IEEE Access. https:// doi. org/ 10. 1109/ ACCESS. 2019. 29581 75

Chen K, Wang H, Valverde-Pérez B etal (2021) Optimal control towards sustainable wastewater treatment

plants based on multi-agent reinforcement learning. Chemosphere. https:// doi. org/ 10. 1016/j. chemo

sphere. 2021. 130498

De Gracia A, Fernández C, Castell A etal (2015) Control of a PCM ventilated facade using reinforcement

learning techniques. Energy Build. https:// doi. org/ 10. 1016/j. enbui ld. 2015. 06. 045

Elavarasan D, Durairaj Vincent P (2020) Crop yield prediction using deep reinforcement learning model for

sustainable agrarian applications. IEEE Access. https:// doi. org/ 10. 1109/ ACCESS. 2020. 29924 80

Emamjomehzadeh O, Kerachian R, Emami-Skardi M etal (2023) Combining urban metabolism and rein-

forcement learning concepts for sustainable water resources management: a nexus approach. J Environ

Manag. https:// doi. org/ 10. 1016/j. jenvm an. 2022. 117046

Feng Y, Zhang X, Jia R etal (2023) Intelligent trajectory design for mobile energy harvesting and data

transmission. IEEE Internet Things J. https:// doi. org/ 10. 1109/ JIOT. 2022. 32022 52

Gao Y, Chang D, Chen CH (2023) A digital twin-based approach for optimizing operation energy consump-

tion at automated container terminals. J Clean Prod. https:// doi. org/ 10. 1016/j. jclep ro. 2022. 135782

Giri MK, Majumder S (2022) Deep Q-learning based optimal resource allocation method for energy har-

vested cognitive radio networks. Phys Commun. https:// doi. org/ 10. 1016/j. phycom. 2022. 101766

Goodland R (1995) The concept of environmental sustainability. Ann Rev Ecol Syst 26(1):1–24. https:// doi.

org/ 10. 1146/ annur ev. es. 26. 110195. 000245

Gu Z, Liu Z, Wang Q etal (2023) Reinforcement learning-based approach for minimizing energy loss of

driving platoon decisions. Sensors. https:// doi. org/ 10. 3390/ s2308 4176

Han M, Duan J, Khairy S etal (2020) Enabling sustainable underwater IoT networks with energy harvest-

ing: a decentralized reinforcement learning approach. IEEE Internet Things J. https:// doi. org/ 10. 1109/

JIOT. 2020. 29907 33

Harrold D, Cao J, Fan Z (2022) Data-driven battery operation for energy arbitrage using rainbow deep rein-

forcement learning. Energy. https:// doi. org/ 10. 1016/j. energy. 2021. 121958

Hausknecht M, Stone P (2015) Deep recurrent Q-learning for partially observable MDPs. Preprint at https://

arxiv. org/ abs/ 1507. 06527

Heinzelman W, Chandrakasan A, Balakrishnan H (2002) An application-speciﬁc protocol architecture for

wireless microsensor networks. IEEE Trans Wireless Commun 1(4):660–670. https:// doi. org/ 10. 1109/

TWC. 2002. 804190

Hessel M, Modayil J, van Hasselt H, etal (2018) Rainbow: combining improvements in deep reinforcement

learning. In: Proceedings of the AAAI conference on artiﬁcial intelligence, pp 3215–3222

Himeur Y, Elnour M, Fadli F etal (2022) Ai-big data analytics for building automation and management

systems: a survey, actual challenges and future perspectives. Artif Intell Rev 56(6):4929–5021.

https:// doi. org/ 10. 1007/ s10462- 022- 10286-2

Hsu R, Liu CT, Wang HL (2014) A reinforcement learning-based ToD provisioning dynamic power

management for sustainable operation of energy harvesting wireless sensor node. IEEE Trans

Emerg Topics Comput. https:// doi. org/ 10. 1109/ TETC. 2014. 23165 18

Huo D, Sari Y, Kealey R etal (2023) Reinforcement learning-based ﬂeet dispatching for greenhouse

gas emission reduction in open-pit mining operations. Resour Conserv Recycl. https:// doi. org/ 10.

1016/j. resco nrec. 2022. 106664

Jendoubi I, Bouﬀard F (2022) Data-driven sustainable distributed energy resources’ control based on

multi-agent deep reinforcement learning. Sustain Energy Grids Netw. https:// doi. org/ 10. 1016/j.

segan. 2022. 100919

Kaelbling LP, Littman ML, Cassandra AR (1998) Planning and acting in partially observable stochastic

domains. Artif Intell 101(1–2):99–134. https:// doi. org/ 10. 1016/ S0004- 3702(98) 00023-X

Kathirgamanathan A, Mangina E, Finn D (2021) Development of a soft actor critic deep reinforcement

learning approach for harnessing energy ﬂexibility in a large oﬃce building. Energy AI. https:// doi.

org/ 10. 1016/j. egyai. 2021. 100101

Khalid M, Wang L, Wang K et al (2023) Deep reinforcement learning-based long-range autonomous

valet parking for smart cities. Sustain Cities Soc. https:// doi. org/ 10. 1016/j. scs. 2022. 104311

Koufakis AM, Rigas ES, Bassiliades N etal (2020) Oﬄine and online electric vehicle charging schedul-

ing with V2V energy transfer. IEEE Trans Intell Transport Syst 21(5):2128–2138. https:// doi. org/

10. 1109/ TITS. 2019. 29140 87

LeCun Y (1989) Generalization and network design strategies. Connect Perspect 19(143–155):18

Leng J, Ruan G, Song Y etal (2021) A loosely-coupled deep reinforcement learning approach for order

acceptance decision of mass-individualized printed circuit board manufacturing in industry 4.0. J

Clean Prod. https:// doi. org/ 10. 1016/j. jclep ro. 2020. 124405

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Reinforcement learning applications inenvironmental…

1 3

Page 67 of 68 88

Li C, Bai L, Yao L etal (2023) A bibliometric analysis and review on reinforcement learning for trans-

portation applications. Transportmetrica B. https:// doi. org/ 10. 1080/ 21680 566. 2023. 21794 61

Lillicrap TP, Hunt JJ, Pritzel A, etal (2016) Continuous control with deep reinforcement learning. In:

International conference on learning representations, ICLR

Liu Q, Sun S, Rong B etal (2021) Intelligent reﬂective surface based 6G communications for sustainable

energy infrastructure. IEEE Wireless Commun. https:// doi. org/ 10. 1109/ MWC. 016. 21001 79

Lowe R, Wu Y, Tamar A, etal (2017) Multi-agent actor-critic for mixed cooperative-competitive envi-

ronments. In: Proceedings of the international conference on neural information processing sys-

tems, NIPS, p 6382-6393

Ma D, Lan G, Hassan M etal (2020) Sensing, computing, and communications for energy harvesting

IoTs: a survey. IEEE Commun Surv Tutor 22(2):1222–1250. https:// doi. org/ 10. 1109/ COMST. 2019.

29625 26

Mabina P, Mukoma P, Booysen M (2021) Sustainability matchmaking: linking renewable sources to electric

water heating through machine learning. Energy Build. https:// doi. org/ 10. 1016/j. enbui ld. 2021. 111085

Marchesini E, Corsi D, Farinelli A (2021) Benchmarking safe deep reinforcement learning in aquatic

navigation. In: IEEE/RSJ international conference on intelligent robots and systems, IROS. IEEE,

pp 5590–5595, https:// doi. org/ 10. 1109/ IROS5 1168. 2021. 96359 25

Mazzi G, Castellini A, Farinelli A (2021) Rule-based shielding for partially observable monte-carlo

planning. In: Proceedings of the international conference on automated planning and scheduling pp

243–251. https:// doi. org/ 10. 1609/ icaps. v31i1. 15968

Mazzi G, Castellini A, Farinelli A (2023) Risk-aware shielding of partially observable monte carlo plan-

ning policies. Artif Intell 324:103987

Miozzo M, Giupponi L, Rossi M, etal (2015) Distributed Q-learning for energy harvesting heterogene-

ous networks. In: IEEE international conference on communication workshop (ICCW), pp 2006–

2011, https:// doi. org/ 10. 1109/ ICCW. 2015. 72474 75

Miozzo M, Giupponi L, Rossi M, etal (2017) Switch-on/oﬀ policies for energy harvesting small cells

through distributed Q-learning. In: IEEE wireless communications and networking conference

workshops (WCNCW), pp 1–6, https:// doi. org/ 10. 1109/ WCNCW. 2017. 79190 75

Mischos S, Dalagdi E, Vrakas D (2023) Intelligent energy management systems: a review. Artif Intell

Rev. https:// doi. org/ 10. 1007/ s10462- 023- 10441-3

Mnih V, Kavukcuoglu K, Silver D etal (2015) Human-level control through deep reinforcement learn-

ing. Nature 518(7540):529–533. https:// doi. org/ 10. 1038/ natur e14236

Moerland TM, Broekens J, Plaat A, etal (2020) Model-based reinforcement learning: a survey. arXiv

abs/2206.09328. https:// doi. org/ 10. 48550/ ARXIV. 2206. 09328

Ng A, Harada D, Russell SJ (1999) Policy invariance under reward transformations: theory and applica-

tion to reward shaping. In: Proceedings of the international conference on machine learning, ICML,

p 278–287

Orfanoudakis S, Chalkiadakis G (2023) A novel aggregation framework for the eﬃcient integration of

distributed energy resources in the smart grid. In: Proceedings of the 2023 international conference

on autonomous agents and multiagent systems. AAMAS. ACM, pp 2514–2516, https:// doi. org/ 10.

5555/ 35459 46. 35989 86

Ounoughi C, Touibi G, Yahia S (2022) EcoLight: eco-friendly traﬃc signal control driven by urban

noise prediction. Lecture Notes Comput Sci. https:// doi. org/ 10. 1007/ 978-3- 031- 12423-5_ 16

Panagopoulos AA, Alam M, Rogers A, etal (2015) AdaHeat: a general adaptive intelligent agent for

domestic heating control. In: Proceedings of the 2015 international conference on autonomous

agents and multiagent systems, AAMAS. ACM, pp 1295–1303

Perianes-Rodriguez A, Waltman L, van Eck NJ (2016) Constructing bibliometric networks: a compari-

son between full and fractional counting. J Info 10(4):1178–1195. https:// doi. org/ 10. 1016/j. joi.

2016. 10. 006

Puterman ML (1994) Markov Decision Processes: Discrete Stochastic Dynamic Programming. Wiley,

Hoboken

Radini S, Marinelli E, Akyol Çağrı etal (2021) Urban water-energy-food-climate nexus in integrated

wastewater and reuse systems: cyber-physical framework and innovations. Appl Energy 298:117268

Rampini L, Re Cecconi F (2022) Artiﬁcial intelligence in construction asset management: A review

of present status, challenges and future opportunities. J Info Technol Construct. https:// doi. org/ 10.

36680/j. itcon. 2022. 043

Rangel-Martinez D, Nigam K, Ricardez-Sandoval L (2021) Machine learning on sustainable energy: a

review and outlook on renewable energy systems, catalysis, smart grid and energy storage. Chem

Eng Res Design. https:// doi. org/ 10. 1016/j. cherd. 2021. 08. 013

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

M.Zuccotto et al.

1 3

88 Page 68 of 68

Roncalli M, Bistaﬀa F, Farinelli A (2019) Decentralized power distribution in the smart grid with ancil-

lary lines. Mobile Netw Appl 24(5):1654–1662. https:// doi. org/ 10. 1007/ s11036- 017- 0893-y

Rumelhart DE, Hinton GE, Williams RJ (1986) Learning representations by back-propagating errors.

Nature 323(6088):533–536. https:// doi. org/ 10. 1038/ 32353 3a0

Sabet S, Farooq B (2022) Green vehicle routing problem: State of the art and future directions. IEEE

Access 10:101622–101642. https:// doi. org/ 10. 1109/ ACCESS. 2022. 32088 99

Sacco A, Esposito F, Marchetto G etal (2021) Sustainable task oﬄoading in UAV networks via multi-

agent reinforcement learning. IEEE Trans Vehicul Technol. https:// doi. org/ 10. 1109/ TVT. 2021.

30743 04

Shaw R, Howley E, Barrett E (2022) Applying reinforcement learning towards automating energy eﬃcient

virtual machine consolidation in cloud data centers. Info Syst. https:// doi. org/ 10. 1016/j. is. 2021. 101722

Sheikhi A, Rayati M, Ranjbar A (2016) Dynamic load management for a residential customer; reinforce-

ment learning approach. Sustain Cities Soc. https:// doi. org/ 10. 1016/j. scs. 2016. 04. 001

Silver D, Huang A, Maddison CJ etal (2016) Mastering the game of Go with deep neural networks and

tree search. Nature. https:// doi. org/ 10. 1038/ natur e16961

Silver D, Schrittwieser J, Simonyan K etal (2017) Mastering the game of go without human knowledge.

Nature. https:// doi. org/ 10. 1038/ natur e24270

Simão TD, Suilen M, Jansen N (2023) Safe policy improvement for POMDPs via ﬁnite-state controllers.

Proc AAAI Conf Artif Intell 37(12):15109–15117. https:// doi. org/ 10. 1609/ aaai. v37i12. 26763

Sivamayil K, Rajasekar E, Aljafari B etal (2023) A systematic study on reinforcement learning based

applications. Energies. https:// doi. org/ 10. 3390/ en160 31512

Skardi M, Kerachian R, Abdolhay A (2020) Water and treated wastewater allocation in urban areas con-

sidering social attachments. J Hydrol. https:// doi. org/ 10. 1016/j. jhydr ol. 2020. 124757

Steccanella L, Bloisi D, Castellini A etal (2020) Waterline and obstacle detection in images from low-

cost autonomous boats for environmental monitoring. Robot Auton Syst 124:103346

Sultanuddin S, Vibin R, Rajesh Kumar A etal (2023) Development of improved reinforcement learn-

ing smart charging strategy for electric vehicle ﬂeet. J Energy Storage. https:// doi. org/ 10. 1016/j. est.

2023. 106987

Sutton RS, Barto AG (2018) Reinforcement learning: an introduction. A Bradford Book, Denver

United Nations (2015) Transforming our world: the 2030 agenda for sustainable development

van Hasselt H, Guez A, Silver D (2016) Deep reinforcement learning with double Q-earning. In: Pro-

ceedings of the AAAI conference on artiﬁcial intelligence, pp 2094–2100, https:// doi. org/ 10. 1609/

aaai. v30i1. 10295

Venkataswamy V, Grigsby J, Grimshaw A etal (2023) RARE: renewable energy aware resource man-

agement in datacenters. Lecture Notes Comput Sci. https:// doi. org/ 10. 1007/ 978-3- 031- 22698- 4\_6

Wang JJ, Wang L (2022) A cooperative memetic algorithm with learning-based agent for energy-aware dis-

tributed hybrid ﬂow-shop scheduling. IEEE Trans Evol Comput. https:// doi. org/ 10. 1109/ TEVC. 2021.

31061 68

Watkins CJCH (1989) Learning from delayed rewards. King’s College, Oxford

Yang T, Zhao L, Li W etal (2020) Reinforcement learning in sustainable energy and electric systems: a sur-

vey. Ann Rev Control 49:145–163. https:// doi. org/ 10. 1016/j. arcon trol. 2020. 03. 001

Yao R, Hu Y, Varga L (2023) Applications of agent-based methods in multi-energy systems—a systematic

literature review. Energies. https:// doi. org/ 10. 3390/ en160 52456

Zhang W, Liu H, Wang F, et al (2021a) Intelligent electric vehicle charging recommendation based on

multi-agent reinforcement learning. In: Proceedings of the web conference, WWW, p 1856–1867,

https:// doi. org/ 10. 1145/ 34423 81. 34499 34

Zhang X, Manogaran G, Muthu B (2021b) IoT enabled integrated system for green energy into smart cities.

Sustain Energy Technol Assess. https:// doi. org/ 10. 1016/j. seta. 2021. 101208

Zuccotto M, Castellini A, Farinelli A (2022a) Learning state-variable relationships for improving POMCP

performance. In: Proceedings of the 37th ACM/SIGAPP symposium on applied computing. Associa-

tion for Computing Machinery, SAC, p 739–747

Zuccotto M, Piccinelli M, Castellini A etal (2022b) Learning state-variable relationships in POMCP: a

framework for mobile robots. Front Robotics AI 2022:183

Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and

institutional aﬃliations.

Content courtesy of Springer Nature, terms of use apply. Rights reserved.

Terms and Conditions

Springer Nature journal content, brought to you courtesy of Springer Nature Customer Service Center

GmbH (“Springer Nature”).

Springer Nature supports a reasonable amount of sharing of research papers by authors, subscribers

and authorised users (“Users”), for small-scale personal, non-commercial use provided that all

sharing, receiving or otherwise using the Springer Nature journal content you agree to these terms of

use (“Terms”). For these purposes, Springer Nature considers academic use (by researchers and

students) to be non-commercial.

These Terms are supplementary and will apply in addition to any applicable website terms and

conditions, a relevant site licence or a personal subscription. These Terms will prevail over any

conflict or ambiguity with regards to the relevant terms, a site licence or a personal subscription (to

the extent of the conflict or ambiguity only). For Creative Commons-licensed articles, the terms of

the Creative Commons license used will apply.

We collect and use personal data to provide access to the Springer Nature journal content. We may

also use these personal data internally within ResearchGate and Springer Nature and as agreed share

it, in an anonymised way, for purposes of tracking, analysis and reporting. We will not otherwise

disclose your personal data outside the ResearchGate or the Springer Nature group of companies

unless we have your permission as detailed in the Privacy Policy.

While Users may use the Springer Nature journal content for small scale, personal non-commercial

use, it is important to note that Users may not:

use such content for the purpose of providing other users with access on a regular or large scale

basis or as a means to circumvent access control;

use such content where to do so would be considered a criminal or statutory offence in any

jurisdiction, or gives rise to civil liability, or is otherwise unlawful;

falsely or misleadingly imply or suggest endorsement, approval , sponsorship, or association

unless explicitly agreed to by Springer Nature in writing;

use bots or other automated methods to access the content or redirect messages

override any security feature or exclusionary protocol; or

share the content in order to create substitute for Springer Nature products or services or a

systematic database of Springer Nature journal content.

In line with the restriction against commercial use, Springer Nature does not permit the creation of a

product or service that creates revenue, royalties, rent or income from our content or its inclusion as

part of a paid for service or for other commercial gain. Springer Nature journal content cannot be

used for inter-library loans and librarians may not upload Springer Nature journal content on a large

scale into their, or any other, institutional repository.

These terms of use are reviewed regularly and may be amended at any time. Springer Nature is not

obligated to publish any information or content on this website and may remove it or features or

functionality at our sole discretion, at any time with or without notice. Springer Nature may revoke

this licence to you at any time and remove access to any copies of the Springer Nature journal content

which have been saved.

To the fullest extent permitted by law, Springer Nature makes no warranties, representations or

guarantees to Users, either express or implied with respect to the Springer nature journal content and

all parties disclaim and waive any implied warranties or warranties imposed by law, including

merchantability or fitness for any particular purpose.

Please note that these rights do not automatically extend to content, data or other material published

by Springer Nature that may be licensed from third parties.

If you would like to use or distribute our Springer Nature journal content to a wider audience or on a

regular basis or in any other manner not expressly permitted by these Terms, please contact Springer

Nature at

onlineservice@springernature.com

Content uploaded by Alberto Castellini

Content may be subject to copyright.

ResearchGate has not been able to resolve any citations for this publication.

Safe and Efficient Reinforcement Learning for Environmental Monitoring

Conference Paper

Full-text available

Nov 2023

This paper discusses the challenges of applying reinforcement techniques to real-world environmental monitoring problems and proposes innovative solutions to overcome them. In particular, we focus on safety, a fundamental problem in RL that arises when it is applied to domains involving humans or hazardous uncertain situations. We propose to use deep neural networks, formal verification, and online refinement of domain knowledge to improve the transparency and efficiency of the learning process, as well as the quality of the final policies. We present two case studies, specifically (i) autonomous water monitoring and (ii) smart control of air quality indoors. In particular, we discuss the challenges and solutions to these problems, addressing crucial issues such as anomaly detection and prevention, real-time control, and online learning. We believe that the proposed techniques can be used to overcome some limitations of RL, providing safe and efficient solutions to complex and urgent problems.

Scalable Safe Policy Improvement via Monte Carlo Tree Search

Conference Paper

Full-text available

Nov 2023

Algorithms for safely improving policies are important to deploy reinforcement learning approaches in real-world scenarios. In this work, we propose an algorithm, called MCTS-SPIBB, that computes safe policy improvement online using a Monte Carlo Tree Search based strategy. We theoretically prove that the policy generated by MCTS-SPIBB converges, as the number of simulations grows, to the optimal safely improved policy generated by Safe Policy Improvement with Baseline Bootstrapping (SPIBB), a popular algorithm based on policy iteration. Moreover, our empirical analysis performed on three standard benchmark domains shows that MCTS-SPIBB scales to significantly larger problems than SPIBB because it computes the policy online and locally, i.e., only in the states actually visited by the agent.

A Novel Aggregation Framework for the Efficient Integration of Distributed Energy Resources in the Smart Grid

Poster

Full-text available

Jun 2023

In this paper, we put forward a novel DER aggregation framework , encompassing a multiagent architecture and various types of mechanisms for the effective management and efficient integration of DERs in the Grid. One critical component of our architecture is the Local Flexibility Estimators (LFEs) agents, which are key for offloading the Aggregator from serious or resource-intensive responsibilities-such as addressing privacy concerns and predicting the accuracy of DER statements regarding their offered demand response services. The proposed aggregation framework allows the formation of efficient LFE cooperatives. Our experiments verify its effectiveness for incorporating heterogeneous DERs into the Grid in an efficient manner-showing that the use of appropriate mechanisms results in higher payments for participating LFEs.

Reinforcement Learning-Based Approach for Minimizing Energy Loss of Driving Platoon Decisions

Article

Full-text available

Apr 2023
SENSORS-BASEL

Reinforcement learning (RL) methods for energy saving and greening have recently appeared in the field of autonomous driving. In inter-vehicle communication (IVC), a feasible and increasingly popular research direction of RL is to obtain the optimal action decision of agents in a special environment. This paper presents the application of reinforcement learning in the vehicle communication simulation framework (Veins). In this research, we explore the application of reinforcement learning algorithms in a green cooperative adaptive cruise control (CACC) platoon. Our aim is to train member vehicles to react appropriately in the event of a severe collision involving the leading vehicle. We seek to reduce collision damage and optimize energy consumption by encouraging behavior that conforms to the platoon’s environmentally friendly aim. Our study provides insight into the potential benefits of using reinforcement learning algorithms to improve the safety and efficiency of CACC platoons while promoting sustainable transportation. The policy gradient algorithm used in this paper has good convergence in the calculation of the minimum energy consumption problem and the optimal solution of vehicle behavior. In terms of energy consumption metrics, the policy gradient algorithm is used first in the IVC field for training the proposed platoon problem. It is a feasible training decision-planning algorithm for solving the minimization of energy consumption caused by decision making in platoon avoidance behavior.

Secure Edge Computing Vulnerabilities in Smart Cities Sustainability using Petri Net and Genetic Algorithm-based Reinforcement Learning

Article

Full-text available

Mar 2023

The Industrial Internet of Things (IIoT) revolution has emerged as a promising network that enhanced information dissemination about the city's resources. This city's resources are wirelessly connected to different constrained devices (such as sensors, robotics, and actuators). However, the communication of this wireless information is threatened by several malicious attacks, cyber-attacks, and hackers. This is due to unsecured IIoT networks that were exposed as a potential back door entry point for the attacks. Consequently, this study aims to develop a security framework for the smart cities’ sustainability edge computing vulnerabilities using Petri Net and Genetic Algorithm-Based Reinforcement Learning (GARL). First, a common trust model for addressing information outflows in the network using a distributed authorization algorithm is proposed. This algorithm is implemented on a secure framework modeling in Petri Net called secure trust-aware philosopher privacy and authentication (STAPPA) for mitigation of the privacy breach in the networks. Genetic Algorithm-based Reinforcement Learning (GARL) is used to optimize the search, detect anomalies, and shortest route during the agent learning in the environment. The detection and accuracy rate results obtained over a secure framework using reinforcement learning are 98.75, 99, 99.50, 99.75, and 100% during simulation in the network environment. The average sensitivity of the detection rate is 1.000, while the average specificity outcome is 0.868. The result of the GARL simulation model obtained shows the best distance of 238.84 * 10−3 fitness when the search space is optimized by reducing the number of chromosomes to 10 in the model. These approaches help to detect anomalies and prevent unauthorized users from accessing edge computing components in the city architecture.

Intelligent energy management systems: a review

Article

Full-text available

Mar 2023
ARTIF INTELL REV

Climate change has become a major problem for humanity in the last two decades. One of the reasons that caused it, is our daily energy waste. People consume electricity in order to use home/work appliances and devices and also reach certain levels of comfort while working or being at home. However, even though the environmental impact of this behavior is not immediately observed, it leads to increased CO2 emissions coming from energy generation from power plants. It has been shown that about 40% of these emissions come from the electricity consumption and also that about 20% of this percentage could have been saved if we started using energy more efficiently. Confronting such a problem efficiently will affect both the environment and our society. Monitoring energy consumption in real-time, changing energy wastage behavior of occupants and using automations with incorporated energy savings scenarios, are ways to decrease global energy footprint. In this review, we study intelligent systems for energy management in residential, commercial and educational buildings, classifying them in two major categories depending on whether they provide direct or indirect control. The article also discusses what the strengths and weaknesses are, which optimization techniques do they use and finally, provide insights about how these systems can be improved in the future.

Applications of Agent-Based Methods in Multi-Energy Systems—A Systematic Literature Review

Article

Full-text available

Mar 2023

The need for a greener and more sustainable energy system evokes a need for more extensive energy system transition research. The penetration of distributed energy resources and Internet of Things technologies facilitate energy system transition towards the next generation of energy system concepts. The next generation of energy system concepts include “integrated energy system”, “multi-energy system”, or “smart energy system”. These concepts reveal that future energy systems can integrate multiple energy carriers with autonomous intelligent decision making. There are noticeable trends in using the agent-based method in research of energy systems, including multi-energy system transition simulation with agent-based modeling (ABM) and multi-energy system management with multi-agent system (MAS) modeling. The need for a comprehensive review of the applications of the agent-based method motivates this review article. Thus, this article aims to systematically review the ABM and MAS applications in multi-energy systems with publications from 2007 to the end of 2021. The articles were sorted into MAS and ABM applications based on the details of agent implementations. MAS application papers in building energy systems, district energy systems, and regional energy systems are reviewed with regard to energy carriers, agent control architecture, optimization algorithms, and agent development environments. ABM application papers in behavior simulation and policy-making are reviewed with regard to the agent decision-making details and model objectives. In addition, the potential future research directions in reinforcement learning implementation and agent control synchronization are highlighted. The review shows that the agent-based method has great potential to contribute to energy transition studies with its plug-and-play ability and distributed decision-making process.

Risk-aware Shielding of Partially Observable Monte Carlo Planning Policies

Article

Aug 2023
ARTIF INTELL

Safe Policy Improvement for POMDPs via Finite-State Controllers

Article

Jun 2023

We study safe policy improvement (SPI) for partially observable Markov decision processes (POMDPs). SPI is an offline reinforcement learning (RL) problem that assumes access to (1) historical data about an environment, and (2) the so-called behavior policy that previously generated this data by interacting with the environment. SPI methods neither require access to a model nor the environment itself, and aim to reliably improve upon the behavior policy in an offline manner. Existing methods make the strong assumption that the environment is fully observable. In our novel approach to the SPI problem for POMDPs, we assume that a finite-state controller (FSC) represents the behavior policy and that finite memory is sufficient to derive optimal policies. This assumption allows us to map the POMDP to a finite-state fully observable MDP, the history MDP. We estimate this MDP by combining the historical data and the memory of the FSC, and compute an improved policy using an off-the-shelf SPI algorithm. The underlying SPI method constrains the policy space according to the available data, such that the newly computed policy only differs from the behavior policy when sufficient data is available. We show that this new policy, converted into a new FSC for the (unknown) POMDP, outperforms the behavior policy with high probability. Experimental results on several well-established benchmarks show the applicability of the approach, even in cases where finite memory is not sufficient.

Development of improved reinforcement learning smart charging strategy for electric vehicle fleet

Article

Aug 2023

Due to its environmental and energy sustainability, electric vehicles (EV) have emerged as the preferred option in the current transportation system. Uncontrolled EV charging, however, can raise consumers; charging costs and overwhelm the grid. Smart charging coordination systems are required to prevent the grid overload caused by charging too many electric vehicles at once. In light of the baseload that is present in the power grid, this research suggests an improved reinforcement learning charging management system. An optimization method, however, requires some knowledge in advance, such as the time the vehicle departs and how much energy it will need when it arrives at the charging station. Therefore, under realistic operating conditions, our improved Reinforcement Learning method with Double Deep Q-learning approach provides an adjustable, scalable, and flexible strategy for an electric car fleet. Our proposed approach provides fair value which solves the over-estimation action value problem in deep Q-learning. Then, a number of different charging strategies are compared to the Reinforcement Learning algorithm. The proposed Reinforcement Learning technique minimizes the variance of the overall load by 68 % when compared to an uncontrolled charging strategy.

Reinforcement learning applications in environmental sustainability: a review

Abstract and Figures

Recommended publications

Online model adaptation in Monte Carlo tree search planning

Safe and Efficient Reinforcement Learning for Environmental Monitoring

Tabular Model Learning in Monte Carlo Tree Search

Learning state-variable relationships for improving POMCP performance