ArticlePDF Available

Abstract and Figures

Demand‐side management (DSM) enables customers to decide consciously on how to seek and obtain power from the grid. The prevailing method available in DSM is load shifting. The grid is assisted through reducing load demands during the peak hours and altering the demand time into the off‐peak hours in a manner that the consumption sources could be met and several online load alterations may be precluded. The present article develops three approaches based on centralized multiagent reinforcement learning (CMARL) wherein the grid is modeled via a cooperative game. In the proposed methods, learning takes place in a center agent, and agents, in turn, are not to communicate with each other throughout the learning process. The results of implementations indicate that the proposed producers optimize the grid performance through diminishing customer costs and enhancing security by minimizing inter‐customer interactions.
This content is subject to copyright. Terms and conditions apply.
RESEARCH ARTICLE
Learning to shift load under uncertain production
in the smart grid
Mohsen Ghaffari
1
| Mohsen Afsharchi
1,2
1
Department of Computer Science and
Information Technology, Institute for
Advanced Studies in Basic Sciences
Zanjan, Zanjan, Iran
2
Department of Electrical and Computer,
Engineering, University of Zanjan,
Zanjan, Iran
Correspondence
Mohsen Afsharchi, Department of
Electrical and Computer, Engineering,
University of Zanjan, Zanjan, Iran.
Summary
Demand-side management (DSM) enables customers to decide consciously on
how to seek and obtain power from the grid. The prevailing method available in
DSM is load shifting. The grid is assisted through reducing load demands during
the peak hours and altering the demand time into the off-peak hours in a man-
ner that the consumption sources could be met and several online load alter-
ations may be precluded. The present article develops three approaches based on
centralized multiagent reinforcement learning (CMARL) wherein the grid is
modeled via a cooperative game. In the proposed methods, learning takes place
in a center agent, and agents, in turn, are not to communicate with each other
throughout the learning process. The results of implementations indicate that the
proposed producers optimize the grid performance through diminishing cus-
tomer costs and enhancing security by minimizing inter-customer interactions.
KEYWORDS
centralized multiagent reinforcement learning, demand-side management, game theory, load
shifting, smart grid
1|INTRODUCTION
A smart grid, consisting of renewable energy sources, storage units, and power equipment, transfers energy from pro-
ducers to consumers via two-way digital technology so as to decrease the production cost and increase reliability
through controlling energy consumption. Recent studies have mainly focused on infrastructure, management, and
smart grid security. In these grids, controlling methodologies and communication technologies are conducted smartly
in both distribution and transmission levels so that electrical energy could be supplied.
In recent years, an objective of smart grid constructions has been to provide the grid with a demand-side management
(DSM) system while considering the optimized performance of the existing production sources and better load demand
management under diverse conditions. In this approach, consumption load is altered in a way that the grid sustainability
can be increased while air pollution can be decreased. In principle, using DSM in smart grids results in increased general
efficiency, security, and sustainability according to the maximum capacity of the current infrastructure.
List of Symbols and Abbreviations: mp, Marginal production; , Set of customers; dh
i, Certain amount of ith customer demand in hour h;d
h
, Total
energy demand for all customers; g
h
, The suitable amount of load for the grid; PH, The set of peak hours; PH0, The set of off-peak hours; β, Load
reduction factor; ED, Load demands from the utility company's point of view; rh
i, The amount of load that each customer should shift; λ, Customer's
satisfaction factor of participating in load shifting; ch
i, Price of each unit of energy on hfor ith customer; B, Locational marginal pricing factor; S, The
set of states; A, The set of actions; R, The reward function; T, The transition function; Δ, The distribution of probabilities in the set of states;
α, Learning factor; γ, Discount factor; CMARL, centralized multi-agent reinforcement learning; DSM, demand side management; MARL, multi-agent
reinforcement learning; MDP, markov decision process; SVR, support vector regression.
Received: 28 March 2020 Revised: 28 July 2020 Accepted: 27 November 2020
DOI: 10.1002/2050-7038.12748
Int Trans Electr Energ Syst. 2021;31:e12748. wileyonlinelibrary.com/journal/etep © 2020 John Wiley & Sons Ltd 1of18
https://doi.org/10.1002/2050-7038.12748
Smart pricing is a unique feature of smart grids. It is capable of persuading customers to control energy consump-
tion by increasing costs during peak hours as well as enticement schemes all over the grid. Furthermore, DSM using
smart pricing allows the customers to make their choices considering their interests (in terms of economy) as well as
the grids' interests orin load volume management termstheir plans the way electrical demands are scheduled dur-
ing all hours of the day. In other words, through by apprised of the hourly cost change trend and the total demand of
the grid, customers can choose the optimum time for their desired load demand. As a result, not only will they reduce
their costs but also they can contribute to proper load shift in the grid, that is, the grid could flawlessly generate and dis-
tribute the load demanded.
Owing to the nonlinear and discrete nature of the current smart grids model, a major part of related investigations
has dealt with developing optimization algorithms to obtain the best power distribution mode. Furthermore, other
important studies on smart grids were conducted on the components and strategies of DSM, which are to be discussed.
In their study, Mohsenian-Rad et al
1
developed a method based on game theory. They defined the grid customers as
players and the daily schedules of their household appliances and loads as a strategy. They assumed that the utility
company could adopt adequate pricing tariffs that differentiate energy usage in terms of time and level. They claimed
that the costs of the grid would be minimized if a Nash equilibrium was achieved. The main problem of this work is to
find a Nash equilibrium in the real world. The number of customers of an actual grid can be so high that finding the
Nash equilibrium may appear inefficient.
Atzeni et al
2
attempted to introduce a different approach to energy management to improve the grid model by tak-
ing energy storage into account. Using the day-ahead technique and a general energy pricing model in DSM mecha-
nism, they tackle the grid optimization from two different perspectives: a user-oriented optimization and a holistic-
based design. In the day-ahead technique, customers should indicate their schedules the day before the test day so that
it would be possible to manage the consumption load of the grid for the next day. It should be noted that this gives rise
to a number of problems in terms of indicating the amount of customer demand.
The overall value of implementing DSM and demand-response management (DRM) schemes was studied in Refer-
ence 3 via a Stackelberg game formulation. The work in Reference 4 investigated how energy consumption could be
optimized through a two-step centralized model, in which a power supplier provides consumers with an energy price
parameter and consumption summary vector.
Chen et al
5
convinced consumers to do load shifting by adopting an instantaneous load billing. They used an aggre-
gative game in their DSM scenario of modeling selfish customer behavior. Furthermore, they proposed an algorithm for
the conditions where there was no central control.
Unlike the conducted studies in this area, Chai et al
6
studied DRM with multiple utility companies. In their paper, they
modeled the interaction between utility companies and residential users as a two-level game. That is, the competition among
the utility companies was formulated as a noncooperative game, while the interaction among the residential users was for-
mulated as an evolutionary game. Finding the Nash equilibrium is a problem that all non-cooperative games struggle with.
All the mentioned works have discussed DSM in terms of economic and optimization aspects. Although these stud-
ies proposed positive matters, most of them supposed the behavior of agents to be extremely rational and assumed that
customers are to behave as preferred by the grid, which, of course, it is not the case in the real world. Thus, a major pro-
portion of the studies have not considered the irrational behavior of customers. Moreover, as mentioned before, very
often, it has been really challenging to attain a Nash equilibrium in large communities in the real world.
Wang et al
7
studied a load shifting method using a non-cooperative game. The main characteristic of the study was the
prospect theory, which simulated the agents' behavior. The evident disadvantage of this work was the exersion of limita-
tions on customers' load-shifting participation; that is, whenever customers tended to start participation in load-shifting,
they were obliged to continue their participation until the last hour of the day (h=24). In other words, participating in load
shifting is almost identical to a one-sided door. Thus, the customers cannot offer their load to be bought at certain hours of
a day and demand their load during the next hours of the same day, but can require demand during their preferred hours.
Rather, it is the grid that will determine how much of the customer's demand could be met for the given hours. Moreover,
once the customers participate in load shifting, they are not allowed to leave the system until the end of the day.
In Reference 8, Wijaya et al suggested a method to solve the load shifting problem based on a multi-unit auction in
which all the customers send information about their demands to a center and then the center regulates their load
shifting arbitrarily and, in the end, sells the electric energy using an auction. Thus, the grid manages the demand, and
the load also is supplied with the best price. Moreover, customers can request their demands according to their pre-
ferred time and price. The probable challenges the method seems to suffer from are related to both grid and economy.
First, there are many interactions in the grid that create severe security problems. Second, while auction has many
advantages, some disadvantages accompany it when used directly.
2of18 GHAFFARI AND AFSHARCHI
He et al
9
proposed a model predictive control framework-based distributed demand-side energy management
method for users and utilities in a smart grid. Users are equipped with renewable energy resources, energy storage sys-
tems, and different types of smart loads. With the proposed method, each user finds an optimal operation routine in
response to the varying electricity prices according to his/her preference individually: for example, the power reduction
of flexible loads, the start time of shiftableloads, the operation power of schedulable loads, and the charge/discharge
routine of the energy storage systems.
In Reference 10, model predictive control methodologies were developed to address the distribution line congestion
and balancing problem in electric power distribution systems. The authors formulated model demand response strate-
gies as mixed-integer quadratic program optimization problems involving both continuous and binary variables.
Jamil and Mittal
11
proposed two optimization algorithms for solving the load shifting problem in a smart grid. The
first one was a particle swarm optimization algorithm and the second one a grasshopper optimization algorithm, both
of which were proposed and applied in three load areas of the smart grid, that is, residential, commercial, and
industrial.
Ali et al
12
simulated load shifting occurring in the residential buildings, with air-conditioners as the shiftable load.
The significant problem of these works was that the method might create new peaks. To overcome this problem, Khalid
et al
13
presented a novel approach to optimize load demand and storage management in response to dynamic pricing
using machine learning and optimization algorithms. Although, the authors show effectiveness in terms of minimizing
the electricity bill as well as intercepting minimal-price peaks creation, their use of linear programming optimization
renders their work inefficient for real-world grids.
Afzaal et al
14
proposed an auction mechanism to optimize the energy traded between consumers and multiple sup-
pliers within a smart grid. They suggested an agent-based forecasting method that is capable of predicting the energy
consumption of each consumer with a lead-time of 1 h. This forecasting is exploited to estimate the cost to purchase the
required energy from multiple suppliers. As mentioned before, auction breeds problems of fairness and security. Also,
this work ignores grid customer satisfaction.
High penetration of renewable energy sources and electrical energy storage systems in electrical distribution grids
has changed distribution system operators' energy balance. For this purpose, Chamandoust et al
15
modeled the energy
scheduling problem of a residential smart electrical distribution grid with renewable energy sources and DSM as a tri-
objective model. The proposed model was solved using the epsilon-constraint method, which was inefficient in real-
world grids. Since the proposed approach has three objective functions, different Pareto solutions are obtained and the
best solution is determined by the decision-making method. Also, they modeled the uncertain behavior of renewable
energy sources using the stochastic optimization approach.
Using centralized reinforcement learning, the present study attempts to develop a method to solve the problem of
load shifting in a multiagent environment. In the proposed method, customers have the right to participate in load
shifting not necessarily in sequential intervals. Furthermore, since it is modeled based on a cooperative game, the prob-
lem of finding a Nash equilibrium is acceptably removed. The most important aspect of the proposed method, com-
pared with the other ones, could be its independence from the number of customers in the grid, which may eliminate
worries about grid extension. Moreover, the necessary number of communication between the customers is reduced sig-
nificantly in comparison with other works. Thus, we claim that our suggested approach, in addition to costs reduction,
solves the balancing problem of the grid without causing customer dissatisfaction, reduces the possibility of security
threats by decreasing the number of communication, and also improves the time complexity (running time) of finding
optimum load shifting in the grid.
Research on smart grids has shown that, owing to the use of renewable energy, sometimes the grids commit errors
on predicting the amount of energy produced. The reason lies in different factors, such as unexpected changes in
weather conditions. In such cases, the level of produced energy would be lower than expected. To deal with production
level irregularity, we specify it by using a uniform distribution, previously indicated by the utility company, called mp
in this paper. Thus, during the learning phase, the level of produced load in an hour may meet the demand during an
hour in one episode while in another episode over the same hour it produces the lowest level of the predicted load,
which cannot meet the demand of the grid. In such cases, the customers should be encouraged to shift loads by increas-
ing the energy price during the given hours and reducing it over other hours.
This paper is composed of five sections. Basic definitions in the field of smart grids will be presented following smart
grids introduction and load distribution in Section 2. Then game theory and reinforcement learning will be reviewed in
Section 3. Section 4 is devoted to the proposed method. Section 5 includes the results of applying the proposed models
and analysis of the research results.
GHAFFARI AND AFSHARCHI 3of18
2|DEFINITIONS
Considering the cost effect on customer behavior, DSM tries to develop working methods in which load demands in the
grid can be managed. Thus, in this section we will discuss how load can be managed and define some terms needed to
proceed further.
2.1 |Load management
Smart grids consist of buyers and sellers of energy, all of whom are generally referred to as the customers of the grid. In
this article, customers are referred to as buyers. Let be a set of customers, where each customer consumes a certain
amount of energy per hour, and it is indicated by dh
ifor the ith customer in the hth hour. Thus, the total energy demand
for all customers in hcan be assessed using the following equation (Wang et al
7
):
dh=X
i
dh
ið1Þ
DSM has different load management strategies, including valley filling, peak clipping, load growth, strategic conser-
vation, and load shifting. In this study, we use load shifting to manage the load efficiently. It is expected that the distri-
bution of customers' load demands after load shifting results in companies facing fewer problems regarding energy
production and distribution.
In load shifting, customers of the grid change the time of a part of their demand from peak hours to off-peak hours.
Therefore, both the to-be-reduced load amount during peak hours and the new time of that demand play key roles in
load shifting. In other words, not only the load demand amount the customers are to reduce at peak hours is important,
but also the time that they can request the same demand again from the grid is of great importance, as it may cause
new peak hours. Suppose a case in which several customers simultaneously shift their load demand to h. What may
result is a sudden increase in load demands in h. Thus, the grid will not be able to meet this demand rate and the prob-
lem of peak hour will happen at that hour.
Utility companies assess the rate of the required load for hour hbased on the data gathered from previous years. More-
over, they predict the rate of produced load by the grid by considering such factors as weather conditions. Finally, based on
the results, they will be able to indicate the suitable amount of load for the grid by using the following equation;
gh=
β×dh+mp,hPH
dh+ED
PH +mp,his minimum load hour
dh+mp,O:W:
8
>
>
>
<
>
>
>
:
ð2Þ
where PH is the set of peak hours and mp refers to marginal production resulting from the uncertain nature of energy
production (i.e., unexpected changes in weather conditions or unpredicted destructions in power generating equip-
ment). mp is assumed uncertain and can be indicated randomly with a uniform distribution. βis the load reduction fac-
tor that should be applied to total demand of the grid at each time. It changes the load level of the grid, where energy
production and distribution incur the minimum cost. As mentioned before, the utility company can predict the total
demand of the grid by applying the data gathered from the previous years. Next, utilizing the predicted total amount of
energy that the grid can generate and distribute at minimum cost, the utility company indicates the value of β. In this
article, we set β= 0.9 and the value of mp is drawn from a uniform distribution between 0 and 50 kW. In addition, ED
shows the load demands from the utility company's point of view and can be calculated by the following Equation (7):
ED =X
hPH
1βðÞ×dhð3Þ
Note that ED will be shifted to another time based on the desired time of utility companies. However, this rate can
be shifted to the adjacent hours to either peak hours or any hours with which customers are more satisfied; the grid is
4of18 GHAFFARI AND AFSHARCHI
also able to define it according to its predictions. An important point while deciding upon the time of shifting the load
is to consider the lowest price for each load unit at a given hour because the main factor that encourages customers to
participate in load shifting is the low prices.
Based on Equation (2), utility companies can compute the load per hour to make it bearable for the grid. Next,
based on the targeted load, they can specify by what amount each customer should shift their load demand from the
peak hour of hto that of the off-peak hours.
rh
i= max dhghmp,0

×dh
i
dh×λið4Þ
where 0 < λ
i
1 shows the ith customer's satisfaction factor of participating in load shifting. This factor allows our
models to learn the customer's decision concerning the load before proceeding to load shifting. In this case, we can
claim that the customer satisfaction with load shifting is taken into account. Also it helps us to learn how to face all
types of customers.
2.2 |Demand cost
Owing to the increasing need for electricity, it would be impossible to reduce the demand significantly. However, con-
trolling and managing hourly consumption can reduce the costs of production and distribution, which makes it possible
to reduce the price of each unit of energy per hour. Here, we assume that the price of each unit of energy in hfor the
ith customer depends on the ratio of their demand to total demand from the grid in the given hour
7
:
ch
i=B×dh
i
dhð5Þ
where Bcan be calculated based on the locational marginal pricing
16
because the price function is not necessarily time-
dependent. Use of Equation (5) makes the price of each unit of energy to be dependent on the total demand for energy.
That is, the less the customers' demand, compared with total demand, the less the price they pay for each unit of
energy.
3|CENTRALIZED LEARNING FORMULATION FOR DEMAND-SIDE
MANAGEMENT
The main idea behind reinforcement learning (RL) is that the rewarded behavior is likely to be repeated, whereas
behavior that is punished is less likely to recur. Thus, an agent learns from the received environmental feedback by two
different signals: state signal indicates the agent state in the environment, and the reward signal shows feedback of the
environment to determine the desirability of the agent state. The agent tries to maximize its long-term utility.
If the environment includes a set of agents whose behaviors affect one another, the learning will be called multi-
agent reinforcement learning (MARL). Learning in multiagent environments takes place in two ways: centralized and
decentralized. In decentralized learning, the agents should interact with each other at all times which, in turn, may
decrease efficiency. Thus, communications between the agents should be restricted. There are several methods to con-
trol the communications, but most of the time the results are not optimal. However, the centralized method reduces
communications in the grid by using a central learner. Moreover, the learning process in this approach is easier than in
decentralized methods. The reason is that all needed information is available for the center. Centralized learning is
structured in such a way that allows each of the agents to be informed about the acts of one another, because of the
learning process. The centralized learning process is similar to the one in which the center, first, simulates the behavior
of agents for itself and learns how to act in different states. Second, when the agents act in the environment, based on
what they have learned before, the center can suggest the agents what action is suitable in the current state (Figure 1).
The other important factor in a multiagent environment is defining the way agents may interact with each other.
The interaction is divided into two categories: cooperative and noncooperative. Agents in this work are cooperative,
and thus the agents not only pay attention to optimize their profit but also try to optimize the costs of the whole grid.
GHAFFARI AND AFSHARCHI 5of18
Note that, by factors we defined here, improving the costs of the whole grid results in improving the customers' costs.
Thus, the resulting conditions not only satisfy the customers but also improve the load level in the grid.
To formulate the model in RL, we use tuple S,A,R,T
fg
, where Srefers to the set of states; Athe set of actions;
R=S×A!, the reward function for each action by the agent in current state; and T=S×A!Δ, the transition
function, where Δis the distribution of probabilities in the set of states S.
Note that, RL is applied to the problems in which the agent moves through some sequence of states in order to
reach the intended state. Thus, we should try to find a model that maps our problem to RL.
In this work, the customers are known as the agents in the model. We should use MARL because we do not deal
with one single customer. The learning method in this article is centralized so that the center receives information from
all customers (agents) in the learning phase and learns how each agent acts in different states using the Q-learning algo-
rithm. Next, in the test phase, all customers send the vector of their daily demand to the center agent. Based on what it
has learned, the center agent indicates the best action for the customers of the grid and finally sends the received infor-
mation to the customers (Figure 1). Thus, the information being sent is only twice as much as the number of customers
in the grid and, apparently, the amount is not high enough to disrupt the grid.
4|PROPOSED MODELS
Three models of the load shifting problem are presented in this section based on MARL. RL-oriented algorithms
include two parts: learning and testing. The learning process is carried out on different data gathered from the previous
behaviors of the customers so that Q-table can be learned and deployed.
During the process of learning, first of all, the data related to one-day schedule of the customers are sent to the cen-
ter agent. The next step of learning is to move through the state space to receive feedback from the environment while
switching from one state to the other. Thus, in this step, the center agent selects an action randomly for each agent and
then, according to the coming sequence of actions, computes a new state in which the grid is. Moreover, it updates the
environmental reward in order to change the state. Note that since we use a cooperative game, the center agent only
needs to calculate the total reward. The center agent continues the process until it reaches one of the goal states (where
there is not a peak hour) or traps states (existing only in the decreased binary model to be explained shortly). Subse-
quently, the whole process is repeated for other available sequences of actions. We will explain the stopping condition
of each model as we introduce them in the following sections. When learning is completed, the center agent becomes
able to act in a way that leads the agents to the intended state and send them back to the customer agents. Obviously, if
the number of the customers is assumed n, then the total number of communications between the customer agents and
the center agent will be in the order of O(n).
4.1 |Demand-based model
In this model, each state includes a list of the total energy demand of all customers during 24 h:
FIGURE 1 Centralized multiagent reinforcement learning to load shifting for costumers of the smart grid
6of18 GHAFFARI AND AFSHARCHI
S=d1,d2,,d24
½
fg ð6Þ
The definition of state in this form causes the state space to be continuous so that it is not possible to create the
whole state space. To solve this problem, we use the rounding method. In this method, first we choose some values as
base values, for example, 0, 100, 200, and round the demand to the nearest base value. Using this method, we change
the state space from continuous into discrete. Note that applying the method may be problematic in terms of accuracy,
but compared to the total demand, it can be ignored.
For example, suppose that we have two customers with the following demand vectors:
D1= 10,5, 5, 5,5, 5, 150,200, 0, 0,0, 0, 200,170, 60, 30,20, 30, 40,200, 150, 170, 200, 20½
and
D2= 0,0, 0, 0,0, 0, 100,150, 30, 10,10, 10, 100,150, 60, 50,50, 40, 20,200, 200, 150, 150, 0½
The total value of load demand in the grid on different hours is as follows:
G= 10,5, 5, 5,5, 5, 250, 350, 30, 10,10, 10, 350,320, 120, 80,70, 70, 60,400, 350, 320,350, 20½
As can be seen, the vector includes real numbers, which results in a continuous state space. Thus, given the values
0, 50, 100, as the base values, the starting state is
s= 0,0, 0, 0, 0, 0, 250,350, 50, 0,0, 0, 350,320, 100, 100,100, 100, 50,400, 350, 300, 350, 0½
By having access to the total demand of the grid per hour as well as peak hours, we can compute the amount of load
that should be shifted from these hours to off-peak hours. Thus, we define a set of actions in which the agents can act as
A=fah!h0hPH&h0PH0
jg ð7Þ
where PH0is a set of off-peak hours. Equation (7) shows that the load shifts from hour hto hour h
0
so that h=h
0
means no shift. Otherwise, the amount of demand that is shifted can be indicated using Equation (4).
Once the center agent attributes an action (a) to each agent in the current state (s), and transfers the new state (s
0
)
in which the agents are located, we use the equation
Ts,a
h!h0

:ð8Þ
for each agent do
s0
h0=sh0+rh0
i
s0
h=shrh
i
In the next step, the center agent computes the reward of this shift so that it can update the Q-table according to the
reward resulting from the action in the given state. Table 1 is used to compute the reward of action ain state s. To show
how a reward function works, an example is presented here. If a shift takes place from peak hour to off-peak hour, the
agents will have the reward of load shifting in which they participate. However, this will not be the case when they
change off-peak hours to peak hours (since a new peak hour is created). In this case, the customers will have to suffer
extra costs and, as a result, they will learn that they have not chosen the right action.
As mentioned before, by considering the suggested definition of the states, we face the dimensionality curse prob-
lem. In other words, although it is probable to create the whole state space theoretically, in practice it is impossible to
create it efficiently. Since creating the whole state space is not possible using the present model, we cannot claim that
the center agent has learned the desired behavior. Therefore, we use support vector regression (SVR). Suppose that the
GHAFFARI AND AFSHARCHI 7of18
state space has one dimension and also that the value assigned to Qof each state has a real number. The role of SVR is
to fit a function for the state space as well as their Q-value in a two-dimensional space (as shown in Figure 2). This
method assesses the Q-value of the state on the test day by comparing it with the value from the SVR. Therefore, by
using SVR instead of searching in a very big state space we make use of the most identical state in a small space.
Although the demand-based model, using rounding and SVR, is generally capable of dealing with huge state space,
the accuracy will decrease, as well. Therefore, we do not claim that this model is optimal with the best time complexity.
Algorithm 1. illustrates The corresponding algorithm to the demand-based model.
Algorithm 1. Centralized MARL load shifting demand-based model.
Phase 1: Learning phase
1: While not convergence do
2: Choose randomly data belong to previous years
3: s
0
Calculates the state by d
h
vector using Equation (6)
4: While exist peak hour do
5: s s
0
6: a
*
Sets an action to each agent using Equation (7)
7: r
t
Calculates reward (s,a
*
) using Table 1
TABLE 1 Rfunction
SAdh0+rh
iR
dhghNo shift dh
i×ch
i
Shift gh0rh
i×c0
i
hc0
i
h0

>gh0rh
i×c0
i
hc0
i
h0

dh>gh&dhrh
ighand 9i0i:dhrh
i0ghNo shift dh
i×ch
i
Shift gh0dh
irh
i

×c0
i
h

rh
i×c0
i
h0

>gh0dh
irh
i

×c0
i
h

+rh
i×c0
i
h0

dh>ghand dhrh
ighand 8i0i:dhrh
i0>ghNo shift dh
i×ch
i
Shift gh0dh
irh
i

×c0
i
h

+rh
i×c0
i
h0

>gh0dh
irh
i

×c0
i
h

rh
i×c0
i
h0

dhrh
i>ghNo shift rh
i×c0
i
h0c0
i
h

Shift gh0rh
i×c0
i
h0c0
i
h

>gh0rh
i×c0
i
h0c0
i
h

FIGURE 2 Two linear and polynomial techniques of SVR
method on sample data gathered using function sin(x)
8of18 GHAFFARI AND AFSHARCHI
8: s
0
Calculates new state using Equation (8)
9: Q
t+1
(s,a
*
) (1 α). Q
t
(s,a
*
)+α.(r
t
+γ.max Q
t
(s
'
,a
*
))
10: EndWhile
11: EndWhile
Phase 2: Testing phase
1: Each agent determines an hourly energy demand scheduling vector and sends it to center.
2: The center does following steps:
a: Calculates initial state
b: Runs SVR(Q) to fit the initial state with one of the learned states
c: Simulates behavior of agents and finds the sequence of actions that reaches them to the goal state
d: Sends back the sequence of actions to the agents
4.2 |Binary model
The demand-based model problems arise from continuous state space and its infinite variation. The first problem was
the continuity of the values, but we discretized the state space to remove the problem. The second problem of the model
was the curse of dimensionality while working with SVR. This prevented us from ensuring that the results are optimal.
SVR is not only inaccurate but also expensive to perform due to the high dimensions of the state space. All the men-
tioned reasons cause the suggested method to have lower efficiency. Thus, we tried to eliminate the direct dependence
of the state space on the load demand.
The idea we use to solve the problems is to assume the grid situation at hour hinsteadof the total demand. In other
words, in the binary model, we compare the total load of the grid during hour hwith the value that the grid can provide
and then store the results as a binary value. By doing so, we reduce the state space of the real numbers to 24 binary num-
bers. As a result, the state in this model includes a list of 24 numbers showing that the hour is either peak or off-peak:
S=x1,x2,,x24
½
fg ð9Þ
where
xh=1, hPH
0, hPH0
ð10Þ
Therefore, in this model, the states are either 0 or 1, where 1's refers to the peak hours and 0's to off-peak hours.
The goal state is a vector of 24 zero numbers. For example, assume that there are two customers with demand vectors
as follows:
D1= 10,5, 5, 5,5, 5, 150,200, 0, 0,0, 0, 200,170, 60, 30,20, 30, 40,200, 150, 170, 200, 20½
and
D2= 0,0, 0, 0,0, 0, 100,150, 30, 10,10, 10, 100,150, 60, 50,50, 40, 20,200, 200, 150, 150, 0½
The goal vector of grid is
G= 10,5, 5, 5,5, 105, 150,150, 130, 110,10, 160, 150,170, 170, 180, 170, 220,210, 200, 250,220, 200, 170½
The start state is
GHAFFARI AND AFSHARCHI 9of18
s= 0,0,0,0,0,0,1,1,0,0,0,0,1,1,0,0,0,0,0,1,1,1,1,0½
and the goal state is
s= 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0½
Note that doing actions in the learning phase could increase the number of 1's in state vectors because of the wrong
load shifting.
According to the definition of state in the model, it is not required to do rounding or use SVR since the total state
space is 2
24
, in which, it is possible to create all the states and learn it using the conventional Q-learning method.
Note that the problem of demand-based model arose from its state structure. In the binary model, we only change
the state structure to get a new transition function. So, the structures of action and reward will be similar to those of
the demand-based model. The transition function changes as follows:
Ts,ah!h0
ðÞ:ð11Þ
for each agent do
dh0 dh0+rh0
i
dh dhrh
i
s0
h= 0 and s0
h0=0, dh<ghand dh0<gh0
s0
h= 0 and s0
h0=1, dh<ghand dh0>gh0
s0
h= 1 and s0
h0=0, dh>ghand dh0<gh0
s0
h= 1 and s0
h0=1, d
h>ghand dh0>gh0
8
>
>
>
>
>
<
>
>
>
>
>
:
It is apparent from the definition that the new transition function acts in two steps. First, it shifts the load according
to the action of the agents; then, it updates the condition to generate a new state.
In the binary model, it is possible to create the whole state space but great care should be taken in selecting the
learning data. The model can provide an optimal answer only if proper data is selected. Algorithm 2. illustrates The
corresponding algorithm to the binary model.
Algorithm 2. Centralized MARL load shifting binary model.
Phase 1: Learning phase
1: While not convergence do
2: Choose randomly a data belonging to previous years
3: s
0
Calculates state by d
h
vector using Equation (9)
4: While exist peak hour do
5: s s
0
6: a
*
Sets an action to each agent using Equation (7)
7: r
t
Calculates reward (s,a
*
) using Table 1
8: s
0
Calculates new state using Equation (11)
9: Q
t+1
(s,a
*
) (1 α). Q
t
(s,a
*
)+α.(r
t
+γ.max Q
t
(s
'
,a
*
))
10: EndWhile
11: EndWhile
Phase 2: Testing phase
1: Each agent determines an hourly energy demand scheduling vector and sends it to center.
2: The center does following steps:
a: Calculates the initial state
b: Simulates behavior of agents and finds the sequence of actions that reaches them to the goal state
c: Sends back the sequence of actions to the agents
10 of 18 GHAFFARI AND AFSHARCHI
4.3 |Decreased binary model
The binary model has solved the problems of the demand-based model. However, it seems that the time complexity of
the binary model can be reduced (i.e., learning 2
24
states needs a lot of time). Therefore, we propose a decreased binary
model to reduce the time complexity. To this end, we employ the batch technique, which is based on the principle that
peak hours for long periods (i.e., season periods) are always the same.
In this proposed model, we associate the state space with the number of peak hours. To this end, off-peak hours are
deleted from the list and the creation of new peak hours is managed by adding a flag. Note that, in the decreased binary
model, the state consists of a vector with length PH + 1, which causes the whole of the state space to shrink from 2
24
to
2PH +1.
The definition of state in the decreased binary model is as follows:
S=x1,x2,,x24,flag12Þ
where
xh=1, hPH
0, hPH0
ð13Þ
and
flag =1, each PH does not change
0, O:W:
ð14Þ
According to the definition of S, the start state of the model is always a vector, including 1 in the number of PH ,
and 0 for the flag. The goal state will be a vector of PH +1,0.
PH = 12,13, 14, 19,20, 21
fg
The start state is
s= 1,1,1,1,1,1,0½
and the goal state is
s= 0,0,0,0,0,0,0½
Note that the value of flag changes to 1 when a new peak hour is created, which we refer to as the trap state. The
algorithm resets to the start state when facing the trap states. Also, the agents receive punishment to be discouraged
from going to this state.
Considering the changes created in the state of the model, compared to the binary model, it is required to redefine
the transition function. However, the definitions of action and reward are left unchanged. The important point in defin-
ing the transition function is to check the flag value
Ts,ah!h0
ðÞ:ð15Þ
for each agent do
dh0 dh0+rh0
i
dh dhrh
i
GHAFFARI AND AFSHARCHI 11 of 18
s0
h= 0 and s0
flag =0, dh<ghandPH0
s0
h= 0 and s0
flag =1, dh<ghand9P H0
s0
h= 1 and s0
flag =0, dh>ghandPH0
s0
h= 1 and s0
flag =1, dh>ghand9PH0
8
>
>
>
>
>
<
>
>
>
>
>
:
According to Equation (15), during the hour at which load is shifted, a new total demand is computed. Then, it is
compared with the desirable generation of the grid, so that the flag value is updated in case the grid fails to generate
the new demand. Algorithm 3. illustrates The corresponding algorithm to the decreased binary model.
In the present section, we have introduced three models based on CMARL to solve the peak problems in the smart
grids. The first model was the demand-based model, which has the curse of dimensionality problem. Next, we proposed
the binary model to solve the problem. In the end, to improve the learning phase running time, we suggested the
decreased binary model. Below, the proposed models' simulation results will be discussed
Algorithm 3. Centralized MARL Load Shifting Decreased Binary Model
Phase 1: Learning phase
1: While not convergence do
2: Choose randomly a data belong to previous years
3: s
0
Calculates state by d
h
vector using Equation (12)
4: While exist peak hour do
5: s s
0
6: a
*
Sets an action to each agent using Equation (7)
7: r
t
Calculates reward (s,a
*
) using Table 1
8: s
0
Calculates new state using Equation (15)
9: Q
t+1
(s,a
*
) (1 α). Q
t
(s,a
*
)+α.(r
t
+γ.max Q
t
(s
'
,a
*
))
10: If happening, halt Wachieving goal then.
11: go to step 2
12: EndIf
13: EndWhile
14: EndWhile
Phase 2: Test phase
1: Each agent determines an hourly energy demand scheduling vector and sends it to center.
2: The center does following steps:
a: Calculates initial state
b: Simulates behavior of agents and finds the sequence of actions which reaches them to goal state
c: Sends back the sequence of actions to agents
5|SIMULATION RESULTS AND ANALYSIS
In this section, we present the results of implementations of the proposed models. Real-world data have been used to
simulate the work,
17
which includes the initial demand of the customers. For example, we used data gathered from a
full-service restaurant and a hospital located in Anchorage state in 20042005.
Although binary simulations and the decreased binary models run during 00:0024:00 intervals, the simulations of
demand-based model runs during a 5-h interval between 16:00 and 20:00. The reason is that applying this model to con-
ventional systems is not viable due to the rapid growth of state space and the high cost of SVR application. Thus,
12 of 18 GHAFFARI AND AFSHARCHI
through reducing the state space size, we analyze some of our simulations in order to show how well our model is
working. Furthermore, we use λ= 0.6 for all the customers.
First, different models' functions will be compared in the case where the test data exist exactly in the learning data.
In other words, in this section, the scalability of the models facing the new states will be discussed.
Figure 3 shows the difference between the models where the data of test day existed in the learning data and the cir-
cumstance under which the test-day data were not similarly found in the learning data. The results show that the
demand-based model cannot guarantee optimal response unless it learns the data of the test day in the learning phase,
although it certainly improves the cost. The reason lies in the exact adjustment of the states in binary and decreased
binary models without considering the value of the customers' demand. However, for the demand-based model, an
insignificant change in the demand rate of an agent can create a new state, which has not been learned before. A close
examination of the figure shows that there is a trivial inexactness in the results of the demand-based model, which is
the result of rounding and SVR (since adjustment takes place in the nearest state). Thus, the demand-based model, in
spite of its complete simplicity, is not considered efficient in terms of computational accuracy and a real-world
situation.
Uncertainty in real-world problems can always lead to inaccurate outputs. Considering the stochastic nature of
energy production, in this section we try to study the way the proposed models act under the conditions of the best and
worst rate of production. Considering the high sensibility of the demand-based model to the type of data relating to the
test day, we involve both conditions of existing and missing test data in the learning phase.
First, we consider the ideal state in which the rate of energy production has the best condition. This causes the grid
to provide the demanded load of the customers during the hours with the highest customer satisfaction. Table 2, created
according to the best production rate of the grid, illustrates that all models always provide an acceptable answer to the
given state if the data of the learning phase include the data of the test day. However, even if the data of the test day are
not learned exactly, the binary and decreased binary models mostly give an acceptable answer to the problem. The
results of this section reveal the principle that if the grid has its best production rate, the learning in the binary and
decreased binary models will occur in a way that the agents give an optimal answer most often.
FIGURE 3 The proposed
models' behavior with test day
data: (A) Demand-based model.
(B) Binary model. (C) Decreased
binary model
GHAFFARI AND AFSHARCHI 13 of 18
Real-world scenarios show that sometimes the smart grids cannot produce the predicted level of energy because of
using renewable resources to produce energy. This could be attributed to different factors such as changes in weather
conditions. Table 3 shows a situation in which the level of energy production is not as predicted. According to the
results, the binary and decreased binary models act independent of the production rate, and in this case, they often pro-
vide an optimized answer. Comparing the results of the demand-based model in two states shows that the problem of
the model is not the way it learns, but it is the lack of adjustment that makes it work weaker than the two other models.
In other words, if the stochastic nature of the production is considered, all three models are flawless. Yet, it is the struc-
ture of the states that fails the demand-based model to ensure to find any optimized answer upon observing a new
state.
Another factor in need of investigation is λ. Since we make use of cooperative games, our prediction about the cus-
tomers' participation is that they tend to distribute a considerable part of their load demand if necessary. In other
words, we expect that the agents, released from their tendency to do some of their jobs on certain hours, participate
more effectively in load shifting and create the necessary condition of the grid. Our investigations show that the rate of
customer participation in load shifting moves in inverse relation to λ(see Figure 4). In this respect, according to our
prediction, the agents need to prefer the profit of the grid to their satisfaction if necessary. Of course, in the real world
some customers prefer their satisfaction over the whole profit of the grid. Considering the λfactor to vary from person
to person, we will study the behavior of the agents during the learning phase, and there is no reason to worry ourselves
over the trouble in the grid caused by this kind of customers.
TABLE 2 Responsiveness average
Method Model Test data is in learning data (%) Test data is not in learning data (%)
Centralized Demand-based 100 46
Binary 100 100
Decreased binary 100 100
TABLE 3 Responsiveness average in worst case
Method Model Test data is in learning Data (%) Test data is not in learning Data (%)
Centralized Demand-based 95 38
Binary 99 94
Decreased binary 99 98
FIGURE 4 Percentage of agents' participation in load shifting
for changing the amount of λ
14 of 18 GHAFFARI AND AFSHARCHI
In this article, we attempt to increase the customers' participation in load shifting during peak hours. Thus, we have
presented some results in Figure 5, which show that the agents are much inclined to do load shifting at peak hours than
at other hours. This can be explained by the way the reward function is defined. Sometimes load distribution takes
place on off-peak hours because the creation of traps states in the learning phase of the demand-based and binary
models which have no direct control over the creation of these states.
The customers of the grid are to be encouraged to change the time of their load demand. Reducing the costs of
energy can be a enticement for them to participate in load shifting. In our next experiment, we study the effects of our
models on the cost of energy. We try to show why the customers will be satisfied with participating in load shifting.
According to Figure 6, after load shifting, the average cost of energy was significantly reduced and the high cost, which
was imposed on the grid on peak hours, was removed.
Figure 7A demonstrates the learning phase's running time, which is not important as this phase is run only once
(or once a season). Note that the demand-based model is not efficient for such reasons as having the state constructed
by real numbers and using the SVR method.
In most of the studies, a rise in the number of customers results in decreasing efficiency of the running time. We
had claimed that our models are efficient even if the number of customers increases; Figure 7B confirms our claim.
FIGURE 5 Agent's participation in load shifting for a
day λ= 0.6
FIGURE 6 Effect of load shifting on cost change
GHAFFARI AND AFSHARCHI 15 of 18
FIGURE 7 The effect of
customers' number on running
time of learning phase and test
phase for the three proposed
models: (A) Learning phase
running time. (B) Test phase
running time
FIGURE 8 Total demand of 10 customers in a year
FIGURE 9 The total demand of ten customers for a week and
a day
16 of 18 GHAFFARI AND AFSHARCHI
Figure 8 illustrates the total demand of the grid customers for each season. According to this figure, the customers'
demand is routine and regular for a season. Therefore, peak hours are fixed for a season and customer demand is pre-
dictable, as we have considered. Figure 9 displays the total customer demands for a week and a day in detail.
In line with the hypothesis, load shifting reduces the costs of the grid for both utility companies and customers,
resolves balancing problems, and enhances the grid security via reducing the number of interactions among customers.
The results of our simulations indicate that learning helps in increasing load shifting at peak hours without creating
new peaks. Also, it paves the way to overcoming most of the problems that may have occurred in previous works, such
as load spike, selfish customers, customer dissatisfaction, and lack of scalability. Since different situations of the grid
are learned, an optimum sequence of actions that may lead to the goal states could be found. Thus, different types of
customers will not change the results, as their behavior have already been learned.
Further research is required to answer the cases whose customers have power generation and storage devices. Also,
centralized learning is also suggested, which is believed to have the potential to propose decentralized learning to load
shifting problems.
6|CONCLUSION
In this article, three models of managing energy in smart grids through load shifting were introduced, all of which oper-
ate based on centralized RL. The binary model was developed to eliminate the disadvantages of the demand-based
model, and the decreased binary models were introduced to improve the learning rate of the binary model. The
decreased binary model is the most thorough compared to other ones. According to the simulation results, the
suggested working methods reduce costs while improving smart grid's performance in terms of energy production and
distribution. Furthermore, the structures that are used improve the rate of decision making in the grid. The complica-
tions resulting from finding the point of Nash equivalent (which is required in the most works of the field) are absent
here and the interactions between the agents are at the minimum level.
PEER REVIEW
The peer review history for this article is available at https://publons.com/publon/10.1002/2050-7038.12748.
DATA AVAILABILITY STATEMENT
Data sharing is not applicable to this article as no new data were created or analyzed in this study.
ORCID
Mohsen Ghaffari https://orcid.org/0000-0002-1939-9053
Mohsen Afsharchi https://orcid.org/0000-0001-8329-9463
REFERENCES
1. Mohsenian-Rad A, Wong VW, Jatskevich J, Schober R, Leon-Garcia A. Autonomous demand-side management based on game-theoretic
energy consumption scheduling for the future smart grid. IEEE Trans Smart Grid. 2010;1:320-331.
2. Atzeni I, Ordóñez LG, Scutariand G, Palomar DP, Fonollosa JR. Noncooperative and cooperative optimization of distributed energy gen-
eration and storage in the demand-side of the smart grid. Trans Sig Process. 2013;61:2454-2472.
3. Manshaei MH, Zhu Q, Alpcan T, Bas¸ar T, Hubaux J-P. Game theory meets network security and privacy. ACM Comput Surv. 2013;45:
1-39.
4. Fadlullah ZM, Quan DM, Kato N, Stojmenovic I. GTES: an optimized game-theoretic demand-side management scheme for smart grid.
IEEE Syst J. 2014;8:588-597.
5. Chen H, Li Y, Louie RH, Vucetic B. Autonomous demand side management based on energy consumption scheduling and instanta-
neous load billing: an aggregative game approach. IEEE Trans Smart Grid. 2014;5:1744-1754.
6. Chai B, Chen J, Yang Z, Zhang Y. Demand response management with multiple utility companies: a two-level game approach. IEEE
Trans Smart Grid. 2014;5:722-731.
7. Wang Y, Saad W, Mandayam NB, Vincent Poor H. Load shifting in the smart grid: to participate or not? IEEE Trans Smart Grid. 2015;7
(6):2604-2614.
8. Wijaya TK, Larson K, Aberer K. Matching demand with supply in the smart grid using agent-based multiunit auction. In Fifth Interna-
tional Conference on Communication Systems and Networks (COMSNETS); 2013.
GHAFFARI AND AFSHARCHI 17 of 18
9. He M, Zhang F, Huang Y, Chen J, Wang J, Wang R. A distributed demand side energy management algorithm for smart grid. Energies.
2019;12(3):426.
10. Kalogeropoulos I, Sarimveis H. Predictive control algorithms for congestion management in electric power distribution grids. Appl Math
Model. 2020;77:635-651.
11. Jamil M, Mittal S. Hourly load shifting approach for demand side management in smart grid using grasshopper optimisation algorithm.
IET Gen Trans Dis. 2020;14(5):808-815.
12. Ali SNH, Lenzen M, Huang J. Shifting air-conditioner load in residential buildings: benefits for low-carbon integrated power grids. IET
Renew Power Gen. 2018;12(11):1314-1323.
13. Khalid Z, Abbas G, Awais M, Alquthami T, Rasheed MB, Novel Load A. A Novel Load scheduling mechanism using artificial neural net-
work based customer profiles in smart grid. Energies. 2020;13(5):1-23.
14. Afzaal A, Kanwal F, Ali AH, Bashir K, Anjum F. Agent-based energy consumption scheduling for smart grids: an auction-theoretic
approach. IEEE Access. 2020;8:73780-73790.
15. Chamandoust H, Derakhshan G, Hakimi SM, Bahramara S. Tri-objective scheduling of residential smart electrical distribution grids
with optimal joint of responsive loads with renewable energy sources. J Energy Storage. 2020;27:101112.
16. Mohammad Shahidehpour ZL, Yamin H. Market Operations in Electric Power Systems: Forecasting, Scheduling, and Risk Management.
Electric Power Systems: Wiley-IEEE Press; 2002.
17. OpenEI. Available from http://en.openei.org/datasets/files/961/pub/.
18. Melo FS. Convergence of Q-learning: a simple proof; 2001.
How to cite this article: Ghaffari M, Afsharchi M. Learning to shift load under uncertain production in the
smart grid. Int Trans Electr Energ Syst. 2021;31:e12748. https://doi.org/10.1002/2050-7038.12748
APPENDIX A
Convergence of algorithm
In this section, the convergence of the proposed models is discussed. Since the applied method to prove convergence is
similar in all the models discussed, we shall present our evidence only once.
Theorem 1. For one finite Markov decision process (MDP), the Q-learning always converges in optimal level in stan-
dard conditions.
Considering that Theorem 1 was proved in Reference 18, in order to prove the convergence of the suggested models,
it is sufficient in the present case to prove that MDP of the models is finite.
Observation 1. Production and distribution costs of the grid are always of finite value.
Theorem 2. The suggested models converge to an optimized value.
Proof
According to Observation 1, the costs of the grid is never infinite. Thus, the costs of the grid for individual customers
are finite and, finally, the reward function, which is defined based on the customers' costs, cannot have an infinite
value. Since the higher limits of the costs of the grid, as well as of the customers, are finite and the lower limit of
the costs is also finite, the reduction of costs can be done to finite numbers. On the other hand, all defined actions
are disparate and always occur on peak hours and the number of customers is finite. Therefore, the number of
states of load shifting cannot be infinite. Then, it can be claimed that the suggested models always have finite
MDP and, according to Theorem 1, the models are convergent in an optimal level.
18 of 18 GHAFFARI AND AFSHARCHI
... Machine Learning (ML) approaches have been significantly used to design detection and prediction systems (Ghaffari & Afsharchi, 2020). Detection systems extract features of a phenomenon, like a disease, and detect the possibility of occurring by having some observations and witnesses. ...
Article
Full-text available
To manage the propagation of infectious diseases, particularly fast-spreading pandemics, it is necessary to provide information about possible infected places and individuals, however, it needs diagnostic tests and is time-consuming and expensive. To smooth these issues, and motivated by the current Coronavirus disease (COVID-19) pandemic, in this paper, we propose a learning-based system and a hidden Markov model (i) to assess hazardous places of a contagious disease, and (ii) to predict the probability of individuals’ infection. To this end, we track the trajectories of individuals in an environment. For evaluating the models and the approaches, we use the Covid-19 outbreak in an urban environment as a case study. Individuals in a closed population are explicitly represented by their movement trajectories over a period of time. The simulation results demonstrate that by adjusting the communicable disease parameters, the detector system and the predictor system are able to correctly assess the hazardous places and determine the infection possibility of individuals and cluster them accurately with high probability, i.e., on average more than 96%. In general, the proposed approaches to assessing hazardous places and predicting the infection possibility of individuals can be applied to contagious diseases by tailoring them to the influential features of the disease.
... This can yield the result of increasing customer satisfaction (Li et al., 2019). Otherwise, these projects cannot be preferable, the sustainability of these projects would be in jeopardy (Ghaffari and Afsharchi, 2021). Caputo et al. (2018) aimed to identify the critical factors of the performance improvement regarding smart grid projects. ...
Article
Full-text available
Smart grid systems help increase RWJ projects (RWJ) so that environmentally friendly energy production can be generated. However, efficient technologies should be implemented to ensure the sustainability of smart grid systems. This study aims to evaluate renewable-friendly smart grid technologies regarding distributed energy investment projects by using a hybrid picture fuzzy rough decision-making approach. Firstly, selected criteria are weighted using the multi stepwise weight assessment ratio analysis (M-SWARA) method based on picture fuzzy rough sets (PFRSs). Subsequently, different renewable-friendly smart grid technologies are ranked with the complex proportional assessment (COPRAS) technique by using PFRSs. It is determined that research and development play the most critical role with respect to the renewable-friendly smart grid technologies for distributed energy investment projects. On the other side, cost is another essential factor for this issue. It is also identified that direct current links are the most important renewable-friendly smart grid technology alternative. Priorities should be given to the development of research and development studies on renewable energies to increase the efficiency of smart grid systems. In this context, private sector companies have a very important role. Similarly, incentives provided by governments to RWJ research and development studies should be increased. Within the scope of these studies, new technologies for RWJ types should be emphasized. In this context, new technologies for all RWJ alternatives should be followed comprehensively. Increasing research and development for such investments will also make smart grid systems more successful.
Article
Full-text available
The future smart grid would help to benefit both the users and the electricity providing companies from smart pricing techniques. In addition, smart pricing can be used to achieve social objectives and would in turn fluctuate wholesale market into demand side. Collecting abundant information regarding the users electricity consumption pattern is a challenging task for utility providing companies. That is, users may not be willing to expose their indigenous information without any incentive. In this paper an Optimal Energy Consumption Scheduling (OECS) mechanism is proposed to tackle this problem. An agent-based forecasting method is designed, which is capable of predicting energy consumption of each consumer with a lead-time of one hour. This forecasting is exploited to estimate the cost of buying required amount of energy from multiple suppliers. Consequently, based on the estimated required energy and cost, an auction mechanism is proposed to optimize the energy traded between consumers and multiple suppliers within a smart grid. The objectives include increased efficiency and cost reduction of electricity usage by the end users. The results and properties of the proposed OECS mechanism are studied, and it is shown that the auction technique is budget balanced for distribution of electrical energy among consumers from diverse renewable generation resources. Extensive numerical simulations are also conducted to show and prove the beneficial properties of OECS mechanism.
Article
Full-text available
In most demand response (DR) based residential load management systems, shifting a considerable amount of load in low price intervals reduces end user cost, however, it may create rebound peaks and user dissatisfaction. To overcome these problems, this work presents a novel approach to optimizing load demand and storage management in response to dynamic pricing using machine learning and optimization algorithms. Unlike traditional load scheduling mechanisms, the proposed algorithm is based on finding suggested low tariff area using artificial neural network (ANN). Where the historical load demand individualized power consumption profiles of all users and real time pricing (RTP) signal are used as input parameters for a forecasting module for training and validating the network. In a response, the ANN module provides a suggested low tariff area to all users such that the electricity tariff below the low tariff area is market based. While the users are charged high prices on the basis of a proposed load based pricing policy (LBPP) if they violate low tariff area, which is based on RTP and inclining block rate (IBR). However, we first developed the mathematical models of load, pricing and energy storage systems (ESS), which are an integral part of the optimization problem. Then, based on suggested low tariff area, the problem is formulated as a linear programming (LP) optimization problem and is solved by using both deterministic and heuristic algorithms. The proposed mechanism is validated via extensive simulations and results show the effectiveness in terms of minimizing the electricity bill as well as intercepting the creation of minimal-price peaks. Therefore, the proposed energy management scheme is beneficial to both end user and utility company.
Article
Full-text available
In this new era of communication, the advent of the smart grid has revolutionised the power system network. The goal of smart grids is to provide a more reliable, environment‐friendly and economically efficient power system. Demand side management or demand side response is one of the key components of the smart grid which accomplishes the smart grid that would provide intelligence to the traditional grid. Here, a new approach has been proposed for the demand side management, which is based on shifting a load from peak to off‐peak time. The main objective of the work is to reduce the peak hour demand and the utility bill of the consumers. To achieve these objectives, the proposed strategy is modelled as a minimised optimisation problem and it tries to find out the optimal solution. For that, two optimisation algorithms, the first one is particle swarm optimisation algorithm and the second one is grasshopper optimisation algorithm, are proposed and applied in three area loads of the smart grid, i.e. residential, commercial and industrial. The obtained simulation results show a significant reduction in peak hour demand and utility bills.
Article
Full-text available
This paper proposes a model predictive control (MPC) framework-based distributed demand side energy management method (denoted as DMPC) for users and utilities in a smart grid. The users are equipped with renewable energy resources (RESs), energy storage system (ESSs) and different types of smart loads. With the proposed method, each user finds an optimal operation routine in response to the varying electricity prices according to his/her own preference individually, for example, the power reduction of flexible loads, the start time of shift-able loads, the operation power of schedulable loads, and the charge/discharge routine of the ESSs. Moreover, in the method a penalty term is used to avoid large fluctuation of the user’s operation routines in two consecutive iteration steps. In addition, unlike traditional energy management methods which neglect the forecast errors, the proposed DMPC method can adapt the operation routine to newly updated data. The DMPC is compared with a frequently used method, namely, a day-ahead programming-based method (denoted as DDA). Simulation results demonstrate the efficiency and flexibility of the DMPC over the DDA method.
Article
Full-text available
Demand-side management (DSM) has emerged as an important smart grid feature that allows utility companies to maintain desirable grid loads. However, the success of DSM is contingent on active customer participation. Indeed, most existing DSM studies are based on game-theoretic models that assume customers will act rationally and will voluntarily participate in DSM. In contrast, in this paper, the impact of customers' subjective behavior on each other's DSM decisions is explicitly accounted for. In particular, a noncooperative game is formulated between grid customers in which each customer can decide on whether to participate in DSM or not. In this game, customers seek to minimize a cost function that reflects their total payment for electricity. Unlike classical game-theoretic DSM studies which assume that customers are rational in their decision-making, a novel approach is proposed, based on the framework of prospect theory (PT), to explicitly incorporate the impact of customer behavior on DSM decisions. To solve the proposed game under both conventional game theory and PT, a new algorithm based on fictitious player is proposed using which the game will reach an epsilon-mixed Nash equilibrium. Simulation results assess the impact of customer behavior on demand-side management. In particular, the overall participation level and grid load can depend significantly on the rationality level of the players and their risk aversion tendency.
Article
Full-text available
In this paper, we investigate a practical demand side management scenario where the selfish consumers compete to minimize their individual energy cost through scheduling their future energy consumption profiles. We adopt an instantaneous load billing scheme to effectively convince the consumers to shift their peak-time consumption and to fairly charge the consumers for their energy consumption. For the considered DSM scenario, an aggregative game is first formulated to model the strategic behaviors of the selfish consumers. By resorting to the variational inequality theory, we analyze the conditions for the existence and uniqueness of the Nash equilibrium (NE) of the formulated game. Subsequently, for the scenario where there is a central unit calculating and sending the real-time aggregated load to all consumers, we develop a one timescale distributed iterative proximal-point algorithm with provable convergence to achieve the NE of the formulated game. Finally, considering the alternative situation where the central unit does not exist, but the consumers are connected and they would like to share their estimated information with others, we present a distributed synchronous agreement-based algorithm and a distributed asynchronous gossip-based algorithm, by which the consumers can achieve the NE of the formulated game through exchanging information with their immediate neighbors.
Article
In this paper, model predictive control methodologies are developed to address two main issues which arise in electric power distribution systems, namely the congestion of the distribution lines and the balancing problem. Consumer energy demand is divided into an uncontrollable part, a controllable part that can be either stored in energy storage devices in order to be consumed at later times or shifted in time in the form of hourly consumption or a consumption that maintains a pattern. Demand – response strategies involve consumers actively in the balancing effort and are part of the MPC methodologies, which are formulated as Mixed Integer Quadratic Program optimization problems involving both continuous and binary variables. Finally, these new developments are tested on the IEEE European Low Voltage Test Feeder which highlights the performance of the proposed control schemes.
Article
This study presents a simulation of low-carbon electricity supply for Australia, contributing new knowledge by demonstrating the benefits of load shifting in residential buildings for downsizing renewable electricity grids comprising wind, hydro, biomass, and solar resources. The load-shifting potential for the whole of Australia is estimated, based on air-conditioner load data and an insulation model for residential buildings. Load shifting is applied to enable transferring residential airconditioner load from peak to off-peak periods, assuming that air-conditioners can be turned-on a few hours ahead of need, during periods where demand is low and renewable resource availability is high, and turned-off during periods of high demand and low resource availability. Thus, load shifting can effectively reduce installed capacity requirements in renewable electricity grids. For 1 h load shifting of residential air-conditioners, Australian electricity demand can be met at the current reliability standards by 130 GW installed capacity, at cost around 12.5 ¢/kWh, and a capacity factor of 32%. The installed capacity can be further reduced by increasing the number of hours that loads can be shifted. The findings suggest that the application of load shifting in residential buildings can play a significant role for power networks with high renewable energy penetration.
Article
This survey provides a structured and comprehensive overview of research on security and privacy in computer and communication networks that use game-theoretic approaches. We present a selected set of works to highlight the application of game theory in addressing different forms of security and privacy problems in computer networks and mobile applications. We organize the presented works in six main categories: security of the physical and MAC layers, security of self-organizing networks, intrusion detection systems, anonymity and privacy, economics of network security, and cryptography. In each category, we identify security problems, players, and game models. We summarize the main results of selected works, such as equilibrium analysis and security mechanism designs. In addition, we provide a discussion on the advantages, drawbacks, and future direction of using game theory in this field. In this survey, our goal is to instill in the reader an enhanced understanding of different research approaches in applying game-theoretic methods to network security. This survey can also help researchers from various fields develop game-theoretic solutions to current and emerging security problems in computer networking.