ArticlePDF Available

Machine learning based load prediction in smart‐grid under different contract scenario

Wiley
IET Generation, Transmission & Distribution
Authors:

Abstract and Figures

Many progressed information scientific strategies, particularly Artificial Intelligence (AI) and profound learning methods, have been proposed and tracked down wide applications in our general public. This proposition creates information driven arrangements by utilizing the most recent profound learning and AI innovation, including outfit learning, meta‐learning and move learning, for energy the executives framework issues. Genuine world datasets are tried on proposed models contrasted and best in class plans, which exhibit the predominant presentation of the proposed model. In this proposition, the engineering of the Smart Grid testbed is additionally planned and created by using ML calculations and true remote correspondence frameworks to such an extent that constant plan necessities of Smart Grid testbed is met by this reconfigurable system with stacking of full convention in medium access control (MAC) and physical layers (PHY). The proposed engineering has the reconfiguration property in view of the organization of remote correspondence and trend setting innovations of Information and communication technologies (ICT) which incorporates Artificial Intelligence (AI) calculation. The fundamental plan objectives of the Smart Grid testbed is to make it simple to construct, reconfigure and scale to address the framework level prerequisites and to address the ongoing necessities.
This content is subject to copyright. Terms and conditions apply.
Received: 17 January 2023 Revised: 23 February 2023 Accepted: 15 March 2023 IET Generation, Transmission & Distribution
DOI: 10.1049/gtd2.12828
ORIGINAL RESEARCH
Machine learning based load prediction in smart-grid under
different contract scenario
Piyush Kumar Yadav1Rajnish Bhasker1Albert Alexander Stonier2Geno Peter3
Arun Vijayakumar4Vivekananda Ganji5
1Department of Electrical Engineering, Veer
Bahadur Singh Purvanchal University, Jaunpur,
India
2School of Electrical Engineering, Vellore Institute
of Technology, Vellore, India
3CRISD, University of Technology Sarawak, Sibu,
Malaysia
4Department of Electrical and Electronics
Engineering, Sree Vidyanikethan Engineering
College, Tirupati, India
5Department of Electrical and Computer
Engineering, Debre Tabor University, Amhara,
Ethiopia
Correspondence
Vivekananda Ganji, Department of Electrical and
Computer Engineering, Debre Tabor University,
Amhara, Ethiopia.
Email: vivekganji@dtu.edu.et
Funding information
University Research Grant, Grant/Award Number:
UTS/3/2022/06; Centre for Research of Innovation
& Sustainable Development (CRISD) of University
of Technology Sarawak, Malaysia
Abstract
Many progressed information scientific strategies, particularly Artificial Intelligence (AI)
and profound learning methods, have been proposed and tracked down wide applications
in our general public. This proposition creates information driven arrangements by uti-
lizing the most recent profound learning and AI innovation, including outfit learning,
meta-learning and move learning, for energy the executives framework issues. Genuine
world datasets are tried on proposed models contrasted and best in class plans, which
exhibit the predominant presentation of the proposed model. In this proposition, the
engineering of the Smart Grid testbed is additionally planned and created by using ML
calculations and true remote correspondence frameworks to such an extent that constant
plan necessities of Smart Grid testbed is met by this reconfigurable system with stacking of
full convention in medium access control (MAC) and physical layers (PHY). The proposed
engineering has the reconfiguration property in view of the organization of remote corre-
spondence and trend setting innovations of Information and communication technologies
(ICT) which incorporates Artificial Intelligence (AI) calculation. The fundamental plan
objectives of the Smart Grid testbed is to make it simple to construct, reconfigure and
scale to address the framework level prerequisites and to address the ongoing necessities.
1 INTRODUCTION
Quick advancement in urbanization achieves massive changes
in individual ways of life. Considering this pattern, many testing
issues like natural contamination, traffic issues, high energy
utilization etc.—are raised. To resolve these issues, the idea of
metropolitan registering is presented, which includes gather-
ing, coordinating and dissecting the information produced by
gadgets in a metropolitan region to further develop individual
life quality [1]. With the quick improvement of man-made
consciousness, AI methods, specifically, profound learning,
show high potential for tending to numerous metropolitan
processing issues. This is principal because of the leap forwards
This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is
properly cited.
© 2023 The Authors. IET Generation, Transmission & Distribution published by John Wiley & Sons Ltd on behalf of The Institution of Engineering and Technology.
in registering and the quick advances in detecting and infor-
mation obtaining transmission, and capacity [2]. Scientists deal
with massive amounts of data and use it more effectively. High
energy efficiency, request side administration, environmentally
friendly power sources and a two-way progression of data and
power, as enabled by joining of interchanges, control, and sign
handling, characterize the current feasible metropolitan power
frameworks, that is, the shrewd matrix [3].
The advancement of age, transmission, activity and uti-
lization essentially influences savvy framework improvement,
affecting its preparation and activity, which carries new view-
points to energy and request reaction in the shrewd matrix
[4]. New and clean gadgets, as well as current innovations in
1918 wileyonlinelibrary.com/iet-gtd IET Gener. Transm. Distrib. 2023;17:1918–1931.
YADAV ET AL.1919
FIGURE 1 Traditional electric power grid system.
information investigation, correspondence frameworks, con-
trol, and data hypothesis, empower a high-level power frame-
work with higher energy productivity and power conveyance
strength [5]. The previous customary electric power frameworks
displayed in Figure 1which are electromechanically controlled
frameworks, have been advancing towards the electronically
controlled network as of late. The various electric cycles that are
occurring inside the shrewd matrix frameworks are facilitated by
the savvy network frameworks’ administration of data, meth-
ods for control, brilliant detecting, and furthermore shrewd
gadgets, which incorporate brilliant detecting and furthermore
brilliant gadgets. As of late, savvy frameworks, as displayed in
Figure 2, have been progressively used to change the prepa-
ration and activity of conventional lattice frameworks in three
principal classifications. This is because of their capacity to
increment productivity, decline costs, and further develop unwa-
vering quality. These incorporate: (1) constant craftsmanship
checking which empowers information to impart between activ-
ity focuses and naturally changes the interaction to work on the
cycle, (2) sharing brilliant gadgets and frameworks information
and (3) handling and breaking down information in a shrewd
manner.
The savvy framework is an arising innovation that accompa-
nies its own arrangement of difficulties. Perhaps the main issue
is that a shrewd framework is made out of an intricate organi-
zation of gadgets. The following are four ongoing issues that
are related to brilliant matrix frameworks. One significant part
of shrewd lattices includes determining the heap, dependability
appraisal, discovery of deficiencies and security in savvy net-
works. Thus, due to the extremely high measure of multi-type
information which is additionally multi-faceted, we really want
to gather this information from shrewd networks involving
savvy gadgets to resolve these issues. This can be accomplished
by utilizing Artificial Intelligence (AI) advancements. In any
case, in ordinary power network frameworks their numerous
constraints to display, streamline and control the information
which is exceptionally tremendous.
A deep neural network may not always be the best choice for
diverse learning challenges; it is vital to select the optimal neural
network structure correctly. In this paper, a deep (LSTM) and
shallow (FCC) structure for the two distinct degrees of learning
is used. It is commonly understood that deeper the neural net-
work, the more likely overfitting. As a result, having a learner
that can give significant learning capacity while employing as
few layers as feasible is very desirable. The first-level learner in
the proposed paradigm captures the majority of the non-linear
link between input and output data, whereas the second-level
learner discerns the linear connection between them.
1920 YADAV ET AL.
FIGURE 2 Smart grid architecture.
The principal benefit of the utilization of AI that utilizes
ML calculations is that machines are made to naturally gain
proficiency with the information, without the requirement for
mediation by people. AI calculations help in building a model
naturally that depends on the information examples and inter-
relations in information. This is additionally called managed
learning, and it empowers to anticipate future qualities in view
of past qualities [18]. ML calculations can be grouped into two
sorts: regulated and unaided learning.
Consequently, exceptionally enormous information can be
taken care of by AI methods by utilizing insightful machines.
One approach to carrying out AI strategies is by utilizing ML
calculations. Computer-based intelligence procedures likewise
utilize Artificial Neural Networks (ANN), Robotics, Expert Sys-
tems (ES), and Fuzzy Logic (FL) for information handling in
a proficient way with speed and precision. The utilization of
AI procedures in savvy lattice enjoys colossal benefits. The AI
procedures can be coordinated into a clever machine or PC
which acts as a savvy framework and every one of the capac-
ities can be constrained by this brilliant machine. Any issues
related to power frameworks can likewise be handily tended
to making them as self-recuperating as AI methods give more
exact and solid arrangements. Now and again, AI procedures
will most likely be unable to supplant the matrix administra-
tors. Consequently, many difficulties are related to utilizing AI
methods in the networks. In shrewd lattice applications, AI pro-
cedures can be applied in two ways, one by utilizing actual AI
strategies and one more by utilizing virtual AI methods. Fur-
ther, Artificial Narrow Intelligence (ANI) and Artificial General
Intelligence (AGI) frameworks can be distinguished. In ANI,
the AI frameworks are intended for explicit errands in view
of the prerequisites and requirements, for example, determin-
ing load utilizing different datasets. Fake General Intelligence,
or AGI, is a term used to portray man-made consciousness, or
AI, frameworks that are planned and created with the goal to
advance independently in a way like that of individuals. Thus
parcel of exertion is made in creating AGI frameworks for savvy
network applications.
The overall objective of the paper is to compare performance
comparative analysis of the different ML algorithms based on
accuracy and precision along with misclassification pertaining
to load prediction problem
The remaining sections of the paper is structured as follows:
The load forecasting methods are described in Section 2.The
problem statements and proposed Ensemble Learning Frame-
work are given in Sections 3and 4, respectively. The learner at
the beginner level, training and optimization are explained in
Sections 5and 6respectively. Algorithms for Gradient Descent
and Levenberg-Marquardt (LM) with modifications are included
in Sections 7and 8. The datasheet and results are discussed
in Sections 9and 10, respectively. Conclusions are discussed in
final section.
2LOAD FORECASTING
Various approaches for load anticipation have been proposed.
The two major forms of thinking that can be broadly applied are
computer-based knowledge and quantitative approaches. For
instance, the developers present a gathering technique based on
YADAV ET AL.1921
a typical reading-up gadget for blazing issue selection. Broad-
ened Basis Function (RBF) uses frontal brain networks with
a 2D-request calculation for blazing issue finding [6]. The
frontal cortex local area plan in each of these two plans has
improved slightly. In view of their new demonstrated develop-
ment in PC, innovative and insightful and ordinary language
care, massive dominating has solidified into a hot approach
(NLP). Spasmodic frontal cortex foundations, such as Long
Short, are among the various massive learning models. In [7], the
Long Short-Term Memory (LSTM) was introduced for man-
aging non-public information. An LSTM-based Sequence to
Sequence (S2S) planning can govern both one-second and one-
hour objective data for one private customer, as shown in [8]. In
[9], the creators spin around a transient, anticipating the appli-
cation of LSTM by a character shopper. Using a Deep Residual
Network, the usefulness of cautious brief weight surveying (res-
web) is demonstrated. In addition, Quantile Regression is an
excellent method for determining the load. The creators of
use the quantile lose faith paradigm to improve surveys exe-
cution. Authors of the work on ordinary quantile apostatize
contemplations alliance and demonstrate its definite usefulness
in probabilistic weight forecasting.
To deal with the heap anticipating problem, set obtaining
information on strategy is proposed in this fragment. Two levels
of pupils are included in the proposed framework. The LSTM
variation is used by the basic stage student to obtain phenome-
nal stage assumptions, while a completely related course (FCC)
mind network is converged by the second stage student with
the goal of a model mix. Three major substances have been
added to the suggested form. Factor load gauging, on the other
hand, is a loss the faith issue to which solely obtaining informa-
tion on approach can be easily applied. The suggested structure
combines performance learning with a planned information
gathering methodology for actual weight estimation, which is
completely new way of thinking that departs from previous
weight estimation models. In particular, grouping calculations
are used in the anticipated creation to divide records into sep-
arate groups based on their proximity. Each records bundle
is then utilized to create an LSTM basis model for the main
degree measure that purchased. The critical degree assumption
outcomes are then cemented on a continuous stage FCC mind
local area as overseen figuring out a way to works of art at the
accuracy of weight gauging. Secondly, for various concentrating
on concerns, a large psyche association may not always be the
best; it is critical to choose the proper frontal cortical local area
structure adequately [10]. A boundless (LSTM) and a shallow
(FCC) shape in the unquestionable degrees of being conscious,
autonomously is chosen in these works of art. It is clear that
when the frontal brain is involved, there would be a near 100
percent overfitting. As a result, having a student who can pro-
vide the best concentration on probation while also using as
few layers as possible is really appealing. The basic certificate in
which the student passes through a few distance limits exhibits
a non-linear connection between data and result records in the
proposed structure, while the second-stage student observes the
moment pursuing among them [11].
FIGURE 3 Window filter method as well as the generation of input and
output data.
3PROBLEM STATEMENT
This section revolves around the subject of stack gauging. Con-
sider a signal with a length series. YT=f1,f2,…, Fm1;l,YTis
made up of two segments, the part, and the weight part. FiT is
the chronicled realities of the ith join that influences load in the
component portion, fi=fi1,fi2,…, fiT. Temperature, for exam-
ple, is one of the statutorily included factors that influence the
power load. If capacities are not specified in the dataset, this
section will be set to invalid, and the gauging will rely on pre-
viously recorded load records, that is, the recorded load data, is
conveyed by the heap component [12].
The goal is to work out the weight at T+tin a rolling antic-
ipatory manner style, in which tis far more extensive than it is
currently T. That is, we accept that the records are open at and
before T,thatis,Ytfor tT, while expecting lT+t. For example, YT
is open and used to degree the heap respect at time T+1 (i.e.,
when striding ahead). To make planning easier and cut down on
planning time, a window channel Wis added to YT, which stores
explicitly the insights for wtime tasks, from time Tto time T
w+1. ST=W(YT), which is an m×wmatrix, is a reasonably
good representation of the information shape ST.
A fitting function yields the forecast value lT+tas follows:
lT+t=S(T)(1)
The objective of the proposed AI-based prescient strategy is
to become familiar with the fitting capacity g(.) from the dataset
YTthat is accessible as shown in Figure 3.
4PROPOSED ENSEMBLE LEARNING
FRAMEWORK
The principle of stacking is consolidated in our system [13]
to achieve great precision in load prediction. Stacking is a
technique for preparing individual AI models first and then
coordinating them [14]. In the suggested system, there are two
1922 YADAV ET AL.
FIGURE 4 With two tiers of learners, the proposed load prediction
framework.
levels of students: the primary level student comprises of a few
people acquiring information on models, and the optional level
researcher utilized to join the discoveries from the essential
stage understudies for a mixed outcome. To meet the necessi-
ties of stacking and testing, the measurements must be separated
into three amounts. The main part of the material is utilized
by under studies at the main degree (meant through D1). Fol-
lowing the gathering and preparing of the first-degree getting
to know moulds, this level of student produces shining realities
that are blended in with the second and 0.33 bits of information
(connoted through D2 and D3, consistently). The mixed two
parts of records are utilized to plan and investigate the shape
of the second-stage understudy. We suggest the utilization of
the LSTM, a rehashing cerebrum network rendition, for num-
ber one phase learning and the FCC mind network for auxiliary
stage getting to be aware in this part. The proposed gadget’s plan
is portrayed in Figure 4. Following preprocessing, the dataset is
isolated into three pieces for training and looking at D1, D2 and
D3. A grouping computation with lot of LSTM designs inside
the dominating level under study, and an FCC model in the
second-certificate under study make up the proposed sign. The
plan of these parts is elucidated exhaustively in the remainder of
this segment.
5LEARNER AT THE BEGINNER
LEVEL
The main level student consists of several LSTM prediction
models as well as a grouping calculation, as detailed in Algo-
rithm 1. The bunching computation separates D1intoD11,
D12,…, and D1k, each of which is utilized to build its own
LSTM model.
5.1 Grouping
Before the LSTM models can use the data, we apply a gath-
ering computation to part the dataset based on the similarity
of the input data tests. Clustering is a stand-alone AI strategy
that refers to the most extensively used way of grouping unla-
belled data into groups with similar components [15]. This is
notable in comparison to organize, which is based on verified
facts. It is commonly recognized that power consumption is
controlled by a variety of obvious factors such as temperature
and timetable dates (e.g., workday, event, month, season etc.), as
well as weaknesses or dormant factors. In our decision model
with the going with reasons, we propose using solo learning. As
a matter of first importance, group input data. Weight gauging
into appropriate groupings and using a different learning model
for each set can help researchers investigate the correlation in
the dataset more quickly. Second, we believe that historical data
from before this period will influence transient weight distribu-
tions. Clustering can load the data testing thus and fairly with
unlabelled unquestionable electric weight data. To summarize,
assigning the planning dataset first and then joining the model
results later resembles a form of resampling process. This is sim-
ilar to the cross-endorsement method, which can help AI reduce
overfitting.
5.2 Algorithms for batching
For the most part, the cumulative power load time series data
is unaffected by upheaval, movement or contortion. To man-
age such data, it is critical to select a lawful data collection
strategy from a variety of existing options. In this section, four
specialty computations from three types of grouping methods
is enumerated as follows: (i) isolating approaches, (ii) various
evened-out procedures and (iii) thickness-based methodology.
The K-means++, BIRCH, DBSCAN and HDBSCAN are the
processes that also been used in few literature. A few data tests
are considered special instances for DBSCAN and HDBSCAN.
In the proposed structure, such assembly of special case data is
handled as a unique bundle.
5.3 Short-term memory (long) (LSTM)
Because of the clever notion of employing three types of doors
to control data stream and recall data for an inconsistent time
span, LSTM overcomes the constraint of prolonged memory
ability in repetitive brain organizations. An unfurled layout of
the LSTM brain network is illustrated in Figure 5. Input door it,
ignore entryway ft, yield entryway otand state unit ctare the four
critical pieces of each LSTM cell (for time t). The province of
the LSTM cell is determined at time tas,
it=𝜎
Wiht1+Uixt+bi(2)
ft=𝜎
Wfht1+Ufxt+bf(3)
YADAV ET AL.1923
FIGURE 5 The LSTM neural cell structure in its entirety.
FIGURE 6 In the second-level learner, an example of an FCC ensemble
neural network.
ct=ft.ct1+it.𝜎(Wht1+Uxt+b(4)
ot=𝜎(W0ht1+Uoxt+bo(5)
ht=tanh (ct1).ot.(6)
For each LSTM model, LSTMiwill be ready with the taking
a look at measurements percent D1i,i=1, 2,…, k,asshownin
Figure 6, in the preparation degree.
5.4 Inside the first-level learner’s testing
process
New data tests past D1 (i.e. in D2andD3, respectively) appear
and are dealt with into the main stage student during the avail-
ability degree for the level student and the looking at degree.
Biting through piece directions to supervise them should be
done with caution. One method is to pick the most extreme
close pack and use the related to arranged LSTM model. In
any event, we recommend using outfit understanding at this
time, this is determined by the configuration in which all of the
homogenous first stage styles drive the power load check. As
a result, each first-stage LSTM model is handled with pristine
data, and an FCC mind local area is used at the second level
to merge the results from the LSTM models to form a single
suspect.
The second-certification student is built up using Second
Level Learner Dataset D2. The data tests in D2 are specifically
taken care of in each pre-coordinated LSTM marker at the basic
level. Each LSTM marker thus provides a guess in this regard.
These findings are used to secure the second-stage student’s
placement.
At degree two, the FCC frontal cortex local region is joined
for outfit learning. Figure 6depicts an outline of the FCC
organization’s affiliation concerns. Alright base designs are
accessible and to be merged through five neurons in this model.
The tanh(.) endorses the basic four neurons. A decisive neuron
is a summation that occurs quickly. The FCC considerations
local area arrangement is better than traditional frontal cortex
local area structures, although having a similar wide range of
neurons in degree two, since it provides more affiliations (and
weights) than the standard planning, making it more meaning-
ful. In some ways, the FCC frontal brain local region resembles
deep residual networks, which have an individual preparing for
each record and inactive variable to every neuron.
6TRAINING AND OPTIMIZATION
6.1 Formulation of the problem
For the two degrees of students, the default lost work is the
entire square blunder. At switch one, the LSTM model that is
used to compare objective capacity is described as,
L(L1;i)=minimize
T𝜖D1ilL1,i
T+tlT+𝜏+𝛼.𝜔i
lstm,(7)
where L1,i;T+tis the predicted burden worth for time T+t
by the LSTM model. I, lT+tis the ground truth (i.e. mark), and
1924 YADAV ET AL.
weights are the main level weights by the LSTM model I. The
heap predicted by the level-two student at time T+tis given by
if the primary level student has k constructed LSTM models.
lL2
T+𝜏 =flL1;1
T+𝜏 ,lL1;2
T+𝜏 ,….lL1;k
T+𝜏 ,𝜔fcc ,(8)
where f() is the troupe FCC brain organization’s result, is the
heap gauge esteem predicted by LSTM model I, and wfcc are the
troupe FCC brain organization’s loads. In level two, the cor-
responding advancement target over the approval and group
dataset D2isgivenby
L(L2)=minimize
TD2lL2
T+𝜏 lT+𝜏+𝛽.𝜔fcc (9)
The L1 rule is utilized to prevent over-fitting in the mind
network getting ready interaction at both the fundamental and
second-level improvement target limitations.
7ALGORITHMS FOR GRADIENT
DESCENT
First-demand point plunge computations, such as shoddy back-
propagation, Stochastic Gradient Decent (SGD) and Adam,
are incredibly effective in preparing substantial cerebral associ-
ations Regardless, debilitated trim and neighbourhood minima
are ordinary hardships for these estimations. In a second order
tendency dive computation is shown as a feasible solution for
smoothing out issues with an objective work that shows fanat-
ical curve. In any case, the second-demand tendency drop
estimation moreover has its limitations. That is the thing one
test is, for particularly significant cerebrum associations, the sec-
ond order computation registers the Hessian Matrix of mind
association, which requires a long-term strategy and planning.
As the number of layers increases, the gigantic capacity gains of
burdens may slow down in the splashing area, whose subordi-
nate of slant will be regular zero, causing a dissipating tendency
situation (known as the extended spot issue).
In light of the many advantages and disadvantages of second-
demand tendency drop estimations, we choose to use the Adam
computation, which is a principal solicitation factor based com-
putation, to deal with the backslide undertaking problem at level
one. We employ FCC at stage two, when FCC is a superficial
cerebrum association This is a 2d-demand improvement estima-
tion using the modified Levenberg–Marquardt (LM) Algorithm.
It is needed to empower a sufficient identifying way to the
planning tests with the maximum un-number of neurons to tri-
umph over the overfitting problem by using a shallow designing
approach at the level. Refs. [16, 17] illustrate the utilization of
improved reinforced learning and deep reinforcement learning
algorithms for energy management.
8LEVENBERG–MARQUARDT (LM)
ALGORITHM WITH MODIFICATIONS
In this segment, the use of modified LM to set up a group mind
community at a high level is enumerated. The Jacobian Matrix
J(fcc) is not set in stone at accentuation due to the auxiliary of,
which is given by,
J𝜔e
jcc =𝜕l(L2)
𝜕𝜔e
1
,𝜕l(L2)
𝜕𝜔e
2
,…..𝜕l(L2)
𝜕𝜔e
z(10)
where fcc represents the loads of the FCC brain network at
emphasis e,withZweight values can be used to approximate
the Hessian grid. Iteratively, a damping factor eis refreshed.
𝜇e=𝛼
e
LL2,𝜔
e
fcc
,(11)
The weights of the FCC neural network are modified at each
iteration.
𝜔e+1
fcc =𝜔
e
fcc +se(12)
Or
𝜔e+1
fcc =𝜔
e
fcc 𝜔
e
fcc +𝛼
eΔ𝜔e
fcc (13)
where seis the standard LM step and fcc denotes a line search for
approximating the standard LM step, defined as,
Δ𝜔
e
fcc =−
J𝜔e
fcc 𝜔
e
fccTJ 𝜔e
fcc 𝜔
e
fcc+𝜇I
.J𝜔e
fcc 𝜔
e
fccTL L2,𝜔
e
fcc 𝜔
e(14)
In algorithm 2, eis a boundary iteratively refilled. Jis
estimated, and eis approximated for computational upward
reduction. Then we will be able to make adjustments as needed.
Δ𝜔
e
fcc =−
J𝜔e
fccT
J𝜔e
fcc+𝜇
eI1
.J𝜔e
fccT
LL2,𝜔
e
fcc 𝜔
e
fcc.(15)
To legitimize regardless of whether seis a decent advance,
the trust district method is utilized. The genuine decrease Re
a
and the recently anticipated decrease Re
pat the eth emphasis are
characterized in (16) and (17), individually.
Re
a=
LL2,𝜔
e
fcc
2
LL2,𝜔
e
fcc +se
(16)
Re
p=
L𝜔e
fcc
2
LL2,𝜔
e
fcc +se
2
+
L𝜔e
fcc +de
2
L𝜔e
fcc +de+𝛼
eJ𝜔e
fccΔ𝜔e
fcc
2
(17)
re =Re
a=Re
pthen mulls over their qualities, and the parcels
are repaired using the cost of re as in algorithm 2.
YADAV ET AL.1925
FIGURE 7 The ISO-New England dataset’s overall system load and temperature for 2018.
8.1 Real-world datasets for evaluation
To aid the coming of the proposed outfit gathering knowl-
edge on structure, far-reaching weight gauging evaluations are
enabled on two datasets at the shape stage and the private
level, respectively. In the Python 3.7 environment, the suggested
system is implemented using Keras 2.2.4, Tensor Flow 2, Zero-
beta, and Sklearn 0.20.0. The mental local area for model blend
at degree is done with ADNBN coded in MATLAB R2018a.
9DATASETS
9.1 Dataset description
For execution evaluation, two publicly available benchmark
datasets are used. The ISO-NE dataset: This is a compos-
ite of hourly temperature and weight data collected in the
New England region from 1 January, 2007 to 31 Decem-
ber, 2018, including data for the eight zones in general
(i.e. Connecticut-CT, Maine-ME, New Hampshire-NH, Rhode
Island-RI, Vermont-VT, Massachusetts of NEM-NEMASS,
Massachusetts of SEM-SEMASS and Massachusetts of WC-
WCMASS) and the entire ISO-NE transmission system. The
overall shape level weight and temperature realities of the ISO-
New England dataset in 2018 are shown in Figure 7. In 2018,
the heap of the relative plurality of eight zones is depicted
in Figure 9. The Residential Electricity Consumption dataset
shows the heap of the relative multitude of eight zones in 2018:
This is a compilation of 370 clients’ power usage data collected
over a rather long period of time, from 2011 to 2014. Private or
current Portuguese clientele are both acceptable. Note that we
use the records for 320 clients for all intents and purposes, as
the records for the more notable 50 clients were acquired after
2011 (i.e. partitioned).
9.2 Preprocessing
On a genuine time-series dataset, a sliding window approach of
P testing is used during the preparatory interaction. As illus-
trated in Figure 7, P’s time is divided into three halves. The
proportions are 2:1:1 split. If an hourly day-ahead heap for
2017 is projected, for example, the period P is set to four years.
Dataset D1 contains data from 2014 to 2015, while datasets
D2andD3 contain data from 2016 and 2017, respectively. P is
examined from 2015 to 2018 in order to determine the heap for
the current year. Standardization is utilized in the preprocess-
ing system and it can speed up the preparation process while
also revealing the true similarity between time series data. Only
datasets D1andD2 are standardized to prevent information
from leaking into time series expectation, which uses future
data to optimize gauge execution. New knowledge provided
by elementary level students is re-established from standard-
ized structure to the first structure in testing set D3. The term
“standardization” has several definitions.
10 RESULTS OF EXPERIMENTS
10.1 Predicting performance at the
framework level
In the overall framework, the ISO-NE dataset is utilized to
calculate short and mid-term load. The first is transitory deter-
mining, which forecasts the heap for the following 24 h. The
1926 YADAV ET AL.
FIGURE 8 For the year 2018, the individual load for each of the eight zones in the ISO-New England dataset.
FIGURE 9 Results of the ISO-NE dataset’s system load projection for the last two weeks of 2011.
YADAV ET AL.1927
TAB L E 1 Attributes for calculating the performance metrics of the
different models
Sl. No. Model TP FP FN TN
1 SVM 8407 1633 1218 3742
2 Logistic Regression 8408 1611 1217 3764
3 KNN 9094 1102 531 4273
4 NB 8855 1743 770 3632
5 DT 8840 818 785 4557
6 RF 9312 508 313 4867
7 SGD 8209 1555 1416 3820
8 XG Boost 9471 225 154 5150
9 Gradient Boosting 9290 742 335 4633
ISO-NE framework load is anticipated individually for the years
2010 and 2011, each involving the three prior years’ infor-
mation as preparation and learning, to contrast our strategy’s
presentation with the existing state of the art manner. The
identical contributions that were utilized in are used here. The
TAB L E 2 Performance comparison based on accuracy and precision
Sl. No. Model % Accuracy % Mis-classification % Precision % Mis-classification
1 SVM 80.90% 19.10% 83.70% 16.30%
2 Logistic Regression 81.10% 18.90% 83.90% 16.10%
3 KNN 89.10% 10.90% 89.20% 10.80%
4 NB 83.20% 16.80% 83.60% 16.40%
5 DT 89.30% 10.70% 91.50% 8.50%
6 RF 94.50% 5.50% 94.80% 5.20%
7 SGD 80.20% 19.80% 84.10% 15.90%
8 XG Boost 97.50% 2.50% 97.60% 2.40%
9 Gradient Boosting 92.80% 7.20% 92.60% 7.40%
FIGURE 10 HDBSCAN model sample distribution for system level load prediction in 2011.
genuine value of the next day’s temperature is used in each fea-
ture calculation because this information is readily available and
weather forecasting is now quite reliable, it is included in the
plans.
The proposed system uses three cutting-edge models, as well
as the classic LSTM repeating brain network model, as bench-
marks. The sort of mean outright rate blunder (MAPE) caused
by the display is shown in Figure 8.
Snorm
T;i=ST;imin (ST;i)
max (ST;i)min (ST;i)(18)
Every year, the students suggestions also introduced in the
second portion. The table indicates that each of our suggested
structure’s four versions out performs the four benchmark
plans. Over the four benchmark plans, a usual drop in MAPE
of 10:17 percent in 2010 and 11:67 percent in 2011 has been
achieved.
We additionally observe that the HDSCAN based approach
beats different variations of our system. To outline the via-
bility of outfit learning, we also provide the presentations of
1928 YADAV ET AL.
first- and second-year students. The table shows that there are
15 and 13 base LSTM models in 2010 and 2011, respectively.
The dataset D1 is segmented into 15 and 13 categories for each
year in order to create the primary level LSTM models. The
MAPE is further lowered by the FCC brain network’s second-
level learning, as seen in the table. For the years 2010 and 2011,
the FCC achieves a normal improvement in MAPE of 21:59
percent and 25:60 percent, respectively, as compared to the
MAPEs of principle level pupils. The HDBSCAN-based LSTM
model projected the gauge outcomes of the most recent two
weeks in 2011 and plotted them alongside the ground reality
to forecast the exhibition outcomes. It is commonly observed
that the estimate bend corresponds to the ground truth firmly
utilizing the HDBSCAN-LSTM model.
Model 12s display is distinguished in the 2011 forecast
results by an image “y” which shows the model’s MAPE score,
which is the lowest of the 13 LSTM models in total. We take a
close look at this instance and plot the forecast’s bunching out-
come in Figure 9. Every one of the previous 12 groups appear to
have an acceptable amount of tests; however the twelfth group
contains only 69 cases. A little dataset is given to this level-
one student (LSTM model 12). As a result, it has a positive
impact very limited capacity for speculating. As the highlights
are divided by this approach, it achieves the most plainly poor
appearance and is not sufficiently generic; instead, they are just
right for the example dataset.
We also look into how the forecast is affected by the quan-
tity of secret neurons in the second degree of learning. The
HDBSCAN-based LSTM model learnt the standard prepara-
tion and testing blunder (i.e. Normalized Root Mean Square
Error) with varied quantities of secret neurons, as shown
Figure 11. The brain network with a similar amount of stowed
away neurons is prepared numerous times in each preliminary,
and the regular preparation and it are inserted into the table
to test blunders. As the table shows, increasing the number of
secret neurons has little effect guarantee a reduction in prepa-
ration and testing errors. For ISONE (SYS) 2010, the base
preparation and testing mistakes are conducted with 8 secret
neurons, and with 11 secret neurons for iSONE (SYS) 2011.
The preparation cycle requires locating a genuine boundary (i.e.
the number of secret neurons). As a result, in our suggested sys-
tem, the network search approach is used. Results calculated are
depicted in Tables 1and 2.
Using year 2010 data, this preliminary is completed using the
HDBSCAN-LSTM model, which features three mystery neu-
rons. The assumptions for retaining data, getting ready and
testing bumbles, as well as planning and testing time are shown
in Figure 10. It shows that the authorization limits for tanh(.)
and sigmoid(.) are more consistent than the ReLU(.) work.
Despite the fact that the sigmoid(.) work invests resources in
a possibility for success. When it comes to setup, the tanh(.)
work does a much better job of reducing the setup and testing
blunder.
The second situation we look at is predicting week-ahead
power load on week-ends (i.e. Saturday and Sunday) for the year
2018 at both the zone and structural levels. The models are con-
structed using data from 2015 to 2017. The result is the hourly
FIGURE 11 Ensemble neural0 networks0 (FCC)0 learning0 curves0
with0 various0 activation0 functions (a) tanh activation function: average
training error (NRMSE) is 0.0210, 0.0011, average testing error (NRMSE) is
0.0219, 0.0011, and average training time is 42.0660 s. (b) Sigmoid activation
function: average training error (NRMSE) is 0.0214, 0.0010, average testing
error (NRMSE) is 0.0222, 0.0009, and average training time is 38.1347 s.
(c) ReLU activation function: average training error (NRMSE).
weight readings at the end of the week. We merely employ
irrefutable temperature data as a component in our quest. This
temperature (at t+T) is not used in this gauge; therefore it is
not exactly identical to the preceding situation.
YADAV ET AL.1929
FIGURE 12 The DBSCAN clustering algorithm was used to distribute categorized inhabitants. (a) Load variations w.r.t. time for Cluster 1. (b) Load variations
w.r.t. time for Cluster 2. (c) Load variations w.r.t. time for Cluster 3. (d) Load variations w.r.t. time for Cluster 4. (e) Load variations w.r.t. time for Cluster 5.
(f) Samples variations w.r.t. distribution of each cluster.
10.2 Performance prediction at the
individual level
On the Residential Electricity Consumption dataset we focus on
the stack gauging issue for individual clients using the proposed
bunch learning approach. The power load data is collected for
an hour, as expected. As seen, the accumulated dataset is divided
into three portions. Then, using the HDBSCAN grouping com-
putation and the data from the preceding three months in 2012,
the 320 clients are invited to a couple of social events. The result
of the gathering is shown in Figure 11. It demonstrates that the
clients are divided into five groups. Each curve in Figure 12a–e
tends to a client’s normalized load, while Figure 12f depicts the
number of clients in each gathering. We can observe that each
meeting is clearly unique. The store twists in Cluster 4 are, for
example, mostly fairly level and close to 0:5, the load twists in
Clusters 2 and 3 are serrated and range from 0:2 to 1, and the
pile twists in Cluster 5 provide a clear everyday plan (showing
these are private clients).
We choose a client to expect its load in each pack for ran-
dom reasons. As a result, the planning set for each client is a
collection of weight time series taken not long before time step
t, with the window size Wset to 168 (The number of hours in
seven days). Because we merely use the recorded weight data
(i.e. m=1), the component of data m×Wat time tis 1 ×168.
The projected weight for a future time frame outlining t+
Tis the result. As a check scheme, we employ the standard
LSTM model highlights the appraisal results for the five clients
chosen (one from each meeting as illustrated). The horizon is
set to 1, 12 and 24, which means we can forecast load values
for the five clients an hour ahead, half-day ahead, and day
ahead. The proposed ensemble learning models, which are
differentiated and LSTM, achieve much higher exactness in this
study.
1930 YADAV ET AL.
For example, for T=1, T=12 and T=24, the BIRCH
MAPEs are 23:35 percent, 26:43 percent and 33:64 percent
of the corresponding LSTM MAPEs, respectively. The ideal
result for various clients and horizons is unique. Regardless, all
of the proposed groups learning models produce the greatest
results. BIRCH and HDBSCAN-based models outperform
all others in terms of achieving the fewest mix-ups appearing
differently in relation to other people. Because HDBSCAN
is a better estimate of DBSCAN, the thickness-based com-
putation HDBSCAN and the moderate estimation BIRCH
are better than allocating K-means++, which faces specific
instances or outcry. According to Figure 11, there is no evident
pattern for this set of data in cluster one, which may explain
why this group of data is difficult to anticipate. Because of
diverse lives or activities, it is incredibly difficult to correctly
forecast each client’s load. As demonstrated by this experiment,
classifying customers and predicting load by group is extremely
possible.
Client 1s MAPE is fairly high for all models, although the
MAPE of the overall huge number of different clients is under
21. As shown in Figure 12, there is no incontrovertible example
for this data social event in pack one, which could explain why
this data gathering is attempting to measure. Because of the
variety of lifestyles and activities, it is impossible to predict
exactly how much each client’s pile will be. As illustrated in
this prototype, portraying clients and predicting load based on
social events is quite practical shown in Figure 12.
As shown in Figure 12, there is no incontrovertible example
for this data social event in pack one, which could explain
why this data gathering is attempting to measure. Because of
the variety of lifestyles and activities, it’s impossible to predict
exactly how much each client’s pile will be. As illustrated in
this prototype, portraying clients and predicting load based on
social events is quite practical shown in Figure 12.
As it has been shown in Table 1, the XGBoost classifier
has detected 9471 data as true positives and 5150 true neg-
atives accurately. It has misclassified only 225+154 =379
data that corresponds to FP and FN respectively. Accu-
racy =(9471+5150)/(9471+225+154+5150) =0.975 Mis-
classification =(225+154)/ (9471+225+154+5150) =0.025.
Hence, it is inferred that XGBoost has exhibited an accuracy
of 97.50% and misclassification of 2.50%. Its performance has
been better in comparison to the other classifiers. The per-
formance comparative analysis of the different ML algorithms
based on accuracy and precision along with misclassification is
tabulated in Table 2.
11 CONCLUSION
In this proposition, we talk about two profound learning
applications among shrewd meter information. The princi-
pal application is a LSTM based outfit learning approach for
momentary burden anticipating the stacking system, the pri-
mary level student comprised of LSTM models, each prepared
with an information bunch; the second-level student comprised
of an FCC gathering brain network for model combination.
An improved second-request enhancement calculation was pro-
posed to take care of the group issue. The following application
focuses on NILM difficulties. In order to gather learning and
Meta-learning in NILM concerns, we first suggest two inno-
vative pre-prepared techniques. These pre-built models are
better suited to NILM problems than motion learning mod-
els. We look into the Transformers model’s ability to deal with
NILM concerns further. We presented Middle Window Trans-
former, a competent and practical Transformer-based method
for NILM tasks, to address the issue of Transformer models
being expensive to register.
AUTHOR CONTRIBUTIONS
Albert Alexander Stonier: Data curation; Writing—original
draft. Geno Peter: Data curation; Writing—original draft.
Vivekananda Ganji: Investigation; Visualization.
CONFLICT OF INTEREST STATEMENT
The authors declare no conflicts of interest.
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available
from the corresponding author upon reasonable request.
REFERENCES
1. D’Incecco, M., Squartini, S., Zhong, M.: Transfer learning for non-
intrusive load monitoring. IEEE Trans. Smart Grid 11(2), 1419–1429,
(2019)
2. Krystalakos, O., Nalmpantis, C., Vrakas, D.: Sliding window approach for
online energy disaggregation using artificial neural networks. In: Proceed-
ings of the 10th Hellenic Conference on Artificial Intelligence. Patras,
Greece, pp. 1–6 (2018)
3. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N.,
Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural
Information Processing Systems. IEEE, Long Beach, CA, pp. 5998–6008
(2017)
4. Murray, D., Stankovic, L., Stankovic, V.: An electrical load measurements
dataset of United Kingdom households from a two-year longitudinal study.
Sci. Data 4(1), 1–12 (2017)
5. Kelly, J., Knottenbelt, W.: The UK-DALE dataset, domestic appliance-
level electricity demand and whole-house demand from five UK homes.
Sci. Data 2(1), 1–14 (2015)
6. Wang, L., Mao, S., Wilamowski, B.: Short-term load forecasting with LSTM
based ensemble learning. In: Proceedings of IEEE Green Com 2019.
Atlanta, GA, pp. 793–800 (2019)
7. Zheng, Y., Capra, L., Wolfson, O., Yang, H.: Urban computing: Concepts,
methodologies, and applications. ACM Trans. Intell. Syst. Technol. 5(3),
38:1–38:55 (2014)
8. Chen, M., Mao, S., Liu, Y.: Big data: A sur vey. Springer Mobile Networks
Appl. J. 19(2), 171–209 (2014)
9. Asia Pacific Urban Energy Association: Introduction to Sustainable
Urban Energy Systems (2019). https://www.apuea.org/index.php/urban-
energy-systems/introduction-to- urban-energ y-systems2. Accessed 2 Jan
2023
10. Wang, Y., Mao, S., Nelms, R.M.: Online algorithm for optimal real-time
energy distribution in smart grid. IEEE Trans. Emerging Top. Comput.
1(1), 10–21 (2013)
11. Huang, Y., Mao, S., Nelms, R.: Smooth scheduling for electricity
distribution in the smart grid. IEEE Syst. J. 9(3), 966–977 (2015)
12. Huang, Y., Mao, S., Nelms, R.M.: Adaptive electricity scheduling in
microgrids. IEEE Trans. Smart Grid 5(1), 270–281 (2014)
YADAV ET AL.1931
13. Huang, Y.: Adaptive electricity scheduling in microgrids. In: Proceedings
of IEEE INFOCOM 2013. Turin, Italy, pp. 1142–1150 (2013)
14. Zou, H.Y., Wang, S.M., Zhang, F., Chen, X.: Distributed online energy
management in inter-connected microgrids. IEEE Internet Things J. 7(4),
2738–2750 (2020)
15. Zou, H., Mao, S., Wang, Y., Zhang, F., Chen, X., Cheng, L.: A survey of
energy management in interconnected multi-microgrids. IEEE Access J.
7(1), 72158–72169 (2019)
16. Xi, L., Li, H., Zhu, J., Li, Y., Wang, S.: A novel automatic generation control
method based on the large-scale electric vehicles and wind power integra-
tion into the grid. IEEE Trans. Neural Networks Learn. Syst. 1–11, early
access (2022)
17. Xie, L., Wu, J., Li, Y., Sun, Q., Xi, L.: Automatic generation control strat-
egy for integrated energy system based on ubiquitous power Internet of
Things. IEEE Internet Things J. 1, early access (2022)
18. Peter, G., Stonier, A.A., Gupta, P., Gavilanes, D., Vergara, M.M., Lung sin,
J.: Smart fault monitoring and normalizing of a power distribution system
using IoT. Energies 15(21), 8206 (2022)
How to cite this article: Yadav, P.K., Bhasker, R.,
Stonier, A.A., Peter, G., Vijayakumar, A., Ganji, V.:
Machine learning based load prediction in smart-grid
under different contract scenario. IET Gener. Transm.
Distrib. 17, 1918–1931 (2023).
https://doi.org/10.1049/gtd2.12828
... Finally, an intelligent operation and maintenance platform is built for power transformers, enabling knowledge retrieval and decision-making support. INTRODUCTION China has proposed an action plan of "achieving carbon peak by 2030 and carbon neutrality by 2060" and vigorously promoted the digital transformation and upgrading of power grid [1], which puts forward higher requirements for the operation and maintenance of traditional power equipment [2]. As a core equipment in the power system, different degrees of faults in power transformers will challenge the safe and stable operation of the entire power grid. ...
... The selfattention mechanism is the core unit of the Transformer encoder, and its operation process is shown in equation (1). (1) where Q, K, and V are input word vector matrices, and dk is the input vector dimension. The self-attention mechanism models the relationship between each input word and assigns weights to the current word. ...
Preprint
The current power grid is undergoing digital transformation and upgrading, and the intelligent health management technology of power transformers is rapidly advancing. However, there are issues of weak information correlation and low decision-making efficiency in the operation and maintenance process and there are few papers on knowledge graph construction specifically related to power transformer maintenance. Additionally, there is limited public data available specifically for power transformer operation and maintenance, making it difficult to effectively construct maintenance knowledge. This paper proposes a method for constructing a knowledge graph for power transformer operation and maintenance based on Roberta-GPliner. Firstly, public literature in the field of power transformers is obtained to enhance the training dataset of power transformer operation and maintenance. Then, we use Roberta as the embedding layer and employ the GPliner joint extraction model to extract knowledge triplets on power transformer operation and maintenance. Roberta-GPliner is compared with other pre-training models, validating that the joint knowledge extraction algorithm based on Roberta-GPliner performs better. Finally, an intelligent operation and maintenance platform is built for power transformers, enabling knowledge retrieval and decision-making support.
... Additionally, the integration of intermittent renewable energy sources, such as solar and wind, adds further variability to the load profile, requiring innovative solutions to ensure reliable and efficient grid operation. In this context, representative learning techniques offer a promising approach to capturing intricate patterns and dependencies present in electric load data [10][11][12]. By leveraging advanced machine learning algorithms, such as deep neural networks and ensemble methods, representative learning models can effectively analyze historical load data along with relevant contextual information to make accurate load forecasts. ...
Conference Paper
Full-text available
In the last decade, the water and electricity industry has experienced significant investments in smart grid technologies. Within a smart grid framework, information and energy engage in bidirectional transmission, opening up diverse applications for artificial intelligence, including artificial neural networks, machine learning, and deep learning. This comprehensive review investigates the dynamic landscape of deep learning methodologies applied to load forecasting within smart grids, spanning short-term (STLF), medium-term (MILF), and long-term (LTLF) Forecasting horizons. We scrutinize a range of techniques, encompassing Auto-Encoder Method, Recurrent Neural Network (RNN), Long Short-Term Memory (LSTM), Convolutional Neural Network (CNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Deep Boltzmann Machine (DBM), Graph Neural Networks (GNNs), Attention Mechanisms, and Hybrid Models. This article introduces and reviews common deep-learning algorithms used in load forecasting for smart grids and power systems. It also offers a comparative assessment based on the reduction percentage in four indicators: accuracy, speed, mean absolute error (MAPE), and root mean square error (RMSE). The research aims to provide valuable insights into the strengths and weaknesses of each deep learning method, guiding researchers and practitioners in making informed decisions when selecting the most suitable approach for diverse load forecasting scenarios in smart grid environments.
... However, there are also some disadvantages to ANNs, such as hardware dependency, which requires processors with parallel processing power. Additionally, the lack of interpretability of the network and the unpredictability of the duration of the network are also significant drawbacks [40]. ...
Article
Full-text available
To maintain grid stability, the energy levels produced by sources within the network must be equal to the energy consumed by customers. In current times, achieving energy balance mainly involves regulating the electrical energy sources, as consumption is typically beyond the control of grid operators. For improving the stability of the grid, accurate forecasting of photovoltaic power output from largely integrated solar photovoltaic plant connected to grid is required. In the present study, to improve the forecasting accuracy of the forecasting models, onsite measurements of the weather parameters and the photovoltaic power output from the 20 kW on‐grid were collected for a typical year which covers all four seasons and evaluated the random forest techniques and other techniques like deep neural networks, artificial neural networks and support vector regression (reference in this study). The simulation results show that the proposed random forest technique for the forecasting horizon of 15 and 30 min is performing well with 49% and 50% improvements in the accuracy respectively over reference model for the study location 22.78°N, 73.65°E, College of Agricultural Engineering and Technology, Anand Agricultural University, Godhra, India.
Article
Full-text available
The improved transient response of PID controller for improved Luo converter supplied Brushless DC (BLDC) motor using solar energy source is extensively investigated. The BLDC drive uses a hybrid PV system and battery-based power generation as a power source to assure continuous power supply to pump water regardless of solar insolation. The PV system serves as the main energy source, and the battery providing as a backup source that discharged its power only during inclement weather or at night if the photovoltaic panel is insufficient to operate the centrifugal pump. The solar panel feeds power into the battery, whenever water to the field is not required, thus there is no need for an extra power source to charge the batteries. The high output voltage gain DC-DC converter does not make any discomfit to the system performance in regards to irradiance variation, switching loss, or power loss by converter or motor side respectively. Moreover, current sensor, voltage sensor, and control circuits are completed excluded which enhance performance and the cost of the system get reduced effectively. A bidirectional charging control allows a bi-directional converter to alter the battery operation mode automatically. The overall system performance is validated using simulation and experimental setup.
Article
Full-text available
Conventional outage management practices in distribution systems are tedious and complex due to the long time taken to locate the fault. Emerging smart technologies and various cloud services offered could be utilized and integrated into the power industry to enhance the overall process, especially in the fault monitoring and normalizing fields in distribution systems. This paper introduces smart fault monitoring and normalizing technologies in distribution systems by using one of the most popular cloud service platforms, the Microsoft Azure Internet of Things (IoT) Hub, together with some of the related services. A hardware prototype was constructed based on part of a real underground distribution system network, and the fault monitoring and normalizing techniques were integrated to form a system. Such a system with IoT integration effectively reduces the power outage experienced by customers in the healthy section of the faulted feeder from approximately 1 h to less than 5 min and is able to improve the System Average Interruption Duration Index (SAIDI) and System Average Interruption Frequency Index (SAIFI) in electric utility companies significantly.
Article
Full-text available
Non-intrusive load monitoring (NILM) is a technique to recover source appliances from only the recorded mains in a household. NILM is unidentifiable and thus a challenge problem because the inferred power value of an appliance given only the mains could not be unique. To mitigate the unidentifiable problem, various methods incorporating domain knowledge into NILM have been proposed and shown effective experimentally. Recently, among these methods, deep neural networks are shown performing best. Arguably, the recently proposed sequence-to-point (seq2point) learning is promising for NILM. However, the results were only carried out on the same data domain. It is not clear if the method could be generalised or transferred to different domains, e.g., the test data were drawn from a different country comparing to the training data. We address this issue in the paper, and two transfer learning schemes are proposed, i.e., appliance transfer learning (ATL) and cross-domain transfer learning (CTL). For ATL, our results show that the latent features learnt by a ‘complex’ appliance, e.g., washing machine, can be transferred to a ‘simple’ appliance, e.g., kettle. For CTL, our conclusion is that the seq2point learning is transferable. Precisely, when the training and test data are in a similar domain, seq2point learning can be directly applied to the test data without fine tuning; when the training and test data are in different domains, seq2point learning needs fine tuning before applying to the test data. Interestingly, we show that only the fully connected layers need fine tuning for transfer learning. Source code can be found at https://github.com/MingjunZhong/transferNILM.
Article
Full-text available
An Interconnected Multi-microgrids (IMMGs) system takes advantage of various complementary power sources, and effectively coordinate the energy sharing/trading among the microgrids (MGs) and the main grid to improve the stability, reliability, and energy efficiency of the system. The core of this structure is to achieve the optimal distribution of energy sharing through proper strategies. However, the volatility and intermittent characteristics of renewable resources, time-varying loads in the MGs, their correlated power generations, the coupled energy among the MGs during energy trading, all bring about new challenges to achieving a stable operation and optimal scheduling in the power system. Many solutions have been proposed to solve these problems. In this paper, we provide an overview of the current energy management systems (EMS) in IMMGs, focusing on the IMMG structure, EMS objectives, timescales, and scheduling optimization structure. We then provide a review of distributed optimization algorithms in IMMGs. We conclude this survey with a discussion future directions.
Conference Paper
Full-text available
Energy disaggregation is the process of extracting the power consumptions of multiple appliances from the total consumption signal of a building. Artificial Neural Networks (ANN) have been very popular for this task in the last decade. In this paper we propose two recurrent network architectures that use sliding window for real-time energy disaggregation. We compare this approach to existing techniques using six metrics and find that it scores better for multi-state devices. Finally, we compare ANNs that use Gated Recurrent Unit neurons against those using Long Short-Term Memory neurons and find that they perform equally.
Article
Full-text available
Smart meter roll-outs provide easy access to granular meter measurements, enabling advanced energy services, ranging from demand response measures, tailored energy feedback and smart home/building automation. To design such services, train and validate models, access to data that resembles what is expected of smart meters, collected in a real-world setting, is necessary. The REFIT electrical load measurements dataset described in this paper includes whole house aggregate loads and nine individual appliance measurements at 8-second intervals per house, collected continuously over a period of two years from 20 houses. During monitoring, the occupants were conducting their usual routines. At the time of publishing, the dataset has the largest number of houses monitored in the United Kingdom at less than 1-minute intervals over a period greater than one year. The dataset comprises 1,194,958,790 readings, that represent over 250,000 monitored appliance uses. The data is accessible in an easy-to-use comma-separated format, is time-stamped and cleaned to remove invalid measurements, correctly label appliance data and fill in small gaps of missing data.
Article
The integrated energy system based on Ubiquitous Power Internet of Things has the characteristics of ubiquitous connection of everything, complex energy conversion mode and unbalanced supply-demand relationship. It brings strong random disturbance to the power grid, which deteriorates the comprehensive control performance of automatic generation control. Therefore, a novel deep reinforcement learning algorithm, namely collaborative learning actor-critic strategy, is proposed. It is oriented to different exploration horizons, has the advantage on experience sharing mechanism and can continuously coordinate the key behavioral strategies. Simulation tests are performed on the two-area integrated energy system and the four-area integrated energy system based on ubiquitous power Internet of Things. Comparative analyses show that the proposed algorithm can efficiently solve the problem of strong random disturbance, and has better convergence characteristic and generalization performance. Besides, it can realize the optimal cooperative control of multi-area integrated energy system efficiently.
Article
In order to solve the problem of frequency instability of power system due to strong random disturbance caused by large-scale electric vehicles and wind power grid connection, an improved reinforcement learning algorithm, namely, optimistic initialized double Q, is proposed in this article from the perspective of automatic generation control. The proposed algorithm uses the optimistic initialization principle to expand the agent action exploration space, so as to prevent Q-learning from falling into local optimum by greedy strategy; meanwhile, it integrates double Q-learning to solve the problem of overestimation of action value in traditional reinforcement learning based on Q-learning. In the algorithm, the hyperparameter ατ is introduced to improve the learning efficiency, and the reward bτ based on exploration times is introduced to increase the Q value estimation to drive the exploration of the algorithm, so as to obtain the optimal solution. By simulating the two-area load frequency control model integrated with large-scale electric vehicles and the four-area interconnected power grid model integrated with large-scale wind power generation, it is verified that the proposed algorithm can obtain the global optimal solution, thus effectively solvinng the frequency instability caused by strong random disturbance in the grid-connected mode of large-scale wind power generation, and compared with many reinforcement learning algorithms, the proposed algorithm has better control performance.
Article
In this paper, a hierarchical online distributed algorithm (HODA) is developed to achieve optimal energy management in inter-connected Microgrids (IMG). The energy management objectives include maximizing users’ utility, optimizing the output power of controllable generators, and keeping the system operating in an economic manner. We formulate the problem as an online least absolute shrinkage and selection operator (LASSO) problem, considering both reactive power and system operation characteristics. We then employ Averaging Fixed Horizon Control (AFHC) to solve the formulated problem under some mild assumptions on the uncertainties in renewable power generation and load demand. The alternating direction method of multipliers (ADMM) is adopted to decouple the coupled constraints. The proposed online algorithm is asymptotically optimal, since its solution converges to the offline optimal solution. The performance of the proposed algorithm is validated using data traces obtained from a real-world IMG system.
Article
The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.