Conference PaperPDF Available

Proactive Failure Management in Smart Grids for Improved Resilience: A Methodology for Failure Prediction and Mitigation

Authors:

Figures

Content may be subject to copyright.
Proactive Failure Management in Smart Grids for
Improved Resilience
A Methodology for Failure Prediction and Mitigation
Igor Kaitovic, Slobodan Lukovic, Miroslaw Malek
Advanced Learning and Research Institute (ALaRI), Faculty of Informatics
Università della Svizzera italiana (University of Lugano)
Lugano, Switzerland
{igor.kaitovic, slobodan.lukovic, miroslaw.malek}@usi.ch
Abstract—A gradual move in the electric power industry
towards Smart Grids brings several challenges to the system
operation such as preserving its resilience and ensuring security.
As the system complexity grows and a number of failures
increases, the need for grid management paradigm shift from
reactive to proactive is apparent and can be realized by
employing advanced monitoring instruments, data analytics and
prediction methods. In order to improve resilience of the Smart
Grid and to contribute to efficient system operation, we present a
blueprint of a comprehensive methodology for proactive failure
management that may also be applied to manage other types of
disturbances and undesirable changes. The methodology is
composed of three main steps: continuous monitoring of the most
indicative features; prediction of failures; their mitigation. The
approach is complementary to the existing ones that are mainly
based on fast detection and localization of grid disturbances, and
reactive corrective actions.
Keywords—Smart Grid, Proactive Management, Failure
Prediction, Resilience, Security, Synchrophasor
I. INTRODUCTION
Current approaches to electric power grid operation are
essentially reactive and are based on state estimation, using
run-time measurements that are typically obtained from
SCADA and, in the recent period more frequently, from
synchrophasors (i.e. Phasor Measurement Units - PMUs)
[1][2]. If a disturbing event is detected (e.g. a voltage sag, or a
frequency deviation), corrective actions are taken (e.g. load
shading, or altering a transmission line load) to bring the
system back to the stable state and to prevent disturbance
propagation. The reactive approach becomes insufficient with
increasing complexity of the grid, introduction of a vast
number of distributed (renewable) energy resources (due to
their intermittent nature); new types of loads like electric
vehicles; as well as demand response programs [3]. Moreover,
considering Smart Grid as a cyber-physical system, new types
of disturbances may occur due to interdependencies between
cyber and electric infrastructures and cyber attacks [4][5][6].
Thus, innovative approaches are needed to address with these
challenges and make Smart Grids more resilient. In the scope
of this paper, we think of resilience as defined in [7], namely
the ability of a system to provide and maintain an acceptable
level of service in the face of various faults and cyber attacks.
In this sense, resilience may be seen as a system property that
unifies dependability, security and performability.
Failures impact both, dependability and performability. A
failure in power systems occurs when the quality of the electric
power delivered to the customer deviates from the one defined
by the service agreement [6]. The minimum quality level also
depends on the type of the customer (e.g. a household or an
industrial consumer). The most severe failure - power outage,
is a complete interruption of power delivery. A blackout
represents a total interruption of the electric power delivery
service that affects all users in the area [8]. Numerous
blackouts in the recent history (e.g. the ones reported in [9] and
[10]) that effected hundreds of millions of customers, are a
strong motivation to further improve grid’s resilience.
Considering importance of Smart Grid as a critical
infrastructure and the impact that failures have on its overall
resilience, we propose a proactive approach to management of
failures which relies on improved availability of grid status
information (e.g. obtained from PMUs) and advanced data
analytics. Our methodology is based on: (i) continuous
monitoring of the most indicative features and time series, (ii)
prediction algorithms trained on features’ logs to predict
failures and (iii) mitigation techniques to prevent impending
failures or minimize their effects. The main difference of our
approach and the state of the art is that our focus is not on a
specific type or aspect of failures (e.g. cascading failures) but
on defining a comprehensive approach to address different
types of failures. The approach may also be applied to address
other types of disturbances or undesirable changes (e.g. cyber
attacks).
The rest of the paper is organized as follows. A change of
paradigm from reactive to proactive Smart Grid management is
outlined in Section II. State of the art is briefly presented in
Section III and covers both industrial and academic trends. A
blueprint of a methodology for proactive management of
failures is described in detail in Section IV and the problem of
obtaining the data for training and evaluating the failure
predictor is addressed in Section V. The anticipated effect on
resilience is discussed in Section VI. Section VII concludes the
paper.
This work has been supported in part by grants from the Hasler
Foundation and Swiss Commission for Technology and Innovation (CTI)
II. FROM REACTIVE TO PROACTIVE SMART GRID
MANAGEMENT
With ability to collect and analyze increasing amount of
data in real time, we observe a rapid paradigm shift (as on the
Fig. 1 from Gartner1 ) from analyzing the past and monitoring
the present to predicting and prescribing the future. For
systems such as Smart Grids and other we used to ask
questions like what happened and why it happened but now
with ability to collect more data and rapid processing capability
we are or will be able to ask questions about the future: what
will happen and what can we do about it? Obviously, the level
of difficulty increases as questions about the future are usually
more difficult than the ones about the past but, on the other
hand, only by careful analysis of the past and the present we
are able to reasonably speculate about the future. At the same
time, an increase in resilience and decrease in maintenance cost
are expected. And this is the essence of our contribution in this
paper by advocating the development of failure prediction and
mitigation methods to significantly improve Smart Grid’s
resilience.
III. STATE OF THE ART
Failure prediction for proactive fault management in
computer systems has been extensively researched [11] with
some methods finding its way into industrial practice [12][13].
Efficient grid management of emerging Smart Grids attracts
considerable attention in both industrial and academic
communities (see, for example [1] and [14]-[18]) especially
that the massive insertion of renewables followed by improved
interaction between utilities and end consumers brought
additional uncertainties in power distribution systems. On the
other hand, modern technologies such as synchrophasors and
advanced Energy Management System (EMS) applications
greatly facilitated grid observability and management.
Therefore, utilities and grid operators have recently started
looking into novel concepts for proactive grid management as
presented in [1]. In this sense “proactive operation” means
taking preventive actions in order to avoid anticipated
problems in the grid. This concept considers two main aspects
failure prevention and asset management (i.e. predictive
maintenance). In this regard the following areas of proactive
grid management solutions are presented in [1]: 1) decision
support systems (DSSs); 2) synchrophasor solutions; 3)
symbiotic integration of synchrophasors with fast-acting
1 Gartner IT Glossary, Predictive Analytics: http://www.gartner.co m/it-glossary/predictive-analytics
controls. The proposed solutions rely on the analysis of a
combination of PMU and SCADA/EMS collected and
processed grid related information. The usage of historical (i.e.
pre-event) data is also foreseen.
Considerable research efforts have been invested in
improvement of grid reliability from the perspective of
efficient maintenance coupled with maximization of assets
utilization. In particular this concerns a shift from scheduled to
predictive grid maintenance [14]. The predictive maintenance
instruments include a group of programs named Reliability-
Centered Maintenance (RCM). In an RCM approach, various
alternative maintenance policies can be compared to select the
most cost-effective one for sustaining equipment reliability. A
relevant work on the case study of the city of New York has
investigated knowledge discovery methods and statistical data
analysis for preventive maintenance [15]. It introduced a
general process for transforming historical electrical grid data
into models that aim to predict the risk of failures for
components and systems. These models can be used directly by
power companies to assist with prioritization of maintenance
and repair work. The study has proved that prediction results
are accurate enough to support decision making.
Major industrial players such as Alstom and PG&E (Pacific
Gas & Electric) announced to advance Synchrophasor Grid
Monitoring into Proactive Grid Stability Management. In 2013,
Alstom Grid collaborated with PG&E project team to deliver
enhanced e-terra integrated real-time synchrophasor and
Energy Management Systems (EMS) applications as the first
stage of the Production Grade Synchrophasor Project. This
enables PG&E to monitor power system behavior from a new
class of GPS time-synchronized, high resolution PMUs. These
devices take grid measurements with the rate of up to 120
times per second versus the traditional four to six second data
rate generated by unsynchronized SCADA sensors. The
increased observability allows PG&E to identify and analyze
system vulnerabilities in real-time, assess available transfer
margins across transmission corridors and provide corrective
actions to prevent potential blackouts. In the future, Alstom’s
Grid Stability Package will help to integrate existing
measurement-based PMU analytics, model-based EMS and
dynamic stability analytics to enable proactive management of
grid stability [16]. Tollgade in partnership with DTE Energy
implemented a pilot to prove the concept of “predictive grid”
across DTE Energy’s service territory by installing advanced
monitoring equipment and predictive grid analytics software to
find tell-tale signs of faults and asset health symptoms before
outages occur [17]. The results have shown that almost 86% of
all outages have been preceded by line disturbances which
leaves significant space for further work in grid failure
prediction.
In [18] an approach to predict power grid weak points, and
specifically to efficiently identify the most probable failure
modes in static load distribution for a given power network has
been developed. The approach is applied to two examples. The
algorithm represents a power network adaption of the heuristic
originally developed to study low probability events in physics.
One finding is that, if the normal operational mode of the grid
is sufficiently healthy, failures are relatively sparse, i.e., the
failures are caused by load fluctuations at only a few buses.
Descriptive
Analyti cs
Diagnostic
Analytics
Predictive
Analytics
Prescriptive
Analytics
What
happened?
Why did it
happen?
What will
happen?
How can we
make it happen?
Difficulty
Value
Fig. 1. Proactive management roadmap
IV. SMART GRID PROACTIVE FAILURE MANAGEMENT
In computer systems, a failure is an event that occurs when
the delivered service deviates from the correct one [19]. In
power systems, a term failure is traditionally used to describe a
complete outage, when no power is delivered to the user. We
address Smart Grid faults, errors and failures in [6] and define
power grid failures as events that occur not only when the
power delivery is interrupted but also when the quality of the
electric power delivered to the customer deviates from the one
defined by the service agreement. The minimum level of
quality depends on the type of the customer. This margin is
used to differentiate between errors and failures.
Power grid failures may affect a large number of
consumers causing customers’ dissatisfaction. Moreover, they
have a highly negative impact on economy, especially in
industrial environments. As power grid is a safety-critical
infrastructure, failures may also compromise safety and pose
life risks. This calls for a comprehensive approach to managing
Smart Grid failures including a need for failure prediction
methods. Failure prediction triggers preventive actions to
mitigate anticipated failures by reducing their effects and,
whenever possible, fully preventing impending failures. When
prevention is not possible and a failure is unavoidable, failure
prediction may activate preparation for repair actions.
An overview of the proposed methodology for proactive
management of Smart Grid failures is depicted in Fig. 2. The
main stages that are monitoring, failure prediction and failure
mitigation are discussed in detail in the following subsections.
The methodology may be implemented as an additional
application of EMS. It may also be adapted for handling other
disturbances and undesirable changes, as well as for early
warning on cyber attacks that are also identified as a threat to
Smart Grid’s resilience [5].
A. Grid Monitoring
Monitoring infrastructure provides data to a failure
prediction algorithm. The set of data, its quality and monitoring
frequency have a high impact on the quality of prediction.
Until recently, the main set of measurements was collected
through SCADA, typically on every 2 to 4 seconds [1]. These
data are usually scarce, noisy, inaccurate and not synchronized.
As such they cannot be used to derive an accurate grid model
that is a basis for the prediction of future states and anticipation
of failures.
The introduction of a large number of PMUs, smart meters
and advanced metering devices, changes the picture
significantly. These new sensors and measuring devices
generate a vast amount of data that is more accurate and more
importantly, synchronized. With time tags, measurements from
different locations may be related and current state of the
systems may be estimated with sufficient accuracy. PMUs
provide information on voltages, currents and phasors on all
three phases separately with the rate of up to 60 [1] or even 120
data points per second in some industrial solutions [16]. Today,
this data is mainly used for the state estimation, detection of
disturbances and their localization; it may also be a valuable
input for failure prediction. Besides real-time measurements,
monitoring infrastructure provides static network data that
corresponds to the parameters and substation configurations
[2]. Additional devices that provide ambient measurements
(e.g. air and wire temperature, wind speed and direction) may
also be a valuable input for failure prediction. Finally, weather
prediction and load forecasts are an important input as extreme
weather conditions and high demands are frequently identified
as root causes of grid failures [6][9].
Due to a large amount of data and still limited computing
resources, not all the data may be processed at runtime. Thus,
monitoring must be adaptive in terms that monitoring
frequency as well as the number of features sent to the
predictor may be adjusted based on the current state of the grid
and estimated probability of near-future failures. For example,
if the probability of failure is estimated as high, it may be
needed to acquire the data with higher rate and from additional
sources (e.g. from a PMU in a different part of the network) in
order to obtain more accurate prediction.
B. Failure Prediction
The goal of online failure prediction is to identify, at
runtime, whether a failure will occur in the near future based
on an assessment of the monitored current system state and the
analysis of past events. The output of a failure predictor is the
probability of a failure imminence in the near future. A failure
predictor should predict as many failures as possible while
minimizing the number of false alarms. Numerous prediction
methods are already successfully used for the enterprise
computer systems for online, short-term prediction of failures
[11]. The prediction quality is identified as a critical part of the
entire predict-mitigate approach [20].
A design of a failure predictor should be conducted in three
phases depicted in Fig. 3. In the first phase a model of the
system should be conceived from the topology of the grid (or
its part). The model should clearly relate system parts to PMU
measurements, identify parts of the system where, based on
historical records, disturbances are the most frequent, and
establish a relation between system parts in terms of
disturbance propagation. The main usage of the model is in
preliminary selection of the most indicative features. As
different types of failures may occur in the grid (e.g. line trips,
generators failures, failures of the cyber infrastructure,
interdependent failures), they must be identified, classified and
properly described with related ranges of features’ values.
Classification of Smart Grid failures and fault taxonomy are
presented in [4] and [6]. Historical data, that are necessary to
Smar t Grid
Monitoring
Ensembl e of
failure pr edictors
Diagn osis
Decis ion on
Count ermeas ures
Failure Prediction
Failure Mitigation
Implementation of
Coun term easur es
Fig. 2. Methodology overview
train and evaluate a predictor, may be obtained from
measurements logs. Alternatively, simulating Smart Grid and
grid failures may be more beneficial for the data generation,
especially in the first phase. The problem of collecting the data
for training and evaluating the predictor will be addressed in
Section V.
In the second phase, the obtained dataset should be
analyzed. Data conditioning includes extraction of the features
(also called events, variables or parameters by different
research communities) and structuring the data in a form that
may be used as input for the prediction algorithm. In particular,
each data set in the stream, that describes one system state,
should be associated with a failure type or marked as failure-
free. A preliminary feature selection should be conducted while
taking into account system model. Feature selection is the
process of selecting the most relevant features (and examples
for algorithm training) and combining them in order to
maximize predictors’ performance; discard redundant and
noisy data; obtain faster and more cost-effective algorithm
training and online prediction; and better interpret the data
relations (data simplification for better human understanding).
Feature selection methods may be classified as filters, wrappers
and embedded methods. A widely used filter method is
Principal Component Analysis (PCA). PCA converts a set of
correlated features into a set of linearly uncorrelated features
(principal components) using orthogonal transformation. The
procedure is independent with respect to the type of the
prediction algorithm that will be used and thus very appropriate
for preliminary selection of features. Numerous packages are
available for feature selection, including those that are a part of
popular tools for statistical analysis (e.g. Matlab/Octave,
Python and R). A good overview of feature selection methods
is given in [21].
In the final stage, the predictor is adapted and evaluated. In
fact, an ensemble of predictors may be used to improve quality
of prediction. Having in mind a large number of existing
failure prediction algorithms, the most viable solution is to
select and to adopt one of them. A comprehensive survey of
failure prediction algorithms is given in [11]. Three main
approaches used for prediction are: failure tracking, symptom
monitoring and detected error reporting. Failure tracking draws
conclusions about upcoming failures from the occurrence of
the previous ones. These methods either aim at predicting the
time of the next occurrence of a failure or at estimating the
probability of failures co-occurrence. Symptoms are defined as
side effects of looming faults that not necessarily manifest
themselves as errors. Symptom-monitoring based predictions
analyze the system features in order to identify those that
indicate an upcoming failure. Several methodologies for the
estimation were proposed in the past, including function
approximation, machine-learning techniques, system models,
graph models, and time series analysis. Finally, the methods
based on detected error reporting, such as the rule-based, the
co-concurrence-based and the pattern recognition methods,
analyze the error reports to predict if a new failure is about to
happen. Current trends in predicting cascading failures in
power grids are mainly based on the application of machine-
learning approaches, such as neural networks, support vector
machines and anomalies’ detection (see, for example, [15]).
In order to speed up the prediction, the set of selected
features should be refined by extracting the most indicative
ones. To facilitate the process, heuristics (such as the ones
presented in [22]) may be employed. After each iteration, the
quality of prediction has to be evaluated. The process
terminates when a sufficient quality of prediction is reached so
that resilience is improved. A prediction quality metric and the
expected effect of our approach on resilience are addressed in
Section VI.
C. Failure Mitigation
Once the prediction mechanisms anticipate a failure,
corrective actions that will mitigate it should be scheduled and
activated. The mitigation is composed of three phases as
depicted in Fig. 2. In the diagnosis phase, the output of the
prediction is analyzed. Additional algorithms may be employed
to better identify the location of the anticipated failure. In the
second phase of mitigation, a decision on a countermeasure is
taken. This decision should take into account the probability of
a failure (provided by the predictor), the cost of the measure
(e.g. maintenance cost or the cost in terms of the number of
customers affected), the probability of a successful mitigation
and the overall effect on resilience (for example the effect on
steady-state availability). Finally, the implementation of the
countermeasure has to be performed.
The ultimate goal is to fully prevent the failure (e.g. by grid
reconfiguration or by decreasing the load of a specific line). If
that is not possible, then the effect of a failure should be
minimized or confined (e.g. by preventive load shading) or a
preparation of repair actions may be triggered to minimize the
repair time (e.g. by directing a maintenance team to the
location where a failure is expected). Some of these techniques
can lead to a system performance degradation. For example, if
a failure predictor was wrong, unnecessary preventive load
shading may be conducted affecting a subset of customers. The
entire process may be implemented as fully automated or it
may require the involvement of an operator for decision-
making. This may depend on the type of mitigation and its
cost. When more than one failure is anticipated, a coordinated
management of mitigation is required.
A great opportunity for mitigation of failures in Smart Grid
lies in the employment of Flexible Alternating Current
Historical
Data
Failure
Classific ation
SystemModel
DataConditioning
PreliminaryFeature
Selection
Network
Topology
Phase1Phase2Phase3
Algori thmSele ction
Evaluation
FeatureSelection
Refine ment
Data Analysis
Design and
Evaluation
Fig. 3. Failure predictor design stages
Transmission Systems (FACTS) that provide a sub-second
response [1]. FACTS incorporate electronic-based and other
static controllers that provide control of one or more AC
transmission system parameters to enhance controllability and
increase power transfer capability. Standard definitions of
FACTS-related terminology is given in [23]. As demonstrated
in [24], these devices can alter power flow in transmission lines
in a fashion that can prevent failures from occurring in the
system. An overview of available FACTS and their capability
for handling specific problems is given in [1]. More details on
mitigation techniques in power systems may be found in [25].
Finally, considering Smart Grid as cyber-physical system, and
having in mind that failures may propagate from the computing
infrastructure [4], techniques for computer systems recovery
should also be included.
V. DATA PROVISION FOR FAILURE PREDICTION
For an efficient training and evaluation of a predictor, a
vast amount of relevant data is needed. This data must contain
a sufficient number of training examples during and before grid
failures. The problem of obtaining such data is that advanced
measuring devices like PMUs are still not widely deployed and
that electric power distribution companies mostly do not make
collected data publicly available. Several academic projects
exist where the data from PMUs and FDRs (Frequency
Disturbance Recorders) are collected for the purpose of state
estimation, visualization, disturbances detection and prediction.
These projects include FNET/GridEye2, EPFL Smart Grid3 and
openPDC4. Even that they are valuable sources, the problem of
insufficient number of training examples that contain the data
collected during a failure still remains. This is mainly due to
the fact that power grid failures are relatively rare events and
that, in some of these projects, a relatively small university
campus grids are monitored with a limited number of PMUs.
An alternative way to train and evaluate prediction
algorithms is to use fault injection methods in power systems
simulators. A number of commercial and open-source grid
simulators are available, including Power System Toolbox
(PST), Power Analysis Toolbox (PAT), Power System
Analysis Toolbox (PSAT), and MatDyn [26]. Still, one should
keep in mind that these and other simulators do not necessarily
take into account all the aspects of the future grid, like a strong
impact of cyber (computing) infrastructure on grid operation
[4]. This is particularly important, having in mind relative
frequency of software failures and their potential propagation
from cyber to electric infrastructure [6].
Fault injection, in a simulation environment, may be
employed to emulate failures and obtain sufficient number of
failure-related training examples. In computer systems, fault
injection is mainly used for testing fault tolerance, identifying
root causes and evaluating dependability [27] but it may be
also used for improvement and validation of failure prediction
by accelerating the process of data collection through data
generation [28]. Many of power system simulation tools also
implement fault injection mechanisms to evaluate reliability of
2 FNET/GridEye: http://fnetpublic.utk.edu/
3 EPFL Smart Grid: http://smartgrid.epfl.ch/
4 openPDC: http://openpdc.codeplex.com/
the grid. For example, MatDyn [26] allows time-domain
simulation of power systems with injection of faults.
VI. EXPECTED EFFECT ON RESILIENCE
Proactive management of failures relies on failure
prediction and mitigation techniques to fully prevent failures or
to minimize their effects. With sufficiently good prediction and
proper corrective actions a highly positive effect on resilience
is expected. For example, in the case of computer systems,
preliminary studies show that steady-state availability of a
system may be improved by an order of a magnitude if the
quality of failure prediction is high and if failures are
anticipated sufficiently in advance [20][29].
Precision and recall are widely used as metrics for the
failure prediction quality. Precision is the ratio of the number
of correctly predicted failures to the total number of predictions
(issued alarms). Recall is the ratio of the total number of
correctly predicted failures to the total number of failures. High
precision means a high percentage of correct predictions and
high recall means that a high percentage of failures are
predicted. Low precision indicates that too many false alarms
are issued. Low recall indicates that a large number of failures
are not predicted. A good quality predictor has high precision
and high recall. Still, it is frequently necessary to compromise
on precision and recall by increasing one at the expense of
another [11].
The correctness of prediction has a decisive impact on
Smart Grid’s resilience and maintenance. If a prediction is
correct, a failure may be avoided. This will improve resilience
of a system and may also decrease maintenance cost. For
example, if a transmission line failure is correctly predicted, a
preventive load shading could be triggered. This will affect a
subset of customers but will prevent the effect of cascading
failures that would finally affect a much larger set of
customers. The overall effect on resilience will be positive. On
the other hand, if prediction is not correct (false alarm),
unnecessary corrective actions may be triggered, resulting in
additional costs and compromising resilience. For example, a
failure of a transmission line may be incorrectly predicted as
there is no real danger of a line failure. As a consequence,
preventive load shading may be triggered as a corrective
action. This will cut off a subset of customers and create a
partial outage even though that there was no danger of a
failure.
In conclusion, the overall effect that proactive management
may have on Smart Grid’s resilience strongly depends on the
quality of prediction and the cost of related corrective actions
(in terms of both, financial cost and time). While dependability
and performability may be improved with good prediction and
effective mitigation, system’s dependability may be
compromised due to bad prediction that will also affect the
efficiency of the overall operation of the grid. If the
methodology is adopted for managing cyber attacks through
their early detection a similar effect on security may be
expected.
VII. CONCLUSIONS
Growing complexity of the grid, increasing number of
intermittent renewable energy resources, changes in interaction
with end consumers, rising needs for electric power and a
demand for more flexible and efficient grid management pose a
challenge to maintaining high resilience standards. On the
other hand, modern Energy Management Systems
complemented by advanced monitoring instruments enable
improved grid observability, state estimation and grid/outage
management which can be effectively used, as advocated in
this paper, in proactive failure management.
We propose a comprehensive methodology for proactive
management of failures in Smart Grid based on data analytics
and inspired by well-established solutions employed in
computer engineering. The concept relies on statistical analysis
of historic pre-event data coupled with online monitoring of the
most indicative features for prediction of near-future failures
and their mitigation. The proposed strategies aim at efficient
prevention of failures, which at the same time improves
resilience and decreases maintenance cost. Since our approach
is holistic it may also be applied to address other types of
disturbances or undesirable changes (e.g. cyber attacks) that
pose a challenge to assurance of high level of resilience. Each
step of the methodology is described including a discussion on
challenges, implementation strategies and opportunities for
improving Smart Grids resilience. A particular care is given to
a description of a failure predictor design whose quality has a
decisive impact on resilience, cost and customer satisfaction.
REFERENCES
[1] J. Giri, “Proactive management of the future grid”, IEEE Journal on
Power and Energy Technology Systems, vol.2, no.2, pp.43-52, June
2015
[2] A. Monticelli, “Electric power system state estimation”, Proceedings of
the IEEE vol.88, no.2, pp. 262-282, August 2002
[3] S. Lukovic, I. Kaitovic, M. Mura, and U. Bondi, “Virtual power plant as
a bridge between distributed energy resources and Smart Grid”, 43rd
Hawaii International Conference on System Sciences (HICSS), Hawaii,
USA, January 2010
[4] J.-C. Laprie, K. Kanoun, M. Kaâniche, “Modelling interdependencies
between the electricity and information infrastructures”, 26th
International Conference on Computer Safety, Reliability & Security
(SAFECOMP), Nuremberg, Germany, September, 2007
[5] O. Kosut, L. Jia, R.J. Thomas, L. Tong, “Malicious data attacks on the
Smart Grid”, IEEE Transactions on Smart Grid, vol.2, no.4, pp. 645-
658, December 2011
[6] I. Kaitovic, S. Lukovic, and M. Malek, “Unifying dependability of
critical infrastructures: Electric power system and ICT (concepts, figures
of merit and taxonomy)” 21st IEEE Pacific Rim International
Symposium on Dependable Computing (PRDC), Zhangjiajie, China,
November, 2015
[7] J.P.G. Sterbenz, D. Hutchison, E.K. Cetinkaya, A. Jabbar, J.P. Rohrer,
M. Scholler, and P. Smith, “Resilience and survivability in
communication networks: Strategies, principles, and survey of
disciplines”, ACM Computer Networks Journal, vol.54, no.8, pp.1245-
1265, June 2010
[8] M. Zima, "Special protection schemes in electric power systems",
Literature survey, ETHZ, 2002
[9] A. Atputharajah, and T.K. Saha, “Power system blackouts - literature
review”, International Conference on Industrial and Information
Systems (ICIIS), Sri Lanka, December 2009
[10] L.L. Loi, T.Z. Hao, S. Mishra, D. Ramasubramanian, S.L. Chun, Y.X.
Fang, “Lessons learned from July 2012 Indian blackout”, 9th IET
International Conference on Advances in Power System Control,
Operation and Management (APSCOM), Hong Kong, China, November
2012
[11] F. Salfner, M. Lenk, and M. Malek, “A survey of online failure
prediction methods”, ACM Computing Surveys (CSUR), vol.42, no.3,
art. 10, March 2010
[12] G. Hoffmann, and M. Malek, “Call availability prediction in a
telecommunication system: A data driven empirical approach”, 25th
IEEE Symposium on Reliable Distributed Systems (SRDS), Leeds, UK,
2006
[13] Y. Watanabe, H. Otsuka, Y. Matsumoto, “Failure Prediction for Cloud
Datacenter by Hybrid Message Pattern Learning”, 11th IEEE
International Conference on Autonomic and Trusted Computing (ATC),
Bali, Indonesia, December 2014
[14] J. Endrenyi, et al., A. Schneider, and Ch. Singh, “The present status of
maintenance strategies and the impact of maintenance on reliability”,
IEEE Transactions on Power Systems, vol.16, no.4, pp.638-646,
November 2001
[15] C. Rudin, et al., “Machine learning for the New York city power grid”,
IEEE Transactions on Pattern Analysis and Machine Intelligence,
vol.34, no.2, pp.328-345, February 2012
[16] Alstom Press Centre Home, “Alstom and PG&E to advance
synchrophasor grid monitoring into proactive grid stability
managemen”, September 2014, [Online], Accessed on July 2015,
Available at: http://www.alstom.com/press-centre/2014/8/alstom-and-
pge-to-advance-synchrophasor-grid-monitoring-into-proactive-grid-
stability-management/
[17] Tollgrade Communications, Inc, “Building a predictive grid for the
Motor City”, Predictive Grid Quarterly Report, vol.1, February 2015,
[Online], Accessed on July 2015, Available at:
http://www.metering.com/wp-content/uploads/2015/02/Predictive-Grid-
Quarterly-Report_Vol1_Tollgrade-Communications_Feb15.pdf
[18] M. Chertkov, F. Pan, and M.G. Stepanov, "Predicting failures in power
grids: The case of static overloads", IEEE Transactions on Smart Grid,
vol.2, no.1, pp.162-172, March 2011
[19] A. Avizienis, J.-C. Laprie, B. Randell, and C. Landwehr, “Basic
concepts and taxonomy of dependable and secure computing”, IEEE
Transactions on Dependable and Secure Computing, vol.1, no.1, pp.11-
33, January 2004
[20] F. Salfner, and M. Malek, “Proactive fault handling for system
availability enhancement”, 19th IEEE International Parallel and
Distributed Symposium (IPDPS), Orlando, FL, USA, 2005
[21] I. Guyon, and A. Elisseeff,An introduction to variable and feature
selection”, ACM Journal of Machine Learning Research, vol.3, pp.1157-
1182, March 2003
[22] H. Liu, and L. Yu, “Toward integrating feature selection algorithms”,
IEEE Transactions on Knowledge and Data Engineering, vol.17, no.4,
pp.491-502, April 2005
[23] A.A. Edris et al, “Proposed terms and definitions for flexible AC
transmission system (FACTS)”, IEEE Transactions on Power Delivery,
vol.12, no.4, pp.1848-1853, October 1997
[24] A. Z. Faza, S. Sedigh, and B.M. McMillin, “Reliability analysis for the
advanced electric power grid: From cyber control and communication to
physical manifestations of failure”, 28th International Conference on
Computer Safety, Reliability, and Security (SAFECOMP), p.p. 257-269,
Hamburg, Germany, September 2009
[25] M.H.J. Bollen, “Understanding power quality problems: Voltage sags
and interruptions”, Chapter 7. IEEE Press, September 1999, ISBN: 978-
0-7803-4713-7
[26] S. Cole, and R. Belmans,MatDyn, a new Matlab based toolbox for
power system dynamic simulation”, IEEE Transactions on Power
Systems (accepted for future publication), July 2011
[27] Mei-Chen Hsueh, T.K. Tsai, R.K. Iyer, “Fault injection techniques and
tools”, Computer , vol.30, no.4, pp.75-82, April 1997
[28] M. Vieira, H. Madeira, I. Irrera, and M. Malek, “Fault injection for
failure prediction methods validation”, 5th Workshop on Hot Topics in
System Dependability (HotDep), Estoril, Portugal, 2009
[29] F. Salfner, M. Schieschke, and M. Malek, “Predicting failures of
computer systems: A case study for a telecommunication system”, 20th
International Parallel and Distributed Processing Symposium (IPDPS),
April 2006
... The approaches are general and are applicable to other domains such as disturbance prediction in smart grids [4] and predictive maintenance [5]. ...
... The prediction quality is identified as a critical part of the entire predict-mitigate approach. A design of a failure predictor should be conducted in three phases depicted in Fig. 3 [4]. In the first phase a model of the system should be conceived. ...
... Once the prediction mechanisms anticipate a failure, corrective actions that will mitigate it should be scheduled and activated. The mitigation is composed of three phases [4]: diagnosis, decision on countermeasures and implementation of countermeasures. ...
Conference Paper
Full-text available
The paper lists three major issues: complexity, time and uncertainty, and identifies dependability as the permanent challenge. In order to enhance dependability, the paradigm shift is proposed where focus is on failure prediction and early malware detection. Failure prediction methodology, including modeling and failure mitigation, is presented and two case studies (failure prediction for computer servers and early malware detection) are described in detail. The proposed approach, using predictive analytics, may increase system availability by an order of magnitude or so.
... proactive, and this can be achieved using advanced monitoring tools, data analysis, and predictive methods [3]. ...
Article
Full-text available
The size of power grids and a complex technological infrastructure with higher levels of automation, connectivity, and remote access make it necessary to be able to detect anomalies of various kinds using optimal and intelligent methods. This paper is a review of studies related to the detection of anomalies in smart grids using AI. Digital repositories were explored considering publications between the years 2011 and 2023. Iterative searches were carried out to consider studies with different approaches, propose experiments, and help identify the most applied methods. Seven objects of study related to anomalies in SG were identified: attacks on data integrity, unusual measurements and consumptions, intrusions, network infrastructure, electrical data, identification of cyber-attacks, and use of detection devices. The issues relating to cybersecurity prove to be widely studied, especially to prevent intrusions, fraud, data falsification, and uncontrolled changes in the network model. There is a clear trend towards the conformation of anomaly detection frameworks or hybrid solutions. Machine learning, regression, decision trees, deep learning, support vector machines, and neural networks are widely used. Other proposals are presented in novel forms, such as federated learning, hyperdimensional computing, and graph-based methods. More solutions are needed that do not depend on a lot of data or knowledge of the network model. The use of AI to solve SG problems is generating an evolution towards what could be called next-generation smart grids. At the end of this document is a list of acronyms and terminology.
... Therefore, fault prediction of networks is of great importance between two networks in the smart grid. In comparison with a single network, fault prediction becomes challenging work in coupled networks for the academic community [5]. ...
Article
Full-text available
Smart grid, responsible for upgrading traditional power networks by integrating with cutting-edge information and communication networks, forms coupled networks but also pose potential hazards in the face of fault cascade. In coupled networks, fault prediction is of significance because tight interaction between power nodes and communication nodes makes the smart grid more vulnerable. Unfortunately, most existing works of fault prediction are specific to a single network and do not consider the correlation of coupled elements. To address these limitations, in this paper, we highlight the interdependence of networks and define fault correlation. Further, we propose a probabilistic prediction model using collaborative filtering in machine learning. We finally present an online prediction algorithm. We conduct experiments to illustrate the effectiveness of our prediction algorithm with different parameters and give some observations that may give more insight into interdependent networks.
... The large-scale penetration of DG, and more generally of distributed energy resources, into conventional electricity systems has posed numerous challenges [4], for operators involved in the operation and maintenance of modern grids, e.g., islanding detection [5], voltage and frequency regulation [6], harmonic distortion [7], electromagnetic interference [8], optimal demand side management of prosumers [9], as well as low-environmental impact routing of overhead power lines for the connection of renewable energy plants [10]. Yet, a large number of opportunities are offered by future proactive [11] and transactive [12] energy systems. One of the main problems is related to the integration of high intermittent and stochastic energy production of RESs, such as solar power and wind energy, into deterministic energy systems, thus demanding an improvement of their own flexibility [13]. ...
Article
Full-text available
The large-scale penetration of renewable energy sources is forcing the transition towards the future electricity networks modeled on the smart grid paradigm, where energy clusters call for new methodologies for the dynamic energy management of distributed energy resources and foster to form partnerships and overcome integration barriers. The prediction of energy production of renewable energy sources, in particular photovoltaic plants that suffer from being highly intermittent, is a fundamental tool in the modern management of electrical grids shifting from reactive to proactive, with also the help of advanced monitoring systems, data analytics and advanced demand side management programs. The gradual move towards a smart grid environment impacts not only the operating control/management of the grid, but also the electricity market. The focus of this article is on advanced methods for predicting photovoltaic energy output that prove, through their accuracy and robustness, to be useful tools for an efficient system management, even at prosumer’s level and for improving the resilience of smart grids. Four different deep neural models for the multivariate prediction of energy time series are proposed; all of them are based on the Long Short-Term Memory network, which is a type of recurrent neural network able to deal with long-term dependencies. Additionally, two of these models also use Convolutional Neural Networks to obtain higher levels of abstraction, since they allow to combine and filter different time series considering all the available information. The proposed models are applied to real-world energy problems to assess their performance and they are compared with respect to the classic univariate approach that is used as a reference benchmark. The significance of this work is to show that, once trained, the proposed deep neural networks ensure their applicability in real online scenarios characterized by high variability of data, without requiring retraining and end-user’s tricks.
... Elevating substations, considering floods, is another possible construction approach to avoid damage in electrical grids. Also, the risk management of an EPG will help to understand what can be changed or improved in order to decrease the faults and susceptibilities of the system [73,90,91]. ...
Article
Full-text available
One of the most critical infrastructures in the world is electrical power grids (EPGs). New threats affecting EPGs, and their different consequences, are analyzed in this survey along with different approaches that can be taken to prevent or minimize those consequences, thus improving EPG resilience. The necessity for electrical power systems to become resilient to such events is becoming compelling; indeed, it is important to understand the origins and consequences of faults. This survey provides an analysis of different types of faults and their respective causes, showing which ones are more reported in the literature. As a result of the analysis performed, it was possible to identify four clusters concerning mitigation approaches, as well as to correlate them with the four different states of the electrical power system resilience curve.
... This may involve running the optimization in the control centers with frequent historic scenarios of failures. Devices such as phase measurement units (PMUs) can make data accessible to control centers in real-time [21], [22] so that the mitigation actions are immediately triggered. PMUs may provide data with the rate of up to 60 [23] or even 120 samples 8 trees [24] or controlled and optimized islanding of a power system in the order of milliseconds for its protection [25]. ...
Article
The introduction of active devices in Smart Grids, such as smart transformers, powered by intelligent software and networking capabilities, brings paramount opportunities for online automated control and regulation. However, online mitigation of disruptive events such as cascading failures, is challenging. Local intelligence by itself cannot tackle such complex collective phenomena with domino effects. Collective intelligence coordinating rapid mitigation actions is required. This paper introduces analytical results from which two optimization strategies for self-repairable Smart Grids are derived. These strategies build a coordination mechanism for smart transformers that runs in three healing modes and performs collective decision-making of the phase angles in the lines of a transmission system to improve reliability under disruptive events, i.e. line failures causing cascading failures. Experimental evaluation using self-repairability envelopes in different case networks, AC power flows and varying number of smart transformers confirms that the higher the number of smart transformers participating in the coordination, the higher the reliability and the capability of a network to self-repair.
Article
Real-time monitoring and control of smart grids is critical to the enhancement of reliability and operational efficiency of power utilities. We develop a real-time anomaly detection framework, which can be built based upon smart meter data collected at the consumers’ premises. The model is designed to detect the occurrence of anomalous events and abnormal conditions at both lateral and customer levels. We propose a generative model for anomaly detection that takes into account the hierarchical structure of the network and the data collected from smart meters. We also address three challenges existing in smart grid analytics: (i) large-scale multivariate count measurements, (ii) missing points, and (iii) variable selection. We present the effectiveness of our approach with numerical experiments.
Conference Paper
Full-text available
Grid disturbances occurred on 30th and 31st of July 2012 leaving millions of Indians in the dark for hours. It was understood that in the blackout that occurred on 31st of July, hydro power was slowed down which led to inadequate power generation while people overdrew more power for cooling off since the temperature was extremely high. This paper investigates the main reasons for the occurrence of the blackout. A simplified simulation model was established with DIgSILENT Power Factory to reproduce the grid behavior when the blackout occurred. After the sensitivity analysis, policy and strategies are recommended to enhance the grid security and robustness at the end of the paper.
Conference Paper
Full-text available
Availability prediction in a telecommunication system plays a crucial role in its management, either by alerting the operator to potential failures or by proactively initiating preventive measures. In this paper, we apply linear (ARMA, multivariate, random walk) and nonlinear (Radial and Universal Basis Functions) regression techniques to recognize system failures and to predict the system's call availability up to 15 minutes in advance. Secondly we introduce a novel nonlinear modeling technique for call availability prediction. We benchmark all five techniques against each other. The applied modeling methods are data driven rather than analytical and can handle large amounts of data. We apply the modeling techniques to real data of a commercial telecommunication platform. The data used for modeling includes: a) time stamped event-based log files; and b) continuously measured system states. Results are given in terms of a) receiver operator characteristics (AUC) for classification into classes of failure and non-failure states and b) as a cost-benefit analysis. Our findings suggest: a) high degree of nonlinearity in the data; b) statistically significant improved forecasting performance and cost-benefit ratio of nonlinear modeling techniques; and finally finding that c) log file data does not contribute to improve model performance with any modeling technique
Article
This paper introduces concepts and algorithms of feature selection, surveys existing feature selection algorithms for classification and clustering, groups and compares different algorithms with a categorizing framework based on search strategies, evaluation criteria, and data mining tasks, reveals unattempted combinations, and provides guidelines in selecting feature selection algorithms. With the categorizing framework, we continue our efforts toward building an integrated system for intelligent feature selection. A unifying platform is proposed as an intermediate step. An illustrative example is presented to show how existing feature selection algorithms can be integrated into a meta algorithm that can take advantage of individual algorithms. An added advantage of doing so is to help a user employ a suitable algorithm without knowing details of each algorithm. Some real-world applications are included to demonstrate the use of feature selection in data mining. We conclude this work by identifying trends and challenges of feature selection research and development.
Article
FACTS is an acronym which stands for Flexible AC Transmission System. FACTS is an evolving technology-based solution envisioned to help the utility industry to deal with changes in the power delivery business. This paper presents results of Task Force 3 of the IEEE's FACTS Working Group of the DC and FACTS Subcommittee which had the assignment to establish appropriate definitions of FACTS-related terminology. These definitions will be included in the IEEE Dictionary.
Conference Paper
In operations and management of large-scale cloud data enters, it is essential for administrators to handle failures occurring in their infrastructure before causing service-level violations. Some techniques for failure prediction have been studied because they can be used to start the troubleshooting process at the early stage of troubles and to prevent service-level violations from occurring. By its nature, however, failure prediction involves a certain amount of incorrect detection (false-positive). When applying failure prediction to the operation and management of cloud data enters, incorrect detection can result in the execution of unnecessary workaround tasks and additional costs. Existing methods for failure prediction using Bayesian inference to identify message patterns related to a certain failure are difficult to apply to relatively stable systems, because the accuracy of their predictions deteriorates in environments where failure rarely occurs. In order to solve this problem, we propose a novel method to improve the accuracy of failure prediction by suppressing incorrect detections using a hybrid score that integrates the probability of simultaneous occurrence between a message pattern and a failure and frequency of the message patterns for the failure. We implemented this method and evaluated the accuracy in a real commercial cloud data enter. The evaluation results revealed that it improved the accuracy of failure prediction by 31.9% compared with the existing method in terms of precision in the best case.
Article
The energy management system (EMS) at utility control centers collects real-time measurements to monitor current grid conditions. The EMS is also a suite of analytics that synthesizes these measurements to provide the grid operator with information to identify current problems and potential future problems. With evolving grid influences, such as growth of variable renewable generation resources, distributed generation, microgrids, demand response (DR), and customer engagement programs, managing the grid is becoming more challenging. Concurrently, however, there are nascent new technologies and advances in grid management schemes that will improve the ability to manage the future grid operations. These technologies include new subsecond synchrophasor measurements and analytics, advances in highperformance computing, visualization platforms, digital relays, cloud computing, and so on. Advances in grid management schemes include adding more intelligence at the substation and distribution systems, as well as microgrids and wide-area monitoring systems. One key initiative is to develop a predict-and-mitigate paradigm enabling anticipatory vision and timely decisions to mitigate potential problems before they spread to the rest of the grid. The word “proactive”means “to act now in anticipation of future problems.”Proactive grid management opportunities and solutions are described in this paper.
Article
Failure prediction methods are becoming sine qua non conditions for effective availability enhancement in complex computer and communication systems. Therefore, there is a growing need for validation, benchmarking and assessment of such methods on real industrial data. Our thesis is that the effectiveness of such methods can be significantly enhanced when combined with fault injection. Then, not only failures can be predicted but also potential root causes can be identified based on symptoms, which can be observed at runtime on the system. We first briefly introduce failure prediction and fault injection methods and then present a methodology for improving and validating failure prediction methods using fault injection. We anticipate that with our approach we will be able to more efficiently predict forthcoming outages and also identify root causes which in turn will enable effective recovery or failure avoidance and as a result substantially enhance availability.
Book
“Power quality problems have increasingly become a substantial concern over the last decade, but surprisingly few analytical techniques have been developed to overcome these disturbances in system-equipment interactions. Now in this comprehensive book, power engineers and students can find the theoretical background necessary for understanding how to analyze, predict, and mitigate the two most severe power disturbances: voltage sags and interruptions. This is the first book to offer in-depth analysis of voltage sags and interruptions and to show how to apply mathematical techniques for practical solutions to these disturbances. From UNDERSTANDING AND SOLVING POWER QUALITY PROBLEMS you will gain important insights into Various types of power quality phenomena and power quality standards Current methods for power system reliability evaluation Origins of voltage sags and interruptions Essential analysis of voltage sags for characterization and prediction of equipment behavior and stochastic prediction Mitigation methods against voltage sags and interruptions.