Conference PaperPDF Available

Anomaly Detection in Electrical Substation Circuits via Unsupervised Machine Learning

July 2016

July 2016

DOI:10.1109/IRI.2016.74

Conference: 2016 IEEE 17th International Conference on Information Reuse and Integration (IRI)

Authors:

Alfonso Valdes

University of Illinois, Urbana-Champaign

Richard Macwan

Information Trust Institute

Matthew Backes

Massachusetts Institute of Technology

Distribution circuit under evaluation.

…

Summary of Results Number of Anomalous Samples in Event Trace

…

Figures - uploaded by Alfonso Valdes

Content may be subject to copyright.

Content uploaded by Alfonso Valdes

Content may be subject to copyright.

Anomaly Detection in Electrical Substation Circuits via

Unsupervised Machine Learning

Alfonso Valdes

University of Illinois at Urbana-

Champaign

1308 W Main St

Urbana, IL 61801 USA

(217)244-5147

avaldes@illinois.edu

Richard Macwan

University of Illinois at Urbana-

Champaign

1308 W Main St

Urbana, IL 61801 USA

rmacwan@illinois.edu

Matt Backes

University of Illinois at Urbana-

Champaign

1308 W Main St

Urbana, IL 61801 USA

mbackes2@illinois.edu

ABSTRACT

Cyber-physical systems (CPS), such as smart grids, include

distributed cyber assets for monitoring, control, and

communication in order to maintain safe and efficient operation of

the physical system in question. Security in CPS may be able to

leverage physical laws that govern the CPS, providing a defense

strategy complementing conventional cybersecurity measures. CPS

intrusion detection systems (CPS IDS) should seek not just to detect

attacks in the host audit logs and network traffic (cyber plane), but

should consider how attacks are reflected in measurements from

diverse devices at multiple locations (physical plane). In electric

grids, voltage and current laws induce physical constraints that can

be leveraged in distributed agreement algorithms to detect

anomalous conditions where the physical and cyber states are

inconsistent. This can be done by explicitly coding the physical

constraints into a hybrid CPS IDS, but the detector is then specific

to a particular CPS. We propose an alternative approach using

machine learning to characterize normal, fault, and attack states in

a smart distribution substation CPS, using this as a component of a

CPS IDS. Our innovative approach does not require that attack

states be rare, nor does it require clean training data. Initial results

indicate that attack states are either learned as unique classes if they

are present in the training phase, or are easily detected as

anomalous by the trained system, and that normal and non-

malicious fault states are learned as well.

CCS Concepts

Security and privacy



Intrusion/anomaly detection and

malware mitigation, Hardware



Power and energy



Energy

distribution



Smart grid,

Computing methodologies



Machine learning



Unsupervised

learning



Anomaly detection,

Computing methodologies



Machine learning



Machine

learning approaches



Neural networks.

Keywords

Anomaly Detection; Distributed Agreement; Machine Learning;

Smart Grid; Cyber/Physical System Security; Neural Networks;

Self-Organizing Maps; Adaptive Resonance Theory; Competitive

Learning; IEC 61850

1. INTRODUCTION

Modern infrastructure systems, such as those in energy delivery,

are rapidly evolving into cyber-physical systems (CPS) in which

distributed cyber assets for monitoring, communication, and

control interface with a physical process for safe and efficient

operation. Cyber assets include human-machine interfaces (HMI)

in control rooms, as well as embedded systems in substations or in

the field outside of any physical security perimeter. Increasingly,

these systems use commodity operating systems and networking

protocols and technology, with the hardware possibly hardened for

a harsh physical environment, but in many cases logically not

unlike those found in enterprise systems. Typically, legacy control

protocols such as MODBUS are adapted to modern CPS via

encapsulation (MODBUS over TCP [1]), while newer protocols

such as IEC 61850 are designed to layered upon modern

networking protocols such as TCP and Ethernet.

There is concern that incorporation of extensive cyber assets in CPS

makes them vulnerable to cyber attack, which potentially leads to

failure of the physical process (for example, power outage in an

electrical system), threat to physical safety, damage to expensive

equipment, and environmental consequences. Embedded systems

in the field are frequently constrained with respect to

communication and computational capability. This limitation, as

well as strict real time requirements often found in CPS, renders

adoption of conventional security technology problematic. CPS

may be attacked via vectors similar to those used to attack

enterprise systems, such as malicious protocol commands or device

compromise. Another important class of attacks in CPS is to inject

incorrect measurement data, causing the CPS to undertake incorrect

and potentially destabilizing control action.

The special challenges to security in CPS are mitigated when the

laws governing the physical system induce constraints in what

should be observed in the cyber system. Understanding the physics

may afford an opportunity to enforce consistency across multiple

devices at different locations using diverse approaches for

measurement and control.

We consider the attacker who has some knowledge of the

underlying protocol and can inject a limited number of false

measurements that are correct as far as the protocol syntax. We

refer to this as a data injection attack. Our defenses make such an

attack significantly more difficult by leveraging the underlying

physical constraints, thereby requiring the simultaneous

compromise of a diversity of devices at multiple points in the

system. Merely injecting false measurements into the network

traffic, even if these are syntactically correct with respect to the

control protocol and pass rudimentary range checks, will not work

Permission to make digital or hard copies of all or part of this work for

personal or classroom use is granted without fee provided that copies are

not made or distributed for profit or commercial advantage and that copies

bear this notice and the full citation on the first page. To copy otherwise,

or republish, to post on servers or to redistribute to lists, requires prior

specific permission and/or a fee.

ACM e-Energy Conference, June 21-24, Waterloo, Ontario, Canada

if the defender is able to quickly assess these for consistency with

the global system state, from a physical standpoint.

In order to implement security measures based on this approach,

we may encode the constraints imposed by the underlying physics.

This was the approach we took in prior work [6][8]. While these

works demonstrated the efficacy and feasibility of the approach, it

is the case that the implementation requires extensive configuration

for a specific CPS. In the present work, we demonstrate that a

machine learning/anomaly detection approach can learn system

states induced by the underlying physical laws, without explicitly

encoding these laws into our detector.

We present initial results of an unsupervised machine learning

approach to anomaly detection, specifically detection of

measurement patterns corresponding to normal operating

conditions, true non-malicious fault currents, and false data

injection in electrical distribution circuits. The learning approach

we use has aspects of Self-Organizing Maps (SOM) [4] and

Adaptive Resonance Theory (ART) [2]. Patterns of voltage and/or

current measurements at various points in the circuit are presented

to the classifier. The classifier ideally learns a number of pattern

exemplars that represent the observed measurement patterns, and

in particular learns patterns for normal and true fault conditions, as

well as for injected measurements falsely purporting to report a

fault condition.

In conventional cybersecurity, “anomaly detection” refers to

detection of extremely unusual events, where “unusual” may be

determined by observing the system to characterize normal

behavior. Anomaly detection in IDS is differentiated from

signature-based detection in which the IDS searches for patterns of

misuse, such as known malware byte sequences [5]. Although

anomaly detection potentially protects systems against unknown

attacks, such as zero-day attacks, in practice the false alarm and

missed detection rates frequently fall short of expectations. A

hypothesis underlying our work is that the constraints imposed by

the physical laws enable our system to overcome these

shortcomings.

Anomaly detection approaches typically try to construct a decision

surface between observations labeled “normal” and “anomalous”.

In practice, there is often the further (questionable) assumption that

malicious events are rare and thus anomalous. Moreover, many

learning-based approaches require clean training data with no

exemplars from the malicious class(es) [5]. The approach we

present has advantages over anomaly detection schemes that tend

to lump all “normal” cases into one class, in that our approach

explicitly allows for multiple patterns of normal conditions (in

electrical systems, these might correspond to different points in the

load curve). Another advantage is that the “anomalous” cases may

fall into several classes, and are potentially learned as different

pattern classes if present in the training phase.

While the approach is applicable to domains other than electrical

circuits, we believe that it is particularly effective in this domain

because the CPS is governed by well-understood physical laws,

namely, Kirchhoff current and voltage laws (KCL/KVL). We do

not claim that our classifier has learned KCL/KVL, but that these

laws induce measurement patterns for physically feasible system

states that are learned as pattern classes in our system. We

conjecture that the approach is applicable to other CPS where

physical laws constrain what is consistent among measurements,

even if those laws are not explicitly encoded into the classifier.

Our contribution is to demonstrate the feasibility of the approach in

the context of electrical distribution substation circuits, and in

particular smart grid systems. In these systems, modern protection

schemes (the rapid detection and isolation of a fault with minimum

outage while maintaining safe operation) depend on cyber assets

for ubiquitous measurement, communication, and control. A

classifier of measurement patterns for steady-state and fault

conditions is a form of distributed agreement among different

devices in the smart substation to detect and mitigate the effect of

an attacker who can falsify a limited number of measurements

available to the protective relays.

Moreover, as discussed in more detail in the results summary

below, our algorithm can be coupled with modern, simulation-

based design and analysis of electrical systems to identify points

where redundant measurements and/or enhanced cyber defenses

are most beneficial.

Throughout this paper we use the term “anomaly” to refer to a

cyber-physical anomaly, specifically, an anomalous pattern of

measurement values potentially indicative of a data injection attack

into the measurements in the substation protection environment. If

undetected, such an attack may trigger a system response that is

unwarranted, suboptimal, and potentially dangerous or

destabilizing.

2. SYSTEM UNDER EVALUATION

We model a distribution substation circuit that connects to the

upstream grid at 69 kV, includes a 7.5 MVA, 69 kV to 13.09 kV

transformer, then a circuit bus with two three-phase distribution

feeders that serve balanced, three-phase customer loads of 1 MW

and 1 MVAR peak. This is based on an actual distribution

substation on the US Eastern Interconnection. The substation

circuit topology is given in Figure 1. We have set up a hardware-

in-the-loop testbed in our laboratory, with the circuit defined in a

Real Time Digital Simulator (RTDS [7]) system connected to

physical ABB protective relays (ABB REF 615 series) at locations

90, 91, 92, and 93. The simulated substation operates under the IEC

61850 protocol [3]. For this work, we utilized only the substation

simulation in RTDS rather than a full hardware-in-the-loop setup

to generate simulated measurements for different contingencies and

injection attacks.

Figure 1. Distribution circuit under evaluation.

The KCL/KVL conditions will hold either in either steady-state

system operation or in the case of true, physical short circuit fault

currents (non-malicious faults). In the latter case, the relays should

undertake collective action to open breaker(s) so as to isolate the

fault and minimize the extent of the outage. The measurements are

simulated under varying load conditions and with measurement

noise that model those observed on the actual system in the field.

In addition to the simulated faults at various points in the circuit,

we also introduce some measurement samples corresponding to

injection attacks.

In our earlier work, we described distributed agreement algorithms

for detecting data injection attacks in this circuit [6], as well as in a

ring-bus topology [8] which is more typical of transmission

substations. In this earlier work, we established the efficacy of

detecting these attacks via applying KCL/KVL constraints,

explicitly coding these constraints in the detection algorithms.

A false injected measurement that evades detection may trigger an

unnecessary protective action, leading to economic consequences

and denial of service in a real system. A detection system based on

distributed agreement using KCL/KVL conditions raises adversary

work factor significantly, requiring the adversary to possess

detailed knowledge of the system and the ability to inject precise,

time-aligned false data at multiple points in the system. We now

demonstrate that our system, based on machine learning and

anomaly detection rather than explicitly encoding KCL/KVL

conditions, similarly raises adversary work factor. In either case,

the attack is not impossible, but it is significantly more difficult

than getting access to and compromising a measurement device.

Also, as shown in [6], fast remedial actions can be triggered by the

distributed agreement to mitigate the impact of the attack.

3. Simulation and Data Generation

We use the RTDS [7] to simulate the electrical distribution

substation circuit and collect measurements for the machine

learning algorithm at the locations where the relays are located.

RTDS and similar systems are widely used in the utility sector to

define numerous topologies, signal actual and virtual substation

components at high sample rates, and conduct high-fidelity

simulation analyses to enable effective system design.

The feature vector for the results discussed here consists of RTDS-

generated, time-aligned voltage and current magnitudes for all

three phases at the four locations, for a total of 24 features if we

simultaneously consider voltage and current, or 12 features if we

consider either separately.

We simulate the circuit for a total of 120 seconds, representing

circuit conditions and events in a compressed time frame. The

circuit has a time-varying load that is typical of a daily residential

customer load profile. The 24-hour load profile is compressed into

24 seconds, allowing the learning algorithm to process the entire

load profile and create corresponding patterns. The load profile is

continuously played in a 24 second loop for the duration of the

simulation. Load consumption levels are based on the actual

distribution substation feeder loads. Figure 2 shows the compressed

load curve.

Figure 2: Load Curve Compressed to 24s

In order to make the simulation realistic and to assess the ability of

the machine learning algorithm to identify patterns in presence of

noisy measurement, we introduce noise levels into the relay

measurements in the RTDS simulation. The noise is assumed to be

Gaussian with mean 0 and standard deviation corresponding to a

1% signal-to-noise ratio (SNR), typical of a distribution circuit.

The standard deviation for the Gaussian noise is given by

! " #$%&'()*+,

-./012

34 5

where the Signal_rms is the RMS value of either the voltage, or

current sine wave. As can be seen from the equation, the standard

deviation changes with the signal level. We base the standard

deviation using the full-load current and nominal voltage values,

which we consider to be most challenging to the algorithm. In this

scenario, the noise level for the signals at a lower load level will be

higher than what it normally would be, but the assumption is that if

the machine learning algorithm is able to identify different system

states in this scenario, then it will be able to identify system states

for a noise level lower than this.

The sampling rate for the simulated measurements is approximately

260 Hz. Typically, phasor measurement units (PMU) sample at a

30 Hz rate, but we chose a higher sampling rate in order to emulate

the sampled value measurement scheme available under the IEC

61850 standard [14] which is the substation environment assumed

here.

We use the RTDS to simulate balanced, three-phase-to-ground

short-circuit faults at three different locations of the distribution

circuit. Two of the faults are located at the end of the distribution

feeders 1 and 2, and the third fault is located on the circuit bus 2.

We consider faults of this type for the sake of simplicity and in

order to test the efficacy of the machine learning algorithm at this

stage in our analysis, and also due to the severity of damage that

can occur with this type of fault. It is of more interest to ensure

proper protection relay operation for this type of fault as opposed

to, say, a single line-to-ground fault. In addition, three phase-to-

ground faults allow us to obtain much higher fault currents, which

will be used to test the machine learning algorithm’s ability to

correctly match patterns from faults at the same location but of

different magnitudes.

Since both the time of occurrence of the fault on the load curve

cycle and the fault magnitude of are random and unpredictable, it

is necessary to account for both these variables while generating

data for machine learning. As stated before, the aim of the machine

learning algorithm is to identify the different states of the system

with minimum false positives for each state. Keeping this in mind,

the different faults were simulated for both the training phase and

the validation phase, varying magnitude and point on the load

curve.

The fault impedance takes on two values. The first is 70 Ohms. This

fault impedance was chosen to give a fault current near two times

the full-load current of the circuit at each fault location. The second

fault impedance is 20 Ohms. This impedance gives fault currents

of five times (relay 92), seven times (relay 91), and ten times (relay

93) the full load current, depending upon which location the fault

occurs. The duration of all faults is 0.05 seconds, or three full cycles

(at 60 Hz). Due to transition from fault states to normal, the trace

for an event in our simulation is non-deterministic, but typically

ranges from 14 to 20 samples. The injection attacks typically last

for 30 samples. We will use the term “trace” to refer to the trace of

consecutive samples corresponding to a fault or attack event.

4. OVERVIEW OF LEARNING

APPROACH

In studies exploring machine learning, it is typical to assign samples

at random to training and test (or validation) data sets. The system

learns from samples in the training set, and claims of generalization

are based on results of the trained system as applied to the

validation set, without further learning. There are more complicated

assignments to validation and training, such as n-fold cross-

validation.

We will denote by “sample” a vector of time-aligned measurement

values at the various points of the circuit, and use “trace” to denote

the sequence of contiguous samples corresponding to a fault or data

injection event. For scoring detections and false alarms, we shall

use the nominal count of 15 samples per fault and 30 per injection

attack, although these are non-deterministic and event traces

exhibit transients at the start and/or end which can be difficult to

assign to the event or to normal. We believe that these values are

high, so that, for example, claiming 90% detection when we detect

27 samples of an attack may in fact be understating the actual

performance. Therefore, our claimed detection results are

conservative.

In this study, we generate a total of 31,250 time aligned samples

and assign the first 5000-6000 samples to training and the rest to

validation. Events of interest, like short circuit faults and data

injection attacks, are simulated at various points in the sample

stream. As it will be further discussed in section 5, the different

scenarios of machine learning are studied considering the inclusion

or exclusion of certain events in the training data. We claim that

this assignment is without significant loss of generality because the

introduced noise is stationary and we do not consider

autocorrelation between samples. Indeed, this lets us make

assertions about the ability of the system to generalize when events

of a particular type occur at previously unseen points on the load

curve.

In the training set, samples 1-5000 contain faults at relays 91 and

92, samples 5001-5500 contain a fault at relay 93, and 5500-6000

contain an injection attack on relay 90. We observe that these are

in the early part of the load curve of Figure 2, dipping to the first

local minimum, so that, if successful, our approach gives

confidence of generalizability beyond specific load characteristics.

We now have the option to choose our training set to be the first

5000, 5500, or 6000 traces/samples to give us a way to observe how

well the machine learning algorithm can identify different event

classes, depending on whether they are included in the training set

or not. In particular, we make the claim that, unlike many other

anomaly detection systems in cyber security, we do not require

clean (that is, attack-free) data in training. This claim can be

evaluated by training on samples 1-6000, which contain an attack.

The remainder of the data generated is the validation set, which

begins with steady-state circuit operation so we can verify that the

varying load curve does not cause anomaly detection. Following

this, the six true faults and eight data injection attacks occur

randomly in the remainder of the samples. Each event happens once

within the validation set.

We would like to observe how well the machine learning algorithm

can generalize to detect true faults and data injection attacks

regardless of where they happen on the load curve. For example,

we wish to confirm that a fault (at the same location) that occurs at

the peak of the load profile matches the pattern for the fault that

occurs at the trough of the load profile.

Next, we would like to observe how well the machine learning

algorithm can generalize to detect of faults at the same location, but

with the fault current being a different magnitude from that seen in

training. In this case, we compare how well faults in the training

set, with fault current being twice the level of full-load current,

match with faults in the validation set having higher fault current,

e.g. five times the full-load current.

The algorithm has configurable parameters for learning rate,

goodness of pattern match, criteria for generating new learned

classes, and for blending similar learned pattern classes. The

algorithm includes outer and inner learning loops. On each pass

through the inner loop, a sample pattern (a measurement trace, after

feature normalization) is presented to the classifier. The outer loop

adjusts learning rates and match criteria, and prunes the learned

SOM by merging similar classes.

The various features are in different units (volts and amperes) or

are of varying magnitude (for example, voltages at Bus1 and Bus2

differ by more than a factor of 5 due to the transformer). For this

reason, we normalize each feature (column in the matrix) by

subtracting the mean and dividing by the standard deviation of the

feature. We then subtract the row mean from each row (time sample

in the matrix), which centers the sample about zero, removing some

of the effect of the location of the sample on the load curve. Finally,

we process the individual values through a squashing function,

given by

#67',8 9 " -- : ;<=>?

where , is a scaling factor, set to 1.0 for our experiments.

Using terminology from Kohonen, the set of currently learned

patterns is referred to as the SOM (Self-organizing map), although

our technique for learning is different from that source. Kohonen

begins with multiple random patterns, whereas we begin with one.

New patterns are learned if a sample does not match a pattern in the

SOM. In this respect, our approach is more like that in ART. The

learned SOM may be considered the knowledge base of our system.

4.1 INNER LOOP

The inner loop operates by presentation of training patterns to the

classifier. Depending on how well sample patterns match learned

patterns, the learned patterns are reinforced or new learned pattern

classes are defined. In the inner loop, samples are presented at

random, with the number of patterns presented equal to some

multiple of the number of patterns. In this way, the classifier “sees”

each training pattern multiple times. This is a batch learning

operation, but with minor modifications could be adapted to

continuous or streaming-mode learning. Such a system would

likely run the inner loop for every sample in real time, and

periodically invoke instances of the outer loop in batch mode (the

outer loop is described below). In this case, it may be possible to

implement a system based on these concepts as a real-time CPS

security module.

In the following, @Ais a pattern currently learned in the SOM, and

B is the training sample pattern presented to the classifier. The

pattern match we use is based on the normalized dot product or

vector cosine, given by

C'DE8 @ F B " @GB@ B

This will be unity if the arguments are collinear. The winning

pattern is the currently learned pattern with the highest match score,

provided that score exceeds the goodness of pattern match

threshold. The winning pattern is adjusted slightly in the direction

of the presented pattern, according to the learning rate. If no

currently learned pattern matches the presented pattern to the

required goodness of match, the presented pattern becomes a new

exemplar for a learned pattern class. This is conceptually similar to

ART, where the match is the degree to which the new pattern

“resonates” with an already learned exemplar class, and a new

exemplar class is defined if no existing pattern class resonates

adequately. The learning operation modifies a learned pattern as

follows:

@ " - H I @ : IB

where I is initially set to 0.2 and decreased according to a

schedule that reduces it by half after each completion of all

iterations of the inner loop. As this parameter decreases, the

influence of new patterns on the learned pattern classes is lessened,

to reflect the expectation that further into the learning, the learned

classes should converge to the actual modes induced by the

physical laws.

Figure 3 shows pseudocode for the inner loop.

4.2 OUTER LOOP

An outer loop iteration consists of all required iterations of the inner

loop and then adjustments to various parameters.

We track the number of patterns that each learned pattern in the

SOM “wins”. If at the end of inner loop execution a pattern class in

the SOM wins too few data patterns, as determined by a pruning

threshold, it is removed (pruned) from the SOM.

At the end of each outer loop operation, we check the SOM for

similar patterns (effectively, we run the SOM through the SOM).

Patterns that match other patterns according to the goodness of

match criterion are optionally blended according to a weighted

average based on the number of patterns each has won. Figure 4

provides pseudocode for the pattern blending logic. Pattern

blending has the effect of further pruning the SOM. The results in

this paper reflect pruning and pattern blending for all but the last

iteration of the outer loop.

At the end of each outer loop iteration and after optional pattern

blending, the goodness of match criterion, initially 0.85, is adjusted

so that it is halfway between its present value and a configurable

capped value. For the results presented here, we used a cap value

of 0.95. We have experimented with values as high as unity, which

is a more stringent match requirement. This has the effect of

requiring a higher quality match of the SOM to the set of training

patterns as training proceeds. Using a more stringent match

criterion results in more normal patterns in the learned SOM, but

the results with respect to anomalous patterns do not change.

The code has an option to perform a simulated annealing operation

at outer loop iterations, wherein the learned SOM is randomized

slightly according to an annealing schedule wherein the

randomization is reduced later in the schedule. This made no

difference in the detection performance of anomalous cases, and

has been disabled for the results presented here. Pseudocode for the

outer loop is provided in Figure 5.

5. RESULTS

Our study used 31,250 time-aligned measurement patterns, with the

first n patterns assigned to the training set, and the remaining to the

test/validation set. As described above, the set includes

observations corresponding to true (non-malicious) ground faults at

the locations 91, 92, and 93 in Figure 1, and observations

corresponding to injection attacks at all four relay locations. By

changing the number of training samples n, we can generate

Let NSOM=number of learned patterns in the

SOM

Let Nj=number of patterns won by SOM pattern

class j

For each input pattern Y

j=ArgMax(Match(Y,Xj))

For some threshold Twin

If (Match(Y,Xj)> Twin)

Nj=Nj+1

Xj=(1-W)Xj+WY

else

NSOM= NSOM+1

X NSOM=Y

NNSOM= 1

endif

Figure 3. Pseudocode for inner loop.

Xi, Xj : Two learned patterns in the SOM

Ni, Nj: Number of patterns each of the these has won

For some threshold Tblend,

If (Match(Xi,Xj)> Tblend)

Xi=(NiXi+NjXj)/(Ni+Nj)

PRUNE(Xj)

endif

Figure 4. Pseudocode for pattern blending.

Execute inner loop

Adjust learning weight W

Adjust thresholds Twin and Tblend

Prune SOM

Blend Patterns

Figure 5. Pseudocode for outer loop.

training sets that include only two faults, all three faults, and zero

or one injection attacks.

The method is considered successful if the following conditions are

met (n is the number of samples in the training set for each

condition described):

• For n=5000, the fault at position 93 is not in the training

set. The other faults are learned as patterns, and faults at

the same location in the validation set should not

generate an anomaly, even though they are of different

magnitudes and at different points on the load curve. All

instances of the fault at position 93 will appear

anomalous, as will all the injection attacks.

• For n=5500, all faults are in the training set. These

should be learned as distinct patterns, and should not

trigger anomalies when encountered in the validation

set. All injection attacks should appear anomalous. This

and the n=5000 are considered “clean” in that they are

attack-free.

• For n=6000, the injection attack at relay 90 is included

in the training set. This should be learned as a pattern

class containing only the injection attack. In the

validation phase, future instances of this attack should

match the learned attack pattern and not trigger an

anomaly, but the other injection attacks should trigger

anomalies. This demonstrates that the system can learn

an attack pattern without diluting its ability to classify

normal patterns or detect future attack patterns.

The event timeline is schematically presented in Figure 6. In the

detailed description of each experiment that follows, we establish

that these success criteria are met. The 120 seconds represent five

compressed days (cycles of the load curve), as discussed

previously.

The anomaly score for a sample is the match score for the class that

the sample most closely fits. For these runs, we use an anomaly

threshold of 0.90 (anything below is considered anomalous).

Figure 6. Simulation Timeline with injected events

For all runs, we define a false alarm as a sample that is incorrectly

declared anomalous, and a missed detection (or false negative) as a

sample that should have been declared anomalous but was not. All

samples corresponding to steady-state operation should appear

normal, and they do for all runs. Depending on the training set

selection, the fault at relay 93 and the attack at relay 90 may or may

not be anomalous, as explained below.

5.1 First Training Set: n=5000

The classifier learns 3 pattern classes for steady-state operation

(4971 matches) as well as for the non-malicious fault states at relays

91 and 92 (15 and 14 matches, respectively), and classifies all

patterns in both the training and validation sets from these classes

correctly. All patterns corresponding to non-malicious faults

correspond to a pattern class for each fault location. No other

samples are classified as belonging to either of these classes.

As the fault at relay 93 event is not in the training set, we expect it

to be detected as an anomaly in the instance starting near sample

5400 as well as all subsequent faults at this relay location in the

validation set, although they vary in magnitude and location on the

load curve. This is in fact the result obtained.

The attack at relay 90 is not in the training set, and the

corresponding event traces at samples 5920, 21807, and 28838 are

flagged as anomalies. All other attack traces are considered

anomalous, as expected.

Two samples for the F92 event at 17135 are declared anomalous.

As the training set included the fault at this location, we consider

these false alarms. Even for these cases, the false alarm samples

comprise a small part of the trace. Typically, the anomalous

samples in a non-malicious fault trace occur toward the end of the

trace, in the unusual patterns observed as the system transitions to

normal operation. This represents a false alarm rate of less than

0.01%.

The results for the validation set on this run are 92.06% detection

on attack samples, but 100% attack detection when considering

traces. None of the false alarms occurred in samples corresponding

to normal system operation, even as the load curve varies.

5.2 Second Training Set: n=5500

The classifier learns five patterns. All normal samples are learned

as a single pattern class, as are faults F91 and F92. Fault F93 was

learned as two patterns, with the bulk of the trace (14 samples) as

one pattern, and the samples at the start and end as another pattern

with a count of two.

Two samples of the F92 fault near sample 15829 are considered

anomalous and counted as false alarms. All other fault and normal

samples were correctly classified, for a false alarm rate under

0.01%.

It appears that the sample learned for the F93 fault in the training

set degrades the detection performance of attack A93 (false data

injection at position 93) at samples 24441 and 26233. The learned

pattern for F93 is similar to at least some samples in these attack

traces. In the first of the A93 attacks in the validation set, which is

at 5x nominal magnitude, 21 samples of the approximately 30-

sample event trace were considered anomalous. For the second A93

event (near sample 26233, 3x magnitude), the detector misses the

entire trace. We note that even though the anomaly scores for the

first event were below threshold, they were on the high side,

matching the F93 pattern at scores of 0.85 or higher.

All other attack traces were detected, with results very similar to

those for the n=5000 experiment.

The results for this experiment were a false alarm rate under 0.01%

and a detection rate of 78.15% of samples, or 88.89% of traces.

5.3 Third Training Set: n=6000

The training set includes all fault exemplars above, as well as an

attack at relay 90 (event A90 at sample 5920). The expectation is

that all the fault exemplars would be learned, as would the A90

event. Subsequent instances of faults and also of the A90 event

should not appear anomalous.

The classifier learns seven patterns. As with the earlier results, the

normal samples are learned as a class, and the samples for faults

F91, and F92 are learned as distinct pattern classes. As before, F93

is learned as two pattern classes, with 14 samples in the bulk of the

trace as one pattern, and samples at the start and end of the trace

learned as a pattern with only these two samples. The attack trace

A90 is learned as two pattern classes, the larger with 26 samples,

and the smaller with 3 samples, once again at the start and end of

the trace.

We observe a single false alarm sample for the F92 event near

sample 17135.

As in the n=5500 experiment, samples for various instances of

attack A93 match the learned pattern for fault F93, degrading the

attack detection performance. As in the prior experiment, 21

samples of the A93 event near sample 24411 are considered

anomalous, but with relatively high scores, and the A93 trace near

sample 26233 is missed entirely

For this run, the false alarm rate is under 0.01% and the detection

rate is 71.11% of samples or 83.33% of traces, with all missed

detections occurring at position 93.

5.4 Results Summary

We conjecture that, due to some quirk in the topology, learning the

pattern for the non-malicious fault at relay 93 inhibits the ability to

detect an injection attack at the same location. We are investigating

why this is the case. This finding is nonetheless a useful result, as a

power system designer can use our approach to identify by

simulation the points in the system where an injection attack is

more difficult to detect, and either modify topology slightly or

deploy extra defense or measurement redundancy at these points.

These results indicate that inclusion of an attack trace in the training

phase does not contaminate the results or impact detection

performance. In an actual implementation, one would label patterns

corresponding to observed or modeled attacks. This can be

achieved either by expert analysis of the algorithm result for an

event of interest that one wishes to incorporate into the system’s

knowledge base as a labeled exemplar, or by injecting such a trace

into the training set.

The following table summarizes these results. Entries in red

indicate either false alarms or failure to detect an entire attack trace

(missed detection or false negative).

Table 1. Summary of Results

Number'of'Anomalous'Samples 'in'Event'Trace

Event

Starting'

Sample'

(Approx)

Train in g'

5000

Train in g'

5500

Train in g'

6000

F92 (2 X)

2006 0 0 0

F91 (2 x)

3833 0 0 0

F93 (2 x)

5400 15 0 0

A90 (5x)

5920 28 28 0

F91 (2 x)

13126 0 0 0

F93 (2 x)

14523 14 0 0

F92 (2 x)

15829 020

F92 (5 x)

17135 201

F91 (7 x)

18442 0 0 0

F93 (1 0x)

19748 15 0 0

A92 (5x)

20504 24 24 24

A90 (5x)

21807 25 25 0

A91 (5x)

23106 27 27 27

A93 (5x)

24411 29 21 21

A91 (10 x)

24930 28 28 28

A93 (3x)

26233 27 0 0

A92 (7x)

27535 28 28 28

A90 (10 x)

28838 30 30 0

0.01% 0.01% 0.00%

Detection

Samples 92.06% 78.15% 71.11%

Traces 100.00% 88.89% 83.33%

6. Related Work

The work presented here proposes anomaly detection based on

machine learning to detect data injection attacks in electrical

systems, and distinguish these from normal and non-malicious fault

states in such systems. Data injection attacks into electrical systems

have been described in [6, 8, 9, 10, 11]. The consequences of such

attacks can range from incorrect system response, which leads to

suboptimal operation, all the way to potentially destabilizing

control actions.

Lin and Ning [9] derived algebraic conditions for injection attacks

that are able to evade detection but result in incorrect system state

estimation. State estimation uses an iterative approach to estimate

state from observable measurements. Measurements are related to

state via a Jacobean matrix. The rank of the Jacobean in typical

transmission systems is such that injected error vectors in the kernel

of the matrix will lead to an incorrect state estimate, but the injected

error vector will not be flagged by the commonly used bad data

detection algorithms.

Bobba et al. [10] extended this result by considering detection and

countermeasures consisting of optimally placing a limited number

of costly but higher-fidelity, harder-to-compromise measurement

units (modern Phasor Measurement Units, or PMUs) so as to

achieve a degree of redundancy that greatly increases the attacker’s

burden. Teixeira and his collaborators [11] considered a radial

distribution system in which Conservation Voltage Reduction

(CVR) is applied. CVR is an energy conservation measure whereby

voltage is reduced slightly at the had of the distribution feeder, and

controlled by distribution transformers along the feeder to maintain

end-line voltage above the nominal limit. In this study, the authors

demonstrated that bad data injection would lead to suboptimal

CVR, but would not be likely to destabilize the system.

CPS offer the opportunity to leverage physical constraints to

advance cyber-physical security. The consistency of measurements

constrained by underlying physical laws can be leveraged to

complement conventional cyber defenses and implement a system

that requires an adversary to compromise network traffic as well as

measurements at multiple points, significantly raising adversary

work factor. This has been demonstrated by the use of KCL/KVL

constraints to distinguish fault and data injection conditions in

distribution and transmission substations [6, 8]. The present work

differs from these in that the KCL/KVL conditions are not

explicitly encoded in our algorithm, but we hypothesize that our

system learns the possible system states constrained by these

conditions.

Anomaly detection in enterprise systems has arguably fallen short

of hopes that it would provide a “magic bullet” against such threats

as zero-day attacks. This is because enterprise systems are highly

variable with respect to many features of interest. Generally,

anomaly detection in enterprise systems has not achieved adequate

detection sensitivity at an acceptable false positive rate. CPS, by

contrast, are characterized by regularities in communications and

measurements. In [12], for example, it was demonstrated that

communication patterns in a SCADA system are sufficiently

regular that they can be learned, and a detector based on anomaly

detection would be feasible. The authors of [13] developed an

anomaly detection system that, like ours, looks for anomalies in the

measurements. Their technique is to fit a quadratic regression to a

window of measurement points, and then perform a principal

component analysis (PCA) on the regression coefficient. The

anomaly detection is based on clustering of the first few principal

components. Our approach considers measurement samples

directly.

7. SUMMARY

Success of our approach with respect to normal and anomalous

patterns requires satisfying the following criteria:

• The classifier should be able to identify distinct states

corresponding to normal operation, non-malicious fault

(consistent with KCL/KVL), and false measurement

injection.

• A pattern corresponding to an injection in the training phase

should cause the system to learn a class that does not match

any normal or non-malicious fault pattern (in other words,

the class consists of the anomalous pattern only). Note that,

in particular, we do not require a “clean” training set.

• A pattern in the validation phase corresponding to an

injection should not match any normal or non-malicious

fault pattern classes well. In particular, a threshold based on

the match score defined above should easily identify this

pattern as an anomaly, even on the first presentation of the

anomalous pattern to the classifier.

The results obtained indicate that the approach described above

largely meets these criteria, subject to the preceding discussion on

false alarms and missed detections. This gives us confidence that,

at least in this domain, the constraints induced by underlying

physics of a CPS lead to states amenable to anomaly detection

based on machine learning. In order for an attacker to inject false

measurements and evade detection, he or she would need to

manipulate not merely messages in the cyber plane, but

measurements at multiple points in the physical plane of the CPS.

Requiring the adversary to simultaneously falsify a number of

measurements at multiple points in the CPS in a manner consistent

with the underlying physics represents a significant increase in

adversary capability to perpetrate a successful, undetected injection

attack.

Our system may be implemented and deployed in a variety of ways.

The first deployment concept is off-line and simulation based, as

we have done here. For CPS in which there is a high-fidelity

simulation framework, our system may be trained entirely off-line.

This is the case in the electric power sector, where technologies

such as RTDS are widely used for power system design. In this

case, part of the design process can include our algorithm to support

design for security, training a system that can detect false data from

measurements consistent with the physical laws. In the results we

have presented here, a security sensitivity analysis may result in the

identification of measurement points where, due to topology or

insufficient redundancy, false data may be difficult to distinguish

from actual measurements. In this situation, analysis based on our

system identifies points for cost-effective deployment of redundant

measurements or enhanced defenses. Off-line, simulation-based

training also enables the system designer to label patterns

corresponding to event classes of interest, so that these can be

classified in operational use as something more specific than

“anomaly”. This allows an implementation in which the learning is

semi-supervised: ground truth for some of the event classes is

known, and the corresponding traces can be labeled before training.

We intend to explore this variant in future work.

In on-line deployment, our algorithms self-organize and learn

patterns of events as these arise in operation. In these cases, expert

operators would need to look at the traces contributing to various

learned patterns, and possibly label these as normal, non-malicious,

or attack.

Finally, it is possible to train a system off-line based on a high-

fidelity simulation and some event labels, then transition this

system to online operation, with parameters permitting new

patterns to be learned and labeled by experts.

8. ACKNOWLEDGMENTS

The work is sponsored by the Department of Energy under grant

DE-OE0000674, under subcontract to ABB Research. The views

expressed are solely those of authors.

The authors would like acknowledge the support from the ABB US

Corporate Research Center in Raleigh, NC, the ECE Department

and the Information Trust Institute of the University of Illinois at

Urbana-Champaign, Ameren Illinois, and the US DOE CEDS

program.

9. REFERENCES

[1] Acromag, Inc. (2005). Introduction to MODBUS TCP/IP.

https://www.acromag.com/sites/default/files/Acromag_Intro_

ModbusTCP_765A.pdf

[2] Grossberg, S. (ed.). 1988. Neural Networks and Natural

Intelligence, MIT Press.

[3] IEC 61850 Communication networks and systems in

substations, all parts, Reference number IEC 61850-SER.

http://www.iec.ch/smartgrid/standards/

[4] Kohonen, S. 2001. Self-Organizing Maps, 3rd edition,

Springer.

[5] Stolfo, S., Hershkop, S, Bui, L., Ferster, R., Wang, K.

“Anomaly Detection for Computer Security and an

Application to File System Acceses”, in M. S> Hacid et al.,

ISMIS 2005.

[6] Macwan, R., Drew, C., Panumpabi, P., Valdes, A., Vaidya,

N., Sauer, P., and Ischenko, D. “Collaborative Defense

Against Data Injection Attacks in IEC 61850 Based Smart

Substation.” To appear in IEEE Power and Energy Society

(IEEE-PES), July 17-21, 2016.

[7] RTDS Technologies. 2015. https://www.rtds.com/

[8] Valdes, A., Cui Hang, Panumpabi, P., Vaidya, N., Drew, C.,

Ischenko, D. 2015. Design and simulation of fast substation

protection in IEC 61850 environments. In Modeling and

Simulation of Cyber-Physical Energy Systems (MSCPES),

Proceedings of the 2015 Workshop on Cyber-Physical

Systems (April 13, 2015), 1-6.

[9] Y. Liu, P. Ning, and M. Reiter, “False data injection attacks

against state estimation in electric power grids,” Proc. 16th

ACM Conf. on Computer and Communications Security

(CCS ’09), Chicago, IL, 2009, pp. 21–32.

[10] R. Bobba, K. Rogers, Q. Wang, H. Khurana, K. Nahrstedt,

and T. Overbye, “Detecting false data injection attacks on

DC state estimation,” Proc. 1st Workshop on Secure Control

Systems (SCS), Stockholm, Sweden, 2010. [Online].

Available:

https://www.truststc.org/conferences/10/CPSWeek/program.

htm

[11] A. Teixeira, G. Dán, H. Sandberg, R. Berthier, R. B. Bobba,

and A. Valdes, “Security of smart distribution grids: Data

integrity attacks on integrated volt/VAR control and

countermeasures,” Proc. American Control Conference

(ACC), Portland, OR, 2014, pp. 4372–4378. 

[12] Cheung, S. and Valdes, A. “Communication Pattern

Anomaly Detection in Process Control System Security”,

IEEE International Conference on Technologies for

Homeland Security, Waltham, MA , May 11-12, 2009

[13] Amidan, BG, JD Follum, KA Freeman, and JE

Dagle. 2015. “Baselining PMU Data to Find Patterns and

Anomalies.” In CIGRE US National Committee: 2015 Grid

of the Future Symposium, Chicago, IL.

[14] Communication networks and systems for power utility

automation – Part 9-2: Specific communication service

mapping (SCSM) – Sampled Values over ISO/IEC 8802-3,

IEC International Standard 61850-9-2, Ed. 2.0, Nov 2011

The role of machine learning in improving power distribution systems resilience

Chapter

Jan 2024

Lately, distribution systems have grown increasingly intricate and vulnerable to a wide range of disruptions, such as natural disasters, cyberattacks, and equipment failures. As a result, there is a growing need for methods that can improve the resilience of these systems and minimize their downtime. In the realm of cutting-edge technology, novel methods of artificial intelligence are coming to the forefront, with machine learning (ML) leading the pack and becoming increasingly applied in many sectors including the energy sector. These automated techniques can help enhance the resilience of distribution systems by providing real-time data analysis, predictive modeling, and automated decision-making capabilities. Accordingly, this chapter delves into the role played by different ML techniques in Revitalizing the tenacity of distribution networks. Specifically, it provides a comprehensive review of the existing studies on the application of ML in distribution systems’ resilience and provides several case studies to illustrate the practical applications of these robust methods aimed at minimizing the frequency of disruptions from both natural and man-made disasters. Additionally, this chapter details the challenges of deploying ML techniques for distribution systems resilience along with highlighting the future directions of research in this area that will address the challenges to fully leverage the potential of AI-powered approaches for improving distribution system resilience. This chapter will act as an insightful resource for different key stakeholders, researchers, and students with a vested interest in this area.

Smart Substation Network Fault Classification Based on a Hybrid Optimization Algorithm

Article

Jul 2019

Exploring exotic configurations with anomalous features with deep learning: Application of classical and quantum-classical hybrid anomaly detection

Article

Oct 2023

We present the application of classical and quantum-classical hybrid anomaly detection schemes to explore exotic configurations with anomalous features. We consider the Anderson model as a prototype, where we define two types of anomalies—a high conductance in the presence of strong impurity and a low conductance in the presence of weak impurity—as a function of random impurity distribution. Such anomalous outcome constitutes an imperceptible fraction of the data set and is not a part of the training process. These exotic configurations, which can be a source of rich new physics, usually remain elusive to conventional classification or regression methods and can be tracked only with a suitable anomaly detection scheme. We also present a systematic study of the performance of the classical and the quantum-classical hybrid anomaly detection method and show that the inclusion of a quantum circuit significantly enhances the performance of anomaly detection, which we quantify with suitable performance metrics. Our approach is quite generic in nature and can be used for any system that relies on a large number of parameters to find their new configurations, which can hold exotic new features.

A survey of anomaly detection methods for power grids

Article

Full-text available

Jul 2023
INT J INF SECUR

The power grid is a constant target for attacks as they have the potential to affect a large geographical location, thus affecting hundreds of thousands of customers. With the advent of wireless sensor networks in the smart grids, the distributed network has more vulnerabilities than before, giving numerous entry points for an attacker. The power grid operation is usually not hindered by small-scale attacks; it is popularly known to be self-healing and recovers from an attack as the neighboring areas can mitigate the loss and prevent cascading failures. However, the attackers could target users, admins and other control personnel, disabling access to their systems and causing a delay in the required action to be taken. Termed as the biggest machine in the world, the US power grid has only been having an increased risk of outages due to cyber attacks. This work focuses on structuring the attack detection literature in power grids and provides a systematic review and insights into the work done in the past decade in the area of anomaly or attack detection in the domain.

Exploring exotic configurations with anomalous features with deep learning: Application of classical and quantum-classical hybrid anomaly detection

Preprint

Full-text available

Apr 2023

In this paper we present the application of classical and quantum-classical hybrid anomaly detection schemes to explore exotic configuration with anomalous features. We consider the Anderson model as a prototype where we define two types of anomalies - a high conductance in presence of strong impurity and low conductance in presence of weak impurity - as a function of random impurity distribution. Such anomalous outcome constitutes less than 10% of a data set and is not a part of the training process. The anomaly detection is therefore more suitable to detect unknown features which is not possible with conventional classification or regression methods. We also present a systematic study of the performance of the classical and the hybrid method and show that the inclusion of a quantum circuit significantly enhances the performance of anomaly detection which we quantify with suitable performance metrics. Our approach is quite generic in nature and can be used for any system that relies on a large number of parameters to find their new configurations which can hold exotic new features.

Trustworthy cyber-physical power systems using AI: dueling algorithms for PMU anomaly detection and cybersecurity

Article

Full-text available

Jun 2024
ARTIF INTELL REV

Energy systems require radical changes due to the conflicting needs of combating climate change and meeting rising energy demands. These revolutionary decentralization, decarbonization, and digitalization techniques have ushered in a new global energy paradigm. Waves of disruption have been felt across the electricity industry as the digitalization journey in this sector has converged with advances in artificial intelligence (AI). However, there are risks involved. As AI becomes more established, new security threats have emerged. Among the most important is the cyber-physical protection of critical infrastructure, such as the power grid. This article focuses on dueling AI algorithms designed to investigate the trustworthiness of power systems’ cyber-physical security under various scenarios using the phasor measurement units (PMU) use case. Particularly in PMU operations, the focus is on areas that manage sensitive data vital to power system operators’ activities. The initial stage deals with anomaly detection applied to energy systems and PMUs, while the subsequent stage examines adversarial attacks targeting AI models. At this stage, evaluations of the Madry attack, basic iterative method (BIM), momentum iterative method (MIM), and projected gradient descend (PGD) are carried out, which are all powerful adversarial techniques that may compromise anomaly detection methods. The final stage addresses mitigation methods for AI-based cyberattacks. All these three stages represent various uses of AI and constitute the dueling AI algorithm convention that is conceptualised and demonstrated in this work. According to the findings of this study, it is essential to investigate the trade-off between the accuracy of AI-based anomaly detection models and their digital immutability against potential cyberphysical attacks in terms of trustworthiness for the critical infrastructure under consideration.

Understanding Cybersecurity Challenges and Detection Algorithms for False Data Injection Attacks in Smart Grids

Chapter

Oct 2023

In Smart grid (SG), cyber-physical attacks (CPA) are the most critical hurdles to the use and development. False data injection attack (FDIA) is a main group among these threats, with a broad range of methods and consequences that have been widely documented in recent years. To overcome this challenge, several recognition processes have been developed in current years. These algorithms are mainly classified into model-based algorithms or data-driven algorithms. By categorizing these algorithms and discussing the advantages and disadvantages of each group, this analysis provides an intensive overview of them. The Chapter begins by introducing different types of CPA as well as the major stated incidents history. In addition, the chapter describes the use of Machine Learning (ML) techniques to distinguish false injection attacks in Smart Grids. A few remarks are made in the conclusion as to what should be considered when developing forthcoming recognition algorithms for fake data injection attacks.

Cyber-Attack Event Analysis for EV Charging Stations

Conference Paper

Jul 2023

Anomaly Detection Using Machine Learning Techniques: A Systematic Review

Chapter

Aug 2023

Anomaly detection is an observation of irregular, uncommon events that leads to a deviation from the expected behaviour of a larger dataset. When data is multiplied exponentially, it becomes sparse, making it difficult to spot anomalies. The fundamental aim of anomaly detection is to determine odd cases as the data may be properly evaluated and understood to make the best decision possible. A promising area of research is detecting anomalies using modern ML algorithms. Many machines learning models that are used to learn and detect anomalies in their respective applications across various domains are examined in this systematic review study.KeywordsAnomaliesAnomaly detectionMachine learning techniquesApplications

Attack Detection Based on Machine Learning Techniques to Safe and Secure for CPS—A Review

Chapter

Apr 2023

Technological progression in communication and computing domains has led to the advent of cyber-physical systems (CPS). As an emerging technological advancement, CPS security is considered one of the prominent research directions these days. CPS is featured by its potential to integrate the cyber and physical data of the real world. CPS deployment in major infrastructure has shown the ability to reshape the world. Although, harnessing this ability is confined by their decisive nature and deep-seated consequences of cyber-attacks on surroundings or environment, infrastructure, and humans. In CPS, the substantial cyber concerns surge from the procedure of information transmission from multiple sensors to diverse actuators via the wireless medium, thus augmenting the attack region. Conventionally, CPS safety has been inspected from the standpoint of impending intruders from acquiring access to crucial systems using crypto-graphic or access control schemes. Thus, most research studies have emphasized attack detection in CPS. Although, in a sphere of growing adversaries, safe-guarding CPS from diverse adversarial attacks is becoming extremely sophisticated. Therefore, the need emerges for constructing resilient CPS which can con-front disruptions and stay functional despite adversarial attacks. Among the predominant methods investigated for constructing robust CPS, machine learning (ML) techniques have displayed greater suitability. However, from the latest studies regarding adversarial ML, it is advisable that for protecting CPS, ML techniques should themselves be robust. Therefore, this paper is intended at surveying the ML techniques employed for securing CPS and for detecting several attacks on CPS. It discusses the various design challenges, security objectives, security measures, security and reliability requirements of CPS, attack detection frameworks, and performance measures employed in prior works. Furthermore, it concludes with several research gaps and future directions for improving ML techniques and developing secure CPS.KeywordsCPSSecurity threatsMachine learning

Security of smart distribution grids: Data integrity attacks on integrated volt/VAR control and countermeasures

Conference Paper

Full-text available

Jun 2014

We examine the feasibility of an attack on the measurements that will be used by integrated volt-var control (VVC) in future smart power distribution systems. The analysis is performed under a variety of assumptions of adversary capability regarding knowledge of details of the VVC algorithm used, system topology, access to actual measurements, and ability to corrupt measurements. The adversary also faces an optimization problem, which is to maximize adverse impact while remaining stealthy. This is achieved by first identifying sets of measurements that can be jointly but stealthily corrupted. Then, the maximal impact of such data corruption is computed for the case where the operator is unaware of the attack and directly applies the configuration from the integrated VVC. Furthermore, since the attacker is constrained to remaining stealthy, we consider a game-theoretic framework where the operator chooses settings to maximize observability and constrain the adversary action space.

Detecting Stealthy False Data Injection Using Machine Learning in Smart Grid

Article

Full-text available

Jun 2014

Aging power industries together with increase in the demand from industrial and residential customers are the main incentive for policy makers to deﬁne a road map to the next generation power system called smart grid. In smart grid, the overall monitoring costs will be decreased but at the same time, the risk of cyber attacks might be increased. Recently a new type of attacks (called the stealth attack) has been introduced, which cannot be detected by the traditional bad data detection using state estimation. In this paper, we show how normal operations of power networks can be statistically distinguished from the case under stealthy attacks. We propose two machine learning based techniques for stealthy attack detection. The ﬁrst method utilizes the supervised learning over labeled data and trains a distributed support vector machine. The design of the distributed SVM is based on the Alternating Direction Method of Multipliers, which offers provable optimality and convergence rate. The second method requires no training data and detects deviation in measurements. In both methods, principle component analysis is used to reduce the dimensionality of the data to be processed, which leads to lower computation complexities. The results of the proposed detection methods on the IEEE standard test systems demonstrate the effectiveness of both schemes.

Detecting False Data Injection Attacks on DC State Estimation

Article

Full-text available

Jan 2010

State estimation is an important power system application that is used to estimate the state of the power transmission networks using (usually) a redundant set of sensor measurements and network topology information. Many power system applications such as contingency analysis rely on the output of the state estimator. Until recently it was assumed that the techniques used to detect and identify bad sensor measurements in state estimation can also thwart malicious sensor measurement modification. However, recent work by Liu et al. [1] demonstrated that an adversary, armed with the knowledge of network configuration, can inject false data into state estimation that uses DC power flow models without being detected. In this work, we explore the detection of false data injection attacks of [1] by protecting a strategically selected set of sensor measurements and by having a way to independently verify or measure the values of a strategically selected set of state variables. Specifically, we show that it is necessary and sufficient to protect a set of basic measurements to detect such attacks.

Communication Pattern Anomaly Detection in Process Control Systems

Conference Paper

Full-text available

Jun 2009

Digital control systems are increasingly being deployed in critical infrastructure such as electric power generation and distribution. To protect these process control systems, we present a learning-based approach for detecting anomalous network traffic patterns. These anomalous patterns may correspond to attack activities such as malware propagation or denial of service. Misuse detection, the mainstream intrusion detection approach used today, typically uses attack signatures to detect known, specific attacks, but may not be effective against new or variations of known attacks. Our approach, which does not rely on attack-specific knowledge, may provide a complementary detection capability for protecting digital control systems.

Design and simulation of fast substation protection in IEC 61850 environments

Article

May 2015

The IEC 61850 protocol suite provides significant benefits in electrical substation design and enables formal validation of complex device configurations to ensure that design objectives are met. One important benefit is the potential for protective relays to react in a collaborative fashion to an observed fault current. Modern relays are networked cyberphysical devices with embedded systems, capable of sophisticated protection schemes that are not possible on legacy overcurrent relays. However, they may be subject to error or cyber attack. Herein, we introduce the CODEF (Collaborative Defense) project examining distributed substation protection. Under CODEF, we derive algorithms for distributed protection schemes based on distributed agreement. By leveraging Kirchhoff's laws, we establish that certain fast agreement protocols have important equivalences to linear coding and error correction theory. In parallel, we describe a cyber-physical simulation environment in which these algorithms are being validated with respect to the strict time constraints of substation protection.

Self-Organizing Maps

Book

Jan 2001

Teuvo Kohonen

Self-organizing maps. 2nd ed

Article

Jan 1995

Teuvo Kohonen

Self-Organizing Maps 3rd edition

Article

T. Kohonen

Self-Organising Maps

Book

Nov 1994

Teuvo Kohonen

The Self-Organising Map (SOM) algorithm was introduced by the author in 1981. Its theory and many applications form one of the major approaches to the contemporary artificial neural networks field, and new technologies have already been based on it. The most important practical applications are in exploratory data analysis, pattern recognition, speech analysis, robotics, industrial and medical diagnostics, instrumentation, and control, and literally hundreds of other tasks. In this monograph the mathematical preliminaries, background, basic ideas, and implications are expounded in a manner which is accessible without prior expert knowledge.

Neural Networks and Natural Intelligence

Article

Jan 1988

Stephen Grossberg

Packed with real-time computer simulations and rigorous demonstrations of these phenomena, this book includes results on vision, speech, cognitive information processing, adaptive pattern recognition, adaptive robotics, conditioning and attention, cognitive-emotional interactions, and decision making under risk. "Neural Networks and Natural Intelligence" first discusses neural network architecture for preattentive 3-D vision and then shows how this architecture provides a unified explanation, through systematic computer simulations, of many classical and recent phenomena from psycho-physics, visual perception, and cortical neurophysiology. It illustrates within the domain of preattentive boundary segmentation and featural filling-in, how computer experiments help to develop and refine computational vision models. (PsycINFO Database Record (c) 2012 APA, all rights reserved)

Anomaly Detection in Electrical Substation Circuits via Unsupervised Machine Learning

Figures

Recommended publications

Research on EIS-Based Anomaly Detection Technique for Composite Materials

Design and simulation of fast substation protection in IEC 61850 environments

Dummy Data Attacks in Power Systems

Robust Meter Placement against False Data Injection Attacks on Power System State Estimation

Stealth Attacks and Protection Schemes for State Estimators in Power Systems