(PDF) A Stacked Machine and Deep Learning Model for Electricity Theft Detection to Secure Smart Grid

A Stacked Machine and Deep Learning Model for

Electricity Theft Detection to Secure Smart Grid

By

Arshid Ali

CIIT/Registration No: SP21-REE-004/ISB

MS thesis

In

Electrical Computer Engineering

COMSATS University Islamabad

Fall 2022

MS Thesis Electricity Theft Detection

COMSATS University Islamabad

A Stacked Machine and Deep Learning Model for

Electricity Theft Detection to Secure Smart Grid

A Thesis Presented to

COMSATS University Islamabad

In partial fulﬁllment

of the requirement for the degree

Of

MS Electrical & Computer Engineering

By

Arshid Ali

CIIT/Registration No: SP21-REE-004/ISB

Fall 2022

iBy: Arshid Ali

MS Thesis Electricity Theft Detection

A Stacked Machine and Deep Learning Model for

Electricity Theft Detection to Secure Smart Grid

A Post Graduate Thesis submitted to the Department of Electrical Computer

Engineeringas partial fulﬁlment for the award of Degree MS Electrical Computer

Engineering.

Name Registration Number

Arshid Ali CIIT/Registration No: SP21-REE-004/ISB

Supervisor: Co-Supervisor:

Dr. Laiq Khan Dr. Nadeem Javaid

Professor Professor

Department of Electrical Computer Engineering Computer Science

COMSATS University Islamabad COMSATS University Islamabad

Signature:

Arshid Ali

(CIIT/Registration No: SP21-REE-004/ISB)

ii By: Arshid Ali

MS Thesis Electricity Theft Detection

Final Approval

This thesis titled

A Stacked Machine and Deep Learning Model for Electricity Theft

Detection to Secure Smart Grid

By

Arshid Ali

CIIT/Registration No: SP21-REE-004/ISB

has been approved for

COMSATS University Islamabad

External Examiner

Examiner Name

Department of Computer Science

ABCD University, Islamabbad

Supervisor: Co-Supervisor:

Dr. Laiq Khan Dr. Nadeem Javaid

Professor Professor

Department of Electrical Computer Engineering Computer Science

COMSATS University Islamabad COMSATS University Islamabad

Head of Department:

HoD Name

HoD Title,

Department of ECE

COMSATS University Islamabad

iii By: Arshid Ali

MS Thesis Electricity Theft Detection

Declaration

I, Arshid Ali, registration number CIIT/Registration No: SP21-REE-004/ISB, hereby

declare that I have produced the work presented in this thesis, during the sched-

uled period of study. I also declare that I have not taken any material from any

source except referred to wherever due that amount of plagiarism is within ac-

ceptable range. If a violation of HEC rules on research has occurred in this thesis,

I shall be liable to punishable action under the plagiarism rules of the HEC.

Date: December 19, 2022

Arshid Ali

CIIT/Registration No:

SP21-REE-004/ISB

iv By: Arshid Ali

MS Thesis Electricity Theft Detection

Certiﬁcate

It is certiﬁed that Arshid Ali (CIIT/Registration No: SP21-REE-004/ISB) has

carried out all the work related to this thesis under my supervision at the Depart-

ment of Electrical Computer Engineering, COMSATS University Islamabad, and

the work fulﬁlls the requirement for award of MS degree.

Date: December 19, 2022

Head of Department:

HoD Name

HoD Title

Department of Electrical Computer Engi-

neering

COMSATS University Islamabad

Supervisor:

Dr. Laiq Khan

Professor

COMSATS University Islamabad

Co-Supervisor:

Dr. Nadeem Javaid

Professor

COMSATS University Islamabad

vBy: Arshid Ali

MS Thesis Electricity Theft Detection

Dedication

Dedicated to my Family, Teachers and Friends.

vi By: Arshid Ali

MS Thesis Electricity Theft Detection

Acknowledgements

This thesis would not have been possible without the support of many people.

Prof.Dr. Laiq Khan and Prof. Dr. Nadeem Javaid have been ideal teachers,

mentors, and thesis supervisors, oﬀering advice and encouragement with a perfect

blend of insight and humor. Im proud of, and grateful for, my time working with

them. Thanks to my adviser, Prof. Dr. Laiq Khan, and Prof. Dr. Nadeem

Javaid, who read my numerous revisions and helped clarify the confusion. Also

thanks to Dr. Junaid Ikram, Dr. Guftar Ahmad, and Dr. Fasih Uddin, who

oﬀered guidance and support.

Thanks to the COMSATS University Islamabad for awarding me a Dissertation

Completion Fellowship, and providing me with the ﬁnancial means to complete

this project. And ﬁnally, thanks to my colleagues, parents, and numerous friends

who endured this long process with me, always oﬀering support and love.

Thank You.

Arshid Ali

CIIT/Registration No: SP21-REE-004/ISB

vii By: Arshid Ali

MS Thesis Electricity Theft Detection

Abstract

Abstract-1

Energy management and eﬃcient asset utilization play an important role in the

economic development of a country. The electricity produced at the power station

faces two types of losses from the generation point to the end user. These losses are

technical losses (TL) and non-technical losses (NTL). The technical losses are due

to the use of ineﬃcient equipment. Non-technical losses mainly occur due to the

illegal use of electricity by customers in the form of theft mainly at the consump-

tion level. These losses in the smart grid (SG) are the main issue in maintaining

grid stability and cause revenue loss to the utility. The automatic metering infras-

tructure (AMI) system has reduced grid instability but it has opened up new ways

for NTLs in the form of diﬀerent cyber-physical theft attacks (CPTA). Machine

learning (ML) techniques can be used to detect and minimize CPTA. However,

they have certain limitations and cannot capture the energy consumption pat-

tern (ECP) of all the users, which decreases the performance of ML techniques

in detecting malicious users. In this paper, we propose a novel ML-based stacked

generalization method for the cyber-physical theft issue in the smart grid. The

original data obtained from the grid is pre-processed to improve model training

and processing. This includes NaN-imputation, normalization, outliers’ capping,

SVM-SMOTE balancing, and PCA-based data reduction techniques. The pre-

processed dataset is provided to the ML models LGB, ET, XGBoost, and RF, to

accurately capture all consumers’ overall ECP. The predictions from these base

models are fed to a meta-classiﬁer multi-layer perceptron (MLP). The MLP com-

bines the learning capability of all the base models and gives an improved ﬁnal

prediction. The proposed structure is implemented and veriﬁed on the publicly

available real-time large dataset of the State Grid Corporation of China (SGCC).

The proposed model outperformed the individual base classiﬁers and the existing

research in terms of CPTA detection with FPR, FNR, F1-Score, and accuracy

values of 0.72%, 2.05%, 97.6%, and 97.69%, respectively.

viii By: Arshid Ali

MS Thesis Electricity Theft Detection

Abstract-2

Electricity plays an important role in our daily life and the demand is increas-

ing day by day. Therefore, while meeting the required energy demand, eﬃcient

energy resource utilization should also be considered due to the limited resources

available. The electricity generated at the power station incurs signiﬁcant losses

in reaching the end consumers. These losses are of technical and non-technical

(NT) categories in which NT losses are billions of dollar Many techniques are in-

troduced by utility companies to address this issue. Recently, a lot of machine

learning-based classiﬁers have been utilized to deal with NTL. However, little

study has been conducted on the evaluation criteria used in NTL identiﬁcation to

assess how successful or inaccurate the algorithm is at properly forecasting non-

technical loss. In a manner similar to this, the presence of unbalanced classes

in this sort of data presents a gap to research unbalanced data management so-

lutions, which are mostly unexplored in the literature. In order to choose which

classiﬁer and balancing method produce the best classiﬁcation results for the theft

detection problem, the authors in this paper carried out a comparative analysis

of various machine-learning algorithms on several data balancing techniques. The

given research applied the 15 simple machine-learning techniques of LR, BNB,

GNB, KNN, Perceptron, PAC, QDA, SGDC, RC, LDA, DT, NCC, MNB, CNB

and dummy classiﬁer. While SMOTE, AdaSyn, SMOBD, NRAS, and CCR are

considered for data balancing. Area Under ROC Curve (AUC), F1-score, and

other six performance measures, which are better suited for this type of situation,

were used as comparative measures. The results indicate that some classiﬁers

show better performance than others when compared to diﬀerent class balancing

methods.

ix By: Arshid Ali

MS Thesis Electricity Theft Detection

Contents

Thesis Title ............................... i

Cover Page ............................... i

Supervisor Approval .......................... i

Final Approval ............................. iii

Declaration ............................... iv

Certiﬁcate ................................ v

Dedication ................................ vi

Acknowledgements ........................... vii

Abstract .................................viii

Contents x

List of Figures xvi

xvii

List of Tables xviii

xviii

xBy: Arshid Ali

MS Thesis Electricity Theft Detection

List of Abbreviations xix

1Preliminary 1

1.1 Introduction ............................... 2

1.2 Electricity Grid ............................. 3

1.2.1 Traditional Grid ........................ 4

1.2.2 Smart Grid ........................... 5

1.2.2.1 Characteristics of smart grid ............. 5

1.3 Data Science ............................... 6

1.3.1 Data Science Applications ................... 7

1.4 Structures of Machine Learning Algorithms .............. 8

1.4.1 Individual Structure ...................... 8

1.4.2 Ensemble Structure ....................... 8

1.4.2.1 Bagging Model .................... 9

1.4.2.2 Boosting Model .................... 10

1.4.2.3 Stacking Model .................... 12

1.5 Data Science and Smart Grid ..................... 13

1.6 Summary ................................ 14

2Introduction 15

2.1 Introduction ............................... 16

2.1.1 Background and Motivation .................. 16

2.1.2 Non-Technical Losses Issues in Smart Grid .......... 20

2.1.3 Contributions .......................... 22

xi By: Arshid Ali

MS Thesis Electricity Theft Detection

2.2 Layout of Thesis ............................ 25

2.2.1 Summary ............................ 25

3Literature Review 26

3.1 NTL detection schemes categories ................... 27

3.1.1 Hardware Based ......................... 27

3.1.2 Game theory Based ....................... 29

3.1.3 Artiﬁcial Intelligence Based .................. 29

3.2 Problem Analysis ............................ 37

3.3 Summary ................................ 40

4Proposed Model-1 and Simulation Results 42

4.1 Introduction ............................... 43

4.2 Dataset Information .......................... 45

4.3 Pre-processing .............................. 45

4.3.1 Missing Data Imputation .................... 46

4.3.2 Handling Outliers ........................ 47

4.3.3 Unit based Normalization ................... 49

4.3.4 Data Balancing ......................... 50

4.3.5 Feature Engineering ...................... 52

4.4 Model Selection ............................. 54

4.4.1 Base Learner-1 ......................... 54

4.4.2 Base Learner-2 ......................... 54

4.4.3 Base Learner-3 ......................... 56

xii By: Arshid Ali

MS Thesis Electricity Theft Detection

4.4.4 Base Learner-4 ......................... 56

4.4.5 Stacking Model ......................... 57

4.5 MLP Mathematical Modeling ..................... 58

4.6 Performance Metrics .......................... 59

4.7 Simulation Setup ............................ 62

4.8 Results Discussion and Evaluation ................... 62

4.9 Summary ................................ 66

5Proposed Model-2 and Simulation Results 68

5.1 Classiﬁcation Algorithms ........................ 69

5.1.1 Decision Tree (DT) ....................... 69

5.1.2 Logistic Regression ....................... 70

5.1.3 K Nearest Neighbors Classiﬁer ................. 72

5.1.4 Bernoulli Naive Bayes Classiﬁer ................ 73

5.1.5 Perceptron ............................ 74

5.1.6 Linear Discriminant Analysis ................. 76

5.1.7 Passive Aggressive Classiﬁer .................. 78

5.1.8 Stochastic Gradient Descent .................. 80

5.1.9 Gaussian Naive Bayes ..................... 81

5.1.10 Multinomial Naive Bayes Algorithm .............. 82

5.1.11 Ridge Classiﬁer ......................... 84

5.1.12 Nearest Centroid Classiﬁer ................... 85

5.1.13 Quadratic Discriminant Analysis ............... 86

xiii By: Arshid Ali

MS Thesis Electricity Theft Detection

5.1.14 Complement Naive Bayes ................... 88

5.1.15 Dummy/Blind Classiﬁer .................... 89

5.2 Data Balancing Techniques ....................... 90

5.2.1 Synthetic Minority Over-Sampling Technique ........ 90

5.2.2 Adaptive Synthetic sampling approach ............ 91

5.2.3 Combined Cleaning and Re-sampling Technique ....... 91

5.2.4 Noise Reduction A Priori Synthetic Over-Sampling (NRAS) 92

5.2.5 SMOBD (Synthetic Minority Over-sampling Based on sam-

ples Density) .......................... 92

5.3 Research Methodology ......................... 93

5.4 Evaluation Parameters Used ...................... 95

5.4.1 Accuracy ............................. 95

5.4.2 Recall .............................. 95

5.4.3 Precision ............................. 96

5.4.4 F1Score ............................. 96

5.4.5 Area Under the Curve ..................... 96

5.4.6 False Positive Rate ....................... 96

5.4.7 False Negative Rate ....................... 96

5.4.8 Matthews Correlation Coeﬃcient ............... 97

5.4.9 Receiver Operator Characteristic ............... 97

5.5 Dataset and Simulation setup ..................... 97

5.6 Simulation Results and Analysis .................... 98

5.6.1 Output Performance using SMOTE-based Data Balancing . 100

xiv By: Arshid Ali

MS Thesis Electricity Theft Detection

5.6.2 Output Performance using ADASYN-based Data Balancing 101

5.6.3 Output Performance using SMOBD-based Data Balancing . 103

5.6.4 Output Performance using NRAS-based Data Balancing . . 104

5.6.5 Output Performance using CCR-based Data Balancing . . . 106

6Conclusion & Future Work 112

6.1 Conclusion ................................113

6.2 Conclusion ................................114

6.3 Future Work ...............................115

Bibliography 116

xv By: Arshid Ali

MS Thesis Electricity Theft Detection

List of Figures

1.1 Individual Machine Learning Model. ................. 9

1.2 Bagging Type Machine Learning Model. ............... 10

1.3 Boosting Machine Learning Model. .................. 11

1.4 Stacking Machine Learning Model. .................. 12

2.1 Energy Generation by Sources. .................... 17

2.2 Economic Losses in Diﬀerent Countries in Billion of USD. . . . . . . 19

2.3 Structure of AMI Network. ....................... 20

2.4 ML Techniques in Various Fields. ................... 22

2.5 Proposed-Model Flow chart. ...................... 24

4.1 Proposed ETD Stacked Generalization Model. ............ 44

4.2 Electricity Consumption Pattern of Two Random Consumers from

SGCC Dataset. ............................. 46

4.3 Total contribution of outliers. ..................... 48

4.4 Variations in ECP of Electric Theft and Honest Consumer. . . . . . 50

4.5 SVM-SMOTE base Balanced Dataset(SGCC). ............ 51

4.6 SVM-SMOTE System Diagram. .................... 52

xvi By: Arshid Ali

MS Thesis Electricity Theft Detection

4.7 Proposed-Model Performance on SGCC Dataset. .......... 62

4.8 Proposed-Model Confusion Matrix on SGCC Dataset. ........ 63

4.9 Precision-Recall Curve of the Base Models and Proposed-Model. . . 64

4.10 Accuracy Comparison of Level-0 and Proposed-Model. ....... 65

4.11 Comparison of Base Models’ ROC with Proposed-Model ROC. . . . 65

4.12 Comparison of AUC, F1-SCORE and Accuracy of Base Models and

Proposed Model. ............................ 66

5.1 Proposed Electricity Theft Detection Model. ............. 94

xvii By: Arshid Ali

MS Thesis Electricity Theft Detection

List of Tables

1 List of abbreviations and acronyms. .................xix

2.1 U.S. Utility-Scale Electricity Generation by Source. ......... 18

3.1 ...................................... 37

4.1 SGCC Original Dataset Information. ................. 45

4.2 Output Performance of Diﬀerent Models on pre-processed data. . . 63

4.3 Models Performance on Diﬀerent Data Splitting. ........... 64

4.4 {F1-Score, AUC, Accuracy of the Base Models, and Proposed Model 66

5.1 Information of Real World SGCC Dataset. .............. 93

5.2 15 Models Output Performance ....................109

5.3 15 Models Output Performance ....................110

xviii By: Arshid Ali

MS Thesis Electricity Theft Detection

Abbreviations Full Form

AMI Automatic metering infrastructure

CPTA Cyber-physical theft attack

PCA Principal component analysis

SVM Support vector machine

SMOTE Synthetic minority oversampling technique

LGB Light gradient boosting

ET Extremely randomized tree

XGBoost Extreme gradient boosting

MLP Multi-layer perceptron

RF Random forest

TNR True negative rate

FNR False negative rate

EPL Electric power loss

NTL Non-technical loss

TL Technical loss

SG Smart grid

ECP Energy consumption pattern

TPR True positive rate

FPR False positive rate

AUC Area under the curve

ROC Receiver operating characteristics curve

NaN Not a number

Symbols Description

σStandard deviation

Z Z score

µMean value

W Weight matrix

xm,n Daily energy consumption

N Number of features

λEigen values

c Number of classes

b Bias

x_i Input data-point

piProbability of outcome of a class

Table 1: List of abbreviations and acronyms.

xix By: Arshid Ali

A Stacked Machine and Deep Learning Model for

Electricity Theft Detection to Secure Smart Grid

December 19, 2022

MS Thesis Electricity Theft Detection

Chapter 1

Preliminary

1By: Arshid Ali

MS Thesis Electricity Theft Detection

1.1 Introduction

The electric power grid is the most complex man-made network to date that is

designed to work reliably in extreme environmental conditions and diﬀerent ge-

ographical locations. The integration of large renewable resources into the sys-

tem also aﬀects the system’s operation. This needs the electricity network to be

modernized to feed the electricity to the community. Upgrading the traditional

electricity network with new technological and innovative infrastructure made the

grid more smarter. A smart grid is deﬁned as an intelligent electricity network

that allows a two-way ﬂow of electricity and information in the network. This

information ﬂow in the smart grid is made possible by an automatic metering

infrastructure (AMI) system in the grid which interconnected the consumer and

utility. Thus big data is obtained at the utility end from a community of energy

consumers (EC). Manually, analyzing this big data, for a particular power ﬂow

between consumers and utilities, is a challenging task and time-consuming.

The present data-driven approaches made this possible to analyze the energy ﬂow,

to a given consumer in the community, using the acquired data. Data science,

which is the science of data, made it easy to gain information from data available in

a raw form. Due to this informative nature, data science made its way into almost

every ﬁeld of life. In smart grids, data science is used in load forecasting, energy

forecasting from wind and solar, electric vehicle (EV) battery life forecasting,

grid intelligence, operation and planning, fault detection, and electricity theft

detection, which is the main research topic in this thesis.

Machine learning techniques can be implemented in diﬀerent ways to be used in the

above-mentioned ﬁelds. For example, the algorithm can be used in individual or

make some form of combination to obtain a diﬀerent model like bagging, boosting,

and stacking. These diﬀerent combinations of algorithms make the available data

easy to predict a future event. Following these diﬀerent approaches, the electricity

theft detection issue can be addressed.

2By: Arshid Ali

MS Thesis Electricity Theft Detection

Preliminary

1.2 Electricity Grid

An electrical grid is an interconnection for energy dispatch from the generation

point to the end consumer. The electric grid consists of diﬀerent sizes and may be

expanded through a country or every continent. It comprises a complete energy

system that consists of generation units, transmission lines, distribution points,

transmission and distribution transformers, and the load [1].

Generally, the generation units are situated at locations far from the end consumer

while the electrical grid connects both the production and consumption ends. The

whole energy system is divided into the following three major parts:

1. GENERATION

The electrical generation is of two types:

(a) A centralized system consists of traditional and few large generating

points that are far away from the end users. This type of generation in-

cludes hydro, nuclear, coal, natural gas, and large solar and wind farms.

Where the grid act as an interconnecting point between generation and

consumers.

(b) While the distribution generation exists close to the consumption points,

for example, a diesel generator and rooftop solar plant.

2. TRANSMISSION and DISTRIBUTION

The transmission consists of transformers, substations, and power lines that

carry electricity from the point of generation to the point of consumption.

For long distances electricity is delivered at a high voltage as electricity at

high voltages minimizes transmission losses over long distances, such as resis-

tive losses in transmission lines. In order to transmit electricity, substations

contain transformers that step-up the voltage at the point of generation.

Transmission occurs via power lines, either overhead or underground. To

step down the voltage for consumption at end-use points, another substa-

tion is found. Energy is then delivered to end users through distribution

lines and distribution transformers.

3By: Arshid Ali

MS Thesis Electricity Theft Detection

3. CONSUMPTION

Commercial, industrial, and residential consumers form three types of con-

sumers. There are diﬀerent needs for each of these consumers, but electricity

generally provides light and power for electrical devices.

Prior to the introduction of demand-side management, the transmission and dis-

tribution (T D) system was designed and built to handle peak loads. It was a

passive delivery network for delivering energy to consumers. The transmission

and distribution used to provide the electricity using the whole electricity net-

work, from which the end consumers used only the needed energy and the rest is

discarded [2].

1.2.1 Traditional Grid

The traditional grid is considered the electricity network that allows the ﬂow of

electricity in one direction and central large power generation units are installed.

The present electricity network upgraded from its traditional system. The tradi-

tional power grid is more like a physical system with manual operation. However,

due to a large number of challenges to this traditional system, a highly reliable

and sustainable grid is required to ensure a continuous supply of electricity to

modern societies [3].

In the future, the electricity grid is expected to be equipped with modern techno-

logical and innovative components, where today’s power grid is facing challenges,

in the following manner:

1. Addition of a large number of intermittent energy resources.

2. Renew the old electricity network, which is facing severe threats due to

societal and population growth, by adding modern devices and energy man-

agement techniques.

3. The existing power grid is aﬀected by natural disasters (such as earthquakes,

ﬂoods, hurricanes, etc) and is badly aﬀecting the grid network. So a ﬂexible

energy system is required.

4By: Arshid Ali

MS Thesis Electricity Theft Detection

1.2.2 Smart Grid

The electricity grids that exist today are mostly built in the past when the electric-

ity generation cost was very low. Whereas, almost the same energy ﬂow method

from the central generation plant to the consumer is in use that was adopted

almost 100 years ago. In which the consumers’ energy need was met through a

distributed unit and the surplus energy saving mechanism. The smart grid helps

to revolutionize the old system using modern communication and information

technologies. However, a huge capital investment is needed for a single signiﬁcant

change in this large and complex system. The adoption of demand-side manage-

ment helped the smart grid to adapt to climate change and extreme conditions

by integrating renewable energy sources. This energy management in the smart

grid makes easy the grid planning for utility and ensures no greenhouse gas emis-

sions. It also helps to control and monitor the operation of the power system. In

short, the smart grid is an electrical system that can intelligently integrate the

transmission, distribution, and Prosumers and that ensures the two-way ﬂow of

information and energy to secure the electricity supply [4].

Thus, the working deﬁnition becomes: The Smart Grid is an advanced digital

two-way power ﬂow power system capable of self-healing, adaptive, resilient, and

sustainable with future prediction under diﬀerent uncertainties. It is equipped for

interoperability with present and future standards of components, devices, and

systems that are cyber-secured against malicious attack.

1.2.2.1 Characteristics of smart grid

Smart grid uses modern technology and intelligent control, communication, mon-

itoring, and ﬂexible system to regulate the normal operation of the power system.

Some of the main attributes of the smart grid are mentioned below.

1. Smart grid gives the information of energy price to consumers using real-

time communication and demand side management for continuous power

ﬂow.

5By: Arshid Ali

MS Thesis Electricity Theft Detection

2. It also helps to accommodate storage devices including DGs, battery sys-

tems, and other micro-level energy systems, thereby improving the ﬂexibility

in network operation.

3. It also optimizes the energy resources operation and management by con-

sidering the electricity requirement of what and when it is needed.

4. Smart grid operates durably during extreme weather conditions and cyber-

physical attacks, thus increasing energy security.

5. It decreases the concerns over environmental damage from fossil-ﬁred power

stations.

6. It gives beneﬁts to prosumers by using a real-time communication system

to inform about ON and OFF peak hours.

7. It revolutionizes the modern transportation system by using electric vehicles

as load and energy-storing devices.

8. It reduces energy losses and wastage in the power system.

9. Smart grid reduces pollution and greenhouse gases by ensuring the use of

renewable energy sources.

1.3 Data Science

Manufacturing has seen a massive digital shift in the previous ten years. An in-

dustrial infrastructure of the future generation has been made possible by wireless

networking, reduced sensor and data storage costs, and other factors. It should

come as no surprise that modern manufacturing businesses have access to a huge

variety of data sources that oﬀer enormous quantities of production and perfor-

mance monitoring. In 2010, the manufacturing industry produced more than two

exabytes of data. Data, however, can be a very valuable resource that is increas-

ingly important to global corporate operations provided it is properly handled.

Data has become the new oil in future IT-enhanced systems.

As a result, businesses ﬁnd it challenging to develop cutting-edge analytics tools

to utilize their data for business beneﬁts. It is possible to promote data-driven

6By: Arshid Ali

MS Thesis Electricity Theft Detection

decision-making and improve the eﬃciency of current business processes by utiliz-

ing this data with modern analytics tools. Going back to the new oil comparison,

an analytics solution is comparable to an oil reﬁnery in that it turns raw materials

into valuable products.

The "Big Data Revolution" has attracted a lot of IT consultants, but businesses

are frequently let down by the results and confused by the volume and diversity

of data. Instead of widespread applications, the concept of industrial analyt-

ics mostly consists of predictions, ideas, andăprojects. The recent explosion of

machine learning research has produced many useful algorithms and tools, but it

hasn’t given operators theătools they need. As a result,ăindustrial decision-makers

explore a world of new needed outputs [5].

1.3.1 Data Science Applications

There are many fantastic applications in the ﬁeld of data science. Data science is

playing a signiﬁcant role Not just in the business ﬁeld, but also in industries like

healthcare, robotics, medicine, and other sectors [6]. Here some of the applications

of Data Science is mentioned:

1. Education

2. Airline Route Planning

3. Healthcare Industry

4. Banking and Finance

5. Filtered Internet Search

6. Product Recommendation Systems

7. Digital Advertising

8. Image Processing

9. Disease Prediction

10. Anomaly Detection

11. Smart Grid

7By: Arshid Ali

MS Thesis Electricity Theft Detection

1.4 Structures of Machine Learning Algorithms

A machine learning algorithm is a process used by AI systems to carry out their

tasks, which often include estimating output values from input data. Classiﬁcation

and regression are the two basic techniques used by machine learning systems.

The best machine learning algorithm to use relies on a number of variables, in-

cluding the quantity, quality, and variety of the data as well as the conclusions

that organizations want to draw from it. Accuracy, training duration, parame-

ters,ăand many other factors are also important. As a result, selecting the best

algorithm requires consideration of a variety of factors, including business goals,

speciﬁcations, testing, and available time. Even the most expert data scientists

are unable to predict which algorithm would perform the best without ﬁrst testing

alternatives [7].

Machine learning algorithms can be mainly used in two ways i.e individual model

and ensemble model. These structures are explained below:

1.4.1 Individual Structure

Typically, a single machine learning process begins with training data being en-

tered into the desired algorithm. The training data is used to train the given

model and give new predictions. The output predictions are then compared with

the original labelsăare then compared for performance analysis purposes, as seen

in Fig. 1.1.

1.4.2 Ensemble Structure

Machine Learning Ensemble Methods help to create multiple models and then

combine them to produce improved results, some ensemble methods are catego-

rized into the following groups [8].

8By: Arshid Ali

MS Thesis Electricity Theft Detection

Training

Data

ML

Algorithm

Input

Data Trained

Model Output

Prediction

;

Figure 1.1: Individual Machine Learning Model.

1.4.2.1 Bagging Model

Bagging is a homogeneous type of ensemble technique in which the same type of

several algorithms is combined in parallel. These algorithms are then fed with

diﬀerent subsets from the original training set for model-learning purposes. This

subset is generated randomly with replacement from the original training set.

This process is called bootstrap aggregation. After the model is trained on all

the available subsets, the outputs are obtained from all the base algorithms. To

obtain a ﬁnal output of the whole system, a majority voting mechanism is adopted,

where the most frequent prediction among all is considered, in the classiﬁcation

case. Whereas in the regression case, an average or mean of all the predictions is

obtained.

Bagging visual representation is shown in ﬁgure-1.2:

Advantages of a Bagging Model:

1. Bagging greatly decreases the error by decreasing the bias issue in model

prediction.

2. Bagging methods give good performance on the available training set using

bootstrap aggregation.

9By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm-

1

Input Data Majority

Voting/

Average

Algorithm-

2

Algorithm-

3

Algorithm-

N

Prediction-1

Output

Prediction

Prediction-2

Prediction-3

Prediction-N

;

Figure 1.2: Bagging Type Machine Learning Model.

3. Also, if the training set is very huge, it can save computational time by

training the model on a relatively smaller data set and working in parallel

which can increase the accuracy of the model.

4. Works well with small datasets as well.

Examples of bagging types methods are Extra tree and Random forest al-

gorithms.

1.4.2.2 Boosting Model

Boosting is another type of homogeneous ensemble model that uses several weak

learners to make a strong learner. Boosting combine the base algorithms sequen-

tially to reduce the error in output prediction. In the process, the output from the

previous model is given to the next algorithm while the result from each model

is saved iteratively. The whole training set is given to the ﬁrst algorithm and

then proceeds sequentially. Finally, when all the models are trained and outputs

are obtained. The predictions from all models are majority-voted in the case of

classiﬁcation and averaged in the regression case.

Also, in boosting, the training set is given more weights for misclassiﬁed data

points in order to reduce their eﬀects on the ﬁnal prediction. While in bagging

the training samples are taken randomly from the whole population as shown in

ﬁgure-1.3.

10 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm-

1

Input Data

Majority Voting/

Average

Algorithm-

2Algorithm-

3Algorithm-

N

Prediction-1

Output

Prediction

Prediction-2 Prediction-3 Prediction-N

Figure 1.3: Boosting Machine Learning Model.

In contrast to bagging, which trains weak learners simultaneously using boot-

strap aggregation, boosting trains baseălearners sequentially with each learner’s

purposeăbeing to minimize the errors of the previous one.

Boosting, like bagging, can be used for regression as well as for classiﬁcation

problems.

Boosting is mainly focused on reducing biasness errors.

Advantages of a Boosting Model:

1. Inability to extract a linear combination of features

2. High variance leading to a small computational power

And thats where boosting comes into the picture. It minimizes the variance

by taking into consideration the results from various trees.

Ada Boost(Adaptive Boosting), Gradient Boosting, XG Boost(Extreme Gradient

Boosting) are few common examples of Boosting Techniques.

11 By: Arshid Ali

MS Thesis Electricity Theft Detection

Level-0

Algorithm-1

Input Data Level-1

Algorithm

Level-0

Algorithm-2

Level-0

Algorithm-3

Level-0

Algorithm-N

Output-1

Final

Prediction

Output-2

Output-3

Output-N Testing

Data

;

Figure 1.4: Stacking Machine Learning Model.

1.4.2.3 Stacking Model

Stacking is also an ensemble model but is diﬀerent from bagging and boosting types

of techniques. Stacking uses a heterogeneous type of base learners for performance

improvement. A little diﬀerent phenomenon of out-of-fold cross-validation is used

in this technique. Here the algorithms are combined in multi-layers. Where the

base learners, in layer-0, are trained and predicted using the training set, and the

prediction of these algorithms is given to a layer-1 algorithm. The algorithm at

layer 1 learns from previous models’ predictions and gave a ﬁnal output. This

output is discrete in the case of classiﬁcation and continuous in the regression

case. It is diﬀerent from a voting algorithm because voting just uses voting or

the mean of all predictions. While stacking uses an algorithm for this purpose.

In a nutshell, stacking is an ensemble learning technique that uses meta-learning

to combine several machine-learning algorithms. In this, base-level algorithms are

trained using a complete training data set, and the meta-model is trained using

the results of every base-level models as features. As seen in Fig. 1.4, we can now

learn stacking, which increases model prediction accuracy.

Advantages of a Stacked Generalization Model:

1. The advantage of stacking is that it may use a variety of eﬀective models

to accomplish classiﬁcation or regression tasks and provide predictions that

perform better than any one model in the ensemble.

12 By: Arshid Ali

MS Thesis Electricity Theft Detection

2. Stacking improves the model prediction accuracy.

The disadvantages of Stacked Generalization Model:

3. As we are taking the whole dataset for training for every individual classiﬁer,

in the case of huge datasets the computational time will be more as each

classiﬁer is working independently on the huge dataset.

1.5 Data Science and Smart Grid

In a smart grid, intelligent devices are able to communicate with one another.

The precise data needed for correct information and energy ﬂow in the network is

provided by these devices. All of this information must be managed in real-time

and preserved so that decisions may be made based on past data and speciﬁc

situations. Data gathered from intelligent devices in substations, feeders, and

numerous databases have been used in a number of research projects. Price data,

electricity data, power system data, geography data, weather data, etc. are all

examples of information sources. Thisărequires a forecastămodel that is accurate,

and eﬀective to match the supply and demand of electricity.

For instance, energy consumption data (kWh) from 100,000 or more customer

smart meters at sampling intervals of 15 minutes demonstrates that assuring the

quality of the acquired data is a particular problem for the evaluation of prediction

models for SG. Numerous variables must be forecasted, including renewable en-

ergy production, energy purchase from energy markets, 24-hour load distribution

planning, etc. The complexity of data processing is increased by a large number

of SG data. Processing this enormous quantity of data for optimal power ﬂow and

real-time monitoring and planning can be accomplished.

Big data-based power generation, optimization, and forecasting research has been

expanded to include renewable energy systems like wind energy systems, solar fore-

casting, load forecasting, and others. Also, the data obtained by utility containing

private information is present which need to be handled while not disturbing the

consumers’ privacy.ă Additionally, this data comprises sensitive and conﬁdential

information from a country’s or an organization’s central grid. ă In order to ensure

the eﬃcient operation of the smart grid, data storage, and cyberattacks directed

13 By: Arshid Ali

MS Thesis Electricity Theft Detection

against the power system, suﬃcient protective mechanisms are necessary. Pro-

cessing large amounts of data and putting in place adequate security measures is

made more desirable by machine learning [9].

1.6 Summary

This chapter gave an introduction to the electrical grid. The diﬀerence between

the old traditional and future smart grid is explained. The ﬁeld of data science,

its current trend, and its wide applications are discussed. In the last, diﬀerent

machine types of machine learning approaches, in prediction, are explained. The

role of data science is deﬁned as dealing with big data, obtained from smart grids,

in order to ensure security and prediction like electricity theft detection.

Shortly, in the smart grid, machine learning and artiﬁcial intelligence can be used

for fault prediction and maintenance, eﬃcient decision-making and forecasting, en-

ergy trading, data protection and security, power consumption, data transparency,

theft detection, etc.

14 By: Arshid Ali

MS Thesis Electricity Theft Detection

Chapter 2

Introduction

15 By: Arshid Ali

MS Thesis Electricity Theft Detection

2.1 Introduction

Eﬃcient energy generation and utilization play an important role in the economy of

a country. The electricity generated at the generation station suﬀers from several

severe types of losses. These losses occur due to technical and non-technical issues.

The technical losses belong to devices’ eﬃciency and can only be minimized by

modernizing the electrical components or the whole system. This requires a large

amount of investment, is time-consuming, and also 100% eﬃcient devices don’t

exist yet.

The technical losses also cause a large capital loss to utility, in Billions of USD.

These non-technical losses which belong to unfair use of electricity by users can

be reduced by proper management and systematic observation to prevent such

cases. The utility adopted a lot of techniques from hardware to software to data-

driven methods, but there still exists in-eﬃciency in terms of illegal electricity

usage prevention.

The developments and innovations in machine learning techniques make desirable

data-driven approaches. Due to the increase, easy and accurate usage of machine

learning techniques for prediction purposes. Machine learning algorithms can be

implemented on smart grid data for ETD purposes. However, these techniques also

face issues regarding prediction accuracy. The implementation of machine learning

algorithms requires data pre-processing and algorithm selection steps for accurate

prediction. In this paper, an ensemble machine and deep learning structure are

adopted to cope with the ETD issues.

Introduction

2.1.1 Background and Motivation

T he successful integration of renewable energy into the electricity network trans-

formed the power grid from a centralized and dull energy system to a decentralized

and intelligent system. This distributed power system makes the grid more eﬃcient

due to eﬃcient infrastructure utilization. The recent technological development

16 By: Arshid Ali

MS Thesis Electricity Theft Detection

and new strategies followed by the utility make the grid more ﬂexible for energy

resource accumulation. Therefore, more intermittent energy resources can be used

for electricity generation. Electricity generation from diﬀerent resources is shown

in Fig. 2.1.

Gas

23%

Nuclear

10%

Wind

6%

Solar

3%

Hydro

16%

Geo, Biomass &

other …

Other

1%

Oil

3%

Coal

35%

Figure 2.1: Energy Generation by Sources.

And this energy can be added to the power system without disturbing the grid

stability. From US Energy Information Administration (eia), the amount of elec-

tricity generation given in Table-2.1, shows an increase in renewable sources above

20% [10]

In addition to the need for improvement in the amount of electricity generation by

adding more resources to the electric grid. Power management and eﬃcient energy

resource utilization also play a useful role in the socioeconomic development of a

country because of the high cost of electricity production and limited available

energy resources. Power management and cost reduction have two possible ways:

1. Generate and transmit electricity from those resources that have minimum

expense per unit.

2. Rated revenue pay-back of consumed electricity to utility in the form of the

electricity billing system.

The reduction in unit per cost of electricity can be addressed by moving towards

low-cost, low-emission renewable sources with more energy-eﬃcient devices. While

17 By: Arshid Ali

MS Thesis Electricity Theft Detection

Model

F1-Score

AUC

Accuracy

ET

0.8234

0.8163

XGB

0.8290

0.8346

LGB

0.9340

0.9354

RF

0.9487

0.9492

Proposed-Model

0.9775

0.9777

#

Energy Source

Billion

KWh

Share of

Total

1.

Total – All Sources

4,116

2.

Fuels (Total)

2,504

60.8%

3.

Natural Gas

1,575

38.3%

4.

Coal

899

21.8%

5.

Petroleum (Total)

19

0.5%

6.

Petroleum Liquids

11

0.3%

7.

Petroleum Coke

7

0.2%

8.

Other Gases

11

0.3%

9.

Nuclear

778

18.9%

10.

Renewables (Total)

826

20.1%

11.

Wind

380

9.2%

12.

Hydropower

260

6.3%

13.

Solar (Total)

115

2.8%

14.

Photovoltaic

112

2.8%

15.

Solar Thermal

3

0.1%

16.

Biomass (Total)

55

1.3%

17.

Wood

37

0.9%

18.

Landfill Gas

10

0.2%

19.

Municipal Solid

Waste (Biogenic)

6

0.2%

20.

Other Biomass

Waste

2

0.1%

21.

Geothermal

16

0.4%

22.

Pumped Storage

Hydropower

-5

-0.1%

23.

Other Sources

12

0.3%

Table 2.1: U.S. Utility-Scale Electricity Generation by Source.

the revenue pay-back system of utility faces issues due to electric power loss (EPL).

The diﬀerence between the energy generated at the generation end and the energy

delivered to the consumers is known as electric power loss. The electricity losses

are classiﬁed into two categories [11]. These are:

1. Technical losses or system losses (TLs)

2. Non-technical losses (NTLs)

TLs are the total EPL in the power system, from the network injection point to the

consumer. This occurs due to the energy dissipated in transmission lines, distri-

bution lines, and transformer cores. This problem can be overcome by using good

quality and highly eﬃcient equipment instead of old old electrical infrastructure

which requires a large cost and time.

18 By: Arshid Ali

MS Thesis Electricity Theft Detection

NTL may be due to some kind of abnormality or changes induced by electricity

consumers (EC) in the electricity network like installation errors, billing errors,

faulty meters, or meter by-passing. This creates system disturbance and low power

load management for utility companies. In addition, NTLs or electricity theft (ET)

not only causes signiﬁcant economic loss but also aﬀects the normal operations of

the power system by creating power ﬂuctuations and disturbing grid stability [12].

According to Northeast Group, the NTL-based worldwide revenue losses were

about $96 billion in 2017 [13]. While in 2014, these losses were about $58.7 billion

in the world with India facing 16.2 billion USD, Brazil facing 10.5 billion USD,

Pakistan facing 0.89 billion USD, and Russia facing 5.1 billion USD [14] [15], which

shows a high increase in loss during the last few years, as shown in Fig-2.2.

96

16.2 10.5 5.1 0.89

0

20

40

60

80

100

120

Worldwide India Brazil Russia Pakistan

Revenue Losses in Billion

$

Countries

38,757 38,757

3,615

38,757

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

Im-Balance Dataset Balanced Dataset

Count

Class

Normal User Theft User

Figure 2.2: Economic Losses in Diﬀerent Countries in Billion of

USD.

To reduce non-technical losses, utility companies must follow the necessary steps

to identify the theft and abnormal behavior of energy usage. However, the con-

ventional methods require a large number of technicians to make the on-the-spot

checkup of users’ energy meters, and an insigniﬁcant amount of energy theft is

detected, which results in a less revenue pay-back.

The recent technological development, AMI system, and especially smart grid

make electricity management, monitoring, and NTL reduction possible. The smart

grid (SG) is an intelligent electricity system that permits a two-way ﬂow of elec-

tricity and information by using an intelligent monitoring system. It integrates

the AMI system to control and monitor the energy usage of consumers and utility

19 By: Arshid Ali

MS Thesis Electricity Theft Detection

in the electricity network [16]. This system works in real-time by ﬁrst collecting

the user’s electricity consumption (EC) information and then transferring it to

the utility using communication channels for billing, grid security, loss reduction,

and other purposes. The AMI structure is shown in ﬁgure-2.3.

Data Base-1 Data Base-2

Real Time

Communication

Utility

Data Collector

Unit

LAN

Smart Meter-1

Smart Meter-2

Smart Meter-3

Smart Meter-6

Smart

Meter-4

Smart

Meter-4 Smart

Meter-5

Smart

Meter-5

Smart Meter-N

WAN

Figure 2.3: Structure of AMI Network.

The collection of EC in real-time makes the SG capable to detect the losses in

electricity networks. The two main types of information required about energy

loss are given below.

1. How to locate the theft source?

2. How much electricity is stolen?

2.1.2 Non-Technical Losses Issues in Smart Grid

In addition, the advancements in power systems make the grid more exposed to

cyber-attacks, fraud, and system failures due to the increase in the number of

nodes in the energy network.

20 By: Arshid Ali

MS Thesis Electricity Theft Detection

The NTL-based losses are mainly experienced by the illegal electricity EC of the

users, which also disturbs the system operation, incurs additional losses, damages

the system components, and also aﬀects the grid security and stability. Many

countries have also characterized electricity theft as a special kind of crime [17].

To reduce NTLs, utility companies must follow the necessary steps to identify the

theft and abnormal behavior of energy usage. However, the conventional meth-

ods require a large number of technicians to perform the on-the-spot checkup

of the consumption meters. The manual energy consumption reading also lacks

organized time and labor schedules. Due to this, an insigniﬁcant amount of en-

ergy theft is detected, which results in a less revenue pay-back [18]. The recent

rapid improvements in ML methods show increased interest in the ideas of models

analyzing the load information, and meter tampering as early as possible. The

ML theft detection techniques work to detect the deviation of energy statistical

patterns from normal behavior. In modern research, the use of ML techniques

provides a new solution for utility companies for detecting anomalous EC. These

modern techniques make it possible to automate and improve detection accuracy

by accurately identifying malicious patterns. Thus, an ML classiﬁer with high ac-

curacy is needed to help the existing techniques deal with large detection tasks. To

overcome the electricity theft issue, many data-driven methods have been used in

recent years. These methods are divided into three categories, namely state-based,

game theory-based and artiﬁcial intelligence-based methods [19].

1) State-based methods use speciﬁc kinds of devices or designs for metering and

theft detection purposes. For example, a special ammeter checks the electricity

diﬀerence between the local and remote ends for fraud detection purposes. State-

based estimation works only at the substation level and not at the end-user level.

This type of installation for electricity theft detection requires extra monitoring

devices which are diﬃcult to install in the existing distribution systems.

2) Game-theory-based methods use the interfering behavior of pricing competition

and product releases like games between anomalous users and electric companies.

The main goal of this method is to ﬁnd an equilibrium for the game. This type

of model is easy to install but it is hard to ﬁnd speciﬁc mathematical modeling,

which relates the actual behavior between the end user with the utility company.

3) Artiﬁcial intelligence (AI) is adopted in almost all worldwide ﬁelds including

21 By: Arshid Ali

MS Thesis Electricity Theft Detection

Gas

23%

Nuclear

10%

Wind

6%

Solar

3%

Hydro

16%

Geo, Biomass &

other …

Other

1%

Oil

3%

Coal

35%

Customer

Service

10%

Other

16%

Sales

16%

Security

25%

Business

33%

Customer Service Other Sales Security Business

Figure 2.4: ML Techniques in Various Fields.

business, security, sales, banking, and many more. The expansion and advance-

ments in SG generate big data, which requires a scalable technique for eﬃcient

utilization. The recent advancements in ML and DL in anomaly detection pave a

way for energy security in SG [20]. These ML-based models can be used to address

the NTL issue in the SG. In the present AMI system, these AI techniques can be

used to draw and compare the load proﬁle and the energy consumption pattern

of end-users to classify legal and illegal electricity users. Diﬀerent applications of

AI are 2.4

This research aims to present an accurate theft and normal users’ classiﬁcation

model using the state grid corporation of China (SGCC) dataset. In this work,

we will use some pre-processing steps such as data cleaning, data normalization,

and data balancing, as shown in Fig. 2.5.

2.1.3 Contributions

It has been observed from the literature that most of the present research works

use diﬀerent intelligent ML methods to detect the NTLs’ behavior in the time-

series data of smart grids. However, the current research still has less accuracy

and a research gap in NTLs’ behavior detection. The present theft detection issues

are tackled in the form of the following contributions:

22 By: Arshid Ali

MS Thesis Electricity Theft Detection

1. The data obtained from the smart meter has normal and theft users where

the number of abnormal users is less than the normal electricity users. Many

research work use classiﬁcation models on the data obtained from the smart

meters without considering the issue of class imbalance. The class imbalance

biases the ML model towards the majority class and the model classiﬁes

theft as a normal user. This class imbalanced data needs a proper balancing

technique to overcome the biasness issue.

2. The second problem addressed in this study is high dimensionality in the

time-series dataset. The high dimensionality causes time complexity issue

and reduces output classiﬁcation performance. This issue is reduced through

a proper feature-reduction technique.

3. Third, many researchers emphasize output results compared with the orig-

inal labels of the testing set and do not focus on the detection level of

abnormal electricity users. The results’ comparison in the form of accuracy

is not a proper metric. It may result in a set of theft users inspected as

normal users, which should be reduced. In the confusion matrix, when the

abnormal consumers are predicted as normal consumers, are considered as

a false negative. This false negative issue is addressed in this research to

reduce revenue loss.

4. In machine learning techniques, some normal users are predicted as mali-

cious, which increases the on-spot inspection cost. The fourth contribution

in this paper addresses the issue in the form of a maximum reduction in

false positive rate.

5. Much state-of-the-art research done in electricity theft detection used a sin-

gle machine learning algorithm which is diﬃcult for the classiﬁer to learn

all the energy patterns of users from a large dataset. The single model is

normally under-ﬁt or over-ﬁt in the case of large and imbalanced datasets.

In this research, an ensemble stacking model is proposed for the best classi-

ﬁcation and generalization purposes.

These ML-based models can be used to address the NTL issue in the smart grid.

In the present AMI system, these AI techniques can be used to draw and com-

pare the load proﬁle and the energy consumption pattern of end-users to classify

23 By: Arshid Ali

MS Thesis Electricity Theft Detection

legal and illegal electricity users. This research aims to present an accurate theft

and normal user classiﬁcation using the state grid cooperation of China (SGCC)

dataset. In this work, we will use some pre-processing steps such as data clean-

ing, data balancing, missing data imputation, and data normalization as shown in

ﬁgure-2.5:

Smart Grid Data

ML

Algorithm

Selection

Meta Model

Missing

Value

Imputation

Data

Balancing

Class

Balancing

Technique

Results Evaluation Desired

Performance Stop

Base

Models

Feature

Extraction

Data

Cleaning No

Yes

No

Yes

Output

Yes Yes

No

Predictions

Balanced

Dataset

Yes

Figure 2.5: Proposed-Model Flow chart.

A Feature extraction step is also aimed to overcome the time complexity and clas-

siﬁcation performance using state-of-the-art machine learning techniques. Finally,

a Stacked (takes the output of ML models and uses a Meta model for output

prediction) machine and deep learning generalization technique will be used for

improved classiﬁcation accuracy.

24 By: Arshid Ali

MS Thesis Electricity Theft Detection

2.2 Layout of Thesis

This thesis is divided into six chapters with a preliminary in chapter zero and

chapter one including an introduction. The literature work is summarized in the

second chapter. Chapter three and four explains the work done in system model

one and two, respectively. While the conclusion of this thesis is added in chapter

ﬁve.

2.2.1 Summary

Electricity plays an important role in the modern world. The demand for energy

is increasing day by day, therefore, lossless electricity consumption is required.

Diﬀerent types of electricity losses occur from generation point to end consumers,

of technical and non-technical kinds. These losses cause system instability, volatile

demand and supply management, and huge revenue losses to utility. Among these

losses non-technical losses, which are due to illegal electricity consumers, are se-

vere type and can be easily reduced. Utilities applied diﬀerent methods, using

hardware, sensors, software, and data-driven techniques, to overcome these losses.

Among these, data-driven approaches are more eﬃcient, simple, and have low

capital investment.

The recent development of machine learning techniques made it possible to use

smart grid data and predict energy loss. However, machine learning techniques

applied in present research have in-eﬃciency in terms of theft consumers’ predic-

tion. This research focuses on pre-processing techniques, data balancing, feature

engineering, and algorithm combination to reduce FPR and FNR, and increase

classiﬁcation accuracy.

25 By: Arshid Ali

MS Thesis Electricity Theft Detection

Chapter 3

Literature Review

26 By: Arshid Ali

MS Thesis Electricity Theft Detection

Literature Review

3.1 NTL detection schemes categories

This section discusses the existing work done to reduce the NTLs in the SG while

considering four main categories of NTL detection methods, including hardware-

based, state-based, game theory-based, and AI-based techniques [21]. The method-

ological structure begins by ﬁrst having an overview of the classical approach and

then moving towards recent artiﬁcial intelligence (AI) based techniques. The AI-

based electricity theft detection (ETD) steps are then studied, in recent research,

focusing on pre-processing, balancing, feature engineering, algorithm modeling,

and how these techniques help in the ETD process. NTL has been the major issue

in achieving grid stability and causing revenue losses to the utility for more than

two decades. Researchers have addressed the issue of NTL reduction in power

systems using state-of-the-art techniques available at that time.

3.1.1 Hardware Based

NTL has been the major issue in achieving grid stability and revenue losses to

the utility for more than two decades. Researchers have addressed the issue of

NTLs reduction in power systems using state-of-the-art techniques available at

that time. For example, Pasdar et al. [22] proposed a smart metering system with

a high-speed signal to detect malicious activity in the network. The system uses

power line communication (PLC) to communicate the customers energy meter

with the utility. In this method, a lossless high-frequency signal, with known line

impedances, is transceived through PLC. The software at the utility compares

the signals of end users and detects the theft location. A similar case with little

modiﬁcations in electricity consumption observability is proposed in [23]. The

proposed method work using smart energy meters with a specialized display sys-

tem for both utility and end user. Using the display system, the end users can

analyze their own consumption while at the same time utility also monitor and

check the consumption behavior. In this way, the power quality and grid stability

27 By: Arshid Ali

MS Thesis Electricity Theft Detection

are maintained. A smart meter with a single chip-based checkup system is imple-

mented in [24] for ETD purposes. The chip uses the standard measurement as the

base and then predicts the malicious behavior by comparing it with real-time con-

sumption. The same hardware-based detection methods are proposed in [25–28]

with wireless, especially GSM-based monitoring systems. A sensor network with

a cloud-based module monitor, circuit breaker, and real-time electricity pulse ob-

server are used to compare and monitor the input and output electricity to the

energy meter. The network then uses some form of switch, or buzzer to power oﬀ

the line and inform the utility about illegal activity. A similar hardware model

is presented in [29] based on state-based estimation. The model uses PLC and

supervisory control and data acquisition (SCADA) to check the state of connected

devices. The PLC is used for communication purposes while SCADA uses inter-

net protocol (IP) services and distributed network protocol 3 (DNP3) for exact

device identiﬁcation and system interoperability, respectively. The acquired data

through the state controller module (SCM) is then compared with the standard

data produced by the load system (LS) to identify malicious attacks between two

connected grids/substations.

The above mentioned techniques used specialized sensors, hardware, and online

monitoring unit. Besides, the techniques may have hardware failure issues, and

can only measure and detect physical theft attacks with no capability of cyber

attack detection. Therefore, the authors in [30] proposed a measurement-based

approach for NTL reduction. The paper follows the use of an energy monitoring

unit on the secondary side of the distribution transformer. The unit takes total

electricity consumption measurement and sends the information to the utility of

the particular group. The measurement is compared by applying a statistical

approach to identity theft among the given group of consumers. Based on the

ﬁndings of the above hardware-based approaches, the techniques help in NTL

reduction. But the system may suﬀer from a high cost of investment in hardware

and unreliability problems. Moving towards its advanced version like AMI, the

authors in [31] suggested a multi-source information fusion (MSIF) technique using

AMI data for more accurate detection purposes. The data collected from electricity

consumers cant be easily classiﬁed as malignant or benign based on a single alert.

Therefore, a combination of alerts from several malicious users is presented for

accurate results. The authors also show the pros and cons of using supervised

28 By: Arshid Ali

MS Thesis Electricity Theft Detection

and unsupervised techniques for output performance. The basic function of the

system is that data collected from the AMI system have information about the

appliance consumption of that particular customer and also the meter reading.

So if a particular device is observed ON and the meter shows zero consumption,

then that user is a theft. However, the complete information about customer

consumption also causes privacy issues.

3.1.2 Game theory Based

Further research and advancement to the hardware-based NTL detection tech-

nique in the form of AMI, data-driven based techniques including a game theory,

and heuristic algorithm-based techniques are applied, such as in [32]- [33], to in-

crease the detection accuracy. The main purpose of the techniques is to model a

game between electric utility and electricity user. The system is designed to ﬁnd

an equilibrium between the two and a threshold value is assigned. A probabilistic

technique is then applied to classify obtained data as honest or ET users. The

game theory and heuristic algorithm take suﬃcient time to deal with big data

due to their stochastic nature. These techniques are inaccurate, biased, and cant

reach an optimal value on a large dataset. The load ﬂow method based on the

AMI dataset is implemented in [34]. The authors addressed the ETD issue in the

SG using the real-time electricity consumption pattern (ECP) of consumers. The

main problem with power ﬂow analysis techniques, namely Gauss Siedal, New-

ton Raphson, and Fast Decoupled methods, is that they have low convergence

rates, large memory requirements, and time complexity issues in reaching an op-

timal point. The proposed system addressed this issue by using modiﬁed linear

regression to capture the electricity theft and normal patterns. This resulted in

an increase in the speed of power ﬂow model simulation and its adoption in large

power systems.

3.1.3 Artiﬁcial Intelligence Based

Due to the new technological development in smart grids, especially AMI systems

with real-time monitoring systems and large-scale ECP collection in the form of

big data. The newly emerging ﬁeld of data science techniques has almost made

29 By: Arshid Ali

MS Thesis Electricity Theft Detection

a replacement for the traditional NTLs detection techniques because of the low

cost, easy implementation, and high ETD rate. The big data obtained from the

utility can be given to data science (DS) techniques for easy and eﬃcient analysis.

The DS-based machine learning (ML) and deep learning (DL) algorithms have the

capability of NTLs and revenue loss reduction. For example, the author in [35]

proposed a hybrid machine learning model. A decision tree (DT) and support

vector machine (SVM) is used such that extra features are extracted from the

original dataset and then fed to the SVM with the original features. SVM is

ﬁnally used for the ﬁnal prediction. In addition to this data pre-processing is done

with only missing data imputation and normalization steps. The experiment is

done on a dataset collected from various homes in the USA. A non-linear radial

basis function (RBF) kernel is selected for SVM to improve the output results.

The ﬁnal accuracy and false positive rate (FPR) obtained were 92.5% and 5.12%

respectively. Zhongzong and He in [36] implemented an extreme gradient boosting

(XGB) classiﬁer for ETD purposes. The method considers the Irish (Ireland)

dataset without using proper data pre-processing data balancing techniques. The

data reduction is done in the way that six artiﬁcial theft attacks were generated

from the original dataset and then the model training/testing is performed with

that data. The ﬁnal output obtained is compared with SVM-based classiﬁcation.

The results show that XGB outperforms SVM in terms of precision, recall, and

FPR.

A novel idea of gradient boosting classiﬁer (GBC) based theft detection method is

proposed in [37]. The research mainly focuses on feature engineering and hyper-

parameter tuning steps for improvement in detection rate and FPR and reduction

in processing time. The feature extraction is done using a combination of syn-

thetic feature generation and weighted feature importance (WFI) techniques. The

ﬁnal results showed that GBC outperforms, the categorical boosting, light gradi-

ent boosting machine, and XGB, in terms of FPR and execution time. Author

in [17] proposes a supervised machine learning technique for all kinds of anomaly

detection in smart grids. For this purpose Endesa (Spain) dataset is considered

which is collected from almost 57000 ﬁeld inspections of diﬀerent consumers. The

feature extraction is done using energy consumption (EC), quality byte, distance,

density, and electrical magnitude-based measurement. The Endesa dataset in-

cluding EC also has important information on geographical, seasonal, and smart

30 By: Arshid Ali

MS Thesis Electricity Theft Detection

meter properties. The ﬁnal extreme gradient boosting (XGB) model shows better

results with an AUC of 91%. The same data-driven technique is applied in [38] us-

ing machine learning, deep learning, and parallel computing techniques to detect

malicious electricity users. A Turkish smart grid dataset is chosen for the model

implementation and detection of false data injection. The feature learning process

is done by combining highly comparative time series analysis and neighborhood

component analysis feature selection algorithms. After data transformation, the

classiﬁcation algorithm is implemented. Improved results are obtained for XG-

Boost in terms of FPR value of 0.005.

Prem et al. [39] worked on cyber-physical attack detection using an isolation forest

classiﬁer (IFC). The isolation forest is used to detect the change in the pattern

of the consumers. The main purpose of theft is to decrease the meter reading

from actual values, which changes the energy consumption pattern (ECP) of that

particular user. Data reduction is done using PCA. The IFC is trained at varying

load and voltage generation in order to capture the exact picture from all pos-

sible ECP of consumers. The hyper-parameter tuning is done and the model is

tested for diﬀerent grid/bus systems. The results obtained show 98.7% recall in

terms of anomaly detection with the IEEE 3-bus system. Leloko et al. [14] tried

to diﬀerentiate theft consumers from honest consumers using the SGCC dataset.

The overall method used data pre-processing, data balancing, and feature reduc-

tion. Hyperparameter tuning is done using a Bayesian optimizer. The model was

individually implemented on both time domain features and frequency domain

features for accurate training. Feature selection from both the time and frequency

domains proves useful. The ﬁnal deep neural network implemented showed out-

standing performance of 97% and 91.8% with an area under the curve (AUC)

and accuracy, respectively. The author in [40] structured a CNN-RNN-BiLSTM

model to detect electricity theft in Raipur (an Indian State). The dataset consists

of 41 users meters of three-phase supply. A combination of 3 diﬀerent deep learn-

ing models is used to learn the data pattern of normal and abnormal users. The

proposed model achieved an improved accuracy of 97.1% and outperformed, the

existing SVM and multi-class SVM when compared. In [41], Paria et al. presented

a solution for ETD purposes while focusing on the ECP of consumers. The ECP

of theft and honest users are not the same, in fact, the theft pattern has more

ﬂuctuations. Therefore, the area with a high probability of malicious activities, in

31 By: Arshid Ali

MS Thesis Electricity Theft Detection

terms of electricity consumption, are installed with distribution transformer me-

ters (DTM). Using this transformer meter, both types of consumers are identiﬁed.

5000 real consumers’ data is analyzed in this work. The data preprocessing, bal-

ancing, and feature reduction are all overcome by generating six synthetic attacks.

SVM algorithm with the combination of noting diﬀerent types of ECP is obtained

using DTM. The ﬁnal experimental result showed a 93% detection rate and 11%

FPR.

Similar to the above ECP-based NTL detection, an optimized convolutional neural

network and gated recurrent unit (CNN-GRU) method is studied in [42]. Real-

time data of 10,000 consumers is analyzed for ETD purposes. The data preprocess-

ing is done to impute missing values. Synthetic minority over-sampling (SMOTE)

is used for class balancing. A manta ray foraging optimization (MRFO) is com-

bined with CNN-GRU for result improvement. The ﬁnal model implemented

showed a 91.1% accuracy which was greater than SVM, logistic regression, and

CNN-GRU. The same data-driven approach is applied in [43] for NTL reduction in

the SG. The authors worked on real-time data of 2,271 consumers collected from

the Honduras distribution system. The smoothing spline function (SSF) is used

for outlier handling. For feature reduction purposes, a new discrete wavelet packet

transform is implemented. The class imbalance issue is addressed using the ran-

dom under-sampling (RUS) technique. In the last step, an ML-based RUS with

Adaboost technique is applied for classiﬁcation purposes. Adaboost performed

better with an accuracy of 94.35% when compared with Linear-SVM, Non-Linear

SVM, and artiﬁcial neural network (ANN). Using new incoming ML and pre-

processing techniques makes the ETD process simple and eﬃcient. Pamir et al.

in [44] followed the same direction and proposed a hybrid ensemble model for

electricity theft detection. The researchers worked on data pre-processing using

KNNOR for data balancing. The feature reduction is done using the recursive fea-

ture elimination technique. For classiﬁcation purposes, a bi-directional long short

term memory (Bi-LSTM) classiﬁer with three layers is used as the base model

followed by a LogitBoost classiﬁer. This proposed stacking approach results in

improved detection performance when veriﬁed on a real-world ’SGCC’ dataset.

The output value obtained for precision, F1-Score, and accuracy show 96.32%,

94.33%, and 89.45%, respectively.

The high dimensional dataset increases the model time complexity and degrades

32 By: Arshid Ali

MS Thesis Electricity Theft Detection

the model classiﬁcation results. So, motivated by feature reduction techniques in

electricity theft detection. A natural gradient boosting (NGBoost) based theft

detection method is proposed in [45]. The three-step system represents the ex-

traction of important features, data pre-processing, and classiﬁcation techniques.

The missing values are imputed with the miss forest technique, and the data im-

balance problem is addressed with the majority-weighted minority oversampling

technique (MWMOTE). A time series feature library combined with a whale opti-

mization algorithm is used for feature extraction. Finally, the NGBoost classiﬁer

is implemented on the SGCC dataset. The proposed structure achieved a 93% of

accuracy, 91%, and 95% for recall and precision, respectively. A similar feature-

engineered approach is adopted in [46]. The authors used SGCC dataset in the

research and addressed the missing values, imbalance class ratio, and high dimen-

sionality issues. These problems are overcome respectively with, knn-imputation,

SMOTETomek, and ts-FRESH algorithms. A min-max normalization is also ap-

plied for data scaling. A CATBoost, XGBoost, and LightGBM classiﬁers are used

for ﬁnal normal and theft classiﬁcation. The results show an improved result for

the CATBoost algorithm by achieving a 95% of accuracy. In a similar fashion,

the researchers addressed the electricity theft issue using the Colombian electricity

supplier dataset. The data is pre-processed by addressing the missing values and

normalization. The missing values are removed while the scaling is done using

Min-Max-Scaler. Finally, a BiGRU-CNN is implemented for classiﬁcation pur-

poses. The purposed model achieved an accuracy of 92.9%, 0.841 of F1-Score,

and AUROC of 0.966 [47].

In electricity theft detection (ETD), the class imbalance problem makes the model

biased towards the majority class and reduces the classiﬁcation performance. To

overcome this issue, the researchers in [48] emphasize class balancing using load

data of 50 urban electricity users for three months in Hebei province, China. A

K-SMOTE technique is used to make equal the honest and theft users. Several

machine learning techniques are applied for theft prediction purposes in which

Random Forest (RF) model shows superior performance than other compared ML

classiﬁers. The evaluation metrics showed 94.53% and 0.9513 for accuracy and

area under the curve, respectively. Sravan and Dipu in [49] also followed the class

balancing approach. The authors implemented ensemble techniques to detect elec-

tricity theft using a dataset of 5000 customers. The dataset was obtained from

33 By: Arshid Ali

MS Thesis Electricity Theft Detection

the commission for energy regulation. Data pre-processing is done by near-miss

imputation and SMOTE oversampling. The author revealed that the bagging

ensemble outperforms other ensemble-boosting methods in terms of theft identiﬁ-

cation. The Random Forest and Extra Tree classiﬁers achieved an AUC value of

0.90 which was higher than comparative ML models. Due to the fast and large

data interpretation capability of deep learning models. The deep learning-based

electricity theft detection, along with the imbalance dataset is addressed by Rui

et al in [50]. To address the model biasness towards the majority class, a focal loss

function is used to reduce the sample weight of normal users. SENet is combined

with wide and deep convolution neural networks to learn the global features and

detect the electricity theft consumers from the data. The ﬁnal model is tested us-

ing real-time data from the Chinese Smart Grid Corporation (SGCC). The model

outperform compared state-of-the-art techniques by obtaining an area under the

curve score of 0.83. Lei et al in [51] proposed a new theft attack model for theft

identiﬁcation. The researchers extract the important patterns of the particular

users along with the neighborhood energy consumption patterns using the SGCC

dataset. The normal users’ pattern seemed to have regular patterns while theft

patterns have large spikes and variations. The Pearson correlation coeﬃcient is

used to capture the similarity between theft and honest users’ patterns. Finally,

a convolution neural network is applied for classiﬁcation purposes. The proposed

method obtained 88% of accuracy and 95% for the area under the curve and

outperform the comparative theft detection techniques.

Heuristic algorithms play a great role in achieving the model’s optimal point of

performance. A dual deep learning technique is combined with heuristic techniques

by Abdulwahab and Nasir in [52]. The researchers used the State Grid data ob-

tained by the Chinese Govt. The dataset contains the energy consumption of 9655

users for one year. For theft classiﬁcation purposes, the data is pre-processed by

addressing missing values, outlier handling, and data normalization. The interpo-

lation, 3-sigma rule, and min-max scaling are respectively used for this purpose.

The class balancing is done using SMOTE and SMOTEBoost techniques. After

data extraction with ZFNET, the ﬁnal CNN-LSTM algorithm is used for theft

identiﬁcation purposes. For a faster and more eﬃcient operation, blue monkey

and black widow optimizers are applied. The simulation results show an improved

performance of 91% and 93% in terms of accuracy on the blue monkey and black

34 By: Arshid Ali

MS Thesis Electricity Theft Detection

widow-based tuning, which stand out as state-of-the-art techniques in electricity

theft prediction.

In a data-driven based electricity theft detection scenario, a low FPR is required

to reduce the onsite inspection cost. Dexi et al proposed a low FPR deep neural

network to address this issue. The real-time Irish smart grid dataset is used in

this research. The author extracts the important features using the deep model

and focal loss is used to reduce the class imbalance problem. Finally, a two-stage

training model is implemented. In the ﬁrst stage of training, a one-dimensional

convolution and residual network are used in combination with convolution gra-

dient descent to update the model weights using the grid search tuning method.

In the second stage, FPR is taken as an objective function and particle swarm

optimization is done. The proposed model achieves outstanding performance, on

the Irish dataset, with 0.29 of FPR and 99.42 of AUC [53]. Hasan et al in [54]

proposed a CNN-LSTM-based electricity theft detection (ETD) model using his-

torical power consumption data for 10,000. The author addresses the missing

value in the dataset. The im-balance class issue is addressed using SMOTE-based

balancing. Overall, an improved accuracy of 89% is obtained for theft and normal

users classiﬁcation.

A simple data-driven approach is adopted by Roubin et al in [55]. The authors

ﬁrst applied important pre-processing steps for NaNs and outliers handling. The

NaNs are imputed with mean imputation and outliers are replaced with values

obtained from the 3-sigma rule. The authors also addressed the diﬀerence in

consumption of users, on weekdays and weekends, and addressed it using the ratio

proﬁle (RP). The researchers then used unsupervised fuzzy c-clustering for theft

and normal user classiﬁcation. Discretized wavelet transform is used for extracting

the important features. Finally, six theft attack samples are created for the Irish

dataset case. The results showed superior performance of the proposed model in

terms of AUC when applied to real datasets. In [56], the researchers proposed

an ensemble machine learning model with a stacking structure to identify the

electricity theft users in the SGCC dataset. The pre-processing of the dataset is

done with the 3-sigma rule, mean imputation, and min-max standardization. The

high dimensionality issue is addressed with principal component analysis. In the

proposed model light gradient boosting, knn and lstm are chosen as base classiﬁers

while SVM tuned with PSO is used as the ﬁnal estimator. The ﬁnal results show

35 By: Arshid Ali

MS Thesis Electricity Theft Detection

an AUC value of 0.986 which is greater than other comparing models. A similarly

stacked autoencoder along with LSTM sequence-to-sequence (S2S) structure is

proposed in [57]. Autoencoders are used to capture the data pattern and the

LSTM-S2S model is used for the ﬁnal classiﬁcation. The proposed model is veriﬁed

on realistic ISET and SGCC datasets. The model achieved 96% accuracy and 0.93

AUC on the SGCC dataset. And 94.5% accuracy and 0.90 AUC for ISET dataset

classiﬁcation. The author in [58] proposed a ConvLSTM model for ETD purposes.

The pre-processing steps include data cleaning KNN imputation and IQR for

handling outliers. The borderline-SMOTE is used for data balancing. Finally, the

CNN-LSTM model is implemented on the SGCC dataset. The proposed model

outperforms other methods by obtaining 0.977 for ROC-AUC and 96.6% accuracy.

Inspired by the importance of feature extraction techniques. The authors proposed

a two-stage theft detection method in [59]. In the ﬁrst stage an auto-regressive

integrated moving average, Holt-winters, and seasonality trend are analyzed to

capture important features from the dataset. In the second stage distributed

random forest classiﬁer is trained and tuned for electricity theft detection purposes.

The ﬁnal model is veriﬁed on the SGCC dataset that shows superior performance

in comparison with state-of-the-art techniques by gaining 98% accuracy and f1-

score. A similar feature extraction-based theft detection is proposed by Yifan

and Qifeng in [60]. A stacked sparse autoencoder is used for electricity theft

identiﬁcation. The autoencoder has a good feature extraction capability and is

used for electricity data reconstruction. A reconstruction error function is used to

compare the normal and theft users’ consumption. Three autoencoder layers are

combined with sparsity to make the model robust. The ﬁnal classiﬁer is optimized

with the PSO algorithm and veriﬁed on a real china dataset. The proposed model

obtained 90% detection rate and FPR less than 10%.

In [61], the problem of missing values in the dataset is focused on data classiﬁca-

tion. The authors relate the quick changes in the electricity consumption pattern

with the type of consumer. The data obtained from smart meters has missing val-

ues and that information needs to be counted in the ETD detection case. The pro-

posed technique shows a relationship between missing values and neural networks

through a neural architecture search (NAS) technique. The proposed architecture

shows 5% improved results, by NANs addressing, with an AUC value of 0.926.

Jeanne and Filipe in [62] estimated the importance of data balancing in ETD

36 By: Arshid Ali

MS Thesis Electricity Theft Detection

using a convolution neural network (CNN). The NaNs in the SGCC dataset are

imputed with the linear interpolation method. Then six diﬀerent class balancing

techniques are used including Random Oversampling, Random Under-sampling,

K-medoids-based Under-sampling, SMOTE, and using Cluster Oversampling. Fi-

nally, CNN is used as a classiﬁer to separate theft users from normal consumers.

The results showed superior performance on random oversampling and CBOS-

based CNN classiﬁcation with an AUC value of 0.67 and 0.68 respectively.

3.2 Problem Analysis

The existing literature study showed that electricity theft detection still has a gap

and further research is needed to adequately solve this issue. More speciﬁcally,

it has been found that pre-processing steps greatly aﬀect the classiﬁer prediction

capability. Therefore, this research proposed implementing and analyzing ﬁfteen

individual classiﬁers on diﬀerent class balancing techniques. We exploit that some

ML classiﬁers and class balancing techniques showed improvement toward theft

identiﬁcation results.

Summary of Related Work

Table 3.1:

0

Paper

References

Dataset Data Pre-

Processing

Model Performance

1. Paper [63] China/SGCC Feature

Reduction

CNN Accuracy=

92%

2. Paper [64] Spain/Endesa Data Scaling XGBOOST Accuracy=

91.1%

3. Paper [65] Ireland/Irish Feature

Reduction

XGBoost Accuracy=

95%

4. Paper [66] China/SGCC Dimensional

Reduction

And

Balancing

UaRe-

Random

Forst

Accuracy=

93.6%

Continued on next page

37 By: Arshid Ali

MS Thesis Electricity Theft Detection

Table 3.1: (Continued)

5. Paper [67] China/SGCC Six

Synthetic

Attacks

Gradient

Boost

Accuracy=

97%

6. Paper [68] China/SGCC Synthetic

Theft

Attacks

CNN Accuracy=

92%

7. Paper [40] India/

Raipur

State

Synthetic

theft attacks

CNN-RNN-

BiLSTM

Accuracy=

97.1%

8. Paper [69] Ireland

(SEAI)

Clustering SVM FPR= 11%

9. Paper [70] Honduras Data Under-

Sampling

AdaBoost Accuracy

94%

10. Paper [65] 370 Homes

EC data

from 3 years

Data

Extraction

RF Accuracy=91%

11. Paper [71] USA Home

EC Daily

Data

Feature

Scaling

SVM Accuracy=

92.5%

12. Paper [72] Ireland/Irish Feature

Reduction

RNN Accuracy=

93%

13. Paper [73] 118, 300

Bus System

Simulation

Data

False

Attacks and

Detection

SVE Accuracy=

93%

14. Paper [74] Korea/KEPCO Feature

Selection

CNN Accuracy=

85%

15. Paper [21] Pakistan/Precon Feature

Reduction

CatBoost Accuracy=

98%

16. Paper [75] LESCO Individual

Homes

SVM Accuracy=

75%

Continued on next page

38 By: Arshid Ali

MS Thesis Electricity Theft Detection

Table 3.1: (Continued)

17. Paper [11] SGCC Data

Balancing,

Reduction,

HP-tuning

Adaboost Accuracy=88%

18. Paper [76] SGCC Data

Balancing,

Reduction,

HP-tuning

ABC-model Accuracy=

91%

19. Paper [42] 10,000

Consumers

Data

Balancing,

CNN-GRU Accuracy=

91.1%

20. Paper [44] SGCC Data

Balancing,

Reduction,

BiLSTM-

Logit-Boost

Accuracy=

89.45%

21. Paper [45] SGCC Data

Balancing,

Reduction,

NGBoost Accuracy=

93%

22. Paper [46] SGCC Data

Balancing

CATBoost Accuracy=

95%

23. Paper [47] Colombian Data

Normalization

Bi-GRU-

CNN

Accuracy=

92.9%

24. Paper [48] China’s 50

Users data

for 3 months

Data

Balancing,

RF Accuracy=

94.53%

25. Paper [49] 5000 users

data

Data

Balancing

RF AUC= 0.91

26. Paper [50] SGCC Data

Reduction

WDCNN AUC= 0.83

27. Paper [51] SGCC Data

Reduction

CNN Accuracy=

88%

Continued on next page

39 By: Arshid Ali

MS Thesis Electricity Theft Detection

Table 3.1: (Continued)

28. Paper [52] Chinese

State Grid

data of 9655

users

Data

Extraction

and

Balancing,

Tuning

Hyper-

parameter

CNN-LSTM Accuracy=

93%

29. Paper [54] 10,000 users

data

Data

Balancing

CNN-LSTM Accuracy=

89%

30. Paper [60] 5000 users

data

Feature

Extraction

RF Detection

Rate= 90%

31. Paper [62] SGCC Data

Balancing

CNN AUC= 0.68

32. Paper [41] 5000 users

data

Six

Synthetic

Attacks

SVM Detection

Rate= 93%

33. Paper [35] Data from

various

homes in

USA

Data Nor-

malization

and NaN

Imputation

SVM Accuracy=

92.5%

34. Paper [53] Irish Data Data

Balancing

and

Extraction

Neural

Network +

Optimization

FPR= 0.29

3.3 Summary

A comprehensive overview of NTL detection is obtained in this section. A list

of diﬀerences is studied with their output performance. It has been observed

that theft detection eﬃciency is improved by ML-based techniques. The present

literature review is also summarized in a table. However, the results show a re-

search gap in terms of pre-processing, feature engineering, data balancing, and

40 By: Arshid Ali

MS Thesis Electricity Theft Detection

algorithm selection. It has also been observed that the present work lacks impor-

tant performance parameters that best explain the classiﬁcation index, which will

be addressed in this research work.

41 By: Arshid Ali

MS Thesis Electricity Theft Detection

Chapter 4

Proposed Model-1 and Simulation Results

42 By: Arshid Ali

MS Thesis Electricity Theft Detection

4.1 Introduction

The recent developments in data-driven techniques made the prediction process

simple and accurate. These data-based techniques, especially machine learning

techniques, are widely used for anomaly detection purposes. In a similar way, the

anomaly in the smart grid is actually the malicious/theft/dis-honest users who

disturb the electricity system and causes a large number of losses to the utility.

So machine learning algorithm can be used to detect the anomaly in the smart

grid. Although ML-based methods, in smart grids, made a lot of progress in theft

detection there still exist issues that need to be addressed. The real-time original

data obtained need some preparation steps which are missing in the present work.

Single machine learning algorithms suﬀer from under-ﬁtting and over-ﬁtting is-

sues. The data obtained in real-time is normally highly dimensional which time

consuming and may cause ineﬃciency in prediction. There is also a need for such

performance parameters which best explain the classiﬁcation performance of a

given model. Regarding these issues, an ensemble stacking model is proposed that

better reduces these issues. To evaluate the classiﬁcation performance, important

performance parameters are selected to address the current issues in ETD.

Proposed Methodology

In our proposed model, an ensemble AI technique is implemented for ETD in

SG. The data is obtained from a utility company. The original data is prepared

for ML model training using preprocessing and feature engineering steps. The

entire dataset is split into training and testing sets. The training set is fed to

base ML classiﬁers for training and prediction purposes. In the ﬁnal step, the

prediction from ML classiﬁers is used as features of a deep learning model for better

classiﬁcation results. The complete system, as shown in ﬁgure-4.1, is divided into

the following four steps.

1. In step 1, the data collected from the SG has some missing values and out-

liers, and has a large variance. This may be due to hardware issues, noise in

the communication medium, and users diﬀerent electricity consumption be-

havior. The missing values in the dataset decrease the model performance.

43 By: Arshid Ali

MS Thesis Electricity Theft Detection

Step-1: Data Pre-Process ing

Step-3: Training Data to Base Models

Step-4:

Level-1 Classifier

Step 3

Data

Normaliz

ation

Step 1

Data

Imputation

Step 2

Handling

Outliers

Data Management

Center Replacing NaN by Mean

Z-Score

Min-Max Scaling

Step-1 Step-3 Step-4 Step-2

Theft Users= 1

Normal Consumers= 0

Pre-Processed

Un-Balanced Data

01

Balanced Data

01

Minority

Class=1 Majority

Class= 0

SVM

Separating

Minority &

Majority Class

Synthetic Data

Generation

using Smote

   

LGBM XGBOOST RF ETC

Input

Layer Hidden

Layer Output

Layer

Output from Base Models

Data

Collection

Multi Layer

Perceptron

Prediction-3 Prediction-4Prediction-2

Ancillary PCA based Reduce d Data

Prediction-2

Towns

Pre-Processed Data

Step-2: Feature Engineering

Simple Conversion E.g

3-D Data to 2D Data

X

Z

Y

ZY

Y

M-Dimensions Data

N

Dimensional

Data

D-1

D-2

D-3

D-N

E-1

E-2

E-3

E-N

Eigen-Values Transpose of

Eigen Vectors

V1-T

V2-T

V3-T

VN-T

X

D1

D2

D3

DM

Reduced

M

Dimensional

Data

PCA Leveraging

Figure 4.1: Proposed ETD Stacked Generalization Model.

Therefore, they are replaced with mean values. To address the issue of out-

liers’ handling, a simple interpolation technique is used. The large variation

in the dataset reduces the model training capability. Therefore, the data is

normalized using Min-Max scaling.

2. The step-2 addresses the issue of high dimensionality of the SGCC dataset.

The original dataset is reduced using principal component analysis (PCA)

in order to increase storage eﬃciency and performance, and reduce storage

cost and time complexity.

3. Step-3 shows the original training data in the four ML models. These base

44 By: Arshid Ali

MS Thesis Electricity Theft Detection

models predict the output individually. These level-0 ML models include

LGB, XGB, LR, and ET classiﬁers.

4. A multilayer perceptron (MLP) is used as a level-1 classiﬁer in step 4, which

obtainS the output of level-0 models and predict the ﬁnal output in the form

of theft or normal electricity consumer.

4.2 Dataset Information

The dataset used in this study is obtained from the real-time electricity consump-

tion of Fujian city consumers connected to SGCC. This SGCC dataset, available

as an MS Excel ﬁle, has a total of 42,372 consumers. There are mainly two types

of consumers in this dataset, which are labeled as 0 and 1. Label-0 indicates a

normal user while label-1 indicates theft consumers. The consumers and their

corresponding daily consumption are arranged as rows and columns in a table,

which shows the records and features of the dataset, respectively. Details of the

dataset are organized in table-4.1.

Original Dataset Information

1. Source of Data Utility (SGCC)

2. Consumption Duration 01 /01 /2014 to 31/ 10/ 2016

3. Consumers Category Residential

4. Type of Data Dially Consumption

5. Total Consumers/ Samples 42,372

6. Normal Consumers 38,757

7. Theft Consumers 3,615

8. Features 1,034

Table 4.1: SGCC Original Dataset Information.

4.3 Pre-processing

In electricity theft detection, we learn the users ECP to the model and then use

that for future CPTA predictions. Therefore, a proper or near to exact pattern is

45 By: Arshid Ali

MS Thesis Electricity Theft Detection

needed for accurate detection. While the electricity consumption data obtained

from the utility is un-scaled, imbalanced, and has missing values and outliers.

Moreover, the information obtained from the original dataset, as shown in table-

4.1, can’t be used for accurate model training. Therefore, the data must be pre-

pared using some ML techniques. After the data is preprocessed, the consumption

patterns of the theft and normal users can be eﬃciently drawn.

Two Random Users from Honest and Theft Class

Electricity Consumption

Figure 4.2: Electricity Consumption Pattern of Two Random Con-

sumers from SGCC Dataset.

As shown in ﬁgure-4.2, from a sample of the SGCC dataset that a normal user

has a much smooth electricity usage pattern than a theft consumer that has large

variations in the usage pattern. So the ﬁnal pre-processed data can be used for

model training and user behavior prediction. The pre-processing steps used in the

proposed model are discussed below.

4.3.1 Missing Data Imputation

The dataset obtained from the utility has a large number of missing values, de-

noted as not a number (NaN) and NaN are diﬀerent from each other. These NaN

values may be due to systematic, environmental, or random errors. The missing

values cant be neglected during preprocessing as it decreases the model perfor-

mance and also, replacing it with zero will result in a loss of information. Many

46 By: Arshid Ali

MS Thesis Electricity Theft Detection

data science techniques are available for missing values imputation such as replac-

ing NaN with mean, median, or mode. The median and mode cause a repetition

of values in ECP, which again lead to negative performance. The linear interpo-

lation method given in [68] is used where NaN is replaced with mean, as given in

equation-4.3.1: It has mainly 3-imputation conditions.

f(x) = 





x(m,n)−1+x(m,n)+1

2,if xm,n =N aN, x(m,n)±1=NaN

0,if x(m,n)±1=NaN

xm,n Otherwise

In equation 4.3.1,

xm,n = the daily electricity consumption,

xm,n−1= the previous value

xm,n+1 = the next value to NaN. (4.1)

4.3.2 Handling Outliers

In the ECP, we found some values to be too large or too small as compared to

the normal value. These unexpected values (Outliers) deceive the model, which

incurs a large execution time. Generally, the values below 10 percent and above

90 percent are treated as outliers.

47 By: Arshid Ali

MS Thesis Electricity Theft Detection

Figure 4.3: Total contribution of outliers.

As shown in ﬁgure-4.3, the outliers in the dataset are less in number and show

very little contribution towards model training. A novel z-score capping-based

outliers handling method [77], shown in algorithm-1, is applied to make the data

more useful. The z-score outliers capping (ZSOC) technique works by ﬁrst ﬁnding

the z-score using equation-4.2.

Z−Score =Z=xi−µ

σ2(4.2)

where

σ=qPN

n=1

(xn−µ)2

n−1

Z= standard score

xi= random value of outliers

µ= the mean value

sigma = standard deviation of row i.

After calculating the Z-score, properly assign the lower and upper limits to the

individual feature. The data point which is less than the lower limit or greater than

the upper limit is replaced with the corresponding limit. The main advantage of

using the capping technique is that it places the outlier at its respective extreme

value instead of completely removing the entire row. This helps to retain the

48 By: Arshid Ali

MS Thesis Electricity Theft Detection

useful information from losing in contrast to the present research [38,78] where

the outliers are entirely removed. Algorithm-1presents the complete process.

Algorithm 1: ZSOC working in cyber-physical Electricity theft detection.

1Start

Data: Xi,j

Result: Zm,n

2//Initialization

3For i = 2,3,4,N

4While

5Original Dataset (X) is Selected

6Find Z-Score, Z

Z=(xi−µ)

σ

7while Z-Score is Calculated

8for t 1 to T

9Find:

10 lower_limit = µ−3∗σ

11 the upper_limit= µ+ 3σ

12 Upper limit imputation

13 if Z>upper_limit then

14 Replace the Z with upper_limit

15 end

16 Impute Lower limit

17 if Z<lower_limit then

18 Replace the Z with upper_limit

19 end

20 when no condition is satisﬁed

21 Impute

22 xi=Z

23 return (Zm,n)

4.3.3 Unit based Normalization

The Z-method was used to handle the outliers, however, the dataset still has large

variations in the ECP of users, shown in ﬁgure-4.4, for a sample taken from the

SGCC dataset. These large variations degenerate the output performance, as the

ML and DL models are sensitive to the variation and quality of the dataset.

49 By: Arshid Ali

MS Thesis Electricity Theft Detection

Two Randomly Selected Features

Outliers

Figure 4.4: Variations in ECP of Electric Theft and Honest Con-

sumer.

Min-max normalization from [71] is applied for data scaling to the range [0, 1].

Min-max normalization has the mathematical form, shown in equation-4.3.

f(xi,j ) = ((xi,j )−min(X))

(max(X)−min(X)) (4.3)

Here min(X) and max(X) show the minimum and maximum electricity consump-

tion (EC) of feature j in the data.

4.3.4 Data Balancing

After missing value imputation, outlier removal, normalization, and feature engi-

neering, the next step is to check for the class imbalance. In the SGCC dataset,

the number of normal and abnormal users is not equally proportional. In the

SGCC dataset, out of the total number of 42,372 users, the number of honest and

illegitimate users are 38,757 and 3,615 respectively, as given in table-2. The ma-

jority class (Normal=0) has more consumers than the minority class (Theft=1),

as shown in Fig. 4.5:

50 By: Arshid Ali

MS Thesis Electricity Theft Detection

38,757 38,757

3,615

38,757

0

5,000

10,000

15,000

20,000

25,000

30,000

35,000

40,000

45,000

Im-Balance Dataset Balanced Dataset

Count

Class

Normal User Theft User

;

Figure 4.5: SVM-SMOTE base Balanced Dataset(SGCC).

Due to this skewed behavior of the dataset, the machine learning model also shows

a biased behavior towards the majority class and classiﬁes the theft user (TU) as

a normal consumer (NC).

To overcome the imbalance class issue SVM-SMOTE is applied which results in

improved performance. The imbalanced and balanced data is given in Fig. 4.5

below:

Diﬀerent techniques are used in the present research such as RUS, random over-

sampling (ROS), and SMOTE. RUS reduces the records in the majority class to

balance the dataset. But due to the reduction in the records, some important

information is also lost, which decreases the model’s performance.

51 By: Arshid Ali

MS Thesis Electricity Theft Detection

Normal

Users

Theft Users Normal

Users

Theft Users

Original Dataset Balanced Dataset

Normal

Users

Theft Users Normal

Users

Theft Users

Original Dataset Balanced Dataset

Hyper-Plane Drawn

by SVM

Chose Minority Class

Use KNN for

Data Point Relation

Generate Synthetic

Samples

Class-1 = Class-2

Original

Dataset

Hyper-Plane Drawn

by SVM

Chose Minority Class

Use KNN for

Data Point Relation

Generate Synthetic

Samples

Class-1 = Class-2

Original

Dataset

Balanced

Dataset

No

Yes

SVM-SMOTE Working

Figure 4.6: SVM-SMOTE System Diagram.

In contrast to RUS, ROS repeats random samples of the minority class and makes

it equal to the majority class. Due to the repetition in the dataset,the over-

ﬁtting issue arises. To overcome the issues of RUS and ROS, SMOTE balances

the dataset by generating synthetic data of the minority class. SMOTE leads

to very high record dataset and causes time complexity issues. For proper data

balancing and to overcome the issues in the above-used techniques, we propose

a support vector machine minority oversampling technique (SVM-SMOTE) for

better classiﬁcation performance. SVM-SMOTE is a modiﬁed form of SMOTE

used for minority class oversampling. In this method, a hyperplane is drawn

between the minority class and majority class, and synthetic data is generated on

the minority side to obtain a balanced dataset. So a clear boundary is obtained

between the ECP of normal and malicious users, and it becomes easy for model

learning and future prediction. The SVM-SMOTE technique is used, shown in

ﬁgure-4.6, for balancing the SGCC dataset.

4.3.5 Feature Engineering

The data pre-processed in previous steps are fully prepared to learn ML and DL

models. But the ML and DL have time complexity issues on a big and high-

dimensional dataset. The feature engineering step is performed to reduce the size

of the original dataset and also retain useful information. In the proposed system,

principal component analysis (PCA) is used for feature reduction purpose. The

52 By: Arshid Ali

MS Thesis Electricity Theft Detection

PCA is used for higher dimensional data reduction into lower dimensions. This is

obtained by forming linear relations among the features using mean and variance.

The reduced features obtained are called components that are independent of

each other. This is due to the fact that PCA ﬁnds variance among the features

and forms new components from the correlated features. The features which are

more correlated are stored as individual components. Similarly, the feature with

the highest variance has more information and is stored as the ﬁrst component.

The second highest variance as the second component, and so on. The overall

PCA-based dimensionality reduction process- [79] is given in Algorithm-2.

Algorithm 2: Dimensionality Reduction Steps in Principal Component Anal-

ysis

1Start

2Input Data Y

3Output Z

4while Original Dataset, Y is Selected do

5Chose Point of Interest

6Find the Mean:

7Mean=µ=x+xn−1

n

8while n= 0 do

9where n = 2,3,4,N

10 xn=values from dataset

11 x=point of interest

12 end

13 Find the Variance:

14 if µis Calculated, then

15 Using

16 σ2=∑N

n=1(xn−µ)2

n

17 xn=Each value in the dataset

18 n = Number of values in the dataset

19 µ= Mean of all the values

20 end

21 If Find the Eigen Values(λ) :

22 Determinant = Det|A-λ∗I|

23 A=Data Matrix

24 λ=Eigen Value

25 I=Identity Matrix

26 if λis Determined then

27 Compute the Eigen vector from Eigen Values:

28 AX = λ∗X

29 A= N-dimensional data

30 X= N-Variables in dataset

31 end

32 if Eigen Vectors are Determined then

33 Sort Vector in Descending order w.r.t lambda Values:

34 λn, λn−1, λn−2, λn−3, λn−i, λ2, λ1

end

35 if done then

36 Find the new Matrix W’ from Eigen Vectors

37 end

38 Find Z from Y and W:

39 Z=W ∗Y

40 End

end

41

53 By: Arshid Ali

MS Thesis Electricity Theft Detection

As the repeated and more related features are summed up as an individual compo-

nent, it also reduces the over-ﬁtting issue in the model. An additional arithmetic

leveraging technique is also applied which results in improved performance in the

proposed model. A set of 300 important features were extracted from the overall

1,034 features in the SGCC dataset. This helped in reducing the execution time.

The main disadvantage of using PCA is that it can’t capture the minimum co-

variance of the two classes and interpret the output features into such a uniform

linear shape that it again leads to a small increase in simulation time. This issue

is been tackled using an arithmetic leveraging technique, which also enhances the

ECP separation.

4.4 Model Selection

In ML and DL, the time series and high dimensional dataset have enormous ECP

and a single algorithm cant learn and predict the accurate behavior. Four diﬀerent

ML models are considered in this work as weak learners to capture the ECP of all

the customers for better generalization. These learners are LGB, RF, XGB, and

ET. The structure and process of all level-0 learners are as follows:

4.4.1 Base Learner-1

Light gradient boosting (LGB) released by Microsoft in 2017 [80]. LGB is a

modiﬁed form of gradient boosting tree algorithm with leaf-wise splitting for higher

accuracy. Due to the leaf-wise splitting structure, LGB is useful for complex

modeling like time series classiﬁcation, regression, ranking, etc.

4.4.2 Base Learner-2

Random forest (RF) is an ensemble ML algorithm used for classiﬁcation and re-

gression. The algorithm is simple in structure with many DT. RF is the best

tool used for multi-variable datasets. It is the more widely used algorithm, and

54 By: Arshid Ali

MS Thesis Electricity Theft Detection

can produce good results without hyper-parameter optimization. The basic mech-

anism of this model is that it uses the bootstrapping phenomenon. Where the

original dataset is randomly divided into subsets with replacements [81]. These

bootstraps are then used for DT and each tree makes a prediction. Based on

these predictions, a voting mechanism is performed. This gives rise to the ﬁnal

prediction of the RF model. RF can handle large datasets, reduce the variance

and over-ﬁtting, and show a higher accuracy as compared to the DT classiﬁer.

The Gini index is a statistical term used to predict the outcome probability of a

random forest. Mathematically, Gini index can be found using equation-4.4 [82].

Gini_index = 1 −

c

X

i=1

(pi)2(4.4)

Here, c = Number of classes

pi= Relative frequency of the given class outcome

The pseudo-code of random forest classiﬁer- [83] shows the complete classiﬁcation

process and is given below-3:

Algorithm 3: Random Forest Classiﬁer

1H

2Start

3Original Dataset D

4Majority Voted Classiﬁer (MVC)

5Training Set= xj, j = 1,2,3, ....m

6Testing Set= xk, k = 1,2,3, ....n

7d=←(xi, yi), i = 1,2,3,4, ...N ;

8if d=xjthen

9Draw from Bootstrap (d) ←xj;

10 Make un-pruned trees;

11 Select best features based on gini-index;

12 Split until each tree grows to its maximum;

13 end

14 if Trees are formed then

15 Test the given trees on testing set, xk;

16 Collect prediction from the given trees (Pd);

17 MVC= Sum Pdonxkfrom 1,2,3,...n;

18 end

19 Return MVC.

55 By: Arshid Ali

MS Thesis Electricity Theft Detection

4.4.3 Base Learner-3

XGB or regularized gradient boosting is a sequential tree-based algorithm that

focuses on computation speed and model performance. This algorithm basically

works on the Taylor series function to ﬁnd the loss function [84]. The model

combines the weak learners sequentially to improve their learning. A new reg-

ularization term is included to prune the extra leaf and avoid overﬁtting. The

algorithm can be used for both regression and classiﬁcation tasks and has been

designed to work with large and complicated datasets. The algorithm- [85] is given

below-4.

Algorithm 4: Extreme Gradient Boosting Classiﬁer

1H

2D is the labeled training data

3Initialize model with a constant value

4=L(yi, )

53: for do m = 0 to M

64: Compute the pseudo-residuals

75: Fit base learner to pseudo residuals

86: Ti = new DecisionTree()

98: Ti. train(Di,features)

10 9: Compute multiplier m

11 10: Update the model

12 11: output Fm(x)

4.4.4 Base Learner-4

Extra tree classiﬁer (ETC), also named extremely randomized tree, is a DT-based

bagging technique. It uses training data and creates a large number of random

un-pruned trees. In the ﬁnal step, ETC reduces the model training by collecting

a random DT for the best split [86]. Due to the random pruning phenomenon

and in the absence of an optimum splitting step, ETC has a very short execution

time and is applied in this model- [87]. The algorithm of the extra tree classiﬁer

is given below-5.

H

56 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm 5: Extra Tree Classiﬁer

1Input: D

2Output:Y

3Initialize Buildingrandomtree(LS) :

4If LS contains a number belonging to the same class,

5returns a leaf labeled with that class.

6Else:

7Make subset and Choosetestrandom(LS);

8Divide LS into LSleft and

9Buildingrandomtree(LSlef t)andrightf romthesesubsets;

10 Create a node with the test attach left and right as successors of this node

and return the resulting tree. Choosetestrandom(LS) :

11 Randomly select a position

12 Randomly select a threshold

13 Find mean and standard deviation values for subsets in LS.

14 If the score of this test is above a given threshold return the test

15 Otherwise, return to the Step and select another position

16 If all positions have already been considered, send the the best test so far.

4.4.5 Stacking Model

The main purpose of building a stacking ML model is to obtain better classi-

ﬁcation results, speciﬁcally theft detection in a SG. The model produces more

accurate results than the individual classiﬁer. The stacked generalization com-

bines the learning ability of multiple algorithms for optimum accuracy in terms of

classiﬁcation [88]. The proposed system combines the strength of all four level-0

classiﬁers for a reduction in variance, bias, overﬁtting, and execution time. The

model deals with big data, and makes accurate predictions. In the stacking model,

the training dataset is fed to the base learners with k-fold cross-validation. The

level-0 learns to make predictions on the out-of-fold dataset. In the next step, the

predictions from all the base learners are used as features of the level-1 classiﬁer

or meta-classiﬁer. The meta-classiﬁer learns the predictions of level-0 learners and

predicts the output class. The complete stacking process is shown in Algorithm-6.

57 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm 6: Proposed Stacking Generalization Technique for Theft Detec-

tion

1Start

2Input= X

3Output= Final Prediction = Pf

4Original Data=X=

5N

(i,j,k,l,m,n,o=1) (x)(i, j, k, l, m, n, o), y(i, j, k, l , m, n, o)

6while Original Dataset is Selected do

7Split the data:

8Trainging set=X= N

(i,j,k,l,m=1) [(x)(i,j,k,l,m),y(i,j,k,l,m)

9Testing set=Y= N

(n,o=1) x(n,o),y(n,o)

10 Level-0 classiﬁer (C1):

11 while n= 0 and do

12 for t 1 to T= X

13 learn base classiﬁer C1 on X1=N

(i,j,k,l=1) x(i,j,k,l),y(i,j,k,l)

14 predict C1 on Validation set V1= N

(m=1) (xm, ym))

15 output prediction=P1

16 end

17 Level-0 classiﬁer (C2)::

18 if then

19 C1 is Calculated learn base classiﬁer C2 on X2= N

(i,j,k,m=1) x(i,j,k,m),y(i,j,k,m)) 

20 predict C2 on the Validation set

21 V2= N

(l=1) (xl, yl)

22 output prediction=P2

23 end

24 Find C3

25 if C2 is Determined then

26 learn base classiﬁer C3 on X3= N

(i,j,l,m=1) x(i,j,l,m),y(i,j,l,m)

27 predict C3 on the Validation set

28 V3= N

(k=1) (xk, yk)

29 output prediction=P3

30 end

31 Find C4, if C1, C2, C3 is Determined then

32 Level-0 classiﬁer (C4): Similarly, learn the base classiﬁers C4 Make predictions P4

33 end

34 if After All Level-0 are Predicted then

35 Meta classiﬁer (M1):

36 learn Meta Classiﬁer M1 on X=[P1,P2,P3,P4]

37 end

38 Predict M1 on Y

39 end

4.5 MLP Mathematical Modeling

An MLP is a useful tool for non-linear data classiﬁcation. It has three main

layers: an input layer, a hidden layer, and an output layer. The number of input

layers depends on input data, the hidden layers are used for weight updation and

output layers are equal to the number of classes in the given dataset. The input

layers provide a scaled signal to the hidden layers. The weights are real numbers

multiplied by the input signals. The hidden layers give the weighted sum of the

58 By: Arshid Ali

MS Thesis Electricity Theft Detection

given information [89].

yo=

n

X

i=1

wixi+b(4.5)

The above-obtained information is still in linear form. The activation function

given below is used to obtain information about non-linear data.

f(x) = 1

1 + e−x(4.6)

Then the information obtained from hidden layers can be found using the equation

below:

yo=f(x)(

n

X

i=1

wixi+b)(4.7)

where yois the output, wiis the weight value, xiis input data, bis the bias factor

and f(x) is the activation function.

The number of neurons determines the hidden layers in the network. If the number

of neurons is kept very then small it will lead to model under-ﬁtting while a large

number of neurons lead to an over-ﬁtting issue in the model prediction. A default

sigmoid activation function is used in the network for non-linear data modeling.

The limit of sigmoid activation is between 0 to one, with 0 for negative values and

1 for positive values. The overall equation used for MLP is given below.

yo=f[W Omn(

n

X

i=1

W Iij xi+b1) + b2](4.8)

where W Iij is the weight of input layer, W Omn is the weight of output layer, b1is

the input bias factor and b2is the bias in output layer.

4.6 Performance Metrics

For classiﬁcation problems, various performance parameters are used to evaluate

the ﬁnal output and performance of the model like confusion matrix, F1-Score,

Area Under the Curve, Precision, Recall, Receiver Operating Curve, and Accuracy.

These parameters are helpful in checking the overall performance of a model. The

59 By: Arshid Ali

MS Thesis Electricity Theft Detection

parameters are explained with respective mathematical forms in the following

paragraphs [36].

1. Confusion Matrix

In ML, a confusion matrix is used to measure classiﬁcation performance. It

is an N * N matrix. Here N is the number of classes in a given dataset. The

matrix has 2 dimensions with actual class and predicted class. In our SGCC

data set, we have binary (2) class classiﬁcations normal (0) and malicious

(1). So the confusion matrix is 2*2 matrix [90]. And has the following

4-types of outputs.

(a) True Positive (TP) = True positive is the actual positive class (1) value

which is predicted positive (1) by the classiﬁer.

(b) True Negative (TN) = True negative is negative class (0) data points

which is predicted as negative (0) by the model.

(c) False Positive (FP) = False positive is the negative class (0) values and

the model classify it as positive class (1).

(d) False Negative (FN) = And false negative shows the positive class (1)

values which is predicted as negative class (0) by the ML model.

2. Accuracy

In ML accuracy is used to measure the overall performance of a model on

a given dataset. It is used to measure how much data is classiﬁed correctly.

Considering the confusion it is the number of correctly classiﬁed data points

divided by all the data points predicted by a given ML model. Mathemati-

cally, accuracy is calculated using equation-5.38 [90].

Accuracy =T P ∗T N

T P ∗T N ∗F P ∗F N (4.9)

After data scaling, there are still some outliers in the dataset, which disturb

the model’s performance and learning time. The Z-Score-based capping

technique is used in this paper to properly address the outliers. The main

advantage of using the capping technique is that it places the outliers at their

respective extreme value instead of completely removing the entire row.

60 By: Arshid Ali

MS Thesis Electricity Theft Detection

3. Precision

Precision is the portion of positive data points that are correctly classiﬁed.

It is the positive class values, from all the predicted values, that the model

classiﬁed as positive. Precision is sometimes referred to as speciﬁcity. The

following mathematical formula, given in equation-4.10, used to ﬁnd the

precision [90].

P recision(P) = T P

T P ∗F P (4.10)

4. Recall

The recall represents the number of positively classiﬁed data points. It is

the portion of actual positive values that the model classiﬁes as positive

and negative [90]. The recall is also called sensitivity and has the following

formula-4.11:

Recall(R) = T P

T P ∗F N (4.11)

5. F1-Score

In the classiﬁcation cases, the main aim is to obtain the best value for preci-

sion and recall. F1-Score is the measure used to ﬁnd the best classiﬁcation

values in terms of precision and recall [90]. Mathematically, it is the har-

monic mean of precision and recall, and is given in equation-4.12.

F1−Score = 2 ∗P r ecision ∗Recall

P recision +Recall (4.12)

6. Area Under the Curve

AUC is the total area covered by the ROC curve or the total points lying

under the ROC curve. The threshold with maximum ROC is called the AUC

value of that model [90].

7. Receiver Operating Characteristics

The ROC curve shows the TP predicted values at diﬀerent thresholds w.r.t

FP points. ROC is important to measure when dealing with imbalanced

datasets. More speciﬁcally, it is the graph of true positive rate (TPR) w.r.t

false positive rate (FPR) [90]. The equation-4.13 is used to ﬁnd the TPR:

T P R =T P

T P +F N (4.13)

61 By: Arshid Ali

MS Thesis Electricity Theft Detection

While to ﬁnd the FPR, equation-4.14 is used:

FPR =F P

F P +T N (4.14)

Result and Simulation

4.7 Simulation Setup

In this section, we set the prepared and reduced dataset into training and testing

subsets. The processed data obtained from the above three steps are split into

80:20 for model training and testing. The ﬁnal result is shown in the next section.

4.8 Results Discussion and Evaluation

The results obtained are evaluated in the form of important performance metrics

required for classiﬁcation purposes. The experimental results obtained after the

model simulations are discussed as follows. Accuracy is a general classiﬁcation

term that may not be a good metric for classiﬁcation. Therefore, a combination

of diﬀerent performance metrics is used for good classiﬁcation. Figure-4.7 shows

the training and testing accuracy, F1-score, and AUC of the proposed model.

0.9979 0.9777 0.9777 0.9775

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Training Accuracy Testing Accuracy AUC F1-Score

;

Figure 4.7: Proposed-Model Performance on SGCC Dataset.

62 By: Arshid Ali

MS Thesis Electricity Theft Detection

Model + Applied Technique (s) Accuracy in % FPR in % FNR in % Time (s)

PCA + SMOTE 95.95 1.31 2.73 1320

PCA + SVM-SMOTE 96.33 1.09 2.57 1540

Z-Score-Capping + PCA + SVM-SMOTE 96.27 1.12 2.61 1220

Z-Score-Capping + PCA(Features=200) + Arithmetic-Leveraging +SVM-SMOTE 97.3 0.60 2.04 1850

Z-Score-Capping + PCA(Features=400) + Arithmetic-Leveraging +SVM-SMOTE 97.29 0.62 2.09 2520

Z-Score-Capping + PCA(Features=300) + Arithmetic-Leveraging + SVM-SMOTE 97.69 0.60 1.82 2070

Table 4.2: Output Performance of Diﬀerent Models on pre-

processed data.

The confusion matrix obtained in ﬁgure-4.8 shows the ﬁnal prediction of the pro-

posed model in terms of TP, TN, FP, and FN. As seen from the ﬁgure that the

proposed model has a high detection rate for normal and theft class prediction.

The values of FP and FN represent wrongly classiﬁed users. A reduction in these

parameters is obtained, with the misclassiﬁcation of FP and FN to 0.54% and

1.69%, respectively. This achieves our proposed objective in terms of very low

FPR and FNR.

Predicted Value

Actual Value

Figure 4.8: Proposed-Model Confusion Matrix on SGCC Dataset.

The precision and recall values are usually calculated in a single relationship. This

combined precision-recall curve (PRC) is obtained for diﬀerent threshold values.

The high precision shows a low FP value and a high recall represents a lower value

of FN. The PRC curve, shown in ﬁgure-4.9, obtained using the proposed model

shows both precision and recall have a high value of 97.1%.

63 By: Arshid Ali

MS Thesis Electricity Theft Detection

Data Splitting Train/Test Size = 80:20 Train/Test Size = 75:25 Train/Test Size = 70: 30

Model Tr. Acc Tes. Acc AUC F1-Score Tr. Acc Tes. Acc AUC F1-Score Tr. Acc Tes. Acc AUC F1-Score

LGBM 0.9478 0.9329 0.9330 0.9315 0.9487 0.9293 0.9292 0.9275 0.9480 0.9309 0.9308 0.9291

RF 0.9940 0.9493 0.9494 0.9488 0.9943 0.9487 0.9487 0.9483 0.9937 0.9444 0.9443 0.9436

XGBoost 0.8329 0.8239 0.8240 0.8168 0.8353 0.8240 0.8239 0.8197 0.8334 0.8304 0.8306 0.8242

ET 0.9992 0.8252 0.8252 0.8319 0.9984 0.8044 0.8046 0.8131 0.9993 0.8069 0.8070 0.8151

Proposed-Model 99.78 0.9769 0.9769 0.9766 0.9974 0.9754 0.9753 0.9749 0.9982 0.9743 0.9743 0.9739

Table 4.3: Models Performance on Diﬀerent Data Splitting.

ET, PRC=0.7508

LGBM, PRC=0.9157

XGB=0.7873

RF=0.9302

Proposed-Model, PRC=0.9725

ET, PRC=0.7508

LGBM, PRC=0.9157

XGB=0.7873

RF=0.9302

Proposed-Model, PRC=0.9725

Recall

Precision

Figure 4.9: Precision-Recall Curve of the Base Models and

Proposed-Model.

In ML accuracy shows how much of the data points are correctly classiﬁed out

of the total predicted points. This clariﬁes how many of the users are predicted

malicious and how many are predicted as normal using the ML model.

Figure-4.10 shows a bar chart with accuracy values of all base models and the pro-

posed model. The proposed model achieved a high accuracy of 97.6% as compared

to level-0 models. The values are given in table-??.

64 By: Arshid Ali

MS Thesis Electricity Theft Detection

0.8163 0.8346

0.9354 0.9492

0.9777

0.7

0.75

0.8

0.85

0.9

0.95

1

ET XGB LGB RF Proposed Model

Output Accuracy

ML Models used in Research

;

Figure 4.10: Accuracy Comparison of Level-0 and Proposed-Model.

The ROC plots the TPR and the FPR at diﬀerent thresholds. The high value of

ROC shows the positive class prediction ability. Figure-4.11 shows the ROC value

of the proposed model to be 96.7%.

XGB, ROCAUC=0.8346

ET, ROCAUC=0.8163

LGB, ROCAUC=0.9354

RF, ROCAUC=0.9492

Proposed-Model, ROCAUC=0.9777

XGB, ROCAUC=0.8346

ET, ROCAUC=0.8163

LGB, ROCAUC=0.9354

RF, ROCAUC=0.9492

Proposed-Model, ROCAUC=0.9777

False Positive Rate

True Positive Rate

Figure 4.11: Comparison of Base Models’ ROC with Proposed-

Model ROC.

65 By: Arshid Ali

MS Thesis Electricity Theft Detection

Model F1-Score AUC Accuracy

XGBoost 0.8168 0.8240 0.8239

ET 0.8319 0.8252 0.8252

LGBM 0.9315 0.9330 0.9329

RF 0.9488 0.9494 0.9493

Proposed-Model 0.9766 0.9769 0.9769

Table 4.4: {F1-Score, AUC, Accuracy of the Base Models, and

Proposed Model

In ML, AUC is a 2-dimensional curve with TP on the y-axis and FP on the x-axis.

The AUC aggregates the TP and FP values on all given thresholds. A high value

of AUC suggests a higher prediction of a positive class, which is electricity theft

in our case.

The ﬁgure-4.12 presents the AUC, F1-score, and accuracy values of all the level-0

models and the proposed model. The results given in table-4.8 also shows a higher

performance of the proposed model as compared to the base models.

LGBM ET RF XGB Proposed-Model

Accuracy F1-Score AUC

Benchmark Models and Proposed Model

Output Performance

Figure 4.12: Comparison of AUC, F1-SCORE and Accuracy of

Base Models and Proposed Model.

4.9 Summary

Due to issues in single algorithm implementation, an ensemble stack generalization

approach is proposed in this work. The data obtained need some pre-processing

66 By: Arshid Ali

MS Thesis Electricity Theft Detection

for better classiﬁcation. The pre-processing steps adopted in this work are missing

data imputation, data normalization, outliers removal, and class balancing. Fea-

ture reduction is done by principal component analysis. A stacking model consists

of level-0 classiﬁers, RF, ET, LGB, XGB, and level-1 classiﬁers as MLP. The clas-

siﬁcation metrics are explained with actual deﬁnitions and mathematical forms

to address the ETD issues to an optimum level. Further to this, a simulation

setup is fully deﬁned including training, testing data, and machine speciﬁcations.

The results obtained are visualized by proper plots. The description of plots with

proper reasoning is explained. After ﬁnal evaluation, it is found that the problems,

focused on in this work, are fully addressed and the proposed model outperformed

the existing techniques.

67 By: Arshid Ali

MS Thesis Electricity Theft Detection

Chapter 5

Proposed Model-2 and Simulation Results

68 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.1 Classiﬁcation Algorithms

We discussed and implemented 15 types of classiﬁcation algorithms. The detail of

each classiﬁer is given below:

5.1.1 Decision Tree (DT)

Decision trees (DT) are supervised ML techniques that use a tree structure resem-

bling a ﬂowchart to represent events, results, and predictions. The decision tree’s

root node is the ﬁrst segment, which includes the complete dataset. Decision trees

work on any numerical dataset and do not need to operate with continuous vari-

ables, in contrast to neural networks and regression [91]. Decision trees are data

structures in which each leaf node denotes an outcome and each branch represents

a decision rule or feature that points to a certain class label. Decision trees are

commonly used to ﬁnd the solution to regression and classiﬁcation issues. Tree

models are used in classiﬁcation problems to label or categorize an entity using

target variables with discrete values. Decision trees using a predictive modeling

approach are extensively employed in ML and data mining [92].

Good decision trees address vital variables, such as deciding upon the features to

split, the values of feature split, and the point at which you should stop splitting.

Gini index: This metric measures the error in classiﬁcation in a likelihood man-

ner. The Gini index is used as an objective function in classiﬁcation and is given

by the formula,

GiniIndex =GI =

C

X

i=1

pi∗(1 −pi)(5.1)

Where P = probability of object classiﬁcation.

The information gain metric shows correct classiﬁcation and is oppositely related

to GI and entropy. The splitting process, in DT, is obtained using Entropy or

the Gini index as a splitting criterion. It reduces the error in feature splitting.

Information gain is given by the formula,

Info_Gain =IG =Eparent −Echildren (5.2)

69 By: Arshid Ali

MS Thesis Electricity Theft Detection

Where Eparent = Entropy before splitting and Echildren = Entropy after splitting.

Pruning practices reduce the overﬁtting factor by eliminating tree sections with

low predictive power. This simpliﬁes the decision tree by eliminating the weak or

not-so-relevant rules. This can be achieved in two ways:

1. Reduce the decision tree’s maximum depth,

2. And set a minimum sample size, required, for each decision space.

The complete process of the decision tree is given in algrorithm-7below [93].

Algorithm 7: Decision Tree Classiﬁer

1

Input:

A. Labeled Training Dataset

b. List of Attributes

c. Splitting Method based on Given Attributes

Output: A decision tree

Method:

Select a Node P

If all the data is of the same class, then

Output the leaf node as P, with the given class C

If the list of attributes is zero, then

Output P as a terminal node labeled of majority class D

Find the best splitting point based on D.

Use splitting criteria to label the node P

Find the Entropy at each node

Find the information gained at each node.

Label node P to a class with maximum information gain.

Output the tree structure.

Return P.

5.1.2 Logistic Regression

Logistic regression is a classiﬁcation method that shows a relationship between

continuous input variables and a categorical output. As opposed to standard

70 By: Arshid Ali

MS Thesis Electricity Theft Detection

linear regression, logistic regression modeling is unique. The response variable Y

is discrete in logistic regression as opposed to continuous. Logistic regression uses

the logit function, given in Eq. 5.3, which makes it diﬀerent from linear regression.

Logit −F unction = 1/(1 + e(−value))(5.3)

Where ’e’ is the base of natural logarithms ’value’ is the actual numerical value

that wants to be transformed.

Logistic regression is a non-linear model which is used for binary classiﬁcation

which is diﬀerent from the polytomous regression model that makes multi-class

prediction [94].

The theft detection process in a power system is a binary classiﬁcation problem

[95]. Let y be the observation of a sample, y=1 and y=0, representing energy

theft and non-theft, respectively. Let x be input features from users’ data, theft

probability can be expressed using the logit function, given in Eq. 5.4:

hθ(x) = P(y= 1|x) = 1

1 + e−g(θ;x)(5.4)

here θshows the parameters of the model needing to be calculated through train-

ing, g(θ;x)is the classiﬁcation boundary. Thus, the likelihood of theft occurrence

can be expressed as follows:

P(y= 0|x) = 1 −hθ(x) = 1

1 + eg(θ;x)(5.5)

Suppose N samples whose observations are y1, y2, , yi, , yNand the relative fea-

ture vectors are x2, x2, , xi, ..., xN.From the given equations, the likelihood of

observation yican be expressed as:

P(yi|xi) = hθ(xi)yi[1 −hθ(xi)]1−yi(5.6)

Assuming independent instances in the given dataset, we can adjust model pa-

rameters θbased on the maximum likelihood estimation, which is described as

follows.

71 By: Arshid Ali

MS Thesis Electricity Theft Detection

L(θ) =

N

X

i=1

P(yi|xi) =

N

X

i=1

hθ(xi)yi[1 −hθ(xi)]1−yi(5.7)

Its logarithmic form is:

ln L(θ) =

N

X

i=1

(yiln hθ(xi) + (1 −yi) ln [1 −hθ(xi)]) (5.8)

Thus, the parameter θin the equation above can be obtained by optimization

method [96]. The algorithm from [97] of logistic regression is shown in algorithm-

8:

Algorithm 8: Logistic Regression

1A. Input Data D

2B. Output Class C

31. Training Set, S: For input m ←1 to S

42. For every training instance ds:

53. The value of regression is

6zm←ym−P(1|dn)

P(1|dn)−(1−P(1|dn))

74. Set the weight of each d_n

8P(1| dn).(1 −P(1|dn))

95. Output: Label (the class: C1)$if P(1|dn)>0.5

10 6. Otherwise (class C2)

5.1.3 K Nearest Neighbors Classiﬁer

The K-nearest neighbor (KNN) classiﬁer is an important algorithm used for classi-

ﬁcation, pattern recognition, and other ML tasks. KNN has been regarded among

the top mining tools because of its simplicity, eﬃcacy, and ease of application in

KNN-based categorization. As a result, numerous practical classiﬁcation tasks

beneﬁt from the application of KNN-based classiﬁcation algorithms. The major-

ity of KNN variants eﬀectively decide the target group for the new samples by

utilizing a majority voting mechanism around k’s nearest neighbors. However,

such a classiﬁcation easily changes with k. It can even make the sensitivity of k

72 By: Arshid Ali

MS Thesis Electricity Theft Detection

in KNN-based classiﬁcation worse, particularly in scenarios with small numbers

of minority samples and outliers [98].

The KNN stores all available class samples and then uses a distance function for

prediction. KNN is a lazy learner and incurs much less computational time than

SVM and logistic regression, as it does not require training time for prediction

purposes. The commonly used distance function is Euclidean-Distance and is

given in Eq. 5.1.3.

Euclidean Distance = qPk

i=1(ai−bi)2(5.9)

The above formula is used for continuous data but when we have categorical input,

then hamming-distance is used, given in Eq. 5.10.

DH=

k

X

i=1 |ai −bi|(5.10)

The given distance functions are used in the classiﬁcation model with k number

of nearest values. KNN has been used as a non-parametric approach since the

1970s [99]. The algorithm-9shows the complete procedure of KNN [100].

5.1.4 Bernoulli Naive Bayes Classiﬁer

Bernoulli Naive Bayes classiﬁer (NBC) works on probability rule called the Naive

Bayes theorem. NBC operates on the assumption that each of the identiﬁed

features is independent of the others. [101]. The mathematical form of Bayes

rule-5.11 is stated as:

P(A|B|) = P(A)P(B|A|)

P(B)(5.11)

73 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm 9: , KNN Algorithm

11. Input:

2A. The input data D,

3B. Prediction set x,

4C. Class label set C

52. Output: The class c_x of prediction set x, c_x from class C

63. Start:

73.1 For each y from data D do,

83.2 Find the diﬀerence D(y, x) from y to x

9end for

10 4. Chose a subset N belongs to the set D,

11 5. N shows k number of samples of k neighbors from the test set x

12 6. Determine the class of x:

13 7. cx=argmax Py∈NI(c=class(y))

14 End

Where features A and B are considered independent of each other. P(A|B)shows

the probability of A while event B already exists. Here P(A) is the probability of

feature A and P(B) shows the event B probability.

Each attribute feature has the same impact on the classiﬁcation outcome, accord-

ing to the independence feature; however, the information value demonstrates how

each feature aﬀects the outcome. Weight coeﬃcients are determined for each fea-

ture, and a weighted Naive Bayes model is created in order to adhere as closely

as possible to the assumptions of Naive Bayes. The model’s projected class is

chosen based on the category with the highest posterior probability [102]. The

algorithm-10 from [103], shows the complete procedure of the NBC.

5.1.5 Perceptron

Frank Rosenblatt put up "The Perceptron: A Perceiving and Recognizing Au-

tomaton" in 1957 as a category of synthetic neural nets that embodied features of

the brain. The classiﬁcation of linearly separable patterns can be addressed using

the perception [104].

For supervised classiﬁcation, the perceptron algorithm works well. A set of input

vectors X, with each vector having a predetermined categorization, is accepted as

74 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm 10: Naive Bayes Algorithm

1Input: 1. Training set S,

22. P=(p_1, p_2, p_3,..., p_n), shows prediction on testing set

33. Input: Testing data

44. Step:

1. From the training set S;

2. Find the mean and standard deviation for each data point;

3. Reiterate

Determine the probability for each p_i using the gauss density formula for

the class;

Till the probability for all predictions (p_1, p_2, p_3, ..., p_n) has been

found.

4. Find the likelihood of the class;

5. Chose the highest likelihood.

End

training data by the learning algorithm, is accepted as training data by the learning

algorithm. The speciﬁed classiﬁcation is regarded as the desired output (x). The

procedure gets training data that can be used to identify users built using the Add

classiﬁcation information. A single-dimensional hyperplane is produced by the

algorithm following convergence. To do this, change the bias and weight vectors

until each training element is accurately classiﬁed by the algorithm which is either

1 or 0. The algorithm also uses ηasalearningratef orweightupdation1001[105].

y=wo+

n

X

i=0

WiXi(5.12)

where W_i is the weights and X_i is the input data. The perceptron shows good

results for binary classiﬁcation using the step function given below-5.13.

f(xi) = 





1,if Pn

i=0 WiXi>0

−1Otherwise

(5.13)

The perceptron neural network structure is straightforward, as is its basic working

75 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm 11: Single Layer Perceptron

11. Input: Training dataset T,

2Input: Training set X,

3use learning-Rate chose max_Epoch

42. Output: Use weight W for bias

53. Start the algorithm with arbitrary W

63. Set the weight W[1]=1.0 and W[2]=1.0, chose bias=0, number of epoch=0,

ﬂoat accuracy= 0

74. While (chosen epoch < max_Epoch and accuracy is 1) do

85. For all vector v_x in set X do

96. (y sgn((W [1] * v_x.X) + (W [2] * v_x.Y ) + bias)

10 7. if (y = v_z multiply Class) then

11 8. Do the setting for the next step

12 9. bias (bias + (η∗v_x.Class)∗1)

13 10. W [1] (W [1] + (η∗v_x.Class)∗v_x.X)

14 11. W [2] (W [2] + (η∗v_x.Class)∗v_x.Y )

15 end if

16 12. Determine the accuracy and increase the epoch

17 13. Accuracy Find Accuracy(X, W, bias)epoch + +

18 Stop.

concept. In order to track the output behavior (using an activation function) and

ultimately make a comparison with the desired output, each input is individually

weighted by a certain value before being added up to create the ﬁnal output.

5.1.6 Linear Discriminant Analysis

First proposed by R. A. Fisher, linear discriminant analysis (LDA) is one of the

most straightforward methods used for classiﬁcation jobs. Using a multivariate

classiﬁer called linear discriminant analysis (LDA), it is possible to assign samples

to one of N classes by discovering some statistical aspects of the data: the data

covariance matrix, the weight of the class within the training samples, and the

mean of each class and how closely it resembles the sample [106] using the equation-

5.14.

Sb=

g

X

i=1

Ni(xi−µ)(xi−µ)T(5.14)

76 By: Arshid Ali

MS Thesis Electricity Theft Detection

where µis the overall mean, Niis the sample size of class i and xiis the mean of

class i for each sample.

The population covariance matrix and means related to each class, however, could

not be fully understood. It is standard procedure to replace these with sample

estimates calculated using the training data in the LDA’s discriminant score. If

the number of training samples is suﬃcient in comparison to the features, this

shouldn’t have a signiﬁcant impact on performance. Determining the co-variance

matrix is highly inaccurate in a high dimensional dataset [107].

The sample covariance matrix cannot be used as a plug-in estimator in some

extreme cases where the sample size is less than the number of features since

doing so would require computing the inverse of the sample covariance matrix,

which is necessary to calculate the discriminant score of the LDA. The complete

LDA process is shown in algorithm-12.

Algorithm 12: Linear Discriminant Analysis

11. Find class mean matrix, M(k×p), and variance matrix, W(p×p, where

22. W=Pk

k=1 (xi−µ′

k)(xi−µ′

k)T

33. using Eigen-decomposition of W.

44. Sphere the means: M∗=M W (−1

2),

55. Compute B∗=PK

k=1(µ′∗

k−µ′∗)(µ′∗

k−µ′∗)T

66. PCA: Obtain L eigenvector (V∗

l)

7in V∗ofB∗=V∗DBV∗Tcorresponding to the L largest eigenvalues.

87. These deﬁne the coordinates of the optimal substance.

98. Obtain L new (discriminant) variables

10 Zl= (W−1

2)V∗

l)TX,

11 for l=1,2,..., L.

12 9. Output new classiﬁed components

Using the approach, we reduce our dataset by minimizing the data X to Z and

moving to features L from features p.Repeating the previous LDA procedures ways

for classiﬁcation purposes [108].

77 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.1.7 Passive Aggressive Classiﬁer

The Passive Aggressive (PA) Algorithms are a group of online learning algorithms

that Crammer et al. suggested (for both classiﬁcation and regression). Online

learning frequently employs the PA algorithm, which is a margin-based learning

strategy. On the one hand, the online PA algorithm updates the pre-existing

classiﬁer by updating the weight vector to correctly categorize the current case.

On the other hand, the new classiﬁer must adhere to the old classiﬁer as closely

as possible.

Passive Aggressively can be used as an alternative to the perceptron algorithm to

overcome its issues. The PA algorithm has been proven superior to many other

alternative methods like Online Perceptron. The high performance is due to the

penalty rule used in PA. During prediction, PA penalizes the event by 1 if it is

false and no penalization for correct detection. The cost function used in the PA

algorithm is referred to as 0-1 loss [109].

PA algorithm also uses stochastic gradient descent for optimization purposes. This

optimization algorithm addresses the hinge-loss-5.15 function in the PA algorithm

which is given below. Where theta is a classiﬁer, x is features and y is the target

variable.

Losshinge(θ, x, y) = max(1 −y(θ.x),0) (5.15)

Except for one point, the hinge loss is continuous and diﬀerentiable throughout

It is when y(θx) = 1, which can be assessed by taking a derivative.

when y(θ.x) = 1, the gradient can be ﬁnd as:

∂

∂θ Losshing e(θ, x, y) = 





−yx y(θ.x)≤1

0Otherwise

(5.16)

Hinge-loss and margin (distance between separator and data points) are inversely

related. The margin used below is also called a discriminative function that gives

78 By: Arshid Ali

MS Thesis Electricity Theft Detection

an output score. Whenever the margin exceeds 1, the loss becomes zero other-

wise its the diﬀerence between the margin and one. Consequently, the passive-

aggressive algorithm’s objective is to identify the following θ,θ(k+1) that reduces:

λ

2

θk+1 −θk



2Losshinge(θ, x, y)(5.17)

Now we will see where the term passive-aggressive got its name. It is due to the

fact that the PA algorithm is passive for correct classiﬁcation and aggressive for

false prediction. Also from the equation given below, the update step is:

θk+1 =θk−η▽Losshing(θ, x, y) = θk+ηyx (5.18)

In the ﬁnal step, the output is obtained using the equation:

yt=sign(updated_θ.w)(5.19)

The complete procedure of passive aggressive classiﬁer from [110], is shown in

algorithm-13

Algorithm 13: Passive Aggressive Classiﬁer

11. Input: Aggressiveness factor C> 0

22. Start: The weights from w1= (0, ..., 0)

33. For all m = 1,2,...

44. Use the instance: x_m ∈Rn

55. Estimate: y_m = sign(w_m.x_m)

66. Output true label: y_m ∈(−1,1)

77. Occur loss: l_m=max(0, 1-y_m(w_m x_m))

88. Update: set:

9τm=lm

∥xm∥2(P A)

10 τm=min nC, lm

∥xm∥2o(P A −1)

11 τm=nlm

∥xm∥2+1

2Co(P A −2)

12 9. Update: wm+1 =Wm+τmymxm

13 10. Output class of the new data points

79 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.1.8 Stochastic Gradient Descent

Modern machine learning relies heavily on stochastic gradient descent (SGD).

SGD reﬁnes a function by taking smaller steps along noisy gradients. Robbins and

Monro’s (1951) classic ﬁnding is that this process provably achieves the function’s

optimum (or local optimum, when it is nonconvex). Recent research examines the

beneﬁts of constant step sizes, gradient or iterative averaging, and adaptive step

sizes. The gradient descent technique includes stochastic gradient descent. The

cost function, however, can be updated after each cycle with just a few random

data sets. To update the solution, SGD merely requires a few random samples of

the data. Each training sample x(i)and label y(i)is updated using SGD [111].

θ=θ−η· ∇θJ(θ;x(i);y(i))(5.20)

To update the solution, this is the same as drawing A’s sub-matrices. Because each

SGD iteration is small and simple to compute, this method can run faster than

traditional descent methods, even if the update direction is not always optimal

[112].

SGD, which oﬀers substantially faster convergence in the order of only, addresses

the problem of high computational cost. Only the quantity of data required to

calculate the objective function’s gradient makes SGD diﬀerent from other meth-

ods. The accuracy of weight updates and update times are traded oﬀ based on

the amount of data available [113].

80 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm 14: Stochastic Gradient Descent

1Start:

21. Initialize ηand Wo

32. For Set s= 1, 2, 3, ... S

43. For i ∈(1,2, ...S)

54. Select one sequence randomly at a time.

65. Ws=Ws−1−η(Ws−1, xi, yi)

76. Track the changes

87. E′

s= 0

98. For all j= 1, 2, 3, ... S

10 9. Ej=E(Ws, xj, yj)

11 10. E′

s=E′

s/S

12 11. s= s+1

13 12. Check for the stop measure

14 End

The pseudo-code-14 of SGD also only has two main steps: individual gradient

computation and weight update, and instead of calculating a genuine gradient,

it calculates the gradient of a sample that was randomly chosen during iteration.

The method terminates when the required conditions are met, which is once again

the maximum number of iterations or before the system starts to overﬁt.

5.1.9 Gaussian Naive Bayes

The Gaussian Naive Bayes (GNB) probabilistic classiﬁer relies on the Bayes theo-

rem that makes the strong (naive) assumption that each feature is independent of

all others. Assuming a Gaussian distribution for attribute values given the class

label, the GNB classiﬁcation is an example of the nave Bayes approach [114]. For

example, the dataset with the ith attribute has its mean and variance and are

represented by µc,i and σ2

c,i respectively, given the class label c.

p(xi|c) = 1

q2πσ2

c,i

exp(−(xi−µc,i)2

2σ2

c,i

)(5.21)

81 By: Arshid Ali

MS Thesis Electricity Theft Detection

The equation-5.21 given below, shows the data with values x_i with class c which

is also called a normal distribution.

The equation to ﬁnd the average µ-5.22 and standard deviation (δ)-5.23 has the

following formulas:

Average =µ=Pn

i=1 xi

n(5.22)

Standard_Deviation =δ2=Pn

i=1(xi−µ)2

n−1(5.23)

The GNB algorithm is relatively straightforward, easy to use, and doesn’t need a

lot of training data. It can deal with missing data very eﬀectively, is not sensitive

to irrelevant features, and scales linearly with the number of features and data

points. The GNB algorithm’s reliance on predictor independence is a signiﬁcant

ﬂaw [115].

The given pseudo-code-15 from [116], for GNB-based classiﬁcation, shows the

complete process.

Algorithm 15: Gaussian Naive Bayes Algorithm

1Start:

21. Read the training data

32. Divide the data into classes C

43. For each C do

54. Find the attributes

65. Find the mean and standard deviation

76. Use the Gaussian Function to calculate probabilities

87. Choose which of the two classes has the highest probability value

98. Verify how well the predicted class matches the actual class

10 9. Accuracy of the return value

11 End

5.1.10 Multinomial Naive Bayes Algorithm

The Multinomial Naive Bayes (MNB) algorithm is simple, easy to use, and doesn’t

require a lot of training data. It scales linearly with the number of features and

82 By: Arshid Ali

MS Thesis Electricity Theft Detection

data points and excels at handling missing data. An important problem in the

MNB algorithm is its reliance on predictor independence [117]. The distribution

vectors-5.24 are:

θy= (θy1,··· , θyn)(5.24)

Where y is the class label, n shows a number of features, and θyiis the probability

P(xi|y)for all features for a sample of classy.

Let C is the total number of classes. The MNB with highest probability P(C|ti)

using Bayes rule-5.25,

P(C|ti) = P(C)P(ti|C)

P(ti), c ∈C(5.25)

The prior probability can be found by taking the ratio of tokens of a class to the

total number of tokens. P(ti|C)is the likelihood of obtaining a token ti in class

C. The scaling term P(ti)-5.26 can be calculated as:

P(ti) =

|C|

X

k=1

P(k)P(ti|k)(5.26)

Algorithm-16 [118] gives the pseudocode of MNB.

Algorithm 16: Multinomial Naive Bayes

1Start:

21. d=n=(n1, ..., nv)

32. C=argmaxcP(D|C)P(C)

43. argmaxcP(n|C)P(C)`v

i=1 P(wi|C)niP(C)

54. argmaxcP(n|C) + logP (C) + Pv

i=1 nilogP (wi|C)

65. argmaxcP(C) + Pv

i=1 nilogP (wi|C)

76. Output Class, C.

8End

83 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.1.11 Ridge Classiﬁer

Ridge classiﬁcation is a technique that is used to analyze linear discriminant mod-

els. It is a form of regularization that penalizes model coeﬃcients to prevent

over-ﬁtting. Over-ﬁtting occurs when a model is too complex and captures noise

in the data instead of the underlying signal. Ridge classiﬁcation addresses this

problem by adding a penalty term to the cost function that discourages complexity.

Typically, the penalty term is the sum of the squared coeﬃcients of the model’s

features. This inhibits over-ﬁtting by requiring the coeﬃcients to stay small. By

altering the penalty, one can adjust the amount of regularisation, which results in

more regularisation and decreased coeﬃcient values. Under-ﬁtting, however, may

occur if the penalizing time is too long. In contrast to logistic regression, the ridge

classiﬁer’s loss function is not a cross-entropy loss. Instead, a mean square loss

with an L2 penalty is used as the loss function.

Algorithm 17: Ridge Classiﬁer

1Start:

21. Converts the target variable to the proper values of +1 and -1

32. Create a ridge model with mean square loss as the loss function

43. Use regularization (ridge) as a penalty term

54. If the value being predicted is less than 0,

6Then Predict the class target as -1

75. Otherwise, the estimated target class is +1

86. One-versus-all training is used to train the Ridge classiﬁer

97. Label-Binarizer is used

10 8. Objective is one binary classiﬁer per class

11 End

In ridge regression, the classiﬁcation error is associated with a regression problem

and can be found using the cost-sensitive formula:

min

β∥c⊙(y−Xβ)∥2

2+λ∥β∥2

2(5.27)

where c∈Rnis a vector of error vector for each instance and shows element-wise

multiplication.

84 By: Arshid Ali

MS Thesis Electricity Theft Detection

The vector c can be divided into two parts:

c=c(p)c(n)(5.28)

where c(p)shows miss-classiﬁcation error related with the true instance and c(n)

shows the error associated with the negative instance [119]. The pseudo-code- [120]

of ridge classiﬁer is given below-17:

5.1.12 Nearest Centroid Classiﬁer

Nearest Centroid (NC) and the neighborhood relation and is derived from Rela-

tive Neighbourhood graphs have both been employed successfully in ﬁnite situa-

tions. The resulting classiﬁcation methods aim to ﬁnd prototypes that are distant

enough, but also uniformly or symmetrically shaped. The NC technique is a solid

baseline classiﬁer that yields results that are understandable, but its performance

suﬀers when the data points are distant with comparable variances. Each labeled

data point in the input of the algorithm is a member of a distinct class, and the

algorithm is given a number of these data points.

Algorithm 18: Nearest Centroid Classiﬁer

1Start:

21. Calculate the class means based on the training data

32. m+=1n + Pn+is.t.yi = +1xi

43. m=1nPnis.t.yi = 1xi

54. Calculate the distance between a new test point, x, and the mean of each

class

65. Calculate d+ = ||xm +||2(Here ||.||2 means Euclidean distance)

76. Compute d=||xm||2

87. Classify x to the class corresponding to the smaller distance.

98. Find the smaller of d+ and d

10 9. Computing the class means m+, m corresponds to training the classiﬁer.

11 10. w=m+m

12 11. Find the intercept is given by

13 12. b=12(||m+||2||m||2)

14 13. Compute the discriminant f=wTx +b.

15 14. Compute the sign of the discriminant y’=sign(f).

16 15. Classify y’ to the positive class if wTx+b > 0and to the negative class if

wTx+b < 0.

85 By: Arshid Ali

MS Thesis Electricity Theft Detection

The algorithm’s relatively straightforward centroids computation step is used in

the model ﬁtting step. A new data point is categorized by locating the centroid

that is closest to it in Euclidean distance and applying the matching label after

the centroids of each class have been located [121]. An NC classiﬁer, also known

as the nearest prototype classiﬁer, outputs target training samples that are near

the centroid [122]. Usually, the Euclidean distance formula is used to ﬁnd the

diﬀerence as shown in equation-5.29:

d(p, q) = p(p1−q1)2+ (p2−q2)2+. . . + (pn−qn)2(5.29)

Here p is the actual class data and q shows class centroid: p_1...p_n is features

of n obtained data q_1...q_n is attributes of n class centroid.

The estimated diﬀerence between the actual samples and the number of class

centroids are measured, rated, and the closest distance is chosen. The membership

of the class was then determined using the observed data. Pseudo-code is available

from the classiﬁer [123].

5.1.13 Quadratic Discriminant Analysis

A quadratic discriminant analysis (QDA) is a multivariate classiﬁer. The QDA

generalizes the linear discriminant function analysis, which ﬁts multivariate nor-

mal distributions with estimates of each group’s covariance. Using statistical clas-

siﬁcation, the QDA Classiﬁer divides measurements of multiple instances by a

quadratic surface. It ﬁnds the correlations in the data set for each class based

on its relation with the centroid. The ﬁnal results suggest the likelihood of class

spectral [124].

The only signiﬁcant diﬀerence between QDA and LDA is the assumption that

the co-variance matrix may diﬀer for each class, leading us to determine the co-

variance independently for each class k, where k =1, 2, ..., K.

Quatratic function is used for QDA classiﬁcation and is given in Eqn. 5.30.

86 By: Arshid Ali

MS Thesis Electricity Theft Detection

δk(x) = −1

2log X

k−1

2(x−µk)T

−1

X

k

(x−µk) + logπk(5.30)

This quadratic discriminant function shows diﬀerent behavior than linear discrim-

inant function due to a sum of k factor and will contain second order terms as

shown in equation-5.31:

Classiﬁcation rule-5.31:

G(x) = argmaxkδk(x)(5.31)

The classiﬁcation method simply predicts the class k that maximizes the quadratic

discriminant function. Quadratic equations in x represent the choice boundaries.

QDA typically ﬁts the data more accurately than LDA, despite having more pa-

rameters to estimate. This is because QDA oﬀers the covariance matrix more

room for operations. With QDA, there are a lot more parameters. Because every

class has its own covariance matrix when using QDA [125].

The algorithm-19 shows the complete procedure for QDA [126].

Algorithm 19: Quadratic Discriminant Analysis

1Start:

21. Collect the training data

32. Set the prior probabilities using

4p′

i=ni

N

53. Do Bartlett’s test, whether the data has homogeneous or heterogeneous

variance-covariance

64. If Pi=Pjfor some i=j then

7the data has heterogeneous variance-covariance matrices and QDA can be

applied

85. Identify and estimate the conditional probability density functions’

parameters f(X|πi)

96. Compute the discriminant functions

10 7. Use cross-validation to estimate misclassiﬁcation probabilities

11 8. Classify observations with unknown group memberships.

12 End

87 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.1.14 Complement Naive Bayes

In Multinomial Naive Bayes, only one class c is employed to estimate weights.

In contrast, the Complement Naive Bayes algorithm uses all training data from

all classes except the c class. The weights will be lower for the class with less

training data if the training data is skewed. Therefore, classiﬁcation will unfairly

favor one class over another. A novel "complement class" variation of Naive Bayes

is introduced to deal with skewed training data, and it is known as Complement

Naive Bayes (CNB) [127].

In contrast, CNB uses data from all classes aside from class C to estimate the

parameters. Because each estimate employs a more equal distribution of training

data across classes, CNB’s estimations are more accurate and reduce the biasness

in weight assignment. Because of ﬁnding more reliable weight estimates, prediction

performance can be increased. These beneﬁts result from more data per estimate,

although overall, CNB is less vulnerable to skewed data bias when utilizing the

same amount of data.

The algorithm-20 shows the complete procedure of CNB Algorithm.

Equation-5.32 shows the the formula of Complement Naive Bayes Algorithm rule

θ′

ci =Nc′i+αi

N′

c+α(5.32)

where Nci shows point i repetitions in a class of data other than c and Nc is

the total number of instant occurrences in classes other than c, and αiand

αaresmoothingparameters, asinEquation.Asbef ore, theweightestimateiswci =logθci

and the classiﬁcation rule is

lCN B (d) = argmaxc[logp(θc)−X

i

filog Nc′i+αi

N′

c+α](5.33)

The parameters are estimated using data from all classes except class C in CNB.

The estimations made by CNB are more precise and have less bias in the weight

estimates since each estimate uses a more fair distribution of training data across

classes. The classiﬁcation accuracy has improved as more trustworthy weight

88 By: Arshid Ali

MS Thesis Electricity Theft Detection

Algorithm 20: Complement Naive Bayes

11. Let (d1, ..., dn)be a set of data, with dij is the count of data i in set j.

22. Let y= (y1, ..., yn)be the labels.

33. CNB (d, y)

44. dij =log(dij + 1)

55. dij =dij

log k1

kδik

66. dij =dij

√k(dkj )2

77. θci =y:yj =cdij +αi

i:yj =ckdkj +α

88. ωci =logθ

99. ωci =ωci

iωci (Weight Normalization)

10 10. Let t=(t1, t2, ...tn)is a test point,

11 let tiis the class according to

12 l(t)= argmincPitiωci

estimates have been discovered. Although overall, CNB is less susceptible to

skewed data bias when using the same amount of data, these advantages derive

from more data being used per estimate. The classiﬁcation rule is

lOV A(d) = ar gmaxc[logp(θc) + 

i

filog Nc′i+αi

N′

c+α−

i

filog Nc′i+αi

N′

c+α](5.34)

This is a combination of the regular and complements classiﬁcation rules [128].

The algorithm-20 from [118] shows the complete CNB procedure.

5.1.15 Dummy/Blind Classiﬁer

A blind classiﬁer (BC) is an algorithm that only takes into account its statistics

to generate output labels at random. Suppose p is the available target class

in the dataset. The likelihood that the BC will assign a certain target lito a

speciﬁc instance if the p classes are distributed equally over the dataset. Where

P(li) = 1/p$fori = 1,2, ...p.

The former equation must be weighted in the event of label unbalancing in relation

to the proportion of patterns in class Lito all other patterns. Since the dummy

classiﬁer does not take the information contained in the training set into account.

When assigning the output labels, it is used as a comparative matrix to measure

89 By: Arshid Ali

MS Thesis Electricity Theft Detection

the classiﬁer performance [129]. The studies employ a "dummy" classiﬁer that

uses a straightforward stratiﬁed technique to create predictions at random while

adhering to the distribution of classes in the training set [130].

5.2 Data Balancing Techniques

We evaluate our proposed structure on imbalanced data and also use 5 diﬀerent

balancing techniques. The details are given below:

5.2.1 Synthetic Minority Over-Sampling Technique

The Synthetic Minority Over-Sampling (SMOTE) technique is used as a data

oversampling approach in dealing with the original training set. The main idea

behind SMOTE is to create artiﬁcial instances rather than simply duplicating

the instances of minority classes. Several minority class instances that are located

inside a speciﬁc neighborhood are interpolated to create this new data. Due to this,

the process is said to be ’feature space’ focused rather than ’data space’ focused,

i.e., the algorithm is based on the values of the features and their relationships

rather than taking into account the dataset as a whole. This also suggested more

research into the theoretical relationship between actual and artiﬁcial instances,

including a thorough analysis of data dimensionality.

Instances of the ximinority class are chosen as the foundation for brand-new

artiﬁcial sampling points. Several nearest neighbors of the same class are chosen

from the training set based on Euclidean distance. To obtain fresh instances,

a randomized interpolation is then completed. The basic procedure operates as

follows: consider ﬁrst the total amount of oversampling N (an integer) ﬁrst, which

can be conﬁgured in either of the two ways to get almost the same distribution of

classes that is nearly 1:1.

The method is then carried out iteratively, in a series of steps. First, a train-

ing set instance representing a minority class is randomly chosen. Next, its K

closest neighbors are determined, which are set to 5. In order to compute the

new instances through interpolation, N of these K instances are ﬁnally selected at

90 By: Arshid Ali

MS Thesis Electricity Theft Detection

random. To do this, the diﬀerence between each of the chosen neighbor and the

feature vector (sample) is calculated. This diﬀerence is added to the prior feature

vector after being multiplied by a random value between 0 and 1. As a result, a

random point is chosen along the ’line segment’ connecting the features [131].

5.2.2 Adaptive Synthetic sampling approach

In the same way that SMOTE provides synthetic data for minority classes, He et al.

proposed ADASYN, which does the same. However, it is predicated on producing

more synthetic data for observations that are more diﬃcult to learn than those that

are simpler to learn for a particular model. Similar to SMOTE, ADASYN produces

synthetic observations along a straight line between an observation belonging to a

minority class and its k-nearest minority class neighbors. Similar to SMOTE, the

K-nearest neighbor number is set at 5. But ADASYN produces more synthetic

observations for minority class observations when there are more positive class

data in the region of the k-nearest neighbors. In contrast, if there are no majority

data within the k-nearest neighbors’ range, no artiﬁcial data will be produced for a

minority. The justiﬁcation behind this is that, these data make it more challenging

to infer minority observations that are very dissimilar to the majority views [132].

5.2.3 Combined Cleaning and Re-sampling Technique

The CCR algorithm was initially introduced by Koziarski and Woniak in the

context of binary classiﬁcation problems [133]. Spheres expand using the available

energy, with the cost increasing for every majority observation encountered during

the expansion. Since the majority of observations inside the spheres are being

translated instead of being completely removed, the information associated with

their original positions is preserved to a large extent preserved.

91 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.2.4 Noise Reduction A Priori Synthetic Over-Sampling

(NRAS)

The Noise Reduction A Priori Synthetic Over-Sampling approach is based on the

inclusion of the conditional probability of minority group membership (four) for

data that don’t appear to be noise. Using the Bayes Theorem, we determine one

of two methods for the likelihood that a group will consist of members.

To make p(x) marginal, we can do the following. For ease of use, we can utilize

this directly if the data distribution is known [134].

The NRAS method will provide a new featurethe probability of belonging to a

minority groupand exclude samples from the minority group that appear to be

noise.

The noisy sample is judged by the formula below:

NOlSEi= 1, if CDi>1

N



i=1

CD∗

it and RD > 1

N



i=1

RD∗

it2|0, else (8)(5.35)

NOISE_i is 1 if the sample xi is a noisy sample and is 0 if not. Array CD[]

and RD[] is the core distance and reachability-distance of all the samples. The

core distance and reachability-distance fully reﬂect the density information of the

datasets [135].

5.2.5 SMOBD (Synthetic Minority Over-sampling Based

on samples Density)

The SMOBD technique is based on oversampling datasets to produce samples that

are more closely related to their real distribution than previous approaches like

SMOTE and SMOTE-ENN. The noise in the data is removed and no additional

samples are synthesized to cover it up. A straightforward approach for determining

sample density based on reachability and core distances is suggested. Density for

alone instance can be calculated using the Eqn 5.36.

92 By: Arshid Ali

MS Thesis Electricity Theft Detection

DFi=η1∗εi+η2∗Niε (5.36)

DFistands for the density of sample xi, with mean values of 1and 2. The sum

of η_1and η_2is 1. The instance density depends on two variables. The lowest

instance size Niinside the radius and the quantity of nearest neighbor (k) at the

closest distance to the centroid.

The formula below calculates the synthetic instance around every minority group.

Ni=DFi/(

j=n

X

j=1

DFj)∗N(5.37)

Nimeans the number of new samples synthesized around sample xi. N means the

total number of new synthesized samples.

5.3 Research Methodology

The research data was obtained from Smart Grid Corporation of China (SGCC)

for Fujian City, China, and is available online at the SGCC website.

This data was chosen because of its ease of availability, and research gap and has

also been de-identiﬁed (for privacy purposes), therefore, conﬁdentiality is ensured.

The number of attributes is 1034 with one class (theft) having 3615 instances and

the other class having 38757 instances. Table-5.1 provides the complete informa-

tion about the dataset.

Attribute Value

TimeFrame of Data Collection 01-01-2014 to 31-10-2016

Total Consumers 42372

Number of Theft Users 3615

Number of Honest Users 38757

Table 5.1: Information of Real World SGCC Dataset.

The comparative analysis among various supervised machine learning algorithms

was carried out using sklearn with colab environment. The data set was trained

93 By: Arshid Ali

MS Thesis Electricity Theft Detection

to reﬂect the consumer class. The values 1s for theft class and 0’s for normal

users’ class is given in the dataset. The theft detected is consider positive or

1 and honest user predicted is considered negative or 0. The ﬁfteen classiﬁca-

tion algorithms were used in the course of this research namely: Decision Tree,

Naïve Bayes, Perceptron, Gaussian Naive Bayes, K Nearest Neighbors, Comple-

ment Naive Bayes, Linear Discriminant Analysis, Quadratic Discriminant Analy-

sis, Multinomial Naive Bayes, Logistic Regression, Passive Aggressive Classiﬁer,

Stochastic Gradient Descent, Ridge Classiﬁer, Nearest Centroid Classiﬁer and

for comparison Dummy/Blind Classiﬁer are used. The following performance at-

tributes were considered for the comparative analysis: Accuracy, F1-Score, MCC,

Precision, Recall, FPR, FNR, and AUC for Classiﬁcation. The overall structure

of the proposed paper is shown in ﬁgure-5.1.

Classifier

Prediction

Data

Collection

Data Pre-

Processing

Final Results

Classifier

Prediction

Data

Collection

Data Pre-

Processing

Final Results

Data BalancingData Balancing

Results Evaluation

HONEST THEFT

;

Figure 5.1: Proposed Electricity Theft Detection Model.

In order to predict the actual class, eight diﬀerent performance metrics are cho-

sen. To ensure the best classiﬁcation for diﬀerent machine learning algorithms,

this research work was carried out by using 15 ML classiﬁers with diﬀerent data

balancing techniques. As the SGCC dataset is class imbalance. The ﬁrst category

94 By: Arshid Ali

MS Thesis Electricity Theft Detection

was total instances of the dataset and all features without any class balancing. The

second category of data set was all instances and attributes while using SMOTE

for class balancing. This helps to avoid the model biasness to the majority class

and increases the actual class prediction occurrences. In the following category of

experiment ADASYN, SMOBD, NRAS, and CCR techniques are used to better

address the class imbalance issue. All 15 classiﬁers are analyzed based on the eight

performance metrics to determine their feasibility in theft and honest consumer

classiﬁcation in a smart grid environment.

5.4 Evaluation Parameters Used

For the comparative study, important performance indicators such as accuracy,

AUC, precision, recall, FNR, FPR, MCC, and F1-score are used. Every parameter

has a speciﬁc function and meaning in malicious behavior identiﬁcation. These

parameters are discussed as follows.

5.4.1 Accuracy

Accuracy shows the number of correctly identiﬁed users divided by the total num-

ber of classiﬁed consumers and is calculated using Eqn. 5.38.

Accuracy =T P +T N

T P +F P +F N +T N (5.38)

5.4.2 Recall

Recall is the ratio of the set of theft values that the classiﬁer predicted as theft

against the total number of theft samples, as given in Eqn. 5.39.

Recall =T rueP ositiv e

T rueP ositive +F alseN egativ e (5.39)

95 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.4.3 Precision

Precision is an important indicator of a machine learning model’s performance in

classiﬁcation. It shows the quality of theft prediction of the ML model, which is

given using Eqn. 5.40.

precision =T P

T P +F P (5.40)

5.4.4 F1Score

The f1score combines the precision and the recall and is given by the Eqn. 5.41:

f1−score =2×precision ×recall

precision +recall (5.41)

5.4.5 Area Under the Curve

The area under the curve (AUC) shows the total area covered under the ROC

plot or the total points lying under the ROC curve. The threshold with maximum

ROC is called the AUC value of that model.

5.4.6 False Positive Rate

The ratio of the quantity of false positives predicted by the model to the quantity

of FP and TN is known as the false positive rate (FPR). It uses the equation shown

below to gauge how eﬀectively the system can accurately reject false positives and

has the equation-5.42 given below:

FPR =F alseP ositive

T rueN egative +F alseP ositiv e (5.42)

5.4.7 False Negative Rate

False negative rate (FNR) shows the ratio of FN to the total of TP and FN. It

evaluates the chance that a target will be missed and have the equation-5.43 given

96 By: Arshid Ali

MS Thesis Electricity Theft Detection

below:

FNR =F N

T P +F N (5.43)

5.4.8 Matthews Correlation Coeﬃcient

A metric used to assess the performance of a binary or two-class classiﬁer is the

Matthews correlation coeﬃcient. The coeﬃcient returns +1 for an accurate fore-

cast, while a value of zero denotes a random prediction. MCC is a much more

accurate metric than accuracy or the F1 score because the latter two may be de-

ceptive because they do not consider the overall values from the confusion matrix

as shown in equation=5.44.

MCC =T P ×T N −F P ×F N

(T P +F P )(T P +F N )(T N +F P )(T N +F N )(5.44)

5.4.9 Receiver Operator Characteristic

A performance Receiver Operator Characteristic (ROC) indicator shows the clas-

siﬁcation performance using diﬀerent values of TPR and FPR. The TPR is given

on X-axis and FPR on Y-axis in the ROC curve.

5.5 Dataset and Simulation setup

The data on energy use that we utilized to develop and validate the proposed

framework is presented in this section. We test our model on a dataset that

contains both benign and harmful data. The data is obtained from the State

Grid Corporation of China (SGCC). It includes both positively and negatively

skewed energy estimations. The energy usage of 42,000 users was recorded over

three years. The SGCC dataset includes daily power usage statistics for 42,372

consumers, including 3615 electrical criminals (class 1) and 38,757 real consumers

(class 0). (1st January 2014 to 31st October 2016). For use in both model training

and prediction, the dataset is divided by 70:30.

97 By: Arshid Ali

MS Thesis Electricity Theft Detection

A system of Core-i7 and 8GB RAM is used for simulation purposes. Google co-

laboratory (colab) is used for code simulation. The results obtained are saved as

PDF for evaluation purposes and are discussed in the next section.

5.6 Simulation Results and Analysis

For the objective of detecting theft, we examined ﬁfteen diﬀerent categories of

ML classiﬁcation algorithms. All ﬁfteen classiﬁers are implemented using the

SGCC dataset. For this purpose, we employed the following ﬁfteen commonly used

classiﬁers: DT, NB, perceptron, MNB, SGD, PAC, LR, KNN, QDA, CNB, LDA,

GNB, NCC, RC, and Dummy classiﬁer. First of all, the original dataset is used

for theft and normal users’ classiﬁcation purposes with little pre-processing like

addressing the NaNs and data normalization. We examined all of the classiﬁers’

results in order to complete a thorough comparative analysis of 5 diﬀerent data

balancing techniques.

Output Accuracy

15 Implemented Models

(a) 15 Classiﬁers’ Output Accuracy

without Balancing

Output Performance

15 Implemented Classifiers

(b) 15 Classiﬁers’ Output Performance

without Balancing

In the initial round, all classiﬁers are trained using the whole dataset without

making any additional changes to the original dataset, and we then assess the

learned classiﬁers using the testing set from the dataset. The original imbalance

dataset, therefore, yielded biased outputs of the classiﬁers in the form of accuracy.

The testing accuracy of all the ﬁfteen models is given in ﬁgure-5.2b:

98 By: Arshid Ali

MS Thesis Electricity Theft Detection

In the ﬁrst set of experiments, we used the original dataset for mentioned classiﬁers

and obtained a quite random classiﬁcation prediction. However, DT and CNB

show higher AUC values of 0.60 and 0.59, respectively, as compared to the rest

of the classiﬁers. While SGD produces a completely random classiﬁcation almost

equal to the dummy classiﬁer, with an AUC value of 0.5. Looking at the MCC

score, which needs to approach 1 for the best classiﬁcation case, it comes out to

be 0.19 and 0.18 for CNB and DT, respectively.

Figure-5.3a presents classiﬁers’ results in terms of precision and recall for sepa-

rating theft from honest consumers. We found that the best result for all the

classiﬁers is linked to samples for each class in the dataset. Meanwhile, the detec-

tion of theft users is very low. SGD shows the worst performance on theft users

with a PRC value of 0.090 almost the same as the dummy classiﬁer with a PRC

of 0.087.

Precision

Recall

(a) 15 Classiﬁers’ Precision-Recall

Curve without Balancing

False Positive Rate

True Positive Rate

(b) 15 Classiﬁers’ AUROC without

Balancing

The most signiﬁcant indicator of an ML algorithm’s eﬀectiveness is accuracy. A

classiﬁer’s accuracy in an unbalanced dataset, however, cannot accurately deter-

mine how well it predicts. The area under the curve (AUC) score is an eﬀective

performance indicator for unbalanced datasets.

99 By: Arshid Ali

MS Thesis Electricity Theft Detection

5.6.1 Output Performance using SMOTE-based Data Bal-

ancing

In the 2nd set of experiments, we used the original dataset but used SMOTE

for class-balancing purposes. This time we obtained an improvement rather than

random predictions in the previous subsection.

Output Accuracy

15 Implemented Models

(a) 15 Classiﬁers’ Output Accuracy us-

ing SMOTE

Output Performance

15 Implemented Classifiers

(b) 15 Classiﬁers’ Output Performance

on SMOTE

This performance characteristic controls how quickly the true positive rate in-

creases when the false positive rate climbs. On the receiver operating characteristic

(ROC) curve, increasing the threshold values demonstrates the exchange between

the genuine positive and false negative rate on the receiver operating characteristic

(ROC) curve as shown in ﬁgure-5.3b. AUC refers to the maximum value achieved

by ROC curves in positive and negative diﬀerentiation. The positive and negative

outcomes are easy to distinguish if the AUC is 100%. It is impossible to make a

distinction between the negative and positive classes if the AUC keeps falling to

0%.

To compare the performance of the classiﬁers, as stated in the earlier section, the

models are implemented with training data samples while predictions were made

using a testing set. To conﬁrm the systems’ performance, evaluation metrics,

including recall, precision, and F1-score, are computed. The accuracy for all the

classiﬁers is obtained and plotted in ﬁgure-5.4a:

100 By: Arshid Ali

MS Thesis Electricity Theft Detection

As seen, the SMOTE-based balanced models outperform the models that are

trained on unbalanced data. For output ’1’ SMOTE+DT outperforms with AUC

and MCC values of 0.84 and 0.69, respectively. While NCC+SMOTE shows poor

performance with AUC and MCC values of 0.56 and 0.16, respectively. The rest

of the performance metrics are shown in ﬁgure-5.4b.

Precision

Recall

(a) 15 Classiﬁers’ Precision-Recall

Curve on SMOTE

False Positive Rate

True Positive Rate

(b) 15 Classiﬁers’ Output AUROC on

SMOTE

Based on the testing set, the decision tree produced the best results with an

overall PRC of 0.77. While GNB(RF) and NCC show output PRC of 0.52 and

0.53 which is almost the same as random prediction like dummy classiﬁer. A

careful examination of these results reveals that PRC for the DT is signiﬁcantly

higher than the PRC of the remaining classiﬁers, as shown in ﬁgure-5.5a.

In ﬁgure-5.5b, the AUC metric value is used to evaluate the ML algorithms. The

DT classiﬁer has the highest value, with an AUC of 0.83, among all the models.

5.6.2 Output Performance using ADASYN-based Data Bal-

ancing

This experiment used ADASYN-based data balancing. We used the original bal-

anced dataset and this time we obtained an overall improvement rather than

random predictions in the previous subsection.

101 By: Arshid Ali

MS Thesis Electricity Theft Detection

Output Accuracy

15 Implemented Models

(a) 15 Classiﬁers’ Output Accuracy us-

ing ADASYN

Output Performance

15 Implemented Classifiers

(b) 15 Classiﬁers’ Output Performance

on ADASYN

A total of six performance metrics are plotted against all 15 implemented classi-

ﬁers. These metrics are MCC, FPR, FNR, AUC, F1-Score, and accuracy. Looking

at all these parameters, overall the KNN and DT classiﬁers outperform. While

GNB, perception, and SGDC show poor classiﬁcation results, as shown in ﬁgure-

5.6b.

The precision-recall curve of all implemented classiﬁers for ADASYN-based data

balancing is shown in ﬁgure-5.7a. The PRC also shows the highest value for KNN

and DT classiﬁers.

Precision

Recall

(a) 15 Classiﬁers’ Precision-Recall

Curve on ADASYN

False Positive Rate

True Positive Rate

(b) 15 Classiﬁers’ Output AUROC on

ADASYN

102 By: Arshid Ali

MS Thesis Electricity Theft Detection

It is appropriate to compare these classiﬁers using the AUC calculation. It is a

suitable statistic for model comparison since AUC is scale-invariant and thresh-

old invariant. A perfect model has an area of 1, and one with a bigger area is

considered superior. The whole two-dimensional Area Under the ROC curve is dis-

played in ﬁgure-5.7b, called AUROC. Among the ﬁfteen implemented classiﬁers,

KNN and DT outperform in terms of theft and normal user classiﬁcation.

5.6.3 Output Performance using SMOBD-based Data Bal-

ancing

In the following experiment, a SMOBD data balancing technique is used on the

SGCC dataset. The performance of 15 implemented classiﬁers is evaluated and

discussed.

The accuracy metric is used for classiﬁcation evaluation of all the implemented

classiﬁers. KNN and DT showed an improved accuracy score of 0.86 and 0.79,

respectively, as shown in ﬁgure-5.8a. While the remaining models give a random

accuracy value below 0.70.

Accuracy, AUC, MCC, F1-Score, precision, recall, and FNR are used for classi-

ﬁcation comparison. The ﬁgure-5.8b shows that DT and KNN outperform other

classiﬁers in terms of theft and honest users’ classiﬁcation.

Output Accuracy

15 Implemented Models

(a) 15 Classiﬁers’ Output Accuracy us-

ing SMOBD

Output Performance

15 Implemented Classifiers

(b) 15 Classiﬁers’ Output Performance

on SMOBD

103 By: Arshid Ali

MS Thesis Electricity Theft Detection

Figure-5.9a presents the assessment of the ML algorithms in relation to the PRC

metric value. Comparatively, the KNN and DT classiﬁers had highest PRC of

0.78 and 0.73, among all the models. While the GNB classiﬁer shows a poor PRC

value of 0.53, which is almost near to the blind/dummy classiﬁer.

Precision

Recall

(a) 15 Classiﬁers’ Precision-Recall

Curve on SMOBD

True Positive Rate

False Positive Rate

(b) 15 Classiﬁers’ Output AUROC on

SMOBD

The AUC value is a good performance metric for imbalanced dataset classiﬁcation.

This performance indicator measures how quickly the TPR changes as the FPR

rises. The ROC curve illustrates the trade-oﬀ between the TPR and the FPR by

increasing the threshold values. The ROC for all the classiﬁers with maximum

value (denoted as AUC) is shown in ﬁgure-5.9b. As seen, the KNN and DT

outperform other classiﬁers on SMOBD-based data balancing with AUC values of

0.86 and 0.79, respectively. This shows that the positive and negative results can

be easily separated as compared to other classiﬁers.

5.6.4 Output Performance using NRAS-based Data Bal-

ancing

This experiment shows the classiﬁcation performance of 15 classiﬁers using the

NRAS data balancing technique on the SGCC dataset.

104 By: Arshid Ali

MS Thesis Electricity Theft Detection

Output Accuracy

15 Implemented Models

(a) 15 Classiﬁers’ Output Accuracy us-

ing NRAS

Output Performance

15 Implemented Classifiers

(b) 15 Classiﬁers’ Output Performance

on NRAS

Various performance metrics are considered for comparison purposes and are plot-

ted.

The accuracy term is used for the classiﬁcation evaluation of all the implemented

algorithms. The general classiﬁcation metric, accuracy term, shows the highest

values of 0.92 and 0.91 for KNN and DT respectively. The lowest accuracy is

obtained for NCC and GNB of 0.59 and 0.60, respectively, when compared with

the dummy classiﬁer.

Precision

Recall

(a) 15 Classiﬁers’ Precision-Recall

Curve on NRAS

False Positive Rate

True Positive Rate

(b) 15 Classiﬁers’ Output AUROC on

NRAS

FNR is a categorization metric that provides a detailed understanding of the

number of thieves who are mistakenly classiﬁed as honest users. In order to detect

105 By: Arshid Ali

MS Thesis Electricity Theft Detection

NTLs, a low FNR is preferred. With the highest observed FNR, the two classiﬁers,

NCC and GNB, appeared to perform poorly. Figure-5.10b demonstrates that the

FNR for the three classiﬁers is very low. In our actual dataset, KNN shows the best

option for theft detection since it has the lowest FNR. Among the 15 classiﬁers,

FNR is minimum for KNN, BNB, and DT classiﬁers. KNN+NRAS thus proves

to be the last option for NTL detection for our actual dataset, on NRAS-based

balancing. The precision-recall curve of implemented classiﬁers for NRAS-based

data balancing is shown in ﬁgure-5.11a. The PRC also shows highest value for

KNN and DT classiﬁers.

FPR calculates the total number of normal users who are predicted to be thieves.

A high FPR increases manual work by increasing the on-site theft veriﬁcation

process. However, a large FPR also shows that the classiﬁcation algorithm is suc-

cessful in theft detection which actually belongs to an honest class in the dataset.

The correctly predicted theft users are TPR and its sum with TNR needs to be

100% for a good classiﬁcation.

Figure-5.11b shows the ROC curve with given AUC values for the implemented

classiﬁers. The KNN and DT outperform, with an AUC value of 0.92 and 0.91,

respectively.

5.6.5 Output Performance using CCR-based Data Balanc-

ing

In the last experiment, a CCR-based balanced dataset is used for classiﬁcation

purposes. The accuracy metric is used for classiﬁcation evaluation. GNB, DT,

and QDA showed an accuracy score of 0.92, 0.96, and 0.98, respectively as shown

in ﬁgure-5.12a.

106 By: Arshid Ali

MS Thesis Electricity Theft Detection

Output Accuracy

15 Implemented Models

(a) 15 Classiﬁers’ Output Performance

on CCR

Output Performance

15 Implemented Classifiers

(b) 15 Classiﬁers’ Output Performance

on CCR

The classiﬁers are also veriﬁed for accuracy, AUC, Precision, Recall, MCC, F1-

Score, and FNR. The best performance is obtained on QDA and DT classiﬁers,

as shown in ﬁgure-5.12b.

However, only the accuracy metric is often not enough for all classiﬁcation prob-

lems. And we need some other necessary metrics to make sure that a model is

reliable in terms of classiﬁcation.

Precision

Recall

(a) 15 Classiﬁers’ Precision-Recall

Curve on CCR

False Positive Rate

True Positive Rate

(b) 15 Classiﬁers’ Output AUROC on

CCR

Similarly, the PRC of the QDA classiﬁer increased to 0.99 in the case of CCR,

followed by GNB and DT. The evaluation of the ML algorithms with respect

107 By: Arshid Ali

MS Thesis Electricity Theft Detection

to the PRC metric value is shown in ﬁgure-5.13a. Comparatively, the perceptron

classiﬁer had the lowest PRC of 0.77 among the implemented models, even though

the data is kept the same. Moreover, the precision of all the models is measured

and is given in ﬁgure.

The AUC score is an eﬀective performance indicator for unbalanced datasets. This

performance indicator establishes how quickly the tpr increases as the fpr rises.

On the ROC curve, changing the decision threshold demonstrates the trade-oﬀ

between the tpr and fpr, as shown in ﬁgure-5.13b. As seen, the AUC value for

QDA is 0.9789 and outperforms the comparative algorithm. This shows that tpr

and fpr can be easily separated. Accuracy, the generic categorization term, as well

as MCC, AUC, FPR, and FNR, each experiment also includes the computation

of ROC and PR curves. For diﬀerent thresholds, the trade-oﬀ between precision

and recall is determined by the PR.

The results of ﬁfteen ML techniques are evaluated using diﬀerent balancing ap-

proaches, as shown in table-5.3. On the evaluation of the results, we can analyze

the classiﬁer behavior based on its output performance. This has been inferred

that almost all algorithms show improved results on balancing, as compared to

imbalanced data. The model maintains its superiority when imbalanced training

samples are available. It has been found that the QDA model showed majorly im-

proved performance than other classiﬁers, in terms of AUC, MCC, and F1-Score

metrics, using CCR balanced dataset.

108 By: Arshid Ali

MS Thesis Electricity Theft Detection

#

Classifiers

Performance

Metrics

Without

Balancing

SMOTE

AdaSyn

NRAS

SMOBD

CCR

1.

Logistic

Regression

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9134

0.51

0.6591

0.026

0.0501

0.1191

0.0012

0.9739

18.7 s

0.6317

0.63

0.7005

0.4559

0.5524

0.2802

0.1936

0.54405

1min 34s

0.6148

0.61

0.6743

0.4406

0.533

0.2442

0.2117

0.5593

55.6 s

0.7622

0.76

0.8766

0.6084

0.7183

0.5502

0.0850

0.3916

1min 38s

0.6380

0.64

0.7107

0.4616

0.5596

0.2939

0.1865

0.5384

1min 10s

0.7995

0.50

0.7998

0.9996

0.8886

-0.0088

1.0

0.00038

1min 27s

2.

Bernoulli

Naïve Bayes

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.3038

0.52

0.0926

0.789

0.1657

0.0301

0.7427

0.2109

13.8 s

0.6037

0.60

0.5625

0.9222

0.6988

0.2713

0.7124

0.0777

19.6 s

0.5980

0.60

0.5593

0.9149

0.6943

0.2551

0.7171

0.0850

13.3s

0.6219

0.62

0.573

0.9472

0.714

0.323

0.7009

0.0528

7.79s

0.6043

0.61

0.5628

0.9234

0.6993

0.2731

0.7123

0.07663

8.28 s

0.6775

0.72

0.9276

0.6474

0.7626

0.3588

0.20204

0.35256

37.8 s

3.

Gaussian

Naïve Bayes

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9049

0.54

0.3433

0.0925

0.1457

0.1406

0.0169

0.9075

5.17 s

0.5363

0.53

0.8551

0.0835

0.1522

0.1615

0.0140

0.9164

13.8 s

0.5339

0.53

0.8155

0.0729

0.1338

0.1368

0.0164

0.9271

12.2s

0.6009

0.60

0.9462

0.2112

0.3453

0.3169

0.0119

0.7888

11.1s

0.5408

0.54

0.8668

0.0927

0.1675

0.1749

0.0141

0.9073

10s

0.9216

0.95

0.9964

0.9053

0.9487

0.801

0.0131

0.0946

41.7 s

4.

K Nearest

neighbor

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9096

0.54

0.4222

0.0853

0.1419

0.1588

0.0112

0.9147

4min 58s

0.8067

0.81

0.7225

0.9941

0.8368

0.6622

0.3792

0.0058

17min 15s

0.7974

0.80

0.7129

0.9941

0.8304

0.6473

0.3982

0.0058

17min 3s

0.9267

0.93

0.9129

0.943

0.9277

0.8539

0.0893

0.0570

18min 26s

0.8611

0.86

0.7847

0.9941

0.8771

0.7496

0.2708

0.0058

17min 47s

0.6284

0.76

0.9953

0.5381

0.6985

0.4263

0.0101

0.4619

90min 27s

5.

Perceptron

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9124

0.54

0.5025

0.0916

0.1549

0.1869

0.0087

0.9084

11.5 s

0.5714

0.57

0.8975

0.158

0.2687

0.2476

0.0179

0.8419

29.4 s

0.5391

0.54

0.8748

0.0887

0.1611

0.1736

0.0126

0.9113

25.8s

0.7512

0.75

0.9275

0.5432

0.6851

0.5511

0.0421

0.45680

12s

0.613373

0.61

0.8572

0.2688

0.4093

0.3089

0.0444

0.7311

19.2s

0.6881

0.43

0.7748

0.8602

0.8153

-0.1768

0.9994

0.1398

26.2 s

6.

PAC

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9137

0.52

0.625

0.0359

0.068

0.1366

0.00198

0.9640

15.8 s

0.6419

0.64

0.5937

0.8552

0.7089

0.3066

0.5836

0.1418

1min 54s

0.5684

0.57

0.8664

0.1635

0.2751

0.237

0.0250

0.8365

2min 11s

0.7972

0.78

0.9283

0.6066

0.7338

0.5976

0.0465

0.3933

2min 27s

0.5991

0.51

0.5031

0.9973

0.6688

0.0872

0.9780

0.0026

1min 54s

0.7253

0.46

0.7856

0.9145

0.8452

-0.1317

0.9978

0.0854

41.7 s

7.

Quadratic

Discriminant

Analysis

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9124

0.52

0.5065

0.035

0.0655

0.1156

0.0032

0.9649

2min 15s

0.6215

0.62

0.9954

0.2416

0.3888

0.3687

0.0011

0.7584

3min 53s

0.6628

0.66

0.8456

0.3964

0.5398

0.3832

0.0720

0.6035

4min 2s

0.8540

0.85

0.8526

0.8549

0.8537

0.7081

0.1468

0.1450

4min 17s

0.5874

0.59

0.796

0.2313

0.3584

0.245

0.0588

0.7687

4min 9s

0.9790

0.98

0.9946

0.9791

0.9868

0.9366

0.0211

0.0209

10min 53s

Table 5.2: 15 Models Output Performance

109 By: Arshid Ali

MS Thesis Electricity Theft Detection

8.

Stochastic

Gradient

Descent

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9126

0.50

1.0

0.0027

0.0054

0.0496

0.0

0.9973

9.06 s

0.5742

0.60

0.7116

0.3373

0.4576

0.2372

0.1357

0.6627

2.2s

0.5668

0.56

0.7759

0.1767

0.2878

0.1984

0.0507

0.8233

41.5s

0.6770

0.68

0.8993

0.406

0.5594

0.4321

0.0451

0.5940

22.6s

0.5924

0.59

0.8044

0.2268

0.3538

0.2475

0.0547

0.7731

31.5s

0.7999

0.50

0.7999

1.0

0.8888

0.0

1.0

0.0

1min 24s

9.

Ridge

Classifier

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9129

0.51

0.625

0.018

0.0349

0.0955

0.0010

0.9820

16.5 s

0.6322

0.63

0.7306

0.4148

0.5291

0.2919

0.1518

0.5852

33.3s

0.62039

0.62

0.7062

0.4089

0.518

0.2645

0.1692

0.5910

32.6s

0.6986

0.70

0.9164

0.4348

0.5898

0.4654

0.0393

0.5652

29.5s

0.6365

0.64

0.7438

0.4125

0.5307

0.3035

0.1410

0.5874

28.9s

0.7998

0.50

0.7998

0.9999

0.8888

-0.0046

1.0

0.00010

1min 16s

10.

LDA

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9085

0.56

0.4273

0.1293

0.1985

0.1982

0.0166

0.8707

2min 19s

0.6847

0.68

0.7221

0.597

0.6536

0.3747

0.2282

0.4029

4min 21s

0.67164

0.67

0.7045

0.5885

0.6413

0.3478

0.2455

0.4115

4min 17s

0.8187

0.82

0.8589

0.7613

0.8072

0.6415

0.1241

0.2387

3min 54s

0.6515

0.65

0.7138

0.5018

0.5893

0.3165

0.1997

0.4982

4min 3s

0.7997

0.50

0.7999

0.9998

0.8887

-0.0062

1.0

0.00019

27min 15s

11.

DT

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.8617

0.60

0.2457

0.2828

0.2629

0.1872

0.0833

0.7172

17min

0.8439

0.84

0.8225

0.8765

0.8486

0.69

0.1878

0.1235

18min 28s

0.83

0.84

0.8187

0.8692

0.8432

0.6789

0.1915

0.1307

14min 33s

0.9122

0.91

0.8951

0.9309

0.9126

0.8231

0.10833

0.0691

26min 54s

0.7984

0.80

0.7883

0.8051

0.7966

0.5905

0.2146

0.1948

17min 31s

0.9677

0.95

0.9781

0.9814

0.9797

0.898

0.0879

0.0186

36min 8s

12.

Nearest

Centroid

Classifier

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.8478

0.56

0.1836

0.2128

0.1929

0.1144

0.0912

0.7863

4.42 s

0.5604

0.56

0.682

0.2204

0.3332

0.161

0.1020

0.7795

7.23s

0.5545

0.55

0.6369

0.2483

0.3573

0.1357

0.1408

0.7517

9.48s

0.5976

0.60

0.9262

0.209

0.3411

0.3046

0.0165

0.7909

3.33s

0.5686

0.57

0.7002

0.2346

0.3514

0.1808

0.0997

0.7654

3.44s

0.3936

0.55

0.9776

0.9814

0.9795

0.0867

0.1942

0.7094

11.4s

13.

Multinomial

Naïve Bayes

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9130

0.51

0.6452

0.018

0.0349

0.0975

0.0009

0.9820

1.85 s

0.6167

0.62

0.7878

0.3159

0.4509

0.2893

0.0845

0.6841

8.48s

0.6028

0.60

0.7356

0.3179

0.4439

0.2483

0.1136

0.6820

9.08s

0.6537

0.65

0.9329

0.3288

0.4862

0.4012

0.0234

0.6711

3.19s

0.6190

0.62

0.7926

0.319

0.4549

0.2948

0.0828

0.6810

3.59s

Nill

14.

Complement

Naïve Bayes

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.8769

0.59

0.2768

0.2504

0.263

0.1963

0.0628

0.7495

1.84 s

0.6140

0.61

0.7904

0.3068

0.442

0.2861

0.0808

0.6931

6.66s

0.6041

0.60

0.7317

0.3259

0.451

0.249

0.1189

0.6740

6.69s

0.6529

0.65

0.9332

0.3269

0.4842

0.4

0.0232

0.6730

3.07s

0.6170

0.62

0.7953

0.3116

0.4477

0.2926

0.0796

0.6884

3.57s

Nill

15.

Dummy

Classifier

Accuracy

AUC

Precision

Recall

F1-Score

MCC

FPR

FNR

Time

0.9123

0.50

nan

0.0

nan

0.0

1.0

85.8 ms

0.4982

0.50

0.4983

1.0

0.6651

0.0

1.0

0.0

251ms

0.5012

0.50

nan

0.0

nan

0.0

1.0

252ms

0.4982

0.50

0.4983

1.0

0.6651

0.0

1.0

0.0

135ms

0.4982

0.50

0.4983

1.0

0.6651

0.0

1.0

0.0

141ms

0.7999

0.50

0.7999

1.0

0.8888

0.0

1.0

0.0

430ms

Table 5.3: 15 Models Output Performance

110 By: Arshid Ali

MS Thesis Electricity Theft Detection

We looked into the classiﬁers that are best at predicting electricity theft. For this

purpose, we showed the output of 8 performance metrics including ROC curves of

the 15 classiﬁers, and determined the related AUC ratings for easier comparison.

When compared to other classiﬁers, we found that the AUCs of a few algorithms

were often higher. The performance of the classiﬁers SGDC, perceptron, and BNB

is the worst among the ﬁfteen analyzed classiﬁers in regard to AUC, whereas the

models QDA, DT, KNN, and LR are comparable with one another.

Using the SGCC dataset, the 15 ML classiﬁers’ are evaluated to detect electricity

theft. Fifteen ML algorithms are chosen for this purpose. And each of the 15

classiﬁers is individually simulated 10 times and their average accuracy is noted

to overcome the variations in output results due to random data splitting. To ob-

serve the model classiﬁcation performance, 8 classiﬁcation parameters are chosen.

The results are noted for each class balancing technique and without any data bal-

ancing. For comparison purposes, the results of each of the 15 ML algorithms are

evaluated on the basis of imbalance dataset results and dummy classiﬁer results.

The complete simulation results for the 15 classiﬁers are given in table-5.3.

We looked into the classiﬁers that are best at predicting electricity theft. For this

purpose, we showed the output of 8 performance metrics including ROC curves of

the 15 classiﬁers, and determined the related AUC ratings for easier comparison.

When compared to other classiﬁers, we found that the AUCs of a few algorithms

were often higher. The performance of the classiﬁers SGDC, perceptron, and BNB

is the worst among the ﬁfteen analyzed classiﬁers in regard to AUC, whereas the

models QDA, DT, KNN, and LR are comparable with one another.

Using the SGCC dataset, the 15 ML classiﬁers are evaluated to detect electricity

theft. Fifteen ML algorithms are chosen for this purpose. And each of the 15

classiﬁers is individually simulated 10 times and their average accuracy is noted

to overcome the variations in output results due to random data splitting. To

observe the model classiﬁcation performance, 8 classiﬁcation parameters are cho-

sen. The results are noted for each time, 5 balancing techniques, and without

any data balancing. For comparison purposes, the results of each of the 15 ML

algorithms are evaluated on the basis of imbalance dataset results and dummy

classiﬁer results. The complete simulation results for the 15 classiﬁers are given

in table-2

111 By: Arshid Ali

MS Thesis Electricity Theft Detection

Chapter 6

Conclusion & Future Work

112 By: Arshid Ali

MS Thesis Electricity Theft Detection

6.1 Conclusion

In the paper, an ML-based stacked generalization technique is proposed to over-

come the CPTA issue in the SG. The overall system is divided into four sections

with speciﬁc functions.

The data obtained from the utility need some pre-processing before being used

for model training. The ﬁrst module addresses these issues with novel techniques

in order to process the data without losing important information. The NaN is

imputed using the mean imputation method for making a complete ECP. The

data is normalized with the min-max scaling technique to bring the data into a

proper range. A z-score capping technique is applied for the eﬃcient handling of

outliers in the dataset.

In the second module, a leveraging-PCA-based technique is applied for important

feature extraction and data reduction purposes. We implement the SVM-SMOTE

technique for optimal balancing of the theft and normal class data obtained from

the PCA technique.

The benchmark classiﬁers are implemented in module 3 of the proposed model.

The dataset is split into 80:20 training and testing ratio after balancing, and fed

to the four base classiﬁers. These base classiﬁers are trained and predicted on 80%

(training set) of the dataset, and the predictions are obtained for each classiﬁer.

The ﬁnal classiﬁcation is performed, as showin in module 4, with input from

four diﬀerent ML models and a meta-level DL model. The prediction of level-0

classiﬁers is fed to the level-1 model to capture the ECP information from all the

base classiﬁers. The ﬁnal prediction obtained from the level-1 model shows an

enhanced performance in terms of classiﬁcation. The results obtained show that

our proposed model outperformed other benchmark ML models. The proposed

model achieves a high accuracy of 97.6%. A very low value of FPR and FNR

is obtained with 0.7%, 2.02%, respectively, which is never achieved before in the

state-of-the-art techniques using the original SGCC dataset.

The results obtained make the proposed model useful to be used in industrial

applications for theft detection and NTL reduction purposes.

113 By: Arshid Ali

MS Thesis Electricity Theft Detection

6.2 Conclusion

This paper has been veriﬁed on a real-world dataset of State Grid Corporation of

China, Fujian City, China for theft detection. The dataset includes 42,373 users’

daily consumption records with 1,034 features. We implemented 15 individual ML

classiﬁers and a comparative analysis is done using several data-balancing tech-

niques. The classiﬁers are veriﬁed using 8 types of performance metrics for the

potential detection of NTL. The aim of this research is to train diﬀerent classiﬁers

on the labeled dataset and then identify the theft users on unknown data using

trained models. For this purpose, diﬀerent balancing techniques are compared

including SMOTE, AdaSyn, NRAS, SMOBD, and CCR. The ML methods im-

plemented include LR, BNB, GNB, KNN, Perceptron, PAC, QDA, SGDC, RC,

LDA, DT, NCC, MNB, CNB and dummy classiﬁer.

For comparison purposes, the performance of balancing techniques is compared

with the imbalanced dataset while the classiﬁer results are compared with a

dummy classiﬁer.

In our ﬁndings, the classiﬁers performed diﬀerently on each class balancing tech-

nique. But overall, all balancing techniques showed good classiﬁcation results in

contrast to the imbalanced dataset. For classiﬁers’ comparison, one of our ﬁndings

showed that, with respect to AUC-measure, QDA showed the highest classiﬁca-

tion performance on CCR with accuracy, AUC, precision, recall, F1-Score, MCC,

FPR and FNR values of 0.979, 0.98, 0.994, 0.979, 0.986, 0.936, 0.021 and 0.020,

respectively. Other classiﬁers like DT and GNB also show good classiﬁcation

results on CCR. While AUC values of 15-classiﬁers, when implemented, on the 5-

balancing techniques showed that BNB, QDA, DT, and GNB on CCR-balancing

and KNN, SGDC, RC, LDA, NCC, MNB, CNB, Perceptron, PAC and LR on

NRAS-balancing give comparatively outstanding results with AUC values of 0.72,

0.98, 0.95, 0.95, 0.93, 0.68, 0.70, 0.82, 0.60, 0.65, 0.65, 0.75, 0.78 and 0.76, re-

spectively. Finally, the best classiﬁer identiﬁed in this study, for electricity theft

detection, is QDA on the CCR technique.

114 By: Arshid Ali

MS Thesis Electricity Theft Detection

6.3 Future Work

The proposed system model used here works in oﬄine mode. In the future, the

concern will be to transform the model into the online mode with hyper-parameter

tuning in order to decrease the system execution time.

In the future, we would like to investigate diﬀerent classiﬁers with feature extrac-

tion steps using recently developed Generative Artiﬁcial Networks (GANs) with

ensemble machine learning methods and deep learning algorithms. It is our plan

to test these techniques for AUC and MCC scores in order to improve theft and

normal users classiﬁcation prediction.

115 By: Arshid Ali

MS Thesis Electricity Theft Detection

Bibliography

[1] “Electrical grid distributions student energy.” [Online]. Available:

https://studentenergy.org/distribution/electrical-grid/

[2] J. R. Aguero, E. Takayesu, D. Novosel, and R. Masiello, “Modernizing the

grid: Challenges and opportunities for a sustainable future,” IEEE Power

and Energy Magazine, vol. 15, no. 3, pp. 74–83, 2017.

[3] B. Wormuth, S. Wang, P. Dehghanian, M. Barati, A. Estebsari, T. P. Filom-

ena, M. H. Kapourchali, and M. A. Lejeune, “Electric power grids under

high-absenteeism pandemics: History, context, response, and opportuni-

ties,” IEEE Access, vol. 8, pp. 215 727–215 747, 2020.

[4] G. Dileep, “A survey on smart grid technologies and applications,”

Renewable Energy, vol. 146, pp. 2589–2625, 2020. [Online]. Available:

https://www.sciencedirect.com/science/article/pii/S0960148119312790

[5] C. M. Flath and N. Stein, “Towards a data science toolbox for industrial

analytics applications,” Computers in Industry, vol. 94, pp. 16–25, 2018.

[6] “Applications of data science | real-world appli-

cations.” [Online]. Available: https://intellipaat.com/blog/

applications-of-data-science-real-world-applications/

[7] “What is machine learning and types of machine learning [up-

dated].” [Online]. Available: https://www.simplilearn.com/tutorials/

machine-learning-tutorial/what-is-machine-learning

[8] “Ensemble methods bagging, boosting, and stack-

ing | by ankit chauhan | analytics vidhya |

medium.” [Online]. Available: https://medium.com/analytics-vidhya/

ensemble-methods-bagging-boosting-and-stacking-28d006708731

116 By: Arshid Ali

MS Thesis Electricity Theft Detection

[9] E. Hossain, I. Khan, F. Un-Noor, S. S. Sikander, and M. S. H. Sunny,

“Application of big data and machine learning in smart grid, and associated

security concerns: A review,” Ieee Access, vol. 7, pp. 13 960–13 988, 2019.

[10] “U.s. energy information administration (eia).” [Online]. Available:

https://www.eia.gov/tools/faqs/faq.php?id=427&t=3

[11] A. Ullah, N. Javaid, M. Asif, M. U. Javed, and A. S. Yahaya, “Alexnet,

adaboost and artiﬁcial bee colony based hybrid model for electricity theft

detection in smart grids,” IEEE Access, vol. 10, pp. 18 681–18 694, 2022.

[12] P. Massaferro, J. M. D. Martino, and A. Fernández, “Fraud detection

on power grids while transitioning to smart meters by leveraging multi-

resolution consumption data,” IEEE Transactions on Smart Grid, vol. 13,

no. 3, pp. 2381–2389, 2022.

[13] A. L. Shah, W. Mesbah, and A. T. Al-Awami, “An algorithm for accurate

detection and correction of technical and nontechnical losses using smart me-

tering,” IEEE Transactions on Instrumentation and Measurement, vol. 69,

no. 11, pp. 8809–8820, 2020.

[14] L. J. Lepolesa, S. Achari, and L. Cheng, “Electricity theft detection in smart

grids based on deep neural network,” IEEE Access, vol. 10, pp. 39 638–

39 655, 2022.

[15] N. Javaid, “A plstm, alexnet and esnn based ensemble learning model for

detecting electricity theft in smart grids,” IEEE Access, vol. 9, pp. 162 935–

162 950, 2021.

[16] M. U. Saleem, M. R. Usman, M. A. Usman, and C. Politis, “Design, deploy-

ment and performance evaluation of an iot based smart energy management

system for demand side management in smart grid,” IEEE Access, vol. 10,

pp. 15 261–15 278, 2022.

[17] M. M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and A. Gómez-

Expósito, “Detection of non-technical losses using smart meter data and

supervised learning,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp.

2661–2670, 2019.

117 By: Arshid Ali

MS Thesis Electricity Theft Detection

[18] S. Mujeeb, N. Javaid, A. Ahmed, S. M. Gulfam, U. Qasim, M. Shaﬁq, and J.-

G. Choi, “Electricity theft detection with automatic labeling and enhanced

rusboost classiﬁcation using diﬀerential evolution and jaya algorithm,” IEEE

Access, vol. 9, pp. 128 521–128 539, 2021.

[19] Z. Yan and H. Wen, “Electricity theft detection base on extreme gradient

boosting in ami,” in 2020 IEEE International Instrumentation and Mea-

surement Technology Conference (I2MTC), 2020, pp. 1–6.

[20] A. Arif, T. A. Alghamdi, Z. A. Khan, and N. Javaid, “Towards eﬃcient

energy utilization using big data analytics in smart cities for electricity theft

detection,” Big Data Research, vol. 27, p. 100285, 2022. [Online]. Available:

https://www.sciencedirect.com/science/article/pii/S2214579621001027

[21] Z. Yan and H. Wen, “Performance analysis of electricity theft detection for

the smart grid: An overview,” IEEE Transactions on Instrumentation and

Measurement, vol. 71, pp. 1–28, 2022.

[22] A. Pasdar and S. Mirzakuchaki, “A solution to remote detecting of illegal

electricity usage based on smart metering,” in 2007 2nd International Work-

shop on Soft Computing Applications, 2007, pp. 163–167.

[23] S. S. Ali, M. Maroof, and S. Hanif, “Smart energy meters for energy conser-

vation minimizing errors,” in 2010 Joint International Conference on Power

Electronics, Drives and Energy Systems 2010 Power India, 2010, pp. 1–7.

[24] D. Zheng and S. Wang, “Research on measuring equipment of single-phase

electricity-stealing with long-distance monitoring function,” in 2009 Asia-

Paciﬁc Power and Energy Engineering Conference, 2009, pp. 1–4.

[25] J. Astronomo, M. D. Dayrit, C. Edjic, and E. R. T. Regidor, “Develop-

ment of electricity theft detector with gsm module and alarm system,” in

2020 IEEE 12th International Conference on Humanoid, Nanotechnology,

Information Technology, Communication and Control, Environment, and

Management (HNICEM), 2020, pp. 1–5.

[26] A. Coa, “Smart prepaid energy metering system to detect energy theft with

facility for real time monitoring,” International Journal of Electrical and

Computer Engineering (IJECE), vol. 9, pp. 4184–4191, 2019.

118 By: Arshid Ali

MS Thesis Electricity Theft Detection

[27] T. Shankar, S. I. G, S. M. S, and S. R. Gondkar, “Wireless power theft

monitoring and controlling unit for substation,” Article in IOSR Journal

of Electronics and Communication Engineering, vol. 9, pp. 10–14, 2014.

[Online]. Available: www.iosrjournals.org

[28] S. H. Mir, S. Ashruf, Y. Bhat, N. Beigh et al., “Review on smart electric

metering system based on gsm/iot,” Asian Journal of Electrical Sciences,

vol. 8, no. 1, pp. 1–6, 2019.

[29] I. N. Fovino, A. Carcano, T. De Lacheze Murel, A. Trombetta, and

M. Masera, “Modbus/dnp3 state-based intrusion detection system,” in 2010

24th IEEE International Conference on Advanced Information Networking

and Applications, 2010, pp. 729–736.

[30] C. Bandim, J. Alves, A. Pinto, F. Souza, M. Loureiro, C. Magalhaes, and

F. Galvez-Durand, “Identiﬁcation of energy theft and tampered meters us-

ing a central observer meter: a mathematical approach,” in 2003 IEEE

PES Transmission and Distribution Conference and Exposition (IEEE Cat.

No.03CH37495), vol. 1, 2003, pp. 163–168 Vol.1.

[31] S. McLaughlin, B. Holbert, A. Fawaz, R. Berthier, and S. Zonouz, “A multi-

sensor energy theft detection framework for advanced metering infrastruc-

tures,” IEEE Journal on Selected Areas in Communications, vol. 31, no. 7,

pp. 1319–1330, 2013.

[32] X. Xia, Y. Xiao, W. Liang, and M. Zheng, “Gthi: A heuristic algorithm

to detect malicious users in smart grids,” IEEE Transactions on Network

Science and Engineering, vol. 7, no. 2, pp. 805–816, 2020.

[33] A. A. Cárdenas, S. Amin, G. Schwartz, R. Dong, and S. Sastry, “A game

theory model for electricity theft detection and privacy-aware control in

ami systems,” in 2012 50th Annual Allerton Conference on Communication,

Control, and Computing (Allerton), 2012, pp. 1830–1837.

[34] Y. Gao, B. Foggo, and N. Yu, “A physically inspired data-driven model for

electricity theft detection with smart meter data,” IEEE Transactions on

Industrial Informatics, vol. 15, no. 9, pp. 5076–5088, 2019.

119 By: Arshid Ali

MS Thesis Electricity Theft Detection

[35] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, and S. Mishra, “Decision

tree and svm-based data analytics for theft detection in smart grid,” IEEE

Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1005–1016, 2016.

[36] Z. Yan and H. Wen, “Electricity theft detection base on extreme gradient

boosting in ami,” IEEE Transactions on Instrumentation and Measurement,

vol. 70, pp. 1–9, 2021.

[37] R. Punmiya and S. Choe, “Energy theft detection using gradient boosting

theft detector with feature engineering-based preprocessing,” IEEE Trans-

actions on Smart Grid, vol. 10, no. 2, pp. 2326–2329, 2019.

[38] F. Unal, A. Almalaq, S. Ekici, and P. Glauner, “Big data-driven detection

of false data injection attacks in smart meters,” IEEE Access, vol. 9, pp.

144 313–144 326, 10 2021.

[39] M. Panthi, “Anomaly detection in smart grids using machine learning tech-

niques,” in 2020 First International Conference on Power, Control and Com-

puting Technologies (ICPC2T), 2020, pp. 220–222.

[40] P. Chandel and T. Thakur, “Smart Meter Data Analysis for Electricity

Theft Detection using Neural Networks,” Advances in Science, Technology

and Engineering Systems Journal, vol. 4, no. 4, pp. 161–168, 2019.

[41] P. Jokar, N. Arianpoo, and V. C. M. Leung, “Electricity theft detection in

ami using customers consumption patterns,” IEEE Transactions on Smart

Grid, vol. 7, no. 1, pp. 216–226, 2016.

[42] N. Ayub, K. Aurangzeb, M. Awais, and U. Ali, “Electricity theft detec-

tion using cnn-gru and manta ray foraging optimization algorithm,” in 2020

IEEE 23rd International Multitopic Conference (INMIC), 2020, pp. 1–6.

[43] K. M. Ghori, R. A. Abbasi, M. Awais, M. Imran, A. Ullah, and L. Szathmary,

“Performance analysis of diﬀerent types of machine learning classiﬁers for

non-technical loss detection,” IEEE Access, vol. 8, pp. 16 033–16 048, 2020.

[44] Pamir, N. Javaid, A. Almogren, M. Adil, M. U. Javed, and M. Zuair, “Rfe

based feature selection and knnor based data balancing for electricity theft

detection using bilstm-logitboost stacking ensemble model,” IEEE Access,

vol. 10, pp. 112 948–112 963, 2022.

120 By: Arshid Ali

MS Thesis Electricity Theft Detection

[45] S. Hussain, M. W. Mustafa, K. H. A. Al-Shqeerat, F. Saeed, and B. A. S. Al-

rimy, “A novel feature-engineered-ngboost machine-learning framework for

fraud detection in electric power consumption data,” Sensors, vol. 21, no. 24,

2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/24/8423

[46] S. Hussain, M. W. Mustafa, T. A. Jumani, S. K. Baloch, H. Alotaibi,

I. Khan, and A. Khan, “A novel feature engineered-catboost-based super-

vised machine learning framework for electricity theft detection,” Energy

Reports, vol. 7, pp. 4425–4436, 2021.

[47] L. Duarte Soares, A. de Souza Queiroz, G. P. López, E. M. Carreño-Franco,

J. M. López-Lezama, and N. Muñoz-Galeano, “Bigru-cnn neural network

applied to electric energy theft detection,” Electronics, vol. 11, no. 5, p. 693,

2022.

[48] Z. Qu, H. Li, Y. Wang, J. Zhang, A. Abu-Siada, and Y. Yao, “Detection of

electricity theft behavior based on improved synthetic minority oversampling

technique and random forest classiﬁer,” Energies, vol. 13, no. 8, 2020.

[Online]. Available: https://www.mdpi.com/1996-1073/13/8/2039

[49] S. K. Gunturi and D. Sarkar, “Ensemble machine learning models for the

detection of energy theft,” Electric Power Systems Research, vol. 192, p.

106904, 2021.

[50] R. Xia, Y. Gao, Y. Zhu, D. Gu, and J. Wang, “An attention-based wide and

deep cnn with dilated convolutions for detecting electricity theft considering

imbalanced data,” Electric Power Systems Research, vol. 214, p. 108886,

2023.

[51] L. Cui, L. Guo, L. Gao, B. Cai, Y. Qu, Y. Zhou, and S. Yu, “A covert

electricity-theft cyberattack against machine learning-based detection mod-

els,” IEEE Transactions on Industrial Informatics, vol. 18, no. 11, pp. 7824–

7833, 2022.

[52] A. A. Almazroi and N. Ayub, “A novel method cnn-lstm ensembler based

on black widow and blue monkey optimizer for electricity theft detection,”

IEEE Access, vol. 9, pp. 141 154–141 166, 2021.

121 By: Arshid Ali

MS Thesis Electricity Theft Detection

[53] D. Gu, Y. Gao, K. Chen, J. Shi, Y. Li, and Y. Cao, “Electricity theft

detection in ami with low false positive rate based on deep learning and evo-

lutionary algorithm,” IEEE Transactions on Power Systems, vol. 37, no. 6,

pp. 4568–4578, 2022.

[54] M. N. Hasan, R. N. Toma, A.-A. Nahid, M. M. M. Islam, and J.-M.

Kim, “Electricity theft detection in smart grid systems: A cnn-lstm

based approach,” Energies, vol. 12, no. 17, 2019. [Online]. Available:

https://www.mdpi.com/1996-1073/12/17/3310

[55] R. Qi, J. Zheng, Z. Luo, and Q. Li, “A novel unsupervised data-driven

method for electricity theft detection in ami using observer meters,” IEEE

Transactions on Instrumentation and Measurement, vol. 71, pp. 1–10, 2022.

[56] R. Xia, Y. Gao, Y. Zhu, D. Gu, and J. Wang, “An eﬃcient method

combined data-driven for detecting electricity theft with stacking structure

based on grey relation analysis,” Energies, vol. 15, no. 19, 2022. [Online].

Available: https://www.mdpi.com/1996-1073/15/19/7423

[57] A. Takiddin, M. Ismail, U. Zafar, and E. Serpedin, “Deep autoencoder-

based anomaly detection of electricity theft cyberattacks in smart grids,”

IEEE Systems Journal, 2022.

[58] H.-X. Gao, S. Kuenzel, and X.-Y. Zhang, “A hybrid convlstm-based anomaly

detection approach for combating energy theft,” IEEE Transactions on In-

strumentation and Measurement, vol. 71, pp. 1–10, 2022.

[59] S. A. Badawi, D. Guessoum, I. Elbadawi, and A. Albadawi, “A novel

time-series transformation and machine-learning-based method for ntl fraud

detection in utility companies,” Mathematics, vol. 10, no. 11, 2022. [Online].

Available: https://www.mdpi.com/2227-7390/10/11/1878

[60] Y. Huang and Q. Xu, “Electricity theft detection based on stacked sparse

denoising autoencoder,” International Journal of Electrical Power & Energy

Systems, vol. 125, p. 106448, 2021.

[61] K. Fei, Q. Li, and C. Zhu, “Non-technical losses detection using missing val-

ues pattern and neural architecture search,” International Journal of Elec-

trical Power & Energy Systems, vol. 134, p. 107410, 2022.

122 By: Arshid Ali

MS Thesis Electricity Theft Detection

[62] J. Pereira and F. Saraiva, “Convolutional neural network applied

to detect electricity theft: A comparative study on unbalanced

data handling techniques,” International Journal of Electrical Power

Energy Systems, vol. 131, p. 107085, 2021. [Online]. Available:

https://www.sciencedirect.com/science/article/pii/S0142061521003240

[63] N. Ibrahim, S. Al-Janabi, and B. Al-Khateeb, “Electricity-theft detection in

smart grid based on deep learning,” Bulletin of Electrical Engineering and

Informatics, vol. 10, no. 4, pp. 2285–2292, 2021.

[64] M. M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and A. Gómez-

Expósito, “Detection of non-technical losses using smart meter data and

supervised learning,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp.

2661–2670, 2018.

[65] Z. Yan and H. Wen, “Electricity theft detection base on extreme gradient

boosting in ami,” IEEE Transactions on Instrumentation and Measurement,

vol. 70, pp. 1–9, 2021.

[66] G. Lin, X. Feng, W. Guo, X. Cui, S. Liu, W. Jin, Z. Lin, and Y. Ding, “Elec-

tricity theft detection based on stacked autoencoder and the undersampling

and resampling based random forest algorithm,” IEEE Access, vol. 9, pp.

124 044–124 058, 2021.

[67] R. Punmiya and S. Choe, “Energy theft detection using gradient boosting

theft detector with feature engineering-based preprocessing,” IEEE Trans-

actions on Smart Grid, vol. 10, no. 2, pp. 2326–2329, 2019.

[68] Z. Zheng, Y. Yang, X. Niu, H.-N. Dai, and Y. Zhou, “Wide and deep convo-

lutional neural networks for electricity-theft detection to secure smart grids,”

IEEE Transactions on Industrial Informatics, vol. 14, no. 4, pp. 1606–1615,

2017.

[69] P. Jokar, N. Arianpoo, and V. C. Leung, “Electricity theft detection in ami

using customers consumption patterns,” IEEE Transactions on Smart Grid,

vol. 7, no. 1, pp. 216–226, 2015.

[70] N. F. Avila, G. Figueroa, and C.-C. Chu, “Ntl detection in electric distribu-

tion systems using the maximal overlap discrete wavelet-packet transform

123 By: Arshid Ali

MS Thesis Electricity Theft Detection

and random undersampling boosting,” IEEE Transactions on Power Sys-

tems, vol. 33, no. 6, pp. 7171–7180, 2018.

[71] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, and S. Mishra, “Decision

tree and svm-based data analytics for theft detection in smart grid,” IEEE

Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1005–1016, 2016.

[72] M. Nabil, M. Ismail, M. Mahmoud, M. Shahin, K. Qaraqe, and E. Serpedin,

“Deep recurrent electricity theft detection in ami networks with random tun-

ing of hyper-parameters,” in 2018 24th International Conference on Pattern

Recognition (ICPR), 2018, pp. 740–745.

[73] Y. He, G. J. Mendis, and J. Wei, “Real-time detection of false data injection

attacks in smart grid: A deep learning-based intelligent mechanism,” IEEE

Transactions on Smart Grid, vol. 8, no. 5, pp. 2505–2516, 2017.

[74] J. Lee, Y. G. Sun, I. Sim, S. H. Kim, D. I. Kim, and J. Y. Kim, “Non-

technical loss detection using deep reinforcement learning for feature cost

eﬃciency and imbalanced dataset,” IEEE Access, vol. 10, pp. 27 084–27 095,

2022.

[75] A. Y. Kharal, H. A. Khalid, A. Gastli, and J. M. Guerrero, “A novel

features-based multivariate gaussian distribution method for the fraudulent

consumers detection in the power utilities of developing countries,” IEEE

Access, vol. 9, pp. 81 057–81 067, 2021.

[76] F. Shehzad, N. Javaid, S. Aslam, and M. U. Javaid, “Electricity theft detec-

tion using big data and genetic algorithm in electric power systems,” Electric

Power Systems Research, vol. 209, p. 107975, 2022.

[77] “Dealing with outliers using the z-score method - analytics vid-

hya.” [Online]. Available: https://www.analyticsvidhya.com/blog/2022/08/

dealing-with-outliers-using-the-z-score-method/

[78] F. Shehzad, N. Javaid, A. Almogren, A. Ahmed, S. M. Gulfam, and A. Rad-

wan, “A robust hybrid deep learning model for detection of non-technical

losses to secure smart grids,” IEEE Access, vol. 9, pp. 128 663–128 678, 2021.

124 By: Arshid Ali

MS Thesis Electricity Theft Detection

[79] P. R. Kanna, K. Sindhanaiselvan, and M. Vijaymeena, “A defensive mecha-

nism based on pca to defend denial of-service attack,” International Journal

of Security and Its Applications, vol. 11, no. 1, pp. 71–82, 2017.

[80] T. Peng, H. Shen, Y. Zhang, P. Ren, J. Zhao, and Y. Jia, “Status forecast

and fault classiﬁcation of smart meters using lightgbm algorithm improved

by random forest,” Wireless Communications & Mobile Computing (Online),

vol. 2002, 2022.

[81] G. Lin, X. Feng, W. Guo, X. Cui, S. Liu, W. Jin, Z. Lin, and Y. Ding, “Elec-

tricity theft detection based on stacked autoencoder and the undersampling

and resampling based random forest algorithm,” IEEE Access, vol. 9, pp.

124 044–124 058, 2021.

[82] T. Daniya, M. Geetha, and K. S. Kumar, “Classiﬁcation and regression trees

with gini index,” Advances in Mathematics Scientiﬁc Journal, vol. 9, no. 10,

pp. 1857–8438, 2020.

[83] S. Li, Y. Han, X. Yao, S. Yingchen, J. Wang, and Q. Zhao, “Electricity theft

detection in power grids with deep learning and random forests,” Journal of

Electrical and Computer Engineering, vol. 2019, 2019.

[84] B. Patnaik, M. Mishra, R. C. Bansal, and R. K. Jena, “Modwt-xgboost

based smart energy solution for fault detection and classiﬁcation in a smart

microgrid,” Applied Energy, vol. 285, p. 116457, 2021.

[85] S. Dey, Y. Kumar, S. Saha, and S. Basak, “Forecasting to classiﬁcation: Pre-

dicting the direction of stock market price using xtreme gradient boosting,”

PESIT South Campus, 2016.

[86] M. R. C. Acosta, S. Ahmed, C. E. Garcia, and I. Koo, “Extremely random-

ized trees-based scheme for stealthy cyber-attack detection in smart grid

networks,” IEEE access, vol. 8, pp. 19 921–19 933, 2020.

[87] B. Sumalatha, M. Seetha, and G. NARAYANAMMA, “An eﬃcient ap-

proach for robust image classiﬁcation based on extremely randomized de-

cision trees,” International Journal of Computer Science and Information

Technologies, vol. 2, no. 2, pp. 677–685, 2011.

125 By: Arshid Ali

MS Thesis Electricity Theft Detection

[88] Z. Ouyang, X. Sun, J. Chen, D. Yue, and T. Zhang, “Multi-view stack-

ing ensemble for power consumption anomaly detection in the context of

industrial internet of things,” IEEE Access, vol. 6, pp. 9623–9631, 2018.

[89] M. R. Mosavi, M. Khishe, M. J. Naseri, G. R. Parvizi, and M. Ayat, “Multi-

layer perceptron neural network utilizing adaptive best-mass gravitational

search algorithm to classify sonar dataset,” Archives of Acoustics, vol. 44,

2019.

[90] Pamir, N. Javaid, U. Qasim, A. S. Yahaya, E. H. Alkhammash, and M. Had-

jouni, “Non-technical losses detection using autoencoder and bidirectional

gated recurrent unit to secure smart grids,” IEEE Access, vol. 10, pp. 56863–

56 875, 2022.

[91] S. Nallathambi and K. Ramasamy, “Prediction of electricity consumption

based on dt and rf: An application on usa country power consumption,”

in 2017 IEEE International Conference on Electrical, Instrumentation and

Communication Engineering (ICEICE), 2017, pp. 1–7.

[92] R. Yao, N. Wang, W. Ke, P. Chen, and X. Sheng, “Electricity theft detection

in unbalanced sample distribution: a novel approach including a mechanism

of sample augmentation,” Applied Intelligence, pp. 1–20, 9 2022. [Online].

Available: https://link.springer.com/article/10.1007/s10489-022-04069-z

[93] G. P. Siknun and I. S. Sitanggang, “Web-based classiﬁcation application for

forest ﬁre data using the shiny framework and the c5.0 algorithm,” Procedia

Environmental Sciences, vol. 33, pp. 332–339, 2016.

[94] J. Brzezinski and G. Knaﬂ, “Logistic regression modeling for context-based

classiﬁcation,” in Proceedings. Tenth International Workshop on Database

and Expert Systems Applications. DEXA 99, 1999, pp. 755–759.

[95] S. S. Noureen, S. B. Bayne, E. Shaﬀer, D. Porschet, and M. Berman,

“Anomaly detection in cyber-physical system using logistic regression anal-

ysis,” in 2019 IEEE Texas Power and Energy Conference (TPEC), 2019,

pp. 1–6.

126 By: Arshid Ali

MS Thesis Electricity Theft Detection

[96] X. Shan, Y. Ren, J. Lin, M. Zhai, J. Li, and B. Wang, “Power system fault

diagnosis based on logistic regression deep neural network,” in 2021 IEEE

4th International Electrical and Energy Conference (CIEEC), 2021, pp. 1–6.

[97] V. Kumar, “Evaluation of computationally intelligent techniques for breast

cancer diagnosis,” Neural Computing and Applications, vol. 33, pp. 3195–

3208, 4 2021.

[98] J. Gou, H. Ma, W. Ou, S. Zeng, Y. Rao, and H. Yang, “A

generalized mean distance-based k-nearest neighbor classiﬁer,” Expert

Systems with Applications, vol. 115, pp. 356–372, 2019. [Online]. Available:

https://www.sciencedirect.com/science/article/pii/S0957417418305293

[99] S. Aziz, S. Z. Hassan Naqvi, M. U. Khan, and T. Aslam, “Electricity theft

detection using empirical mode decomposition and k-nearest neighbors,” in

2020 International Conference on Emerging Trends in Smart Technologies

(ICETST), 2020, pp. 1–5.

[100] H. L. Qimin Cao, Lei La and S. Han, “Mixed weighted knn for imbalanced

datasets,” International Journal of Performability Engineering, vol. 14,

no. 7, p. 1391, 2018. [Online]. Available: http://www.ijpe-online.com/EN/

abstract/article_3613.shtml

[101] M. Singh, M. Wasim Bhatt, H. S. Bedi, and U. Mishra, “Performance of

bernoullis naive bayes classiﬁer in the detection of fake news,” Materials

Today: Proceedings, 2020. [Online]. Available: https://www.sciencedirect.

com/science/article/pii/S2214785320385333

[102] Y. Guo and L. Lu, “Research on recognition and classiﬁcation of user stealing

detection based on weighted naive bayes,” in 2021 International Conference

on Control Science and Electric Power Systems (CSEPS), 2021, pp. 75–78.

[103] M. F. A. Saputra, T. Widiyaningtyas, and A. P. Wibawa, “Illiteracy clas-

siﬁcation using k means-naïve bayes algorithm,” International Journal on

Informatics Visualization, vol. 2, pp. 153–158, 2018.

[104] J. Singh and R. Banerjee, “A study on single and multi-layer perceptron neu-

ral network,” in 2019 3rd International Conference on Computing Method-

ologies and Communication (ICCMC), 2019, pp. 35–40.

127 By: Arshid Ali

MS Thesis Electricity Theft Detection

[105] C. A. Mello, R. Lewis, A. Brooks-Kayal, J. Carlsen, H. Graben-

statter, and A. M. White, “(14) (pdf) supervised learning for the

neurosurgery intensive care unit using single-layer perceptron classiﬁers.”

[Online]. Available: https://www.researchgate.net/publication/281828950_

Supervised_Learning_for_the_Neurosurgery_Intensive_Care_Unit_

Using_Single-Layer_Perceptron_Classiﬁers

[106] M. R. Wasef and N. Raﬂa, “Hls implementation of linear discriminant anal-

ysis classiﬁer,” in 2020 IEEE International Symposium on Circuits and Sys-

tems (ISCAS), 2020, pp. 1–4.

[107] H. Sifaou, A. Kammoun, and M.-S. Alouini, “High-dimensional linear

discriminant analysis classiﬁer for spiked covariance model *,” Journal of

Machine Learning Research, vol. 21, pp. 1–24, 2020. [Online]. Available:

http://jmlr.org/papers/v21/19-428.html.

[108] “Linear discriminant analysis, explained | by yang xiaozhou | to-

wards data science.” [Online]. Available: https://towardsdatascience.com/

linear-discriminant-analysis-explained-f88be6c1e00b

[109] C.-C. Chang, Y.-J. Lee, and H.-K. Pao, “A passive-aggressive algorithm for

semi-supervised learning,” in 2010 International Conference on Technologies

and Applications of Artiﬁcial Intelligence, 2010, pp. 335–341.

[110] K. Crammer, O. Dekel, and J. Keshet, “Online passive-aggressive algorithms

shai shalev-shwartz yoram singer ,” Journal of Machine Learning Research,

vol. 7, pp. 551–585, 2006.

[111] S. Mandt, M. D. H. Fman, and D. M. Blei, “Stochastic gradient descent

as approximate bayesian inference,” Journal of Machine Learning Research,

vol. 18, 4 2017. [Online]. Available: https://arxiv.org/abs/1704.04289v2

[112] L. Guo, M. Li, S. Xu, and F. Yang, “Application of stochastic gradient

descent technique for method of moments,” in 2020 IEEE International

Conference on Computational Electromagnetics (ICCEM), 2020, pp. 97–98.

[113] A. Sharma, “Guided stochastic gradient descent algorithm for inconsistent

datasets,” Applied Soft Computing, vol. 73, pp. 1068–1080, 2018.

128 By: Arshid Ali

MS Thesis Electricity Theft Detection

[Online]. Available: https://www.sciencedirect.com/science/article/pii/

S156849461830557X

[114] A. H. Jahromi and M. Taheri, “A non-parametric mixture of gaussian naive

bayes classiﬁers based on local independent features,” in 2017 Artiﬁcial In-

telligence and Signal Processing Conference (AISP), 2017, pp. 209–212.

[115] E. K. Ampomah, G. Nyame, Z. Qin, P. C. Addo, E. O. Gyamﬁ, and M. Gyan,

“Stock market prediction with gaussian naïve bayes machine learning algo-

rithm,” Informatica (Slovenia), vol. 45, pp. 243–256, 6 2021.

[116] D. T. Barus, R. Elfarizy, F. Masri, and P. H. Gunawan, “Parallel pro-

gramming of churn prediction using gaussian naïve bayes,” in 2020 8th

International Conference on Information and Communication Technology

(ICoICT), 2020, pp. 1–4.

[117] V. K. V and P. Samuel, “A multinomial naïve bayes classiﬁer for identifying

actors and use cases from software requirement speciﬁcation documents,”

in 2022 2nd International Conference on Intel ligent Technologies (CONIT),

2022, pp. 1–5.

[118] M. K. Saad, “The impact of text preprocessing and term weighting on arabic

text classiﬁcation,” 2010.

[119] D. Arpit, S. Wu, P. Natarajan, R. Prasad, and P. Natarajan, “Ridge re-

gression based classiﬁers for large scale class imbalanced datasets,” in 2013

IEEE Workshop on Applications of Computer Vision (WACV), 2013, pp.

267–274.

[120] “Ridge classiﬁcation concepts python examples -

data analytics.” [Online]. Available: https://vitalﬂux.com/

ridge-classiﬁcation-concepts-python-examples/

[121] S. Johri, S. Debnath, A. Mocherla, A. Singk, A. Prakash, J. Kim, and

I. Kerenidis, “Nearest centroid classiﬁcation on a trapped ion quantum

computer,” npj Quantum Information 2021 7:1, vol. 7, pp. 1–11, 8 2021.

[Online]. Available: https://www.nature.com/articles/s41534-021-00456-5

129 By: Arshid Ali

MS Thesis Electricity Theft Detection

[122] E. N. Tamatjita and A. W. Mahastama, “Comparison of music genre clas-

siﬁcation using nearest centroid classiﬁer and k-nearest neighbours,” in

2016 International Conference on Information Management and Technol-

ogy (ICIMTech), 2016, pp. 118–123.

[123] “Classiﬁcation.” [Online]. Available: https://idc9.github.io/stor390/notes/

classiﬁcation/classiﬁcation.html

[124] D. Menaka, L. P. Suresh, and S. S. P. Kumar, “Land cover classiﬁcation of

multispectral satellite images using qda classiﬁer,” in 2014 International

Conference on Control, Instrumentation, Communication and Computa-

tional Technologies (ICCICCT), 2014, pp. 1383–1386.

[125] “9.2.8 - quadratic discriminant analysis (qda) | stat 508.” [Online]. Available:

https://online.stat.psu.edu/stat508/lesson/9/9.2/9.2.8

[126] “10.2 - discriminant analysis procedure | stat 505.” [Online]. Available:

https://online.stat.psu.edu/stat505/lesson/10/10.2

[127] B. Seref and E. Bostanci, “Performance of naïve and complement naïve bayes

algorithms based on accuracy, precision and recall performance evaluation

criterions,” Int. J. Comput, vol. 8, pp. 75–92, 2019.

[128] J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, “Tackling the poor

assumptions of naive bayes text classiﬁers,” in Proceedings of the 20th inter-

national conference on machine learning (ICML-03), 2003, pp. 616–623.

[129] A. Martino, A. Rizzi, and F. M. F. Mascioli, “Supervised approaches for pro-

tein function prediction by topological data analysis,” in 2018 International

Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8.

[130] G. Figueroa, Y.-S. Chen, N. Avila, and C.-C. Chu, “Improved practices in

machine learning algorithms for ntl detection with imbalanced data,” in 2017

IEEE Power Energy Society General Meeting, 2017, pp. 1–5.

[131] A. Fernández, S. Garcia, F. Herrera, and N. V. Chawla, “Smote for learning

from imbalanced data: progress and challenges, marking the 15-year an-

niversary,” Journal of artiﬁcial intel ligence research, vol. 61, pp. 863–905,

2018.

130 By: Arshid Ali

MS Thesis Electricity Theft Detection

[132] J. Brandt and E. Lanzén, “A comparative review of smote and adasyn in

imbalanced data classiﬁcation,” 2021.

[133] M. Koziarski, M. Woniak, and B. Krawczyk, “Combined cleaning and

resampling algorithm for multi-class imbalanced data with label noise,”

Knowledge-Based Systems, vol. 204, p. 106223, 2020. [Online]. Available:

https://www.sciencedirect.com/science/article/pii/S0950705120304330

[134] W. A. Rivera, “Noise reduction a priori synthetic over-sampling for class

imbalanced data sets,” Information Sciences, vol. 408, pp. 146–161, 2017.

[135] Q. Cao and S. Wang, “Applying over-sampling technique based on data den-

sity and cost-sensitive svm to imbalanced learning,” in 2011 International

Conference on Information Management, Innovation Management and In-

dustrial Engineering, vol. 2, 2011, pp. 543–548.

131 By: Arshid Ali

A Stacked Machine and Deep Learning Model for Electricity Theft Detection to Secure Smart Grid

Recommended publications

Step towards secure and reliable smart grids in Industry 5.0: A federated learning assisted hybrid d...

Exploiting machine learning to tackle peculiar consumption of electricity in power grids: A step tow...

CNN and GRU based Deep Neural Network for Electricity Theft Detection to Secure Smart Grid

A Novel Combined Deep Neural Network With Boosting Method For Electricity Theft Detection In Smart G...

Enhancing Smart Grid Security: Detecting Electricity Theft through Ensemble Deep Learning