ThesisPDF Available

A Stacked Machine and Deep Learning Model for Electricity Theft Detection to Secure Smart Grid

Authors:
A Stacked Machine and Deep Learning Model for
Electricity Theft Detection to Secure Smart Grid
By
Arshid Ali
CIIT/Registration No: SP21-REE-004/ISB
MS thesis
In
Electrical Computer Engineering
COMSATS University Islamabad
Fall 2022
MS Thesis Electricity Theft Detection
COMSATS University Islamabad
A Stacked Machine and Deep Learning Model for
Electricity Theft Detection to Secure Smart Grid
A Thesis Presented to
COMSATS University Islamabad
In partial fulfillment
of the requirement for the degree
Of
MS Electrical & Computer Engineering
By
Arshid Ali
CIIT/Registration No: SP21-REE-004/ISB
Fall 2022
iBy: Arshid Ali
MS Thesis Electricity Theft Detection
A Stacked Machine and Deep Learning Model for
Electricity Theft Detection to Secure Smart Grid
A Post Graduate Thesis submitted to the Department of Electrical Computer
Engineeringas partial fulfilment for the award of Degree MS Electrical Computer
Engineering.
Name Registration Number
Arshid Ali CIIT/Registration No: SP21-REE-004/ISB
Supervisor: Co-Supervisor:
Dr. Laiq Khan Dr. Nadeem Javaid
Professor Professor
Department of Electrical Computer Engineering Computer Science
COMSATS University Islamabad COMSATS University Islamabad
Signature:
Arshid Ali
(CIIT/Registration No: SP21-REE-004/ISB)
ii By: Arshid Ali
MS Thesis Electricity Theft Detection
Final Approval
This thesis titled
A Stacked Machine and Deep Learning Model for Electricity Theft
Detection to Secure Smart Grid
By
Arshid Ali
CIIT/Registration No: SP21-REE-004/ISB
has been approved for
COMSATS University Islamabad
External Examiner
Examiner Name
Department of Computer Science
ABCD University, Islamabbad
Supervisor: Co-Supervisor:
Dr. Laiq Khan Dr. Nadeem Javaid
Professor Professor
Department of Electrical Computer Engineering Computer Science
COMSATS University Islamabad COMSATS University Islamabad
Head of Department:
HoD Name
HoD Title,
Department of ECE
COMSATS University Islamabad
iii By: Arshid Ali
MS Thesis Electricity Theft Detection
Declaration
I, Arshid Ali, registration number CIIT/Registration No: SP21-REE-004/ISB, hereby
declare that I have produced the work presented in this thesis, during the sched-
uled period of study. I also declare that I have not taken any material from any
source except referred to wherever due that amount of plagiarism is within ac-
ceptable range. If a violation of HEC rules on research has occurred in this thesis,
I shall be liable to punishable action under the plagiarism rules of the HEC.
Date: December 19, 2022
Arshid Ali
CIIT/Registration No:
SP21-REE-004/ISB
iv By: Arshid Ali
MS Thesis Electricity Theft Detection
Certificate
It is certified that Arshid Ali (CIIT/Registration No: SP21-REE-004/ISB) has
carried out all the work related to this thesis under my supervision at the Depart-
ment of Electrical Computer Engineering, COMSATS University Islamabad, and
the work fulfills the requirement for award of MS degree.
Date: December 19, 2022
Head of Department:
HoD Name
HoD Title
Department of Electrical Computer Engi-
neering
COMSATS University Islamabad
Supervisor:
Dr. Laiq Khan
Professor
COMSATS University Islamabad
Co-Supervisor:
Dr. Nadeem Javaid
Professor
COMSATS University Islamabad
vBy: Arshid Ali
MS Thesis Electricity Theft Detection
Dedication
Dedicated to my Family, Teachers and Friends.
vi By: Arshid Ali
MS Thesis Electricity Theft Detection
Acknowledgements
This thesis would not have been possible without the support of many people.
Prof.Dr. Laiq Khan and Prof. Dr. Nadeem Javaid have been ideal teachers,
mentors, and thesis supervisors, offering advice and encouragement with a perfect
blend of insight and humor. Im proud of, and grateful for, my time working with
them. Thanks to my adviser, Prof. Dr. Laiq Khan, and Prof. Dr. Nadeem
Javaid, who read my numerous revisions and helped clarify the confusion. Also
thanks to Dr. Junaid Ikram, Dr. Guftar Ahmad, and Dr. Fasih Uddin, who
offered guidance and support.
Thanks to the COMSATS University Islamabad for awarding me a Dissertation
Completion Fellowship, and providing me with the financial means to complete
this project. And finally, thanks to my colleagues, parents, and numerous friends
who endured this long process with me, always offering support and love.
Thank You.
Arshid Ali
CIIT/Registration No: SP21-REE-004/ISB
vii By: Arshid Ali
MS Thesis Electricity Theft Detection
Abstract
Abstract-1
Energy management and efficient asset utilization play an important role in the
economic development of a country. The electricity produced at the power station
faces two types of losses from the generation point to the end user. These losses are
technical losses (TL) and non-technical losses (NTL). The technical losses are due
to the use of inefficient equipment. Non-technical losses mainly occur due to the
illegal use of electricity by customers in the form of theft mainly at the consump-
tion level. These losses in the smart grid (SG) are the main issue in maintaining
grid stability and cause revenue loss to the utility. The automatic metering infras-
tructure (AMI) system has reduced grid instability but it has opened up new ways
for NTLs in the form of different cyber-physical theft attacks (CPTA). Machine
learning (ML) techniques can be used to detect and minimize CPTA. However,
they have certain limitations and cannot capture the energy consumption pat-
tern (ECP) of all the users, which decreases the performance of ML techniques
in detecting malicious users. In this paper, we propose a novel ML-based stacked
generalization method for the cyber-physical theft issue in the smart grid. The
original data obtained from the grid is pre-processed to improve model training
and processing. This includes NaN-imputation, normalization, outliers’ capping,
SVM-SMOTE balancing, and PCA-based data reduction techniques. The pre-
processed dataset is provided to the ML models LGB, ET, XGBoost, and RF, to
accurately capture all consumers’ overall ECP. The predictions from these base
models are fed to a meta-classifier multi-layer perceptron (MLP). The MLP com-
bines the learning capability of all the base models and gives an improved final
prediction. The proposed structure is implemented and verified on the publicly
available real-time large dataset of the State Grid Corporation of China (SGCC).
The proposed model outperformed the individual base classifiers and the existing
research in terms of CPTA detection with FPR, FNR, F1-Score, and accuracy
values of 0.72%, 2.05%, 97.6%, and 97.69%, respectively.
viii By: Arshid Ali
MS Thesis Electricity Theft Detection
Abstract-2
Electricity plays an important role in our daily life and the demand is increas-
ing day by day. Therefore, while meeting the required energy demand, efficient
energy resource utilization should also be considered due to the limited resources
available. The electricity generated at the power station incurs significant losses
in reaching the end consumers. These losses are of technical and non-technical
(NT) categories in which NT losses are billions of dollar Many techniques are in-
troduced by utility companies to address this issue. Recently, a lot of machine
learning-based classifiers have been utilized to deal with NTL. However, little
study has been conducted on the evaluation criteria used in NTL identification to
assess how successful or inaccurate the algorithm is at properly forecasting non-
technical loss. In a manner similar to this, the presence of unbalanced classes
in this sort of data presents a gap to research unbalanced data management so-
lutions, which are mostly unexplored in the literature. In order to choose which
classifier and balancing method produce the best classification results for the theft
detection problem, the authors in this paper carried out a comparative analysis
of various machine-learning algorithms on several data balancing techniques. The
given research applied the 15 simple machine-learning techniques of LR, BNB,
GNB, KNN, Perceptron, PAC, QDA, SGDC, RC, LDA, DT, NCC, MNB, CNB
and dummy classifier. While SMOTE, AdaSyn, SMOBD, NRAS, and CCR are
considered for data balancing. Area Under ROC Curve (AUC), F1-score, and
other six performance measures, which are better suited for this type of situation,
were used as comparative measures. The results indicate that some classifiers
show better performance than others when compared to different class balancing
methods.
ix By: Arshid Ali
MS Thesis Electricity Theft Detection
Contents
Thesis Title ............................... i
Cover Page ............................... i
Supervisor Approval .......................... i
Final Approval ............................. iii
Declaration ............................... iv
Certificate ................................ v
Dedication ................................ vi
Acknowledgements ........................... vii
Abstract .................................viii
Contents x
List of Figures xvi
xvii
List of Tables xviii
xviii
xBy: Arshid Ali
MS Thesis Electricity Theft Detection
List of Abbreviations xix
1Preliminary 1
1.1 Introduction ............................... 2
1.2 Electricity Grid ............................. 3
1.2.1 Traditional Grid ........................ 4
1.2.2 Smart Grid ........................... 5
1.2.2.1 Characteristics of smart grid ............. 5
1.3 Data Science ............................... 6
1.3.1 Data Science Applications ................... 7
1.4 Structures of Machine Learning Algorithms .............. 8
1.4.1 Individual Structure ...................... 8
1.4.2 Ensemble Structure ....................... 8
1.4.2.1 Bagging Model .................... 9
1.4.2.2 Boosting Model .................... 10
1.4.2.3 Stacking Model .................... 12
1.5 Data Science and Smart Grid ..................... 13
1.6 Summary ................................ 14
2Introduction 15
2.1 Introduction ............................... 16
2.1.1 Background and Motivation .................. 16
2.1.2 Non-Technical Losses Issues in Smart Grid .......... 20
2.1.3 Contributions .......................... 22
xi By: Arshid Ali
MS Thesis Electricity Theft Detection
2.2 Layout of Thesis ............................ 25
2.2.1 Summary ............................ 25
3Literature Review 26
3.1 NTL detection schemes categories ................... 27
3.1.1 Hardware Based ......................... 27
3.1.2 Game theory Based ....................... 29
3.1.3 Artificial Intelligence Based .................. 29
3.2 Problem Analysis ............................ 37
3.3 Summary ................................ 40
4Proposed Model-1 and Simulation Results 42
4.1 Introduction ............................... 43
4.2 Dataset Information .......................... 45
4.3 Pre-processing .............................. 45
4.3.1 Missing Data Imputation .................... 46
4.3.2 Handling Outliers ........................ 47
4.3.3 Unit based Normalization ................... 49
4.3.4 Data Balancing ......................... 50
4.3.5 Feature Engineering ...................... 52
4.4 Model Selection ............................. 54
4.4.1 Base Learner-1 ......................... 54
4.4.2 Base Learner-2 ......................... 54
4.4.3 Base Learner-3 ......................... 56
xii By: Arshid Ali
MS Thesis Electricity Theft Detection
4.4.4 Base Learner-4 ......................... 56
4.4.5 Stacking Model ......................... 57
4.5 MLP Mathematical Modeling ..................... 58
4.6 Performance Metrics .......................... 59
4.7 Simulation Setup ............................ 62
4.8 Results Discussion and Evaluation ................... 62
4.9 Summary ................................ 66
5Proposed Model-2 and Simulation Results 68
5.1 Classification Algorithms ........................ 69
5.1.1 Decision Tree (DT) ....................... 69
5.1.2 Logistic Regression ....................... 70
5.1.3 K Nearest Neighbors Classifier ................. 72
5.1.4 Bernoulli Naive Bayes Classifier ................ 73
5.1.5 Perceptron ............................ 74
5.1.6 Linear Discriminant Analysis ................. 76
5.1.7 Passive Aggressive Classifier .................. 78
5.1.8 Stochastic Gradient Descent .................. 80
5.1.9 Gaussian Naive Bayes ..................... 81
5.1.10 Multinomial Naive Bayes Algorithm .............. 82
5.1.11 Ridge Classifier ......................... 84
5.1.12 Nearest Centroid Classifier ................... 85
5.1.13 Quadratic Discriminant Analysis ............... 86
xiii By: Arshid Ali
MS Thesis Electricity Theft Detection
5.1.14 Complement Naive Bayes ................... 88
5.1.15 Dummy/Blind Classifier .................... 89
5.2 Data Balancing Techniques ....................... 90
5.2.1 Synthetic Minority Over-Sampling Technique ........ 90
5.2.2 Adaptive Synthetic sampling approach ............ 91
5.2.3 Combined Cleaning and Re-sampling Technique ....... 91
5.2.4 Noise Reduction A Priori Synthetic Over-Sampling (NRAS) 92
5.2.5 SMOBD (Synthetic Minority Over-sampling Based on sam-
ples Density) .......................... 92
5.3 Research Methodology ......................... 93
5.4 Evaluation Parameters Used ...................... 95
5.4.1 Accuracy ............................. 95
5.4.2 Recall .............................. 95
5.4.3 Precision ............................. 96
5.4.4 F1Score ............................. 96
5.4.5 Area Under the Curve ..................... 96
5.4.6 False Positive Rate ....................... 96
5.4.7 False Negative Rate ....................... 96
5.4.8 Matthews Correlation Coefficient ............... 97
5.4.9 Receiver Operator Characteristic ............... 97
5.5 Dataset and Simulation setup ..................... 97
5.6 Simulation Results and Analysis .................... 98
5.6.1 Output Performance using SMOTE-based Data Balancing . 100
xiv By: Arshid Ali
MS Thesis Electricity Theft Detection
5.6.2 Output Performance using ADASYN-based Data Balancing 101
5.6.3 Output Performance using SMOBD-based Data Balancing . 103
5.6.4 Output Performance using NRAS-based Data Balancing . . 104
5.6.5 Output Performance using CCR-based Data Balancing . . . 106
6Conclusion & Future Work 112
6.1 Conclusion ................................113
6.2 Conclusion ................................114
6.3 Future Work ...............................115
Bibliography 116
xv By: Arshid Ali
MS Thesis Electricity Theft Detection
List of Figures
1.1 Individual Machine Learning Model. ................. 9
1.2 Bagging Type Machine Learning Model. ............... 10
1.3 Boosting Machine Learning Model. .................. 11
1.4 Stacking Machine Learning Model. .................. 12
2.1 Energy Generation by Sources. .................... 17
2.2 Economic Losses in Different Countries in Billion of USD. . . . . . . 19
2.3 Structure of AMI Network. ....................... 20
2.4 ML Techniques in Various Fields. ................... 22
2.5 Proposed-Model Flow chart. ...................... 24
4.1 Proposed ETD Stacked Generalization Model. ............ 44
4.2 Electricity Consumption Pattern of Two Random Consumers from
SGCC Dataset. ............................. 46
4.3 Total contribution of outliers. ..................... 48
4.4 Variations in ECP of Electric Theft and Honest Consumer. . . . . . 50
4.5 SVM-SMOTE base Balanced Dataset(SGCC). ............ 51
4.6 SVM-SMOTE System Diagram. .................... 52
xvi By: Arshid Ali
MS Thesis Electricity Theft Detection
4.7 Proposed-Model Performance on SGCC Dataset. .......... 62
4.8 Proposed-Model Confusion Matrix on SGCC Dataset. ........ 63
4.9 Precision-Recall Curve of the Base Models and Proposed-Model. . . 64
4.10 Accuracy Comparison of Level-0 and Proposed-Model. ....... 65
4.11 Comparison of Base Models’ ROC with Proposed-Model ROC. . . . 65
4.12 Comparison of AUC, F1-SCORE and Accuracy of Base Models and
Proposed Model. ............................ 66
5.1 Proposed Electricity Theft Detection Model. ............. 94
xvii By: Arshid Ali
MS Thesis Electricity Theft Detection
List of Tables
1 List of abbreviations and acronyms. .................xix
2.1 U.S. Utility-Scale Electricity Generation by Source. ......... 18
3.1 ...................................... 37
4.1 SGCC Original Dataset Information. ................. 45
4.2 Output Performance of Different Models on pre-processed data. . . 63
4.3 Models Performance on Different Data Splitting. ........... 64
4.4 {F1-Score, AUC, Accuracy of the Base Models, and Proposed Model 66
5.1 Information of Real World SGCC Dataset. .............. 93
5.2 15 Models Output Performance ....................109
5.3 15 Models Output Performance ....................110
xviii By: Arshid Ali
MS Thesis Electricity Theft Detection
Abbreviations Full Form
AMI Automatic metering infrastructure
CPTA Cyber-physical theft attack
PCA Principal component analysis
SVM Support vector machine
SMOTE Synthetic minority oversampling technique
LGB Light gradient boosting
ET Extremely randomized tree
XGBoost Extreme gradient boosting
MLP Multi-layer perceptron
RF Random forest
TNR True negative rate
FNR False negative rate
EPL Electric power loss
NTL Non-technical loss
TL Technical loss
SG Smart grid
ECP Energy consumption pattern
TPR True positive rate
FPR False positive rate
AUC Area under the curve
ROC Receiver operating characteristics curve
NaN Not a number
Symbols Description
σStandard deviation
Z Z score
µMean value
W Weight matrix
xm,n Daily energy consumption
N Number of features
λEigen values
c Number of classes
b Bias
x_i Input data-point
piProbability of outcome of a class
Table 1: List of abbreviations and acronyms.
xix By: Arshid Ali
A Stacked Machine and Deep Learning Model for
Electricity Theft Detection to Secure Smart Grid
December 19, 2022
MS Thesis Electricity Theft Detection
Chapter 1
Preliminary
1By: Arshid Ali
MS Thesis Electricity Theft Detection
1.1 Introduction
The electric power grid is the most complex man-made network to date that is
designed to work reliably in extreme environmental conditions and different ge-
ographical locations. The integration of large renewable resources into the sys-
tem also affects the system’s operation. This needs the electricity network to be
modernized to feed the electricity to the community. Upgrading the traditional
electricity network with new technological and innovative infrastructure made the
grid more smarter. A smart grid is defined as an intelligent electricity network
that allows a two-way flow of electricity and information in the network. This
information flow in the smart grid is made possible by an automatic metering
infrastructure (AMI) system in the grid which interconnected the consumer and
utility. Thus big data is obtained at the utility end from a community of energy
consumers (EC). Manually, analyzing this big data, for a particular power flow
between consumers and utilities, is a challenging task and time-consuming.
The present data-driven approaches made this possible to analyze the energy flow,
to a given consumer in the community, using the acquired data. Data science,
which is the science of data, made it easy to gain information from data available in
a raw form. Due to this informative nature, data science made its way into almost
every field of life. In smart grids, data science is used in load forecasting, energy
forecasting from wind and solar, electric vehicle (EV) battery life forecasting,
grid intelligence, operation and planning, fault detection, and electricity theft
detection, which is the main research topic in this thesis.
Machine learning techniques can be implemented in different ways to be used in the
above-mentioned fields. For example, the algorithm can be used in individual or
make some form of combination to obtain a different model like bagging, boosting,
and stacking. These different combinations of algorithms make the available data
easy to predict a future event. Following these different approaches, the electricity
theft detection issue can be addressed.
2By: Arshid Ali
MS Thesis Electricity Theft Detection
Preliminary
1.2 Electricity Grid
An electrical grid is an interconnection for energy dispatch from the generation
point to the end consumer. The electric grid consists of different sizes and may be
expanded through a country or every continent. It comprises a complete energy
system that consists of generation units, transmission lines, distribution points,
transmission and distribution transformers, and the load [1].
Generally, the generation units are situated at locations far from the end consumer
while the electrical grid connects both the production and consumption ends. The
whole energy system is divided into the following three major parts:
1. GENERATION
The electrical generation is of two types:
(a) A centralized system consists of traditional and few large generating
points that are far away from the end users. This type of generation in-
cludes hydro, nuclear, coal, natural gas, and large solar and wind farms.
Where the grid act as an interconnecting point between generation and
consumers.
(b) While the distribution generation exists close to the consumption points,
for example, a diesel generator and rooftop solar plant.
2. TRANSMISSION and DISTRIBUTION
The transmission consists of transformers, substations, and power lines that
carry electricity from the point of generation to the point of consumption.
For long distances electricity is delivered at a high voltage as electricity at
high voltages minimizes transmission losses over long distances, such as resis-
tive losses in transmission lines. In order to transmit electricity, substations
contain transformers that step-up the voltage at the point of generation.
Transmission occurs via power lines, either overhead or underground. To
step down the voltage for consumption at end-use points, another substa-
tion is found. Energy is then delivered to end users through distribution
lines and distribution transformers.
3By: Arshid Ali
MS Thesis Electricity Theft Detection
3. CONSUMPTION
Commercial, industrial, and residential consumers form three types of con-
sumers. There are different needs for each of these consumers, but electricity
generally provides light and power for electrical devices.
Prior to the introduction of demand-side management, the transmission and dis-
tribution (T D) system was designed and built to handle peak loads. It was a
passive delivery network for delivering energy to consumers. The transmission
and distribution used to provide the electricity using the whole electricity net-
work, from which the end consumers used only the needed energy and the rest is
discarded [2].
1.2.1 Traditional Grid
The traditional grid is considered the electricity network that allows the flow of
electricity in one direction and central large power generation units are installed.
The present electricity network upgraded from its traditional system. The tradi-
tional power grid is more like a physical system with manual operation. However,
due to a large number of challenges to this traditional system, a highly reliable
and sustainable grid is required to ensure a continuous supply of electricity to
modern societies [3].
In the future, the electricity grid is expected to be equipped with modern techno-
logical and innovative components, where today’s power grid is facing challenges,
in the following manner:
1. Addition of a large number of intermittent energy resources.
2. Renew the old electricity network, which is facing severe threats due to
societal and population growth, by adding modern devices and energy man-
agement techniques.
3. The existing power grid is affected by natural disasters (such as earthquakes,
floods, hurricanes, etc) and is badly affecting the grid network. So a flexible
energy system is required.
4By: Arshid Ali
MS Thesis Electricity Theft Detection
1.2.2 Smart Grid
The electricity grids that exist today are mostly built in the past when the electric-
ity generation cost was very low. Whereas, almost the same energy flow method
from the central generation plant to the consumer is in use that was adopted
almost 100 years ago. In which the consumers’ energy need was met through a
distributed unit and the surplus energy saving mechanism. The smart grid helps
to revolutionize the old system using modern communication and information
technologies. However, a huge capital investment is needed for a single significant
change in this large and complex system. The adoption of demand-side manage-
ment helped the smart grid to adapt to climate change and extreme conditions
by integrating renewable energy sources. This energy management in the smart
grid makes easy the grid planning for utility and ensures no greenhouse gas emis-
sions. It also helps to control and monitor the operation of the power system. In
short, the smart grid is an electrical system that can intelligently integrate the
transmission, distribution, and Prosumers and that ensures the two-way flow of
information and energy to secure the electricity supply [4].
Thus, the working definition becomes: The Smart Grid is an advanced digital
two-way power flow power system capable of self-healing, adaptive, resilient, and
sustainable with future prediction under different uncertainties. It is equipped for
interoperability with present and future standards of components, devices, and
systems that are cyber-secured against malicious attack.
1.2.2.1 Characteristics of smart grid
Smart grid uses modern technology and intelligent control, communication, mon-
itoring, and flexible system to regulate the normal operation of the power system.
Some of the main attributes of the smart grid are mentioned below.
1. Smart grid gives the information of energy price to consumers using real-
time communication and demand side management for continuous power
flow.
5By: Arshid Ali
MS Thesis Electricity Theft Detection
2. It also helps to accommodate storage devices including DGs, battery sys-
tems, and other micro-level energy systems, thereby improving the flexibility
in network operation.
3. It also optimizes the energy resources operation and management by con-
sidering the electricity requirement of what and when it is needed.
4. Smart grid operates durably during extreme weather conditions and cyber-
physical attacks, thus increasing energy security.
5. It decreases the concerns over environmental damage from fossil-fired power
stations.
6. It gives benefits to prosumers by using a real-time communication system
to inform about ON and OFF peak hours.
7. It revolutionizes the modern transportation system by using electric vehicles
as load and energy-storing devices.
8. It reduces energy losses and wastage in the power system.
9. Smart grid reduces pollution and greenhouse gases by ensuring the use of
renewable energy sources.
1.3 Data Science
Manufacturing has seen a massive digital shift in the previous ten years. An in-
dustrial infrastructure of the future generation has been made possible by wireless
networking, reduced sensor and data storage costs, and other factors. It should
come as no surprise that modern manufacturing businesses have access to a huge
variety of data sources that offer enormous quantities of production and perfor-
mance monitoring. In 2010, the manufacturing industry produced more than two
exabytes of data. Data, however, can be a very valuable resource that is increas-
ingly important to global corporate operations provided it is properly handled.
Data has become the new oil in future IT-enhanced systems.
As a result, businesses find it challenging to develop cutting-edge analytics tools
to utilize their data for business benefits. It is possible to promote data-driven
6By: Arshid Ali
MS Thesis Electricity Theft Detection
decision-making and improve the efficiency of current business processes by utiliz-
ing this data with modern analytics tools. Going back to the new oil comparison,
an analytics solution is comparable to an oil refinery in that it turns raw materials
into valuable products.
The "Big Data Revolution" has attracted a lot of IT consultants, but businesses
are frequently let down by the results and confused by the volume and diversity
of data. Instead of widespread applications, the concept of industrial analyt-
ics mostly consists of predictions, ideas, andăprojects. The recent explosion of
machine learning research has produced many useful algorithms and tools, but it
hasn’t given operators theătools they need. As a result,ăindustrial decision-makers
explore a world of new needed outputs [5].
1.3.1 Data Science Applications
There are many fantastic applications in the field of data science. Data science is
playing a significant role Not just in the business field, but also in industries like
healthcare, robotics, medicine, and other sectors [6]. Here some of the applications
of Data Science is mentioned:
1. Education
2. Airline Route Planning
3. Healthcare Industry
4. Banking and Finance
5. Filtered Internet Search
6. Product Recommendation Systems
7. Digital Advertising
8. Image Processing
9. Disease Prediction
10. Anomaly Detection
11. Smart Grid
7By: Arshid Ali
MS Thesis Electricity Theft Detection
1.4 Structures of Machine Learning Algorithms
A machine learning algorithm is a process used by AI systems to carry out their
tasks, which often include estimating output values from input data. Classification
and regression are the two basic techniques used by machine learning systems.
The best machine learning algorithm to use relies on a number of variables, in-
cluding the quantity, quality, and variety of the data as well as the conclusions
that organizations want to draw from it. Accuracy, training duration, parame-
ters,ăand many other factors are also important. As a result, selecting the best
algorithm requires consideration of a variety of factors, including business goals,
specifications, testing, and available time. Even the most expert data scientists
are unable to predict which algorithm would perform the best without first testing
alternatives [7].
Machine learning algorithms can be mainly used in two ways i.e individual model
and ensemble model. These structures are explained below:
1.4.1 Individual Structure
Typically, a single machine learning process begins with training data being en-
tered into the desired algorithm. The training data is used to train the given
model and give new predictions. The output predictions are then compared with
the original labelsăare then compared for performance analysis purposes, as seen
in Fig. 1.1.
1.4.2 Ensemble Structure
Machine Learning Ensemble Methods help to create multiple models and then
combine them to produce improved results, some ensemble methods are catego-
rized into the following groups [8].
8By: Arshid Ali
MS Thesis Electricity Theft Detection
Training
Data
ML
Algorithm
Input
Data Trained
Model Output
Prediction
;
Figure 1.1: Individual Machine Learning Model.
1.4.2.1 Bagging Model
Bagging is a homogeneous type of ensemble technique in which the same type of
several algorithms is combined in parallel. These algorithms are then fed with
different subsets from the original training set for model-learning purposes. This
subset is generated randomly with replacement from the original training set.
This process is called bootstrap aggregation. After the model is trained on all
the available subsets, the outputs are obtained from all the base algorithms. To
obtain a final output of the whole system, a majority voting mechanism is adopted,
where the most frequent prediction among all is considered, in the classification
case. Whereas in the regression case, an average or mean of all the predictions is
obtained.
Bagging visual representation is shown in figure-1.2:
Advantages of a Bagging Model:
1. Bagging greatly decreases the error by decreasing the bias issue in model
prediction.
2. Bagging methods give good performance on the available training set using
bootstrap aggregation.
9By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm-
1
Input Data Majority
Voting/
Average
Algorithm-
2
Algorithm-
3
Algorithm-
N
Prediction-1
Output
Prediction
Prediction-2
Prediction-3
Prediction-N
;
Figure 1.2: Bagging Type Machine Learning Model.
3. Also, if the training set is very huge, it can save computational time by
training the model on a relatively smaller data set and working in parallel
which can increase the accuracy of the model.
4. Works well with small datasets as well.
Examples of bagging types methods are Extra tree and Random forest al-
gorithms.
1.4.2.2 Boosting Model
Boosting is another type of homogeneous ensemble model that uses several weak
learners to make a strong learner. Boosting combine the base algorithms sequen-
tially to reduce the error in output prediction. In the process, the output from the
previous model is given to the next algorithm while the result from each model
is saved iteratively. The whole training set is given to the first algorithm and
then proceeds sequentially. Finally, when all the models are trained and outputs
are obtained. The predictions from all models are majority-voted in the case of
classification and averaged in the regression case.
Also, in boosting, the training set is given more weights for misclassified data
points in order to reduce their effects on the final prediction. While in bagging
the training samples are taken randomly from the whole population as shown in
figure-1.3.
10 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm-
1
Input Data
Majority Voting/
Average
Algorithm-
2Algorithm-
3Algorithm-
N
Prediction-1
Output
Prediction
Prediction-2 Prediction-3 Prediction-N
Figure 1.3: Boosting Machine Learning Model.
In contrast to bagging, which trains weak learners simultaneously using boot-
strap aggregation, boosting trains baseălearners sequentially with each learner’s
purposeăbeing to minimize the errors of the previous one.
Boosting, like bagging, can be used for regression as well as for classification
problems.
Boosting is mainly focused on reducing biasness errors.
Advantages of a Boosting Model:
1. Inability to extract a linear combination of features
2. High variance leading to a small computational power
And thats where boosting comes into the picture. It minimizes the variance
by taking into consideration the results from various trees.
Ada Boost(Adaptive Boosting), Gradient Boosting, XG Boost(Extreme Gradient
Boosting) are few common examples of Boosting Techniques.
11 By: Arshid Ali
MS Thesis Electricity Theft Detection
Level-0
Algorithm-1
Input Data Level-1
Algorithm
Level-0
Algorithm-2
Level-0
Algorithm-3
Level-0
Algorithm-N
Output-1
Final
Prediction
Output-2
Output-3
Output-N Testing
Data
;
Figure 1.4: Stacking Machine Learning Model.
1.4.2.3 Stacking Model
Stacking is also an ensemble model but is different from bagging and boosting types
of techniques. Stacking uses a heterogeneous type of base learners for performance
improvement. A little different phenomenon of out-of-fold cross-validation is used
in this technique. Here the algorithms are combined in multi-layers. Where the
base learners, in layer-0, are trained and predicted using the training set, and the
prediction of these algorithms is given to a layer-1 algorithm. The algorithm at
layer 1 learns from previous models’ predictions and gave a final output. This
output is discrete in the case of classification and continuous in the regression
case. It is different from a voting algorithm because voting just uses voting or
the mean of all predictions. While stacking uses an algorithm for this purpose.
In a nutshell, stacking is an ensemble learning technique that uses meta-learning
to combine several machine-learning algorithms. In this, base-level algorithms are
trained using a complete training data set, and the meta-model is trained using
the results of every base-level models as features. As seen in Fig. 1.4, we can now
learn stacking, which increases model prediction accuracy.
Advantages of a Stacked Generalization Model:
1. The advantage of stacking is that it may use a variety of effective models
to accomplish classification or regression tasks and provide predictions that
perform better than any one model in the ensemble.
12 By: Arshid Ali
MS Thesis Electricity Theft Detection
2. Stacking improves the model prediction accuracy.
The disadvantages of Stacked Generalization Model:
3. As we are taking the whole dataset for training for every individual classifier,
in the case of huge datasets the computational time will be more as each
classifier is working independently on the huge dataset.
1.5 Data Science and Smart Grid
In a smart grid, intelligent devices are able to communicate with one another.
The precise data needed for correct information and energy flow in the network is
provided by these devices. All of this information must be managed in real-time
and preserved so that decisions may be made based on past data and specific
situations. Data gathered from intelligent devices in substations, feeders, and
numerous databases have been used in a number of research projects. Price data,
electricity data, power system data, geography data, weather data, etc. are all
examples of information sources. Thisărequires a forecastămodel that is accurate,
and effective to match the supply and demand of electricity.
For instance, energy consumption data (kWh) from 100,000 or more customer
smart meters at sampling intervals of 15 minutes demonstrates that assuring the
quality of the acquired data is a particular problem for the evaluation of prediction
models for SG. Numerous variables must be forecasted, including renewable en-
ergy production, energy purchase from energy markets, 24-hour load distribution
planning, etc. The complexity of data processing is increased by a large number
of SG data. Processing this enormous quantity of data for optimal power flow and
real-time monitoring and planning can be accomplished.
Big data-based power generation, optimization, and forecasting research has been
expanded to include renewable energy systems like wind energy systems, solar fore-
casting, load forecasting, and others. Also, the data obtained by utility containing
private information is present which need to be handled while not disturbing the
consumers’ privacy Additionally, this data comprises sensitive and confidential
information from a country’s or an organization’s central grid. ă In order to ensure
the efficient operation of the smart grid, data storage, and cyberattacks directed
13 By: Arshid Ali
MS Thesis Electricity Theft Detection
against the power system, sufficient protective mechanisms are necessary. Pro-
cessing large amounts of data and putting in place adequate security measures is
made more desirable by machine learning [9].
1.6 Summary
This chapter gave an introduction to the electrical grid. The difference between
the old traditional and future smart grid is explained. The field of data science,
its current trend, and its wide applications are discussed. In the last, different
machine types of machine learning approaches, in prediction, are explained. The
role of data science is defined as dealing with big data, obtained from smart grids,
in order to ensure security and prediction like electricity theft detection.
Shortly, in the smart grid, machine learning and artificial intelligence can be used
for fault prediction and maintenance, efficient decision-making and forecasting, en-
ergy trading, data protection and security, power consumption, data transparency,
theft detection, etc.
14 By: Arshid Ali
MS Thesis Electricity Theft Detection
Chapter 2
Introduction
15 By: Arshid Ali
MS Thesis Electricity Theft Detection
2.1 Introduction
Efficient energy generation and utilization play an important role in the economy of
a country. The electricity generated at the generation station suffers from several
severe types of losses. These losses occur due to technical and non-technical issues.
The technical losses belong to devices’ efficiency and can only be minimized by
modernizing the electrical components or the whole system. This requires a large
amount of investment, is time-consuming, and also 100% efficient devices don’t
exist yet.
The technical losses also cause a large capital loss to utility, in Billions of USD.
These non-technical losses which belong to unfair use of electricity by users can
be reduced by proper management and systematic observation to prevent such
cases. The utility adopted a lot of techniques from hardware to software to data-
driven methods, but there still exists in-efficiency in terms of illegal electricity
usage prevention.
The developments and innovations in machine learning techniques make desirable
data-driven approaches. Due to the increase, easy and accurate usage of machine
learning techniques for prediction purposes. Machine learning algorithms can be
implemented on smart grid data for ETD purposes. However, these techniques also
face issues regarding prediction accuracy. The implementation of machine learning
algorithms requires data pre-processing and algorithm selection steps for accurate
prediction. In this paper, an ensemble machine and deep learning structure are
adopted to cope with the ETD issues.
Introduction
2.1.1 Background and Motivation
T he successful integration of renewable energy into the electricity network trans-
formed the power grid from a centralized and dull energy system to a decentralized
and intelligent system. This distributed power system makes the grid more efficient
due to efficient infrastructure utilization. The recent technological development
16 By: Arshid Ali
MS Thesis Electricity Theft Detection
and new strategies followed by the utility make the grid more flexible for energy
resource accumulation. Therefore, more intermittent energy resources can be used
for electricity generation. Electricity generation from different resources is shown
in Fig. 2.1.
Gas
23%
Nuclear
10%
Wind
6%
Solar
3%
Hydro
16%
Geo, Biomass &
other
Other
1%
Oil
3%
Coal
35%
Figure 2.1: Energy Generation by Sources.
And this energy can be added to the power system without disturbing the grid
stability. From US Energy Information Administration (eia), the amount of elec-
tricity generation given in Table-2.1, shows an increase in renewable sources above
20% [10]
In addition to the need for improvement in the amount of electricity generation by
adding more resources to the electric grid. Power management and efficient energy
resource utilization also play a useful role in the socioeconomic development of a
country because of the high cost of electricity production and limited available
energy resources. Power management and cost reduction have two possible ways:
1. Generate and transmit electricity from those resources that have minimum
expense per unit.
2. Rated revenue pay-back of consumed electricity to utility in the form of the
electricity billing system.
The reduction in unit per cost of electricity can be addressed by moving towards
low-cost, low-emission renewable sources with more energy-efficient devices. While
17 By: Arshid Ali
MS Thesis Electricity Theft Detection
Model
F1-Score
Accuracy
ET
0.8234
0.8163
XGB
0.8290
0.8346
LGB
0.9340
0.9354
RF
0.9487
0.9492
Proposed-Model
0.9775
0.9777
#
Energy Source
Billion
KWh
Share of
Total
1.
Total All Sources
4,116
2.
Fuels (Total)
2,504
60.8%
3.
Natural Gas
1,575
38.3%
4.
Coal
899
21.8%
5.
Petroleum (Total)
19
0.5%
6.
Petroleum Liquids
11
0.3%
7.
Petroleum Coke
7
0.2%
8.
Other Gases
11
0.3%
9.
Nuclear
778
18.9%
10.
Renewables (Total)
826
20.1%
11.
Wind
380
9.2%
12.
Hydropower
260
6.3%
13.
Solar (Total)
115
2.8%
14.
Photovoltaic
112
2.8%
15.
Solar Thermal
3
0.1%
16.
Biomass (Total)
55
1.3%
17.
Wood
37
0.9%
18.
Landfill Gas
10
0.2%
19.
Municipal Solid
Waste (Biogenic)
6
0.2%
20.
Other Biomass
Waste
2
0.1%
21.
Geothermal
16
0.4%
22.
Pumped Storage
Hydropower
-5
-0.1%
23.
Other Sources
12
0.3%
Table 2.1: U.S. Utility-Scale Electricity Generation by Source.
the revenue pay-back system of utility faces issues due to electric power loss (EPL).
The difference between the energy generated at the generation end and the energy
delivered to the consumers is known as electric power loss. The electricity losses
are classified into two categories [11]. These are:
1. Technical losses or system losses (TLs)
2. Non-technical losses (NTLs)
TLs are the total EPL in the power system, from the network injection point to the
consumer. This occurs due to the energy dissipated in transmission lines, distri-
bution lines, and transformer cores. This problem can be overcome by using good
quality and highly efficient equipment instead of old old electrical infrastructure
which requires a large cost and time.
18 By: Arshid Ali
MS Thesis Electricity Theft Detection
NTL may be due to some kind of abnormality or changes induced by electricity
consumers (EC) in the electricity network like installation errors, billing errors,
faulty meters, or meter by-passing. This creates system disturbance and low power
load management for utility companies. In addition, NTLs or electricity theft (ET)
not only causes significant economic loss but also affects the normal operations of
the power system by creating power fluctuations and disturbing grid stability [12].
According to Northeast Group, the NTL-based worldwide revenue losses were
about $96 billion in 2017 [13]. While in 2014, these losses were about $58.7 billion
in the world with India facing 16.2 billion USD, Brazil facing 10.5 billion USD,
Pakistan facing 0.89 billion USD, and Russia facing 5.1 billion USD [14] [15], which
shows a high increase in loss during the last few years, as shown in Fig-2.2.
96
16.2 10.5 5.1 0.89
0
20
40
60
80
100
120
Worldwide India Brazil Russia Pakistan
Revenue Losses in Billion
$
Countries
38,757 38,757
3,615
38,757
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
Im-Balance Dataset Balanced Dataset
Count
Class
Normal User Theft User
Figure 2.2: Economic Losses in Different Countries in Billion of
USD.
To reduce non-technical losses, utility companies must follow the necessary steps
to identify the theft and abnormal behavior of energy usage. However, the con-
ventional methods require a large number of technicians to make the on-the-spot
checkup of users’ energy meters, and an insignificant amount of energy theft is
detected, which results in a less revenue pay-back.
The recent technological development, AMI system, and especially smart grid
make electricity management, monitoring, and NTL reduction possible. The smart
grid (SG) is an intelligent electricity system that permits a two-way flow of elec-
tricity and information by using an intelligent monitoring system. It integrates
the AMI system to control and monitor the energy usage of consumers and utility
19 By: Arshid Ali
MS Thesis Electricity Theft Detection
in the electricity network [16]. This system works in real-time by first collecting
the user’s electricity consumption (EC) information and then transferring it to
the utility using communication channels for billing, grid security, loss reduction,
and other purposes. The AMI structure is shown in figure-2.3.
Data Base-1 Data Base-2
Real Time
Communication
Utility
Data Collector
Unit
LAN
LAN
LAN
Smart Meter-1
Smart Meter-2
Smart Meter-3
Smart Meter-6
Smart
Meter-4
Smart
Meter-4 Smart
Meter-5
Smart
Meter-5
Smart Meter-N
WAN
Figure 2.3: Structure of AMI Network.
The collection of EC in real-time makes the SG capable to detect the losses in
electricity networks. The two main types of information required about energy
loss are given below.
1. How to locate the theft source?
2. How much electricity is stolen?
2.1.2 Non-Technical Losses Issues in Smart Grid
In addition, the advancements in power systems make the grid more exposed to
cyber-attacks, fraud, and system failures due to the increase in the number of
nodes in the energy network.
20 By: Arshid Ali
MS Thesis Electricity Theft Detection
The NTL-based losses are mainly experienced by the illegal electricity EC of the
users, which also disturbs the system operation, incurs additional losses, damages
the system components, and also affects the grid security and stability. Many
countries have also characterized electricity theft as a special kind of crime [17].
To reduce NTLs, utility companies must follow the necessary steps to identify the
theft and abnormal behavior of energy usage. However, the conventional meth-
ods require a large number of technicians to perform the on-the-spot checkup
of the consumption meters. The manual energy consumption reading also lacks
organized time and labor schedules. Due to this, an insignificant amount of en-
ergy theft is detected, which results in a less revenue pay-back [18]. The recent
rapid improvements in ML methods show increased interest in the ideas of models
analyzing the load information, and meter tampering as early as possible. The
ML theft detection techniques work to detect the deviation of energy statistical
patterns from normal behavior. In modern research, the use of ML techniques
provides a new solution for utility companies for detecting anomalous EC. These
modern techniques make it possible to automate and improve detection accuracy
by accurately identifying malicious patterns. Thus, an ML classifier with high ac-
curacy is needed to help the existing techniques deal with large detection tasks. To
overcome the electricity theft issue, many data-driven methods have been used in
recent years. These methods are divided into three categories, namely state-based,
game theory-based and artificial intelligence-based methods [19].
1) State-based methods use specific kinds of devices or designs for metering and
theft detection purposes. For example, a special ammeter checks the electricity
difference between the local and remote ends for fraud detection purposes. State-
based estimation works only at the substation level and not at the end-user level.
This type of installation for electricity theft detection requires extra monitoring
devices which are difficult to install in the existing distribution systems.
2) Game-theory-based methods use the interfering behavior of pricing competition
and product releases like games between anomalous users and electric companies.
The main goal of this method is to find an equilibrium for the game. This type
of model is easy to install but it is hard to find specific mathematical modeling,
which relates the actual behavior between the end user with the utility company.
3) Artificial intelligence (AI) is adopted in almost all worldwide fields including
21 By: Arshid Ali
MS Thesis Electricity Theft Detection
Gas
23%
Nuclear
10%
Wind
6%
Solar
3%
Hydro
16%
Geo, Biomass &
other
Other
1%
Oil
3%
Coal
35%
Customer
Service
10%
Other
16%
Sales
16%
Security
25%
Business
33%
Customer Service Other Sales Security Business
Figure 2.4: ML Techniques in Various Fields.
business, security, sales, banking, and many more. The expansion and advance-
ments in SG generate big data, which requires a scalable technique for efficient
utilization. The recent advancements in ML and DL in anomaly detection pave a
way for energy security in SG [20]. These ML-based models can be used to address
the NTL issue in the SG. In the present AMI system, these AI techniques can be
used to draw and compare the load profile and the energy consumption pattern
of end-users to classify legal and illegal electricity users. Different applications of
AI are 2.4
This research aims to present an accurate theft and normal users’ classification
model using the state grid corporation of China (SGCC) dataset. In this work,
we will use some pre-processing steps such as data cleaning, data normalization,
and data balancing, as shown in Fig. 2.5.
2.1.3 Contributions
It has been observed from the literature that most of the present research works
use different intelligent ML methods to detect the NTLs’ behavior in the time-
series data of smart grids. However, the current research still has less accuracy
and a research gap in NTLs’ behavior detection. The present theft detection issues
are tackled in the form of the following contributions:
22 By: Arshid Ali
MS Thesis Electricity Theft Detection
1. The data obtained from the smart meter has normal and theft users where
the number of abnormal users is less than the normal electricity users. Many
research work use classification models on the data obtained from the smart
meters without considering the issue of class imbalance. The class imbalance
biases the ML model towards the majority class and the model classifies
theft as a normal user. This class imbalanced data needs a proper balancing
technique to overcome the biasness issue.
2. The second problem addressed in this study is high dimensionality in the
time-series dataset. The high dimensionality causes time complexity issue
and reduces output classification performance. This issue is reduced through
a proper feature-reduction technique.
3. Third, many researchers emphasize output results compared with the orig-
inal labels of the testing set and do not focus on the detection level of
abnormal electricity users. The results’ comparison in the form of accuracy
is not a proper metric. It may result in a set of theft users inspected as
normal users, which should be reduced. In the confusion matrix, when the
abnormal consumers are predicted as normal consumers, are considered as
a false negative. This false negative issue is addressed in this research to
reduce revenue loss.
4. In machine learning techniques, some normal users are predicted as mali-
cious, which increases the on-spot inspection cost. The fourth contribution
in this paper addresses the issue in the form of a maximum reduction in
false positive rate.
5. Much state-of-the-art research done in electricity theft detection used a sin-
gle machine learning algorithm which is difficult for the classifier to learn
all the energy patterns of users from a large dataset. The single model is
normally under-fit or over-fit in the case of large and imbalanced datasets.
In this research, an ensemble stacking model is proposed for the best classi-
fication and generalization purposes.
These ML-based models can be used to address the NTL issue in the smart grid.
In the present AMI system, these AI techniques can be used to draw and com-
pare the load profile and the energy consumption pattern of end-users to classify
23 By: Arshid Ali
MS Thesis Electricity Theft Detection
legal and illegal electricity users. This research aims to present an accurate theft
and normal user classification using the state grid cooperation of China (SGCC)
dataset. In this work, we will use some pre-processing steps such as data clean-
ing, data balancing, missing data imputation, and data normalization as shown in
figure-2.5:
Smart Grid Data
ML
Algorithm
Selection
Meta Model
Missing
Value
Imputation
Data
Balancing
Class
Balancing
Technique
Results Evaluation Desired
Performance Stop
Base
Models
Feature
Extraction
Data
Cleaning No
Yes
No
Yes
Output
Yes Yes
No
Predictions
Balanced
Dataset
Yes
Figure 2.5: Proposed-Model Flow chart.
A Feature extraction step is also aimed to overcome the time complexity and clas-
sification performance using state-of-the-art machine learning techniques. Finally,
a Stacked (takes the output of ML models and uses a Meta model for output
prediction) machine and deep learning generalization technique will be used for
improved classification accuracy.
24 By: Arshid Ali
MS Thesis Electricity Theft Detection
2.2 Layout of Thesis
This thesis is divided into six chapters with a preliminary in chapter zero and
chapter one including an introduction. The literature work is summarized in the
second chapter. Chapter three and four explains the work done in system model
one and two, respectively. While the conclusion of this thesis is added in chapter
five.
2.2.1 Summary
Electricity plays an important role in the modern world. The demand for energy
is increasing day by day, therefore, lossless electricity consumption is required.
Different types of electricity losses occur from generation point to end consumers,
of technical and non-technical kinds. These losses cause system instability, volatile
demand and supply management, and huge revenue losses to utility. Among these
losses non-technical losses, which are due to illegal electricity consumers, are se-
vere type and can be easily reduced. Utilities applied different methods, using
hardware, sensors, software, and data-driven techniques, to overcome these losses.
Among these, data-driven approaches are more efficient, simple, and have low
capital investment.
The recent development of machine learning techniques made it possible to use
smart grid data and predict energy loss. However, machine learning techniques
applied in present research have in-efficiency in terms of theft consumers’ predic-
tion. This research focuses on pre-processing techniques, data balancing, feature
engineering, and algorithm combination to reduce FPR and FNR, and increase
classification accuracy.
25 By: Arshid Ali
MS Thesis Electricity Theft Detection
Chapter 3
Literature Review
26 By: Arshid Ali
MS Thesis Electricity Theft Detection
Literature Review
3.1 NTL detection schemes categories
This section discusses the existing work done to reduce the NTLs in the SG while
considering four main categories of NTL detection methods, including hardware-
based, state-based, game theory-based, and AI-based techniques [21]. The method-
ological structure begins by first having an overview of the classical approach and
then moving towards recent artificial intelligence (AI) based techniques. The AI-
based electricity theft detection (ETD) steps are then studied, in recent research,
focusing on pre-processing, balancing, feature engineering, algorithm modeling,
and how these techniques help in the ETD process. NTL has been the major issue
in achieving grid stability and causing revenue losses to the utility for more than
two decades. Researchers have addressed the issue of NTL reduction in power
systems using state-of-the-art techniques available at that time.
3.1.1 Hardware Based
NTL has been the major issue in achieving grid stability and revenue losses to
the utility for more than two decades. Researchers have addressed the issue of
NTLs reduction in power systems using state-of-the-art techniques available at
that time. For example, Pasdar et al. [22] proposed a smart metering system with
a high-speed signal to detect malicious activity in the network. The system uses
power line communication (PLC) to communicate the customers energy meter
with the utility. In this method, a lossless high-frequency signal, with known line
impedances, is transceived through PLC. The software at the utility compares
the signals of end users and detects the theft location. A similar case with little
modifications in electricity consumption observability is proposed in [23]. The
proposed method work using smart energy meters with a specialized display sys-
tem for both utility and end user. Using the display system, the end users can
analyze their own consumption while at the same time utility also monitor and
check the consumption behavior. In this way, the power quality and grid stability
27 By: Arshid Ali
MS Thesis Electricity Theft Detection
are maintained. A smart meter with a single chip-based checkup system is imple-
mented in [24] for ETD purposes. The chip uses the standard measurement as the
base and then predicts the malicious behavior by comparing it with real-time con-
sumption. The same hardware-based detection methods are proposed in [2528]
with wireless, especially GSM-based monitoring systems. A sensor network with
a cloud-based module monitor, circuit breaker, and real-time electricity pulse ob-
server are used to compare and monitor the input and output electricity to the
energy meter. The network then uses some form of switch, or buzzer to power off
the line and inform the utility about illegal activity. A similar hardware model
is presented in [29] based on state-based estimation. The model uses PLC and
supervisory control and data acquisition (SCADA) to check the state of connected
devices. The PLC is used for communication purposes while SCADA uses inter-
net protocol (IP) services and distributed network protocol 3 (DNP3) for exact
device identification and system interoperability, respectively. The acquired data
through the state controller module (SCM) is then compared with the standard
data produced by the load system (LS) to identify malicious attacks between two
connected grids/substations.
The above mentioned techniques used specialized sensors, hardware, and online
monitoring unit. Besides, the techniques may have hardware failure issues, and
can only measure and detect physical theft attacks with no capability of cyber
attack detection. Therefore, the authors in [30] proposed a measurement-based
approach for NTL reduction. The paper follows the use of an energy monitoring
unit on the secondary side of the distribution transformer. The unit takes total
electricity consumption measurement and sends the information to the utility of
the particular group. The measurement is compared by applying a statistical
approach to identity theft among the given group of consumers. Based on the
findings of the above hardware-based approaches, the techniques help in NTL
reduction. But the system may suffer from a high cost of investment in hardware
and unreliability problems. Moving towards its advanced version like AMI, the
authors in [31] suggested a multi-source information fusion (MSIF) technique using
AMI data for more accurate detection purposes. The data collected from electricity
consumers cant be easily classified as malignant or benign based on a single alert.
Therefore, a combination of alerts from several malicious users is presented for
accurate results. The authors also show the pros and cons of using supervised
28 By: Arshid Ali
MS Thesis Electricity Theft Detection
and unsupervised techniques for output performance. The basic function of the
system is that data collected from the AMI system have information about the
appliance consumption of that particular customer and also the meter reading.
So if a particular device is observed ON and the meter shows zero consumption,
then that user is a theft. However, the complete information about customer
consumption also causes privacy issues.
3.1.2 Game theory Based
Further research and advancement to the hardware-based NTL detection tech-
nique in the form of AMI, data-driven based techniques including a game theory,
and heuristic algorithm-based techniques are applied, such as in [32]- [33], to in-
crease the detection accuracy. The main purpose of the techniques is to model a
game between electric utility and electricity user. The system is designed to find
an equilibrium between the two and a threshold value is assigned. A probabilistic
technique is then applied to classify obtained data as honest or ET users. The
game theory and heuristic algorithm take sufficient time to deal with big data
due to their stochastic nature. These techniques are inaccurate, biased, and cant
reach an optimal value on a large dataset. The load flow method based on the
AMI dataset is implemented in [34]. The authors addressed the ETD issue in the
SG using the real-time electricity consumption pattern (ECP) of consumers. The
main problem with power flow analysis techniques, namely Gauss Siedal, New-
ton Raphson, and Fast Decoupled methods, is that they have low convergence
rates, large memory requirements, and time complexity issues in reaching an op-
timal point. The proposed system addressed this issue by using modified linear
regression to capture the electricity theft and normal patterns. This resulted in
an increase in the speed of power flow model simulation and its adoption in large
power systems.
3.1.3 Artificial Intelligence Based
Due to the new technological development in smart grids, especially AMI systems
with real-time monitoring systems and large-scale ECP collection in the form of
big data. The newly emerging field of data science techniques has almost made
29 By: Arshid Ali
MS Thesis Electricity Theft Detection
a replacement for the traditional NTLs detection techniques because of the low
cost, easy implementation, and high ETD rate. The big data obtained from the
utility can be given to data science (DS) techniques for easy and efficient analysis.
The DS-based machine learning (ML) and deep learning (DL) algorithms have the
capability of NTLs and revenue loss reduction. For example, the author in [35]
proposed a hybrid machine learning model. A decision tree (DT) and support
vector machine (SVM) is used such that extra features are extracted from the
original dataset and then fed to the SVM with the original features. SVM is
finally used for the final prediction. In addition to this data pre-processing is done
with only missing data imputation and normalization steps. The experiment is
done on a dataset collected from various homes in the USA. A non-linear radial
basis function (RBF) kernel is selected for SVM to improve the output results.
The final accuracy and false positive rate (FPR) obtained were 92.5% and 5.12%
respectively. Zhongzong and He in [36] implemented an extreme gradient boosting
(XGB) classifier for ETD purposes. The method considers the Irish (Ireland)
dataset without using proper data pre-processing data balancing techniques. The
data reduction is done in the way that six artificial theft attacks were generated
from the original dataset and then the model training/testing is performed with
that data. The final output obtained is compared with SVM-based classification.
The results show that XGB outperforms SVM in terms of precision, recall, and
FPR.
A novel idea of gradient boosting classifier (GBC) based theft detection method is
proposed in [37]. The research mainly focuses on feature engineering and hyper-
parameter tuning steps for improvement in detection rate and FPR and reduction
in processing time. The feature extraction is done using a combination of syn-
thetic feature generation and weighted feature importance (WFI) techniques. The
final results showed that GBC outperforms, the categorical boosting, light gradi-
ent boosting machine, and XGB, in terms of FPR and execution time. Author
in [17] proposes a supervised machine learning technique for all kinds of anomaly
detection in smart grids. For this purpose Endesa (Spain) dataset is considered
which is collected from almost 57000 field inspections of different consumers. The
feature extraction is done using energy consumption (EC), quality byte, distance,
density, and electrical magnitude-based measurement. The Endesa dataset in-
cluding EC also has important information on geographical, seasonal, and smart
30 By: Arshid Ali
MS Thesis Electricity Theft Detection
meter properties. The final extreme gradient boosting (XGB) model shows better
results with an AUC of 91%. The same data-driven technique is applied in [38] us-
ing machine learning, deep learning, and parallel computing techniques to detect
malicious electricity users. A Turkish smart grid dataset is chosen for the model
implementation and detection of false data injection. The feature learning process
is done by combining highly comparative time series analysis and neighborhood
component analysis feature selection algorithms. After data transformation, the
classification algorithm is implemented. Improved results are obtained for XG-
Boost in terms of FPR value of 0.005.
Prem et al. [39] worked on cyber-physical attack detection using an isolation forest
classifier (IFC). The isolation forest is used to detect the change in the pattern
of the consumers. The main purpose of theft is to decrease the meter reading
from actual values, which changes the energy consumption pattern (ECP) of that
particular user. Data reduction is done using PCA. The IFC is trained at varying
load and voltage generation in order to capture the exact picture from all pos-
sible ECP of consumers. The hyper-parameter tuning is done and the model is
tested for different grid/bus systems. The results obtained show 98.7% recall in
terms of anomaly detection with the IEEE 3-bus system. Leloko et al. [14] tried
to differentiate theft consumers from honest consumers using the SGCC dataset.
The overall method used data pre-processing, data balancing, and feature reduc-
tion. Hyperparameter tuning is done using a Bayesian optimizer. The model was
individually implemented on both time domain features and frequency domain
features for accurate training. Feature selection from both the time and frequency
domains proves useful. The final deep neural network implemented showed out-
standing performance of 97% and 91.8% with an area under the curve (AUC)
and accuracy, respectively. The author in [40] structured a CNN-RNN-BiLSTM
model to detect electricity theft in Raipur (an Indian State). The dataset consists
of 41 users meters of three-phase supply. A combination of 3 different deep learn-
ing models is used to learn the data pattern of normal and abnormal users. The
proposed model achieved an improved accuracy of 97.1% and outperformed, the
existing SVM and multi-class SVM when compared. In [41], Paria et al. presented
a solution for ETD purposes while focusing on the ECP of consumers. The ECP
of theft and honest users are not the same, in fact, the theft pattern has more
fluctuations. Therefore, the area with a high probability of malicious activities, in
31 By: Arshid Ali
MS Thesis Electricity Theft Detection
terms of electricity consumption, are installed with distribution transformer me-
ters (DTM). Using this transformer meter, both types of consumers are identified.
5000 real consumers’ data is analyzed in this work. The data preprocessing, bal-
ancing, and feature reduction are all overcome by generating six synthetic attacks.
SVM algorithm with the combination of noting different types of ECP is obtained
using DTM. The final experimental result showed a 93% detection rate and 11%
FPR.
Similar to the above ECP-based NTL detection, an optimized convolutional neural
network and gated recurrent unit (CNN-GRU) method is studied in [42]. Real-
time data of 10,000 consumers is analyzed for ETD purposes. The data preprocess-
ing is done to impute missing values. Synthetic minority over-sampling (SMOTE)
is used for class balancing. A manta ray foraging optimization (MRFO) is com-
bined with CNN-GRU for result improvement. The final model implemented
showed a 91.1% accuracy which was greater than SVM, logistic regression, and
CNN-GRU. The same data-driven approach is applied in [43] for NTL reduction in
the SG. The authors worked on real-time data of 2,271 consumers collected from
the Honduras distribution system. The smoothing spline function (SSF) is used
for outlier handling. For feature reduction purposes, a new discrete wavelet packet
transform is implemented. The class imbalance issue is addressed using the ran-
dom under-sampling (RUS) technique. In the last step, an ML-based RUS with
Adaboost technique is applied for classification purposes. Adaboost performed
better with an accuracy of 94.35% when compared with Linear-SVM, Non-Linear
SVM, and artificial neural network (ANN). Using new incoming ML and pre-
processing techniques makes the ETD process simple and efficient. Pamir et al.
in [44] followed the same direction and proposed a hybrid ensemble model for
electricity theft detection. The researchers worked on data pre-processing using
KNNOR for data balancing. The feature reduction is done using the recursive fea-
ture elimination technique. For classification purposes, a bi-directional long short
term memory (Bi-LSTM) classifier with three layers is used as the base model
followed by a LogitBoost classifier. This proposed stacking approach results in
improved detection performance when verified on a real-world ’SGCC’ dataset.
The output value obtained for precision, F1-Score, and accuracy show 96.32%,
94.33%, and 89.45%, respectively.
The high dimensional dataset increases the model time complexity and degrades
32 By: Arshid Ali
MS Thesis Electricity Theft Detection
the model classification results. So, motivated by feature reduction techniques in
electricity theft detection. A natural gradient boosting (NGBoost) based theft
detection method is proposed in [45]. The three-step system represents the ex-
traction of important features, data pre-processing, and classification techniques.
The missing values are imputed with the miss forest technique, and the data im-
balance problem is addressed with the majority-weighted minority oversampling
technique (MWMOTE). A time series feature library combined with a whale opti-
mization algorithm is used for feature extraction. Finally, the NGBoost classifier
is implemented on the SGCC dataset. The proposed structure achieved a 93% of
accuracy, 91%, and 95% for recall and precision, respectively. A similar feature-
engineered approach is adopted in [46]. The authors used SGCC dataset in the
research and addressed the missing values, imbalance class ratio, and high dimen-
sionality issues. These problems are overcome respectively with, knn-imputation,
SMOTETomek, and ts-FRESH algorithms. A min-max normalization is also ap-
plied for data scaling. A CATBoost, XGBoost, and LightGBM classifiers are used
for final normal and theft classification. The results show an improved result for
the CATBoost algorithm by achieving a 95% of accuracy. In a similar fashion,
the researchers addressed the electricity theft issue using the Colombian electricity
supplier dataset. The data is pre-processed by addressing the missing values and
normalization. The missing values are removed while the scaling is done using
Min-Max-Scaler. Finally, a BiGRU-CNN is implemented for classification pur-
poses. The purposed model achieved an accuracy of 92.9%, 0.841 of F1-Score,
and AUROC of 0.966 [47].
In electricity theft detection (ETD), the class imbalance problem makes the model
biased towards the majority class and reduces the classification performance. To
overcome this issue, the researchers in [48] emphasize class balancing using load
data of 50 urban electricity users for three months in Hebei province, China. A
K-SMOTE technique is used to make equal the honest and theft users. Several
machine learning techniques are applied for theft prediction purposes in which
Random Forest (RF) model shows superior performance than other compared ML
classifiers. The evaluation metrics showed 94.53% and 0.9513 for accuracy and
area under the curve, respectively. Sravan and Dipu in [49] also followed the class
balancing approach. The authors implemented ensemble techniques to detect elec-
tricity theft using a dataset of 5000 customers. The dataset was obtained from
33 By: Arshid Ali
MS Thesis Electricity Theft Detection
the commission for energy regulation. Data pre-processing is done by near-miss
imputation and SMOTE oversampling. The author revealed that the bagging
ensemble outperforms other ensemble-boosting methods in terms of theft identifi-
cation. The Random Forest and Extra Tree classifiers achieved an AUC value of
0.90 which was higher than comparative ML models. Due to the fast and large
data interpretation capability of deep learning models. The deep learning-based
electricity theft detection, along with the imbalance dataset is addressed by Rui
et al in [50]. To address the model biasness towards the majority class, a focal loss
function is used to reduce the sample weight of normal users. SENet is combined
with wide and deep convolution neural networks to learn the global features and
detect the electricity theft consumers from the data. The final model is tested us-
ing real-time data from the Chinese Smart Grid Corporation (SGCC). The model
outperform compared state-of-the-art techniques by obtaining an area under the
curve score of 0.83. Lei et al in [51] proposed a new theft attack model for theft
identification. The researchers extract the important patterns of the particular
users along with the neighborhood energy consumption patterns using the SGCC
dataset. The normal users’ pattern seemed to have regular patterns while theft
patterns have large spikes and variations. The Pearson correlation coefficient is
used to capture the similarity between theft and honest users’ patterns. Finally,
a convolution neural network is applied for classification purposes. The proposed
method obtained 88% of accuracy and 95% for the area under the curve and
outperform the comparative theft detection techniques.
Heuristic algorithms play a great role in achieving the model’s optimal point of
performance. A dual deep learning technique is combined with heuristic techniques
by Abdulwahab and Nasir in [52]. The researchers used the State Grid data ob-
tained by the Chinese Govt. The dataset contains the energy consumption of 9655
users for one year. For theft classification purposes, the data is pre-processed by
addressing missing values, outlier handling, and data normalization. The interpo-
lation, 3-sigma rule, and min-max scaling are respectively used for this purpose.
The class balancing is done using SMOTE and SMOTEBoost techniques. After
data extraction with ZFNET, the final CNN-LSTM algorithm is used for theft
identification purposes. For a faster and more efficient operation, blue monkey
and black widow optimizers are applied. The simulation results show an improved
performance of 91% and 93% in terms of accuracy on the blue monkey and black
34 By: Arshid Ali
MS Thesis Electricity Theft Detection
widow-based tuning, which stand out as state-of-the-art techniques in electricity
theft prediction.
In a data-driven based electricity theft detection scenario, a low FPR is required
to reduce the onsite inspection cost. Dexi et al proposed a low FPR deep neural
network to address this issue. The real-time Irish smart grid dataset is used in
this research. The author extracts the important features using the deep model
and focal loss is used to reduce the class imbalance problem. Finally, a two-stage
training model is implemented. In the first stage of training, a one-dimensional
convolution and residual network are used in combination with convolution gra-
dient descent to update the model weights using the grid search tuning method.
In the second stage, FPR is taken as an objective function and particle swarm
optimization is done. The proposed model achieves outstanding performance, on
the Irish dataset, with 0.29 of FPR and 99.42 of AUC [53]. Hasan et al in [54]
proposed a CNN-LSTM-based electricity theft detection (ETD) model using his-
torical power consumption data for 10,000. The author addresses the missing
value in the dataset. The im-balance class issue is addressed using SMOTE-based
balancing. Overall, an improved accuracy of 89% is obtained for theft and normal
users classification.
A simple data-driven approach is adopted by Roubin et al in [55]. The authors
first applied important pre-processing steps for NaNs and outliers handling. The
NaNs are imputed with mean imputation and outliers are replaced with values
obtained from the 3-sigma rule. The authors also addressed the difference in
consumption of users, on weekdays and weekends, and addressed it using the ratio
profile (RP). The researchers then used unsupervised fuzzy c-clustering for theft
and normal user classification. Discretized wavelet transform is used for extracting
the important features. Finally, six theft attack samples are created for the Irish
dataset case. The results showed superior performance of the proposed model in
terms of AUC when applied to real datasets. In [56], the researchers proposed
an ensemble machine learning model with a stacking structure to identify the
electricity theft users in the SGCC dataset. The pre-processing of the dataset is
done with the 3-sigma rule, mean imputation, and min-max standardization. The
high dimensionality issue is addressed with principal component analysis. In the
proposed model light gradient boosting, knn and lstm are chosen as base classifiers
while SVM tuned with PSO is used as the final estimator. The final results show
35 By: Arshid Ali
MS Thesis Electricity Theft Detection
an AUC value of 0.986 which is greater than other comparing models. A similarly
stacked autoencoder along with LSTM sequence-to-sequence (S2S) structure is
proposed in [57]. Autoencoders are used to capture the data pattern and the
LSTM-S2S model is used for the final classification. The proposed model is verified
on realistic ISET and SGCC datasets. The model achieved 96% accuracy and 0.93
AUC on the SGCC dataset. And 94.5% accuracy and 0.90 AUC for ISET dataset
classification. The author in [58] proposed a ConvLSTM model for ETD purposes.
The pre-processing steps include data cleaning KNN imputation and IQR for
handling outliers. The borderline-SMOTE is used for data balancing. Finally, the
CNN-LSTM model is implemented on the SGCC dataset. The proposed model
outperforms other methods by obtaining 0.977 for ROC-AUC and 96.6% accuracy.
Inspired by the importance of feature extraction techniques. The authors proposed
a two-stage theft detection method in [59]. In the first stage an auto-regressive
integrated moving average, Holt-winters, and seasonality trend are analyzed to
capture important features from the dataset. In the second stage distributed
random forest classifier is trained and tuned for electricity theft detection purposes.
The final model is verified on the SGCC dataset that shows superior performance
in comparison with state-of-the-art techniques by gaining 98% accuracy and f1-
score. A similar feature extraction-based theft detection is proposed by Yifan
and Qifeng in [60]. A stacked sparse autoencoder is used for electricity theft
identification. The autoencoder has a good feature extraction capability and is
used for electricity data reconstruction. A reconstruction error function is used to
compare the normal and theft users’ consumption. Three autoencoder layers are
combined with sparsity to make the model robust. The final classifier is optimized
with the PSO algorithm and verified on a real china dataset. The proposed model
obtained 90% detection rate and FPR less than 10%.
In [61], the problem of missing values in the dataset is focused on data classifica-
tion. The authors relate the quick changes in the electricity consumption pattern
with the type of consumer. The data obtained from smart meters has missing val-
ues and that information needs to be counted in the ETD detection case. The pro-
posed technique shows a relationship between missing values and neural networks
through a neural architecture search (NAS) technique. The proposed architecture
shows 5% improved results, by NANs addressing, with an AUC value of 0.926.
Jeanne and Filipe in [62] estimated the importance of data balancing in ETD
36 By: Arshid Ali
MS Thesis Electricity Theft Detection
using a convolution neural network (CNN). The NaNs in the SGCC dataset are
imputed with the linear interpolation method. Then six different class balancing
techniques are used including Random Oversampling, Random Under-sampling,
K-medoids-based Under-sampling, SMOTE, and using Cluster Oversampling. Fi-
nally, CNN is used as a classifier to separate theft users from normal consumers.
The results showed superior performance on random oversampling and CBOS-
based CNN classification with an AUC value of 0.67 and 0.68 respectively.
3.2 Problem Analysis
The existing literature study showed that electricity theft detection still has a gap
and further research is needed to adequately solve this issue. More specifically,
it has been found that pre-processing steps greatly affect the classifier prediction
capability. Therefore, this research proposed implementing and analyzing fifteen
individual classifiers on different class balancing techniques. We exploit that some
ML classifiers and class balancing techniques showed improvement toward theft
identification results.
Summary of Related Work
Table 3.1:
0
Paper
References
Dataset Data Pre-
Processing
Model Performance
1. Paper [63] China/SGCC Feature
Reduction
CNN Accuracy=
92%
2. Paper [64] Spain/Endesa Data Scaling XGBOOST Accuracy=
91.1%
3. Paper [65] Ireland/Irish Feature
Reduction
XGBoost Accuracy=
95%
4. Paper [66] China/SGCC Dimensional
Reduction
And
Balancing
UaRe-
Random
Forst
Accuracy=
93.6%
Continued on next page
37 By: Arshid Ali
MS Thesis Electricity Theft Detection
Table 3.1: (Continued)
5. Paper [67] China/SGCC Six
Synthetic
Attacks
Gradient
Boost
Accuracy=
97%
6. Paper [68] China/SGCC Synthetic
Theft
Attacks
CNN Accuracy=
92%
7. Paper [40] India/
Raipur
State
Synthetic
theft attacks
CNN-RNN-
BiLSTM
Accuracy=
97.1%
8. Paper [69] Ireland
(SEAI)
Clustering SVM FPR= 11%
9. Paper [70] Honduras Data Under-
Sampling
AdaBoost Accuracy
94%
10. Paper [65] 370 Homes
EC data
from 3 years
Data
Extraction
RF Accuracy=91%
11. Paper [71] USA Home
EC Daily
Data
Feature
Scaling
SVM Accuracy=
92.5%
12. Paper [72] Ireland/Irish Feature
Reduction
RNN Accuracy=
93%
13. Paper [73] 118, 300
Bus System
Simulation
Data
False
Attacks and
Detection
SVE Accuracy=
93%
14. Paper [74] Korea/KEPCO Feature
Selection
CNN Accuracy=
85%
15. Paper [21] Pakistan/Precon Feature
Reduction
CatBoost Accuracy=
98%
16. Paper [75] LESCO Individual
Homes
SVM Accuracy=
75%
Continued on next page
38 By: Arshid Ali
MS Thesis Electricity Theft Detection
Table 3.1: (Continued)
17. Paper [11] SGCC Data
Balancing,
Reduction,
HP-tuning
Adaboost Accuracy=88%
18. Paper [76] SGCC Data
Balancing,
Reduction,
HP-tuning
ABC-model Accuracy=
91%
19. Paper [42] 10,000
Consumers
Data
Balancing,
CNN-GRU Accuracy=
91.1%
20. Paper [44] SGCC Data
Balancing,
Reduction,
BiLSTM-
Logit-Boost
Accuracy=
89.45%
21. Paper [45] SGCC Data
Balancing,
Reduction,
NGBoost Accuracy=
93%
22. Paper [46] SGCC Data
Balancing
CATBoost Accuracy=
95%
23. Paper [47] Colombian Data
Normalization
Bi-GRU-
CNN
Accuracy=
92.9%
24. Paper [48] China’s 50
Users data
for 3 months
Data
Balancing,
RF Accuracy=
94.53%
25. Paper [49] 5000 users
data
Data
Balancing
RF AUC= 0.91
26. Paper [50] SGCC Data
Reduction
WDCNN AUC= 0.83
27. Paper [51] SGCC Data
Reduction
CNN Accuracy=
88%
Continued on next page
39 By: Arshid Ali
MS Thesis Electricity Theft Detection
Table 3.1: (Continued)
28. Paper [52] Chinese
State Grid
data of 9655
users
Data
Extraction
and
Balancing,
Tuning
Hyper-
parameter
CNN-LSTM Accuracy=
93%
29. Paper [54] 10,000 users
data
Data
Balancing
CNN-LSTM Accuracy=
89%
30. Paper [60] 5000 users
data
Feature
Extraction
RF Detection
Rate= 90%
31. Paper [62] SGCC Data
Balancing
CNN AUC= 0.68
32. Paper [41] 5000 users
data
Six
Synthetic
Attacks
SVM Detection
Rate= 93%
33. Paper [35] Data from
various
homes in
USA
Data Nor-
malization
and NaN
Imputation
SVM Accuracy=
92.5%
34. Paper [53] Irish Data Data
Balancing
and
Extraction
Neural
Network +
Optimization
FPR= 0.29
3.3 Summary
A comprehensive overview of NTL detection is obtained in this section. A list
of differences is studied with their output performance. It has been observed
that theft detection efficiency is improved by ML-based techniques. The present
literature review is also summarized in a table. However, the results show a re-
search gap in terms of pre-processing, feature engineering, data balancing, and
40 By: Arshid Ali
MS Thesis Electricity Theft Detection
algorithm selection. It has also been observed that the present work lacks impor-
tant performance parameters that best explain the classification index, which will
be addressed in this research work.
41 By: Arshid Ali
MS Thesis Electricity Theft Detection
Chapter 4
Proposed Model-1 and Simulation Results
42 By: Arshid Ali
MS Thesis Electricity Theft Detection
4.1 Introduction
The recent developments in data-driven techniques made the prediction process
simple and accurate. These data-based techniques, especially machine learning
techniques, are widely used for anomaly detection purposes. In a similar way, the
anomaly in the smart grid is actually the malicious/theft/dis-honest users who
disturb the electricity system and causes a large number of losses to the utility.
So machine learning algorithm can be used to detect the anomaly in the smart
grid. Although ML-based methods, in smart grids, made a lot of progress in theft
detection there still exist issues that need to be addressed. The real-time original
data obtained need some preparation steps which are missing in the present work.
Single machine learning algorithms suffer from under-fitting and over-fitting is-
sues. The data obtained in real-time is normally highly dimensional which time
consuming and may cause inefficiency in prediction. There is also a need for such
performance parameters which best explain the classification performance of a
given model. Regarding these issues, an ensemble stacking model is proposed that
better reduces these issues. To evaluate the classification performance, important
performance parameters are selected to address the current issues in ETD.
Proposed Methodology
In our proposed model, an ensemble AI technique is implemented for ETD in
SG. The data is obtained from a utility company. The original data is prepared
for ML model training using preprocessing and feature engineering steps. The
entire dataset is split into training and testing sets. The training set is fed to
base ML classifiers for training and prediction purposes. In the final step, the
prediction from ML classifiers is used as features of a deep learning model for better
classification results. The complete system, as shown in figure-4.1, is divided into
the following four steps.
1. In step 1, the data collected from the SG has some missing values and out-
liers, and has a large variance. This may be due to hardware issues, noise in
the communication medium, and users different electricity consumption be-
havior. The missing values in the dataset decrease the model performance.
43 By: Arshid Ali
MS Thesis Electricity Theft Detection
Step-1: Data Pre-Process ing
Step-3: Training Data to Base Models
Step-4:
Level-1 Classifier
Step 3
Data
Normaliz
ation
Step 1
Data
Imputation
Step 2
Handling
Outliers
Data Management
Center Replacing NaN by Mean
Z-Score
Min-Max Scaling
Step-1 Step-3 Step-4 Step-2
Theft Users= 1
Normal Consumers= 0
Pre-Processed
Un-Balanced Data
01
Balanced Data
01
Minority
Class=1 Majority
Class= 0
SVM
Separating
Minority &
Majority Class
Synthetic Data
Generation
using Smote
LGBM XGBOOST RF ETC
Input
Layer Hidden
Layer Output
Layer
Output from Base Models
Data
Collection
Multi Layer
Perceptron
Prediction-3 Prediction-4Prediction-2
Ancillary PCA based Reduce d Data
Prediction-2
Towns
Pre-Processed Data
Step-2: Feature Engineering
Simple Conversion E.g
3-D Data to 2D Data
X
Z
Y
ZY
Y
M-Dimensions Data
N
Dimensional
Data
D-1
D-2
D-3
D-N
E-1
E-2
E-3
E-N
Eigen-Values Transpose of
Eigen Vectors
V1-T
V2-T
V3-T
VN-T
X
D1
D2
D3
DM
Reduced
M
Dimensional
Data
PCA Leveraging
Figure 4.1: Proposed ETD Stacked Generalization Model.
Therefore, they are replaced with mean values. To address the issue of out-
liers’ handling, a simple interpolation technique is used. The large variation
in the dataset reduces the model training capability. Therefore, the data is
normalized using Min-Max scaling.
2. The step-2 addresses the issue of high dimensionality of the SGCC dataset.
The original dataset is reduced using principal component analysis (PCA)
in order to increase storage efficiency and performance, and reduce storage
cost and time complexity.
3. Step-3 shows the original training data in the four ML models. These base
44 By: Arshid Ali
MS Thesis Electricity Theft Detection
models predict the output individually. These level-0 ML models include
LGB, XGB, LR, and ET classifiers.
4. A multilayer perceptron (MLP) is used as a level-1 classifier in step 4, which
obtainS the output of level-0 models and predict the final output in the form
of theft or normal electricity consumer.
4.2 Dataset Information
The dataset used in this study is obtained from the real-time electricity consump-
tion of Fujian city consumers connected to SGCC. This SGCC dataset, available
as an MS Excel file, has a total of 42,372 consumers. There are mainly two types
of consumers in this dataset, which are labeled as 0 and 1. Label-0 indicates a
normal user while label-1 indicates theft consumers. The consumers and their
corresponding daily consumption are arranged as rows and columns in a table,
which shows the records and features of the dataset, respectively. Details of the
dataset are organized in table-4.1.
Original Dataset Information
1. Source of Data Utility (SGCC)
2. Consumption Duration 01 /01 /2014 to 31/ 10/ 2016
3. Consumers Category Residential
4. Type of Data Dially Consumption
5. Total Consumers/ Samples 42,372
6. Normal Consumers 38,757
7. Theft Consumers 3,615
8. Features 1,034
Table 4.1: SGCC Original Dataset Information.
4.3 Pre-processing
In electricity theft detection, we learn the users ECP to the model and then use
that for future CPTA predictions. Therefore, a proper or near to exact pattern is
45 By: Arshid Ali
MS Thesis Electricity Theft Detection
needed for accurate detection. While the electricity consumption data obtained
from the utility is un-scaled, imbalanced, and has missing values and outliers.
Moreover, the information obtained from the original dataset, as shown in table-
4.1, can’t be used for accurate model training. Therefore, the data must be pre-
pared using some ML techniques. After the data is preprocessed, the consumption
patterns of the theft and normal users can be efficiently drawn.
Two Random Users from Honest and Theft Class
Electricity Consumption
Figure 4.2: Electricity Consumption Pattern of Two Random Con-
sumers from SGCC Dataset.
As shown in figure-4.2, from a sample of the SGCC dataset that a normal user
has a much smooth electricity usage pattern than a theft consumer that has large
variations in the usage pattern. So the final pre-processed data can be used for
model training and user behavior prediction. The pre-processing steps used in the
proposed model are discussed below.
4.3.1 Missing Data Imputation
The dataset obtained from the utility has a large number of missing values, de-
noted as not a number (NaN) and NaN are different from each other. These NaN
values may be due to systematic, environmental, or random errors. The missing
values cant be neglected during preprocessing as it decreases the model perfor-
mance and also, replacing it with zero will result in a loss of information. Many
46 By: Arshid Ali
MS Thesis Electricity Theft Detection
data science techniques are available for missing values imputation such as replac-
ing NaN with mean, median, or mode. The median and mode cause a repetition
of values in ECP, which again lead to negative performance. The linear interpo-
lation method given in [68] is used where NaN is replaced with mean, as given in
equation-4.3.1: It has mainly 3-imputation conditions.
f(x) =
x(m,n)1+x(m,n)+1
2,if xm,n =N aN, x(m,n)±1=NaN
0,if x(m,n)±1=NaN
xm,n Otherwise
In equation 4.3.1,
xm,n = the daily electricity consumption,
xm,n1= the previous value
xm,n+1 = the next value to NaN. (4.1)
4.3.2 Handling Outliers
In the ECP, we found some values to be too large or too small as compared to
the normal value. These unexpected values (Outliers) deceive the model, which
incurs a large execution time. Generally, the values below 10 percent and above
90 percent are treated as outliers.
47 By: Arshid Ali
MS Thesis Electricity Theft Detection
Figure 4.3: Total contribution of outliers.
As shown in figure-4.3, the outliers in the dataset are less in number and show
very little contribution towards model training. A novel z-score capping-based
outliers handling method [77], shown in algorithm-1, is applied to make the data
more useful. The z-score outliers capping (ZSOC) technique works by first finding
the z-score using equation-4.2.
ZScore =Z=xiµ
σ2(4.2)
where
σ=qPN
n=1
(xnµ)2
n1
Z= standard score
xi= random value of outliers
µ= the mean value
sigma = standard deviation of row i.
After calculating the Z-score, properly assign the lower and upper limits to the
individual feature. The data point which is less than the lower limit or greater than
the upper limit is replaced with the corresponding limit. The main advantage of
using the capping technique is that it places the outlier at its respective extreme
value instead of completely removing the entire row. This helps to retain the
48 By: Arshid Ali
MS Thesis Electricity Theft Detection
useful information from losing in contrast to the present research [38,78] where
the outliers are entirely removed. Algorithm-1presents the complete process.
Algorithm 1: ZSOC working in cyber-physical Electricity theft detection.
1Start
Data: Xi,j
Result: Zm,n
2//Initialization
3For i = 2,3,4,N
4While
5Original Dataset (X) is Selected
6Find Z-Score, Z
Z=(xiµ)
σ
7while Z-Score is Calculated
8for t 1 to T
9Find:
10 lower_limit = µ3σ
11 the upper_limit= µ+ 3σ
12 Upper limit imputation
13 if Z>upper_limit then
14 Replace the Z with upper_limit
15 end
16 Impute Lower limit
17 if Z<lower_limit then
18 Replace the Z with upper_limit
19 end
20 when no condition is satisfied
21 Impute
22 xi=Z
23 return (Zm,n)
4.3.3 Unit based Normalization
The Z-method was used to handle the outliers, however, the dataset still has large
variations in the ECP of users, shown in figure-4.4, for a sample taken from the
SGCC dataset. These large variations degenerate the output performance, as the
ML and DL models are sensitive to the variation and quality of the dataset.
49 By: Arshid Ali
MS Thesis Electricity Theft Detection
Two Randomly Selected Features
Outliers
Figure 4.4: Variations in ECP of Electric Theft and Honest Con-
sumer.
Min-max normalization from [71] is applied for data scaling to the range [0, 1].
Min-max normalization has the mathematical form, shown in equation-4.3.
f(xi,j ) = ((xi,j )min(X))
(max(X)min(X)) (4.3)
Here min(X) and max(X) show the minimum and maximum electricity consump-
tion (EC) of feature j in the data.
4.3.4 Data Balancing
After missing value imputation, outlier removal, normalization, and feature engi-
neering, the next step is to check for the class imbalance. In the SGCC dataset,
the number of normal and abnormal users is not equally proportional. In the
SGCC dataset, out of the total number of 42,372 users, the number of honest and
illegitimate users are 38,757 and 3,615 respectively, as given in table-2. The ma-
jority class (Normal=0) has more consumers than the minority class (Theft=1),
as shown in Fig. 4.5:
50 By: Arshid Ali
MS Thesis Electricity Theft Detection
38,757 38,757
3,615
38,757
0
5,000
10,000
15,000
20,000
25,000
30,000
35,000
40,000
45,000
Im-Balance Dataset Balanced Dataset
Count
Class
Normal User Theft User
;
Figure 4.5: SVM-SMOTE base Balanced Dataset(SGCC).
Due to this skewed behavior of the dataset, the machine learning model also shows
a biased behavior towards the majority class and classifies the theft user (TU) as
a normal consumer (NC).
To overcome the imbalance class issue SVM-SMOTE is applied which results in
improved performance. The imbalanced and balanced data is given in Fig. 4.5
below:
Different techniques are used in the present research such as RUS, random over-
sampling (ROS), and SMOTE. RUS reduces the records in the majority class to
balance the dataset. But due to the reduction in the records, some important
information is also lost, which decreases the model’s performance.
51 By: Arshid Ali
MS Thesis Electricity Theft Detection
Normal
Users
Theft Users Normal
Users
Theft Users
Original Dataset Balanced Dataset
Normal
Users
Theft Users Normal
Users
Theft Users
Original Dataset Balanced Dataset
Hyper-Plane Drawn
by SVM
Chose Minority Class
Use KNN for
Data Point Relation
Generate Synthetic
Samples
Class-1 = Class-2
Original
Dataset
Hyper-Plane Drawn
by SVM
Chose Minority Class
Use KNN for
Data Point Relation
Generate Synthetic
Samples
Class-1 = Class-2
Original
Dataset
Balanced
Dataset
No
Yes
SVM-SMOTE Working
Figure 4.6: SVM-SMOTE System Diagram.
In contrast to RUS, ROS repeats random samples of the minority class and makes
it equal to the majority class. Due to the repetition in the dataset,the over-
fitting issue arises. To overcome the issues of RUS and ROS, SMOTE balances
the dataset by generating synthetic data of the minority class. SMOTE leads
to very high record dataset and causes time complexity issues. For proper data
balancing and to overcome the issues in the above-used techniques, we propose
a support vector machine minority oversampling technique (SVM-SMOTE) for
better classification performance. SVM-SMOTE is a modified form of SMOTE
used for minority class oversampling. In this method, a hyperplane is drawn
between the minority class and majority class, and synthetic data is generated on
the minority side to obtain a balanced dataset. So a clear boundary is obtained
between the ECP of normal and malicious users, and it becomes easy for model
learning and future prediction. The SVM-SMOTE technique is used, shown in
figure-4.6, for balancing the SGCC dataset.
4.3.5 Feature Engineering
The data pre-processed in previous steps are fully prepared to learn ML and DL
models. But the ML and DL have time complexity issues on a big and high-
dimensional dataset. The feature engineering step is performed to reduce the size
of the original dataset and also retain useful information. In the proposed system,
principal component analysis (PCA) is used for feature reduction purpose. The
52 By: Arshid Ali
MS Thesis Electricity Theft Detection
PCA is used for higher dimensional data reduction into lower dimensions. This is
obtained by forming linear relations among the features using mean and variance.
The reduced features obtained are called components that are independent of
each other. This is due to the fact that PCA finds variance among the features
and forms new components from the correlated features. The features which are
more correlated are stored as individual components. Similarly, the feature with
the highest variance has more information and is stored as the first component.
The second highest variance as the second component, and so on. The overall
PCA-based dimensionality reduction process- [79] is given in Algorithm-2.
Algorithm 2: Dimensionality Reduction Steps in Principal Component Anal-
ysis
1Start
2Input Data Y
3Output Z
4while Original Dataset, Y is Selected do
5Chose Point of Interest
6Find the Mean:
7Mean=µ=x+xn1
n
8while n= 0 do
9where n = 2,3,4,N
10 xn=values from dataset
11 x=point of interest
12 end
13 Find the Variance:
14 if µis Calculated, then
15 Using
16 σ2=N
n=1(xnµ)2
n
17 xn=Each value in the dataset
18 n = Number of values in the dataset
19 µ= Mean of all the values
20 end
21 If Find the Eigen Values(λ) :
22 Determinant = Det|A-λI|
23 A=Data Matrix
24 λ=Eigen Value
25 I=Identity Matrix
26 if λis Determined then
27 Compute the Eigen vector from Eigen Values:
28 AX = λX
29 A= N-dimensional data
30 X= N-Variables in dataset
31 end
32 if Eigen Vectors are Determined then
33 Sort Vector in Descending order w.r.t lambda Values:
34 λn, λn1, λn2, λn3, λni, λ2, λ1
end
35 if done then
36 Find the new Matrix W’ from Eigen Vectors
37 end
38 Find Z from Y and W:
39 Z=W Y
40 End
end
41
53 By: Arshid Ali
MS Thesis Electricity Theft Detection
As the repeated and more related features are summed up as an individual compo-
nent, it also reduces the over-fitting issue in the model. An additional arithmetic
leveraging technique is also applied which results in improved performance in the
proposed model. A set of 300 important features were extracted from the overall
1,034 features in the SGCC dataset. This helped in reducing the execution time.
The main disadvantage of using PCA is that it can’t capture the minimum co-
variance of the two classes and interpret the output features into such a uniform
linear shape that it again leads to a small increase in simulation time. This issue
is been tackled using an arithmetic leveraging technique, which also enhances the
ECP separation.
4.4 Model Selection
In ML and DL, the time series and high dimensional dataset have enormous ECP
and a single algorithm cant learn and predict the accurate behavior. Four different
ML models are considered in this work as weak learners to capture the ECP of all
the customers for better generalization. These learners are LGB, RF, XGB, and
ET. The structure and process of all level-0 learners are as follows:
4.4.1 Base Learner-1
Light gradient boosting (LGB) released by Microsoft in 2017 [80]. LGB is a
modified form of gradient boosting tree algorithm with leaf-wise splitting for higher
accuracy. Due to the leaf-wise splitting structure, LGB is useful for complex
modeling like time series classification, regression, ranking, etc.
4.4.2 Base Learner-2
Random forest (RF) is an ensemble ML algorithm used for classification and re-
gression. The algorithm is simple in structure with many DT. RF is the best
tool used for multi-variable datasets. It is the more widely used algorithm, and
54 By: Arshid Ali
MS Thesis Electricity Theft Detection
can produce good results without hyper-parameter optimization. The basic mech-
anism of this model is that it uses the bootstrapping phenomenon. Where the
original dataset is randomly divided into subsets with replacements [81]. These
bootstraps are then used for DT and each tree makes a prediction. Based on
these predictions, a voting mechanism is performed. This gives rise to the final
prediction of the RF model. RF can handle large datasets, reduce the variance
and over-fitting, and show a higher accuracy as compared to the DT classifier.
The Gini index is a statistical term used to predict the outcome probability of a
random forest. Mathematically, Gini index can be found using equation-4.4 [82].
Gini_index = 1
c
X
i=1
(pi)2(4.4)
Here, c = Number of classes
pi= Relative frequency of the given class outcome
The pseudo-code of random forest classifier- [83] shows the complete classification
process and is given below-3:
Algorithm 3: Random Forest Classifier
1H
2Start
3Original Dataset D
4Majority Voted Classifier (MVC)
5Training Set= xj, j = 1,2,3, ....m
6Testing Set= xk, k = 1,2,3, ....n
7d=(xi, yi), i = 1,2,3,4, ...N ;
8if d=xjthen
9Draw from Bootstrap (d) xj;
10 Make un-pruned trees;
11 Select best features based on gini-index;
12 Split until each tree grows to its maximum;
13 end
14 if Trees are formed then
15 Test the given trees on testing set, xk;
16 Collect prediction from the given trees (Pd);
17 MVC= Sum Pdonxkfrom 1,2,3,...n;
18 end
19 Return MVC.
55 By: Arshid Ali
MS Thesis Electricity Theft Detection
4.4.3 Base Learner-3
XGB or regularized gradient boosting is a sequential tree-based algorithm that
focuses on computation speed and model performance. This algorithm basically
works on the Taylor series function to find the loss function [84]. The model
combines the weak learners sequentially to improve their learning. A new reg-
ularization term is included to prune the extra leaf and avoid overfitting. The
algorithm can be used for both regression and classification tasks and has been
designed to work with large and complicated datasets. The algorithm- [85] is given
below-4.
Algorithm 4: Extreme Gradient Boosting Classifier
1H
2D is the labeled training data
3Initialize model with a constant value
4=L(yi, )
53: for do m = 0 to M
64: Compute the pseudo-residuals
75: Fit base learner to pseudo residuals
86: Ti = new DecisionTree()
98: Ti. train(Di,features)
10 9: Compute multiplier m
11 10: Update the model
12 11: output Fm(x)
4.4.4 Base Learner-4
Extra tree classifier (ETC), also named extremely randomized tree, is a DT-based
bagging technique. It uses training data and creates a large number of random
un-pruned trees. In the final step, ETC reduces the model training by collecting
a random DT for the best split [86]. Due to the random pruning phenomenon
and in the absence of an optimum splitting step, ETC has a very short execution
time and is applied in this model- [87]. The algorithm of the extra tree classifier
is given below-5.
H
56 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm 5: Extra Tree Classifier
1Input: D
2Output:Y
3Initialize Buildingrandomtree(LS) :
4If LS contains a number belonging to the same class,
5returns a leaf labeled with that class.
6Else:
7Make subset and Choosetestrandom(LS);
8Divide LS into LSleft and
9Buildingrandomtree(LSlef t)andrightf romthesesubsets;
10 Create a node with the test attach left and right as successors of this node
and return the resulting tree. Choosetestrandom(LS) :
11 Randomly select a position
12 Randomly select a threshold
13 Find mean and standard deviation values for subsets in LS.
14 If the score of this test is above a given threshold return the test
15 Otherwise, return to the Step and select another position
16 If all positions have already been considered, send the the best test so far.
4.4.5 Stacking Model
The main purpose of building a stacking ML model is to obtain better classi-
fication results, specifically theft detection in a SG. The model produces more
accurate results than the individual classifier. The stacked generalization com-
bines the learning ability of multiple algorithms for optimum accuracy in terms of
classification [88]. The proposed system combines the strength of all four level-0
classifiers for a reduction in variance, bias, overfitting, and execution time. The
model deals with big data, and makes accurate predictions. In the stacking model,
the training dataset is fed to the base learners with k-fold cross-validation. The
level-0 learns to make predictions on the out-of-fold dataset. In the next step, the
predictions from all the base learners are used as features of the level-1 classifier
or meta-classifier. The meta-classifier learns the predictions of level-0 learners and
predicts the output class. The complete stacking process is shown in Algorithm-6.
57 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm 6: Proposed Stacking Generalization Technique for Theft Detec-
tion
1Start
2Input= X
3Output= Final Prediction = Pf
4Original Data=X=
5N
(i,j,k,l,m,n,o=1) (x)(i, j, k, l, m, n, o), y(i, j, k, l , m, n, o)
6while Original Dataset is Selected do
7Split the data:
8Trainging set=X= N
(i,j,k,l,m=1) [(x)(i,j,k,l,m),y(i,j,k,l,m)
9Testing set=Y= N
(n,o=1) x(n,o),y(n,o)
10 Level-0 classifier (C1):
11 while n= 0 and do
12 for t 1 to T= X
13 learn base classifier C1 on X1=N
(i,j,k,l=1) x(i,j,k,l),y(i,j,k,l)
14 predict C1 on Validation set V1= N
(m=1) (xm, ym))
15 output prediction=P1
16 end
17 Level-0 classifier (C2)::
18 if then
19 C1 is Calculated learn base classifier C2 on X2= N
(i,j,k,m=1) x(i,j,k,m),y(i,j,k,m))
20 predict C2 on the Validation set
21 V2= N
(l=1) (xl, yl)
22 output prediction=P2
23 end
24 Find C3
25 if C2 is Determined then
26 learn base classifier C3 on X3= N
(i,j,l,m=1) x(i,j,l,m),y(i,j,l,m)
27 predict C3 on the Validation set
28 V3= N
(k=1) (xk, yk)
29 output prediction=P3
30 end
31 Find C4, if C1, C2, C3 is Determined then
32 Level-0 classifier (C4): Similarly, learn the base classifiers C4 Make predictions P4
33 end
34 if After All Level-0 are Predicted then
35 Meta classifier (M1):
36 learn Meta Classifier M1 on X=[P1,P2,P3,P4]
37 end
38 Predict M1 on Y
39 end
4.5 MLP Mathematical Modeling
An MLP is a useful tool for non-linear data classification. It has three main
layers: an input layer, a hidden layer, and an output layer. The number of input
layers depends on input data, the hidden layers are used for weight updation and
output layers are equal to the number of classes in the given dataset. The input
layers provide a scaled signal to the hidden layers. The weights are real numbers
multiplied by the input signals. The hidden layers give the weighted sum of the
58 By: Arshid Ali
MS Thesis Electricity Theft Detection
given information [89].
yo=
n
X
i=1
wixi+b(4.5)
The above-obtained information is still in linear form. The activation function
given below is used to obtain information about non-linear data.
f(x) = 1
1 + ex(4.6)
Then the information obtained from hidden layers can be found using the equation
below:
yo=f(x)(
n
X
i=1
wixi+b)(4.7)
where yois the output, wiis the weight value, xiis input data, bis the bias factor
and f(x) is the activation function.
The number of neurons determines the hidden layers in the network. If the number
of neurons is kept very then small it will lead to model under-fitting while a large
number of neurons lead to an over-fitting issue in the model prediction. A default
sigmoid activation function is used in the network for non-linear data modeling.
The limit of sigmoid activation is between 0 to one, with 0 for negative values and
1 for positive values. The overall equation used for MLP is given below.
yo=f[W Omn(
n
X
i=1
W Iij xi+b1) + b2](4.8)
where W Iij is the weight of input layer, W Omn is the weight of output layer, b1is
the input bias factor and b2is the bias in output layer.
4.6 Performance Metrics
For classification problems, various performance parameters are used to evaluate
the final output and performance of the model like confusion matrix, F1-Score,
Area Under the Curve, Precision, Recall, Receiver Operating Curve, and Accuracy.
These parameters are helpful in checking the overall performance of a model. The
59 By: Arshid Ali
MS Thesis Electricity Theft Detection
parameters are explained with respective mathematical forms in the following
paragraphs [36].
1. Confusion Matrix
In ML, a confusion matrix is used to measure classification performance. It
is an N * N matrix. Here N is the number of classes in a given dataset. The
matrix has 2 dimensions with actual class and predicted class. In our SGCC
data set, we have binary (2) class classifications normal (0) and malicious
(1). So the confusion matrix is 2*2 matrix [90]. And has the following
4-types of outputs.
(a) True Positive (TP) = True positive is the actual positive class (1) value
which is predicted positive (1) by the classifier.
(b) True Negative (TN) = True negative is negative class (0) data points
which is predicted as negative (0) by the model.
(c) False Positive (FP) = False positive is the negative class (0) values and
the model classify it as positive class (1).
(d) False Negative (FN) = And false negative shows the positive class (1)
values which is predicted as negative class (0) by the ML model.
2. Accuracy
In ML accuracy is used to measure the overall performance of a model on
a given dataset. It is used to measure how much data is classified correctly.
Considering the confusion it is the number of correctly classified data points
divided by all the data points predicted by a given ML model. Mathemati-
cally, accuracy is calculated using equation-5.38 [90].
Accuracy =T P T N
T P T N F P F N (4.9)
After data scaling, there are still some outliers in the dataset, which disturb
the model’s performance and learning time. The Z-Score-based capping
technique is used in this paper to properly address the outliers. The main
advantage of using the capping technique is that it places the outliers at their
respective extreme value instead of completely removing the entire row.
60 By: Arshid Ali
MS Thesis Electricity Theft Detection
3. Precision
Precision is the portion of positive data points that are correctly classified.
It is the positive class values, from all the predicted values, that the model
classified as positive. Precision is sometimes referred to as specificity. The
following mathematical formula, given in equation-4.10, used to find the
precision [90].
P recision(P) = T P
T P F P (4.10)
4. Recall
The recall represents the number of positively classified data points. It is
the portion of actual positive values that the model classifies as positive
and negative [90]. The recall is also called sensitivity and has the following
formula-4.11:
Recall(R) = T P
T P F N (4.11)
5. F1-Score
In the classification cases, the main aim is to obtain the best value for preci-
sion and recall. F1-Score is the measure used to find the best classification
values in terms of precision and recall [90]. Mathematically, it is the har-
monic mean of precision and recall, and is given in equation-4.12.
F1Score = 2 P r ecision Recall
P recision +Recall (4.12)
6. Area Under the Curve
AUC is the total area covered by the ROC curve or the total points lying
under the ROC curve. The threshold with maximum ROC is called the AUC
value of that model [90].
7. Receiver Operating Characteristics
The ROC curve shows the TP predicted values at different thresholds w.r.t
FP points. ROC is important to measure when dealing with imbalanced
datasets. More specifically, it is the graph of true positive rate (TPR) w.r.t
false positive rate (FPR) [90]. The equation-4.13 is used to find the TPR:
T P R =T P
T P +F N (4.13)
61 By: Arshid Ali
MS Thesis Electricity Theft Detection
While to find the FPR, equation-4.14 is used:
FPR =F P
F P +T N (4.14)
Result and Simulation
4.7 Simulation Setup
In this section, we set the prepared and reduced dataset into training and testing
subsets. The processed data obtained from the above three steps are split into
80:20 for model training and testing. The final result is shown in the next section.
4.8 Results Discussion and Evaluation
The results obtained are evaluated in the form of important performance metrics
required for classification purposes. The experimental results obtained after the
model simulations are discussed as follows. Accuracy is a general classification
term that may not be a good metric for classification. Therefore, a combination
of different performance metrics is used for good classification. Figure-4.7 shows
the training and testing accuracy, F1-score, and AUC of the proposed model.
0.9979 0.9777 0.9777 0.9775
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Training Accuracy Testing Accuracy AUC F1-Score
;
Figure 4.7: Proposed-Model Performance on SGCC Dataset.
62 By: Arshid Ali
MS Thesis Electricity Theft Detection
Model + Applied Technique (s) Accuracy in % FPR in % FNR in % Time (s)
PCA + SMOTE 95.95 1.31 2.73 1320
PCA + SVM-SMOTE 96.33 1.09 2.57 1540
Z-Score-Capping + PCA + SVM-SMOTE 96.27 1.12 2.61 1220
Z-Score-Capping + PCA(Features=200) + Arithmetic-Leveraging +SVM-SMOTE 97.3 0.60 2.04 1850
Z-Score-Capping + PCA(Features=400) + Arithmetic-Leveraging +SVM-SMOTE 97.29 0.62 2.09 2520
Z-Score-Capping + PCA(Features=300) + Arithmetic-Leveraging + SVM-SMOTE 97.69 0.60 1.82 2070
Table 4.2: Output Performance of Different Models on pre-
processed data.
The confusion matrix obtained in figure-4.8 shows the final prediction of the pro-
posed model in terms of TP, TN, FP, and FN. As seen from the figure that the
proposed model has a high detection rate for normal and theft class prediction.
The values of FP and FN represent wrongly classified users. A reduction in these
parameters is obtained, with the misclassification of FP and FN to 0.54% and
1.69%, respectively. This achieves our proposed objective in terms of very low
FPR and FNR.
Predicted Value
Actual Value
Figure 4.8: Proposed-Model Confusion Matrix on SGCC Dataset.
The precision and recall values are usually calculated in a single relationship. This
combined precision-recall curve (PRC) is obtained for different threshold values.
The high precision shows a low FP value and a high recall represents a lower value
of FN. The PRC curve, shown in figure-4.9, obtained using the proposed model
shows both precision and recall have a high value of 97.1%.
63 By: Arshid Ali
MS Thesis Electricity Theft Detection
Data Splitting Train/Test Size = 80:20 Train/Test Size = 75:25 Train/Test Size = 70: 30
Model Tr. Acc Tes. Acc AUC F1-Score Tr. Acc Tes. Acc AUC F1-Score Tr. Acc Tes. Acc AUC F1-Score
LGBM 0.9478 0.9329 0.9330 0.9315 0.9487 0.9293 0.9292 0.9275 0.9480 0.9309 0.9308 0.9291
RF 0.9940 0.9493 0.9494 0.9488 0.9943 0.9487 0.9487 0.9483 0.9937 0.9444 0.9443 0.9436
XGBoost 0.8329 0.8239 0.8240 0.8168 0.8353 0.8240 0.8239 0.8197 0.8334 0.8304 0.8306 0.8242
ET 0.9992 0.8252 0.8252 0.8319 0.9984 0.8044 0.8046 0.8131 0.9993 0.8069 0.8070 0.8151
Proposed-Model 99.78 0.9769 0.9769 0.9766 0.9974 0.9754 0.9753 0.9749 0.9982 0.9743 0.9743 0.9739
Table 4.3: Models Performance on Different Data Splitting.
ET, PRC=0.7508
LGBM, PRC=0.9157
XGB=0.7873
RF=0.9302
Proposed-Model, PRC=0.9725
ET, PRC=0.7508
LGBM, PRC=0.9157
XGB=0.7873
RF=0.9302
Proposed-Model, PRC=0.9725
Recall
Precision
Figure 4.9: Precision-Recall Curve of the Base Models and
Proposed-Model.
In ML accuracy shows how much of the data points are correctly classified out
of the total predicted points. This clarifies how many of the users are predicted
malicious and how many are predicted as normal using the ML model.
Figure-4.10 shows a bar chart with accuracy values of all base models and the pro-
posed model. The proposed model achieved a high accuracy of 97.6% as compared
to level-0 models. The values are given in table-??.
64 By: Arshid Ali
MS Thesis Electricity Theft Detection
0.8163 0.8346
0.9354 0.9492
0.9777
0.7
0.75
0.8
0.85
0.9
0.95
1
ET XGB LGB RF Proposed Model
Output Accuracy
ML Models used in Research
;
Figure 4.10: Accuracy Comparison of Level-0 and Proposed-Model.
The ROC plots the TPR and the FPR at different thresholds. The high value of
ROC shows the positive class prediction ability. Figure-4.11 shows the ROC value
of the proposed model to be 96.7%.
XGB, ROCAUC=0.8346
ET, ROCAUC=0.8163
LGB, ROCAUC=0.9354
RF, ROCAUC=0.9492
Proposed-Model, ROCAUC=0.9777
XGB, ROCAUC=0.8346
ET, ROCAUC=0.8163
LGB, ROCAUC=0.9354
RF, ROCAUC=0.9492
Proposed-Model, ROCAUC=0.9777
False Positive Rate
True Positive Rate
Figure 4.11: Comparison of Base Models’ ROC with Proposed-
Model ROC.
65 By: Arshid Ali
MS Thesis Electricity Theft Detection
Model F1-Score AUC Accuracy
XGBoost 0.8168 0.8240 0.8239
ET 0.8319 0.8252 0.8252
LGBM 0.9315 0.9330 0.9329
RF 0.9488 0.9494 0.9493
Proposed-Model 0.9766 0.9769 0.9769
Table 4.4: {F1-Score, AUC, Accuracy of the Base Models, and
Proposed Model
In ML, AUC is a 2-dimensional curve with TP on the y-axis and FP on the x-axis.
The AUC aggregates the TP and FP values on all given thresholds. A high value
of AUC suggests a higher prediction of a positive class, which is electricity theft
in our case.
The figure-4.12 presents the AUC, F1-score, and accuracy values of all the level-0
models and the proposed model. The results given in table-4.8 also shows a higher
performance of the proposed model as compared to the base models.
LGBM ET RF XGB Proposed-Model
Accuracy F1-Score AUC
Benchmark Models and Proposed Model
Output Performance
Figure 4.12: Comparison of AUC, F1-SCORE and Accuracy of
Base Models and Proposed Model.
4.9 Summary
Due to issues in single algorithm implementation, an ensemble stack generalization
approach is proposed in this work. The data obtained need some pre-processing
66 By: Arshid Ali
MS Thesis Electricity Theft Detection
for better classification. The pre-processing steps adopted in this work are missing
data imputation, data normalization, outliers removal, and class balancing. Fea-
ture reduction is done by principal component analysis. A stacking model consists
of level-0 classifiers, RF, ET, LGB, XGB, and level-1 classifiers as MLP. The clas-
sification metrics are explained with actual definitions and mathematical forms
to address the ETD issues to an optimum level. Further to this, a simulation
setup is fully defined including training, testing data, and machine specifications.
The results obtained are visualized by proper plots. The description of plots with
proper reasoning is explained. After final evaluation, it is found that the problems,
focused on in this work, are fully addressed and the proposed model outperformed
the existing techniques.
67 By: Arshid Ali
MS Thesis Electricity Theft Detection
Chapter 5
Proposed Model-2 and Simulation Results
68 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.1 Classification Algorithms
We discussed and implemented 15 types of classification algorithms. The detail of
each classifier is given below:
5.1.1 Decision Tree (DT)
Decision trees (DT) are supervised ML techniques that use a tree structure resem-
bling a flowchart to represent events, results, and predictions. The decision tree’s
root node is the first segment, which includes the complete dataset. Decision trees
work on any numerical dataset and do not need to operate with continuous vari-
ables, in contrast to neural networks and regression [91]. Decision trees are data
structures in which each leaf node denotes an outcome and each branch represents
a decision rule or feature that points to a certain class label. Decision trees are
commonly used to find the solution to regression and classification issues. Tree
models are used in classification problems to label or categorize an entity using
target variables with discrete values. Decision trees using a predictive modeling
approach are extensively employed in ML and data mining [92].
Good decision trees address vital variables, such as deciding upon the features to
split, the values of feature split, and the point at which you should stop splitting.
Gini index: This metric measures the error in classification in a likelihood man-
ner. The Gini index is used as an objective function in classification and is given
by the formula,
GiniIndex =GI =
C
X
i=1
pi(1 pi)(5.1)
Where P = probability of object classification.
The information gain metric shows correct classification and is oppositely related
to GI and entropy. The splitting process, in DT, is obtained using Entropy or
the Gini index as a splitting criterion. It reduces the error in feature splitting.
Information gain is given by the formula,
Info_Gain =IG =Eparent Echildren (5.2)
69 By: Arshid Ali
MS Thesis Electricity Theft Detection
Where Eparent = Entropy before splitting and Echildren = Entropy after splitting.
Pruning practices reduce the overfitting factor by eliminating tree sections with
low predictive power. This simplifies the decision tree by eliminating the weak or
not-so-relevant rules. This can be achieved in two ways:
1. Reduce the decision tree’s maximum depth,
2. And set a minimum sample size, required, for each decision space.
The complete process of the decision tree is given in algrorithm-7below [93].
Algorithm 7: Decision Tree Classifier
1
Input:
A. Labeled Training Dataset
b. List of Attributes
c. Splitting Method based on Given Attributes
Output: A decision tree
Method:
Select a Node P
If all the data is of the same class, then
Output the leaf node as P, with the given class C
If the list of attributes is zero, then
Output P as a terminal node labeled of majority class D
Find the best splitting point based on D.
Use splitting criteria to label the node P
Find the Entropy at each node
Find the information gained at each node.
Label node P to a class with maximum information gain.
Output the tree structure.
Return P.
5.1.2 Logistic Regression
Logistic regression is a classification method that shows a relationship between
continuous input variables and a categorical output. As opposed to standard
70 By: Arshid Ali
MS Thesis Electricity Theft Detection
linear regression, logistic regression modeling is unique. The response variable Y
is discrete in logistic regression as opposed to continuous. Logistic regression uses
the logit function, given in Eq. 5.3, which makes it different from linear regression.
Logit F unction = 1/(1 + e(value))(5.3)
Where ’e’ is the base of natural logarithms ’value’ is the actual numerical value
that wants to be transformed.
Logistic regression is a non-linear model which is used for binary classification
which is different from the polytomous regression model that makes multi-class
prediction [94].
The theft detection process in a power system is a binary classification problem
[95]. Let y be the observation of a sample, y=1 and y=0, representing energy
theft and non-theft, respectively. Let x be input features from users’ data, theft
probability can be expressed using the logit function, given in Eq. 5.4:
hθ(x) = P(y= 1|x) = 1
1 + eg(θ;x)(5.4)
here θshows the parameters of the model needing to be calculated through train-
ing, g(θ;x)is the classification boundary. Thus, the likelihood of theft occurrence
can be expressed as follows:
P(y= 0|x) = 1 hθ(x) = 1
1 + eg(θ;x)(5.5)
Suppose N samples whose observations are y1, y2, , yi, , yNand the relative fea-
ture vectors are x2, x2, , xi, ..., xN.From the given equations, the likelihood of
observation yican be expressed as:
P(yi|xi) = hθ(xi)yi[1 hθ(xi)]1yi(5.6)
Assuming independent instances in the given dataset, we can adjust model pa-
rameters θbased on the maximum likelihood estimation, which is described as
follows.
71 By: Arshid Ali
MS Thesis Electricity Theft Detection
L(θ) =
N
X
i=1
P(yi|xi) =
N
X
i=1
hθ(xi)yi[1 hθ(xi)]1yi(5.7)
Its logarithmic form is:
ln L(θ) =
N
X
i=1
(yiln hθ(xi) + (1 yi) ln [1 hθ(xi)]) (5.8)
Thus, the parameter θin the equation above can be obtained by optimization
method [96]. The algorithm from [97] of logistic regression is shown in algorithm-
8:
Algorithm 8: Logistic Regression
1A. Input Data D
2B. Output Class C
31. Training Set, S: For input m 1 to S
42. For every training instance ds:
53. The value of regression is
6zmymP(1|dn)
P(1|dn)(1P(1|dn))
74. Set the weight of each d_n
8P(1| dn).(1 P(1|dn))
95. Output: Label (the class: C1)$if P(1|dn)>0.5
10 6. Otherwise (class C2)
5.1.3 K Nearest Neighbors Classifier
The K-nearest neighbor (KNN) classifier is an important algorithm used for classi-
fication, pattern recognition, and other ML tasks. KNN has been regarded among
the top mining tools because of its simplicity, efficacy, and ease of application in
KNN-based categorization. As a result, numerous practical classification tasks
benefit from the application of KNN-based classification algorithms. The major-
ity of KNN variants effectively decide the target group for the new samples by
utilizing a majority voting mechanism around k’s nearest neighbors. However,
such a classification easily changes with k. It can even make the sensitivity of k
72 By: Arshid Ali
MS Thesis Electricity Theft Detection
in KNN-based classification worse, particularly in scenarios with small numbers
of minority samples and outliers [98].
The KNN stores all available class samples and then uses a distance function for
prediction. KNN is a lazy learner and incurs much less computational time than
SVM and logistic regression, as it does not require training time for prediction
purposes. The commonly used distance function is Euclidean-Distance and is
given in Eq. 5.1.3.
Euclidean Distance = qPk
i=1(aibi)2(5.9)
The above formula is used for continuous data but when we have categorical input,
then hamming-distance is used, given in Eq. 5.10.
DH=
k
X
i=1 |ai bi|(5.10)
The given distance functions are used in the classification model with k number
of nearest values. KNN has been used as a non-parametric approach since the
1970s [99]. The algorithm-9shows the complete procedure of KNN [100].
5.1.4 Bernoulli Naive Bayes Classifier
Bernoulli Naive Bayes classifier (NBC) works on probability rule called the Naive
Bayes theorem. NBC operates on the assumption that each of the identified
features is independent of the others. [101]. The mathematical form of Bayes
rule-5.11 is stated as:
P(A|B|) = P(A)P(B|A|)
P(B)(5.11)
73 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm 9: , KNN Algorithm
11. Input:
2A. The input data D,
3B. Prediction set x,
4C. Class label set C
52. Output: The class c_x of prediction set x, c_x from class C
63. Start:
73.1 For each y from data D do,
83.2 Find the difference D(y, x) from y to x
9end for
10 4. Chose a subset N belongs to the set D,
11 5. N shows k number of samples of k neighbors from the test set x
12 6. Determine the class of x:
13 7. cx=argmax PyNI(c=class(y))
14 End
Where features A and B are considered independent of each other. P(A|B)shows
the probability of A while event B already exists. Here P(A) is the probability of
feature A and P(B) shows the event B probability.
Each attribute feature has the same impact on the classification outcome, accord-
ing to the independence feature; however, the information value demonstrates how
each feature affects the outcome. Weight coefficients are determined for each fea-
ture, and a weighted Naive Bayes model is created in order to adhere as closely
as possible to the assumptions of Naive Bayes. The model’s projected class is
chosen based on the category with the highest posterior probability [102]. The
algorithm-10 from [103], shows the complete procedure of the NBC.
5.1.5 Perceptron
Frank Rosenblatt put up "The Perceptron: A Perceiving and Recognizing Au-
tomaton" in 1957 as a category of synthetic neural nets that embodied features of
the brain. The classification of linearly separable patterns can be addressed using
the perception [104].
For supervised classification, the perceptron algorithm works well. A set of input
vectors X, with each vector having a predetermined categorization, is accepted as
74 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm 10: Naive Bayes Algorithm
1Input: 1. Training set S,
22. P=(p_1, p_2, p_3,..., p_n), shows prediction on testing set
33. Input: Testing data
44. Step:
1. From the training set S;
2. Find the mean and standard deviation for each data point;
3. Reiterate
Determine the probability for each p_i using the gauss density formula for
the class;
Till the probability for all predictions (p_1, p_2, p_3, ..., p_n) has been
found.
4. Find the likelihood of the class;
5. Chose the highest likelihood.
End
training data by the learning algorithm, is accepted as training data by the learning
algorithm. The specified classification is regarded as the desired output (x). The
procedure gets training data that can be used to identify users built using the Add
classification information. A single-dimensional hyperplane is produced by the
algorithm following convergence. To do this, change the bias and weight vectors
until each training element is accurately classified by the algorithm which is either
1 or 0. The algorithm also uses ηasalearningratef orweightupdation1001[105].
y=wo+
n
X
i=0
WiXi(5.12)
where W_i is the weights and X_i is the input data. The perceptron shows good
results for binary classification using the step function given below-5.13.
f(xi) =
1,if Pn
i=0 WiXi>0
1Otherwise
(5.13)
The perceptron neural network structure is straightforward, as is its basic working
75 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm 11: Single Layer Perceptron
11. Input: Training dataset T,
2Input: Training set X,
3use learning-Rate chose max_Epoch
42. Output: Use weight W for bias
53. Start the algorithm with arbitrary W
63. Set the weight W[1]=1.0 and W[2]=1.0, chose bias=0, number of epoch=0,
float accuracy= 0
74. While (chosen epoch < max_Epoch and accuracy is 1) do
85. For all vector v_x in set X do
96. (y sgn((W [1] * v_x.X) + (W [2] * v_x.Y ) + bias)
10 7. if (y = v_z multiply Class) then
11 8. Do the setting for the next step
12 9. bias (bias + (ηv_x.Class)1)
13 10. W [1] (W [1] + (ηv_x.Class)v_x.X)
14 11. W [2] (W [2] + (ηv_x.Class)v_x.Y )
15 end if
16 12. Determine the accuracy and increase the epoch
17 13. Accuracy Find Accuracy(X, W, bias)epoch + +
18 Stop.
concept. In order to track the output behavior (using an activation function) and
ultimately make a comparison with the desired output, each input is individually
weighted by a certain value before being added up to create the final output.
5.1.6 Linear Discriminant Analysis
First proposed by R. A. Fisher, linear discriminant analysis (LDA) is one of the
most straightforward methods used for classification jobs. Using a multivariate
classifier called linear discriminant analysis (LDA), it is possible to assign samples
to one of N classes by discovering some statistical aspects of the data: the data
covariance matrix, the weight of the class within the training samples, and the
mean of each class and how closely it resembles the sample [106] using the equation-
5.14.
Sb=
g
X
i=1
Ni(xiµ)(xiµ)T(5.14)
76 By: Arshid Ali
MS Thesis Electricity Theft Detection
where µis the overall mean, Niis the sample size of class i and xiis the mean of
class i for each sample.
The population covariance matrix and means related to each class, however, could
not be fully understood. It is standard procedure to replace these with sample
estimates calculated using the training data in the LDA’s discriminant score. If
the number of training samples is sufficient in comparison to the features, this
shouldn’t have a significant impact on performance. Determining the co-variance
matrix is highly inaccurate in a high dimensional dataset [107].
The sample covariance matrix cannot be used as a plug-in estimator in some
extreme cases where the sample size is less than the number of features since
doing so would require computing the inverse of the sample covariance matrix,
which is necessary to calculate the discriminant score of the LDA. The complete
LDA process is shown in algorithm-12.
Algorithm 12: Linear Discriminant Analysis
11. Find class mean matrix, M(k×p), and variance matrix, W(p×p, where
22. W=Pk
k=1 (xiµ
k)(xiµ
k)T
33. using Eigen-decomposition of W.
44. Sphere the means: M=M W (1
2),
55. Compute B=PK
k=1(µ′∗
kµ′∗)(µ′∗
kµ′∗)T
66. PCA: Obtain L eigenvector (V
l)
7in VofB=VDBVTcorresponding to the L largest eigenvalues.
87. These define the coordinates of the optimal substance.
98. Obtain L new (discriminant) variables
10 Zl= (W1
2)V
l)TX,
11 for l=1,2,..., L.
12 9. Output new classified components
Using the approach, we reduce our dataset by minimizing the data X to Z and
moving to features L from features p.Repeating the previous LDA procedures ways
for classification purposes [108].
77 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.1.7 Passive Aggressive Classifier
The Passive Aggressive (PA) Algorithms are a group of online learning algorithms
that Crammer et al. suggested (for both classification and regression). Online
learning frequently employs the PA algorithm, which is a margin-based learning
strategy. On the one hand, the online PA algorithm updates the pre-existing
classifier by updating the weight vector to correctly categorize the current case.
On the other hand, the new classifier must adhere to the old classifier as closely
as possible.
Passive Aggressively can be used as an alternative to the perceptron algorithm to
overcome its issues. The PA algorithm has been proven superior to many other
alternative methods like Online Perceptron. The high performance is due to the
penalty rule used in PA. During prediction, PA penalizes the event by 1 if it is
false and no penalization for correct detection. The cost function used in the PA
algorithm is referred to as 0-1 loss [109].
PA algorithm also uses stochastic gradient descent for optimization purposes. This
optimization algorithm addresses the hinge-loss-5.15 function in the PA algorithm
which is given below. Where theta is a classifier, x is features and y is the target
variable.
Losshinge(θ, x, y) = max(1 y(θ.x),0) (5.15)
Except for one point, the hinge loss is continuous and differentiable throughout
It is when y(θx) = 1, which can be assessed by taking a derivative.
when y(θ.x) = 1, the gradient can be find as:
∂θ Losshing e(θ, x, y) =
yx y(θ.x)1
0Otherwise
(5.16)
Hinge-loss and margin (distance between separator and data points) are inversely
related. The margin used below is also called a discriminative function that gives
78 By: Arshid Ali
MS Thesis Electricity Theft Detection
an output score. Whenever the margin exceeds 1, the loss becomes zero other-
wise its the difference between the margin and one. Consequently, the passive-
aggressive algorithm’s objective is to identify the following θ,θ(k+1) that reduces:
λ
2
θk+1 θk
2Losshinge(θ, x, y)(5.17)
Now we will see where the term passive-aggressive got its name. It is due to the
fact that the PA algorithm is passive for correct classification and aggressive for
false prediction. Also from the equation given below, the update step is:
θk+1 =θkηLosshing(θ, x, y) = θk+ηyx (5.18)
In the final step, the output is obtained using the equation:
yt=sign(updated_θ.w)(5.19)
The complete procedure of passive aggressive classifier from [110], is shown in
algorithm-13
Algorithm 13: Passive Aggressive Classifier
11. Input: Aggressiveness factor C> 0
22. Start: The weights from w1= (0, ..., 0)
33. For all m = 1,2,...
44. Use the instance: x_m Rn
55. Estimate: y_m = sign(w_m.x_m)
66. Output true label: y_m (1,1)
77. Occur loss: l_m=max(0, 1-y_m(w_m x_m))
88. Update: set:
9τm=lm
xm2(P A)
10 τm=min nC, lm
xm2o(P A 1)
11 τm=nlm
xm2+1
2Co(P A 2)
12 9. Update: wm+1 =Wm+τmymxm
13 10. Output class of the new data points
79 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.1.8 Stochastic Gradient Descent
Modern machine learning relies heavily on stochastic gradient descent (SGD).
SGD refines a function by taking smaller steps along noisy gradients. Robbins and
Monro’s (1951) classic finding is that this process provably achieves the function’s
optimum (or local optimum, when it is nonconvex). Recent research examines the
benefits of constant step sizes, gradient or iterative averaging, and adaptive step
sizes. The gradient descent technique includes stochastic gradient descent. The
cost function, however, can be updated after each cycle with just a few random
data sets. To update the solution, SGD merely requires a few random samples of
the data. Each training sample x(i)and label y(i)is updated using SGD [111].
θ=θη· θJ(θ;x(i);y(i))(5.20)
To update the solution, this is the same as drawing A’s sub-matrices. Because each
SGD iteration is small and simple to compute, this method can run faster than
traditional descent methods, even if the update direction is not always optimal
[112].
SGD, which offers substantially faster convergence in the order of only, addresses
the problem of high computational cost. Only the quantity of data required to
calculate the objective function’s gradient makes SGD different from other meth-
ods. The accuracy of weight updates and update times are traded off based on
the amount of data available [113].
80 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm 14: Stochastic Gradient Descent
1Start:
21. Initialize ηand Wo
32. For Set s= 1, 2, 3, ... S
43. For i (1,2, ...S)
54. Select one sequence randomly at a time.
65. Ws=Ws1η(Ws1, xi, yi)
76. Track the changes
87. E
s= 0
98. For all j= 1, 2, 3, ... S
10 9. Ej=E(Ws, xj, yj)
11 10. E
s=E
s/S
12 11. s= s+1
13 12. Check for the stop measure
14 End
The pseudo-code-14 of SGD also only has two main steps: individual gradient
computation and weight update, and instead of calculating a genuine gradient,
it calculates the gradient of a sample that was randomly chosen during iteration.
The method terminates when the required conditions are met, which is once again
the maximum number of iterations or before the system starts to overfit.
5.1.9 Gaussian Naive Bayes
The Gaussian Naive Bayes (GNB) probabilistic classifier relies on the Bayes theo-
rem that makes the strong (naive) assumption that each feature is independent of
all others. Assuming a Gaussian distribution for attribute values given the class
label, the GNB classification is an example of the nave Bayes approach [114]. For
example, the dataset with the ith attribute has its mean and variance and are
represented by µc,i and σ2
c,i respectively, given the class label c.
p(xi|c) = 1
q2πσ2
c,i
exp((xiµc,i)2
2σ2
c,i
)(5.21)
81 By: Arshid Ali
MS Thesis Electricity Theft Detection
The equation-5.21 given below, shows the data with values x_i with class c which
is also called a normal distribution.
The equation to find the average µ-5.22 and standard deviation (δ)-5.23 has the
following formulas:
Average =µ=Pn
i=1 xi
n(5.22)
Standard_Deviation =δ2=Pn
i=1(xiµ)2
n1(5.23)
The GNB algorithm is relatively straightforward, easy to use, and doesn’t need a
lot of training data. It can deal with missing data very effectively, is not sensitive
to irrelevant features, and scales linearly with the number of features and data
points. The GNB algorithm’s reliance on predictor independence is a significant
flaw [115].
The given pseudo-code-15 from [116], for GNB-based classification, shows the
complete process.
Algorithm 15: Gaussian Naive Bayes Algorithm
1Start:
21. Read the training data
32. Divide the data into classes C
43. For each C do
54. Find the attributes
65. Find the mean and standard deviation
76. Use the Gaussian Function to calculate probabilities
87. Choose which of the two classes has the highest probability value
98. Verify how well the predicted class matches the actual class
10 9. Accuracy of the return value
11 End
5.1.10 Multinomial Naive Bayes Algorithm
The Multinomial Naive Bayes (MNB) algorithm is simple, easy to use, and doesn’t
require a lot of training data. It scales linearly with the number of features and
82 By: Arshid Ali
MS Thesis Electricity Theft Detection
data points and excels at handling missing data. An important problem in the
MNB algorithm is its reliance on predictor independence [117]. The distribution
vectors-5.24 are:
θy= (θy1,··· , θyn)(5.24)
Where y is the class label, n shows a number of features, and θyiis the probability
P(xi|y)for all features for a sample of classy.
Let C is the total number of classes. The MNB with highest probability P(C|ti)
using Bayes rule-5.25,
P(C|ti) = P(C)P(ti|C)
P(ti), c C(5.25)
The prior probability can be found by taking the ratio of tokens of a class to the
total number of tokens. P(ti|C)is the likelihood of obtaining a token ti in class
C. The scaling term P(ti)-5.26 can be calculated as:
P(ti) =
|C|
X
k=1
P(k)P(ti|k)(5.26)
Algorithm-16 [118] gives the pseudocode of MNB.
Algorithm 16: Multinomial Naive Bayes
1Start:
21. d=n=(n1, ..., nv)
32. C=argmaxcP(D|C)P(C)
43. argmaxcP(n|C)P(C)`v
i=1 P(wi|C)niP(C)
54. argmaxcP(n|C) + logP (C) + Pv
i=1 nilogP (wi|C)
65. argmaxcP(C) + Pv
i=1 nilogP (wi|C)
76. Output Class, C.
8End
83 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.1.11 Ridge Classifier
Ridge classification is a technique that is used to analyze linear discriminant mod-
els. It is a form of regularization that penalizes model coefficients to prevent
over-fitting. Over-fitting occurs when a model is too complex and captures noise
in the data instead of the underlying signal. Ridge classification addresses this
problem by adding a penalty term to the cost function that discourages complexity.
Typically, the penalty term is the sum of the squared coefficients of the model’s
features. This inhibits over-fitting by requiring the coefficients to stay small. By
altering the penalty, one can adjust the amount of regularisation, which results in
more regularisation and decreased coefficient values. Under-fitting, however, may
occur if the penalizing time is too long. In contrast to logistic regression, the ridge
classifier’s loss function is not a cross-entropy loss. Instead, a mean square loss
with an L2 penalty is used as the loss function.
Algorithm 17: Ridge Classifier
1Start:
21. Converts the target variable to the proper values of +1 and -1
32. Create a ridge model with mean square loss as the loss function
43. Use regularization (ridge) as a penalty term
54. If the value being predicted is less than 0,
6Then Predict the class target as -1
75. Otherwise, the estimated target class is +1
86. One-versus-all training is used to train the Ridge classifier
97. Label-Binarizer is used
10 8. Objective is one binary classifier per class
11 End
In ridge regression, the classification error is associated with a regression problem
and can be found using the cost-sensitive formula:
min
βc(yXβ)2
2+λβ2
2(5.27)
where cRnis a vector of error vector for each instance and shows element-wise
multiplication.
84 By: Arshid Ali
MS Thesis Electricity Theft Detection
The vector c can be divided into two parts:
c=c(p)c(n)(5.28)
where c(p)shows miss-classification error related with the true instance and c(n)
shows the error associated with the negative instance [119]. The pseudo-code- [120]
of ridge classifier is given below-17:
5.1.12 Nearest Centroid Classifier
Nearest Centroid (NC) and the neighborhood relation and is derived from Rela-
tive Neighbourhood graphs have both been employed successfully in finite situa-
tions. The resulting classification methods aim to find prototypes that are distant
enough, but also uniformly or symmetrically shaped. The NC technique is a solid
baseline classifier that yields results that are understandable, but its performance
suffers when the data points are distant with comparable variances. Each labeled
data point in the input of the algorithm is a member of a distinct class, and the
algorithm is given a number of these data points.
Algorithm 18: Nearest Centroid Classifier
1Start:
21. Calculate the class means based on the training data
32. m+=1n + Pn+is.t.yi = +1xi
43. m=1nPnis.t.yi = 1xi
54. Calculate the distance between a new test point, x, and the mean of each
class
65. Calculate d+ = ||xm +||2(Here ||.||2 means Euclidean distance)
76. Compute d=||xm||2
87. Classify x to the class corresponding to the smaller distance.
98. Find the smaller of d+ and d
10 9. Computing the class means m+, m corresponds to training the classifier.
11 10. w=m+m
12 11. Find the intercept is given by
13 12. b=12(||m+||2||m||2)
14 13. Compute the discriminant f=wTx +b.
15 14. Compute the sign of the discriminant y’=sign(f).
16 15. Classify y’ to the positive class if wTx+b > 0and to the negative class if
wTx+b < 0.
85 By: Arshid Ali
MS Thesis Electricity Theft Detection
The algorithm’s relatively straightforward centroids computation step is used in
the model fitting step. A new data point is categorized by locating the centroid
that is closest to it in Euclidean distance and applying the matching label after
the centroids of each class have been located [121]. An NC classifier, also known
as the nearest prototype classifier, outputs target training samples that are near
the centroid [122]. Usually, the Euclidean distance formula is used to find the
difference as shown in equation-5.29:
d(p, q) = p(p1q1)2+ (p2q2)2+. . . + (pnqn)2(5.29)
Here p is the actual class data and q shows class centroid: p_1...p_n is features
of n obtained data q_1...q_n is attributes of n class centroid.
The estimated difference between the actual samples and the number of class
centroids are measured, rated, and the closest distance is chosen. The membership
of the class was then determined using the observed data. Pseudo-code is available
from the classifier [123].
5.1.13 Quadratic Discriminant Analysis
A quadratic discriminant analysis (QDA) is a multivariate classifier. The QDA
generalizes the linear discriminant function analysis, which fits multivariate nor-
mal distributions with estimates of each group’s covariance. Using statistical clas-
sification, the QDA Classifier divides measurements of multiple instances by a
quadratic surface. It finds the correlations in the data set for each class based
on its relation with the centroid. The final results suggest the likelihood of class
spectral [124].
The only significant difference between QDA and LDA is the assumption that
the co-variance matrix may differ for each class, leading us to determine the co-
variance independently for each class k, where k =1, 2, ..., K.
Quatratic function is used for QDA classification and is given in Eqn. 5.30.
86 By: Arshid Ali
MS Thesis Electricity Theft Detection
δk(x) = 1
2log X
k1
2(xµk)T
1
X
k
(xµk) + logπk(5.30)
This quadratic discriminant function shows different behavior than linear discrim-
inant function due to a sum of k factor and will contain second order terms as
shown in equation-5.31:
Classification rule-5.31:
G(x) = argmaxkδk(x)(5.31)
The classification method simply predicts the class k that maximizes the quadratic
discriminant function. Quadratic equations in x represent the choice boundaries.
QDA typically fits the data more accurately than LDA, despite having more pa-
rameters to estimate. This is because QDA offers the covariance matrix more
room for operations. With QDA, there are a lot more parameters. Because every
class has its own covariance matrix when using QDA [125].
The algorithm-19 shows the complete procedure for QDA [126].
Algorithm 19: Quadratic Discriminant Analysis
1Start:
21. Collect the training data
32. Set the prior probabilities using
4p
i=ni
N
53. Do Bartlett’s test, whether the data has homogeneous or heterogeneous
variance-covariance
64. If Pi=Pjfor some i=j then
7the data has heterogeneous variance-covariance matrices and QDA can be
applied
85. Identify and estimate the conditional probability density functions’
parameters f(X|πi)
96. Compute the discriminant functions
10 7. Use cross-validation to estimate misclassification probabilities
11 8. Classify observations with unknown group memberships.
12 End
87 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.1.14 Complement Naive Bayes
In Multinomial Naive Bayes, only one class c is employed to estimate weights.
In contrast, the Complement Naive Bayes algorithm uses all training data from
all classes except the c class. The weights will be lower for the class with less
training data if the training data is skewed. Therefore, classification will unfairly
favor one class over another. A novel "complement class" variation of Naive Bayes
is introduced to deal with skewed training data, and it is known as Complement
Naive Bayes (CNB) [127].
In contrast, CNB uses data from all classes aside from class C to estimate the
parameters. Because each estimate employs a more equal distribution of training
data across classes, CNB’s estimations are more accurate and reduce the biasness
in weight assignment. Because of finding more reliable weight estimates, prediction
performance can be increased. These benefits result from more data per estimate,
although overall, CNB is less vulnerable to skewed data bias when utilizing the
same amount of data.
The algorithm-20 shows the complete procedure of CNB Algorithm.
Equation-5.32 shows the the formula of Complement Naive Bayes Algorithm rule
θ
ci =Nci+αi
N
c+α(5.32)
where Nci shows point i repetitions in a class of data other than c and Nc is
the total number of instant occurrences in classes other than c, and αiand
αaresmoothingparameters, asinEquation.Asbef ore, theweightestimateiswci =logθci
and the classification rule is
lCN B (d) = argmaxc[logp(θc)X
i
filog Nci+αi
N
c+α](5.33)
The parameters are estimated using data from all classes except class C in CNB.
The estimations made by CNB are more precise and have less bias in the weight
estimates since each estimate uses a more fair distribution of training data across
classes. The classification accuracy has improved as more trustworthy weight
88 By: Arshid Ali
MS Thesis Electricity Theft Detection
Algorithm 20: Complement Naive Bayes
11. Let (d1, ..., dn)be a set of data, with dij is the count of data i in set j.
22. Let y= (y1, ..., yn)be the labels.
33. CNB (d, y)
44. dij =log(dij + 1)
55. dij =dij
log k1
kδik
66. dij =dij
k(dkj )2
77. θci =y:yj =cdij +αi
i:yj =ckdkj +α
88. ωci =logθ
99. ωci =ωci
iωci (Weight Normalization)
10 10. Let t=(t1, t2, ...tn)is a test point,
11 let tiis the class according to
12 l(t)= argmincPitiωci
estimates have been discovered. Although overall, CNB is less susceptible to
skewed data bias when using the same amount of data, these advantages derive
from more data being used per estimate. The classification rule is
lOV A(d) = ar gmaxc[logp(θc) +
i
filog Nci+αi
N
c+α
i
filog Nci+αi
N
c+α](5.34)
This is a combination of the regular and complements classification rules [128].
The algorithm-20 from [118] shows the complete CNB procedure.
5.1.15 Dummy/Blind Classifier
A blind classifier (BC) is an algorithm that only takes into account its statistics
to generate output labels at random. Suppose p is the available target class
in the dataset. The likelihood that the BC will assign a certain target lito a
specific instance if the p classes are distributed equally over the dataset. Where
P(li) = 1/p$fori = 1,2, ...p.
The former equation must be weighted in the event of label unbalancing in relation
to the proportion of patterns in class Lito all other patterns. Since the dummy
classifier does not take the information contained in the training set into account.
When assigning the output labels, it is used as a comparative matrix to measure
89 By: Arshid Ali
MS Thesis Electricity Theft Detection
the classifier performance [129]. The studies employ a "dummy" classifier that
uses a straightforward stratified technique to create predictions at random while
adhering to the distribution of classes in the training set [130].
5.2 Data Balancing Techniques
We evaluate our proposed structure on imbalanced data and also use 5 different
balancing techniques. The details are given below:
5.2.1 Synthetic Minority Over-Sampling Technique
The Synthetic Minority Over-Sampling (SMOTE) technique is used as a data
oversampling approach in dealing with the original training set. The main idea
behind SMOTE is to create artificial instances rather than simply duplicating
the instances of minority classes. Several minority class instances that are located
inside a specific neighborhood are interpolated to create this new data. Due to this,
the process is said to be ’feature space’ focused rather than ’data space’ focused,
i.e., the algorithm is based on the values of the features and their relationships
rather than taking into account the dataset as a whole. This also suggested more
research into the theoretical relationship between actual and artificial instances,
including a thorough analysis of data dimensionality.
Instances of the ximinority class are chosen as the foundation for brand-new
artificial sampling points. Several nearest neighbors of the same class are chosen
from the training set based on Euclidean distance. To obtain fresh instances,
a randomized interpolation is then completed. The basic procedure operates as
follows: consider first the total amount of oversampling N (an integer) first, which
can be configured in either of the two ways to get almost the same distribution of
classes that is nearly 1:1.
The method is then carried out iteratively, in a series of steps. First, a train-
ing set instance representing a minority class is randomly chosen. Next, its K
closest neighbors are determined, which are set to 5. In order to compute the
new instances through interpolation, N of these K instances are finally selected at
90 By: Arshid Ali
MS Thesis Electricity Theft Detection
random. To do this, the difference between each of the chosen neighbor and the
feature vector (sample) is calculated. This difference is added to the prior feature
vector after being multiplied by a random value between 0 and 1. As a result, a
random point is chosen along the ’line segment’ connecting the features [131].
5.2.2 Adaptive Synthetic sampling approach
In the same way that SMOTE provides synthetic data for minority classes, He et al.
proposed ADASYN, which does the same. However, it is predicated on producing
more synthetic data for observations that are more difficult to learn than those that
are simpler to learn for a particular model. Similar to SMOTE, ADASYN produces
synthetic observations along a straight line between an observation belonging to a
minority class and its k-nearest minority class neighbors. Similar to SMOTE, the
K-nearest neighbor number is set at 5. But ADASYN produces more synthetic
observations for minority class observations when there are more positive class
data in the region of the k-nearest neighbors. In contrast, if there are no majority
data within the k-nearest neighbors’ range, no artificial data will be produced for a
minority. The justification behind this is that, these data make it more challenging
to infer minority observations that are very dissimilar to the majority views [132].
5.2.3 Combined Cleaning and Re-sampling Technique
The CCR algorithm was initially introduced by Koziarski and Woniak in the
context of binary classification problems [133]. Spheres expand using the available
energy, with the cost increasing for every majority observation encountered during
the expansion. Since the majority of observations inside the spheres are being
translated instead of being completely removed, the information associated with
their original positions is preserved to a large extent preserved.
91 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.2.4 Noise Reduction A Priori Synthetic Over-Sampling
(NRAS)
The Noise Reduction A Priori Synthetic Over-Sampling approach is based on the
inclusion of the conditional probability of minority group membership (four) for
data that don’t appear to be noise. Using the Bayes Theorem, we determine one
of two methods for the likelihood that a group will consist of members.
To make p(x) marginal, we can do the following. For ease of use, we can utilize
this directly if the data distribution is known [134].
The NRAS method will provide a new featurethe probability of belonging to a
minority groupand exclude samples from the minority group that appear to be
noise.
The noisy sample is judged by the formula below:
NOlSEi= 1, if CDi>1
N
N
i=1
CD
it and RD > 1
N
N
i=1
RD
it2|0, else (8)(5.35)
NOISE_i is 1 if the sample xi is a noisy sample and is 0 if not. Array CD[]
and RD[] is the core distance and reachability-distance of all the samples. The
core distance and reachability-distance fully reflect the density information of the
datasets [135].
5.2.5 SMOBD (Synthetic Minority Over-sampling Based
on samples Density)
The SMOBD technique is based on oversampling datasets to produce samples that
are more closely related to their real distribution than previous approaches like
SMOTE and SMOTE-ENN. The noise in the data is removed and no additional
samples are synthesized to cover it up. A straightforward approach for determining
sample density based on reachability and core distances is suggested. Density for
alone instance can be calculated using the Eqn 5.36.
92 By: Arshid Ali
MS Thesis Electricity Theft Detection
DFi=η1εi+η2N (5.36)
DFistands for the density of sample xi, with mean values of 1and 2. The sum
of η_1and η_2is 1. The instance density depends on two variables. The lowest
instance size Niinside the radius and the quantity of nearest neighbor (k) at the
closest distance to the centroid.
The formula below calculates the synthetic instance around every minority group.
Ni=DFi/(
j=n
X
j=1
DFj)N(5.37)
Nimeans the number of new samples synthesized around sample xi. N means the
total number of new synthesized samples.
5.3 Research Methodology
The research data was obtained from Smart Grid Corporation of China (SGCC)
for Fujian City, China, and is available online at the SGCC website.
This data was chosen because of its ease of availability, and research gap and has
also been de-identified (for privacy purposes), therefore, confidentiality is ensured.
The number of attributes is 1034 with one class (theft) having 3615 instances and
the other class having 38757 instances. Table-5.1 provides the complete informa-
tion about the dataset.
Attribute Value
TimeFrame of Data Collection 01-01-2014 to 31-10-2016
Total Consumers 42372
Number of Theft Users 3615
Number of Honest Users 38757
Table 5.1: Information of Real World SGCC Dataset.
The comparative analysis among various supervised machine learning algorithms
was carried out using sklearn with colab environment. The data set was trained
93 By: Arshid Ali
MS Thesis Electricity Theft Detection
to reflect the consumer class. The values 1s for theft class and 0’s for normal
users’ class is given in the dataset. The theft detected is consider positive or
1 and honest user predicted is considered negative or 0. The fifteen classifica-
tion algorithms were used in the course of this research namely: Decision Tree,
Naïve Bayes, Perceptron, Gaussian Naive Bayes, K Nearest Neighbors, Comple-
ment Naive Bayes, Linear Discriminant Analysis, Quadratic Discriminant Analy-
sis, Multinomial Naive Bayes, Logistic Regression, Passive Aggressive Classifier,
Stochastic Gradient Descent, Ridge Classifier, Nearest Centroid Classifier and
for comparison Dummy/Blind Classifier are used. The following performance at-
tributes were considered for the comparative analysis: Accuracy, F1-Score, MCC,
Precision, Recall, FPR, FNR, and AUC for Classification. The overall structure
of the proposed paper is shown in figure-5.1.
Classifier
Prediction
Data
Collection
Data Pre-
Processing
Final Results
Classifier
Prediction
Data
Collection
Data Pre-
Processing
Final Results
Data BalancingData Balancing
Results Evaluation
HONEST THEFT
;
Figure 5.1: Proposed Electricity Theft Detection Model.
In order to predict the actual class, eight different performance metrics are cho-
sen. To ensure the best classification for different machine learning algorithms,
this research work was carried out by using 15 ML classifiers with different data
balancing techniques. As the SGCC dataset is class imbalance. The first category
94 By: Arshid Ali
MS Thesis Electricity Theft Detection
was total instances of the dataset and all features without any class balancing. The
second category of data set was all instances and attributes while using SMOTE
for class balancing. This helps to avoid the model biasness to the majority class
and increases the actual class prediction occurrences. In the following category of
experiment ADASYN, SMOBD, NRAS, and CCR techniques are used to better
address the class imbalance issue. All 15 classifiers are analyzed based on the eight
performance metrics to determine their feasibility in theft and honest consumer
classification in a smart grid environment.
5.4 Evaluation Parameters Used
For the comparative study, important performance indicators such as accuracy,
AUC, precision, recall, FNR, FPR, MCC, and F1-score are used. Every parameter
has a specific function and meaning in malicious behavior identification. These
parameters are discussed as follows.
5.4.1 Accuracy
Accuracy shows the number of correctly identified users divided by the total num-
ber of classified consumers and is calculated using Eqn. 5.38.
Accuracy =T P +T N
T P +F P +F N +T N (5.38)
5.4.2 Recall
Recall is the ratio of the set of theft values that the classifier predicted as theft
against the total number of theft samples, as given in Eqn. 5.39.
Recall =T rueP ositiv e
T rueP ositive +F alseN egativ e (5.39)
95 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.4.3 Precision
Precision is an important indicator of a machine learning model’s performance in
classification. It shows the quality of theft prediction of the ML model, which is
given using Eqn. 5.40.
precision =T P
T P +F P (5.40)
5.4.4 F1Score
The f1score combines the precision and the recall and is given by the Eqn. 5.41:
f1score =2×precision ×recall
precision +recall (5.41)
5.4.5 Area Under the Curve
The area under the curve (AUC) shows the total area covered under the ROC
plot or the total points lying under the ROC curve. The threshold with maximum
ROC is called the AUC value of that model.
5.4.6 False Positive Rate
The ratio of the quantity of false positives predicted by the model to the quantity
of FP and TN is known as the false positive rate (FPR). It uses the equation shown
below to gauge how effectively the system can accurately reject false positives and
has the equation-5.42 given below:
FPR =F alseP ositive
T rueN egative +F alseP ositiv e (5.42)
5.4.7 False Negative Rate
False negative rate (FNR) shows the ratio of FN to the total of TP and FN. It
evaluates the chance that a target will be missed and have the equation-5.43 given
96 By: Arshid Ali
MS Thesis Electricity Theft Detection
below:
FNR =F N
T P +F N (5.43)
5.4.8 Matthews Correlation Coefficient
A metric used to assess the performance of a binary or two-class classifier is the
Matthews correlation coefficient. The coefficient returns +1 for an accurate fore-
cast, while a value of zero denotes a random prediction. MCC is a much more
accurate metric than accuracy or the F1 score because the latter two may be de-
ceptive because they do not consider the overall values from the confusion matrix
as shown in equation=5.44.
MCC =T P ×T N F P ×F N
(T P +F P )(T P +F N )(T N +F P )(T N +F N )(5.44)
5.4.9 Receiver Operator Characteristic
A performance Receiver Operator Characteristic (ROC) indicator shows the clas-
sification performance using different values of TPR and FPR. The TPR is given
on X-axis and FPR on Y-axis in the ROC curve.
5.5 Dataset and Simulation setup
The data on energy use that we utilized to develop and validate the proposed
framework is presented in this section. We test our model on a dataset that
contains both benign and harmful data. The data is obtained from the State
Grid Corporation of China (SGCC). It includes both positively and negatively
skewed energy estimations. The energy usage of 42,000 users was recorded over
three years. The SGCC dataset includes daily power usage statistics for 42,372
consumers, including 3615 electrical criminals (class 1) and 38,757 real consumers
(class 0). (1st January 2014 to 31st October 2016). For use in both model training
and prediction, the dataset is divided by 70:30.
97 By: Arshid Ali
MS Thesis Electricity Theft Detection
A system of Core-i7 and 8GB RAM is used for simulation purposes. Google co-
laboratory (colab) is used for code simulation. The results obtained are saved as
PDF for evaluation purposes and are discussed in the next section.
5.6 Simulation Results and Analysis
For the objective of detecting theft, we examined fifteen different categories of
ML classification algorithms. All fifteen classifiers are implemented using the
SGCC dataset. For this purpose, we employed the following fifteen commonly used
classifiers: DT, NB, perceptron, MNB, SGD, PAC, LR, KNN, QDA, CNB, LDA,
GNB, NCC, RC, and Dummy classifier. First of all, the original dataset is used
for theft and normal users’ classification purposes with little pre-processing like
addressing the NaNs and data normalization. We examined all of the classifiers’
results in order to complete a thorough comparative analysis of 5 different data
balancing techniques.
Output Accuracy
15 Implemented Models
(a) 15 Classifiers’ Output Accuracy
without Balancing
Output Performance
15 Implemented Classifiers
(b) 15 Classifiers’ Output Performance
without Balancing
In the initial round, all classifiers are trained using the whole dataset without
making any additional changes to the original dataset, and we then assess the
learned classifiers using the testing set from the dataset. The original imbalance
dataset, therefore, yielded biased outputs of the classifiers in the form of accuracy.
The testing accuracy of all the fifteen models is given in figure-5.2b:
98 By: Arshid Ali
MS Thesis Electricity Theft Detection
In the first set of experiments, we used the original dataset for mentioned classifiers
and obtained a quite random classification prediction. However, DT and CNB
show higher AUC values of 0.60 and 0.59, respectively, as compared to the rest
of the classifiers. While SGD produces a completely random classification almost
equal to the dummy classifier, with an AUC value of 0.5. Looking at the MCC
score, which needs to approach 1 for the best classification case, it comes out to
be 0.19 and 0.18 for CNB and DT, respectively.
Figure-5.3a presents classifiers’ results in terms of precision and recall for sepa-
rating theft from honest consumers. We found that the best result for all the
classifiers is linked to samples for each class in the dataset. Meanwhile, the detec-
tion of theft users is very low. SGD shows the worst performance on theft users
with a PRC value of 0.090 almost the same as the dummy classifier with a PRC
of 0.087.
Precision
Precision
Recall
(a) 15 Classifiers’ Precision-Recall
Curve without Balancing
False Positive Rate
True Positive Rate
(b) 15 Classifiers’ AUROC without
Balancing
The most significant indicator of an ML algorithm’s effectiveness is accuracy. A
classifier’s accuracy in an unbalanced dataset, however, cannot accurately deter-
mine how well it predicts. The area under the curve (AUC) score is an effective
performance indicator for unbalanced datasets.
99 By: Arshid Ali
MS Thesis Electricity Theft Detection
5.6.1 Output Performance using SMOTE-based Data Bal-
ancing
In the 2nd set of experiments, we used the original dataset but used SMOTE
for class-balancing purposes. This time we obtained an improvement rather than
random predictions in the previous subsection.
Output Accuracy
15 Implemented Models
(a) 15 Classifiers’ Output Accuracy us-
ing SMOTE
Output Performance
15 Implemented Classifiers
(b) 15 Classifiers’ Output Performance
on SMOTE
This performance characteristic controls how quickly the true positive rate in-
creases when the false positive rate climbs. On the receiver operating characteristic
(ROC) curve, increasing the threshold values demonstrates the exchange between
the genuine positive and false negative rate on the receiver operating characteristic
(ROC) curve as shown in figure-5.3b. AUC refers to the maximum value achieved
by ROC curves in positive and negative differentiation. The positive and negative
outcomes are easy to distinguish if the AUC is 100%. It is impossible to make a
distinction between the negative and positive classes if the AUC keeps falling to
0%.
To compare the performance of the classifiers, as stated in the earlier section, the
models are implemented with training data samples while predictions were made
using a testing set. To confirm the systems’ performance, evaluation metrics,
including recall, precision, and F1-score, are computed. The accuracy for all the
classifiers is obtained and plotted in figure-5.4a:
100 By: Arshid Ali
MS Thesis Electricity Theft Detection
As seen, the SMOTE-based balanced models outperform the models that are
trained on unbalanced data. For output ’1’ SMOTE+DT outperforms with AUC
and MCC values of 0.84 and 0.69, respectively. While NCC+SMOTE shows poor
performance with AUC and MCC values of 0.56 and 0.16, respectively. The rest
of the performance metrics are shown in figure-5.4b.
Precision
Precision
Recall
(a) 15 Classifiers’ Precision-Recall
Curve on SMOTE
False Positive Rate
True Positive Rate
(b) 15 Classifiers’ Output AUROC on
SMOTE
Based on the testing set, the decision tree produced the best results with an
overall PRC of 0.77. While GNB(RF) and NCC show output PRC of 0.52 and
0.53 which is almost the same as random prediction like dummy classifier. A
careful examination of these results reveals that PRC for the DT is significantly
higher than the PRC of the remaining classifiers, as shown in figure-5.5a.
In figure-5.5b, the AUC metric value is used to evaluate the ML algorithms. The
DT classifier has the highest value, with an AUC of 0.83, among all the models.
5.6.2 Output Performance using ADASYN-based Data Bal-
ancing
This experiment used ADASYN-based data balancing. We used the original bal-
anced dataset and this time we obtained an overall improvement rather than
random predictions in the previous subsection.
101 By: Arshid Ali
MS Thesis Electricity Theft Detection
Output Accuracy
15 Implemented Models
(a) 15 Classifiers’ Output Accuracy us-
ing ADASYN
Output Performance
15 Implemented Classifiers
(b) 15 Classifiers’ Output Performance
on ADASYN
A total of six performance metrics are plotted against all 15 implemented classi-
fiers. These metrics are MCC, FPR, FNR, AUC, F1-Score, and accuracy. Looking
at all these parameters, overall the KNN and DT classifiers outperform. While
GNB, perception, and SGDC show poor classification results, as shown in figure-
5.6b.
The precision-recall curve of all implemented classifiers for ADASYN-based data
balancing is shown in figure-5.7a. The PRC also shows the highest value for KNN
and DT classifiers.
Precision
Precision
Recall
(a) 15 Classifiers’ Precision-Recall
Curve on ADASYN
False Positive Rate
True Positive Rate
(b) 15 Classifiers’ Output AUROC on
ADASYN
102 By: Arshid Ali
MS Thesis Electricity Theft Detection
It is appropriate to compare these classifiers using the AUC calculation. It is a
suitable statistic for model comparison since AUC is scale-invariant and thresh-
old invariant. A perfect model has an area of 1, and one with a bigger area is
considered superior. The whole two-dimensional Area Under the ROC curve is dis-
played in figure-5.7b, called AUROC. Among the fifteen implemented classifiers,
KNN and DT outperform in terms of theft and normal user classification.
5.6.3 Output Performance using SMOBD-based Data Bal-
ancing
In the following experiment, a SMOBD data balancing technique is used on the
SGCC dataset. The performance of 15 implemented classifiers is evaluated and
discussed.
The accuracy metric is used for classification evaluation of all the implemented
classifiers. KNN and DT showed an improved accuracy score of 0.86 and 0.79,
respectively, as shown in figure-5.8a. While the remaining models give a random
accuracy value below 0.70.
Accuracy, AUC, MCC, F1-Score, precision, recall, and FNR are used for classi-
fication comparison. The figure-5.8b shows that DT and KNN outperform other
classifiers in terms of theft and honest users’ classification.
Output Accuracy
15 Implemented Models
(a) 15 Classifiers’ Output Accuracy us-
ing SMOBD
Output Performance
15 Implemented Classifiers
(b) 15 Classifiers’ Output Performance
on SMOBD
103 By: Arshid Ali
MS Thesis Electricity Theft Detection
Figure-5.9a presents the assessment of the ML algorithms in relation to the PRC
metric value. Comparatively, the KNN and DT classifiers had highest PRC of
0.78 and 0.73, among all the models. While the GNB classifier shows a poor PRC
value of 0.53, which is almost near to the blind/dummy classifier.
Precision
Precision
Recall
(a) 15 Classifiers’ Precision-Recall
Curve on SMOBD
True Positive Rate
False Positive Rate
(b) 15 Classifiers’ Output AUROC on
SMOBD
The AUC value is a good performance metric for imbalanced dataset classification.
This performance indicator measures how quickly the TPR changes as the FPR
rises. The ROC curve illustrates the trade-off between the TPR and the FPR by
increasing the threshold values. The ROC for all the classifiers with maximum
value (denoted as AUC) is shown in figure-5.9b. As seen, the KNN and DT
outperform other classifiers on SMOBD-based data balancing with AUC values of
0.86 and 0.79, respectively. This shows that the positive and negative results can
be easily separated as compared to other classifiers.
5.6.4 Output Performance using NRAS-based Data Bal-
ancing
This experiment shows the classification performance of 15 classifiers using the
NRAS data balancing technique on the SGCC dataset.
104 By: Arshid Ali
MS Thesis Electricity Theft Detection
Output Accuracy
15 Implemented Models
(a) 15 Classifiers’ Output Accuracy us-
ing NRAS
Output Performance
15 Implemented Classifiers
(b) 15 Classifiers’ Output Performance
on NRAS
Various performance metrics are considered for comparison purposes and are plot-
ted.
The accuracy term is used for the classification evaluation of all the implemented
algorithms. The general classification metric, accuracy term, shows the highest
values of 0.92 and 0.91 for KNN and DT respectively. The lowest accuracy is
obtained for NCC and GNB of 0.59 and 0.60, respectively, when compared with
the dummy classifier.
Precision
Precision
Recall
(a) 15 Classifiers’ Precision-Recall
Curve on NRAS
False Positive Rate
True Positive Rate
(b) 15 Classifiers’ Output AUROC on
NRAS
FNR is a categorization metric that provides a detailed understanding of the
number of thieves who are mistakenly classified as honest users. In order to detect
105 By: Arshid Ali
MS Thesis Electricity Theft Detection
NTLs, a low FNR is preferred. With the highest observed FNR, the two classifiers,
NCC and GNB, appeared to perform poorly. Figure-5.10b demonstrates that the
FNR for the three classifiers is very low. In our actual dataset, KNN shows the best
option for theft detection since it has the lowest FNR. Among the 15 classifiers,
FNR is minimum for KNN, BNB, and DT classifiers. KNN+NRAS thus proves
to be the last option for NTL detection for our actual dataset, on NRAS-based
balancing. The precision-recall curve of implemented classifiers for NRAS-based
data balancing is shown in figure-5.11a. The PRC also shows highest value for
KNN and DT classifiers.
FPR calculates the total number of normal users who are predicted to be thieves.
A high FPR increases manual work by increasing the on-site theft verification
process. However, a large FPR also shows that the classification algorithm is suc-
cessful in theft detection which actually belongs to an honest class in the dataset.
The correctly predicted theft users are TPR and its sum with TNR needs to be
100% for a good classification.
Figure-5.11b shows the ROC curve with given AUC values for the implemented
classifiers. The KNN and DT outperform, with an AUC value of 0.92 and 0.91,
respectively.
5.6.5 Output Performance using CCR-based Data Balanc-
ing
In the last experiment, a CCR-based balanced dataset is used for classification
purposes. The accuracy metric is used for classification evaluation. GNB, DT,
and QDA showed an accuracy score of 0.92, 0.96, and 0.98, respectively as shown
in figure-5.12a.
106 By: Arshid Ali
MS Thesis Electricity Theft Detection
Output Accuracy
15 Implemented Models
(a) 15 Classifiers’ Output Performance
on CCR
Output Performance
15 Implemented Classifiers
(b) 15 Classifiers’ Output Performance
on CCR
The classifiers are also verified for accuracy, AUC, Precision, Recall, MCC, F1-
Score, and FNR. The best performance is obtained on QDA and DT classifiers,
as shown in figure-5.12b.
However, only the accuracy metric is often not enough for all classification prob-
lems. And we need some other necessary metrics to make sure that a model is
reliable in terms of classification.
Precision
Precision
Recall
(a) 15 Classifiers’ Precision-Recall
Curve on CCR
False Positive Rate
True Positive Rate
(b) 15 Classifiers’ Output AUROC on
CCR
Similarly, the PRC of the QDA classifier increased to 0.99 in the case of CCR,
followed by GNB and DT. The evaluation of the ML algorithms with respect
107 By: Arshid Ali
MS Thesis Electricity Theft Detection
to the PRC metric value is shown in figure-5.13a. Comparatively, the perceptron
classifier had the lowest PRC of 0.77 among the implemented models, even though
the data is kept the same. Moreover, the precision of all the models is measured
and is given in figure.
The AUC score is an effective performance indicator for unbalanced datasets. This
performance indicator establishes how quickly the tpr increases as the fpr rises.
On the ROC curve, changing the decision threshold demonstrates the trade-off
between the tpr and fpr, as shown in figure-5.13b. As seen, the AUC value for
QDA is 0.9789 and outperforms the comparative algorithm. This shows that tpr
and fpr can be easily separated. Accuracy, the generic categorization term, as well
as MCC, AUC, FPR, and FNR, each experiment also includes the computation
of ROC and PR curves. For different thresholds, the trade-off between precision
and recall is determined by the PR.
The results of fifteen ML techniques are evaluated using different balancing ap-
proaches, as shown in table-5.3. On the evaluation of the results, we can analyze
the classifier behavior based on its output performance. This has been inferred
that almost all algorithms show improved results on balancing, as compared to
imbalanced data. The model maintains its superiority when imbalanced training
samples are available. It has been found that the QDA model showed majorly im-
proved performance than other classifiers, in terms of AUC, MCC, and F1-Score
metrics, using CCR balanced dataset.
108 By: Arshid Ali
MS Thesis Electricity Theft Detection
#
Classifiers
Performance
Metrics
Without
Balancing
SMOTE
AdaSyn
NRAS
SMOBD
CCR
1.
Logistic
Regression
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9134
0.51
0.6591
0.026
0.0501
0.1191
0.0012
0.9739
18.7 s
0.6317
0.63
0.7005
0.4559
0.5524
0.2802
0.1936
0.54405
1min 34s
0.6148
0.61
0.6743
0.4406
0.533
0.2442
0.2117
0.5593
55.6 s
0.7622
0.76
0.8766
0.6084
0.7183
0.5502
0.0850
0.3916
1min 38s
0.6380
0.64
0.7107
0.4616
0.5596
0.2939
0.1865
0.5384
1min 10s
0.7995
0.50
0.7998
0.9996
0.8886
-0.0088
1.0
0.00038
1min 27s
2.
Bernoulli
Naïve Bayes
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.3038
0.52
0.0926
0.789
0.1657
0.0301
0.7427
0.2109
13.8 s
0.6037
0.60
0.5625
0.9222
0.6988
0.2713
0.7124
0.0777
19.6 s
0.5980
0.60
0.5593
0.9149
0.6943
0.2551
0.7171
0.0850
13.3s
0.6219
0.62
0.573
0.9472
0.714
0.323
0.7009
0.0528
7.79s
0.6043
0.61
0.5628
0.9234
0.6993
0.2731
0.7123
0.07663
8.28 s
0.6775
0.72
0.9276
0.6474
0.7626
0.3588
0.20204
0.35256
37.8 s
3.
Gaussian
Naïve Bayes
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9049
0.54
0.3433
0.0925
0.1457
0.1406
0.0169
0.9075
5.17 s
0.5363
0.53
0.8551
0.0835
0.1522
0.1615
0.0140
0.9164
13.8 s
0.5339
0.53
0.8155
0.0729
0.1338
0.1368
0.0164
0.9271
12.2s
0.6009
0.60
0.9462
0.2112
0.3453
0.3169
0.0119
0.7888
11.1s
0.5408
0.54
0.8668
0.0927
0.1675
0.1749
0.0141
0.9073
10s
0.9216
0.95
0.9964
0.9053
0.9487
0.801
0.0131
0.0946
41.7 s
4.
K Nearest
neighbor
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9096
0.54
0.4222
0.0853
0.1419
0.1588
0.0112
0.9147
4min 58s
0.8067
0.81
0.7225
0.9941
0.8368
0.6622
0.3792
0.0058
17min 15s
0.7974
0.80
0.7129
0.9941
0.8304
0.6473
0.3982
0.0058
17min 3s
0.9267
0.93
0.9129
0.943
0.9277
0.8539
0.0893
0.0570
18min 26s
0.8611
0.86
0.7847
0.9941
0.8771
0.7496
0.2708
0.0058
17min 47s
0.6284
0.76
0.9953
0.5381
0.6985
0.4263
0.0101
0.4619
90min 27s
5.
Perceptron
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9124
0.54
0.5025
0.0916
0.1549
0.1869
0.0087
0.9084
11.5 s
0.5714
0.57
0.8975
0.158
0.2687
0.2476
0.0179
0.8419
29.4 s
0.5391
0.54
0.8748
0.0887
0.1611
0.1736
0.0126
0.9113
25.8s
0.7512
0.75
0.9275
0.5432
0.6851
0.5511
0.0421
0.45680
12s
0.613373
0.61
0.8572
0.2688
0.4093
0.3089
0.0444
0.7311
19.2s
0.6881
0.43
0.7748
0.8602
0.8153
-0.1768
0.9994
0.1398
26.2 s
6.
PAC
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9137
0.52
0.625
0.0359
0.068
0.1366
0.00198
0.9640
15.8 s
0.6419
0.64
0.5937
0.8552
0.7089
0.3066
0.5836
0.1418
1min 54s
0.5684
0.57
0.8664
0.1635
0.2751
0.237
0.0250
0.8365
2min 11s
0.7972
0.78
0.9283
0.6066
0.7338
0.5976
0.0465
0.3933
2min 27s
0.5991
0.51
0.5031
0.9973
0.6688
0.0872
0.9780
0.0026
1min 54s
0.7253
0.46
0.7856
0.9145
0.8452
-0.1317
0.9978
0.0854
41.7 s
7.
Quadratic
Discriminant
Analysis
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9124
0.52
0.5065
0.035
0.0655
0.1156
0.0032
0.9649
2min 15s
0.6215
0.62
0.9954
0.2416
0.3888
0.3687
0.0011
0.7584
3min 53s
0.6628
0.66
0.8456
0.3964
0.5398
0.3832
0.0720
0.6035
4min 2s
0.8540
0.85
0.8526
0.8549
0.8537
0.7081
0.1468
0.1450
4min 17s
0.5874
0.59
0.796
0.2313
0.3584
0.245
0.0588
0.7687
4min 9s
0.9790
0.98
0.9946
0.9791
0.9868
0.9366
0.0211
0.0209
10min 53s
Table 5.2: 15 Models Output Performance
109 By: Arshid Ali
MS Thesis Electricity Theft Detection
8.
Stochastic
Gradient
Descent
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9126
0.50
1.0
0.0027
0.0054
0.0496
0.0
0.9973
9.06 s
0.5742
0.60
0.7116
0.3373
0.4576
0.2372
0.1357
0.6627
2.2s
0.5668
0.56
0.7759
0.1767
0.2878
0.1984
0.0507
0.8233
41.5s
0.6770
0.68
0.8993
0.406
0.5594
0.4321
0.0451
0.5940
22.6s
0.5924
0.59
0.8044
0.2268
0.3538
0.2475
0.0547
0.7731
31.5s
0.7999
0.50
0.7999
1.0
0.8888
0.0
1.0
0.0
1min 24s
9.
Ridge
Classifier
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9129
0.51
0.625
0.018
0.0349
0.0955
0.0010
0.9820
16.5 s
0.6322
0.63
0.7306
0.4148
0.5291
0.2919
0.1518
0.5852
33.3s
0.62039
0.62
0.7062
0.4089
0.518
0.2645
0.1692
0.5910
32.6s
0.6986
0.70
0.9164
0.4348
0.5898
0.4654
0.0393
0.5652
29.5s
0.6365
0.64
0.7438
0.4125
0.5307
0.3035
0.1410
0.5874
28.9s
0.7998
0.50
0.7998
0.9999
0.8888
-0.0046
1.0
0.00010
1min 16s
10.
LDA
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9085
0.56
0.4273
0.1293
0.1985
0.1982
0.0166
0.8707
2min 19s
0.6847
0.68
0.7221
0.597
0.6536
0.3747
0.2282
0.4029
4min 21s
0.67164
0.67
0.7045
0.5885
0.6413
0.3478
0.2455
0.4115
4min 17s
0.8187
0.82
0.8589
0.7613
0.8072
0.6415
0.1241
0.2387
3min 54s
0.6515
0.65
0.7138
0.5018
0.5893
0.3165
0.1997
0.4982
4min 3s
0.7997
0.50
0.7999
0.9998
0.8887
-0.0062
1.0
0.00019
27min 15s
11.
DT
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.8617
0.60
0.2457
0.2828
0.2629
0.1872
0.0833
0.7172
17min
0.8439
0.84
0.8225
0.8765
0.8486
0.69
0.1878
0.1235
18min 28s
0.83
0.84
0.8187
0.8692
0.8432
0.6789
0.1915
0.1307
14min 33s
0.9122
0.91
0.8951
0.9309
0.9126
0.8231
0.10833
0.0691
26min 54s
0.7984
0.80
0.7883
0.8051
0.7966
0.5905
0.2146
0.1948
17min 31s
0.9677
0.95
0.9781
0.9814
0.9797
0.898
0.0879
0.0186
36min 8s
12.
Nearest
Centroid
Classifier
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.8478
0.56
0.1836
0.2128
0.1929
0.1144
0.0912
0.7863
4.42 s
0.5604
0.56
0.682
0.2204
0.3332
0.161
0.1020
0.7795
7.23s
0.5545
0.55
0.6369
0.2483
0.3573
0.1357
0.1408
0.7517
9.48s
0.5976
0.60
0.9262
0.209
0.3411
0.3046
0.0165
0.7909
3.33s
0.5686
0.57
0.7002
0.2346
0.3514
0.1808
0.0997
0.7654
3.44s
0.3936
0.55
0.9776
0.9814
0.9795
0.0867
0.1942
0.7094
11.4s
13.
Multinomial
Naïve Bayes
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9130
0.51
0.6452
0.018
0.0349
0.0975
0.0009
0.9820
1.85 s
0.6167
0.62
0.7878
0.3159
0.4509
0.2893
0.0845
0.6841
8.48s
0.6028
0.60
0.7356
0.3179
0.4439
0.2483
0.1136
0.6820
9.08s
0.6537
0.65
0.9329
0.3288
0.4862
0.4012
0.0234
0.6711
3.19s
0.6190
0.62
0.7926
0.319
0.4549
0.2948
0.0828
0.6810
3.59s
Nill
14.
Complement
Naïve Bayes
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.8769
0.59
0.2768
0.2504
0.263
0.1963
0.0628
0.7495
1.84 s
0.6140
0.61
0.7904
0.3068
0.442
0.2861
0.0808
0.6931
6.66s
0.6041
0.60
0.7317
0.3259
0.451
0.249
0.1189
0.6740
6.69s
0.6529
0.65
0.9332
0.3269
0.4842
0.4
0.0232
0.6730
3.07s
0.6170
0.62
0.7953
0.3116
0.4477
0.2926
0.0796
0.6884
3.57s
Nill
15.
Dummy
Classifier
Accuracy
AUC
Precision
Recall
F1-Score
MCC
FPR
FNR
Time
0.9123
0.50
nan
0.0
nan
0.0
0.0
1.0
85.8 ms
0.4982
0.50
0.4983
1.0
0.6651
0.0
1.0
0.0
251ms
0.5012
0.50
nan
0.0
nan
0.0
0.0
1.0
252ms
0.4982
0.50
0.4983
1.0
0.6651
0.0
1.0
0.0
135ms
0.4982
0.50
0.4983
1.0
0.6651
0.0
1.0
0.0
141ms
0.7999
0.50
0.7999
1.0
0.8888
0.0
1.0
0.0
430ms
Table 5.3: 15 Models Output Performance
110 By: Arshid Ali
MS Thesis Electricity Theft Detection
We looked into the classifiers that are best at predicting electricity theft. For this
purpose, we showed the output of 8 performance metrics including ROC curves of
the 15 classifiers, and determined the related AUC ratings for easier comparison.
When compared to other classifiers, we found that the AUCs of a few algorithms
were often higher. The performance of the classifiers SGDC, perceptron, and BNB
is the worst among the fifteen analyzed classifiers in regard to AUC, whereas the
models QDA, DT, KNN, and LR are comparable with one another.
Using the SGCC dataset, the 15 ML classifiers’ are evaluated to detect electricity
theft. Fifteen ML algorithms are chosen for this purpose. And each of the 15
classifiers is individually simulated 10 times and their average accuracy is noted
to overcome the variations in output results due to random data splitting. To ob-
serve the model classification performance, 8 classification parameters are chosen.
The results are noted for each class balancing technique and without any data bal-
ancing. For comparison purposes, the results of each of the 15 ML algorithms are
evaluated on the basis of imbalance dataset results and dummy classifier results.
The complete simulation results for the 15 classifiers are given in table-5.3.
We looked into the classifiers that are best at predicting electricity theft. For this
purpose, we showed the output of 8 performance metrics including ROC curves of
the 15 classifiers, and determined the related AUC ratings for easier comparison.
When compared to other classifiers, we found that the AUCs of a few algorithms
were often higher. The performance of the classifiers SGDC, perceptron, and BNB
is the worst among the fifteen analyzed classifiers in regard to AUC, whereas the
models QDA, DT, KNN, and LR are comparable with one another.
Using the SGCC dataset, the 15 ML classifiers are evaluated to detect electricity
theft. Fifteen ML algorithms are chosen for this purpose. And each of the 15
classifiers is individually simulated 10 times and their average accuracy is noted
to overcome the variations in output results due to random data splitting. To
observe the model classification performance, 8 classification parameters are cho-
sen. The results are noted for each time, 5 balancing techniques, and without
any data balancing. For comparison purposes, the results of each of the 15 ML
algorithms are evaluated on the basis of imbalance dataset results and dummy
classifier results. The complete simulation results for the 15 classifiers are given
in table-2
111 By: Arshid Ali
MS Thesis Electricity Theft Detection
Chapter 6
Conclusion & Future Work
112 By: Arshid Ali
MS Thesis Electricity Theft Detection
6.1 Conclusion
In the paper, an ML-based stacked generalization technique is proposed to over-
come the CPTA issue in the SG. The overall system is divided into four sections
with specific functions.
The data obtained from the utility need some pre-processing before being used
for model training. The first module addresses these issues with novel techniques
in order to process the data without losing important information. The NaN is
imputed using the mean imputation method for making a complete ECP. The
data is normalized with the min-max scaling technique to bring the data into a
proper range. A z-score capping technique is applied for the efficient handling of
outliers in the dataset.
In the second module, a leveraging-PCA-based technique is applied for important
feature extraction and data reduction purposes. We implement the SVM-SMOTE
technique for optimal balancing of the theft and normal class data obtained from
the PCA technique.
The benchmark classifiers are implemented in module 3 of the proposed model.
The dataset is split into 80:20 training and testing ratio after balancing, and fed
to the four base classifiers. These base classifiers are trained and predicted on 80%
(training set) of the dataset, and the predictions are obtained for each classifier.
The final classification is performed, as showin in module 4, with input from
four different ML models and a meta-level DL model. The prediction of level-0
classifiers is fed to the level-1 model to capture the ECP information from all the
base classifiers. The final prediction obtained from the level-1 model shows an
enhanced performance in terms of classification. The results obtained show that
our proposed model outperformed other benchmark ML models. The proposed
model achieves a high accuracy of 97.6%. A very low value of FPR and FNR
is obtained with 0.7%, 2.02%, respectively, which is never achieved before in the
state-of-the-art techniques using the original SGCC dataset.
The results obtained make the proposed model useful to be used in industrial
applications for theft detection and NTL reduction purposes.
113 By: Arshid Ali
MS Thesis Electricity Theft Detection
6.2 Conclusion
This paper has been verified on a real-world dataset of State Grid Corporation of
China, Fujian City, China for theft detection. The dataset includes 42,373 users’
daily consumption records with 1,034 features. We implemented 15 individual ML
classifiers and a comparative analysis is done using several data-balancing tech-
niques. The classifiers are verified using 8 types of performance metrics for the
potential detection of NTL. The aim of this research is to train different classifiers
on the labeled dataset and then identify the theft users on unknown data using
trained models. For this purpose, different balancing techniques are compared
including SMOTE, AdaSyn, NRAS, SMOBD, and CCR. The ML methods im-
plemented include LR, BNB, GNB, KNN, Perceptron, PAC, QDA, SGDC, RC,
LDA, DT, NCC, MNB, CNB and dummy classifier.
For comparison purposes, the performance of balancing techniques is compared
with the imbalanced dataset while the classifier results are compared with a
dummy classifier.
In our findings, the classifiers performed differently on each class balancing tech-
nique. But overall, all balancing techniques showed good classification results in
contrast to the imbalanced dataset. For classifiers’ comparison, one of our findings
showed that, with respect to AUC-measure, QDA showed the highest classifica-
tion performance on CCR with accuracy, AUC, precision, recall, F1-Score, MCC,
FPR and FNR values of 0.979, 0.98, 0.994, 0.979, 0.986, 0.936, 0.021 and 0.020,
respectively. Other classifiers like DT and GNB also show good classification
results on CCR. While AUC values of 15-classifiers, when implemented, on the 5-
balancing techniques showed that BNB, QDA, DT, and GNB on CCR-balancing
and KNN, SGDC, RC, LDA, NCC, MNB, CNB, Perceptron, PAC and LR on
NRAS-balancing give comparatively outstanding results with AUC values of 0.72,
0.98, 0.95, 0.95, 0.93, 0.68, 0.70, 0.82, 0.60, 0.65, 0.65, 0.75, 0.78 and 0.76, re-
spectively. Finally, the best classifier identified in this study, for electricity theft
detection, is QDA on the CCR technique.
114 By: Arshid Ali
MS Thesis Electricity Theft Detection
6.3 Future Work
The proposed system model used here works in offline mode. In the future, the
concern will be to transform the model into the online mode with hyper-parameter
tuning in order to decrease the system execution time.
In the future, we would like to investigate different classifiers with feature extrac-
tion steps using recently developed Generative Artificial Networks (GANs) with
ensemble machine learning methods and deep learning algorithms. It is our plan
to test these techniques for AUC and MCC scores in order to improve theft and
normal users classification prediction.
115 By: Arshid Ali
MS Thesis Electricity Theft Detection
Bibliography
[1] “Electrical grid distributions student energy. [Online]. Available:
https://studentenergy.org/distribution/electrical-grid/
[2] J. R. Aguero, E. Takayesu, D. Novosel, and R. Masiello, “Modernizing the
grid: Challenges and opportunities for a sustainable future,” IEEE Power
and Energy Magazine, vol. 15, no. 3, pp. 74–83, 2017.
[3] B. Wormuth, S. Wang, P. Dehghanian, M. Barati, A. Estebsari, T. P. Filom-
ena, M. H. Kapourchali, and M. A. Lejeune, “Electric power grids under
high-absenteeism pandemics: History, context, response, and opportuni-
ties,” IEEE Access, vol. 8, pp. 215 727–215 747, 2020.
[4] G. Dileep, “A survey on smart grid technologies and applications,”
Renewable Energy, vol. 146, pp. 2589–2625, 2020. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0960148119312790
[5] C. M. Flath and N. Stein, “Towards a data science toolbox for industrial
analytics applications,” Computers in Industry, vol. 94, pp. 16–25, 2018.
[6] “Applications of data science | real-world appli-
cations. [Online]. Available: https://intellipaat.com/blog/
applications-of-data-science-real-world-applications/
[7] “What is machine learning and types of machine learning [up-
dated]. [Online]. Available: https://www.simplilearn.com/tutorials/
machine-learning-tutorial/what-is-machine-learning
[8] “Ensemble methods bagging, boosting, and stack-
ing | by ankit chauhan | analytics vidhya |
medium. [Online]. Available: https://medium.com/analytics-vidhya/
ensemble-methods-bagging-boosting-and-stacking-28d006708731
116 By: Arshid Ali
MS Thesis Electricity Theft Detection
[9] E. Hossain, I. Khan, F. Un-Noor, S. S. Sikander, and M. S. H. Sunny,
“Application of big data and machine learning in smart grid, and associated
security concerns: A review,” Ieee Access, vol. 7, pp. 13 960–13 988, 2019.
[10] “U.s. energy information administration (eia). [Online]. Available:
https://www.eia.gov/tools/faqs/faq.php?id=427&t=3
[11] A. Ullah, N. Javaid, M. Asif, M. U. Javed, and A. S. Yahaya, “Alexnet,
adaboost and artificial bee colony based hybrid model for electricity theft
detection in smart grids,” IEEE Access, vol. 10, pp. 18 681–18 694, 2022.
[12] P. Massaferro, J. M. D. Martino, and A. Fernández, “Fraud detection
on power grids while transitioning to smart meters by leveraging multi-
resolution consumption data,” IEEE Transactions on Smart Grid, vol. 13,
no. 3, pp. 2381–2389, 2022.
[13] A. L. Shah, W. Mesbah, and A. T. Al-Awami, “An algorithm for accurate
detection and correction of technical and nontechnical losses using smart me-
tering,” IEEE Transactions on Instrumentation and Measurement, vol. 69,
no. 11, pp. 8809–8820, 2020.
[14] L. J. Lepolesa, S. Achari, and L. Cheng, “Electricity theft detection in smart
grids based on deep neural network,” IEEE Access, vol. 10, pp. 39 638–
39 655, 2022.
[15] N. Javaid, “A plstm, alexnet and esnn based ensemble learning model for
detecting electricity theft in smart grids,” IEEE Access, vol. 9, pp. 162 935–
162 950, 2021.
[16] M. U. Saleem, M. R. Usman, M. A. Usman, and C. Politis, “Design, deploy-
ment and performance evaluation of an iot based smart energy management
system for demand side management in smart grid,” IEEE Access, vol. 10,
pp. 15 261–15 278, 2022.
[17] M. M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and A. Gómez-
Expósito, “Detection of non-technical losses using smart meter data and
supervised learning,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp.
2661–2670, 2019.
117 By: Arshid Ali
MS Thesis Electricity Theft Detection
[18] S. Mujeeb, N. Javaid, A. Ahmed, S. M. Gulfam, U. Qasim, M. Shafiq, and J.-
G. Choi, “Electricity theft detection with automatic labeling and enhanced
rusboost classification using differential evolution and jaya algorithm,” IEEE
Access, vol. 9, pp. 128 521–128 539, 2021.
[19] Z. Yan and H. Wen, “Electricity theft detection base on extreme gradient
boosting in ami,” in 2020 IEEE International Instrumentation and Mea-
surement Technology Conference (I2MTC), 2020, pp. 1–6.
[20] A. Arif, T. A. Alghamdi, Z. A. Khan, and N. Javaid, “Towards efficient
energy utilization using big data analytics in smart cities for electricity theft
detection,” Big Data Research, vol. 27, p. 100285, 2022. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S2214579621001027
[21] Z. Yan and H. Wen, “Performance analysis of electricity theft detection for
the smart grid: An overview,” IEEE Transactions on Instrumentation and
Measurement, vol. 71, pp. 1–28, 2022.
[22] A. Pasdar and S. Mirzakuchaki, “A solution to remote detecting of illegal
electricity usage based on smart metering,” in 2007 2nd International Work-
shop on Soft Computing Applications, 2007, pp. 163–167.
[23] S. S. Ali, M. Maroof, and S. Hanif, “Smart energy meters for energy conser-
vation minimizing errors,” in 2010 Joint International Conference on Power
Electronics, Drives and Energy Systems 2010 Power India, 2010, pp. 1–7.
[24] D. Zheng and S. Wang, “Research on measuring equipment of single-phase
electricity-stealing with long-distance monitoring function,” in 2009 Asia-
Pacific Power and Energy Engineering Conference, 2009, pp. 1–4.
[25] J. Astronomo, M. D. Dayrit, C. Edjic, and E. R. T. Regidor, “Develop-
ment of electricity theft detector with gsm module and alarm system,” in
2020 IEEE 12th International Conference on Humanoid, Nanotechnology,
Information Technology, Communication and Control, Environment, and
Management (HNICEM), 2020, pp. 1–5.
[26] A. Coa, “Smart prepaid energy metering system to detect energy theft with
facility for real time monitoring,” International Journal of Electrical and
Computer Engineering (IJECE), vol. 9, pp. 4184–4191, 2019.
118 By: Arshid Ali
MS Thesis Electricity Theft Detection
[27] T. Shankar, S. I. G, S. M. S, and S. R. Gondkar, “Wireless power theft
monitoring and controlling unit for substation,” Article in IOSR Journal
of Electronics and Communication Engineering, vol. 9, pp. 10–14, 2014.
[Online]. Available: www.iosrjournals.org
[28] S. H. Mir, S. Ashruf, Y. Bhat, N. Beigh et al., “Review on smart electric
metering system based on gsm/iot,” Asian Journal of Electrical Sciences,
vol. 8, no. 1, pp. 1–6, 2019.
[29] I. N. Fovino, A. Carcano, T. De Lacheze Murel, A. Trombetta, and
M. Masera, “Modbus/dnp3 state-based intrusion detection system,” in 2010
24th IEEE International Conference on Advanced Information Networking
and Applications, 2010, pp. 729–736.
[30] C. Bandim, J. Alves, A. Pinto, F. Souza, M. Loureiro, C. Magalhaes, and
F. Galvez-Durand, “Identification of energy theft and tampered meters us-
ing a central observer meter: a mathematical approach,” in 2003 IEEE
PES Transmission and Distribution Conference and Exposition (IEEE Cat.
No.03CH37495), vol. 1, 2003, pp. 163–168 Vol.1.
[31] S. McLaughlin, B. Holbert, A. Fawaz, R. Berthier, and S. Zonouz, “A multi-
sensor energy theft detection framework for advanced metering infrastruc-
tures,” IEEE Journal on Selected Areas in Communications, vol. 31, no. 7,
pp. 1319–1330, 2013.
[32] X. Xia, Y. Xiao, W. Liang, and M. Zheng, “Gthi: A heuristic algorithm
to detect malicious users in smart grids,” IEEE Transactions on Network
Science and Engineering, vol. 7, no. 2, pp. 805–816, 2020.
[33] A. A. Cárdenas, S. Amin, G. Schwartz, R. Dong, and S. Sastry, “A game
theory model for electricity theft detection and privacy-aware control in
ami systems,” in 2012 50th Annual Allerton Conference on Communication,
Control, and Computing (Allerton), 2012, pp. 1830–1837.
[34] Y. Gao, B. Foggo, and N. Yu, “A physically inspired data-driven model for
electricity theft detection with smart meter data,” IEEE Transactions on
Industrial Informatics, vol. 15, no. 9, pp. 5076–5088, 2019.
119 By: Arshid Ali
MS Thesis Electricity Theft Detection
[35] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, and S. Mishra, “Decision
tree and svm-based data analytics for theft detection in smart grid,” IEEE
Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1005–1016, 2016.
[36] Z. Yan and H. Wen, “Electricity theft detection base on extreme gradient
boosting in ami,” IEEE Transactions on Instrumentation and Measurement,
vol. 70, pp. 1–9, 2021.
[37] R. Punmiya and S. Choe, “Energy theft detection using gradient boosting
theft detector with feature engineering-based preprocessing,” IEEE Trans-
actions on Smart Grid, vol. 10, no. 2, pp. 2326–2329, 2019.
[38] F. Unal, A. Almalaq, S. Ekici, and P. Glauner, “Big data-driven detection
of false data injection attacks in smart meters,” IEEE Access, vol. 9, pp.
144 313–144 326, 10 2021.
[39] M. Panthi, “Anomaly detection in smart grids using machine learning tech-
niques,” in 2020 First International Conference on Power, Control and Com-
puting Technologies (ICPC2T), 2020, pp. 220–222.
[40] P. Chandel and T. Thakur, “Smart Meter Data Analysis for Electricity
Theft Detection using Neural Networks,” Advances in Science, Technology
and Engineering Systems Journal, vol. 4, no. 4, pp. 161–168, 2019.
[41] P. Jokar, N. Arianpoo, and V. C. M. Leung, “Electricity theft detection in
ami using customers consumption patterns,” IEEE Transactions on Smart
Grid, vol. 7, no. 1, pp. 216–226, 2016.
[42] N. Ayub, K. Aurangzeb, M. Awais, and U. Ali, “Electricity theft detec-
tion using cnn-gru and manta ray foraging optimization algorithm,” in 2020
IEEE 23rd International Multitopic Conference (INMIC), 2020, pp. 1–6.
[43] K. M. Ghori, R. A. Abbasi, M. Awais, M. Imran, A. Ullah, and L. Szathmary,
“Performance analysis of different types of machine learning classifiers for
non-technical loss detection,” IEEE Access, vol. 8, pp. 16 033–16 048, 2020.
[44] Pamir, N. Javaid, A. Almogren, M. Adil, M. U. Javed, and M. Zuair, “Rfe
based feature selection and knnor based data balancing for electricity theft
detection using bilstm-logitboost stacking ensemble model,” IEEE Access,
vol. 10, pp. 112 948–112 963, 2022.
120 By: Arshid Ali
MS Thesis Electricity Theft Detection
[45] S. Hussain, M. W. Mustafa, K. H. A. Al-Shqeerat, F. Saeed, and B. A. S. Al-
rimy, “A novel feature-engineered-ngboost machine-learning framework for
fraud detection in electric power consumption data,” Sensors, vol. 21, no. 24,
2021. [Online]. Available: https://www.mdpi.com/1424-8220/21/24/8423
[46] S. Hussain, M. W. Mustafa, T. A. Jumani, S. K. Baloch, H. Alotaibi,
I. Khan, and A. Khan, “A novel feature engineered-catboost-based super-
vised machine learning framework for electricity theft detection,” Energy
Reports, vol. 7, pp. 4425–4436, 2021.
[47] L. Duarte Soares, A. de Souza Queiroz, G. P. López, E. M. Carreño-Franco,
J. M. López-Lezama, and N. Muñoz-Galeano, “Bigru-cnn neural network
applied to electric energy theft detection,” Electronics, vol. 11, no. 5, p. 693,
2022.
[48] Z. Qu, H. Li, Y. Wang, J. Zhang, A. Abu-Siada, and Y. Yao, “Detection of
electricity theft behavior based on improved synthetic minority oversampling
technique and random forest classifier,” Energies, vol. 13, no. 8, 2020.
[Online]. Available: https://www.mdpi.com/1996-1073/13/8/2039
[49] S. K. Gunturi and D. Sarkar, “Ensemble machine learning models for the
detection of energy theft,” Electric Power Systems Research, vol. 192, p.
106904, 2021.
[50] R. Xia, Y. Gao, Y. Zhu, D. Gu, and J. Wang, “An attention-based wide and
deep cnn with dilated convolutions for detecting electricity theft considering
imbalanced data,” Electric Power Systems Research, vol. 214, p. 108886,
2023.
[51] L. Cui, L. Guo, L. Gao, B. Cai, Y. Qu, Y. Zhou, and S. Yu, “A covert
electricity-theft cyberattack against machine learning-based detection mod-
els,” IEEE Transactions on Industrial Informatics, vol. 18, no. 11, pp. 7824–
7833, 2022.
[52] A. A. Almazroi and N. Ayub, “A novel method cnn-lstm ensembler based
on black widow and blue monkey optimizer for electricity theft detection,”
IEEE Access, vol. 9, pp. 141 154–141 166, 2021.
121 By: Arshid Ali
MS Thesis Electricity Theft Detection
[53] D. Gu, Y. Gao, K. Chen, J. Shi, Y. Li, and Y. Cao, “Electricity theft
detection in ami with low false positive rate based on deep learning and evo-
lutionary algorithm,” IEEE Transactions on Power Systems, vol. 37, no. 6,
pp. 4568–4578, 2022.
[54] M. N. Hasan, R. N. Toma, A.-A. Nahid, M. M. M. Islam, and J.-M.
Kim, “Electricity theft detection in smart grid systems: A cnn-lstm
based approach,” Energies, vol. 12, no. 17, 2019. [Online]. Available:
https://www.mdpi.com/1996-1073/12/17/3310
[55] R. Qi, J. Zheng, Z. Luo, and Q. Li, “A novel unsupervised data-driven
method for electricity theft detection in ami using observer meters,” IEEE
Transactions on Instrumentation and Measurement, vol. 71, pp. 1–10, 2022.
[56] R. Xia, Y. Gao, Y. Zhu, D. Gu, and J. Wang, “An efficient method
combined data-driven for detecting electricity theft with stacking structure
based on grey relation analysis,” Energies, vol. 15, no. 19, 2022. [Online].
Available: https://www.mdpi.com/1996-1073/15/19/7423
[57] A. Takiddin, M. Ismail, U. Zafar, and E. Serpedin, “Deep autoencoder-
based anomaly detection of electricity theft cyberattacks in smart grids,”
IEEE Systems Journal, 2022.
[58] H.-X. Gao, S. Kuenzel, and X.-Y. Zhang, “A hybrid convlstm-based anomaly
detection approach for combating energy theft,” IEEE Transactions on In-
strumentation and Measurement, vol. 71, pp. 1–10, 2022.
[59] S. A. Badawi, D. Guessoum, I. Elbadawi, and A. Albadawi, “A novel
time-series transformation and machine-learning-based method for ntl fraud
detection in utility companies,” Mathematics, vol. 10, no. 11, 2022. [Online].
Available: https://www.mdpi.com/2227-7390/10/11/1878
[60] Y. Huang and Q. Xu, “Electricity theft detection based on stacked sparse
denoising autoencoder,” International Journal of Electrical Power & Energy
Systems, vol. 125, p. 106448, 2021.
[61] K. Fei, Q. Li, and C. Zhu, “Non-technical losses detection using missing val-
ues pattern and neural architecture search,” International Journal of Elec-
trical Power & Energy Systems, vol. 134, p. 107410, 2022.
122 By: Arshid Ali
MS Thesis Electricity Theft Detection
[62] J. Pereira and F. Saraiva, “Convolutional neural network applied
to detect electricity theft: A comparative study on unbalanced
data handling techniques,” International Journal of Electrical Power
Energy Systems, vol. 131, p. 107085, 2021. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0142061521003240
[63] N. Ibrahim, S. Al-Janabi, and B. Al-Khateeb, “Electricity-theft detection in
smart grid based on deep learning,” Bulletin of Electrical Engineering and
Informatics, vol. 10, no. 4, pp. 2285–2292, 2021.
[64] M. M. Buzau, J. Tejedor-Aguilera, P. Cruz-Romero, and A. Gómez-
Expósito, “Detection of non-technical losses using smart meter data and
supervised learning,” IEEE Transactions on Smart Grid, vol. 10, no. 3, pp.
2661–2670, 2018.
[65] Z. Yan and H. Wen, “Electricity theft detection base on extreme gradient
boosting in ami,” IEEE Transactions on Instrumentation and Measurement,
vol. 70, pp. 1–9, 2021.
[66] G. Lin, X. Feng, W. Guo, X. Cui, S. Liu, W. Jin, Z. Lin, and Y. Ding, “Elec-
tricity theft detection based on stacked autoencoder and the undersampling
and resampling based random forest algorithm,” IEEE Access, vol. 9, pp.
124 044–124 058, 2021.
[67] R. Punmiya and S. Choe, “Energy theft detection using gradient boosting
theft detector with feature engineering-based preprocessing,” IEEE Trans-
actions on Smart Grid, vol. 10, no. 2, pp. 2326–2329, 2019.
[68] Z. Zheng, Y. Yang, X. Niu, H.-N. Dai, and Y. Zhou, “Wide and deep convo-
lutional neural networks for electricity-theft detection to secure smart grids,”
IEEE Transactions on Industrial Informatics, vol. 14, no. 4, pp. 1606–1615,
2017.
[69] P. Jokar, N. Arianpoo, and V. C. Leung, “Electricity theft detection in ami
using customers consumption patterns,” IEEE Transactions on Smart Grid,
vol. 7, no. 1, pp. 216–226, 2015.
[70] N. F. Avila, G. Figueroa, and C.-C. Chu, “Ntl detection in electric distribu-
tion systems using the maximal overlap discrete wavelet-packet transform
123 By: Arshid Ali
MS Thesis Electricity Theft Detection
and random undersampling boosting,” IEEE Transactions on Power Sys-
tems, vol. 33, no. 6, pp. 7171–7180, 2018.
[71] A. Jindal, A. Dua, K. Kaur, M. Singh, N. Kumar, and S. Mishra, “Decision
tree and svm-based data analytics for theft detection in smart grid,” IEEE
Transactions on Industrial Informatics, vol. 12, no. 3, pp. 1005–1016, 2016.
[72] M. Nabil, M. Ismail, M. Mahmoud, M. Shahin, K. Qaraqe, and E. Serpedin,
“Deep recurrent electricity theft detection in ami networks with random tun-
ing of hyper-parameters,” in 2018 24th International Conference on Pattern
Recognition (ICPR), 2018, pp. 740–745.
[73] Y. He, G. J. Mendis, and J. Wei, “Real-time detection of false data injection
attacks in smart grid: A deep learning-based intelligent mechanism,” IEEE
Transactions on Smart Grid, vol. 8, no. 5, pp. 2505–2516, 2017.
[74] J. Lee, Y. G. Sun, I. Sim, S. H. Kim, D. I. Kim, and J. Y. Kim, “Non-
technical loss detection using deep reinforcement learning for feature cost
efficiency and imbalanced dataset,” IEEE Access, vol. 10, pp. 27 084–27 095,
2022.
[75] A. Y. Kharal, H. A. Khalid, A. Gastli, and J. M. Guerrero, “A novel
features-based multivariate gaussian distribution method for the fraudulent
consumers detection in the power utilities of developing countries,” IEEE
Access, vol. 9, pp. 81 057–81 067, 2021.
[76] F. Shehzad, N. Javaid, S. Aslam, and M. U. Javaid, “Electricity theft detec-
tion using big data and genetic algorithm in electric power systems,” Electric
Power Systems Research, vol. 209, p. 107975, 2022.
[77] “Dealing with outliers using the z-score method - analytics vid-
hya. [Online]. Available: https://www.analyticsvidhya.com/blog/2022/08/
dealing-with-outliers-using-the-z-score-method/
[78] F. Shehzad, N. Javaid, A. Almogren, A. Ahmed, S. M. Gulfam, and A. Rad-
wan, “A robust hybrid deep learning model for detection of non-technical
losses to secure smart grids,” IEEE Access, vol. 9, pp. 128 663–128 678, 2021.
124 By: Arshid Ali
MS Thesis Electricity Theft Detection
[79] P. R. Kanna, K. Sindhanaiselvan, and M. Vijaymeena, “A defensive mecha-
nism based on pca to defend denial of-service attack,” International Journal
of Security and Its Applications, vol. 11, no. 1, pp. 71–82, 2017.
[80] T. Peng, H. Shen, Y. Zhang, P. Ren, J. Zhao, and Y. Jia, “Status forecast
and fault classification of smart meters using lightgbm algorithm improved
by random forest,” Wireless Communications & Mobile Computing (Online),
vol. 2002, 2022.
[81] G. Lin, X. Feng, W. Guo, X. Cui, S. Liu, W. Jin, Z. Lin, and Y. Ding, “Elec-
tricity theft detection based on stacked autoencoder and the undersampling
and resampling based random forest algorithm,” IEEE Access, vol. 9, pp.
124 044–124 058, 2021.
[82] T. Daniya, M. Geetha, and K. S. Kumar, “Classification and regression trees
with gini index,” Advances in Mathematics Scientific Journal, vol. 9, no. 10,
pp. 1857–8438, 2020.
[83] S. Li, Y. Han, X. Yao, S. Yingchen, J. Wang, and Q. Zhao, “Electricity theft
detection in power grids with deep learning and random forests,” Journal of
Electrical and Computer Engineering, vol. 2019, 2019.
[84] B. Patnaik, M. Mishra, R. C. Bansal, and R. K. Jena, “Modwt-xgboost
based smart energy solution for fault detection and classification in a smart
microgrid,” Applied Energy, vol. 285, p. 116457, 2021.
[85] S. Dey, Y. Kumar, S. Saha, and S. Basak, “Forecasting to classification: Pre-
dicting the direction of stock market price using xtreme gradient boosting,”
PESIT South Campus, 2016.
[86] M. R. C. Acosta, S. Ahmed, C. E. Garcia, and I. Koo, “Extremely random-
ized trees-based scheme for stealthy cyber-attack detection in smart grid
networks,” IEEE access, vol. 8, pp. 19 921–19 933, 2020.
[87] B. Sumalatha, M. Seetha, and G. NARAYANAMMA, “An efficient ap-
proach for robust image classification based on extremely randomized de-
cision trees,” International Journal of Computer Science and Information
Technologies, vol. 2, no. 2, pp. 677–685, 2011.
125 By: Arshid Ali
MS Thesis Electricity Theft Detection
[88] Z. Ouyang, X. Sun, J. Chen, D. Yue, and T. Zhang, “Multi-view stack-
ing ensemble for power consumption anomaly detection in the context of
industrial internet of things,” IEEE Access, vol. 6, pp. 9623–9631, 2018.
[89] M. R. Mosavi, M. Khishe, M. J. Naseri, G. R. Parvizi, and M. Ayat, “Multi-
layer perceptron neural network utilizing adaptive best-mass gravitational
search algorithm to classify sonar dataset,” Archives of Acoustics, vol. 44,
2019.
[90] Pamir, N. Javaid, U. Qasim, A. S. Yahaya, E. H. Alkhammash, and M. Had-
jouni, “Non-technical losses detection using autoencoder and bidirectional
gated recurrent unit to secure smart grids,” IEEE Access, vol. 10, pp. 56863–
56 875, 2022.
[91] S. Nallathambi and K. Ramasamy, “Prediction of electricity consumption
based on dt and rf: An application on usa country power consumption,”
in 2017 IEEE International Conference on Electrical, Instrumentation and
Communication Engineering (ICEICE), 2017, pp. 1–7.
[92] R. Yao, N. Wang, W. Ke, P. Chen, and X. Sheng, “Electricity theft detection
in unbalanced sample distribution: a novel approach including a mechanism
of sample augmentation,” Applied Intelligence, pp. 1–20, 9 2022. [Online].
Available: https://link.springer.com/article/10.1007/s10489-022-04069-z
[93] G. P. Siknun and I. S. Sitanggang, “Web-based classification application for
forest fire data using the shiny framework and the c5.0 algorithm,” Procedia
Environmental Sciences, vol. 33, pp. 332–339, 2016.
[94] J. Brzezinski and G. Knafl, “Logistic regression modeling for context-based
classification,” in Proceedings. Tenth International Workshop on Database
and Expert Systems Applications. DEXA 99, 1999, pp. 755–759.
[95] S. S. Noureen, S. B. Bayne, E. Shaffer, D. Porschet, and M. Berman,
“Anomaly detection in cyber-physical system using logistic regression anal-
ysis,” in 2019 IEEE Texas Power and Energy Conference (TPEC), 2019,
pp. 1–6.
126 By: Arshid Ali
MS Thesis Electricity Theft Detection
[96] X. Shan, Y. Ren, J. Lin, M. Zhai, J. Li, and B. Wang, “Power system fault
diagnosis based on logistic regression deep neural network,” in 2021 IEEE
4th International Electrical and Energy Conference (CIEEC), 2021, pp. 1–6.
[97] V. Kumar, “Evaluation of computationally intelligent techniques for breast
cancer diagnosis,” Neural Computing and Applications, vol. 33, pp. 3195–
3208, 4 2021.
[98] J. Gou, H. Ma, W. Ou, S. Zeng, Y. Rao, and H. Yang, “A
generalized mean distance-based k-nearest neighbor classifier,” Expert
Systems with Applications, vol. 115, pp. 356–372, 2019. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0957417418305293
[99] S. Aziz, S. Z. Hassan Naqvi, M. U. Khan, and T. Aslam, “Electricity theft
detection using empirical mode decomposition and k-nearest neighbors,” in
2020 International Conference on Emerging Trends in Smart Technologies
(ICETST), 2020, pp. 1–5.
[100] H. L. Qimin Cao, Lei La and S. Han, “Mixed weighted knn for imbalanced
datasets,” International Journal of Performability Engineering, vol. 14,
no. 7, p. 1391, 2018. [Online]. Available: http://www.ijpe-online.com/EN/
abstract/article_3613.shtml
[101] M. Singh, M. Wasim Bhatt, H. S. Bedi, and U. Mishra, “Performance of
bernoullis naive bayes classifier in the detection of fake news,” Materials
Today: Proceedings, 2020. [Online]. Available: https://www.sciencedirect.
com/science/article/pii/S2214785320385333
[102] Y. Guo and L. Lu, “Research on recognition and classification of user stealing
detection based on weighted naive bayes,” in 2021 International Conference
on Control Science and Electric Power Systems (CSEPS), 2021, pp. 75–78.
[103] M. F. A. Saputra, T. Widiyaningtyas, and A. P. Wibawa, “Illiteracy clas-
sification using k means-naïve bayes algorithm,” International Journal on
Informatics Visualization, vol. 2, pp. 153–158, 2018.
[104] J. Singh and R. Banerjee, “A study on single and multi-layer perceptron neu-
ral network,” in 2019 3rd International Conference on Computing Method-
ologies and Communication (ICCMC), 2019, pp. 35–40.
127 By: Arshid Ali
MS Thesis Electricity Theft Detection
[105] C. A. Mello, R. Lewis, A. Brooks-Kayal, J. Carlsen, H. Graben-
statter, and A. M. White, “(14) (pdf) supervised learning for the
neurosurgery intensive care unit using single-layer perceptron classifiers.
[Online]. Available: https://www.researchgate.net/publication/281828950_
Supervised_Learning_for_the_Neurosurgery_Intensive_Care_Unit_
Using_Single-Layer_Perceptron_Classifiers
[106] M. R. Wasef and N. Rafla, “Hls implementation of linear discriminant anal-
ysis classifier,” in 2020 IEEE International Symposium on Circuits and Sys-
tems (ISCAS), 2020, pp. 1–4.
[107] H. Sifaou, A. Kammoun, and M.-S. Alouini, “High-dimensional linear
discriminant analysis classifier for spiked covariance model *,” Journal of
Machine Learning Research, vol. 21, pp. 1–24, 2020. [Online]. Available:
http://jmlr.org/papers/v21/19-428.html.
[108] “Linear discriminant analysis, explained | by yang xiaozhou | to-
wards data science. [Online]. Available: https://towardsdatascience.com/
linear-discriminant-analysis-explained-f88be6c1e00b
[109] C.-C. Chang, Y.-J. Lee, and H.-K. Pao, “A passive-aggressive algorithm for
semi-supervised learning,” in 2010 International Conference on Technologies
and Applications of Artificial Intelligence, 2010, pp. 335–341.
[110] K. Crammer, O. Dekel, and J. Keshet, “Online passive-aggressive algorithms
shai shalev-shwartz yoram singer ,” Journal of Machine Learning Research,
vol. 7, pp. 551–585, 2006.
[111] S. Mandt, M. D. H. Fman, and D. M. Blei, “Stochastic gradient descent
as approximate bayesian inference,” Journal of Machine Learning Research,
vol. 18, 4 2017. [Online]. Available: https://arxiv.org/abs/1704.04289v2
[112] L. Guo, M. Li, S. Xu, and F. Yang, “Application of stochastic gradient
descent technique for method of moments,” in 2020 IEEE International
Conference on Computational Electromagnetics (ICCEM), 2020, pp. 97–98.
[113] A. Sharma, “Guided stochastic gradient descent algorithm for inconsistent
datasets,” Applied Soft Computing, vol. 73, pp. 1068–1080, 2018.
128 By: Arshid Ali
MS Thesis Electricity Theft Detection
[Online]. Available: https://www.sciencedirect.com/science/article/pii/
S156849461830557X
[114] A. H. Jahromi and M. Taheri, “A non-parametric mixture of gaussian naive
bayes classifiers based on local independent features,” in 2017 Artificial In-
telligence and Signal Processing Conference (AISP), 2017, pp. 209–212.
[115] E. K. Ampomah, G. Nyame, Z. Qin, P. C. Addo, E. O. Gyamfi, and M. Gyan,
“Stock market prediction with gaussian naïve bayes machine learning algo-
rithm,” Informatica (Slovenia), vol. 45, pp. 243–256, 6 2021.
[116] D. T. Barus, R. Elfarizy, F. Masri, and P. H. Gunawan, “Parallel pro-
gramming of churn prediction using gaussian naïve bayes,” in 2020 8th
International Conference on Information and Communication Technology
(ICoICT), 2020, pp. 1–4.
[117] V. K. V and P. Samuel, “A multinomial naïve bayes classifier for identifying
actors and use cases from software requirement specification documents,”
in 2022 2nd International Conference on Intel ligent Technologies (CONIT),
2022, pp. 1–5.
[118] M. K. Saad, “The impact of text preprocessing and term weighting on arabic
text classification,” 2010.
[119] D. Arpit, S. Wu, P. Natarajan, R. Prasad, and P. Natarajan, “Ridge re-
gression based classifiers for large scale class imbalanced datasets,” in 2013
IEEE Workshop on Applications of Computer Vision (WACV), 2013, pp.
267–274.
[120] “Ridge classification concepts python examples -
data analytics. [Online]. Available: https://vitalflux.com/
ridge-classification-concepts-python-examples/
[121] S. Johri, S. Debnath, A. Mocherla, A. Singk, A. Prakash, J. Kim, and
I. Kerenidis, “Nearest centroid classification on a trapped ion quantum
computer,” npj Quantum Information 2021 7:1, vol. 7, pp. 1–11, 8 2021.
[Online]. Available: https://www.nature.com/articles/s41534-021-00456-5
129 By: Arshid Ali
MS Thesis Electricity Theft Detection
[122] E. N. Tamatjita and A. W. Mahastama, “Comparison of music genre clas-
sification using nearest centroid classifier and k-nearest neighbours,” in
2016 International Conference on Information Management and Technol-
ogy (ICIMTech), 2016, pp. 118–123.
[123] “Classification. [Online]. Available: https://idc9.github.io/stor390/notes/
classification/classification.html
[124] D. Menaka, L. P. Suresh, and S. S. P. Kumar, “Land cover classification of
multispectral satellite images using qda classifier,” in 2014 International
Conference on Control, Instrumentation, Communication and Computa-
tional Technologies (ICCICCT), 2014, pp. 1383–1386.
[125] “9.2.8 - quadratic discriminant analysis (qda) | stat 508. [Online]. Available:
https://online.stat.psu.edu/stat508/lesson/9/9.2/9.2.8
[126] “10.2 - discriminant analysis procedure | stat 505. [Online]. Available:
https://online.stat.psu.edu/stat505/lesson/10/10.2
[127] B. Seref and E. Bostanci, “Performance of naïve and complement naïve bayes
algorithms based on accuracy, precision and recall performance evaluation
criterions,” Int. J. Comput, vol. 8, pp. 75–92, 2019.
[128] J. D. Rennie, L. Shih, J. Teevan, and D. R. Karger, “Tackling the poor
assumptions of naive bayes text classifiers,” in Proceedings of the 20th inter-
national conference on machine learning (ICML-03), 2003, pp. 616–623.
[129] A. Martino, A. Rizzi, and F. M. F. Mascioli, “Supervised approaches for pro-
tein function prediction by topological data analysis,” in 2018 International
Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8.
[130] G. Figueroa, Y.-S. Chen, N. Avila, and C.-C. Chu, “Improved practices in
machine learning algorithms for ntl detection with imbalanced data,” in 2017
IEEE Power Energy Society General Meeting, 2017, pp. 1–5.
[131] A. Fernández, S. Garcia, F. Herrera, and N. V. Chawla, “Smote for learning
from imbalanced data: progress and challenges, marking the 15-year an-
niversary,” Journal of artificial intel ligence research, vol. 61, pp. 863–905,
2018.
130 By: Arshid Ali
MS Thesis Electricity Theft Detection
[132] J. Brandt and E. Lanzén, “A comparative review of smote and adasyn in
imbalanced data classification,” 2021.
[133] M. Koziarski, M. Woniak, and B. Krawczyk, “Combined cleaning and
resampling algorithm for multi-class imbalanced data with label noise,”
Knowledge-Based Systems, vol. 204, p. 106223, 2020. [Online]. Available:
https://www.sciencedirect.com/science/article/pii/S0950705120304330
[134] W. A. Rivera, “Noise reduction a priori synthetic over-sampling for class
imbalanced data sets,” Information Sciences, vol. 408, pp. 146–161, 2017.
[135] Q. Cao and S. Wang, “Applying over-sampling technique based on data den-
sity and cost-sensitive svm to imbalanced learning,” in 2011 International
Conference on Information Management, Innovation Management and In-
dustrial Engineering, vol. 2, 2011, pp. 543–548.
131 By: Arshid Ali
ResearchGate has not been able to resolve any citations for this publication.
ResearchGate has not been able to resolve any references for this publication.