Conference PaperPDF Available

Improving ML/DL Solutions for Anomaly Detection in IoT Environments

Authors:

Abstract and Figures

As part of the evolution toward an era of Web 3.0, the Internet of Things (IoT) bridges physical smart devices to digital world to enhance services for consumer convenience. However, the rapid increase of IoT devices led also to the inheritance of security, privacy, and trust problems, already well-known in traditional networks, making IoT devices even more vulnerable. To be able to detect anomalies and protect such IoT devices from cyberattacks, different techniques have been proposed in the literature using diverse approaches going from the logic-based (knowledge bases and ontologies) ones to the statistical ones (Machine Learning-ML/Deep Learning-DL). In this paper, we focus on the later approaches (ML/DL) to identify, reproduce, evaluate, and compare different state-of-the-art machine learning algorithms for anomaly detection in IoT environments, along with the main datasets used in such research works. Once suitable ML models and datasets are identified, we investigated the potential for enhancing them by incorporating a feature selection algorithm. This aims to reduce the dataset’s dimensionality while concurrently improving performance metrics such as accuracy, precision, recall, and F1-score.
Content may be subject to copyright.
Improving ML/DL Solutions for Anomaly
Detection in IoT Environments
Nouredine Tamani, Saad El-Jaouhari, Abdul-Qadir Khan, and Bastien Pauchet
Institut Supérieur d’Electronique de Paris (Isep), Issy-les-Moulineaux, France,
{saad.el-jaouhari, nouredine.tamani}@isep.fr,
abdul-qadir.khan@ext.isep.fr, bastien.pauchet@orange.fr
Abstract. As part of the evolution toward an era of Web 3.0, the In-
ternet of Things (IoT) bridges physical smart devices to digital world
to enhance services for consumer convenience. However, the rapid in-
crease of IoT devices led also to the inheritance of security, privacy,
and trust problems, already well-known in traditional networks, making
IoT devices even more vulnerable. To be able to detect anomalies and
protect such IoT devices from cyberattacks, different techniques have
been proposed in the literature using diverse approaches going from the
logic-based (knowledge bases and ontologies) ones to the statistical ones
(Machine Learning-ML/Deep Learning-DL). In this paper, we focus on
the later approaches (ML/DL) to identify, reproduce, evaluate, and com-
pare different state-of-the-art machine learning algorithms for anomaly
detection in IoT environments, along with the main datasets used in
such research works. Once suitable ML models and datasets are identi-
fied, we investigated the potential for enhancing them by incorporating a
feature selection algorithm. This aims to reduce the dataset’s dimension-
ality while concurrently improving performance metrics such as accuracy,
precision, recall, and F1-score.
Keywords: Anomaly detection ·IoT networks ·Security and Privacy ·
Machine/ Deep Learning ·Model Reproducibility.
1 Introduction
Internet of Things (IoT) is a domain mentioned for the first time in 1999 by
Kevin Ashton, when he produced the idea of a radio-frequency identification
chip (RFID) to track products in the supply chain [20]. IoT networks consist of
many devices capable of data collection, storage, processing, and communication
[11]. IoT has evolved significantly since then, thanks to the proliferation of con-
nected objects, and the harvesting and processing of large volumes of data with
Big Data and AI techniques. Nevertheless, the infrastructures, applications, and
services associated with IoT devices introduced several threats and vulnerabil-
ities as emerging protocols and workflows have exponentially increased attack
surfaces [20]. Securing IoT devices is challenging for both academics and in-
dustries because of the heterogeneity of IoT environment. Besides, conventional
2 N. TAMANI et al.
security controls are not suitable for all the IoT devices, in particular for the most
constrained ones. Since the distributed IoT networks are outside of the cover-
age of security perimeters, the existing solutions relying on the cloud also suffer
from centralization and high delay. Furthermore, IoT device vendors commonly
overlook security requirements in their design processes due to their rush-to-
market proclivity. Moreover, the lack of security standards has contributed to
the complexity of securing IoT devices. These challenges and the nature of IoT
applications call for a monitoring system such as anomaly detection at device,
edge and network levels beyond the organizational boundary [20].
The research community has already developed solid statistical and machine
learning methods to detect anomalies inside IoT data, with real-time analysis,
and prediction of unusual behaviors in IoT environments [13]. In this context,
an anomaly in IoT data can be defined as a data point or a subset of data points
that deviate from the normal patterns. These anomalies can be classified into 3
types, namely (i) Point anomaly, when it refers to a precise point/instant; (ii)
Contextual anomaly, when an outlier is compared to other points according
to a context and a time window; and (iii) Collective anomaly, when some
points are individually normal, but as a group, they are suspicious.
This paper places particular emphasis on ML/DL approaches for anomaly
detection in IoT data. First, we conducted a comprehensive review of recent
works addressing anomaly detection in IoT environments. This analysis aimed
to identify the main models along with their corresponding training datasets.
We noticed that most of the research efforts in this area have been dedicated to
point anomaly detection. Second, from the identified approaches and datasets,
we selected 5 publicly available datasets and reproduced the identified mod-
els to assess their performance.Then, we selected the model demonstrating the
highest performance, and further enhanced it through the implementation of a
feature selection approach. The objective is to reduce the dimensionality of the
dataset and the training time while increasing the performances. We showed that
Random Forest-based model trained on TON-IoT dataset with only the 9 best
features has a better Accuracy, Precision, Recall and F1-score than the model
trained on the original dataset (with 45 features).
The rest of the paper is organized as follows. Section 2 analyzes the ML
approaches developed in the literature for anomaly detection in IoT environ-
ments. Section 3 describes the existing datasets and trained ML/DL models for
anomaly detection in IoT data. In Section 4 details our experiments with the
selected datasets to reproduce and improve the ML/DL models for anomaly
detection. Section 5 concludes the paper with some comments and future work.
2 Approaches for Anomaly Detection in IoT Data
We focused in this section on the recent papers published in the literature be-
tween 2019 and 2023 (summarized in Subsection 2.1). We split them into 2
distinct types, namely: papers where a survey was conducted and papers where
the authors compared amongst ML/DL models (summarized in Subsection 2.2).
Improving ML/DL Solutions for Anomaly Detection in IoT Environments 3
2.1 Surveys in Anomaly Detection in IoT
The authors in [10] focused on DL approaches such as Multilayer Perceptron
(MLP) and Graph Neural Network (GNN) coupled with encoders or Recurrent
Neural Networks (RNN). They also listed the recent available datasets that
are used to train the models to detect attacks on IoT environments. In [3],
the authors surveyed the existing literature on ML/DL methods used to detect
cybersecurity attacks in IoT environments. The survey was conducted using
the PRISMA method, wherein eighty studies from 2016 to 2021 were carefully
selected and evaluated, along with the datasets available for IoT systems.
In [11,27], the authors provided an in-depth overview of the existing works in
developing anomaly detection solutions using ML/DL techniques for protecting
IoT systems. They focused on algorithms and datasets available in the field, but
no model comparison has been performed. Similarly, in [5], the authors carried
out a large overview of ML and DL techniques developed up to 2021, and the
nature of data for IoT systems, identified types of anomalies, datasets, and the
evaluation metrics to measure the performances of the trained models, as well.
In [13], the authors studied anomaly detection in IoT environments, where huge
amount of data can be collected, processed, and analysed to reduce risks, detect
and prevent malicious activities, and avoid involuntary downtime. The study
covered the period between 2000 and 2018 and fields of smart environments,
transports, health, smart objects, and industrial systems. The literature review
of the above articles is summarized in Table 1.
Table 1: Summary of surveys of ML/DL methods for anomaly detection in IoT.
Paper Models Dataset(s)
[11] RF (Random Forest), DL, RL,
LSTM, CNN (Convolutional Neural
Network), GNN, Multiple, AE-ANN,
AE-SNN, Ensemble, AE, Subspace,
Self-Learning, TCN, AE-LSTM,
DBN, DNN
N-BaIoT, CICIDS 2017, AWID,
UNSW-NB15, NSL-KDD,
Kyoto, KDD CUP 1999
[5] C_LOF, AutoCloud, TEDA
Clustering, BDLM & RBPF, HTM,
ANN (Artificial Neural Networks),
MDADM, Multi-kernel, xStream,
Regression Model, SVM, CEDAS,
MuDi-Stream, Extreme Learning
Machines, AMAD, LSTM,
Auto-encoder, DNN, Evolving
spiking NN, ISTL, e-SREBOM
Space imager data stream,
KDD29, Cover type,
Spam-SMS, Spam-URL,
KDDCup99, NAB, UCI, D1,
DS2, DS3, Yahoo Webscope,
HTTP, SMTP, SMTP+HTTP,
COVERTYPE, SHUTTLE,
Weather, Web traffic, Avocado,
Temperature, and UCSD
Pedestrian datasets
4 N. TAMANI et al.
[10] GDN, Gumbel softmax sampling
strategy, OmniAnomaly, MLP, LR
(Linear Regression), SVM, DT, RF,
SS-TCVN, auto-encoder, LSTM,
Gelenbe Network, Gaussian,
HOT-SAX, GTA, SS-VTCN, CNN
Swat, WaDI, MSL, SMAP,
DS2OS, BaIoT
[13] SVM, PCA, Kernel nonLinear
Regression, CNN, RNN, DT,
Auto-encoders
no dataset mentioned
[27] CNN, RED, DNN, Hybrid anomaly
detection, clusters, IRESE
unsupervised
No dataset
[3] Naïve Bayes, BayesNet, DT, RF,
SMV, SVR, KNN, FPT, Fuzzy
C-Means Algorithm, XGBoost, LR,
K-Means clustering, CDL, RNN,
CNN, Deep Auto-encoders,DNN,
DBN, MLPNN
BoT-IoT, AWID, MQTT
regular traffic packets, KDD99,
Vx-Heaven, Kaggle and
Ransomware, NSL-KDD, ICS
cyberattack dataset, IoT Traffic,
UNSW-NB15, CICIDS2017,
ISCX, UGR16
2.2 Comparing ML/DL models for Anomaly Detection in IoT
The authors discussed in [17], the way IoTs collect data about their surround-
ing environments, and performed a comparative study of various ML/DL ap-
proaches for attack and anomaly detection, and concluded that Random Forest
gives the best performances in terms of accuracy and precision. In [9], the au-
thors trained and compared 10 models on the TON-IoT dataset: Bidirectional
Gated Recurrent Unit Recurrent Neural Network (B-GRU-RNN), Bidirectional
Long Short-Term Memory Recurrent Neural Network (B-LSTM-RNN), Random
Forest (RF), Gradient Boosted Trees (GBT), K Nearest Neighbours (KNN),
Deep Neural Network (DNN), eXtreme Gradient Boosting (XGB), MLP, Sup-
port Vector Machine (SVM), Naive Bayes (NB). They compared the efficiency
of each model using 4 metrics (Accuracy, Precision, Recall, F1-score), and they
identified B-GRU-RNN as the best model. In [24], the authors have trained a
Two-tier Classification (TDTC) model combined with a Two-layer Dimension
reduction, and compared it with Two-tier classification [23], NB, RF, SVM, and
Decision Tree (DT) on the NSL-KDD dataset [31]. They compared the efficiency
of each model using the Detection rate and showed that the best model still to
be TDTC. The model is not only capable of detecting attacks but to distinguish
the type of attacks as well. In [22], the authors introduced a ML-based approach
for modeling IoT service behaviors by observing their communication patterns.
The training process was performed on distributed nodes within multiple IoT
sites, and the resulting models are combined together to produce a global model
among different IoT sites. The authors showed that the combined model has a
better anomaly detection rate than the local models.
Improving ML/DL Solutions for Anomaly Detection in IoT Environments 5
In [29], the authors introduced an outlier detection procedure using the
K-means algorithm coupled with Big Data techniques, to make the process scal-
able. The model was trained on Guildford’s facility dataset, proposed within
the framework of the European Smart Santander Project. In [18], the authors
have considered 5 ML algorithms: Logistic Regression (LR), SVM, DT, Random
Forest, ANN, to train models on DS2OS dataset1and to evaluate their perfor-
mances by using 5 metrics (accuracy, precision, recall, F1 score, and area under
the Receiver Operating Characteristic Curve). The authors showed that Random
Forest-based ML model outperforms the other models on the used dataset, but
they pointed out the need for a new robust algorithm for anomaly detection.
In [26], the authors trained several ML/DL models for anomaly detection in
IoT environments on NB-IoT EDGE DEVICE dataset. The comparison among
the models has been carried out based on 3 metrics: Precision, Recall, F1-score.
The auto-encoders found to be a better choice than ML for anomaly detection,
when the detection is on the edge. In [33], the authors proposed a Convolution
Neural Network (CNN) to detect and classify anomalies in IoT Networks using
dimensionality 1D, 2D and 3D. They used transfer learning to do binary classifi-
cation and multi-class classification and they trained their models on a combined
dataset made of BoT-IoT, IoT Network Intrusion, MQTT-IoT-IDS2020 [19], and
IoT-23 [15]. They compared each model with 4 metrics (Accuracy, Precision, Re-
call, F1-score), and they concluded that CNND1 performs better than CNN2D
and CNN3D. In [12], the authors showed that DL models are better at handling
the small variants due to their high-level feature extraction capabilities. The
authors trained a DL model and a shallow model on NSL-KDD2dataset. They
compared both models based on 6 metrics: Detection Rate (DR), False Alarm
Rate (FAR), Accuracy, Precision, Recall, F1-score. The results showed that the
DL model outperforms the shallow one for detecting distributed attacks.
In [1], the authors used a simulated IoT network to show that feature selection
can help increase the accuracy of DDoS attack detection in IoT network traffic.
The authors considered a variety of ML algorithms: KNN, LSVM, DT, RF, NN,
and trained them on simulated data. The comparison among the models was
based on the regular metrics of Accuracy, Precision, Recall, F1-score. They con-
cluded that RF-based model outperforms the other models in both classification
of the legitimate activities and DDOS attacks. In [32], the authors developed
an intelligent intrusion-detection system tailored to IoT environments using a
DL algorithm to detect malicious traffic inside such environments. They evalu-
ated the models using both real-network traffic traces, and simulated data. They
designed a DL model (DL-Sim) and compared it with existing IDS (Intrusion De-
tection Systems) solutions (IWC) using 3 metrics (Precision, Recall, F1-score).
DL-SIM model outperforms the existing solutions. In [34], the authors designed
a DL model to detect anomalies in Multivariate Time Series Data, which is re-
sistant to noise. The model is MSCRED (Multi Scale Convolutional Recurrent
Encoder Decoder) that correlates the inter-sensor data and uses an attention
1https://www.kaggle.com/datasets/francoisxa/ds2ostraffictraces
2https://www.kaggle.com/datasets/hassan06/nslkdd?select=kddtest
6 N. TAMANI et al.
based Convolutional Long-Short Term Memory (ConvLSTM) network to detect
patterns. The performance of the trained model is good but the training data
used are synthetic. In [6], the authors proposed a clustering method to detect
anomalies in Big Data. It is an improved optimization approach where a weight
is assigned to each data point. The approach has been compared with K-means
algorithm applied on Australian credit approval dataset [25], Heart dataset [30]
and NSL-KDD [4]. The comparison has been performed by using 6 metrics:
Purity, Mirkin, F-measure, Variation of Information (VI), Partition Coefficient
(PC), V-measure. They showed that the clustering method detects anomalous
values more accurately than K-means. The literature review of the above articles
is summarized in Table 2.
Table 2: State of the art summary of ML methods for anomaly detection in IoT.
Paper Models Best model Dataset(s)
[9] B-GRU-RNN,
B-LSTM-RNN, RF, GBT,
KNN, DNN, XGB, MLP,
SVM, NB
B-GRU-RNN (Acc:
98.62%, P: 99.68%,
R: 98.20%, F1:
98.93%)
TON-IoT
[24] TDTC, Two-tier
classification, Naïves
Bayes, RF, SVM, DT
TDTC (Acc:
84.86%)
NSL-KDD
[22] Mdeling IoT
communicative behavior
by observing traffic
Proposed model 2 datasets: Australian
credit approval and
Heart datasets
[29] Outlier detection
algorithm with Big Data
processing
Outlier detection
algorithm (AUC:
0.8967)
Guildford’s facility
(European Smart
Santander Project)
[17] LR, SVM, RF, Naïve
Bayes, DT, CNN, MLP,
GNB, RNN, GRU,
LSTM, AdaB, KNN,
DNN, XGBoost, ID3,
QDA
RF UCI ML, IoT-23,
BoT-IoT, NSL-KDD,
DS2OS, CICIDS-2017,
UNSW-NB15, ICS
Cyberattack, IoT
Network Intrusion
dataset, KDDCUP99
[18] LR, SVM, DT, RF, ANN RF (Acc: 99.4%,
F1: 99%)
DS2OS traffic traces
[26] ADM-EDGE, ADM-FOG,
SVM, ABOD, KNN,
PCA, HBOD
ADM-EDGE (P:
70.5%, R: 69%, F1:
60.7%)
NB-IoT EDGE
DEVICE
[33] CNN, CNN1D, CNN2D,
CNN3D, C-LSTM-AE,
C-CMU, FFN, SNN
CNN1D (Acc:
99.97%, P: 99.95%,
R: 99.95%, F1:
99.95%)
BoT-IoT, IoT Network
Intrusion,
MQTT-IoT-IDS2020,
IoT-23, IoT-DS-1/-2
Improving ML/DL Solutions for Anomaly Detection in IoT Environments 7
[12] Deep model, Shallow
model
Deep model NSL-KDD
[2] GAAOD to approximate
KNN
- TAO, Stock, HPC
[34] MSCRED MSCRED Synthetic Data, Power
Plant Data
[32] DNN, DL-Sim, IWC,
DL-Testbed
DL-Sim (P:
96.88%, R: 98.02%,
F1: 97.46%)
Synthetic data about a
smart house. Around
60.000 data
[1] KNN, LSVM, DT, RF,
NN
RF (Acc: 99.9%,
P-normal: 99.9%,
P-attack: 99.9%,
R-normal: 99.8%,
R-attack: 99.9%,
F1-normal: 99.8%,
F1-attack: 99.9%)
synthetic data about
IoT devices with
normal activities and
DoS attacks
3 Datasets and ML Approaches for Anomaly Detection
From the state of the art, we have identified the following ML/DL algorithms
used for anomaly detection in IoT environments: Support Vector Machine (SVM),
Random Forest (RF), K Nearest Neighbors (KNN), Basic gated Recurrent Unit-
Recurrent Neural Network (B-GRU-RNN), Logistic Regression (LR), and Con-
volutional Neural Network 1 Dimension (CNN1D). The performances of their
corresponding ML/DL models, trained on diverse datasets, in terms of Accu-
racy, Precision, Recall, F1-score are listed in Table 3. These results have been
extracted from the state of the art, studied in the previous section.
We have also identified the publicly accessible datasets for training ML mod-
els. Their characteristics, in terms of dimensions, distribution and description,
are detailed in Table 4.
4 Model Reproduction and Improvement
In this section, we detail the process of reproducing the results of the selected ML
algorithms. We have carried out our training/testing processes on a computer
with the following properties: Memory of 16 GB, Processor Intel(R) Core (TM)
i5-7200U CPU @ 2.50GHz 2.71 GHz, and Windows 10 Enterprise 22H2 Oper-
ating System. For the software, we used Python 3.11.3 on Visual Studio Code,
with Pandas, Keras, Tensorflow and Scikit-learn, NumPy, Time, Datetime, IP
address, ipynb, and os libraries.
3https://research.unsw.edu.au/projects/toniot-datasets
4https://research.unsw.edu.au/projects/unsw-nb15-dataset
5https://www.stratosphereips.org/datasets-iot23
8 N. TAMANI et al.
Table 3. Ml/DL Model performances from the state of the art.
Paper Model Dataset Accuracy Precision Recall F1-score
[18] SVM DS2OS 98.2% 98% 98% 98%
[9] SVM TON-IoT 72.34% 82.91% 72.40% 77.30%
[14] SVM UR Fall Detection 98.39% - - 98.8%
[18] RF DS2OS 99.4% 99% 99% 99%
[9] RF TON-IoT 96.30% 96.36% 98.01% 97.18%
[28] RF UNSW-NB15 98.2% 98% 98% 98%
[9] KNN TON-IoT 95.79% 96.19% 97.38% 96.78%
[1] KNN Simulated DoS attacks 99.9% 99.8% 99.3% 99.5%
[14] KNN UR Fall Detection 98.79% - - 99.1%
[28] KNN UNSW-NB15 96% 96% 96% 96%
[9] B-GRU-RNN TON-IoT 98.62% 99.68% 98.20% 98.93%
[18] LR DS2OS 98.3% 98% 98% 98%
[33] CNN1D BoT-IoT (old TON-IoT) 99.97% 99.95% 99.95% 99.95%
Table 4. Dataset information summary.
Dataset Dimensions Data distribution Description
NSL
[11,24,6,17,3,12]
25192 x 43 13449 normal /
11743 anomalies
Records of internet traffic seen
by simple intrusion detection
systems (IDS)
TON-IoT3
[9,17,33,3]
461043 x 45 300000 normal /
161043 anomalies
Heterogeneous data sources:
IoT and IIoT (Industrial IoT)
sensors, Operating systems logs
(Windows 7 and 10, Ubuntu 14
and 18 TLS), and Network
traffic
UNSW-NB154
[11,17,3]
82332 x 45 37000 normal /
45332 anomalies
Hybrid of normal activities and
synthetic attacks
DS2OS [10,17,18] 357952 x 13 347935 normal /
10017 anomalies
Traces captured in the IoT
environment DS2OS
IoT-235[17,33] 8186879 x 23 497177 normal /
7689702
anomalies
IoT Network traffic in
Stratosphere Laboratory, AIC
group, FEL, CTU (Czech
Technical University)
Improving ML/DL Solutions for Anomaly Detection in IoT Environments 9
4.1 Data Preprocessing
Because of dataset format, some features required a conversion step such as string
values, which have been converted into integer values by using ASCII conversion,
and True or False data were replaced by 1 and 0, respectively. Timestamp feature,
which indicates the time when the data was collected, was broken down into 6
features: Year, Month, Day, Hour, Minute, and Second. Some other features, such
as data identifier, have been discarded from the considered datasets since they
are not relevant for anomaly detection. The size of datasets and the modifications
performed on them are listed in Table 5.
Table 5. Dataset information summary before and after modifications.
Dataset &
Dimension
New
Dimensions
Modifications Distribution
NSL-KDD
25192 x 43
25192 x 43 3 features converted 13449 normal /
11743 anomalies
TON-IoT
461043 x 45
461043 x 43 20 features converted
Timestamp transformed
8 features dropped
300000 normal /
161043 anomalies
UNSW-NB15
82332 x 45
82332 x 43 3 features converted
2 features dropped
37000 normal /
45332 anomalies
DS2OS
357952 x 13
357952 x 18 10 features converted
Timestamp transformed
1 feature dropped
347935 normal /
10017 anomalies
IoT-23
8186879 x 23
8186879 x 27 11 features converted
Timestamp transformed
2 features dropped
497177 normal /
7689702 anomalies
4.2 Model Reproduction and Comparative Analysis
The comparison is performed by using 4 metrics: Accuracy, Precision, Recall,
F1-score. The performances of each reproduced model are listed in Table 6 for
both DL and ML approaches, where results in bold font represent the best
trained models with regards the datasets used to train them. The results of
the model comparison are relatively close to the results found in the respective
papers. When comparing the models, we can see that Random Forest-based
model outperforms the models when using TON-IoT dataset.
4.3 Random Forest-based Model Improvement
Upon comparing our reproduced Random Forest-based model with the existing
models based on Random Forest algorithm trained on TON-IoT dataset (as
summarized in Table 7), it turns out that there is still room for improvement.
To do so, one feasible way is to modify the dataset’s dimensionality by working
on its features with a feature selection approach.
10 N. TAMANI et al.
Table 6. Reproduced Experimental results for ML/DL approaches.
Model Dataset Accuracy Precision Recall F1-score
SVM
NSL-KDD 92.70% 91.99% 95.14% 93.54%
TON-IoT 65.3060% 100% 65.3049% 79.0114%
UNSW-NB15 55.8329% 2.3910% 66.6667% 4.6164%
DS2OS 97.1700% 100% 97.1700% 98.5647%
IoT-23 93.9236% 0.0010% 100% 0.0020%
Random
Forest
NSL-KDD 98.3774% 99.7415% 99.3745% 99.5577%
TON-IoT 99.9972% 100% 99.9983% 99.9992%
UNSW-NB15 92.7240% 97.8987% 97.0749% 97.4851%
DS2OS 99.9720% 99.9986% 100% 99.9993%
IoT-23 99.9967% 99.9990% 99.9960% 99.9975%
KNN
NSL-KDD 98.7299% 98.4525% 99.1834% 98.8166%
TON-IoT 99.8449% 99.8583% 99.9033% 99.8808%
UNSW-NB15 82.9720% 84.1616% 79.3322% 81.6756%
DS2OS 99.2541% 99.6132% 99.6189% 99.6161%
IoT-23 93.8829% 41.5585% 49.0994% 45.0154%
Linear
Regression
NSL-KDD 97.0034% 96.8000% 97.4310% 97.1145%
TON-IoT 67.0292% 98.9666% 66.5971% 79.6176%
UNSW-NB15 55.6628% 2.2349% 66.5323% 4.3245%
DS2OS 97.5178% 99.8231% 97.6727% 98.7362%
IoT-23 94.0720% 2.6400% 100% 5.1442%
B-GRU-
RNN
NSL-KDD 99.4840% 99.5228% 99.5228% 99.5228%
TON-IoT 99.8666% 99.9315% 99.8630% 99.8972%
UNSW-NB15 95.7916% 95.0571% 95.7057% 95.3803%
DS2OS 99.9972% 99.9971% 100% 99.9986%
IoT-23 99.9155% 98.6767% 99.9350% 99.3019%
CNN1D
NSL-KDD 98.0353% 97.4141% 98.9122% 98.1575%
TON-IoT 65.0132% 100% 65.0090% 78.7945%
UNSW-NB15 44.6955% 100% 44.6955% 61.7787%
DS2OS 98.4300% 100% 98.4108% 99.1990%
IoT-23 6.0389% 100% 6.0389% 11.3900%
Table 7. Efficiency of multiple RF models using TON-IoT.
Dataset Accuracy Precision Recall F1-score
Our results 99.9972% 100% 99.9983% 99.9992%
IoT and IIoT Networks [9] 96.30% 96.36% 98.01% 97.18%
Network TON-IoT datasets [21] 99.9998% n/a n/a n/a
TON-IoT Telemetry Dataset[7] 85% 87% 85% 85%
ToN-IoT: Intrusion Data Sets [8] 98.075% n/a n/a 97.264%
Improving ML/DL Solutions for Anomaly Detection in IoT Environments 11
Feature Selection: It is possible to improve the overall efficiency of the model
by selecting only the best features of the dataset to train the model. To do
so, we have used SelectKBest from Sci-kit-learn library. Once the features are
ranked from the most important to the less important one, we progressively
train the model in different cycles, by adding in each cycle one more feature in
the ranked order. In each cycle, we compute the accuracy of the model. If the
current accuracy is less than or equal the one of the previous cycle, we discard
the current model and we return the previous one. Otherwise, we add the next
best feature (if it remains) in the ranked order, and we proceed with a new cycle
of training. This technique is based on the algorithm used in [16].
Obtained Results: Table 8 summarizes the obtained results when we removed
31, 33 and 34 features respectively. With Random Forest-based model, when
using TON-IoT dataset along with features selection, the accuracy, precision,
recall and F1-score increase. It seems that many features from TON-IoT dataset
are not relevant for anomaly detection.
Table 8. Results of features selection applied to RF model with TON-IoT dataset.
Modifications Accuracy Precision Recall F1-score
No modification 99.9972% 100% 99.9983% 99.9992%
Removing 31 features (12 left) 99.9976% 100% 99.9983% 99.9992%
Removing 33 features (10 left) 99.9978% 100% 99.9983% 99.9992%
SelectKBest (9 features left) 99.9985% 100% 100% 100%
Furthermore, we have tried to reproduce the results for Random Forest with
other datasets, and to train other ML algorithms on Ton-IoT dataset.
In Table 9, we summarized the following experiment results:
Random Forest (RF) and 2 datasets: IoT-23 and DO2OS. In both cases, we
noticed that feature selection is not relevant since the quality of the obtained
models decreased when we applied SelectKBest.
TON-IoT with KNN: the accuracy, precision, recall and F1-score increase
when we remove the irrelevant features from the dataset TON-IoT.
DL approach with DS20S dataset: we have also applied the same process on
B-GRU-RNN algorithm with DS2OS dataset and we have also obtained less
conclusive results.
5 Conclusion and Future Work
We studied in this paper the provision of ML/DL approaches in anomaly de-
tection within IoT data. We tested different algorithms on diverse IoT datasets
to identify the best model for anomaly detection in IoT, and we were able to
12 N. TAMANI et al.
Table 9. Results of the features selection experiment on other models and dataset
Random Forest with IoT-23 dataset
Accuracy Precision Recall F1-score
No modification 99.9967% 99.9990% 99.9960% 99.9975%
SelectKBest: 7 features left 98.2123% 98.6970% 99.8292% 99.2599%
Random Forest with DS2OS dataset
Accuracy Precision Recall F1-score
No modification 99.9720% 99.9986% 100% 99.9993%
SelectKBest: 6 features left 93.3768% 99.9756% 99.8408% 99.9082%
KNN model with TON-IoT dataset
Accuracy Precision Recall F1-score
No modification 99.8449% 99.8583% 99.9033% 99.8808%
SelectKBest: 6 features left 99.9902% 99.9500% 99.9084% 99.9292%
B-GRU-RNN model with DS2OS dataset
Accuracy Precision Recall F1-score
No modification 99.4371% 99.9770% 99.4472% 99.7114%
SelectKBest: 6 features left 97.2664% 100% 97.2664% 98.6143%
improve the efficiency of the best model by using feature selection approach.
However, we noticed that the performances of ML/DL models often depend on
the use-case on hand and the quality of the dataset used, in terms of size and
diversity of anomalies represented. Besides, data preprocessing of the data is also
crucial for the training process. We need to go beyond the empirical approach
followed in this paper to formally prepare the data to train a ML/DL model in
a generic way. Furthermore, most of the research work have been dedicated to
point anomaly detection, which let open a large perspective for studying group
and context-based anomaly detection with ML/DL approaches.
References
1. Machine learning ddos detection for consumer internet of things devices. pp. 29–35.
Institute of Electrical and Electronics Engineers Inc. (2018). https://doi.org/
10.1109/SPW.2018.00013
2. Knn-based approximate outlier detection algorithm over iot streaming data. IEEE
Access 8, 42,749–42,759 (2020). https://doi.org/10.1109/ACCESS.2020.2977114
3. Abdullahi, M., Baashar, Y., Alhussian, H., Alwadain, A., Aziz, N., Capretz, L.F.,
Abdulkadir, S.J.: Detecting cybersecurity attacks in internet of things using artifi-
cial intelligence methods: A systematic literature review. Electronics 11(2) (2022).
https://doi.org/10.3390/electronics11020198
4. Aggarwal, P., Sharma, S.K.: Analysis of kdd dataset attributes - class wise for
intrusion detection. pp. 842–851. Elsevier (2015). https://doi.org/10.1016/j.
procs.2015.07.490
5. Al-amri, R., Murugesan, R.K., Man, M., Abdulateef, A.F., Al-Sharafi, M.A.,
Alkahtani, A.A.: A review of machine learning and deep learning techniques
for anomaly detection in iot data. Applied Sciences 11(12) (2021). https:
//doi.org/10.3390/app11125320
Improving ML/DL Solutions for Anomaly Detection in IoT Environments 13
6. Alguliyev, R.M., Aliguliyev, R.M., Imamverdiyev, Y.N., Sukhostat, L.V.: An
anomaly detection based on optimization. International Journal of Intelligent Sys-
tems and Applications 9, 87–96 (2017). https://doi.org/10.5815/ijisa.2017.
12.08
7. Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A., Anwar, A.: Ton_iot telemetry
dataset: A new generation dataset of iot and iiot for data-driven intrusion detection
systems. IEEE Access 8, 165,130–165,150 (2020). https://doi.org/10.1109/
ACCESS.2020.3022862
8. Booij, T.M., Chiscop, I., Meeuwissen, E., Moustafa, N., Hartog, F.T.H.d.: Ton_iot:
The role of heterogeneity and the need for standardization of features and attack
types in iot network intrusion data sets. IEEE Internet of Things Journal 9(1),
485–496 (2022). https://doi.org/10.1109/JIOT.2021.3085194
9. Da Silva Oliveira, G.A., Lima, P.S.S., Kon, F., Terada, R., Batista, D.M., Hirata,
R., Hamdan, M.: A stacked ensemble classifier for an intrusion detection system
in the edge of iot and iiot networks. In: 2022 IEEE Latin-American Conference
on Communications (LATINCOM), pp. 1–6 (2022). https://doi.org/10.1109/
LATINCOM56090.2022.10000559
10. DeMedeiros, K., Hendawi, A., Alvarez, M.: A survey of ai-based anomaly detection
in iot and sensor networks. Sensors 23(3) (2023). https://doi.org/10.3390/
s23031352
11. Diro, A., Chilamkurti, N., Nguyen, V.D., Heyne, W.: A comprehensive study of
anomaly detection schemes in iot networks using machine learning algorithms.
Sensors 21(24) (2021). https://doi.org/10.3390/s21248320
12. Diro, A.A., Chilamkurti, N.: Distributed attack detection scheme using deep learn-
ing approach for internet of things. Future Generation Computer Systems 82,
761–768 (2018). https://doi.org/10.1016/j.future.2017.08.043
13. Fahim, M., Sillitti, A.: Anomaly detection, analysis and prediction techniques in
iot environment: A systematic literature review. IEEE Access 7, 81,664–81,681
(2019). https://doi.org/10.1109/ACCESS.2019.2921912
14. Galvao, Y.M., Albuquerque, V.A., Fernandes, B.J.T., Valenca, M.J.S.: Anomaly
detection in smart houses: Monitoring elderly daily behavior for fall detecting. pp.
1–6. IEEE (2017). https://doi.org/10.1109/LA-CCI.2017.8285701
15. Garcia, S., Parmisano, A., Erquiaga, M.J.: https://www.stratosphereips.org/datasets-
iot23 (2020)
16. Haidar, N., Tamani, N., Nienaber, F., Wesseling, M.T., Bouju, A., Ghamri-
Doudane, Y.: Data collection period and sensor selection method for smart build-
ing occupancy prediction. In: 2019 IEEE 89th Vehicular Technology Conference
(VTC2019-Spring), pp. 1–6 (2019). https://doi.org/10.1109/VTCSpring.2019.
8746447
17. Haji, S.H., Ameen, S.Y.: Attack and anomaly detection in iot networks using ma-
chine learning techniques: A review. Asian Journal of Research in Computer Sci-
ence (2021)
18. Hasan, M., Islam, M.M., Zarif, M.I.I., Hashem, M.: Attack and anomaly detection
in iot sensors in iot sites using machine learning approaches (2019). https://doi.
org/10.1016/j.iot.2019.10
19. Hindy, H., Tachtatzis, C., Atkinson, R., Bayne, E., Bellekens, X.: https://ieee-
dataport.org/open-access/mqtt-iot-ids2020-mqtt-internet-things-intrusion-
detection-dataset (2020)
20. Merchant, N.: Iot technologies explained: History, examples, risks & future. URL
https://www.visionofhumanity.org/what-is-the-internet- of-things/
14 N. TAMANI et al.
21. Moustafa, N.: A new distributed architecture for evaluating ai-based security sys-
tems at the edge: Network ton_iot datasets. Sustainable Cities and Society
72, 102,994 (2021). https://doi.org/https://doi.org/10.1016/j.scs.2021.
102994
22. Pahl, M.O., Aubet, F.X.: All eyes on you: Distributed multi-dimensional iot mi-
croservice anomaly detection. In: 2018 14th International Conference on Network
and Service Management (CNSM), pp. 72–80 (2018)
23. Pajouh, H.H., Dastghaibyfard, G., Hashemi, S.: Two-tier network anomaly de-
tection model: a machine learning approach. Journal of Intelligent Information
Systems 48(1), 61–74 (2017). https://doi.org/10.1007/s10844-015-0388-x
24. Pajouh, H.H., Javidan, R., Khayami, R., Dehghantanha, A., Choo, K.K.R.: A
two-layer dimension reduction and two-tier classification model for anomaly-based
intrusion detection in iot backbone networks. IEEE Transactions on Emerging Top-
ics in Computing 7(2), 314–323 (2019). https://doi.org/10.1109/TETC.2016.
2633228
25. Quinlan, R.: Statlog (australian credit approval)
(https://doi.org/10.24432/c59012)
26. Savic, M., Lukic, M., Danilovic, D., Bodroski, Z., Bajovic, D., Mezei, I., Vukobra-
tovic, D., Skrbic, S., Jakovetic, D.: Deep learning anomaly detection for cellular
iot with applications in smart logistics. IEEE Access 9, 59,406–59,419 (2021).
https://doi.org/10.1109/ACCESS.2021.3072916
27. Sharma, B., Sharma, L., Lal, C.: Anomaly detection techniques using deep learning
in iot: A survey. In: 2019 International Conference on Computational Intelligence
and Knowledge Economy (ICCIKE), pp. 146–149 (2019). https://doi.org/10.
1109/ICCIKE47802.2019.9004362
28. Shaver, A., Liu, Z., Thapa, N., Roy, K., Gokara ju, B., Yuan, X.: Anomaly based in-
trusion detection for iot with machine learning. Institute of Electrical and Electron-
ics Engineers Inc. (2020). https://doi.org/10.1109/AIPR50011.2020.9425199
29. Souza, A.M., Amazonas, J.R.: An outlier detect algorithm using big data processing
and internet of things architecture. pp. 1010–1015. Elsevier B.V. (2015). https:
//doi.org/10.1016/j.procs.2015.05.095
30. Steinbrunn, A., Pfisterer, W., Detrano, M., Janosi, R.: Heart disease
https://doi.org/10.24432/c52p4x (1988)
31. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd
cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for
Security and Defense Applications, pp. 1–6 (2009). https://doi.org/10.1109/
CISDA.2009.5356528
32. Thamilarasu, G., Chawla, S.: Towards deep-learning-driven intrusion detection for
the internet of things. Sensors (Switzerland) 19 (2019). https://doi.org/10.
3390/s19091977
33. Ullah, I., Mahmoud, Q.H.: Design and development of a deep learning-based model
for anomaly detection in iot networks. IEEE Access 9, 103,906–103,926 (2021).
https://doi.org/10.1109/ACCESS.2021.3094024
34. Zhang, C., Song, D., Chen, Y., Feng, X., Lumezanu, C., Cheng, W., Ni, J., Zong,
B., Chen, H., Chawla, N.V.: A deep neural network for unsupervised anomaly
detection and diagnosis in multivariate time series data (2018)
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Machine learning (ML) and deep learning (DL), in particular, are common tools for anomaly detection (AD). With the rapid increase in the number of Internet-connected devices, the growing desire for Internet of Things (IoT) devices in the home, on our person, and in our vehicles, and the transition to smart infrastructure and the Industrial IoT (IIoT), anomaly detection in these devices is critical. This paper is a survey of anomaly detection in sensor networks/the IoT. This paper defines what an anomaly is and surveys multiple sources based on those definitions. The goal of this survey was to highlight how anomaly detection is being performed on the Internet of Things and sensor networks, identify anomaly detection approaches, and outlines gaps in the research in this domain.
Article
Full-text available
In recent years, technology has advanced to the fourth industrial revolution (Industry 4.0), where the Internet of things (IoTs), fog computing, computer security, and cyberattacks have evolved exponentially on a large scale. The rapid development of IoT devices and networks in various forms generate enormous amounts of data which in turn demand careful authentication and security. Artificial intelligence (AI) is considered one of the most promising methods for addressing cybersecurity threats and providing security. In this study, we present a systematic literature review (SLR) that categorize, map and survey the existing literature on AI methods used to detect cybersecurity attacks in the IoT environment. The scope of this SLR includes an in-depth investigation on most AI trending techniques in cybersecurity and state-of-art solutions. A systematic search was performed on various electronic databases (SCOPUS, Science Direct, IEEE Xplore, Web of Science, ACM, and MDPI). Out of the identified records, 80 studies published between 2016 and 2021 were selected, surveyed and carefully assessed. This review has explored deep learning (DL) and machine learning (ML) techniques used in IoT security, and their effectiveness in detecting attacks. However, several studies have proposed smart intrusion detection systems (IDS) with intelligent architectural frameworks using AI to overcome the existing security and privacy challenges. It is found that support vector machines (SVM) and random forest (RF) are among the most used methods, due to high accuracy detection another reason may be efficient memory. In addition, other methods also provide better performance such as extreme gradient boosting (XGBoost), neural networks (NN) and recurrent neural networks (RNN). This analysis also provides an insight into the AI roadmap to detect threats based on attack categories. Finally, we present recommendations for potential future investigations.
Article
Full-text available
The Internet of Things (IoT) consists of a massive number of smart devices capable of data collection, storage, processing, and communication. The adoption of the IoT has brought about tremendous innovation opportunities in industries, homes, the environment, and businesses. However, the inherent vulnerabilities of the IoT have sparked concerns for wide adoption and applications. Unlike traditional information technology (I.T.) systems, the IoT environment is challenging to secure due to resource constraints, heterogeneity, and distributed nature of the smart devices. This makes it impossible to apply host-based prevention mechanisms such as anti-malware and anti-virus. These challenges and the nature of IoT applications call for a monitoring system such as anomaly detection both at device and network levels beyond the organisational boundary. This suggests an anomaly detection system is strongly positioned to secure IoT devices better than any other security mechanism. In this paper, we aim to provide an in-depth review of existing works in developing anomaly detection solutions using machine learning for protecting an IoT system. We also indicate that blockchain-based anomaly detection systems can collaboratively learn effective machine learning models to detect anomalies.
Article
Full-text available
In recent years, the security industry has seen an exponential increase in cyber-attacks. These attacks have been effective in accomplishing their despicable goals. A secure network needs a robust intrusion detection scheme. Traditional machine learning approaches seem to be inefficient in the face of dynamic communication networks and various intrusion techniques. They cannot satisfy the criteria of the modern network context. Deep learning is important in the field of network security. The deep learning-based Intrusion Detection System (IDS) efficiency is excellent, much better than conventional approaches. Convolutional neural networks are an excellent alternative for intrusion classification due to their ability to automatically classify main characteristics in input data and their effectiveness in performing faster computations. This paper proposes and implements a model for intrusion detection in the Internet of Things (IoT) networks using a convolutional neural network for binary and multiclass classification. The BoT-IoT, IoT Network Intrusion, MQQT-IoT-IDS2020, and IoT-23 intrusion detection datasets are used to train and validate the proposed convolutional neural network model. Three convolutional neural networks models are used to detect a variety of threats, with significant validation results. The transfer learning principle is then used to create a multiclass and binary classification model. Our proposed binary and multiclass classification model obtained excellent accuracy, precision, recall, and F1 score compared to the established deep learning implementations. Subsequently, the proposed binary and multiclass classification models perform better than traditional machine learning techniques and other existing deep learning architectures in accuracy, precision, recall, and F1 score.
Article
Full-text available
Abstract: Anomaly detection has gained considerable attention in the past couple of years. Emerging technologies, such as the Internet of Things (IoT), are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications. Examining these collected data to detect suspicious events can reduce functional threats and avoid unseen issues that cause downtime in the applications. Due to the dynamic nature of the data stream characteristics, many unresolved problems persist. In the existing literature, methods have been designed and developed to evaluate certain anomalous behaviors in IoT data stream sources. However, there is a lack of comprehensive studies that discuss all the aspects of IoT data processing. Thus, this paper attempts to fill this gap by providing a complete image of various state-of-the-art techniques on the major problems and core challenges in IoT data. The nature of data, anomaly types, learning mode, window model, datasets, and evaluation criteria are also presented. Research challenges related to data evolving, feature-evolving, windowing, ensemble approaches, nature of input data, data complexity and noise, parameters selection, data visualizations, heterogeneity of data, accuracy, and large-scale and high-dimensional data are investigated. Finally, the challenges that require substantial research efforts and future directions are summarized.
Article
Full-text available
The Internet of Things (IoT) is one of today's most rapidly growing technologies. It is a technology that allows billions of smart devices or objects known as "Things" to collect different types of data about themselves and their surroundings using various sensors. They may then share it with the authorized parties for various purposes, including controlling and monitoring industrial services or increasing business services or functions. However, the Internet of Things currently faces more security threats than ever before. Machine Learning (ML) has observed a critical technological breakthrough, which has opened several new research avenues to solve current and future IoT challenges. However, Machine Learning is a powerful technology to identify threats and suspected activities in intelligent devices and networks. In this paper, various ML algorithms have been compared in terms of attack detection and anomaly detection, following a thorough literature review on Machine Learning methods and the significance of IoT security in the context of various types of potential attacks. Furthermore, possible ML-based IoT protection technologies have been introduced.
Article
Full-text available
The Internet of Things (IoT) is reshaping our connected world as the number of lightweight devices connected to the Internet is rapidly growing. Therefore, high-quality research on intrusion detection in the IoT domain is essential. To this end, network intrusion datasets are fundamental, as many attack detection strategies have to be trained and evaluated using such datasets. In this paper, we introduce the description, statistical analysis, and machine learning evaluation of the novel ToN_IoT dataset. Comparison to other recent IoT datasets shows the importance of heterogeneity within these datasets, and how differences between datasets may have a huge impact on detection performance. In a cross-training experiment, we show that the inclusion of different data collection methods and a large diversity of the monitored features are of crucial importance for IoT network intrusion datasets to be useful for the industry. We also explain that the practical application of IoT datasets in operational environments requires the standardization of feature descriptions and cyberattack classes. This can only be achieved with a joint effort from the research community.
Conference Paper
Over the last three decades, cyberattacks have become a threat to national security. These attacks can compromise Internet of Things (IoT) and Industrial Internet of Things (IIoT) networks and affect society. In this paper, we explore Artificial Intelligence (AI) techniques with Machine and Deep Learning models to improve the performance of an anomaly-based Intrusion Detection System (IDS). We use the ensemble classifier method to find the best combination between multiple models of prediction algorithms and to stack the output of these individual models to obtain the final prediction of a new and unique model with better precision. Although there are many ensemble approaches, finding a suitable ensemble configuration for a given dataset is still challenging. We designed an Artificial Neural Network (ANN) with the Adam optimizer to update all model weights based on training data and achieve the best performance. The result shows that it is possible to use a stacked ensemble classifier to achieve good evaluation metrics. For instance, the average accuracy achieved by one of the proposed models was 99.7%. This result was better than the results obtained by any other individual classifier. All the developed code is publicly available to ensure reproducibility.
Article
While there has been a significant interest in understanding the cyber threat landscape of Internet of Things (IoT) networks, and the design of Artificial Intelligence (AI)-based security approaches, there is a lack of distributed architecture led to generating heterogeneous datasets that contain the actual behaviours of real-world IoT networks and complex cyber threat scenarios to evaluate the credibility of the new systems. This paper presents a new realistic testbed architecture of IoT network deployed at the IoT lab of the University of New South Wales (UNSW) at Canberra. The platform NSX vCloud NFV was employed to facilitate the execution of Software-Defined Network (SDN), Network Function Virtualization (NFV) and Service Orchestration (SO) to offer dynamic testbed networks, which allow the interaction of edge, fog and cloud tiers. While deploying the architecture, real-world normal and attack scenarios are executed to collect labeled datasets. The datasets are referred to as “ToN_IoT”, as they comprise heterogeneous data sources collected from telemetry datasets of IoT services, Windows and Linux-based datasets, and datasets of network traffic. The ToN_IoT network dataset is validated using four machine learning-based anomaly detection algorithms of Gradient Boosting Machine, Random Forest, Naive Bayes, and Deep Neural Networks, revealing a high performance of detection accuracy using the set of training and testing. These new datasets provide a realistic testbed of design, a variety of normal and attack events, heterogeneous data sources, and a ground truth table of security events. A comparative summary of the ToN_IoT network dataset and other competing network datasets demonstrates its diverse legitimate and anomalous patterns that can be used to better validate new AI-based security solutions. The datasets can be publicly accessed from ADFA (2020).