Conference PaperPDF Available

Improving ML/DL Solutions for Anomaly Detection in IoT Environments

January 2024

January 2024

DOI:10.1007/978-3-031-57942-4_20

Conference: International Conference on Advanced Information Networking and Applications

Authors:

Nouredine Tamani

Institut Supérieur d’Electronique de Paris

Saad El Jaouhari

Institut Supérieur d’Electronique de Paris

Abdul Qadir Khan

Sorbonne Université

As part of the evolution toward an era of Web 3.0, the Internet of Things (IoT) bridges physical smart devices to digital world to enhance services for consumer convenience. However, the rapid increase of IoT devices led also to the inheritance of security, privacy, and trust problems, already well-known in traditional networks, making IoT devices even more vulnerable. To be able to detect anomalies and protect such IoT devices from cyberattacks, different techniques have been proposed in the literature using diverse approaches going from the logic-based (knowledge bases and ontologies) ones to the statistical ones (Machine Learning-ML/Deep Learning-DL). In this paper, we focus on the later approaches (ML/DL) to identify, reproduce, evaluate, and compare different state-of-the-art machine learning algorithms for anomaly detection in IoT environments, along with the main datasets used in such research works. Once suitable ML models and datasets are identified, we investigated the potential for enhancing them by incorporating a feature selection algorithm. This aims to reduce the dataset’s dimensionality while concurrently improving performance metrics such as accuracy, precision, recall, and F1-score.

Summary of surveys of ML/DL methods for anomaly detection in IoT.

…

State of the art summary of ML methods for anomaly detection in IoT.

…

Ml/DL Model performances from the state of the art.

…

Dataset information summary.

…

Dataset information summary before and after modifications.

…

Figures - uploaded by Saad El Jaouhari

Content may be subject to copyright.

Content uploaded by Saad El Jaouhari

Content may be subject to copyright.

Improving ML/DL Solutions for Anomaly

Detection in IoT Environments

Nouredine Tamani, Saad El-Jaouhari, Abdul-Qadir Khan, and Bastien Pauchet

Institut Supérieur d’Electronique de Paris (Isep), Issy-les-Moulineaux, France,

{saad.el-jaouhari, nouredine.tamani}@isep.fr,

abdul-qadir.khan@ext.isep.fr, bastien.pauchet@orange.fr

Abstract. As part of the evolution toward an era of Web 3.0, the In-

ternet of Things (IoT) bridges physical smart devices to digital world

to enhance services for consumer convenience. However, the rapid in-

crease of IoT devices led also to the inheritance of security, privacy,

and trust problems, already well-known in traditional networks, making

IoT devices even more vulnerable. To be able to detect anomalies and

protect such IoT devices from cyberattacks, diﬀerent techniques have

been proposed in the literature using diverse approaches going from the

logic-based (knowledge bases and ontologies) ones to the statistical ones

(Machine Learning-ML/Deep Learning-DL). In this paper, we focus on

the later approaches (ML/DL) to identify, reproduce, evaluate, and com-

pare diﬀerent state-of-the-art machine learning algorithms for anomaly

detection in IoT environments, along with the main datasets used in

such research works. Once suitable ML models and datasets are identi-

ﬁed, we investigated the potential for enhancing them by incorporating a

feature selection algorithm. This aims to reduce the dataset’s dimension-

ality while concurrently improving performance metrics such as accuracy,

precision, recall, and F1-score.

Keywords: Anomaly detection ·IoT networks ·Security and Privacy ·

Machine/ Deep Learning ·Model Reproducibility.

1 Introduction

Internet of Things (IoT) is a domain mentioned for the ﬁrst time in 1999 by

Kevin Ashton, when he produced the idea of a radio-frequency identiﬁcation

chip (RFID) to track products in the supply chain [20]. IoT networks consist of

many devices capable of data collection, storage, processing, and communication

[11]. IoT has evolved signiﬁcantly since then, thanks to the proliferation of con-

nected objects, and the harvesting and processing of large volumes of data with

Big Data and AI techniques. Nevertheless, the infrastructures, applications, and

services associated with IoT devices introduced several threats and vulnerabil-

ities as emerging protocols and workﬂows have exponentially increased attack

surfaces [20]. Securing IoT devices is challenging for both academics and in-

dustries because of the heterogeneity of IoT environment. Besides, conventional

2 N. TAMANI et al.

security controls are not suitable for all the IoT devices, in particular for the most

constrained ones. Since the distributed IoT networks are outside of the cover-

age of security perimeters, the existing solutions relying on the cloud also suﬀer

from centralization and high delay. Furthermore, IoT device vendors commonly

overlook security requirements in their design processes due to their rush-to-

market proclivity. Moreover, the lack of security standards has contributed to

the complexity of securing IoT devices. These challenges and the nature of IoT

applications call for a monitoring system such as anomaly detection at device,

edge and network levels beyond the organizational boundary [20].

The research community has already developed solid statistical and machine

learning methods to detect anomalies inside IoT data, with real-time analysis,

and prediction of unusual behaviors in IoT environments [13]. In this context,

an anomaly in IoT data can be deﬁned as a data point or a subset of data points

that deviate from the normal patterns. These anomalies can be classiﬁed into 3

types, namely (i) Point anomaly, when it refers to a precise point/instant; (ii)

Contextual anomaly, when an outlier is compared to other points according

to a context and a time window; and (iii) Collective anomaly, when some

points are individually normal, but as a group, they are suspicious.

This paper places particular emphasis on ML/DL approaches for anomaly

detection in IoT data. First, we conducted a comprehensive review of recent

works addressing anomaly detection in IoT environments. This analysis aimed

to identify the main models along with their corresponding training datasets.

We noticed that most of the research eﬀorts in this area have been dedicated to

point anomaly detection. Second, from the identiﬁed approaches and datasets,

we selected 5 publicly available datasets and reproduced the identiﬁed mod-

els to assess their performance.Then, we selected the model demonstrating the

highest performance, and further enhanced it through the implementation of a

feature selection approach. The objective is to reduce the dimensionality of the

dataset and the training time while increasing the performances. We showed that

Random Forest-based model trained on TON-IoT dataset with only the 9 best

features has a better Accuracy, Precision, Recall and F1-score than the model

trained on the original dataset (with 45 features).

The rest of the paper is organized as follows. Section 2 analyzes the ML

approaches developed in the literature for anomaly detection in IoT environ-

ments. Section 3 describes the existing datasets and trained ML/DL models for

anomaly detection in IoT data. In Section 4 details our experiments with the

selected datasets to reproduce and improve the ML/DL models for anomaly

detection. Section 5 concludes the paper with some comments and future work.

2 Approaches for Anomaly Detection in IoT Data

We focused in this section on the recent papers published in the literature be-

tween 2019 and 2023 (summarized in Subsection 2.1). We split them into 2

distinct types, namely: papers where a survey was conducted and papers where

the authors compared amongst ML/DL models (summarized in Subsection 2.2).

Improving ML/DL Solutions for Anomaly Detection in IoT Environments 3

2.1 Surveys in Anomaly Detection in IoT

The authors in [10] focused on DL approaches such as Multilayer Perceptron

(MLP) and Graph Neural Network (GNN) coupled with encoders or Recurrent

Neural Networks (RNN). They also listed the recent available datasets that

are used to train the models to detect attacks on IoT environments. In [3],

the authors surveyed the existing literature on ML/DL methods used to detect

cybersecurity attacks in IoT environments. The survey was conducted using

the PRISMA method, wherein eighty studies from 2016 to 2021 were carefully

selected and evaluated, along with the datasets available for IoT systems.

In [11,27], the authors provided an in-depth overview of the existing works in

developing anomaly detection solutions using ML/DL techniques for protecting

IoT systems. They focused on algorithms and datasets available in the ﬁeld, but

no model comparison has been performed. Similarly, in [5], the authors carried

out a large overview of ML and DL techniques developed up to 2021, and the

nature of data for IoT systems, identiﬁed types of anomalies, datasets, and the

evaluation metrics to measure the performances of the trained models, as well.

In [13], the authors studied anomaly detection in IoT environments, where huge

amount of data can be collected, processed, and analysed to reduce risks, detect

and prevent malicious activities, and avoid involuntary downtime. The study

covered the period between 2000 and 2018 and ﬁelds of smart environments,

transports, health, smart objects, and industrial systems. The literature review

of the above articles is summarized in Table 1.

Table 1: Summary of surveys of ML/DL methods for anomaly detection in IoT.

Paper Models Dataset(s)

[11] RF (Random Forest), DL, RL,

LSTM, CNN (Convolutional Neural

Network), GNN, Multiple, AE-ANN,

AE-SNN, Ensemble, AE, Subspace,

Self-Learning, TCN, AE-LSTM,

DBN, DNN

N-BaIoT, CICIDS 2017, AWID,

UNSW-NB15, NSL-KDD,

Kyoto, KDD CUP 1999

[5] C_LOF, AutoCloud, TEDA

Clustering, BDLM & RBPF, HTM,

ANN (Artiﬁcial Neural Networks),

MDADM, Multi-kernel, xStream,

Regression Model, SVM, CEDAS,

MuDi-Stream, Extreme Learning

Machines, AMAD, LSTM,

Auto-encoder, DNN, Evolving

spiking NN, ISTL, e-SREBOM

Space imager data stream,

KDD29, Cover type,

Spam-SMS, Spam-URL,

KDDCup99, NAB, UCI, D1,

DS2, DS3, Yahoo Webscope,

HTTP, SMTP, SMTP+HTTP,

COVERTYPE, SHUTTLE,

Weather, Web traﬃc, Avocado,

Temperature, and UCSD

Pedestrian datasets

4 N. TAMANI et al.

[10] GDN, Gumbel softmax sampling

strategy, OmniAnomaly, MLP, LR

(Linear Regression), SVM, DT, RF,

SS-TCVN, auto-encoder, LSTM,

Gelenbe Network, Gaussian,

HOT-SAX, GTA, SS-VTCN, CNN

Swat, WaDI, MSL, SMAP,

DS2OS, BaIoT

[13] SVM, PCA, Kernel nonLinear

Regression, CNN, RNN, DT,

Auto-encoders

no dataset mentioned

[27] CNN, RED, DNN, Hybrid anomaly

detection, clusters, IRESE

unsupervised

No dataset

[3] Naïve Bayes, BayesNet, DT, RF,

SMV, SVR, KNN, FPT, Fuzzy

C-Means Algorithm, XGBoost, LR,

K-Means clustering, CDL, RNN,

CNN, Deep Auto-encoders,DNN,

DBN, MLPNN

BoT-IoT, AWID, MQTT

regular traﬃc packets, KDD99,

Vx-Heaven, Kaggle and

Ransomware, NSL-KDD, ICS

cyberattack dataset, IoT Traﬃc,

UNSW-NB15, CICIDS2017,

ISCX, UGR16

2.2 Comparing ML/DL models for Anomaly Detection in IoT

The authors discussed in [17], the way IoTs collect data about their surround-

ing environments, and performed a comparative study of various ML/DL ap-

proaches for attack and anomaly detection, and concluded that Random Forest

gives the best performances in terms of accuracy and precision. In [9], the au-

thors trained and compared 10 models on the TON-IoT dataset: Bidirectional

Gated Recurrent Unit Recurrent Neural Network (B-GRU-RNN), Bidirectional

Long Short-Term Memory Recurrent Neural Network (B-LSTM-RNN), Random

Forest (RF), Gradient Boosted Trees (GBT), K Nearest Neighbours (KNN),

Deep Neural Network (DNN), eXtreme Gradient Boosting (XGB), MLP, Sup-

port Vector Machine (SVM), Naive Bayes (NB). They compared the eﬃciency

of each model using 4 metrics (Accuracy, Precision, Recall, F1-score), and they

identiﬁed B-GRU-RNN as the best model. In [24], the authors have trained a

Two-tier Classiﬁcation (TDTC) model combined with a Two-layer Dimension

reduction, and compared it with Two-tier classiﬁcation [23], NB, RF, SVM, and

Decision Tree (DT) on the NSL-KDD dataset [31]. They compared the eﬃciency

of each model using the Detection rate and showed that the best model still to

be TDTC. The model is not only capable of detecting attacks but to distinguish

the type of attacks as well. In [22], the authors introduced a ML-based approach

for modeling IoT service behaviors by observing their communication patterns.

The training process was performed on distributed nodes within multiple IoT

sites, and the resulting models are combined together to produce a global model

among diﬀerent IoT sites. The authors showed that the combined model has a

better anomaly detection rate than the local models.

Improving ML/DL Solutions for Anomaly Detection in IoT Environments 5

In [29], the authors introduced an outlier detection procedure using the

K-means algorithm coupled with Big Data techniques, to make the process scal-

able. The model was trained on Guildford’s facility dataset, proposed within

the framework of the European Smart Santander Project. In [18], the authors

have considered 5 ML algorithms: Logistic Regression (LR), SVM, DT, Random

Forest, ANN, to train models on DS2OS dataset1and to evaluate their perfor-

mances by using 5 metrics (accuracy, precision, recall, F1 score, and area under

the Receiver Operating Characteristic Curve). The authors showed that Random

Forest-based ML model outperforms the other models on the used dataset, but

they pointed out the need for a new robust algorithm for anomaly detection.

In [26], the authors trained several ML/DL models for anomaly detection in

IoT environments on NB-IoT EDGE DEVICE dataset. The comparison among

the models has been carried out based on 3 metrics: Precision, Recall, F1-score.

The auto-encoders found to be a better choice than ML for anomaly detection,

when the detection is on the edge. In [33], the authors proposed a Convolution

Neural Network (CNN) to detect and classify anomalies in IoT Networks using

dimensionality 1D, 2D and 3D. They used transfer learning to do binary classiﬁ-

cation and multi-class classiﬁcation and they trained their models on a combined

dataset made of BoT-IoT, IoT Network Intrusion, MQTT-IoT-IDS2020 [19], and

IoT-23 [15]. They compared each model with 4 metrics (Accuracy, Precision, Re-

call, F1-score), and they concluded that CNND1 performs better than CNN2D

and CNN3D. In [12], the authors showed that DL models are better at handling

the small variants due to their high-level feature extraction capabilities. The

authors trained a DL model and a shallow model on NSL-KDD2dataset. They

compared both models based on 6 metrics: Detection Rate (DR), False Alarm

Rate (FAR), Accuracy, Precision, Recall, F1-score. The results showed that the

DL model outperforms the shallow one for detecting distributed attacks.

In [1], the authors used a simulated IoT network to show that feature selection

can help increase the accuracy of DDoS attack detection in IoT network traﬃc.

The authors considered a variety of ML algorithms: KNN, LSVM, DT, RF, NN,

and trained them on simulated data. The comparison among the models was

based on the regular metrics of Accuracy, Precision, Recall, F1-score. They con-

cluded that RF-based model outperforms the other models in both classiﬁcation

of the legitimate activities and DDOS attacks. In [32], the authors developed

an intelligent intrusion-detection system tailored to IoT environments using a

DL algorithm to detect malicious traﬃc inside such environments. They evalu-

ated the models using both real-network traﬃc traces, and simulated data. They

designed a DL model (DL-Sim) and compared it with existing IDS (Intrusion De-

tection Systems) solutions (IWC) using 3 metrics (Precision, Recall, F1-score).

DL-SIM model outperforms the existing solutions. In [34], the authors designed

a DL model to detect anomalies in Multivariate Time Series Data, which is re-

sistant to noise. The model is MSCRED (Multi Scale Convolutional Recurrent

Encoder Decoder) that correlates the inter-sensor data and uses an attention

1https://www.kaggle.com/datasets/francoisxa/ds2ostraﬃctraces

2https://www.kaggle.com/datasets/hassan06/nslkdd?select=kddtest

6 N. TAMANI et al.

based Convolutional Long-Short Term Memory (ConvLSTM) network to detect

patterns. The performance of the trained model is good but the training data

used are synthetic. In [6], the authors proposed a clustering method to detect

anomalies in Big Data. It is an improved optimization approach where a weight

is assigned to each data point. The approach has been compared with K-means

algorithm applied on Australian credit approval dataset [25], Heart dataset [30]

and NSL-KDD [4]. The comparison has been performed by using 6 metrics:

Purity, Mirkin, F-measure, Variation of Information (VI), Partition Coeﬃcient

(PC), V-measure. They showed that the clustering method detects anomalous

values more accurately than K-means. The literature review of the above articles

is summarized in Table 2.

Table 2: State of the art summary of ML methods for anomaly detection in IoT.

Paper Models Best model Dataset(s)

[9] B-GRU-RNN,

B-LSTM-RNN, RF, GBT,

KNN, DNN, XGB, MLP,

SVM, NB

B-GRU-RNN (Acc:

98.62%, P: 99.68%,

R: 98.20%, F1:

98.93%)

TON-IoT

[24] TDTC, Two-tier

classiﬁcation, Naïves

Bayes, RF, SVM, DT

TDTC (Acc:

84.86%)

NSL-KDD

[22] Mdeling IoT

communicative behavior

by observing traﬃc

Proposed model 2 datasets: Australian

credit approval and

Heart datasets

[29] Outlier detection

algorithm with Big Data

processing

Outlier detection

algorithm (AUC:

0.8967)

Guildford’s facility

(European Smart

Santander Project)

[17] LR, SVM, RF, Naïve

Bayes, DT, CNN, MLP,

GNB, RNN, GRU,

LSTM, AdaB, KNN,

DNN, XGBoost, ID3,

QDA

RF UCI ML, IoT-23,

BoT-IoT, NSL-KDD,

DS2OS, CICIDS-2017,

UNSW-NB15, ICS

Cyberattack, IoT

Network Intrusion

dataset, KDDCUP99

[18] LR, SVM, DT, RF, ANN RF (Acc: 99.4%,

F1: 99%)

DS2OS traﬃc traces

[26] ADM-EDGE, ADM-FOG,

SVM, ABOD, KNN,

PCA, HBOD

ADM-EDGE (P:

70.5%, R: 69%, F1:

60.7%)

NB-IoT EDGE

DEVICE

[33] CNN, CNN1D, CNN2D,

CNN3D, C-LSTM-AE,

C-CMU, FFN, SNN

CNN1D (Acc:

99.97%, P: 99.95%,

R: 99.95%, F1:

99.95%)

BoT-IoT, IoT Network

Intrusion,

MQTT-IoT-IDS2020,

IoT-23, IoT-DS-1/-2

Improving ML/DL Solutions for Anomaly Detection in IoT Environments 7

[12] Deep model, Shallow

model

Deep model NSL-KDD

[2] GAAOD to approximate

KNN

- TAO, Stock, HPC

[34] MSCRED MSCRED Synthetic Data, Power

Plant Data

[32] DNN, DL-Sim, IWC,

DL-Testbed

DL-Sim (P:

96.88%, R: 98.02%,

F1: 97.46%)

Synthetic data about a

smart house. Around

60.000 data

[1] KNN, LSVM, DT, RF,

RF (Acc: 99.9%,

P-normal: 99.9%,

P-attack: 99.9%,

R-normal: 99.8%,

R-attack: 99.9%,

F1-normal: 99.8%,

F1-attack: 99.9%)

synthetic data about

IoT devices with

normal activities and

DoS attacks

3 Datasets and ML Approaches for Anomaly Detection

From the state of the art, we have identiﬁed the following ML/DL algorithms

used for anomaly detection in IoT environments: Support Vector Machine (SVM),

Random Forest (RF), K Nearest Neighbors (KNN), Basic gated Recurrent Unit-

Recurrent Neural Network (B-GRU-RNN), Logistic Regression (LR), and Con-

volutional Neural Network 1 Dimension (CNN1D). The performances of their

corresponding ML/DL models, trained on diverse datasets, in terms of Accu-

racy, Precision, Recall, F1-score are listed in Table 3. These results have been

extracted from the state of the art, studied in the previous section.

We have also identiﬁed the publicly accessible datasets for training ML mod-

els. Their characteristics, in terms of dimensions, distribution and description,

are detailed in Table 4.

4 Model Reproduction and Improvement

In this section, we detail the process of reproducing the results of the selected ML

algorithms. We have carried out our training/testing processes on a computer

with the following properties: Memory of 16 GB, Processor Intel(R) Core (TM)

i5-7200U CPU @ 2.50GHz 2.71 GHz, and Windows 10 Enterprise 22H2 Oper-

ating System. For the software, we used Python 3.11.3 on Visual Studio Code,

with Pandas, Keras, Tensorﬂow and Scikit-learn, NumPy, Time, Datetime, IP

address, ipynb, and os libraries.

3https://research.unsw.edu.au/projects/toniot-datasets

4https://research.unsw.edu.au/projects/unsw-nb15-dataset

5https://www.stratosphereips.org/datasets-iot23

8 N. TAMANI et al.

Table 3. Ml/DL Model performances from the state of the art.

Paper Model Dataset Accuracy Precision Recall F1-score

[18] SVM DS2OS 98.2% 98% 98% 98%

[9] SVM TON-IoT 72.34% 82.91% 72.40% 77.30%

[14] SVM UR Fall Detection 98.39% - - 98.8%

[18] RF DS2OS 99.4% 99% 99% 99%

[9] RF TON-IoT 96.30% 96.36% 98.01% 97.18%

[28] RF UNSW-NB15 98.2% 98% 98% 98%

[9] KNN TON-IoT 95.79% 96.19% 97.38% 96.78%

[1] KNN Simulated DoS attacks 99.9% 99.8% 99.3% 99.5%

[14] KNN UR Fall Detection 98.79% - - 99.1%

[28] KNN UNSW-NB15 96% 96% 96% 96%

[9] B-GRU-RNN TON-IoT 98.62% 99.68% 98.20% 98.93%

[18] LR DS2OS 98.3% 98% 98% 98%

[33] CNN1D BoT-IoT (old TON-IoT) 99.97% 99.95% 99.95% 99.95%

Table 4. Dataset information summary.

Dataset Dimensions Data distribution Description

NSL

[11,24,6,17,3,12]

25192 x 43 13449 normal /

11743 anomalies

Records of internet traﬃc seen

by simple intrusion detection

systems (IDS)

TON-IoT3

[9,17,33,3]

461043 x 45 300000 normal /

161043 anomalies

Heterogeneous data sources:

IoT and IIoT (Industrial IoT)

sensors, Operating systems logs

(Windows 7 and 10, Ubuntu 14

and 18 TLS), and Network

traﬃc

UNSW-NB154

[11,17,3]

82332 x 45 37000 normal /

45332 anomalies

Hybrid of normal activities and

synthetic attacks

DS2OS [10,17,18] 357952 x 13 347935 normal /

10017 anomalies

Traces captured in the IoT

environment DS2OS

IoT-235[17,33] 8186879 x 23 497177 normal /

7689702

anomalies

IoT Network traﬃc in

Stratosphere Laboratory, AIC

group, FEL, CTU (Czech

Technical University)

Improving ML/DL Solutions for Anomaly Detection in IoT Environments 9

4.1 Data Preprocessing

Because of dataset format, some features required a conversion step such as string

values, which have been converted into integer values by using ASCII conversion,

and True or False data were replaced by 1 and 0, respectively. Timestamp feature,

which indicates the time when the data was collected, was broken down into 6

features: Year, Month, Day, Hour, Minute, and Second. Some other features, such

as data identiﬁer, have been discarded from the considered datasets since they

are not relevant for anomaly detection. The size of datasets and the modiﬁcations

performed on them are listed in Table 5.

Table 5. Dataset information summary before and after modiﬁcations.

Dataset &

Dimension

New

Dimensions

Modiﬁcations Distribution

NSL-KDD

25192 x 43

25192 x 43 •3 features converted 13449 normal /

11743 anomalies

TON-IoT

461043 x 45

461043 x 43 •20 features converted

•Timestamp transformed

•8 features dropped

300000 normal /

161043 anomalies

UNSW-NB15

82332 x 45

82332 x 43 •3 features converted

•2 features dropped

37000 normal /

45332 anomalies

DS2OS

357952 x 13

357952 x 18 •10 features converted

•Timestamp transformed

•1 feature dropped

347935 normal /

10017 anomalies

IoT-23

8186879 x 23

8186879 x 27 •11 features converted

•Timestamp transformed

•2 features dropped

497177 normal /

7689702 anomalies

4.2 Model Reproduction and Comparative Analysis

The comparison is performed by using 4 metrics: Accuracy, Precision, Recall,

F1-score. The performances of each reproduced model are listed in Table 6 for

both DL and ML approaches, where results in bold font represent the best

trained models with regards the datasets used to train them. The results of

the model comparison are relatively close to the results found in the respective

papers. When comparing the models, we can see that Random Forest-based

model outperforms the models when using TON-IoT dataset.

4.3 Random Forest-based Model Improvement

Upon comparing our reproduced Random Forest-based model with the existing

models based on Random Forest algorithm trained on TON-IoT dataset (as

summarized in Table 7), it turns out that there is still room for improvement.

To do so, one feasible way is to modify the dataset’s dimensionality by working

on its features with a feature selection approach.

10 N. TAMANI et al.

Table 6. Reproduced Experimental results for ML/DL approaches.

Model Dataset Accuracy Precision Recall F1-score

SVM

NSL-KDD 92.70% 91.99% 95.14% 93.54%

TON-IoT 65.3060% 100% 65.3049% 79.0114%

UNSW-NB15 55.8329% 2.3910% 66.6667% 4.6164%

DS2OS 97.1700% 100% 97.1700% 98.5647%

IoT-23 93.9236% 0.0010% 100% 0.0020%

Random

Forest

NSL-KDD 98.3774% 99.7415% 99.3745% 99.5577%

TON-IoT 99.9972% 100% 99.9983% 99.9992%

UNSW-NB15 92.7240% 97.8987% 97.0749% 97.4851%

DS2OS 99.9720% 99.9986% 100% 99.9993%

IoT-23 99.9967% 99.9990% 99.9960% 99.9975%

KNN

NSL-KDD 98.7299% 98.4525% 99.1834% 98.8166%

TON-IoT 99.8449% 99.8583% 99.9033% 99.8808%

UNSW-NB15 82.9720% 84.1616% 79.3322% 81.6756%

DS2OS 99.2541% 99.6132% 99.6189% 99.6161%

IoT-23 93.8829% 41.5585% 49.0994% 45.0154%

Linear

Regression

NSL-KDD 97.0034% 96.8000% 97.4310% 97.1145%

TON-IoT 67.0292% 98.9666% 66.5971% 79.6176%

UNSW-NB15 55.6628% 2.2349% 66.5323% 4.3245%

DS2OS 97.5178% 99.8231% 97.6727% 98.7362%

IoT-23 94.0720% 2.6400% 100% 5.1442%

B-GRU-

RNN

NSL-KDD 99.4840% 99.5228% 99.5228% 99.5228%

TON-IoT 99.8666% 99.9315% 99.8630% 99.8972%

UNSW-NB15 95.7916% 95.0571% 95.7057% 95.3803%

DS2OS 99.9972% 99.9971% 100% 99.9986%

IoT-23 99.9155% 98.6767% 99.9350% 99.3019%

CNN1D

NSL-KDD 98.0353% 97.4141% 98.9122% 98.1575%

TON-IoT 65.0132% 100% 65.0090% 78.7945%

UNSW-NB15 44.6955% 100% 44.6955% 61.7787%

DS2OS 98.4300% 100% 98.4108% 99.1990%

IoT-23 6.0389% 100% 6.0389% 11.3900%

Table 7. Eﬃciency of multiple RF models using TON-IoT.

Dataset Accuracy Precision Recall F1-score

Our results 99.9972% 100% 99.9983% 99.9992%

IoT and IIoT Networks [9] 96.30% 96.36% 98.01% 97.18%

Network TON-IoT datasets [21] 99.9998% n/a n/a n/a

TON-IoT Telemetry Dataset[7] 85% 87% 85% 85%

ToN-IoT: Intrusion Data Sets [8] 98.075% n/a n/a 97.264%

Improving ML/DL Solutions for Anomaly Detection in IoT Environments 11

Feature Selection: It is possible to improve the overall eﬃciency of the model

by selecting only the best features of the dataset to train the model. To do

so, we have used SelectKBest from Sci-kit-learn library. Once the features are

ranked from the most important to the less important one, we progressively

train the model in diﬀerent cycles, by adding in each cycle one more feature in

the ranked order. In each cycle, we compute the accuracy of the model. If the

current accuracy is less than or equal the one of the previous cycle, we discard

the current model and we return the previous one. Otherwise, we add the next

best feature (if it remains) in the ranked order, and we proceed with a new cycle

of training. This technique is based on the algorithm used in [16].

Obtained Results: Table 8 summarizes the obtained results when we removed

31, 33 and 34 features respectively. With Random Forest-based model, when

using TON-IoT dataset along with features selection, the accuracy, precision,

recall and F1-score increase. It seems that many features from TON-IoT dataset

are not relevant for anomaly detection.

Table 8. Results of features selection applied to RF model with TON-IoT dataset.

Modiﬁcations Accuracy Precision Recall F1-score

No modiﬁcation 99.9972% 100% 99.9983% 99.9992%

Removing 31 features (12 left) 99.9976% 100% 99.9983% 99.9992%

Removing 33 features (10 left) 99.9978% 100% 99.9983% 99.9992%

SelectKBest (9 features left) 99.9985% 100% 100% 100%

Furthermore, we have tried to reproduce the results for Random Forest with

other datasets, and to train other ML algorithms on Ton-IoT dataset.

In Table 9, we summarized the following experiment results:

–Random Forest (RF) and 2 datasets: IoT-23 and DO2OS. In both cases, we

noticed that feature selection is not relevant since the quality of the obtained

models decreased when we applied SelectKBest.

–TON-IoT with KNN: the accuracy, precision, recall and F1-score increase

when we remove the irrelevant features from the dataset TON-IoT.

–DL approach with DS20S dataset: we have also applied the same process on

B-GRU-RNN algorithm with DS2OS dataset and we have also obtained less

conclusive results.

5 Conclusion and Future Work

We studied in this paper the provision of ML/DL approaches in anomaly de-

tection within IoT data. We tested diﬀerent algorithms on diverse IoT datasets

to identify the best model for anomaly detection in IoT, and we were able to

12 N. TAMANI et al.

Table 9. Results of the features selection experiment on other models and dataset

Random Forest with IoT-23 dataset

Accuracy Precision Recall F1-score

No modiﬁcation 99.9967% 99.9990% 99.9960% 99.9975%

SelectKBest: 7 features left 98.2123% 98.6970% 99.8292% 99.2599%

Random Forest with DS2OS dataset

Accuracy Precision Recall F1-score

No modiﬁcation 99.9720% 99.9986% 100% 99.9993%

SelectKBest: 6 features left 93.3768% 99.9756% 99.8408% 99.9082%

KNN model with TON-IoT dataset

Accuracy Precision Recall F1-score

No modiﬁcation 99.8449% 99.8583% 99.9033% 99.8808%

SelectKBest: 6 features left 99.9902% 99.9500% 99.9084% 99.9292%

B-GRU-RNN model with DS2OS dataset

Accuracy Precision Recall F1-score

No modiﬁcation 99.4371% 99.9770% 99.4472% 99.7114%

SelectKBest: 6 features left 97.2664% 100% 97.2664% 98.6143%

improve the eﬃciency of the best model by using feature selection approach.

However, we noticed that the performances of ML/DL models often depend on

the use-case on hand and the quality of the dataset used, in terms of size and

diversity of anomalies represented. Besides, data preprocessing of the data is also

crucial for the training process. We need to go beyond the empirical approach

followed in this paper to formally prepare the data to train a ML/DL model in

a generic way. Furthermore, most of the research work have been dedicated to

point anomaly detection, which let open a large perspective for studying group

and context-based anomaly detection with ML/DL approaches.

References

1. Machine learning ddos detection for consumer internet of things devices. pp. 29–35.

Institute of Electrical and Electronics Engineers Inc. (2018). https://doi.org/

10.1109/SPW.2018.00013

2. Knn-based approximate outlier detection algorithm over iot streaming data. IEEE

Access 8, 42,749–42,759 (2020). https://doi.org/10.1109/ACCESS.2020.2977114

3. Abdullahi, M., Baashar, Y., Alhussian, H., Alwadain, A., Aziz, N., Capretz, L.F.,

Abdulkadir, S.J.: Detecting cybersecurity attacks in internet of things using artiﬁ-

cial intelligence methods: A systematic literature review. Electronics 11(2) (2022).

https://doi.org/10.3390/electronics11020198

4. Aggarwal, P., Sharma, S.K.: Analysis of kdd dataset attributes - class wise for

intrusion detection. pp. 842–851. Elsevier (2015). https://doi.org/10.1016/j.

procs.2015.07.490

5. Al-amri, R., Murugesan, R.K., Man, M., Abdulateef, A.F., Al-Sharaﬁ, M.A.,

Alkahtani, A.A.: A review of machine learning and deep learning techniques

for anomaly detection in iot data. Applied Sciences 11(12) (2021). https:

//doi.org/10.3390/app11125320

Improving ML/DL Solutions for Anomaly Detection in IoT Environments 13

6. Alguliyev, R.M., Aliguliyev, R.M., Imamverdiyev, Y.N., Sukhostat, L.V.: An

anomaly detection based on optimization. International Journal of Intelligent Sys-

tems and Applications 9, 87–96 (2017). https://doi.org/10.5815/ijisa.2017.

12.08

7. Alsaedi, A., Moustafa, N., Tari, Z., Mahmood, A., Anwar, A.: Ton_iot telemetry

dataset: A new generation dataset of iot and iiot for data-driven intrusion detection

systems. IEEE Access 8, 165,130–165,150 (2020). https://doi.org/10.1109/

ACCESS.2020.3022862

8. Booij, T.M., Chiscop, I., Meeuwissen, E., Moustafa, N., Hartog, F.T.H.d.: Ton_iot:

The role of heterogeneity and the need for standardization of features and attack

types in iot network intrusion data sets. IEEE Internet of Things Journal 9(1),

485–496 (2022). https://doi.org/10.1109/JIOT.2021.3085194

9. Da Silva Oliveira, G.A., Lima, P.S.S., Kon, F., Terada, R., Batista, D.M., Hirata,

R., Hamdan, M.: A stacked ensemble classiﬁer for an intrusion detection system

in the edge of iot and iiot networks. In: 2022 IEEE Latin-American Conference

on Communications (LATINCOM), pp. 1–6 (2022). https://doi.org/10.1109/

LATINCOM56090.2022.10000559

10. DeMedeiros, K., Hendawi, A., Alvarez, M.: A survey of ai-based anomaly detection

in iot and sensor networks. Sensors 23(3) (2023). https://doi.org/10.3390/

s23031352

11. Diro, A., Chilamkurti, N., Nguyen, V.D., Heyne, W.: A comprehensive study of

anomaly detection schemes in iot networks using machine learning algorithms.

Sensors 21(24) (2021). https://doi.org/10.3390/s21248320

12. Diro, A.A., Chilamkurti, N.: Distributed attack detection scheme using deep learn-

ing approach for internet of things. Future Generation Computer Systems 82,

761–768 (2018). https://doi.org/10.1016/j.future.2017.08.043

13. Fahim, M., Sillitti, A.: Anomaly detection, analysis and prediction techniques in

iot environment: A systematic literature review. IEEE Access 7, 81,664–81,681

(2019). https://doi.org/10.1109/ACCESS.2019.2921912

14. Galvao, Y.M., Albuquerque, V.A., Fernandes, B.J.T., Valenca, M.J.S.: Anomaly

detection in smart houses: Monitoring elderly daily behavior for fall detecting. pp.

1–6. IEEE (2017). https://doi.org/10.1109/LA-CCI.2017.8285701

15. Garcia, S., Parmisano, A., Erquiaga, M.J.: https://www.stratosphereips.org/datasets-

iot23 (2020)

16. Haidar, N., Tamani, N., Nienaber, F., Wesseling, M.T., Bouju, A., Ghamri-

Doudane, Y.: Data collection period and sensor selection method for smart build-

ing occupancy prediction. In: 2019 IEEE 89th Vehicular Technology Conference

(VTC2019-Spring), pp. 1–6 (2019). https://doi.org/10.1109/VTCSpring.2019.

8746447

17. Haji, S.H., Ameen, S.Y.: Attack and anomaly detection in iot networks using ma-

chine learning techniques: A review. Asian Journal of Research in Computer Sci-

ence (2021)

18. Hasan, M., Islam, M.M., Zarif, M.I.I., Hashem, M.: Attack and anomaly detection

in iot sensors in iot sites using machine learning approaches (2019). https://doi.

org/10.1016/j.iot.2019.10

19. Hindy, H., Tachtatzis, C., Atkinson, R., Bayne, E., Bellekens, X.: https://ieee-

dataport.org/open-access/mqtt-iot-ids2020-mqtt-internet-things-intrusion-

detection-dataset (2020)

20. Merchant, N.: Iot technologies explained: History, examples, risks & future. URL

https://www.visionofhumanity.org/what-is-the-internet- of-things/

14 N. TAMANI et al.

21. Moustafa, N.: A new distributed architecture for evaluating ai-based security sys-

tems at the edge: Network ton_iot datasets. Sustainable Cities and Society

72, 102,994 (2021). https://doi.org/https://doi.org/10.1016/j.scs.2021.

102994

22. Pahl, M.O., Aubet, F.X.: All eyes on you: Distributed multi-dimensional iot mi-

croservice anomaly detection. In: 2018 14th International Conference on Network

and Service Management (CNSM), pp. 72–80 (2018)

23. Pajouh, H.H., Dastghaibyfard, G., Hashemi, S.: Two-tier network anomaly de-

tection model: a machine learning approach. Journal of Intelligent Information

Systems 48(1), 61–74 (2017). https://doi.org/10.1007/s10844-015-0388-x

24. Pajouh, H.H., Javidan, R., Khayami, R., Dehghantanha, A., Choo, K.K.R.: A

two-layer dimension reduction and two-tier classiﬁcation model for anomaly-based

intrusion detection in iot backbone networks. IEEE Transactions on Emerging Top-

ics in Computing 7(2), 314–323 (2019). https://doi.org/10.1109/TETC.2016.

2633228

25. Quinlan, R.: Statlog (australian credit approval)

(https://doi.org/10.24432/c59012)

26. Savic, M., Lukic, M., Danilovic, D., Bodroski, Z., Bajovic, D., Mezei, I., Vukobra-

tovic, D., Skrbic, S., Jakovetic, D.: Deep learning anomaly detection for cellular

iot with applications in smart logistics. IEEE Access 9, 59,406–59,419 (2021).

https://doi.org/10.1109/ACCESS.2021.3072916

27. Sharma, B., Sharma, L., Lal, C.: Anomaly detection techniques using deep learning

in iot: A survey. In: 2019 International Conference on Computational Intelligence

and Knowledge Economy (ICCIKE), pp. 146–149 (2019). https://doi.org/10.

1109/ICCIKE47802.2019.9004362

28. Shaver, A., Liu, Z., Thapa, N., Roy, K., Gokara ju, B., Yuan, X.: Anomaly based in-

trusion detection for iot with machine learning. Institute of Electrical and Electron-

ics Engineers Inc. (2020). https://doi.org/10.1109/AIPR50011.2020.9425199

29. Souza, A.M., Amazonas, J.R.: An outlier detect algorithm using big data processing

and internet of things architecture. pp. 1010–1015. Elsevier B.V. (2015). https:

//doi.org/10.1016/j.procs.2015.05.095

30. Steinbrunn, A., Pﬁsterer, W., Detrano, M., Janosi, R.: Heart disease

https://doi.org/10.24432/c52p4x (1988)

31. Tavallaee, M., Bagheri, E., Lu, W., Ghorbani, A.A.: A detailed analysis of the kdd

cup 99 data set. In: 2009 IEEE Symposium on Computational Intelligence for

Security and Defense Applications, pp. 1–6 (2009). https://doi.org/10.1109/

CISDA.2009.5356528

32. Thamilarasu, G., Chawla, S.: Towards deep-learning-driven intrusion detection for

the internet of things. Sensors (Switzerland) 19 (2019). https://doi.org/10.

3390/s19091977

33. Ullah, I., Mahmoud, Q.H.: Design and development of a deep learning-based model

for anomaly detection in iot networks. IEEE Access 9, 103,906–103,926 (2021).

https://doi.org/10.1109/ACCESS.2021.3094024

34. Zhang, C., Song, D., Chen, Y., Feng, X., Lumezanu, C., Cheng, W., Ni, J., Zong,

B., Chen, H., Chawla, N.V.: A deep neural network for unsupervised anomaly

detection and diagnosis in multivariate time series data (2018)

ResearchGate has not been able to resolve any citations for this publication.

A Survey of AI-Based Anomaly Detection in IoT and Sensor Networks

Article

Full-text available

Jan 2023
SENSORS-BASEL

Machine learning (ML) and deep learning (DL), in particular, are common tools for anomaly detection (AD). With the rapid increase in the number of Internet-connected devices, the growing desire for Internet of Things (IoT) devices in the home, on our person, and in our vehicles, and the transition to smart infrastructure and the Industrial IoT (IIoT), anomaly detection in these devices is critical. This paper is a survey of anomaly detection in sensor networks/the IoT. This paper defines what an anomaly is and surveys multiple sources based on those definitions. The goal of this survey was to highlight how anomaly detection is being performed on the Internet of Things and sensor networks, identify anomaly detection approaches, and outlines gaps in the research in this domain.

electronics Detecting Cybersecurity Attacks in Internet of Things Using Artificial Intelligence Methods: A Systematic Literature Review

Article

Full-text available

Jan 2022

In recent years, technology has advanced to the fourth industrial revolution (Industry 4.0), where the Internet of things (IoTs), fog computing, computer security, and cyberattacks have evolved exponentially on a large scale. The rapid development of IoT devices and networks in various forms generate enormous amounts of data which in turn demand careful authentication and security. Artificial intelligence (AI) is considered one of the most promising methods for addressing cybersecurity threats and providing security. In this study, we present a systematic literature review (SLR) that categorize, map and survey the existing literature on AI methods used to detect cybersecurity attacks in the IoT environment. The scope of this SLR includes an in-depth investigation on most AI trending techniques in cybersecurity and state-of-art solutions. A systematic search was performed on various electronic databases (SCOPUS, Science Direct, IEEE Xplore, Web of Science, ACM, and MDPI). Out of the identified records, 80 studies published between 2016 and 2021 were selected, surveyed and carefully assessed. This review has explored deep learning (DL) and machine learning (ML) techniques used in IoT security, and their effectiveness in detecting attacks. However, several studies have proposed smart intrusion detection systems (IDS) with intelligent architectural frameworks using AI to overcome the existing security and privacy challenges. It is found that support vector machines (SVM) and random forest (RF) are among the most used methods, due to high accuracy detection another reason may be efficient memory. In addition, other methods also provide better performance such as extreme gradient boosting (XGBoost), neural networks (NN) and recurrent neural networks (RNN). This analysis also provides an insight into the AI roadmap to detect threats based on attack categories. Finally, we present recommendations for potential future investigations.

A Comprehensive Study of Anomaly Detection Schemes in IoT Networks Using Machine Learning Algorithms

Article

Full-text available

Dec 2021
SENSORS-BASEL

The Internet of Things (IoT) consists of a massive number of smart devices capable of data collection, storage, processing, and communication. The adoption of the IoT has brought about tremendous innovation opportunities in industries, homes, the environment, and businesses. However, the inherent vulnerabilities of the IoT have sparked concerns for wide adoption and applications. Unlike traditional information technology (I.T.) systems, the IoT environment is challenging to secure due to resource constraints, heterogeneity, and distributed nature of the smart devices. This makes it impossible to apply host-based prevention mechanisms such as anti-malware and anti-virus. These challenges and the nature of IoT applications call for a monitoring system such as anomaly detection both at device and network levels beyond the organisational boundary. This suggests an anomaly detection system is strongly positioned to secure IoT devices better than any other security mechanism. In this paper, we aim to provide an in-depth review of existing works in developing anomaly detection solutions using machine learning for protecting an IoT system. We also indicate that blockchain-based anomaly detection systems can collaboratively learn effective machine learning models to detect anomalies.

Design and Development of a Deep Learning-Based Model for Anomaly Detection in IoT Networks

Article

Full-text available

Jul 2021

In recent years, the security industry has seen an exponential increase in cyber-attacks. These attacks have been effective in accomplishing their despicable goals. A secure network needs a robust intrusion detection scheme. Traditional machine learning approaches seem to be inefficient in the face of dynamic communication networks and various intrusion techniques. They cannot satisfy the criteria of the modern network context. Deep learning is important in the field of network security. The deep learning-based Intrusion Detection System (IDS) efficiency is excellent, much better than conventional approaches. Convolutional neural networks are an excellent alternative for intrusion classification due to their ability to automatically classify main characteristics in input data and their effectiveness in performing faster computations. This paper proposes and implements a model for intrusion detection in the Internet of Things (IoT) networks using a convolutional neural network for binary and multiclass classification. The BoT-IoT, IoT Network Intrusion, MQQT-IoT-IDS2020, and IoT-23 intrusion detection datasets are used to train and validate the proposed convolutional neural network model. Three convolutional neural networks models are used to detect a variety of threats, with significant validation results. The transfer learning principle is then used to create a multiclass and binary classification model. Our proposed binary and multiclass classification model obtained excellent accuracy, precision, recall, and F1 score compared to the established deep learning implementations. Subsequently, the proposed binary and multiclass classification models perform better than traditional machine learning techniques and other existing deep learning architectures in accuracy, precision, recall, and F1 score.

A Review of Machine Learning and Deep Learning Techniques for Anomaly Detection in IoT Data

Article

Full-text available

Jun 2021

Abstract: Anomaly detection has gained considerable attention in the past couple of years. Emerging technologies, such as the Internet of Things (IoT), are known to be among the most critical sources of data streams that produce massive amounts of data continuously from numerous applications. Examining these collected data to detect suspicious events can reduce functional threats and avoid unseen issues that cause downtime in the applications. Due to the dynamic nature of the data stream characteristics, many unresolved problems persist. In the existing literature, methods have been designed and developed to evaluate certain anomalous behaviors in IoT data stream sources. However, there is a lack of comprehensive studies that discuss all the aspects of IoT data processing. Thus, this paper attempts to fill this gap by providing a complete image of various state-of-the-art techniques on the major problems and core challenges in IoT data. The nature of data, anomaly types, learning mode, window model, datasets, and evaluation criteria are also presented. Research challenges related to data evolving, feature-evolving, windowing, ensemble approaches, nature of input data, data complexity and noise, parameters selection, data visualizations, heterogeneity of data, accuracy, and large-scale and high-dimensional data are investigated. Finally, the challenges that require substantial research efforts and future directions are summarized.

Attack and Anomaly Detection in IoT Networks using Machine Learning Techniques: A Review

Article

Full-text available

Jun 2021

The Internet of Things (IoT) is one of today's most rapidly growing technologies. It is a technology that allows billions of smart devices or objects known as "Things" to collect different types of data about themselves and their surroundings using various sensors. They may then share it with the authorized parties for various purposes, including controlling and monitoring industrial services or increasing business services or functions. However, the Internet of Things currently faces more security threats than ever before. Machine Learning (ML) has observed a critical technological breakthrough, which has opened several new research avenues to solve current and future IoT challenges. However, Machine Learning is a powerful technology to identify threats and suspected activities in intelligent devices and networks. In this paper, various ML algorithms have been compared in terms of attack detection and anomaly detection, following a thorough literature review on Machine Learning methods and the significance of IoT security in the context of various types of potential attacks. Furthermore, possible ML-based IoT protection technologies have been introduced.

ToN_IoT: The Role of Heterogeneity and the Need for Standardization of Features and Attack Types in IoT Network Intrusion Data Sets

Article

Full-text available

May 2021

The Internet of Things (IoT) is reshaping our connected world as the number of lightweight devices connected to the Internet is rapidly growing. Therefore, high-quality research on intrusion detection in the IoT domain is essential. To this end, network intrusion datasets are fundamental, as many attack detection strategies have to be trained and evaluated using such datasets. In this paper, we introduce the description, statistical analysis, and machine learning evaluation of the novel ToN_IoT dataset. Comparison to other recent IoT datasets shows the importance of heterogeneity within these datasets, and how differences between datasets may have a huge impact on detection performance. In a cross-training experiment, we show that the inclusion of different data collection methods and a large diversity of the monitored features are of crucial importance for IoT network intrusion datasets to be useful for the industry. We also explain that the practical application of IoT datasets in operational environments requires the standardization of feature descriptions and cyberattack classes. This can only be achieved with a joint effort from the research community.

A Stacked Ensemble Classifier for an Intrusion Detection System in the Edge of IoT and IIoT Networks

Conference Paper

Nov 2022

Over the last three decades, cyberattacks have become a threat to national security. These attacks can compromise Internet of Things (IoT) and Industrial Internet of Things (IIoT) networks and affect society. In this paper, we explore Artificial Intelligence (AI) techniques with Machine and Deep Learning models to improve the performance of an anomaly-based Intrusion Detection System (IDS). We use the ensemble classifier method to find the best combination between multiple models of prediction algorithms and to stack the output of these individual models to obtain the final prediction of a new and unique model with better precision. Although there are many ensemble approaches, finding a suitable ensemble configuration for a given dataset is still challenging. We designed an Artificial Neural Network (ANN) with the Adam optimizer to update all model weights based on training data and achieve the best performance. The result shows that it is possible to use a stacked ensemble classifier to achieve good evaluation metrics. For instance, the average accuracy achieved by one of the proposed models was 99.7%. This result was better than the results obtained by any other individual classifier. All the developed code is publicly available to ensure reproducibility.

A new distributed architecture for evaluating AI-based security systems at the edge: Network TON_IoT datasets

Article

May 2021

While there has been a significant interest in understanding the cyber threat landscape of Internet of Things (IoT) networks, and the design of Artificial Intelligence (AI)-based security approaches, there is a lack of distributed architecture led to generating heterogeneous datasets that contain the actual behaviours of real-world IoT networks and complex cyber threat scenarios to evaluate the credibility of the new systems. This paper presents a new realistic testbed architecture of IoT network deployed at the IoT lab of the University of New South Wales (UNSW) at Canberra. The platform NSX vCloud NFV was employed to facilitate the execution of Software-Defined Network (SDN), Network Function Virtualization (NFV) and Service Orchestration (SO) to offer dynamic testbed networks, which allow the interaction of edge, fog and cloud tiers. While deploying the architecture, real-world normal and attack scenarios are executed to collect labeled datasets. The datasets are referred to as “ToN_IoT”, as they comprise heterogeneous data sources collected from telemetry datasets of IoT services, Windows and Linux-based datasets, and datasets of network traffic. The ToN_IoT network dataset is validated using four machine learning-based anomaly detection algorithms of Gradient Boosting Machine, Random Forest, Naive Bayes, and Deep Neural Networks, revealing a high performance of detection accuracy using the set of training and testing. These new datasets provide a realistic testbed of design, a variety of normal and attack events, heterogeneous data sources, and a ground truth table of security events. A comparative summary of the ToN_IoT network dataset and other competing network datasets demonstrates its diverse legitimate and anomalous patterns that can be used to better validate new AI-based security solutions. The datasets can be publicly accessed from ADFA (2020).

Anomaly Based Intrusion Detection for IoT with Machine Learning

Conference Paper

Oct 2020

Improving ML/DL Solutions for Anomaly Detection in IoT Environments

Abstract and Figures

Recommended publications

Anomaly Detection Techniques using Deep Learning in IoT: A Survey

AI for Anomaly Detection in IoT

Anomaly Detection in IoT Using Extended Isolation Forest

A Contextual Derivation Algorithm for Cybersecurity in IoT Environments