Conference PaperPDF Available

Machine learning and datamining methods for hybrid IoT intrusion detection

November 2020

November 2020

DOI:10.1109/CloudTech49835.2020.9365895

Conference: 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech)

Authors:

Abdellatif el Ghazi

Université Internationale de Rabat

Rachid Ait Moulay

Université Mohamed 5

: Framework Architecture This will allow MapReduce to use each core to calculate summations separately and combine the results to calculate the final covariance matrix  [40].

…

: Architecture of Neural Network

…

Figures - uploaded by Rachid Ait Moulay

Content may be subject to copyright.

Content uploaded by Rachid Ait Moulay

Content may be subject to copyright.

Machine learning and datamining methods for

hybrid IoT intrusion detection

Abdellatif El Ghazi

TIC Lab, Information and

Communication Technology

Laboratory

International University in Rabat

Rabat, Morocco

abdellatif.elghazi@uir.ac.ma

Ait Moulay Rachid*

Algebra and Functional Analysis Group

Mohamed V University in Rabat*

Rabat, Morocco

rachid_aitmoualy@um5.ac.ma*

Abstract—By 2025 Internet of things will reach over 75 billion

devices which would exceed number of humans about 8.1

billion. These devices need to be secured from many threats by

implementing secure and interoperable solutions in order to

guarantee a proper functioning of the infrastructures and

systems using the IoT. This is why we proposed a hybrid

intrusion detection system installed on the cloud powering

another online and real time intrusion detection system on the

fog to monitor the communication and detect attacks before it

spreads over the network as in the case of Mirai botnet. We

will provide details of the different algorithms used to

implement this distributed system so as to detect attacks

against IoT devices.

Keywords—IDS, Cloud, Fog Computing, Machine learning,

Datamining, Honeypot.

I. INTRODUCTION

The use of IoT in different sectors such as (health,

Transport, supply management and logistics, smart buildings

and homes), also in personal utilities and wearables, they

became omnipresent and widespread in many infrastructures

and organizations thanks to smart watch, smart TVs, sensors,

actuators ...

The internet of things is a new paradigm that connects the

physical world (houses, buildings, factories), this new

technology can be connected to the internet using sensors to

obtain measurements of (temperature, pressure, pollution rate

in the area, light, vibration), it may also determine the road

condition and help indentifying people in a house or a

building using RFID, and actuators that can control these

devices using the data collected from sensors.[1,2,3]

To benefit of the full functionalities offered by connected

devices that exchange and exploit a vast amount of data

with proprietary platforms, which requires the

implementation of solutions, capable of securing a

heterogeneous network where each sensor or actuator can

implement its own protocol especially in recent years, where

Internet of things have become the Achilles heels of

companies and organizations targeted by malware and

viruses that spread using IoT devices because of the lack of

standard and insecure protocols. [1,4,31]

We adopted the integration of IoT devices with big data

solutions and machine learning algorithms, to analyze and

process the data collected by sensors in different fields

(smart homes, smart streets…) also in different types of

networks (Zigbee, Bluetooth, WIFI) to improve the security

and detect malicious devices in an environment that grows

and increases in number every day. [5]

By 2025 the number of sensors and actuators can reach 100

billion of devices with a revenue of 3 trillion dollars [6,7,46],

which sometimes use web-based protocols called web of

things giving the possibility to send data to the cloud servers

or even communicate with social networks [8,3].

This rich and uncontrollable environment gives hackers a

way to disrupt organizations by using these IoTs in their

future attacks, such as DDOS or to make money with them

by what we call crypto-mining.

To create an IDS we distinguish 3 types of analytics and

training applied:

Anomaly intrusion detector [9,11,14,15]: this system is

based on a model that is trained with normal network traffic

in order to be able to detect abnormal behaviour and evaluate

traffic that is not included in this category (port scan,

increase bandwidth usage). It can also disseminate zero day

attacks, i.e. never encountered and never known in the

training phase, which makes it difficult for attackers to

access a network without being detected since this type of

system is configured and trained for a network and

environment at its normal state. Furthermore, it can be used

to generate signatures for misuse intrusion detection systems.

Misuse intrusion detector [9,10,11,12,13] : is a system

based only on attack signatures, therefore, it cannot detect

zero day attacks and it requires recurrent updates of its

knowledge databases since we must first know how the

attacks are carried out against a network of IoT devices in

order to protect it. One of its abilities is that it is not

generating a high rate of false positives and negatives like

anomaly based detectors.

Hybrid systems [9]: is a system that combines both an

anomaly detector and an abuse detector, in order to create a

system that takes advantage of the signatures generated by an

anomaly IDS to perform the basic knowledge database

update of a misuse detector system to strengthen network

security.

However, there are two major categories of intrusion

detection systems based either on host or network monitoring

• Network intrusion detection system (NIDS) : it is

installed on the network extremum to detect outside attackers

by monitoring and analyzing intercepted packets to spot

attack signatures and abnormal behaviour (misuse detection,

behaviour analysis) like we stated before. [9,11,32]

• Host Intrusion detection system (HIDS): it is installed

on the machine or the host we intend to monitor, its principal

purpose is to identify insider threats and detect abnormal

actions like suspicious files or modification of logs, unknown

system calls, adding or deleting a user… [9,11,32]

To protect businesses and organizations using IoT, one of

the best solutions is integrating an hybrid IDS based on

behaviour and misuse detection system against different

types of attacks like Dos, MITM, unauthorized access and

control from the Internet, private data access, privilege

escalation, malware infection ... [4,10,25,26] using a

mechanism based on artificial intelligence, statistics, data

mining [23] and data collection to minimize false positives

and false negatives for unknown or new attacks (zero day)

and maximize the detection rate of real intrusions.

The generation of massive data by IoT devices will

require using and applying one or more artificial intelligence

algorithms to analyze data by coupling it with big data tools

such as MapReduce and Hadoop for parallel processing and

distributed storage. This article will consist of 4 sections:

 Related works.

 Theoretical Hybrid IDS Framework.

 Metrics and evaluation.

 Conclusion and implementation challenges.

II. RELATED WORKS

To secure IoT devices a framework named EXPOSURE has

been proposed to classify domain names into benign or

malicious (Botnet, virus link, spam or phishing link), it uses

a J48 decision tree which is an implementation of the C4.5

decision tree. The framework is composed of 5 modules the

first one is a data collector that looks for malicious and

benign domain names using different sources from the

internet , a second module that works as a local data collector

for monitoring the network where EXPOSURE is installed, a

feature analyser that uses both the precedent modules, so that

to label domain names accordingly into malicious or benign,

then the result of this feature analyser is used by a machine

learner module and a classifier module. This framework

which was updated every day to keep up with threats that can

damage a network it can detect new malicious domain names

that weren‟t in the training set with an accuracy of 98.5%

and 0.9% of false positive rate. [20,21,22].

A misuse intrusion detection system was proposed using ID3

algorithm coupled with unsupervised clustering algorithm

that process the rules used by snort IDS [19] to feed them to

ID3 decision tree in order to facilitate and optimize the

classification. This combination of techniques between

supervised and unsupervised algorithms have given better

results than the naïve processing used by snort which

compares an input with the installed signatures. The old

technique might become slower if the signatures in the

database are very large.

The framework achieved a maximum speed increase of

105% and an average speed of 40.3% and a minimum of 5%

[18].

Another IDS implemented a naïve Bayes algorithm which is

known to be fast and intuitive [17]. It was tested on three

types of attacks like Dos, Scan, Unauthorized Access with an

accuracy rate respectively of 99% 96% 90%. This IDS

showed better results in term of speed and accuracy but with

too many false positives compared to a Neural Network IDS

[16].

A lightweight IDS for edge devices using SVM was

proposed to detect only DOS attacks, they used the

transmission rate of the packet field to train their model

because they remarked this attribute was increasing or

decreasing depending of the attack's type, or the execution

stage (exfiltration of data, malware update). ,

From this attribute they derived three features composed of

mean, maximum, median to avoid under-fitting.

They also created multiple features sets to test the

performance of the lightweight SVM with three different

kernels: Linear kernel, Polynomials, radial basis function.

They conducted multiple experiments to choose better

parameters for the SVM based IDS, and they have found out

that the linear lightweight SVM is much better than other

two kernels. Then they compared the performance of this

IDS using accuracy and CPU time with other lightweight

algorithms like a Genetical based SVM [33] and A-IDS [34]

and wfs-IDS [35]. their IDS outperformed other three

algorithms in accuracy and CPU time which confirmed its

lightweight property because the CPU time is less than the

mentioned three algorithms thus it will not consume more

energy and resources for an accuracy of 98.3%. The problem

with their IDS is that it has not been used on other attacks

like remote to local or unauthorized access on the edge node

which limited their IDS to detect only DOS, so it can‟t be

generalized on other attack types. [36]

To protect fog nodes and enhance IoT security an adaptive

IDS using Artificial Neural Network was created, capable of

measuring threats and self-protecting against attacks by

closing connection or asking for authentication... depending

on the threat level. They used a risk management unit at the

end of the output of their model to evaluate the risk of the

abnormal behaviour into different levels between 0 (no

attack) or secure state to level 3 which is equivalent to a fog

node being at destabilized state that can cripple the fog node

functionalities. This risk management unit utilizes the output

calculated by the Artificial Neural Network model to

evaluate the threat activity by verifying the interval of output

τ. This unit has the ability to monitor logs to check the

authenticity and the periodicity of the actions that has been

raised to the risk management unit to measure its threat

levels. They trained three models depending on the resources

they are trying to monitor, these resources include Memory

availability, buffer consumption and CPU usage.

These models showed an error near to zero when they were

compared to their real value.

The architecture of ANN is composed of 10 neurons in

hidden layer, an activation function using sigmoid symmetric

function, and 2 delays unit and one linear output function.

For the algorithm optimization they used levenberg

marquardt backpropagation algorithm that showed its

capability to efficiently distinguish between normal and

abnormal activity. The Framework is able to protect against

DOS, flooding with accuracy that can reach 97% and

precision of 98.4% and recall of 98.9%, with little overhead

to the fog node which can be categorized as lightweight

because it did not stress the fog node resources. [37]

To maximize detection rate the authors [47] used DNN-

KNN algorithm operating on the fog, which implemented a

binary classification model, in the first step they used DNN

to classify event into malignant or non malignant, it is

composed of one input Layer and two hidden Layers, each

of them has the same number of neurons, in the hidden

Layer they used hyperbolic tangent as activation function,

for the output Layer they used two neurons, one neuron for

the malignant activity and the other one for the normal

behaviour, they used softmax activation function in the

output Layer. if one of the neurons does not achieve a

defined limit to conclude the classification of the activity as

normal or malicious, the suspicious activity will be sent to a

feature reduction module implementing Information Gain

algorithm which selects the best features for classification,

these features are redirected to the KNN algorithm to be

finally classified, the result from the k nearest neighbor is

considered as final. The DNN-KNN algorithm showed

an accuracy rate of 99.77% and recall rate of 99.76% for the

NSL-KDD dataset. For the CICDS2017 it also showed a

higher accuracy and recall rate attaining 99.85% and

99.87% compared to other implementation using the same

datasets.

Another technique to secure edge devices has been proposed,

it is composed of 3 modules, the first module is a snort IDS

that has the ability to identify and catch malicious devices,

this IDS informs a secure load balancer module about the

category and the identification of the edge device. This

secure load balancer uses a Markov model to confirm the

device category (compromised, normal) and calculate its

shifting probability, and then it uses another hidden Markov

model to decide if the edge device traffic should be diverted

to a third module which is a cloud honeypot that aims to

monitor and log all the traffic made by the suspected device.

The honeypot uses a two stage Markov model to flag an edge

device as secure if it has been classified incorrectly as

compromised by the precedent module (secure load

balancer). This framework can achieve an accuracy of 90%,

it has the ability to minimize the false positives by an online

honeypot monitoring diverted traffic of suspected edge

device to confirm if it has been misclassified, and also It

improves the IDS response by updating a database with

attacks detected within the network.[38]

III. THEORETICAL HYBRID IDS FRAMEWORK:

A.MOTIVATION

The goal of our hybrid cloud-based and distributed IDS

is to minimize false positives and maximize the detection of

zero-day attacks for the signature intrusion detection system

based on online Incremental SVM [29,39] installed on a fog

architecture [28], since it is not able to detect new attacks

(zero-day). This signature based IDS will collect signatures

from a honeypot [27] installed on the internet, and an

anomaly detector system based on Artificial Neural Network

[37] that will help enhance detection rate of the online SVM

IDS.

B.SYSTEM ARCHITECTURE

The system will be composed of 4 modules, a data

collector based on an intelligent honeypot installed on the

cloud to detect all known and also zero-day type attacks

against IoT devices like sensors, and cameras.

A second component for feature selection and reduction

tool that allows to reduce dimensionality and to minimize

inputs as well as features in order to make processing and

model creation faster and consume less memory, CPU by

using PCA coupled with MapReduce to maximize feature

reduction speed. A third module based on Artificial Neural

Network to detect anomalous behaviour and update a

database of the attacks used by the fourth module, which is

an online signature based IDS that uses a database of attacks

updated by both the online honeypot and the third module

(Artificial Neural Network) to help detect new zero day type

attacks as depicted in Fig. 1.

1) Data Collector Using Smart Honeypot

An intelligent honeypot [24,27] will be used to collect

and store attacks in the server. Then process them in real

time in the cloud to automatically update the misuse detector

in the network where edge and IoT devices are installed in

order to predict new zero day attacks that are not able to

disseminate. This system which is inspired from a practical

implementation [24] will be composed of a hybrid honeypot

capable of extending the time of attackers connections and

sessions, so that it goes to the next step to recover the bits of

code used for the persistence of attackers and the exploitation

of IoT devices to be used for future attacks.

In order to implement this system, we will use machine

learning algorithms like the hidden Markov as well as deep

learning and reinforcement learning algorithms like Q-

learning.

This honeypot will be initially naive but gradually will

make updates to its internal knowledge database by

searching on the internet for answers wanted and desired by

attackers using platforms like shodan, censys.io, zoomeye

and also masscan to extend their sessions. These responses

collected from the Internet will then be stored in the database

to be selected afterwards by learning algorithms, to increase

further the session time and retrieve the exploitation code

(payload).

Attacks and payloads sent by attackers will be stored for

future processing in order to send them to signatures

database, so that the malicious detector updates his model for

detecting new zero day attacks (unknown).

However, in order to make this process fast for the

incremental learning algorithm we will use big data tools like

MapReduce and also apply algorithms for feature reduction

like PCA to update their knowledge base of their signature.

The tools used to collect the data will be tcpdump and

Wireshark. [30]

2) Feature Selection and processing Tool

To make the online SVM IDS work faster we need to use

principal Component Analysis (PCA) [42] for feature

reduction, it‟s principally used for data optimization and

compression. We need also to use MapReduce [41] to

process the constant flow and large quantity of data captured

by the honeypot and the Artificial Neural Network IDS. We

think this combination of PCA and MapReduce will allow

the online IDS to be more adaptive and respond swiftly to

new attacks without too much delay.

However, to achieve the precedent goal we will use a

multicore machine that will calculate every summation

expressions within eigenvectors of the covariance matrix 

used by the PCA :

 = 1

m ( xixiT

i=1 ) – ((1

m∗ xi

i=1 )*(1

m xiT

i=1 ))

(1)

Figure 1 : Framework Architecture

This will allow MapReduce to use each core to calculate

summations separately and combine the results to calculate

the final covariance matrix  [40].

3) Artificial Neural Network

To train the Artificial Neural Network (ANN), to help us

discover anomalous behaviour we will use a testbed

composed of :

 Several smart bulbs.

 Smart TV.

 Temperature sensor.

We chose Multilayer Perceptron (MLP) to train our

model from normal and attack data captured by Wireshark.

The MLP is composed of one input Layer, one Hidden

Layer, and one output Layer as shown in Fig. 2. For the

hidden layer we will have 7 neurons, and the function

applied for activation is sigmoid function. In the input and

hidden layer we will have one bias unit noted as “b”. The

MLP will use feed-forward algorithm to calculate signal

value “a” of each neuron connection from the input layer to

hidden layer using weight 𝑤𝑖of an input 𝑥𝑖with “n” as the

total number of inputs, and then calculate the error in the

output layer by comparing the final signal with the expected

result. To calculate this output signal of a neuron we apply

this equation :

a = f( wixi

i=0 + b).

After calculation of the error, MLP uses backpropagation

algorithm to forward back the error to each layer to

recalculate and adjust the weights and bias to minimize the

error produced in the feed-forward step.

However, after we train our model offline we will proceed on

the online phase where we will install the MLP based IDS on

a cloud server to validate traffic classified by the online

SVM IDS as normal to make sure no new attacks have

passed undetected. Otherwise, IDS behaviour will update a

signature database after being processed by PCA. This

Figure 2 : Architecture of Neural Network

This technique will help strengthen the misuse IDS and

increase its zero-day attack detection rate.[43,44]

4) Online Incremental SVM :

To protect a network of IoT devices we will rely on an

online Incremental SVM on a fog architecture installed on

Raspberry pi. This misuse IDS will use a database that

contains attack signatures coming from an intelligent

honeypot and behavioural IDS processed by PCA to reduce

data complexity for the SVM to perform better [40]. The

SVM is a statistical supervised learning algorithm capable of

doing binary classification, it uses a decision hyperplane line

to maximize distance separation between two classes. For

higher dimensional input data x1,x2,x3,x4……..xn a kernel

function is applied K and bias b with coefficient 𝛼0,𝑖as

follows :

f(x) = sign( α0,iK(x, xi)

i=1 + b)

multiple kernel functions can be used like linear, RBF

and polynomial kernel, but for this IDS we chose linear

kernel function [45] :

K(x,𝑥𝑖) = 𝑥𝑇∗ 𝑥𝑖

To secure IoT internal network we will first use data

𝑃𝑅0captured by the honeypot, then we will apply PCA to

reduce data dimensionality in order to create our initial

model. This misuse IDS will permanently update its attack

signatures by using an online database. To implement the

online training we will use the initial vectors 𝑆𝑉

calculated in first SVM model and add it to 𝑃𝑅1 to get

another support vectors 𝑆𝑉

1,we will follow this procedure

recursively for every sample of data untill 𝑃𝑅𝑛 in signature

database using 𝑆𝑉

𝑛−1vectors [45].The algorithm used in the

online training is as follows :

𝑆𝑉

𝑖 = 𝑃𝑅𝑖+𝑆𝑉

𝑖−1

IV. METRICS AND EVALUATION :

To establish a classification method we need an approach

to measure the performance and relevance of a classification

model.

To verify this system we need the following information:

(5)

(4)

(3)

(2)

• True positive: a positive represents a sample which

was malicious and it has been well classified as malicious by

the machine learning algorithm (30: see “Table I”).

• True negative: represents a sample which was

correctly classified as Begnin (820: see “Table I”,).

• False positive: are the data that was incorrectly

classified as malicious even if they are Begnin (20: see

“Table I”).

• False negative: represents data that was incorrectly

classified as Benign while they are malicious. (30:see “Table

I”).

• Accuracy : (TP + TN) / (TP + TN + FP + FN) =

94,44%.

In this example we found that the rate is high (94,44%),

but does not reflect the quality of the model especially that

among 60 malicious data 30 were well classified which

represent 50%. The accuracy rate is a somewhat naive

measure, it only gives a global vision of the model, but it is

not particularly relevant for unbalanced data. This is why we

will use the following metrics:

• Precision: TP / TP + FP = 30/50 (60%) it means when

our model classifies a data as an attack and predicts it

correctly with a rate of 60%.

• Recall or TPR Sensitivity: TP / TP + FN = 30/60

(50%) from all the data that was classified as malicious it

represents the rate of what was definitely malicious.

• False positive rate (FPR): FP / FP + TN = 20/840

(2,38%) is the rate of elements that have been misclassified

among the normal data (True negative).

V. CONCLUSION AND IMPLEMENTATION

CHALLENGES

IoT devices are known for their limited resources such as

memory or processing time, which has forced us to use a fog

computing architecture to protect the network from attacks,

combined with cloud computing to leverage storage and

processing power to perform complex tasks such as reducing

large data features. The framework described in this article is

adaptive to new attacks especially because it gets updated

using live data captured from the internet, which gives it an

edge advantage over other types of IDS that are trained either

on data not intended for IoT devices or do not use data that

gives better results for new attacks. We tried to combine high

true positive of misuse IDS and increase detection rate by

getting better data quality captured directly from the internet

using an intelligent honeypot. We have to implement this

framework and compare its performance with other type of

IDS to make sure this misuse IDS can perform better in

detecting zero day attacks in challenging environment.

ACKNOWLEDGMENT

The authors would like to thank our colleagues in

mathematics and statistics laboratory at the university

Mohamed V who contributed with their insights and mastery

of the subject to assist this research

We would also like to show our gratitude to Shodan

scanning website for giving us access to their API that

helped in creating this paper.

REFERENCES

[1] M. Noura, M Atiquzaman, M Gaedke Interoperability in internet

of things: Taxonomies and open challenges.Mobile Networks and

Applications, 2019.

[2] E Al Nuaimi, H Al Neyadi, N Mohamed Applications of big data

to smart cities. Journal of Internet,2015.

[3] J Gubbi, R Buyya, S Marusic, M Palaniswami Internet of Things

(IoT): A vision, architectural elements, and future directions.

Future Generation Computer Systems, 2013.

[4] Zhongjin Liu, Le Zhang, Qiuying Ni, Juntai Chen, Ru Wang, Ye

Li, Yueying He An Integrated Architecture for IoT Malware

Analysis and Detection. International Conference on Internet of

Things as a Service, 2018.

[5] Mohsen Marjani, Fariza Nasaruddin, Abdullah Gani, Ahmad

Karim,Ibrahum Abaker,Targio Hashem,Aisha Siddiqa,Ibrar

Yaqoob Big IoT Data Analytics : Architecture, Opportunities,

and Open Research Challenges, 2017.

[6] Jayavardhana Gubbi, Rajkumar Buyya, Slaven

Marusic,Maritmuthu Palansiwami. Internet of things (IOT) : A

Vision, Architectural Elements, and Future Directions. Future

Generation Computer Systems, 2013.

[7] Summia Taj, Uniza Asad, Moeen Azhar,Sumaira Kausar.

Interoperability in IOT based smart home : A review. Review of

Computer Engineering StudiesVol.5, No.3, September, 2018, pp.

50-55.

[8] Luigi Atzori, Antonio Iera,Giacomo Morabito. Internet of things

a survey. Computer Networks Volume 54, Issue 15, 28 October

2010, Pages 2787-2805.

[9] Anna L. Buczak,Erhan guven A survey of Data Mining and

Machine Learning Methods for Cyber Security Intrusion

Detection. IEEE Communications surveys & tutorials, 2015.

[10] MY Su Real-time anomaly detection systems for Denial-of-

Service attacks by weighted k-nearest-neighbor classifiers Expert

Systems with Applications, 2011. 38(4) : p. 3492-3498

[11] https://resources.infosecinstitute.com/network-design-firewall-

idsips/

[12] C Livadas, R Walsh, DE Lapsley, WT Strayer Using Machine

Learning Techniques to Identify Botnet Traffic.Proceedings.

2006 31st IEEE Conference on Local Computer Networks.

[13] F Jemili, M Zaghdoud, MB Ahmed A Framework for an

Adaptive Intrusion Detection System using Bayesian

Network.Intelligence and Security Informatics, ISI, IEEE

International Conference, 2007.

[14] C Kruegel, D Mutz, W Robertson Bayesian event classification

for intrusion detection. 19th Annual Computer Security

Applications Conference, 2003.

[15] S Benferhat, T Kenaza A naive bayes approach for detecting

coordinated attacks. 2008 32nd Annual IEEE International

Computer Software and Applications Conference.

[16] M Panda, MR Patra Network intrusion detection using naive

bayes. IJCSNS International Journal of Computer Science and

Network Security, VOL.7 No.12, December 2007.

[17] NB Amor, S Benferhat, Z Elouedi Naive bayes vs decision trees

in intrusion detection systems, 2004.

[18] C Kruegel, T Toth Using decision trees to improve signature-

based intrusion detection. International Workshop on Recent

Advances in Intrusion Detection, 2003.

Classification(Malicious)

Classification(Benign)

Real(

Malicious

)

30 (True positive)

30 (False negative)

Real

(Benign)

20 (False positive)

820 (True negative)

Table I. Confusion Matrix

[19] http://manual-snort-org.s3-website-us-east-1.amazonaws.com/

[20] http://cedric.cnam.fr/vertigo/Cours/ml2/coursArbresDecision.htm

[21] L Bilge, E Kirda, C Kruegel, M Balduzzi, EXPOSURE : Finding

Malicious Domains Using Passive DNS Analysis, 2011.

[22] L Bilge, S Sen, D Balzarotti, E Kirda, Exposure: A passive dns

analysis service to detect and report malicious domains. ACM

Transactions on Information and System Security April 2014.

[23] GR Hendry, SJ Yang, Intrusion signature creation via clustering

anomalies. Proceedings Volume 6973, Data Mining, Intrusion

Detection, Information Assurance, and Data Networks Security

2008.

[24] Tongbo Luo, Zhaoyan Xu, Towards an Intelligent-Interaction

Honeypot for IoT Devices : IoTCandyJar. Black Hat, 2017.

[25] https://blog.avast.com/mqtt-vulnerabilities-hacking-smart-homes

[26] https://blog.shodan.io/security-researchers-find-vulnerable-IoT-

devices-and-mongodb-databases-exposing-corporate-data/

[27] ML Bringer, CA Chelmecki, H Fujinoki A survey: Recent

advances and future trends in honeypot research. I.J.Computer

Network and Information Security, 2012,10, 63-75.

[28] Hazzaa Alshareef ; Marwah Almasri ; Abdulaziz Albesher ; Dan

Grigoras Towards an Effective Management of IoT by

Integrating Cloud and Fog Computing. IEEE International

Conference on Smart Internet of Things (SmartIoT), 2019.

[29] D Nallaperuma, R Nawaratne, Online incremental machine

learning platform for big data-driven smart traffic management.

IEEE Transactions on Intelligent Transportation Systems (

Volume: 20 , Issue: 12 , Dec. 2019 ).

[30] H.H Pajouh, R, Javidan, R, Khayami, D. Ali, and K. K. R. Choo,

« A Two-layer Dimension Reduction and Two-tier Classification

Model for Anomaly-Based Intrusion Detection in IoT backbone

networks » IEEE Transactions on emerging Topics in

Computing, vol. PP, no 99,pp. 1-1, Nov. 2016.

[31] D. Kushner, „„The real story of stuxnet,‟‟ IEEE Spectr., vol. 50,

no. 3, pp. 48–53, Mar. 2013

[32] Elike Hodo, Xavier Bellekens, Andrew Hamilton, Pierre-Louis

Dubouilh. Threat analysis of IoT networks using artificial neural

network intrusion detection system. International Symposium on

Networks, Computers and Communications (ISNCC), 2016

[33] Peiying Tao,Zhe Sun, and Zhixin Sun. An improved Intrusion

Detection Algorithm based on ga and svm. IEEE Access,

6:13624-13631,2018.

[34] Shadi Aljawarneh,Monther Aldwairi, and Muneer Bani

Yassein.Anomaly-based intrusion detection system through

feature selection analysis and building hybrid efficient model.

Journal of Computational Science, 25:152-160,2018.

[35] Yang Li,Jun Li Wang,Zhi-Hong Tian,Tian-Bo Lu, and chen

Young.Building lightweight intrusion detection system using

wrapper-based feature selection mechanisms Computers &

Security, 28(6): 466-475, 2009.

[36] SANA ULLAH JAN, SAEED AHMED, VLADIMIR

SHAKHOV,INSOO KOO. Towards a Lightweight Intrusion

Detection System for the Internet of Things.IEEE

Access,7:42450 - 42471,2019.

[37] JESUS PACHECO, VICTOR H . BENITEZ , LUIS C . FLIX-

HERRN, AND PRATIK SATAMArtificial Neural Networks-

Based Intrusion Detection System for Internet of Things Fog

Nodes. IEEE Access,8:73907 - 73918,2020.

[38] A. S. Sohal, R. Sandhu, S. K. Sood, and V. Chang. A

cybersecurity framework to identify malicious edge device in

fogcomputing and cloud- of-things environments.Computers &

Security. 2019

[39] http://www.jmlr.org/papers/v7/laskov06a.html

[40] Cheng-Tao Chu,Sang Kyun Kim,Yi-An Lin,YuanYuan Yu,Gary

Bradski,Andrew Y NG,Kunle Olukotun. Map-Reducefor

Machine Learning on Multicore.Advances in Neural Information

Processing Systems 19 (NIPS 2006)

[41] https://www.guru99.com/introduction-to-mapreduce.html

[42] Wei Wang, Roberto Battiti. Identifying Intrusions in Computer

Networkswith Principal Component Analysis First International

Conference on Availability, Reliability and Security (ARES'06),

2006.

[43] Elike Hodo, Xavier Bellekens, Andrew Hamilton, Pierre-Louis

Dubouilh, Ephraim Iorkyase, Christos Tachtatzis,Robert

Atkinson. Threat analysis of IoT networks Using Artificial

Neural Network Intrusion Detection System.International

Symposium on Networks, Computers and Communications

(ISNCC),2016

[44] Sanmeet Kaur, Maninder Singh.Hybrid intrusion detection and

signature generation using Deep Recurrent Neural Networks.

Springer Neural Computing and Applications, 2019.

[45] Nadeem Ahmed Sayed,Huan Liu,Kah Kay sung.Incremental

Learning with Support vector Machines, 1999.

[46] Xu, Q., Aung, K. M. M., Zhu, Y., & Yong, K. L.

A Blockchain-Based Storage System for Data Analytics in the

Internet of Things. Studies in Computational Intelligence,

(Springer) 119–138, 2017.

[47] Cristiano Antonio de Souzaa,Carlos Becker Westphall,Renato

Bobsin Machado,João Bosco Mangueira Sobral,Gustavo dos

Santos Vieira. Hybrid approach to intrusion detection in fog-

based IoT environments, Computer Networks,Elservier,2020.

ABDELLATIF EL GHAZI

He is currently a professor since 2012 at school of energy of UIR,

and member of TICLab. he coordinates two projects,

ERASMUS+ e-VAL and MarMooc.

His research interests include :

digital analysis and optimization, cloud computing, IT security,

IoT and artificial intelligence.

RACHID AIT MOULAY

Received his Master degree in software engineering from the University

Mohamed V of Rabat in 2010.

He served about 10 years as a software Engineer engineer in many companies.

He is currently pursuing Ph.D from the University Mohamed V of Rabat.

His research areas include :

Cyber Security,IoT,Machine Learning,Smart Homes.

A Systematic Review on Hybrid Intrusion Detection System

Article

Full-text available

May 2022

As computer networks keep growing at a high rate, achieving confidentiality, integrity, and availability of the information system is essential. Intrusion detection systems (IDSs) have been widely used to monitor and secure networks. The two major limitations facing existing intrusion detection systems are high rates of false-positive alerts and low detection rates on zero-day attacks. To overcome these problems, we need intrusion detection techniques that can learn and effectively detect intrusions. Hybrid methods based on machine learning techniques have been proposed by different researchers. These methods take advantage of the single detection methods and leverage their weakness. Therefore, this paper reviews 111 related studies in the period between 2012 and 2022 focusing on hybrid detection systems. The review points out the existing gaps in the development of hybrid intrusion detection systems and the need for further research in this area.

Intelligent IoT Security Monitoring based on Fuzzy Optimum-Path Forest Classifier

Preprint

Full-text available

Dec 2021

The detection of intrusions in IoT networks is essen- tial to maintain the availability and integrity of data transmitted and generated by devices connected to these networks. This is primarily when the data originates from critical activities, such as activities in the military, financial, industrial, and health sectors. Machine learning techniques have been adopted to create ways to detect or improve the accuracy of existing models for automatic intrusion detection. However, it is difficult to find in the literature an accurate intrusion detection technique in an IoT environment, as there are different types of attacks that can happen in different ways. Therefore, to solve this problem, this work proposes applying Fuzzy OPF (Optimum-Path Forest) as a new detection algorithm for any threat that escapes the regular traffic of an IoT network. We evaluate our proposed approach by using five different ML algorithms: Linear Discriminant Analysis, Support Vector Machine, Bayes, K-Nearest Neighbors, and Optimum-Path Forest. Experimental results analysis showed that our proposed model outperforms well-known algorithms in the literature regarding the Accuracy, Recall, and F1 metrics.

Empowering IoT Security by Classifying Threat Detection Through Machine Learning

Conference Paper

Apr 2024

Intelligent IoT security monitoring based on fuzzy optimum-path forest classifier

Article

Full-text available

Jul 2022
SOFT COMPUT

Detection of intrusions in Internet of Things networks is essential to maintain the availability and integrity of the data generated and transmitted by connected devices. Such a procedure is paramount when the data originate from critical activities, such as military, financial, industrial, and health sectors. In the last decades, machine learning (ML)-based approaches have become one of the most suitable and adopted procedures for the task, providing automatic, fast, and accurate results. Despite such success, the literature still presents a gap regarding valid applications of intrusion detection in the IoT environments, which usually stands for a challenging task composed of different types of attacks. In this context, this work applies a recent technique based on graphs and logic fuzzy, namely Fuzzy Optimum-Path Forest (Fuzzy OPF), to detect threats that escape an IoT network's regular traffic. We evaluate our model against five well-known ML algorithms, i.e., Linear Discriminant Analysis, Support Vector Machine, Naive Bayes, K-Nearest Neighbors, and the standard Optimum-Path Forest. Experimental results show that Fuzzy OPF outperforms the baselines considering accuracy, recall, and F1 metrics. As a result, the Fuzzy OPF proposal for intrusion detection had a hit rate of 98 and 99%.

Intrusion detection and prevention in fog based IoT environments: A systematic literature review

Article

Full-text available

Jul 2022
COMPUT NETW

Currently, the Internet of Things is spreading in all areas that apply computing resources. An important ally of the IoT is fog computing. It extends cloud computing and services to the edge of the network. Smart environments are becoming real and possible through IoT and fog computing. However, they are not free from security threats and vulnerabilities. This makes special security techniques indispensable. Security is one of the biggest challenges to ensuring an optimal IoT and Fog environment. Combined with the significant damage generated by application attacks, this fact creates the need to focus efforts in this area. This need can be proven through existing reviews of the state-of-the-art that pointed out several open aspects that need greater research effort. In this way, this article presents a Systematic Literature Review (SLR) considering the context of intrusion detection and prevention in environments based on fog computing and IoT. This review addresses more than 100 studies that were included after going through an extensive inclusion/exclusion process, with well-defined criteria. From these studies, information was extracted to build a view of the current state-of-the-art and answer the research questions of this study. In this way, we identify the state-of-the-art, open questions and possibilities for future research.

Hybrid approach to intrusion detection in fog-based IoT environments

Article

Full-text available

Oct 2020
COMPUT NETW

In the Internet of Things (IoT) systems, information of various kinds is continuously captured, processed, and transmitted by systems generally interconnected by the Internet and distributed solutions. Attacks to capture information and overload services are common. This fact makes security techniques indispensable in IoT environments. Intrusion detection is one of the vital security points, aimed at identifying attempted attacks. The characteristics of IoT devices make it impossible to apply these solutions in this environment. Also, the existing anomaly-based methods for multiclass detection do not present acceptable accuracy. We present an intrusion detection architecture that operates in the fog computing layer. It has two steps and aims to classify events into specific types of attacks or non-attacks, for the execution of countermeasures. Our work presents a relevant contribution to the state of the art in this aspect. We propose a hybrid binary classification method called DNN-kNN. It has high accuracy and recall rates and is ideal for composing the first level of the two-stage detection method of the presented architecture. The approach is based on Deep Neural Networks (DNN) and the k-Nearest Neighbor (kNN) algorithm. It was evaluated with the public databases NSL-KDD and CICIDS2017. We used the method of selecting attributes based on the rate of information gain. The approach proposed in this work obtained 99.77% accuracy for the NSL-KDD dataset and 99.85% accuracy for the CICIDS2017 dataset. The experimental results showed that the proposed hybrid approach was able to achieve greater precision about classic machine learning approaches and the recent advances in intrusion detection for IoT systems. In addition, the approach works with low overhead in terms of memory and processing costs.

Artificial Neural Networks Based Intrusion Detection System for Internet of Things Fog Nodes

Article

Full-text available

Apr 2020

The Internet of Things (IoT) represents a mean to share resources (memory, storage computational power, data, etc.) between computers and mobile devices, as well as buildings, wearable devices, electrical grids, and automobiles, just to name few. The IoT is leading to the development of advanced information services that will require large storage and computational power, as well as real-time processing capabilities. The integration of IoT with emerging technologies such as Fog Computing can complement these requirements with pervasive and cost-effective services capable of processing large-scale geo-distributed information. In any IoT application, communication availability is essential to deliver accurate and useful information, for instance, to take actions during dangerous situations, or to manage critical infrastructures. IoT components like gateways, also called Fog Nodes, face outstanding security challenges as the attack surface grows with the number of connected devices requesting communication services. These Fog nodes can be targeted by an attacker, preventing the nodes from delivering important information to the final users or to perform accurate automated actions. This paper introduces an Anomaly Behavior Analysis Methodology based on Artificial Neural Networks, to implement an adaptive Intrusion Detection System (IDS) capable of detecting when a Fog node has been compromised, and then take the required actions to ensure communication availability. The experimental results reveal that the proposed approach has the capability for characterizing the normal behavior of Fog Nodes despite its complexity due to the adaptive scheme, and also has the capability of detecting anomalies due to any kind of sources such as misuses, cyberattacks or system glitches, with high detection rate and low false alarms.

Online Incremental Machine Learning Platform for Big Data-Driven Smart Traffic Management

Article

Full-text available

Jul 2019

The technological landscape of intelligent transport systems (ITS) has been radically transformed by the emergence of the big data streams generated by the Internet of Things (IoT), smart sensors, surveillance feeds, social media, as well as growing infrastructure needs. It is timely and pertinent that ITS harness the potential of an artificial intelligence (AI) to develop the big data-driven smart traffic management solutions for effective decision-making. The existing AI techniques that function in isolation exhibit clear limitations in developing a comprehensive platform due to the dynamicity of big data streams, high-frequency unlabeled data generation from the heterogeneous data sources, and volatility of traffic conditions. In this paper, we propose an expansive smart traffic management platform (STMP) based on the unsupervised online incremental machine learning, deep learning, and deep reinforcement learning to address these limitations. The STMP integrates the heterogeneous big data streams, such as the IoT, smart sensors, and social media, to detect concept drifts, distinguish between the recurrent and non-recurrent traffic events, and impact propagation, traffic flow forecasting, commuter sentiment analysis, and optimized traffic control decisions. The platform is successfully demonstrated on 190 million records of smart sensor network traffic data generated by 545,851 commuters and corresponding social media data on the arterial road network of Victoria, Australia.

Hybrid intrusion detection and signature generation using Deep Recurrent Neural Networks

Article

Full-text available

Jun 2020
NEURAL COMPUT APPL

Automated signature generation for Intrusion Detection Systems (IDSs) for proactive security of networks is a promising area of research. An IDS monitors a system or activities of a network for detecting any policy violations or malicious actions and produces reports to the management system. Numerous solutions have been proposed by various researchers so far for intrusion detection in networks. However, the need to efficiently identifying any intrusion in the network is on the rise as the network attacks are increasing exponentially. This research work proposes a deep learning-based system for hybrid intrusion detection and signature generation of unknown web attacks referred as D-Sign. D-Sign is capable of successfully detecting and generating attack signatures with high accuracy, sensitivity and specificity. It has been for attack detection and signature generation of web-based attacks. D-Sign has reported significantly low False Positives and False Negatives. The experimental results demonstrated that the proposed system identifies the attacks proactively than other state-of-the-art approaches and generates signatures effectively thereby causing minimum damage due to network attacks.

Toward a Lightweight Intrusion Detection System for the Internet of Things

Article

Full-text available

Mar 2019

Integration of the internet into the entities of the different domains of human society (like smart homes, health care, smart grids, manufacturing processes, product supply chains, and environmental monitoring) is emerging as a new paradigm called the Internet of Things (IoT). However, the ubiquitous and wide-range IoT networks make them prone to cyber attacks. One of the main types of attack is denial of service (DoS), where the attacker floods the network with a large volume of data to prevent nodes from using the services. An intrusion detection mechanism is considered a chief source of protection for information and communications technology. However, conventional intrusion detection methods need to be modified and improved for application to the Internet of Things owing to certain limitations, like resource-constrained devices, the limited memory and battery capacity of nodes, and specific protocol stacks. In this paper, we develop a lightweight attack detection strategy utilizing a supervised machine learning–based support vector machine (SVM) to detect an adversary attempting to inject unnecessary data into the IoT network. Simulation results show that the proposed SVM-based classifier, aided by a combination of two or three incomplex features, can perform satisfactorily in terms of classification accuracy and detection time.

Interoperability in IOT based smart home: A review

Article

Full-text available

Sep 2018

The IOT Internet of things is rapidly moving towards reality. It is broadly applied to plenty of areas which includes healthcare, smart homes, smart cities, energy management, logistics and to forth. Smart homes are distinct class of IOT environments with a variety of connected heterogeneous devices. Daily life entities connected to the internet are enriched with the capabilities of identifying, sensing and processing. The heterogeneous entities require to execute mutual performance of activities effectively & rapidly. However, with the huge growth of connected devices, the ability of interoperation among devices is going to be a challenge. In this paper a state of the art on interoperability in smart homes is presented. The detailed discussion is given on numerous feature of connectivity protocols used for interoperation, limitation of existing solutions encounters in research.

Interoperability in Internet of Things: Taxonomies and Open Challenges

Article

Full-text available

Jul 2018

In the last few years, many smart objects found in the physical world are interconnected and communicate through the existing internet infrastructure which creates a global network infrastructure called the Internet of Things (IoT). Research has shown a substantial development of solutions for a wide range of devices and IoT platforms over the past 6-7 years. However, each solution provides its own IoT infrastructure, devices, APIs, and data formats leading to interoperability issues. Such interoperability issues are the consequence of many critical issues such as vendor lock-in, impossibility to develop IoT application exposing cross-platform, and/or cross-domain, difficulty in plugging non-interoperable IoT devices into different IoT platforms, and ultimately prevents the emergence of IoT technology at a large-scale. To enable seamless resource sharing between different IoT vendors, efforts by several academia, industry, and standardization bodies have emerged to help IoT interoperability, i.e., the ability for multiple IoT platforms from different vendors to work together. This paper performs a comprehensive survey on the state-of-the-art solutions for facilitating interoperability between different IoT platforms. Also, the key challenges in this topic is presented.

An Improved Intrusion Detection Algorithm Based on GA and SVM

Article

Full-text available

Mar 2018

In the era of big data, with the increasing number of audit data features, human-centered smart intrusion detection system (IDS) performance is decreasing in training time and classification accuracy, and many SVM-based intrusion detection algorithms have been widely used to identify an intrusion quickly and accurately. This paper proposes the FWP-SVM-GA (feature selection, weight, and parameter optimization of support vector machine based on the genetic algorithm) algorithm based on the characteristics of the genetic algorithm (GA) and the support vector machine (SVM) algorithm. The algorithm first optimizes the crossover probability and mutation probability of GA according to the population evolution algebra and fitness value; then, it subsequently uses a feature selection method based on the genetic algorithm with an innovation in the fitness function that decreases the SVM error rate and increases the true positive rate. Finally, according to the optimal feature subset, the feature weights and parameters of SVM are simultaneously optimized. The simulation results show that the algorithm accelerates the algorithm convergence, increases the true positive rate, decreases the error rate, and shortens the classification time. Compared with other SVM-based intrusion detection algorithms, the detection rate is higher and the false positive and false negative rates are lower.

Towards an Effective Management of IoT by Integrating Cloud and Fog Computing

Conference Paper

Aug 2019

An Integrated Architecture for IoT Malware Analysis and Detection

Chapter

Jan 2019

Along with the rapid development of the IoT, the security issue of the IoT devices has also been greatly challenged. The variants of the IoT malware are constantly emerging. However, there is lacking of an IoT malware analysis architecture to extract and detect the malware behaviors. This paper addresses the problem and propose an IoT behavior analysis and detection architecture. We integrate the static and dynamic behavior analysis and network traffic analysis to understand and evaluate the IoT malware’s behaviors and spread range. The experiment on Mirai malware and several variants shows that the architecture is comprehensive and effective for the IoT malware behavior analysis as well as spread range monitoring.

Machine learning and datamining methods for hybrid IoT intrusion detection

Figures

Recommended publications

A hybrid machine learning method in detecting anomalies in IoT at the fog layer

R THAMARAI SELVI et al.: AN ENSEMBLE NEURAL NETWORK TECHNIQUE FOR IMPROVING SECURITY AMONG VARIOUS D...

A hybrid approach for malware detection in SDN-enabled IoT scenarios

The Performance of IoT Malware Detection Technique Using Feature Selection and Feature Reduction in...