ChapterPDF Available

FedGroup: A Federated Learning Approach for Anomaly Detection in IoT Environments

June 2023

June 2023

DOI:10.1007/978-3-031-34776-4_7

In book: Mobile and Ubiquitous Systems: Computing, Networking and Services (pp.121-132)

Authors:

Basem Suleiman

The University of Sydney

Muhammad Johan Alibasa

Telkom University

The increasing adoption and use of IoT devices in smart home environments have raised concerns around the data security or privacy of smart home users. Several studies employed traditional machine learning to address the key security challenge, namely anomaly detection in IoT devices. Such models, however, require transmitting sensitive IoT data to a central model for training and validation which introduces security and performance concerns. In this paper, we propose a federated learning approach for detecting anomalies in IoT devices. We present our FedGroup model and algorithms that train and validate local models based on data from a group of IoT devices. FedGroup also updates the learning of the central model based on the learning changes that result from each group of IoT devices, rather than computing the average learning of each device. Our empirical evaluation of the real IoT dataset demonstrates the capability of our FedGroup model and anomaly detection accuracy as the same or better than federated and non-federated learning models. FedGroup is also more secure and performs well given all the IoT data are used to train and update the models locally.KeywordsInternet of Things (IoT)Anomaly DetectionFederated LearningMachine LearningPrivacySmart Home

The architecture of our proposed approach FedGroup

…

Figures - uploaded by Basem Suleiman

Content may be subject to copyright.

Content uploaded by Basem Suleiman

Content may be subject to copyright.

FedGroup: A Federated Learning Approach for

Anomaly Detection in IoT Environments

Yixuan Zhang1, Basem Suleiman1[0000−0003−2674−0253] , and Muhammad Johan

Alibasa2[0000−0002−2335−0404]

1School of Computer Science, University of Sydney, Australia

{nikki.zhang,basem.suleiman}@sydney.edu.au

2School of Computing, Telkom University, Indonesia

alibasa@telkomuniversity.ac.id

Abstract. The increasing adoption and use of IoT devices in smart

home environments have raised concerns around the data security or pri-

vacy of smart home users. Several studies employed traditional machine

learning to address the key security challenge, namely anomaly detection

in IoT devices. Such models, however, require transmitting sensitive IoT

data to a central model for training and validation which introduces se-

curity and performance concerns. In this paper, we propose a federated

learning approach for detecting anomalies in IoT devices. We present

our FedGroup model and algorithms that train and validate local mod-

els based on data from a group of IoT devices. FedGroup also updates

the learning of the central model based on the learning changes that re-

sult from each group of IoT devices, rather than computing the average

learning of each device. Our empirical evaluation of the real IoT dataset

demonstrates the capability of our FedGroup model and anomaly detec-

tion accuracy as the same or better than federated and non-federated

learning models. FedGroup is also more secure and performs well given

all the IoT data are used to train and update the models locally.

Keywords: Internet of Things (IoT) ·Anomaly Detection ·Federated

Learning ·Machine Learning ·Privacy ·Smart Home.

1 Introduction

With the advancement of the Internet technologies, it was predicted that the

number of Internet of Things (IoT) devices in the smart home environment in

2020 would be 7 devices per person [1]. In smart home environments devices

such as sensors, smartphones, and smart TVs are connected to the Internet so

that they can be accessed and monitored remotely. To achieve this, data will be

continuously sensed so that these IoT devices can perform certain functions. For

example, turning on the air conditioner when the temperature reaches a certain

degree and switching oﬀ the TV when there is no one detected watching TV [2].

The signiﬁcant growth of IoT devices in smart homes has also brought forward

research interests in the very large amount of data that are collected and used to

support diﬀerent types of intelligent services for smart homes [3]. The collected

2 Y. Zhang et al.

data can be used to develop intelligent data-driven models for enhancing the

user’s experience of smart homes.

The connectivity of smart home devices to the Internet and continuous data

sensing brought a number of key challenges to smart home users, including data

privacy and malicious access and control of the sensitive IoT devices. The set-

tings of IoT devices do not take users’ privacy and security as a priority. Such

IoT devices are often vulnerable to network attacks given it’s connected to the

Internet, and these attacks can be pervasive [4]. Such attacks may include access

and transfer of data being sensed by these devices, remotely switching oﬀ secu-

rity and monitoring cameras, and opening the doors remotely for unauthorised

home residents. Recent research reported that around 59% of users are con-

cerned about smart devices listening to them without permission and gathering

data without their knowledge [5]. Therefore, it becomes crucial to maintain the

highest levels of privacy and security while these IoT devices are used in smart

homes. To address this challenge, a large amount of research work that em-

ploys AI-based approaches to detect anomalous behaviour in such IoT devices

[6][7][8][9]. These approaches heavily focus on traditional machine learning mod-

els which also bring new challenges. Such machine learning approaches require

transmitting all data sensed from all IoT devices to an external server to train

and validate central anomaly detection models. This can be very expensive and

could exhaust the bandwidth of the network. It also can expose the sensitive

data collected from the IoT devices over the network which makes it vulnerable

to cyber-attacks. The situation would not get better even with encrypting and

decrypting the data as it could add performance overhead.

In this study, we address the anomaly detection problem on IoT devices by em-

ploying a federated learning approach. Federated learning allows the training of

local models based on data sensed from a group of IoT devices within a smart

home. The local models do not need to transmit the raw IoT data, but only

model updates that result from local training. The model (parameter) updates

are then shared with a central model which aggregates the values of the learning

parameters and then sends them back to all local models, so they can update

their learning. Although this federated learning approach addresses the above

data security and performance overhead posed by traditional machine learning

models, it has its unique challenges. In this paper, we address the research ques-

tion: How to design local and central models that work in federated learning

settings, given various IoT devices in smart homes?

Existing federated learning approaches suggest using the overall average to up-

date the learning parameters shared by all local models. This might not be

practical as it ignores the bias of the local models (each might have diﬀerent

data with or without anomalies). We propose a new federated learning model

called FedGroup for anomaly detection in smart home environments. We present

algorithms that detail the training process on data collected from a group of IoT

devices and the process of updating learning parameters in the central model

FedGroup: A Federated Learning Approach for Anomaly Detection 3

based on the learning results from each group of IoT devices. Our FedGroup

addresses the bias resulting from averaging all updates from each device.

The main contributions of this paper are:

1. A new federated learning model and algorithms called FedGroup for anomaly

detection on IoT devices. The FedGroup model computes the learning up-

dates based on parameters from a group of IoT devices.

2. Empirical evaluation of FedGroup on real data collected from various IoT

devices [6][7][8][9]. The evaluation also includes a performance analysis of the

FedGroup against federated learning and non-federated learning models.

The rest of this paper is organised as follows. Section 2 describes related work

in the ﬁeld of anomaly detection for IoTs. Section 3 presents the dataset used

in this study and the FedGroup model and algorithms proposed in this paper.

Experiments and Results are then presented in Section 4. Section 5 presents key

conclusions and future work.

2 Literature Review

Various traditional machine learning and deep learning approaches have been

utilised to identify attacks on IoT devices. One study by Stojkoska et al. (2017)

[10] suggests that the cloud-centric of holistic IoT-based framework for smart

home environments requires substantial data storage and processing infrastruc-

ture, and the current state is far from eﬃcient. They highlighted that the new

approaches should comprehend the issue of massive data management on the

cloud. Furthermore, future studies also have to investigate diﬀerent methods

to ensure security since the cloud-based techniques pose an enormous risk of

revealing personal information and data, which are considered as urgent issues.

Past research tended to focus on centralised anomaly detection in which the

cloud collects data from various sources. This raises several issues including high

communication load and data privacy. Federated Learning (FL) was proposed

with the characteristic of lightweight communicating updates, and it was proven

to successfully predict text input on mobile devices [11]. FL merges the updates

from all the distributed devices thus the calculation on the cloud was signiﬁ-

cantly reduced, resulting in improved scalability and lightweight communication

[12]. Another study [13] compared the IoT intrusion detection using diﬀerent ap-

proaches, including centralised, on-device and FL. The eﬃciency of FL reached a

similar accuracy to the centralised approach. Besides, the study suggests that FL

outperformed the on-device approach as it could take advantage of the knowl-

edge from others. FL answers signiﬁcant drawbacks of centralised ML models

that are expensive, computationally diﬃcult, and have low scalability support.

Mohri et al. (2019) [14] indicated that diﬀerent clients might be weighted dif-

ferently by FL resulting in unfairness. Fairness in this context refers to both

the training data and the training procedures. The study indicates that the

4 Y. Zhang et al.

uniform distribution is not the common distribution in many cases. Therefore,

minimising the anticipated loss concerning the speciﬁc distribution is harmful

and might lead to a mismatch with the target. Consequently, the study pre-

sented an agnostic FL framework in which the centralised model is optimised

for any target distribution produced by a mixture of client distributions by util-

ising data-dependent Rademacher complexity. However, the optimisation of the

single worst device is limited for a smart environment with numerous IoT de-

vices. In separate research, Li et al. (2020) [15] concur that unfair distribution of

model could bring disproportionate performance since overall accuracy is high

but individual accuracy is uncertain. The generated model may be biased to-

wards devices with massive data. Their study developed An enhanced model

conﬁgured at a more granular scale to ensure equitable device distribution and

maintain the same overall accuracy.

A study analysis [16] is crucial for understanding ensemble learning (EL) for

network security, and anomaly detection can perform well in results. EL combines

multiple learning models and achieves better prediction results. Furthermore,

an ensemble of models has a stronger resilience in the face of training data

uncertainty. The EL concept has similarity to how FL aggregate the training

results. This opens up an opportunity to incorporate ensemble learning with

FL.

As previously shown, many studies showed FL showed better performance than

traditional ML and conﬁrmed the high privacy level of FL. However, research

identifying the attack using FL is still scarce and has not been explored in

depth. Besides, there are issues, for instance, the bias of the distributed models

is averaged to produce the ﬁnal global model that will cause unfairness. The past

studies neglected the reality that various local models have distinct functionality

and structure. The research gaps in the smart home environments are that past

research failed to address the similarity of network traﬃc ﬂow data patterns of

device models in the same category. The same type of IoT devices have similar

vulnerability structures under similar attacks. Therefore, IoT devices within the

same group should use similar parameters for anomaly detection. While it seems

like a straightforward method to aggregate updates together, the bias in the

training phase arises from the updating of participants’ parameters that diﬀer

from one another and the selection of the average.

3 Methodology

Our study aims to build an anomaly detection to detect whether there are any

attack attempts (Attack Detection). The ﬁrst section Research Data displays

the network traﬃc ﬂow data, and attack data are the original input data. Then,

the Research Method shows the details of designing models, and the Experiment

and analysis provide the preparation and evaluation process.

FedGroup: A Federated Learning Approach for Anomaly Detection 5

3.1 IoT Datasets

Our dataset was obtained from the UNSW IoT analytics team, consisting of real-

world attacks to assess the privacy and security dangers of IoT devices [6][7][8][9].

The dataset was collected from 28 unique IoT devices in various categories and

multiple non-IoT devices in the smart environment. There are 30 PACP ﬁles

consisting of both attack and benign data in two separate stages. The dataset

was split into two stages: the ﬁrst stage is between 28/05/2018 and 17/06/2018,

and the second stage is from 24/09/2018 to 26/10/2018. IoT devices are deﬁned

as devices linked to the Internet with application logic and executing TCP/IP

connection. Ten IoT devices in this dataset contain benign and attack traﬃc

datasets, whereas the others contain only benign data. The datasets used in our

study can be found in Flow and Annotation data 3, the implementation of our

algorithms and models, and supplementary results and materials can be accessed

from the project repository 4”. Therefore, this research focuses on the selected

ten IoT devices with wireless connection to the Internet in four categories listed

in Table. 1.

IoT devices

IoT Devices No. MAC Addresses IoT devices Category

IoT Device 0 00:16:6c:ab:6b:88 Samsung Smart Cam Camera

IoT Device 1 00:17:88:2b:9a:25 Phillip Hue Lightbulb Energy management

IoT Device 2 44:65:0d:56:cc:d3 Amazon Echo Contollers/Hubs

IoT Device 3 50:c7:bf:00:56:39 TP-Link Plug Energy management

IoT Device 4 70:ee:50:18:34:43 Netatmo Camera Camera

IoT Device 5 74:c6:3b:29:d7:1d iHome PowerPlug Energy management

IoT Device 6 d0:73:d5:01:83:08 LiFX Bulb Energy management

IoT Device 7 ec:1a:59:79:f4:89 Belkin Switch Energy management

IoT Device 8 ec:1a:59:83:28:11 Belkin Motion Sensor Energy management

IoT Device 9 F4:F5:D8:8F:0A:3C Chromcast Ultra Appliances

Table 1. IoT devices included in the dataset

The network traﬃc ﬂow data of the ten IoT devices are collected every minute,

marked with activity, and recorded to the ten separate excel network traﬃc

ﬂow data ﬁles. The ﬁles contain ”Timestamp”, ”NoOfFlows”, and a signiﬁ-

cant number of attributes of patterns. Several features such as ”InternetTcp”,

”InternetUdp”, ”LocalTcp”, and ”LocalUdp” are the contents of the follow-

ing ”From” and ”To”, and the contents after ”Port” are port numbers (e.g.,

”From###Port###Packet”). Since the packet and byte are not closely con-

nected and the sizes of the packets in this dataset vary, we decided to forecast

attacks by including them. Based on the network traﬃc ﬂow data, it is unknown

which network ﬂow is going to which IoT devices or coming from which IoT

3https://iotanalytics.unsw.edu.au/attack-data

4https://github.com/BasemSuleiman/IoT Anomaly Detection Smart Homes

6 Y. Zhang et al.

devices. The reasons are that diﬀerent IoT devices use the same port number

and use diﬀerent port numbers simultaneously.

The UNSW IoT analytics team designed a set of attacks comparable to real-

world attacks and are particular to several real-world consumer IoT devices.

The tools were created in Python to ﬁnd susceptible and vulnerable devices on

the local network by running diﬀerent tests against them. Then, the program

performs targeted attacks on IoT devices that are susceptible. The attack con-

dition includes the start and end time of the attacks, the impact of the attack,

and attack types. When determining the normal behaviour or under the attacks,

it relies on the rules ”if (ﬂowtime >= startTime ×1000 and endTime ×1000

>= ﬂowtime, then attack = true”. It is multiplied by 1000 since the times are

recorded in diﬀerent units: ﬂow time in milliseconds while start time and end

time are not.

3.2 Proposed Approach: FedGroup

FedAvg model accepts the initial model from the central server, training models

on decentralised local device servers, and reports the best performance parame-

ters to the central model [11]. For ML, there is only one step which is a client-

to-server upload step. In contrast with Traditional Machine Learning, FedAvg

sends code to data rather than send data to code. For FL-based learning, there

are four steps in one iteration:

1. A server-to-client broadcast step

2. A local client update step

3. A client-to-server upload step

4. A server update step

While it is simple for FedAvg to summarise all the parameters from local servers

and select the mean as the following round parameter, the main weakness is

the failure to address the similarity of network traﬃc ﬂow data patterns of the

device models in the same category. Furthermore, the devices in smart homes are

not assigned into diﬀerent groups based on their similarity. The devices in the

same groups should have similar functionalities and vulnerable risks. The bias

in the training procedure was from the updates of device parameters that are

diﬀerent from each other and easily choose the average. Imaging the IoT devices

in a smart home are mostly energy management applications such as plugs or

blubs, the parameters of cloud server will bias to the energy management devices

because the numbers of its are more remarkable than other groups.

Therefore, we present FedAvg with group masters called FedGroup that send

parameters to the group master rather than the central server. The group masters

have helped combine devices within the same group and aggregate parameters

depending on the groups. In the central model, there are multiple group masters

then send back the aggregated results to the corresponding participant local

FedGroup: A Federated Learning Approach for Anomaly Detection 7

model, which is more eﬃcient for the local model to focus on the information

within the same group.

The product can be grouped into many categories based on features. The group

can be deﬁned based on the category of the IoT devices such as Camera or

Energy management. Apart from that, the group can also be deﬁned by other

characteristics. If separate groups are based on features, for instance, the smart

door product contains many features to open the door such as app control,

ﬁngerprint recognition, entering the password, scanning an intelligent card or

simple using key unlock. If the smart door is under attack, the central model

will tell which speciﬁc part is under attack.

The followings are the steps of FedGroup (Fig 1):

1. Every local model computes training network traﬃc ﬂow data with all pa-

rameters and sends the parameters’ best results to the central model;

2. Group master in the central model aggregates the parameters based on the

group;

3. Group master sends back the aggregated results to devices in the correspond-

ing group;

4. Local models update the models with the new parameters.

Fig. 1. The architecture of our proposed approach FedGroup

8 Y. Zhang et al.

Deﬁnition: Using Xnto represent the network traﬃc ﬂow data of the IoT de-

vices and ynto represent the prediction target. Network NDnGi :Nrepresents

network, Dn means Device n and Gi represents Group i. The Xnand Mnare

included in NDnGi. During the training, setting the best score S, the best pa-

rameter B, the average score of the entire model C, and the average parameters

of the entire model A. For each Model M, parameters P={a, b, ...}means pa-

rameters such as weights, n estimator and so on with all possible parameters

grid p={a0, a1, ...},{b0, ...}, ... such as n estimator have parameters 1, 2 and

so on. Erepresents the selected parameters grids in the local models after the

update to the central model.

Algorithm 1 FedGroup: Client Side Learning Algorithm

1: INPUT: P , E

2: REQUIRE: Xn, yn, M

3: OUTPUT: Band Sto C entral Ser ver Side Learning :F edAvg with the related

Group Master

4: SET: Local Model M

5: /* Fit possible parameters grids and return the best parameters and the best

score*/

6: for e∈Edo

7: Fit Pin Mwith diﬀerent grid eto train Xnand yn

8: Test the Mto get the accuracy

9: CALCULATE Band S

10: end for

Algorithm 2 FedGroup: Group Master Algorithm

1: INPUT: Band Sof each d

2: DISPLAY: scores of each groups

3: OUTPUT: Aand Cto C entral Ser ver Side Learning :F edGr oup

4: CALCULATE Aand Cbased on Band S

FedGroup: A Federated Learning Approach for Anomaly Detection 9

Algorithm 3 FedGroup: Central Server Side Learning Algorithm

1: INPUT: M , P, p

2: OUTPUT: A, C

3: /* 1st round: receive the best parameters and best devices from every model, and

calculate the average parameters of each group*/

4: for g∈Gdo

5: for n∈Ndo

6: Initial: M

7: Cl ient Side Learning :F edGroup (P, p)

8: Return Band Sof each N

9: end for

10: Group M aster :FedGr oup (Band Sof each N)

11: Return Aand C

12: end for

13: /* 2nd round: send mean parameter to Client Server and return the mean score

and average parameter of mode*/

14: for g∈Gdo

15: for n∈Ndo

16: Cl ient Side Learning :F edGroup (P, A)

17: Return Band Sof each N

18: end for

19: Group M aster :FedGr oup (Band Sof each N)

20: Return Aand C

21: end for

3.3 Experiment and analysis

For network traﬃc ﬂow data of the IoT devices, remove “NoOfFlow” because it

counts the numbers of ﬂow, which is highly correlated to all the other attributes.

Due to the fact that diﬀerent devices use the same port number and the same

device use diﬀerent port numbers, there are 253 attributes about bytes of port

number and packages of the port number in total. When capturing the network

behaviour of one device by milliseconds, various port numbers have not been

used. In other words, the NaN data means there is no network behaviour for the

corresponding port number. That is the reason we have missing data. To ﬁll in

the missing data, we assign the most likely value and the global constant to a

particular value of 0. It signiﬁes no network behaviour with zeros packet-level

and zeroes byte-level network traﬃc ﬂow data at that time point.

To predict whether it is an attack means that there are two options: attack

or non-attack. Implementing Decision Tree, Logistic Regression and Ensemble

Learning as local models on ML, FedAvg, and FedGroup as central models to

attack detection. To avoid overﬁtting, StratiﬁedShuﬄeSplit split the dataset 80%

training and 20% testing dataset. For training data, Stratiﬁed 5-Fold Cross-

Validation randomly divides the entire data into ﬁve folds, ﬁts four folds to

the model, and validates the model using the remaining fold. Evaluate the 20%

testing data to compute accuracy with an F1 score with a weighted average.

10 Y. Zhang et al.

To evaluate the model, the False Positive Rate (FPR) is used to calculate the

probability of falsely rejecting the null hypothesis to measure the accuracy of

the test.

4 Results

This study developed an anomaly detection system by using our proposed model,

called FedGroup as described in the previous section. Table 2 provides the re-

sults summary from various models including Decision Tree, Logistic Regression

and Ensemble Learning as the local model on traditional ML, FedAvg and Fed-

Group to attack detection. EL can merge several models even if the individuals

are weak, and we use it as an initial local training model. Using ML as an initial

model for every IoT device is not always performed as expected because they

are used to solve a speciﬁc question or a type of question. For example, logis-

tic regression eﬀectively classiﬁes data into discrete classes by investigating the

connection between a collection of labelled data. However, If the number of fea-

tures is greater than the number of observations, Logistic Regression should not

be utilised. Considering the various performances that sometimes perform good

but sometimes perform not of machine learning models when solving a problem,

ensemble learning joins various contributing models to seek better forecasts.

Algorithms Attack Detection

Local Model Central Model Accuracy Running Times (seconds) FPR

Decision Tree Traditional ML 99.84% 8524 10.04%

Decision Tree FedAvg 99.85% 154 9.57%

Decision Tree FedGroup 99.87% 154 7.70%

Logistic regression Traditional ML 99.76% 21376 24.48%

Logistic regression FedAvg 99.77% 2912 20.28%

Logistic regression FedGroup 99.77% 2999 20.18%

Ensemble learning Traditional ML 99.85% 33940 9.60%

Ensemble learning FedAvg 99.91% 2390 9.03%

Ensemble learning FedGroup 99.91% 2143 9.43%

Table 2. The accuracy of FedGroup, FedAvg and traditional ML using diﬀerent models

Firstly, anomaly detection can determine whether there is any attack attempt.

The highest accuracy of 99.91% of attack detection was reached by the FedGroup

model using Ensemble learning as the locally model to train. Secondly, FL-

based learning models performed either similar or better than the traditional ML

models. Considering the anomaly detection problem is a no binary classiﬁcation,

while the StratiﬁedShuﬄeSplit is used to try to solve the problem of overﬁtting,

the accuracy of all models is more than 99%. Therefore, FPR is a more reliable

evaluation metric since higher FPR scores indicates higher ratio of negative

events are incorrectly categorised as positive. As shown in Table 2, the FPRs of

FedGroup: A Federated Learning Approach for Anomaly Detection 11

FL-based are less than the FPRs of the Traditional ML model indicating better

performance with less overﬁtting issue.

The running time of FL-based is less than the traditional ML model where the

client slide model spends O(n) and central server takes O(n2). For example, us-

ing Ensemble learning as the local model and FedGroup as the Central Model

spends 2143 seconds which is around 1/16 of time spend on Traditional ML

(33940 seconds) and 0.9 of time spend on FedAvg (2390 seconds). As a result

of lightweight communication, no central authority, and a decentralised learn-

ing model, FL uses the advantages of locally training data to reduce the run-

ning time. Besides, data safety is guaranteed without sending, communicating

or sharing to other IoT devices or the Internet.

Each smart home has a large amount of IoT devices to make our life more

eﬃcient and easier. If we focus on the diﬀerences of FPRs that are larger than

1%, then FPRs of FedGroup are better than FPRs of FedAvg. Diﬀerent IoT

devices have diﬀerent vulnerable functions and maybe under diﬀerent attacks.

Meanwhile, one similar attack may have similar functionality or patterns. When

the central model learns attack types from the same category of IoT devices,

FedGroup is useful to provide parameters of IoT devices within the same group.

Besides combining all the smart environments to build smart cities or industries,

FedGroup can learn all the attack detection and attack type detection based on

group categorisation such as the traﬃc light group, subway group, and others.

5 Conclusion

In this paper, we introduce a new model called FedGroup model and algorithms

which address the issue of IoT anomaly detection in the smart home environ-

ments. FedGroup allow training and detecting anomalies based on data collected

from group of devices, and thus reduces the vulnerability of the IoT data trans-

mitted and shared on a central server. We evaluate our FedGroup approach on

real dataset collected from various IoT devices in smart-home settings to detect

anomalous behaviour. Based on our experimental results, it can be concluded

that the performance of FedGroup improved in terms of accuracy of anomaly de-

tection compared to the traditional FedAvg. Furthermore, FedGroup can address

the issue of fairness of the training procedure and can maintain data privacy, as

the values of learning parameters need to be shared with the central model. Our

results also demonstrated that Ensemble Learning as local models used in our

FedGroup achieved the best accuracy, 99.91%.

While our ﬁnding has provided the comparison results of diﬀerent models, more

empirical studies on continuous real-time learning and alternative ways to ensure

the fairness of federated learning need to be conducted to test further and reﬁne

our ﬁndings. Besides, expanding the model to other frameworks not limited to

anomaly detection, ﬁnding the system cost and how the link instability of wireless

networks aﬀects the model updating are several opportunities for in future work.

12 Y. Zhang et al.

References

1. Evans, D. (2011). How the Next Evolution of the Internet Is Changing Everything.

11.

2. Robles, R. J., & Kim, T. (2010). Applications, Systems and Methods in Smart Home

Technology: A Review. International Journal of Advanced Science and Technology,

15, 13.

3. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things

(IoT): A vision, architectural elements, and future directions. Future Generation

Computer Systems, 29(7), 1645–1660. https://doi.org/10.1016/j.future.2013.01.010

4. Abomhara, M., & Køien, G. M. (n.d.). Security and privacy in the Internet of

Things: Current status and open issues. 8.

5. 59 per cent of smart speaker users have privacy concerns – report — Mo-

bile Marketing Magazine. Mobilemarketingmagazine.com. (2021). Retrieved 23

October 2021, from https://mobilemarketingmagazine.com/59-per-cent-of-smart-

speaker-users-have-privacy-concerns-report-.

6. Habibi Gharakheili, H., Sivanathan, A., Hamza, A., & Sivaraman, V. (2019).

Network-Level Security for the Internet of Things: Opportunities and Challenges.

Computer, 52(8), 58–62.

https://doi.org/10.1109/MC.2019.2917972

7. Hamza, A., Gharakheili, H. H., Benson, T. A., & Sivaraman, V. (2019). De-

tecting Volumetric Attacks on loT Devices via SDN-Based Monitoring of MUD

Activity. Proceedings of the 2019 ACM Symposium on SDN Research, 36–48.

https://doi.org/10.1145/3314148.3314352

8. Sivanathan, A., Gharakheili, H. H., Loi, F., Radford, A., Wijenayake, C., Vish-

wanath, A., & Sivaraman, V. (2019). Classifying IoT Devices in Smart Environ-

ments Using Network Traﬃc Characteristics. IEEE Transactions on Mobile Com-

puting, 18(8), 1745–1759. https://doi.org/10.1109/TMC.2018.2866249

9. Sivaraman, V., Gharakheili, H. H., Fernandes, C., Clark, N., & Kar-

liychuk, T. (2018). Smart IoT Devices in the Home: Security and Pri-

vacy Implications. IEEE Technology and Society Magazine, 37(2), 71–79.

https://doi.org/10.1109/MTS.2018.2826079

10. Stojkoska, B. L. R., & Trivodaliev, K. V. (2017). A review of Internet of Things for

smart home: Challenges and solutions. Journal of Cleaner Production, 140, 1454-

1464.

11. Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated Machine Learning:

Concept and Applications. ArXiv:1902.04885 [Cs]. http://arxiv.org/abs/1902.04885

12. McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. Y. (2017).

Communication-eﬃcient learning of deep networks from decentralized data. Re-

trieved from Artiﬁcial Intelligence and Statistics, 1273–1282 http://proceedings.

mlr.press/v54/mcmahan17a.html.

13. Rahman, S. A., Tout, H., Talhi, C., & Mourad, A. (2020). Internet of Things In-

trusion Detection: Centralized, On-Device, or Federated Learning? IEEE Network,

34(6), 310–317. https://doi.org/10.1109/MNET.011.2000286

14. Mohri, M., Sivek, G., & Suresh, A. T. (2019). Agnostic Federated Learning. 11.

15. Li, T., Sanjabi, M., Beirami, A., & Smith, V. (2020). Fair Resource Allocation in

Federated Learning. ArXiv:1905.10497 [Cs, Stat]. http://arxiv.org/abs/1905.10497

16. Vanerio, J., & Casas, P. (2017). Ensemble-learning Approaches for Network

Security and Anomaly Detection. Proceedings of the Workshop on Big Data

Analytics and Machine Learning for Data Communication Networks, 1–6.

https://doi.org/10.1145/3098593.3098594

Privacy-Aware Anomaly Detection in IoT Environments using FedGroup: A Group-Based Federated Learning Approach

Preprint

Full-text available

Jun 2023

Concerns on the data security and privacy of smart home users have been growing popularity due to the rising usage of IoT devices. Many traditional machine learning techniques have been used to perform anomaly detections. However, these models need to send private IoT data to a central model for validation and training, raising security and efficiency issues. We propose a new Federated Learning (FL) method called FedGroup, which adopts the FedAvg method, but it updates the learning of the central model based on the learning changes brought by each group of IoT devices. Our experimental results showed that FedGroup achieved same or better anomaly detection accuracy compared to other federated and non-federated learning methods. Furthermore, we showed how ensemble learning may be used to connect many contributing models for superior average prediction performance. FedGroup also improve the detection of attack type detection and attack type detail detection. By comparing our new models with baseline models, our models performed better showing an accuracy of 99.64% accuracy with 0.02% FPR on attack type detection and 99.89% accuracy on attack type detail detection.

Privacy-Aware Anomaly Detection in IoT Environments using FedGroup: A Group-Based Federated Learning Approach

Article

Full-text available

Jan 2024
J NETW SYST MANAG

The popularity of Internet of Things (IoT) devices in smart homes has raised significant concerns regarding data security and privacy. Traditional machine learning (ML) methods for anomaly detection often require sharing sensitive IoT data with a central server, posing security and efficiency challenges. In response, this paper introduces FedGroup, a novel Federated Learning (FL) method inspired by FedAvg. FedGroup revolutionizes the central model’s learning process by updating it based on the learning patterns of distinct groups of IoT devices. Our experimental results demonstrate that FedGroup consistently achieves comparable or superior accuracy in anomaly detection when compared to both federated and non-federated learning methods. Additionally, Ensemble Learning (EL) collects intelligence from numerous contributing models, leading to enhanced prediction performance. Furthermore, FedGroup significantly improves the detection of attack types and their details, contributing to a more robust security framework for smart homes. Our approach demonstrates exceptional performance, achieving an accuracy rate of 99.64% with a minimal false positive rate (FPR) of 0.02% in attack type detection, and an impressive 99.89% accuracy in attack type detail detection.

Internet of Things Intrusion Detection: Centralized, On-Device, or Federated Learning?

Article

Full-text available

Sep 2020

With the ever increasing number of cyber-attacks, Internet of Things (IoT) devices are being exposed to serious malware, attacks, and malicious activities alongside their development. While past research has been focused on centralized intrusion detection assuming the existence of a central entity to store and perform analysis on data from all participant devices, these approaches cannot scale well with the fast growth of IoT connected devices and introduce a single-point failure risk that may compromise data privacy. Moreover, with data being widely spread across large networks of connected devices, decentralized computations are very much in need. In this context, we propose in this article a Federated Learning based scheme for IoT intrusion detection that maintains data privacy by performing local training and inference of detection models. In this scheme, not only privacy can be assured, but also devices can benefit from their peers' knowledge by communicating only their updates with a remote server that aggregates the latter and shares an improved detection model with participating devices. We perform thorough experiments on an NSL-KDD dataset to evaluate the efficiency of the proposed approach. Experimental results and empirical analysis explore the robustness and advantages of the proposed Federated Learning detection model by reaching an accuracy close to that of the centralized approach and outperforming the distributed unaggregated on-device trained models.

Security and privacy in the Internet of Things: Current status and open issues

Conference Paper

Full-text available

May 2014

Applications, Systems and Methods in Smart Home Technology: A Review

Article

Full-text available

Jan 2010

Smart Home technology started for more than a decade to introduce the concept of networking devices and equipment in the house. According to the Smart Homes Association the best definition of smart home technology is: the integration of technology and services through home networking for a better quality of living. Many tools that are used in computer systems can also be integrated in Smart Home Systems. In this paper, we present the Technologies and tools that can be integrated or applied in Smart Home systems.

Detecting Volumetric Attacks on loT Devices via SDN-Based Monitoring of MUD Activity

Conference Paper

Apr 2019

Smart environments equipped with IoT devices are increasingly under threat from an escalating number of sophisticated cyber-attacks. Current security approaches are inaccurate, expensive, or unscalable, as they require static signatures of known attacks, specialized hardware, or full packet inspection. The IETF Manufacturer Usage Description (MUD) framework aims to reduce the attack surface on an IoT device by formally defining its expected network behavior. In this paper, we use SDN to monitor compliance with the MUD behavioral profile, and develop machine learning methods to detect volumetric attacks such as DoS, reflective TCP/UDP/ICMP flooding, and ARP spoofing to IoT devices. Our first contribution develops a machine for detecting anomalous patterns of MUD-compliant network activity via coarse-grained (device-level) and fine-grained (flow-level) SDN telemetry for each IoT device, thereby giving visibility into flows that contribute to a volumetric attack. For our second contribution we measure network behavior of IoT devices by collecting benign and volumetric attacks traffic traces in our lab, label our dataset, and make it available to the public. Our last contribution prototypes a full working system (built with an OpenFlow switch, Faucet SDN controller, and a MUD policy engine), demonstrates its application in detecting volumetric attacks on several consumer IoT devices with high accuracy, and provides insights into cost and performance of our system. Our data and solution modules are released as open source to the community.

Network-Level Security for the Internet of Things: Opportunities and Challenges

Article

Aug 2019

Smart environments with many Internet of Things (IoT) devices are at significant risk of cyberattacks, putting private data and personal safety in danger. While IoT device manufacturers are putting more safeguards in their products, they need to be augmented with networklevel methods to detect and block anomalous behavior. Our approach provides a strong layer of runtime defense at the network layer applicable to large and heterogeneous IoT environments.

Classifying IoT Devices in Smart Environments Using Network Traffic Characteristics

Article

Aug 2018

The Internet of Things (IoT) is being hailed as the next wave revolutionizing our society, and smart homes, enterprises, and cities are increasingly being equipped with IoT devices. Yet, operators of such smart environments may not even be fully aware of their IoT assets. In this paper, we address this challenge by developing a framework for IoT device classification using network traffic characteristics. First, we instrument a smart environment with 28 different IoT devices spanning cameras, lights, plugs, motion sensors and health-monitors. We collect and synthesize traffic traces from this infrastructure for a period of 6 months, a subset of which we release as open data for the community to use. Second, we present insights into the underlying network traffic characteristics using statistical attributes such as activity cycles, port numbers, signalling patterns and cipher suites. Third, we develop a multi-stage machine-learning-based classification algorithm and demonstrate its ability to identify specific IoT devices with over 99%. Finally, we discuss the trade-offs between cost, speed, and performance involved in deploying the classification framework in real-time. Our study paves the way for operators of smart environments to monitor their IoT assets for presence, functionality, and cyber-security without requiring any specialized devices or protocols.

Smart IoT Devices in the Home: Security and Privacy Implications

Article

Jun 2018

Internet of Things (IoT) devices possess network capabilities and contain at least a part of the application logic, i.e., they have the ability to perform Transmission Control Protocol/Internet Protocol (TCP/IP) communications on their own, and can process some of the sensor data. The IoT thus refers to the network of physical objects embedded with electronics, software, sensors and connectivity to enable objects to exchange data with the manufacturer, operator, and/or other connected devices. At the start of this decade, there were an estimated 12.5 billion IoT devices, almost twice as much as the world?s population of 6.8 billion people [1]. The number of IoT devices is expected to grow rapidly in coming years.

Ensemble-learning Approaches for Network Security and Anomaly Detection

Conference Paper

Aug 2017

The application of machine learning models to network security and anomaly detection problems has largely increased in the last decade; however, there is still no clear best-practice or silver bullet approach to address these problems in a general context. While deep-learning is today a major breakthrough in other domains, it is difficult to say which is the best model or category of models to address the detection of anomalous events in operational networks. We present a potential solution to fill this gap, exploring the application of ensemble learning models to network security and anomaly detection. We investigate different ensemble-learning approaches to enhance the detection of attacks and anomalies in network measurements, following a particularly promising model known as the Super Learner. The Super Learner performs asymptotically as well as the best possible weighted combination of the base learners, providing a very powerful approach to tackle multiple problems with the same technique. We test the proposed solution for two different problems, using the well-known MAWILab dataset for detection of network attacks, and a semi-synthetic dataset for detection of traffic anomalies in operational cellular networks. Results confirm that the Super Learner provides better results than any of the single models, opening the door for a generalization of a best-practice technique for these specific domains.

The Internet of Things: How the Next Evolution of the Internet is Changing Everything

Article

Jan 2011

D. Evans

A review of Internet of Things for smart home: Challenges and solutions

Article

Jan 2017
J CLEAN PROD

FedGroup: A Federated Learning Approach for Anomaly Detection in IoT Environments

Abstract and Figures

Recommended publications

Privacy-Aware Anomaly Detection in IoT Environments using FedGroup: A Group-Based Federated Learning...

Privacy-Aware Anomaly Detection in IoT Environments using FedGroup: A Group-Based Federated Learning...

Design of Improved Secure Data Sharing and Searching

Feature Encoding by Location-Enhanced Word2Vec Embedding for Human Activity Recognition in Smart Hom...