ChapterPDF Available

FedGroup: A Federated Learning Approach for Anomaly Detection in IoT Environments

Authors:

Abstract and Figures

The increasing adoption and use of IoT devices in smart home environments have raised concerns around the data security or privacy of smart home users. Several studies employed traditional machine learning to address the key security challenge, namely anomaly detection in IoT devices. Such models, however, require transmitting sensitive IoT data to a central model for training and validation which introduces security and performance concerns. In this paper, we propose a federated learning approach for detecting anomalies in IoT devices. We present our FedGroup model and algorithms that train and validate local models based on data from a group of IoT devices. FedGroup also updates the learning of the central model based on the learning changes that result from each group of IoT devices, rather than computing the average learning of each device. Our empirical evaluation of the real IoT dataset demonstrates the capability of our FedGroup model and anomaly detection accuracy as the same or better than federated and non-federated learning models. FedGroup is also more secure and performs well given all the IoT data are used to train and update the models locally.KeywordsInternet of Things (IoT)Anomaly DetectionFederated LearningMachine LearningPrivacySmart Home
Content may be subject to copyright.
FedGroup: A Federated Learning Approach for
Anomaly Detection in IoT Environments
Yixuan Zhang1, Basem Suleiman1[0000000326740253] , and Muhammad Johan
Alibasa2[0000000223350404]
1School of Computer Science, University of Sydney, Australia
{nikki.zhang,basem.suleiman}@sydney.edu.au
2School of Computing, Telkom University, Indonesia
alibasa@telkomuniversity.ac.id
Abstract. The increasing adoption and use of IoT devices in smart
home environments have raised concerns around the data security or pri-
vacy of smart home users. Several studies employed traditional machine
learning to address the key security challenge, namely anomaly detection
in IoT devices. Such models, however, require transmitting sensitive IoT
data to a central model for training and validation which introduces se-
curity and performance concerns. In this paper, we propose a federated
learning approach for detecting anomalies in IoT devices. We present
our FedGroup model and algorithms that train and validate local mod-
els based on data from a group of IoT devices. FedGroup also updates
the learning of the central model based on the learning changes that re-
sult from each group of IoT devices, rather than computing the average
learning of each device. Our empirical evaluation of the real IoT dataset
demonstrates the capability of our FedGroup model and anomaly detec-
tion accuracy as the same or better than federated and non-federated
learning models. FedGroup is also more secure and performs well given
all the IoT data are used to train and update the models locally.
Keywords: Internet of Things (IoT) ·Anomaly Detection ·Federated
Learning ·Machine Learning ·Privacy ·Smart Home.
1 Introduction
With the advancement of the Internet technologies, it was predicted that the
number of Internet of Things (IoT) devices in the smart home environment in
2020 would be 7 devices per person [1]. In smart home environments devices
such as sensors, smartphones, and smart TVs are connected to the Internet so
that they can be accessed and monitored remotely. To achieve this, data will be
continuously sensed so that these IoT devices can perform certain functions. For
example, turning on the air conditioner when the temperature reaches a certain
degree and switching off the TV when there is no one detected watching TV [2].
The significant growth of IoT devices in smart homes has also brought forward
research interests in the very large amount of data that are collected and used to
support different types of intelligent services for smart homes [3]. The collected
2 Y. Zhang et al.
data can be used to develop intelligent data-driven models for enhancing the
user’s experience of smart homes.
The connectivity of smart home devices to the Internet and continuous data
sensing brought a number of key challenges to smart home users, including data
privacy and malicious access and control of the sensitive IoT devices. The set-
tings of IoT devices do not take users’ privacy and security as a priority. Such
IoT devices are often vulnerable to network attacks given it’s connected to the
Internet, and these attacks can be pervasive [4]. Such attacks may include access
and transfer of data being sensed by these devices, remotely switching off secu-
rity and monitoring cameras, and opening the doors remotely for unauthorised
home residents. Recent research reported that around 59% of users are con-
cerned about smart devices listening to them without permission and gathering
data without their knowledge [5]. Therefore, it becomes crucial to maintain the
highest levels of privacy and security while these IoT devices are used in smart
homes. To address this challenge, a large amount of research work that em-
ploys AI-based approaches to detect anomalous behaviour in such IoT devices
[6][7][8][9]. These approaches heavily focus on traditional machine learning mod-
els which also bring new challenges. Such machine learning approaches require
transmitting all data sensed from all IoT devices to an external server to train
and validate central anomaly detection models. This can be very expensive and
could exhaust the bandwidth of the network. It also can expose the sensitive
data collected from the IoT devices over the network which makes it vulnerable
to cyber-attacks. The situation would not get better even with encrypting and
decrypting the data as it could add performance overhead.
In this study, we address the anomaly detection problem on IoT devices by em-
ploying a federated learning approach. Federated learning allows the training of
local models based on data sensed from a group of IoT devices within a smart
home. The local models do not need to transmit the raw IoT data, but only
model updates that result from local training. The model (parameter) updates
are then shared with a central model which aggregates the values of the learning
parameters and then sends them back to all local models, so they can update
their learning. Although this federated learning approach addresses the above
data security and performance overhead posed by traditional machine learning
models, it has its unique challenges. In this paper, we address the research ques-
tion: How to design local and central models that work in federated learning
settings, given various IoT devices in smart homes?
Existing federated learning approaches suggest using the overall average to up-
date the learning parameters shared by all local models. This might not be
practical as it ignores the bias of the local models (each might have different
data with or without anomalies). We propose a new federated learning model
called FedGroup for anomaly detection in smart home environments. We present
algorithms that detail the training process on data collected from a group of IoT
devices and the process of updating learning parameters in the central model
FedGroup: A Federated Learning Approach for Anomaly Detection 3
based on the learning results from each group of IoT devices. Our FedGroup
addresses the bias resulting from averaging all updates from each device.
The main contributions of this paper are:
1. A new federated learning model and algorithms called FedGroup for anomaly
detection on IoT devices. The FedGroup model computes the learning up-
dates based on parameters from a group of IoT devices.
2. Empirical evaluation of FedGroup on real data collected from various IoT
devices [6][7][8][9]. The evaluation also includes a performance analysis of the
FedGroup against federated learning and non-federated learning models.
The rest of this paper is organised as follows. Section 2 describes related work
in the field of anomaly detection for IoTs. Section 3 presents the dataset used
in this study and the FedGroup model and algorithms proposed in this paper.
Experiments and Results are then presented in Section 4. Section 5 presents key
conclusions and future work.
2 Literature Review
Various traditional machine learning and deep learning approaches have been
utilised to identify attacks on IoT devices. One study by Stojkoska et al. (2017)
[10] suggests that the cloud-centric of holistic IoT-based framework for smart
home environments requires substantial data storage and processing infrastruc-
ture, and the current state is far from efficient. They highlighted that the new
approaches should comprehend the issue of massive data management on the
cloud. Furthermore, future studies also have to investigate different methods
to ensure security since the cloud-based techniques pose an enormous risk of
revealing personal information and data, which are considered as urgent issues.
Past research tended to focus on centralised anomaly detection in which the
cloud collects data from various sources. This raises several issues including high
communication load and data privacy. Federated Learning (FL) was proposed
with the characteristic of lightweight communicating updates, and it was proven
to successfully predict text input on mobile devices [11]. FL merges the updates
from all the distributed devices thus the calculation on the cloud was signifi-
cantly reduced, resulting in improved scalability and lightweight communication
[12]. Another study [13] compared the IoT intrusion detection using different ap-
proaches, including centralised, on-device and FL. The efficiency of FL reached a
similar accuracy to the centralised approach. Besides, the study suggests that FL
outperformed the on-device approach as it could take advantage of the knowl-
edge from others. FL answers significant drawbacks of centralised ML models
that are expensive, computationally difficult, and have low scalability support.
Mohri et al. (2019) [14] indicated that different clients might be weighted dif-
ferently by FL resulting in unfairness. Fairness in this context refers to both
the training data and the training procedures. The study indicates that the
4 Y. Zhang et al.
uniform distribution is not the common distribution in many cases. Therefore,
minimising the anticipated loss concerning the specific distribution is harmful
and might lead to a mismatch with the target. Consequently, the study pre-
sented an agnostic FL framework in which the centralised model is optimised
for any target distribution produced by a mixture of client distributions by util-
ising data-dependent Rademacher complexity. However, the optimisation of the
single worst device is limited for a smart environment with numerous IoT de-
vices. In separate research, Li et al. (2020) [15] concur that unfair distribution of
model could bring disproportionate performance since overall accuracy is high
but individual accuracy is uncertain. The generated model may be biased to-
wards devices with massive data. Their study developed An enhanced model
configured at a more granular scale to ensure equitable device distribution and
maintain the same overall accuracy.
A study analysis [16] is crucial for understanding ensemble learning (EL) for
network security, and anomaly detection can perform well in results. EL combines
multiple learning models and achieves better prediction results. Furthermore,
an ensemble of models has a stronger resilience in the face of training data
uncertainty. The EL concept has similarity to how FL aggregate the training
results. This opens up an opportunity to incorporate ensemble learning with
FL.
As previously shown, many studies showed FL showed better performance than
traditional ML and confirmed the high privacy level of FL. However, research
identifying the attack using FL is still scarce and has not been explored in
depth. Besides, there are issues, for instance, the bias of the distributed models
is averaged to produce the final global model that will cause unfairness. The past
studies neglected the reality that various local models have distinct functionality
and structure. The research gaps in the smart home environments are that past
research failed to address the similarity of network traffic flow data patterns of
device models in the same category. The same type of IoT devices have similar
vulnerability structures under similar attacks. Therefore, IoT devices within the
same group should use similar parameters for anomaly detection. While it seems
like a straightforward method to aggregate updates together, the bias in the
training phase arises from the updating of participants’ parameters that differ
from one another and the selection of the average.
3 Methodology
Our study aims to build an anomaly detection to detect whether there are any
attack attempts (Attack Detection). The first section Research Data displays
the network traffic flow data, and attack data are the original input data. Then,
the Research Method shows the details of designing models, and the Experiment
and analysis provide the preparation and evaluation process.
FedGroup: A Federated Learning Approach for Anomaly Detection 5
3.1 IoT Datasets
Our dataset was obtained from the UNSW IoT analytics team, consisting of real-
world attacks to assess the privacy and security dangers of IoT devices [6][7][8][9].
The dataset was collected from 28 unique IoT devices in various categories and
multiple non-IoT devices in the smart environment. There are 30 PACP files
consisting of both attack and benign data in two separate stages. The dataset
was split into two stages: the first stage is between 28/05/2018 and 17/06/2018,
and the second stage is from 24/09/2018 to 26/10/2018. IoT devices are defined
as devices linked to the Internet with application logic and executing TCP/IP
connection. Ten IoT devices in this dataset contain benign and attack traffic
datasets, whereas the others contain only benign data. The datasets used in our
study can be found in Flow and Annotation data 3, the implementation of our
algorithms and models, and supplementary results and materials can be accessed
from the project repository 4”. Therefore, this research focuses on the selected
ten IoT devices with wireless connection to the Internet in four categories listed
in Table. 1.
IoT devices
IoT Devices No. MAC Addresses IoT devices Category
IoT Device 0 00:16:6c:ab:6b:88 Samsung Smart Cam Camera
IoT Device 1 00:17:88:2b:9a:25 Phillip Hue Lightbulb Energy management
IoT Device 2 44:65:0d:56:cc:d3 Amazon Echo Contollers/Hubs
IoT Device 3 50:c7:bf:00:56:39 TP-Link Plug Energy management
IoT Device 4 70:ee:50:18:34:43 Netatmo Camera Camera
IoT Device 5 74:c6:3b:29:d7:1d iHome PowerPlug Energy management
IoT Device 6 d0:73:d5:01:83:08 LiFX Bulb Energy management
IoT Device 7 ec:1a:59:79:f4:89 Belkin Switch Energy management
IoT Device 8 ec:1a:59:83:28:11 Belkin Motion Sensor Energy management
IoT Device 9 F4:F5:D8:8F:0A:3C Chromcast Ultra Appliances
Table 1. IoT devices included in the dataset
The network traffic flow data of the ten IoT devices are collected every minute,
marked with activity, and recorded to the ten separate excel network traffic
flow data files. The files contain ”Timestamp”, ”NoOfFlows”, and a signifi-
cant number of attributes of patterns. Several features such as ”InternetTcp”,
”InternetUdp”, ”LocalTcp”, and ”LocalUdp” are the contents of the follow-
ing ”From” and ”To”, and the contents after ”Port” are port numbers (e.g.,
”From###Port###Packet”). Since the packet and byte are not closely con-
nected and the sizes of the packets in this dataset vary, we decided to forecast
attacks by including them. Based on the network traffic flow data, it is unknown
which network flow is going to which IoT devices or coming from which IoT
3https://iotanalytics.unsw.edu.au/attack-data
4https://github.com/BasemSuleiman/IoT Anomaly Detection Smart Homes
6 Y. Zhang et al.
devices. The reasons are that different IoT devices use the same port number
and use different port numbers simultaneously.
The UNSW IoT analytics team designed a set of attacks comparable to real-
world attacks and are particular to several real-world consumer IoT devices.
The tools were created in Python to find susceptible and vulnerable devices on
the local network by running different tests against them. Then, the program
performs targeted attacks on IoT devices that are susceptible. The attack con-
dition includes the start and end time of the attacks, the impact of the attack,
and attack types. When determining the normal behaviour or under the attacks,
it relies on the rules ”if (flowtime >= startTime ×1000 and endTime ×1000
>= flowtime, then attack = true”. It is multiplied by 1000 since the times are
recorded in different units: flow time in milliseconds while start time and end
time are not.
3.2 Proposed Approach: FedGroup
FedAvg model accepts the initial model from the central server, training models
on decentralised local device servers, and reports the best performance parame-
ters to the central model [11]. For ML, there is only one step which is a client-
to-server upload step. In contrast with Traditional Machine Learning, FedAvg
sends code to data rather than send data to code. For FL-based learning, there
are four steps in one iteration:
1. A server-to-client broadcast step
2. A local client update step
3. A client-to-server upload step
4. A server update step
While it is simple for FedAvg to summarise all the parameters from local servers
and select the mean as the following round parameter, the main weakness is
the failure to address the similarity of network traffic flow data patterns of the
device models in the same category. Furthermore, the devices in smart homes are
not assigned into different groups based on their similarity. The devices in the
same groups should have similar functionalities and vulnerable risks. The bias
in the training procedure was from the updates of device parameters that are
different from each other and easily choose the average. Imaging the IoT devices
in a smart home are mostly energy management applications such as plugs or
blubs, the parameters of cloud server will bias to the energy management devices
because the numbers of its are more remarkable than other groups.
Therefore, we present FedAvg with group masters called FedGroup that send
parameters to the group master rather than the central server. The group masters
have helped combine devices within the same group and aggregate parameters
depending on the groups. In the central model, there are multiple group masters
then send back the aggregated results to the corresponding participant local
FedGroup: A Federated Learning Approach for Anomaly Detection 7
model, which is more efficient for the local model to focus on the information
within the same group.
The product can be grouped into many categories based on features. The group
can be defined based on the category of the IoT devices such as Camera or
Energy management. Apart from that, the group can also be defined by other
characteristics. If separate groups are based on features, for instance, the smart
door product contains many features to open the door such as app control,
fingerprint recognition, entering the password, scanning an intelligent card or
simple using key unlock. If the smart door is under attack, the central model
will tell which specific part is under attack.
The followings are the steps of FedGroup (Fig 1):
1. Every local model computes training network traffic flow data with all pa-
rameters and sends the parameters’ best results to the central model;
2. Group master in the central model aggregates the parameters based on the
group;
3. Group master sends back the aggregated results to devices in the correspond-
ing group;
4. Local models update the models with the new parameters.
Fig. 1. The architecture of our proposed approach FedGroup
8 Y. Zhang et al.
Definition: Using Xnto represent the network traffic flow data of the IoT de-
vices and ynto represent the prediction target. Network NDnGi :Nrepresents
network, Dn means Device n and Gi represents Group i. The Xnand Mnare
included in NDnGi. During the training, setting the best score S, the best pa-
rameter B, the average score of the entire model C, and the average parameters
of the entire model A. For each Model M, parameters P={a, b, ...}means pa-
rameters such as weights, n estimator and so on with all possible parameters
grid p={a0, a1, ...},{b0, ...}, ... such as n estimator have parameters 1, 2 and
so on. Erepresents the selected parameters grids in the local models after the
update to the central model.
Algorithm 1 FedGroup: Client Side Learning Algorithm
1: INPUT: P , E
2: REQUIRE: Xn, yn, M
3: OUTPUT: Band Sto C entral Ser ver Side Learning :F edAvg with the related
Group Master
4: SET: Local Model M
5: /* Fit possible parameters grids and return the best parameters and the best
score*/
6: for eEdo
7: Fit Pin Mwith different grid eto train Xnand yn
8: Test the Mto get the accuracy
9: CALCULATE Band S
10: end for
Algorithm 2 FedGroup: Group Master Algorithm
1: INPUT: Band Sof each d
2: DISPLAY: scores of each groups
3: OUTPUT: Aand Cto C entral Ser ver Side Learning :F edGr oup
4: CALCULATE Aand Cbased on Band S
FedGroup: A Federated Learning Approach for Anomaly Detection 9
Algorithm 3 FedGroup: Central Server Side Learning Algorithm
1: INPUT: M , P, p
2: OUTPUT: A, C
3: /* 1st round: receive the best parameters and best devices from every model, and
calculate the average parameters of each group*/
4: for gGdo
5: for nNdo
6: Initial: M
7: Cl ient Side Learning :F edGroup (P, p)
8: Return Band Sof each N
9: end for
10: Group M aster :FedGr oup (Band Sof each N)
11: Return Aand C
12: end for
13: /* 2nd round: send mean parameter to Client Server and return the mean score
and average parameter of mode*/
14: for gGdo
15: for nNdo
16: Cl ient Side Learning :F edGroup (P, A)
17: Return Band Sof each N
18: end for
19: Group M aster :FedGr oup (Band Sof each N)
20: Return Aand C
21: end for
3.3 Experiment and analysis
For network traffic flow data of the IoT devices, remove “NoOfFlow” because it
counts the numbers of flow, which is highly correlated to all the other attributes.
Due to the fact that different devices use the same port number and the same
device use different port numbers, there are 253 attributes about bytes of port
number and packages of the port number in total. When capturing the network
behaviour of one device by milliseconds, various port numbers have not been
used. In other words, the NaN data means there is no network behaviour for the
corresponding port number. That is the reason we have missing data. To fill in
the missing data, we assign the most likely value and the global constant to a
particular value of 0. It signifies no network behaviour with zeros packet-level
and zeroes byte-level network traffic flow data at that time point.
To predict whether it is an attack means that there are two options: attack
or non-attack. Implementing Decision Tree, Logistic Regression and Ensemble
Learning as local models on ML, FedAvg, and FedGroup as central models to
attack detection. To avoid overfitting, StratifiedShuffleSplit split the dataset 80%
training and 20% testing dataset. For training data, Stratified 5-Fold Cross-
Validation randomly divides the entire data into five folds, fits four folds to
the model, and validates the model using the remaining fold. Evaluate the 20%
testing data to compute accuracy with an F1 score with a weighted average.
10 Y. Zhang et al.
To evaluate the model, the False Positive Rate (FPR) is used to calculate the
probability of falsely rejecting the null hypothesis to measure the accuracy of
the test.
4 Results
This study developed an anomaly detection system by using our proposed model,
called FedGroup as described in the previous section. Table 2 provides the re-
sults summary from various models including Decision Tree, Logistic Regression
and Ensemble Learning as the local model on traditional ML, FedAvg and Fed-
Group to attack detection. EL can merge several models even if the individuals
are weak, and we use it as an initial local training model. Using ML as an initial
model for every IoT device is not always performed as expected because they
are used to solve a specific question or a type of question. For example, logis-
tic regression effectively classifies data into discrete classes by investigating the
connection between a collection of labelled data. However, If the number of fea-
tures is greater than the number of observations, Logistic Regression should not
be utilised. Considering the various performances that sometimes perform good
but sometimes perform not of machine learning models when solving a problem,
ensemble learning joins various contributing models to seek better forecasts.
Algorithms Attack Detection
Local Model Central Model Accuracy Running Times (seconds) FPR
Decision Tree Traditional ML 99.84% 8524 10.04%
Decision Tree FedAvg 99.85% 154 9.57%
Decision Tree FedGroup 99.87% 154 7.70%
Logistic regression Traditional ML 99.76% 21376 24.48%
Logistic regression FedAvg 99.77% 2912 20.28%
Logistic regression FedGroup 99.77% 2999 20.18%
Ensemble learning Traditional ML 99.85% 33940 9.60%
Ensemble learning FedAvg 99.91% 2390 9.03%
Ensemble learning FedGroup 99.91% 2143 9.43%
Table 2. The accuracy of FedGroup, FedAvg and traditional ML using different models
Firstly, anomaly detection can determine whether there is any attack attempt.
The highest accuracy of 99.91% of attack detection was reached by the FedGroup
model using Ensemble learning as the locally model to train. Secondly, FL-
based learning models performed either similar or better than the traditional ML
models. Considering the anomaly detection problem is a no binary classification,
while the StratifiedShuffleSplit is used to try to solve the problem of overfitting,
the accuracy of all models is more than 99%. Therefore, FPR is a more reliable
evaluation metric since higher FPR scores indicates higher ratio of negative
events are incorrectly categorised as positive. As shown in Table 2, the FPRs of
FedGroup: A Federated Learning Approach for Anomaly Detection 11
FL-based are less than the FPRs of the Traditional ML model indicating better
performance with less overfitting issue.
The running time of FL-based is less than the traditional ML model where the
client slide model spends O(n) and central server takes O(n2). For example, us-
ing Ensemble learning as the local model and FedGroup as the Central Model
spends 2143 seconds which is around 1/16 of time spend on Traditional ML
(33940 seconds) and 0.9 of time spend on FedAvg (2390 seconds). As a result
of lightweight communication, no central authority, and a decentralised learn-
ing model, FL uses the advantages of locally training data to reduce the run-
ning time. Besides, data safety is guaranteed without sending, communicating
or sharing to other IoT devices or the Internet.
Each smart home has a large amount of IoT devices to make our life more
efficient and easier. If we focus on the differences of FPRs that are larger than
1%, then FPRs of FedGroup are better than FPRs of FedAvg. Different IoT
devices have different vulnerable functions and maybe under different attacks.
Meanwhile, one similar attack may have similar functionality or patterns. When
the central model learns attack types from the same category of IoT devices,
FedGroup is useful to provide parameters of IoT devices within the same group.
Besides combining all the smart environments to build smart cities or industries,
FedGroup can learn all the attack detection and attack type detection based on
group categorisation such as the traffic light group, subway group, and others.
5 Conclusion
In this paper, we introduce a new model called FedGroup model and algorithms
which address the issue of IoT anomaly detection in the smart home environ-
ments. FedGroup allow training and detecting anomalies based on data collected
from group of devices, and thus reduces the vulnerability of the IoT data trans-
mitted and shared on a central server. We evaluate our FedGroup approach on
real dataset collected from various IoT devices in smart-home settings to detect
anomalous behaviour. Based on our experimental results, it can be concluded
that the performance of FedGroup improved in terms of accuracy of anomaly de-
tection compared to the traditional FedAvg. Furthermore, FedGroup can address
the issue of fairness of the training procedure and can maintain data privacy, as
the values of learning parameters need to be shared with the central model. Our
results also demonstrated that Ensemble Learning as local models used in our
FedGroup achieved the best accuracy, 99.91%.
While our finding has provided the comparison results of different models, more
empirical studies on continuous real-time learning and alternative ways to ensure
the fairness of federated learning need to be conducted to test further and refine
our findings. Besides, expanding the model to other frameworks not limited to
anomaly detection, finding the system cost and how the link instability of wireless
networks affects the model updating are several opportunities for in future work.
12 Y. Zhang et al.
References
1. Evans, D. (2011). How the Next Evolution of the Internet Is Changing Everything.
11.
2. Robles, R. J., & Kim, T. (2010). Applications, Systems and Methods in Smart Home
Technology: A Review. International Journal of Advanced Science and Technology,
15, 13.
3. Gubbi, J., Buyya, R., Marusic, S., & Palaniswami, M. (2013). Internet of Things
(IoT): A vision, architectural elements, and future directions. Future Generation
Computer Systems, 29(7), 1645–1660. https://doi.org/10.1016/j.future.2013.01.010
4. Abomhara, M., & Køien, G. M. (n.d.). Security and privacy in the Internet of
Things: Current status and open issues. 8.
5. 59 per cent of smart speaker users have privacy concerns report Mo-
bile Marketing Magazine. Mobilemarketingmagazine.com. (2021). Retrieved 23
October 2021, from https://mobilemarketingmagazine.com/59-per-cent-of-smart-
speaker-users-have-privacy-concerns-report-.
6. Habibi Gharakheili, H., Sivanathan, A., Hamza, A., & Sivaraman, V. (2019).
Network-Level Security for the Internet of Things: Opportunities and Challenges.
Computer, 52(8), 58–62.
https://doi.org/10.1109/MC.2019.2917972
7. Hamza, A., Gharakheili, H. H., Benson, T. A., & Sivaraman, V. (2019). De-
tecting Volumetric Attacks on loT Devices via SDN-Based Monitoring of MUD
Activity. Proceedings of the 2019 ACM Symposium on SDN Research, 36–48.
https://doi.org/10.1145/3314148.3314352
8. Sivanathan, A., Gharakheili, H. H., Loi, F., Radford, A., Wijenayake, C., Vish-
wanath, A., & Sivaraman, V. (2019). Classifying IoT Devices in Smart Environ-
ments Using Network Traffic Characteristics. IEEE Transactions on Mobile Com-
puting, 18(8), 1745–1759. https://doi.org/10.1109/TMC.2018.2866249
9. Sivaraman, V., Gharakheili, H. H., Fernandes, C., Clark, N., & Kar-
liychuk, T. (2018). Smart IoT Devices in the Home: Security and Pri-
vacy Implications. IEEE Technology and Society Magazine, 37(2), 71–79.
https://doi.org/10.1109/MTS.2018.2826079
10. Stojkoska, B. L. R., & Trivodaliev, K. V. (2017). A review of Internet of Things for
smart home: Challenges and solutions. Journal of Cleaner Production, 140, 1454-
1464.
11. Yang, Q., Liu, Y., Chen, T., & Tong, Y. (2019). Federated Machine Learning:
Concept and Applications. ArXiv:1902.04885 [Cs]. http://arxiv.org/abs/1902.04885
12. McMahan, B., Moore, E., Ramage, D., Hampson, S., & Arcas, B. A. Y. (2017).
Communication-efficient learning of deep networks from decentralized data. Re-
trieved from Artificial Intelligence and Statistics, 1273–1282 http://proceedings.
mlr.press/v54/mcmahan17a.html.
13. Rahman, S. A., Tout, H., Talhi, C., & Mourad, A. (2020). Internet of Things In-
trusion Detection: Centralized, On-Device, or Federated Learning? IEEE Network,
34(6), 310–317. https://doi.org/10.1109/MNET.011.2000286
14. Mohri, M., Sivek, G., & Suresh, A. T. (2019). Agnostic Federated Learning. 11.
15. Li, T., Sanjabi, M., Beirami, A., & Smith, V. (2020). Fair Resource Allocation in
Federated Learning. ArXiv:1905.10497 [Cs, Stat]. http://arxiv.org/abs/1905.10497
16. Vanerio, J., & Casas, P. (2017). Ensemble-learning Approaches for Network
Security and Anomaly Detection. Proceedings of the Workshop on Big Data
Analytics and Machine Learning for Data Communication Networks, 1–6.
https://doi.org/10.1145/3098593.3098594
... In the prior study [12], we introduced a model called FedGroup to address the issue of anomaly detection. FedGroup is based on the Federated Learning model principle but with an additional group master in the central server to compute learning updates based on parameters from a collection of IoT devices. ...
... Our previous study [12] proposed a new algorithm called FedGroup to address the unfairness problem of FedAvg. Previous research has overlooked the fact that local models have varying functionality and structures. ...
... Since aggregate accuracy is high but individual accuracy is an unknown, unjust distribution of the model might result in disproportionate performance [22]. FedGroup [12] suggests calculating the average of updates based on groups rather than simply choosing a one-shot averaging of all updates. The model includes multiple local models, a central model and several group masters in the central model. ...
Preprint
Full-text available
Concerns on the data security and privacy of smart home users have been growing popularity due to the rising usage of IoT devices. Many traditional machine learning techniques have been used to perform anomaly detections. However, these models need to send private IoT data to a central model for validation and training, raising security and efficiency issues. We propose a new Federated Learning (FL) method called FedGroup, which adopts the FedAvg method, but it updates the learning of the central model based on the learning changes brought by each group of IoT devices. Our experimental results showed that FedGroup achieved same or better anomaly detection accuracy compared to other federated and non-federated learning methods. Furthermore, we showed how ensemble learning may be used to connect many contributing models for superior average prediction performance. FedGroup also improve the detection of attack type detection and attack type detail detection. By comparing our new models with baseline models, our models performed better showing an accuracy of 99.64% accuracy with 0.02% FPR on attack type detection and 99.89% accuracy on attack type detail detection.
... In 2023, our previous work [39] introduced FedGroup, an algorithm designed to address the highly skewed distribution challenge of FedAvg. FedGroup departs from computing the average learning of each device and instead adjusts the central model's learning based on the learning patterns observed in distinct groups of IoT devices. ...
... While the aggregate accuracy may appear satisfactory, individual accuracy remains obscure, potentially leading to skewed performance distribution [21]. In contrast, FedGroup introduces a novel approach [39]. It advocates computing the average of updates on a group basis rather than opting for a one-size-fitsall averaging strategy (refer to Fig. 3 and Fig. 4). ...
Article
Full-text available
The popularity of Internet of Things (IoT) devices in smart homes has raised significant concerns regarding data security and privacy. Traditional machine learning (ML) methods for anomaly detection often require sharing sensitive IoT data with a central server, posing security and efficiency challenges. In response, this paper introduces FedGroup, a novel Federated Learning (FL) method inspired by FedAvg. FedGroup revolutionizes the central model’s learning process by updating it based on the learning patterns of distinct groups of IoT devices. Our experimental results demonstrate that FedGroup consistently achieves comparable or superior accuracy in anomaly detection when compared to both federated and non-federated learning methods. Additionally, Ensemble Learning (EL) collects intelligence from numerous contributing models, leading to enhanced prediction performance. Furthermore, FedGroup significantly improves the detection of attack types and their details, contributing to a more robust security framework for smart homes. Our approach demonstrates exceptional performance, achieving an accuracy rate of 99.64% with a minimal false positive rate (FPR) of 0.02% in attack type detection, and an impressive 99.89% accuracy in attack type detail detection.
Article
Full-text available
With the ever increasing number of cyber-attacks, Internet of Things (IoT) devices are being exposed to serious malware, attacks, and malicious activities alongside their development. While past research has been focused on centralized intrusion detection assuming the existence of a central entity to store and perform analysis on data from all participant devices, these approaches cannot scale well with the fast growth of IoT connected devices and introduce a single-point failure risk that may compromise data privacy. Moreover, with data being widely spread across large networks of connected devices, decentralized computations are very much in need. In this context, we propose in this article a Federated Learning based scheme for IoT intrusion detection that maintains data privacy by performing local training and inference of detection models. In this scheme, not only privacy can be assured, but also devices can benefit from their peers' knowledge by communicating only their updates with a remote server that aggregates the latter and shares an improved detection model with participating devices. We perform thorough experiments on an NSL-KDD dataset to evaluate the efficiency of the proposed approach. Experimental results and empirical analysis explore the robustness and advantages of the proposed Federated Learning detection model by reaching an accuracy close to that of the centralized approach and outperforming the distributed unaggregated on-device trained models.
Article
Full-text available
Smart Home technology started for more than a decade to introduce the concept of networking devices and equipment in the house. According to the Smart Homes Association the best definition of smart home technology is: the integration of technology and services through home networking for a better quality of living. Many tools that are used in computer systems can also be integrated in Smart Home Systems. In this paper, we present the Technologies and tools that can be integrated or applied in Smart Home systems.
Conference Paper
Smart environments equipped with IoT devices are increasingly under threat from an escalating number of sophisticated cyber-attacks. Current security approaches are inaccurate, expensive, or unscalable, as they require static signatures of known attacks, specialized hardware, or full packet inspection. The IETF Manufacturer Usage Description (MUD) framework aims to reduce the attack surface on an IoT device by formally defining its expected network behavior. In this paper, we use SDN to monitor compliance with the MUD behavioral profile, and develop machine learning methods to detect volumetric attacks such as DoS, reflective TCP/UDP/ICMP flooding, and ARP spoofing to IoT devices. Our first contribution develops a machine for detecting anomalous patterns of MUD-compliant network activity via coarse-grained (device-level) and fine-grained (flow-level) SDN telemetry for each IoT device, thereby giving visibility into flows that contribute to a volumetric attack. For our second contribution we measure network behavior of IoT devices by collecting benign and volumetric attacks traffic traces in our lab, label our dataset, and make it available to the public. Our last contribution prototypes a full working system (built with an OpenFlow switch, Faucet SDN controller, and a MUD policy engine), demonstrates its application in detecting volumetric attacks on several consumer IoT devices with high accuracy, and provides insights into cost and performance of our system. Our data and solution modules are released as open source to the community.
Article
Smart environments with many Internet of Things (IoT) devices are at significant risk of cyberattacks, putting private data and personal safety in danger. While IoT device manufacturers are putting more safeguards in their products, they need to be augmented with networklevel methods to detect and block anomalous behavior. Our approach provides a strong layer of runtime defense at the network layer applicable to large and heterogeneous IoT environments.
Article
The Internet of Things (IoT) is being hailed as the next wave revolutionizing our society, and smart homes, enterprises, and cities are increasingly being equipped with IoT devices. Yet, operators of such smart environments may not even be fully aware of their IoT assets. In this paper, we address this challenge by developing a framework for IoT device classification using network traffic characteristics. First, we instrument a smart environment with 28 different IoT devices spanning cameras, lights, plugs, motion sensors and health-monitors. We collect and synthesize traffic traces from this infrastructure for a period of 6 months, a subset of which we release as open data for the community to use. Second, we present insights into the underlying network traffic characteristics using statistical attributes such as activity cycles, port numbers, signalling patterns and cipher suites. Third, we develop a multi-stage machine-learning-based classification algorithm and demonstrate its ability to identify specific IoT devices with over 99%. Finally, we discuss the trade-offs between cost, speed, and performance involved in deploying the classification framework in real-time. Our study paves the way for operators of smart environments to monitor their IoT assets for presence, functionality, and cyber-security without requiring any specialized devices or protocols.
Article
Internet of Things (IoT) devices possess network capabilities and contain at least a part of the application logic, i.e., they have the ability to perform Transmission Control Protocol/Internet Protocol (TCP/IP) communications on their own, and can process some of the sensor data. The IoT thus refers to the network of physical objects embedded with electronics, software, sensors and connectivity to enable objects to exchange data with the manufacturer, operator, and/or other connected devices. At the start of this decade, there were an estimated 12.5 billion IoT devices, almost twice as much as the world?s population of 6.8 billion people [1]. The number of IoT devices is expected to grow rapidly in coming years.
Conference Paper
The application of machine learning models to network security and anomaly detection problems has largely increased in the last decade; however, there is still no clear best-practice or silver bullet approach to address these problems in a general context. While deep-learning is today a major breakthrough in other domains, it is difficult to say which is the best model or category of models to address the detection of anomalous events in operational networks. We present a potential solution to fill this gap, exploring the application of ensemble learning models to network security and anomaly detection. We investigate different ensemble-learning approaches to enhance the detection of attacks and anomalies in network measurements, following a particularly promising model known as the Super Learner. The Super Learner performs asymptotically as well as the best possible weighted combination of the base learners, providing a very powerful approach to tackle multiple problems with the same technique. We test the proposed solution for two different problems, using the well-known MAWILab dataset for detection of network attacks, and a semi-synthetic dataset for detection of traffic anomalies in operational cellular networks. Results confirm that the Super Learner provides better results than any of the single models, opening the door for a generalization of a best-practice technique for these specific domains.