ArticlePDF Available

RR-LADP: A Privacy-Enhanced Federated Learning Scheme for Internet of Everything

February 2021
IEEE Consumer Electronics Magazine PP(99):1-1

February 2021
PP(99):1-1

DOI:10.1109/MCE.2021.3059958

Authors:

While the widespread use of ubiquitously connected devices in IoE offers enormous benets, it also raises serious privacy concerns. Federated learning, as one of the promising solutions to alleviate such problems, is considered as capable of performing data training without exposing raw data that kept by multiple devices. However, either malicious attackers or untrusted servers, can deduce users privacy from the local updates of each device. Previous studies mainly focus on privacy-preserving approaches inside the servers, which requires the framework to be built on trusted servers. In this paper, we propose a privacy-enhanced federated learning scheme for IoE. Two mechanisms are adopted in our approach, namely the randomized response (RR) mechanism and the local adaptive differential privacy (LADP) mechanism. RR is adopted to prevent the server from knowing whose updates are collected in each round. LADP enables devices to add noise adaptively to its local updates before submitting them to the server. Experiments demonstrate the feasibility and effectiveness of our approach.

The RR-LADP Framework.

…

Training process of RR-LADP and CDP when K = 1000 and K = 100. The solid lines represent loss and the dotted lines represent δ. (ε = 8, σ = 1)

…

Results on the accuracy and δ for different privacy budgets. The solid lines represent accuracy and the dotted lines represent δ. (σ = 1, K = 1000, K = 100)

…

Results on the accuracy for different noise parameter. (ε = 8, K = 1000, K = 100)

…

Figures - uploaded by Yang Liu

Content may be subject to copyright.

Content uploaded by Yang Liu

Content may be subject to copyright.

2162-2248 (c) 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/MCE.2021.3059958, IEEE Consumer

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 1

RR-LADP: A Privacy-Enhanced Federated

Learning Scheme for Internet of Everything

Zerui Li

Harbin Institute of

Technology, Shenzhen

Qing Liao

Harbin Institute of

Technology, Shenzhen

Mohsen Guizani

Qatar University, Doha

Yuchen Tian

Harbin Institute of

Technology, Shenzhen

Yang Liu

Harbin Institute of

Technology, Shenzhen

Peng Cheng Laboratory

Weizhe Zhang

Harbin Institute of

Technology, Shenzhen

Peng Cheng Laboratory

Xiaojiang Du

Temple University,

Philadelphia

Abstract—While the widespread use of ubiqui-

tously connected devices in IoE offers enormous

beneﬁts, it also raises serious privacy concerns. Fed-

erated learning, as one of the promising solutions

to alleviate such problems, is considered as capable

of performing data training without exposing raw

data that kept by multiple devices. However, either

malicious attackers or untrusted servers, can deduce

users’ privacy from the local updates of each device.

Previous studies mainly focus on privacy-preserving

approaches inside the servers, which requires the

framework to be built on trusted servers. In this pa-

per, we propose a privacy-enhanced federated learn-

ing scheme for IoE. Two mechanisms are adopted in

our approach, namely the randomized response (RR)

mechanism and the local adaptive differential privacy

(LADP) mechanism. RR is adopted to prevent the

server from knowing whose updates are collected

in each round. LADP enables devices to add noise

adaptively to its local updates before submitting them

to the server. Experiments demonstrate the feasibility

and effectiveness of our approach.

I. INTRODUCTION

THE Internet of Everything (IoE) redeﬁnes the

connection between people, things and data

and changes the way to interact devices. In this era,

any object can be transformed into network data

through corresponding sensors. At the same time,

advances in communication technology and en-

hancement of edge computing capabilities facilitate

the application of machine learning in the Internet

of Everything, which means that network data can

be effectively mined to support intelligent services.

For example, by collecting static and dynamic in-

formation of users, smart furniture and intelligence

software can improve the efﬁciency of work and

life. In general, more data means better service. In

this environment, most IoT devices continuously

collect and upload private data during operation,

which is potential to compromise users’ privacy [?].

Recently, federated learning, which localizes the

training process, is seen as a potential machine

learning mechanism to solve the problem of using

private data. It distributes the tasks of model train-

ing to multiple participants and aggregates local

updates to iteratively generate a global model. The

advantage is that the information delivered to server

is model weights difference or training gradient [?],

instead of the raw data. This distributed learning

setup decouples the training tasks from the need

for the server to centralize data. It can use the

private data of different users to learn a high-quality

sharing model, while leaving raw data on the local

IoT devices. Therefore, the risk of privacy leakage

caused by data transmission, cloud storage, and

centralized training can be effectively decreased. At

the same time, the availability of training data is

guaranteed as there is no need to encrypt the data

before using it.

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 2

In fact, similar to most privacy solutions, feder-

ated learning is also based on an important assump-

tion that the process is scheduled by a trusted server.

Untrusted servers and malicious attackers may per-

form model inversion attacks by obtaining the

communication parameters of federated learning. In

the IoE solutions, data collected from sensors will

be transmitted through multiple layers to generate

services. The multi-layer structure composed of

hardware and software makes federated learning

more vulnerable to potential attacks.

In this paper, we propose a privacy-enhanced

federated learning scheme for IoE. Two mecha-

nisms are adopted in our approach, namely the

randomized response (RR) mechanism and the local

adaptive differential privacy (LADP) mechanism.

The main contributions are as follows.

•We propose the RR mechanism, which is com-

pleted by each device to enhance the privacy of

devices selection. In each training round of RR

federated learning, the server cannot determine

whether a particular device is participated the

training. The mechanism, therefore, can pre-

vent untrusted servers and malicious attackers

from knowing which devices’ updates are in-

cluded in the communication content.

•We adopt the LADP mechanism in the stage

of local training. Gaussian noise is added to

the local updates of each device adaptively

before the the updates are uploaded to the

server. Hence, the mechanism can even prevent

untrusted server and malicious attackers from

deducing relevant information of the training

data with local updates.

II. BACKGROU ND A ND M OTI VATION

A. Federated learning

The general ﬂow of federated learning using Fed-

eratedAveraging (FedAvg) algorithm to aggregate

updates is as follows [?].

Suppose there are a total of Kclients and each

client has a private dataset. In each training round

t, the server randomly selects K0(K0≤K)clients

and sends them the global model with the weight

ωt−1. Each client kselected trains the model on

its private data, and uploads the weights difference

∆ωk

t. Finally, the server averages these local up-

dates and generates a new global model, and the

process repeats as:

ωt=ωt−1+1

K0X∆ωk

t(1)

B. Differential privacy

Differential privacy provides a strong privacy

guarantee for aggregate data [?]. Its deﬁnition is

as follows.

Deﬁne two datasets to be adjacent if they dif-

fer only in a single record. A given mechanism

M:D → R has domain Dand range R. We deﬁne

the mechanism Msatisﬁes ε-differential privacy, if

for any two adjacent inputs d, d0∈ D and for any

subset of outputs S ∈ R the following inequality

holds:

P r[M(d)∈ S]≤eεP r[M(d0)∈ S] + δ(2)

The privacy budget εlimits the bounds of privacy

loss and the slack variable δallows the deﬁnition

break with a given probability.

The general way to realize this mechanism

is to add Gaussian noise to approximate a real

value function f:D → R with differential pri-

vacy. The noise is calibrated to sensitivity Sf,

which is the maximum value of absolute distance

|f(d)−f(d0)|.f(d)and f(d0)are function value

corresponding to the adjacent input dand d0. We

deﬁne a Gaussian noise addition mechanism as

M(d) = f(d) + N(0, Sf2σ2), where N(0, Sf2σ2)

is the Gaussian distributed noise with mean 0and

standard deviation Sfσ.

C. Motivation

The privacy-preserving federated learning can

be realized by leveraging differential privacy [?],

which effectively reduce the possibility to infer

extra information through data transferred in each

training round. Ideally, it requires a trusted server

to complete the noise addition operation. In the real

world training process, we consider the following

situations:

•The server is curious: The server can nor-

mally complete the privacy processing steps

such as noise addition after FedAvg. At the

same time, it wants to infer the private data of

clients.

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 3

•The server is incompetent: The server may

fail to add noise into the averaged updates for

some reason before releasing the global model.

It puts all participating clients under great risks

of privacy leakage.

Due to the deﬁciency of centralized privacy-

preserving approach, we adjust the client selection

mechanism and shifting the noise addition to client

side in order to reduce the dependence on the server.

In the context of FedAvg, we are more interested

in the weights difference contributed by each client.

In this way, the noise contained in the aggregation

is the sum of the noise added by each client. It will

satisﬁes differential privacy if each client process

satisﬁes [?].

III. STATE-OF-THE-ART

Instead of collecting data from clients and train-

ing the model on the server in a centralized way,

federated learning allows multiple clients to learn

a model collaboratively while keeping data locally.

It provides a new solution for preserving privacy

in machine learning. Google ﬁrst proposed feder-

ated learning [?], a privacy-preserving collaborative

modeling mechanism. They applied federated learn-

ing to the input prediction and query suggestions of

Gboard [?], [?]. Konecny et al. [?] used structured

updates and model compression to reduce uplink

and downlink communication costs. Bonawitz et al.

proposed a protocol user to improve the robustness

of federated learning [?]. However, they did not

consider the privacy risks of the federated learning

mechanism.

The research of Fredrikson et al. [?] show that

after training, sample data involved in the model

training can be reconstructed via model parameters,

even if data is remained locally. To minimize such

disclosure, Geyer et al. [?] incorporated differential

privacy in the aggregation update on the server side.

Differential privacy can indeed reduce the correla-

tion between ﬁnal model and aggregated updates.

However, it is a post-processing method with some

limitations. Dealing with the results of FedAvg

directly and ignoring the training process may re-

duce the usability of the model and increase the

difﬁculty of observing the true expression of raw

data for noises. Moreover, such approach ignores

the protection of the updates transmitted during

the communication, making the model vulnerable

to inversion attacks. Agarwal et al. [?] proposed

to add noise distributedly to approximate global

privacy. Wei et al. [?] applied this method in

federated learning. The global model will satisfy

differential privacy when each part satisﬁes. How-

ever, the noise addition is performed by the server.

Such privacy preservation is invalid for untrusted

servers. Some researchers [?], [?] incorporated ho-

momorphic encryption into federated learning. For

such encryption based method, local updates are

transmitted and calculated in the form of ciphertext.

Therefore, it can well preserve the privacy without

losing model accuracy. However, the calculation

types supported by homomorphic encryption are

limited. For ciphertext, the encryption/decryption

and the calculation of it are with great computation

overhead [?]. The transmission of it is also time

consuming. These limitations make the approach

impractical to be used in IoE.

IV. PROPOSED METHOD

A. The system design

Considering the diversity of devices, we uni-

formly term the actual devices in IoE as clients for

the convenience of analysis. Fig. 1 illustrates the

main components and process of RR-LADP. For

a collaborative learning task, the server distributes

the initial model and samples clients. Then each

client randomly responds to the training request.

After that, the actual participants train the model

locally with their private data, and appropriately

add noise to weights difference for privacy pre-

serving. Finally, an edge computing node (i.e., edge

server with secure multi-party summation protocol)

aggregates all local updates and returns the result to

the server to update global model. Through several

rounds, the global model that incorporates contri-

butions from multiple clients can perform well.

B. RR federated learning

Due to the huge number of devices in IoE envi-

ronment, it is impracticable to involve all devices

that meet the training requirements in each training

round. Training tasks involving a large number of

devices is more likely to be interrupted, which

poses a great challenge to the distributed decision

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 4

Fig. 1: The RR-LADP Framework.

making capabilities of the server [?], [?]. In fact,

only a fraction of them will sufﬁce to generate a

desirable model, and in this way, the communica-

tion pressure can be effectively reduced. However,

in traditional federated learning ﬂow, the server

masters the whole process of client selection. We

propose a disturbance mechanism termed as RR,

which could cause some deviation between actual

participants and server sampling results. Therefore,

it is hard for the curious server and attackers to

correspond the ﬁnal model to a particular client.

Before each round of communication, the server

checks and establishes communication with IoT

devices which meet the training requirements. Sup-

pose a total of Keligible clients participate in

global model construction. Server randomly selects

K0(K0≤K)clients for training. We deﬁne a state

parameter λk

twith value of 0or 1represents

whether client kparticipates in round t, and a

response probability p. Based on server sampling

results, clients initialize their state parameter. All

clients keep their state parameters unchanged with

probability pand ﬂip them with probability (1 −p).

Then, we can calculate participation probability of

each client kas follows:

P r[λk

t= 1] = K0

K∗p+ (1 −K0

K)∗(1 −p)(3)

According to equation 3, server can estimate the

number of participants with ˆ

Kt=P r[λk

t= 1] ∗K.

By constructing a maximum likelihood function, we

can verify that ˆ

Ktis the unbiased estimate of Kt,

which is the number of actual participants in round

t. In this way, server can get a value that is similar

to Ktfor FedAvg [?] in each round.

Additionally, we set p=eε

eε+1 to satisfy ε-

differential privacy so that RR federated learning

can meet rigorous rather than intuitive privacy guar-

antees [?]. The response probability pand privacy

budget εare positively correlated. The higher the

privacy budget, the more likely selected clients

response, which means that actual participation is

similar to the server sampling results. It may cause

the risk of privacy leakage. Conversely, a lower

privacy budget leads to a higher ﬂip probability and

lower risk.

C. Local adaptive differential privacy

Centralized privacy enhancing process has cer-

tain risks, because the local updates of each par-

ticipant can be obtained before the aggregation.

According to the composition theorems in [?], the

global process can satisfy (ε, δ)-differential privacy,

if each local process satisﬁes (εk, δ)-differential

privacy and PK

k=1 εk≤ε. Therefore, we consider

incorporating differential privacy into the client.

The following details our solution based on Fe-

dAvg.

Algorithm 1gives the basic process of LADP.

For each responding client, we train the weights

matrix to get the difference ∆ωk

t=ωk

t−ωt−1from

the global model generated by the last round. In

each epoch, we optimize loss function by batch

gradient descent (BGD) and record the 2-norm of

the difference matrix kω−ωt−1k2. When epochs

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 5

Algorithm 1 Local update of LADP

update(k, ωt−1)

initialize λk

if λk

t= 1 then

ω←ωt−1

B ← split Setkinto batches

for each local epoch i = 1,2,3, ... do

for batch b ∈ B do

ω←ω−η∇L(ω, b)

Ci← kω−ωt−1k2

C=Ci(i= 1,2,3, ... |B|

|b|)

∆ω←(ω−ωt−1+1

|B| N(0, σ2S2))

∆ω←∆ω·min(1,C

k∆ωk2)

else

∆ω←0

return ∆ω

reach the upper bound, we stop training and clip

difference matrix with C, which is the mean of

the norms. If kω−ωt−1k2< C, keep elements

of difference matrix unchanged. Otherwise scale

down the elements with C

k∆ωk2

. It can effectively

reduce the expression of private data and improve

the generalization ability of global model. Before

sending updates to the aggregator, we add noise to

it to enhance privacy.

We adopt the Gaussian mechanism distort local

updates of each client. Noise variance σ2S2deter-

mines the retention of contributions from clients.

Excessive noise means updates is highly distorted,

but less noise cannot meet the privacy preserving.

In each training, σis ﬁxed, and the values of S

will be adjusted adaptively. On the one hand, we set

S=Cto adjust the noise addition according to the

updates itself. If a single updates is outstanding, the

noise will increase. On the other hand, we expect

the clients with different amounts of data could

contribute similarly to the global model. Thus, we

scale down the noise with 1

|B| .

D. Track privacy loss globally

Privacy loss reﬂects the risk of data privacy leak-

age. We adopt moments accountants [?] to track and

limit privacy loss. In model training with multiple

rounds, we consider the knowledge inheritance.

Suppose ξis the observation result of adjacent

datasets dand d0under M, we deﬁne privacy

loss with L(ξ)

M(pre,d)||M(pre,d0)= ln P r[M(pre,d)=ξ]

P r[M(pre,d0)=ξ].

The pre is including all previous outputs. Privacy

loss increases if the probability that the observation

comes from the original set is higher. We de-

ﬁne α(τ)

M(pre,d)||M(pre,d0)as the cumulant generating

function of L(ξ)

M(pre,d)||M(pre,d0)at value τ. Con-

sidering all the adjacent datasets and all possible

previous outputs, we track the privacy loss of client

kas follows:

α(τ)

,max

pre,d,d0α(τ)

Mk(pre,d)||Mk(pre,d0)(4)

Then we can track the global privacy loss with

α(τ)

M=Pα(τ)

Mk. For any ﬁxed privacy budget ε,

we can calculate the current value of slack variable

with δ= min

τeα(τ)

M−τ ε. When δreaches bound, the

accumulated privacy loss after current round is out

of tolerance. Thus, we stop training and return the

result. The setting of bound usually depends on

the sample space. Considering the disturbance from

both RR and LADP, we set the bound to 1

|KB| .

V. EXPERIMENT AND ANA LYSI S

We simulate the training process of federated

learning and apply our proposed mechanism to it.

These experimental results veriﬁed the feasibility

and effectiveness of our proposed mechanism. Each

client trains a fully connected neural network with

the same structure, which contains two hidden

layers with 600 and 400 neurons. The simple neural

network structure allows us to better evaluate the

impact of the mechanism. Cross entropy is chosen

as the loss function. The learning rate is 0.1and

the optimization method perform within clients

is BGD. In order to simulate non-IID distributed

data, we divide MNIST into different subsets, and

each of them contains only two or three digit

samples. A model trained on the dataset of a single

client cannot accurately recognize all digits. Before

training, each client divide its dataset into multiple

batches B={b1, b2, b3, ...}. For comparison, the

value of K0follows the CDP setting [?], that

is K0= 30,100,300 when K= 100,1000,10000.

Similarly, the batch size is set to 10 and the number

of epochs trained by each client is 4.

Separate performance: In the early stages of

the experiment, we evaluate the effects of RR and

LADP separately. To verify the feasibility of RR,

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 6

Fig. 2: Training process of RR federated learning. (We do not apply privacy bound here, but set the

max round to 100. The legend in plot 3also applies to the other two plots.)

Fig. 3: Results on the Accuracy for LADP and CDP. (σ= 1)

we conducted experiments on RR federated learn-

ing under different disturbances, which depends

on ε, when K= 100 and K0= 30. Although

the actual number of participants ﬂuctuates around

estimated result, it does not have consequences on

the normal convergence and accuracy of the global

model(see Fig. 2). At the same time, we design

the comparative experiment of LADP and CDP to

observe the actual performance of LADP. Fig. 3

illustrates that the accuracy of LADP rises smoothly

and performs well with different privacy budgets,

especially when the participants are few and the

privacy budget is low. LADP directly adds noise

to difference matrices ∆ω. Suppose the noise is ζ.

We can get ∆ω+ζ=η(∇L +ζ

η). It is equivalent

to disturbing gradient so that the inﬂuence on the

ﬁnal model is traceable and controllable. However,

CDP deals with the average, and treats the training

process of each client as a black box. Then, its

performance usually ﬂuctuates.

Comparison with CDP: To compare RR-LADP

and CDP, we track privacy budget in each mecha-

nism by calculating δ. In fact, accumulated knowl-

edge inheritance pre makes the privacy loss of

each round increase rapidly. Saving privacy budget

of the same order of magnitude can not support

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 7

Fig. 4: Training process of RR-LADP and CDP

when K= 1000 and K0= 100. The solid lines

represent loss and the dotted lines represent δ.

(ε= 8, σ = 1)

more training rounds. Therefore, careful allocation

means efﬁcient data usage. Fig. 4 tracks the loss

and δas communication rounds increased under

two mechanisms. It shows that RR-LADP allocated

the privacy budget more carefully, while the loss of

global model drops as similar to CDP.

TABLE I: Accuracy and time spent in each round

of RR-LADP and CDP. (ε= 8,σ= 1)

Clients Rounds CDP RR-LADP

100 100 0.73 |25.9s 0.80 |26.5s

1000 200 0.91 |85.7s 0.92 |89.2s

10000 400 0.96 |315.9s 0.97 |336.1s

Table I shows the average accuracy of mul-

tiple training and training time per round. RR-

LADP achieves higher accuracy than CDP, while

increasing the time cost. The accuracy of RR-LADP

depends on the effects of RR and LADP, which

has been described in the separate experiments.

For CDP, the noise addition operation for privacy

appears only once on the server side in each training

round. However for RR-LADP, all clients in one

training round should perform the noise addition

operation, bringing the more time delay.

Factors affecting RR-LADP: The accuracy of

the ﬁnal model is affected by multiple factors,

such as the batch size, learning rate and other

common parameters in machine learning. We have

Fig. 5: Results on the accuracy and δfor different

privacy budgets. The solid lines represent

accuracy and the dotted lines represent δ.

(σ= 1, K = 1000, K0= 100)

not struggled to ﬁnd the best combination of these

parameters, but only discuss some signiﬁcant fac-

tors in RR-LADP, including number of clients,

privacy budget, and the noise.

The number of clients determines the training

data and determines the possibility of privacy leak-

age by affecting the sampling probability. The lower

privacy loss in each round means more training

rounds and higher accuracy. At the same time,

the privacy bound, which is set to 1

|KB| , decreases

signiﬁcantly if the number of clients increases.

Privacy budget εplays a pivotal role in RR-

LADP (see Fig. 5). It controls the training process

of in two ways. First, it determines the response

probability p, which affects the disturbance in the

RR mechanism. In fact, K0

K∗pin equation 3 rep-

resents the part selected by the server to participate

in training. The server infer the local updates of

a particular client easily if overlap ratio is high.

Second, after updating global model in each round,

we track δwith the ﬁxed ε. The lower ε, the higher

δ, which means it is easier to reach the boundary

and the fewer rounds are allowed.

Our mechanism allows to control model perfor-

mance by choosing the value of σ. The level of

noise added in the local updates can directly affect

the accuracy of the model. As shown in Fig. 6,

the independent variable is the noise parameter σ.

When less noise is added, the privacy loss in each

round increases. After few rounds of training, δ

reaches bound when accuracy of the model is low.

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 8

Fig. 6: Results on the accuracy for different noise

parameter. (ε= 8, K = 1000, K0= 100)

By adding more noise, the model gains more train-

ing rounds with a ﬁxed privacy budget. However,

much noise will reduce data availability, and limit

the accuracy.

VI. CO NC LU SI ON S

This paper proposes RR-LADP, a federated

learning mechanism for IoE, based on randomized

response client selection and differentially private

client model training. Different from existing ap-

proaches that require a trusted server to take charge

of the privacy-enhanced process, the core strategy

of our approach is to enhance the clients privacy

locally to adapt the approach into an environment

without trusted servers. It does so by preventing

the server from knowing which clients’ updates

are collected in each round, as well as adding

noise adaptively to clients’ local updates before

submitting them to the server. We show the reliable

performance of RR-LADP through experiments

with different parameter settings. While providing

a higher level privacy-preserving capability, our

approach achieves 0.97 training accuracy in the

experiments on MNIST. Additionally, as a modiﬁed

version of traditional federated learning framework,

our approach has a potential to be used to train

various machine learning models, instead of a

single structural model. In the current format of

RR-LADP, a global privacy budget is introduced

to control the whole training process, with RR

focusing on preserving the privacy of a client set

and LADP focusing on preserving the private data

of each client. Therefore, while the objections to

be protected are different, the two mechanisms are

sharing a privacy budget. Potential improvements

can be achieved by revising such structure. Hence,

a future study will give consideration to explore

a more delicate allocation of privacy budgets by

tracking privacy losses in the two mechanisms

separately. In addition, we will improve RR-LADP

by applying the mechanism into real-world IoE

environment, such as intelligent wearable devices

and Internet of Vehicles.

ACK NOWLEDG ME NT

This work was supported by the National Key

Research and Development Program of China

(2017YFB0802204), Key-Area Research and De-

velopment Program for Guangdong Province,

China (2019B010136001), Basic Research Project

of Shenzhen, China (JCYJ20190806143418198),

and Basic Research Project of Shenzhen, China

(JCYJ20190806142601687). Corresponding au-

thors: Weizhe Zhang and Yang Liu.

Zerui Li is currently a MSc student with School of Computer

Science and Technology, Harbin Institute of Technology (Shen-

zhen), China. His research interests include information secu-

rity and privacy. Contact him at 18S151552@stu.hit.edu.cn.

Yuchen Tian is currently a MSc student with School

of Computer Science and Technology, Harbin Institute of

Technology (Shenzhen), China. His research interests in-

clude information security and privacy. Contact him at

19S051060@stu.hit.edu.cn.

Weizhe Zhang is currently a professor in the School of

Computer Science and Technology at Harbin Institute of Tech-

nology, China. He has published more than 100 academic

papers in journals, books, and conference proceedings. He is

a senior member of the IEEE. He is the corresponding author

of this article. Contact him at wzzhang@hit.edu.cn.

Qing Liao is currently an associate professor with School

of Computer Science and Technology, Harbin Institute of

Technology (Shenzhen), China. She received her Ph.D degree

from the Hong Kong University of Science and Technology.

Contact her at liaoqing@hit.edu.cn.

Yang Liu is currently an assistant professor with School

of Computer Science and Technology, Harbin Institute of

Technology (Shenzhen), China. He received his D.Phil (Ph.D)

degree in department of computer science from University of

Oxford. He is the corresponding author of this article. Contact

him at liu.yang@hit.edu.cn.

Xiaojiang Du is a tenured Full Professor and the Director of

the Security And Networking (SAN) Lab in the Department

of Computer and Information Sciences at Temple University,

Philadelphia, USA. He has authored over 400 journal and

conference papers in these areas, as well as a book published

by Springer. He is an IEEE Fellow and a Life Member of

ACM. Contact him at dxj@ieee.org.

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

Electronics Magazine

IEEE CONSUMER ELECTRONICS MAGAZINE 9

Mohsen Guizani is currently a Professor with the CSE

Department, Qatar University, Qatar. He is currently the

Editor-in-Chief of the IEEE Network Magazine, serves on

the editorial boards of several international technical journals,

and the Founder and Editor-in-Chief of Wireless Commu-

nications and Mobile Computing (Wiley). Contact him at

mguizani@ieee.org.

Authorized licensed use limited to: University Town Library of Shenzhen. Downloaded on May 30,2021 at 02:50:56 UTC from IEEE Xplore. Restrictions apply.

R E V I E W Recent trends towards privacy-preservation in Internet of Things, its challenges and future directions

Article

Full-text available

Feb 2023

The Internet of Things (IoT) is a self-configuring, intelligent system in which autonomous things connect to the Internet and communicate with each other. As ‘things’ are autonomous, it may raise privacy concerns. In this study, the authors describe the background of IoT systems and privacy and security measures, including (a) approaches to preserving privacy in IoT-based systems, (b) existing privacy solutions, and (c) recommending privacy models for different layers of IoT applications. Based on the results of our study, it is clear that new methods such as Blockchain, Machine Learning, Data Minimisation, and Data Encryption can greatly impact privacy issues to ensure security and privacy. Moreover, it makes sense that users can protect their personal information easier if there is fewer data to collect, store, and share by smart devices. Thus, this study proposes a machine learning-based data minimisation method that, in these networks, can be very beneficial for privacy-preserving.

Security Provisions in Smart Edge Computing Devices Using Blockchain and Machine Learning Algorithms: A Novel Approach

Article

Full-text available

Jan 2022

R E V I E W Recent trends towards privacy-preservation in Internet of Things, its challenges and future directions

Article

Full-text available

Jan 2023

Recent trends towards privacy‐preservation in Internet of Things, its challenges and future directions

Article

Full-text available

Dec 2022
IET CIRC DEVICE SYST

The Internet of Things (IoT) is a self‐configuring, intelligent system in which autonomous things connect to the Internet and communicate with each other. As ‘things’ are autonomous, it may raise privacy concerns. In this study, the authors describe the background of IoT systems and privacy and security measures, including (a) approaches to preserving privacy in IoT‐based systems, (b) existing privacy solutions, and (c) recommending privacy models for different layers of IoT applications. Based on the results of our study, it is clear that new methods such as Blockchain, Machine Learning, Data Minimisation, and Data Encryption can greatly impact privacy issues to ensure security and privacy. Moreover, it makes sense that users can protect their personal information easier if there is fewer data to collect, store, and share by smart devices. Thus, this study proposes a machine learning‐based data minimisation method that, in these networks, can be very beneficial for privacy‐preserving.

Security provisions in smart edge computing devices using blockchain and machine learning algorithms: a novel approach

Article

Full-text available

Nov 2022
CLUSTER COMPUT

It is difficult to manage massive amounts of data in an overlying environment with a single server. Therefore, it is necessary to comprehend the security provisions for erratic data in a dynamic environment. The authors are concerned about the security risk of vulnerable data in a Mobile Edge based distributive environment. As a result, edge computing appears to be an excellent perspective in which training can be done in an Edge-based environment. The combination of Edge computing and consensus approach of Blockchain in conjunction with machine learning techniques can further improve data security, mitigate the possibility of exposed data, and it reduces the risk of a data breach. As a result, the concept of federated learning provides a path for training the shared data. A dataset was collected that contained several vulnerable, exposed, recovered, and secured data and data security was precepted under the surveillance of two-factor authentication. This paper discusses the evolution of data and security flaws and their corresponding solutions in smart edge computing devices. The proposed model incorporates data security using consensus approach of Blockchain and machine learning techniques that include several classifiers and optimization techniques. Further, the authors applied the proposed algorithms in an edge computing environment by distributing several batches of data to different clients. As a result, the client privacy was maintained by using Blockchain servers. Furthermore, the authors segregated the client data into batches that were trained using the federated learning technique. The results obtained in this paper demonstrate the implementation of a Blockchain-based training model in an edge-based computing environment.

Optimal Multikey Homomorphic Encryption with Steganography Approach for Multimedia Security in Internet of Everything Environment

Article

Full-text available

Apr 2022

Recent developments of semiconductor and communication technologies have resulted in the interconnection of numerous devices in offering seamless communication and services, which is termed as Internet of Everything (IoE). It is a subset of Internet of Things (IoT) which finds helpful in several applications namely smart city, smart home, precise agriculture, healthcare, logistics, etc. Despite the benefits of IoE, it is limited to processing and storage abilities, resulting in the degradation of device safety, privacy, and efficiency. Security and privacy become major concerns in the transmission of multimedia data over the IoE network. Encryption and image steganography is considered effective solutions to accomplish secure data transmission in the IoE environment. For resolving the limitations of the existing works, this article proposes an optimal multikey homomorphic encryption with steganography approach for multimedia security (OMKHES-MS) technique in the IoE environment. Primarily, singular value decomposition (SVD) model is applied for the separation of cover images into RGB elements. Besides, optimum pixel selection process is carried out using coyote optimization algorithm (COA). At the same time, the encryption of secret images is performed using poor and rich optimization (PRO) with multikey homomorphic encryption (MKHE) technique. Finally, the cipher image is embedded into the chosen pixel values of the cover image to generate stego image. For assessing the better outcomes of the OMKHES-MS model, a wide range of experiments were carried out. The extensive comparative analysis reported the supremacy of the proposed model over the rennet approaches interms of different measures.

A privacy-preserving federated learning protocol with a secure data aggregation for the Internet of Everything

Article

May 2024
COMPUT COMMUN

Sultan Basudan

Differentially Private Federated Learning With Importance Client Sampling

Article

Jan 2023

As numerous consumer electronics applications like smartphones and wearables generate lots of distributed data daily, consumer desire to safely and efficiently tackle private and isolated data. Federated learning (FL) is hopeful to satisfy the above requirement due to strong data security and applicability to large-scale scenarios. But diverse clients inevitably cause non-independent and identically distributed (non-iid) data among clients, which severely hinders performance analysis. Besides, affected by non-iid data, the participating clients are typically heterogeneous, which induces the client sampling problem. More importantly, albeit FL can enhance privacy via data localization, for highly secret data like physiological data from wearables, FL should possess stronger security to prevent third-party attacks. For data heterogeneity, client sampling, and privacy security, we propose differential privacy (DP) enabled and importance-aware FL algorithm DPFLICS to jointly handle these problems. Specifically, we utilize the truncated concentrated DP to tightly track the end-to-end privacy loss. To attain better sampling, the server selects partial clients with the probability derived from our importance client sampling. Moreover, to further improve performance, we also leverage the adaptive YOGI optimizer on the server side, which is an adaptive gradient method improved from the widely-used ADAM optimization. Finally, the multiple experiments exhibit the effectiveness of our method.

User-Centric Federated Matrix Factorization Based on Differential Privacy

Article

May 2023

Matrix factorization is a popular recommendation method used in many fields, but it often lacks sufficient privacy protection. To address privacy concerns in connected living, we propose a user-centric federated matrix factorization model based on differential privacy (FMF-DP). First, we design a federated matrix factorization framework where each user is a participant, and only uploads their gradients after completing the training task locally. Second, we introduce the user-average-rating bias into the matrix factorization method to improve accuracy. Finally, we use differential privacy to enhance privacy protection without compromising computational efficiency. We have implemented a prototype of FMF-DP and tested it on several datasets, showing promising results. Compared with the baseline model, the RMSE is reduced by 0.25 on average, and the computational efficiency is also improved.

Federated Learning and Differential Privacy in Clinical Health: Extensive Survey

Article

Full-text available

Apr 2023

David Odera

Federated Learning (FL) is concept that has been adopted in medical field to analyze data in individual devices through aggregation of machine learning model in global server. It also provides data privacy being that the sampled devices are not allowed to share data among themselves. Therefore, it minimizes computation costs and privacy risks to some extent compared to conventional methods of machine learning. However, federation learning provides a different use case in health as compared to other sectors. Preservation of patients' sensitive information such as electronic health record (EHR) when sharing data among different medical practitioners is of greatest concern. So the question is, how should FL techniques be structured in the current clinical environment where heterogeneity is the order of the day?The EU's General Data Protection Regulation (GDPR) and Health Insurance Portability and Accountability Act of 1996 (HIPPA) regulations recommends health providers to gain authorizations from patients before sharing their private data for medical analytical progression. This leads to some bottlenecks in clinical analysis. Although attempts have been made to address some of the challenges, privacy, performance, implementation, computation and adversaries still pose some threats. This paper provides a comprehensive review that covers literature, mathematical notations, architecture, process flow, challenges and frameworks used to implement FL with respect to healthcare. Possible solutions on how to address privacy challenges in accordance with HIPPA act and GDPR is discussed. Finally, the study gives future direction of FL in clinical health and a list of practical tools to conduct analysis on patients' data.

Preserving Data Privacy via Federated Learning: Challenges and Solutions

Article

Full-text available

May 2020

Data have always been a major priority for businesses of all sizes. Businesses tend to enhance their ability in contextualizing data and draw new insights from it as the data itself proliferates with the advancement of technologies. Federated learning acts as a special form of privacy-preserving machine learning technique and can contextualize the data. It is a decentralized training approach for privately collecting and training the data provided by mobile devices, which are located at different geographical locations. Furthermore, users can benefit from obtaining a well-trained machine learning model without sending their privacy-sensitive personal data to the cloud. This article focuses on the most significant challenges associated with the preservation of data privacy via federated learning. Valuable attack mechanisms are discussed, and associated solutions are highlighted to the corresponding attack. Several research aspects along with promising future directions and applications via federated learning are additionally discussed.

Differentially Private Federated Learning: A Client Level Perspective

Article

Full-text available

Dec 2017

Federated learning is a recent advance in privacy protection. In this context, a trusted curator aggregates parameters optimized in decentralized fashion by multiple clients. The resulting model is then distributed back to all clients, ultimately converging to a joint representative model without explicitly having to share the data. However, the protocol is vulnerable to differential attacks, which could originate from any party contributing during federated optimization. In such an attack, a client's contribution during training and information about their data set is revealed through analyzing the distributed model. We tackle this problem and propose an algorithm for client sided differential privacy preserving federated optimization. The aim is to hide clients' contributions during training, balancing the trade-off between privacy loss and model performance. Empirical studies suggest that given a sufficiently large number of participating clients, our proposed procedure can maintain client-level differential privacy at only a minor cost in model performance.

Practical Secure Aggregation for Privacy-Preserving Machine Learning

Conference Paper

Full-text available

Oct 2017

We design a novel, communication-efficient, failure-robust protocol for secure aggregation of high-dimensional data. Our protocol allows a server to compute the sum of large, user-held data vectors from mobile devices in a secure manner (i.e. without learning each user's individual contribution), and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network. We prove the security of our protocol in the honest-but-curious and active adversary settings, and show that security is maintained even if an arbitrarily chosen subset of users drop out at any time. We evaluate the efficiency of our protocol and show, by complexity analysis and a concrete implementation, that its runtime and communication overhead remain low even on large data sets and client pools. For 16-bit input values, our protocol offers $1.73 x communication expansion for 2¹⁰ users and 2²⁰-dimensional vectors, and 1.98 x expansion for 2¹⁴ users and 2²⁴-dimensional vectors over sending data in the clear.

Federated Learning With Differential Privacy: Algorithms and Performance Analysis

Article

Apr 2020

Federated learning (FL), as a type of distributed machine learning, is capable of significantly preserving clients’ private data from being exposed to adversaries. Nevertheless, private information can still be divulged by analyzing uploaded parameters from clients, e.g., weights trained in deep neural networks. In this paper, to effectively prevent information leakage, we propose a novel framework based on the concept of differential privacy (DP), in which artificial noise is added to parameters at the clients’ side before aggregating, namely, noising before model aggregation FL (NbAFL). First, we prove that the NbAFL can satisfy DP under distinct protection levels by properly adapting different variances of artificial noise. Then we develop a theoretical convergence bound on the loss function of the trained FL model in the NbAFL. Specifically, the theoretical bound reveals the following three key properties: 1) there is a tradeoff between convergence performance and privacy protection levels, i.e., better convergence performance leads to a lower protection level; 2) given a fixed privacy protection level, increasing the number $N$ of overall clients participating in FL can improve the convergence performance; and 3) there is an optimal number aggregation times (communication rounds) in terms of convergence performance for a given protection level. Furthermore, we propose a $K$ -client random scheduling strategy, where $K$ ( $1\leq K< N$ ) clients are randomly selected from the $N$ overall clients to participate in each aggregation. We also develop a corresponding convergence bound for the loss function in this case and the $K$ -client random scheduling strategy also retains the above three properties. Moreover, we find that there is an optimal $K$ that achieves the best convergence performance at a fixed privacy level. Evaluations demonstrate that our theoretical results are consistent with simulations, thereby facilitating the design of various privacy-preserving FL algorithms with different tradeoff requirements on convergence performance and privacy levels.

Preserving Balance Between Privacy and Data Integrity in Edge-Assisted Internet of Things

Article

Nov 2019

Internet of Things (IoT) devices and the edge jointly broaden the IoT’s sensing capability and the monitoring scope for various applications. Though accessing sensing data and making decisions through IoT smart devices turns out to be commonplace, it is challenging to guarantee user privacy and preserve the accuracy (integrity) of the collected data. The IoT smart devices frequently lose either IoT user’s privacy or data integrity. This also makes it crucial to put a threshold on the cost of computation and load of the IoT devices, as gradually more IoT services demand access to the resources that devices offer. In this article, we propose BalancePIC , a scheme that attempts to preserve a balance in the three aspects (user privacy, data integrity in edge-assisted IoT devices, and the computational cost). It achieves the balance through a balanced truth discovery approach and a proposed enhanced technique for data privacy, which are used in IoT devices and edge server interactions. It authenticates the IoT user participation with privacy in the truth discovery process through a biometric-ECC-based authentication algorithm. The nature of the BalancePIC scheme is to straightforwardly provide the likelihood for a simple amendment on the cryptography technique and weight assignment. This lessens the overall computational cost for the IoT user devices but also restricts the communications between the user devices and the edge server, which is important for data integrity. We present an enhanced technique to preserve privacy by guarding the user from potential threats and suspicious data collection parties. To achieve this, BalancePIC takes steps to blur the original sensory data of the device by processing results in groups called zones. Simulation result analysis provides evidence for the balance preservation in the three aspects.

Program Analysis of Commodity IoT Applications for Security and Privacy: Challenges and Opportunities

Article

Aug 2019

Recent advances in Internet of Things (IoT) have enabled myriad domains such as smart homes, personal monitoring devices, and enhanced manufacturing. IoT is now pervasive—new applications are being used in nearly every conceivable environment, which leads to the adoption of device-based interaction and automation. However, IoT has also raised issues about the security and privacy of these digitally augmented spaces. Program analysis is crucial in identifying those issues, yet the application and scope of program analysis in IoT remains largely unexplored by the technical community. In this article, we study privacy and security issues in IoT that require program-analysis techniques with an emphasis on identified attacks against these systems and defenses implemented so far. Based on a study of five IoT programming platforms, we identify the key insights that result from research efforts in both the program analysis and security communities and relate the efficacy of program-analysis techniques to security and privacy issues. We conclude by studying recent IoT analysis systems and exploring their implementations. Through these explorations, we highlight key challenges and opportunities in calibrating for the environments in which IoT systems will be used.

Deep Learning with Differential Privacy

Conference Paper

Jul 2016

Machine learning techniques based on neural networks are achieving remarkable results in a wide variety of domains. Often, the training of models requires large, representative datasets, which may be crowdsourced and contain sensitive information. The models should not expose private information in these datasets. Addressing this goal, we develop new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy. Our implementation and experiments demonstrate that we can train deep neural networks with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.

The Algorithmic Foundations of Differential Privacy

Article

Jan 2014

Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures

Conference Paper

Oct 2015

Machine-learning (ML) algorithms are increasingly utilized in privacy-sensitive applications such as predicting lifestyle choices, making medical diagnoses, and facial recognition. In a model inversion attack, recently introduced in a case study of linear classifiers in personalized medicine by Fredrikson et al., adversarial access to an ML model is abused to learn sensitive genomic information about individuals. Whether model inversion attacks apply to settings outside theirs, however, is unknown. We develop a new class of model inversion attack that exploits confidence values revealed along with predictions. Our new attacks are applicable in a variety of settings, and we explore two in depth: decision trees for lifestyle surveys as used on machine-learning-as-a-service systems and neural networks for facial recognition. In both cases confidence values are revealed to those with the ability to make prediction queries to models. We experimentally show attacks that are able to estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other and, in the other context, show how to recover recognizable images of people's faces given only their name and access to the ML model. We also initiate experimental exploration of natural countermeasures, investigating a privacy-aware decision tree training algorithm that is a simple variant of CART learning, as well as revealing only rounded confidence values. The lesson that emerges is that one can avoid these kinds of MI attacks with negligible degradation to utility.

The Algorithmic Foundations of Differential Privacy

Article

Jan 2013

The problem of privacy-preserving data analysis has a long history spanning multiple disciplines. As electronic data about individuals becomes increasingly detailed, and as technology enables ever more powerful collection and curation of these data, the need increases for a robust, meaningful, and mathematically rigorous definition of privacy, together with a computationally rich class of algorithms that satisfy this definition. Differential Privacy is such a definition. After motivating and discussing the meaning of differential privacy, the preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example. A key point is that, by rethinking the computational goal, one can often obtain far better results than would be achieved by methodically replacing each step of a non-private computation with a differentially private implementation. Despite some astonishingly powerful computational results, there are still fundamental limitations – not just on what can be achieved with differential privacy but on what can be achieved with any method that protects against a complete breakdown in privacy. Virtually all the algorithms discussed herein maintain differential privacy against adversaries of arbitrary computational power. Certain algorithms are computationally intensive, others are efficient. Computational complexity for the adversary and the algorithm are both discussed. We then turn from fundamentals to applications other than query-release, discussing differentially private methods for mechanism design and machine learning. The vast majority of the literature on differentially private algorithms considers a single, static, database that is subject to many analyses. Differential privacy in other models, including distributed databases and computations on data streams is discussed. Finally, we note that this work is meant as a thorough introduction to the problems and techniques of differential privacy, but is not intended to be an exhaustive survey – there is by now a vast amount of work in differential privacy, and we can cover only a small portion of it.

RR-LADP: A Privacy-Enhanced Federated Learning Scheme for Internet of Everything

Abstract and Figures

Recommended publications

Concentrated Differentially Private Federated Learning With Performance Analysis

Secure Federated Averaging Algorithm with Differential Privacy

Performance-Enhanced Federated Learning With Differential Privacy for Internet of Things

Hybrid differential privacy based federated learning for Internet of Things