ArticlePDF Available

P³: Privacy-Preserving Scheme Against Poisoning Attacks in Mobile-Edge Computing

February 2020
IEEE Transactions on Computational Social Systems PP(99):1-9

February 2020
PP(99):1-9

DOI:10.1109/TCSS.2019.2960824

Authors:

Ping Zhao

Donghua University

Daiyu Huang

Donghua University

Mobile-edge computing (MEC) has emerged to enable users to offload their location data into the MEC server, and at the same time, the MEC server executes the location-aware data processing to compute the statistical results about these collected locations. However, malicious users may deliberately generate poisoning locations and send these poisoning locations to the MEC server, aiming to poison the statistical results learned by the MEC server and even the other users' location privacy. Existing work concerning privacy preservation in MEC has not studied such poisoning attacks in MEC. Another line of somehow related work focused on poisoning attacks in a different scenario-adversarial machine learning. However, MEC exhibits different features with the machine learning settings, and thus, the privacy preservation against poisoning attacks in MEC faces significantly new challenges. To address the problem, we propose the privacy-preserving scheme, i.e., privacy-preserving scheme against poisoning (P <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> ), that utilizes the feature learning model to infer the social relationships among users from their location data and then constructs the inferred social graph. Thereafter, it searches the optimal map between the inferred social graph and the social graph from social networks to identify the poisoning locations. Experiments on two real-world data sets, two baseline works, and two kinds of poisoning attacks have demonstrated the privacy preservation against the poisoning attacks in MEC P <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">3</sup> provides.

Illustration of poisoning attacks in MEC.

…

Figures - uploaded by Ping Zhao

Content may be subject to copyright.

Content uploaded by Ping Zhao

Content may be subject to copyright.

818 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020

P3: Privacy-Preserving Scheme Against Poisoning

Attacks in Mobile-Edge Computing

Ping Zhao, Member, IEEE, Haojun Huang ,Member, IEEE, Xiaohui Zhao, and Daiyu Huang

Abstract— Mobile-edge computing (MEC) has emerged to

enable users to ofﬂoad their location data into the MEC server,

and at the same time, the MEC server executes the location-aware

data processing to compute the statistical results about these

collected locations. However, malicious users may deliberately

generate poisoning locations and send these poisoning locations

to the MEC server, aiming to poison the statistical results learned

by the MEC server and even the other users’ location privacy.

Existing work concerning privacy preservation in MEC has

not studied such poisoning attacks in MEC. Another line of

somehow related work focused on poisoning attacks in a different

scenario—adversarial machine learning. However, MEC exhibits

different features with the machine learning settings, and thus,

the privacy preservation against poisoning attacks in MEC faces

signiﬁcantly new challenges. To address the problem, we propose

the privacy-preserving scheme, i.e., privacy-preserving scheme

against poisoning (P3), that utilizes the feature learning model

to infer the social relationships among users from their location

data and then constructs the inferred social graph. Thereafter,

it searches the optimal map between the inferred social graph and

the social graph from social networks to identify the poisoning

locations. Experiments on two real-world data sets, two baseline

works, and two kinds of poisoning attacks have demonstrated

the privacy preservation against the poisoning attacks in MEC

P3provides.

Index Terms—Feature learning, location privacy, mobile-ed ge

computing (MEC), poisoning attacks, social relationship.

I. INTRODUCTION

WITH the development of the mobile Internet, mobile-

edge computing (MEC) has been widely applied.

In MEC systems, as shown in Fig. 1, users ofﬂoad their

location data into the MEC server, and the MEC server

computes the statistical results about these locations to support

Manuscript received July 1, 2019; revised October 7, 2019; accepted

November 18, 2019. Date of publication February 20, 2020; date of current

version June 10, 2020. This work was supported in part by the National

Natural Science Foundation of China under Grant 61902060, Grant 61801106,

Grant 61671216, Grant 61977064, and Grant 61871436, in part by the Shang-

hai Sailing Program under Grant 19YF1402100, in part by the Chenguang

Program supported by the Shanghai Education Development Foundation and

the Shanghai Municipal Education Commission, in part by the Fundamental

Research Funds for the Central Universities under Grant 2232019D3-51,

in part by Initial Research Funds for Young Teachers of Donghua University,

and in part by the Shanghai Rising-Star Program under Grant 19QA1400300.

(Corresponding author: Haojun Huang.)

Ping Zhao, Xiaohui Zhao, and Daiyu Huang are with the College of

Information Science and Technology, Donghua University, Shanghai 201620,

China (e-mail: pingzhao2018ph@dhu.edu.cn; 160910903@mail.dhu.edu.cn;

160910309@mail.dhu.edu.cn).

Haojun Huang is with the School of Electronic Information and

Communications, Huazhong University of Science and Technology, Wuhan

430074, China (e-mail: hjhuang@hust.edu.cn).

Digital Object Identiﬁer 10.1109/TCSS.2019.2960824

new edge intelligence applications. However, malicious users

(hereafter, adversaries) may deliberately generate and send

poisoning locations to the MEC server to poison the statistical

results about users’ locations, incurring the MEC server and

even other users suffering from the poisoning attacks. Such

poisoning attacks have been the bottleneck for the wide

development and applications of MEC systems and, thus, have

attracted widespread concern [1]–[3].

Existing work concerning privacy preservation in

MEC [4]–[11] mainly used anonymous mechanisms and

trafﬁc detection techniques combined with machine learning

to protect users’ location privacy or the proposed relevant

algorithms to minimize the loss of privacy in MEC systems,

such as task ofﬂoading, mobile support system (MSS),

blockchain, and so on. However, these studies neither study

the defense mechanisms against poisoning attacks nor focus

on poisoning attack schemes. Another line of somehow related

work [12]–[19] focused on poisoning attacks in an adversarial

machine learning scenario. Nevertheless, compared with the

machine learning settings, MEC is a different scenario and

exhibits different features, e.g., task ofﬂoading and time-delay

sensitivity. Other works [20]–[27] focused on poisoning attack

schemes in the Internet of Things (IoT). Unfortunately, these

works did not study defense mechanisms against poisoning

attacks. In addition, several works [28]–[32] focused on the

defenses against poisoning attacks in scenarios, machine

learning, named data networks, and IoT. However, these

works considered different scenarios rather than the scenario,

MEC, and more importantly, MEC exhibits different features

that bring in extremely new challenges to the defense

mechanisms against poisoning attacks in MEC. In summary,

it is necessary to study defense mechanisms against poisoning

attacks in MEC.

To address the above problem, in this article, we propose P3,

i.e., the Privacy-Preserving scheme against Poisoning attacks

in MEC. Speciﬁcally, it ﬁrst utilizes the feature learning model

to infer the social relationship among users from their location

data and then constructs the inferred social graph with the

help of the inferred relationships. On this basis, it searches the

optimal map between the inferred social graph and the social

graph from social networks to identify the poisoning locations.

Finally, it validates the performance utilizing real-world data

sets. In summary, we make the following contributions.

1) We propose to utilize the feature learning model to

construct the inferred social graph without any domain

experts’ knowledge. Most existing works relied on

See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

ZHAO et al.:P

3ATTAC K S I N M E C 819

Fig. 1. Illustration of poisoning attacks in MEC.

the simple heuristic algorithm that involves too much

domain experts’ knowledge and, thus, incurs too many

errors. On the contrary, with the knowledge of loca-

tions collected by the MEC server, we propose to ﬁrst

characterize the social behavior of users’ mobility using

feature learning and then, on this basis, infer the social

relationships and construct the inferred social graph.

2) We propose to mirror the inferred relationships to their

social relationships in social networks to identify the

poisoning locations. To concrete, the inferred social

graph is structurally correlated with the social graph

in social networks, and thus, the inferred relationships

can be mapped to the relationships in a social graph.

As a result, the poisoning locations can be identiﬁed via

searching the optimal map between the inferred social

graph and the social graph from social networks.

3) We use two real-world data sets to validate the effective-

ness of the proposed scheme P3, and simulation results

validate the location privacy preservation against the

poisoning attacks. Speciﬁcally, we use loc-Gwalla and

loc-Brightkite data sets, compare P3with two baseline

work, and investigate the performance in two kinds of

poisoning attacks. Moreover, simulation results prove

that P3outperforms the two baselines in both the two

data sets and the two kinds of poisoning attacks.

The remainder of this article is organized as follows.

Section II introduces the adversary model. Then, Section III

proceeds to describe the design of the privacy-preserving

scheme P3in detail, followed by the evaluation in Section IV.

Section V reviews the related work. Finally, Section VI

concludes this article.

II. ADVER SA RY MODEL

As shown in Fig. 1, users ofﬂoad the location data to

the MEC server, and the MEC server computes the corre-

sponding results and returns to the users. At the same time,

the MEC server collects these users’ locations and executes

a certain kind of statistical function to calculate the statistical

results about these collected locations. These statistical results

are expected to support new edge intelligence applications.

However, malicious users (i.e., adversaries) generate poisoning

locations and send these poisoning locations to the MEC

server, aiming to poison the statistical results learned by the

MEC server [33], [34]. What’s more, these poisoned statistical

results will further lead to serious errors in applications,

e.g., map inference and smart transportation. For example,

in the hearing “The Dawn of AI,” the vulnerabilities of systems

to poisoning attacks have attracted a widespread concern of

experts from academia and industry.

Adversaries have the background knowledge of the locations

of a small present of users. They conduct poisoning attacks

via generating poisoning locations and send these poisoning

locations to the MEC server on the basis of these known

locations. Adversaries’ capability is limited by the number of

these poisoning points. In this article, adversaries are assumed

to be able to control a small present of users, and the rate

of these poisoning locations is set less than 20%. In the

scenario, MEC, adversaries indeed can inject a small present of

poisoning locations since a large number of users involve the

MEC applications and send locations to the MEC server. Both

the MEC server and the cloud server are assumed to be trusted

and honestly perform the proposed scheme P3. The goal of P3

is to identify these poisoning locations.

III. DESIGN OF P3

The main idea behind the proposed scheme P3,asshown

in Fig. 2, is that it ﬁrst characterizes the social behavior of

users’ mobility using feature learning since users’ mobility

behavior is also shaped by their social relationships (i.e., why

they move) [1]–[3]. Then, on this basis, it infers the social

relationships among users and constructs the inferred social

graph. Thereafter, it searches the optimal map between the

inferred social graph and the social graph from social networks

utilizing the structural correlations between the two graphs

to identify the poisoning locations. In summary, it mainly

includes two steps: constructing inferred social graph and

mapping the inferred social graph and the social graph from

social networks.

A. Construction of Inferred Social Graph

We ﬁrst search the mobility neighborhoods using the

random walk, which is shown in Fig. 3. Speciﬁcally, we

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

820 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020

Fig. 2. Work ﬂow of P3.

Fig. 3. Illustration of construction of inferred social graph. (a) Users and locations. (b) Weighted bipartite graph Go. (c) Random walk traces.

construct the weighted bipartite graph Go={U,L,E}

[see Fig. 3(a) and (b)], where Uis the set of users, Lis the set

of locations of users, and Eis the set of edges connecting users

and locations. The weight wu,lof a speciﬁc edge (u,l)∈E

(u∈U,l∈L) is the times of the user uchecking in the

location l. We deﬁne the graph neighborhoods of a speciﬁc

node x∈U∪Lby gn(x), which is the set of nodes connecting

with x. Then, we generate πrandom walk traces [see Fig. 3(c)]

for each user, and the walk length of each trace is wl.Fora

speciﬁc user u, we denote the current node in the random

walk trace by nownand the next node by nextn. Thereafter,

we deﬁne the next node in the random walk trace nextnin (1),

shown at the bottom of this page. For example, the next

node in the random walk trace nextnis sampled with the

probability P(nextn=x/nown)[see (1)]. Then, the mobility

neighborhoods gn(u)of the user uare the nodes before and

after the node u(i.e., user u) in all the πrandom walk

traces.

Then, we use the machine learning model, skip-gram,

to map the random walk traces to vectors and infer the rela-

tionships among users using such vectors. To make concrete,

we assume that the random walk traces of each user (e.g., u)

are mapped to a vector α(u), and the vectors of all nodes are

denoted by α. We deﬁne the objective function as

arg max

α|U∪L|×d

n∈U∪Li∈m(n)

exp(α(i)·α(n))

j∈U∪Lexp(α( j)·α(n)) .(2)

Then, we use the negative sampling approach to reduce the

computation cost and redeﬁne the objective function as

arg max

α|U∪L|×d

n∈U∪Li∈m(n)log 1

1+exp(−α(i).α(n))

+n∈U∪Li∈m(n)log 1

1+exp(α(i).α(n)) .(3)

Thereafter, in the learning process, we use the stochastic gradi-

ent descend. Finally, we compare the cosine similarity θ(u,v)

of any two users uand v. When the cosine similarity θ(u,v)

meets θ(u,v) =(α(u)·α(v))/(α(u)2×α(v) 2)≥θo

(θois the threshold), the users uand vare regarded to be

friends. As a result, a inferred social graph G[see Fig. 4(a)] is

constructed, where nodes represent users, and edges represent

friendships.

B. Optimal Map Between Inferred Social Graph and Social

Graph

We ﬁrst select klandmarks. Speciﬁcally, in both inferred

social graph and the social graph from social networks,

the nodes with high betweenness are selected as the landmarks.

The betweenness of a speciﬁc node quantiﬁes the number of

shortest paths pass through the node and is measured via an

opportunity network [35]. For example, in Fig. 4, the red nodes

in an inferred social graph Gand a social graph Gare picked

out and regarded as the landmarks.

P(nextn=x/nown)=

⎧

⎪

⎨

⎪

⎩

wnown,x

|gn(nown)|

i=1wnown,i

if nown∈U∧x∈gn(nown)∧(nown,x)∈E

wx,nown

|gn(nown)|

i=1wi,nown

if nown∈L∧x∈gn(nown)∧(nown,x)∈E

0else

(1)

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

ZHAO et al.:P

3ATTAC K S I N M E C 821

Fig. 4. Illustration of map between inferred social graph and social graph from social networks in the loc-Gwalla data set [37]. (a) Nodes represent users,

and edges indicate the inferred social relationships. (b) Nodes represent users, and edges indicate the social relationships from social networks. Red nodes

in (a) and (b) are selected as the landmarks, and k=14.

Then, based on the klandmarks, we search for the optimal

map between the inferred social graph Gand social graph G.

Speciﬁcally, for each node cin inferred social graph G,

the distance between the node cand the klandmarks in Gis

{dc1,dc2,...,dck}. Likewise, in social graph G, the distance

between a speciﬁc node sand the klandmarks in Gis

{ds1,ds2,...,dsk}. Then, we deﬁne the map score between

node cin inferred social graph Gand node sin social graph

Gas (−k

i=1(dci −dsi)2)1/2.Giventheklandmarks in G

and kand landmarks in G,therearek!possible maps. For

each map between the landmarks in Gand G,weusethe

Hungarian algorithm [36] to search for maps for the remaining

nodes in inferred social graph Gand social graph G.For

k!possible maps, it repeats this operations k!times. Denote

the outputs of the k!operations by O1,O2,...,Oi,...,Ok!,

which records the maps among nodes in Gand G. Denote

Oi={(c1,s1),...,(cj,sj),...},wherecjand sjare nodes

in inferred social graph Gand social graph G, respectively,

and node cjin Gis mapped to node sjin the jth operations.

Finally, we select the output (e.g., Oi) with the largest match

score (−j=1(k

i=1(dcji−dsji)2)1/2) as the optimal map

between the inferred social graph Gand social graph Gfrom

social networks. The locations of users in inferred social graph

Gunmapped to the users in social graph Gare identiﬁed as

the poisoning locations.

Note that existing work [38] has validated that the between-

ness of nodes in the social graph follows a heavy-tailed

distribution and that there exist only a small number of nodes

with high betweenness. Thus, k| G|,where|G|is the

number of nodes in G. Therefore, although there are k!maps

between the klandmarks in Gand klandmarks in G,itis

possible to brute force enumeration.

IV. PERFORMANCE EVAL U AT I O N

A. Setup

1) Data Set: We use two real-world data sets, i.e., loc-

Gwalla and loc-Brightkite [37], to validate the performance

of P3. The loc-Gwalla data set consists of 196 591 users

(i.e., nodes), 950 327 edges (i.e., friendships), and 6.4 million

locations from February 2009 to October 2010. Another data

set, i.e., loc-Brightkite, records 4.5 million locations of 58 228

users from April 2008 to October 2010 and 214 078 edges

among users.

2) Baseline Work for Comparison: In addition, we compare

this article P3with two baseline works dubbed as Base-

line1 and Baseline2 since there is no existing work concerning

location privacy protection against poisoning attacks in MEC.

Speciﬁcally, in Baseline1, algorithms know the number of

poisoning locations nin the location data set and randomly

pick out nlocations from the Nlocations in the location data

set as the poisoning locations. On the contrary, in Baseline2,

algorithms randomly pick out a certain number of locations

from the Nlocations without knowing the number of poison-

ing locations and regard these selected locations as poisoning

locations. Nis the number of locations in the location data

set. Note that both the two baseline works indeed quantify

the performance of random guessing, but the ﬁrst baseline

work randomly guesses with the knowledge of the number of

poisoning locations in the location data set.

3) Poisoning Attacks: Furthermore, to quantify the privacy

preservation that P3provides, we consider two kinds of

poisoning attacks. To make concrete, in the ﬁrst kind of

poisoning attacks, adversaries use the locations synthesized in

the existing work [39] as the poisoning locations. The existing

work [39] considered the geographic and semantic features

of locations when synthesizing locations. In the second kind

of poisoning attacks, adversaries randomly generate locations

in the region bounded by users’ locations as the poisoning

locations. Therefore, the poisoning locations in the ﬁrst kind

of poisoning attacks are more plausible to users’ locations.

Note that, in fact, the two kinds of poisoning attacks repre-

sent the sophisticated and straightforward poisoning strategies,

respectively. Thereafter, the two kinds of poisoning attacks are

dubbed as Poison1 and Poison2 for the ease of presentation.

4) Metrics: What’smore,weusethemetric, identiﬁcation

rate, i.e., the failure rate of poisoning attacks. The deﬁnition of

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

822 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020

Fig. 5. Identiﬁcation rate varying with the number of locations Nin both loc-Gwalla and loc-Brightkite with the rate of poisoning locations n/N=0.2and

the length of each user’s trace le =200. (a) Identiﬁcation rate in the loc-Gwalla data set. (b) Identiﬁcation rate in the loc-Brightkite data set.

the identiﬁcation rate is as follows. In addition, we investigate

the impact of the number of landmarks k, the number of

locations N, the rate of poisoning locations n/N(the number

of poisoning locations n), the walk length wl, the length of

each user’s trace le, and the threshold θoon the identiﬁcation

rate.

Deﬁnition 1: Assume that npoisoning locations are iden-

tiﬁed. Then, the identiﬁcation rate, i.e., the failure rate of

poisoning attacks, is n/n.

5) Parameter Settings: Other default parameters are set as

follows: the number of landmarks k∈(3,7); the number of

locations N∈(300,640)million in the loc-Gwalla data set,

and N∈(200,450)million in the loc-Brightkite data set; the

rate of poisoning locations n/N∈(1%,20%); the length of a

speciﬁc user’s trace is within (200,500); the walk length of

the random walk trace wl∈(10,100); the threshold θo=0.8;

and the dimension of the learned vectors d=128. Simulations

are implemented in C++ and conducted on a desktop PC with

an Intel Core i7 3.41GHz processor and 8GB RAM.

B. Impact of Number of Locations N on Identiﬁcation Rate

Fig. 5 shows the impact of the number of locations Non

the identiﬁcation rate. It can be observed that the identiﬁcation

rates in the three algorithms, Baseline1, Baseline2, and P3,

decrease with the increasing number of locations N.The

reason is that, intuitively, the poisoning locations are more

likely to be cloaked with the locations of users when the

number of locations Nis enlarged, thereby decreasing the

identiﬁcation rate. Moreover, in the ﬁrst step, constructing

the inferred social graph, more users deﬁnitely result in

incorrect social relationships inferred by the algorithm P3and

further lead to the failure of identifying poisoning locations.

In Baseline1 and Baseline2, it is more difﬁcult for algorithms

to pick out the poisoning locations via random guess when the

number of locations Nincreases. In addition, the identiﬁcation

rate in P3is less affected by the number of locations Nthan

that in baselines, as P3executes the sophisticated algorithms to

analyze the relationships among users and the map between the

inferred social graph and social graph from social networks.

In addition, the identiﬁcation rate in the Baseline1 algorithm

is much larger than that in Baseline2 since the algorithm in

Baseline1 obtains more side information about the poisoning

locations, i.e., the number of poisoning locations n.

In addition, when adversaries launch the ﬁrst kind of

poisoning attacks, the identiﬁcation rate in P3varies within

(0.6029,0.796), and the identiﬁcation rates in baselines

decrease to 0.174 and 0.0198 from 0.2and0.06, respectively.

On the contrary, when adversaries launch the second kind

of poisoning attacks, the identiﬁcation rates in P3, Baseline1,

and Baseline2 vary from (0.7485,0.7975),(0.1744,0.2),and

(0.0198,0.066), respectively. It is obvious that the ﬁrst kind

of poisoning attacks, i.e., Poson1, is more affected by the

increment of the number of locations N. What’s more, poi-

soning attacks, Poson1, are more effective than the second

kind of poisoning attacks, Poson2, as the identiﬁcation rates

in the three algorithms in Poson1 are less than that in Poson2.

It is attributed to that the poisoning locations generated in

Poson1 exhibit much similar mobility models and are plausible

to users’ locations.

Furthermore, it is interesting to observe that the identiﬁ-

cation rates in the loc-Gwalla data set are larger than those

in the loc-Brightkite data set. To this end, we analyze the

two data sets and ﬁnd that the average degree and graph

density in loc-Gwalla and loc-Brightkite are 9.7, 4.92 E−5,

and 7.5, 1.32E−5, respectively. It means that the loc-Gwalla

data set includes much more relationships among users than

the loc-Brightkite data set, and thus, it is difﬁcult for the

three algorithms to identify the poisoning locations in the

loc-Brightkite data set with less relationship.

C. Impact of Rate of Poisoning Locations n/Non

Identiﬁcation Rate

Fig. 6 shows the impact of the rate of poisoning locations

n/Non the identiﬁcation rate. We can observe that the

identiﬁcation rate in the algorithm, P3, increases with the

rate of poisoning attacks n/Nwhen n/Nincreases from 1%

to 15% in the loc-Gwalla data set and from 1% to 13% in

the loc-Brightkite data set. Then, the identiﬁcation rate in P3

slowly changes when the rate of poisoning attacks n/Ncontin-

ually increases in both the loc-Gwalla and loc-Brightkite data

sets. The reason is that, in algorithm P3, when a small number

of poisoning locations exist in data sets, poisoning locations

are more likely to be cloaked in the users’ locations both in

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

ZHAO et al.:P

3ATTAC K S I N M E C 823

Fig. 6. Impact of the rate of poisoning locations n/Non the identiﬁcation rate in both (a) loc-Gwalla and (b) loc-Brightkite with the number of locations

in loc-Gwalla N=300 billion, the loc-Brightkite N=200 billion, and the rate of poisoning locations n/N=0.2.

Fig. 7. Impact of the rate of the length of user’s trace le on the identiﬁcation rate in both (a) loc-Gwalla and (b) loc-Brightkite with the number of locations

in loc-Gwalla N=300 billion, the loc-Brightkite N=200 billion, and the length of each user’s trace le =200.

the two steps: constructing inferred social graph and mapping

the inferred social graph and the social graph from the social

networks. As a result, P3can identify more poisoning loca-

tions when the rate of poisoning locations n/Nvaries within

(1%,15%). Likewise, it is easier for the Baseline1 algorithm to

pick out poisoning locations when n/Nis increased. When the

rate of poisoning locations n/Nincreases to a certain value,

the identiﬁcation rates in the algorithms, P3and Baseline1,

do not increase as a larger n/Ndenoted more locations and,

thus, fewer probabilities for algorithms to ﬁnd the poisoning

locations. Moreover, the Baseline1 algorithm is less affected

by n/N, with the identiﬁcation rates increasing from 0.005

to 0.2, as the sophisticated algorithm P3is more sensitive to

the n/Nthan Baseline1. Finally, it is interesting to observe

that the identiﬁcation rate in Baseline2 decreases with n/N,

as more locations decrease the probability of random guess.

In addition, the identiﬁcation rate in the poisoning attacks,

Poison1, is less than that in Poison2. The reasons are the same

as analyzed earlier, i.e., the poisoning locations in Poison1 are

more plausible to users’ locations than that in Poison2. As a

result, it is more difﬁcult to identify such poisoning locations.

Furthermore, the identiﬁcation rate in the poisoning attacks,

Poison1, is much affected by the varying rate of poison-

ing locations n/Ncompared with the identiﬁcation rate in

Poison2. To make concrete, in P3, the identiﬁcation rate in

Poison1 increases to 0.8 from 0.566, while the identiﬁcation

rate in Poison2 varies within (0.73,0.8). The reasons are the

same as analyzed earlier, i.e., Poison2 is a straightforward

poisoning strategy, and Poison1 is a sophisticated poisoning

strategy. As such, Poison1 is more likely to be affected by the

rate of poisoning locations n/N.

In addition, in P3, the identiﬁcation rate in the loc-Gwalla

data set is larger than that in the loc-Brightkite data set with the

identiﬁcation rate varying within (0.56,0.8)and (0.608,0.8),

respectively. The reasons are the same as analyzed earlier,

i.e., loc-Gwalla contains much more relationships among users

than the loc-Brightkite data set with an average degree of

9.7 and graph density 4.92E−5. Thus, it is difﬁcult for

the three algorithms to identify the poisoning locations in the

loc-Brightkite data set with less relationship.

D. Impact of Length of User’s Trace le on Identiﬁcation Rate

Fig. 7 shows the impact of the length of the user’s trace le on

the identiﬁcation rate. First, we can see that the identiﬁcation

rates in algorithms, P3, Baseline1, and Baseline2, decrease

with the length of the user’s trace le. Speciﬁcally, in the

loc-Gwalla data set, the identiﬁcation rate in P3decreases

to 0.809 from 0.6568, and in the loc-Brightkite data set,

it decreases by 0.162. In Baseline1, the identiﬁcation rate

decreases to 0.01 in the loc-Gwalla data set and 0.0067 in

loc-Brightkite. Similarly, in Baseline2, the identiﬁcation rate

varies within (0.094,0.064)in loc-Gwalla and (0.096,0.065)

in loc-Brightkite, respectively. The reasons are that poisoning

locations generated in poisoning attacks Poison1 are more

plausible to users’ locations when the length of the user’s trace

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

824 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020

Fig. 8. Impact of (a) number of landmarks k, (b) walk length of random trace wl, and (c) threshold θoon the identiﬁcation rate in both loc-Gwalla and

loc-Brightkite.

le is enlarged. As such, it is difﬁcult for P3to identify these

plausible locations (i.e., poisoning locations). Regarding the

identiﬁcation rates in Baseline1 and Baseline2, it is attributed

to the increasing number of locations resulted from the length

of the user’s trace le.

Furthermore, in P3, the identiﬁcation rate in poisoning

attacks Poison1 is largely affected by the length of the user’s

trace le compared with that in Poison2. Speciﬁcally, in P3,

the identiﬁcation rate in Poison1 varies within 0.6389 and

0.8057, while in Poison2, it decreases to 0.7308 from 0.809.

As analyzed earlier, the adversaries launching Poison1 can

more credibly imitate the locations of users when the length

of the user’s trace le is enlarged, resulting in the poisoning

locations more plausible to users’ locations. Finally, the iden-

tiﬁcation rate in the loc-Brightkite data set is larger than that

in the loc-Gwalla data set.

E. Impact of Number of Landmarks k on Identiﬁcation Rate

In the following, as shown in Fig. 8, we investigate the

impact of the number of landmarks k, the walk length of the

random walk trace wl, and the threshold θoon the identiﬁcation

rates in algorithm P3since Baseline1 and Baseline2 are not

affected by these parameters.

The impact of the number of landmarks kon the iden-

tiﬁcation rate is shown in Fig. 8(a). We observe that ini-

tially as kincreases, the identiﬁcation rate rapidly increases

since the increasing number of landmarks improves the pre-

cision of the map between the inferred social graph and

the social graph from social networks. Speciﬁcally, in loc-

Gwalla, the identiﬁcation rate in Poison1 increases to 0.79 and,

in Poison2, increases to 0.8. On the contrary, in loc-Brightkite,

the identiﬁcation rates in Poison1 and Poison2 vary within

(0.74,0.8)and (0.78,0.83), respectively. However, when the

number of landmarks continually increases, the identiﬁcation

rate decreases, as more errors are injected during the process

of selecting landmarks, and thereby, the set of landmarks in

the inferred graph and the social graph from social networks

are less identical. We can see that the identiﬁcation rates in

the two kinds of poisoning attacks and two data sets decrease

to 0.66, 0.62, 0.74, and 0.79, respectively.

F. Impact of Walk Length wl on Identiﬁcation Rate

Fig. 8(b) shows the impact of the walk length of the

random walk trace wl on the identiﬁcation rate. It shows the

identiﬁcation rates in loc-Gwalla and loc-Brightkite increase

sharply when wl increases from 10 to 50 and 10 to 70, respec-

tively. Thereafter, the identiﬁcation rates in the loc-Gwalla

and loc-Brightkite data sets saturate. Moreover, the identiﬁ-

cation rate in loc-Gwalla is larger than that in loc-Brightkite.

It is attributed to that when the walk length of the random

walk trace wl is enlarged, it improves the precision of the

inferred social relationships and, thereby, the identiﬁcation

rate. Moreover, the larger average degree and graph density

in loc-Gwalla contribute to the better privacy preservation

algorithms provided in the loc-Gwalla data set.

G. Impact of Threshold θoon Identiﬁcation Rate

The identiﬁcation rates in two data sets and two poisoning

attacks in P3are shown in Fig. 8(c) when we vary the threshold

θofrom 0.4to0.8. It shows the identiﬁcation rates increase

with the threshold θosince a larger threshold θoimproves

the precision of the inferred social relationships and further

enlarges the identiﬁcation rates. Moreover, the identiﬁcation

rate in poisoning attacks, Poison1, is less than that in Poison2,

as the poisoning locations in Poison1 exhibit more similar

features with users’ locations.

V. R ELATED WORK

A. Privacy Preservation in MEC

One kind of work focused on launching attacks to disclose

location privacy in MEC scenarios. Speciﬁcally, the work [7]

used the chaff service to protect the location privacy of mobile

users in MECs. Another work [8] proposed that a third party

whose identity is not reliable accesses the MEC platform; it

will pose a potential threat. Similarly, Vratonjic et al. [40]

proposed that the use of shared public IP addresses would

pose a threat to location privacy. Li et al. [9] studied the

problem of online security-aware MEC under jamming attacks

and proposed a secure edge computing method based on the

MAB framework with sleeping arms to adaptively select the

trusted MEC server to protect the user’s location privacy.

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

ZHAO et al.:P

3ATTAC K S I N M E C 825

Another kind of related work studied privacy preservation

in MEC. To make concrete, Zhang et al. [41] proposed to

solve the problem of data security and privacy in MEC by

using cryptographic-based technology. Moreover, Du et al. [8]

analyzed the privacy issues in MEC from the aspects of

data aggregation and data mining and used the anonymous

mechanism and trafﬁc detection techniques combined with

machine learning. What’s more, Li et al. [42] proposed a data

aggregation scheme to protect the privacy of terminal devices

in the MEC-assisted IoT scenario.

In addition, other studies focused on privacy protec-

tion in application scenarios such as task ofﬂoading, MSS,

blockchain, and big data support in MEC. For example, liter-

ature [10] established a safe MEC framework for pilgrimage

and used it to switch between fog computing terminal (FCT)

and cloud. Studies, such as [11], focused on the privacy

problems caused by the wireless task ofﬂoading characteristics

of MEC and proposed the privacy awareness task ofﬂoading

algorithm based on CMDP to minimize the delay and energy

consumption. The work [5] proposed an MSS with MEC as

the core to protect the network privacy of mobile users. The

follow-up study [6] found anonymous security and privacy

issues in blockchain therapy under the background of MEC

application in-home therapy management and proposed a safe

treatment framework. Moreover, the work [4] proposed an

online learning algorithm to predict user learning and deﬁned

a strict attack model to minimize privacy loss in a long time.

However, the above-mentioned literature either focused on

the privacy protection or launched attacks to disclose the data

privacy in MEC and did not study both the strategy of poi-

soning attacks and the defense mechanisms against poisoning

attacks in MEC. On the contrary, this article dedicates to

design the privacy-preserving algorithm against the poisoning

attacks in MEC.

B. Studies About Poisoning Attacks

Another line of somehow related works [12]–[19] focused

on the poisoning attacks in an adversarial machine learn-

ing scenario. Speciﬁcally, literature [12] reviewed security

threats to machine learning including the poisoning attacks.

The work [13] explored poisoning attacks on neural nets.

Moreover, the study [14] made systematic research of data

poisoning attacks for online learning. Another work [15]

proposed a new method for generating undetectable attacks

automatically using the backpropagation characteristics of the

trained deep neural network (DNN). The work [16] developed

three new attacks that can bypass the extensive data disinfec-

tion defenses. On the contrary, the study in [17] discussed the

vulnerability of the weighted method of domain adaptation to

poisoning attacks in an adversarial machine learning environ-

ment. The latest work [18] focused on the optimal poisoning

attack under multitask learning (MTL) model. Furthermore,

literature [19] presented a new backdoor attack without label

poisoning, which proves that it is possible.

However, the above-mentioned work mainly focused

on designing a poisoning attack algorithm in machine

learning scenario without considering the corresponding

defense mechanisms. On the contrary, this article dedicates to

the privacy-preserving scheme against the poisoning attacks

in MEC. Furthermore, the MEC exhibits different character-

istics compared with the machine learning settings, e.g., task

ofﬂoading and time-delay sensitivity, and thus, P3attacks in

MEC face signiﬁcant new challenges.

Another kind of work [20]–[23] focused on poisoning

attacks in the IoT. To make concrete, the work in [20] studied

how to effectively carry out two types of data poisoning

attacks, namely, exploitable attack and target attack, and

proposed an optimal attack framework. Another work [21]

proposed a method to identify harmful data using the con-

textual information of the origin and transformation of data

points in the training set. Moreover, literature [22] developed

an optimization problem that solved the problem of fake user

ratings. The latest work [23] designed an intelligent attack

mechanism to achieve the maximum attack effectiveness while

covering up the attack behavior.

Unfortunately, these works only launched the poisoning

attacks in the IoT without investigating the defense mech-

anisms against the poisoning attacks in IoT. In this article,

we concentrate on the privacy-preserving algorithm against

the poisoning attacks in MEC rather than the IoT.

The third kind of related work focused on the poisoning

attack in threat intelligence systems [24], malware detection

systems [25], unsupervised node embedding methods [26], and

naive Bayes spam ﬁlters, respectively [27]. Nevertheless, these

works considered different scenarios from that considered

in this article. More importantly, these works did not study

the defense mechanisms, while this article indeed focuses on

the privacy-preserving algorithm against the poisoning attacks

in MEC.

In addition, there are several works [28]–[32] focused on the

defenses against poisoning attacks. Speciﬁcally, studies [28]

and [29] studied an efﬁcient tag inversion poisoning attack

optimization algorithm, suspicious data point detection, and

relabeling mechanism to mitigate the impact of such poisoning

attacks respectively. Aiming at the poisoning attacks, litera-

ture [30] studied the function of transforming the data set

from the source domain to the target domain with cluster

separability under the adversarial settings. Moreover, stud-

ies [31] and [32] considered feedback-based content poisoning

mitigation in named data networks and prevention of ARP

poisoning in the IoT, respectively.

However, these works studied the defenses in scenarios,

machine learning, named data networks, and IoT, which are

different from the scenario considered in this article. Fur-

thermore, the scenario MEC exhibits different characteristics,

e.g., task ofﬂoading and time-delay sensitivity, and thus,

the above-mentioned work is not applicable to MEC.

VI. CONCLUSION

In this article, we propose P3, the ﬁrst attempt toward the

scheme against poisoning attacks in MEC. The main idea is

to construct the inferred social graph utilizing feature learning

and search the optimal map between the inferred social graph

and the social graph from social networks to identify the

poisoning locations. Extensive experiments on two real-world

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

826 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020

data sets, two baseline works, and two kinds of poisoning

attacks have demonstrated the effectiveness of P3.

REFERENCES

[1] F.-Y. Wang, Y. Tang, X. Liu, and Y. Yuan, “Social education: Opportuni-

ties and challenges in cyber-physical-social space,” IEEE Trans. Comput.

Social Syst., vol. 6, no. 2, pp. 191–196, Apr. 2019.

[2] R. Basak, S. Sural, N. Ganguly, and S. K. Ghosh, “Online

public shaming on Twitter: Detection, analysis, and mitigation,”

IEEE Trans. Comput. Soc. Syst., vol. 6, no. 2, pp. 208–220,

Apr. 2019.

[3] S. H. Sajadi, M. Fazli, and J. Habibi, “The affective evolution of social

norms in social networks,” IEEE Trans. Comput. Social Syst.,vol.5,

no. 3, pp. 727–735, Sep. 2018.

[4] P. Zhou, K. Wang, J. Xu, and D. Wu, “Differentially-private and

trustworthy online social multimedia big data retrieval in edge com-

puting,” IEEE Trans. Multimedia, vol. 21, no. 3, pp. 539–554,

Mar. 2019.

[5] P. Zhang, M. Durresi, and A. Durresi, “Mobile privacy protection

enhanced with multi-access edge computing,” in Proc. IEEE 32nd Int.

Conf. Adv. Inf. Netw. Appl. (AINA), May 2018.

[6]M.A.Rahmanet al., “Blockchain-based mobile edge computing

framework for secure therapy applications,” IEEE Access,vol.6,

pp. 72469–72478, 2018.

[7] T. He, E. N. Ciftcioglu, S. Wang, and K. S. Chan, “Location privacy

in mobile edge clouds,” in Proc. IEEE 37th Int. Conf. Distrib. Comput.

Syst. (ICDCS), Jun. 2017.

[8] M. Du, K. Wang, Y. Chen, X. Wang, and Y. Sun, “Big data privacy

preserving in multi-access edge computing for heterogeneous Inter-

net of Things,” IEEE Commun. Mag., vol. 56, no. 8, pp. 62–67,

Aug. 2018.

[9] B. Li, T. Chen, X. Wang, and G. B. Giannakis, “Secure edge computing

in IoT via online learning,” in Proc. 52nd Asilomar Conf. Signals, Syst.,

Comput., Oct. 2018.

[10] A. Rahman, E. Hassanain, and M. S. Hossain, “Towards a secure

mobile edge computing framework for Hajj,” IEEE Access,vol.5,

pp. 11768–11781, 2017.

[11] X. He, J. Liu, R. Jin, and H. Dai, “Privacy-aware ofﬂoading in mobile-

edge computing,” in Proc. GLOBECOM IEEE Global Commun. Conf.,

Dec. 2017.

[12] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung, “A survey

on security threats and defensive techniques of machine learning: A data

driven view,” IEEE Access, vol. 6, pp. 12103–12117, 2018.

[13] A. Shafahi et al., “Poison frogs! Targeted clean-label poisoning

attacks on neural networks,” in Proc. Adv. Neural Inf. Process. Syst.,

2018.

[14] Y. Wang and K. Chaudhuri, “Data poisoning attacks against

online learning,” Aug. 2018, arXiv:1808.08994. [Online]. Available:

https://arxiv.org/abs/1808.08994

[15] F. Khalid, M. A. Hanif, S. Rehman, and M. Shaﬁque, “TrISec:

Training data-unaware imperceptible security attacks on deep

neural networks,” Nov. 2018, arXiv:1811.01031. [Online]. Available:

https://arxiv.org/abs/1811.01031

[16] P. W. Koh, J. Steinhardt, and P. Liang, “Stronger data poisoning

attacks break data sanitization defenses,” Nov. 2018, arXiv:1811.00741.

[Online]. Available: https://arxiv.org/abs/1811.00741

[17] M. Umer, C. Frederickson, and R. Polikar, “Adversarial poisoning of

importance weighting in domain adaptation,” in Proc. IEEE Symp. Ser.

Comput. Intell. (SSCI), Nov. 2018.

[18] M. Zhao, B. An, Y. Yu, S. Liu, and S. J. Pan, “Data poisoning attacks on

multi-task relationship learning,” in Proc. 32nd AAAI Conf. Artif. Intell.,

2018, pp. 2628–2635.

[19] M. Barni, K. Kallas, and B. Tondi, “A new backdoor attack in

CNNs by training set corruption without label poisoning,” Feb. 2019,

arXiv:1902.11237. [Online]. Available: https://arxiv.org/abs/1902.11237

[20] C. Miao, Q. Li, H. Xiao, W. Jiang, M. Huai, and L. Su, “Towards data

poisoning attacks in crowd sensing systems,” in Proc. 18th ACM Int.

Symp. Mobile Ad Hoc Netw. Comput. (Mobihoc), 2018.

[21] N. Baracaldo, B. Chen, H. Ludwig, A. Safavi, and R. Zhang, “Detecting

poisoning attacks on machine learning in IoT environments,” in Proc.

IEEE Int. Congr. Internet Things (ICIOT), Jul. 2018.

[22] M. Fang, G. Yang, N. Z. Gong, and J. Liu, “Poisoning attacks to graph-

based recommender systems,” in Proc. 34th Annu. Comput. Secur. Appl.

Conf. (ACSAC), 2018, pp. 1–12.

[23] C. Miao, Q. Li, L. Su, M. Huai, W. Jiang, and J. Gao, “Attack under

disguise: An intelligent data poisoning attack mechanism in crowdsourc-

ing,” in Proc. World Wide Web Conf. World Wide Web (WWW), 2018,

pp. 13–22.

[24] N. Khurana, S. Mittal, and A. Joshi, “Preventing poisoning attacks

on AI based threat intelligence systems,” Jul. 2018, arXiv:1807.07418.

[Online]. Available: https://arxiv.org/abs/1807.07418

[25] S. Chen et al., “Automated poisoning attacks and defenses in malware

detection systems: An adversarial machine learning approach,” Comput.

Secur., vol. 73, pp. 326–344, Mar. 2018.

[26] M. Sun et al., “Data poisoning attack against unsupervised node

embedding methods,” Oct. 2018, arXiv:1810.12881. [Online]. Available:

https://arxiv.org/abs/1810.12881

[27] D. J. Miller, X. Hu, Z. Xiang, and G. Kesidis, “A mixture

model based defense for data poisoning attacks against naive Bayes

spam ﬁlters,” Oct. 2018, arXiv:1811.00121. [Online]. Available:

https://arxiv.org/abs/1811.00121

[28] A. Paudice, L. Muñoz-González, and E. C. Lupu, “Label sanitization

against label ﬂipping poisoning attacks,” in Proc. Joint Eur. Conf. Mach.

Learn. Knowl. Discovery Databases, 2018, pp. 5–15.

[29] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C. Lupu, “Detec-

tion of adversarial training examples in poisoning attacks through

anomaly detection,” Feb. 2018, arXiv:1802.03041. [Online]. Available:

https://arxiv.org/abs/1802.03041

[30] C. V. S. Praven and C. S. Kumar, “Domain adversarial representation

learning for data independent defenses against poisoning attacks,” in

Proc. ICLR, 2018, pp. 1–3.

[31] W. Cui, Y. Li, Y. Xin, and C. Liu, “Feedback-based content poisoning

mitigation in named data networking,” in Proc. IEEE Symp. Comput.

Commun. (ISCC), Jun. 2018, pp. 759–765.

[32] W. Gao et al., “ARP poisoning prevention in Internet of Things,” in Proc.

9th Int. Conf. Inf. Technol. Med. Edu. (ITME), Oct. 2018, pp. 733–736.

[33] P. Zhao, J. Li, F. Zeng, F. Xiao, C. Wang, and H. Jiang, “ILLIA: Enabling

k-anonymity-based privacy preserving against location injection attacks

in continuous LBS queries,” IEEE Internet Things J., vol. 5, no. 2,

pp. 1033–1042, Apr. 2018.

[34] P. Zhao et al., “P3-LOC: A privacy-preserving paradigm-driven frame-

work for indoor localization,” IEEE/ACM Trans. Netw., vol. 26, no. 6,

pp. 2856–2869, Dec. 2018.

[35] T. Opsahl, F. Agneessens, and J. Skvoretz, “Node centrality in weighted

networks: Generalizing degree and shortest paths,” Social Netw., vol. 32,

no. 3, pp. 245–251, Jul. 2010.

[36] R. E. Bellman, “Book review: Combinatorial optimization: Networks

and matroids,” Bull. Amer. Math. Soc., vol. 84, no. 3, pp. 461–464,

May 1978.

[37] Gwalla and Brightkite Data. Accessed: Dec. 21, 2019. [Online]. Avail-

able: http://snap.stanford.722edu/data/index.html

[38] W. Gao, G. Cao, A. Iyengar, and M. Srivatsa, “Supporting cooperative

caching in disruption tolerant networks,” in Proc. 31st Int. Conf. Distrib.

Comput. Syst., Jun. 2011, pp. 1–12.

[39] V. Bindschaedler and R. Shokri, “Synthesizing plausible privacy-

preserving location traces,” in Proc. IEEE Symp. Secur. Privacy (SP),

May 2016, pp. 1–18.

[40] N. Vratonjic, K. Huguenin, V. Bindschaedler, and J.-P. Hubaux,

“A location-privacy threat stemming from the use of shared pub-

lic IP addresses,” IEEE Trans. Mobile Comput., vol. 13, no. 11,

pp. 2445–2457, Nov. 2014.

[41] J. Zhang, B. Chen, Y. Zhao, X. Cheng, and F. Hu, “Data security

and privacy-preserving in edge computing paradigm: Survey and open

issues,” IEEE Access, vol. 6, pp. 18209–18237, 2018.

[42] X. Li, S. Liu, F. Wu, S. Kumari, and J. J. P. C. Rodrigues, “Privacy

preserving data aggregation scheme for mobile edge computing assisted

IoT applications,” IEEE Internet Things J., vol. 6, no. 3, pp. 4755–4763,

Jun. 2019, doi: 10.1109/jiot.2018.2874473.

Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.

Distributed privacy preservation for online social network using flexible clustering and whale optimization algorithm

Article

Full-text available

Feb 2024
CLUSTER COMPUT

Over the past few years, global use of Online Social Networks (OSNs) has increased. The rising use of OSN makes protecting users’ privacy from OSN attacks difficult. Finally, it affects the basic commitment to protect OSN users from such invasions. The lack of a distributed, dynamic, and artificial intelligence (AI)-based privacy-preserving strategy for performance trade-offs is a research challenge. We propose the Distributed Privacy Preservation (DPP) for OSN using Artificial Intelligence (DPP-OSN-AI) to reduce Information Loss (IL) and improve privacy preservation from different OSN threats. DPP-OSN-AI uses AI to design privacy notions in distributed OSNs. DPP-OSN-AI consists of AI-based clustering, l-diversity, and t-closeness phases to achieve the DPP for OSN. The AI-based clustering is proposed for dynamic and optimal clustering of OSN users to ensure personalized k-anonymization to protect from AI-based threats. First, the optimal number of clusters is discovered dynamically with simple computations, and then the Whale Optimization Algorithm is designed to optimally place the OSN users across the clusters such that it helps to protect them from AI-based threats. Because k-anonymized OSN clusters are insufficient to handle all privacy concerns in a distributed OSN environment, we systematically applied the l-diversity privacy idea followed by the t-closeness to it, resulting in higher DPP and lower IL. The DPP-OSN-AI model is assessed for IL Efficiency (ILE), Degree of Anonymization (DoA,) and computational complexity using publically accessible OSN datasets. Compared to state-of-the-art, DPP-OSN-AI model DoA is 15.57% higher, ILE is 17.85% higher, and computational complexity is 3.61% lower.

Deep Reinforcement Learning-Based Joint Optimization of Delay and Privacy in Multiple-User MEC Systems

Chapter

May 2024

Mobile Edge Computing (MEC) enables mobile users to run various delay-sensitive applications via offloading computation tasks to MEC servers. However, the location privacy and the usage pattern privacy are disclosed to the untrusted MEC servers. The most related work concerning privacy-preserving offloading schemes in MEC either consider an impractical MEC scenario consisting of a single user or take a large amount of computation and communication cost. In this paper, we propose a deep reinforcement learning-based joint optimization of delay and privacy preservation during offloading for multiple-user wireless powered MEC systems, preserving users’ both location privacy and usage pattern privacy. The main idea is that, to protect both the two kinds of privacy, we propose to disguise users’ offloading decisions and deliberately offloading redundant tasks along with the actual tasks to the MEC servers. On this basis, we further formalize the task offloading as an optimization problem of computation rate and privacy preservation. Then, we design a deep reinforcement learning-based offloading algorithm to solve such a non-convex problem, aiming to obtain the better tradeoff between the computation rate and the privacy preservation. Finally, extensive simulation results demonstrate that our algorithm can maintain a high level of computation rate while protecting users’ usage pattern privacy and location privacy, compared with two learning-based methods and two Baselines.

Attacking Social Media via Behavior Poisoning

Article

Mar 2024

Since social media such as Facebook and Twitter have permeated various aspects of daily life, people have strong incentives to influence information dissemination on these platforms and differentiate their content from the fierce competition. Existing dissemination strategies typically employ marketing techniques, such as seeking publicity through renowned actors or targeted advertising placements. Despite their various forms, most simply spread information to strengthen user impressions without conducting formal analyses of specific influence enhancement. And coupled with high costs, most fall short of expectations. To this end, we ingeniously formulate the task of social media dissemination as poisoning attacks, which influence specified content’s dissemination among target users by intervening in some users’ social media behaviors (including retweeting, following, and profile modifying). Correspondingly, we propose a novel poisoning attack, I nfluence-based S ocial M edia A ttack (ISMA) to generate discrete poisoning behaviors, which is difficult to achieve with existing attacks. In ISMA, we first contribute an efficient influence evaluator to quantify the spread influence of poisoning behaviors. Based on the estimated influence, we then present an imperceptible hierarchical selector and a profile modification method ProMix to select influential behaviors to poison. Notably, our attack is driven by custom attack objectives, which allows one to flexibly design different optimization goals to change the information flow, which could solve the blindness of existing influence maximization methods. Besides, behaviors such as retweeting are gentle and simple to implement. These properties make our attack more cost-effective and practical. Extensive experiments on two large-scale real-world datasets demonstrate the superiority of our method as it significantly outperforms baselines, and additionally, the proposed evaluator’s analysis of user influence provides new insights for influence maximization on social media.

A pipeline approach for privacy preservation against poisoning attacks in a Mobile Edge Computing environment

Article

Mar 2024
AD HOC NETW

Data privacy protection model based on blockchain in mobile edge computing

Article

Feb 2024

Mobile edge computing (MEC) technology is widely used for real‐time and bandwidth‐intensive services, but its underlying heterogeneous architecture may lead to a variety of security and privacy issues. Blockchain provides novel solutions for data security and privacy protection in MEC. However, the scalability of traditional blockchain is difficult to meet the requirements of real‐time data processing, and the consensus mechanism is not suitable for resource‐constrained devices. Moreover, the access control of MEC data needs to be further improved. Given the above problems, a data privacy protection model based on sharding blockchain and access control is designed in this paper. First, a privacy‐preserving platform based on a sharding blockchain is designed. Reputation calculation and improved Proof‐of‐Work (PoW) consensus mechanism are proposed to accommodate resource‐constrained edge devices. The incentive mechanism with rewards and punishments is designed to constrain node behavior. A reward allocation algorithm is proposed to encourage nodes to actively contribute to obtaining more rewards. Second, an access control strategy using ciphertext policy attribute‐based encryption (CP‐ABE) and RSA is designed. A smart contract is deployed to implement the automatic access control function. The InterPlanetary File System is introduced to alleviate the blockchain storage burden. Finally, we analyze the security of the proposed privacy protection model and statistics of the GAS consumed by the access control policy. The experimental results show that the proposed data privacy protection model achieves fine‐grained control of access rights, and has higher throughput and security than traditional blockchain.

AdverSPAM: Adversarial SPam Account Manipulation in Online Social Networks

Article

Jan 2024

In recent years, the widespread adoption of Machine Learning (ML) at the core of complex IT systems has driven researchers to investigate the security and reliability of ML techniques. A very specific kind of threats concerns the adversary mechanisms through which an attacker could induce a classification algorithm to provide the desired output. Such strategies, known as Adversarial Machine Learning (AML), have a twofold purpose: to calculate a perturbation to be applied to the classifier’s input such that the outcome is subverted, while maintaining the underlying intent of the original data. Although any manipulation that accomplishes these goals is theoretically acceptable, in real scenarios perturbations must correspond to a set of permissible manipulations of the input, which is rarely considered in the literature. In this paper, we present AdverSPAM , an AML technique designed to fool the spam account detection system of an Online Social Network (OSN). The proposed black-box evasion attack is formulated as an optimization problem that computes the adversarial sample while maintaining two important properties of the feature space, namely statistical correlation and semantic dependency . Although being demonstrated in an OSN security scenario, such an approach might be applied in other context where the aim is to perturb data described by mutually related features. Experiments conducted on a public dataset show the effectiveness of AdverSPAM compared to five state-of-the-art competitors, even in the presence of adversarial defense mechanisms.

Communications Security in Industry X: A Survey

Article

Full-text available

Jan 2024

Industry 4.0 is moving towards deployment using 5G as one of the main underlying communication infrastructures. Thus, the vision of the Industry of the future is getting more attention in research. Industry X (InX) is a significant thrust beyond the state-of-the-art of current Industry 4.0, towards a mix of cyber and physical systems through novel technological developments. In this survey, we define InX as the combination of Industry 4.0 and 5.0 paradigms. Most of the novel technologies, such as cyber-physical systems, industrial internet of things, machine learning, advances in cloud computing, such as edge and fog computing, and blockchain, to name a few, are converged through advanced communication networks. Since communication networks are usually targeted for security attacks, these new technologies upon which InX relies must be secured to avoid security vulnerabilities propagating into InX and its components. Therefore, in this article, we break down the security concerns of the converged InX-communication networks into the core technologies that tie these, once considered distinct, fields together. The security challenges of each technology are highlighted and potential solutions are discussed. The existing vulnerabilities or research gaps are brought forth to stir further research in this direction. New emerging visions in the context of InX are provided towards the end of the article to provoke further curiosity of researchers.

Efficient approximation and privacy preservation algorithms for real time online evolving data streams

Article

Full-text available

Jan 2024
WORLD WIDE WEB

Because of the processing of continuous unstructured large streams of data, mining real-time streaming data is a more challenging research issue than mining static data. The privacy issue persists when sensitive data is included in streaming data. In recent years, there has been significant progress in research on the anonymization of static data. For the anonymization of quasi-identifiers, two typical strategies are generalization and suppression. However, the high dynamicity and potential infinite properties of the streaming data make it a challenging task. To end this, we propose a novel Efficient Approximation and Privacy Preservation Algorithms (EAPPA) framework in this paper to achieve efficient data pre-processing from the live streaming and its privacy preservation with minimum Information Loss (IL) and computational requirements. As the existing privacy preservation solutions for streaming data suffer from the challenges of redundant data, we first propose the efficient technique of data approximation with data pre-processing. We design the Flajolet Martin (FM) algorithm for robust and efficient approximation of unique elements in the data stream with a data cleaning mechanism. We fed the periodically approximated and pre-processed streaming data to the anonymization algorithm. Using adaptive clustering, we propose innovative k-anonymization and l-diversity privacy principles for data streams. The proposed approach scans a stream to detect and reuse clusters that fulfill the k-anonymity and l-diversity criteria for reducing anonymization time and IL. The experimental results reveal the efficiency of the EAPPA framework compared to state-of-art methods.

The Security and Privacy of Mobile Edge Computing: An Artificial Intelligence Perspective

Article

Dec 2023

Mobile Edge Computing (MEC) is a new computing paradigm that enables cloud computing and information technology (IT) services to be delivered at the network’s edge. By shifting the load of cloud computing to individual local servers, MEC helps meet the requirements of ultralow latency, localized data processing, and extends the potential of Internet of Things (IoT) for end-users. However, the crosscutting nature of MEC and the multidisciplinary components necessary for its deployment have presented additional security and privacy concerns. Fortunately, Artificial Intelligence (AI) algorithms can cope with excessively unpredictable and complex data, which offers a distinct advantage in dealing with sophisticated and developing adversaries in the security industry. Hence, in this paper we comprehensively provide a survey of security and privacy in MEC from the perspective of AI. On the one hand, we use European Telecommunications Standards Institute (ETSI) MEC reference architecture as our based framework while merging the Software Defined Network (SDN) and Network Function Virtualization (NFV) to better illustrate a serviceable platform of MEC. On the other hand, we focus on new security and privacy issues, as well as potential solutions from the viewpoints of AI. Finally, we comprehensively discuss the opportunities and challenges associated with applying AI to MEC security and privacy as possible future research directions.

Data Detection Poisoning Attack Using Resource Scheme Multilinear Regression

Conference Paper

May 2023

Stronger data poisoning attacks break data sanitization defenses

Article

Full-text available

Nov 2021
MACH LEARN

Machine learning models trained on data from the outside world can be corrupted by data poisoning attacks that inject malicious points into the models’ training sets. A common defense against these attacks is data sanitization: first filter out anomalous training points before training the model. In this paper, we develop three attacks that can bypass a broad range of common data sanitization defenses, including anomaly detectors based on nearest neighbors, training loss, and singular-value decomposition. By adding just 3% poisoned data, our attacks successfully increase test error on the Enron spam detection dataset from 3 to 24% and on the IMDB sentiment classification dataset from 12 to 29%. In contrast, existing attacks which do not explicitly account for these data sanitization defenses are defeated by them. Our attacks are based on two ideas: (i) we coordinate our attacks to place poisoned points near one another, and (ii) we formulate each attack as a constrained optimization problem, with constraints designed to ensure that the poisoned points evade detection. As this optimization involves solving an expensive bilevel problem, our three attacks correspond to different ways of approximating this problem, based on influence functions; minimax duality; and the Karush–Kuhn–Tucker (KKT) conditions. Our results underscore the need to develop more robust defenses against data poisoning attacks.

Preventing Poisoning Attacks On AI Based Threat Intelligence Systems

Conference Paper

Full-text available

Oct 2019

TrISec: Training Data-Unaware Imperceptible Security Attacks on Deep Neural Networks

Conference Paper

Full-text available

Jul 2019

A New Backdoor Attack in CNNS by Training Set Corruption Without Label Poisoning

Conference Paper

Full-text available

Feb 2019

Backdoor attacks against CNNs represent a new threat against deep learning systems, due to the possibility of corrupting the training set so to induce an incorrect behaviour at test time. To avoid that the trainer recognises the presence of the corrupted samples, the corruption of the training set must be as stealthy as possible. Previous works have focused on the stealthiness of the perturbation injected into the training samples, however they all assume that the labels of the corrupted samples are also poisoned. This greatly reduces the stealthiness of the attack, since samples whose content does not agree with the label can be identified by visual inspection of the training set or by running a pre-classification step. In this paper we present a new backdoor attack without label poisoning Since the attack works by corrupting only samples of the target class, it has the additional advantage that it does not need to identify beforehand the class of the samples to be attacked at test time. Results obtained on the MNIST digits recognition task and the traffic signs classification task show that backdoor attacks without label poisoning are indeed possible, thus raising a new alarm regarding the use of deep learning in security-critical applications. Index Terms-Adversarial learning, security of deep learning, backdoor poisoning attacks, training with poisoned data.

Online Public Shaming on Twitter: Detection, Analysis, and Mitigation

Article

Full-text available

Feb 2019

Public shaming in online social networks and related online public forums like Twitter has been increasing in recent years. These events are known to have a devastating impact on the victim's social, political, and financial life. Notwithstanding its known ill effects, little has been done in popular online social media to remedy this, often by the excuse of large volume and diversity of such comments and, therefore, unfeasible number of human moderators required to achieve the task. In this paper, we automate the task of public shaming detection in Twitter from the perspective of victims and explore primarily two aspects, namely, events and shamers. Shaming tweets are categorized into six types: abusive, comparison, passing judgment, religious/ethnic, sarcasm/joke, and whataboutery, and each tweet is classified into one of these types or as nonshaming. It is observed that out of all the participating users who post comments in a particular shaming event, majority of them are likely to shame the victim. Interestingly, it is also the shamers whose follower counts increase faster than that of the nonshamers in Twitter. Finally, based on categorization and classification of shaming tweets, a web application called BlockShame has been designed and deployed for on-the-fly muting/blocking of shamers attacking a victim on the Twitter.

Data Poisoning Attacks on Multi-Task Relationship Learning

Article

Apr 2018

Multi-task learning (MTL) is a machine learning paradigm that improves the performance of each task by exploiting useful information contained in multiple related tasks. However, the relatedness of tasks can be exploited by attackers to launch data poisoning attacks, which has been demonstrated a big threat to single-task learning. In this paper, we provide the first study on the vulnerability of MTL. Specifically, we focus on multi-task relationship learning (MTRL) models, a popular subclass of MTL models where task relationships are quantized and are learned directly from training data. We formulate the problem of computing optimal poisoning attacks on MTRL as a bilevel program that is adaptive to arbitrary choice of target tasks and attacking tasks. We propose an efficient algorithm called PATOM for computing optimal attack strategies. PATOM leverages the optimality conditions of the subproblem of MTRL to compute the implicit gradients of the upper level objective function. Experimental results on real-world datasets show that MTRL models are very sensitive to poisoning attacks and the attacker can significantly degrade the performance of target tasks, by either directly poisoning the target tasks or indirectly poisoning the related tasks exploiting the task relatedness. We also found that the tasks being attacked are always strongly correlated, which provides a clue for defending against such attacks.

Social Education: Opportunities and Challenges in Cyber-Physical-Social Space

Article

Apr 2019

We are making good progresses in our impact and reputation over the last year. According to the latest data released by Scopus on February 11, 2019, our CiteScore hits its historical high to 3.94, and TCSS ranks 8th out of the 226 journals (top 3.54%) in the field of social sciences. This is a solid improvement compared with the corresponding data in 2017 (CiteScore: 2.36, Rank: 17/226, top 8%). Thanks and congratulations to our authors, reviewers, and members of our editorial board. The current issue includes 17 regular papers and a brief discussion on social education.

Secure Edge Computing in IoT via Online Learning

Conference Paper

Oct 2018

Adversarial Poisoning of Importance Weighting in Domain Adaptation

Conference Paper

Nov 2018

Label Sanitization Against Label Flipping Poisoning Attacks

Chapter

Feb 2019

Many machine learning systems rely on data collected in the wild from untrusted sources, exposing the learning algorithms to data poisoning. Attackers can inject malicious data in the training dataset to subvert the learning process, compromising the performance of the algorithm producing errors in a targeted or an indiscriminate way. Label flipping attacks are a special case of data poisoning, where the attacker can control the labels assigned to a fraction of the training points. Even if the capabilities of the attacker are constrained, these attacks have been shown to be effective to significantly degrade the performance of the system. In this paper we propose an efficient algorithm to perform optimal label flipping poisoning attacks and a mechanism to detect and relabel suspicious data points, mitigating the effect of such poisoning attacks.

P³: Privacy-Preserving Scheme Against Poisoning Attacks in Mobile-Edge Computing

Abstract and Figures

Recommended publications

A pipeline approach for privacy preservation against poisoning attacks in a Mobile Edge Computing en...

Deep Reinforcement Learning-Based Joint Optimization of Delay and Privacy in Multiple-User MEC Syste...

Deep Reinforcement Learning-Based Joint Optimization of Delay and Privacy in Multiple-User MEC Syste...

Garbage In, Garbage Out: Poisoning Attacks Disguised With Plausible Mobility in Data Aggregation