Content uploaded by Ping Zhao
Author content
All content in this area was uploaded by Ping Zhao on Dec 14, 2020
Content may be subject to copyright.
818 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020
P3: Privacy-Preserving Scheme Against Poisoning
Attacks in Mobile-Edge Computing
Ping Zhao, Member, IEEE, Haojun Huang ,Member, IEEE, Xiaohui Zhao, and Daiyu Huang
Abstract— Mobile-edge computing (MEC) has emerged to
enable users to offload their location data into the MEC server,
and at the same time, the MEC server executes the location-aware
data processing to compute the statistical results about these
collected locations. However, malicious users may deliberately
generate poisoning locations and send these poisoning locations
to the MEC server, aiming to poison the statistical results learned
by the MEC server and even the other users’ location privacy.
Existing work concerning privacy preservation in MEC has
not studied such poisoning attacks in MEC. Another line of
somehow related work focused on poisoning attacks in a different
scenario—adversarial machine learning. However, MEC exhibits
different features with the machine learning settings, and thus,
the privacy preservation against poisoning attacks in MEC faces
significantly new challenges. To address the problem, we propose
the privacy-preserving scheme, i.e., privacy-preserving scheme
against poisoning (P3), that utilizes the feature learning model
to infer the social relationships among users from their location
data and then constructs the inferred social graph. Thereafter,
it searches the optimal map between the inferred social graph and
the social graph from social networks to identify the poisoning
locations. Experiments on two real-world data sets, two baseline
works, and two kinds of poisoning attacks have demonstrated
the privacy preservation against the poisoning attacks in MEC
P3provides.
Index Terms—Feature learning, location privacy, mobile-ed ge
computing (MEC), poisoning attacks, social relationship.
I. INTRODUCTION
WITH the development of the mobile Internet, mobile-
edge computing (MEC) has been widely applied.
In MEC systems, as shown in Fig. 1, users offload their
location data into the MEC server, and the MEC server
computes the statistical results about these locations to support
Manuscript received July 1, 2019; revised October 7, 2019; accepted
November 18, 2019. Date of publication February 20, 2020; date of current
version June 10, 2020. This work was supported in part by the National
Natural Science Foundation of China under Grant 61902060, Grant 61801106,
Grant 61671216, Grant 61977064, and Grant 61871436, in part by the Shang-
hai Sailing Program under Grant 19YF1402100, in part by the Chenguang
Program supported by the Shanghai Education Development Foundation and
the Shanghai Municipal Education Commission, in part by the Fundamental
Research Funds for the Central Universities under Grant 2232019D3-51,
in part by Initial Research Funds for Young Teachers of Donghua University,
and in part by the Shanghai Rising-Star Program under Grant 19QA1400300.
(Corresponding author: Haojun Huang.)
Ping Zhao, Xiaohui Zhao, and Daiyu Huang are with the College of
Information Science and Technology, Donghua University, Shanghai 201620,
China (e-mail: pingzhao2018ph@dhu.edu.cn; 160910903@mail.dhu.edu.cn;
160910309@mail.dhu.edu.cn).
Haojun Huang is with the School of Electronic Information and
Communications, Huazhong University of Science and Technology, Wuhan
430074, China (e-mail: hjhuang@hust.edu.cn).
Digital Object Identifier 10.1109/TCSS.2019.2960824
new edge intelligence applications. However, malicious users
(hereafter, adversaries) may deliberately generate and send
poisoning locations to the MEC server to poison the statistical
results about users’ locations, incurring the MEC server and
even other users suffering from the poisoning attacks. Such
poisoning attacks have been the bottleneck for the wide
development and applications of MEC systems and, thus, have
attracted widespread concern [1]–[3].
Existing work concerning privacy preservation in
MEC [4]–[11] mainly used anonymous mechanisms and
traffic detection techniques combined with machine learning
to protect users’ location privacy or the proposed relevant
algorithms to minimize the loss of privacy in MEC systems,
such as task offloading, mobile support system (MSS),
blockchain, and so on. However, these studies neither study
the defense mechanisms against poisoning attacks nor focus
on poisoning attack schemes. Another line of somehow related
work [12]–[19] focused on poisoning attacks in an adversarial
machine learning scenario. Nevertheless, compared with the
machine learning settings, MEC is a different scenario and
exhibits different features, e.g., task offloading and time-delay
sensitivity. Other works [20]–[27] focused on poisoning attack
schemes in the Internet of Things (IoT). Unfortunately, these
works did not study defense mechanisms against poisoning
attacks. In addition, several works [28]–[32] focused on the
defenses against poisoning attacks in scenarios, machine
learning, named data networks, and IoT. However, these
works considered different scenarios rather than the scenario,
MEC, and more importantly, MEC exhibits different features
that bring in extremely new challenges to the defense
mechanisms against poisoning attacks in MEC. In summary,
it is necessary to study defense mechanisms against poisoning
attacks in MEC.
To address the above problem, in this article, we propose P3,
i.e., the Privacy-Preserving scheme against Poisoning attacks
in MEC. Specifically, it first utilizes the feature learning model
to infer the social relationship among users from their location
data and then constructs the inferred social graph with the
help of the inferred relationships. On this basis, it searches the
optimal map between the inferred social graph and the social
graph from social networks to identify the poisoning locations.
Finally, it validates the performance utilizing real-world data
sets. In summary, we make the following contributions.
1) We propose to utilize the feature learning model to
construct the inferred social graph without any domain
experts’ knowledge. Most existing works relied on
2329-924X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.:P
3ATTAC K S I N M E C 819
Fig. 1. Illustration of poisoning attacks in MEC.
the simple heuristic algorithm that involves too much
domain experts’ knowledge and, thus, incurs too many
errors. On the contrary, with the knowledge of loca-
tions collected by the MEC server, we propose to first
characterize the social behavior of users’ mobility using
feature learning and then, on this basis, infer the social
relationships and construct the inferred social graph.
2) We propose to mirror the inferred relationships to their
social relationships in social networks to identify the
poisoning locations. To concrete, the inferred social
graph is structurally correlated with the social graph
in social networks, and thus, the inferred relationships
can be mapped to the relationships in a social graph.
As a result, the poisoning locations can be identified via
searching the optimal map between the inferred social
graph and the social graph from social networks.
3) We use two real-world data sets to validate the effective-
ness of the proposed scheme P3, and simulation results
validate the location privacy preservation against the
poisoning attacks. Specifically, we use loc-Gwalla and
loc-Brightkite data sets, compare P3with two baseline
work, and investigate the performance in two kinds of
poisoning attacks. Moreover, simulation results prove
that P3outperforms the two baselines in both the two
data sets and the two kinds of poisoning attacks.
The remainder of this article is organized as follows.
Section II introduces the adversary model. Then, Section III
proceeds to describe the design of the privacy-preserving
scheme P3in detail, followed by the evaluation in Section IV.
Section V reviews the related work. Finally, Section VI
concludes this article.
II. ADVER SA RY MODEL
As shown in Fig. 1, users offload the location data to
the MEC server, and the MEC server computes the corre-
sponding results and returns to the users. At the same time,
the MEC server collects these users’ locations and executes
a certain kind of statistical function to calculate the statistical
results about these collected locations. These statistical results
are expected to support new edge intelligence applications.
However, malicious users (i.e., adversaries) generate poisoning
locations and send these poisoning locations to the MEC
server, aiming to poison the statistical results learned by the
MEC server [33], [34]. What’s more, these poisoned statistical
results will further lead to serious errors in applications,
e.g., map inference and smart transportation. For example,
in the hearing “The Dawn of AI,” the vulnerabilities of systems
to poisoning attacks have attracted a widespread concern of
experts from academia and industry.
Adversaries have the background knowledge of the locations
of a small present of users. They conduct poisoning attacks
via generating poisoning locations and send these poisoning
locations to the MEC server on the basis of these known
locations. Adversaries’ capability is limited by the number of
these poisoning points. In this article, adversaries are assumed
to be able to control a small present of users, and the rate
of these poisoning locations is set less than 20%. In the
scenario, MEC, adversaries indeed can inject a small present of
poisoning locations since a large number of users involve the
MEC applications and send locations to the MEC server. Both
the MEC server and the cloud server are assumed to be trusted
and honestly perform the proposed scheme P3. The goal of P3
is to identify these poisoning locations.
III. DESIGN OF P3
The main idea behind the proposed scheme P3,asshown
in Fig. 2, is that it first characterizes the social behavior of
users’ mobility using feature learning since users’ mobility
behavior is also shaped by their social relationships (i.e., why
they move) [1]–[3]. Then, on this basis, it infers the social
relationships among users and constructs the inferred social
graph. Thereafter, it searches the optimal map between the
inferred social graph and the social graph from social networks
utilizing the structural correlations between the two graphs
to identify the poisoning locations. In summary, it mainly
includes two steps: constructing inferred social graph and
mapping the inferred social graph and the social graph from
social networks.
A. Construction of Inferred Social Graph
We first search the mobility neighborhoods using the
random walk, which is shown in Fig. 3. Specifically, we
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
820 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020
Fig. 2. Work flow of P3.
Fig. 3. Illustration of construction of inferred social graph. (a) Users and locations. (b) Weighted bipartite graph Go. (c) Random walk traces.
construct the weighted bipartite graph Go={U,L,E}
[see Fig. 3(a) and (b)], where Uis the set of users, Lis the set
of locations of users, and Eis the set of edges connecting users
and locations. The weight wu,lof a specific edge (u,l)∈E
(u∈U,l∈L) is the times of the user uchecking in the
location l. We define the graph neighborhoods of a specific
node x∈U∪Lby gn(x), which is the set of nodes connecting
with x. Then, we generate πrandom walk traces [see Fig. 3(c)]
for each user, and the walk length of each trace is wl.Fora
specific user u, we denote the current node in the random
walk trace by nownand the next node by nextn. Thereafter,
we define the next node in the random walk trace nextnin (1),
shown at the bottom of this page. For example, the next
node in the random walk trace nextnis sampled with the
probability P(nextn=x/nown)[see (1)]. Then, the mobility
neighborhoods gn(u)of the user uare the nodes before and
after the node u(i.e., user u) in all the πrandom walk
traces.
Then, we use the machine learning model, skip-gram,
to map the random walk traces to vectors and infer the rela-
tionships among users using such vectors. To make concrete,
we assume that the random walk traces of each user (e.g., u)
are mapped to a vector α(u), and the vectors of all nodes are
denoted by α. We define the objective function as
arg max
α|U∪L|×d
n∈U∪Li∈m(n)
exp(α(i)·α(n))
j∈U∪Lexp(α( j)·α(n)) .(2)
Then, we use the negative sampling approach to reduce the
computation cost and redefine the objective function as
arg max
α|U∪L|×d
n∈U∪Li∈m(n)log 1
1+exp(−α(i).α(n))
+n∈U∪Li∈m(n)log 1
1+exp(α(i).α(n)) .(3)
Thereafter, in the learning process, we use the stochastic gradi-
ent descend. Finally, we compare the cosine similarity θ(u,v)
of any two users uand v. When the cosine similarity θ(u,v)
meets θ(u,v) =(α(u)·α(v))/(α(u)2×α(v) 2)≥θo
(θois the threshold), the users uand vare regarded to be
friends. As a result, a inferred social graph G[see Fig. 4(a)] is
constructed, where nodes represent users, and edges represent
friendships.
B. Optimal Map Between Inferred Social Graph and Social
Graph
We first select klandmarks. Specifically, in both inferred
social graph and the social graph from social networks,
the nodes with high betweenness are selected as the landmarks.
The betweenness of a specific node quantifies the number of
shortest paths pass through the node and is measured via an
opportunity network [35]. For example, in Fig. 4, the red nodes
in an inferred social graph Gand a social graph Gare picked
out and regarded as the landmarks.
P(nextn=x/nown)=
⎧
⎪
⎪
⎪
⎪
⎨
⎪
⎪
⎪
⎪
⎩
wnown,x
|gn(nown)|
i=1wnown,i
if nown∈U∧x∈gn(nown)∧(nown,x)∈E
wx,nown
|gn(nown)|
i=1wi,nown
if nown∈L∧x∈gn(nown)∧(nown,x)∈E
0else
(1)
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.:P
3ATTAC K S I N M E C 821
Fig. 4. Illustration of map between inferred social graph and social graph from social networks in the loc-Gwalla data set [37]. (a) Nodes represent users,
and edges indicate the inferred social relationships. (b) Nodes represent users, and edges indicate the social relationships from social networks. Red nodes
in (a) and (b) are selected as the landmarks, and k=14.
Then, based on the klandmarks, we search for the optimal
map between the inferred social graph Gand social graph G.
Specifically, for each node cin inferred social graph G,
the distance between the node cand the klandmarks in Gis
{dc1,dc2,...,dck}. Likewise, in social graph G, the distance
between a specific node sand the klandmarks in Gis
{ds1,ds2,...,dsk}. Then, we define the map score between
node cin inferred social graph Gand node sin social graph
Gas (−k
i=1(dci −dsi)2)1/2.Giventheklandmarks in G
and kand landmarks in G,therearek!possible maps. For
each map between the landmarks in Gand G,weusethe
Hungarian algorithm [36] to search for maps for the remaining
nodes in inferred social graph Gand social graph G.For
k!possible maps, it repeats this operations k!times. Denote
the outputs of the k!operations by O1,O2,...,Oi,...,Ok!,
which records the maps among nodes in Gand G. Denote
Oi={(c1,s1),...,(cj,sj),...},wherecjand sjare nodes
in inferred social graph Gand social graph G, respectively,
and node cjin Gis mapped to node sjin the jth operations.
Finally, we select the output (e.g., Oi) with the largest match
score (−j=1(k
i=1(dcji−dsji)2)1/2) as the optimal map
between the inferred social graph Gand social graph Gfrom
social networks. The locations of users in inferred social graph
Gunmapped to the users in social graph Gare identified as
the poisoning locations.
Note that existing work [38] has validated that the between-
ness of nodes in the social graph follows a heavy-tailed
distribution and that there exist only a small number of nodes
with high betweenness. Thus, k| G|,where|G|is the
number of nodes in G. Therefore, although there are k!maps
between the klandmarks in Gand klandmarks in G,itis
possible to brute force enumeration.
IV. PERFORMANCE EVAL U AT I O N
A. Setup
1) Data Set: We use two real-world data sets, i.e., loc-
Gwalla and loc-Brightkite [37], to validate the performance
of P3. The loc-Gwalla data set consists of 196 591 users
(i.e., nodes), 950 327 edges (i.e., friendships), and 6.4 million
locations from February 2009 to October 2010. Another data
set, i.e., loc-Brightkite, records 4.5 million locations of 58 228
users from April 2008 to October 2010 and 214 078 edges
among users.
2) Baseline Work for Comparison: In addition, we compare
this article P3with two baseline works dubbed as Base-
line1 and Baseline2 since there is no existing work concerning
location privacy protection against poisoning attacks in MEC.
Specifically, in Baseline1, algorithms know the number of
poisoning locations nin the location data set and randomly
pick out nlocations from the Nlocations in the location data
set as the poisoning locations. On the contrary, in Baseline2,
algorithms randomly pick out a certain number of locations
from the Nlocations without knowing the number of poison-
ing locations and regard these selected locations as poisoning
locations. Nis the number of locations in the location data
set. Note that both the two baseline works indeed quantify
the performance of random guessing, but the first baseline
work randomly guesses with the knowledge of the number of
poisoning locations in the location data set.
3) Poisoning Attacks: Furthermore, to quantify the privacy
preservation that P3provides, we consider two kinds of
poisoning attacks. To make concrete, in the first kind of
poisoning attacks, adversaries use the locations synthesized in
the existing work [39] as the poisoning locations. The existing
work [39] considered the geographic and semantic features
of locations when synthesizing locations. In the second kind
of poisoning attacks, adversaries randomly generate locations
in the region bounded by users’ locations as the poisoning
locations. Therefore, the poisoning locations in the first kind
of poisoning attacks are more plausible to users’ locations.
Note that, in fact, the two kinds of poisoning attacks repre-
sent the sophisticated and straightforward poisoning strategies,
respectively. Thereafter, the two kinds of poisoning attacks are
dubbed as Poison1 and Poison2 for the ease of presentation.
4) Metrics: What’smore,weusethemetric, identification
rate, i.e., the failure rate of poisoning attacks. The definition of
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
822 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020
Fig. 5. Identification rate varying with the number of locations Nin both loc-Gwalla and loc-Brightkite with the rate of poisoning locations n/N=0.2and
the length of each user’s trace le =200. (a) Identification rate in the loc-Gwalla data set. (b) Identification rate in the loc-Brightkite data set.
the identification rate is as follows. In addition, we investigate
the impact of the number of landmarks k, the number of
locations N, the rate of poisoning locations n/N(the number
of poisoning locations n), the walk length wl, the length of
each user’s trace le, and the threshold θoon the identification
rate.
Definition 1: Assume that npoisoning locations are iden-
tified. Then, the identification rate, i.e., the failure rate of
poisoning attacks, is n/n.
5) Parameter Settings: Other default parameters are set as
follows: the number of landmarks k∈(3,7); the number of
locations N∈(300,640)million in the loc-Gwalla data set,
and N∈(200,450)million in the loc-Brightkite data set; the
rate of poisoning locations n/N∈(1%,20%); the length of a
specific user’s trace is within (200,500); the walk length of
the random walk trace wl∈(10,100); the threshold θo=0.8;
and the dimension of the learned vectors d=128. Simulations
are implemented in C++ and conducted on a desktop PC with
an Intel Core i7 3.41GHz processor and 8GB RAM.
B. Impact of Number of Locations N on Identification Rate
Fig. 5 shows the impact of the number of locations Non
the identification rate. It can be observed that the identification
rates in the three algorithms, Baseline1, Baseline2, and P3,
decrease with the increasing number of locations N.The
reason is that, intuitively, the poisoning locations are more
likely to be cloaked with the locations of users when the
number of locations Nis enlarged, thereby decreasing the
identification rate. Moreover, in the first step, constructing
the inferred social graph, more users definitely result in
incorrect social relationships inferred by the algorithm P3and
further lead to the failure of identifying poisoning locations.
In Baseline1 and Baseline2, it is more difficult for algorithms
to pick out the poisoning locations via random guess when the
number of locations Nincreases. In addition, the identification
rate in P3is less affected by the number of locations Nthan
that in baselines, as P3executes the sophisticated algorithms to
analyze the relationships among users and the map between the
inferred social graph and social graph from social networks.
In addition, the identification rate in the Baseline1 algorithm
is much larger than that in Baseline2 since the algorithm in
Baseline1 obtains more side information about the poisoning
locations, i.e., the number of poisoning locations n.
In addition, when adversaries launch the first kind of
poisoning attacks, the identification rate in P3varies within
(0.6029,0.796), and the identification rates in baselines
decrease to 0.174 and 0.0198 from 0.2and0.06, respectively.
On the contrary, when adversaries launch the second kind
of poisoning attacks, the identification rates in P3, Baseline1,
and Baseline2 vary from (0.7485,0.7975),(0.1744,0.2),and
(0.0198,0.066), respectively. It is obvious that the first kind
of poisoning attacks, i.e., Poson1, is more affected by the
increment of the number of locations N. What’s more, poi-
soning attacks, Poson1, are more effective than the second
kind of poisoning attacks, Poson2, as the identification rates
in the three algorithms in Poson1 are less than that in Poson2.
It is attributed to that the poisoning locations generated in
Poson1 exhibit much similar mobility models and are plausible
to users’ locations.
Furthermore, it is interesting to observe that the identifi-
cation rates in the loc-Gwalla data set are larger than those
in the loc-Brightkite data set. To this end, we analyze the
two data sets and find that the average degree and graph
density in loc-Gwalla and loc-Brightkite are 9.7, 4.92 E−5,
and 7.5, 1.32E−5, respectively. It means that the loc-Gwalla
data set includes much more relationships among users than
the loc-Brightkite data set, and thus, it is difficult for the
three algorithms to identify the poisoning locations in the
loc-Brightkite data set with less relationship.
C. Impact of Rate of Poisoning Locations n/Non
Identification Rate
Fig. 6 shows the impact of the rate of poisoning locations
n/Non the identification rate. We can observe that the
identification rate in the algorithm, P3, increases with the
rate of poisoning attacks n/Nwhen n/Nincreases from 1%
to 15% in the loc-Gwalla data set and from 1% to 13% in
the loc-Brightkite data set. Then, the identification rate in P3
slowly changes when the rate of poisoning attacks n/Ncontin-
ually increases in both the loc-Gwalla and loc-Brightkite data
sets. The reason is that, in algorithm P3, when a small number
of poisoning locations exist in data sets, poisoning locations
are more likely to be cloaked in the users’ locations both in
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.:P
3ATTAC K S I N M E C 823
Fig. 6. Impact of the rate of poisoning locations n/Non the identification rate in both (a) loc-Gwalla and (b) loc-Brightkite with the number of locations
in loc-Gwalla N=300 billion, the loc-Brightkite N=200 billion, and the rate of poisoning locations n/N=0.2.
Fig. 7. Impact of the rate of the length of user’s trace le on the identification rate in both (a) loc-Gwalla and (b) loc-Brightkite with the number of locations
in loc-Gwalla N=300 billion, the loc-Brightkite N=200 billion, and the length of each user’s trace le =200.
the two steps: constructing inferred social graph and mapping
the inferred social graph and the social graph from the social
networks. As a result, P3can identify more poisoning loca-
tions when the rate of poisoning locations n/Nvaries within
(1%,15%). Likewise, it is easier for the Baseline1 algorithm to
pick out poisoning locations when n/Nis increased. When the
rate of poisoning locations n/Nincreases to a certain value,
the identification rates in the algorithms, P3and Baseline1,
do not increase as a larger n/Ndenoted more locations and,
thus, fewer probabilities for algorithms to find the poisoning
locations. Moreover, the Baseline1 algorithm is less affected
by n/N, with the identification rates increasing from 0.005
to 0.2, as the sophisticated algorithm P3is more sensitive to
the n/Nthan Baseline1. Finally, it is interesting to observe
that the identification rate in Baseline2 decreases with n/N,
as more locations decrease the probability of random guess.
In addition, the identification rate in the poisoning attacks,
Poison1, is less than that in Poison2. The reasons are the same
as analyzed earlier, i.e., the poisoning locations in Poison1 are
more plausible to users’ locations than that in Poison2. As a
result, it is more difficult to identify such poisoning locations.
Furthermore, the identification rate in the poisoning attacks,
Poison1, is much affected by the varying rate of poison-
ing locations n/Ncompared with the identification rate in
Poison2. To make concrete, in P3, the identification rate in
Poison1 increases to 0.8 from 0.566, while the identification
rate in Poison2 varies within (0.73,0.8). The reasons are the
same as analyzed earlier, i.e., Poison2 is a straightforward
poisoning strategy, and Poison1 is a sophisticated poisoning
strategy. As such, Poison1 is more likely to be affected by the
rate of poisoning locations n/N.
In addition, in P3, the identification rate in the loc-Gwalla
data set is larger than that in the loc-Brightkite data set with the
identification rate varying within (0.56,0.8)and (0.608,0.8),
respectively. The reasons are the same as analyzed earlier,
i.e., loc-Gwalla contains much more relationships among users
than the loc-Brightkite data set with an average degree of
9.7 and graph density 4.92E−5. Thus, it is difficult for
the three algorithms to identify the poisoning locations in the
loc-Brightkite data set with less relationship.
D. Impact of Length of User’s Trace le on Identification Rate
Fig. 7 shows the impact of the length of the user’s trace le on
the identification rate. First, we can see that the identification
rates in algorithms, P3, Baseline1, and Baseline2, decrease
with the length of the user’s trace le. Specifically, in the
loc-Gwalla data set, the identification rate in P3decreases
to 0.809 from 0.6568, and in the loc-Brightkite data set,
it decreases by 0.162. In Baseline1, the identification rate
decreases to 0.01 in the loc-Gwalla data set and 0.0067 in
loc-Brightkite. Similarly, in Baseline2, the identification rate
varies within (0.094,0.064)in loc-Gwalla and (0.096,0.065)
in loc-Brightkite, respectively. The reasons are that poisoning
locations generated in poisoning attacks Poison1 are more
plausible to users’ locations when the length of the user’s trace
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
824 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020
Fig. 8. Impact of (a) number of landmarks k, (b) walk length of random trace wl, and (c) threshold θoon the identification rate in both loc-Gwalla and
loc-Brightkite.
le is enlarged. As such, it is difficult for P3to identify these
plausible locations (i.e., poisoning locations). Regarding the
identification rates in Baseline1 and Baseline2, it is attributed
to the increasing number of locations resulted from the length
of the user’s trace le.
Furthermore, in P3, the identification rate in poisoning
attacks Poison1 is largely affected by the length of the user’s
trace le compared with that in Poison2. Specifically, in P3,
the identification rate in Poison1 varies within 0.6389 and
0.8057, while in Poison2, it decreases to 0.7308 from 0.809.
As analyzed earlier, the adversaries launching Poison1 can
more credibly imitate the locations of users when the length
of the user’s trace le is enlarged, resulting in the poisoning
locations more plausible to users’ locations. Finally, the iden-
tification rate in the loc-Brightkite data set is larger than that
in the loc-Gwalla data set.
E. Impact of Number of Landmarks k on Identification Rate
In the following, as shown in Fig. 8, we investigate the
impact of the number of landmarks k, the walk length of the
random walk trace wl, and the threshold θoon the identification
rates in algorithm P3since Baseline1 and Baseline2 are not
affected by these parameters.
The impact of the number of landmarks kon the iden-
tification rate is shown in Fig. 8(a). We observe that ini-
tially as kincreases, the identification rate rapidly increases
since the increasing number of landmarks improves the pre-
cision of the map between the inferred social graph and
the social graph from social networks. Specifically, in loc-
Gwalla, the identification rate in Poison1 increases to 0.79 and,
in Poison2, increases to 0.8. On the contrary, in loc-Brightkite,
the identification rates in Poison1 and Poison2 vary within
(0.74,0.8)and (0.78,0.83), respectively. However, when the
number of landmarks continually increases, the identification
rate decreases, as more errors are injected during the process
of selecting landmarks, and thereby, the set of landmarks in
the inferred graph and the social graph from social networks
are less identical. We can see that the identification rates in
the two kinds of poisoning attacks and two data sets decrease
to 0.66, 0.62, 0.74, and 0.79, respectively.
F. Impact of Walk Length wl on Identification Rate
Fig. 8(b) shows the impact of the walk length of the
random walk trace wl on the identification rate. It shows the
identification rates in loc-Gwalla and loc-Brightkite increase
sharply when wl increases from 10 to 50 and 10 to 70, respec-
tively. Thereafter, the identification rates in the loc-Gwalla
and loc-Brightkite data sets saturate. Moreover, the identifi-
cation rate in loc-Gwalla is larger than that in loc-Brightkite.
It is attributed to that when the walk length of the random
walk trace wl is enlarged, it improves the precision of the
inferred social relationships and, thereby, the identification
rate. Moreover, the larger average degree and graph density
in loc-Gwalla contribute to the better privacy preservation
algorithms provided in the loc-Gwalla data set.
G. Impact of Threshold θoon Identification Rate
The identification rates in two data sets and two poisoning
attacks in P3are shown in Fig. 8(c) when we vary the threshold
θofrom 0.4to0.8. It shows the identification rates increase
with the threshold θosince a larger threshold θoimproves
the precision of the inferred social relationships and further
enlarges the identification rates. Moreover, the identification
rate in poisoning attacks, Poison1, is less than that in Poison2,
as the poisoning locations in Poison1 exhibit more similar
features with users’ locations.
V. R ELATED WORK
A. Privacy Preservation in MEC
One kind of work focused on launching attacks to disclose
location privacy in MEC scenarios. Specifically, the work [7]
used the chaff service to protect the location privacy of mobile
users in MECs. Another work [8] proposed that a third party
whose identity is not reliable accesses the MEC platform; it
will pose a potential threat. Similarly, Vratonjic et al. [40]
proposed that the use of shared public IP addresses would
pose a threat to location privacy. Li et al. [9] studied the
problem of online security-aware MEC under jamming attacks
and proposed a secure edge computing method based on the
MAB framework with sleeping arms to adaptively select the
trusted MEC server to protect the user’s location privacy.
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
ZHAO et al.:P
3ATTAC K S I N M E C 825
Another kind of related work studied privacy preservation
in MEC. To make concrete, Zhang et al. [41] proposed to
solve the problem of data security and privacy in MEC by
using cryptographic-based technology. Moreover, Du et al. [8]
analyzed the privacy issues in MEC from the aspects of
data aggregation and data mining and used the anonymous
mechanism and traffic detection techniques combined with
machine learning. What’s more, Li et al. [42] proposed a data
aggregation scheme to protect the privacy of terminal devices
in the MEC-assisted IoT scenario.
In addition, other studies focused on privacy protec-
tion in application scenarios such as task offloading, MSS,
blockchain, and big data support in MEC. For example, liter-
ature [10] established a safe MEC framework for pilgrimage
and used it to switch between fog computing terminal (FCT)
and cloud. Studies, such as [11], focused on the privacy
problems caused by the wireless task offloading characteristics
of MEC and proposed the privacy awareness task offloading
algorithm based on CMDP to minimize the delay and energy
consumption. The work [5] proposed an MSS with MEC as
the core to protect the network privacy of mobile users. The
follow-up study [6] found anonymous security and privacy
issues in blockchain therapy under the background of MEC
application in-home therapy management and proposed a safe
treatment framework. Moreover, the work [4] proposed an
online learning algorithm to predict user learning and defined
a strict attack model to minimize privacy loss in a long time.
However, the above-mentioned literature either focused on
the privacy protection or launched attacks to disclose the data
privacy in MEC and did not study both the strategy of poi-
soning attacks and the defense mechanisms against poisoning
attacks in MEC. On the contrary, this article dedicates to
design the privacy-preserving algorithm against the poisoning
attacks in MEC.
B. Studies About Poisoning Attacks
Another line of somehow related works [12]–[19] focused
on the poisoning attacks in an adversarial machine learn-
ing scenario. Specifically, literature [12] reviewed security
threats to machine learning including the poisoning attacks.
The work [13] explored poisoning attacks on neural nets.
Moreover, the study [14] made systematic research of data
poisoning attacks for online learning. Another work [15]
proposed a new method for generating undetectable attacks
automatically using the backpropagation characteristics of the
trained deep neural network (DNN). The work [16] developed
three new attacks that can bypass the extensive data disinfec-
tion defenses. On the contrary, the study in [17] discussed the
vulnerability of the weighted method of domain adaptation to
poisoning attacks in an adversarial machine learning environ-
ment. The latest work [18] focused on the optimal poisoning
attack under multitask learning (MTL) model. Furthermore,
literature [19] presented a new backdoor attack without label
poisoning, which proves that it is possible.
However, the above-mentioned work mainly focused
on designing a poisoning attack algorithm in machine
learning scenario without considering the corresponding
defense mechanisms. On the contrary, this article dedicates to
the privacy-preserving scheme against the poisoning attacks
in MEC. Furthermore, the MEC exhibits different character-
istics compared with the machine learning settings, e.g., task
offloading and time-delay sensitivity, and thus, P3attacks in
MEC face significant new challenges.
Another kind of work [20]–[23] focused on poisoning
attacks in the IoT. To make concrete, the work in [20] studied
how to effectively carry out two types of data poisoning
attacks, namely, exploitable attack and target attack, and
proposed an optimal attack framework. Another work [21]
proposed a method to identify harmful data using the con-
textual information of the origin and transformation of data
points in the training set. Moreover, literature [22] developed
an optimization problem that solved the problem of fake user
ratings. The latest work [23] designed an intelligent attack
mechanism to achieve the maximum attack effectiveness while
covering up the attack behavior.
Unfortunately, these works only launched the poisoning
attacks in the IoT without investigating the defense mech-
anisms against the poisoning attacks in IoT. In this article,
we concentrate on the privacy-preserving algorithm against
the poisoning attacks in MEC rather than the IoT.
The third kind of related work focused on the poisoning
attack in threat intelligence systems [24], malware detection
systems [25], unsupervised node embedding methods [26], and
naive Bayes spam filters, respectively [27]. Nevertheless, these
works considered different scenarios from that considered
in this article. More importantly, these works did not study
the defense mechanisms, while this article indeed focuses on
the privacy-preserving algorithm against the poisoning attacks
in MEC.
In addition, there are several works [28]–[32] focused on the
defenses against poisoning attacks. Specifically, studies [28]
and [29] studied an efficient tag inversion poisoning attack
optimization algorithm, suspicious data point detection, and
relabeling mechanism to mitigate the impact of such poisoning
attacks respectively. Aiming at the poisoning attacks, litera-
ture [30] studied the function of transforming the data set
from the source domain to the target domain with cluster
separability under the adversarial settings. Moreover, stud-
ies [31] and [32] considered feedback-based content poisoning
mitigation in named data networks and prevention of ARP
poisoning in the IoT, respectively.
However, these works studied the defenses in scenarios,
machine learning, named data networks, and IoT, which are
different from the scenario considered in this article. Fur-
thermore, the scenario MEC exhibits different characteristics,
e.g., task offloading and time-delay sensitivity, and thus,
the above-mentioned work is not applicable to MEC.
VI. CONCLUSION
In this article, we propose P3, the first attempt toward the
scheme against poisoning attacks in MEC. The main idea is
to construct the inferred social graph utilizing feature learning
and search the optimal map between the inferred social graph
and the social graph from social networks to identify the
poisoning locations. Extensive experiments on two real-world
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.
826 IEEE TRANSACTIONS ON COMPUTATIONAL SOCIAL SYSTEMS, VOL. 7, NO. 3, JUNE 2020
data sets, two baseline works, and two kinds of poisoning
attacks have demonstrated the effectiveness of P3.
REFERENCES
[1] F.-Y. Wang, Y. Tang, X. Liu, and Y. Yuan, “Social education: Opportuni-
ties and challenges in cyber-physical-social space,” IEEE Trans. Comput.
Social Syst., vol. 6, no. 2, pp. 191–196, Apr. 2019.
[2] R. Basak, S. Sural, N. Ganguly, and S. K. Ghosh, “Online
public shaming on Twitter: Detection, analysis, and mitigation,”
IEEE Trans. Comput. Soc. Syst., vol. 6, no. 2, pp. 208–220,
Apr. 2019.
[3] S. H. Sajadi, M. Fazli, and J. Habibi, “The affective evolution of social
norms in social networks,” IEEE Trans. Comput. Social Syst.,vol.5,
no. 3, pp. 727–735, Sep. 2018.
[4] P. Zhou, K. Wang, J. Xu, and D. Wu, “Differentially-private and
trustworthy online social multimedia big data retrieval in edge com-
puting,” IEEE Trans. Multimedia, vol. 21, no. 3, pp. 539–554,
Mar. 2019.
[5] P. Zhang, M. Durresi, and A. Durresi, “Mobile privacy protection
enhanced with multi-access edge computing,” in Proc. IEEE 32nd Int.
Conf. Adv. Inf. Netw. Appl. (AINA), May 2018.
[6]M.A.Rahmanet al., “Blockchain-based mobile edge computing
framework for secure therapy applications,” IEEE Access,vol.6,
pp. 72469–72478, 2018.
[7] T. He, E. N. Ciftcioglu, S. Wang, and K. S. Chan, “Location privacy
in mobile edge clouds,” in Proc. IEEE 37th Int. Conf. Distrib. Comput.
Syst. (ICDCS), Jun. 2017.
[8] M. Du, K. Wang, Y. Chen, X. Wang, and Y. Sun, “Big data privacy
preserving in multi-access edge computing for heterogeneous Inter-
net of Things,” IEEE Commun. Mag., vol. 56, no. 8, pp. 62–67,
Aug. 2018.
[9] B. Li, T. Chen, X. Wang, and G. B. Giannakis, “Secure edge computing
in IoT via online learning,” in Proc. 52nd Asilomar Conf. Signals, Syst.,
Comput., Oct. 2018.
[10] A. Rahman, E. Hassanain, and M. S. Hossain, “Towards a secure
mobile edge computing framework for Hajj,” IEEE Access,vol.5,
pp. 11768–11781, 2017.
[11] X. He, J. Liu, R. Jin, and H. Dai, “Privacy-aware offloading in mobile-
edge computing,” in Proc. GLOBECOM IEEE Global Commun. Conf.,
Dec. 2017.
[12] Q. Liu, P. Li, W. Zhao, W. Cai, S. Yu, and V. C. M. Leung, “A survey
on security threats and defensive techniques of machine learning: A data
driven view,” IEEE Access, vol. 6, pp. 12103–12117, 2018.
[13] A. Shafahi et al., “Poison frogs! Targeted clean-label poisoning
attacks on neural networks,” in Proc. Adv. Neural Inf. Process. Syst.,
2018.
[14] Y. Wang and K. Chaudhuri, “Data poisoning attacks against
online learning,” Aug. 2018, arXiv:1808.08994. [Online]. Available:
https://arxiv.org/abs/1808.08994
[15] F. Khalid, M. A. Hanif, S. Rehman, and M. Shafique, “TrISec:
Training data-unaware imperceptible security attacks on deep
neural networks,” Nov. 2018, arXiv:1811.01031. [Online]. Available:
https://arxiv.org/abs/1811.01031
[16] P. W. Koh, J. Steinhardt, and P. Liang, “Stronger data poisoning
attacks break data sanitization defenses,” Nov. 2018, arXiv:1811.00741.
[Online]. Available: https://arxiv.org/abs/1811.00741
[17] M. Umer, C. Frederickson, and R. Polikar, “Adversarial poisoning of
importance weighting in domain adaptation,” in Proc. IEEE Symp. Ser.
Comput. Intell. (SSCI), Nov. 2018.
[18] M. Zhao, B. An, Y. Yu, S. Liu, and S. J. Pan, “Data poisoning attacks on
multi-task relationship learning,” in Proc. 32nd AAAI Conf. Artif. Intell.,
2018, pp. 2628–2635.
[19] M. Barni, K. Kallas, and B. Tondi, “A new backdoor attack in
CNNs by training set corruption without label poisoning,” Feb. 2019,
arXiv:1902.11237. [Online]. Available: https://arxiv.org/abs/1902.11237
[20] C. Miao, Q. Li, H. Xiao, W. Jiang, M. Huai, and L. Su, “Towards data
poisoning attacks in crowd sensing systems,” in Proc. 18th ACM Int.
Symp. Mobile Ad Hoc Netw. Comput. (Mobihoc), 2018.
[21] N. Baracaldo, B. Chen, H. Ludwig, A. Safavi, and R. Zhang, “Detecting
poisoning attacks on machine learning in IoT environments,” in Proc.
IEEE Int. Congr. Internet Things (ICIOT), Jul. 2018.
[22] M. Fang, G. Yang, N. Z. Gong, and J. Liu, “Poisoning attacks to graph-
based recommender systems,” in Proc. 34th Annu. Comput. Secur. Appl.
Conf. (ACSAC), 2018, pp. 1–12.
[23] C. Miao, Q. Li, L. Su, M. Huai, W. Jiang, and J. Gao, “Attack under
disguise: An intelligent data poisoning attack mechanism in crowdsourc-
ing,” in Proc. World Wide Web Conf. World Wide Web (WWW), 2018,
pp. 13–22.
[24] N. Khurana, S. Mittal, and A. Joshi, “Preventing poisoning attacks
on AI based threat intelligence systems,” Jul. 2018, arXiv:1807.07418.
[Online]. Available: https://arxiv.org/abs/1807.07418
[25] S. Chen et al., “Automated poisoning attacks and defenses in malware
detection systems: An adversarial machine learning approach,” Comput.
Secur., vol. 73, pp. 326–344, Mar. 2018.
[26] M. Sun et al., “Data poisoning attack against unsupervised node
embedding methods,” Oct. 2018, arXiv:1810.12881. [Online]. Available:
https://arxiv.org/abs/1810.12881
[27] D. J. Miller, X. Hu, Z. Xiang, and G. Kesidis, “A mixture
model based defense for data poisoning attacks against naive Bayes
spam filters,” Oct. 2018, arXiv:1811.00121. [Online]. Available:
https://arxiv.org/abs/1811.00121
[28] A. Paudice, L. Muñoz-González, and E. C. Lupu, “Label sanitization
against label flipping poisoning attacks,” in Proc. Joint Eur. Conf. Mach.
Learn. Knowl. Discovery Databases, 2018, pp. 5–15.
[29] A. Paudice, L. Muñoz-González, A. Gyorgy, and E. C. Lupu, “Detec-
tion of adversarial training examples in poisoning attacks through
anomaly detection,” Feb. 2018, arXiv:1802.03041. [Online]. Available:
https://arxiv.org/abs/1802.03041
[30] C. V. S. Praven and C. S. Kumar, “Domain adversarial representation
learning for data independent defenses against poisoning attacks,” in
Proc. ICLR, 2018, pp. 1–3.
[31] W. Cui, Y. Li, Y. Xin, and C. Liu, “Feedback-based content poisoning
mitigation in named data networking,” in Proc. IEEE Symp. Comput.
Commun. (ISCC), Jun. 2018, pp. 759–765.
[32] W. Gao et al., “ARP poisoning prevention in Internet of Things,” in Proc.
9th Int. Conf. Inf. Technol. Med. Edu. (ITME), Oct. 2018, pp. 733–736.
[33] P. Zhao, J. Li, F. Zeng, F. Xiao, C. Wang, and H. Jiang, “ILLIA: Enabling
k-anonymity-based privacy preserving against location injection attacks
in continuous LBS queries,” IEEE Internet Things J., vol. 5, no. 2,
pp. 1033–1042, Apr. 2018.
[34] P. Zhao et al., “P3-LOC: A privacy-preserving paradigm-driven frame-
work for indoor localization,” IEEE/ACM Trans. Netw., vol. 26, no. 6,
pp. 2856–2869, Dec. 2018.
[35] T. Opsahl, F. Agneessens, and J. Skvoretz, “Node centrality in weighted
networks: Generalizing degree and shortest paths,” Social Netw., vol. 32,
no. 3, pp. 245–251, Jul. 2010.
[36] R. E. Bellman, “Book review: Combinatorial optimization: Networks
and matroids,” Bull. Amer. Math. Soc., vol. 84, no. 3, pp. 461–464,
May 1978.
[37] Gwalla and Brightkite Data. Accessed: Dec. 21, 2019. [Online]. Avail-
able: http://snap.stanford.722edu/data/index.html
[38] W. Gao, G. Cao, A. Iyengar, and M. Srivatsa, “Supporting cooperative
caching in disruption tolerant networks,” in Proc. 31st Int. Conf. Distrib.
Comput. Syst., Jun. 2011, pp. 1–12.
[39] V. Bindschaedler and R. Shokri, “Synthesizing plausible privacy-
preserving location traces,” in Proc. IEEE Symp. Secur. Privacy (SP),
May 2016, pp. 1–18.
[40] N. Vratonjic, K. Huguenin, V. Bindschaedler, and J.-P. Hubaux,
“A location-privacy threat stemming from the use of shared pub-
lic IP addresses,” IEEE Trans. Mobile Comput., vol. 13, no. 11,
pp. 2445–2457, Nov. 2014.
[41] J. Zhang, B. Chen, Y. Zhao, X. Cheng, and F. Hu, “Data security
and privacy-preserving in edge computing paradigm: Survey and open
issues,” IEEE Access, vol. 6, pp. 18209–18237, 2018.
[42] X. Li, S. Liu, F. Wu, S. Kumari, and J. J. P. C. Rodrigues, “Privacy
preserving data aggregation scheme for mobile edge computing assisted
IoT applications,” IEEE Internet Things J., vol. 6, no. 3, pp. 4755–4763,
Jun. 2019, doi: 10.1109/jiot.2018.2874473.
Authorized licensed use limited to: Donghua University. Downloaded on September 11,2020 at 01:59:52 UTC from IEEE Xplore. Restrictions apply.