Conference PaperPDF Available

We Know Where You Are: Home Location Identification in Location-Based Social Networks

August 2016

August 2016

DOI:10.1109/ICCCN.2016.7568598

Conference: 2016 25th International Conference on Computer Communication and Networks (ICCCN)

Authors:

Yulong Gu

Tsinghua University

Yuan Yao

Tsinghua University

Weidong Liu

Tsinghua University

Content uploaded by Yulong Gu

Content may be subject to copyright.

We Know Where You Are: Home Location

Identiﬁcation in Location-Based Social Networks

Yulong Gu, Yuan Yao, Weidong Liu, Jiaxing Song

Department of Computer Science and Technology

Tsinghua University

Beijing, 100084, China

guyulongcs@gmail.com, yaoyuan13@mails.tsinghua.edu.cn, {liuwd, jxsong}@tsinghua.edu.cn

Abstract—The rapid spread of smartphones has led to the

increasing popularity of Location-Based Social Networks(LBSNs)

like Foursquare, Gowalla, Facebook Places and so on where users

can publish information about their current location. In LBSNs,

identifying home locations of users is very important for vari-

ous applications like effective location-based advertisement and

recommendation. However, this problem is rather challenging

because the location information in LBSNs is sparse and noisy:

Only a small percentage of users share their home location

information due to privacy concerns; users may check in at

diverse places far from their home and make friends far away;

many users even do not have any check-in information. In this

paper, we propose a trust-based inﬂuence model, named as TSU

to solve the problem. To be speciﬁc, TSU is a Trust-based uniﬁed

probabilistic model that models edges in LBSNs based on signals

from Social relationship data(social friendship, social trust) and

User-centric data(check-in data) in LBSN. We proposed a Home

Location Identiﬁcation method based on TSU model and evaluate

it on a large real-world LBSNs dataset. Extensive experiments

demonstrate that our method signiﬁcantly outperforms state-of-

the-art methods.

Keywords—Home Location Identiﬁcation, Location-Based So-

cial Networks, Trust, Inﬂuence Model, Social Networks

I. INTRODUCTION

In recent years, we have seen a rapid proliferation of

Social Networks such as Facebook1, Twitter2, Google+3and

so on. As the largest online social network in the world,

Facebook has over 1.18 billion monthly active users as of

August 2015 4. The rapid growth of mobile internet and the

location-acquisition technologies has led to the increasing pop-

ularity of Location-Based Social Networks(LBSNs) such as

Foursquare5, Gowalla6and Brightkite7by embedding location

into Social Networks. Users in LBSNs can conveniently log

their activity histories with spatio-temporal data by checking

in at various venues(e.g., scenic spots, restaurants, airports)

at any time using their smart phones. The inherent nature of

LBSNs encourages users to publish their current location(i.e.,

check-ins). This same is true for most popular social network

websites, like Facebook, Twitter, Google+ and so on.

1https://www.facebook.com

2https://twitter.com

3https://plus.google.com/

4https://en.wikipedia.org/wiki/Facebook

5https://foursquare.com

6https://en.wikipedia.org/wiki/Gowalla

7https://en.wikipedia.org/wiki/Brightkite

User proﬁling, which aims to infer user’s attributes, such as

age, gender, interests, home location, education and so on, has

been a hot topic in academic. Many research have been done

on user proﬁling[26, 39] to serve personalized search[8, 32],

targeted advertisement[5, 31], news recommendation[4, 22]

and so on. As the rapid growth of LBSNs, Home Location

Identiﬁcation of users becomes one of the most important

user proﬁling problems because home location of users are ex-

tremely important for various applications to provide effective

location-based services. For example, proﬁling users’ home lo-

cations enables search engines to provide personalized search

results in mother tongue of users, news sites to recommend

localized news and advertisers to recommend local ads. The

home location of a user is deﬁned as the relative “permanent”

place where the user spend most of his time in[20]. It captures

the major and static geographic scope of the user and therefore

provides valuable information for personalized services.

The home location problem is quite challenging because

signals that may help to identify home locations of users are

sparse and noisy. Firstly, only a small percentage of users

provide their home locations in Social Networks due to privacy

concerns. On twitter, only a few people (16%) register city

level locations in their proﬁles and most of users leave general,

non-sensical or even blank information[20]. Secondly, users

may check in at various places far from their home and

make friends far away. Thirdly, many users do not have any

check-in data. As of September 2013, only 30% of users

provide their location information to at least one social media

account and 12% of adult smartphone owners have used geo-

social services to check-in at some location[2]. This problem

has been attracting great interests of researchers in academic

recently[9, 10, 19–21, 30]. Existing approaches can mainly be

divided into two parts: Content Based Approach [9, 10, 19]

and Check-in Based Approach[20, 21, 30]. Content based

approach infers home location of users using models based on

extracted location information from texts like tweets in Social

Networks. Check-in based approach infers home location of

users leveraging check-in data of users.

In this paper, we propose a trust-based inﬂuence model,

named as TSU to model edges in LBSNs. We represent a

LBSN as a directed heterogenous graph where the nodes can

be users or venues. Edges in the graph can be friendships

between users, check-in edges from users to venues. TSU is a

Trust-based uniﬁed probabilistic model that models edges in

the graph based on signals from Social relationship data(social

friendship, social trust) and User-centric data(check-in data) in

LBSNs. In TSU, we model each node with a location and an

inﬂuence scope. We assume each edge t→hfrom a tail

node tto a head node his generated according to nodes’

locations, inﬂuence scope of the head node h, social trust

value of the head node hfor the tail node t. TSU is based on

the motivation that people tend to make friends with people

living near and check in at venues that are near from them,

people tend to follow celebrities and visit popular places, and

people tend to make friends who have more common friends

with them. In this paper, we propose the idea of using “social

trust” which measure closeness in social structure to model

edges in LBSNs. Speciﬁcally, we measure social trust between

users by calculating Jaccard Similarity[18] on friend sets of

users. Social trust value will be higher if two users have more

common friends. To the best of our knowledge, we are the

ﬁrst that propose the idea of using “social trust" for Home

Location Identiﬁcation.

For the Home Location Identiﬁcation problem, we propose

a two-stage Home Location Identiﬁcation method based on

TSU model. In the ﬁrst stage, for users who have check-in

data, we develop a single-pass clustering algorithm to cluster

their check-in data and select the center of largest cluster as

home locations of them. In the second stage, we use a global

iteration method to estimate home location of users so that

the joint conditional probability of generating all the edges is

maximum.

We conduct extensive experiments to evaluate our Home

Location Identiﬁcation method and compare with state-of-

the-art methods[12, 20, 21, 30, 34] based on a large-scale

Foursquare dataset containing about 836K users and 649K

venues. Experiment results show that our method can predict

home locations of users who have check-in data at the accuracy

of 92.1% though the average check-in number of each user

is only about 2.7. Our method can predict home locations

of all users who don’t have home location at the accuracy

of 63.1%, outperforming state-of-the-art methods by about

6.9%, when only 16.7% users have check-in data and 20%

of users don’t have home location. In a word, out method

signiﬁcantly outperforms state-of-the-art methods, and achieve

the best performance.

Our main contributions are:

•We ﬁrstly propose a trust-based uniﬁed probabilistic

model called TSU for Home Location Identiﬁcation.

•We ﬁrstly propose the idea of using social trust to

measure closeness in social structure for Home Location

Identiﬁcation problem.

•We propose a two-stage Home Location Identiﬁcation

method based on TSU model and extensive experiments

demonstrate that our method outperforms state-of-the-art

methods by about 6.9%.

The rest of the paper is structured as follows: Section

II introduces related work. In Section III, we describe the

dataset. In Section IV, we formulate the Home Location

Identiﬁcation problem. In Section V, we present the trust-

based inﬂuence model TSU. In Section VI, we introduce our

method for Home Location Identiﬁcation problem. In Section

VII, we demonstrate the experiment results. In Section VIII,

we conclude the paper and discuss future work.

II. RE LATE D WORK

In this section, we divide related work into three parts: User

Proﬁling, Human Mobility and Home Location Identiﬁcation.

A. User Proﬁling

User proﬁling aims to infer user’s attributes, such as age,

gender, interests, home location, education and so on. Mislove

et al. [26] propose a method of inferring users’ attributes like

colleges, matriculation years and majors of students by detect-

ing communities in social networks based on the phenomenon

that users with common attributes are more likely to be friends

and often form dense communities. Zhong et al. [39] extract

rich semantics of users’ check-ins, employ tensor factorization

to draw out low dimensional representations of users’ intrinsic

check-in preferences and use the extracted features in classi-

ﬁer to infer various demographic attributes. With increasing

popularity of location-based services (LBSs), there have been

growing concerns for location privacy. Many research have

been done to protect privacy of users recently[15, 16, 27–

29, 36, 38].

User proﬁling are used for various applications like per-

sonalized search, targeted advertisement and news recommen-

dation. [8, 32] focus on proﬁling users’ interests to serve

personalized search. Qiu and Cho [32] show that users’

preferences can be learned accurately even from little click-

history data and they can help improve the performance of

personalized search signiﬁcantly.

B. Human Mobility

Many research have been done on studying social and

temporal characteristics of how people use the location shar-

ing services. The study on patterns of human mobility are

signiﬁcant for social science, design of future location-based

services, trafﬁc forecasting, urban planning and so on. Cheng

et al. [11] investigate 22 million check-ins of users and

ﬁnd that human mobility follow certain spatial and periodic

patterns. Cho et al. [12] ﬁnd that humans experience a com-

bination of periodic movement that is geographically limited

and seemingly random jumps inﬂuenced by the social network

structure. Allamanis et al. [6] demonstrate that geographic

distance plays an important role in the creation of new social

connections and users form new ties with friends of existing

friends because connection arise among users visiting the

same place. Wei et al. [35] propose a trace-driven model for

generating synthetic LBSN datasets capturing the properties

of the original datasets. Foroozani et al. [13] propose a model

that captures human mobility properties by introducing hotspot

zones, using a graph of hotspot zones as the input area map,

dividing day time to some periods and modeling various

speeds in different times and spaces.

C. Home Location Identiﬁcation

Home Location Identiﬁcation focuses on identifying home

location of users in social networks. There are two types of

approaches to solve these problems: Content Based Approach

and Check-in Based Approach.

1) Content Based Approach: Content based approach infers

home location of users using models based on extracted

location information from texts like tweets in social networks.

Cheng et al. [10] propose a probabilistic framework for

estimating a Twitter user’s city-level location based purely

on the content of the user’s tweets. They use a classiﬁcation

component for automatically identifying words in tweets with

a strong local geo-scope and a lattice-based neighborhood

smoothing model for reﬁning a user’s location estimate. Chan-

dra et al. [9] employ a probabilistic framework to estimate the

city-level location of a Twitter user, based on the content of

the tweets in their dialogues. [23, 24] use an ensemble of

statistical and heuristic classiﬁers to predict home locations

of Twitter users based on content of tweets and tweeting

behavior of users. Li et al. [19] propose a global location

identiﬁcation method that combines multiple microblogs of

a user and utilizes them to identify the user’s location. The

method organizes points of interest into a tree structure,

extract candidate locations from each microblog of a user and

then aggregates these candidate locations and identiﬁes top-k

locations of the user.

2) Check-in Based Approach: Check-in based approach

infers home location of users leveraging check-in data of users.

Cho et al. [12] infer the home location by discretizing the

world into 25 by 25km cells and deﬁning the home location

as the average position of check-ins in the cell with the most

check-ins. Li et al. [20] propose an uniﬁed and discriminative

inﬂuence model which models inﬂuence scope of uses and

venues. They develop location prediction method to identify

home locations of users in Twitter based on the model using

signals observed from friends and venues identiﬁed in tweets.

Pontes et al. [30] use a majority voting scheme which take

the most popular location of a user as her home location. Liu

et al. [21] get the estimated home locations using a hierarchical

clustering method to cluster checkins at night.

III. DATASET DESCRIPTION

In this section, we brieﬂy introduce the main characteristics

of Fousquare as well as the crawled dataset used in our

experiments.

A. Foursquare: Background

Foursquare is currently one of the largest and most pop-

ular LBSNs. As a local search and recommendation service,

Foursquare provides search results or recommended places to

go based on targeted locations. The service was created in late

2008 and launched in 2009. Users in Foursquare can share

their locations with friends and followers through check ins.

Check ins are performed via mobile devices with GPS when

a user is close to speciﬁc locations known as venues which

represent real locations of a great variety of categories such

as airports or restaurants. As of December 2013, Foursquare

had 45 million registered users[1]. Foursquare gives incentives

to users who visit (check in) speciﬁc places (venues) using

rewards like mayorships to frequent visitors. Users can post

tips at speciﬁc venues, commenting on their experiences

when visiting the corresponding physical places. What’s more,

Foursquare enables users rate venues by answering questions

which help Foursquare understand how people feel about a

place, including such questions as whether or not a user likes

it. More than 50 million people use Foursquare and Swarm

(a companion app to Foursquare) each month, across desktop,

mobile web, and mobile apps and people have checked in more

than 8 billion times worldwide as of February 20168.

B. Foursquare Dataset

In this paper, we use a widely used and publicly available

Foursquare dataset extracted from the Foursquare applica-

tion through the public API[17, 33]. This dataset contains

2,153,471 users, 1,143,092 venues, 1,021,970 check-ins, and

27,098,490 social connections. Each user has a unique id and

a geospatial location (latitude and longitude) that represents

the user home town location. Each venue has a unique id

and a geospatial location (latitude and longitude). The social

graph data contains the social graph edges (connections) that

exist between users. Each social connection consists of two

users (friends) represented by two unique ids (ﬁrst user id and

second user id).

We focus our study on Home Location Identiﬁcation on

Foursquare users within the continental United States. Toward

this purpose, we ﬁlter all valid users and who are in the social

graph and located in continental United States. The statistical

data after applying this ﬁlter is shown in Table I.

TABLE I

SUM MARY S TATIST IC S OF FOURSQUARE DATASET

Type Number

Users 835,896

Venues 648,825

Check-ins 370,477

Social Graph Edges 12,924,609

C. Mapping Location to City

We need a method to map a location to corresponding city

so that we can know which city a user lives in given home

location of him. In this paper, we map a location to speciﬁc

city in following method: The candidate cities are the 297

cities in the United States with a population of at least 100,000

on July 1, 2014, as estimated by the United States Census

Bureau [3]. We deﬁne a location’s mapped city as the nearest

city of the location.

8https://foursquare.com/about

Fig. 1. An example of LBSN

IV. HOME LOC ATION IDENTIFI CATION PROB LE M

FORMULATION

In this section, we ﬁrstly represent a Location-based Social

Network as a directed heterogeneous graph and then formalize

the Home Location Identiﬁcation problem.

A. Location-based Social Networks Formulation

We represent a Location-based Social Network as a directed

heterogeneous graph G= (N, E ). An example of the LBSN is

shown in Figure 1. In the graph, nodes can represent users or

venues. There are two types of edges in the graph: (1) follow-

ing relationship edges from users to other users; (2) check-in

edges from users to venues; For LBSNs where friendships are

undirected, they can also be represented using the directed

heterogeneous graph by creating two following relationship

edges for each undirected edge. We denote concepts in LBSNs

as follows:

•N={ni}, i = 1...N: the set of N nodes in G

•E={ehni, nji}: the set of E edges in G, niis the tail node

and njis the head node of the edge

•U={ui}, i = 1...U: the set of U users in N

•V={Vi}, i = 1...V : the set of V venues in N

•F={fhui, uji}: following relationship from user node uito

•C={Cij }={chui, vji}: check-in edges from user node ui

to venue node vj

•UH: the set of users whose home locations are known

•U−H: the set of users whose home locations are not known

•UC: the set of users who have check-in data

•U−C: the set of users who don’t have check-in data

•L: a geographical location denoted by (Lat,Lon) where Lat

is the latitude, Lon is the longitude

•Lui: home location of user ui

•Lvj: location of venue vj

We have that: N=U∪V, E =F∪C∪R, U =UH∪U−H

and U=UC∪U−C.

Further, we denote the edges as follows:

•Ie(n): incoming nodes of node nof edge type e

•Oe(n): outgoing nodes of node nof edge type e

•If(ui): following users of user ui

•Of(ui): users that are followed by user ui

•Ic(ui): venues checked in by user ui

•Ic(vj): users who check in at venue vj

B. Home Location Identiﬁcation Problem Formulation

Home Location Identiﬁcation Problem For a Location-

based Social Network G= (N, E ), for each user in U−H,

estimate a home location e

Luiso as to make e

Luiclose to ui’s

true home location Lui.

V. TSU: TRU ST-BASED INFLUENCE MODEL

In this section, we introduce a trust-based inﬂuence model

names as T SU to model edges in Location-based Social

Networks.

A. Motivation of TSU model

Existing research have exploited social friendship and

check-in data for Home Location Identiﬁcation[12, 20, 21, 30,

34]. Our model T SU exploits the new signal “social trust”.

To be speciﬁc, T SU is Trust-baed inﬂuence model based on

Social friendship data(social friendship, social trust) and User-

centric data(check-in data).

1) social friendship: The probability of friendship de-

creases as the distance between nodes increases has been

observed from social networks like Facebook, Twitter and so

on[7, 20]. Li et al. [20] ﬁnd that different nodes have different

inﬂuence in social networks which means different head nodes

have different probabilities to attract tail nodes at the same

distance. For example, a star is more likely to attract users

who live far away than a regular user.

2) social trust: Existing methods[20] consider friend re-

lation as a binary relationship. However, closer friends in

social networks should have more inﬂuence on the home

location of friends. We propose the concept “social trust” to

measure the closeness in social structure and ﬁrstly apply it

for Home Location Identiﬁcation problem. If two users have

more common friends, the social trust value between these

two nodes will be higher and they tend to live nearer.

3) check-in data: We can predict home location of users

using his check-in data because users tend to visit venues

nearby[12, 20].

B. Social Trust

In this paper, we propose “social trust” to measure closeness

in social structure and apply it in TSU model. We denote the

social trust value of node nifor node njas Tji and measure

social trust between nodes using Jaccard Similarity[18] on

friend sets of users. Jaccard Similarity is a statistic used for

comparing the similarity and diversity of sample sets. The

Jaccard Similarity measures similarity between ﬁnite sample

sets, and is deﬁned as the size of the intersection divided by

the size of the union of the sample sets. To be speciﬁc, for

user node uiand uj, their common friends are denoted as

CF (ui, uj). Then we have that C F (ui, uj) = F(ui)∩F(uj)

where F(ui)is the friend set of user ui. We deﬁne Tji as

Equation 1:

Tji =J accard(ui, uj) = |F(ui)∩F(uj)|

|F(ui)∪F(uj)|(1)

We have that Tis symmetric and Tji =Tij .

C. Formulation of TSU Model

We use a trust-based inﬂuence model called T SU to model

edges in LBSNs. In this model, we denote the inﬂuence of

a node nias Iniwhich is a probability distribution over the

geographic plane. For a node ni, we deﬁne ni’s inﬂuence on

another node njat a location Las the probability that njbuild

an edge ehnj, niito it. A inﬂuential node will have more broad

inﬂuence scope and more inﬂuence at the same distance than

an ordinary node.

1) Inﬂuence Model of Nodes on Geographic: We choose a

gaussian distribution to capture a node’s inﬂuence model for

its expressiveness and simplicity the same as previous research

[20]. To be speciﬁc, we model a node ni’s inﬂuence Inias a

bivariate gaussian distribution N(Lni,Pni), centered at ni’s

location Lni= (latni, lonni)and with the covariance matrix

Pnias its inﬂuence scope. We assume the inﬂuence scope of

a node on the latitude and longitude dimensions is the same,

so Pni=σni

σni. The inﬂuence probability of node

niat a location L is measured in Equation 2:

P(L|Ini) = 1

2πσ2

(Latni−LatL)2+(Lonni−LonL)2

−2σni2(2)

2) Social trust-based User Inﬂuence Model: The probabil-

ity that a user uiinﬂuence a user ujto build a following edge

to him is measured in Equation 3:

P(fhuj, uii|Iui, Luj) = Tji

2πσ2

eTji

(Latui−Latuj)2+(Lonui−Lonuj)2

−2σui2

(3)

3) Venue Inﬂuence Model: The probability that a user ui

check in at venue vjis measured in Equation 4:

P(chui, vji|Ivj, Lui) = 1

2πσ2

(Latvj−Latui)2+(Lonvj−Lonui)2

−2σvj2

(4)

4) TSU Model on LBSNs: We make a conditional in-

dependence assumption that the edge from a tail node to

a head node is conditionally independent given the head

node and tail node. This assumption is widely applied in

machine learning models like Naive Bayes[25]. TSU Model

is shown in Equation 5 which measures joint probability of

generating friendship and check-in edges in LBSNs. We can

estimate unknown home location of users using the Maximum

Likelihood Estimation(MLE) principle under TSU model.

Algorithm 1 HLIA: Home Location Identiﬁcation Algorithm

Input: G, F, C, R, Lui(∀ui∈UH)

Output: Lui(∀ui∈U−H)

1: function HLI A(G, F, C , R, L)

2: // Init home location of users in U−H

3: for each ui∈U−Hdo users: no home location

4: if ui∈UCthen user: have check-in

5: Lui=SP C lustering(Cui, cτ)

6: else user: no check-in

7: Lui=Random

8: end if

9: end for

10: // Update home locations of users in U−Hiteratively

11: while true do Outer Loop

12: for each ui∈Udo

13: Update σ2

uibased on Equation 8

14: end for

15: for each vj∈Vdo

16: Update σ2

vjbased on Equation 9

17: end for

18: while true do Inner Loop

19: for each ui∈(U−H∩U−C)do

20: Calculate Latnew

uiand Lonnew

uibased on Equation

6 and 7

21: end for

22: If Inner Loop converges, then break

23: end while

24: for each ui∈(U−H∩U−C)do

25: Latui=Latnew

ui,Lonui=Lonnew

26: end for

27: If Outer Loop converges, then break

28: end while

29: end function

30:

Input: L, cτ L : the location list, cτ: cluster threshold

Output: lc

31: function SP C lustering(L, cτ)

32: C: clusters

33: for each i∈[1, Length(L)] do

34: Get the cluster Cmin that has the minimum distance dmin

with Li

35: if dmin < cτthen

36: Cmin ←Li

37: else

38: Create a new cluster Cnew

39: Cnew ←Li

40: end if

41: end for

42: return lcwhich is center of the largest cluster

43: end function

P(E|Iu, Iv)

=P(F|Iu, Iv)×P(C|Iu, Iv)

fhuj,uii∈F

Tji

2πσ2

eTji

(Latui−Latuj)2+(Lonui−Lonuj)2

−2σui2

×Y

chui,vji∈C

2πσ2

(Latvj−Latui)2+(Lonvj−Lonui)2

−2σvj2

(5)

Latui=

uj∈If(ui)

Tji Latuj

σ2

uj∈Of(ui)

Tij Latuj

σ2

vj∈Oc(ui)

Latvj

σ2

uj∈If(ui)

Tji

σ2

uj∈Of(ui)

Tij

σ2

vj∈Oc(ui)

σ2

(6)

Lonui=

uj∈If(ui)

Tji Lonuj

σ2

uj∈Of(ui)

Tij Lonuj

σ2

vj∈Oc(ui)

Lonvj

σ2

uj∈If(ui)

Tji

σ2

uj∈Of(ui)

Tij

σ2

vj∈Oc(ui)

σ2

(7)

σ2

ui=X

uj∈If(ui)

Tji

(Latuj−Latui)2+ (Lonuj−Lonui)2

2|If(ui)|

(8)

σ2

vj=P

ui∈Ic(vj)

(Latui−Latvj)2+ (Latui−Lonvj)2

2|Ic(vj)|(9)

VI. HOME LOCATION IDENTIFICATION METHOD

In this section, we develop our Home Location Identiﬁcation

method based on TSU model. To be speciﬁc, we estimate

a user’s home location that maximizes the likelihood which

represents joint probability of generating edges(friendships,

check-ins).

In TSU model shown in Equation 5, for user ui∈U−H,

both Luiand σuiare unknown; for user ui∈UHand venue

vj∈V,σuiand σvjare unknown. We differentiate Equation

5 with regard to unknown variable and obtain the results

shown in Equation 7, 8, 8, 9. In these equations, the unknown

variables are dependent on each other. We use a two-stage

algorithm called HLIA which is demonstrated in Algorithm

1 to solve the problem. In Stage 1, HLIA initializes home

location of users who have check-in data by clustering their

check-in data using a sing-pass clustering algorithm. In Stage

2, HLIA updates home location of users iteratively so that

the likelihood is maximum. We prove that HLIA converges in

Theorem 6.1.

A. Stage 1: Initialization

HLIA initializes home location of users who don’t have

home locations from Step 3 to Step 9. For a user who has

check-in data, HLIA initializes his home location by clustering

his check-in data using a sing-pass clustering algorithm on

locations called SPClustering based on the Single-pass Clus-

tering Algorithm[14]. For a user who don’t have check-in data,

HLIA initialize his home location as a random value.

The SPClustering algorithm is shown from Step 31 to Step

43. It clusters a location list to clusters in a single pass and

returns the center of the largest cluster as result. Speciﬁcally,

SPClustering scans location Liin location list sequentially

and ﬁnd the nearest cluster Cmin for the location Li. If the

minimum distance dmin is less than a threshold dτ, it adds

the location Lito the nearest cluster Cmin. Otherwise, it

creates a new cluster Cnew with the location Li. Consequently,

SPClustering is a linear algorithm.

B. Stage 2: Updating Iteratively

HLIA updates home location of users who don’t have check-

in data iteratively from Step 11 to Step 28. The outer loop from

Step 11 to Step 28 updates σ2

uiand σ2

vjbased on Equation 8

and 9. The inner loop from Step 18 to Step 26 updates Latui

and Lonuibased on Equation 6 and 7. HLIA stops when the

likelihood converges.

Theorem 6.1: The Home Location Identiﬁcation algorithm

HLIA converges.

Proof: In the inner loop, HLIA can coverage and obtain

Latuiand Lonuithat maximizes the likelihood with ﬁxed

σ2

uiand σ2

ujas shown in [37]. In the outer loop, HLIA can

directly calculate new σ2

uiand σ2

ujaccording to Equation 8

and 9 given locations of nodes. Consequently, the likelihood

will increases monotonically and the algorithm will converge.

VII. EXPERIMENTS

A. Experiment Setup

1) Dataset: As described in Section III-B, Foursquare

dataset has 835,896 users, 648,825 venues, 370,477 check-

ins, and 12,924,609 social graph edges. In the dataset, there

are 138,983 users who have check-in data, constituting only

16.7% of all users. For users who have check-in data, the

average check-in number of each user is about 2.7.

In the experiments, we deﬁne the ratio of people who have

home location as rh and rh =UH

U. We randomly split users

into two parts: rh of users have home location and 1−rh of

users don’t have have home location. In the experiments, we

select rh = 80%. This is the same way as existing methods[7,

10, 20]. In this setting, there are 669,472 users have home

location and 166,424 users don’t have home location. There

are 27,781 users(16.7%) who have check-in data among the

166,424 users who don’t have home location.

a) Methods:

•UDI is the method developed in [20], which predicts a user’s

location based on an inﬂuence model. UDI uses signals like

friendships and venues in tweets.

•Maxvote is the baseline method developed in [30], which

predicts a user’s location by taking the most popular location

of a user. We can’t directly using a max mote scheme because

location information like latitude and longitude are continuous.

So we ﬁrstly map check-in list to city list using method

described in III-C.

•ClusterHier is the baseline method developed in [21], which

predicts a user’s home location using a hierarchical clustering

algorithm to cluster checkins at night(shared from 8:00 p.m. to

7:59 a.m. every day).

•Avg is the baseline method developed in [12, 34], which

discretizes the world into 25 by 25 km cells and deﬁnes the

home location as the average position of check-ins in the cell

with the most check-ins.

•HLI A is our Home Location Identiﬁcation method.

•HLI Auc is our Home Location Identiﬁcation method, but also

update users who have check-in data in the iteration stage the

same as UDI.

2) Evaluation Metrics: We measure the performance of

different methods using accuracy within 100 miles error

distance(ACC ) the same as previous work[20]. To be speciﬁc,

for a user ui, his true and estimated home location are Luiand

Luirespectively. Let Err(ui)be the error distance between

Luiand e

Lui, then ACC is deﬁned as Equation 10.

ACC =|ui∈U−H∧E rr(ui)≤100|

|U−H|(10)

B. Experiment Results

1) Home Location Identiﬁcation for U−H∩UC:Methods

Maxvote,C lusterH ier and Avg have the limitation that

they can only predict home locations of users who have

check-in data. It means that they can only predict 16.7%

of users in U−Hin the dataset. We ﬁrstly compare the

performance of different methods on users who have check-

in data. Table II shows the performance of each method. The

results demonstrate that our method HLIA outperforms all

existing methods for users who have check-in data. To be

speciﬁc, HLIA can predict home locations of users who have

check-in data at the accuracy of 92.1% though the average

check-in number of each user is only about 2.7, and achieves

the best performance.

TABLE II

PERFORMANCE OF HOME LOC ATION ID EN TIFI CATI ON F OR U−H∩UC

Method ACC (%)

HLI A 92.1

Maxvote 91.9

ClusterHier 91.6

Avg 88.8

2) Home Location Identiﬁcation for U−H:In this exper-

iment, we compare the performance of UDI ,HLIA and

HLIAcu for all users who don’t have home location. Table

III shows the performance of each method. The column Gain

in the table deﬁnes the gain of ACC comparing to U DI and

the value of gain is equal to accm−accu

accuwhen the ACC of a

method and UDI are accmand accurespectively.

a) HLIA vs. U DI:We can see that HLIA signiﬁcantly

improves UDI by 6.9% in terms of ACC.

b) HLIA vs. H LI Auc:In the initialization stage of

HLIA, we initialize home location of users in U−Cas random

value and home location of users in UCby clustering their

check-in data. If we update home location of users in UC

using the randomly initialized locations in the updating stage,

the accuracy of estimated home location of users in UCmay

be affect. This is proved in the experiments. By comparing

HLIA and H LI Auc, we see that only update locations of

Fig. 2. Accuracy under different error distances

users in U−Cin updating stage of HLIA can improve the

ACC by 1.3%.

TABLE III

PERFORMANCE OF HOME LOCATION IDENT IFIC ATIO N FO R U−H

Method ACC (%) Gain(100%)

UDI 59.0 0

HLI Auc 62.3 5.6

HLI A 63.1 6.9

3) Inﬂuence of Error Distance: We have used the error

distance 100 miles to measure accuracy as illustrated in

VII-A2. To investigate the inﬂuence of error distance, we

measure accuracy on different values of error distance and

the result is shown in Figure 2. Our method performs much

better than state-of-the-art methods when error distance is less

than 100. For example, the accuracy of our method and UDI

are 0.532 and 0.454 respectively when the error distance is 20

miles. The accuracy will be close to 1 when the error distance

is more than 2,500 miles.

4) Inﬂuence of ratio of users who have home location:

To investigate the inﬂuence of ratio of users who have home

location, we evaluate methods in another setting where rh =

0.2, which means that only 20% users have home location.

This setting is more close to the real-world case. Table III

shows the performance of each method. From Table IV, we

ﬁnd that HLIA signiﬁcantly outperforms U DI by 47.4%. By

comparing Table III and IV, we ﬁnd that HLIA outperforms

UDI even more when fewer users have home location.

TABLE IV

INFLUENCE OF RATIO OF USERS WHO HAVE HOME LOCATION

Method ACC (%) Gain(100%)

UDI 33.1 0

HLI A 48.8 47.4

C. Discussion

Content Based Approach [9, 10, 19, 23, 24] infers home

location based on texts in social networks. This approach needs

texts data in social network. What’s more, venue information

in texts can be noisy and ambiguous: Users may just mention

a venue because of news and there can be many places with

the same name. Our method avoids problems like these using

check-in data.

Check-in Based Approach[12, 20, 21, 30, 34] infers home

location of users using check-in data. Existing methods like

Maxvote[30], C lusterH ier[21] and Avg[12, 34] have the

shortcoming that they can only predict home locations of users

who have check-in data. However, in real world, only 12%

of adult smartphone owners have used geo-social services to

check-in at some location[2]. Our method HLIA can predict

home locations of users who have check-in data at the accuracy

of 92.1% when the average check-in number of each user is

only about 2.7 and achieves the best performance. Our method

can predict home locations of users who don’t have check-in

data at the accuracy of 63.1% when only 16.7% have check-

in data and 20% of users don’t have home location, outper-

forming state-of-the-art methods by about 6.9%. Comparing

to Li et al. [20], our method can model closeness in social

structure in LBSNs. What’s more, our method don’t need to

update home location of users who have check-in data in the

updating stage.

In a word, our method outperforms state-of-the-art methods

greatly and achieves the best performance.

VIII. CONCLUSION AND FUT UR E WORK

Home Location Identiﬁcation of users in Location-based

Social Networks is important for location-based applications

such as personal search and recommendations. In this paper,

we propose a Trust-based inﬂuence model called TSU based

on Social relationships data(social relationships, social trust)

and User-centric data(check-ins) in LBSN. We develop a

method for this problem based on TSU model. Extensive ex-

periments on a large scale dataset demonstrate that our method

outperforms state-of-the-art methods by 6.9%. Comparing to

previous research, we ﬁrstly demonstrate the effectiveness of

using social trust which measures closeness in social structure

for Home Location Identiﬁcation problem.

In future, we will further study how to use time information

and social structure in social networks for Home Location

Identiﬁcation problem. What’s more, we plan to do research

on how to improve location-based services based on our

Home Location Identiﬁcation method and study how to protect

privacy of users in social networks.

REFERENCES

[1] Foursquare. https://en.wikipedia.org/wiki/Foursquare.

[2] Pewresearch. http://www.pewinternet.org/2013/09/12/

location-based-services.

[3] United state cities by population. https://en.wikipedia.

org/wiki/List_of_United_States_cities_by_population.

[4] Fabian Abel, Qi Gao, Geert-Jan Houben, and Ke Tao.

Analyzing user modeling on twitter for personalized

news recommendations. In User Modeling, Adaption and

Personalization, pages 1–12. Springer, 2011.

[5] Amr Ahmed, Yucheng Low, Mohamed Aly, Vanja Josi-

fovski, and Alexander J Smola. Scalable distributed in-

ference of dynamic user interests for behavioral targeting.

In Proceedings of the 17th ACM SIGKDD international

conference on Knowledge discovery and data mining,

pages 114–122. ACM, 2011.

[6] Miltiadis Allamanis, Salvatore Scellato, and Cecilia Mas-

colo. Evolution of a location-based online social network:

analysis and models. In Proceedings of the 2012 ACM

conference on Internet measurement conference, pages

145–158. ACM, 2012.

[7] Lars Backstrom, Eric Sun, and Cameron Marlow. Find

me if you can: improving geographical prediction with

social and spatial proximity. In Proceedings of the 19th

international conference on World wide web, pages 61–

70. ACM, 2010.

[8] David Carmel, Naama Zwerdling, Ido Guy, Shila Ofek-

Koifman, Nadav Har’El, Inbal Ronen, Erel Uziel, Sivan

Yogev, and Sergey Chernov. Personalized social search

based on the user’s social network. In Proceedings of

the 18th ACM conference on Information and knowledge

management, pages 1227–1236. ACM, 2009.

[9] Swarup Chandra, Latifur Khan, and Fahad Bin Muhaya.

Estimating twitter user location using social interactions–

a content based approach. In Privacy, Security, Risk

and Trust (PASSAT) and 2011 IEEE Third Inernational

Conference on Social Computing (SocialCom), 2011

IEEE Third International Conference on, pages 838–843.

IEEE, 2011.

[10] Zhiyuan Cheng, James Caverlee, and Kyumin Lee. You

are where you tweet: a content-based approach to geo-

locating twitter users. In Proceedings of the 19th ACM

international conference on Information and knowledge

management, pages 759–768. ACM, 2010.

[11] Zhiyuan Cheng, James Caverlee, Kyumin Lee, and

Daniel Z Sui. Exploring millions of footprints in location

sharing services. ICWSM, 2011:81–88, 2011.

[12] Eunjoon Cho, Seth A Myers, and Jure Leskovec. Friend-

ship and mobility: user movement in location-based so-

cial networks. In Proceedings of the 17th ACM SIGKDD

international conference on Knowledge discovery and

data mining, pages 1082–1090. ACM, 2011.

[13] Ahmad Foroozani, Mohammed Gharib, Ali Moham-

mad Afshin Hemmatyar, and Ali Movaghar. A novel

human mobility model for manets based on real data.

In Computer Communication and Networks (ICCCN),

2014 23rd International Conference on, pages 1–7. IEEE,

2014.

[14] William B Frakes and Ricardo Baeza-Yates. Information

retrieval: data structures and algorithms. 1992.

[15] Xiaowen Gong, Xu Chen, Kai Xing, Dong-Hoon Shin,

Mengyuan Zhang, and Junshan Zhang. Personalized lo-

cation privacy in mobile networks: A social group utility

approach. In Computer Communications (INFOCOM),

2015 IEEE Conference on, pages 1008–1016. IEEE,

2015.

[16] Hamed Haddadi, Richard Mortier, and Steven Hand.

Privacy analytics. ACM SIGCOMM Computer Commu-

nication Review, 42(2):94–98, 2012.

[17] Justin J Levandoski, Mohamed Sarwat, Ahmed Eldawy,

and Mohamed F Mokbel. Lars: A location-aware rec-

ommender system. In Data Engineering (ICDE), 2012

IEEE 28th International Conference on, pages 450–461.

IEEE, 2012.

[18] Michael Levandowsky and David Winter. Distance

between sets. Nature, 234(5323):34–35, 1971.

[19] Guoliang Li, Jun Hu, Jianhua Feng, and Kian-lee Tan.

Effective location identiﬁcation from microblogs. In

Data Engineering (ICDE), 2014 IEEE 30th International

Conference on, pages 880–891. IEEE, 2014.

[20] Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and

Kevin Chen-Chuan Chang. Towards social user pro-

ﬁling: uniﬁed and discriminative inﬂuence model for

inferring home locations. In Proceedings of the 18th

ACM SIGKDD international conference on Knowledge

discovery and data mining, pages 1023–1031. ACM,

2012.

[21] Hao Liu, Yaoxue Zhang, Yuezhi Zhou, Di Zhang, Xi-

aoming Fu, and KK Ramakrishnan. Mining checkins

from location-sharing services for client-independent ip

geolocation. In INFOCOM, 2014 Proceedings IEEE,

pages 619–627. IEEE, 2014.

[22] Jiahui Liu, Peter Dolan, and Elin Rønby Pedersen. Per-

sonalized news recommendation based on click behavior.

In Proceedings of the 15th international conference on

Intelligent user interfaces, pages 31–40. ACM, 2010.

[23] Jalal Mahmud, Jeffrey Nichols, and Clemens Drews.

Where is this tweet from? inferring home locations of

twitter users. ICWSM, 12:511–514, 2012.

[24] Jalal Mahmud, Jeffrey Nichols, and Clemens Drews.

Home location identiﬁcation of twitter users. ACM Trans-

actions on Intelligent Systems and Technology (TIST), 5

(3):47, 2014.

[25] Andrew McCallum, Kamal Nigam, et al. A comparison

of event models for naive bayes text classiﬁcation. In

AAAI-98 workshop on learning for text categorization,

volume 752, pages 41–48. Citeseer, 1998.

[26] Alan Mislove, Bimal Viswanath, Krishna P Gummadi,

and Peter Druschel. You are who you know: inferring

user proﬁles in online social networks. In Proceedings of

the third ACM international conference on Web search

and data mining, pages 251–260. ACM, 2010.

[27] Ben Niu, Qinghua Li, Xiaoyan Zhu, and Hui Li. A ﬁne-

grained spatial cloaking scheme for privacy-aware users

in location-based services. In Computer Communication

and Networks (ICCCN), 2014 23rd International Confer-

ence on, pages 1–8. IEEE, 2014.

[28] Ed Novak and Qun Li. Near-pri: Private, proximity based

location sharing. In INFOCOM, 2014 Proceedings IEEE,

pages 37–45. IEEE, 2014.

[29] Sarah Pidcock and Urs Hengartner. Zerosquare: A

privacy-friendly location hub for geosocial applications.

In Proc. 2nd ACM SIGCOMM Workshop Networking,

Systems, and Applications Mobile Handhelds, 2013.

[30] Tatiana Pontes, Marisa Vasconcelos, Jussara Almeida,

Ponnurangam Kumaraguru, and Virgilio Almeida. We

know where you live: privacy characterization of

foursquare behavior. In Proceedings of the 2012 ACM

Conference on Ubiquitous Computing, pages 898–905.

ACM, 2012.

[31] Foster Provost, Brian Dalessandro, Rod Hook, Xiaohan

Zhang, and Alan Murray. Audience selection for on-

line brand advertising: privacy-friendly social network

targeting. In Proceedings of the 15th ACM SIGKDD

international conference on Knowledge discovery and

data mining, pages 707–716. ACM, 2009.

[32] Feng Qiu and Junghoo Cho. Automatic identiﬁcation of

user interest for personalized search. In Proceedings of

the 15th international conference on World Wide Web,

pages 727–736. ACM, 2006.

[33] Mohamed Sarwat, Justin J Levandoski, Ahmed Eldawy,

and Mohamed F Mokbel. Lars*: a scalable and efﬁcient

location-aware recommender system. IEEE Transactions

on Knowledge and Data Engineering (TKDE), 2013.

[34] Salvatore Scellato, Anastasios Noulas, Renaud Lam-

biotte, and Cecilia Mascolo. Socio-spatial properties of

online location-based social networks. ICWSM, 11:329–

336, 2011.

[35] Wei Wei, Xiaojun Zhu, and Qun Li. Lbsnsim: analyzing

and modeling location-based social networks. In INFO-

COM, 2014 Proceedings IEEE, pages 1680–1688. IEEE,

2014.

[36] Ning Xia, Han Hee Song, Yong Liao, Marios Iliofotou,

Antonio Nucci, Zhi-Li Zhang, and Aleksandar Kuz-

manovic. Mosaic: Quantifying privacy leakage in mobile

networks. In ACM SIGCOMM Computer Communication

Review, volume 43, pages 279–290. ACM, 2013.

[37] Zhijun Yin, Rui Li, Qiaozhu Mei, and Jiawei Han. Ex-

ploring social tagging graph for web object classiﬁcation.

In Proceedings of the 15th ACM SIGKDD international

conference on Knowledge discovery and data mining,

pages 957–966. ACM, 2009.

[38] Leah Zhao, Neil Wong Hon Chan, Shanchieh Jay Yang,

and Roy W Melton. Privacy sensitive resource access

monitoring for android systems. In Computer Communi-

cation and Networks (ICCCN), 2015 24th International

Conference on, pages 1–6. IEEE, 2015.

[39] Yuan Zhong, Nicholas Jing Yuan, Wen Zhong, Fuzheng

Zhang, and Xing Xie. You are where you go: Inferring

demographic attributes from location check-ins. In Pro-

ceedings of the Eighth ACM International Conference

on Web Search and Data Mining, pages 295–304. ACM,

2015.

Correlation between Triadic Closure and Homophily Formed over Location-Based Social Networks

Article

Full-text available

Feb 2021

Social Internet of Things (SIoT) is a variation of social networks that adopt the property of peer-to-peer networks, in which connections between the things and social actors are automatically established. SIoT is a part of various organizations that inherit the social interaction, and these organizations include industries, institutions, and other establishments. Triadic closure and homophily are the most commonly used measures to investigate social networks’ formation and nature, where both measures are used exclusively or with statistical models. The triadic closure patterns are mapped for actors’ communication behavior over a location-based social network, affecting the homophily. In this study, we investigate triads emergence in homophilic social networks. This evaluation is based on the empirical review of triads within social networks (SNs) formed on Big Data. We utilized a large location-based dataset for an in-depth analysis, the Chinese telecommunication-based anonymized call detail records (CDRs). Two other openly available datasets, Brightkite and Gowalla, were also studied. We identified and proposed three social triad classes in a homophilic network to feature the correlation between social triads and homophily. The study opened a promising research direction that relates the variation of homophily based on closure triads nature. The homophilic triads are further categorized into transitive and intransitive groups. As our concluding research objective, we examined the relative triadic throughput within a location-based social network for the given datasets. The research study attains significant results highlighting the positive connection between homophily and a specific social triad class.

Rehumanize geoprivacy: from disclosure control to human perception

Article

Full-text available

Feb 2022
GeoJournal

Traditional boundaries between people are vanishing due to the rise of Internet of Things technology. Our smart devices keep us connected to the world, but also monitor our daily lives through an unprecedented amount data collection. As a result, defining privacy has become more complicated. Individuals want to leverage new technology (e.g., making friends through sharing private experiences) and also avoid unwanted consequences (e.g., targeted advertising). In the age of ubiquitous digital content, geoprivacy is unique because concerns in this area are constantly changing and context-dependent. Multiple factors influence people’s location disclosure decisions, including time, culture, demographics, spatial granularity, and trust. Existing research primarily focuses on the computational efforts of protecting geoprivacy, while the variation of geoprivacy perceptions has yet to receive adequate attention in the data science literature. In this work, we explore geoprivacy from a cognate-based perspective and tackle our changing perception of the concept from multiple angles. Our objectives are to rehumanize this field from contextual, cultural, and economic dimensions and highlight the uniqueness of geodata under the broad topic of privacy. It is essential that we understand the spatial variations of geoprivacy perceptions in the era of big data. Masking geographic coordinates can no longer fully anonymize spatial data, and targeted geoprivacy protection needs to be further investigated to improve user experience.

Geospatial Data: From Theory to Practice

Chapter

Sep 2023

Developers Need Protection, Too: Perspectives and Research Challenges for Privacy in Social Coding Platforms

Conference Paper

Full-text available

May 2023

Social Coding Platforms (SCPs) like GitHub have become central to modern software engineering thanks to their collaborative and version-control features. Like in mainstream Online Social Networks (OSNs) such as Facebook, users of SCPs are subjected to privacy attacks and threats given the high amounts of personal and project-related data available in their profiles and software repositories. However, unlike in OSNs, the privacy concerns and practices of SCP users have not been extensively explored nor documented in the current literature. In this work, we present the preliminary results of an online survey (N=105) addressing developers' concerns and perceptions about privacy threats steaming from SCPs. Our results suggest that, although users express concern about social and organisational privacy threats, they often feel safe sharing personal and project-related information on these platforms. Moreover, attacks targeting the inference of sensitive attributes are considered more likely than those seeking to re-identify source-code contributors. Based on these findings, we propose a set of recommendations for future investigations addressing privacy and identity management in SCPs.

Interactive COVID-19 Mobility Impact and Social Distancing Analysis Platform

Article

Sep 2021

The research team has utilized privacy-protected mobile device location data, integrated with COVID-19 case data and census population data, to produce a COVID-19 impact analysis platform that can inform users about the effects of COVID-19 spread and government orders on mobility and social distancing. The platform is being updated daily, to continuously inform decision-makers about the impacts of COVID-19 on their communities, using an interactive analytical tool. The research team has processed anonymized mobile device location data to identify trips and produced a set of variables, including social distancing index, percentage of people staying at home, visits to work and non-work locations, out-of-town trips, and trip distance. The results are aggregated to county and state levels to protect privacy, and scaled to the entire population of each county and state. The research team is making their data and findings, which are updated daily and go back to January 1, 2020, for benchmarking, available to the public to help public officials make informed decisions. This paper presents a summary of the platform and describes the methodology used to process data and produce the platform metrics.

Please Forget Where I Was Last Summer: The Privacy Risks of Public Location (Meta)Data

Conference Paper

Full-text available

Jan 2019

The exposure of location data constitutes a significant privacy risk to users as it can lead to de-anonymization, the inference of sensitive information, and even physical threats. In this paper we present LPAuditor, a tool that conducts a comprehensive evaluation of the privacy loss caused by public location metadata. First, we demonstrate how our system can pinpoint users’ key locations at an unprecedented granularity by identifying their actual postal addresses. Our evaluation on Twitter data highlights the effectiveness of our techniques which outperform prior approaches by 18.9%-91.6% for homes and 8.7%-21.8% for workplaces. Next we present a novel exploration of automated private information inference that uncovers “sensitive” locations that users have visited (pertaining to health, religion, and sex/nightlife). We find that location metadata can provide additional context to tweets and thus lead to the exposure of private information that might not match the users’ intentions. We further explore the mismatch between user actions and information exposure and find that older versions of the official Twitter apps follow a privacy-invasive policy of including precise GPS coordinates in the metadata of tweets that users have geotagged at a coarse-grained level (e.g., city). The implications of this exposure are further exacerbated by our finding that users are considerably privacy-cautious in regards to exposing precise location data. When users can explicitly select what location data is published, there is a 94.6% reduction in tweets with GPS coordinates. As part of current efforts to give users more control over their data, LPAuditor can be adopted by major services and offered as an auditing tool that informs users about sensitive information they (indirectly) expose through location metadata.

Urban Knowledge Graph Aided Mobile User Profiling

Article

Jul 2023

Nowadays, the explosive growth of personalized web applications and the rapid development of artificial intelligence technology have flourished the recent research on mobile user profiling, i.e., inferring the user profile from mobile behavioral data. Particularly, existing studies mainly follow the data-driven paradigm to develop feature engineering and representation learning on such data, which however suffer from the robustness issue, i.e., generalizing poorly across datasets and profiles without considering semantic knowledge therein. In comparison, the rising knowledge-driven paradigm built upon the knowledge graph (KG) offers a potential solution to mitigate such weakness. Therefore, in this paper, we propose a Knowledge Graph aided framework for Mobile User Profiling (KG-MUP). Specifically, to distil semantic knowledge among data, we firstly construct an urban knowledge graph (UrbanKG) with domain entities like users, regions, point of interests (POIs), etc. identified, as well as semantic relations for home, workplace, spatiality, etc. extracted. Moreover, we leverage tensor decomposition and graph neural network to obtain knowledgeable user representations from UrbanKG. In addition, we introduce several customized features to quantify individual mobility characteristics for mobile user profiling. Extensive experiments on three real-world mobility datasets demonstrate that KG-MUP achieves state-of-the-art performance on user profile inference tasks. Moreover, further results also reveal the importance of various semantic knowledge to user profile inference, which provides meaningful insights on user modeling with mobile behavioral data.

Note: Home Location Detection from Mobile Phone Data: Evidence from Togo

Conference Paper

Jun 2022

When Machine Learning Meets Privacy A Survey and Outlook

Article

Full-text available

Mar 2021

The newly emerged machine learning (e.g., deep learning) methods have become a strong driving force to revolutionize a wide range of industries, such as smart healthcare, financial technology, and surveillance systems. Meanwhile, privacy has emerged as a big concern in this machine learning-based artificial intelligence era. This article is a comprehensive study on privacy preservation problems and machine learning. The survey covers three categories of interactions between privacy and machine learning: (i) private machine learning, (ii) machine learning-aided privacy protection, and (iii) machine learning-based privacy attack and corresponding protection schemes. The current research progress in each category is reviewed and the key challenges are identified. Finally, based on our in-depth analysis of the area of privacy and machine learning, we point out future research directions in this field.

A POI Group Recommendation Method in Location-Based Social Networks Based on User Influence

Article

Full-text available

Jan 2021
EXPERT SYST APPL

Group recommendation has attracted researchers’ attention in various domains, specifically such approaches utilizing location-based social networks (LBSNs). However, point of interest (POI) group recommendation faces the challenge of aggregating diverse user preferences, while group members have different influences on the final decision of the group. Besides, the recommendation of spatial items is different from non-spatial items and the unique features of the spatial items such as distance must be considered in the recommendation. In this paper, a POI group recommendation method is proposed to tackle this problem. User influence is modeled fuzzy and taken into account the difference of users’ personality and their preferences when are alone or in a group, by using historical check-in data in LBSNs and in terms of category, distance and time. The proposed method is integrated with the weighted average aggregation to improve the efficiency of the POI group recommendation. Experimental results in a real dataset show improvement in the accuracy of POI group recommendations in varying sizes of groups. The results also get better when the user influence is calculated using the fuzzy approach. Besides, studying user behavior differences to choose the place to visit when alone or in a group shows that i) the flexibility of users in distance is less than time and category. It is also in the category less than time. ii) Time has a greater range of behavioral change than distance and category. iii) Users who actively participate in group decision making have a more significant number of visits in groups than when they are alone.

A novel human mobility model for MANETs based on real data

Conference Paper

Full-text available

Aug 2014

Performance evaluation of mobile networks needs accurate simulation set up including realistic characteristics. The most important issue in mobile networks' simulation is the mobility of the nodes. Since mobile nodes usually are carried by humans, thus, nodes mobility should be modelized as human movement. To the best of our knowledge, none of the existing mobility models have the ability to modelize all the human movement characteristics. In this paper, a new mobility model has been proposed based on the human mobility data collected for more than 6000 hours. The new model captures human mobility properties by introducing hotspot zones, using a graph of hotspot zones as the input area map, dividing day time to some periods and modeling various speeds in different times and spaces. Moreover, it models some other important human mobility features that had been modeled in previous works. To evaluate the performance of the proposed model, it is compared with real collected data.

Mining checkins from location-sharing services for client-independent IP geolocation

Conference Paper

Full-text available

Apr 2014

Accurately determining the geographic location of an Internet host is important for location-aware applications such as location-based advertising and network diagnostics. Despite their fast response time, widely used database-driven geolocation approaches provide only inaccurate locations. Delay measurement based approaches improve the estimation accuracy but still suffer from a limited precision (about 10 km) and a long response time (tens of seconds) to localize a single PC, which cannot meet the demand of precise and real-time geolocation for location-aware applications. In this paper, we propose a new geolocation approach, Checkin-Geo, which exploits geolocation resources fundamentally different from existing database-driven (using DNS, Whois, etc.) or network delay measurement based approaches. In particular, we leverage the location data that users are willing to share in location-sharing services and logs of user logins from PCs for real-time and accurate geolocation. Experimental results show that compared to existing geolocation techniques, Checkin-Geo achieves 1) a median estimation error of 799 meters (an order of magnitude smaller than existing approaches), and 2) a negligible response time, which are promising for accurate location-aware applications.

LBSNSim: Analyzing and modeling location-based social networks

Conference Paper

Full-text available

Apr 2014

The soaring adoption of location-based social networks (LBSNs) makes it possible to analyze human socio-spatial behaviors based on large-scale realistic data, which is important to both the research community and the design of new location-based social applications. However, performing direct measurements on LBSNs is impractical, because of the security mechanisms of existing LBSNs, and high time and resource costs. The problem is exacerbated by the scarcity of available LBSN datasets, which is mainly due to the privacy concerns and the hardness of distributing large-volume data. As a result, only a very few number of LBSN datasets are publicly released. In this paper, we extract and study the universal statistical features of three LBSN datasets, and propose LBSNSim, a trace-driven model for generating synthetic LBSN datasets capturing the properties of the original datasets. Our evaluation shows that LBSNSim provides an accurate representation of target LBSNs.

Privacy Sensitive Resource Access Monitoring for Android Systems

Conference Paper

Aug 2015

Personalized location privacy in mobile networks: A social group utility approach

Conference Paper

Apr 2015

Exploring millions of footprints in location sharing services

Article

Jan 2011

Z. Cheng

A fine-grained spatial cloaking scheme for privacy-aware users in Location-Based Services

Conference Paper

Aug 2014

In Location-Based Services (LBSs) mobile users submit location-related queries to the untrusted LBS server to get service. However, such queries increasingly induce privacy concerns from mobile users. To address this problem, we propose FGcloak, a novel fine-grained spatial cloaking scheme for privacy-aware mobile users in LBSs. Based on a novel use of modified Hilbert Curve in a particular area, our scheme effectively guarantees k-anonymity and at the same time provides larger cloaking region. It also uses a parameter σ for users to make fine-grained control on the system overhead based on the resource constraints of mobile devices. Security analysis and empirical evaluation results verify the effectiveness and efficiency of our scheme.

You Are Where You Go: Inferring Demographic Attributes from Location Check-ins

Article

Feb 2015

User profiling is crucial to many online services. Several recent studies suggest that demographic attributes are predictable from different online behavioral data, such as users' "Likes" on Facebook, friendship relations, and the linguistic characteristics of tweets. But location check-ins, as a bridge of users' offline and online lives, have by and large been overlooked in inferring user profiles. In this paper, we investigate the predictive power of location check-ins for inferring users' demographics and propose a simple yet general location to profile (L2P) framework. More specifically, we extract rich semantics of users' check-ins in terms of spatiality, temporality, and location knowledge, where the location knowledge is enriched with semantics mined from heterogeneous domains including both online customer review sites and social networks. Additionally, tensor factorization is employed to draw out low dimensional representations of users' intrinsic check-in preferences considering the above factors. Meanwhile, the extracted features are used to train predictive models for inferring various demographic attributes. We collect a large dataset consisting of profiles of 159,530 verified users from an online social network. Extensive experimental results based upon this dataset validate that: 1) Location check-ins are diagnostic representations of a variety of demographic attributes, such as gender, age, education background, and marital status; 2) The proposed framework substantially outperforms compared models for profile inference in terms of various evaluation metrics, such as precision, recall, F-measure, and AUC.

Effective location identification from microblogs

Conference Paper

Mar 2014

The rapid development of social networks has resulted in a proliferation of user-generated content (UGC). The UGC data, when properly analyzed, can be beneficial to many applications. For example, identifying a user's locations from microblogs is very important for effective location-based advertisement and recommendation. In this paper, we study the problem of identifying a user's locations from microblogs. This problem is rather challenging because the location information in a microblog is incomplete and we cannot get an accurate location from a local microblog. To address this challenge, we propose a global location identification method, called Glitter. Glitter combines multiple microblogs of a user and utilizes them to identify the user's locations. Glitter not only improves the quality of identifying a user's location but also supplements the location of a microblog so as to obtain an accurate location of a microblog. To facilitate location identification, GLITTER organizes points of interest (POIs) into a tree structure where leaf nodes are POIs and non-leaf nodes are segments of POIs, e.g., countries, states, cities, districts, and streets. Using the tree structure, Glitter first extracts candidate locations from each microblog of a user which correspond to some tree nodes. Then Glitter aggregates these candidate locations and identifies top-k locations of the user. Using the identified top-k user locations, Glitter refines the candidate locations and computes top-k locations of each microblog. To achieve high recall, we enable fuzzy matching between locations and microblogs. We propose an incremental algorithm to support dynamic updates of microblogs. Experimental results on real-world datasets show that our method achieves high quality and good performance, and scales very well.

Near-pri: Private, proximity based location sharing

Conference Paper

Apr 2014

As the ubiquity of smartphones increases we see an increase in the popularity of location based services. Specifically, online social networks provide services such as alerting the user of friend co-location, and finding a user's k nearest neighbors. Location information is sensitive, which makes privacy a strong concern for location based systems like these. We have built one such service that allows two parties to share location information privately and securely. Our system allows every user to maintain and enforce their own policy. When one party, (Alice), queries the location of another party, (Bob), our system uses homomorphic encryption to test if Alice is within Bob's policy. If she is, Bob's location is shared with Alice only. If she is not, no user location information is shared with anyone. Due to the importance and sensitivity of location information, and the easily deployable design of our system, we offer a useful, practical, and important system to users. Our main contribution is a flexible, practical protocol for private proximity testing, a useful and efficient technique for representing location values, and a working implementation of the system we design in this paper. It is implemented as an Android application with the Facebook online social network used for communication between users.

We Know Where You Are: Home Location Identification in Location-Based Social Networks

Recommended publications

Static and Dynamic Approach of Social Roles Identification Using PISNA and Subgraphs Matching

Influence of Social Networks, Opportunity Identification on the Performance of Internet Entrepreneur...

On De-anonymization of Single Tweet Messages

Macro-and micromodels of social networks. Part 2. Identification and imitational experiments