Conference PaperPDF Available

Familiar strangers detection in online social networks

Authors:
  • Paris School of business, France
  • Université de Technologie de Troyes UTT

Abstract and Figures

Online social networks and microblogging platforms have collected a huge number of users this last decade. On such platforms, traces of activities are automatically recorded and stored on remote servers. Open data deriving from these traces of interactions represent a major opportunity for social network analysis and mining. This leads to important challenges when trying to understand and analyse these large-scale networks better. Recently, many sociological concepts such as friendship, community, trust and reputation have been transposed and integrated into online social networks. The recent success of mobile social networks and the increasing number of nomadic users of online social networks can contribute to extending the scope of these concepts. In this paper, we transpose the notion of the Familiar Stranger, which is a sociological concept introduced by Stanley Milgram. We propose a framework particularly adapted to online platforms that allows this concept to be defined. Various application fields may be considered: entertainment, services, homeland security, etc. To perform the detection task, we address the concept of familiarity based on spatio-temporal and attribute similarities. The paper ends with a case study of the well-known microblogging platform Twitter.
Content may be subject to copyright.
Familiar Strangers detection in online social
networks
Charles PEREZ, Babiga BIRREGAH, Marc LEMERCIER
ICD (Charles Delaunay Institute) - UMR CNRS STMR 6279
University of Technology of Troyes,
12 rue Marie Curie,
10 010 Troyes Cedex
{charles.perez, babiga.birregah, marc.lemercier}@utt.fr
Abstract—Online social networks and microblogging platforms
have collected a huge number of users this last decade. On
such platforms, traces of activities are automatically recorded
and stored on remote servers. Open data deriving from these
traces of interactions represent a major opportunity for social
network analysis and mining. This leads to important challenges
when trying to understand and analyse these large-scale networks
better. Recently, many sociological concepts such as friendship,
community, trust and reputation have been transposed and
integrated into online social networks. The recent success of
mobile social networks and the increasing number of nomadic
users of online social networks can contribute to extending the
scope of these concepts. In this paper, we transpose the notion of
the Familiar Stranger, which is a sociological concept introduced
by Stanley Milgram. We propose a framework particularly
adapted to online platforms that allows this concept to be defined.
Various application fields may be considered: entertainment,
services, homeland security, etc. To perform the detection task,
we address the concept of familiarity based on spatio-temporal
and attribute similarities. The paper ends with a case study of
the well-known microblogging platform Twitter.
Index Terms—Familiar Stranger, Social Network Analysis,
Nomadism, Online Social Networks, Smartphones, Geo-location,
Twitter
I. INTRODUCTION
On social networking sites, a user can create a virtual
identity and interact online with other users. By definition,
social networking sites can allow the user to: (1) construct a
public or semi-public profile within the system, (2) manage
a list of other users with whom they share a connection and
(3) view and traverse their list of connections [1]. Although
this definition only contains basic features, social networking
sites have been enriched by many other services such as text,
picture and video publishing or geolocation services. With
the increase in the number of participants, these networks
become more and more complex and can easily integrate
a wide range of sociological concepts such as friendship,
neighbourhood, community, prestige, etc. Figure 1 highlights
some concepts that apply to both the virtual and the physical
worlds. Depending on the context, each concept has relatively
similar meanings.
Geosocial data represent a good example of the connections
between the virtual and physical worlds [2]. Geosocial data can
be defined as geolocated or geotagged data that are generated
Fig. 1. Common concepts of online and offline social networks
from a social platform. These data represent traces of inter-
actions that help to reconstitute networks in both virtual and
physical worlds. A message sent online by a user with a smart
device (e.g. smartphone, smart tablet) represents a virtual
interaction but also contains geolocation data. Geolocation
can allow the detection of physical proximity between users
which can then contribute to the construction of physical social
networks. The bridge represented by this type of data can help
to enlarge the possibilities of applications and should permit a
better understanding of the relationship between users’ online
and offline lives [3], [4], [5]. Those networks that combine
measures of the physical world with human input are often
referred to as cyber-physical social networks [6]. In this work,
we exploit these cyber-physical social networks by introducing
a framework that aims to detect Familiar Strangers (FS).
The concept of the Familiar Stranger was first introduced
by S. Milgram in 1972 [7]. Our Familiar Stranger is a person
whom we observe regularly but without direct interaction. An
example of Familiar Stranger are people who take the same bus
as us every day, whom we encounter repeatedly but without
direct interaction (e.g. talking with). They are not friends,
but they are more likely to become our friends than simple
strangers. It is important to emphasise that this concept is
sociological and involves several dimensions when adapted
to the online sphere (behavioural, spatial and temporal, etc.).
The growth of digital social networks offers a good opportu-
nity for the investigation of the different dimensions of this
phenomenon with several theoretical and applied challenges.
Various application fields may be considered: entertainment,
services, homeland security, etc. [8], [9].
The remainder of this paper is organised as follows. Section
II provides some definitions of the Familiar Stranger and
discusses their limitations with respect to the original concept
introduced by S. Milgram. Section III presents an overview of
the multi-dimensional model and its usefulness to address the
FS detection. After some preliminary definitions, in Section IV
we introduce a new definition of FS in the context of Online
Social Networks (OSN). Section V presents an algorithm for
detecting FS. An application to Twitter is presented in section
VI and the last section concludes this paper.
II. TH E FAMILIAR STRANGER:CONCEPT AND REL ATED
WORKS
The Familiar Stranger concept, as described in the reference
literature, has been adapted to many situations. In this section,
we present the most relevant contributions for detecting the
Familiar Stranger.
In his original experiment, S. Milgram proposes a simple
way to highlight the existence of Familiar Strangers in a real-
life social network [10]. He proposes to his students of the
University of New York that they go to a train station at a
particular time in the morning and take pictures of people
waiting there. A week later, he asks his students to show
their photographs to the people in the picture and ask them
who they recognise and with whom they ever interact. This
experiment shows that most people are able to clearly identify
many individuals with whom they never interact but whose
faces are familiar. These individuals are neither friends nor
strangers, but Familiar Strangers.
A first approach to automatically detecting such individuals
was proposed by [11]. This approach revisits the S. Milgram
experiment with the use of Bluetooth devices called Jabber-
wockies. These devices are worn by individuals or placed
in static locations such as bus stops or in train stations.
They allow the detection of Familiar Strangers based on both
the neighbourhood of an individual (within 20 meters) and
proximity to a static set of chosen locations. These locations
are chosen based on the places where Familiar Strangers are
more likely to meet (e.g. bus stop, train station).
A drawback of this experiment is the need to place specific
devices on both individuals and locations that are observed.
This implies that the experiment is performed on a specific
set of individuals and in a predefined spatio-temporal context.
[12] have presented a mobility model that takes into account
the duration and frequency of contacts between people to
compute a familiarity metric. This work states that Familiar
Strangers are all pairs of individuals that meet regularly but
do not spend time with each other. The proposed framework is
based on real human mobility datasets and thus fully takes into
account the spatio-temporal aspects. However, this approach
reduces the problem of FS detection purely to spatio-temporal
considerations.
A social network approach based on social identity has also
been proposed in order to formalise the Familiar Stranger
SN ST Att Dyn Dev
[10] Phy Yes No
[11] Phy Yes No No Yes
[12] Phy Yes No No Yes
[13] Dig No Yes No No
Our
proposal Dig Yes Yes Yes No
TABLE I
COMPARISON OF FAMILIAR STRAN GER D ET ECT IO N APP ROAC HES
concept [13]. This approach models a social network as a
graph G(N, E , A), where Nis a set of nodes (individuals)
linked by a set Eof connections (relationships). Each node
possesses a subset of attributes Aufrom a collection of
attributes A. The approach is based on the analysis of these
attributes by taking into account proximity as a key factor.
These attributes can be generated from the content of social
identity and interactions such as phone conversations, sent
mails, etc. In this context, the notion of Familiar Stranger
is defined based on two requirements. The first requirement
(stranger) aims to eliminate all connections of the targeted
individual with his set of Familiar Strangers. The second
requirement (familiar) ensures that a familiar node possesses
a set of attributes that are required and contained in a goal.
This goal depends on the individual for whom we are looking
for Familiar Strangers. Depending on the purpose, the work of
[13] can be time-consuming, especially if the aim is to detect
all of the Familiar Strangers of any node without limitations.
Although this approach remains focused on attributes that
may contain geographical locations and activities over time,
it does not take into account time and space constraints
such as the geographical notions of neighbourhood, proximity,
distance, and their consistency over time.
The different approaches are classified in Table I based on
five important parameters of detection. The social network
used for the detection (SN: physical or digital). The spatio-
temporal (ST) aspect that is or not considered in the detection.
The attribute parameter (Att) that distinguishes the works that
are content-based from others. We also consider the dynamic
aspect (Dyn) that reveals the ability of the proposed approach
to integrate new incoming individuals during analysis. Finally,
we distinguish the detection approaches that rely on the use
of a particular device (Dev) from those that do not.
Table II indicates, for each contribution, the meaning given
to the Familiar and Stranger aspects of the Familiar Stranger. It
is clear that many distinctions exist between the interpretation
of the concept depending on the context of the research.
With the recent success of nomadism, geolocation based
services and social networks, our study aims to build a detec-
tion algorithm that only depends on data generated by users
through mobile devices (smartphones, smart tablets). This
approach is based on a specific combination of technological
devices (smartphones) and usage practice (users that enable
geolocation of their statuses).
Familiar Stranger
[10] Observed
repeatedly
No direct
interactions
[11] We repeatedly
observe
Do not directly
interact with
[12] High number of contacts and low
contacts duration
[13] Exhibit similarity Not directly
connected
TABLE II
COMPARISON OF FAMILIAR STRANGER CONCEPT ADAPTATIONS
The above mentioned works regarding Familiar Stranger
detection take into account the spatio-temporal or attribute
similarity but do not combine these factors in the detection
process. They also require software or particular hardware to
perform this detection. The approach proposed in this paper
takes into account spatio-temporal parameters but also the
social proximity (i.e. similarity induced by node attributes)
of individuals. To the best of our knowledge, this approach is
the first attempt to detect the Familiar Stranger based only on
the data generated by mobile social network users.
III. THE M ULTI -DIMENSIONAL MODEL
The FS definition applied by [13] to social networks is
mainly based on the conditions of stranger and familiarity.
We propose to keep these conditions and modify them to
perform FS detection. The notion of familiarity as defined by
Stanley Milgram is multi dimensional: societal, behavioural,
spatial, temporal, etc. In this context, the main dimensions to
retain in the definition of Familiar Stranger can be found in
the two following statements:
S1. Our FS do not have direct interaction with us
S2. FS are people who seem familiar
S1 requires two familiar people not to have direct interac-
tion; this means that they should not be friends. S2 requires
that they frequent the same neighbourhood regularly and share
some common characteristics.
Some researches such as [14] and [15] have proposed a
multi-dimensional framework for friend recommendation sys-
tems. The main dimensions of these approaches are presented
in Figure 2 and the dimensions underlying Familiar Stranger
behaviour can be represented and analysed using this model.
The model takes into account the three dimensions that are
involved in mobile social networks and that can contribute
to Familiar Stranger detection online. Layer I represents the
spatio-temporal patterns (i.e. set of positions over time T=
t0, t1, t2) that are generated by users who have enabled the
mobile geolocation service proposed on most of the platforms.
The second layer represents the online social graph that reveals
the connections between profiles on a given platform (e.g.
friends). The third layer represents the data that are generated
Fig. 2. Multi-layer model
by users and more specifically the connections among content
that can be extracted from these data. In the case of Twitter
users, the connections between individuals can be deduced
based on the fact that they use the same hashtags (#) or
reference the same profiles (@) in their messages.
The concept of Familiar Stranger as addressed in this work
hinges on the concepts of strangers (as opposed to friends)
contained in layer II. The concept of familiarity is associated
with the similarity of attributes in layer III but also on the
spatio-temporal similarity of layer I.
IV. FAMILIAR STRANGER DEFINITION IN ONLINE SOCIAL
NE TWORKS
In this section, we present and define a set of concepts that
are required to identify a FS. First, we introduce the concept of
friends and strangers; second, we consider attribute similarity
and third, we address the spatio-temporal dimension.
A. Friends and strangers in online social networks
The notion of friendship, based on the representation of
social ties, exists in both virtual and physical worlds. In
this work, we only consider online friendship based on the
existence of virtual connections. However, many works have
highlighted the correlations between the online and offline
social network of a user [16], [17], [14], [18].
Since most of the online social platforms require the
creation of a link between two people before they can
interact, we propose the identification of strangers based
on the existence or not of an edge between them. We can
identify two different types of platform: the one that permits
the creation of directed edges (e.g. Twitter, LiveJournal) and
the one that does not (e.g. Facebook, LinkedIn). In most
cases the first category does not need mutual agreement for
creating edges, while the second requires the consent of both
nodes involved. We introduce some preliminary definitions
below.
Definition 1: Friends in OSN
Two nodes (u, v)N2are friends if and only if:
{(u, v),(v, u)} E2
Where (u, v)is an arc from node uto node v,Eis the set
of edges and Nthe set of Nodes
On Twitter, if a node denoted ufollows a node vand v
follows uthen uand vare considered mutual friends. The
friendship link such as on Facebook is considered as an arc
from uto v(uinvites vto be friends) and an arc from vto u
(uaccepts the request) or vice versa. Although some studies
consider that two nodes linked by a unilateral arc are friends
(e.g. [19]) this is not the case in this work. Basically, we will
consider that being strangers is the contrary of being friends.
Definition 2: Strangers in OSN
Two nodes (u, v)N2are strangers if and only if:
{(u, v),(v, u)}6⊂E
The definition of strangers on platforms with undirected
links is straightforward since two unconnected nodes will be
considered strangers. On directed platforms such as Twitter
two nodes are strangers if they are not connected, or if only
one arc exists between them (ufollows vor vfollows ubut
not mutually).
B. Content-based similarity
Many techniques for attributes generation and similarity
measures can be found in the literature [20], [21], [22], [23],
[24]. The similarity of interests is computed as a content-
based attribute similarity between two individuals. Since we
require no additional features in our detection approach, this
indicator is necessarily based on the information that can be
publicly retrieved online. Proportional frequencies is one of
the most common and generic way to represent such patterns
[25]. To obtain such a representation for a social networking
site user, it is necessary to define a set of possible attributes
whose values are discrete and finite (e.g. A={sport, science,
literature}). These attributes can be chosen depending on
the expected outcomes or can be adapted to the information
retrieved on the platform. Each node uN is represented
by a set Auof nattributes whose values belongs to Aand
that may include duplicates when occurrences are multiple
(e.g. Au={sport, sport, science}). This set of elements is
basically built up from the occurrences of terms observed in
a subset of collected public messages but many alternatives
exist [22], [20], [21]. Auallows us to build a histogram
where each element of Ais associated with its number of
occurrences (e.g. Hsport=2,Hscience=1,Hliterature=0) . Dividing
each of these occurrences by the number nof attributes
permits us to create proportional frequencies. In this work,
we only consider information extracted from public messages
(i.e. activity traces), since all other types of information (e.g.
self descriptions) can be incomplete, false or absent from the
platform. We also assume that significant information is more
likely to be contained in traces of activities (i.e. chats, talks)
than in static content that is often obsolete. Many different
similarity measures can be used to compare a couple of users
(u, v)from their proportional frequencies (P, Q). As stated
in definition 3, we propose to evaluate the interest similarity
based on the Jaccard’s coefficient.
Definition 3: Interest similarity
The interest similarity between two nodes (u, v)N2is
defined as :
SI(u, v) = Pd
i=1 PiQi
Pd
i=1 P2
i+Pd
i=1 Q2
iPd
i=1 PiQi
Where (P, Q)are the proportional frequencies of nodes
(u, v)and dis the size of the set of possible attributes.
In Section V we present the integration of this indicator into
the Familiar Stranger algorithm.
C. Spatio-temporal metric
With the success of online platforms, users’ spatio-temporal
footprints are increasing and become accessible for analysis.
Since the beginning of online social networks, the temporal
aspect has been naturally identified by a timestamp. This
timestamp is determined by the computer or the device’s
internal clock. The spatial aspect started to emerge in recent
years with the use of geolocation by GPS and Wi-Fi and has
been integrated into online social networks [26].
A message can automatically be associated with a time
and a location. This geolocation usually requires the use of
a smartphone or a smart tablet that offers GPS, and the
agreement of the user. When agreement is given, the longitude
and latitude of the user are automatically sent within the
metadata of his or her messages. This feature is now available
on Twitter and Facebook and is the central feature of Mobile
Social Softwares (a.k.a. MoSoSo [27]) such as Foursquare.
The precision of this geolocation is between 50 and 300 feet.
Many approaches can be performed to deduce a spatio-
temporal relation between social network actors. In the lit-
erature, many proposed analysis of spatio-temporal relations
between actors repose on a similarity of their spatio-temporal
patterns [9], [6], [8]. The top scores of similarities are iden-
tified as the best candidates for establishing the relation.
The calculation usually refers to a similarity score that is
computed between two patterns. Then, a similarity graph is
computed where nodes are patterns and weighted links are
scores between each pair of users. The final step is to apply an
algorithm for estimating the relative importance of the feature
in the network [28].
In this work, we investigate a specific heuristic approach
identified by Stanley Milgram as individuals who seem fa-
miliar. The described assumption requires two individuals to
meet each other regularly. This relation is clearly related to
the Meeting heuristic as presented in [8] and [29] but with an
additional regularity constraint.
We propose the definition of a context-matching function
between two persons. This function evaluates when two
people are in the same spatio-temporal frame during a
particular time of the experiment t [0, T ]. The geographical
neighbourhood is defined by a radius Rof the circle centred
on one of the two individuals analysed.
Definition 4: Geographical neighbourhood
The basic geographical neighbourhood of a node uN is
defined as follows:
t [0, T ], Geo+
t(u) = {vN |min
[t,t+δt]d(u, v)R}
Where d(u, v)is the geographical distance R,δt the spatio-
temporal constraints
Given (u, v)N2, we define the following Boolean function
that identifies when two nodes meet each other.
Geo+
t(u, v) = (1if vGeo+
t(u)
0otherwise
Definition 5: Time meeting list
We define the spatio-temporal list of Time meeting (LT)
between two nodes (u, v)N 2as:
t [0, T ], LT (u, v ) = {t [0, T ]|Geo+
t(u, v)=1}
From the meeting list, we can compute the average
frequency of the meetings between the two individuals and
this can be used to reveal a similarity score. However, the
requirements expressed by Milgram reveal the importance of
the regularity of the meetings. Typically, meeting many times
in the same day is not significant if no meetings are recorded
after this day. This case can illustrate two individuals who
may share the same entertainment over a short period of
time and with strong activity online. In this regard, meeting
regularly over a long period of time is more significant in our
experiment. This case can be illustrated by two individuals
waiting at the same bus stop every day. Thus the consistency
of the relationship over time is a critical factor. The frequency
of meetings is an indicator that can be used to identify
whether a relationship is significant or not but it does not
necessary reveal its regularity. A high frequency can hide a
very high quantity of meetings in a very small time frame
and no meetings in any other time frames. On the contrary,
a low frequency can hide a regularity of meetings if they are
scattered over a larger time frame. We propose the definition
of the observed periods as follows:
Definition 6: Observed periods
We denote LTi(u, v)the ith element (i.e. meeting) of the set
LT (u, v)and we define the ith observed periods Pibetween
two meetings as follow: (u, v)N 2,i [0, M ]
if i < M Pi(u, v) = |LTi+1(u, v)LTi(u, v)|
if i =M|TLTi(u, v)|
Where Mis the quantity of meetings between uand vduring
the experiment.
Fig. 3. Representation of periods and meeting lists
Figure 3 illustrates the spatio-temporal list of meeting times
(LTi) and the periods between these meetings (Pi).
We propose the definition of a reference value that
represents the period between meetings that is ideal to
establish that two people are regularly meeting. We denote
this value Pideal. A reasonable reference value for Familiar
Stranger detection could stand between one day and one
week, depending on the situation. We then propose a bias
indicator that only measures periods of time that exceed the
Pideal . The measure of compliance of the observed meetings
with the expected value is detailed below.
Definition 7: Compliance with ideal
(u, v)N 2, C (u, v) = 1
TX
Pi(u,v)>Pideal
Pi(u, v)Pideal
Meeting so infrequently that the average time between
meetings exceeds the ideal value significantly affects the
assumption of familiarity.
Finally, we define the spatio-temporal similarity between
two people as:
Definition 8: Spatio-temporal similarity
(u, v)N 2, SS T (u, v) = 1 C(u, v )
The spatio-temporal similarity between two users will be
equal to one if they meet frequently enough that the time be-
tween meeting is under the specified ideal (Pideal). The spatio-
temporal similarity will be null if no meeting is recorded
during the time of the experiment.
D. An improved definition of Familiar Stranger
We propose the linear combination of the two similarities
defined in the previous subsections. We define the familiarity
as a linear weighted sum of Interest Similarity (SI) and
Spatio-Temporal Similarity (SST ):
Definition 9: Familiarity
(u, v)N 2, F (u, v ) = αSST (u, v) + βSI(u, v )
with α +β= 1
The weights affecting αand βdepend on the situation
analysed and on the expected results. We propose setting
α=β= 0.5in order to correspond well with S. Milgram’s
sociological conception of familiarity. However, it can be
noted that setting α= 0 reduces familiarity to interest
similarity, and thus to a problem with no spatio-temporal
considerations. This approach is then related to [13]. Setting
β= 0 reduces the problem to spatio-temporal considerations
and such approaches do not need more than data generated by
sensors. The interests of users are not taken into account and
the approach is closer to [11] and [12].
We would lastly propose a new definition of Familiar
Stranger based on constructed familiarity.
Definition 10: Familiar Stranger in OSN
The set of FS of a node ushould respect two conditions:
Stranger condition:
vF Su, u and v ar e strangers based on def. 2
Familiar condition:
vF Su, F (u, v) = αSST (u, v) + β SI(u, v)>K
with α =β,
where Kis a familiarity threshold
V. FAMILIAR STRANGER DETECTION
In this section we propose an algorithm to detect Familiar
Strangers of a given individual. It is important to note that
geographical constraints permit a significant reduction in the
complexity of the problem. The nature of the model means
that any individual who does not appear in the neighbourhood
of the specified person during the time of the experiment is not
analysed by our algorithm. For this reason, there is no need
to crawl a full online social network to detect the Familiar
Strangers of an individual. However, if a node meets this
constraint at least once it will be investigated by our algorithm.
The algorithm basically requires us to locate the target, track
his or her movements, and analyse his or her interests and
those of his or her neighbours. The accuracy of the detection
will then mainly depend on the duration of the experiment and
on the quality of the data and parameters of the experiment.
The inputs of the algorithm (figure 4) are: the target user u,
the coefficients α, β corresponding to the spatio-temporal and
interests similarities and the spatio-temporal constraints Rand
δt. The output of the algorithm is a vector containing the list
of top Familiar Strangers candidates for the specified target
user.
During the first steps of the algorithm, the Familiar Stranger
vector is initialised and attributes are generated for the target
node (steps 1-3). These attributes, as described above, can be
generated by different processes but on the basis of publicly
available data. The algorithm enters a loop that corresponds to
the full time span of the experiment. At each specified time, we
collect the position of the target user and store the individuals
that appear in the same spatio-temporal frame (steps 5-6). All
neighbours are potential candidates to be FS and are added
to the list of recorded users. When the experiment ends, we
calculate the familiarity score, and finally validate the stranger
condition (steps 9-12). FS candidates are ranked in the list
F Suthat is returned by the algorithm.
VI. FAMILIAR STRANGER DETECTION O N TWITTER
A. Selection of the platform and candidates
We identify three main requirements in order to be able
to perform the FS algorithm on an online social networking
Inputs:
Target user u
The time delay δt
The geographical distance R
α, β weighting coefficients
Output:
F Suthe sorted vector containing top Familiar Strangers
candidates of node u
1F Su
2Collect public messages of user u
3AuGenerated attributes from messages
4 for tfrom start to end
5Retrieve geo localized updates of the set
{Neighborhood u}
6Neighborhood Geo+
t(u) = {vN |
min[t,t+δt]d(u, v)R}
7 endfor
8 foreach vNeighborhood
9 Compute F(u, v) = αSST (u, v) + βSI(u, v )
10 if ustranger to v
11 then F Suv
12 endif
13 endforeach
14 return sorted F Su
Fig. 4. Egocentric Familiar Stranger Detection
platform: (1) user data must be publicly available, (2) a
geolocation service should be integrated and (3) target users
and candidates should be active on the platform.
Concerning the first requirement, we can only analyse
platforms that provide a significant amount of public data.
The second requirement is to gain access to spatio-temporal
data and this is now possible with online social networks such
as Twitter and Facebook and using Mobile Social Software
such as Foursquare. The last requirement is mandatory to
enlarge the scope and the interest of the experiment regarding
the usual FS detection methods.
The analysis and comparison of these three conditions
on the main platforms has led us to choose the Twitter
microblogging platform to perform our algorithm. Twitter
hosts about 500 million accounts which generate more than
a million geolocated tweets daily and a large part of those can
be collected in real time through the official Twitter streaming
API.
We have performed a preselection of profiles who meet
requirements (1) and (3). For this purpose, we opened a stream
in a specific zone and collected anyone who sent a tweet in
this area during a given period of time. For each profile, we
collected the last two hundred messages and calculated the
frequency of activity, the ratio of geolocated tweets and the
number of distinct locations associated with the user’s tweets.
The individuals who met given thresholds (i.e. who are nomad
users) were selected for the experiment. The threshold set
filtered individuals who sent up to ten messages per day and of
Fig. 5. Geographical footprints of the Twitter users in the San Francisco
Bay
whose activity, at least 75% was geolocated. These 200 tweets
were necessarily associated with at least 50 distinct positions.
We performed the experiment in the San Francisco Bay
area from November 2011 to April 2012. During this time
period, a number of fifty thousand users have generated a
number of geolocated messages equal to one million. The
geographical footprints generated by the sample of these users
are represented in Figure 5.
B. Measuring familiarity
In order to perform the interest similarity calculation, we
collected the tweets of users and extracted the entities with
the help of regular expressions and a term dictionary. We
then built the proportional frequencies on the basis of the
top measured entities of the sample and applied a similarity
coefficient between pairs of users.
Twitter users cannot be located in one place continuously
and the only information accessible on their positions is
discrete geolocated tweets. To overcome this problem we
set a time delay parameter (δt) that makes each position
available during a specific period of time. Combined with the
defined radius (R) of spatial proximity, this allows us to define
with more or less flexibility the spatio-temporal constraints of
encounters.
We have generated the spatio-temporal encounter graph
for the set of selected individuals for distinct spatio-temporal
parameters. On such a graph a link between two nodes means
that they met at least once during the time span of the
experiment. Figure 6 represents the core component of the
spatio-temporal encounter graph for distinct spatio-temporal
constraints. This representation gives an idea of the impact of
the choice of constraints that can be used for the computation.
According to the previous results, broadening the constraints
leads to an increase in connections and thus to an increase
in candidates based on spatio-temporal similarity between
individuals.
The final step is to compute the Familiar Stranger detection.
In this work, we have set equal weights for the spatio-temporal
Fig. 6. Visualisation of the spatio-temporal encounter network
Fig. 7. Familiarity between users for δt = 30sec and R= 500m
and interest similarity indices (i.e. α=β). This allows better
compliance with the Milgram requirements for FS.
The final result in the dataset is presented in the familiarity
matrix of Figure 7. On such a matrix, each line and column
corresponds to a unique analysed individual. A black pixel
represents perfect familiarity while a white pixel represents
completely non-familiar users. We can see that the diagonal is
black, which shows that the familiarity between an individual
and him or herself is always maximal. The matrix is symmet-
rical because the familiarity between a user uand a user v
is equal to the familiarity between vand u. We can see that
most of the pixels are light, which means that not many users
are familiar to each other. The most familiar people linked to
an individual correspond to the darkest pixels encountered on
the line or on the column of the individual concerned.
C. Familiar Strangers
The top Familiar Stranger candidates are deduced from this
figure as the most familiar people that comply with the stranger
assumption. The results confirm that the Familiar Stranger,
even in the context of a single city, is not a commonly observed
phenomenon. We were able to extract serious candidates with
strong similarities for a significant part of the individuals. The
selection of the familiarity threshold Kremains important for
the final selection of candidates. In our work, the Kparameter
is set up to select the top 10% of familiar people that comply
with the stranger assumption. It is important to note that this
parameter may depend on each individual’s behaviour, since
two different people may have a different number of Familiar
Strangers. To identify a good threshold value for Kwe could
ask users to participate in regard to their Familiar Stranger.
This could permit the calculation of false positives and false
negatives ratios and the adaptation of the threshold with regard
to the results.
VII. CONCLUSION
This proposal, specifically adapted for online social net-
works, attempts to better adapt the FS sociological require-
ments as postulated by S. Milgram in his first studies. This
framework contains spatio-temporal, content-based and on-
line social graph analysis to take into account the multi-
dimensional aspect of the concept. Such a framework has been
particularly designed to be applicable to online social networks
using geolocation services and an application to Twitter has
been proposed. Although the quantity and accuracy of ge-
olocation data is still not sufficient to ensure the exhaustive
nature of the results, the growth of mobile social networking
applications and the success of smartphones should permit
this problem to be resolved in the near future. The approach
proposed in this work concerns various application fields such
as entertainment, services or homeland security.
VIII. ACKNOWLEDGMENT
This work is part of the CyNIC (Cybercrime, Nomadism
and IntelligenCe) CPER project supported by the Champagne-
Ardenne region and European Regional Development Fund
(ERDF).
REFERENCES
[1] D. Boyd and N. B. Ellison, “Social Network Sites: Definition, His-
tory, and Scholarship, Journal of Computer-Mediated Communication,
vol. 13, no. 1-2, Nov. 2007.
[2] S. Elwood, “Spatiality, temporality, and contexts: Geosocial data as
evidence of social interactions and networks, in Spatio-Temporal Con-
straints on Social Networks, 2010.
[3] L. Humphreys, “Mobile Social Networks and Social Practice: A Case
Study of Dodgeball,” Journal of Computer-Mediated Communication,
vol. 13, no. 1, 2007.
[4] A. Mtibaa, A. Chaintreau, J. LeBrun, E. Oliver, A. K. Pietil¨
ainen,
and C. Diot, “Are you moved by your social network application?”
in Proceedings of the first workshop on Online social networks. New
York, NY, USA: ACM, 2008, pp. 67–72.
[5] C. Putnam and B. Kolko, “Getting Online but Still Living Offline: The
Complex Relationship of Technology Adoption and In-person Social
Networks,” in Social Network Analysis and Mining, 2009. ASONAM
’09. International Conference on Advances in, 2009, pp. 33–40.
[6] Z. Yin, M. Gupta, T. Weninger, and J. Han, “A Unified Framework
for Link Recommendation Using Random Walks, in Advances in
Social Networks Analysis and Mining (ASONAM), 2010 International
Conference on, 2010, pp. 152–159.
[7] S. Milgram, “The Familiar Stranger: An aspect of the urban anonymity,
Newsletter, vol. Division 8, 1972.
[8] X. Yu, A. Pan, L.-A. Tang, Z. Li, and J. Han, “Geo-Friends Recommen-
dation in GPS-based Cyber-physical Social Network, Social Network
Analysis and Mining, International Conference on Advances in, vol. 0,
pp. 361–368, 2011.
[9] D. Quercia and L. Capra, “FriendSensing: recommending friends using
mobile phones,” in Proceedings of the third ACM conference on Rec-
ommender systems. New York, NY, USA: ACM, 2009, pp. 273–276.
[10] S. Milgram, “The individual in a social world. Addison-Wesley, 1977,
pp. 322–335.
[11] E. Paulos and E. Goodman, “The familiar stranger: anxiety, comfort,
and play in public places,” in CHI ’04: Proceedings of the SIGCHI
conference on Human factors in computing systems. New York, NY,
USA: ACM, 2004, pp. 223–230.
[12] P. Hui and J. Crowcroft, “Human mobility models and opportunistic
communications system design,” Philosophical Transactions of the
Royal Society A: Mathematical, Physical and Engineering Sciences, vol.
366, no. 1872, pp. 2005–2016, 2008.
[13] N. Agarwal, H. Liu, S. Murthy, A. Sen, and X. Wang, “A social
identity approach to identify familiar strangers in a social network,”
in in Proceedings of the 3rd International AAAI Conference of Weblogs
and Social, 2009.
[14] N. Li and G. Chen, “Multi-layered friendship modeling for location-
based mobile social networks,” Mobile and Ubiquitous Systems: Net-
working & Services, MobiQuitous, 2009. MobiQuitous ’09. 6th Annual
International, 2009.
[15] V. Agarwal and K. K. Bharadwaj, “A collaborative filtering framework
for friends recommendation in social networks based on interaction
intensity and adaptive user similarity,” Social Network Analysis and
Mining.
[16] V. Kostakos and E. O’Neill, “Cityware: Urban Computing to Bridge
Online and Real-world Social Networks,” 2008.
[17] N. Eagle, A. Sandy Pentland, and D. Lazer, “Inferring friendship
network structure by using mobile phone data,” Proceedings of the
National Academy of Sciences, vol. 106, no. 36, pp. 15 274–15 278,
2009.
[18] T. Hossmann, F. Legendre, G. Nomikos, and T. Spyropoulos, “Stumbl:
Using Facebook to Collect Rich Datasets for Opportunistic Networking
Research,” Information Forensics and Security, 2009. WIFS 2009, 2011.
[19] A. Wang, “Don’t follow me: Spam detection in twitter,” in Security
and Cryptography (SECRYPT), Proceedings of the 2010 International
Conference on, 2010.
[20] C.-Y. Teng and H.-H. Chen, “Detection of Bloggers’ Interests: Using
Textual, Temporal, and Interactive Features, in Web Intelligence, 2006,
pp. 366–369.
[21] J. Chen, R. Nairn, L. Nelson, M. Bernstein, and E. Chi, “Short and tweet:
experiments on recommending content from information streams,” in
CHI ’10: Proceedings of the 28th international conference on Human
factors in computing systems. New York, NY, USA: ACM, 2010, pp.
1185–1194.
[22] M. Michelson and S. A. Macskassy, “Discovering users’ topics of
interest on twitter: a first look,” in AND. New York, NY, USA: ACM
Press, 2010, pp. 73–80.
[23] S. Macskassy, “Contextual linking behavior of bloggers: leveraging text
mining to enable topic-based analysis,” Social Network Analysis and
Mining, vol. 1, pp. 355–375, 2011.
[24] P. Bhattacharyya, A. Garg, and S. F. Wu, “Analysis of user keyword
similarity in online social networks,” Social Network Analysis and
Mining, pp. 1–16, 2011.
[25] S.-H. Cha, “Comprehensive Survey on Distance/Similarity Measures
between Probability Density Functions,” International journal of math-
ematical models and methods in applied sciences, 2007.
[26] F. Johansson, “Extending Mobile Social Software With Contextual
Information,” pp. 1–11, Jan. 2008.
[27] L. Giuseppe, “Mobile Social Software: Definition, Scope and Appli-
cations,” in EU/IST eChallenges Conference, The Hague (The Nether-
lands), 2007.
[28] S. White and P. Smyth, “Algorithms for estimating relative importance
in networks,” in Proceedings of the ninth ACM SIGKDD international
conference on Knowledge discovery and data mining, 2003.
[29] F. Kappe, B. Zaka, and M. Steurer, “Automatically Detecting Points of
Interest and Social Networks from Tracking Positions of Avatars in a
Virtual World,” in Social Network Analysis and Mining, 2009. ASONAM
’09. International Conference on Advances in, 2009, pp. 89–94.
... The categories of articles were decomposed into several categories from the three main classes, namely, review articles [40,[43][44][45][46][47][48][49], model/framework, and application. Under the second category of model/framework, it is sub-categorised into time-based [50][51][52][53], activity-based [17,[54][55][56][57][58][59][60][61][62][63][64][65][66], location-based [1,11,12,17,20,30,42,48,, social-based [9,[89][90][91][92][93][94][95][96][97][98][99][100][101][102][103][104][105][106], and multi-dimension , while under the third category of application, it is also sub-categorised into travelling and POI [1,4,8,49,50,73,, shopping and e-commerce [156,157], and finally events and activities [22,[158][159][160][161][162][163][164][165][166][167]. ...
... Social ties-based strategies have been used for the first group of techniques to consider those actual friends are trustworthy than desired as a source of recommendations [96]. The author of [97] focuses heavily on the premise that individuals prefer others who identify with them to suggest similarity in taste. The definition of 'family strangers' has been proposed by the third group of research studies to describe a new degree of confidence between individuals based on different variables such as similarity of interest, geographical neighbourhood, and time meeting. ...
Article
Full-text available
Agenda 2030 of Sustainable Development Goals (SDGs) 9 and 11 recognizes tourism as one of the central industries to global development to tackle global challenges. With the transformation of information and communication technologies (ICT), e-tourism has evolved globally to establish commercial relationships using the Internet for offering tourism-related products, including giving personalised suggestions. The contextual suggestion has emerged as a modified recommendation system that is integrated with information-retrieval techniques within large databases to provide tourists with a list of suggestions based on contexts, such as location, time of day, or day of the week (weekdays or weekends). This study surveyed literature in the field of contextual suggestion and recommendation systems with a focus on e-tourism. The concerns linked with approaches used in contextual suggestion and recommendation systems are highlighted in this systematic review, while motivations, recommendations, and practical implications in e-tourism are also discussed in this paper. A query search using the keywords “contextual suggestion system”, “recommendation system”, and “tourism” identified 143 relevant articles published from 2012 to 2020. Four major repositories are considered for searching, namely, (i) Science Direct, (ii) Scopus, (iii) IEEE, and (iv) Web of Science. This review was carried out under the protocols of four phases, namely, (i) query searching in major article repositories, (ii) removal of duplicates, (iii) scan of title and abstract, and (iv) complete reading of articles. To identify the gaps in current research, a taxonomy analysis was exemplified into categories and subcategories. The main categories were highlighted as (i) review articles, (ii) model/framework, and (iii) applications. Critical analysis was carried out on the basis of the available literature on the limitations of approaches used in contextual suggestion and recommendation systems. In conclusion, the approaches used are mainly based on content-based filtering, collaborative filtering, preference-based product ranking, and language modelling. The evaluation measures for the contextual suggestion system include precision, normalized discounted cumulative, and mean reciprocal rank, while test collections comprise Internet resources. Given that the tourism industry contributed to the environmental and social-economic development, contextual suggestion and recommendation systems have presented themselves to be relevant in integrating and achieving SDG 9 and SDG 11 in many ways such as web-based e-services by the government sector and smart gadgets based on reliable and real-time data and information for city planners as well as law enforcement personnel in a sustainable city.
... A second research stream heavily relies on the assumption saying that people trust those who agree with them to propose taste similarities [69,70]. A third category of research works have introduced the concept of "familiar strangers" to define a new level of trust between people based on various factors like interest similarity, geographical neighborhood, and time meeting [71]. Another group of research works, based on follow the leader methods [72], mined the totality of users social networks to generate the most influential person that can change their opinions about recommendations [73]. ...
... Other researchers have moved from analyzing classical textual data to social networking analysis [56]. The structured information provided by graph representation techniques can help to propose complex similarities methods to compare users in social networks, their preferences and items they like, e.g., Jaccard similarity [11], cosine similarity [105], similarity based on Pearson's correlation coefficient [103,106], or multi-dimensional similarity [71] (interest similarity, ge ...
Article
Recommender systems have recently been singled out as a fascinating area of research, owing to the technological progress in mobile devices, such as smartphones and tablets, as well as to the rapid growth of social networking. In this respect, the main purpose of recommender systems is to suggest items that help users to make decisions from a large number of possible actions such as what place to visit, what movie to watch, or which friend to add to a social network system. In mobile environment, many personal, social and environmental contextual factors can be integrated into the recommendation process in order to provide the correct recommendation to a special user, at the perfect moment, in the appropriate location based on his/her emotional state, his/her current activity and past behavior. This paper provides an overview of context-aware recommender systems in mobile environment. The objective of this systematic review is to investigate the current state of the art in context-aware recommender systems and classify the reviewed research papers. This study aims equally to identify the possible future directions in this research area.
... Seyyar satıcılar, toplu taşıma araçlarında düzenli olarak karşılaşılan ancak direkt iletişim kurulmayan yolcular, köşe başında duran simitçi ya da sabah yürüyüşlerinde düzenli olarak karşılaşılan bir kişi bu tanıma göre meçhul dost olarak nitelendirilir. Perez, Birregah ve Lemercier'in de belirttiği gibi gündelik hayatta, işe giderken ya da kentte rutin eylemleri gerçekleştirirken karşılaşılan bu kişiler, doğrudan dostlarımız değillerdir ancak sıradan yabancılara oranla arkadaşımız olma olasılıkları daha yüksektir (Perez, Birregah & Lemercier, 2013, s. 1175. Meçhul dost, yabancılarla aramızda duran bir eşiktir. ...
Article
Bu çalışma, Tolga Karaçelik’in Gişe Memuru (2010) filmini Stanley Milgram’ın “meçhul dost” kavramı merkezinde yeniden düşünmeyi amaçlamaktadır. Milgram’ın sosyolojik bağlamda kullandığı, kent insanının kalabalıklar içerisinde yalnız olmadığını ifade eden meçhul dost kavramı, Milgram’a göre kentte düzenli olarak gözlemlediğimiz, algı çemberimiz içerisinde yer alan ancak çoğu zaman doğrudan etkileşim kurmadığımız kişileri tanımlar. Çalışmada, mekân olarak kenti seçen filmin ana karakteri gişe memurunun, gişeden düzenli geçen kişiler için bir meçhul dost olduğu savunulmaktadır. Öncelikle literatür taraması yapılarak kavramın farklı çalışmalarda hangi bağlamda ele alındığı özetlenmiş, ardından filmin karakteri, bir meçhul dost olarak yeniden düşünülmüştür. Filmde meçhul dost kavramının anlam evrenine ait tekrar ve yeniden karşılaşmalar, bunlara ilişkin diyaloglar, anımsamalar, mesafe, gişe memurluğunun rutine dayalı, düzenli karşılaşmalara müsait doğası betimsel analiz yöntemiyle incelenmiştir. Çalışmanın sonucunda, yeniden karşılaşmaların filmdeki meçhul dostluğun hem kuruluşuna hem de yitirilişine neden olduğu ortaya konulmuştur.
... La présence dans le même contexte et régulière de deux personnes physiques donne tout naturellement un critère de légitimité qui peut être pris en compte dans un modèle futur. Notons tout de même, que l'analyse des rencontres spatio-temporelles entre deux utilisateurs de média sociaux a été traitée mais non présentée dans cette thèse car non intégrée au modèle final [143]. Ce travail a notamment permis de mettre en évidence la faisabilité de la mesure de distance spatio-temporelle entre deux utilisateurs de Twitter sous certaines conditions. ...
Thesis
Full-text available
Notre société doit faire face à de nombreux changements dans les modes de communication.L’émergence simultanée des terminaux nomades et des réseaux sociaux numériques permet désormais de partager des informations depuis presque n’importe quel lieu et potentiellement avec toutes les entités connectées.Le développement de l’usage des smartphones dans un cadre professionnel ainsi que celui des réseaux sociaux numériques constitue une opportunité, mais également une source d’exposition à de nombreuses menaces telles que la fuites d’information sensible, le hameçonnage, l’accès non légitime à des données personnelles, etc.Alors que nous observons une augmentation significative de la malveillance sur les plateformes sociales, aucune solution ne permet d’assurer un usage totalement maîtrisé des réseaux sociaux numériques. L’apport principal de ce travail est la mise en place de la méthodologie (SPOTLIGHT) qui décrit un outil d’analyse comportementale d’un utilisateur de smartphone et de ses contacts sur les différents médias sociaux. La principale hypothèse est que les smartphones, qui sont étroitement liés à leurs propriétaires, mémorisent les activités de l’utilisateur (interactions) et peuvent être utiles pour mieux le protéger sur le numérique.Cette approche est implémentée dans un prototype d’application mobile appelé SPOTLIGHT 1.0 qui permet d’analyser les traces mémorisées dans le smartphone d’un utilisateur afin de l’aider à prendre les décisions adéquates dans le but de protéger ses données
... According to Milgram's Familiar Strangers theory, 5 familiar strangers are people that we all observe regularly in our daily activities but with whom we do not interact, for example, the familiar people who often takes the same bus during commutes. Although the traditional research on familiar strangers focuses on identifying the familiar strangers in the physical world, there are emerging research on identifying familiar strangers in the virtual world based on common social identities or interests, such as social networks 17 and interest-based online communities. 18 The virtual hackathon's time-bound and transitionary properties compounded by a large number of participants meant that hackathoners were focused on finding the familiar, whether a potential collaborator shared a familiar professor or attended the same school, or that they shared experiences on other platforms/events but had never actually interacted. ...
Article
Full-text available
This article introduces a large-scale virtual hackathon where we observed the way participants found collaborators and undertook innovation processes entirely in the virtual world. As an emerging social-technical practice, the virtual hackathon leverages the power of familiar strangers, the improvisation of low-cost digital services, and the crowdsourcing mechanism to enable open innovation under the constraint of physical distancing. This study contributes to the research by introducing and conceptualizing a modified artifact – virtual hackathon. The implication of and the lessons learnt from the virtual hackathon are applicable and generalizable to organizations when managing virtual collaborations, digital infrastructure, and open innovation.
... The authors first construct the people and interest relation graph, and then convert the problem of finding familiar strangers in the graph into the Steiner tree problem, which provides a lower bound on the search space. With the same goal, [17] considers the geographical and meeting time similarity of strangers, besides the attributes of interests. In this way, detecting familiar strangers can be carried out by comparing the similarity measurements of two individuals. ...
Conference Paper
Full-text available
The newly emerging location-based social networks (LBSN) such as Tinder and Momo extends social interaction from friends to strangers, providing novel experiences of making new friends. Familiar strangers refer to the strangers who meet frequently in daily life and may share common interests; thus they may be good candidates for friend recommendation. In this paper, we study the problem of discovering familiar strangers, specifically, public transportation trip companions, and their common interests. We collect 5.7 million transaction records of smart cards from about 3.02 million people in the city of Beijing, China. We first analyze this dataset and reveal the temporal and spatial characteristics of passenger encounter behaviors. Then we propose a stability metric to measure hidden friend relations. This metric facilitates us to employ community detection techniques to capture the communities of trip companions. Further, we infer common interests of each community using a topic model, i.e., LDA4HFC (Latent Dirichlet Allocation for Hidden Friend Communities) model. Such topics for communities help to understand how hidden friend clusters are formed. We evaluate our method using large-scale and real-world datasets, consisting of two-week smart card records and 901,855 Point of Interests (POIs) in Beijing. The results show that our method outperforms three baseline methods with higher recommendation accuracy. Moreover, our case study demonstrates that the discovered topics interpret the communities very well.
Article
Full-text available
This publication contains reprint articles for which IEEE does not hold copyright. Full text is not available on IEEE Xplore for these articles.
Article
Full-text available
The tremendous growth in the amount of attention and users, on social networking sites (SNSs), has led to information overload and that adds to the difficulty of making accurate recommendations of new friends to the users of SNSs. This article incorporates collaborative filtering (CF), the most successful and widely used filtering technique, in social networks to facilitate users in exploring new friends having similar interests while being connected with old ones as well. Here, first we design an implicit rating model, for estimating a user’s affinity toward his friends, which uncover the strength of relationship, utilizing both attribute similarity and user interaction intensity. We then propose a CF-based framework that offers list of friends to the user by leveraging on the preference of like-minded users, with a given small set of people that user has already labeled as friends. Despite the immense success of CF, accuracy and sparsity are still major challenges, especially in social networking domain with a staggering growth having enormous number of users. To address these inherent challenges, first we have explored the idea of adaptive similarity computation between users by employing evolutionary algorithms to learn individual preferences toward particular set of attributes that results in considerable improvement in recommendation accuracy as compared to the situation where all the attributes are given equal importance. Second, we incorporate effective missing data prediction algorithm as a solution to data sparsity thereby further enhancing accuracy. Experimental results are presented to illustrate the effectiveness of the proposed friends recommendation schemes.
Article
Full-text available
This paper studies a Bluetooth-based mobile social network application deployed among a group of 28 participants collected during a computer communication conference. We compare the social graph containing friends, as defined by participants, to the contact graph, that is the temporal network created by opportunistic contacts as owners of devices move and come into communication range. Our contribution is twofold: first, we prove that most properties of nodes, links, and paths correlate among the social and contact graphs. Second, we describe how the structure of the social graph helps build forwarding paths in the contact graph, allowing two nodes to communicate over time using opportunistic contacts and intermediate nodes. Efficient paths can be built using only pairs of nodes that are socially close (i.e. connected through a few pairs of friends). Our results indicate that opportunistic forwarding complies with the requirement of social network application.
Article
Full-text available
In this paper we describe the role of Mobile Social Software (MoSoSo) applications in the development of a human-oriented Ubiquitous Network Society aiming at economical growth and social cohesion. In order to achieve this goal, both policy makers and enterprises need to address a number of technical and political challenges, such as the implementation of effective mechanisms for privacy management and protection of digital rights. All solutions should acknowledge the role of the user, not only seen as a consumer, but also as an active citizen, producing and sharing knowledge. By designing for societal development, enterprises would benefit from grassroots innovation and higher adoption of new services.
Article
We present a novel problem of searching for ‘familiar strangers’ in a social network. Familiar strangers are individuals who are not directly connected but exhibit some similarity. The power-law nature of social networks determines that majority of individuals are directly connected with a small number of fellow individuals, and similar individuals can be largely unknown to each other. Moreover, the individuals of a social network have only a local view of the network, which makes the problem of aggregating these familiar strangers a challenge. In this work, we formulate the problem, show why it is significant to address the challenge, and present an approach that innovatively employs the social identities of the individuals with competitive approaches. A blogger and citation network are used to showcase technical details and empirical results with related issues and future work.
Article
Distance or similarity measures are essential to solve many pattern recognition problems such as classification, clustering, and retrieval problems. Various distance/similarity measures that are applicable to compare two probability density functions, pdf in short, are reviewed and categorized in both syntactic and semantic relationships. A correlation coefficient and a hierarchical clustering technique are adopted to reveal similarities among numerous distance/similarity measures.
Article
Location-based Mobile Social Networks (MSNs) are becoming increasingly popular given the success of Online Social Networks (OSNs), such as Facebook and MySpace, and recent availability of open mobile platforms, such as Apple iPhones and Google Android phones. MSNs extend existing OSNs by allowing a user to know when her friends are around and by providing the ability to meet new people who share her interests. There are few studies, however, on how users are connected through these emerging location-based MSNs. In this paper, we present analysis results of a commercial MSN for which we quantified the correlation between users' friendship with their mobility characteristics, social graph properties, and user profiles. The evaluation of the derived model from the empirical traces suggests that the model-based friend recommendation is effective, and its performance is better than well-known Naive Bayes classifier and J48 decision tree algorithms. To the best of our knowledge, this paper presents the first study that models the friendship connections over a real-world location-based MSN.
Article
This article points out how mobile contextual information can enhance social networking tasks through mobile social software. Since the mobile phone is considered a personal belonging by most people, the context of the mobile phone is treated as the context of its user. The contextual information of the mobile phone can enhance and make social networking tasks easier to perform on the go. The article shows that vital communicative and social information can be extracted from four types of mobile contextual information: physical, computational, time and user context. This article also covers the issues concerning personal integrity and security when social softwares can pinpoint or map user behavior through contextual information.