ArticlePDF Available

CrowdProbe: Non-invasive Crowd Monitoring with Wi-Fi Probe

Authors:

Abstract and Figures

Devices with integrated Wi-Fi chips broadcast beacons for network connection management purposes. Such information can be captured with inexpensive monitors and used to extract user behavior. To understand the behavior of visitors, we deployed our passive monitoring system---CrowdProbe, in a multi-floor museum for six months. We used a Hidden Markov Models (HMM) based trajectory inference algorithm to infer crowd movement using more than 1.7 million opportunistically obtained probe request frames. However, as more devices adopt schemes to randomize their MAC addresses in the passive probe session to protect user privacy, it becomes more difficult to track crowd and understand their behavior. In this paper, we try to make use of historical transition probability to reason about the movement of those randomized devices with spatial and temporal constraints. With CrowdProbe, we are able to achieve sufficient accuracy to understand the movement of visitors carrying devices with randomized MAC addresses.
Content may be subject to copyright.
115
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe
HANDE HONG,National University of Singapore, Singapore
GIRISHA DURREL DE SILVA, National University of Singapore, Singapore
MUN CHOON CHAN, National University of Singapore, Singapore
Devices with integrated Wi-Fi chips broadcast beacons for network connection management purposes. Such information can
be captured with inexpensive monitors and used to extract user behavior. To understand the behavior of visitors, we deployed
our passive monitoring system—CrowdProbe, in a multi-oor museum for six months. We used a Hidden Markov Models
(HMM) based trajectory inference algorithm to infer crowd movement using more than 1.7 million opportunistically obtained
probe request frames.
However, as more devices adopt schemes to randomize their MAC addresses in the passive probe session to protect user
privacy, it becomes more dicult to track crowd and understand their behavior. In this paper, we try to make use of historical
transition probability to reason about the movement of those randomized devices with spatial and temporal constraints.
With CrowdProbe, we are able to achieve sucient accuracy to understand the movement of visitors carrying devices with
randomized MAC addresses.
CCS Concepts:
Networks Location based services
;
Human-centered computing Mobile phones
;
Mathe-
matics of computing Kalman lters and hidden Markov models;
Additional Key Words and Phrases: Passive tracking, randomization, transition probability, Crowd movement
ACM Reference Format:
Hande Hong, Girisha Durrel De Silva, and Mun Choon Chan. 2018. CrowdProbe: Non-invasive Crowd Monitoring with
WiFi Probe. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3, Article 115 (September 2018), 23 pages. https:
//doi.org/10.1145/3264925
1 INTRODUCTION
Understanding how crowds move and how they behave has been one of the focuses for the research community.
Gaining such information is of vital importance for managing visitor ow in public areas such as shopping malls,
railway stations, and museums. By knowing how people move, we are able to come up with countermeasures to
reduce congestion and improve the spatial arrangement. Furthermore, we can foresee a visitor’s future movement
based on statistical patterns.
The most traditional way of tracking is to use pencil and paper to record how users move along with the
corresponding timestamps. Such a method is labor-intensive and tedious. It is also error-prone when there is
a large crowd. The ubiquity of digital devices and technologies have revolutionized the way we get to know
about our environment. Video-based recognition is one of the most popular technologies used to observe visitor
This is the corresponding author
Authors’ addresses: Hande Hong, National University of Singapore, Singapore, honghand@comp.nus.edu.sg; Girisha Durrel De Silva,
National University of Singapore, Singapore, girisha@comp.nus.edu.sg; Mun Choon Chan, National University of Singapore, Singapore,
chanmc@comp.nus.edu.sg.
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that
copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst
page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy
otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from
permissions@acm.org.
©2018 Association for Computing Machinery.
2474-9567/2018/9-ART115 $15.00
https://doi.org/10.1145/3264925
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:2 H. Hong et al.
behavior[
4
,
22
,
24
26
]. However, the deployment of the video-based system is expensive and the system could
potentially have poor performance because of limited lighting condition and overlapped individuals in the same
image. Furthermore, people are concerned about privacy and not willing to be subjected to visual monitoring. To
overcome the above limitations, researchers have looked to exploit dierent technologies including the use of
Bluetooth[19,32], cellular network[2] and, RFID[6,33].
Due to the widespread deployment of WiFi networks and the availability of WiFi chipsets on smartphones, use
of WiFi related information to extract user information has been both popular and shown to be eective[
1
,
3
,
5
,
7
].
Smartphones periodically broadcast probe request frames to trigger responses from nearby APs. By deploying
WiFi monitors in the environment, we can capture these management frames and extract location information
related to phone owners. Such methods are passive because they require no change on the mobile devices. Passive
scanning is performed only by the WiFi monitors with no impact on the operations of existing infrastructure.
While previous work[
15
,
23
] has shown the feasibility of such methods, iOS and Android have enabled MAC
randomization to protect user privacy. This adds to the challenge of whether such technique can be used in
practice.
In this paper, we present CrowdProbe, a system that has been deployed in a multi-oor museum to track
thousands of visitors daily using passive WiFi monitoring over six months. We input temporally and spatially
sparse passively collected RSS ngerprints to a Hidden Markov Models(HMM) based model to generate visitor
trajectories. Dierent from traditional HMM, we do not obtain regular observations from the system since the
probe requests are only sent opportunistically and can be quite sparse. Instead, we modify the model to include
specic features of museum visitors to improve the trajectory inference performance. In addition, we make use
of historical transition probability to reason about the movement of those randomized devices with spatial and
temporal constraints. We summarize our contributions as follows:
To the best of our knowledge, CrowdProbe is the rst large-scale passive WiFi monitoring system deployed
in a complex indoor public space. Six months’ experience and data we get can be valuable in bridging
research and practical usage.
We use Hidden Markov Models(HMM) based trajectories generation method which makes use of WiFi
ngerprinting, spatial constraints and temporal constraints. With the proposed method, we successfully
generate more than 91 thousand traces which give adequate information to understand visitor behavior in
the museum.
Based on the data accumulated in the visitor traces, we generate visitor transition probability and show
that this information can be used to accurately reason about the short time crowd movement of the visitors
with mobile devices with randomized MAC address.
The rest of the paper is organized as follows. We give the background of probe request and MAC randomization
in Section 2. In Section 3, we describe the architecture for the CrowdProbe system and deployment setting. In
Section 4, we present how the data is processed for trajectory inference. We present our trajectory inference
algorithm in Section 5. We use the transition probability generated by the trajectory to infer the movement of the
visitors with mobile devices with randomized MAC address in Section 6. The evaluation of CrowdProbe is given
in Section 7. Then we present the related work and discussion in Section 8and Section 9. Finally, we summarize
the paper in Section 10.
2 BACKGROUND
2.1 Probe Request
Smartphone broadcasts probe request frames to trigger responses from nearby APs with the purpose of speeding
up the discovery of surrounding APs. Such frames are management frames containing information such as
network identier (SSID), MAC address, signal strength, and the time stamp. The emission of such a frame is
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:3
0
0.05
0.1
0.15
0.2
0.25
0-5s
5-10s
10-20s
20-30s
30-60s
1-2min
2-3min
3-5min
5-10min
>10min
Percentage
Probe Interval
0.162
0.142
0.163
0.070
0.139
0.103
0.058 0.039 0.042
0.082
Fig. 1. Probe request interval distribution from data collected in museum
unavoidable as long as the device needs to connect to the network. Devices generally send probe request frames
when they are not associated. However, when the currently connected WiFi signal becomes weak, the device
will start to send probe frames to nd better network candidate and prepare for handover. Such features make it
suitable for indoor tracking as most of the indoor environments have complex layouts. When a visitor moves
around inside the building, WiFi signal can vary a lot and trigger another probe to be sent from the mobile device.
To understand how frequently probe requests are sent in real life scenarios, we process the data we collect
in the museum and plot the result in the Figure 1. As can be seen from the gure, probe request frames can be
sent with intervals ranging from 5 seconds to more than 10 min, with 88% of the frames sent within 5 min. In
places like shopping malls, museums, and other public spaces, visitors can spend up to an hour or more. The
information provided by the probe requests can provide up to minute-level granularity on coarse user locations
and thus can help us understand the movement of visitors in these public spaces.
2.2 MAC Randomization
2.2.1 iOS. From iOS 8 onward, Apple introduced MAC address randomization to avoid passive tracking of
devices. The initial setting is that randomized addresses are used only while the devices are not associated and in
sleep mode[
13
]. In later versions, the condition to trigger randomization has been extended to include location
service and auto-join scan [
29
]. This means that devices are sending more randomized MAC address in the probe
frame. From previous work in [
21
], we know that Apple device seems to implement true randomization across
the entire eld of MAC address.
2.2.2 Android. Following the same pace as iOS, Google’s Android operating system added experimental
support for MAC randomization. Full implementation went live in version 6.0 which covers most of the Android
user base. However, a recent study shows that Android’s MAC randomization is largely absent[
14
] even if the OS
version does support this feature. Compared to Apple device, Android devices, for example, Google devices are
always randomized with prex DA:A1:19.
2.2.3 MAC Randomization Implementation in Practice. We made an analysis of the museum data regarding
MAC randomization and show the statistics in Table 1. Among all the probe request frames we have collected, 63%
of the probe request frames were sent with randomized MAC addresses. If devices have similar probe frequency,
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:4 H. Hong et al.
Fingerprint Generation
Outlier
Non-Mobile
Device
Trajectory Inferring
Temporal and
Spatial Constraint
Transition
Probability
Global Unique or
Long-lived
Randomized MAC
Short-lived
Randomized MAC
E
A
A
E
Crowd Movement for Short-lived Randomized MAC
Fig. 2. Architecture of CrowdProbe
then the ratio of devices that have implemented randomization is close to 63% of the population. On average,
each global unique MAC sent 34 probe request frames, while locally assigned MAC addresses were only sent with
5 probe request frames each. While global unique addresses have a 1-1 relationship with an individual device, a
device performing randomization can have either 1-to-1 or 1-to-many relationship in a single day. Thus most of
the randomized MAC addresses only existed in a limited number of probe request over a specic time period and
were never seen again. Overall, we can see that randomized devices play an important role in crowd monitoring.
If we are not able to properly tackle this problem, half of the information is concealed.
Table 1. Data statistics in Museum
Category Global Unique MAC Randomized MAC Not mobile device
Probe Request Frame Number 1,744,764 3,006,941 108,262
MAC Address Number 50,953 602,133 2373
Probe request Per MAC 34 5 45
3 ARCHITECTURE AND DEPLOYMENT OF CROWDPROBE
The architecture of CrowdProbe is shown in Figure 2. Multiple WiFi monitors are deployed in varies locations
and each WiFi monitor scans for probe request frames. When a user, carrying WiFi-enabled mobile devices,
walks around dierent exhibition locations, the frames transmitted are captured by the monitors. Ideally, the
WiFi monitors should be placed in the location that beacons from a device in any location within the monitored
area can be heard by multiple monitors.
Data collected by the monitors are sent to the server for further processing. The server performs data analysis
to generate crowd movement information:
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:5
Level 1
Level 2
Level 3
1
2
4
3
1
5
67
8
9
10
Location of Monitor
ALocation Label
A
B
C
D
E
F
G
H
Entrance
80m
128m
50m
Detection Range Indication I
Fig. 3. Floorplan for the museum and the deployment layout
Device Filtering:
ngerprints from remote devices, non-mobile devices, and devices from sta in the
museum are ltered out to make sure that the devices are carried by real visitors.
Fingerprint Generation and Classication:
probe request data from multiple monitors are merged to
form the signal ngerprint. After that, these ngerprints are divided into two categories: stable MAC and
short-lived randomized MAC.
Trajectory Inference:
Data from the stable set are used to generate visitors’ trajectories based on
temporal and spatial constraints. Using the result in trajectory generation, we are able to derive transition
probabilities.
Movement Inference for Randomized Devices:
The transition probabilities can be input as a tool to
guess the movement of randomized devices in a short time slot. By combining data from the randomized
devices and global unique MAC devices, we can give a complete view of visitor movement in the museum.
The deployment of CrowdProbe has two components: the front-end WiFi monitors and back-end servers. We
deployed the system in a museum of three oors We divide the museum into 9 locations, marked with dierent
colors in Figure 3. Location A is the main entrance and ticket counter. The location I is a cafe providing food and
space for visitors to have a rest. The other seven locations are dierent exhibitions focus on dierent topics. The
WiFi monitor deployed is a Raspberry Pi 3 device equipped with one D-Link wireless USB adapter(DWA-132).
Raspberry Pi 3 is a low-cost computing platform with a 1.2 GHz quad-core ARM Cortex A53, 1 GB LPDDR2-900
SDRAM, and supports 802.11n Wireless LAN. Since the embedded WiFi adapter in the Raspberry Pi 3 cannot
operate in the monitor mode, we instead use USB WiFi dongles to implement passive scanning. Each monitor
can pick up transmissions sent by the mobile devices in the vicinity. Note that as the mobile devices transmit
probes on all channels in the supported spectrum (typically both 2.4GHz and 5GHz), a monitor can ideally hear
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:6 H. Hong et al.
Fig. 4. Monitors deployed in the museum
transmissions from all nearby mobile devices by sning on a single channel. However, in practice, due to packet
loss, not all transmissions will be received. However, it has been noted that hopping between channel does not
help to pick up more messages [
13
]. In our deployment, in order to maximize the probe request, the monitors
are set to listen to the same channel with the nearby WiFi APs provided by the museum. To increase frame
reception, we also sni NULL data frames which are used for power management purpose when the devices are
associated[15].
Figure 4shows one of the monitors deployed and the device components we used for monitoring. We deploy a
total of 10 boxes to ensure that we cover most of the exhibition locations. The deployment locations are labeled
in Figure 3with red circle icons. Due to aesthetic requirements by the museum management and the need to
access power, we are not able to deploy the monitors in the desired locations to maximize coverage. Most of the
monitors are installed under chairs, in corridors, or behind doors, which is not optimal for data collection. For
example, in Figure 3, location C, and D do not have proper monitors in the center area. Nevertheless, we are able
to cover most of the area suciently to understand visitors’ movement pattern. The data collection is carried
out with approval from the Institutional Review Board(IRB). To keep the privacy of visitors, we do not store the
actual value but instead stored a hashed value of the MAC address after we verify that the MAC address is valid
or randomized.
In the following sections, we will elaborate the details of each component of CrowdProbe and the corresponding
challenges.
4 DEVICE FILTER AND CLASSIFICATION
In this section, we will describe the process carried out to increase the likelihood that the ngerprints collected
come from visitors to the museum.
4.1 Filtering Remote Devices
Since the museum is located near a street famous for food and bars, monitors deployed may opportunistically
capture probe frames from pedestrians on the streets. Such data has to be removed. This is handled by enforcing
a minimum requirement of good quality RSS. While it is possible for visitors to visit signal blind spots with weak
RSS, but it is not likely for visitors to spend all their time in such area. If the visitor walks around the museum,
there is a good chance that strong RSS signals from the devices can be captured.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:7
Fig. 5. MAC address classification
4.2 Filtering Non-mobile Devices
This ltering step is to make sure that the devices detected are from valid mobile device vendors. Note that we
are mainly interested in smartphones carried by mobile users. We make use of the online public database to
match the OUI of MAC addresses collected from probe request frames. Since we need to use the OUI eld of the
device, this step targets on global unique MAC address. Fortunately, non-mobile devices do not have the security
concern to be tracked and thus lack the incentive to implement randomization.
4.3 Filtering Security Guard and Sta in Museum
Individuals inside the museum can be visitors or employees of the museum. The dierence between these two
categories is that visitors usually go to the museum occasionally, while employees stay in the museum over
multiple days a week. Thus we keep a list of hashed MAC addresses that were captured by the monitors over
multiple days. This set of devices may belong to employees in the museum or to non-mobile devices, for example,
desktop with a WiFi dongle that comes from mobile phone vendor.
4.4 Fingerprint Generation and Classification
After device ltering, data collected from dierent monitors are merged into a signal ngerprint based on the
time stamp. For example, the ngerprint is represented as
®
f
:
{r1,r2,r3, .. ., rn}
, where
ri
is the RSS captured by
monitor
i
. We use the value of -99 to denote missing data when the monitor fails to capture the probe request or
the monitor is too far away. Since all the monitors are connected to the internet to transmit data to the server, we
also have to ensure that the clocks in monitors are well synchronized.
The last step to prepare the ngerprint data is as follows. We divide all the ngerprint data into two categories:
stable MAC and short-lived MAC as shown in Figure 5. Stable MAC includes the global unique data and the set of
randomized MAC data that do not change their mac address(Long-Lived). Long-Lived MAC addresses were sent
by randomized devices, but they preserve the same randomized MAC over the entire visit. For these devices, we
can track them as easily as the devices with globally unique MACs. Data from the globally unique and long-lived
randomized MAC are given as input to generate the trajectories of visitors.
5 TRAJECTORY INFERENCE WITH HIDDEN MARKOV MODELS
To infer user movement trajectories, we model the visiting process as a probability-based state transition
process. We adopt the most prevalent method used in passive tracking or indoor localization: Hidden Markov
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:8 H. Hong et al.
Model(HMM)[
10
,
17
]. HMM models next state based on the previous state, current observation, and transition
probability. In our scenario, the hidden states are the location labels of visitors with given observations as RSS
ngerprint vectors. Thus, we rst try to match each ngerprint to a set of locations. Then we make use of spatial
and temporal constraints to generate the transition probability and nally the target trajectory.
5.1 Emission Probabilities
The emission probability model denes the probability distribution of the visitors’ location across the entire
space where each ngerprint is captured. Correctly modeling the emission probability forms the basis for our
trajectory inference. To make full use of the signal information in the passive RSS ngerprint from all the
nearby monitors, we use ngerprint similarity to identify the location. We used four dierent phones to collect a
ngerprint database in all the exhibition locations. We normalized the ngerprint and calculated their Tanimoto
Coecient[
30
]. Cross-validation is used to understand the performance of such ngerprint similarity method.
The ngerprint database is separated into a training set and a testing set. We show the result of testing data from
dierent phone models in Table 2. The four phone models (Nexus5, Nexus6, Meizu MX6, Meizu Pro6) are all
using the Android OS. We use Android as we can easily modify the phone to send more probe frames with global
unique MAC address.
Table 2. Classification result with dierent phone models
Train/Test Mx6 Pro6 Nexus5 Nexus6
Mx6 0.88 0.68 0.66 0.72
Pro6 0.65 0.91 0.8 0.7
Nexus5 0.71 0.82 0.87 0.78
Nexus6 0.67 0.72 0.79 0.86
As we can see in Table 2, if the training set and testing set come from the same phone models, we are able to
achieve close to 90% accuracy. However, when mapping to dierent phone models, the accuracy drastically drops
to 70 percent. So, besides the multi-path eect, antenna gain and phone placement, phone model dierences also
have a negative impact on ngerprint matching. Furthermore, a phone can also transmit at dierent power levels
depending on the specic IEEE 802.11 version used [
11
]. For example, Samsung Galaxy S4 sends at 13 dB using
802.11a but it sends at 12 dB using 802.11n.
Clearly, ngerprint similarity alone is insucient to improve the accuracy. Our approach is to keep a set of
locations in our emission probabilities. That is to say, we do not decide on a single location for each ngerprint.
Instead, we keep a list of candidates assigning each of the possible candidate a probability. The idea is similar
to particle ltering[
9
], but instead of keeping a large number of random sample, we only keep a limited set of
candidates for higher eciency. So for each ngerprint
fi
,
{r1,r2, .. ., rn}
, we have a list of candidate locations
{l1,l2, .. ., ln}
. We calculate the similarity of each candidate with the corresponding ngerprint in the database.
Then we get a list of ngerprint similarity
{s1,s2, .. ., sn}
. We estimate the conditional probability of a device in
location ljgiven ngerprint fias follow:
ωj=(rj+99)/
n
Õ
1
(rk+99)(1)
p(lj|f i)=ωjsj/
n
Õ
1
(ωksk)(2)
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:9
0
0.2
0.4
0.6
0.8
1
0 hop 1 hop 2 hops 3 hops
Ratio
3 Min
5 Min
10 Min
20 Min
Fig. 6. Visitors’ probabilities of moving to other locations with dierent time intervals
With the weight
ωj
, we are giving more condence to the stronger RSS ngerprints. After this step, for each
ngerprint, we are able to generate a list of candidate locations and their emission probabilities. For example, {A:
0.7, B: 0.15, C: 0.07, E: 0.08} or {F: 0.26, G: 0.54, H: 0.21 }.
5.2 Transition Probabilities
How the visitor will move between each pair of consecutive ngerprints is modeled as transition probabilities.
We need to decide what is the probability for the visitor to move between exhibition locations or stay in the
same location based on consecutive ngerprints and their time stamps. In our modeling, we made the following
assumptions:
Assumption 1: A visitor’s movement in the museum is suciently slow compared to the timescale of probe request
capture such that the WiFi monitor is able to track his/her movement from one location to another.
For CrowdProbe to work well, a user needs to spend enough time in a single location so that it is likely that
the device transmits at least one probe request from each location. To verify our assumption, we collected the
transition pattern of visitors to the museum with dierent time intervals ranging from 3 min to 20 min and
plotted the result in Figure 6. The x-axis shows how far a visitor can move, measured in the number of hops from
current location to a destination location. From the gure, we can see that when the time interval is short, say 3
min, the likelihood that a visitor will stay in the same location is more than 80%. When the time interval is 20
min, a visitor has a 30% chance of moving to a location two or more hops away. In a 5 min interval, the likelihood
of a visitor either staying in the same location or move to a neighboring location is 93%.
Assumption 2: The longer a visitor spends in an exhibition location, the more likely he will leave for the next
exhibition location.
If a visitor has already spent some time, for example, 15 minutes, in the same exhibition location, then he is
more likely to leave the location than the visitor who just arrives in this area. Thus, the transition probability
should also take time already spent in the current location into consideration. Figure 7shows the decay curve for
locations D and E. As more time elapsed, more visitors will leave the place.
With this rule, we also solve a classical problem in passive tracking: the handover problem. The handover
problem comes when the visitor is near the boundary of two dierent locations. The location inferred from the
ngerprint can jump back-and-forth between the two locations. In our scenario, the museum is a multi-oor
building where some of the ceilings between dierent oor are removed for aesthetic requirements. For instance,
location A and E are connected openly without blocking, which leads to the problem that visitors in location A
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:10 H. Hong et al.
0
0.2
0.4
0.6
0.8
1
0 10 20 30 40 50 60
Percent of visitors remaining
Time elapsed (Min)
Location E
Location D
Fig. 7. The visitor decay curves for location D and E
have a high chance to be detected in location E. Sequences of ngerprints can generate jitters like AEAEAE in
the trajectory derived. By taking the tendency to stay into consideration, more stable transitions can be obtained.
Based on the above discussion, we dene the following:
Staying Tendency
describes the inclination of the visitor to stay in the same exhibition location. From Figure
7, we can see that the percent of visitors remaining and the stay time follow an inverse proportional relationship.
Thus we dene the staying tendency coecient ωte nd as
ωtend =τth r es ho ld
t+1(3)
where
t
is the time length that the visitor has stayed in the current exhibition location. We add 1 to the ratio to
handle extremely short duration
t
. The longer time the visitor has spent in the same location, the smaller the
value of
ωtend
will be. In Figure 7, we see dierent curves for dierent locations. Thus the time length threshold
of
τth r es ho ld
, which indicates the stay time length when a visitor has an equal chance to stay and leave the current
location, should change based on dierent locations.
Order of Neighbor
is dened as the number of locations a person must pass through to reach a specic
exhibition location from the current location. For example, in the oorplan of the museum shown in Figure 3,
for location C, the 1st-order neighbors include its immediate adjacent locations (A, D, and I), and its 2nd-order
neighbors include the immediate adjacent locations of its 1st-order neighbors (excluding C and C’s rst order
neighbors). We dene
hopij
as the number of hops a person need to transit from location
i
to location
j
, which
equal to the order of neighbor. Particularly, we set hopii as 1.
Based on the map constraint and temporal limitations, with time interval
τin t er val
between consecutive
ngerprint, we dene the transition likelihood
LHij
and normalized transition probability
pij
between
location iand jas follows:
LHij=(ωt end /hopii +τin t erva l /τt hr e sho ld ,i=j
1/hopij +τin t er va l /τth r es ho ld ,other wise (4)
pij=LHij
ÍN
k=1LHik
(5)
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:11
where N is the set of all the locations. With the increasing time interval between consecutive ngerprint
τin t er val
, the relative dierence of likelihood between each pair of locations becomes smaller. That means if the
time interval between two ngerprints is small, we give higher transition probability to a nearby location. If the
time interval is large we do not give any preference for the transition as the visitor can walk to any location
within such a long duration. Table 3gives the list of important parameter used.
Table 3. List of some important parameters in this paper
Parameter Description
τmin The minimum staying time length required for visitor each location
siRSS ngerprint similarity
ωtend staying tendency coecient
τth r es ho ld Stay time length when visitor have a equal chance to stay and leave
τin t er val Time interval between consecutive ngerprint
hopij The number of hops a person need to transit from location ito locationj
LHijTransition likelihood between location iand j
pijNormalized transition probability between location iand j
5.3 Trajectory Inference
With the available transition probability and emission probability, we use Viterbi’s algorithm[
12
] to nd the
maximum probability trajectory. For a series ngerprint
f1,f2, .., fn
, we nd the sequence locations
l1,l2, . ., ln
which maximize the Equation 6. Since we have only a limited number of candidates for each ngerprint captured,
the result converges very fast. Usually, a visitor will spend quite a lot of time in a single location, thus the sequence
of locations will contain a lot of redundancy. For example,
AEEE EE EEE F F F F FGG GGGGGGGG GF ADDI
. Each
letter represents the location of the visitor when a specic probe message was captured by the monitors. We
simplify the trajectory by removing consecutive and duplicate locations and updating the corresponding time
stamps. For the above example, we get AEF GF ADI .
argmax
l1,l2, . . ., lnÖ
i<n
p(li+1|fi) ∗ p(ii+1)(6)
6 ONE-HOP MOVEMENT INFERENCE FOR SHORT-LIVED RANDOMIZED DEVICE
From Table 1, we observe that if devices have similar probe frequencies, then the number of devices that implement
MAC randomization is close to 2/3 of the population. While we are able to derive the crowd movement based
on the above trajectory inference model using data from devices with Stable MAC addresses, ignoring a large
number of devices with randomized MAC will lose a substantial amount of information. Previous work [
31
]
used Information Elements (IE) as signatures to track devices. However, recent work[
21
] have found that such
signatures may change during randomization. Improper use of IE may also cause a high rate of false positives. So
if we can not track the movement of each randomized device, can we infer the crowd movement at each time
duration without knowing who they are? In this section, we will use the trajectories derived from stable MAC
devices to infer the one-hop crowd movement of short-lived randomized devices.
6.1 Overview
Figure 8gives the overview of our one-hop movement inference for short-lived randomized devices. We cut
the time duration in each day into separate time slots and generate the corresponding status vector in each
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:12 H. Hong et al.
Time
...
SV1SV2SV3... SVNSVN+1
SVENSVEN+1
ADD RIn and R’Out
Transition Probability Matrix
Extended Status Vector
Status
Vector
E
A
One Hop Crowd Movement
for Short-lived Randomized MAC
Input
Fig. 8. Overview of the movement inference for short-lived randomized devices
time slot. A status vector is a vector that contains the number of randomized MAC devices which send probe
frames captured in each location. A status vector is a snapshot of the number of short-lived MAC devices in each
location. To complete the picture, we include the number of visitors who enter and leave the museum to form the
extended status vector.
RI n
denotes the number of people entering the museum and
ROu t
denotes the number of
people leaving the museum.
Although each individual visitor has his preference for route selection when visiting the museum, the choices
are generally aected by the layout of exhibits, facilities, interpretative tools, and advertisements. If we assume
that people carrying non-randomized phones have similar behavior to those carrying randomized phones, the
aggregated movement should be similar. We utilize this assumption to infer crowd movement of users carrying
devices with randomized MAC addresses based on the transition pattern learned from devices with Stable MAC
addresses. In the next few sections, we will discuss the details of our algorithm.
6.2 Status Vector and Transition Matrix
Within each time slot, we dene a status vector
SV
which contains the number of randomized probe frames
captured in each location. An example of this will be
{A
: 10
,B
: 3
,C
: 3
,D
: 3
,E
: 1
,F
: 2
,G
: 3
,H
: 0
,I
: 2
}
.
However, we found that this vector does not capture all the information about visitors that enter or leave the
museum at the time. Since the museum has multiple entrances/exits and the probe transmission is opportunistic,
a new visitor can appear or leave with last probe frame being captured in any location. Thus we dene the
extended version of status vector
SV E
to add two virtual locations, "In" and "Out". For every two consecutive
vectors, we dene the two SV Es as follow:
where
RA
and
R
A
is the number of devices within location A in time slot N and N+1. We dene the visitor
movement between two time slots as a transition matrix TNas follow:
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:13
Table 4. Extended Status Vector
A B C D E F G H I I n Out
SV ENRARBRCRDRERFRGRHRIRI n 0
SV EN+1R
AR
BR
CR
DR
ER
FR
GR
HR
I0R
Ou t
XAAXABXAC... XAHXAI0XAOut
XBAXBBXBC... XBHXBI0XBOut
XCAXCBXCC... XCHXCI0XCOut
... ... ... ... ... ... ... ...
XHAXHBXHC... XHHXHI0XHOut
XIAXIBXIC... XIHXII0XIOut
XI nAXI n BXI nC... XI nHXI n I0 0
0 0 0 ... 0 0 0 0
where
XCC
denotes the number of visitors that remains in location C,
XAC
denotes the number of people
that move from location A to C. Note that no visitor goes from state
Out
to any location, and no visitor goes
from any location to state
In
. Both of these values in the matrix are set to 0. To conserve the number of people,
all the variables need to satisfy the following equations:
XAA+XAB+XAC+... +XAOut =RA
...
XIA+XIB+XIC+... +XIOut =RI
XI nA+XI n B+XI nC+... +XI nOut =RI n
XAA+XBA+XCA+... +XI nA=R
A
...
XAI+XBI+XCI+... +XI nI=R
I
XAout +XBou t +XCou t +... +XI nou t =R
out
RI n R
out =Rдa p
(7)
Now based on the processing of randomized MAC data, we can derive values
RA
to
RI
,
R
A
to
R
I
and
Rдap
.
Suppose we have
N
locations in the museum (not including In and Out). We have 2
×N+
3equations in the above
formulation. However, we have
N×N+
2
×N
unknown values. Whenever
N>
1, we have
N(N+
2
)>
2
N+
3.
So there exist many dierent transition matrices that can satisfy the equations. Thus, we need a way to nd a
specic solution that satises additional constraints. Our approach is to make use of the data accumulated with
global unique and long-lived randomized address. The approach uses a two-step process to infer the one-hop
movement for the short-lived randomized device.
6.3 Two-steps Conversion for Short-Lived Randomized Data
We assume that people carrying non-randomized phones have similar moving patterns with those carrying
randomized phones. In order to utilize such movement pattern, we perform the same processing to the stable MAC
data set as described in section 6.2. With every two consecutive time slots, we are able to get one ground-truth
transition matrix since these devices keep the same MAC addresses. We sum up all the transition matrices and
normalize each row to generate the probability matrix
Ttr a in
. We assume that
Ttr a in
captures the average user
behavior.
Thus in the rst step,
SV EN
is multiplied by the transition matrix
Ttr a in
to generate the expected status vector
SV E
N.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:14 H. Hong et al.
5321
A B C D
SVEN
1244
SVEN+1
0.5 0.2 0.1 0.2
0.2 0.1 0.5 0.2
0.2 0.4 0.3 0.1
0.1 0.5 0.3 0.1
Probability Transition Matrix Ttrain
First Conversion: SVEN * Ttrain
3332
SVE’N
2 1 0 1
0 0 1 0
0 0 0 0
0 0 0 0
Second Conversion:
-2 -1 1 2
DiffN
0 -1 1 0
0000
1 1 0 3
0 0 1 0
0 0 0 0
0 0 0 0
1 1 0 3
0 0 2 0
0 0 0 0
0 0 0 0
Fig. 9. A simple example for two-step conversion
SV E
N=SV ENTt r a in (8)
As the current occupancy status can be dierent from the average behavior, in the second step, we nd the
status vector that is close to
SV E
N
and yet minimizes the dierences. For ease of processing, we calculate the
dierence vector Di f f Nby subtracting SV E
Nfrom SV EN+1.
Di f f N=SV EN+1SV E
N(9)
Each of the negative values in
Di f f N
indicates that there is a certain number of visitors that move from current
location to the other locations. For each positive value of
Di f f N
, it means this location attracts visitors from
other locations. Thus, with each negative value, we search for available positive values in the
Di f f N
to ll the
hole. Based on the transition probability, we assign the visitors to move to the other corresponding location until
the
Di f f N
is adjusted to be a vector containing all 0 values. With these sequence of conversions, we nish the
second step conversion.
The nal transition matrix gives an estimation of the crowd movement for people bringing devices with
randomized MAC during this period. Figure 9gives a simple example of the two-step conversion calculation with
only four locations included without In and Out state for ease of explanation. Summing up the matrix for both
Stable MAC and Short-lived Randomized MAC data set, we have an overview of the crowd movement for visitors.
7 EVALUATION
There are three parts in the evaluation. First, we evaluate the accuracy of trajectory inferring using Stable MAC
data. We then present the results for inferring movement of devices with random MAC addresses using movement
patterns of devices with Stable MAC addresses. Finally, we present some interesting crowd movement statistics
ndings in the museum.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:15
13:30 14:00 14:30
A C D H D A F AB I E Ground Truth
A C D H DA F AB I E Visitor X4
Visitor X3
Visitor X2
Visitor X1
CAE
A C D H DA F AB I E
C EC
A C D H DA F AB I E
C C
A C D H DA F A
I E
C E
A
B
A
Fig. 10. Path generation with modified phones for TR1
7.1 Accuracy of Trajectory Inferring
Two parameters are required in the inference. First, we need the minimum staying time length
τmin
that a visitor
has to spend in a location for the system to be able to detect the movement. This duration depends on how
frequent each device transmits the probe frames. Based on measurements presented earlier, the parameter is set
to 5 min.
The second parameter needed,
τth r es ho ld
, indicates the stay time length when a visitor has an equal chance to
stay and leave. We use the average staying duration in each of the locations to infer the value. Since dierent
locations have dierent area sizes,
τth r es ho ld
can vary a lot for dierent areas. The corresponding value of
τth r es ho ld
measured for dierent locations are
{A
: 18
min,B
: 9
min,C
: 16
min,D
: 13
min,E
: 13
min,F
: 10
min,G
:
32min,H: 13min,I: 13min}.
7.1.1 Ground Truth Collection. To verify the accuracy of trajectory inference, we organize three ground truth
collection tours to the museum. The details of the tours are listed in Table 5. In TR1, we organized four people
carrying four dierent phones to walk on a predened route. All the phones in this tour are modied to prompt
more probe request frames with global unique MAC address. We required the users to record down the time
stamps in each of the locations. In TR2, we followed a one-hour guided tour with 11 other adult visitors (16 in
total including tour guide). All the visitors just use their mobile devices which are unmodied phones with WiFi
switched ON. In TR3, we do a similar guide tour with more young people involved.
Table 5. Detail of three ground truth collection tour
Name Number of People Young Old Identied Visiting Route Time
TR1 4 4 0 4 ACDHDBAIAEFA 1h 30min
TR2 16 7 9 9 ACDH 55min
TR3 18 13 5 13 ACDBAEF 1h 8min
7.1.2 Trajectory Inferring Result with Probe Frequency. We will only show the result of the rst two trips as
the result of the third trip is similar. The result of the path inference for TR1 is shown in Figure 10 with ground
truth plotted in the bottom. We found that the trajectories inferred are pretty accurate with minor errors. The
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:16 H. Hong et al.
14:00 14:30 15:00
A C D H Ground Truth
Visitor X6
Visitor X5
Visitor X2
Visitor X1
A C D H DA
C
A C D H DA
C
A D H A
BD
A D H DA
C
Fig. 11. Path generation with unmodified phones for TR2
start time and end time for each location dier from the ground truth by a maximum of 3 minutes. The reason
for such high accuracy is because of the use of the modied phones with a high frequency of probe transmission.
The result in Figure 11 shows the ground truth and our trajectory inference for four of the visitors during
TR2. The ground truth tour ended after visiting location H. We include the full traces of the four visitors which
showed their personal choices after the tour guide ended the tour. Compared to the result in Figure 10, only X6
generate the full trajectory without any gap. Trajectories for visitor X1 and X2 both miss the location C, which
may be because they stayed there only for a short time. In Visitor X2’s trajectory, there is a duration of around 20
minutes without any probe request emitted for which we can not decide on the proper location. While we can
guess that the visitor may remain in the same location H, such an assumption may lead to a large error. Thus we
leave that period of time as unknown.
7.1.3 Trajectory Inferring Accuracy. We dene several metrics to measure the accuracy of trajectory inference:
False Positive
The ratio of the locations identied by the algorithm in the trajectories but not present in
ground truth trajectories.
Location Recall
The correct number of locations derived in trajectory / The total number of locations in
ground truth.
Time Length Accuracy Time length estimation accuracy.
Start and End Time Error
The start time and end time shift errors for each location we identied in the
trajectory.
Based on the result of the three trips, we identied 26 trajectories and use them to calculate the accuracy of
trajectories inference. We compare the performance of three approaches used to derive trajectory.
FP
uses only the WiFi ngerprinting method for localization and uses these locations to derive the
trajectories.
HMM
is similar to CrowdProbe but without considering the movement pattern of visitors inside the
museum. That is to set the transition probability as same for all the locations.
CrowdProbe the proposed method.
The results are shown in Figure 12 and Figure 13. It can be observed that CrowdProbe attains a low false positive
close to 0.14 compared to 0.23 for FP and 0.18 for HMM. While CrowdProbe has similar recall rate with HMM
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:17
method, the time length estimation for CrowdProbe is much higher at 0.94. The Start Time and End Time Error
for FP and HMM are around 5 minutes and 3 minutes. CrowdProbe improves that to around 2.5 minutes. The
improvement is due to reducing the jitters in the handover area.
0
0.2
0.4
0.6
0.8
1
False Positive Recall Total Time Length
Ratio
FP HMM CrowdProbe
0.23
0.82
0.72
0.18
0.91
0.82
0.14
0.92 0.94
Fig. 12. Trajectory generation performance
0
1
2
3
4
5
6
7
Start Time Error End Time Error
Error(Min)
FP HMM CrowdProbe
5.3
4.9
3.1
3.8
2.4 2.6
Fig. 13. Time stamp estimation performance
7.2 Short-lived Randomized Device One-hop Transition Evaluation
7.2.1 Time Slicing. We need to decide what is the proper duration or time slot length that we are able to
infer without losing too much information about crowd movement but with sucient collected data. The basic
requirement for picking the length of the time slot is to ensure that device with randomized MAC will send at
least 1 probe request in each time slot but not multiple probe frames with dierent MAC addresses. Thus this
value is decided by two factors: the lifetime of randomized MAC and probe frequency.
The only hint we found about the lifetime of randomized MAC is in the conguration le
wpa_supplicant.con f
used by Android and Windows and OS client station which indicate a
rand_addr _li f e time =
60 [
20
]. That means
any two randomized addresses are not likely to emitted from the same device within 1 minute. Thus, we can
safely set the time slot length to be larger than 1 minute. From Figure 1, we can see that most of the devices send
at least one probe frame within a 5-minute time slot. With a larger value like 10 minutes, we are likely to include
multiple samples from the same device in each time slot. That may introduce error in the status vector. With a
much smaller value like 1 minute, we may not have received any probes from many of the devices.
A device may transmit more than one probe frame in the same 5-minute slot. If the (randomized) MAC address
remains the same, then this is not a problem. However, if the MAC address changes within a single time slot,
then the same device may be counted as dierent devices and we overestimate the number of users. Hence, a
relatively short interval of 5 minute will also limit the amount of overestimation due to duplicates.
7.2.2 Evaluation Method. Even though we can derive the transition matrix for randomized devices, we are
not able to verify the result since probe frames are randomized as we do not have the ground truth data for
the short-lived randomized MAC data. Thus, we instead use the data from Stable MAC devices to check the
performance of our approach. The ow of the evaluation is given in Figure 14. We input the status vectors
SV EN
and
SV EN+1
for the Stable MAC devices to the algorithm and get the result transition matrix. With the Stable
MAC data set, we can easily derive the ground truth transition matrix. We use the following metrics to measure
the performance of the short-lived MAC device one-hop transition inference.
Transition Accuracy
The number of correct transitions in Transition matrix for Stable MAC data / The
total number of transition happen in ground truth. Note AAis also regarded as one transition.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:18 H. Hong et al.
Passive Scanning Data
Status Vector For
Stable MAC Data
Status Vector For
Short-lived MAC Data
...
Transition Probability Matrix
...
Transition Matrix For
Short-lived MAC Data
Evaluation? No Ground Truth!
...
Transition Matrix For
Stable MAC Data
Ground Truth For
Stable MAC Data
Evaluation
Using Stable
MAC Data
Fig. 14. Demonstration of our evaluation method for one-hop transition inference
0
0.2
0.4
0.6
0.8
1
0.5 0.6 0.7 0.8 0.9 1
Ratio(%)
Accuracy
July
Sep
Oct
Fig. 15. Randomized trajectory inference accuracy with non-randomized data testing result
7.2.3 Result for Short-lived Randomized Device Transition. By comparing the ground truth matrix and the one
we derived, we can estimate the performance of our Short-lived transition inference method. With each two
consecutive time slots, we run the evaluation method to verify the eectiveness of the short-lived randomized
device one-hop transition. We show the results for the month of July, September, and October 2017 by plotting
the CDF of the transition accuracy in Figure 15. The average accuracy for the three months is 0.8, 0.81, 0.77
respectively. That means in every 5 transitions, we can correctly infer 4 of them. Considering the diculty of
tracking devices with randomized MAC addresses, the accuracy is better than what we expected.
Table 6gives a summary of the information we can get from passive scanning. If the device provides Stable MAC
addresses, we can derive a lot of information about the crowd movement. We can derive short time movement for
devices with randomized MAC addresses if we can supplant the data with statistics of stable MAC in the same
venue. However, if the information is only for short-lived randomized MAC, we can only do occupancy counting
in each time slot.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:19
Table 6. Information we can obtain from passive sacnning
Feature Counting RI n and RO ut One-hop Transition Trajectory inference Stay Length Estimation
Stable MAC ✓ ✓
Short-Lived MAC with
Stable MAC statistics ✓ ✓ × ×
Short-Lived MAC Only ✓ ✓ × × ×
Level 1
Level 2
Level 3
1
2
4
3
1
5
67
8
9
10
Location of Monitor
ALocation Label
A
B
C
D
E
F
G
H
Entrance
1 percent of movement
8 percent of movement
I
Fig. 16. The arrows and their widths represent visitors’ flows between dierent locations. The most frequent path is shown
as green color.
7.3 Findings for Museum Statistics
With the help of the trajectory and transition inference algorithm, we share our ndings in processing museum
data regarding the visitors’ movement pattern. In August 2017, the layout of the museum was changed due to
some artwork being replaced and exhibition location G being blocked for re-installation. Thus in our analysis, we
may also include the impacts of such changes.
Although each visiting path selection can be aected by personal choice, from a macro view, the path spatial
distribution should be the result of the interplay between monitors locations and the spatial layout of the museum.
Figure 16 gives the spatial distribution of visitors in the museum. From the gure, we can see a majority amount
of visitors follow the route ACDGHFEA and a smaller set of visitors take the reverse route with AEFGHDCA.
The two paths both begin from and end at location A which is the main entrance to the museum. Among all
the sub-path, EFG and GFE appear in 35% and 29% of all the visitors’ trajectories. This is because of the linear
layout on the second oor of the museum. The number of visitors picks ACD is twice the number of visitors
who pick ABD. Only 18% of people actually make a visit to exhibition location B. 73% of the visitors actually
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:20 H. Hong et al.
skip exhibition location H which located deep inside the museum in the third oor. CrowdProbe enables us to
analysis on such visitor pattern without labor-intensive survey.
0
0.05
0.1
0.15
0.2
0.25
0.3
1 2 3 4 5 6 7 8 9
Ratio
Number of Locations Visited
July Sep to Dec
Fig. 17. Number of exhibition locations visited for visitors
0
5
10
15
20
25
30
35
A B C D E F G H I
Time Length(min)
Exhibition Locations
July Sep to Dec
Fig. 18. Average staying time for each exhibition location
Figure 17 shows the average number of exhibition locations visited. 80% of the visitors took a tour including
3-6 locations. Only very few visitors actually visit the whole museum. After the re-innovation started in August,
the average number of locations visited decreases slightly. Figure 18 gives the average staying time for each
location. The duration usually ranges from 10-20 minutes and is somewhat proportional to the area of each
location. With the change in August, the time spent in location G is distributed to other exhibition locations
causing an increment of staying in almost all the other areas.
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
<30min
0.5-1h
1-1.5h
1.5-2h
2-2.5h
2.5-3h
>3h
Average
0
25
50
75
100
Ratio
Time Length(min)
Visiting Time
July Sep to Dec
Fig. 19. Distribution of total time length spend in museum
0
20
40
60
80
100
120
140
10:00:00
11:00:00
12:00:00
13:00:00
14:00:00
15:00:00
16:00:00
17:00:00
18:00:00
19:00:00
20:00:00
21:00:00
Time Length(min)
Entry Time
Weekday Friday Weekend
Fig. 20. Visitor stay time length vs entry time
Figure 19 shows the total time distribution a visitor spent in the museum. 63% and 66% of visitors spent around
0.5 to 2 hours at the museum in July and Sep-Dec. Only ve percent of the visitors spent more than 3 hours in
the museum. After access to location G had been blocked, the time in the museum drops slightly from 84min to
79min. From Figure 20, we can conclude that the visitor will stay for shorter length when reaching the closing
time for the museum (21:00 on Friday, 19:00 otherwise). In the morning, around 11 am, visitors tend to stay less
time, which may be because of the approaching lunchtime. In the afternoon, the duration is relatively stable and
starts decreasing two hours before the closing time.
8 RELATED WORK
Research work on passive tracking can generally be divided into two categories: device-free passive tracking and
device-based passive tracking.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:21
8.1 Device-free Passive Tracking
The idea of radio-based device-free passive tracking is based on the fact that the existence of the human body
in an RF environment aects the RF signals, especially in 2.4 GHz and 5 GHz band common in WiFi network.
Typical deployment usually includes signal transmitters and monitoring points. During the training phase, RSS
information is collected under dierent conditions. Later in the testing phase, the emerging RSS ngerprint is
matched to the database to infer the number of people and their locations. While this technology is still only used
in controlled experiment settings, some research work already shows the potential. Nuzzer[
28
] used probabilistic
approach for handling the device-free passive localization problem for a single intruder. E-eye[
34
] uses channel
state information to identify and distinguish in-house activities. Ichnaea[
27
] shorten the training period and
applied statistical anomaly detection techniques and particle ltering to provide localization capabilities.
Device-free passive tracking usually requires tedious training and can only track a limited number of people.
Moreover, if the environment settings have changed, the radio database needs to be re-trained and adjusted to t
the new changes. This limitation hinders the further deployment of such device-free passive tracking technology.
8.2 Device-based Passive Tracking
Device-based passive tracking aims to track devices that carried with users, especially smartphones. Early work[
18
]
uses RFID to estimated visitor positions, visiting patterns, and inter-human relationships at a science museum.
Recent work[
35
] use Bluetooth to monitor visitors’ length of stay at the Louvre. However, their experiment only
cover 8.2% of the visitor which aects the credibility and practicality of the conclusion. Due to the widespread
deployment of WiFi networks and the popularity of smartphone, the use of WiFi related information to individual
information has been both popular and shown to be eective. Researchers have proposed a series of ideas
to exploit the availability of probe information from mobile devices to track individuals. Musa[
23
] also used
HMM-based method to estimate smartphone trajectory which is similar to our work. However, their system is
meant to deploy for outdoor road conditions where vehicle have x moving direction and the requirement for
granularity is lower than in a complex museum.
Besides merely tracking location, more work focuses on revealing user relationship such as using the known
SSIDs list in probe requests as the ngerprint to decide whether two people are socially linked together[
3
,
7
]. A
similar method has been used to generate spatial-temporal similarity based on users’ co-occurrence frequency to
infer relationships between them [
5
,
15
]. Adriano[
8
] exploited WiFi probe requests to de-anonymize the origin of
participants in large events. To combat such information leakage, major mobile phone vendors introduced MAC
randomization and encouraged the devices to send probe frames with empty (unknown) SSID list [16].
After the introduction of MAC randomization, researchers focus on de-anonymize WiFi frame. Freudiger[
13
]
attempt to use sequence number and timing information to link randomized probe message. Vanhoef[
31
] make
use of information element(IE) and scrambler seeds used at the physical layer to track users. Martin[
21
] is more
aggressive to implement control frame attack to expose the globally unique MAC.
Compared to the previous works, CrowdProbe is deployed in a complex indoor environment. We provide a
non-invasive method to reveal the crowd movement regardless of the phone vendors or OS versions.
9 DISCUSSION
In our measurements, we observe that about 60% of the devices randomized their MAC addresses. As more
vendors take action to protect the privacy of the user, this ratio will continue to increase. While such a trend
presents a challenge for CrowdProbe, we would like to highlight that CrowdProbe can work as long as there is
sucient statistics from devices that broadcast frames with Stable MAC addresses. We believe this will be the
case for the following reasons. First, if a device is associated with the WiFi network, it will revert to use its global
unique MAC address [
21
]. In many public spaces, free WiFi access is often available. It can be expected that some
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
115:22 H. Hong et al.
visitors will connect to the WiFi network for internet access. Thus, one will be able to collect sucient statistics
with Global Unique MAC even though it may take more time. Second, as shown in Figure 5, some devices indeed
randomized their MAC, but they keep the same randomized MAC over a suciently long duration of up to hours.
Such data can be used to infer the transition probability without linking each MAC to a specic user device.
Lastly, we also sni NULL data frame. These frames are used for power management and do not randomized the
MAC addresses. Current randomization scheme is implemented only on the active scanning of the mobile device.
Based on the above discussion, CrowdProbe can continue to collect enough data to infer the transition pattern
and one-hop transition for devices with randomized MACs.
While CrowdProbe is only deployed and tested in the museum environment, the technique has the potential
to be used in other environments like shopping malls and transportation hubs. For trajectory inferring, all the
parameters are based on the data collected in the place. Thus, our algorithm will still run for the dierent scenarios
as long as sucient data can be collected.
The sparse nature of frames transmission limits the accuracy of crowd monitoring which can be seen from the
performance gap between Figure 10 and Figure 11. To get more frame transmission, the author in [
23
] propose
to emulate the SSID of popular or previously visited AP. This technique can also be integrated into our system.
However, this technique triggers the use of WiFi interface of the mobile device which will interrupt the existing
connection and drain the battery at a higher rate.
10 CONCLUSION
In this paper, we propose an HMM-based visitor trajectory inference method based on passive WiFi monitoring.
Moreover, we make use of the transition probability derived from existing trajectories to generate the possible
movement mapping. The deployment and evaluation in a multi-oor museum proved the feasibility of the
proposed system. We believe that CrowdProbe can also be used in other scenarios. While there is no xed model
for all the applications, the experience and lessons we learn from this case study will help in bridging research
and practice.
REFERENCES
[1] Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social networks 25, 3 (2003), 211–230.
[2]
Jamal Jokar Arsanjani, Wolfgang Kainz, and Ali Jafar Mousivand. 2011. Tracking dynamic land-use change using spatially explicit
Markov Chain based on cellular automata: the case of Tehran. International Journal of Image and Data Fusion 2, 4 (2011), 329–345.
[3]
Marco V Barbera, Alessandro Epasto, Alessandro Mei, Vasile C Perta, and Julinda Stefa. 2013. Signals from the crowd: uncovering social
relationships through smartphone probes. In IMC. ACM, 265–276.
[4]
Ben Benfold and Ian Reid. 2011. Stable multi-target tracking in real-time surveillance video. In Computer Vision and Pattern Recognition
(CVPR), 2011 IEEE Conference on. IEEE, 3457–3464.
[5]
Ningning Cheng, Prasant Mohapatra, Mathieu Cunche, Mohamed Ali Kaafar, Roksana Boreli, and Srikanth Krishnamurthy. 2012.
Inferring user relationship from hidden information in wlans. In MILITARY COMMUNICATIONS CONFERENCE, 2012-MILCOM 2012.
IEEE, 1–6.
[6]
Tom Chothia and Vitaliy Smirnov. 2010. A Traceability Attack against e-Passports.. In Financial Cryptography, Vol. 6052. Springer,
20–34.
[7]
Mathieu Cunche, Mohamed Ali Kaafar, and Roksana Boreli. 2012. I know who you will meet this evening! linking wireless devices using
wi- probe requests. In WoWMoM. IEEE, 1–9.
[8]
Adriano Di Luzio, Alessandro Mei, and Julinda Stefa. 2016. Mind your probes: De-anonymization of large crowds through smartphone
WiFi probe requests. In Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on. IEEE, 1–9.
[9]
Arnaud Doucet, Nando De Freitas, Kevin Murphy, and Stuart Russell. 2000. Rao-Blackwellised particle ltering for dynamic Bayesian
networks. In Proceedings of the Sixteenth conference on Uncertainty in articial intelligence. Morgan Kaufmann Publishers Inc., 176–183.
[10] Sean R Eddy. 1996. Hidden markov models. Current opinion in structural biology 6, 3 (1996), 361–365.
[11] Samsung Electronics. 2013. SAR evaluation report. In SAR evaluation report. Samsung Electronics, 3–4.
[12] G David Forney. 1973. The viterbi algorithm. Proc. IEEE 61, 3 (1973), 268–278.
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe 115:23
[13]
Julien Freudiger. 2015. How talkative is your mobile device?: an experimental study of Wi-Fi probe requests. In Proceedings of the 8th
ACM Conference on Security & Privacy in Wireless and Mobile Networks. ACM, 8.
[14]
Dan Goodin. 2017. Shielding MAC addresses from stalkers is hard and Android fails miserably at it. https://arstechnica.com/
information-technology/2017/03/shielding- mac-addresses- from-stalkers-is-hard-android- is-failing- miserably/. [Online].
[15]
Hande Hong, Chengwen Luo, and Mun Choon Chan. 2016. SocialProbe: Understanding Social Interaction Through Passive WiFi
Monitoring. In Proceedings of the 13th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services.
ACM, 94–103.
[16]
Xueheng Hu, Lixing Song, Dirk Van Bruggen, and Aaron Striegel. 2015. Is There WiFi Yet? How Aggressive WiFi Probe Requests
Deteriorate Energy and Throughput. arXiv preprint arXiv:1502.01222 (2015).
[17]
Mohamed Ibrahim and Moustafa Youssef. 2011. A hidden markov model for localization using low-end GSM cell phones. In Communica-
tions (ICC), 2011 IEEE International Conference on. IEEE, 1–5.
[18]
Takayuki Kanda, Masahiro Shiomi, Laurent Perrin, Tatsuya Nomura, Hiroshi Ishiguro, and Norihiro Hagita. 2007. Analysis of people
trajectories with ubiquitous sensors in a science museum. In Robotics and Automation, 2007 IEEE International Conference on. IEEE,
4846–4853.
[19]
Thomas Liebig and Armel Ulrich Kemloh Wagoum. 2012. Modelling Microscopic Pedestrian Mobility using Bluetooth.. In ICAART (2).
270–275.
[20] Jouni Malinen. 2014. Linux WPA/WPA2/IEEE 802.1X Supplicant. https://w1./wpa_supplicant/. [Online].
[21]
Jeremy Martin, Travis Mayberry, Collin Donahue, Lucas Foppe, Lamont Brown, Chadwick Riggins, Erik C Rye, and Dane Brown. 2017.
A Study of MAC Address Randomization in Mobile Devices and When it Fails. arXiv preprint arXiv:1703.02874 (2017).
[22]
Lyudmila Mihaylova, Paul Brasnett, Nishan Canagarajah, and David Bull. 2007. Object tracking by particle ltering techniques in video
sequences. Advances and challenges in multisensor data and information processing 8 (2007), 260–268.
[23]
ABM Musa and Jakob Eriksson. 2012. Tracking unmodied smartphones using wi- monitors. In Proceedings of the 10th ACM conference
on embedded network sensor systems. ACM, 281–294.
[24]
Nuria M Oliver, Barbara Rosario, and Alex P Pentland. 2000. A Bayesian computer vision system for modeling human interactions. IEEE
transactions on pattern analysis and machine intelligence 22, 8 (2000), 831–843.
[25] Oluwatoyin P Popoola and Kejun Wang. 2012. Video-based abnormal human behavior recognitionâĂŤA review. IEEE Transactions on
Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 6 (2012), 865–878.
[26]
Romer Rosales and Stan Sclaro. 1999. 3D trajectory recovery for tracking multiple objects and trajectory guided recognition of actions.
In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., Vol. 2. IEEE, 117–123.
[27]
Ahmed Saeed, Ahmed E Kosba, and Moustafa Youssef. 2014. Ichnaea: A low-overhead robust WLAN device-free passive localization
system. IEEE Journal of selected topics in signal processing 8, 1 (2014), 5–15.
[28]
Moustafa Seifeldin, Ahmed Saeed, Ahmed E Kosba, Amr El-Keyi, and Moustafa Youssef. 2013. Nuzzer: A large-scale device-free passive
localization system for wireless environments. IEEE Transactions on Mobile Computing 12, 7 (2013), 1321–1334.
[29] K. Skinner and J. Novak. 2015. Privacy and your app. [Online].
[30] Taee T Tanimoto. 1958. Elementary mathematical theory of classication and prediction. (1958).
[31]
Mathy Vanhoef, Célestin Matte, Mathieu Cunche, Leonardo S Cardoso, and Frank Piessens. 2016. Why MAC address randomization is
not enough: An analysis of Wi-Fi network discovery mechanisms. In Proceedings of the 11th ACM on Asia Conference on Computer and
Communications Security. ACM, 413–424.
[32]
Mathias Versichele, Tijs Neutens, Matthias Delafontaine, and Nico Van de Weghe. 2012. The use of Bluetooth for analysing spatiotemporal
dynamics of human movement at mass events: A case study of the Ghent Festivities. Applied Geography 32, 2 (2012), 208–220.
[33] Harald Vogt. 2002. Ecient object identication with passive RFID tags. Pervasive computing (2002), 98–113.
[34]
Yan Wang, Jian Liu, Yingying Chen, Marco Gruteser, Jie Yang, and Hongbo Liu. 2014. E-eyes: device-free location-oriented activity
identication using ne-grained wi signatures. In Proceedings of the 20th annual international conference on Mobile computing and
networking. ACM, 617–628.
[35]
Yuji Yoshimura, Anne Krebs, and Carlo Ratti. 2017. Noninvasive Bluetooth Monitoring of Visitors’ Length of Stay at the Louvre. IEEE
Pervasive Computing 16, 2 (2017), 26–34.
Received February 2018; revised July 2018; accepted September 2018
Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.
... Numerous frameworks have been introduced to facilitate the prototyping of Wi-Fi tracker system. These include CrowdProbe [26], mD-Track [27], Widar [28], IndoTrack [29], Beanstalk [30], SenseFlow [31], ARIEL [32], Probr [33], MOBYWIT [34], etc. However, many of these studies lack comprehensive discussions on all aspects of Wi-Fi tracking system, including the hardware, software, algorithms, server system, web-based application, and system demonstration. ...
... In comparison to previous studies [18], [21], [26]- [34], this Wi-Fi tracking system offers additional components such as a server system and a user-friendly web-based application. ...
Article
This study aims to design and develop Wi-Fi tracker system that utilizes RSSI-based distance parameters for crowd-monitoring applications in indoor settings. The system consists of three main components, namely 1) an embedded node that runs on Raspberry-pi Zero W, 2) a real-time localization algorithm, and 3) a server system with an online dashboard. The embedded node scans and collects relevant information from Wi-Fi-connected smartphones, such as MAC data, RSSI, timestamps, etc. These data are then transmitted to the server system, where the localization algorithm passively determines the location of devices as long as Wi-Fi is enabled. The mentioned devices are smartphones, tablets, laptops, while the algorithm used is a Non-Linear System with Lavenberg–Marquart and Unscented Kalman Filter (UKF). The server and online dashboard (web-based application) have three functions, including displaying and recording device localization results, setting parameters, and visualizing analyzed data. The node hardware was designed for minimum size and portability, resulting in a consumer electronics product outlook. The system demonstration in this study was conducted to validate its functionality and performance.
... In a manner analogous to SSID profiling, researchers have also investigated the user demographics associated with specific Wi-Fi locations. These investigative methods encompass a spectrum of approaches, ranging from passive scanning techniques [77] to the active extraction of metadata through HTTP accesses Hong et al. [94] introduce CrowdProbe, a system deployed within a multi-floor museum. This system is designed to achieve a high level of accuracy in understanding the movement patterns of museum visitors who carry devices with RCM. ...
Article
Full-text available
The proliferation of Wi-Fi devices has led to the rise of privacy concerns related to MAC Address-based systems used for people tracking and localization across various applications, such as smart cities, intelligent transportation systems, and marketing. These systems have highlighted the necessity for mobile device manufacturers to implement Randomized And Changing MAC address (RCM) techniques as a countermeasure for device identification. In response to the challenges posed by diverse RCM implementations, the IEEE has taken steps to standardize RCM operations through the 802.11aq Task Group (TG). However, while RCM implementation addresses some concerns, it can disrupt services that span both Layer 2 and upper-layers, which were originally designed assuming static MAC addresses. To address these challenges, the IEEE has established the 802.11bh TG, focusing on defining new device identification methods, particularly for Layer 2 services that require pre-association identification. Simultaneously, the IETF launched the MAC Address Device Identification for Network and Application Services (MADINAS) Working Group to investigate the repercussions of RCM on upper-layer services, including the Dynamic Host Configuration Protocol (DHCP). Concurrently, derandomization techniques have emerged to counteract RCM defense mechanisms. The exploration of these techniques has suggested the need for a broader privacy enhancement framework for WLANs that goes beyond simple MAC address randomization. These findings have prompted the inception of the 802.11bi TG, which aims to compile an exhaustive list of potential privacy vulnerabilities and prerequisites for a more private IEEE 802.11 standard. In this context, this tutorial aims to provide insights into the motivations behind RCM, its implementation, and its evolution over the years. It elucidates the influence of RCM on network processes and services. Furthermore, the tutorial delves into the recent progress made within the domains of 802.11bh, 802.11bi, and MADINAS. It offers a thorough analysis of the initial work undertaken by these groups, along with an overview of the relevant research challenges. The tutorial objective is to inspire the research community to explore innovative approaches and solutions that contribute to the ongoing efforts to enhance WLAN privacy through standardization initiatives.
... Similarly, another framework named 'crowd Probe' was proposed to monitor the crowd in indoor settings. It utilizes hidden Markov model for extracting trajectories of individuals by using Wi-Fi monitor [21]. Another relevant work on the dynamic positions and events to manage the large crowd according to the capacity to avoid causalities was proposed by [22]. ...
Article
The management of large events with hundreds of thousands of individuals has remained a challenge over the years. Crushes and stampedes occurring in the events of mass gathering have swallowed many valuable lives around the world. Considering the substantial advancement in positional tracking, wearable technology, and wireless communication, many event organizers are embracing the use of these technologies to get assistance in managing large events. Intelligent monitoring of crowd movement and timely analysis of evolving conditions may aid in early detection of critical situations. The current research aims to propose a big data resource framework to model, simulate, and visualize the crowd conditions for actual venue settings. A distributed framework has been presented to monitor the movement and interaction of individuals in large crowded events through localized sensing and geospatial analysis of massive positional data. The pilgrimage (Hajj) has been considered as a case study for demonstrating the effectiveness of the proposed framework. The proposed framework has been with the help of synthetic data that covered some useful and frequent scenarios based on the case study of pilgrimage (hajj), which is an annual event involving more than a million people.
... Second, the wireless-based (or specifically WiFi-based) approach infers crowd properties by leveraging the relationship between wireless signals and crowd states [11], [12]. Therein, the WiFi channel state information (CSI)based approach can achieve relatively high accuracy in crowd counting [13], [14], but incurs high costs in obtaining CSI data, small deployment space, and limited scalability; in contrast, the passive WiFi sensing-based approach, which employs WiFi sniffers to passively sense nearby pedestrians via capturing probe frames sent by their mobile devices, is most promising due to its advantages of low cost, large coverage, and strong scalability [15], [16], [17], and has been validated to be applicable [18], [19] even if only limited accuracy can be obtained. ...
Preprint
Full-text available
p>Regarding the passive WiFi sensing based crowd analysis, this paper first theoretically investigates its limitations, and then proposes a deep learning based scheme targeted for returning fine-grained crowd states in large surveillance areas. To this end, three key challenges are coped with: to relieve the influences of the randomness and sparsity induced by passive WiFi sensing, an attention-based deep convolutional autoencoder model is designed to recover accurate crowd density maps in a way similar to image reconstruction; to combat the anonymity caused by MAC randomization, following the identification of local high-density crowds (LHDCs) with the density clustering algorithm, i.e. DM-DBSCAN, a bidirectional convolutional LSTM based model is employed to infer LHDC speeds; to overcome the absence of passive WiFi sensing datasets for model training, three semi-synthetic datasets are produced by emulating passive WiFi sensing with practical pedestrian tracking datasets. Extensive experiments confirm that, the proposed scheme significantly outperforms existing WiFi-based methods in terms of crowd density estimation and provides superior crowd speed estimation. More importantly, the scheme can also produce consistent crowd states on a real-world dataset, revealing that it has the ability to support accurate, visualized and real-time crowd monitoring in large surveillance areas.</p
... Second, the wireless-based (or specifically WiFi-based) approach infers crowd properties by leveraging the relationship between wireless signals and crowd states [11], [12]. Therein, the WiFi channel state information (CSI)based approach can achieve relatively high accuracy in crowd counting [13], [14], but incurs high costs in obtaining CSI data, small deployment space, and limited scalability; in contrast, the passive WiFi sensing-based approach, which employs WiFi sniffers to passively sense nearby pedestrians via capturing probe frames sent by their mobile devices, is most promising due to its advantages of low cost, large coverage, and strong scalability [15], [16], [17], and has been validated to be applicable [18], [19] even if only limited accuracy can be obtained. ...
Preprint
Full-text available
p>Regarding the passive WiFi sensing based crowd analysis, this paper first theoretically investigates its limitations, and then proposes a deep learning based scheme targeted for returning fine-grained crowd states in large surveillance areas. To this end, three key challenges are coped with: to relieve the influences of the randomness and sparsity induced by passive WiFi sensing, an attention-based deep convolutional autoencoder model is designed to recover accurate crowd density maps in a way similar to image reconstruction; to combat the anonymity caused by MAC randomization, following the identification of local high-density crowds (LHDCs) with the density clustering algorithm, i.e. DM-DBSCAN, a bidirectional convolutional LSTM based model is employed to infer LHDC speeds; to overcome the absence of passive WiFi sensing datasets for model training, three semi-synthetic datasets are produced by emulating passive WiFi sensing with practical pedestrian tracking datasets. Extensive experiments confirm that, the proposed scheme significantly outperforms existing WiFi-based methods in terms of crowd density estimation and provides superior crowd speed estimation. More importantly, the scheme can also produce consistent crowd states on a real-world dataset, revealing that it has the ability to support accurate, visualized and real-time crowd monitoring in large surveillance areas.</p
... An example is the de-anonymization result of two political meetings held a few days before the elections in Italy, which surprisingly coincide with the reported official voting results. In [118], the authors propose the use of passive monitors to monitor WiFi evidence in a museum to extract information about the behavior of its visitors. More than 1.7 million probes were collected during the six months of capture. ...
Article
Full-text available
Smart cities, leveraging IoT technologies, are revolutionizing the quality of life for citizens. However, the massive data generated in these cities also poses significant privacy risks, particularly in de-anonymization and re-identification. This survey focuses on the privacy concerns and commonly used techniques for data protection in smart cities, specifically addressing geolocation data and video surveillance. We categorize the attacks into linking, predictive and inference, and side-channel attacks. Furthermore, we examine the most widely employed de-identification and anonymization techniques, highlighting privacy-preserving techniques and anonymization tools; while these methods can reduce the privacy risks, they are not enough to address all the challenges. In addition, we argue that de-identification must involve properties such as unlikability, selective disclosure and self-sovereignty. This paper concludes by outlining future research challenges in achieving complete de-identification in smart cities.
... It is based on power measurements and PRs; however, it does not provide a description of the MAC randomization procedure used. The solution for understanding the behavior of visitors in [17] addresses the issue of the device randomization by using more than 1.7 million PR frames, historical transition probability and a Hidden Markov Model (HMM)-based trajectory inference algorithm. The group behavior detection system introduced in [18] is also based on capturing PRs and tackling MAC address randomization by the method described in [7] based on collective matrix factorization, which reveals the hidden associations by factorizing mobility information and usage patterns simultaneously. ...
Article
Full-text available
Monitoring the presence and movements of individuals or crowds in a given area can provide valuable insight into actual behavior patterns and hidden trends. Therefore, it is crucial in areas such as public safety, transportation, urban planning, disaster and crisis management, and mass events organization, both for the adoption of appropriate policies and measures and for the development of advanced services and applications. In this paper, we propose a non-intrusive privacy-preserving detection of people’s presence and movement patterns by tracking their carried WiFi-enabled personal devices, using the network management messages transmitted by these devices for their association with the available networks. However, due to privacy regulations, various randomization schemes have been implemented in network management messages to prevent easy discrimination between devices based on their addresses, sequence numbers of messages, data fields, and the amount of data contained in the messages. To this end, we proposed a novel de-randomization method that detects individual devices by grouping similar network management messages and corresponding radio channel characteristics using a novel clustering and matching procedure. The proposed method was first calibrated using a labeled publicly available dataset, which was validated by measurements in a controlled rural and a semi-controlled indoor environment, and finally tested in terms of scalability and accuracy in an uncontrolled crowded urban environment. The results show that the proposed de-randomization method is able to correctly detect more than 96% of the devices from the rural and indoor datasets when validated separately for each device. When the devices are grouped, the accuracy of the method decreases but is still above 70% for rural environments and 80% for indoor environments. The final verification of the non-intrusive, low-cost solution for analyzing the presence and movement patterns of people, which also provides information on clustered data that can be used to analyze the movements of individuals, in an urban environment confirmed the accuracy, scalability and robustness of the method. However, it also revealed some drawbacks in terms of exponential computational complexity and determination and fine-tuning of method parameters, which require further optimization and automation.
Article
Regarding the passive WiFi sensing based crowd analysis, this paper first theoretically investigates its limitations, and then proposes a deep learning based scheme targeted for returning fine-grained crowd states in large surveillance areas. To this end, three key challenges are coped with: to relieve the influences of the randomness and sparsity induced by passive WiFi sensing, an attention-based deep convolutional autoencoder model is designed to recover accurate crowd density maps in a way similar to image reconstruction; to combat the anonymity caused by MAC randomization, following the identification of local high-density crowds (LHDCs) with the density clustering algorithm, i.e. DM-DBSCAN, a bidirectional convolutional LSTM based model is employed to infer LHDC speeds; to overcome the absence of passive WiFi sensing datasets for model training, three semi-synthetic datasets are produced by emulating passive WiFi sensing with practical pedestrian tracking datasets. Extensive experiments confirm that, the proposed scheme significantly outperforms existing WiFi-based methods in terms of crowd density estimation and provides superior crowd speed estimation. More importantly, the scheme can also produce consistent crowd states on a real-world dataset, revealing that it has the ability to support accurate, visualized and real-time crowd monitoring in large surveillance areas.
Article
The needs of tourists have become diversified in recent years, and providing tourist information can help satisfy these various needs. Although such information should reflect actual travel behavior, it is not easy to collect data on how tourists visit tourist attractions. Nowadays, it is not uncommon to see tourists looking at their smartphone, searching for information about where to go next. If a recommendation system could suggest the next destination, more tourists would appreciate the information and visit there. Therefore, we envisioned a recommendation system based on records of tourist movements from Wi-Fi packet sensor data. The system would provide tourists with relevant information about the recommended locations. The aim of this research was to investigate the computation method for the recommendations, which is necessary to demonstrate this idea. We also compared the results and characteristics of recommendations made based on cosine similarity. The results showed that, when using Wi-Fi data without cleaning, some tourist spots were strongly recommended because of bias in the data observations. Therefore, a dataset was also used that was restricted to tourists who visited three or more places. As a result of this restriction, the accuracy of the recommendations was improved. Furthermore, we made a recommendation based on a dummy variable that represented whether tourists had visited each location, which enabled recommendations to be generated for locations where it was difficult to observe data.
Article
Full-text available
Media Access Control (MAC) address randomization is a privacy technique whereby mobile devices rotate through random hardware addresses in order to prevent observers from singling out their traffic or physical location from other nearby devices. Adoption of this technology, however, has been sporadic and varied across device manufacturers. In this paper, we present the first wide-scale study of MAC address randomization in the wild, including a detailed breakdown of different randomization techniques by operating system, manufacturer, and model of device. We then identify multiple flaws in these implementations which can be exploited to defeat randomization as performed by existing devices. First, we show that devices commonly make improper use of randomization by sending wireless frames with the true, global address when they should be using a randomized address. We move on to extend the passive identification techniques of Vanhoef et al. to effectively defeat randomization in ~96% of Android phones. Finally, we identify a previously unknown flaw in the way wireless chipsets handle low-level control frames which applies to 100% of devices we tested. This flaw permits an active attack that can be used under certain circumstances to track any existing wireless device.
Conference Paper
Full-text available
In this paper, we present an approach to extract social behavior and interaction patterns of mobile users by passively monitoring WiFi probe requests and null data frames that are sent by smartphones for network control/management purposes. By analyzing the temporal and spatial correlations of the Receive Signal Strength Indicators (RSSI) of packets from these low rate transmissions, we are able to discover proximity relationships, occupancy patterns, and social interactions among users. We evaluate the SocialProbe system using commodity off-the-shelf smartphones and WiFi Access Points in two locations, a research lab and a public dining area. The result shows that the proposed approach is able to obtain reliable social relationships and interactions in a non-intrusive way.
Article
Full-text available
Modeling human behaviors and activity patterns for recognition or detection of special event has attracted significant research interest in recent years. Diverse methods that are abound for building intelligent vision systems aimed at scene understanding and making correct semantic inference from the observed dynamics of moving targets. Most applications are in surveillance, video content retrieval, and human-computer interfaces. This paper presents not only an update extending previous related surveys, but also a focus on contextual abnormal human behavior detection especially in video surveillance applications. The main purpose of this survey is to extensively identify existing methods and characterize the literature in a manner that brings key challenges to attention.
Article
Art museum professionals traditionally rely on observations and surveys to enhance their knowledge of visitor behavior and experience. However, these approaches often produce spatially and temporally limited empirical evidence and measurements. Only recently has the ubiquity of digital technologies revolutionized the ability to collect data about human behavior. Consequently, the greater availability of large-scale datasets based on quantifying visitors' behavior provides new opportunities to apply computational and comparative analytical techniques. In this article, the authors analyze visitor behavior in the Louvre Museum from anonymized longitudinal datasets collected from noninvasive Bluetooth sensors. They examine visitors' length of stay in the museum and consider this relationship with occupation density around artwork. This data analysis increases museum professionals' knowledge and understanding of the visitor experience. This article is part of a special issue on smart cities.
Conference Paper
We present several novel techniques to track (unassociated) mobile devices by abusing features of the Wi-Fi standard. This shows that using random MAC addresses, on its own, does not guarantee privacy. First, we show that information elements in probe requests can be used to fingerprint devices. We then combine these fingerprints with incremental sequence numbers, to create a tracking algorithm that does not rely on unique identifiers such as MAC addresses. Based on real-world datasets, we demonstrate that our algorithm can correctly track as much as 50% of devices for at least 20 minutes. We also show that commodity Wi-Fi devices use predictable scrambler seeds. These can be used to improve the performance of our tracking algorithm. Finally, we present two attacks that reveal the real MAC address of a device, even if MAC address randomization is used. In the first one, we create fake hotspots to induce clients to connect using their real MAC address. The second technique relies on the new 802.11u standard, commonly referred to as Hotspot 2.0, where we show that Linux and Windows send Access Network Query Protocol (ANQP) requests using their real MAC address.
Conference Paper
The IEEE 802.11 standard defines Wi-Fi probe requests as a active mechanism with which mobile devices can request information from access points and accelerate the Wi-Fi connection process. Researchers in previous work have identified privacy hazards associated with Wi-Fi probe requests, such as leaking past access points identifiers and user mobility. Besides several efforts to develop privacy-preserving alternatives, modern mobile devices continue to use Wi-Fi probe requests. In this work, we quantify Wi-Fi probe requests' threat to privacy by conducting an experimental study of the most popular smartphones in different settings. Our objective is to identify how different factors influence the probing frequency and the average number of broadcasted probes. Our conclusions are worrisome: On average, some mobile devices send probe requests as often as 55 times per hour, thus revealing their unique MAC address at high frequency. Even if a mobile device is not charging and in sleep mode, it might broadcast about 2000 probes per hour. We also evaluate a commercially deployed MAC address randomization mechanism, and demonstrate a simple method to re-identify anonymized probes.
Conference Paper
Whenever our smartphones have their WiFi radio interface on, they periodically try to connect to known wireless APs (networks the user has connected to in the past). This is done through WiFi Probe requests—special wireless frames that contain the MAC address of the sending device and, in most of the cases, the human-readable name-string (SSID) of the known AP. This semantic information, inherent to the network protocol, is sent in the clear and, if sniffed, can help discover important information and phenomena of people and human nature that have nothing to do with technology. In this paper we present the idea of exploiting WiFi probe requests to de-anonymize the origin of participants in large events. We make use of several, publicly available datasets containing more than 11M of probe requests collected in scenarios that are of citywide, national (two political meetings), and international religion-related relevance. We show how, by exploiting the semantic information brought by the relative WiFi probes, we are able to discover with high accuracy the provenance of the crowds in each event. In particular, the de-anonymization outcome of the two political meetings held few days before the election days in Italy match surprisingly well the official voting results reported for the two respective parties.
Conference Paper
Whenever our smartphones have their WiFi radio interface on, they periodically try to connect to known wireless APs (networks the user has connected to in the past). This is done through WiFi Probe requests—special wireless frames that contain the MAC address of the sending device and, in most of the cases, the human-readable name-string (SSID) of the known AP. This semantic information, inherent to the network protocol, is sent in the clear and, if sniffed, can help discover important information and phenomena of people and human nature that have nothing to do with technology. In this paper we present the idea of exploiting WiFi probe requests to de-anonymize the origin of participants in large events. We make use of several, publicly available datasets containing more than 11M of probe requests collected in scenarios that are of citywide, national (two political meetings), and international religion-related relevance. We show how, by exploiting the semantic information brought by the relative WiFi probes, we are able to discover with high accuracy the provenance of the crowds in each event. In particular, the de-anonymization outcome of the two political meetings held few days before the election days in Italy match surprisingly well the official voting results reported for the two respective parties.