ArticlePDF Available

CrowdProbe: Non-invasive Crowd Monitoring with Wi-Fi Probe

September 2018
Proceedings of the ACM on Interactive Mobile Wearable and Ubiquitous Technologies 2(3):1-23

September 2018
2(3):1-23

Authors:

National University of Singapore

Devices with integrated Wi-Fi chips broadcast beacons for network connection management purposes. Such information can be captured with inexpensive monitors and used to extract user behavior. To understand the behavior of visitors, we deployed our passive monitoring system---CrowdProbe, in a multi-floor museum for six months. We used a Hidden Markov Models (HMM) based trajectory inference algorithm to infer crowd movement using more than 1.7 million opportunistically obtained probe request frames. However, as more devices adopt schemes to randomize their MAC addresses in the passive probe session to protect user privacy, it becomes more difficult to track crowd and understand their behavior. In this paper, we try to make use of historical transition probability to reason about the movement of those randomized devices with spatial and temporal constraints. With CrowdProbe, we are able to achieve sufficient accuracy to understand the movement of visitors carrying devices with randomized MAC addresses.

Probe request interval distribution from data collected in museum

…

. Data statistics in Museum

…

Architecture of CrowdProbe

…

. Classification result with different phone models

…

Floorplan for the museum and the deployment layout

…

Figures - uploaded by Hande Hong

Content may be subject to copyright.

Content uploaded by Hande Hong

Content may be subject to copyright.

115

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe

HANDE HONG∗,National University of Singapore, Singapore

GIRISHA DURREL DE SILVA, National University of Singapore, Singapore

MUN CHOON CHAN, National University of Singapore, Singapore

Devices with integrated Wi-Fi chips broadcast beacons for network connection management purposes. Such information can

be captured with inexpensive monitors and used to extract user behavior. To understand the behavior of visitors, we deployed

our passive monitoring system—CrowdProbe, in a multi-oor museum for six months. We used a Hidden Markov Models

(HMM) based trajectory inference algorithm to infer crowd movement using more than 1.7 million opportunistically obtained

probe request frames.

However, as more devices adopt schemes to randomize their MAC addresses in the passive probe session to protect user

privacy, it becomes more dicult to track crowd and understand their behavior. In this paper, we try to make use of historical

transition probability to reason about the movement of those randomized devices with spatial and temporal constraints.

With CrowdProbe, we are able to achieve sucient accuracy to understand the movement of visitors carrying devices with

randomized MAC addresses.

CCS Concepts:

•Networks →Location based services

;

•Human-centered computing →Mobile phones

;

•Mathe-

matics of computing →Kalman lters and hidden Markov models;

Additional Key Words and Phrases: Passive tracking, randomization, transition probability, Crowd movement

ACM Reference Format:

Hande Hong, Girisha Durrel De Silva, and Mun Choon Chan. 2018. CrowdProbe: Non-invasive Crowd Monitoring with

WiFi Probe. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3, Article 115 (September 2018), 23 pages. https:

//doi.org/10.1145/3264925

1 INTRODUCTION

Understanding how crowds move and how they behave has been one of the focuses for the research community.

Gaining such information is of vital importance for managing visitor ow in public areas such as shopping malls,

railway stations, and museums. By knowing how people move, we are able to come up with countermeasures to

reduce congestion and improve the spatial arrangement. Furthermore, we can foresee a visitor’s future movement

based on statistical patterns.

The most traditional way of tracking is to use pencil and paper to record how users move along with the

corresponding timestamps. Such a method is labor-intensive and tedious. It is also error-prone when there is

a large crowd. The ubiquity of digital devices and technologies have revolutionized the way we get to know

about our environment. Video-based recognition is one of the most popular technologies used to observe visitor

∗This is the corresponding author

Authors’ addresses: Hande Hong, National University of Singapore, Singapore, honghand@comp.nus.edu.sg; Girisha Durrel De Silva,

National University of Singapore, Singapore, girisha@comp.nus.edu.sg; Mun Choon Chan, National University of Singapore, Singapore,

chanmc@comp.nus.edu.sg.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that

copies are not made or distributed for prot or commercial advantage and that copies bear this notice and the full citation on the rst

page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy

otherwise, or republish, to post on servers or to redistribute to lists, requires prior specic permission and/or a fee. Request permissions from

permissions@acm.org.

2474-9567/2018/9-ART115 $15.00

https://doi.org/10.1145/3264925

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:2 •H. Hong et al.

behavior[

–

]. However, the deployment of the video-based system is expensive and the system could

potentially have poor performance because of limited lighting condition and overlapped individuals in the same

image. Furthermore, people are concerned about privacy and not willing to be subjected to visual monitoring. To

overcome the above limitations, researchers have looked to exploit dierent technologies including the use of

Bluetooth[19,32], cellular network[2] and, RFID[6,33].

Due to the widespread deployment of WiFi networks and the availability of WiFi chipsets on smartphones, use

of WiFi related information to extract user information has been both popular and shown to be eective[

Smartphones periodically broadcast probe request frames to trigger responses from nearby APs. By deploying

WiFi monitors in the environment, we can capture these management frames and extract location information

related to phone owners. Such methods are passive because they require no change on the mobile devices. Passive

scanning is performed only by the WiFi monitors with no impact on the operations of existing infrastructure.

While previous work[

] has shown the feasibility of such methods, iOS and Android have enabled MAC

randomization to protect user privacy. This adds to the challenge of whether such technique can be used in

practice.

In this paper, we present CrowdProbe, a system that has been deployed in a multi-oor museum to track

thousands of visitors daily using passive WiFi monitoring over six months. We input temporally and spatially

sparse passively collected RSS ngerprints to a Hidden Markov Models(HMM) based model to generate visitor

trajectories. Dierent from traditional HMM, we do not obtain regular observations from the system since the

probe requests are only sent opportunistically and can be quite sparse. Instead, we modify the model to include

specic features of museum visitors to improve the trajectory inference performance. In addition, we make use

of historical transition probability to reason about the movement of those randomized devices with spatial and

temporal constraints. We summarize our contributions as follows:

•

To the best of our knowledge, CrowdProbe is the rst large-scale passive WiFi monitoring system deployed

in a complex indoor public space. Six months’ experience and data we get can be valuable in bridging

research and practical usage.

•

We use Hidden Markov Models(HMM) based trajectories generation method which makes use of WiFi

ngerprinting, spatial constraints and temporal constraints. With the proposed method, we successfully

generate more than 91 thousand traces which give adequate information to understand visitor behavior in

the museum.

•

Based on the data accumulated in the visitor traces, we generate visitor transition probability and show

that this information can be used to accurately reason about the short time crowd movement of the visitors

with mobile devices with randomized MAC address.

The rest of the paper is organized as follows. We give the background of probe request and MAC randomization

in Section 2. In Section 3, we describe the architecture for the CrowdProbe system and deployment setting. In

Section 4, we present how the data is processed for trajectory inference. We present our trajectory inference

algorithm in Section 5. We use the transition probability generated by the trajectory to infer the movement of the

visitors with mobile devices with randomized MAC address in Section 6. The evaluation of CrowdProbe is given

in Section 7. Then we present the related work and discussion in Section 8and Section 9. Finally, we summarize

the paper in Section 10.

2 BACKGROUND

2.1 Probe Request

Smartphone broadcasts probe request frames to trigger responses from nearby APs with the purpose of speeding

up the discovery of surrounding APs. Such frames are management frames containing information such as

network identier (SSID), MAC address, signal strength, and the time stamp. The emission of such a frame is

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:3

0.05

0.1

0.15

0.2

0.25

0-5s

5-10s

10-20s

20-30s

30-60s

1-2min

2-3min

3-5min

5-10min

>10min

Percentage

Probe Interval

0.162

0.142

0.163

0.070

0.139

0.103

0.058 0.039 0.042

0.082

Fig. 1. Probe request interval distribution from data collected in museum

unavoidable as long as the device needs to connect to the network. Devices generally send probe request frames

when they are not associated. However, when the currently connected WiFi signal becomes weak, the device

will start to send probe frames to nd better network candidate and prepare for handover. Such features make it

suitable for indoor tracking as most of the indoor environments have complex layouts. When a visitor moves

around inside the building, WiFi signal can vary a lot and trigger another probe to be sent from the mobile device.

To understand how frequently probe requests are sent in real life scenarios, we process the data we collect

in the museum and plot the result in the Figure 1. As can be seen from the gure, probe request frames can be

sent with intervals ranging from 5 seconds to more than 10 min, with 88% of the frames sent within 5 min. In

places like shopping malls, museums, and other public spaces, visitors can spend up to an hour or more. The

information provided by the probe requests can provide up to minute-level granularity on coarse user locations

and thus can help us understand the movement of visitors in these public spaces.

2.2 MAC Randomization

2.2.1 iOS. From iOS 8 onward, Apple introduced MAC address randomization to avoid passive tracking of

devices. The initial setting is that randomized addresses are used only while the devices are not associated and in

sleep mode[

]. In later versions, the condition to trigger randomization has been extended to include location

service and auto-join scan [

]. This means that devices are sending more randomized MAC address in the probe

frame. From previous work in [

], we know that Apple device seems to implement true randomization across

the entire eld of MAC address.

2.2.2 Android. Following the same pace as iOS, Google’s Android operating system added experimental

support for MAC randomization. Full implementation went live in version 6.0 which covers most of the Android

user base. However, a recent study shows that Android’s MAC randomization is largely absent[

] even if the OS

version does support this feature. Compared to Apple device, Android devices, for example, Google devices are

always randomized with prex DA:A1:19.

2.2.3 MAC Randomization Implementation in Practice. We made an analysis of the museum data regarding

MAC randomization and show the statistics in Table 1. Among all the probe request frames we have collected, 63%

of the probe request frames were sent with randomized MAC addresses. If devices have similar probe frequency,

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:4 •H. Hong et al.

Fingerprint Generation

Outlier

Non-Mobile

Device

Trajectory Inferring

Temporal and

Spatial Constraint

Transition

Probability

Global Unique or

Long-lived

Randomized MAC

Short-lived

Randomized MAC

Crowd Movement for Short-lived Randomized MAC

Fig. 2. Architecture of CrowdProbe

then the ratio of devices that have implemented randomization is close to 63% of the population. On average,

each global unique MAC sent 34 probe request frames, while locally assigned MAC addresses were only sent with

5 probe request frames each. While global unique addresses have a 1-1 relationship with an individual device, a

device performing randomization can have either 1-to-1 or 1-to-many relationship in a single day. Thus most of

the randomized MAC addresses only existed in a limited number of probe request over a specic time period and

were never seen again. Overall, we can see that randomized devices play an important role in crowd monitoring.

If we are not able to properly tackle this problem, half of the information is concealed.

Table 1. Data statistics in Museum

Category Global Unique MAC Randomized MAC Not mobile device

Probe Request Frame Number 1,744,764 3,006,941 108,262

MAC Address Number 50,953 602,133 2373

Probe request Per MAC 34 5 45

3 ARCHITECTURE AND DEPLOYMENT OF CROWDPROBE

The architecture of CrowdProbe is shown in Figure 2. Multiple WiFi monitors are deployed in varies locations

and each WiFi monitor scans for probe request frames. When a user, carrying WiFi-enabled mobile devices,

walks around dierent exhibition locations, the frames transmitted are captured by the monitors. Ideally, the

WiFi monitors should be placed in the location that beacons from a device in any location within the monitored

area can be heard by multiple monitors.

Data collected by the monitors are sent to the server for further processing. The server performs data analysis

to generate crowd movement information:

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:5

Level 1

Level 2

Level 3

Location of Monitor

ALocation Label

Entrance

80m

128m

50m

Detection Range Indication I

Fig. 3. Floorplan for the museum and the deployment layout

•Device Filtering:

ngerprints from remote devices, non-mobile devices, and devices from sta in the

museum are ltered out to make sure that the devices are carried by real visitors.

•Fingerprint Generation and Classication:

probe request data from multiple monitors are merged to

form the signal ngerprint. After that, these ngerprints are divided into two categories: stable MAC and

short-lived randomized MAC.

•Trajectory Inference:

Data from the stable set are used to generate visitors’ trajectories based on

temporal and spatial constraints. Using the result in trajectory generation, we are able to derive transition

probabilities.

•Movement Inference for Randomized Devices:

The transition probabilities can be input as a tool to

guess the movement of randomized devices in a short time slot. By combining data from the randomized

devices and global unique MAC devices, we can give a complete view of visitor movement in the museum.

The deployment of CrowdProbe has two components: the front-end WiFi monitors and back-end servers. We

deployed the system in a museum of three oors We divide the museum into 9 locations, marked with dierent

colors in Figure 3. Location A is the main entrance and ticket counter. The location I is a cafe providing food and

space for visitors to have a rest. The other seven locations are dierent exhibitions focus on dierent topics. The

WiFi monitor deployed is a Raspberry Pi 3 device equipped with one D-Link wireless USB adapter(DWA-132).

Raspberry Pi 3 is a low-cost computing platform with a 1.2 GHz quad-core ARM Cortex A53, 1 GB LPDDR2-900

SDRAM, and supports 802.11n Wireless LAN. Since the embedded WiFi adapter in the Raspberry Pi 3 cannot

operate in the monitor mode, we instead use USB WiFi dongles to implement passive scanning. Each monitor

can pick up transmissions sent by the mobile devices in the vicinity. Note that as the mobile devices transmit

probes on all channels in the supported spectrum (typically both 2.4GHz and 5GHz), a monitor can ideally hear

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:6 •H. Hong et al.

Fig. 4. Monitors deployed in the museum

transmissions from all nearby mobile devices by sning on a single channel. However, in practice, due to packet

loss, not all transmissions will be received. However, it has been noted that hopping between channel does not

help to pick up more messages [

]. In our deployment, in order to maximize the probe request, the monitors

are set to listen to the same channel with the nearby WiFi APs provided by the museum. To increase frame

reception, we also sni NULL data frames which are used for power management purpose when the devices are

associated[15].

Figure 4shows one of the monitors deployed and the device components we used for monitoring. We deploy a

total of 10 boxes to ensure that we cover most of the exhibition locations. The deployment locations are labeled

in Figure 3with red circle icons. Due to aesthetic requirements by the museum management and the need to

access power, we are not able to deploy the monitors in the desired locations to maximize coverage. Most of the

monitors are installed under chairs, in corridors, or behind doors, which is not optimal for data collection. For

example, in Figure 3, location C, and D do not have proper monitors in the center area. Nevertheless, we are able

to cover most of the area suciently to understand visitors’ movement pattern. The data collection is carried

out with approval from the Institutional Review Board(IRB). To keep the privacy of visitors, we do not store the

actual value but instead stored a hashed value of the MAC address after we verify that the MAC address is valid

or randomized.

In the following sections, we will elaborate the details of each component of CrowdProbe and the corresponding

challenges.

4 DEVICE FILTER AND CLASSIFICATION

In this section, we will describe the process carried out to increase the likelihood that the ngerprints collected

come from visitors to the museum.

4.1 Filtering Remote Devices

Since the museum is located near a street famous for food and bars, monitors deployed may opportunistically

capture probe frames from pedestrians on the streets. Such data has to be removed. This is handled by enforcing

a minimum requirement of good quality RSS. While it is possible for visitors to visit signal blind spots with weak

RSS, but it is not likely for visitors to spend all their time in such area. If the visitor walks around the museum,

there is a good chance that strong RSS signals from the devices can be captured.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:7

MAC Address

Global Unique Randomized

Long-Lived Short-Lived

Stable MAC

Fig. 5. MAC address classification

4.2 Filtering Non-mobile Devices

This ltering step is to make sure that the devices detected are from valid mobile device vendors. Note that we

are mainly interested in smartphones carried by mobile users. We make use of the online public database to

match the OUI of MAC addresses collected from probe request frames. Since we need to use the OUI eld of the

device, this step targets on global unique MAC address. Fortunately, non-mobile devices do not have the security

concern to be tracked and thus lack the incentive to implement randomization.

4.3 Filtering Security Guard and Sta in Museum

Individuals inside the museum can be visitors or employees of the museum. The dierence between these two

categories is that visitors usually go to the museum occasionally, while employees stay in the museum over

multiple days a week. Thus we keep a list of hashed MAC addresses that were captured by the monitors over

multiple days. This set of devices may belong to employees in the museum or to non-mobile devices, for example,

desktop with a WiFi dongle that comes from mobile phone vendor.

4.4 Fingerprint Generation and Classification

After device ltering, data collected from dierent monitors are merged into a signal ngerprint based on the

time stamp. For example, the ngerprint is represented as

{r1,r2,r3, .. ., rn}

, where

is the RSS captured by

monitor

. We use the value of -99 to denote missing data when the monitor fails to capture the probe request or

the monitor is too far away. Since all the monitors are connected to the internet to transmit data to the server, we

also have to ensure that the clocks in monitors are well synchronized.

The last step to prepare the ngerprint data is as follows. We divide all the ngerprint data into two categories:

stable MAC and short-lived MAC as shown in Figure 5. Stable MAC includes the global unique data and the set of

randomized MAC data that do not change their mac address(Long-Lived). Long-Lived MAC addresses were sent

by randomized devices, but they preserve the same randomized MAC over the entire visit. For these devices, we

can track them as easily as the devices with globally unique MACs. Data from the globally unique and long-lived

randomized MAC are given as input to generate the trajectories of visitors.

5 TRAJECTORY INFERENCE WITH HIDDEN MARKOV MODELS

To infer user movement trajectories, we model the visiting process as a probability-based state transition

process. We adopt the most prevalent method used in passive tracking or indoor localization: Hidden Markov

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:8 •H. Hong et al.

Model(HMM)[

]. HMM models next state based on the previous state, current observation, and transition

probability. In our scenario, the hidden states are the location labels of visitors with given observations as RSS

ngerprint vectors. Thus, we rst try to match each ngerprint to a set of locations. Then we make use of spatial

and temporal constraints to generate the transition probability and nally the target trajectory.

5.1 Emission Probabilities

The emission probability model denes the probability distribution of the visitors’ location across the entire

space where each ngerprint is captured. Correctly modeling the emission probability forms the basis for our

trajectory inference. To make full use of the signal information in the passive RSS ngerprint from all the

nearby monitors, we use ngerprint similarity to identify the location. We used four dierent phones to collect a

ngerprint database in all the exhibition locations. We normalized the ngerprint and calculated their Tanimoto

Coecient[

]. Cross-validation is used to understand the performance of such ngerprint similarity method.

The ngerprint database is separated into a training set and a testing set. We show the result of testing data from

dierent phone models in Table 2. The four phone models (Nexus5, Nexus6, Meizu MX6, Meizu Pro6) are all

using the Android OS. We use Android as we can easily modify the phone to send more probe frames with global

unique MAC address.

Table 2. Classification result with dierent phone models

Train/Test Mx6 Pro6 Nexus5 Nexus6

Mx6 0.88 0.68 0.66 0.72

Pro6 0.65 0.91 0.8 0.7

Nexus5 0.71 0.82 0.87 0.78

Nexus6 0.67 0.72 0.79 0.86

As we can see in Table 2, if the training set and testing set come from the same phone models, we are able to

achieve close to 90% accuracy. However, when mapping to dierent phone models, the accuracy drastically drops

to 70 percent. So, besides the multi-path eect, antenna gain and phone placement, phone model dierences also

have a negative impact on ngerprint matching. Furthermore, a phone can also transmit at dierent power levels

depending on the specic IEEE 802.11 version used [

]. For example, Samsung Galaxy S4 sends at 13 dB using

802.11a but it sends at 12 dB using 802.11n.

Clearly, ngerprint similarity alone is insucient to improve the accuracy. Our approach is to keep a set of

locations in our emission probabilities. That is to say, we do not decide on a single location for each ngerprint.

Instead, we keep a list of candidates assigning each of the possible candidate a probability. The idea is similar

to particle ltering[

], but instead of keeping a large number of random sample, we only keep a limited set of

candidates for higher eciency. So for each ngerprint

{r1,r2, .. ., rn}

, we have a list of candidate locations

{l1,l2, .. ., ln}

. We calculate the similarity of each candidate with the corresponding ngerprint in the database.

Then we get a list of ngerprint similarity

{s1,s2, .. ., sn}

. We estimate the conditional probability of a device in

location ljgiven ngerprint fias follow:

ωj=(rj+99)/

(rk+99)(1)

p(lj|f i)=ωj∗sj/

(ωk∗sk)(2)

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:9

0.2

0.4

0.6

0.8

0 hop 1 hop 2 hops 3 hops

Ratio

3 Min

5 Min

10 Min

20 Min

Fig. 6. Visitors’ probabilities of moving to other locations with dierent time intervals

With the weight

ωj

, we are giving more condence to the stronger RSS ngerprints. After this step, for each

ngerprint, we are able to generate a list of candidate locations and their emission probabilities. For example, {A:

0.7, B: 0.15, C: 0.07, E: 0.08} or {F: 0.26, G: 0.54, H: 0.21 }.

5.2 Transition Probabilities

How the visitor will move between each pair of consecutive ngerprints is modeled as transition probabilities.

We need to decide what is the probability for the visitor to move between exhibition locations or stay in the

same location based on consecutive ngerprints and their time stamps. In our modeling, we made the following

assumptions:

Assumption 1: A visitor’s movement in the museum is suciently slow compared to the timescale of probe request

capture such that the WiFi monitor is able to track his/her movement from one location to another.

For CrowdProbe to work well, a user needs to spend enough time in a single location so that it is likely that

the device transmits at least one probe request from each location. To verify our assumption, we collected the

transition pattern of visitors to the museum with dierent time intervals ranging from 3 min to 20 min and

plotted the result in Figure 6. The x-axis shows how far a visitor can move, measured in the number of hops from

current location to a destination location. From the gure, we can see that when the time interval is short, say 3

min, the likelihood that a visitor will stay in the same location is more than 80%. When the time interval is 20

min, a visitor has a 30% chance of moving to a location two or more hops away. In a 5 min interval, the likelihood

of a visitor either staying in the same location or move to a neighboring location is 93%.

Assumption 2: The longer a visitor spends in an exhibition location, the more likely he will leave for the next

exhibition location.

If a visitor has already spent some time, for example, 15 minutes, in the same exhibition location, then he is

more likely to leave the location than the visitor who just arrives in this area. Thus, the transition probability

should also take time already spent in the current location into consideration. Figure 7shows the decay curve for

locations D and E. As more time elapsed, more visitors will leave the place.

With this rule, we also solve a classical problem in passive tracking: the handover problem. The handover

problem comes when the visitor is near the boundary of two dierent locations. The location inferred from the

ngerprint can jump back-and-forth between the two locations. In our scenario, the museum is a multi-oor

building where some of the ceilings between dierent oor are removed for aesthetic requirements. For instance,

location A and E are connected openly without blocking, which leads to the problem that visitors in location A

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:10 •H. Hong et al.

0.2

0.4

0.6

0.8

0 10 20 30 40 50 60

Percent of visitors remaining

Time elapsed (Min)

Location E

Location D

Fig. 7. The visitor decay curves for location D and E

have a high chance to be detected in location E. Sequences of ngerprints can generate jitters like AEAEAE in

the trajectory derived. By taking the tendency to stay into consideration, more stable transitions can be obtained.

Based on the above discussion, we dene the following:

Staying Tendency

describes the inclination of the visitor to stay in the same exhibition location. From Figure

7, we can see that the percent of visitors remaining and the stay time follow an inverse proportional relationship.

Thus we dene the staying tendency coecient ωte nd as

ωtend =τth r es ho ld

t+1(3)

where

is the time length that the visitor has stayed in the current exhibition location. We add 1 to the ratio to

handle extremely short duration

. The longer time the visitor has spent in the same location, the smaller the

value of

ωtend

will be. In Figure 7, we see dierent curves for dierent locations. Thus the time length threshold

τth r es ho ld

, which indicates the stay time length when a visitor has an equal chance to stay and leave the current

location, should change based on dierent locations.

Order of Neighbor

is dened as the number of locations a person must pass through to reach a specic

exhibition location from the current location. For example, in the oorplan of the museum shown in Figure 3,

for location C, the 1st-order neighbors include its immediate adjacent locations (A, D, and I), and its 2nd-order

neighbors include the immediate adjacent locations of its 1st-order neighbors (excluding C and C’s rst order

neighbors). We dene

hopij

as the number of hops a person need to transit from location

to location

, which

equal to the order of neighbor. Particularly, we set hopii as 1.

Based on the map constraint and temporal limitations, with time interval

τin t er val

between consecutive

ngerprint, we dene the transition likelihood

LHi→j

and normalized transition probability

pi→j

between

location iand jas follows:

LHi→j=(ωt end /hopii +τin t erva l /τt hr e sho ld ,i=j

1/hopij +τin t er va l /τth r es ho ld ,other wise (4)

pi→j=LHi→j

ÍN

k=1LHi→k

(5)

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:11

where N is the set of all the locations. With the increasing time interval between consecutive ngerprint

τin t er val

, the relative dierence of likelihood between each pair of locations becomes smaller. That means if the

time interval between two ngerprints is small, we give higher transition probability to a nearby location. If the

time interval is large we do not give any preference for the transition as the visitor can walk to any location

within such a long duration. Table 3gives the list of important parameter used.

Table 3. List of some important parameters in this paper

Parameter Description

τmin The minimum staying time length required for visitor each location

siRSS ngerprint similarity

ωtend staying tendency coecient

τth r es ho ld Stay time length when visitor have a equal chance to stay and leave

τin t er val Time interval between consecutive ngerprint

hopij The number of hops a person need to transit from location ito locationj

LHi→jTransition likelihood between location iand j

pi→jNormalized transition probability between location iand j

5.3 Trajectory Inference

With the available transition probability and emission probability, we use Viterbi’s algorithm[

] to nd the

maximum probability trajectory. For a series ngerprint

f1,f2, .., fn

, we nd the sequence locations

l1,l2, . ., ln

which maximize the Equation 6. Since we have only a limited number of candidates for each ngerprint captured,

the result converges very fast. Usually, a visitor will spend quite a lot of time in a single location, thus the sequence

of locations will contain a lot of redundancy. For example,

AEEE EE EEE F F F F FGG GGGGGGGG GF ADDI

. Each

letter represents the location of the visitor when a specic probe message was captured by the monitors. We

simplify the trajectory by removing consecutive and duplicate locations and updating the corresponding time

stamps. For the above example, we get AEF GF ADI .

argmax

l1,l2, . . ., lnÖ

i<n

p(li+1|fi) ∗ p(i→i+1)(6)

6 ONE-HOP MOVEMENT INFERENCE FOR SHORT-LIVED RANDOMIZED DEVICE

From Table 1, we observe that if devices have similar probe frequencies, then the number of devices that implement

MAC randomization is close to 2/3 of the population. While we are able to derive the crowd movement based

on the above trajectory inference model using data from devices with Stable MAC addresses, ignoring a large

number of devices with randomized MAC will lose a substantial amount of information. Previous work [

]

used Information Elements (IE) as signatures to track devices. However, recent work[

] have found that such

signatures may change during randomization. Improper use of IE may also cause a high rate of false positives. So

if we can not track the movement of each randomized device, can we infer the crowd movement at each time

duration without knowing who they are? In this section, we will use the trajectories derived from stable MAC

devices to infer the one-hop crowd movement of short-lived randomized devices.

6.1 Overview

Figure 8gives the overview of our one-hop movement inference for short-lived randomized devices. We cut

the time duration in each day into separate time slots and generate the corresponding status vector in each

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:12 •H. Hong et al.

Time

...

SV1SV2SV3... SVNSVN+1

SVENSVEN+1

ADD RIn and R’Out

Transition Probability Matrix

Extended Status Vector

Status

Vector

One Hop Crowd Movement

for Short-lived Randomized MAC

Input

Fig. 8. Overview of the movement inference for short-lived randomized devices

time slot. A status vector is a vector that contains the number of randomized MAC devices which send probe

frames captured in each location. A status vector is a snapshot of the number of short-lived MAC devices in each

location. To complete the picture, we include the number of visitors who enter and leave the museum to form the

extended status vector.

RI n

denotes the number of people entering the museum and

ROu t

denotes the number of

people leaving the museum.

Although each individual visitor has his preference for route selection when visiting the museum, the choices

are generally aected by the layout of exhibits, facilities, interpretative tools, and advertisements. If we assume

that people carrying non-randomized phones have similar behavior to those carrying randomized phones, the

aggregated movement should be similar. We utilize this assumption to infer crowd movement of users carrying

devices with randomized MAC addresses based on the transition pattern learned from devices with Stable MAC

addresses. In the next few sections, we will discuss the details of our algorithm.

6.2 Status Vector and Transition Matrix

Within each time slot, we dene a status vector

which contains the number of randomized probe frames

captured in each location. An example of this will be

: 10

: 3

: 1

: 2

: 3

: 0

: 2

}

However, we found that this vector does not capture all the information about visitors that enter or leave the

museum at the time. Since the museum has multiple entrances/exits and the probe transmission is opportunistic,

a new visitor can appear or leave with last probe frame being captured in any location. Thus we dene the

extended version of status vector

SV E

to add two virtual locations, "In" and "Out". For every two consecutive

vectors, we dene the two SV Es as follow:

where

and

R′

is the number of devices within location A in time slot N and N+1. We dene the visitor

movement between two time slots as a transition matrix TNas follow:

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:13

Table 4. Extended Status Vector

A B C D E F G H I I n Out

SV ENRARBRCRDRERFRGRHRIRI n 0

SV EN+1R′

AR′

BR′

CR′

DR′

ER′

FR′

GR′

HR′

I0R′

Ou t



XA→AXA→BXA→C... XA→HXA→I0XA→Out

XB→AXB→BXB→C... XB→HXB→I0XB→Out

XC→AXC→BXC→C... XC→HXC→I0XC→Out

... ... ... ... ... ... ... ...

XH→AXH→BXH→C... XH→HXH→I0XH→Out

XI→AXI→BXI→C... XI→HXI→I0XI→Out

XI n→AXI n →BXI n→C... XI n→HXI n →I0 0

0 0 0 ... 0 0 0 0



where

XC→C

denotes the number of visitors that remains in location C,

XA→C

denotes the number of people

that move from location A to C. Note that no visitor goes from state

Out

to any location, and no visitor goes

from any location to state

. Both of these values in the matrix are set to 0. To conserve the number of people,

all the variables need to satisfy the following equations:











XA→A+XA→B+XA→C+... +XA→Out =RA

...

XI→A+XI→B+XI→C+... +XI→Out =RI

XI n→A+XI n →B+XI n→C+... +XI n→Out =RI n

XA→A+XB→A+XC→A+... +XI n→A=R′

...

XA→I+XB→I+XC→I+... +XI n→I=R′

XA→out +XB→ou t +XC→ou t +... +XI n→ou t =R′

out

RI n −R′

out =Rдa p

(7)

Now based on the processing of randomized MAC data, we can derive values

R′

and

Rдap

Suppose we have

locations in the museum (not including In and Out). We have 2

×N+

3equations in the above

formulation. However, we have

N×N+

×N

unknown values. Whenever

1, we have

N(N+

N+

So there exist many dierent transition matrices that can satisfy the equations. Thus, we need a way to nd a

specic solution that satises additional constraints. Our approach is to make use of the data accumulated with

global unique and long-lived randomized address. The approach uses a two-step process to infer the one-hop

movement for the short-lived randomized device.

6.3 Two-steps Conversion for Short-Lived Randomized Data

We assume that people carrying non-randomized phones have similar moving patterns with those carrying

randomized phones. In order to utilize such movement pattern, we perform the same processing to the stable MAC

data set as described in section 6.2. With every two consecutive time slots, we are able to get one ground-truth

transition matrix since these devices keep the same MAC addresses. We sum up all the transition matrices and

normalize each row to generate the probability matrix

Ttr a in

. We assume that

Ttr a in

captures the average user

behavior.

Thus in the rst step,

SV EN

is multiplied by the transition matrix

Ttr a in

to generate the expected status vector

SV E ′

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:14 •H. Hong et al.

5321

A B C D

SVEN

1244

SVEN+1

0.5 0.2 0.1 0.2

0.2 0.1 0.5 0.2

0.2 0.4 0.3 0.1

0.1 0.5 0.3 0.1

Probability Transition Matrix Ttrain

First Conversion: SVEN * Ttrain

3332

SVE’N

2 1 0 1

0 0 1 0

0 0 0 0

Second Conversion:

-2 -1 1 2

DiffN

0 -1 1 0

0000

1 1 0 3

0 0 1 0

0 0 0 0

1 1 0 3

0 0 2 0

0 0 0 0

Fig. 9. A simple example for two-step conversion

SV E ′

N=SV EN∗Tt r a in (8)

As the current occupancy status can be dierent from the average behavior, in the second step, we nd the

status vector that is close to

SV E ′

and yet minimizes the dierences. For ease of processing, we calculate the

dierence vector Di f f Nby subtracting SV E′

Nfrom SV EN+1.

Di f f N=SV EN+1−SV E ′

N(9)

Each of the negative values in

Di f f N

indicates that there is a certain number of visitors that move from current

location to the other locations. For each positive value of

Di f f N

, it means this location attracts visitors from

other locations. Thus, with each negative value, we search for available positive values in the

Di f f N

to ll the

hole. Based on the transition probability, we assign the visitors to move to the other corresponding location until

the

Di f f N

is adjusted to be a vector containing all 0 values. With these sequence of conversions, we nish the

second step conversion.

The nal transition matrix gives an estimation of the crowd movement for people bringing devices with

randomized MAC during this period. Figure 9gives a simple example of the two-step conversion calculation with

only four locations included without In and Out state for ease of explanation. Summing up the matrix for both

Stable MAC and Short-lived Randomized MAC data set, we have an overview of the crowd movement for visitors.

7 EVALUATION

There are three parts in the evaluation. First, we evaluate the accuracy of trajectory inferring using Stable MAC

data. We then present the results for inferring movement of devices with random MAC addresses using movement

patterns of devices with Stable MAC addresses. Finally, we present some interesting crowd movement statistics

ndings in the museum.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:15

13:30 14:00 14:30

A C D H D A F AB I E Ground Truth

A C D H DA F AB I E Visitor X4

Visitor X3

Visitor X2

Visitor X1

CAE

A C D H DA F AB I E

C EC

A C D H DA F AB I E

C C

A C D H DA F A

I E

C E

Fig. 10. Path generation with modified phones for TR1

7.1 Accuracy of Trajectory Inferring

Two parameters are required in the inference. First, we need the minimum staying time length

τmin

that a visitor

has to spend in a location for the system to be able to detect the movement. This duration depends on how

frequent each device transmits the probe frames. Based on measurements presented earlier, the parameter is set

to 5 min.

The second parameter needed,

τth r es ho ld

, indicates the stay time length when a visitor has an equal chance to

stay and leave. We use the average staying duration in each of the locations to infer the value. Since dierent

locations have dierent area sizes,

τth r es ho ld

can vary a lot for dierent areas. The corresponding value of

τth r es ho ld

measured for dierent locations are

: 18

min,B

: 9

min,C

: 16

min,D

: 13

min,E

: 13

min,F

: 10

min,G

32min,H: 13min,I: 13min}.

7.1.1 Ground Truth Collection. To verify the accuracy of trajectory inference, we organize three ground truth

collection tours to the museum. The details of the tours are listed in Table 5. In TR1, we organized four people

carrying four dierent phones to walk on a predened route. All the phones in this tour are modied to prompt

more probe request frames with global unique MAC address. We required the users to record down the time

stamps in each of the locations. In TR2, we followed a one-hour guided tour with 11 other adult visitors (16 in

total including tour guide). All the visitors just use their mobile devices which are unmodied phones with WiFi

switched ON. In TR3, we do a similar guide tour with more young people involved.

Table 5. Detail of three ground truth collection tour

Name Number of People Young Old Identied Visiting Route Time

TR1 4 4 0 4 ACDHDBAIAEFA 1h 30min

TR2 16 7 9 9 ACDH 55min

TR3 18 13 5 13 ACDBAEF 1h 8min

7.1.2 Trajectory Inferring Result with Probe Frequency. We will only show the result of the rst two trips as

the result of the third trip is similar. The result of the path inference for TR1 is shown in Figure 10 with ground

truth plotted in the bottom. We found that the trajectories inferred are pretty accurate with minor errors. The

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:16 •H. Hong et al.

14:00 14:30 15:00

A C D H Ground Truth

Visitor X6

Visitor X5

Visitor X2

Visitor X1

A C D H DA

A D H A

A D H DA

Fig. 11. Path generation with unmodified phones for TR2

start time and end time for each location dier from the ground truth by a maximum of 3 minutes. The reason

for such high accuracy is because of the use of the modied phones with a high frequency of probe transmission.

The result in Figure 11 shows the ground truth and our trajectory inference for four of the visitors during

TR2. The ground truth tour ended after visiting location H. We include the full traces of the four visitors which

showed their personal choices after the tour guide ended the tour. Compared to the result in Figure 10, only X6

generate the full trajectory without any gap. Trajectories for visitor X1 and X2 both miss the location C, which

may be because they stayed there only for a short time. In Visitor X2’s trajectory, there is a duration of around 20

minutes without any probe request emitted for which we can not decide on the proper location. While we can

guess that the visitor may remain in the same location H, such an assumption may lead to a large error. Thus we

leave that period of time as unknown.

7.1.3 Trajectory Inferring Accuracy. We dene several metrics to measure the accuracy of trajectory inference:

•False Positive

The ratio of the locations identied by the algorithm in the trajectories but not present in

ground truth trajectories.

•Location Recall

The correct number of locations derived in trajectory / The total number of locations in

ground truth.

•Time Length Accuracy Time length estimation accuracy.

•Start and End Time Error

The start time and end time shift errors for each location we identied in the

trajectory.

Based on the result of the three trips, we identied 26 trajectories and use them to calculate the accuracy of

trajectories inference. We compare the performance of three approaches used to derive trajectory.

•FP

uses only the WiFi ngerprinting method for localization and uses these locations to derive the

trajectories.

•HMM

is similar to CrowdProbe but without considering the movement pattern of visitors inside the

museum. That is to set the transition probability as same for all the locations.

•CrowdProbe the proposed method.

The results are shown in Figure 12 and Figure 13. It can be observed that CrowdProbe attains a low false positive

close to 0.14 compared to 0.23 for FP and 0.18 for HMM. While CrowdProbe has similar recall rate with HMM

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:17

method, the time length estimation for CrowdProbe is much higher at 0.94. The Start Time and End Time Error

for FP and HMM are around 5 minutes and 3 minutes. CrowdProbe improves that to around 2.5 minutes. The

improvement is due to reducing the jitters in the handover area.

0.2

0.4

0.6

0.8

False Positive Recall Total Time Length

Ratio

FP HMM CrowdProbe

0.23

0.82

0.72

0.18

0.91

0.82

0.14

0.92 0.94

Fig. 12. Trajectory generation performance

Start Time Error End Time Error

Error(Min)

FP HMM CrowdProbe

5.3

4.9

3.1

3.8

2.4 2.6

Fig. 13. Time stamp estimation performance

7.2 Short-lived Randomized Device One-hop Transition Evaluation

7.2.1 Time Slicing. We need to decide what is the proper duration or time slot length that we are able to

infer without losing too much information about crowd movement but with sucient collected data. The basic

requirement for picking the length of the time slot is to ensure that device with randomized MAC will send at

least 1 probe request in each time slot but not multiple probe frames with dierent MAC addresses. Thus this

value is decided by two factors: the lifetime of randomized MAC and probe frequency.

The only hint we found about the lifetime of randomized MAC is in the conguration le

wpa_supplicant.con f

used by Android and Windows and OS client station which indicate a

rand_addr _li f e time =

60 [

]. That means

any two randomized addresses are not likely to emitted from the same device within 1 minute. Thus, we can

safely set the time slot length to be larger than 1 minute. From Figure 1, we can see that most of the devices send

at least one probe frame within a 5-minute time slot. With a larger value like 10 minutes, we are likely to include

multiple samples from the same device in each time slot. That may introduce error in the status vector. With a

much smaller value like 1 minute, we may not have received any probes from many of the devices.

A device may transmit more than one probe frame in the same 5-minute slot. If the (randomized) MAC address

remains the same, then this is not a problem. However, if the MAC address changes within a single time slot,

then the same device may be counted as dierent devices and we overestimate the number of users. Hence, a

relatively short interval of 5 minute will also limit the amount of overestimation due to duplicates.

7.2.2 Evaluation Method. Even though we can derive the transition matrix for randomized devices, we are

not able to verify the result since probe frames are randomized as we do not have the ground truth data for

the short-lived randomized MAC data. Thus, we instead use the data from Stable MAC devices to check the

performance of our approach. The ow of the evaluation is given in Figure 14. We input the status vectors

SV EN

and

SV EN+1

for the Stable MAC devices to the algorithm and get the result transition matrix. With the Stable

MAC data set, we can easily derive the ground truth transition matrix. We use the following metrics to measure

the performance of the short-lived MAC device one-hop transition inference.

•Transition Accuracy

The number of correct transitions in Transition matrix for Stable MAC data / The

total number of transition happen in ground truth. Note A→Ais also regarded as one transition.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:18 •H. Hong et al.

Passive Scanning Data

Status Vector For

Stable MAC Data

Status Vector For

Short-lived MAC Data

...

Transition Probability Matrix

...

Transition Matrix For

Short-lived MAC Data

Evaluation? No Ground Truth!

...

Transition Matrix For

Stable MAC Data

Ground Truth For

Stable MAC Data

Evaluation

Using Stable

MAC Data

Fig. 14. Demonstration of our evaluation method for one-hop transition inference

0.2

0.4

0.6

0.8

0.5 0.6 0.7 0.8 0.9 1

Ratio(%)

Accuracy

July

Sep

Oct

Fig. 15. Randomized trajectory inference accuracy with non-randomized data testing result

7.2.3 Result for Short-lived Randomized Device Transition. By comparing the ground truth matrix and the one

we derived, we can estimate the performance of our Short-lived transition inference method. With each two

consecutive time slots, we run the evaluation method to verify the eectiveness of the short-lived randomized

device one-hop transition. We show the results for the month of July, September, and October 2017 by plotting

the CDF of the transition accuracy in Figure 15. The average accuracy for the three months is 0.8, 0.81, 0.77

respectively. That means in every 5 transitions, we can correctly infer 4 of them. Considering the diculty of

tracking devices with randomized MAC addresses, the accuracy is better than what we expected.

Table 6gives a summary of the information we can get from passive scanning. If the device provides Stable MAC

addresses, we can derive a lot of information about the crowd movement. We can derive short time movement for

devices with randomized MAC addresses if we can supplant the data with statistics of stable MAC in the same

venue. However, if the information is only for short-lived randomized MAC, we can only do occupancy counting

in each time slot.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:19

Table 6. Information we can obtain from passive sacnning

Feature Counting RI n and RO ut One-hop Transition Trajectory inference Stay Length Estimation

Stable MAC ✓ ✓ ✓ ✓ ✓

Short-Lived MAC with

Stable MAC statistics ✓ ✓ ✓ × ×

Short-Lived MAC Only ✓ ✓ × × ×

Level 1

Level 2

Level 3

Location of Monitor

ALocation Label

Entrance

1 percent of movement

8 percent of movement

Fig. 16. The arrows and their widths represent visitors’ flows between dierent locations. The most frequent path is shown

as green color.

7.3 Findings for Museum Statistics

With the help of the trajectory and transition inference algorithm, we share our ndings in processing museum

data regarding the visitors’ movement pattern. In August 2017, the layout of the museum was changed due to

some artwork being replaced and exhibition location G being blocked for re-installation. Thus in our analysis, we

may also include the impacts of such changes.

Although each visiting path selection can be aected by personal choice, from a macro view, the path spatial

distribution should be the result of the interplay between monitors locations and the spatial layout of the museum.

Figure 16 gives the spatial distribution of visitors in the museum. From the gure, we can see a majority amount

of visitors follow the route ACDGHFEA and a smaller set of visitors take the reverse route with AEFGHDCA.

The two paths both begin from and end at location A which is the main entrance to the museum. Among all

the sub-path, EFG and GFE appear in 35% and 29% of all the visitors’ trajectories. This is because of the linear

layout on the second oor of the museum. The number of visitors picks ACD is twice the number of visitors

who pick ABD. Only 18% of people actually make a visit to exhibition location B. 73% of the visitors actually

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:20 •H. Hong et al.

skip exhibition location H which located deep inside the museum in the third oor. CrowdProbe enables us to

analysis on such visitor pattern without labor-intensive survey.

0.05

0.1

0.15

0.2

0.25

0.3

1 2 3 4 5 6 7 8 9

Ratio

Number of Locations Visited

July Sep to Dec

Fig. 17. Number of exhibition locations visited for visitors

A B C D E F G H I

Time Length(min)

Exhibition Locations

July Sep to Dec

Fig. 18. Average staying time for each exhibition location

Figure 17 shows the average number of exhibition locations visited. 80% of the visitors took a tour including

3-6 locations. Only very few visitors actually visit the whole museum. After the re-innovation started in August,

the average number of locations visited decreases slightly. Figure 18 gives the average staying time for each

location. The duration usually ranges from 10-20 minutes and is somewhat proportional to the area of each

location. With the change in August, the time spent in location G is distributed to other exhibition locations

causing an increment of staying in almost all the other areas.

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

<30min

0.5-1h

1-1.5h

1.5-2h

2-2.5h

2.5-3h

>3h

Average

100

Ratio

Time Length(min)

Visiting Time

July Sep to Dec

Fig. 19. Distribution of total time length spend in museum

100

120

140

10:00:00

11:00:00

12:00:00

13:00:00

14:00:00

15:00:00

16:00:00

17:00:00

18:00:00

19:00:00

20:00:00

21:00:00

Time Length(min)

Entry Time

Weekday Friday Weekend

Fig. 20. Visitor stay time length vs entry time

Figure 19 shows the total time distribution a visitor spent in the museum. 63% and 66% of visitors spent around

0.5 to 2 hours at the museum in July and Sep-Dec. Only ve percent of the visitors spent more than 3 hours in

the museum. After access to location G had been blocked, the time in the museum drops slightly from 84min to

79min. From Figure 20, we can conclude that the visitor will stay for shorter length when reaching the closing

time for the museum (21:00 on Friday, 19:00 otherwise). In the morning, around 11 am, visitors tend to stay less

time, which may be because of the approaching lunchtime. In the afternoon, the duration is relatively stable and

starts decreasing two hours before the closing time.

8 RELATED WORK

Research work on passive tracking can generally be divided into two categories: device-free passive tracking and

device-based passive tracking.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:21

8.1 Device-free Passive Tracking

The idea of radio-based device-free passive tracking is based on the fact that the existence of the human body

in an RF environment aects the RF signals, especially in 2.4 GHz and 5 GHz band common in WiFi network.

Typical deployment usually includes signal transmitters and monitoring points. During the training phase, RSS

information is collected under dierent conditions. Later in the testing phase, the emerging RSS ngerprint is

matched to the database to infer the number of people and their locations. While this technology is still only used

in controlled experiment settings, some research work already shows the potential. Nuzzer[

] used probabilistic

approach for handling the device-free passive localization problem for a single intruder. E-eye[

] uses channel

state information to identify and distinguish in-house activities. Ichnaea[

] shorten the training period and

applied statistical anomaly detection techniques and particle ltering to provide localization capabilities.

Device-free passive tracking usually requires tedious training and can only track a limited number of people.

Moreover, if the environment settings have changed, the radio database needs to be re-trained and adjusted to t

the new changes. This limitation hinders the further deployment of such device-free passive tracking technology.

8.2 Device-based Passive Tracking

Device-based passive tracking aims to track devices that carried with users, especially smartphones. Early work[

]

uses RFID to estimated visitor positions, visiting patterns, and inter-human relationships at a science museum.

Recent work[

] use Bluetooth to monitor visitors’ length of stay at the Louvre. However, their experiment only

cover 8.2% of the visitor which aects the credibility and practicality of the conclusion. Due to the widespread

deployment of WiFi networks and the popularity of smartphone, the use of WiFi related information to individual

information has been both popular and shown to be eective. Researchers have proposed a series of ideas

to exploit the availability of probe information from mobile devices to track individuals. Musa[

] also used

HMM-based method to estimate smartphone trajectory which is similar to our work. However, their system is

meant to deploy for outdoor road conditions where vehicle have x moving direction and the requirement for

granularity is lower than in a complex museum.

Besides merely tracking location, more work focuses on revealing user relationship such as using the known

SSIDs list in probe requests as the ngerprint to decide whether two people are socially linked together[

]. A

similar method has been used to generate spatial-temporal similarity based on users’ co-occurrence frequency to

infer relationships between them [

]. Adriano[

] exploited WiFi probe requests to de-anonymize the origin of

participants in large events. To combat such information leakage, major mobile phone vendors introduced MAC

randomization and encouraged the devices to send probe frames with empty (unknown) SSID list [16].

After the introduction of MAC randomization, researchers focus on de-anonymize WiFi frame. Freudiger[

]

attempt to use sequence number and timing information to link randomized probe message. Vanhoef[

] make

use of information element(IE) and scrambler seeds used at the physical layer to track users. Martin[

] is more

aggressive to implement control frame attack to expose the globally unique MAC.

Compared to the previous works, CrowdProbe is deployed in a complex indoor environment. We provide a

non-invasive method to reveal the crowd movement regardless of the phone vendors or OS versions.

9 DISCUSSION

In our measurements, we observe that about 60% of the devices randomized their MAC addresses. As more

vendors take action to protect the privacy of the user, this ratio will continue to increase. While such a trend

presents a challenge for CrowdProbe, we would like to highlight that CrowdProbe can work as long as there is

sucient statistics from devices that broadcast frames with Stable MAC addresses. We believe this will be the

case for the following reasons. First, if a device is associated with the WiFi network, it will revert to use its global

unique MAC address [

]. In many public spaces, free WiFi access is often available. It can be expected that some

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

115:22 •H. Hong et al.

visitors will connect to the WiFi network for internet access. Thus, one will be able to collect sucient statistics

with Global Unique MAC even though it may take more time. Second, as shown in Figure 5, some devices indeed

randomized their MAC, but they keep the same randomized MAC over a suciently long duration of up to hours.

Such data can be used to infer the transition probability without linking each MAC to a specic user device.

Lastly, we also sni NULL data frame. These frames are used for power management and do not randomized the

MAC addresses. Current randomization scheme is implemented only on the active scanning of the mobile device.

Based on the above discussion, CrowdProbe can continue to collect enough data to infer the transition pattern

and one-hop transition for devices with randomized MACs.

While CrowdProbe is only deployed and tested in the museum environment, the technique has the potential

to be used in other environments like shopping malls and transportation hubs. For trajectory inferring, all the

parameters are based on the data collected in the place. Thus, our algorithm will still run for the dierent scenarios

as long as sucient data can be collected.

The sparse nature of frames transmission limits the accuracy of crowd monitoring which can be seen from the

performance gap between Figure 10 and Figure 11. To get more frame transmission, the author in [

] propose

to emulate the SSID of popular or previously visited AP. This technique can also be integrated into our system.

However, this technique triggers the use of WiFi interface of the mobile device which will interrupt the existing

connection and drain the battery at a higher rate.

10 CONCLUSION

In this paper, we propose an HMM-based visitor trajectory inference method based on passive WiFi monitoring.

Moreover, we make use of the transition probability derived from existing trajectories to generate the possible

movement mapping. The deployment and evaluation in a multi-oor museum proved the feasibility of the

proposed system. We believe that CrowdProbe can also be used in other scenarios. While there is no xed model

for all the applications, the experience and lessons we learn from this case study will help in bridging research

and practice.

REFERENCES

[1] Lada A Adamic and Eytan Adar. 2003. Friends and neighbors on the web. Social networks 25, 3 (2003), 211–230.

[2]

Jamal Jokar Arsanjani, Wolfgang Kainz, and Ali Jafar Mousivand. 2011. Tracking dynamic land-use change using spatially explicit

Markov Chain based on cellular automata: the case of Tehran. International Journal of Image and Data Fusion 2, 4 (2011), 329–345.

[3]

Marco V Barbera, Alessandro Epasto, Alessandro Mei, Vasile C Perta, and Julinda Stefa. 2013. Signals from the crowd: uncovering social

relationships through smartphone probes. In IMC. ACM, 265–276.

[4]

Ben Benfold and Ian Reid. 2011. Stable multi-target tracking in real-time surveillance video. In Computer Vision and Pattern Recognition

(CVPR), 2011 IEEE Conference on. IEEE, 3457–3464.

[5]

Ningning Cheng, Prasant Mohapatra, Mathieu Cunche, Mohamed Ali Kaafar, Roksana Boreli, and Srikanth Krishnamurthy. 2012.

Inferring user relationship from hidden information in wlans. In MILITARY COMMUNICATIONS CONFERENCE, 2012-MILCOM 2012.

IEEE, 1–6.

[6]

Tom Chothia and Vitaliy Smirnov. 2010. A Traceability Attack against e-Passports.. In Financial Cryptography, Vol. 6052. Springer,

20–34.

[7]

Mathieu Cunche, Mohamed Ali Kaafar, and Roksana Boreli. 2012. I know who you will meet this evening! linking wireless devices using

wi- probe requests. In WoWMoM. IEEE, 1–9.

[8]

Adriano Di Luzio, Alessandro Mei, and Julinda Stefa. 2016. Mind your probes: De-anonymization of large crowds through smartphone

WiFi probe requests. In Computer Communications, IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on. IEEE, 1–9.

[9]

Arnaud Doucet, Nando De Freitas, Kevin Murphy, and Stuart Russell. 2000. Rao-Blackwellised particle ltering for dynamic Bayesian

networks. In Proceedings of the Sixteenth conference on Uncertainty in articial intelligence. Morgan Kaufmann Publishers Inc., 176–183.

[10] Sean R Eddy. 1996. Hidden markov models. Current opinion in structural biology 6, 3 (1996), 361–365.

[11] Samsung Electronics. 2013. SAR evaluation report. In SAR evaluation report. Samsung Electronics, 3–4.

[12] G David Forney. 1973. The viterbi algorithm. Proc. IEEE 61, 3 (1973), 268–278.

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

CrowdProbe: Non-invasive Crowd Monitoring with WiFi Probe •115:23

[13]

Julien Freudiger. 2015. How talkative is your mobile device?: an experimental study of Wi-Fi probe requests. In Proceedings of the 8th

ACM Conference on Security & Privacy in Wireless and Mobile Networks. ACM, 8.

[14]

Dan Goodin. 2017. Shielding MAC addresses from stalkers is hard and Android fails miserably at it. https://arstechnica.com/

information-technology/2017/03/shielding- mac-addresses- from-stalkers-is-hard-android- is-failing- miserably/. [Online].

[15]

Hande Hong, Chengwen Luo, and Mun Choon Chan. 2016. SocialProbe: Understanding Social Interaction Through Passive WiFi

Monitoring. In Proceedings of the 13th International Conference on Mobile and Ubiquitous Systems: Computing, Networking and Services.

ACM, 94–103.

[16]

Xueheng Hu, Lixing Song, Dirk Van Bruggen, and Aaron Striegel. 2015. Is There WiFi Yet? How Aggressive WiFi Probe Requests

Deteriorate Energy and Throughput. arXiv preprint arXiv:1502.01222 (2015).

[17]

Mohamed Ibrahim and Moustafa Youssef. 2011. A hidden markov model for localization using low-end GSM cell phones. In Communica-

tions (ICC), 2011 IEEE International Conference on. IEEE, 1–5.

[18]

Takayuki Kanda, Masahiro Shiomi, Laurent Perrin, Tatsuya Nomura, Hiroshi Ishiguro, and Norihiro Hagita. 2007. Analysis of people

trajectories with ubiquitous sensors in a science museum. In Robotics and Automation, 2007 IEEE International Conference on. IEEE,

4846–4853.

[19]

Thomas Liebig and Armel Ulrich Kemloh Wagoum. 2012. Modelling Microscopic Pedestrian Mobility using Bluetooth.. In ICAART (2).

270–275.

[20] Jouni Malinen. 2014. Linux WPA/WPA2/IEEE 802.1X Supplicant. https://w1./wpa_supplicant/. [Online].

[21]

Jeremy Martin, Travis Mayberry, Collin Donahue, Lucas Foppe, Lamont Brown, Chadwick Riggins, Erik C Rye, and Dane Brown. 2017.

A Study of MAC Address Randomization in Mobile Devices and When it Fails. arXiv preprint arXiv:1703.02874 (2017).

[22]

Lyudmila Mihaylova, Paul Brasnett, Nishan Canagarajah, and David Bull. 2007. Object tracking by particle ltering techniques in video

sequences. Advances and challenges in multisensor data and information processing 8 (2007), 260–268.

[23]

ABM Musa and Jakob Eriksson. 2012. Tracking unmodied smartphones using wi- monitors. In Proceedings of the 10th ACM conference

on embedded network sensor systems. ACM, 281–294.

[24]

Nuria M Oliver, Barbara Rosario, and Alex P Pentland. 2000. A Bayesian computer vision system for modeling human interactions. IEEE

transactions on pattern analysis and machine intelligence 22, 8 (2000), 831–843.

[25] Oluwatoyin P Popoola and Kejun Wang. 2012. Video-based abnormal human behavior recognitionâĂŤA review. IEEE Transactions on

Systems, Man, and Cybernetics, Part C (Applications and Reviews) 42, 6 (2012), 865–878.

[26]

Romer Rosales and Stan Sclaro. 1999. 3D trajectory recovery for tracking multiple objects and trajectory guided recognition of actions.

In Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference on., Vol. 2. IEEE, 117–123.

[27]

Ahmed Saeed, Ahmed E Kosba, and Moustafa Youssef. 2014. Ichnaea: A low-overhead robust WLAN device-free passive localization

system. IEEE Journal of selected topics in signal processing 8, 1 (2014), 5–15.

[28]

Moustafa Seifeldin, Ahmed Saeed, Ahmed E Kosba, Amr El-Keyi, and Moustafa Youssef. 2013. Nuzzer: A large-scale device-free passive

localization system for wireless environments. IEEE Transactions on Mobile Computing 12, 7 (2013), 1321–1334.

[29] K. Skinner and J. Novak. 2015. Privacy and your app. [Online].

[30] Taee T Tanimoto. 1958. Elementary mathematical theory of classication and prediction. (1958).

[31]

Mathy Vanhoef, Célestin Matte, Mathieu Cunche, Leonardo S Cardoso, and Frank Piessens. 2016. Why MAC address randomization is

not enough: An analysis of Wi-Fi network discovery mechanisms. In Proceedings of the 11th ACM on Asia Conference on Computer and

Communications Security. ACM, 413–424.

[32]

Mathias Versichele, Tijs Neutens, Matthias Delafontaine, and Nico Van de Weghe. 2012. The use of Bluetooth for analysing spatiotemporal

dynamics of human movement at mass events: A case study of the Ghent Festivities. Applied Geography 32, 2 (2012), 208–220.

[33] Harald Vogt. 2002. Ecient object identication with passive RFID tags. Pervasive computing (2002), 98–113.

[34]

Yan Wang, Jian Liu, Yingying Chen, Marco Gruteser, Jie Yang, and Hongbo Liu. 2014. E-eyes: device-free location-oriented activity

identication using ne-grained wi signatures. In Proceedings of the 20th annual international conference on Mobile computing and

networking. ACM, 617–628.

[35]

Yuji Yoshimura, Anne Krebs, and Carlo Ratti. 2017. Noninvasive Bluetooth Monitoring of Visitors’ Length of Stay at the Louvre. IEEE

Pervasive Computing 16, 2 (2017), 26–34.

Received February 2018; revised July 2018; accepted September 2018

Proc. ACM Interact. Mob. Wearable Ubiquitous Technol., Vol. 2, No. 3, Article 115. Publication date: September 2018.

A Consumer Product of Wi-Fi Tracker System using RSSI-based Distance for Indoor Crowd Monitoring

Article

Jun 2023

This study aims to design and develop Wi-Fi tracker system that utilizes RSSI-based distance parameters for crowd-monitoring applications in indoor settings. The system consists of three main components, namely 1) an embedded node that runs on Raspberry-pi Zero W, 2) a real-time localization algorithm, and 3) a server system with an online dashboard. The embedded node scans and collects relevant information from Wi-Fi-connected smartphones, such as MAC data, RSSI, timestamps, etc. These data are then transmitted to the server system, where the localization algorithm passively determines the location of devices as long as Wi-Fi is enabled. The mentioned devices are smartphones, tablets, laptops, while the algorithm used is a Non-Linear System with Lavenberg–Marquart and Unscented Kalman Filter (UKF). The server and online dashboard (web-based application) have three functions, including displaying and recording device localization results, setting parameters, and visualizing analyzed data. The node hardware was designed for minimum size and portability, resulting in a consumer electronics product outlook. The system demonstration in this study was conducted to validate its functionality and performance.

A Tutorial On Privacy, RCM and Its Implications in WLAN

Article

Full-text available

Jan 2023

The proliferation of Wi-Fi devices has led to the rise of privacy concerns related to MAC Address-based systems used for people tracking and localization across various applications, such as smart cities, intelligent transportation systems, and marketing. These systems have highlighted the necessity for mobile device manufacturers to implement Randomized And Changing MAC address (RCM) techniques as a countermeasure for device identification. In response to the challenges posed by diverse RCM implementations, the IEEE has taken steps to standardize RCM operations through the 802.11aq Task Group (TG). However, while RCM implementation addresses some concerns, it can disrupt services that span both Layer 2 and upper-layers, which were originally designed assuming static MAC addresses. To address these challenges, the IEEE has established the 802.11bh TG, focusing on defining new device identification methods, particularly for Layer 2 services that require pre-association identification. Simultaneously, the IETF launched the MAC Address Device Identification for Network and Application Services (MADINAS) Working Group to investigate the repercussions of RCM on upper-layer services, including the Dynamic Host Configuration Protocol (DHCP). Concurrently, derandomization techniques have emerged to counteract RCM defense mechanisms. The exploration of these techniques has suggested the need for a broader privacy enhancement framework for WLANs that goes beyond simple MAC address randomization. These findings have prompted the inception of the 802.11bi TG, which aims to compile an exhaustive list of potential privacy vulnerabilities and prerequisites for a more private IEEE 802.11 standard. In this context, this tutorial aims to provide insights into the motivations behind RCM, its implementation, and its evolution over the years. It elucidates the influence of RCM on network processes and services. Furthermore, the tutorial delves into the recent progress made within the domains of 802.11bh, 802.11bi, and MADINAS. It offers a thorough analysis of the initial work undertaken by these groups, along with an overview of the relevant research challenges. The tutorial objective is to inspire the research community to explore innovative approaches and solutions that contribute to the ongoing efforts to enhance WLAN privacy through standardization initiatives.

Big Data Framework for Crowd Monitoring in Large Crowded Events

Article

Jun 2023

The management of large events with hundreds of thousands of individuals has remained a challenge over the years. Crushes and stampedes occurring in the events of mass gathering have swallowed many valuable lives around the world. Considering the substantial advancement in positional tracking, wearable technology, and wireless communication, many event organizers are embracing the use of these technologies to get assistance in managing large events. Intelligent monitoring of crowd movement and timely analysis of evolving conditions may aid in early detection of critical situations. The current research aims to propose a big data resource framework to model, simulate, and visualize the crowd conditions for actual venue settings. A distributed framework has been presented to monitor the movement and interaction of individuals in large crowded events through localized sensing and geospatial analysis of massive positional data. The pilgrimage (Hajj) has been considered as a case study for demonstrating the effectiveness of the proposed framework. The proposed framework has been with the help of synthetic data that covered some useful and frequent scenarios based on the case study of pilgrimage (hajj), which is an annual event involving more than a million people.

On the Fine-Grained Crowd Analysis via Passive WiFi Sensing

Preprint

Full-text available

Aug 2023

p>Regarding the passive WiFi sensing based crowd analysis, this paper first theoretically investigates its limitations, and then proposes a deep learning based scheme targeted for returning fine-grained crowd states in large surveillance areas. To this end, three key challenges are coped with: to relieve the influences of the randomness and sparsity induced by passive WiFi sensing, an attention-based deep convolutional autoencoder model is designed to recover accurate crowd density maps in a way similar to image reconstruction; to combat the anonymity caused by MAC randomization, following the identification of local high-density crowds (LHDCs) with the density clustering algorithm, i.e. DM-DBSCAN, a bidirectional convolutional LSTM based model is employed to infer LHDC speeds; to overcome the absence of passive WiFi sensing datasets for model training, three semi-synthetic datasets are produced by emulating passive WiFi sensing with practical pedestrian tracking datasets. Extensive experiments confirm that, the proposed scheme significantly outperforms existing WiFi-based methods in terms of crowd density estimation and provides superior crowd speed estimation. More importantly, the scheme can also produce consistent crowd states on a real-world dataset, revealing that it has the ability to support accurate, visualized and real-time crowd monitoring in large surveillance areas.</p

On the Fine-Grained Crowd Analysis via Passive WiFi Sensing

Preprint

Full-text available

Aug 2023

Collecting, Processing and Secondary Using Personal and (Pseudo)Anonymized Data in Smart Cities

Article

Full-text available

Mar 2023

Smart cities, leveraging IoT technologies, are revolutionizing the quality of life for citizens. However, the massive data generated in these cities also poses significant privacy risks, particularly in de-anonymization and re-identification. This survey focuses on the privacy concerns and commonly used techniques for data protection in smart cities, specifically addressing geolocation data and video surveillance. We categorize the attacks into linking, predictive and inference, and side-channel attacks. Furthermore, we examine the most widely employed de-identification and anonymization techniques, highlighting privacy-preserving techniques and anonymization tools; while these methods can reduce the privacy risks, they are not enough to address all the challenges. In addition, we argue that de-identification must involve properties such as unlikability, selective disclosure and self-sovereignty. This paper concludes by outlining future research challenges in achieving complete de-identification in smart cities.

Non-Intrusive Privacy-Preserving Approach for Presence Monitoring Based on WiFi Probe Requests

Article

Full-text available

Feb 2023
SENSORS-BASEL

Monitoring the presence and movements of individuals or crowds in a given area can provide valuable insight into actual behavior patterns and hidden trends. Therefore, it is crucial in areas such as public safety, transportation, urban planning, disaster and crisis management, and mass events organization, both for the adoption of appropriate policies and measures and for the development of advanced services and applications. In this paper, we propose a non-intrusive privacy-preserving detection of people’s presence and movement patterns by tracking their carried WiFi-enabled personal devices, using the network management messages transmitted by these devices for their association with the available networks. However, due to privacy regulations, various randomization schemes have been implemented in network management messages to prevent easy discrimination between devices based on their addresses, sequence numbers of messages, data fields, and the amount of data contained in the messages. To this end, we proposed a novel de-randomization method that detects individual devices by grouping similar network management messages and corresponding radio channel characteristics using a novel clustering and matching procedure. The proposed method was first calibrated using a labeled publicly available dataset, which was validated by measurements in a controlled rural and a semi-controlled indoor environment, and finally tested in terms of scalability and accuracy in an uncontrolled crowded urban environment. The results show that the proposed de-randomization method is able to correctly detect more than 96% of the devices from the rural and indoor datasets when validated separately for each device. When the devices are grouped, the accuracy of the method decreases but is still above 70% for rural environments and 80% for indoor environments. The final verification of the non-intrusive, low-cost solution for analyzing the presence and movement patterns of people, which also provides information on clustered data that can be used to analyze the movements of individuals, in an urban environment confirmed the accuracy, scalability and robustness of the method. However, it also revealed some drawbacks in terms of exponential computational complexity and determination and fine-tuning of method parameters, which require further optimization and automation.

On the Fine-Grained Crowd Analysis via Passive WiFi Sensing

Article

Jan 2023

Regarding the passive WiFi sensing based crowd analysis, this paper first theoretically investigates its limitations, and then proposes a deep learning based scheme targeted for returning fine-grained crowd states in large surveillance areas. To this end, three key challenges are coped with: to relieve the influences of the randomness and sparsity induced by passive WiFi sensing, an attention-based deep convolutional autoencoder model is designed to recover accurate crowd density maps in a way similar to image reconstruction; to combat the anonymity caused by MAC randomization, following the identification of local high-density crowds (LHDCs) with the density clustering algorithm, i.e. DM-DBSCAN, a bidirectional convolutional LSTM based model is employed to infer LHDC speeds; to overcome the absence of passive WiFi sensing datasets for model training, three semi-synthetic datasets are produced by emulating passive WiFi sensing with practical pedestrian tracking datasets. Extensive experiments confirm that, the proposed scheme significantly outperforms existing WiFi-based methods in terms of crowd density estimation and provides superior crowd speed estimation. More importantly, the scheme can also produce consistent crowd states on a real-world dataset, revealing that it has the ability to support accurate, visualized and real-time crowd monitoring in large surveillance areas.

Cross-zone and extreme-aware mobility learning of crowd interactions with built environments

Conference Paper

Dec 2022

Recommendation System for Tourist Attractions Based on Wi-Fi Packet Sensor Data

Article

Aug 2022

The needs of tourists have become diversified in recent years, and providing tourist information can help satisfy these various needs. Although such information should reflect actual travel behavior, it is not easy to collect data on how tourists visit tourist attractions. Nowadays, it is not uncommon to see tourists looking at their smartphone, searching for information about where to go next. If a recommendation system could suggest the next destination, more tourists would appreciate the information and visit there. Therefore, we envisioned a recommendation system based on records of tourist movements from Wi-Fi packet sensor data. The system would provide tourists with relevant information about the recommended locations. The aim of this research was to investigate the computation method for the recommendations, which is necessary to demonstrate this idea. We also compared the results and characteristics of recommendations made based on cosine similarity. The results showed that, when using Wi-Fi data without cleaning, some tourist spots were strongly recommended because of bias in the data observations. Therefore, a dataset was also used that was restricted to tourists who visited three or more places. As a result of this restriction, the accuracy of the recommendations was improved. Furthermore, we made a recommendation based on a dummy variable that represented whether tourists had visited each location, which enabled recommendations to be generated for locations where it was difficult to observe data.

A Study of MAC Address Randomization in Mobile Devices and When it Fails

Article

Full-text available

Mar 2017

Media Access Control (MAC) address randomization is a privacy technique whereby mobile devices rotate through random hardware addresses in order to prevent observers from singling out their traffic or physical location from other nearby devices. Adoption of this technology, however, has been sporadic and varied across device manufacturers. In this paper, we present the first wide-scale study of MAC address randomization in the wild, including a detailed breakdown of different randomization techniques by operating system, manufacturer, and model of device. We then identify multiple flaws in these implementations which can be exploited to defeat randomization as performed by existing devices. First, we show that devices commonly make improper use of randomization by sending wireless frames with the true, global address when they should be using a randomized address. We move on to extend the passive identification techniques of Vanhoef et al. to effectively defeat randomization in ~96% of Android phones. Finally, we identify a previously unknown flaw in the way wireless chipsets handle low-level control frames which applies to 100% of devices we tested. This flaw permits an active attack that can be used under certain circumstances to track any existing wireless device.

SocialProbe: Understanding Social Interaction Through Passive WiFi Monitoring

Conference Paper

Full-text available

Nov 2016

In this paper, we present an approach to extract social behavior and interaction patterns of mobile users by passively monitoring WiFi probe requests and null data frames that are sent by smartphones for network control/management purposes. By analyzing the temporal and spatial correlations of the Receive Signal Strength Indicators (RSSI) of packets from these low rate transmissions, we are able to discover proximity relationships, occupancy patterns, and social interactions among users. We evaluate the SocialProbe system using commodity off-the-shelf smartphones and WiFi Access Points in two locations, a research lab and a public dining area. The result shows that the proposed approach is able to obtain reliable social relationships and interactions in a non-intrusive way.

Video-Based Abnormal Human Behavior Recognition—A Review

Article

Full-text available

Nov 2012

Modeling human behaviors and activity patterns for recognition or detection of special event has attracted significant research interest in recent years. Diverse methods that are abound for building intelligent vision systems aimed at scene understanding and making correct semantic inference from the observed dynamics of moving targets. Most applications are in surveillance, video content retrieval, and human-computer interfaces. This paper presents not only an update extending previous related surveys, but also a focus on contextual abnormal human behavior detection especially in video surveillance applications. The main purpose of this survey is to extensively identify existing methods and characterize the literature in a manner that brings key challenges to attention.

A Bayesian computer vision system for modeling human interactions

Article

Jan 2000

Noninvasive Bluetooth Monitoring of Visitors' Length of Stay at the Louvre

Article

Apr 2017

Art museum professionals traditionally rely on observations and surveys to enhance their knowledge of visitor behavior and experience. However, these approaches often produce spatially and temporally limited empirical evidence and measurements. Only recently has the ubiquity of digital technologies revolutionized the ability to collect data about human behavior. Consequently, the greater availability of large-scale datasets based on quantifying visitors' behavior provides new opportunities to apply computational and comparative analytical techniques. In this article, the authors analyze visitor behavior in the Louvre Museum from anonymized longitudinal datasets collected from noninvasive Bluetooth sensors. They examine visitors' length of stay in the museum and consider this relationship with occupation density around artwork. This data analysis increases museum professionals' knowledge and understanding of the visitor experience. This article is part of a special issue on smart cities.

Rao-Blackwellised particle filtering for dynamic Bayesian networks

Article

Why MAC Address Randomization is not Enough: An Analysis of Wi-Fi Network Discovery Mechanisms

Conference Paper

May 2016

We present several novel techniques to track (unassociated) mobile devices by abusing features of the Wi-Fi standard. This shows that using random MAC addresses, on its own, does not guarantee privacy. First, we show that information elements in probe requests can be used to fingerprint devices. We then combine these fingerprints with incremental sequence numbers, to create a tracking algorithm that does not rely on unique identifiers such as MAC addresses. Based on real-world datasets, we demonstrate that our algorithm can correctly track as much as 50% of devices for at least 20 minutes. We also show that commodity Wi-Fi devices use predictable scrambler seeds. These can be used to improve the performance of our tracking algorithm. Finally, we present two attacks that reveal the real MAC address of a device, even if MAC address randomization is used. In the first one, we create fake hotspots to induce clients to connect using their real MAC address. The second technique relies on the new 802.11u standard, commonly referred to as Hotspot 2.0, where we show that Linux and Windows send Access Network Query Protocol (ANQP) requests using their real MAC address.

How talkative is your mobile device?

Conference Paper

Jun 2015

Julien Freudiger

The IEEE 802.11 standard defines Wi-Fi probe requests as a active mechanism with which mobile devices can request information from access points and accelerate the Wi-Fi connection process. Researchers in previous work have identified privacy hazards associated with Wi-Fi probe requests, such as leaking past access points identifiers and user mobility. Besides several efforts to develop privacy-preserving alternatives, modern mobile devices continue to use Wi-Fi probe requests. In this work, we quantify Wi-Fi probe requests' threat to privacy by conducting an experimental study of the most popular smartphones in different settings. Our objective is to identify how different factors influence the probing frequency and the average number of broadcasted probes. Our conclusions are worrisome: On average, some mobile devices send probe requests as often as 55 times per hour, thus revealing their unique MAC address at high frequency. Even if a mobile device is not charging and in sleep mode, it might broadcast about 2000 probes per hour. We also evaluate a commercially deployed MAC address randomization mechanism, and demonstrate a simple method to re-identify anonymized probes.

Mind Your Probes: De-Anonymization of Large Crowds Through Smartphone WiFi Probe Requests

Conference Paper

Apr 2016

Whenever our smartphones have their WiFi radio interface on, they periodically try to connect to known wireless APs (networks the user has connected to in the past). This is done through WiFi Probe requests—special wireless frames that contain the MAC address of the sending device and, in most of the cases, the human-readable name-string (SSID) of the known AP. This semantic information, inherent to the network protocol, is sent in the clear and, if sniffed, can help discover important information and phenomena of people and human nature that have nothing to do with technology. In this paper we present the idea of exploiting WiFi probe requests to de-anonymize the origin of participants in large events. We make use of several, publicly available datasets containing more than 11M of probe requests collected in scenarios that are of citywide, national (two political meetings), and international religion-related relevance. We show how, by exploiting the semantic information brought by the relative WiFi probes, we are able to discover with high accuracy the provenance of the crowds in each event. In particular, the de-anonymization outcome of the two political meetings held few days before the election days in Italy match surprisingly well the official voting results reported for the two respective parties.

Mind Your Probes: De-Anonymization of Large Crowds Through Smartphone WiFi Probe Requests

Conference Paper

Apr 2016

CrowdProbe: Non-invasive Crowd Monitoring with Wi-Fi Probe

Abstract and Figures

Recommended publications

Reading handwritten German words in historical documents

Design of Smart Home Energy Management System for Saving Energy

1A1-C18 Life Log System based on Recording Touching Object

A machine learning approach for dynamic spectrum access radio identification