ThesisPDF Available

Estimating Spatial-Busyness Using Wi-Fi Traffic Filtering

April 2020

April 2020

DOI:10.13140/RG.2.2.16812.41607

Thesis for: BSc Multimedia, Mobile and Web Development
Advisor: Dr. Stephen Brown

Authors:

Jesse Ashmore

National University of Ireland, Maynooth

Higher education institutions in Ireland are facing increased student numbers and reduced funding year on year. This has placed a strain on available campus facilities, impacting student comfort and raising safety concerns associated with congestion and footfall. This project aims to create a unique perspective on the increased congestion levels around campus by drawing from two research areas: perceived crowding, and wireless crowd sensing. A new term, spatial busyness, is defined to describe our perception of how busy a location is. By utilising the unique nature of MAC Addresses, publicly visible in all transmitted Wi-Fi packets, it is possible to obtain a count of unique devices in an area. A relative measure of busyness, a number between 0.0 – 1.0, is constructed by comparing current device counts with historical patterns. A system was created to evaluate the effectiveness of Wi-Fi scanning and the use of relative measurements to convey a sense of busyness was investigated. It was found that while our human perception of busyness is mostly based on the occupancy level of a location (something the system measures well), other factors such as noise levels can strongly affect our impression of busyness. Although this caused some inconsistencies in measurements, the system in general provided informative and useful data to both target user groups. The system was evaluated as a product in terms of its applications, strengths, and weaknesses, and the potential reasons for dissatisfactory performance in certain environments and the challenges in conveying a sense of busyness through words are explored. These results enable future work as an extension to this project and helps to direct research in improving the interpretation and representation of sensed wireless activity.

Technical problem overview

…

Final project architecture

…

Figures - uploaded by Jesse Ashmore

Content may be subject to copyright.

Content uploaded by Jesse Ashmore

Content may be subject to copyright.

Estimating Spatial-Busyness Using Wi-Fi Traffic

Filtering

Aaron Jesse Ashmore

FINAL YEAR PROJECT – 2020

B.Sc. Single Honours in Multimedia, Mobile and Web Development

Department of Computer Science

Maynooth University

Maynooth, Co. Kildare

Ireland

A thesis submitted in partial fulfilment of the requirements for the

B.Sc. Single Honours in Multimedia, Mobile and Web Development

Supervisor: Dr. Stephen Brown

Page| 2

CONTENTS

DECLARATION 5

ACKNOWLEDGEMENTS 6

ABSTRACT 7

1 INTRODUCTION 7

1.1 Topic addressed in this project 7

1.2 Motivation 7

1.3 Problem statement 8

1.4 Approach 8

1.4.1 Research & Establishment of Problem 8

1.4.2 Development, Investigation & Experimentation 8

1.4.3 Testing & Evaluation 8

1.5 Metrics 8

1.6 Project Overview 9

1.6.1 Implementation Overview 9

1.6.2 Significant Achievements 9

2 TECHNICAL BACKGROUND 9

2.1 Topic Material 9

2.1.1 Overview of people-counting techniques 9

2.1.2 Wireless people-counting techniques 10

2.1.3 Value of RSSI as a metric 10

2.1.4 Applications of static and mobile sensors 10

2.1.5 Difficulties in the area of people counting 10

2.2 Technical Material 11

2.2.1 TypeScript Documentation [21] 11

2.2.2 Docker Compose Documentation [8] 11

2.2.3 Endpoint testing with Jest and Supertest [23] 11

2.2.4 ESP32 – Wi-Fi sniffer [24] 11

2.2.5 802.11 frames: A starter guide to learn wireless sniffer traces [25] 11

2.2.6 Airodump-ng [26] 11

2.2.7 Enable Monitor Mode & Packet Injection on the Raspberry Pi [27] 11

2.2.8 MAC Address Randomisation in iOS [28] 12

2.2.9 What does GDPR say about Wi-Fi tracking? [30] 12

Page | 3

3 THE PROBLEM – DETECTING & REPRESENTING BUSYNESS 12

3.1 Technical Problem Overview 12

3.2 Identifying Key Problem Areas 13

3.2.1 Data Sensing & Processing 13

3.2.2 Data Storage 13

3.2.3 Data Analysis 13

3.2.4 Data Visualisation & Representation 13

4 EXPERIMENTS & INVESTIGATIONS 14

4.1 Sensor Hardware Decision – ESP32 vs. Linux (+ compatible chipset) 14

4.1.1 Equipment 14

4.1.2 Considerations/Variables 14

4.1.3 Results 14

4.1.4 Conclusion 14

4.2 Scan Parameters 15

4.3 Correlation between devices and people 15

4.4 How busy is “busy”? – Representing a sense of busyness 16

4.4.1 Setup 16

4.4.2 Results 16

4.5 Notable Challenges 16

4.5.1 MAC Address Randomisation – Filtering bad data 16

4.5.2 Monitor-Client Mode Switching – Kernel modifications in the Raspberry Pi 17

4.5.3 Staying Alive - Redundancy in the Raspberry Pi 17

4.5.4 Database/API Optimisation – Reducing query overhead via batch processing 17

4.5.5 Heavy Data – Difficulties and success during high-traffic events 18

5 DESIGN & IMPLEMENTATION SOLUTIONS 18

5.1 Final Design Diagram 18

5.2 Addressing Key Problem Areas 18

5.2.1 Data Sensing & Processing – Measuring occupancy 19

5.2.2 Data Storage 19

5.2.3 Data Analysis 20

5.2.4 Data Visualisation & Representation 21

Page| 4

6 EVALUATION & DISCUSSION 22

6.1 Software Verification & System Correctness 22

6.1.1 Sensor Components 22

6.1.2 Server Components 22

6.1.3 User Interface Components 23

6.2 System Effectiveness & Discussion 23

6.2.1 User Feedback – MSU Events Officer 23

6.2.2 Survey of Estimation Accuracy & Discussion of Busyness Perception 24

6.3 Evaluation Summary & Project Metrics 25

7 CONCLUSION 25

7.1 Limitations of Approach & Threats to Validity 25

7.2 Future Work 25

7.3 Personal Closing Comments 26

1 APPENDIX 27

2 REFERENCES 43

Page | 5

DECLARATION

I hereby certify that this material, which I now submit for assessment on the program of study

as part of the B.Sc. Single Honours in Multimedia, Mobile and Web Development

qualification, is entirely my own work and has not been taken from the work of others - save

and to the extent that such work has been cited and acknowledged within the text of my work.

I hereby acknowledge and accept that this thesis may be distributed to future final year

students, as an example of the standard expected of final year projects.

Signed: Date: 20/03/2020

Page| 6

NOTES & ACKNOWLEDGEMENTS

Please note that this thesis is submitted alongside accompanying source code in the supplied

zip archive. Additionally, it was difficult to adhere to the 20-page limit, especially when there

was a lot of work to discuss. An effect of the page limit is that I had little room for diagrams

and screenshots in the main body. Given that visualisation is a major part of this project, please

refer to the Appendix whenever guided, as this provides important context to what is being

discussed. Apologies for the inconvenience of separating so much material.

I’d like to extend my thanks to the following people:

- My supervisor, Dr. Stephen Brown, who provided timely feedback and guidance

throughout, particularly pulling no punches when reviewing this thesis

- Colin Maher and Sandra Byrne, Events Officer and Manager of the MSU respectively,

who offered the SU Bar as a testing ground for the project and provided vital user

feedback

- My peers, who patiently listened to me waffle about the project, provided moral support

during the tougher moments of misbehaving code, and helped directly with user testing

of the user interface/visualisation platform

- Vanush “Misha” Paturyan, for assisting with deploying the sensors around campus,

namely granting access for each device to the internal IoT network

- Maynooth University and the Department of Computer Science, for affording me the

opportunity to pursue a personally suggested project and providing any required

facilities and resources to do so

- The nameless few who I have forgotten to mention – you know who you are (hopefully)

Page | 7

ABSTRACT

Higher education institutions in Ireland are facing increased student numbers and reduced funding year

on year. This has placed a strain on available campus facilities, impacting student comfort and raising

safety concerns associated with congestion and footfall. This project aims to create a unique perspective

on the increased congestion levels around campus by drawing from two research areas: perceived

crowding, and wireless crowd sensing. A new term, spatial busyness, is defined to describe our

perception of how busy a location is. By utilising the unique nature of MAC Addresses, publicly visible

in all transmitted Wi-Fi packets, it is possible to obtain a count of unique devices in an area. A relative

measure of busyness, a number between 0.0 – 1.0, is constructed by comparing current device counts

with historical patterns. A system was created to evaluate the effectiveness of Wi-Fi scanning and the

use of relative measurements to convey a sense of busyness was investigated. It was found that while

our human perception of busyness is mostly based on the occupancy level of a location (something the

system measures well), other factors such as noise levels can strongly affect our impression of busyness.

Although this caused some inconsistencies in measurements, the system in general provided

informative and useful data to both target user groups. The system was evaluated as a product in terms

of its applications, strengths, and weaknesses, and the potential reasons for dissatisfactory performance

in certain environments and the challenges in conveying a sense of busyness through words are

explored. These results enable future work as an extension to this project and helps to direct research in

improving the interpretation and representation of sensed wireless activity.

1 INTRODUCTION

1.1 TOPIC ADDRESSED IN THIS PROJECT

The main topic addressed in this project is the idea of spatial-busyness (henceforth busyness)

estimation. Like people-counting, it is concerned with density and congestion of people in an area. The

main difference between the two ideas is the detail of measurement. While people-counting involves

identifying the precise number of people in an area, busyness estimation is achieved by measuring the

relative activity in an area and comparing this to both historical data and current measurements in

other areas. With this relative measure, we can apply a categorical rating (e.g. busy, quiet, lively, etc.) to

indicate the estimated busyness in an area.

1.2 MOTIVATION

With the ever-increasing population of the Maynooth University campus, cited at 14,000 in 2016 [1],

comes an increased risk to the health and safety of students in busy areas. These risks are compounded

with reduced third-level funding year-on-year since 2009 [2, Fig. 4.1], increasing strain on available

resources and facilities. Automated people-counting systems are employed commercially all over the

world, but what if we were to apply this idea to a campus environment? With information on crowd-

density in areas around campus, security and management will be better able to predict and identify

potentially hazardous locations in real-time. Students suffering with sensory-issues, agoraphobia or

anxiety, or simply a preference for peace and quiet could avoid crowded environments without needing

to assess the location first-hand. Research in the area of non-participatory people tracking is abundant

[3]–[5], which encourages the technical direction of this project, though the focus of this project differs

from [3]–[5] in that it attempts to create a utility for multiple end users. Additionally, I have identified

a journal article [6] with a similar approach to mine which I will discuss later. The article uses “group

gathering” to describe a similar metric to that being used for this report, however due to some differences

in measurement, I am defining busyness as a new term. This project addresses the question of the

viability of representing the busyness of a location through a cheap and non-participatory method,

generating information pertinent to the safety and comfort of the outlined user groups.

Page| 8

1.3 PROBLEM STATEMENT

Busyness is closely related to perceived crowding, to which there are many factors such as room colour

and visual complexity [7]; I will be taking a high-level approach to busyness estimation. I am expecting

contextual clues of busyness (physical location size, typical occupancy, etc.) will be causal factors of the

data I will be collecting. If true, I will not need to deal with these specific clues directly, simplifying data

collection and analysis. The data collected will be used to obtain relative measures against both current

and historical data to provide categorical descriptions of locations. I will be evaluating both the sensing

and descriptive accuracy of a location’s busyness with on-site reports and surveys. Finally, I hope to

determine the viability of obtaining an accurate estimation of busyness without the subjects’

involvement or storage of personally identifiable information.

The project goals require a full system solution, from hardware and firmware, all the way up

through the full web stack, in addition to some research and experimentation throughout development

of various components. There are four technical problem areas that must be overcome in order to build

a functional, reliable, and useful system: sensing, storage, analysis, and visualisation.

1.4 APPROACH

The project can be broken into 3 main phases: research, development, and evaluation. Outlined below

are the three phases in greater detail, with any relevant sections you may wish to jump to from here.

1.4.1 Research & Establishment of Problem

Existing academic research is explored to provide insight into the problem area and potential issues (see

Section 2.1). Technical research is also undertaken throughout the entire process to aid in the

implementation process (see Section 2.2). This phase also involves preliminary research into some

technologies, such as an evaluation of potential sensing hardware found in Section 1.21 of the Appendix.

Finally, the Problem to be solved is established in Section 3.

1.4.2 Development, Investigation & Experimentation

In this phase, the four key problem areas established in the first phase will be addressed. During the

development process, continuous experimentation is required to inform decisions in the four areas. Any

experiments or investigations relevant to the design and decision process are discussed in Chapter 4,

and the decisions made in each area are discussed in detail in Chapter 5.

1.4.3 Testing & Evaluation

The final phase of the project is to deploy and test the developed hardware and software, and to evaluate

the effectiveness of the system and its utility among both target user groups. A detailed, end-to-end

verification of the correctness of the system can be found in Section 1.23 of the Appendix and a

discussion the effectiveness of the system can be found in Section 5.2.

1.5 METRICS

I evaluated each of the problem solutions with a variety of metrics such as server CPU time, API response

time, and general data correctness, but to evaluate the system effectiveness requires engagement with

end users and feedback on its utility. As the motivation stems from the usefulness and accuracy of

implementation, I will be using statements, surveys, opinions, and correlation trends as metrics of

evaluating the project’s success. I will gain an insight into potential future work based on statements

and opinions and will be able to evaluate the busyness descriptors with correlation and surveys.

Page | 9

1.6 PROJECT OVERVIEW

The project is quite involved at all implementation levels and required extensive time spent on each

problem area. While it could be classed as an implementation project in terms of generic crowd-sensing,

there is a reasonable amount of research and investigation being conducted throughout. The following

are a list of implementation stages and significant achievements of the project. Many of these topics will

be visited in more detail later.

1.6.1 Implementation Overview

1. Evaluate potential hardware to be used when sensing chosen signal type

2. Develop software to perform scanning and data processing on hardware

3. Perform ground truth testing of hardware/software

4. Design and build an API & database to store and serve data collected by each deployed device

5. Develop a front-end platform to view historical and real-time data

6. Deploy devices around campus to test scalability

7. Review the deployments for accuracy and value

8. Iterate on hardware implementation and front-end platform to improve value and accuracy

1.6.2 Significant Achievements

1. The correlation between devices and people on campus proved to be a valid assumption

2. The value and utility of the system was confirmed by both parties outlined in the motivation

3. Applying sensing data relativistically was the key to achieving an accurate measure of busyness

4. The metric gathering in this project was achieved using a low-cost, all-in-one sensing device1

2 TECHNICAL BACKGROUND

2.1 TOPIC MATERIAL

While the use of the term “busyness” is not common, the idea is not entirely novel. There are many

approaches, applications, and concerns of measuring occupancy in an area.

Addressing the problem statement with wireless technologies is common practice, and some

existing research ([3], [6], [8], [9]) aligns very closely with this project in terms of technical approach

and implementation, though the use of the end system differs. However, wireless approaches are by no

means the only way to measure occupancy. Research in the area is vibrant and growing, with commercial

applications [10] and more novel uses of wireless device counting such as Google’s traffic prediction

system [11] providing financial incentives to development beyond the more humanitarian motivations

such as the one outlined in this project.

2.1.1 Overview of people-counting techniques

There are several popular methods to people counting. Traditional methods of counting by hand can be

unreliable, so automated systems have taken their place. Standard approaches such as turnstiles are

simple ways of achieving accurate occupancy metrics, but require fixed entry points and limit the flow

of people [12]. A very common means of measurement is through video surveillance, but this method

presents many difficulties such as resolution, movement, and occlusion. While much progress has been

made in this area [13]–[15], additional methods of counting have emerged in recent years with the

1 While some research and projects (e.g. [8], [9]) have used a similar hardware-software combination

such as the Raspberry Pi and Airodump-ng, in the case of [9] an external Wi-Fi antenna is used and in

others (e.g. [3], [4], [6]) use more expensive or complicated setups.

Page| 10

ubiquity of personal wireless devices. Wireless, non-participatory methods include tracking through Wi-

Fi signals ([3], [6], [16]), Bluetooth ([4], [8], [9]), and some more experimental methods such as through

the use of laser-grids [17]. A full history and progression of tracking systems is explored by T. Räty in

their survey of surveillance systems [18].

2.1.2 Wireless people-counting techniques

For this project, I had narrowed the technique down to a wireless approach but needed to decide between

Bluetooth and Wi-Fi. Abedi et al. [4] compare Bluetooth with Wi-Fi as a method of non-participatory

monitoring and investigates how various properties of the environment and sensor antenna affect

scanning. The various comparisons made between the two media indicate a much stronger use case for

Wi-Fi over Bluetooth. Signal propagation, discovery time, and ubiquity of Bluetooth devices fell very

short of Wi-Fi in testing. Additionally, testing of both the 2.4GHz and 5GHz Wi-Fi spectrum led me to

omit the 5GHz channels entirely from the scanning process2.

2.1.3 Value of RSSI as a metric

A hugely valuable metric in people tracking is RSSI, the Received Signal Strength Indicator of a captured

signal. Its main applications are the use of distance tracking and localisation in areas where other

indicators such as GPS are difficult to obtain. Guvenc et al. [19] used RSSI localisation to optimally

position access points in the University of New Mexico, while Bai et al. [9] used RSSI in combination

with other filtering methods to remove noise from their measurements. Similar to Guvenc et al., E.

Vattapparamban [3] uses many RSSIs detected by multiple sensors to locate one device within a grid.

This project utilises RSSI as a simple threshold to filter out devices beyond a desired distance

from the sensor. I briefly explored the variability of this metric2, but there are much more involved

approaches to analysing RSSI discussed by Adewumi et al. [20].

2.1.4 Applications of static and mobile sensors

Much of the research I encountered involved static sensors, that is, a (set of) ground sensor(s) deployed

in a fixed location to capture data. Having a fixed sensor is important in localisation and for this project

too, where the position of the sensor forms part of the data (location). Abedi et al. [6] take a human-

geographical approach by identifying patterns of human behaviour from data collected by their static

sensor. By tracking device persistence in the sensing area, they were able to track time spent in an area

rather than just an aggregate of devices such as in this project.

While static sensors seem to be the most common setup, the use of mobile sensors has come up

a few times during my research. E. Vattapparamban [3] investigated the possibilities of mobilising their

monitoring system by attaching the sensors to drones. In doing so, they showed the potential localisation

powers of such a system in the use of search and rescue, and theorised both malicious uses of and

counters to this type of surveillance. Another form of mobile sensing is Google’s use of users’ personal

devices as individual mobile sensors to identify levels of traffic congestion [11]. The ubiquity of this data

provides unparalleled insight into the way people move throughout the day.

2.1.5 Difficulties in the area of people counting

Depending on the method of people-counting, the challenges vary in nature and difficulty. Video

surveillance systems suffer from several technical challenges in the capture and analysis of video frames.

With relevance to this project, Bai et al. [9] discuss some of the issues encountered during their wireless-

based approach. Discussed in 5.2.2, a similar problem where system accuracy varied between monitored

locations, this is a significant difficulty of such blanket approaches to people-counting systems.

2 See 4.3.2

Page | 11

2.2 TECHNICAL MATERIAL

I used a variety of resources to help me design and develop the project. The following are an overview of

these and what I learned from each.

2.2.1 TypeScript Documentation [21]

JavaScript is the language of the web and is very quick to develop in due to its loose typing and simple

syntax, however these are equally its pitfalls. TypeScript, from Microsoft, aims to meet many of the

shortcomings of JavaScript by adding strict typing and language features not found in JavaScript. For

the API this was a trivial decision, as the additional restrictions helped me catch and identify many errors

in my code before even testing. I used the official documentation to help me set it up in my project.

2.2.2 Docker Compose Documentation [8]

Docker is a great technology that containerises applications for quick and predictable deployment.

Docker compose is an additional tool that helps deploy full stack applications that require multiple

containers. You specify rules for their deployment such as how and when they’re mounted,

dependencies, and how they recover from failure. With this resource, I learned how to Dockerise and

deploy my project quickly and reliably3. Specifically, the documentation taught me how containers

communicate with each other, and how I can interact with them as a developer.

2.2.3 Endpoint testing with Jest and Supertest [23]

I needed to verify the functionality of the database and API with endpoint tests, but I was unfamiliar

with the approach when using MongoDB and Express. I found this to be great resource where I learned

how to use a testing framework called Jest. I used it to test4 various parts of the API such as report

submission and retrieval, among others.

2.2.4 ESP32 – Wi-Fi sniffer [24]

The ESP32 was a candidate during my evaluation of potential sensor hardware5. Support for the device

in packet sniffing is not great, and I am not very experienced working in C, so this resource from Łukasz

Podkalicki helped greatly in developing a minimal implementation with which to test.

2.2.5 802.11 frames: A starter guide to learn wireless sniffer traces [25]

To augment the ESP32 resource (2.2.4), I used this guide from Cisco to help me understand what parts

of the packet were being dissected by the example sniffer code. This greatly helped me understand the

structure of the 802.11 frame6.

2.2.6 Airodump-ng [26]

A vital part of the scanning implementation involved a pen-testing toolkit called Airodump-ng.

Airodump-ng, part of this toolkit, is used for packet sniffing with available wireless chips. The

documentation was vital in tweaking scan parameters and wrapping it in my scanning script correctly.

2.2.7 Enable Monitor Mode & Packet Injection on the Raspberry Pi [27]

Monitor mode allows a device to ‘scan’ for wireless packets being transmitted in the area. The basis of

this project hinged on having periods of scanning in monitor mode to capture all unique devices sending

packets. The Raspberry Pi does not have native support for monitor mode, without this it would not be

3 See appendix 1.24

4 See appendix 1.2

5 See 4.1

6 See appendix 1.1

Page| 12

possible to scan for Wi-Fi signals in an area with the device. Thankfully, Kody wrote an easy-to-follow

guide for enabling support through a firmware patch called Nexmon7. Without this, I would not have

discovered the Kali-Pi, the Linux distribution built for the Pi which includes this modification.

2.2.8 MAC Address Randomisation in iOS [28]

This was my primary resource in understanding how randomisation occurs in mobile devices. The key

takeaway from this was the way to tell a randomised address from a ‘real’ one by checking the second-

least-significant bit in the address. This is on the basis of the IEEE specification [29, pp. 1–2] which

indicates that an address is locally administered i.e. the device created/assigned this address to itself.

2.2.9 What does GDPR say about Wi-Fi tracking? [30]

GDPR is an important consideration in this project. While care was taken to design a system that

minimises the collection of personal data, I must still observe, count and store MAC addresses to a

degree. This website is a great source for information on GDPR in general, but it particularly clarified

the point that “A MAC address is a personal data at the moment it is combined with other (personal)

data that can be traced back to a person”. To avoid any personally identifiable information being stored,

I hash any MAC addresses before they are transmitted to the server, and never extrapolate collected data

to the individual e.g. through movement tracking, time spent in each location, etc. It is certainly possible

for the data I am collecting to be used for more than basic aggregation, but that is not the intention of

the project; an aspect GDPR considers.

3 THE PROBLEM – DETECTING & REPRESENTING BUSYNESS

The technical scope of the project can be broken down into four key problem areas: sensing, storage,

analysis, and visualisation. Figure 1 shows the requirements of the technical problem which we can use

to motivate a discussion of the technical solutions.

3.1 TECHNICAL PROBLEM OVERVIEW

The system required for the project must adhere to the problem overview shown in figure 1. The

sensor(s) will be deployed on the campus which would need remote access to the server. The reports

they submit will need to be stored somewhere such as a database. Analyses will be performed on the

stored data and visualised on some platform. This platform must be accessible to the end user and

display the data in a way that is easy to interpret.

Figure 1 – Technical problem overview

7 https://github.com/seemoo-lab/nexmon

Page | 13

3.2 IDENTIFYING KEY PROBLEM AREAS

3.2.1 Data Sensing & Processing

This problem requires two decisions to be made: what is to be sensed; and the hardware required to do

so. A specific goal of the system is to perform sensing without interaction with the subjects to be sensed

i.e. non-participatory and without the subjects’ knowledge of sensing. This is to simplify the problem

and additionally minimise any data privacy concerns. Sensing the presence of people is the goal of this

step but there are several challenges associated with doing this in a non-participatory manner. As

explored, video surveillance is common, but the approach taken here should minimise privacy concerns.

Additionally, the medium to be sensed should not depend too heavily on the physical characteristics of

the location e.g. the presence of walls or occlusion amongst people would cause problems with an

approach involving visual monitoring like video or even laser grids. There is not enough time to evaluate

every potential solution to the problem, so I am restricting what is to be sensed to some form of

abundant, wireless, easily anonymised signal. The trade-off here is a decrease in accuracy of sensing due

to the characteristics of signal propagation and the source of the signal to be sensed. Thankfully, the

choice of data source helps determine the required hardware (sensor) and testing and experimentation

is required to decide on implementation specifics. The hardware chosen also determines to what extent

the data can be processed before being sent to storage due to processing overhead. In this section I also

need to decide on an appropriate schema for the data format implemented on the data storage side to

ensure integrity. Finally, three critical considerations are: to determine what a sensing/scanning process

involves; the interval at which to perform scans; and any additional parameters of the process.

3.2.2 Data Storage

Given the sensors are deployed to different locations, a remote store such as a networked database must

be chosen. The type of database to use is dependent on the data sensed, how easily it interfaces with

APIs, and my personal experience with it. I must continually evaluate the performance of the database

as my data grows.

3.2.3 Data Analysis

Once the data is stored, it will need to be analysed to meet the requirements set out in the problem

statement and motivation. Comparing recent to historical and live data between locations are the

minimum required operations, but additional analyses relating to visualisation may be required.

Depending on the analysis, it may not be computationally appropriate to run analytics for every query.

3.2.4 Data Visualisation & Representation

Visualisation is a significant part of this project, as it directly addresses the main motivation: providing

data and insights for both the student body and management/security. It is an area of the project

requiring consistent evaluation and adjustment to get right. The platform for visualisation will need to

be easy to use on desktop and mobile, and the visualisations chosen must be informative at a glance.

A vital consideration of the project in order to adhere to the motivation is that the data presented

to a potentially non-technical user must be accessible, familiar, and intelligible. This project aims to

achieve this by translating metrics of the data source (unique device counts) to a sense of busyness.

During data analysis, the relative measure of busyness is quantified to a number between 0.0 and 1.0,

from least to most busy. This does not convey a natural sense of busyness. This project will explore the

idea of presenting a categorical description of busyness, rather than a number. A categorical

representation also provides an ability to skew the interpretation of the 0.0 to 1.0 scale, as this number

may not translate linearly to busyness, and the user should not be required to perform this interpretation

themselves. For example, should a relative measure of 0.3 be classified as “quiet”? Whatever form is

chosen, evaluations to establish its effectiveness and discuss potential alternatives will be performed.

Page| 14

4 EXPERIMENTS & INVESTIGATIONS

The following is a set of the more noteworthy experiments and investigations performed throughout the

project. These were vital during the decision-making process and so will be heavily backreferenced in

the proceeding chapters. In addition, challenges encountered during the process are discussed in 4.5.

Important to note is that not all experiments and investigations discussed took place chronologically

before decisions were made. Some, such as 4.4, were undertaken after an initial implementation cycle.

4.1 SENSOR HARDWARE DECISION – ESP32 VS. LINUX

(+ COMPATIBLE HARDWARE/CHIPSET)

One candidate for the sensor was the ESP32, a small, low-power chip with Wi-Fi support. It was the

primary alternative to using a Linux device with a chipset compatible with Airodump-ng, the packet

sniffing toolkit. This investigation looked at two aspects: performance and ease-of-use. With the data

produced from the tests, along with personal experience with the devices, I was able to evaluate both

potential solutions to this problem. The test took place in the main Eolas lab with 20 people in the room,

with at least one device per person (laptop, likely also phone). Both devices were let scan for 60s,

switching wireless channels every 2s.

4.1.1 Equipment

1. ESP32 running packet sniffing code written in Arduino/C [24]

2. Linux laptop with a USB Wi-Fi solution (Alfa AWUS036NHA)8

4.1.2 Considerations/Variables

1. Channel hop is sequential on the ESP32 but not on the Linux device

2. The Linux device automatically identifies access points and removes them, the ESP32 did not

account for all APs but did filter out any eduroam networks9

3. The Linux device is much more powerful than the ESP32. The higher performance in testing

might not translate to smaller, more portable Linux solutions

4. The antenna used in the Linux is more powerful than the built-in antenna on the ESP32

4.1.3 Results

• The Alfa gathered 44 unique client addresses – APs filtered by Airodump-ng

• The ESP32 gathered 30 unique MAC addresses – Cisco APs manually filtered

The Linux solution is more robust, possibly due to the hardware, but equally due to the existing toolkits

available for this type of project. However, there are other pros and cons which are important to consider

in the decision. Please see appendix 1.21 for pros and cons of each method explored.

4.1.4 Conclusion

With the data and pros/cons evaluated, I decided that the non-ESP32 approach was more appropriate

given the project's time constraints. This result could have been different in other cases, as the test did

not include the exact hardware of a deployable Linux solution. An important note is that use of either

device is not necessarily exclusive and using both may provide greater flexibility in terms of remote

deployment. However, at the time of testing I believed it to be more beneficial to focus on a single

platform first and expand later if necessary.

8 https://www.alfa.com.tw/products_detail/7.htm

9 Using the Cisco OUI – "84:B8:02" – as a filter for all eduroam Aps on campus

Page | 15

4.2 SCAN PARAMETERS

There are a few customisable parameters to Airodump-ng which affect scanning results. Signal

propagation, antenna design, and Wi-Fi filtering specifics are complex areas of research [19], [20], [32].

Due to time constraints, I was unable to perform entirely deductive investigations. Rather, I opted for

inductive but informed approaches to determine the effects of each parameter on the scanning

procedure, the underlying principles and causes of which are not explored in this thesis and are instead

left as an exercise for the reader.

As the data captured included the signal strength of the captured packet (RSSI), I was able to

perform some filtering on this metric in my script. For medium-sized areas, an RSSI threshold of -

80dBm performed best. I settled on this number through some testing of how crowding, noise, and

distance seemed to affect the RSS of a known device to the sensor. Additionally, I tested the effects of

scan length, channel hopping frequency, and channel set:

1. For large areas, an RSSI threshold of >=-85dBm performed best. Small areas worked well with

>=-78dBm

2. For areas of lower occupancy, a scan time of 60 seconds proved enough to capture most signals

3. For areas of higher occupancy, a longer scan time of between 90-120 seconds was better

4. Due to the number of channels to scan in the 5GHz spectrum, scanning was restricted to

2.4GHz. This may result in lost devices, but it seemed to make little difference in the testing

5. Channel hopping would occur at a high frequency, about 500ms, though this isn’t documented.

I found that a more reliable count of devices was taken with a slightly slower frequency of

1000ms between hops.

4.3 CORRELATION BETWEEN DEVICES AND PEOPLE

The crux of the project relies on the assumption that there is a tangible correlation between the number

of devices and the number of people in an area. While this might feel intuitively correct, especially when

supported by ownership rates of wireless devices among the target demographic [31], it is important to

test this assumption. At the same time, one must resist the urge to people-count; this experiment only

aims to measure a correlation, not obtain a device-person ratio, as it is not relevant to the goal of the

project.

I conducted an experiment to measure the correlation between the number of devices sensed and the

number of people in an area with the following aspects:

• The occupancy of the room was continually monitored.

• The sensor had been configured to try capture the devices in the room but not too far beyond10

• Count of people in the room was taken during each sensor scan interval for a several hours

The graphed data11 shows the count of people in a room against the count of non-randomised devices

found during the scanning process. The data exhibited a correlation factor 0.67 between device count

and people count. While this factor is not as significant as I had hoped, there is nonetheless a tangible

and utilisable correlation between devices and people, and the number of unique devices is likely a good

relative measure.

10 See 4.2 – This was not a completely precise boundary due to the complexities of signal propagation.

11 See appendix 1.14

Page| 16

4.4 HOW BUSY IS “BUSY”? – REPRESENTING A SENSE OF BUSYNESS

The title to this section, and inspiration for this investigation, was a YouGov study [33] in which the

perceived meaning of a categorical rating was related to a quantified, numerical value. In order to convey

a sense of busyness, the appropriate wording must be used. Throughout the project, the single most

common point of confusion during demos and user-testing was what a categorical rating (quiet, calm,

lively, etc.) meant. The following is a brief investigation to determine if a set of appropriate words exist

to rate the busyness of an area. It is important to note that this investigation took place after an initial

scale and set of busyness descriptors were in use for some time on the visualisation platform.

4.4.1 Setup

I created a polling page which would present a busyness rating to the user and ask them to represent

that word on scale of least to most busy. The results were submitted to my API and stored in a MongoDB

collection for analysis later. The decision of linear background gradient for the slider was made after an

initial run of the poll, the results of which have been discarded. The reason for discarding the initial run

was two-fold: I identified a potential biasing factor in the poll colour scheme; and I didn’t provide a way

to not answer a question. The colour scheme used on the slider initially matched the colours choice used

on the dashboard UI 12 . However, the placement of each colour on the slider had been directing

participants to place the slider thumb in areas defined by the colours e.g. if the middle were orange,

more intense words such as “busy” might appear here despite being only 50% on the slider. The other

issue was a missing “Not Sure” button, forcing participants to answer even if they felt unsure of the

word’s meaning or feeling. I updated the slider to be less biased by using a single colour gradient13.

4.4.2 Results

An infographic14 was generated from the data 15 generated by 21 participants. It shows how intensely each

word was rated in relation to a sense of busyness. I had thought of filtering the outliers; however, they

signify the confusion that each word holds and so I believe removing them would be dishonest.

In general, the most universally-agreed-on words were found at the extrema. There was

disagreement around words like “hopping” and “humming”, with strong candidates around the intervals

already being used for busyness categorisation on the front end (discussed in 4.2.4). Evaluating the

standard deviation and mean of each word’s intensity, I chose the following categories for busyness

ratings on the front end:

very quiet (0-.15) – calm (.15-.35) – comfortable (.35-.65) – bustling (.65-.85) – hectic (.85-.100)

4.5 NOTABLE CHALLENGES

As this project involved aspects of both hardware and software, there were quite a few interesting or

noteworthy challenges.

4.5.1 MAC Address Randomisation – Filtering bad data

“Media Access Control (MAC) address randomization is a privacy technique whereby

mobile devices rotate through random hardware addresses in order to prevent

observers from singling out their traffic or physical location from other nearby

devices.” – Martin et al. [34, Ch. Abstract]

12 See appendix 1.7

13 See appendix 1.8

14 See appendix 1.9

15 See appendix 1.10

Page | 17

There are some very involved approaches to identifying exactly which device has been randomising its

addresses, some of which are discussed by Martin et al. [34, Sec. 4.2]. Due to complexity and time

constraints, I decided on a much simpler method of discarding any likely-randomised addresses and

retaining the ‘real’ ones. To do this, I identify any properties of a sensed device which might indicate a

‘real’ MAC address and discard any devices which fail these checks, assuming they are randomised.

The following assumptions are made:

1. If a device is associated with a network, it is not randomised

2. If a device’s MAC address begins with a known manufacturer prefix16, it is not randomised

3. If a device’s MAC address is not locally administered, it is not randomised17

4. If the above checks fail, assume it is randomised

With this method18, I was able to obtain a reasonably stable correlation between the number of devices

and the number of people19. I performed a ground truth experiment by counting the number of people

in an area during each scan interval, and correlating it with the number of detected devices, post-

filtering.

4.5.2 Monitor-Client Mode Switching – Kernel modifications in the Raspberry Pi

The Raspberry Pi 3B(+) does not natively support monitor mode on its wireless chipset. A kernel patch

(Nexmon) bundled with a Linux distribution (Kali-Pi) allowed for this mode to be entered. The

Raspberry Pi could now be used for both scanning and data pre-processing. However, due to the

aftermarket nature of this modification, switching between client and monitor mode to allow data

transmission between scans was an arduous task. Instability such as packet loss and complete

disconnections from the network required for submitting scan reports would occur randomly. The only

way to maintain stability was to remove and re-add the wireless chipset driver from the kernel and tidy

up networking services while the script was running20.

4.5.3 Staying Alive - Redundancy in the Raspberry Pi

In the early stages of development, I found that the script would crash often. While many of the culprit

bugs for the crashes would be ironed out over the months, in the meantime I needed a way to guarantee

as best as possible that each deployed Raspberry Pi would stay alive and work as intended during the

scan and submit process. The script running the scan and processing data on the Pi is executed at boot

in a wrapper that force-restarts the device if any unhandled exception occurred. In the script itself, many

potential exceptions are handled such as failure switching wireless mode, failed data transmission, and

processing errors. All scan reports are kept on each device (up to a maximum of 1000) after generation

and backlogged if they failed to send for any reason. In one case, a device failed to transmit data for an

entire day due to an API bug, but once the connection was restored all reports were submitted

retroactively, and no data was lost. Finally, a system was developed to remotely deliver configuration

updates to each Pi uniquely, in case a parameter causes inaccuracies or errors.

4.5.4 Database/API Optimisation – Reducing query overhead via batch processing

Initially, all queries were naively processed at query-time. This was acceptable at first, as the queries

were for recent data and involved at most a few tens of report summaries and comparisons. As

visualisation complexity grew, each request involved summaries of weeks of data for each device,

16 https://regauth.standards.ieee.org/standards-ra-web/pub/view.html#registries

17 See 2.2.7

18 See appendix 1.18 for pseudocode

19 See 4.3

20 See appendix 1.3

Page| 18

comparisons between all days, historic highs and lows, etc. The processing time for these queries quickly

grew and meant page load times were as high as 7s, with each load consuming 4.5s of CPU time on the

server, and with hundreds of kilobytes of JSON transmitted. With compression and batch pre-

processing of large queries, page load times were reduced by 60%, CPU time by 83%, and data was

compressed by -86%. The trade-off was that batch processing, a heavy operation, had to be performed

at regular intervals to ensure freshness of data resulting in about 30s CPU time total each day – though

this would increase linearly as more devices are added to the system. These significant performance

improvements contribute greatly to the responsiveness of the dashboard on all platforms.

4.5.5 Heavy Data – Difficulties and success during high-traffic events

When a device was first deployed in a high-traffic area (SU Bar), a number of issues cropped up that I

had previously thought possible but was not prepared for. Firstly, the device suffered from overheating,

and the scan windows were picking up very little data due to thermal throttling of the processor and/or

chipset. To quickly resolve this, I mounted some coins onto the surface of the CPU. This remediated the

issue at first, but I eventually switched to dedicated aluminium heatsinks. However, now that the device

was no longer throttling and scans could be performed during high-intensity periods, the device stopped

reporting indefinitely. After scanning during a high-intensity event, the device generated reports larger

than the default payload limits in the Node.js API and would hang on trying to submit this backlogged

report after each scan interval. Adjusting the payload limit and trimming some unnecessary information

(such as randomised MAC addresses) from the reports resolved this issue. Luckily, all reports generated

in the meantime were still stored on the device, thanks to the redundancy described in 4.5.3, and no data

was lost while the issue was present.

5 DESIGN & IMPLEMENTATION SOLUTIONS

Motivated by the discussions of Figure 1, the four problem areas, and investigations performed in

Chapter 4, this chapter will discuss the final design and an overview of how each problem area was

addressed.

5.1 FINAL DESIGN DIAGRAM

The final architecture of the system as motivated by discussion and experiments:

Figure 2 – Final project architecture

5.2 ADDRESSING KEY PROBLEM AREAS

In this section, the key problem areas highlighted in 3.2 are revisited and solutions are presented to each

with reference to relevant experiments or research from Chapter 4.

Page | 19

5.2.1 Data Sensing & Processing – Measuring occupancy

The main concerns with sensing were:

1. Finding an abundant, wireless, and anonymous method of activity measurement

2. Testing and developing a suitable hardware solution for the above method

Wireless devices such as mobile phones, tablets and laptops are ubiquitous on university campuses. The

rate of ownership of these devices among students in the Netherlands, a similar demographic, is as high

as 96% [31]. Given the ubiquity of this signal and ease of scanning for public signals, Wi-Fi packets were

chosen as the data source for the project. Wireless packets are constantly being sent from devices, even

when Wi-Fi is disconnected or disabled in OS21. Every wireless packet contains a publicly visible unique

device identifier, the MAC address, found in the Data Link layer22. By scanning for and filtering all

packets in an area using a device in monitor mode23, it is possible to compose a list of these unique

identifiers. Unfortunately, not every wireless device is guaranteed to be transmitting during a scan

window, and some devices may be transmitting with multiple randomised addresses at once 24 .

Addressing these two concerns: devices missed during a scan window can be minimised with a longer

scan window25; and randomised addresses can be filtered with some heuristics for a reasonably reliable26

count of devices in an area. Experiments and investigations around the use of Wi-Fi as a data source,

including which wireless spectrum to scan, were discussed in 4.2 and 4.3.

With respect to hardware, many small IoT devices (such as the ESP32) can filter packets and

extract MAC addresses, but after some testing27 and evaluation of previous experience, the Raspberry Pi

3 was chosen for sensing, pre-processing and transmitting collected data. Additional experiments were

performed to determine some scanning parameters such as scan length and power thresholds at which

to classify a device as “in the area”25. Finally, a script was developed to perform the scan procedure at

five-minute intervals. The process of a scan is:

1. Prepare the device environment by setting the correct date-time, runtime variables and more

2. Check for any remote configuration updates and sync any unsubmitted reports

3. Prepare the device for scanning by entering monitor mode25

4. Delegate Airodump-ng with scan parameters and allow to run for pre-defined scan length

5. Process the collected data into desired format28

6. Switch from monitor mode to client mode and verify internet connection

7. Synchronise any unsubmitted reports, backlogging any failed reports for next time

8. Remain in client mode until 30s before the next scan, upon which re-enter monitor mode

5.2.2 Data Storage

A database is required to store the data collected by deployed sensors. MongoDB, a NoSQL database,

was chosen as the solution for a few reasons:

21 www.xda-developers.com/android-pie-gets-android-oreo-turn-on-wifi-automatically-feature/

22 See appendix 1.1

23 Operating on this mode, the wireless network card is able to capture all types of Wi-Fi Management

packets. https://www.acrylicwifi.com/en/blog/wifi-monitor-mode

24 See both Apple’s and Android’s documentation on randomisation support.apple.com/en-

us/HT201395 & https://source.android.com/devices/tech/connect/wifi-mac-randomization

25 See 4.2

26 See 4.3

27 See 4.1

28 See appendix 1.11

Page| 20

1. The sensed data is non-relational and is stored and transmitted in JSON

2. Having JSON throughout the full pipeline simplifies design of a JSON-based API

3. JSON is compatible out-of-the-box with any JavaScript used on the front end.

4. I have previous experience setting up and working with MongoDB

The database needs to be accessible remotely from each sensor. This is achieved using Docker containers

deployed on the Computer Science Department server. Two containers are used: one for the Node.js

API; and the other for the MongoDB instance, and only the Node.js container can access the MongoDB

container. Node.js was chosen for the API and web server as it comprises the code for both in a single

workspace. MongoDB schema in Node.js validate incoming reports29 from each device and integration

tests are run before each redeployment to ensure endpoint functionality30.

5.2.3 Data Analysis

Document-based databases (e.g. MongoDB) can be slow to query for and compare large amounts of data,

common operations during data analysis. To improve performance, two indexes were created over the

data collection: submission time; and the collecting sensor ID. Together, these reduce memory usage in

sorting and comparing31.

The main task of the analyses is to build a profile of a location in terms of typical activity levels

over time32 and to compare real-time data to historical33. Some aspects of profile-building require large

amounts of data, and so these operations are done as scheduled batch processes using CRON jobs rather

than on a per-query basis, though most visualised analyses will involve some processing on client-side.

One of the later additions to the front-end feature-set was to allow a user to query for predicted

future busyness levels in any monitored locations 34 . This is done by averaging any historical

measurements of the requested location and time and presenting it to the user as a predicted level of

busyness. The feature is simple with much room for improvement, but the basic concept allows a student

or management to plan ahead of time based on historical patterns without the need to go through the

data themselves.

At all stages it is important that the user interface and API remain responsive, as users would

be frequently be accessing the visualisation platform or querying data for up-to-date information. Any

delay in information retrieval from the API could cause slowdowns in the user interface or the API to

return more stale data. API and database performance metrics were continually evaluated in terms of

CPU usage and response time35 throughout development as data structures evolved.

Calculating Busyness

The busyness metric is relative and fit to a scale of 0.0–1.0: least busy to most busy. To obtain this

number for a specific location, the following steps are performed:

1. Gather all location reports over a period (~3 weeks) - the time range within which to relate

busyness

2. With a rolling average, smooth the unique device counts in the reports gathered36

3. Get the minimum and maximum measurements in the smoothed array

29 See appendix 1.11

30 See appendix 1.2

31 https://docs.mongodb.com/manual/core/query-optimization/

32 See appendix 1.4

33 See appendix 1.5

34 See appendix 1.6

35 See 4.5.4

36 See appendix 1.16

Page | 21

4. To filter noise and anomalies, take the three most recent readings (15 minutes) and generate a

weighted measurement using the following formula: ̅ = 0.15 ∗ 3+ 0.25 ∗ 2+ 0.50 ∗ 1,

where x1 is the most recent measurement

5. Take the weighted measurement and map it between the min/max obtained in (3.) for a relative

measure of busyness between 0.0 – 1.0

As you may have noticed, two important steps stand out during processing: smoothing and weighting.

Smoothing was performed on historical data (step 2) to reduce the effect of outliers in historical data, as

inaccurate extrema could greatly impact the reliability of busyness estimation. The weighted

measurements (step 4) favour the most recent reading to maintain focus on freshness of data and

accuracy of real-time fluctuations in busyness, but also includes two previous readings.

These two steps help smooth any anomalous readings that may be unrepresentative of the

location both in real-time and historical data. Both the averaged historical data and the unweighted

busyness measurement are still stored and can be requested from the API, with the both weighted and

unweighted busyness levels made available to users on the UI37.

5.2.4 Data Visualisation & Representation

The visualisation platform is comprised of two parts: A live interactive map; and a dashboard containing

analytics, the interactive map in a smaller format, and general information on the project

The platform of a webpage was chosen as it is universally accessible from desktops and mobiles and

required the minimum amount of ramp-up in terms of implementation. JavaScript frameworks for the

live map (Mapbox) and analytics/graphs (ChartJS) are used to create and display visualisations, with

the webpage layout being implemented without frameworks.

Prioritising usability and performance, the dashboard features a responsive design to ensure compatible

with devices of all screen sizes and care was taken to optimise API requests and data processing38.

Continuous testing throughout development of the UI through user feedback and my own personal real-

world usage of the dashboard on desktop and mobile heavily guided design decisions, such as the

placement of buttons and explanations of certain tools. I also followed best practices and optimisations

suggested by Google Chrome’s Lighthouse tool39.

Representing Busyness

An important role of the visualisation process is to decide on an intelligible representation of the

system’s measurements.

With regards to the live map, I decided to use a heatmap to display two key pieces of

information: the current measure of busyness as calculated in 5.2.3, and how the absolute number of

devices compares between each location. A heatmap provides a quick way to visualise and compare data

without the need for the user to get lost in the numbers. Importantly, however, the heatmap is not a

replacement for the data itself, and all measurements which drive the heatmap are available on the UI

also. The visualisation40,achieved with Mapbox, represents the busyness at that location through the

intensity/colour of the heatmap and the absolute number of devices detected by the radius. This two-

dimensional representation allows us to make statements such as: “Location X is busier than Location

Y, but Y has far more devices”. Such a measurement might indicate that Location Y is physically larger

than Location X, as for a location to have more devices but lower busyness suggests potentially a higher

37 See appendix 1.17

38 See 4.5.5

39 https://developers.google.com/web/tools/lighthouse

40 See appendix 1.11

Page| 22

maximum occupancy, or that the demographic which frequents Location Y has a greatly skewed device-

person ratio. While this project does not attempt to interpret this type data, presenting it to the user

allows them to form their own analyses beyond that of the scope of the project.

There are many ways to represent a measurement of busyness: a simple bar; a rating-out-of-x

system, word categories; the number by itself; and more. With this project, I decided to use a categorical

rating (word) to provide insight on busyness in an area, the impact of which I’ll discuss in the evaluation.

I initially divided the 0.0 – 1.0 scale linearly in 0.2 increments. During testing, I found there were caused

large discrepancies between the system’s categorisation of an area and how I personally perceived it.

Following this, I changed the size of the increments to respect the intensity of extrema; 0.15 for each

extremum, 0.2 given to the closest categories, and a larger 0.3 interval at the centre. Skewing the

intervals rather than categorising linearly was the right direction for initial testing and provided a

reasonable estimation. I later discovered that human perception of busyness is much more complicated

than initially hoped, and the category intervals would be adjusted following surveys. This is detailed in

4.4 and discussed & evaluated in 5.1.1.

Finally, I feel it is important to mention any inspiration for the visualisations. Abedi et al.’s research

directly inspired some of the visualisations employed on my dashboard, such as their stacked area chart

[6, Fig. 5], and the busyness comparison chart41 was both practically and visually inspired by Google’s

location busyness indicators42

6 EVALUATION & DISCUSSION

Due to constraints, please refer to Appendix 1.23 for full end-to-end verification of system correctness

6.1 SOFTWARE VERIFICATION & SYSTEM CORRECTNESS

6.1.1 Sensor Components

Verifying software which interacts with and depends on hardware is difficult. Due to this coupling, I

found it challenging to design formal tests around the scanning script used on the Raspberry Pi. Instead,

I added launch arguments to allow me to run the script in a ‘test’ mode on non-Linux devices and devices

without a wireless chipset so that I could verify the main data flow during development e.g. the data sync

process, report generation. Additionally, any key script parameters are stored in a config.json43 and

synced remotely from the server at runtime, this allowed testing and updating of scan parameters during

deployment phases.

6.1.2 Server Components

Schema on the server side44 are used to validate data generated (reports) by the sensors to ensure data

integrity. If any report fails validation, a HTTP 400 status code (Bad Request) is returned by the server.

In this case, the sensor assumes the report is corrupt, and moves it a separate folder for reports that

cannot be submitted to the server. I manually check this folder from time to time and have found some

rejected reports in the past which were corrupt due to a hardware error causing invalid data to be logged

in the report output.

41 See appendix 1.5

42 See appendix 1.13

43 See appendix 1.15

44 See appendix 1.11

Page | 23

To validate server behaviour, formal API testing was developed for some of the endpoints like

POST and GET45 report. These tests validate the interaction between the sensors and the API, and the

API and the database e.g. reports can be submitted and retrieved, incorrect reports produce the correct

errors, etc. TypeScript [21] was used from the outset on the server side to ensure I was writing safer

code. The endpoint tests were run before each deployment to ensure no core functionality had been

broken by any updates, and report data was frequently checked by hand for sanity. All endpoints were

manually tested throughout the project using Postman The server is running in a docker container using

Docker Compose[22] to ensure it stays alive.

6.1.3 User Interface Components

During development, I continually tested the front-end platform on all devices and evaluated its

performance using Google Chrome’s Lighthouse Tool46. These tests verified that the user interface was

functional and responsive (both in performance and responsive design terms) on smaller screens such

as mobile devices, as well as much larger content display screens. The UX/UI was additionally validated

through user testing and feedback to ensure clear and concise design, without sacrificing functionality.

Many informal user testing trials were performed to evaluate the effectiveness of the dashboard layout,

readability, and presentation of visualised data. The effectiveness of the presentation and estimation is

further discussed in 5.2.2.

6.2 SYSTEM EFFECTIVENESS & DISCUSSION

To evaluate the system’s effectiveness, I surveyed both my own and other’s impression of busyness in

the monitored areas to determine accuracy. I also spoke to the Events Officer of the Maynooth Student’s

Union for feedback on the project’s deployment in the SU Bar to identify potential applications and

improvements. During deployment, I checked monitored areas in person to see if the data being

displayed on the dashboard represented the real-world accurately47 – this also serves as a test of the

system’s correctness and data flow. At the time of observation, busyness was tightly coupled with the

idea of relative occupancy, so my evaluations were positive. I believe that with the understanding gained

during this project I would evaluate the accuracy differently, the reasons for which follow in 5.2.2.

6.2.1 User Feedback – MSU Events Officer

A sensor was deployed in the Students Union bar as a trial run of the project in an uncontrolled

environment. Additionally, this gave the opportunity to evaluate the system as a product. Speaking to

the MSU Events Officer (henceforth User) after a few weeks of deployment, I was able to identify several

uses for the project, as well as some improvements which could be made going forward.

Overall, the User found the data produced to be useful in resource management/allocation, and

for use in data-driven discussions. Particularly, the user would like to have more areas monitored, such

as meeting rooms. It was suggested that this system could integrate with existing resource management

systems such as their room booking tool to identify rooms that have been booked but unused, or rooms

which are incorrectly allocated e.g. large rooms being mostly used for small meetings. The User

highlighted the system’s potential use in evaluating the effectiveness of campaigns by measuring

numbers gathering at campaign events. This would extend the system’s use beyond measuring busyness.

The User expressed a general interest in assisting with the development and trialling of the

project as a product and made a few suggestions in this regard. The visualisations on the front end are

limited, and a finer-grained breakdown of the data or added interactivity for power-users would benefit

45 See appendix 1.2

46 See appendix 1.20

47 See appendix 1.22

Page| 24

the User greatly. Personally, I feel this was a result of having a single dashboard for both end-users

(management and students). Developing for both, I needed to be careful in keeping the interface

informative but accessible by both power and casual users. An additional suggestion was to have a

second metric or form of reporting, particularly in the occurrence of data anomalies. The User suggested

a photograph to be taken during high busyness periods to help the User understand where the

anomalous data was coming from. The User also made an important point regarding this second report,

in that it acts to build trust with users, providing a justification for the data. Of course, taking pictures

would open an entirely new discussion on privacy concerns, but the greater point of building trust

between the system and its users is important to note.

In summary, the User was happy with the trial and is interested in future iterations of the system.

6.2.2 Survey of Estimation Accuracy & Discussion of Busyness Perception

I created an additional webpage like the busyness poll to test current readings from the system against

people’s impressions of busyness in each location. Each participant was presented with a page asking

them to rate the current busyness in the area using a slider. The test was run for a short time before a

pattern emerged.

The data I collected48 was unexpected and indicated that how we perceive busyness is more

complicated than first thought. While my estimations aligned with the ratings provided by the system,

the participants’ did not. I had become accustomed to how the system measured and represented

busyness. I found that users consistently rated busyness higher than the system in the SU Bar and lower

than the system in the Final Year Lab. Speaking to participants directly, I found that factors beyond just

room size and occupancy impacted their perception of busyness. The most cited reason for discrepant

ratings was the noise level in an area – which seems obvious in hindsight – as a silent room at maximum

occupancy is going to feel a lot less busy than the same room at half occupancy but a loud noise level. I

believe more metrics such a noise or light levels in an area would greatly improve busyness estimation.

The discrepancies recorded also allude to how non-linear our perception of busyness can be;

perceived busyness increases rapidly as the number of people grows from zero. As mentioned in 5.2.4, I

moved from a linear busyness categorisation to a more bell-curved shape. Despite basing the busyness

intervals on the poll results49, these did not seem representative of people’s perception in the real world.

I believe that that the poll may have been evaluating people’s impression of occupancy/crowding rather

than busyness, and that the intervals generated by the survey were not appropriate. While I couldn’t

collect enough data to confirm this assumption, from discussions with participants it seems the category

intervals need to be skewed towards the upper extreme:

very quiet (0-.10) – calm (.10-.25) – comfortable (.25-.45) – bustling (.45-.75) – hectic (.75-.100)

I speculate that participants lacked contextual clues to rate the implication of each word, as they

were not physically present in an environment representative of each surveyed level of busyness. As a

result, participants may have had to fall back on their understanding of busyness, potentially driven by

an imagination of occupancy. If this is the case, then my system will be inaccurate for the same reason:

it is estimating busyness with only a single metric. This goes back to my hopes in the introduction: that

contextual clues to perceived busyness will be implicit in the data I collect. It seems this is not the case,

though occupancy can still be a good measure – suggested by other research of perceived crowdedness

[35] – but for this project the scale must be skewed. Skewing the categorisation scale might provide

greater utility too, as data will be classified both in terms of what’s closer to our perception of busyness

and an estimation of occupancy level, useful for students and management respectively. Biasing this

48 See appendix 1.19

49 See appendix 1.10

Page | 25

scale could also result in over-estimating the level of perceived busyness. As this system potentially

directly impacts the safety and well-being of students, false positives are preferred to false negatives.

6.3 EVALUATION SUMMARY & PROJECT METRICS

In total, nearly 300 commits were made to the project, contributing to ~4,000 executable lines of code.

Many commits were in blocks, consisting of a new feature and fixes for bugs introduced by that feature.

The top three languages used were Python (38.9%), TypeScript (26.1%), and JavaScript (20.9%). The

correctness of the system has been sufficiently validated. Code relating to various components was

shown and any results of testing discussed and there is a verified flow of data from sensor to server and

server to front end platform. Again, please refer to Appendix 1.23 for a full system verification. The

effectiveness of the system was evaluated, but not conclusively verified. Through survey results and

discussions with users of the system more work is required to accurately represent the busyness of a

location to more closely align it with human perceived busyness, but the data required is likely there.

7 CONCLUSION

Over the project’s duration, I was able to come to several significant conclusions:

1. Busyness is a valuable metric to the parties outlined in the motivation (students & management)

2. People sense busyness primarily as a feeling; some verbal descriptors struggle to convey this

3. There is a utilisable correlation between the number of devices and people in an area

4. A relative measure of busyness is only accurate when captured data includes occupancy extrema

in a location i.e. during quieter weeks, busyness estimation will be more sensitive and skewed

5. Additional relative metrics such as ambient noise levels could improve estimation effectiveness

6. It is possible to build a low-cost, non-participatory solution to busyness estimation

7.1 LIMITATIONS OF APPROACH & THREATS TO VALIDITY

There are several limitations of the approach taken which threaten the validity of the above conclusions.

1. The use of device-person correlation only works if the ratio remains consistent in an area. For

example, if a location has an emergent device-person ratio of 2:1, then the system implicitly

treats a user with only 1 device as less impactful on the level of busyness despite being an

additional person in the location. Inconsistent ratios can result in inflation and undercounting

2. By using only a single metric for busyness, the system does not consider additional contextual

clues which may impact our human perception of busyness (such as noise)

3. My experiments & investigations into the behaviour of antennae, wireless chipsets, and signal

propagation were very limited, and any assumptions made around these behaviours and

parameters may have negatively impacted the accuracy of measurements

4. The decision to represent busyness categorically (on a word scale) may have been detrimental

to busyness estimation/presentation accuracy, as discussed in 5.2. Alternative representations

could have been a continuous form such as a bar or keeping a discrete but wordless x-out-of-y

system e.g. using stars or other symbols

5. By opting for a non-participatory approach to sensing, I could avoid potential GDPR issues.

However, anonymous systems tend to be more limited in accuracy as they can’t provide exact

person count. Additionally, care must be taken when developing such systems. For example, it

was ruled that Google overstepped the mark in their data collection for Google Street View [36]

7.2 FUTURE WORK

Given the large corpus of existing people-tracking research and products, there are many directions in

which to take future work. Additionally, the use of commercial applications [10] of similar people-

Page| 26

tracking systems is popular in cities and transport, and in particularly shopping outlets where human

behaviour can be harnessed directly to increase sales [37].

In terms of expanding the current implementation, I would like to increase the coverage of the

system to the rest of the campus buildings, the library especially. A similar project/product is Waitz [38],

a busyness monitoring tool for students and staff at USC. Personally, this would be my ideal future vision

of this project in all aspects (UI, coverage, utility, etc.). In discussions with peers, I found that busyness

monitoring in the Maynooth University gym and library were the two most suggested areas to cover,

these would be the next targets.

The use of people-tracking and busyness monitoring in strategic planning is highly valuable.

Being able to quickly evaluate historical busyness levels with a non-participatory measurement system

provides a data-driven argument for strategic event planners when justifying administrative decisions

(e.g. increased security, footfall improvement, etc.). I would like to expand the existing dashboard toolset

to include report generation for administration purposes, providing deeper insights into the profile of a

location such as time spent in a location, well researched by Abedi et al. [6].

Finally, I would like to improve accuracy of busyness estimation. By spending more time

researching the technical and psychological aspects of this project, such as signal propagation and

human-perceived busyness, it could be possible to achieve a much more reliable measurement and

representation of a location’s busyness. Additional metrics such as light and noise levels to complement

the existing device count metric would also generate more data to compare and corelate, and likely

increase the accuracy of the system.

7.3 PERSONAL CLOSING COMMENTS

This project asked a question: “Can you tell how busy an area is just from the number of devices in that

area?”. As an investigation, whether the answer was yes or no, it would have been successful. However,

I found that it quickly grew from an investigation to somewhat of a passion-project, and I thoroughly

enjoyed working on all aspects of implementation – even the cryptic Linux kernel errors.

Over the course of the project I improved many skills. In terms of the non-technical aspects, the

first which comes to mind is report writing. This thesis is by far the largest and most academically

involved document I’ve written, so learning how to organise my thoughts over such a large space has

been challenging but immensely rewarding. Learning to strictly manage my deadlines relating to the

thesis has also been a valuable learning experience. Technologically, I’ve become much more familiar

with Linux and Raspberry Pis, and my understanding of wireless technologies has improved. While I

already had reasonable experience with web development, my skills with CSS, JavaScript and TypeScript

have been further developed and complemented with additional experience with Node.js and MongoDB.

As my first ‘real’ research project, it was hard not to become attached to the idea that the answer

to that original question would be “yes”. I was striving for a working product, focusing on the end-user

experience, but I also needed to validate my decisions and assumptions both theoretically and

empirically. These two ideals did not coexist peacefully, especially when the data didn’t match my

expectations. At those moments, remembering the point of the project was important: to answer that

original question. Despite data from the busyness survey suggesting that the system was lacking

important contextual clues to estimate more accurately, I believe this can be overcome with better

heuristics. Perhaps time spent in a location is an important factor, as lots of people moving through an

area could make it seem even busier (motion seems to increase perceived busyness), or there is a better

set of intervals or categories to use for busyness visualisation. So in closing, can you tell how busy an

area is just from the number of devices in that area? I would say yes, but as always, there’s more work

to be done to perfect the system within its limitations.

Page | 27

1 APPENDIX

The appendix is quite lengthy, so I have provided a brief overview to its contents here. Of course, the full

context of each item is found in the relevant sections of the main body where they are referenced. Due

the amount of source code for the project it has been excluded from the appendix and can instead

be found in the accompanying zip file.

APPENDIX CONTENTS

1. The data format for a generic 802.11 frame, showing the MAC addresses in Address 1-3

2. An example of an automated endpoint test used on the API as part of verification (TypeScript)

3. The script used to switch the RPi wireless chipset between monitor and client mode (Python)

4. Front-end – visualisation of each day’s busyness data for comparison between days & times

5. Front-end – visualisation of current day’s busyness by the hour, compared to historical patterns

6. Front-end – tool to allow users to query for predicted busyness levels by location and time

7. Busyness Poll – slider (V1) used by participants to indicate perceived intensity of each word

8. Busyness Poll – slider (V2) updated to reduce potential bias caused by colour choice in V1

9. Busyness Poll – data visualised as a ridgeline plot with mean and standard deviation

10. Busyness Poll – raw data unprocessed

11. JSON schema used to verify the format of reports generated by RPis and stored in MongoDB

12. Front-end – Live heatmap display of Campus with real-time sensor data (post-processing)

13. Front-end – Inspiration for historical comparison in Appx. 1.5

14. Graph of devices measured vs. people counted during experiment – correlation of 0.67

15. Example of a sensor configuration file which can be updated remotely on a per-sensor basis

16. Algorithm used to perform a simple moving average on historical busyness data (TypeScript)

17. Front-end – both real/weighted busyness measurements and min/max are shown to the user

18. Pseudocode algorithm used when filtering sensor data for ‘randomised’ MACs (Python)

19. The raw data collected during system accuracy evaluation – shows System against User rating

20. Lighthouse benchmark of front-end platform showing high performance and accessibility

21. Pros and Cons evaluated during decision process of sensor hardware – related to 4.1.

22. Some personal observations of system accuracy before busyness was evaluated as a sense

23. Full demonstration of end-to-end system correctness – follow sensed data through all stages

24. Configuration file used for Docker deployment of both the MongoDB and Node.js containers

Page| 28

1.1 802.11 GENERIC MAC FRAME

https://en.wikipedia.org/wiki/802.11_Frame_Types#/media/File:802.11_frame.png (Buhadram, CC BY-SA 4.0)

1.2 SAMPLE ENDPOINT TEST

Tests the scan report retrieval endpoint used as a basis for many analyses

* Test the /GET route

describe('/GET_report_range', () => {

// Test Report Data

let report: any = {

...

};

it('should receive 404 if no reports found', async (done) => {

// Send request for report and store result

const api = `/report/range?device=${testDevices[0]}&start=${new Date(yesterday).t

oISOString()}`;

const result = await request(app).get(api).send();

// Verify empty body and correct HTTP status

expect(result.status).toEqual(404);

expect(result.body).toMatchObject({});

done();

});

it('should receive a single report for one device', async (done) => {

// Customise the test report

let insert1 = report;

insert1.summary.device = testDevices[0];

insert1.summary.time = Date.now() - 1000

// Add it to the database

await new ReportSchema(insert1).save();

// Send request for report and store result

const api = `/report/range?device=${testDevices[0]}&start=${new Date(yesterday).t

oISOString()}`;

const response = await request(app).get(api).send();

// Verify test report is returned from the API.

expect(response.body).toHaveProperty(testDevices[0]);

expect(response.body[testDevices[0]].length).toStrictEqual(1);

done();

});

Page | 29

1.3 SCRIPT FOR SWITCHING TO CLIENT MODE ON THE RASPBERRY PI

def set_client():

"""

Sets the Raspberry Pi into Client Mode for internet access (wlan0)

"""

print("\n---- Switching to Client ----\n")

client_interface = interface

if "mon" in client_interface:

client_interface = interface[:-3]

try:

proc.check_output(["ping", "-q", "-c", "1", "-W", "1", "google.com"])

print(f"*** Connected! ***")

except proc.CalledProcessError:

print("-> No existing connection")

print("-> Taking down wifi")

proc.call(["sudo", "airmon-ng", "stop", interface], stdout=proc.DEVNULL)

proc.call(["sudo", "ifconfig", client_interface, "down"])

print("-> Re-registering wlan driver")

proc.call(["sudo", "modprobe", "-r", "brcmfmac"])

proc.call(["sudo", "modprobe", "brcmfmac"])

print("-> Asserting mac address")

proc.call(["sudo", "ifconfig", client_interface, "down"])

proc.call(["sudo", "macchanger", client_interface, "-p"])

print("-> Starting WiFi")

proc.call(["sudo", "ifconfig", client_interface, "up"])

conn_tries = 1

max_tries = 5

while conn_tries <= 3 and get_time_till_next_5_minute() > 75:

print("-> Restarting Networking Service")

proc.call(["sudo", "systemctl", "restart", "networking"])

print(f"-> Done, waiting {conn_tries * max_tries}s to establish connection")

time.sleep(conn_tries * max_tries)

print("-> Running dhclient")

proc.call(["sudo", "dhclient", "-r"], stdout=proc.DEVNULL)

proc.call(["sudo", "dhclient", client_interface])

try:

print("-> Testing network connection")

ping_out = proc.check_output(["ping", "-

I", client_interface, "www.google.com", "-c", "3"])

if "ms" not in ping_out:

print(f"/// Failed to connect after setting client ({conn_tries}/{max_tries

}) ///\n")

conn_tries += 1

else:

print(f"*** Connected! ***")

break

except proc.CalledProcessError:

traceback.print_exc()

print(f"/// Failed to connect after setting client ({conn_tries}/{max_tries})

///\n")

conn_tries += 1

Page| 30

1.4 VISUALISATION – TYPICAL ACTIVITY LEVELS OF A LOCATION

1.5 VISUALISATION – COMPARISON BETWEEN HISTORICAL & CURRENT DATA

Page | 31

1.6 VISUALISATION – FUTURE PREDICTIONS THROUGH USER QUERIES

1.7 BUSYNESS POLL – ORIGINAL SLIDER GRADIENT

1.8 BUSYNESS POLL – UPDATED SLIDER GRADIENT

Page| 32

1.9 BUSYNESS POLL – INFOGRAPHIC OF WORD INTENSITIES

Page | 33

1.10 BUSYNESS POLL – THE RESPONSES

word

mean

std

responses

dead

3.29

8.44

empty

4.95

9.64

very quiet

6.55

8.66

quiet

19.14

12.4

still

19.24

15.99

calm

22.71

8.7

comfortable

40.95

17.41

astir

52.38

17.26

humming

54.79

17.6

dynamic

60.89

11.7

vibrant

64.75

9.43

lively

65.71

10.2

hopping

65.79

20.39

busy

68.65

12.58

buzzing

72.05

15.07

bustling

78.05

13.67

loud

78.6

13.7

very busy

85.05

11.59

packed

87.38

10.58

hectic

91.26

9.58

1.11 REPORT FORMAT GENERATED ON SENSORS AND STORED IN DATABASE

{ summary: {

time: Number,

device: String,

randomised: Number,

unique: Number,

power_limit: Number,

measure_temp: Number,

scan_length: Number,

channel_hop_frequency: Number,

top_devices: [{manufacturer: String, count: Number }]

}, clients: [{

client: String,

association: String,

power: Number,

is_randomised: Boolean

}]}

Page| 34

1.12 MAPBOX DISPLAY OF BUSYNESS USING HEATMAPS

1.13 GOOGLE’S BUSYNESS INDICATOR

Page | 35

1.14 DETECTED DEVICE COUNT AGAINST PEOPLE IN AN AREA

1.15 CONFIG.JSON – FORMAT TO ALLOW REMOTE UPDATES OF SCAN PARAMS

A sample configuration

{

"scan_length": 60,

"channel_hop_frequency": 1000,

"power_limit": -80,

"interface": "wlan0",

"endpoint": "http://weather.cs.nuim.ie/server/ashmore/a/report",

"updated": 0

}

1.16 SIMPLE MOVING AVERAGE (TYPESCRIPT) – SMOOTHING BUSYNESS

/**

* Performs a simple moving average over input dataset

* https://en.wikipedia.org/wiki/Moving_average#Simple_moving_average

* @param dataset number array of input data on which to perform moving average

* @param movingRange the range over which to average data e.g. 3/5/7/... values

private movingAverage(dataset: number[], movingRange: number) {

let rollingValues: number[] = [];

rollingValues.length = movingRange;

rollingValues.fill(0);

const average = [];

for (const [i, point] of dataset.entries()) {

rollingValues[i % movingRange] = point;

if (i % movingRange === 0 && i !== 0) {

average.push(rollingValues.reduce((a, b) => a + b, 0) / movingRange);

}

return average;

}

Measurements over time

Devices Detected vs People

devices people

Page| 36

1.17 PRESENTATION OF REAL-TIME AND WEIGHTED BUSYNESS READINGS

1.18 PSEUDOCODE FOR MAC ADDRESS RANDOMISATION FITLERING

def is_randomised(device):

if device.is_associated:

return False

if device.mac.has_manufacturer_prefix:

return False

if not device.mac.is_locally_administered:

return False

return True

Page | 37

1.19 DATA COLLECTED DURING SYSTEM ACCURACY POLL

Final Year Lab SU Bar

System

User

Error

0.490361

0.22

-0.270361

0.418023

0.217

-0.201023

0.418023

0.237

-0.181023

0.455814

0.278

-0.177814

0.340361

0.172

-0.168361

0.286145

0.223

-0.063145

0.238166

0.18

-0.058166

0.337278

0.142

-0.195278

0.301829

0.067

-0.234829

0.221893

0.302

-0.08011

0.261628

0.177

0.084628

0.261628

0.12

0.141628

Survey

User

Error

0.519784

0.579

0.0592158

0.341945

0.511

0.1690547

0.341945

0.687

0.3450547

0.341945

0.553

0.2110547

0.363133

0.579

0.2158671

0.404272

0.565

0.1607278

0.378165

0.623

0.2448354

0.376187

0.635

0.2588133

0.368705

0.623

0.254295

0.486702

0.594

0.1072979

0.486702

0.521

0.0342979

1.20 CHROME LIGHTHOUSE PERFORMANCE AND ACCESSIBILITY AUDIT

Page| 38

1.21 PROS & CONS OF ESP32 AND GENERIC LINUX CHIPSET TESTING

This section refers to 4.1 in the main body. The following are the pros and cons of each choice of

device/setup in sensing.

1.21.1 Linux + Airodump-ng

Pros

- Pre-written tools to scan MAC addresses

- Works with any compatible chipset.

o Raspberry Pi with a kernel modification can run client/monitor mode

o If a more performant Wi-Fi chipset is needed, the Raspberry Pi has USB support for

external devices like the Alfa used in testing

- Can run from CLI - writes output file that other programs could read and send to server

- Easy to debug over SSH if required

- Can test toolset/scripts easily on any Unix OS

- Has simultaneous client and monitor mode network access with support for 802.1x

Cons

- Requires more power-hungry hardware, limiting deployment options

- Script is pre-written and may have limitations

- Added complexity of running an OS underneath the sensing script.

- The script must should run on boot and only end when requested (ESP32 does this by default)

1.21.2 ESP32 + Arduino/C

Pros

- Specify behaviour as desired, all code written from scratch

- Device can run on battery as it is low powered

- Very small and portable

- Boots immediately into program in memory, no need to schedule auto start for script

Cons

- Very few code examples and little documentation

- Personal unfamiliarity with C and Arduino devices. Code may not be robust or reliable

- Software and hardware performance are poor, lower than any device capable of running Linux

- Acquiring network access will require tricky redirection or certs – no support for 802.1x

1.22 PERSONAL OBSERVATIONS OF BUSYNESS DURING DEPLOYMENT

This evaluation was performed at a time when it was considered that busyness ≈ occupancy, which is

way there are discrepancies between my evaluation of the system and that of participants in the busyness

accuracy survey

System Rating

Personal Evaluation

Observation Notes

Very Quiet

Accurate

Most tables vacant, no people standing

Quiet

Accurate

Many vacant tables, plentiful standing space

Lively Mostly accurate

Feels a little quieter than lively, almost "quiet" (Readings to

this classification)

Busy

Accurate

Most tables filled, some empty, lots of standing space

Very Busy

Accurate

All tables filled, very little standing space

Page | 39

1.23 DEMONSTRATION OF END-TO-END SYSTEM CORRECTNESS

Due to length constraints, the full demo of system correctness is presented in the appendix. This section

will demonstrate the flow of data through the system and validate its correctness. Please refer to Figure

2 in the main body for an overview of the system.

On the sensor, two files are generated. First, the output from Airodump-ng is written to a file

for parsing later. Note the highlighted MAC addresses, these will be followed through the system!

Of the three highlighted address, the device with RSSI -50dBm (the last one) will be filtered as

it is randomised. The reason it was determined as randomised is because its 2nd character is “A”, it wasn’t

associated with a network, and its 3-byte prefix was not found in the OUI reservation list. This is

discussed in 4.5.1. You might wonder why the first device was not considered randomised despite not

being associated with a network and its second character being “C” (seemingly locally administered). As

it happens, "5C:C5:D4" is the OUI reservation for "Intel Corporate", this is the rule which saved it from

being filtered. This is a good example of the MAC address filtering algorithm in action (Appendix 1.18).

Above is the section of the generated report that contains the highlighted addresses from

Airodump-ng’s output. As you can see, the device with -50dBm RSSI has been filtered due to

randomisation. In addition, the report includes the MAC addresses of both clients and APs in a hashed

Page| 40

format. This will allow comparison and filtering if needed in the future but has destroyed the original

MAC Address for privacy.

This is the summary section of the report, the stored client data is not currently used for

anything, but the summary is used during data analysis and live visualisations. This report.json is

now submitted to the API.

Now that the report has been submitted to the API, it will be validated by the database schema

and if valid will be stored in the database. By utilising the GET_reports_by_range endpoint, we can

find the report submitted at timestamp 1583346320601.

Page | 41

This is proof that the report as generated has been submitted to the server and is query-able. It can now

be used in analysis and live reporting.

Refreshing the front-end platform, we can see in the networking tab that the busyness profile

for the Final Year Lab was requested. We can see that the weighted busyness rating of 13.65 included

the latest report generated with a measurement of 9 unique devices. We can also cross-reference the id

and time of the latest report to verify that this is indeed the report that we have been following through

the system. The network information also tells us some other properties of the profile, such as the min

and max device counts, and the range analysed to obtain the min/max counts.

This concludes the journey of data from source to visualisation. The report is now saved in the

database, ready to be analysed in the future, and will contribute to the profile of the Final Year Lab for

the next ~few weeks (depending on analysis).

Page| 42

1.24 DOCKER COMPOSE CONFIGURATION

This is the configuration used to spin up both the API and the MongoDB container on the department

server.

version: '3'

services:

mongo:

image: mongo

restart: always

command: mongod --port 5050

ports:

- "5050:5050"

volumes:

- mongodata:/data/db

environment:

MONGO_INITDB_ROOT_USERNAME: xxxxxxxxxxxxx

MONGO_INITDB_ROOT_PASSWORD: xxxxxxxxxxxxx

web:

build: ./node-api

restart: always

depends_on:

- mongo

ports:

- "5010:5010"

volumes:

mongodata:

driver: local

Page | 43

2 REFERENCES

[1] Maynooth Univeristy, “Maynooth at a glance | Maynooth University.” [Online]. Available:

https://www.maynoothuniversity.ie/about-us/maynooth-glance. [Accessed: 17-Mar-2020].

[2] C. Department of Education and Skills, “Education - CSO - Central Statistics Office.” [Online].

Available: https://www.cso.ie/en/releasesandpublications/ep/p-

mip/measuringirelandsprogress2017/ed/. [Accessed: 17-Mar-2020].

[3] E. Vattapparamban, “People Counting and occupancy Monitoring using WiFi Probe Requests

and Unmanned Aerial Vehicles,” FIU Electron. Theses Diss., 2016, doi: 10.25148/etd.FIDC000246.

[4] N. Abedi, A. Bhaskar, and E. Chung, “Bluetooth and Wi-Fi MAC address based crowd data

collection and monitoring: Benefits, challenges and enhancement,” in Australasian Transport

Research Forum, ATRF 2013 - Proceedings, 2013.

[5] S. B. Azmy, N. Zorba, and H. S. Hassanein, “Quality of Coverage: A Novel Approach to

Coverage for Mobile Crowd Sensing Systems,” 2018 Glob. Inf. Infrastruct. Netw. Symp. GIIS 2018, pp.

1–5, 2019, doi: 10.1109/GIIS.2018.8635769.

[6] N. Abedi, A. Bhaskar, and E. Chung, “Tracking spatio-temporal movement of human in terms

of space utilization using Media-Access-Control address data,” Appl. Geogr., vol. 51, pp. 72–81, 2014,

doi: 10.1016/j.apgeog.2014.04.001.

[7] A. Baum and G. E. Davis, “Spatial and social aspects of crowding perception,” Environ.

Behav., vol. 8, no. 4, pp. 527–544, 1976, doi: 10.1177/001391657684003.

[8] R. Schenström and E. Hörnlund, “Indoor Location Surveillance Utilizing Wi-Fi and Bluetooth

Signals,” 2019.

[9] L. Bai, N. Ireson, S. Mazumdar, and F. Ciravegna, “Lessons learned using Wi-Fi and bluetooth

as means to monitor public service usage,” in UbiComp/ISWC 2017 - Adjunct Proceedings of the 2017

ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the

2017 ACM International Symposium on Wearable Computers, 2017, pp. 432–440, doi:

10.1145/3123024.3124417.

[10] SensMax, “People counting system for shopping malls and smart buildings.” [Online].

Available: https://sensmax.eu/solutions/people-counting-system-for-shopping-malls-and-smart-

buildings/. [Accessed: 12-Feb-2020].

[11] B. Brindle, “How Does Google Maps Predict Traffic? | HowStuffWorks,” 11-Feb-2020.

[Online]. Available: https://electronics.howstuffworks.com/how-does-google-maps-predict-

traffic.htm. [Accessed: 12-Feb-2020].

[12] X. Liu, P. H. Tu, J. Rittscher, A. Perera, and N. Krahnstoever, “Detecting and counting people

in surveillance applications,” IEEE Int. Conf. Adv. Video Signal Based Surveill. - Proc. AVSS 2005,

vol. 2005, pp. 306–311, 2005, doi: 10.1109/AVSS.2005.1577286.

[13] M. A. K. Sagun and B. Bolat, “A novel approach for people counting and tracking from crowd

video,” Proc. - 2017 IEEE Int. Conf. Innov. Intell. Syst. Appl. INISTA 2017, no. July, pp. 277–281,

2017, doi: 10.1109/INISTA.2017.8001170.

[14] J. Li, L. Huang, and C. Liu, “Robust people counting in video surveillance: Dataset and

system,” 2011 8th IEEE Int. Conf. Adv. Video Signal Based Surveillance, AVSS 2011, pp. 54–59, 2011,

doi: 10.1109/AVSS.2011.6027294.

Page| 44

[15] Y. Mao, J. Tong, and W. Xiang, “Estimation of crowd density using multi-local features and

regression,” Proc. World Congr. Intell. Control Autom., pp. 6295–6300, 2010, doi:

10.1109/WCICA.2010.5554367.

[16] K. Akkaya, I. Guvenc, R. Aygun, N. Pala, and A. Kadri, “IoT-based occupancy monitoring

techniques for energy-efficient smart buildings,” 2015 IEEE Wirel. Commun. Netw. Conf. Work.

WCNCW 2015, pp. 58–63, 2015, doi: 10.1109/WCNCW.2015.7122529.

[17] A. Fod, A. Howard, and M. J. Matarić, “A laser-based people tracker,” Proc. - IEEE Int. Conf.

Robot. Autom., vol. 3, no. May, pp. 3024–3029, 2002, doi: 10.1109/robot.2002.1013691.

[18] T. D. Räty, “Survey on contemporary remote surveillance systems for public safety,” IEEE

Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 40, no. 5, pp. 493–515, 2010, doi:

10.1109/TSMCC.2010.2042446.

[19] I. Guvenc, “Enhancements to RSS Based Indoor Tracking Systems Using Kalman Filters,” Ieee

Pervasive Comput., no. 505, pp. 91–102, 2003.

[20] O. G. Adewumi, K. Djouani, and A. M. Kurien, “RSSI based indoor and outdoor distance

estimation for localization in WSN,” Proc. IEEE Int. Conf. Ind. Technol., pp. 1534–1539, 2013, doi:

10.1109/ICIT.2013.6505900.

[21] Microsoft, “TypeScript in 5 minutes · TypeScript.” [Online]. Available:

https://www.typescriptlang.org/docs/handbook/typescript-in-5-minutes.html. [Accessed: 07-Feb-

2020].

[22] Docker Inc., “Compose file version 3 reference | Docker Documentation.” [Online]. Available:

https://docs.docker.com/compose/. [Accessed: 07-Feb-2020].

[23] Z. Liew, “Endpoint testing with Jest and Supertest | Zell Liew,” 2019. [Online]. Available:

https://zellwk.com/blog/endpoint-testing/. [Accessed: 07-Feb-2020].

[24] Ł. Podkalicki, “ESP32 - WiFi Sniffer | Łukasz Podkalicki,” 23-Jan-2017. [Online]. Available:

https://blog.podkalicki.com/esp32-wifi-sniffer/. [Accessed: 07-Feb-2020].

[25] N. Darchis, “802.11 frames : A starter guide to learn wireless sniffer traces,” 25-Oct-2010.

[Online]. Available: https://community.cisco.com/t5/wireless-mobility-documents/802-11-frames-a-

starter-guide-to-learn-wireless-sniffer-traces/ta-p/3110019. [Accessed: 12-Feb-2020].

[26] Aircrack-ng, “airodump-ng [Aircrack-ng].” [Online]. Available: https://www.aircrack-

ng.org/doku.php?id=airodump-ng. [Accessed: 07-Feb-2020].

[27] K. Kinzie, “How To Enable Monitor Mode & Packet Injection on the Raspberry Pi,” 15-Dec-

2018. [Online]. Available: https://null-byte.wonderhowto.com/how-to/enable-monitor-mode-packet-

injection-raspberry-pi-0189378/. [Accessed: 07-Feb-2020].

[28] Johannes, “MAC Address Randomization on iOS,” 18-Feb-2019. [Online]. Available:

https://www.turais.de/mac-address-randomization-on-ios-

12/?fbclid=IwAR0B24Ktyfrzm2wzc6XnvUok-oZIN19pxMLpqANxvRFTVyNZF1TrPq17LhU.

[Accessed: 07-Feb-2020].

[29] I. T. Standards et al., “Standard Group MAC Addresses : A Tutorial Guide,” vol. 10039, no. Llc,

pp. 1–4.

[30] Privacy Company, “What does the GDPR say about WiFi tracking?,” 2019. [Online]. Available:

https://www.privacycompany.eu/blogpost-en/what-does-the-gdpr-say-about-wifi-tracking.

[Accessed: 03-Feb-2020].

Page | 45

[31] M. B. W. Kobus, P. Rietveld, and J. N. Van Ommeren, “Ownership versus on-campus use of

mobile IT devices by university students,” Comput. Educ., vol. 68, pp. 29–41, 2013, doi:

10.1016/j.compedu.2013.04.003.

[32] T. Mitchell, S. Madgwick, S. Rankine, G. Hilton, A. Freed, and A. Nix, “Making the Most of Wi-

Fi: Optimisations for Robust Wireless Live Music Performance,” Proc. Int. Conf. New Interfaces

Music. Expr., 2014.

[33] M. Smith, “How good is ‘good’? | YouGov,” 11-Oct-2018. [Online]. Available:

https://today.yougov.com/topics/lifestyle/articles-reports/2018/10/11/how-good-good. [Accessed:

09-Feb-2020].

[34] J. Martin et al., “A Study of MAC Address Randomization in Mobile Devices and When it

Fails,” 2017.

[35] T. N. Westover and J. R. Collins, “Perceived crowding in recreation settings: An urban case

study,” Leis. Sci., vol. 9, no. 2, pp. 87–99, 1987, doi: 10.1080/01490408709512149.

[36] C. Duffy, “Google privacy lawsuit: Tech giant to pay $13 million over Street View data

collection - CNN,” 25-Jul-2019. [Online]. Available:

https://edition.cnn.com/2019/07/22/tech/google-street-view-privacy-lawsuit-settlement/index.html.

[Accessed: 06-Mar-2020].

[37] D. Oosterlinck, D. F. Benoit, P. Baecke, and N. Van de Weghe, “Bluetooth tracking of humans

in an indoor environment: An application to shopping mall visits,” Appl. Geogr., 2017, doi:

10.1016/j.apgeog.2016.11.005.

[38] Waitz, “Waitz.” [Online]. Available: https://waitz.io/index.html. [Accessed: 02-Mar-2020].

ResearchGate has not been able to resolve any citations for this publication.

A novel approach for people counting and tracking from crowd video

Conference Paper

Full-text available

Jul 2017

A Study of MAC Address Randomization in Mobile Devices and When it Fails

Article

Full-text available

Mar 2017

Media Access Control (MAC) address randomization is a privacy technique whereby mobile devices rotate through random hardware addresses in order to prevent observers from singling out their traffic or physical location from other nearby devices. Adoption of this technology, however, has been sporadic and varied across device manufacturers. In this paper, we present the first wide-scale study of MAC address randomization in the wild, including a detailed breakdown of different randomization techniques by operating system, manufacturer, and model of device. We then identify multiple flaws in these implementations which can be exploited to defeat randomization as performed by existing devices. First, we show that devices commonly make improper use of randomization by sending wireless frames with the true, global address when they should be using a randomized address. We move on to extend the passive identification techniques of Vanhoef et al. to effectively defeat randomization in ~96% of Android phones. Finally, we identify a previously unknown flaw in the way wireless chipsets handle low-level control frames which applies to 100% of devices we tested. This flaw permits an active attack that can be used under certain circumstances to track any existing wireless device.

IoT-based Occupancy Monitoring Techniques for Energy-Efficient Smart Buildings

Conference Paper

Full-text available

Mar 2015

With the proliferation of Internet of Things (IoT) devices such as smartphones, sensors, cameras, and RFIDs, it is possible to collect massive amount of data for localization and tracking of people within commercial buildings. Enabled by such occupancy monitoring capabilities, there are extensive opportunities for improving the energy consumption of buildings via smart HVAC control. In this respect, the major challenges we envision are 1) to achieve occupancy monitoring in a minimally intrusive way, e.g., using the existing infrastructure in the buildings and not requiring installation of any apps in the users' smart devices, and 2) to develop effective data fusion techniques for improving occupancy monitoring accuracy using a multitude of sources. This paper surveys the existing works on occupancy monitoring and multi-modal data fusion techniques for smart commercial buildings. The goal is to lay down a framework for future research to exploit the spatio-temporal data obtained from one or more of various IoT devices such as temperature sensors, surveillance cameras, and RFID tags that may be already in use in the buildings. A comparative analysis of existing approaches and future predictions for research challenges are also provided.

Tracking spatio-temporal movement of human in terms of space utilization using Media-Access-Control address data

Article

Full-text available

Jul 2014
APPL GEOGR

Using Media-Access-Control (MAC) address for data collection and tracking is a capable and cost effective approach as the traditional ways such as surveys and video surveillance have numerous drawbacks and limitations. Positioning cell phones by Global System for Mobile communication was considered an attack on people's privacy. MAC addresses just keep a unique log of a WiFi or Bluetooth-enabled device for connecting to another device that has not potential privacy infringements. This paper presents the use of MAC address data collection approach for analysis of spatio-temporal dynamics of human in terms of shared space utilization. This paper firstly discuses the critical challenges and key benefits of MAC address data as a tracking technology for monitoring human movement. Here, proximity-based MAC address tracking is postulated as an effective methodology for analysing the complex spatio-temporal dynamics of human movements at shared zones such as lounge and office areas. A case study of university staff lounge area is described in detail and results indicates a significant added value of the methodology for human movement tracking. By analysis of MAC address data in the study area, clear statistics such as staff's utilisation frequency, utilisation peak periods, and staff time spent is obtained. The analyses also reveal staff's socialising profiles in terms of group and solo gathering. The paper is concluded with a discussion on why MAC address tracking offers significant advantages for tracking human behaviour in terms of shared space utilisation with respect to other and more prominent technologies, and outlines some of its remaining deficiencies.

Bluetooth and Wi-Fi MAC Address Based Crowd Data Collection and Monitoring: Benefits, Challenges and Enhancement

Conference Paper

Full-text available

Oct 2013

This paper firstly presents the benefits and critical challenges on the use of Bluetooth and Wi-Fi for crowd data collection and monitoring. The major challenges include antenna characteristics, environment’s complexity and scanning features. Wi-Fi and Bluetooth are compared in this paper in terms of architecture, discovery time, popularity of use and signal strength. Type of antennas used and the environment’s complexity such as trees for outdoor and partitions for indoor spaces highly affect the scanning range. The aforementioned challenges are empirically evaluated by “real” experiments using Bluetooth and Wi-Fi Scanners. The issues related to the antenna characteristics are also highlighted by experimenting with different antenna types. Novel scanning approaches including Overlapped Zones and Single Point Multi-Range detection methods will be then presented and verified by real-world tests. These novel techniques will be applied for location identification of the MAC IDs captured that can extract more information about people movement dynamics.

Quality of Coverage: A Novel Approach to Coverage for Mobile Crowd Sensing Systems

Conference Paper

Oct 2018

Lessons Learned Using Wi-Fi and Bluetooth as Means to Monitor Public Service Usage

Conference Paper

Jul 2017

Facets of urban public transport such as occupancy, waiting times, route preferences are essential to help deliver improved services as well as better information for passengers to plan their daily travel. The ability to automatically estimate passenger occupancy in near real-time throughout cities will be a step change in the way public service usage is currently estimated and provide significant insights to decision makers. The ever-increasing popularity and abundance of mobile devices with always-on Wi-Fi/Bluetooth interfaces makes Wi-Fi/Bluetooth sensing a promising approach for estimating passenger load. In this paper, we present a Wi-Fi/Bluetooth sensing system to detect mobile devices for estimating passenger counts using public transport. We present our findings on an initial set of experiments on a series of bus/tram journeys encapsulating different scenarios over five days in a UK metropolitan area. Our initial experiments show promising results and we present our plans for future large-scale experiments.

Bluetooth tracking of humans in an indoor environment: An application to shopping mall visits

Article

Jan 2017
APPL GEOGR

Intelligence about the spatio-temporal behaviour of individuals is valuable in many settings. Generating tracking data is a necessity for this analysis and requires an appropriate methodology. In this study, the applicability of Bluetooth tracking in an indoor setting is investigated. A wide variety of applications can benefit from indoor Bluetooth tracking. This paper examines the value of the method in a marketing application. A Belgian shopping mall served as a real-life test setting for the methodology. A total of 56 Bluetooth scanners registered 18.943 unique MAC addresses during a 19-day period. The results indicate that Bluetooth tracking is a sound approach for capturing tracking data, which can be used to map and analyse the spatio-temporal behaviour of individuals. The methodology also provides a more efficient and more accurate way of obtaining a variety of relevant metrics in the field of consumer behaviour research. Bluetooth tracking can be implemented as a new and cost effective practice for marketing research, that provides fast and accurate results and insights. We conclude that Bluetooth tracking is a viable approach, but that certain technological and practical aspects need to be considered when applying Bluetooth tracking in new cases.

RSSI based indoor and outdoor distance estimation for localization in WSN

Conference Paper

Feb 2013

Research has revealed that the correlation between distance and RSSI (Received Signal Strength Indication) values is the key of ranging and localization technologies in wireless sensor networks (WSNs). In this paper, an RSSI model that estimates the distance between sensor nodes in WSNs is presented. The performance of this model is evaluated and analyzed in a real system deployment in an indoor and outdoor environment by performing an empirical measurement using Crossbow IRIS wireless sensor motes. Our result shows that there is less error in distance estimation in an outdoor environment compared to indoor environment. The results of these evaluations would contribute towards obtaining accurate locations of wireless sensor nodes.

Ownership versus on-campus use of mobile IT devices by university students

Article

Oct 2013
COMPUT EDUC

This study investigated ownership and on-campus use of laptops, tablets, and smartphones, using survey information on Dutch university students. We show that 96% of students own at least one of these mobile IT devices (i.e., a laptop, tablet, or smartphone). Using econometric modelling, we also show that student income, parental income, gender, immigrant parents, and household type (e.g., living with parents) have a statistically significant but small effect on mobile IT device ownership. The demand for tablets is relatively income inelastic, and the demand for laptops and smartphones extremely so. Therefore ownership rates are high for all student groups, including lower income students. However, students leave their laptops (and tablets) at home most of the time, mainly because they find it cumbersome to carry a laptop, and the vast majority of students hold the opinion that abolishing computer labs while facilitating laptop use is a bad idea, despite the didactical advantages this may have during lectures. Thus, it appears that the current high ownership rates of mobile IT devices by no means imply students' preference or support for university Bring Your Own Device (BYOD) strategies.

Estimating Spatial-Busyness Using Wi-Fi Traffic Filtering

Abstract and Figures

Recommended publications

Development of an IOT System for Traffic Analysis Purposes from Capturing MAC Address Based Data

Tracking via Bluetooth and Wi Fi

Urban traffic state estimation: Fusing point and zone based data

Privacy-Aware Crowd Monitoring and WiFi Traffic Emulation for Effective Crisis Management