ThesisPDF Available

Estimating Spatial-Busyness Using Wi-Fi Traffic Filtering

Authors:

Abstract and Figures

Higher education institutions in Ireland are facing increased student numbers and reduced funding year on year. This has placed a strain on available campus facilities, impacting student comfort and raising safety concerns associated with congestion and footfall. This project aims to create a unique perspective on the increased congestion levels around campus by drawing from two research areas: perceived crowding, and wireless crowd sensing. A new term, spatial busyness, is defined to describe our perception of how busy a location is. By utilising the unique nature of MAC Addresses, publicly visible in all transmitted Wi-Fi packets, it is possible to obtain a count of unique devices in an area. A relative measure of busyness, a number between 0.0 – 1.0, is constructed by comparing current device counts with historical patterns. A system was created to evaluate the effectiveness of Wi-Fi scanning and the use of relative measurements to convey a sense of busyness was investigated. It was found that while our human perception of busyness is mostly based on the occupancy level of a location (something the system measures well), other factors such as noise levels can strongly affect our impression of busyness. Although this caused some inconsistencies in measurements, the system in general provided informative and useful data to both target user groups. The system was evaluated as a product in terms of its applications, strengths, and weaknesses, and the potential reasons for dissatisfactory performance in certain environments and the challenges in conveying a sense of busyness through words are explored. These results enable future work as an extension to this project and helps to direct research in improving the interpretation and representation of sensed wireless activity.
Content may be subject to copyright.
Estimating Spatial-Busyness Using Wi-Fi Traffic
Filtering
Aaron Jesse Ashmore
FINAL YEAR PROJECT 2020
B.Sc. Single Honours in Multimedia, Mobile and Web Development
Department of Computer Science
Maynooth University
Maynooth, Co. Kildare
Ireland
A thesis submitted in partial fulfilment of the requirements for the
B.Sc. Single Honours in Multimedia, Mobile and Web Development
Supervisor: Dr. Stephen Brown
Page| 2
CONTENTS
DECLARATION 5
ACKNOWLEDGEMENTS 6
ABSTRACT 7
1 INTRODUCTION 7
1.1 Topic addressed in this project 7
1.2 Motivation 7
1.3 Problem statement 8
1.4 Approach 8
1.4.1 Research & Establishment of Problem 8
1.4.2 Development, Investigation & Experimentation 8
1.4.3 Testing & Evaluation 8
1.5 Metrics 8
1.6 Project Overview 9
1.6.1 Implementation Overview 9
1.6.2 Significant Achievements 9
2 TECHNICAL BACKGROUND 9
2.1 Topic Material 9
2.1.1 Overview of people-counting techniques 9
2.1.2 Wireless people-counting techniques 10
2.1.3 Value of RSSI as a metric 10
2.1.4 Applications of static and mobile sensors 10
2.1.5 Difficulties in the area of people counting 10
2.2 Technical Material 11
2.2.1 TypeScript Documentation [21] 11
2.2.2 Docker Compose Documentation [8] 11
2.2.3 Endpoint testing with Jest and Supertest [23] 11
2.2.4 ESP32 Wi-Fi sniffer [24] 11
2.2.5 802.11 frames: A starter guide to learn wireless sniffer traces [25] 11
2.2.6 Airodump-ng [26] 11
2.2.7 Enable Monitor Mode & Packet Injection on the Raspberry Pi [27] 11
2.2.8 MAC Address Randomisation in iOS [28] 12
2.2.9 What does GDPR say about Wi-Fi tracking? [30] 12
Page | 3
3 THE PROBLEM DETECTING & REPRESENTING BUSYNESS 12
3.1 Technical Problem Overview 12
3.2 Identifying Key Problem Areas 13
3.2.1 Data Sensing & Processing 13
3.2.2 Data Storage 13
3.2.3 Data Analysis 13
3.2.4 Data Visualisation & Representation 13
4 EXPERIMENTS & INVESTIGATIONS 14
4.1 Sensor Hardware Decision ESP32 vs. Linux (+ compatible chipset) 14
4.1.1 Equipment 14
4.1.2 Considerations/Variables 14
4.1.3 Results 14
4.1.4 Conclusion 14
4.2 Scan Parameters 15
4.3 Correlation between devices and people 15
4.4 How busy is “busy”? Representing a sense of busyness 16
4.4.1 Setup 16
4.4.2 Results 16
4.5 Notable Challenges 16
4.5.1 MAC Address Randomisation Filtering bad data 16
4.5.2 Monitor-Client Mode Switching Kernel modifications in the Raspberry Pi 17
4.5.3 Staying Alive - Redundancy in the Raspberry Pi 17
4.5.4 Database/API Optimisation Reducing query overhead via batch processing 17
4.5.5 Heavy Data Difficulties and success during high-traffic events 18
5 DESIGN & IMPLEMENTATION SOLUTIONS 18
5.1 Final Design Diagram 18
5.2 Addressing Key Problem Areas 18
5.2.1 Data Sensing & Processing Measuring occupancy 19
5.2.2 Data Storage 19
5.2.3 Data Analysis 20
5.2.4 Data Visualisation & Representation 21
Page| 4
6 EVALUATION & DISCUSSION 22
6.1 Software Verification & System Correctness 22
6.1.1 Sensor Components 22
6.1.2 Server Components 22
6.1.3 User Interface Components 23
6.2 System Effectiveness & Discussion 23
6.2.1 User Feedback MSU Events Officer 23
6.2.2 Survey of Estimation Accuracy & Discussion of Busyness Perception 24
6.3 Evaluation Summary & Project Metrics 25
7 CONCLUSION 25
7.1 Limitations of Approach & Threats to Validity 25
7.2 Future Work 25
7.3 Personal Closing Comments 26
1 APPENDIX 27
2 REFERENCES 43
Page | 5
DECLARATION
I hereby certify that this material, which I now submit for assessment on the program of study
as part of the B.Sc. Single Honours in Multimedia, Mobile and Web Development
qualification, is entirely my own work and has not been taken from the work of others - save
and to the extent that such work has been cited and acknowledged within the text of my work.
I hereby acknowledge and accept that this thesis may be distributed to future final year
students, as an example of the standard expected of final year projects.
Signed: Date: 20/03/2020
Page| 6
NOTES & ACKNOWLEDGEMENTS
Please note that this thesis is submitted alongside accompanying source code in the supplied
zip archive. Additionally, it was difficult to adhere to the 20-page limit, especially when there
was a lot of work to discuss. An effect of the page limit is that I had little room for diagrams
and screenshots in the main body. Given that visualisation is a major part of this project, please
refer to the Appendix whenever guided, as this provides important context to what is being
discussed. Apologies for the inconvenience of separating so much material.
I’d like to extend my thanks to the following people:
- My supervisor, Dr. Stephen Brown, who provided timely feedback and guidance
throughout, particularly pulling no punches when reviewing this thesis
- Colin Maher and Sandra Byrne, Events Officer and Manager of the MSU respectively,
who offered the SU Bar as a testing ground for the project and provided vital user
feedback
- My peers, who patiently listened to me waffle about the project, provided moral support
during the tougher moments of misbehaving code, and helped directly with user testing
of the user interface/visualisation platform
- Vanush “Misha” Paturyan, for assisting with deploying the sensors around campus,
namely granting access for each device to the internal IoT network
- Maynooth University and the Department of Computer Science, for affording me the
opportunity to pursue a personally suggested project and providing any required
facilities and resources to do so
- The nameless few who I have forgotten to mentionyou know who you are (hopefully)
Page | 7
ABSTRACT
Higher education institutions in Ireland are facing increased student numbers and reduced funding year
on year. This has placed a strain on available campus facilities, impacting student comfort and raising
safety concerns associated with congestion and footfall. This project aims to create a unique perspective
on the increased congestion levels around campus by drawing from two research areas: perceived
crowding, and wireless crowd sensing. A new term, spatial busyness, is defined to describe our
perception of how busy a location is. By utilising the unique nature of MAC Addresses, publicly visible
in all transmitted Wi-Fi packets, it is possible to obtain a count of unique devices in an area. A relative
measure of busyness, a number between 0.01.0, is constructed by comparing current device counts
with historical patterns. A system was created to evaluate the effectiveness of Wi-Fi scanning and the
use of relative measurements to convey a sense of busyness was investigated. It was found that while
our human perception of busyness is mostly based on the occupancy level of a location (something the
system measures well), other factors such as noise levels can strongly affect our impression of busyness.
Although this caused some inconsistencies in measurements, the system in general provided
informative and useful data to both target user groups. The system was evaluated as a product in terms
of its applications, strengths, and weaknesses, and the potential reasons for dissatisfactory performance
in certain environments and the challenges in conveying a sense of busyness through words are
explored. These results enable future work as an extension to this project and helps to direct research in
improving the interpretation and representation of sensed wireless activity.
1 INTRODUCTION
1.1 TOPIC ADDRESSED IN THIS PROJECT
The main topic addressed in this project is the idea of spatial-busyness (henceforth busyness)
estimation. Like people-counting, it is concerned with density and congestion of people in an area. The
main difference between the two ideas is the detail of measurement. While people-counting involves
identifying the precise number of people in an area, busyness estimation is achieved by measuring the
relative activity in an area and comparing this to both historical data and current measurements in
other areas. With this relative measure, we can apply a categorical rating (e.g. busy, quiet, lively, etc.) to
indicate the estimated busyness in an area.
1.2 MOTIVATION
With the ever-increasing population of the Maynooth University campus, cited at 14,000 in 2016 [1],
comes an increased risk to the health and safety of students in busy areas. These risks are compounded
with reduced third-level funding year-on-year since 2009 [2, Fig. 4.1], increasing strain on available
resources and facilities. Automated people-counting systems are employed commercially all over the
world, but what if we were to apply this idea to a campus environment? With information on crowd-
density in areas around campus, security and management will be better able to predict and identify
potentially hazardous locations in real-time. Students suffering with sensory-issues, agoraphobia or
anxiety, or simply a preference for peace and quiet could avoid crowded environments without needing
to assess the location first-hand. Research in the area of non-participatory people tracking is abundant
[3][5], which encourages the technical direction of this project, though the focus of this project differs
from [3][5] in that it attempts to create a utility for multiple end users. Additionally, I have identified
a journal article [6] with a similar approach to mine which I will discuss later. The article uses “group
gathering” to describe a similar metric to that being used for this report, however due to some differences
in measurement, I am defining busyness as a new term. This project addresses the question of the
viability of representing the busyness of a location through a cheap and non-participatory method,
generating information pertinent to the safety and comfort of the outlined user groups.
Page| 8
1.3 PROBLEM STATEMENT
Busyness is closely related to perceived crowding, to which there are many factors such as room colour
and visual complexity [7]; I will be taking a high-level approach to busyness estimation. I am expecting
contextual clues of busyness (physical location size, typical occupancy, etc.) will be causal factors of the
data I will be collecting. If true, I will not need to deal with these specific clues directly, simplifying data
collection and analysis. The data collected will be used to obtain relative measures against both current
and historical data to provide categorical descriptions of locations. I will be evaluating both the sensing
and descriptive accuracy of a location’s busyness with on-site reports and surveys. Finally, I hope to
determine the viability of obtaining an accurate estimation of busyness without the subjects’
involvement or storage of personally identifiable information.
The project goals require a full system solution, from hardware and firmware, all the way up
through the full web stack, in addition to some research and experimentation throughout development
of various components. There are four technical problem areas that must be overcome in order to build
a functional, reliable, and useful system: sensing, storage, analysis, and visualisation.
1.4 APPROACH
The project can be broken into 3 main phases: research, development, and evaluation. Outlined below
are the three phases in greater detail, with any relevant sections you may wish to jump to from here.
1.4.1 Research & Establishment of Problem
Existing academic research is explored to provide insight into the problem area and potential issues (see
Section 2.1). Technical research is also undertaken throughout the entire process to aid in the
implementation process (see Section 2.2). This phase also involves preliminary research into some
technologies, such as an evaluation of potential sensing hardware found in Section 1.21 of the Appendix.
Finally, the Problem to be solved is established in Section 3.
1.4.2 Development, Investigation & Experimentation
In this phase, the four key problem areas established in the first phase will be addressed. During the
development process, continuous experimentation is required to inform decisions in the four areas. Any
experiments or investigations relevant to the design and decision process are discussed in Chapter 4,
and the decisions made in each area are discussed in detail in Chapter 5.
1.4.3 Testing & Evaluation
The final phase of the project is to deploy and test the developed hardware and software, and to evaluate
the effectiveness of the system and its utility among both target user groups. A detailed, end-to-end
verification of the correctness of the system can be found in Section 1.23 of the Appendix and a
discussion the effectiveness of the system can be found in Section 5.2.
1.5 METRICS
I evaluated each of the problem solutions with a variety of metrics such as server CPU time, API response
time, and general data correctness, but to evaluate the system effectiveness requires engagement with
end users and feedback on its utility. As the motivation stems from the usefulness and accuracy of
implementation, I will be using statements, surveys, opinions, and correlation trends as metrics of
evaluating the project’s success. I will gain an insight into potential future work based on statements
and opinions and will be able to evaluate the busyness descriptors with correlation and surveys.
Page | 9
1.6 PROJECT OVERVIEW
The project is quite involved at all implementation levels and required extensive time spent on each
problem area. While it could be classed as an implementation project in terms of generic crowd-sensing,
there is a reasonable amount of research and investigation being conducted throughout. The following
are a list of implementation stages and significant achievements of the project. Many of these topics will
be visited in more detail later.
1.6.1 Implementation Overview
1. Evaluate potential hardware to be used when sensing chosen signal type
2. Develop software to perform scanning and data processing on hardware
3. Perform ground truth testing of hardware/software
4. Design and build an API & database to store and serve data collected by each deployed device
5. Develop a front-end platform to view historical and real-time data
6. Deploy devices around campus to test scalability
7. Review the deployments for accuracy and value
8. Iterate on hardware implementation and front-end platform to improve value and accuracy
1.6.2 Significant Achievements
1. The correlation between devices and people on campus proved to be a valid assumption
2. The value and utility of the system was confirmed by both parties outlined in the motivation
3. Applying sensing data relativistically was the key to achieving an accurate measure of busyness
4. The metric gathering in this project was achieved using a low-cost, all-in-one sensing device1
2 TECHNICAL BACKGROUND
2.1 TOPIC MATERIAL
While the use of the term “busyness” is not common, the idea is not entirely novel. There are many
approaches, applications, and concerns of measuring occupancy in an area.
Addressing the problem statement with wireless technologies is common practice, and some
existing research ([3], [6], [8], [9]) aligns very closely with this project in terms of technical approach
and implementation, though the use of the end system differs. However, wireless approaches are by no
means the only way to measure occupancy. Research in the area is vibrant and growing, with commercial
applications [10] and more novel uses of wireless device counting such as Google’s traffic prediction
system [11] providing financial incentives to development beyond the more humanitarian motivations
such as the one outlined in this project.
2.1.1 Overview of people-counting techniques
There are several popular methods to people counting. Traditional methods of counting by hand can be
unreliable, so automated systems have taken their place. Standard approaches such as turnstiles are
simple ways of achieving accurate occupancy metrics, but require fixed entry points and limit the flow
of people [12]. A very common means of measurement is through video surveillance, but this method
presents many difficulties such as resolution, movement, and occlusion. While much progress has been
made in this area [13][15], additional methods of counting have emerged in recent years with the
1 While some research and projects (e.g. [8], [9]) have used a similar hardware-software combination
such as the Raspberry Pi and Airodump-ng, in the case of [9] an external Wi-Fi antenna is used and in
others (e.g. [3], [4], [6]) use more expensive or complicated setups.
Page| 10
ubiquity of personal wireless devices. Wireless, non-participatory methods include tracking through Wi-
Fi signals ([3], [6], [16]), Bluetooth ([4], [8], [9]), and some more experimental methods such as through
the use of laser-grids [17]. A full history and progression of tracking systems is explored by T. Räty in
their survey of surveillance systems [18].
2.1.2 Wireless people-counting techniques
For this project, I had narrowed the technique down to a wireless approach but needed to decide between
Bluetooth and Wi-Fi. Abedi et al. [4] compare Bluetooth with Wi-Fi as a method of non-participatory
monitoring and investigates how various properties of the environment and sensor antenna affect
scanning. The various comparisons made between the two media indicate a much stronger use case for
Wi-Fi over Bluetooth. Signal propagation, discovery time, and ubiquity of Bluetooth devices fell very
short of Wi-Fi in testing. Additionally, testing of both the 2.4GHz and 5GHz Wi-Fi spectrum led me to
omit the 5GHz channels entirely from the scanning process2.
2.1.3 Value of RSSI as a metric
A hugely valuable metric in people tracking is RSSI, the Received Signal Strength Indicator of a captured
signal. Its main applications are the use of distance tracking and localisation in areas where other
indicators such as GPS are difficult to obtain. Guvenc et al. [19] used RSSI localisation to optimally
position access points in the University of New Mexico, while Bai et al. [9] used RSSI in combination
with other filtering methods to remove noise from their measurements. Similar to Guvenc et al., E.
Vattapparamban [3] uses many RSSIs detected by multiple sensors to locate one device within a grid.
This project utilises RSSI as a simple threshold to filter out devices beyond a desired distance
from the sensor. I briefly explored the variability of this metric2, but there are much more involved
approaches to analysing RSSI discussed by Adewumi et al. [20].
2.1.4 Applications of static and mobile sensors
Much of the research I encountered involved static sensors, that is, a (set of) ground sensor(s) deployed
in a fixed location to capture data. Having a fixed sensor is important in localisation and for this project
too, where the position of the sensor forms part of the data (location). Abedi et al. [6] take a human-
geographical approach by identifying patterns of human behaviour from data collected by their static
sensor. By tracking device persistence in the sensing area, they were able to track time spent in an area
rather than just an aggregate of devices such as in this project.
While static sensors seem to be the most common setup, the use of mobile sensors has come up
a few times during my research. E. Vattapparamban [3] investigated the possibilities of mobilising their
monitoring system by attaching the sensors to drones. In doing so, they showed the potential localisation
powers of such a system in the use of search and rescue, and theorised both malicious uses of and
counters to this type of surveillance. Another form of mobile sensing is Google’s use of users’ personal
devices as individual mobile sensors to identify levels of traffic congestion [11]. The ubiquity of this data
provides unparalleled insight into the way people move throughout the day.
2.1.5 Difficulties in the area of people counting
Depending on the method of people-counting, the challenges vary in nature and difficulty. Video
surveillance systems suffer from several technical challenges in the capture and analysis of video frames.
With relevance to this project, Bai et al. [9] discuss some of the issues encountered during their wireless-
based approach. Discussed in 5.2.2, a similar problem where system accuracy varied between monitored
locations, this is a significant difficulty of such blanket approaches to people-counting systems.
2 See 4.3.2
Page | 11
2.2 TECHNICAL MATERIAL
I used a variety of resources to help me design and develop the project. The following are an overview of
these and what I learned from each.
2.2.1 TypeScript Documentation [21]
JavaScript is the language of the web and is very quick to develop in due to its loose typing and simple
syntax, however these are equally its pitfalls. TypeScript, from Microsoft, aims to meet many of the
shortcomings of JavaScript by adding strict typing and language features not found in JavaScript. For
the API this was a trivial decision, as the additional restrictions helped me catch and identify many errors
in my code before even testing. I used the official documentation to help me set it up in my project.
2.2.2 Docker Compose Documentation [8]
Docker is a great technology that containerises applications for quick and predictable deployment.
Docker compose is an additional tool that helps deploy full stack applications that require multiple
containers. You specify rules for their deployment such as how and when they’re mounted,
dependencies, and how they recover from failure. With this resource, I learned how to Dockerise and
deploy my project quickly and reliably3. Specifically, the documentation taught me how containers
communicate with each other, and how I can interact with them as a developer.
2.2.3 Endpoint testing with Jest and Supertest [23]
I needed to verify the functionality of the database and API with endpoint tests, but I was unfamiliar
with the approach when using MongoDB and Express. I found this to be great resource where I learned
how to use a testing framework called Jest. I used it to test4 various parts of the API such as report
submission and retrieval, among others.
2.2.4 ESP32 Wi-Fi sniffer [24]
The ESP32 was a candidate during my evaluation of potential sensor hardware5. Support for the device
in packet sniffing is not great, and I am not very experienced working in C, so this resource from Łukasz
Podkalicki helped greatly in developing a minimal implementation with which to test.
2.2.5 802.11 frames: A starter guide to learn wireless sniffer traces [25]
To augment the ESP32 resource (2.2.4), I used this guide from Cisco to help me understand what parts
of the packet were being dissected by the example sniffer code. This greatly helped me understand the
structure of the 802.11 frame6.
2.2.6 Airodump-ng [26]
A vital part of the scanning implementation involved a pen-testing toolkit called Airodump-ng.
Airodump-ng, part of this toolkit, is used for packet sniffing with available wireless chips. The
documentation was vital in tweaking scan parameters and wrapping it in my scanning script correctly.
2.2.7 Enable Monitor Mode & Packet Injection on the Raspberry Pi [27]
Monitor mode allows a device to ‘scan’ for wireless packets being transmitted in the area. The basis of
this project hinged on having periods of scanning in monitor mode to capture all unique devices sending
packets. The Raspberry Pi does not have native support for monitor mode, without this it would not be
3 See appendix 1.24
4 See appendix 1.2
5 See 4.1
6 See appendix 1.1
Page| 12
possible to scan for Wi-Fi signals in an area with the device. Thankfully, Kody wrote an easy-to-follow
guide for enabling support through a firmware patch called Nexmon7. Without this, I would not have
discovered the Kali-Pi, the Linux distribution built for the Pi which includes this modification.
2.2.8 MAC Address Randomisation in iOS [28]
This was my primary resource in understanding how randomisation occurs in mobile devices. The key
takeaway from this was the way to tell a randomised address from a ‘real’ one by checking the second-
least-significant bit in the address. This is on the basis of the IEEE specification [29, pp. 12] which
indicates that an address is locally administered i.e. the device created/assigned this address to itself.
2.2.9 What does GDPR say about Wi-Fi tracking? [30]
GDPR is an important consideration in this project. While care was taken to design a system that
minimises the collection of personal data, I must still observe, count and store MAC addresses to a
degree. This website is a great source for information on GDPR in general, but it particularly clarified
the point that “A MAC address is a personal data at the moment it is combined with other (personal)
data that can be traced back to a person”. To avoid any personally identifiable information being stored,
I hash any MAC addresses before they are transmitted to the server, and never extrapolate collected data
to the individual e.g. through movement tracking, time spent in each location, etc. It is certainly possible
for the data I am collecting to be used for more than basic aggregation, but that is not the intention of
the project; an aspect GDPR considers.
3 THE PROBLEM DETECTING & REPRESENTING BUSYNESS
The technical scope of the project can be broken down into four key problem areas: sensing, storage,
analysis, and visualisation. Figure 1 shows the requirements of the technical problem which we can use
to motivate a discussion of the technical solutions.
3.1 TECHNICAL PROBLEM OVERVIEW
The system required for the project must adhere to the problem overview shown in figure 1. The
sensor(s) will be deployed on the campus which would need remote access to the server. The reports
they submit will need to be stored somewhere such as a database. Analyses will be performed on the
stored data and visualised on some platform. This platform must be accessible to the end user and
display the data in a way that is easy to interpret.
Figure 1Technical problem overview
7 https://github.com/seemoo-lab/nexmon
Page | 13
3.2 IDENTIFYING KEY PROBLEM AREAS
3.2.1 Data Sensing & Processing
This problem requires two decisions to be made: what is to be sensed; and the hardware required to do
so. A specific goal of the system is to perform sensing without interaction with the subjects to be sensed
i.e. non-participatory and without the subjectsknowledge of sensing. This is to simplify the problem
and additionally minimise any data privacy concerns. Sensing the presence of people is the goal of this
step but there are several challenges associated with doing this in a non-participatory manner. As
explored, video surveillance is common, but the approach taken here should minimise privacy concerns.
Additionally, the medium to be sensed should not depend too heavily on the physical characteristics of
the location e.g. the presence of walls or occlusion amongst people would cause problems with an
approach involving visual monitoring like video or even laser grids. There is not enough time to evaluate
every potential solution to the problem, so I am restricting what is to be sensed to some form of
abundant, wireless, easily anonymised signal. The trade-off here is a decrease in accuracy of sensing due
to the characteristics of signal propagation and the source of the signal to be sensed. Thankfully, the
choice of data source helps determine the required hardware (sensor) and testing and experimentation
is required to decide on implementation specifics. The hardware chosen also determines to what extent
the data can be processed before being sent to storage due to processing overhead. In this section I also
need to decide on an appropriate schema for the data format implemented on the data storage side to
ensure integrity. Finally, three critical considerations are: to determine what a sensing/scanning process
involves; the interval at which to perform scans; and any additional parameters of the process.
3.2.2 Data Storage
Given the sensors are deployed to different locations, a remote store such as a networked database must
be chosen. The type of database to use is dependent on the data sensed, how easily it interfaces with
APIs, and my personal experience with it. I must continually evaluate the performance of the database
as my data grows.
3.2.3 Data Analysis
Once the data is stored, it will need to be analysed to meet the requirements set out in the problem
statement and motivation. Comparing recent to historical and live data between locations are the
minimum required operations, but additional analyses relating to visualisation may be required.
Depending on the analysis, it may not be computationally appropriate to run analytics for every query.
3.2.4 Data Visualisation & Representation
Visualisation is a significant part of this project, as it directly addresses the main motivation: providing
data and insights for both the student body and management/security. It is an area of the project
requiring consistent evaluation and adjustment to get right. The platform for visualisation will need to
be easy to use on desktop and mobile, and the visualisations chosen must be informative at a glance.
A vital consideration of the project in order to adhere to the motivation is that the data presented
to a potentially non-technical user must be accessible, familiar, and intelligible. This project aims to
achieve this by translating metrics of the data source (unique device counts) to a sense of busyness.
During data analysis, the relative measure of busyness is quantified to a number between 0.0 and 1.0,
from least to most busy. This does not convey a natural sense of busyness. This project will explore the
idea of presenting a categorical description of busyness, rather than a number. A categorical
representation also provides an ability to skew the interpretation of the 0.0 to 1.0 scale, as this number
may not translate linearly to busyness, and the user should not be required to perform this interpretation
themselves. For example, should a relative measure of 0.3 be classified as “quiet”? Whatever form is
chosen, evaluations to establish its effectiveness and discuss potential alternatives will be performed.
Page| 14
4 EXPERIMENTS & INVESTIGATIONS
The following is a set of the more noteworthy experiments and investigations performed throughout the
project. These were vital during the decision-making process and so will be heavily backreferenced in
the proceeding chapters. In addition, challenges encountered during the process are discussed in 4.5.
Important to note is that not all experiments and investigations discussed took place chronologically
before decisions were made. Some, such as 4.4, were undertaken after an initial implementation cycle.
4.1 SENSOR HARDWARE DECISION ESP32 VS. LINUX
(+ COMPATIBLE HARDWARE/CHIPSET)
One candidate for the sensor was the ESP32, a small, low-power chip with Wi-Fi support. It was the
primary alternative to using a Linux device with a chipset compatible with Airodump-ng, the packet
sniffing toolkit. This investigation looked at two aspects: performance and ease-of-use. With the data
produced from the tests, along with personal experience with the devices, I was able to evaluate both
potential solutions to this problem. The test took place in the main Eolas lab with 20 people in the room,
with at least one device per person (laptop, likely also phone). Both devices were let scan for 60s,
switching wireless channels every 2s.
4.1.1 Equipment
1. ESP32 running packet sniffing code written in Arduino/C [24]
2. Linux laptop with a USB Wi-Fi solution (Alfa AWUS036NHA)8
4.1.2 Considerations/Variables
1. Channel hop is sequential on the ESP32 but not on the Linux device
2. The Linux device automatically identifies access points and removes them, the ESP32 did not
account for all APs but did filter out any eduroam networks9
3. The Linux device is much more powerful than the ESP32. The higher performance in testing
might not translate to smaller, more portable Linux solutions
4. The antenna used in the Linux is more powerful than the built-in antenna on the ESP32
4.1.3 Results
The Alfa gathered 44 unique client addresses – APs filtered by Airodump-ng
The ESP32 gathered 30 unique MAC addressesCisco APs manually filtered
The Linux solution is more robust, possibly due to the hardware, but equally due to the existing toolkits
available for this type of project. However, there are other pros and cons which are important to consider
in the decision. Please see appendix 1.21 for pros and cons of each method explored.
4.1.4 Conclusion
With the data and pros/cons evaluated, I decided that the non-ESP32 approach was more appropriate
given the project's time constraints. This result could have been different in other cases, as the test did
not include the exact hardware of a deployable Linux solution. An important note is that use of either
device is not necessarily exclusive and using both may provide greater flexibility in terms of remote
deployment. However, at the time of testing I believed it to be more beneficial to focus on a single
platform first and expand later if necessary.
8 https://www.alfa.com.tw/products_detail/7.htm
9 Using the Cisco OUI "84:B8:02"as a filter for all eduroam Aps on campus
Page | 15
4.2 SCAN PARAMETERS
There are a few customisable parameters to Airodump-ng which affect scanning results. Signal
propagation, antenna design, and Wi-Fi filtering specifics are complex areas of research [19], [20], [32].
Due to time constraints, I was unable to perform entirely deductive investigations. Rather, I opted for
inductive but informed approaches to determine the effects of each parameter on the scanning
procedure, the underlying principles and causes of which are not explored in this thesis and are instead
left as an exercise for the reader.
As the data captured included the signal strength of the captured packet (RSSI), I was able to
perform some filtering on this metric in my script. For medium-sized areas, an RSSI threshold of -
80dBm performed best. I settled on this number through some testing of how crowding, noise, and
distance seemed to affect the RSS of a known device to the sensor. Additionally, I tested the effects of
scan length, channel hopping frequency, and channel set:
1. For large areas, an RSSI threshold of >=-85dBm performed best. Small areas worked well with
>=-78dBm
2. For areas of lower occupancy, a scan time of 60 seconds proved enough to capture most signals
3. For areas of higher occupancy, a longer scan time of between 90-120 seconds was better
4. Due to the number of channels to scan in the 5GHz spectrum, scanning was restricted to
2.4GHz. This may result in lost devices, but it seemed to make little difference in the testing
5. Channel hopping would occur at a high frequency, about 500ms, though this isn’t documented.
I found that a more reliable count of devices was taken with a slightly slower frequency of
1000ms between hops.
4.3 CORRELATION BETWEEN DEVICES AND PEOPLE
The crux of the project relies on the assumption that there is a tangible correlation between the number
of devices and the number of people in an area. While this might feel intuitively correct, especially when
supported by ownership rates of wireless devices among the target demographic [31], it is important to
test this assumption. At the same time, one must resist the urge to people-count; this experiment only
aims to measure a correlation, not obtain a device-person ratio, as it is not relevant to the goal of the
project.
I conducted an experiment to measure the correlation between the number of devices sensed and the
number of people in an area with the following aspects:
The occupancy of the room was continually monitored.
The sensor had been configured to try capture the devices in the room but not too far beyond10
Count of people in the room was taken during each sensor scan interval for a several hours
The graphed data11 shows the count of people in a room against the count of non-randomised devices
found during the scanning process. The data exhibited a correlation factor 0.67 between device count
and people count. While this factor is not as significant as I had hoped, there is nonetheless a tangible
and utilisable correlation between devices and people, and the number of unique devices is likely a good
relative measure.
10 See 4.2 This was not a completely precise boundary due to the complexities of signal propagation.
11 See appendix 1.14
Page| 16
4.4 HOW BUSY IS BUSY”? REPRESENTING A SENSE OF BUSYNESS
The title to this section, and inspiration for this investigation, was a YouGov study [33] in which the
perceived meaning of a categorical rating was related to a quantified, numerical value. In order to convey
a sense of busyness, the appropriate wording must be used. Throughout the project, the single most
common point of confusion during demos and user-testing was what a categorical rating (quiet, calm,
lively, etc.) meant. The following is a brief investigation to determine if a set of appropriate words exist
to rate the busyness of an area. It is important to note that this investigation took place after an initial
scale and set of busyness descriptors were in use for some time on the visualisation platform.
4.4.1 Setup
I created a polling page which would present a busyness rating to the user and ask them to represent
that word on scale of least to most busy. The results were submitted to my API and stored in a MongoDB
collection for analysis later. The decision of linear background gradient for the slider was made after an
initial run of the poll, the results of which have been discarded. The reason for discarding the initial run
was two-fold: I identified a potential biasing factor in the poll colour scheme; and I didn’t provide a way
to not answer a question. The colour scheme used on the slider initially matched the colours choice used
on the dashboard UI 12 . However, the placement of each colour on the slider had been directing
participants to place the slider thumb in areas defined by the colours e.g. if the middle were orange,
more intense words such as “busy” might appear here despite being only 50% on the slider. The other
issue was a missing “Not Sure” button, forcing participants to answer even if they felt unsure of the
word’s meaning or feeling. I updated the slider to be less biased by using a single colour gradient13.
4.4.2 Results
An infographic14 was generated from the data 15 generated by 21 participants. It shows how intensely each
word was rated in relation to a sense of busyness. I had thought of filtering the outliers; however, they
signify the confusion that each word holds and so I believe removing them would be dishonest.
In general, the most universally-agreed-on words were found at the extrema. There was
disagreement around words like “hopping” and “humming”, with strong candidates around the intervals
already being used for busyness categorisation on the front end (discussed in 4.2.4). Evaluating the
standard deviation and mean of each word’s intensity, I chose the following categories for busyness
ratings on the front end:
very quiet (0-.15) calm (.15-.35) comfortable (.35-.65) bustling (.65-.85) hectic (.85-.100)
4.5 NOTABLE CHALLENGES
As this project involved aspects of both hardware and software, there were quite a few interesting or
noteworthy challenges.
4.5.1 MAC Address Randomisation Filtering bad data
“Media Access Control (MAC) address randomization is a privacy technique whereby
mobile devices rotate through random hardware addresses in order to prevent
observers from singling out their traffic or physical location from other nearby
devices.” – Martin et al. [34, Ch. Abstract]
12 See appendix 1.7
13 See appendix 1.8
14 See appendix 1.9
15 See appendix 1.10
Page | 17
There are some very involved approaches to identifying exactly which device has been randomising its
addresses, some of which are discussed by Martin et al. [34, Sec. 4.2]. Due to complexity and time
constraints, I decided on a much simpler method of discarding any likely-randomised addresses and
retaining the ‘real’ ones. To do this, I identify any properties of a sensed device which might indicate a
‘real’ MAC address and discard any devices which fail these checks, assuming they are randomised.
The following assumptions are made:
1. If a device is associated with a network, it is not randomised
2. If a device’s MAC address begins with a known manufacturer prefix16, it is not randomised
3. If a device’s MAC address is not locally administered, it is not randomised17
4. If the above checks fail, assume it is randomised
With this method18, I was able to obtain a reasonably stable correlation between the number of devices
and the number of people19. I performed a ground truth experiment by counting the number of people
in an area during each scan interval, and correlating it with the number of detected devices, post-
filtering.
4.5.2 Monitor-Client Mode Switching – Kernel modifications in the Raspberry Pi
The Raspberry Pi 3B(+) does not natively support monitor mode on its wireless chipset. A kernel patch
(Nexmon) bundled with a Linux distribution (Kali-Pi) allowed for this mode to be entered. The
Raspberry Pi could now be used for both scanning and data pre-processing. However, due to the
aftermarket nature of this modification, switching between client and monitor mode to allow data
transmission between scans was an arduous task. Instability such as packet loss and complete
disconnections from the network required for submitting scan reports would occur randomly. The only
way to maintain stability was to remove and re-add the wireless chipset driver from the kernel and tidy
up networking services while the script was running20.
4.5.3 Staying Alive - Redundancy in the Raspberry Pi
In the early stages of development, I found that the script would crash often. While many of the culprit
bugs for the crashes would be ironed out over the months, in the meantime I needed a way to guarantee
as best as possible that each deployed Raspberry Pi would stay alive and work as intended during the
scan and submit process. The script running the scan and processing data on the Pi is executed at boot
in a wrapper that force-restarts the device if any unhandled exception occurred. In the script itself, many
potential exceptions are handled such as failure switching wireless mode, failed data transmission, and
processing errors. All scan reports are kept on each device (up to a maximum of 1000) after generation
and backlogged if they failed to send for any reason. In one case, a device failed to transmit data for an
entire day due to an API bug, but once the connection was restored all reports were submitted
retroactively, and no data was lost. Finally, a system was developed to remotely deliver configuration
updates to each Pi uniquely, in case a parameter causes inaccuracies or errors.
4.5.4 Database/API Optimisation Reducing query overhead via batch processing
Initially, all queries were naively processed at query-time. This was acceptable at first, as the queries
were for recent data and involved at most a few tens of report summaries and comparisons. As
visualisation complexity grew, each request involved summaries of weeks of data for each device,
16 https://regauth.standards.ieee.org/standards-ra-web/pub/view.html#registries
17 See 2.2.7
18 See appendix 1.18 for pseudocode
19 See 4.3
20 See appendix 1.3
Page| 18
comparisons between all days, historic highs and lows, etc. The processing time for these queries quickly
grew and meant page load times were as high as 7s, with each load consuming 4.5s of CPU time on the
server, and with hundreds of kilobytes of JSON transmitted. With compression and batch pre-
processing of large queries, page load times were reduced by 60%, CPU time by 83%, and data was
compressed by -86%. The trade-off was that batch processing, a heavy operation, had to be performed
at regular intervals to ensure freshness of data resulting in about 30s CPU time total each day – though
this would increase linearly as more devices are added to the system. These significant performance
improvements contribute greatly to the responsiveness of the dashboard on all platforms.
4.5.5 Heavy Data Difficulties and success during high-traffic events
When a device was first deployed in a high-traffic area (SU Bar), a number of issues cropped up that I
had previously thought possible but was not prepared for. Firstly, the device suffered from overheating,
and the scan windows were picking up very little data due to thermal throttling of the processor and/or
chipset. To quickly resolve this, I mounted some coins onto the surface of the CPU. This remediated the
issue at first, but I eventually switched to dedicated aluminium heatsinks. However, now that the device
was no longer throttling and scans could be performed during high-intensity periods, the device stopped
reporting indefinitely. After scanning during a high-intensity event, the device generated reports larger
than the default payload limits in the Node.js API and would hang on trying to submit this backlogged
report after each scan interval. Adjusting the payload limit and trimming some unnecessary information
(such as randomised MAC addresses) from the reports resolved this issue. Luckily, all reports generated
in the meantime were still stored on the device, thanks to the redundancy described in 4.5.3, and no data
was lost while the issue was present.
5 DESIGN & IMPLEMENTATION SOLUTIONS
Motivated by the discussions of Figure 1, the four problem areas, and investigations performed in
Chapter 4, this chapter will discuss the final design and an overview of how each problem area was
addressed.
5.1 FINAL DESIGN DIAGRAM
The final architecture of the system as motivated by discussion and experiments:
Figure 2Final project architecture
5.2 ADDRESSING KEY PROBLEM AREAS
In this section, the key problem areas highlighted in 3.2 are revisited and solutions are presented to each
with reference to relevant experiments or research from Chapter 4.
Page | 19
5.2.1 Data Sensing & ProcessingMeasuring occupancy
The main concerns with sensing were:
1. Finding an abundant, wireless, and anonymous method of activity measurement
2. Testing and developing a suitable hardware solution for the above method
Wireless devices such as mobile phones, tablets and laptops are ubiquitous on university campuses. The
rate of ownership of these devices among students in the Netherlands, a similar demographic, is as high
as 96% [31]. Given the ubiquity of this signal and ease of scanning for public signals, Wi-Fi packets were
chosen as the data source for the project. Wireless packets are constantly being sent from devices, even
when Wi-Fi is disconnected or disabled in OS21. Every wireless packet contains a publicly visible unique
device identifier, the MAC address, found in the Data Link layer22. By scanning for and filtering all
packets in an area using a device in monitor mode23, it is possible to compose a list of these unique
identifiers. Unfortunately, not every wireless device is guaranteed to be transmitting during a scan
window, and some devices may be transmitting with multiple randomised addresses at once 24 .
Addressing these two concerns: devices missed during a scan window can be minimised with a longer
scan window25; and randomised addresses can be filtered with some heuristics for a reasonably reliable26
count of devices in an area. Experiments and investigations around the use of Wi-Fi as a data source,
including which wireless spectrum to scan, were discussed in 4.2 and 4.3.
With respect to hardware, many small IoT devices (such as the ESP32) can filter packets and
extract MAC addresses, but after some testing27 and evaluation of previous experience, the Raspberry Pi
3 was chosen for sensing, pre-processing and transmitting collected data. Additional experiments were
performed to determine some scanning parameters such as scan length and power thresholds at which
to classify a device as “in the area”25. Finally, a script was developed to perform the scan procedure at
five-minute intervals. The process of a scan is:
1. Prepare the device environment by setting the correct date-time, runtime variables and more
2. Check for any remote configuration updates and sync any unsubmitted reports
3. Prepare the device for scanning by entering monitor mode25
4. Delegate Airodump-ng with scan parameters and allow to run for pre-defined scan length
5. Process the collected data into desired format28
6. Switch from monitor mode to client mode and verify internet connection
7. Synchronise any unsubmitted reports, backlogging any failed reports for next time
8. Remain in client mode until 30s before the next scan, upon which re-enter monitor mode
5.2.2 Data Storage
A database is required to store the data collected by deployed sensors. MongoDB, a NoSQL database,
was chosen as the solution for a few reasons:
21 www.xda-developers.com/android-pie-gets-android-oreo-turn-on-wifi-automatically-feature/
22 See appendix 1.1
23 Operating on this mode, the wireless network card is able to capture all types of Wi-Fi Management
packets. https://www.acrylicwifi.com/en/blog/wifi-monitor-mode
24 See both Apple’s and Android’s documentation on randomisation support.apple.com/en-
us/HT201395 & https://source.android.com/devices/tech/connect/wifi-mac-randomization
25 See 4.2
26 See 4.3
27 See 4.1
28 See appendix 1.11
Page| 20
1. The sensed data is non-relational and is stored and transmitted in JSON
2. Having JSON throughout the full pipeline simplifies design of a JSON-based API
3. JSON is compatible out-of-the-box with any JavaScript used on the front end.
4. I have previous experience setting up and working with MongoDB
The database needs to be accessible remotely from each sensor. This is achieved using Docker containers
deployed on the Computer Science Department server. Two containers are used: one for the Node.js
API; and the other for the MongoDB instance, and only the Node.js container can access the MongoDB
container. Node.js was chosen for the API and web server as it comprises the code for both in a single
workspace. MongoDB schema in Node.js validate incoming reports29 from each device and integration
tests are run before each redeployment to ensure endpoint functionality30.
5.2.3 Data Analysis
Document-based databases (e.g. MongoDB) can be slow to query for and compare large amounts of data,
common operations during data analysis. To improve performance, two indexes were created over the
data collection: submission time; and the collecting sensor ID. Together, these reduce memory usage in
sorting and comparing31.
The main task of the analyses is to build a profile of a location in terms of typical activity levels
over time32 and to compare real-time data to historical33. Some aspects of profile-building require large
amounts of data, and so these operations are done as scheduled batch processes using CRON jobs rather
than on a per-query basis, though most visualised analyses will involve some processing on client-side.
One of the later additions to the front-end feature-set was to allow a user to query for predicted
future busyness levels in any monitored locations 34 . This is done by averaging any historical
measurements of the requested location and time and presenting it to the user as a predicted level of
busyness. The feature is simple with much room for improvement, but the basic concept allows a student
or management to plan ahead of time based on historical patterns without the need to go through the
data themselves.
At all stages it is important that the user interface and API remain responsive, as users would
be frequently be accessing the visualisation platform or querying data for up-to-date information. Any
delay in information retrieval from the API could cause slowdowns in the user interface or the API to
return more stale data. API and database performance metrics were continually evaluated in terms of
CPU usage and response time35 throughout development as data structures evolved.
Calculating Busyness
The busyness metric is relative and fit to a scale of 0.0–1.0: least busy to most busy. To obtain this
number for a specific location, the following steps are performed:
1. Gather all location reports over a period (~3 weeks) - the time range within which to relate
busyness
2. With a rolling average, smooth the unique device counts in the reports gathered36
3. Get the minimum and maximum measurements in the smoothed array
29 See appendix 1.11
30 See appendix 1.2
31 https://docs.mongodb.com/manual/core/query-optimization/
32 See appendix 1.4
33 See appendix 1.5
34 See appendix 1.6
35 See 4.5.4
36 See appendix 1.16
Page | 21
4. To filter noise and anomalies, take the three most recent readings (15 minutes) and generate a
weighted measurement using the following formula: ̅ = 0.15 ∗ 3+ 0.25 ∗ 2+ 0.50 ∗ 1,
where x1 is the most recent measurement
5. Take the weighted measurement and map it between the min/max obtained in (3.) for a relative
measure of busyness between 0.0 – 1.0
As you may have noticed, two important steps stand out during processing: smoothing and weighting.
Smoothing was performed on historical data (step 2) to reduce the effect of outliers in historical data, as
inaccurate extrema could greatly impact the reliability of busyness estimation. The weighted
measurements (step 4) favour the most recent reading to maintain focus on freshness of data and
accuracy of real-time fluctuations in busyness, but also includes two previous readings.
These two steps help smooth any anomalous readings that may be unrepresentative of the
location both in real-time and historical data. Both the averaged historical data and the unweighted
busyness measurement are still stored and can be requested from the API, with the both weighted and
unweighted busyness levels made available to users on the UI37.
5.2.4 Data Visualisation & Representation
The visualisation platform is comprised of two parts: A live interactive map; and a dashboard containing
analytics, the interactive map in a smaller format, and general information on the project
The platform of a webpage was chosen as it is universally accessible from desktops and mobiles and
required the minimum amount of ramp-up in terms of implementation. JavaScript frameworks for the
live map (Mapbox) and analytics/graphs (ChartJS) are used to create and display visualisations, with
the webpage layout being implemented without frameworks.
Prioritising usability and performance, the dashboard features a responsive design to ensure compatible
with devices of all screen sizes and care was taken to optimise API requests and data processing38.
Continuous testing throughout development of the UI through user feedback and my own personal real-
world usage of the dashboard on desktop and mobile heavily guided design decisions, such as the
placement of buttons and explanations of certain tools. I also followed best practices and optimisations
suggested by Google Chrome’s Lighthouse tool39.
Representing Busyness
An important role of the visualisation process is to decide on an intelligible representation of the
system’s measurements.
With regards to the live map, I decided to use a heatmap to display two key pieces of
information: the current measure of busyness as calculated in 5.2.3, and how the absolute number of
devices compares between each location. A heatmap provides a quick way to visualise and compare data
without the need for the user to get lost in the numbers. Importantly, however, the heatmap is not a
replacement for the data itself, and all measurements which drive the heatmap are available on the UI
also. The visualisation40,achieved with Mapbox, represents the busyness at that location through the
intensity/colour of the heatmap and the absolute number of devices detected by the radius. This two-
dimensional representation allows us to make statements such as: Location X is busier than Location
Y, but Y has far more devices”. Such a measurement might indicate that Location Y is physically larger
than Location X, as for a location to have more devices but lower busyness suggests potentially a higher
37 See appendix 1.17
38 See 4.5.5
39 https://developers.google.com/web/tools/lighthouse
40 See appendix 1.11
Page| 22
maximum occupancy, or that the demographic which frequents Location Y has a greatly skewed device-
person ratio. While this project does not attempt to interpret this type data, presenting it to the user
allows them to form their own analyses beyond that of the scope of the project.
There are many ways to represent a measurement of busyness: a simple bar; a rating-out-of-x
system, word categories; the number by itself; and more. With this project, I decided to use a categorical
rating (word) to provide insight on busyness in an area, the impact of which I’ll discuss in the evaluation.
I initially divided the 0.0 1.0 scale linearly in 0.2 increments. During testing, I found there were caused
large discrepancies between the system’s categorisation of an area and how I personally perceived it.
Following this, I changed the size of the increments to respect the intensity of extrema; 0.15 for each
extremum, 0.2 given to the closest categories, and a larger 0.3 interval at the centre. Skewing the
intervals rather than categorising linearly was the right direction for initial testing and provided a
reasonable estimation. I later discovered that human perception of busyness is much more complicated
than initially hoped, and the category intervals would be adjusted following surveys. This is detailed in
4.4 and discussed & evaluated in 5.1.1.
Finally, I feel it is important to mention any inspiration for the visualisations. Abedi et al.’s research
directly inspired some of the visualisations employed on my dashboard, such as their stacked area chart
[6, Fig. 5], and the busyness comparison chart41 was both practically and visually inspired by Google’s
location busyness indicators42
6 EVALUATION & DISCUSSION
Due to constraints, please refer to Appendix 1.23 for full end-to-end verification of system correctness
6.1 SOFTWARE VERIFICATION & SYSTEM CORRECTNESS
6.1.1 Sensor Components
Verifying software which interacts with and depends on hardware is difficult. Due to this coupling, I
found it challenging to design formal tests around the scanning script used on the Raspberry Pi. Instead,
I added launch arguments to allow me to run the script in a ‘test’ mode on non-Linux devices and devices
without a wireless chipset so that I could verify the main data flow during development e.g. the data sync
process, report generation. Additionally, any key script parameters are stored in a config.json43 and
synced remotely from the server at runtime, this allowed testing and updating of scan parameters during
deployment phases.
6.1.2 Server Components
Schema on the server side44 are used to validate data generated (reports) by the sensors to ensure data
integrity. If any report fails validation, a HTTP 400 status code (Bad Request) is returned by the server.
In this case, the sensor assumes the report is corrupt, and moves it a separate folder for reports that
cannot be submitted to the server. I manually check this folder from time to time and have found some
rejected reports in the past which were corrupt due to a hardware error causing invalid data to be logged
in the report output.
41 See appendix 1.5
42 See appendix 1.13
43 See appendix 1.15
44 See appendix 1.11
Page | 23
To validate server behaviour, formal API testing was developed for some of the endpoints like
POST and GET45 report. These tests validate the interaction between the sensors and the API, and the
API and the database e.g. reports can be submitted and retrieved, incorrect reports produce the correct
errors, etc. TypeScript [21] was used from the outset on the server side to ensure I was writing safer
code. The endpoint tests were run before each deployment to ensure no core functionality had been
broken by any updates, and report data was frequently checked by hand for sanity. All endpoints were
manually tested throughout the project using Postman The server is running in a docker container using
Docker Compose[22] to ensure it stays alive.
6.1.3 User Interface Components
During development, I continually tested the front-end platform on all devices and evaluated its
performance using Google Chrome’s Lighthouse Tool46. These tests verified that the user interface was
functional and responsive (both in performance and responsive design terms) on smaller screens such
as mobile devices, as well as much larger content display screens. The UX/UI was additionally validated
through user testing and feedback to ensure clear and concise design, without sacrificing functionality.
Many informal user testing trials were performed to evaluate the effectiveness of the dashboard layout,
readability, and presentation of visualised data. The effectiveness of the presentation and estimation is
further discussed in 5.2.2.
6.2 SYSTEM EFFECTIVENESS & DISCUSSION
To evaluate the system’s effectiveness, I surveyed both my own and other’s impression of busyness in
the monitored areas to determine accuracy. I also spoke to the Events Officer of the Maynooth Student’s
Union for feedback on the project’s deployment in the SU Bar to identify potential applications and
improvements. During deployment, I checked monitored areas in person to see if the data being
displayed on the dashboard represented the real-world accurately47 this also serves as a test of the
system’s correctness and data flow. At the time of observation, busyness was tightly coupled with the
idea of relative occupancy, so my evaluations were positive. I believe that with the understanding gained
during this project I would evaluate the accuracy differently, the reasons for which follow in 5.2.2.
6.2.1 User Feedback MSU Events Officer
A sensor was deployed in the Students Union bar as a trial run of the project in an uncontrolled
environment. Additionally, this gave the opportunity to evaluate the system as a product. Speaking to
the MSU Events Officer (henceforth User) after a few weeks of deployment, I was able to identify several
uses for the project, as well as some improvements which could be made going forward.
Overall, the User found the data produced to be useful in resource management/allocation, and
for use in data-driven discussions. Particularly, the user would like to have more areas monitored, such
as meeting rooms. It was suggested that this system could integrate with existing resource management
systems such as their room booking tool to identify rooms that have been booked but unused, or rooms
which are incorrectly allocated e.g. large rooms being mostly used for small meetings. The User
highlighted the system’s potential use in evaluating the effectiveness of campaigns by measuring
numbers gathering at campaign events. This would extend the system’s use beyond measuring busyness.
The User expressed a general interest in assisting with the development and trialling of the
project as a product and made a few suggestions in this regard. The visualisations on the front end are
limited, and a finer-grained breakdown of the data or added interactivity for power-users would benefit
45 See appendix 1.2
46 See appendix 1.20
47 See appendix 1.22
Page| 24
the User greatly. Personally, I feel this was a result of having a single dashboard for both end-users
(management and students). Developing for both, I needed to be careful in keeping the interface
informative but accessible by both power and casual users. An additional suggestion was to have a
second metric or form of reporting, particularly in the occurrence of data anomalies. The User suggested
a photograph to be taken during high busyness periods to help the User understand where the
anomalous data was coming from. The User also made an important point regarding this second report,
in that it acts to build trust with users, providing a justification for the data. Of course, taking pictures
would open an entirely new discussion on privacy concerns, but the greater point of building trust
between the system and its users is important to note.
In summary, the User was happy with the trial and is interested in future iterations of the system.
6.2.2 Survey of Estimation Accuracy & Discussion of Busyness Perception
I created an additional webpage like the busyness poll to test current readings from the system against
people’s impressions of busyness in each location. Each participant was presented with a page asking
them to rate the current busyness in the area using a slider. The test was run for a short time before a
pattern emerged.
The data I collected48 was unexpected and indicated that how we perceive busyness is more
complicated than first thought. While my estimations aligned with the ratings provided by the system,
the participants did not. I had become accustomed to how the system measured and represented
busyness. I found that users consistently rated busyness higher than the system in the SU Bar and lower
than the system in the Final Year Lab. Speaking to participants directly, I found that factors beyond just
room size and occupancy impacted their perception of busyness. The most cited reason for discrepant
ratings was the noise level in an areawhich seems obvious in hindsight – as a silent room at maximum
occupancy is going to feel a lot less busy than the same room at half occupancy but a loud noise level. I
believe more metrics such a noise or light levels in an area would greatly improve busyness estimation.
The discrepancies recorded also allude to how non-linear our perception of busyness can be;
perceived busyness increases rapidly as the number of people grows from zero. As mentioned in 5.2.4, I
moved from a linear busyness categorisation to a more bell-curved shape. Despite basing the busyness
intervals on the poll results49, these did not seem representative of people’s perception in the real world.
I believe that that the poll may have been evaluating people’s impression of occupancy/crowding rather
than busyness, and that the intervals generated by the survey were not appropriate. While I couldn’t
collect enough data to confirm this assumption, from discussions with participants it seems the category
intervals need to be skewed towards the upper extreme:
very quiet (0-.10) calm (.10-.25) comfortable (.25-.45) bustling (.45-.75) hectic (.75-.100)
I speculate that participants lacked contextual clues to rate the implication of each word, as they
were not physically present in an environment representative of each surveyed level of busyness. As a
result, participants may have had to fall back on their understanding of busyness, potentially driven by
an imagination of occupancy. If this is the case, then my system will be inaccurate for the same reason:
it is estimating busyness with only a single metric. This goes back to my hopes in the introduction: that
contextual clues to perceived busyness will be implicit in the data I collect. It seems this is not the case,
though occupancy can still be a good measuresuggested by other research of perceived crowdedness
[35] but for this project the scale must be skewed. Skewing the categorisation scale might provide
greater utility too, as data will be classified both in terms of what’s closer to our perception of busyness
and an estimation of occupancy level, useful for students and management respectively. Biasing this
48 See appendix 1.19
49 See appendix 1.10
Page | 25
scale could also result in over-estimating the level of perceived busyness. As this system potentially
directly impacts the safety and well-being of students, false positives are preferred to false negatives.
6.3 EVALUATION SUMMARY & PROJECT METRICS
In total, nearly 300 commits were made to the project, contributing to ~4,000 executable lines of code.
Many commits were in blocks, consisting of a new feature and fixes for bugs introduced by that feature.
The top three languages used were Python (38.9%), TypeScript (26.1%), and JavaScript (20.9%). The
correctness of the system has been sufficiently validated. Code relating to various components was
shown and any results of testing discussed and there is a verified flow of data from sensor to server and
server to front end platform. Again, please refer to Appendix 1.23 for a full system verification. The
effectiveness of the system was evaluated, but not conclusively verified. Through survey results and
discussions with users of the system more work is required to accurately represent the busyness of a
location to more closely align it with human perceived busyness, but the data required is likely there.
7 CONCLUSION
Over the project’s duration, I was able to come to several significant conclusions:
1. Busyness is a valuable metric to the parties outlined in the motivation (students & management)
2. People sense busyness primarily as a feeling; some verbal descriptors struggle to convey this
3. There is a utilisable correlation between the number of devices and people in an area
4. A relative measure of busyness is only accurate when captured data includes occupancy extrema
in a location i.e. during quieter weeks, busyness estimation will be more sensitive and skewed
5. Additional relative metrics such as ambient noise levels could improve estimation effectiveness
6. It is possible to build a low-cost, non-participatory solution to busyness estimation
7.1 LIMITATIONS OF APPROACH & THREATS TO VALIDITY
There are several limitations of the approach taken which threaten the validity of the above conclusions.
1. The use of device-person correlation only works if the ratio remains consistent in an area. For
example, if a location has an emergent device-person ratio of 2:1, then the system implicitly
treats a user with only 1 device as less impactful on the level of busyness despite being an
additional person in the location. Inconsistent ratios can result in inflation and undercounting
2. By using only a single metric for busyness, the system does not consider additional contextual
clues which may impact our human perception of busyness (such as noise)
3. My experiments & investigations into the behaviour of antennae, wireless chipsets, and signal
propagation were very limited, and any assumptions made around these behaviours and
parameters may have negatively impacted the accuracy of measurements
4. The decision to represent busyness categorically (on a word scale) may have been detrimental
to busyness estimation/presentation accuracy, as discussed in 5.2. Alternative representations
could have been a continuous form such as a bar or keeping a discrete but wordless x-out-of-y
system e.g. using stars or other symbols
5. By opting for a non-participatory approach to sensing, I could avoid potential GDPR issues.
However, anonymous systems tend to be more limited in accuracy as they can’t provide exact
person count. Additionally, care must be taken when developing such systems. For example, it
was ruled that Google overstepped the mark in their data collection for Google Street View [36]
7.2 FUTURE WORK
Given the large corpus of existing people-tracking research and products, there are many directions in
which to take future work. Additionally, the use of commercial applications [10] of similar people-
Page| 26
tracking systems is popular in cities and transport, and in particularly shopping outlets where human
behaviour can be harnessed directly to increase sales [37].
In terms of expanding the current implementation, I would like to increase the coverage of the
system to the rest of the campus buildings, the library especially. A similar project/product is Waitz [38],
a busyness monitoring tool for students and staff at USC. Personally, this would be my ideal future vision
of this project in all aspects (UI, coverage, utility, etc.). In discussions with peers, I found that busyness
monitoring in the Maynooth University gym and library were the two most suggested areas to cover,
these would be the next targets.
The use of people-tracking and busyness monitoring in strategic planning is highly valuable.
Being able to quickly evaluate historical busyness levels with a non-participatory measurement system
provides a data-driven argument for strategic event planners when justifying administrative decisions
(e.g. increased security, footfall improvement, etc.). I would like to expand the existing dashboard toolset
to include report generation for administration purposes, providing deeper insights into the profile of a
location such as time spent in a location, well researched by Abedi et al. [6].
Finally, I would like to improve accuracy of busyness estimation. By spending more time
researching the technical and psychological aspects of this project, such as signal propagation and
human-perceived busyness, it could be possible to achieve a much more reliable measurement and
representation of a location’s busyness. Additional metrics such as light and noise levels to complement
the existing device count metric would also generate more data to compare and corelate, and likely
increase the accuracy of the system.
7.3 PERSONAL CLOSING COMMENTS
This project asked a question: “Can you tell how busy an area is just from the number of devices in that
area?”. As an investigation, whether the answer was yes or no, it would have been successful. However,
I found that it quickly grew from an investigation to somewhat of a passion-project, and I thoroughly
enjoyed working on all aspects of implementationeven the cryptic Linux kernel errors.
Over the course of the project I improved many skills. In terms of the non-technical aspects, the
first which comes to mind is report writing. This thesis is by far the largest and most academically
involved document I’ve written, so learning how to organise my thoughts over such a large space has
been challenging but immensely rewarding. Learning to strictly manage my deadlines relating to the
thesis has also been a valuable learning experience. Technologically, I’ve become much more familiar
with Linux and Raspberry Pis, and my understanding of wireless technologies has improved. While I
already had reasonable experience with web development, my skills with CSS, JavaScript and TypeScript
have been further developed and complemented with additional experience with Node.js and MongoDB.
As my first ‘real’ research project, it was hard not to become attached to the idea that the answer
to that original question would be “yes”. I was striving for a working product, focusing on the end-user
experience, but I also needed to validate my decisions and assumptions both theoretically and
empirically. These two ideals did not coexist peacefully, especially when the data didn’t match my
expectations. At those moments, remembering the point of the project was important: to answer that
original question. Despite data from the busyness survey suggesting that the system was lacking
important contextual clues to estimate more accurately, I believe this can be overcome with better
heuristics. Perhaps time spent in a location is an important factor, as lots of people moving through an
area could make it seem even busier (motion seems to increase perceived busyness), or there is a better
set of intervals or categories to use for busyness visualisation. So in closing, can you tell how busy an
area is just from the number of devices in that area? I would say yes, but as always, there’s more work
to be done to perfect the system within its limitations.
Page | 27
1 APPENDIX
The appendix is quite lengthy, so I have provided a brief overview to its contents here. Of course, the full
context of each item is found in the relevant sections of the main body where they are referenced. Due
the amount of source code for the project it has been excluded from the appendix and can instead
be found in the accompanying zip file.
APPENDIX CONTENTS
1. The data format for a generic 802.11 frame, showing the MAC addresses in Address 1-3
2. An example of an automated endpoint test used on the API as part of verification (TypeScript)
3. The script used to switch the RPi wireless chipset between monitor and client mode (Python)
4. Front-end visualisation of each day’s busyness data for comparison between days & times
5. Front-endvisualisation of current day’s busyness by the hour, compared to historical patterns
6. Front-endtool to allow users to query for predicted busyness levels by location and time
7. Busyness Poll slider (V1) used by participants to indicate perceived intensity of each word
8. Busyness Pollslider (V2) updated to reduce potential bias caused by colour choice in V1
9. Busyness Polldata visualised as a ridgeline plot with mean and standard deviation
10. Busyness Poll raw data unprocessed
11. JSON schema used to verify the format of reports generated by RPis and stored in MongoDB
12. Front-end Live heatmap display of Campus with real-time sensor data (post-processing)
13. Front-end Inspiration for historical comparison in Appx. 1.5
14. Graph of devices measured vs. people counted during experiment – correlation of 0.67
15. Example of a sensor configuration file which can be updated remotely on a per-sensor basis
16. Algorithm used to perform a simple moving average on historical busyness data (TypeScript)
17. Front-end both real/weighted busyness measurements and min/max are shown to the user
18. Pseudocode algorithm used when filtering sensor data for ‘randomised’ MACs (Python)
19. The raw data collected during system accuracy evaluation shows System against User rating
20. Lighthouse benchmark of front-end platform showing high performance and accessibility
21. Pros and Cons evaluated during decision process of sensor hardware related to 4.1.
22. Some personal observations of system accuracy before busyness was evaluated as a sense
23. Full demonstration of end-to-end system correctness follow sensed data through all stages
24. Configuration file used for Docker deployment of both the MongoDB and Node.js containers
Page| 28
1.1 802.11 GENERIC MAC FRAME
https://en.wikipedia.org/wiki/802.11_Frame_Types#/media/File:802.11_frame.png (Buhadram, CC BY-SA 4.0)
1.2 SAMPLE ENDPOINT TEST
Tests the scan report retrieval endpoint used as a basis for many analyses
/*
* Test the /GET route
*/
describe('/GET_report_range', () => {
// Test Report Data
let report: any = {
...
};
it('should receive 404 if no reports found', async (done) => {
// Send request for report and store result
const api = `/report/range?device=${testDevices[0]}&start=${new Date(yesterday).t
oISOString()}`;
const result = await request(app).get(api).send();
// Verify empty body and correct HTTP status
expect(result.status).toEqual(404);
expect(result.body).toMatchObject({});
done();
});
it('should receive a single report for one device', async (done) => {
// Customise the test report
let insert1 = report;
insert1.summary.device = testDevices[0];
insert1.summary.time = Date.now() - 1000
// Add it to the database
await new ReportSchema(insert1).save();
// Send request for report and store result
const api = `/report/range?device=${testDevices[0]}&start=${new Date(yesterday).t
oISOString()}`;
const response = await request(app).get(api).send();
// Verify test report is returned from the API.
expect(response.body).toHaveProperty(testDevices[0]);
expect(response.body[testDevices[0]].length).toStrictEqual(1);
done();
});
Page | 29
1.3 SCRIPT FOR SWITCHING TO CLIENT MODE ON THE RASPBERRY PI
def set_client():
"""
Sets the Raspberry Pi into Client Mode for internet access (wlan0)
"""
print("\n---- Switching to Client ----\n")
client_interface = interface
if "mon" in client_interface:
client_interface = interface[:-3]
try:
proc.check_output(["ping", "-q", "-c", "1", "-W", "1", "google.com"])
print(f"*** Connected! ***")
except proc.CalledProcessError:
print("-> No existing connection")
print("-> Taking down wifi")
proc.call(["sudo", "airmon-ng", "stop", interface], stdout=proc.DEVNULL)
proc.call(["sudo", "ifconfig", client_interface, "down"])
print("-> Re-registering wlan driver")
proc.call(["sudo", "modprobe", "-r", "brcmfmac"])
proc.call(["sudo", "modprobe", "brcmfmac"])
print("-> Asserting mac address")
proc.call(["sudo", "ifconfig", client_interface, "down"])
proc.call(["sudo", "macchanger", client_interface, "-p"])
print("-> Starting WiFi")
proc.call(["sudo", "ifconfig", client_interface, "up"])
conn_tries = 1
max_tries = 5
while conn_tries <= 3 and get_time_till_next_5_minute() > 75:
print("-> Restarting Networking Service")
proc.call(["sudo", "systemctl", "restart", "networking"])
print(f"-> Done, waiting {conn_tries * max_tries}s to establish connection")
time.sleep(conn_tries * max_tries)
print("-> Running dhclient")
proc.call(["sudo", "dhclient", "-r"], stdout=proc.DEVNULL)
proc.call(["sudo", "dhclient", client_interface])
try:
print("-> Testing network connection")
ping_out = proc.check_output(["ping", "-
I", client_interface, "www.google.com", "-c", "3"])
if "ms" not in ping_out:
print(f"/// Failed to connect after setting client ({conn_tries}/{max_tries
}) ///\n")
conn_tries += 1
else:
print(f"*** Connected! ***")
break
except proc.CalledProcessError:
traceback.print_exc()
print(f"/// Failed to connect after setting client ({conn_tries}/{max_tries})
///\n")
conn_tries += 1
Page| 30
1.4 VISUALISATION TYPICAL ACTIVITY LEVELS OF A LOCATION
1.5 VISUALISATION COMPARISON BETWEEN HISTORICAL & CURRENT DATA
Page | 31
1.6 VISUALISATION FUTURE PREDICTIONS THROUGH USER QUERIES
1.7 BUSYNESS POLL ORIGINAL SLIDER GRADIENT
1.8 BUSYNESS POLL UPDATED SLIDER GRADIENT
Page| 32
1.9 BUSYNESS POLL INFOGRAPHIC OF WORD INTENSITIES
Page | 33
1.10 BUSYNESS POLL THE RESPONSES
word
mean
std
responses
dead
3.29
8.44
21
empty
4.95
9.64
21
very quiet
6.55
8.66
20
quiet
19.14
12.4
21
still
19.24
15.99
21
calm
22.71
8.7
21
comfortable
40.95
17.41
21
astir
52.38
17.26
16
humming
54.79
17.6
19
dynamic
60.89
11.7
19
vibrant
64.75
9.43
20
lively
65.71
10.2
21
hopping
65.79
20.39
19
busy
68.65
12.58
20
buzzing
72.05
15.07
20
bustling
78.05
13.67
19
loud
78.6
13.7
20
very busy
85.05
11.59
21
packed
87.38
10.58
21
hectic
91.26
9.58
19
1.11 REPORT FORMAT GENERATED ON SENSORS AND STORED IN DATABASE
{ summary: {
time: Number,
device: String,
randomised: Number,
unique: Number,
power_limit: Number,
measure_temp: Number,
scan_length: Number,
channel_hop_frequency: Number,
top_devices: [{manufacturer: String, count: Number }]
}, clients: [{
client: String,
association: String,
power: Number,
is_randomised: Boolean
}]}
Page| 34
1.12 MAPBOX DISPLAY OF BUSYNESS USING HEATMAPS
1.13 GOOGLES BUSYNESS INDICATOR
Page | 35
1.14 DETECTED DEVICE COUNT AGAINST PEOPLE IN AN AREA
1.15 CONFIG.JSON FORMAT TO ALLOW REMOTE UPDATES OF SCAN PARAMS
A sample configuration
{
"scan_length": 60,
"channel_hop_frequency": 1000,
"power_limit": -80,
"interface": "wlan0",
"endpoint": "http://weather.cs.nuim.ie/server/ashmore/a/report",
"updated": 0
}
1.16 SIMPLE MOVING AVERAGE (TYPESCRIPT) SMOOTHING BUSYNESS
/**
* Performs a simple moving average over input dataset
* https://en.wikipedia.org/wiki/Moving_average#Simple_moving_average
* @param dataset number array of input data on which to perform moving average
* @param movingRange the range over which to average data e.g. 3/5/7/... values
*/
private movingAverage(dataset: number[], movingRange: number) {
let rollingValues: number[] = [];
rollingValues.length = movingRange;
rollingValues.fill(0);
const average = [];
for (const [i, point] of dataset.entries()) {
rollingValues[i % movingRange] = point;
if (i % movingRange === 0 && i !== 0) {
average.push(rollingValues.reduce((a, b) => a + b, 0) / movingRange);
}
}
return average;
}
0
10
20
30
40
50
60
Measurements over time
Devices Detected vs People
devices people
Page| 36
1.17 PRESENTATION OF REAL-TIME AND WEIGHTED BUSYNESS READINGS
1.18 PSEUDOCODE FOR MAC ADDRESS RANDOMISATION FITLERING
def is_randomised(device):
if device.is_associated:
return False
if device.mac.has_manufacturer_prefix:
return False
if not device.mac.is_locally_administered:
return False
return True
Page | 37
1.19 DATA COLLECTED DURING SYSTEM ACCURACY POLL
Final Year Lab SU Bar
System
User
Error
0.490361
0.22
-0.270361
0.418023
0.217
-0.201023
0.418023
0.237
-0.181023
0.455814
0.278
-0.177814
0.340361
0.172
-0.168361
0.286145
0.223
-0.063145
0.238166
0.18
-0.058166
0.337278
0.142
-0.195278
0.301829
0.067
-0.234829
0.221893
0.302
-0.08011
0.261628
0.177
0.084628
0.261628
0.12
0.141628
Survey
User
Error
0.519784
0.579
0.0592158
0.341945
0.511
0.1690547
0.341945
0.687
0.3450547
0.341945
0.553
0.2110547
0.363133
0.579
0.2158671
0.404272
0.565
0.1607278
0.378165
0.623
0.2448354
0.376187
0.635
0.2588133
0.368705
0.623
0.254295
0.486702
0.594
0.1072979
0.486702
0.521
0.0342979
1.20 CHROME LIGHTHOUSE PERFORMANCE AND ACCESSIBILITY AUDIT
Page| 38
1.21 PROS & CONS OF ESP32 AND GENERIC LINUX CHIPSET TESTING
This section refers to 4.1 in the main body. The following are the pros and cons of each choice of
device/setup in sensing.
1.21.1 Linux + Airodump-ng
Pros
- Pre-written tools to scan MAC addresses
- Works with any compatible chipset.
o Raspberry Pi with a kernel modification can run client/monitor mode
o If a more performant Wi-Fi chipset is needed, the Raspberry Pi has USB support for
external devices like the Alfa used in testing
- Can run from CLI - writes output file that other programs could read and send to server
- Easy to debug over SSH if required
- Can test toolset/scripts easily on any Unix OS
- Has simultaneous client and monitor mode network access with support for 802.1x
Cons
- Requires more power-hungry hardware, limiting deployment options
- Script is pre-written and may have limitations
- Added complexity of running an OS underneath the sensing script.
- The script must should run on boot and only end when requested (ESP32 does this by default)
1.21.2 ESP32 + Arduino/C
Pros
- Specify behaviour as desired, all code written from scratch
- Device can run on battery as it is low powered
- Very small and portable
- Boots immediately into program in memory, no need to schedule auto start for script
Cons
- Very few code examples and little documentation
- Personal unfamiliarity with C and Arduino devices. Code may not be robust or reliable
- Software and hardware performance are poor, lower than any device capable of running Linux
- Acquiring network access will require tricky redirection or certsno support for 802.1x
1.22 PERSONAL OBSERVATIONS OF BUSYNESS DURING DEPLOYMENT
This evaluation was performed at a time when it was considered that busyness ≈ occupancy, which is
way there are discrepancies between my evaluation of the system and that of participants in the busyness
accuracy survey
Personal Evaluation
Observation Notes
Very Quiet
Accurate
Most tables vacant, no people standing
Quiet
Accurate
Many vacant tables, plentiful standing space
Lively Mostly accurate
Feels a little quieter than lively, almost "quiet" (Readings to
this classification)
Busy
Accurate
Most tables filled, some empty, lots of standing space
Very Busy
Accurate
All tables filled, very little standing space
Page | 39
1.23 DEMONSTRATION OF END-TO-END SYSTEM CORRECTNESS
Due to length constraints, the full demo of system correctness is presented in the appendix. This section
will demonstrate the flow of data through the system and validate its correctness. Please refer to Figure
2 in the main body for an overview of the system.
On the sensor, two files are generated. First, the output from Airodump-ng is written to a file
for parsing later. Note the highlighted MAC addresses, these will be followed through the system!
Of the three highlighted address, the device with RSSI -50dBm (the last one) will be filtered as
it is randomised. The reason it was determined as randomised is because its 2nd character is “A”, it wasn’t
associated with a network, and its 3-byte prefix was not found in the OUI reservation list. This is
discussed in 4.5.1. You might wonder why the first device was not considered randomised despite not
being associated with a network and its second character being “C” (seemingly locally administered). As
it happens, "5C:C5:D4" is the OUI reservation for "Intel Corporate", this is the rule which saved it from
being filtered. This is a good example of the MAC address filtering algorithm in action (Appendix 1.18).
Above is the section of the generated report that contains the highlighted addresses from
Airodump-ng’s output. As you can see, the device with -50dBm RSSI has been filtered due to
randomisation. In addition, the report includes the MAC addresses of both clients and APs in a hashed
Page| 40
format. This will allow comparison and filtering if needed in the future but has destroyed the original
MAC Address for privacy.
This is the summary section of the report, the stored client data is not currently used for
anything, but the summary is used during data analysis and live visualisations. This report.json is
now submitted to the API.
Now that the report has been submitted to the API, it will be validated by the database schema
and if valid will be stored in the database. By utilising the GET_reports_by_range endpoint, we can
find the report submitted at timestamp 1583346320601.
Page | 41
This is proof that the report as generated has been submitted to the server and is query-able. It can now
be used in analysis and live reporting.
Refreshing the front-end platform, we can see in the networking tab that the busyness profile
for the Final Year Lab was requested. We can see that the weighted busyness rating of 13.65 included
the latest report generated with a measurement of 9 unique devices. We can also cross-reference the id
and time of the latest report to verify that this is indeed the report that we have been following through
the system. The network information also tells us some other properties of the profile, such as the min
and max device counts, and the range analysed to obtain the min/max counts.
This concludes the journey of data from source to visualisation. The report is now saved in the
database, ready to be analysed in the future, and will contribute to the profile of the Final Year Lab for
the next ~few weeks (depending on analysis).
Page| 42
1.24 DOCKER COMPOSE CONFIGURATION
This is the configuration used to spin up both the API and the MongoDB container on the department
server.
version: '3'
services:
mongo:
image: mongo
restart: always
command: mongod --port 5050
ports:
- "5050:5050"
volumes:
- mongodata:/data/db
environment:
MONGO_INITDB_ROOT_USERNAME: xxxxxxxxxxxxx
MONGO_INITDB_ROOT_PASSWORD: xxxxxxxxxxxxx
web:
build: ./node-api
restart: always
depends_on:
- mongo
ports:
- "5010:5010"
volumes:
mongodata:
driver: local
Page | 43
2 REFERENCES
[1] Maynooth Univeristy, “Maynooth at a glance | Maynooth University.” [Online]. Available:
https://www.maynoothuniversity.ie/about-us/maynooth-glance. [Accessed: 17-Mar-2020].
[2] C. Department of Education and Skills, “Education - CSO - Central Statistics Office.” [Online].
Available: https://www.cso.ie/en/releasesandpublications/ep/p-
mip/measuringirelandsprogress2017/ed/. [Accessed: 17-Mar-2020].
[3] E. Vattapparamban, “People Counting and occupancy Monitoring using WiFi Probe Requests
and Unmanned Aerial Vehicles,” FIU Electron. Theses Diss., 2016, doi: 10.25148/etd.FIDC000246.
[4] N. Abedi, A. Bhaskar, and E. Chung, “Bluetooth and Wi-Fi MAC address based crowd data
collection and monitoring: Benefits, challenges and enhancement,” in Australasian Transport
Research Forum, ATRF 2013 - Proceedings, 2013.
[5] S. B. Azmy, N. Zorba, and H. S. Hassanein, “Quality of Coverage: A Novel Approach to
Coverage for Mobile Crowd Sensing Systems,” 2018 Glob. Inf. Infrastruct. Netw. Symp. GIIS 2018, pp.
1–5, 2019, doi: 10.1109/GIIS.2018.8635769.
[6] N. Abedi, A. Bhaskar, and E. Chung, “Tracking spatio-temporal movement of human in terms
of space utilization using Media-Access-Control address data,” Appl. Geogr., vol. 51, pp. 7281, 2014,
doi: 10.1016/j.apgeog.2014.04.001.
[7] A. Baum and G. E. Davis, “Spatial and social aspects of crowding perception,” Environ.
Behav., vol. 8, no. 4, pp. 527544, 1976, doi: 10.1177/001391657684003.
[8] R. Schenström and E. Hörnlund, “Indoor Location Surveillance Utilizing Wi-Fi and Bluetooth
Signals,” 2019.
[9] L. Bai, N. Ireson, S. Mazumdar, and F. Ciravegna, “Lessons learned using Wi-Fi and bluetooth
as means to monitor public service usage,” in UbiComp/ISWC 2017 - Adjunct Proceedings of the 2017
ACM International Joint Conference on Pervasive and Ubiquitous Computing and Proceedings of the
2017 ACM International Symposium on Wearable Computers, 2017, pp. 432440, doi:
10.1145/3123024.3124417.
[10] SensMax, “People counting system for shopping malls and smart buildings.” [Online].
Available: https://sensmax.eu/solutions/people-counting-system-for-shopping-malls-and-smart-
buildings/. [Accessed: 12-Feb-2020].
[11] B. Brindle, “How Does Google Maps Predict Traffic? | HowStuffWorks,” 11-Feb-2020.
[Online]. Available: https://electronics.howstuffworks.com/how-does-google-maps-predict-
traffic.htm. [Accessed: 12-Feb-2020].
[12] X. Liu, P. H. Tu, J. Rittscher, A. Perera, and N. Krahnstoever, “Detecting and counting people
in surveillance applications,” IEEE Int. Conf. Adv. Video Signal Based Surveill. - Proc. AVSS 2005,
vol. 2005, pp. 306311, 2005, doi: 10.1109/AVSS.2005.1577286.
[13] M. A. K. Sagun and B. Bolat, “A novel approach for people counting and tracking from crowd
video,” Proc. - 2017 IEEE Int. Conf. Innov. Intell. Syst. Appl. INISTA 2017, no. July, pp. 277281,
2017, doi: 10.1109/INISTA.2017.8001170.
[14] J. Li, L. Huang, and C. Liu, “Robust people counting in video surveillance: Dataset and
system,” 2011 8th IEEE Int. Conf. Adv. Video Signal Based Surveillance, AVSS 2011, pp. 5459, 2011,
doi: 10.1109/AVSS.2011.6027294.
Page| 44
[15] Y. Mao, J. Tong, and W. Xiang, “Estimation of crowd density using multi-local features and
regression,” Proc. World Congr. Intell. Control Autom., pp. 62956300, 2010, doi:
10.1109/WCICA.2010.5554367.
[16] K. Akkaya, I. Guvenc, R. Aygun, N. Pala, and A. Kadri, “IoT-based occupancy monitoring
techniques for energy-efficient smart buildings,” 2015 IEEE Wirel. Commun. Netw. Conf. Work.
WCNCW 2015, pp. 5863, 2015, doi: 10.1109/WCNCW.2015.7122529.
[17] A. Fod, A. Howard, and M. J. Matarić, “A laser-based people tracker,” Proc. - IEEE Int. Conf.
Robot. Autom., vol. 3, no. May, pp. 30243029, 2002, doi: 10.1109/robot.2002.1013691.
[18] T. D. Räty, “Survey on contemporary remote surveillance systems for public safety,” IEEE
Trans. Syst. Man Cybern. Part C Appl. Rev., vol. 40, no. 5, pp. 493515, 2010, doi:
10.1109/TSMCC.2010.2042446.
[19] I. Guvenc, “Enhancements to RSS Based Indoor Tracking Systems Using Kalman Filters,” Ieee
Pervasive Comput., no. 505, pp. 91102, 2003.
[20] O. G. Adewumi, K. Djouani, and A. M. Kurien, “RSSI based indoor and outdoor distance
estimation for localization in WSN,” Proc. IEEE Int. Conf. Ind. Technol., pp. 15341539, 2013, doi:
10.1109/ICIT.2013.6505900.
[21] Microsoft, “TypeScript in 5 minutes · TypeScript.” [Online]. Available:
https://www.typescriptlang.org/docs/handbook/typescript-in-5-minutes.html. [Accessed: 07-Feb-
2020].
[22] Docker Inc., “Compose file version 3 reference | Docker Documentation.” [Online]. Available:
https://docs.docker.com/compose/. [Accessed: 07-Feb-2020].
[23] Z. Liew, “Endpoint testing with Jest and Supertest | Zell Liew,” 2019. [Online]. Available:
https://zellwk.com/blog/endpoint-testing/. [Accessed: 07-Feb-2020].
[24] Ł. Podkalicki, “ESP32 - WiFi Sniffer | Łukasz Podkalicki,” 23-Jan-2017. [Online]. Available:
https://blog.podkalicki.com/esp32-wifi-sniffer/. [Accessed: 07-Feb-2020].
[25] N. Darchis, “802.11 frames: A starter guide to learn wireless sniffer traces,” 25-Oct-2010.
[Online]. Available: https://community.cisco.com/t5/wireless-mobility-documents/802-11-frames-a-
starter-guide-to-learn-wireless-sniffer-traces/ta-p/3110019. [Accessed: 12-Feb-2020].
[26] Aircrack-ng, “airodump-ng [Aircrack-ng].” [Online]. Available: https://www.aircrack-
ng.org/doku.php?id=airodump-ng. [Accessed: 07-Feb-2020].
[27] K. Kinzie, “How To Enable Monitor Mode & Packet Injection on the Raspberry Pi,” 15-Dec-
2018. [Online]. Available: https://null-byte.wonderhowto.com/how-to/enable-monitor-mode-packet-
injection-raspberry-pi-0189378/. [Accessed: 07-Feb-2020].
[28] Johannes, “MAC Address Randomization on iOS,” 18-Feb-2019. [Online]. Available:
https://www.turais.de/mac-address-randomization-on-ios-
12/?fbclid=IwAR0B24Ktyfrzm2wzc6XnvUok-oZIN19pxMLpqANxvRFTVyNZF1TrPq17LhU.
[Accessed: 07-Feb-2020].
[29] I. T. Standards et al., “Standard Group MAC Addresses: A Tutorial Guide,” vol. 10039, no. Llc,
pp. 14.
[30] Privacy Company, “What does the GDPR say about WiFi tracking?,” 2019. [Online]. Available:
https://www.privacycompany.eu/blogpost-en/what-does-the-gdpr-say-about-wifi-tracking.
[Accessed: 03-Feb-2020].
Page | 45
[31] M. B. W. Kobus, P. Rietveld, and J. N. Van Ommeren, “Ownership versus on-campus use of
mobile IT devices by university students,” Comput. Educ., vol. 68, pp. 2941, 2013, doi:
10.1016/j.compedu.2013.04.003.
[32] T. Mitchell, S. Madgwick, S. Rankine, G. Hilton, A. Freed, and A. Nix, “Making the Most of Wi-
Fi: Optimisations for Robust Wireless Live Music Performance,” Proc. Int. Conf. New Interfaces
Music. Expr., 2014.
[33] M. Smith, “How good is ‘good’? | YouGov,” 11-Oct-2018. [Online]. Available:
https://today.yougov.com/topics/lifestyle/articles-reports/2018/10/11/how-good-good. [Accessed:
09-Feb-2020].
[34] J. Martin et al., “A Study of MAC Address Randomization in Mobile Devices and When it
Fails,” 2017.
[35] T. N. Westover and J. R. Collins, “Perceived crowding in recreation settings: An urban case
study,” Leis. Sci., vol. 9, no. 2, pp. 8799, 1987, doi: 10.1080/01490408709512149.
[36] C. Duffy, “Google privacy lawsuit: Tech giant to pay $13 million over Street View data
collection - CNN,” 25-Jul-2019. [Online]. Available:
https://edition.cnn.com/2019/07/22/tech/google-street-view-privacy-lawsuit-settlement/index.html.
[Accessed: 06-Mar-2020].
[37] D. Oosterlinck, D. F. Benoit, P. Baecke, and N. Van de Weghe, “Bluetooth tracking of humans
in an indoor environment: An application to shopping mall visits,” Appl. Geogr., 2017, doi:
10.1016/j.apgeog.2016.11.005.
[38] Waitz, “Waitz.” [Online]. Available: https://waitz.io/index.html. [Accessed: 02-Mar-2020].
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Media Access Control (MAC) address randomization is a privacy technique whereby mobile devices rotate through random hardware addresses in order to prevent observers from singling out their traffic or physical location from other nearby devices. Adoption of this technology, however, has been sporadic and varied across device manufacturers. In this paper, we present the first wide-scale study of MAC address randomization in the wild, including a detailed breakdown of different randomization techniques by operating system, manufacturer, and model of device. We then identify multiple flaws in these implementations which can be exploited to defeat randomization as performed by existing devices. First, we show that devices commonly make improper use of randomization by sending wireless frames with the true, global address when they should be using a randomized address. We move on to extend the passive identification techniques of Vanhoef et al. to effectively defeat randomization in ~96% of Android phones. Finally, we identify a previously unknown flaw in the way wireless chipsets handle low-level control frames which applies to 100% of devices we tested. This flaw permits an active attack that can be used under certain circumstances to track any existing wireless device.
Conference Paper
Full-text available
With the proliferation of Internet of Things (IoT) devices such as smartphones, sensors, cameras, and RFIDs, it is possible to collect massive amount of data for localization and tracking of people within commercial buildings. Enabled by such occupancy monitoring capabilities, there are extensive opportunities for improving the energy consumption of buildings via smart HVAC control. In this respect, the major challenges we envision are 1) to achieve occupancy monitoring in a minimally intrusive way, e.g., using the existing infrastructure in the buildings and not requiring installation of any apps in the users' smart devices, and 2) to develop effective data fusion techniques for improving occupancy monitoring accuracy using a multitude of sources. This paper surveys the existing works on occupancy monitoring and multi-modal data fusion techniques for smart commercial buildings. The goal is to lay down a framework for future research to exploit the spatio-temporal data obtained from one or more of various IoT devices such as temperature sensors, surveillance cameras, and RFID tags that may be already in use in the buildings. A comparative analysis of existing approaches and future predictions for research challenges are also provided.
Article
Full-text available
Using Media-Access-Control (MAC) address for data collection and tracking is a capable and cost effective approach as the traditional ways such as surveys and video surveillance have numerous drawbacks and limitations. Positioning cell phones by Global System for Mobile communication was considered an attack on people's privacy. MAC addresses just keep a unique log of a WiFi or Bluetooth-enabled device for connecting to another device that has not potential privacy infringements. This paper presents the use of MAC address data collection approach for analysis of spatio-temporal dynamics of human in terms of shared space utilization. This paper firstly discuses the critical challenges and key benefits of MAC address data as a tracking technology for monitoring human movement. Here, proximity-based MAC address tracking is postulated as an effective methodology for analysing the complex spatio-temporal dynamics of human movements at shared zones such as lounge and office areas. A case study of university staff lounge area is described in detail and results indicates a significant added value of the methodology for human movement tracking. By analysis of MAC address data in the study area, clear statistics such as staff's utilisation frequency, utilisation peak periods, and staff time spent is obtained. The analyses also reveal staff's socialising profiles in terms of group and solo gathering. The paper is concluded with a discussion on why MAC address tracking offers significant advantages for tracking human behaviour in terms of shared space utilisation with respect to other and more prominent technologies, and outlines some of its remaining deficiencies.
Conference Paper
Full-text available
This paper firstly presents the benefits and critical challenges on the use of Bluetooth and Wi-Fi for crowd data collection and monitoring. The major challenges include antenna characteristics, environment’s complexity and scanning features. Wi-Fi and Bluetooth are compared in this paper in terms of architecture, discovery time, popularity of use and signal strength. Type of antennas used and the environment’s complexity such as trees for outdoor and partitions for indoor spaces highly affect the scanning range. The aforementioned challenges are empirically evaluated by “real” experiments using Bluetooth and Wi-Fi Scanners. The issues related to the antenna characteristics are also highlighted by experimenting with different antenna types. Novel scanning approaches including Overlapped Zones and Single Point Multi-Range detection methods will be then presented and verified by real-world tests. These novel techniques will be applied for location identification of the MAC IDs captured that can extract more information about people movement dynamics.
Conference Paper
Facets of urban public transport such as occupancy, waiting times, route preferences are essential to help deliver improved services as well as better information for passengers to plan their daily travel. The ability to automatically estimate passenger occupancy in near real-time throughout cities will be a step change in the way public service usage is currently estimated and provide significant insights to decision makers. The ever-increasing popularity and abundance of mobile devices with always-on Wi-Fi/Bluetooth interfaces makes Wi-Fi/Bluetooth sensing a promising approach for estimating passenger load. In this paper, we present a Wi-Fi/Bluetooth sensing system to detect mobile devices for estimating passenger counts using public transport. We present our findings on an initial set of experiments on a series of bus/tram journeys encapsulating different scenarios over five days in a UK metropolitan area. Our initial experiments show promising results and we present our plans for future large-scale experiments.
Article
Intelligence about the spatio-temporal behaviour of individuals is valuable in many settings. Generating tracking data is a necessity for this analysis and requires an appropriate methodology. In this study, the applicability of Bluetooth tracking in an indoor setting is investigated. A wide variety of applications can benefit from indoor Bluetooth tracking. This paper examines the value of the method in a marketing application. A Belgian shopping mall served as a real-life test setting for the methodology. A total of 56 Bluetooth scanners registered 18.943 unique MAC addresses during a 19-day period. The results indicate that Bluetooth tracking is a sound approach for capturing tracking data, which can be used to map and analyse the spatio-temporal behaviour of individuals. The methodology also provides a more efficient and more accurate way of obtaining a variety of relevant metrics in the field of consumer behaviour research. Bluetooth tracking can be implemented as a new and cost effective practice for marketing research, that provides fast and accurate results and insights. We conclude that Bluetooth tracking is a viable approach, but that certain technological and practical aspects need to be considered when applying Bluetooth tracking in new cases.
Conference Paper
Research has revealed that the correlation between distance and RSSI (Received Signal Strength Indication) values is the key of ranging and localization technologies in wireless sensor networks (WSNs). In this paper, an RSSI model that estimates the distance between sensor nodes in WSNs is presented. The performance of this model is evaluated and analyzed in a real system deployment in an indoor and outdoor environment by performing an empirical measurement using Crossbow IRIS wireless sensor motes. Our result shows that there is less error in distance estimation in an outdoor environment compared to indoor environment. The results of these evaluations would contribute towards obtaining accurate locations of wireless sensor nodes.
Article
This study investigated ownership and on-campus use of laptops, tablets, and smartphones, using survey information on Dutch university students. We show that 96% of students own at least one of these mobile IT devices (i.e., a laptop, tablet, or smartphone). Using econometric modelling, we also show that student income, parental income, gender, immigrant parents, and household type (e.g., living with parents) have a statistically significant but small effect on mobile IT device ownership. The demand for tablets is relatively income inelastic, and the demand for laptops and smartphones extremely so. Therefore ownership rates are high for all student groups, including lower income students. However, students leave their laptops (and tablets) at home most of the time, mainly because they find it cumbersome to carry a laptop, and the vast majority of students hold the opinion that abolishing computer labs while facilitating laptop use is a bad idea, despite the didactical advantages this may have during lectures. Thus, it appears that the current high ownership rates of mobile IT devices by no means imply students' preference or support for university Bring Your Own Device (BYOD) strategies.