Conference PaperPDF Available

GameVibes: Vibration-based Crowd Monitoring for Sports Games through Audience-Game-Facility Association Modeling

Authors:
GameVibes: Vibration-based Crowd Monitoring for Sports Games
through Audience-Game-Facility Association Modeling
Yiwen Dong
ywdong@stanford.edu
Stanford University
Stanford, California, USA
Yuyan Wu
Stanford University
Stanford, USA
Jesse R Codling
University of Michigan
Ann Arbor, USA
Jatin Aggarwal
Stanford University
Stanford, USA
Peide Huang
Carnegie Mellon University
Pittsburgh, USA
Wenhao Ding
Carnegie Mellon University
Pittsburgh, USA
Hugo Latapie
Cisco Systems, Inc.
San Jose, USA
Pei Zhang
University of Michigan
Ann Arbor, USA
Hae Young Noh
Stanford University
Stanford, USA
Figure 1: Crowd reactions during a NCAA Pac-12 Basketball Game at Stanford Maples Pavilion.
ABSTRACT
Crowd monitoring involves tracking and analyzing the behavior of
large groups of people in large-scale public spaces, such as sports
games. In sports stadiums, understanding audience reactions to the
games and their distribution around the public facilities is important
for ensuring public safety and security, enhancing the game experi-
ence, and improving crowd management. Recent crowd-crushing
incidents (e.g., Kanjuruhan Stadium disaster, Seoul Halloween Stam-
pede) have caused 100+ deaths in a single event, calling for ad-
vancements in crowd monitoring methods. Existing monitoring
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than the
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or
republish, to post on servers or to redistribute to lists, requires prior specic permission
and/or a fee. Request permissions from permissions@acm.org.
BuildSys ’23, November 15–16, 2023, Istanbul, Turkey
©2023 Copyright held by the owner/author(s). Publication rights licensed to ACM.
ACM ISBN 979-8-4007-0230-3/23/11. . . $15.00
https://doi.org/10.1145/3600100.3623750
approaches include manual observation, wearables, video-, audio-,
and WiFi-based sensing. However, few meet the practical needs
due to their limitations in cost, privacy protection, and accuracy.
In this paper, we introduce GameVibes, a novel method for crowd
behavior monitoring using crowd-induced oor vibrations to infer
audience reactions to the game (e.g., clapping, stomping, dancing)
and crowd trac (i.e., the number of people entering each door).
The main benets of GameVibes are that it allows continuous, ne-
grained crowd monitoring in a cost-eective and non-intrusive way
and is perceived as more privacy-friendly. Unlike monitoring an
individual person, crowd monitoring involves understanding the
overall behavior of a large population (typically more than 1,000),
leading to high uncertainty in the vibration data. To overcome the
challenge, we rst establish the game and facility association to
inform the context of crowd behaviors, including 1) game associa-
tions (temporal context) between the crowd reaction and the game
progress and 2) facility associations (spatial context) between the
crowd trac and facility layouts. Then, we formulate the crowd
monitoring problem by converting the conceptual graph of the
177
BuildSys ’23, November 15–16, 2023, Istanbul, Turkey Dong, et al.
audience-game-facility association into probabilistic game/facility
association models. Through these models, GameVibes rst learns
the latent representations of the game progress and facility layout
through neural network encoders, and then integrates heteroge-
neous game/facility information and vibration data to estimate
crowd behaviors. This mitigates the estimation error due to the
uncertainty in vibration data. To evaluate our approach, we con-
duct 6 real-world deployments for NCAA Pac-12 games at Stanford
Maples Pavilion. Our results show that GameVibes achieves a 0.9
F-1 score in crowd reaction monitoring and 9.3 mean absolute error
in crowd trac estimation, which correspond to 10% and 12.2%
error reduction, respectively, compared to the baseline methods
without context-specic information.
KEYWORDS
crowd behavior, oor vibration, context, association, sports game
ACM Reference Format:
Yiwen Dong, Yuyan Wu, Jesse R Codling, Jatin Aggarwal, Peide Huang, Wen-
hao Ding, Hugo Latapie, Pei Zhang, and Hae Young Noh. 2023. GameVibes:
Vibration-based Crowd Monitoring for Sports Games through Audience-
Game-Facility Association Modeling. In The 10th ACM International Con-
ference on Systems for Energy-Ecient Buildings, Cities, and Transportation
(BuildSys ’23), November 15–16, 2023, Istanbul, Turkey. ACM, New York, NY,
USA, 12 pages. https://doi.org/10.1145/3600100.3623750
1 INTRODUCTION
Crowd monitoring is the process of tracking and analyzing the
behavior of large groups of people in large-scale public spaces,
such as sports games and shopping malls [
24
]. Especially, crowd
monitoring in sports stadiums is a critical component in ensuring
public safety and security [
37
], enhancing the game experience for
the audience [
13
], and optimizing resource allocation at the stadi-
ums [
21
,
28
]. Over the years, the mismanagement of the crowd has
led to grave consequences. For example, the Kanjuruhan Stadium
disaster and the Seoul Halloween Stampede have caused 135 and
159 deaths, respectively [33, 36]. Nevertheless, studies have found
that such incidents can be prevented by proper crowd monitoring
and timely crowd control [
8
]. By analyzing the reactions and trac
of the crowd, we can detect and prevent potential threats, such as
riots, stampedes, violence, or terrorist attacks [
18
]. Moreover, we
can also gain insights into the social and psychological aspects of
crowd behavior, such as emotions and motivations, to understand
the fundamental cause behind such behaviors [38].
While many existing approaches are developed to monitor crowd
behaviors, few meet the practical needs due to their limitations in
cost, privacy protection, and accuracy [
1
,
4
,
15
,
19
,
23
]. For example,
manual monitoring is the most common approach [
22
], which
is labor-intensive, costly, and can be signicantly delayed due to
negligence in manual observations. Automatic devices such as video
and audio data suer from privacy issues due to the appearance
and voice recordings of the public [
5
,
25
]. On the other hand, WiFi-
and radio frequency-based devices are used to capture the body
motion of individuals [
17
,
35
], but they have diculty in capturing
the activity among a large group of people due to noise interference
and between-person dierences, producing inaccurate results.
In this paper, we introduce GameVibes, a novel system for crowd
monitoring using vibration sensors mounted underneath or on the
oor surfaces. GameVibes captures crowd-induced vibrations to in-
fer crowd reactions to the game (e.g., clapping, stomping, dancing)
and crowd trac (i.e., the number of people entering each door).
The primary intuition behind GameVibes is that various types of
crowd behaviors and levels of crowd trac induce distinct vibration
patterns of the oor, allowing us to characterize and distinguish
crowd behaviors. The main benets of using oor vibration to infer
human behaviors are that it is cost-ecient, allows continuous,
ne-grained crowd behavior monitoring, and is perceived as more
privacy-friendly than cameras or audio recordings. This sensing
approach has been explored in many existing applications, such as
occupant detection [
26
,
31
], identication [
10
,
30
], activity recogni-
tion [3, 29], localization [27], and health monitoring [11, 12, 20].
However, crowd monitoring is not a trivial problem because it
involves a large group of people (typically more than 1,000), which
causes high uncertainty in crowd-induced vibration data. Unlike
monitoring an individual person, crowd monitoring involves under-
standing the overall behavior of a large population, which involves
huge variations in the uniformity of their behaviors, particularly
reected in two aspects: 1) During the game, oor vibration in-
duced by crowd reaction is uncertain due to the dierence among
individuals and the proportion of people reacting; 2) Before/after
the game, oor vibration induced by crowd trac is uncertain due
to the large range of possible number of pedestrians. As a result,
estimating crowd behaviors may lead to much larger errors than
monitoring an individual person.
To overcome the challenges, we leverage the audience-game-
facility associations, which bridge the context of the game/facility
with crowd reactions and trac. Specically, GameVibes establishes
1) game associations (temporal context) between the crowd reaction
and the game progress, such as clapping after the home team’s
goals, stomping to disturb the opponents’ free throws, and 2) fa-
cility associations (spatial context) between the crowd trac and
facility layouts, such as crowd accumulates around the entry doors
near the food stands. With the established associations, we formu-
late the crowd monitoring problem by converting the conceptual
graph of the audience-game-facility association into probabilistic
game/facility association models. Through these models, GameVibes
rst learns the latent representations of the game progress and fa-
cility layout through neural network encoders, and then merge
the heterogeneous game/facility information and vibration data
to estimate the crowd behaviors. With the audience-game-facility
association modeling, GameVibes mitigates the estimation error due
to the uncertainty in the vibration data, leading to more accurate
and interpretable crowd monitoring.
The key contributions of this paper are:
We introduce GameVibes, the rst oor-vibration-based crowd
monitoring system that continuously monitors crowd reac-
tions during sports games and estimates crowd trac over
various entry locations.
We characterize the game- and facility-dependent oor vi-
brations induced by the crowd behaviors to develop game
and facility association models, which provide temporal and
spatial contexts to the vibration data to allow more accurate
and interpretable crowd monitoring.
We evaluate the GameVibes system through 6 real-world
deployments for NCAA sports games at Stanford Maples
178
GameVibes: Vibration-based Crowd Monitoring for Sports Games BuildSys ’23, November 15–16, 2023, Istanbul, Turkey
Pavilion, which validates its eectiveness and robustness
under various scenarios.
For the rest of this paper, we rst characterize crowd-induced vi-
brations with game and facility contexts to formulate the audience-
game-facility association model (Section 2), then introduce the
components of the GameVibes system (Section 3). Next, we present
the real-world evaluation and discuss the results (Section 4). After
summarizing the related work (Section 5), we conclude the study
and present the future work (Section 6).
2 CHARACTERIZING CROWD-INDUCED
VIBRATION WITH GAME AND FACILITY
CONTEXTS
In this section, we characterize the relationship among crowd be-
haviors, vibration data, and game/facility contexts to develop an
audience-game-facility association model for crowd reaction and
trac estimation. Specically, we rst discuss how crowd reac-
tion is aected by game progress and reected in oor vibration
of the bleachers (Section 2.1), and then analyze how crowd traf-
c is inuenced by facility layout and captured by oor vibration
(Section 2.2). Then, we establish game and facility associations to
develop probabilistic graphical models that merge heterogeneous
information (game/facility information and vibration data) to make
context-aware estimations of crowd behaviors.
2.1 Crowd Reaction Characterization through
Floor Vibration and Game Progress
To understand how crowd reaction is aected by game progress
and reected in oor vibration, we rst characterize the vibration
signals with respect to crowd reaction to validate the vibration-
based approach for crowd reaction monitoring. Then, we analyze
the relationship between the game progress and the crowd reactions
in order to leverage this relationship as a temporal context to the
vibration data in Section 2.3.
2.1.1 Relationship between Crowd Reaction and Floor Vibration.
We characterize the oor vibrations induced by various types of
crowd reactions, including 1) quiet (no body motion), 2) active (sit-
ting with upper or lower body movements such as clapping and
foot shuing), and 3) moving (standing/walking with lower body
movements such as stomping and dancing), as shown in Figure 2.
The vibration induced by the quiet reaction has a low signal ampli-
tude and noise-like oscillations around the mean. In contrast, the
vibration induced by moving (i.e., stomping) has large amplitudes,
characterized by separated impulses each representing a heavy
step. Other active reactions such as clapping induce oor vibration
indirectly through the bleacher seats, so the signal has a lower
amplitude than moving with a unique frequency representing the
physical properties of the seat-oor connection.
2.1.2 Relationship between Crowd Reaction and Game Progress.
With the relationship between oor vibration and crowd reactions,
we further analyze how audience reaction changes as the game
progresses. Figure 3 shows the distribution of crowd reaction types
associated with various game events, including home goal, oppo-
nent goal, and game break. We observe that the crowd is mainly
Quiet
Active
Moving
Clapping
Stomping
Figure 2: Characterization of vibration signals induced by
crowd reactions, including quiet, active, and moving (from
top to bottom). Both time- and wavelet-domain plots show
clear distinctions among various crowd reaction types.
90%
2%
7% 11%
67%
22%
20%
60%
20% Active
Moving
Quiet
90%
2%
7% 5%
63%
32%
20%
60%
20%
Opponent GoalHome Goal Game Break
Figure 3: Crowd reaction distribution varies across various
events at a sample game. Active (clapping) dominates the
home goal event, while moving (stomping) and quiet domi-
nate the opponent goal event. The reactions are more evenly
distributed during the game break.
clapping (i.e., active) after the home goal, while remaining quiet
after the opponent’s goal, except for a few opponents’ fans. The
moving (mainly stomping) reaction observed at the opponent’s
goal is mainly caused by noise making to distract the opponent
during the defense, showing support for our home team. Crowd
reactions during the game break are more diverse as people may
choose to take a break by leaving the seating area or stay to enjoy
the entertainment on-court such as kids’ mini-game, T-shirt toss,
and dance cams.
2.2 Crowd Trac Characterization through
Floor Vibration and Facility Layout
To validate the vibration-based approach for crowd trac estima-
tion, we rst characterize the relationship between oor vibration
and crowd trac. Then, We analyze how facility layout aects
crowd trac in order to leverage this inuence as a spatial context
to the vibration data in Section 2.3.
179
BuildSys ’23, November 15–16, 2023, Istanbul, Turkey Dong, et al.
11:25:20
Time Feb 20, 2023
1.75
1.8
1.85
1.9
Signal (V)
Door OpeningWalking Door Closing
Figure 4: Floor vibrations captured by a sample sensor near
an entry door. The impulsive peaks induced by walking and
opening or closing the door are detected by a peak-picking
algorithm, which correlates with the level of crowd trac.
a) Facility layout around door 1 b) Facility layout around door 5
c) Number of people entering door 1 d) Number of people entering door 5
0
50
100
150
Number of People
Door 5
5:10-5:20 pm
5:20-5:30 pm
5:30-5:40 pm
5:40-5:50 pm
0
20
40
60
80
100
Number of Peaks
# of peaks
# of people
Figure 5: The dierence in facility layout around a) door 1
and b) door 5 leads to distinct crowd trac as shown in c)
and d). The crowd trac is correlated with the number of
peaks detected in the vibration signals.
2.2.1 Relationship between Crowd Traic and Floor Vibration. The
number of audience entering through each door can be inferred
from the oor vibration signals captured by sensors deployed at
the oor’s surface beside each door. The physical intuition is that
the movements of the audience passing through the door, including
walking and opening or closing the door, generate vibration in the
oor structures. These vibrations are then detected and recorded by
sensors mounted on the oor surface (refer to Figure 4). We notice
that there is a positive correlation between the number of peaks
and the number of people entering the doors (See Figure 5). This
indicates the frequency of audience movements (e.g., walking, door
opening/closing) correlates with the number of people entering
through each door during a time interval. For example, more foot-
steps and door-opening events will be observed when the crowd
trac is of higher volume. Therefore, detecting impulsive peaks
in the recorded vibration signals induced by audience movements
allows crowd trac estimation at each door.
2.2.2 Relationship between Crowd Traic and Facility Layout. The
location and distribution of essential facilities such as food stations,
restrooms, and game swags play a crucial role in determining crowd
18:49 18:50 18:51 18:52
Feb 20, 2023
Vibration
Data
Game
Process
Opponent Goal
Home Goal
Game Break Start
Active Vibe Windows
Temporal Association
Game
Progress
Vibration
Data
Figure 6: Game association examples between crowd-induced
vibrations and the game progress. Each game event such as
the opponent goal, home team goal, and game break start is
matched with active vibration windows over time.
trac. For example, Figure 5 shows the facility layout around two
sample doors at our evaluation site, obtained from the stadium map
provided by the venue operator. Door 1 is surrounded by two food
stands, a restroom, and a game swag station, which attracts a larger
audience. In contrast, door 5 has fewer facilities around, which
leads to a lower level of crowd trac. As we compare Figure 5c)
and d), however, we observe that the ratio of detected peaks to
the actual crowd trac diers across these two doors. This means
that only knowing the number of peaks in the vibration signals
is not sucient to estimate the number of people. Therefore, we
leverage the distinct layout of facilities surrounding door 1 and door
5 to enhance the accuracy of our crowd trac estimation through
facility associations, which is introduced in Section 2.3.
2.3
Formulation of the Audience-Game-Facility
Association Models
To incorporate the inuence of game progress and facility layout on
crowd behaviors, we establish game/facility associations to provide
temporal and spatial contexts to the vibration data. Specically, we
establish 1) the game association between crowd reaction and the
game progress to provide a temporal context, and 2) the facility
association between the crowd trac and facility layout to provide a
spatial context. With these contexts, we formulate the game/facility
association models to allow more accurate and interpretable esti-
mation of crowd behaviors.
2.3.1 Establishing Game Associations. The game association is de-
ned as the relationship between crowd-induced oor vibrations
and the game progress through their occurrences in time sequence.
For example, if the crowd reacts with a round of applause after
a home team’s goal, there is an association based on the time se-
quence such that “clapping” occurs concurrently or right after the
“goal” event.
We establish the game associations based on the time sequence of
occurrences between events during the games and crowd-induced
vibration signals. In the previous example, we capture the unique
oor vibration signals induced by the “clapping” motion to establish
an association with the “goal” event, which provides a context of
the game to the vibration data recorded at that time. The game
event types we focus on are mainly the score changes and game
time divisions (i.e., playing periods and game breaks), which are
easily accessible from the stadium operation team. The vibration
signals are divided into a series of 1-second windows for discrete
association. Figure 6 shows a snapshot of the game associations
180
GameVibes: Vibration-based Crowd Monitoring for Sports Games BuildSys ’23, November 15–16, 2023, Istanbul, Turkey
between game progress and vibration data. A series of continu-
ous active signal windows (black dots) are matched with various
events (colored dots) in the games. However, not all windows are
matched because these vibrations may be induced by unrecorded
game events, such as extraordinary blocking and passing moments,
or by individuals who are sitting near the sensors. In these cases,
we estimate the crowd reaction through vibration data only.
2.3.2 Establishing Facility Association. The facility association refers
to the relationship between the vibration data recorded at entry
doors and the facilities around those doors based on spatial distance.
For example, if a sensor is placed at an entry door with a food stand
nearby, then the vibration data reects the crowd trac around the
food stand, which has a distinct trac pattern when compared to a
door without any facilities.
We establish the facility associations through location proximity
between the sensor and each facility type. The facilities we consider
include restrooms, game swag stations, food stands, and a student
center. Facilities within a certain walking distance near the doors
are associated with the corresponding sensor, in which the distance
threshold is chosen as the overall maximum distance to reach any
facility starting from the closest door. The distance is important
because it reects the strength a facility association - the shorter
the distance is, the more inuence a facility may have on the crowd
trac. These associations are encoded as a spatial proximity matrix
to enhance the accuracy of our crowd trac estimation. The details
of how we leverage the facility layout for crowd trac estimation
are described in Section 3.4.2.
2.3.3 Developing Audience-Game-Facility Association Model for
Crowd Monitoring. With the game (temporal context) and facility
(spatial context) associations established in the previous subsec-
tions, we formulate the crowd monitoring problem by developing
a game association and a facility association model that formalize
the relationship among vibration data, game progress, and facility
layout. As summarized in Figure 7, the conceptual graphs (left)
describing audience-game-facility relationships are converted into
the corresponding probabilistic graphical models (right), allowing
probabilistic analysis of the crowd behavior through vibration data.
Specically, we formulate the game and facility association model
for crowd reaction monitoring (upper) and crowd trac estimation
(lower), respectively.
Crowd Reaction Monitoring Formulation: The upper left
part of Figure 7 shows the conceptual graph between crowd reaction
and game progress. According to the discussion in Section 2.1, the
game progress aects the crowd reaction, with which the vibration
data can be associated through the sequence of occurrence in time.
To this end, we formulate a probabilistic game association model
among crowd reaction (Y), game progress (G), and vibration data
(X) based on the conceptual graph, where dependencies and game
associations are maintained. Assuming that the game record is an
accurate and timely reection of the game progress,
𝐺
is regarded
as a deterministic variable in our model. With this formulation, the
objective of crowd reaction monitoring is to estimate 𝑃(𝑌|𝑋, 𝐺 ).
Crowd Trac Estimation Formulation: Similarly, the discus-
sion in Section 2.2 shows that the layout of the facility at each door
aects crowd trac, so the vibration data collected at that door
can be spatially associated with the surrounding facilities (see the
F
Crowd
Reaction
Game
Progress
Affect
Vibe Data Game Record
Game
Association
Conceptual Graph Game/Facility Association
Model
Induce
Y
G
X
Game
Association
Affect
Induce
Formulate
Crowd
Traffi c
Facility
Layout
Affect
Vibe Data Facility Proximity
Facili ty
Association
Induce
Y
X
Affect
Induce
Formulate
Crowd Reaction
Monitoring
Crowd Traffic
Estimation
Facili ty
Association
Figure 7: Conceptual graph (left) and the corresponding prob-
abilistic game/facility association model (right). The upper
half describes the relationships among the crowd reaction
(Y), game progress (G), and vibration data (X) through the
game association; the lower half summarizes the relation-
ships among the crowd trac (Y), facility layout (F), and
vibration data (X) through the facility association. G and F
are in squared boxes, representing deterministic variables.
lower part of Figure 7). Based on this conceptual relationship, we
formulate a probabilistic facility association model among crowd
trac (Y), facility layout (F), and vibration data (X), where the de-
pendencies and facility associations are maintained. Given that the
proximity of each type of facility around each door is known, 𝐹is
also regarded as a deterministic variable in our model. To this end,
the objective of crowd trac estimation is to compute 𝑃(𝑌|𝑋, 𝐹).
3 CROWD BEHAVIOR MONITORING
THROUGH AUDIENCE-GAME-FACILITY
ASSOCIATION MODELING
In this section, we rst provide an overview of our GameVibes
system and then present each module in the system for crowd
monitoring throughout sports games.
3.1 Overview of GameVibes
Our GameVibes system consists of three modules: 1) Sensing and
Data Pre-progressing, 2) Game-informed Crowd Reaction Monitor-
ing, and 3) Facility-informed Crowd Trac Estimation. The input
of the GameVibes system is the crowd-induced vibration, the game
record, and the facility layout in the stadium.
In the rst module, we collect the vibration data through vi-
bration sensors mounted underneath the oor (for bleachers) or
attached to the oor surface (at the entry doors). The vibration
signals are transmitted wirelessly to a centralized server and stored
in a hard drive. We also pre-process the raw signals through sliding
windows and interpolation algorithms to produce discretized signal
segments for analysis.
In the crowd reaction monitoring module, we integrate the game
record with the processed vibration data through the game associa-
tion model introduced in Section 2.3. The module estimates crowd
181
BuildSys ’23, November 15–16, 2023, Istanbul, Turkey Dong, et al.
Crowd-
Induced
Vibratio n
Sensing and
Data Pre-
Processing
(Sec. 3.2)
Game-Informed
Crowd Reaction
Monitoring
(Sec. 3.3)
Facility-Informed
Crowd Traffic
Estimation
(Sec. 3.4)
Crowd
Reaction Type
Crowd Traffic
at each door
Game
Record
Facility
Layout
Figure 8: Overview of the GameVibes system in 3 modules.
GameVibes integrates crowd-induced vibration, game record,
and facility layout to estimate crowd reaction and trac.
Router
Router Network
Sensor
Data Transmission
Figure 9: Sensor network of GameVibes. Sensor data are trans-
mitted wirelessly through routers via Wi-Fi connections in
the stadium.
reaction types, including audience status (e.g., quiet, active, moving)
and the specic type of reaction under each status (e.g., clapping,
stomping), which will be discussed in Section 3.3.
Similarly, for crowd trac estimation, we integrate the facility
layout with the processed vibration data based on the facility as-
sociation model in Section 2.3. This module estimates the crowd
trac in terms of the number of people entering each door, which
will be discussed in Section 3.4.
3.2 Sensing and Data Pre-processing
GameVibes’s geophone-based vibration sensing platform is devel-
oped based on the design that has been successful in previous animal
and human welfare applications [
6
,
9
]. This platform consists of
robust, independent geophone sensing nodes that communicate
over a private WiFi network. The geophones convert the vertical
velocity of the oor into electrical signals which are digitized at
the node and then transmitted to a centralized aggregator. At the
aggregator, each geophone’s data is recorded for later processing.
These data can be analyzed at the aggregator or downloaded for
analysis on a more powerful machine.
The setting of the large-scale sports stadium requires additional
adaptations to the previous sensor network design. Compared to
existing vibration-based sensing platforms, GameVibes represents
a signicant increase in scale, both in the deployment area and the
number of occupants. Because of this scale-up, many of the assump-
tions made in our previous sensing network [
6
,
9
] no longer applied.
For example, the mains-powered nodes running for months at a
time were changed to battery-powered geophone sensors which
can operate for at most 24 hours. This tradeo was deemed ac-
ceptable for the stadium environment where individual games are
relatively short, on the order of several hours, and the network can
be re-deployed for each event being monitored.
Operating in a larger area with stricter deployment requirements
informed changes to the network topology as well. Previously, a
simple star network was in use [
6
,
9
], as the default for WiFi-based
systems. Since this was now insucient, a multi-hop mesh network,
as shown in Figure 9, was used. In this mesh network, multiple
wireless access points service connections to sensor nodes while
passing data between each other on a separate wireless channel.
The fully wireless nature of this setup was necessary not only
for communication range but also to comply with strict visibility
requirements for the public location.
An unexpected challenge to wireless sensing of crowds was the
impact of the crowds themselves on data transmission. During test-
ing, four wireless routers were able to connect easily across the
basketball court. However, with the addition of occupants, who
both absorb radio waves and introduce electrical noise from de-
vices on their person, the wireless backhaul connections between
access points started to break down. Fortunately, due to the self-
organizing nature of 802.11s Wi-Fi meshes [
16
], the number of
access points is exible, so we add additional routers to reduce
the mesh hop distance, improving data transmission reliability. In
future deployments, our experience suggests that once connections
are established in a given environment, the maximum hop distance
should be halved to ensure robustness with large crowds.
After collecting the vibration data, we process the signals to
prepare for window-based analysis and mitigate the missing data
issue for subsequent data modeling. First, we segment the signals
into 1-second windows with a time step of 0.5 seconds to avoid the
eect of the activity signals truncated by the window edges. If more
than half of the data is missing in a window, it is excluded from
further analysis. Conversely, if the amount of missing data within
a window is less than half of its duration, a linear interpolation
method is employed to eciently ll in the missing values.
3.3
Game-informed Crowd Reaction Monitoring
In this section, we introduce the module for crowd reaction mon-
itoring, which integrates the game record and vibration data to
monitor crowd reactions. Each step of this module will be discussed
in the next few subsections based on Figure 10.
3.3.1 Crowd Reaction Detection and Feature Extraction. We rst
detect vibration windows that capture audience motions using the
processed vibration data (noted as Vibe Data). Crowd reaction de-
tection is performed by comparing the signal window to the noise
signal. To capture the noise characteristics, we select signal win-
dows for approximately 2 minutes during periods of inactivity (the
time when the stadium is empty). These windows serve as the noise
signal for reference. Assuming a normal distribution for the noise,
we calculate the mean and standard deviation values for these noise
signals for each sensor. The signals are then subtracted from the
mean value of the noise signal to maintain its zero average. To
accommodate the high noise variance caused by the loud music
during the game (approximately 20 times the standard deviation
of the noise according to our observations), we choose 20 times
of the standard deviation of the noise signal as the threshold for
crowd reaction detection. Windows with amplitudes surpassing
182
GameVibes: Vibration-based Crowd Monitoring for Sports Games BuildSys ’23, November 15–16, 2023, Istanbul, Turkey
Game Record
Encoder
Game
Association
Game
Record
Vibe Data
Crowd
Reaction
Detection
Feature
Extraction
Feature
Selection Vibe Data
Encoder
Crowd
Reaction
Update
Encoder
Concatenation
𝑃(𝑌|𝑋, 𝐺 )
Figure 10: Module for crowd reaction monitoring in
GameVibes. Game associations are rst established, and then
encoded by neural networks to extract latent representations
for the game and vibe data. Finally, the representations of
the game record and vibration data are integrated through
an update encoder to estimate crowd reactions.
Figure 11: Activity detection results on sample vibration data.
The algorithm successfully detects 83%of crowd reactions
within the 5-second error range of the recorded reactions.
this threshold are detected as windows with audience motion. Sub-
sequently, we smooth these identied activities across adjacent
5 windows through moving median to rene the detection and
enhance the accuracy of our analysis. As shown in Figure 11, this
algorithm successfully detects 83%of crowd reactions within the
5-second error range of the manually labeled reactions.
After the crowd reaction detection, we extract features from the
vibration signals, summarized as follows:
Time-domain Feature: signal energy of each 0.1-s segment.
Frequency-domain Feature: cumulative signal amplitudes in
each 10 Hz range after the Fourier Transform.
Time-Frequency-domain Feature: sum of wavelet coecients
within each 10-Hz and 0.1-second grid block.
These features provide a more comprehensive description of the
crowd reactions than a single domain, covering multiple aspects
of the vibration signals, such as its variations in time, frequency,
and dependencies between time and frequency. These features are
then selected by a random forest model to rank their importance
to the crowd reaction. This importance is determined based on the
impurity of crowd reaction types after splitting data on a feature:
the more a feature decreases the impurity, the more important the
feature is. This provides an ecient process to reduce the feature
dimension while preserving eective information in vibration data.
3.3.2 Modeling of Game Record and Vibration Data. We rst estab-
lish the game associations between the game record and the crowd-
induced vibration data, and then develop two dierent encoders,
for modeling of game record and vibration data, respectively.
Game Association Establishing: The game associations are
established between the game record and the active windows of
vibration data based on Section 2.3.1. With the associations, the
game progress (G) and the vibration data (X) are linked through
time, enabling game-informed crowd reaction monitoring.
Game and Vibration Data Modeling: To model the game
record and vibration data, we design two neural networks with
dierent characteristics to encode the features. First, we leverage a
game record encoder to learn the latent variables representing the
multifaceted inuence of the game progress on audience-induced
vibrations (e.g., intensity, duration) by expanding a one-dimensional
score change or game break indicator to a multi-dimensional vector.
Then, we use a vibe data encoder to learn latent representations of
the crowd reactions from vibration data. The game record encoder
is designed as a 1-layer neural network that converts each 1-d game
event into a 32-d vector, describing the process in which a single
game event has multiple aspects of inuence on the vibration data.
The vibe data encoder is designed as a 3-layer, 256-neuron wide
neural network considering the complex inter-dependency between
various selected features requires a larger number of neurons to
capture. In addition, a 40% dropout is applied to the neural network
to mitigate the overtting problem. The percentage of dropouts is
chosen based on the performance during preliminary testing on
data from one game.
3.3.3 Audience-Game Integration for Crowd Reaction Monitoring.
After modeling the game record and vibration data, we concatenate
their learned embeddings and integrate this information through an
update encoder to estimate the conditional distribution
𝑃(𝑌|𝑋 , 𝐺)
as introduced in Section 2.3.3. The encoder is a 3-layer funnel-
shaped neural network that gradually transforms the concatenated
embeddings to approximate the distribution of
𝑃(𝑌|𝑋 , 𝐺)
. The re-
sultant conditional probability is represented as a vector with the
same length as the number of audience reaction types.
3.4 Facility-informed Crowd Trac Estimation
In this section, we introduce the module for crowd trac estimation
by integrating the facility layout and vibration data to estimate the
crowd trac (i.e., headcounts at each door). The next few subsec-
tions will present each step in Figure 12.
3.4.1 Crowd Traic Feature Extraction and Data Augmentation. We
conduct feature extraction based on the oor vibration signals
to capture information related to the level of crowd trac. As
illustrated in Figure 4 of Section 2.2.1, the peaks of the oor vibration
signal represent the movements of the audience passing through
the door including walking, opening, or closing the door. Therefore,
we detect these peaks and extract features from these peaks to
estimate the crowd trac. First, we identify peaks by setting a
minimum peak distance of 1 second and selecting a threshold as the
minimum peak height. The threshold peak height for peak detection
is selected based on the correlation score between the number of
detected peaks and the actual count of people entering, which is
183
BuildSys ’23, November 15–16, 2023, Istanbul, Turkey Dong, et al.
Spatial
Proximity Matrix
Facility
Association
Facility
Layout
Vibe Data
Signal
Peak
Detection
Feature
Extraction
Vibe Data
Encoder
Crowd
Traff ic
Update
Encoder
Entry Data
Augmentation
Door
Matching
𝑃(𝑌|𝑋, 𝐹 )
Figure 12: Module for crowd trac estimation in GameVibes.
Facility associations are rst established and then modeled
by a spatial proximity matrix. Then, we integrate the spatial
proximity matrix and the vibration data through door match-
ing and an update encoder to estimate the crowd trac.
determined through a preliminary analysis of the training data.
The chosen threshold corresponds to the highest correlation score
observed during this analysis. Then, we extracted features based on
the detected peaks to relate oor vibration to capture information
related to the level of crowd trac, summarized as follows:
Peak Count Feature: the number of peaks detected, which
indicates the frequency of people interacting with the door
and stepping on the area of the oor near the sensors.
Peak Height Features: the maximum, minimum, average, and
standard deviation of the height of the detected peaks, which
describe the movement type (e.g., footsteps vs. door opening)
and how urgent the movements are.
Peak Time Dierence Features: the minimum, maximum, aver-
age, and standard deviation of the time dierences between
adjacent peaks, representing the movement frequencies.
To overcome the limited size of our dataset, we augment our
sample by merging two 10-minute windows in the original sample
to generate a new sample. This is based on the assumption that the
relationship between the crowd trac and oor vibration at each
door is not aected by time (i.e., time-invariant) because we use the
same sensor and put it at the same location throughout the game.
The data augmentation is realized by generating a new sample by
merging the features of the two windows. The output ground truth
of each augmented sample is obtained by aggregating the number
of people entering within these two windows.
3.4.2 Modeling of Facility Layout and Vibration Data. We establish
the facility associations between the facility layout and sensors at
various doors and then leverage a spatial proximity matrix and an
encoder neural network to model facility layout and vibration data,
respectively.
Facility Association Establishing: The facility associations
are established between the facility layout and the sensor at each
door based on the stadium map provided by the venue operator,
as discussed in Section 2.3.2. With these associations, the facility
layout (F) and vibration data (X) are linked through their locations.
Facility Proximity and Vibration Data Modeling: To model
the facility proximity and vibration data, we develop 1) a spatial
proximity matrix and 2) a vibration data encoder neural network,
Student
Door 6
Door 1 Door 2 Door 3
Door 4
Door 5
Southwest
Entrance
Food
Game
Swags Game
Swags
Student
Entrance
Student
Corner
Restroom Restroom
RestroomFood Southeast
Entrance
Food
Restroom
Restroom
Sen 1
Sen 5Sen 6
Sen 3
Sen 2
Sen 4
Router
Sensor
Seat Area
Figure 13: Experiment setup at Stanford Maples Pavilion
with the sensor layout (marked as red dots), router layout
(marked as green devices), facility locations at the concourse
area (described as squares of dierent colors), and 16 entry
doors connecting the concourse and the game court.
respectively. The spatial proximity matrix is a look-up table of the
weight of each facility type (represented as rows) corresponding
to the sensor at each door (represented as columns). The weight
of each facility is determined by the ratio between the maximum
walking distance to any facility from the closest door (around 20
meters in our case) and the actual distance between the facility
and the door. A higher weight means a shorter distance to the
facility, indicating a stronger association. These facilities include
food stands, game swag stations, restrooms, and game courts, as
described in Figure 13. For each facility type, a higher weight rep-
resents closer proximity to that door. On the other hand, the vibe
data encoder is a 2-layer neural network with 32 neurons due to the
smaller feature dimension than the previous module, which learns
the latent variables of the crowd trac from the vibration data.
3.4.3 Audience-Facility Integration for Crowd Traic Estimation.
With the vibration data and facility layout modeled at each door, we
match the vibration data with its door and concatenate the learned
embeddings from neural networks for facility-aware updating. The
update encoder is a 2-layer neural network that approximates the
distribution of
𝑃(𝑌|𝑋 , 𝐹 )
. The output is the number of people en-
tering each door.
4 REAL-WORLD EVALUATION AT STANFORD
MAPLES PAVILION
To evaluate GameVibes, we conduct 6 real-world deployments for
NCAA pac-12 women’s and men’s basketball games at Stanford
Maples Pavilion, producing more than 280 hours of vibration data
from 12 sensors. In this section, we rst introduce the deployment
setup, and then show the results for crowd monitoring. Furthermore,
we discuss the variables that aect crowd behaviors and results,
including game types, sensor locations, and promotional events.
184
GameVibes: Vibration-based Crowd Monitoring for Sports Games BuildSys ’23, November 15–16, 2023, Istanbul, Turkey
Door 1
Door 2
Door 3
Door 4
Door 5
Door 6
0
10
20
30
MAE (# of persons)
Baseline
Our Method
Sen 1
Sen 2
Sen 3
Sen 4
Sen 5
Sen 6
0.5
0.6
0.7
0.8
0.9
1
F-1 Score
Sen 1
Sen 2
Sen 3
Sen 4
Sen 5
Sen 6
0.5
0.6
0.7
0.8
0.9
1
F-1 Score
Baseline
Our Method
a) Crowd Reaction Prediction Results
(for Associated Data)
b) Crowd Traffic Estimation Results
(for Associated Data)
Figure 14: Our GameVibes system outperforms the baseline at
all sensor locations for a) crowd reaction monitoring (higher
F-1 score), and b) crowd trac estimation (lower MAE).
4.1 Deployment Setup
The deployment setup is shown in Figure 13, which involves two
sets of sensors: 1) six interior sensors (Sen 1-6) located underneath
the bleachers at the seating area, and 2) six exterior sensors (Door
1-6) located on the oor of selected entry doors connecting the
concourse and the game court. The sensors are installed and unin-
stalled before and after each game for battery change and functional
checking. All sensors are connected over a wireless mesh through
six routers distributed across the venue as discussed in Section 3.2.
The sampling frequency is set to 500 Hz to maximize the temporal
resolution while ensuring data transmission eciency.
The ground truths are collected through multiple sources, includ-
ing 1) a volunteer team of 6-8 people observing the crowd for each
game, 2) the EPSN website for score change over time, and 3) the
stadium management team for the facility layout. Before and after
the game, the volunteers count the number of people passing the
doors with deployed sensors every 10 minutes. During the game,
the volunteers are spread across the seating areas and record the
crowd reactions around each interior sensor. The labels include 1)
audience status - quiet, active, moving, and 2) audience reactions
- clapping, stomping, dancing, and walking. The volunteers also
record the playing period and the game breaks.
4.2 Overall Performance of GameVibes
Overall, GameVibes has a 0.9 F-1 score in crowd reaction monitor-
ing and 9.3 mean absolute error (MAE) in estimating headcounts
for crowd trac estimation among various doors. Compared to
the baseline methods without audience-game-facility association,
GameVibes has averages of 10% and 12.2% improvements, respec-
tively. The results are summarized in Figure 14. The performance
increase is mainly because GameVibes incorporates the temporal
and spatial contexts through the game/facility associations with the
vibration data. As the game progresses drive the crowd reactions
and facility layout direct the crowd trac, these contexts provide
reliable prior information for crowd monitoring.
4.2.1 Crowd Reaction Monitoring Performance. GameVibes has a
0.9 and 0.83 F-1 score in audience status and reaction classication,
respectively. To show the eectiveness of the game association, we
also compare the overall performance with the performance on
windows that have game associations, which has an average of 10%
improvement in the F-1 score. The improvement indicates that the
game context corrects multiple misclassied samples due to the
Women's
Basketball
Men's
Basketball
0
20
40
60
80
100
1-MAPE (%)
Baseline
Our Method
Women's
Basketball
Men's
Basketball
0
20
40
60
80
100
MAPE (%)
Baseline
Our Method
b) Improvement for Data
w/ Facility Associations
+9.7%
+11.4%
a) Improvement for Data
w/ Game Associations
Women's
Basketball
Men's
Basketball
0.5
0.6
0.7
0.8
0.9
1
F-1 Score
Baseline
Our Method
Women's
Basketball
Men's
Basketball
0.5
0.6
0.7
0.8
0.9
1
F-1 Score
Baseline
Our Method
+20.6%
+15.7%
Figure 15: GameVibes has up to 20.6% and 11.4% improvement
for women’s and men’s basketball games, visualized for a)
data with game associations, and b) data with facility associ-
ations.
highly uncertain data. This is because the latent representations
learned from the game record can indirectly reect the cause of
crowd reaction variations, such as the active crowd size and the
activity intensity.
4.2.2 Crowd Traic Estimation Performance. GameVibes has an
average of 9.3 MAE for crowd trac estimation, which has an
average of 12.2% improvement among all doors when compared
to the baseline method without facility association. To understand
the relative error for crowd trac estimation, we also compute the
mean absolute percentage error (MAPE) by averaging MAEs for
all the 10-min periods for all games, which is 30.6% on average for
all doors. It is worth noting that MAPE explodes when a door has
almost no one entering (which means the denominator is nearly
zero), which often happens during the rst 10 minutes of entry.
4.3 Discussion of Variables in GameVibes
In this section, we discuss the variables that aect crowd behaviors
in games, including game types, sensor locations, and promotional
events such as free food and raes.
4.3.1 Eect of Game Types. The game types aect the distribution
of crowd reaction and trac, mainly through the popularity and
intensity of the game, and the time when the game happens. Based
on our observation, women’s basketball tends to attract a larger
audience than men’s and therefore leads to a higher level of crowd
trac and more intensive crowd reactions such as stomping. More-
over, most women’s basketball games happen on weekend nights,
which further increases the crowd trac around the food stands
and amplies crowd reactions more than the games that happen in
the afternoons.
We also observe variations in GameVibes’s crowd monitoring
performance across various game types. Figure 15 shows that
GameVibes has slightly dierent performance for women’s and
men’s basketball games. The lower performance for crowd reac-
tion monitoring is due to the 2
×
more audience in women’s games,
leading to noisier signals and more frequent loss of packets during
data transmission. For crowd trac estimation, however, men’s
games have a larger percentage error. This is because the size of
the audience is smaller in men’s games, resulting in a large MAPE
as the MAE is divided by the overall smaller size of the audience
185
BuildSys ’23, November 15–16, 2023, Istanbul, Turkey Dong, et al.
00:00:00 00:30:00 01:00:00
Time after Door Opens
0
20
40
60
80
100
Cumulative Audience Entry (%)
a) Effect of Free Food
Other Doors (w/o Food)
Student Door (w/ Food)
Student Door (w/o Food)
00:00:00 00:10:00 00:20:00 00:30:00
Time after Door Opens
0
20
40
60
80
100
Cumulative Audience Exit (%)
b) Effect of Raffle
Other Doors (w/o Raffle)
Student Door (w/ Raffle)
Student Door (w/o Raffle)
Figure 16: Eect of the promotional events on crowd trac
pattern, visualized for a) free Food before the game, and
b) rale after the game at the student door. We observe an
earlier increase in cumulative audience entry (%) and a later
increase in cumulative audience exit (%) at the student door
as compared to other doors without promotional events.
entering each door during each 10-minute interval. Our GameVibes
compensates for these issues through audience-game-facility asso-
ciation modeling, leading to more balanced performance for both
types of games.
4.3.2 Eect of Sensor Locations. The sensor locations aect the
performance of crowd monitoring mainly through the vibration
data quality, which is determined by various factors, such as the
noise in the surroundings, the oor/bleacher material property, and
the level of interference during data transmission. The inconsistent
performance in baseline indicates there are discrepancies in data
quality across various sensor locations/doors (see Figure 14). In con-
trast, our method with audience-game-facility association mitigates
this issue and produces better and more consistent performance,
especially for crowd reaction monitoring.
4.3.3 Eect of Promotional Events. The promotional events include
free food, raes, and entertainment sessions to engage the audience
before and after the game. These events mainly aect the crowd
trac. For example, the student door sometimes has free food and
raes before/after the game. People tend to arrive early to the game
on the day when free food is served as observed by the high initial
increase in cumulative audience entry (%) at the student door in
Figure 16a). When there is no free food, cumulative audience entry
(%) follows a similar pattern to other doors. Further, people tend
to stay back to collect raes at the end of the game as observed
by a late increase in cumulative audience exit (%) in Figure 16b).
When there is no rae event, cumulative audience exit (%) follows
a similar pattern to other doors with most people leaving the door
as soon as the game ends. We plan to study the impact of these
promotional events and incorporate their eect on crowd trac
estimation in future work.
5 RELATED WORK
To provide a background for this study, we review the existing
literature on human/animal-induced structural vibration sensing
and crowd monitoring through other sensing modalities.
5.1 Human/Animal-Induced Structural
Vibration Sensing
The potential of using structural vibrations to infer behaviors of hu-
mans or animals has been explored in many previous studies. Our
prior work has shown promise in using the footstep-induced oor
vibrations for occupancy detection [
26
,
31
], identication [
10
,
30
],
gait health monitoring [
2
,
11
,
12
,
20
]. In addition to footsteps, vi-
brations induced by human activities can also be used for the pre-
diction and characterization of activity types and patterns [
3
,
7
,
29
].
Moreover, structural vibration sensing has been successful in ani-
mal health and activity monitoring [
6
,
9
]. These studies provide a
knowledge base on how human/animal-induced structural vibra-
tions can be used to infer their behaviors, which inspired the sensor
deployment, feature extraction, and evaluation design of this study.
5.2 Crowd Monitoring through Other Sensing
Modalities
Existing methods for crowd monitoring include manual monitoring,
wearable devices, questionnaires, videos, audio recordings, WiFi
and radio frequency signals, and so on. Manual monitoring is the
most common approach for crowd monitoring. It is ecient and
interpretable, but is labor-intensive, costly, and can be signicantly
delayed due to negligence in manual observations. Questionnaires
are used for crowd monitoring. However, this method is also time-
consuming and unable to gather timely information during the
events [
14
]. Centralized communication is introduced for immedi-
ate feedback [
32
]. While it oers timely observation, the continu-
ous messaging may intrude on attendees’ experiences. One study
utilized skin interfaces for crowd monitoring [
34
], but they are
required to be carried by each person and can be intrusive. Other
works use cameras or microphones to catch crowd behaviors [
5
,
25
].
However, these devices usually come with privacy concerns and
thus may not be allowed in many public spaces. WiFi- and radio
frequency-based devices are used to capture the body motion of
individuals [
35
]. However, they have diculty capturing the ac-
tivity among a large group of people due to noise interference
and between-person dierences, producing less accurate results.
Compared to the existing method, structural vibration sensing is
non-intrusive, wide-ranged, less sensitive to loud sounds, and is per-
ceived as more privacy-friendly, allowing continuous monitoring
of crowd behavior in large, noisy indoor spaces.
6 CONCLUSIONS AND FUTURE WORK
In this paper, we introduce GameVibes, a novel system for vibration-
based crowd monitoring. GameVibes is cost-ecient, wide-ranged,
and perceived as more privacy-friendly, which allows ubiquitous,
ne-grained crowd behavior monitoring in the public. We over-
come the challenge of the high uncertainty in crowd-induced vibra-
tions by modeling audience-game-facility associations. This allows
context-aware and more accurate estimations of crowd reaction and
trac. We evaluate GameVibes through real-world deployments for
6 NCAA sports games at Stanford Maples Pavilion and achieve 0.9
F-1 score and 9.3 MAE in crowd reaction monitoring and crowd
trac estimation, respectively.
For future work, we will rst improve the audience-game-facility
association model by incorporating uncertainties in the game progress
records and facility layout due to delayed or missing information
and variations in maps across dierent games. We will also ex-
plore larger and more complex scenarios (e.g., football games) and
consider the distribution of the audience who support each team.
186
GameVibes: Vibration-based Crowd Monitoring for Sports Games BuildSys ’23, November 15–16, 2023, Istanbul, Turkey
In addition, we will target downstream applications such as rec-
ognizing the crowd emotion, and detecting and predicting events
concerning public safety.
ACKNOWLEDGMENTS
This work was funded by the U.S. National Science Foundation
(under grant number NSF-CMMI-2026699), Cisco Systems, Inc., and
Stanford CEE Graduate Fellowship. The views and conclusions con-
tained here are those of the authors and should not be interpreted as
necessarily representing the ocial policies or endorsements, either
express or implied, of any University, Corporation, or the National
Science Foundation. Special thanks to our volunteers who have
shown remarkable generosity and dedication in recording ground
truths for the games. They are: Akash Doshi, Andrew Jensen, Akhil
Kode, Helen Zhang, Kai Kirk, Jingxiao Liu, Mo Wu, Sanat Mehta,
and Yun Ni.
REFERENCES
[1]
Ali M Al-Shaery, Shroug S Alshehri, Norah S Farooqi, and Mohamed O Khozium.
2020. In-depth survey to detect, monitor and manage crowd. IEEE Access 8 (2020),
209008–209019.
[2]
Majd Alwan, Siddharth Dalal, Steve Kell, Robin Felder, et al
.
2003. Derivation of
basic human gait characteristics from oor vibrations. In 2003 Summer Bioengi-
neering Conference, June. 25–29.
[3]
Majd Alwan, Prabhu Jude Rajendran, Steve Kell, David Mack, Siddharth Dalal,
Matt Wolfe, and Robin Felder. 2006. A smart and passive oor-vibration based
fall detector for elderly. In 2006 2nd International Conference on Information &
Communication Technologies, Vol. 1. IEEE, 1003–1007.
[4]
Maria Andersson, Joakim Rydell, and Jorgen Ahlberg. 2009. Estimation of crowd
behavior using sensor networks and sensor fusion. In 2009 12th International
Conference on Information Fusion. IEEE, 396–403.
[5]
Reza Bahmanyar, Elenora Vig, and Peter Reinartz. 2019. MRCNet: Crowd count-
ing and density map estimation in aerial and ground imagery. arXiv preprint
arXiv:1909.12743 (2019).
[6]
Amelie Bonde, Jesse R. Codling, Kanittha Naruethep, Yiwen Dong, Wachirawich
Siripaktanakon, Sripong Ariyadech, Akkarit Sangpetch, Orathai Sangpetch,
Shijia Pan, Hae Young Noh, and Pei Zhang. 2021. PigNet: Failure-Tolerant
Pig Activity Monitoring System Using Structural Vibration. In Proceedings
of the 20th International Conference on Information Processing in Sensor Net-
works (Co-Located with CPS-IoT Week 2021). ACM, Nashville TN USA, 1–13.
https://doi.org/10.1145/3412382.3458902
[7]
Amelie Bonde, Shijia Pan, Mostafa Mirshekari, Carlos Ruiz, Hae Young Noh,
and Pei Zhang. 2020. OAC: Overlapping oce activity classication through
IoT-sensed structural vibration. In 2020 IEEE/ACM Fifth International Conference
on Internet-of-Things Design and Implementation (IoTDI). IEEE, 216–222.
[8]
Maria Moitinho de Almeida and Johan von Schreeb. 2019. Human stampedes:
An updated review of current literature. Prehospital and disaster medicine 34, 1
(2019), 82–88.
[9]
Yiwen Dong, Amelie Bonde, Jesse R Codling, Adeola Bannis, Jinpu Cao, Asya
Macon, Gary Rohrer, Jeremy Miles, Sudhendu Sharma, Tami Brown-Brandl, et al
.
2023. PigSense: Structural Vibration-based Activity and Health Monitoring
System for Pigs. ACM Transactions on Sensor Networks (2023).
[10]
Yiwen Dong, Jonathon Fagert, and Hae Young Noh. 2023. Characterizing the
variability of footstep-induced structural vibrations for open-world person iden-
tication. Mechanical Systems and Signal Processing 204 (2023), 110756.
[11]
Yiwen Dong, Joanna Jiaqi Zou, Jingxiao Liu, Jonathon Fagert, Mostafa Mirshekari,
Linda Lowes, Megan Iammarino, Pei Zhang, and Hae Young Noh. 2020. MD-Vibe:
physics-informed analysis of patient-induced structural vibration data for moni-
toring gait health in individuals with muscular dystrophy. In Adjunct proceedings
of the 2020 ACM international joint conference on pervasive and ubiquitous com-
puting and proceedings of the 2020 ACM international symposium on wearable
computers. 525–531.
[12]
Jonathon Fagert, Mostafa Mirshekari, Shijia Pan, Linda Lowes, Megan Iammarino,
Pei Zhang, and Hae Young Noh. 2021. Structure-and sampling-adaptive gait
balance symmetry estimation using footstep-induced structural oor vibrations.
Journal of Engineering Mechanics 147, 2 (2021), 04020151.
[13]
Victoria Filingeri, Ken Eason, Patrick Waterson, and Roger Haslam. 2017. Factors
inuencing experience in crowds–the participant perspective. Applied ergonomics
59 (2017), 431–441.
[14]
Renee Glass. 2005. Observer response to contemporary dance. Thinking in four
dimensions: Creativity and cognition in contemporary dance (2005), 107–121.
[15]
Sabrina Haque, Muhammad Sheikh Sadi, Md Erfanul Haque Ra, Md Milon Islam,
and Md Kamrul Hasan. 2020. Real-time crowd detection to prevent stampede. In
Proceedings of International Joint Conference on Computational Intelligence: IJCCI
2018. Springer, 665–678.
[16]
Guido R. Hiertz, Dee Denteneer, Sebastian Max, Rakesh Taori, Javier Cardona,
Lars Berlemann, and Bernhard Walke. 2010. IEEE 802.11s: The WLAN Mesh
Standard. IEEE Wireless Communications 17, 1 (Feb. 2010), 104–111. https:
//doi.org/10.1109/MWC.2010.5416357
[17]
Nicholas Jarvis, John Hata, Nicholas Wayne, Vaskar Raychoudhury, and Md Os-
man Gani. 2019. Miamimapper: Crowd analysis using active and passive indoor
localization through wi- probe monitoring. In Proceedings of the 15th ACM
International Symposium on QoS and Security for Wireless and Mobile Networks.
1–10.
[18]
Wang Jingying. 2021. A survey on crowd counting methods and datasets. In
Advances in Computer, Communication and Computational Sciences: Proceedings
of IC4S 2019. Springer, 851–863.
[19]
Burak Kantarci and Hussein T Mouftah. 2014. Trustworthy sensing for public
safety in cloud-centric internet of things. IEEE Internet of Things Journal 1, 4
(2014), 360–368.
[20]
Ellis Kessler, Vijaya VN Sriram Malladi, and Pablo A Tarazaga. 2019. Vibration-
based gait analysis via instrumented buildings. International Journal of Distributed
Sensor Networks 15, 10 (2019), 1550147719881608.
[21]
Brian F Kingshott. 2014. Crowd management: Understanding attitudes and
behaviors. Journal of Applied Security Research 9, 3 (2014), 273–289.
[22]
Ven Jyn Kok,Mei Kuan Lim, and Chee Seng Chan. 2016. Crowd behavior analysis:
A review where physics meets biology. Neurocomputing 177 (2016), 342–362.
[23]
Ajay Kumar et al
.
2021. Crowd behavior monitoring and analysis in surveillance
applications: a survey. Turkish Journal of Computer and Mathematics Education
(TURCOMAT) 12, 7 (2021), 2322–2336.
[24]
Sonu Lamba and Neeta Nain. 2017. Crowd monitoring and classication: a survey.
In Advances in Computer and Computational Sciences: Proceedings of ICCCCS 2016,
Volume 1. Springer, 21–31.
[25]
Saurabh Maheshwari and Surbhi Heda. 2016. A review on crowd behavior
analysis methods for video surveillance. In Proceedings of the Second Interna-
tional Conference on Information and Communication Technology for Competitive
Strategies. 1–5.
[26]
Mostafa Mirshekari, Jonathon Fagert, Shijia Pan, Pei Zhang, and Hae Young
Noh. 2020. Step-level occupant detection across dierent structures through
footstep-induced oor vibration using model transfer. Journal of Engineering
Mechanics 146, 3 (2020), 04019137.
[27]
Mostafa Mirshekari, Shijia Pan, Jonathon Fagert, Eve M. Schooler, Pei Zhang, and
Hae Young Noh. 2018. Occupant localization using footstep-induced structural
vibration. Mechanical Systems and Signal Processing 112 (2018), 77–97. https:
//doi.org/10.1016/j.ymssp.2018.04.026
[28]
Michael S Molloy, Zane Sherif, Stan Natin, and John McDonnell. 2009. Manage-
ment of mass gatherings. Koenig and Schultz’s Disaster Medicine: Comprehensive
Principles and Practices; Schultz, CH, Koenig, KL, Eds (2009), 265–293.
[29]
Shijia Pan, Mario Berges, Juleen Rodakowski, Pei Zhang, and Hae Young Noh.
2019. Fine-grained recognition of activities of daily living through structural
vibration and electrical sensing. In Proceedings of the 6th ACM International
Conference on Systems for Energy-Ecient Buildings, Cities, and Transportation.
149–158.
[30]
Shijia Pan, Tong Yu, Mostafa Mirshekari, Jonathon Fagert, Amelie Bonde, Ole J
Mengshoel, Hae Young Noh, and Pei Zhang. 2017. Footprintid: Indoor pedestrian
identication through ambient structural vibration sensing. Proceedings of the
ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies 1, 3 (2017),
1–31.
[31]
Yves Reuland, Sai GS Pai, Slah Drira, and Ian FC Smith. 2017. Vibration-based
occupant detection using a multiple-model approach. In Dynamics of Civil Struc-
tures, Volume 2: Proceedings of the 35th IMAC, A Conference and Exposition on
Structural Dynamics 2017. Springer, 49–56.
[32]
Nikitas M Sgouros. 2000. Detection, analysis and rendering of audience reac-
tions in distributed multimedia performance. In Proceedings of the eighth ACM
international conference on Multimedia. 195–200.
[33]
Avinash Sharma, Brian McCloskey, David S Hui, Aayushi Rambia, Adam Zumla,
Tieble Traore, Shuja Sha, Sherif A El-Kafrawy, Esam I Azhar, Alimuddin Zumla,
et al
.
2023. Global mass gathering events and deaths due to crowd surge, stam-
pedes, crush and physical injuries-lessons from the Seoul Halloween and other
disasters. Travel medicine and infectious disease 52 (2023).
[34]
Catherine J Stevens, Emery Schubert, Rua Haszard Morris, Matt Frear, Johnson
Chen, Sue Healey, Colin Schoknecht, and Stephen Hansen. 2009. Cognition
and the temporal arts: Investigating audience response to dance using PDAs
that record continuous data during live performance. International Journal of
Human-Computer Studies 67, 9 (2009), 800–813.
[35]
Mohammad Yamin, Abdullah M Basahel, and Adnan A Abi Sen. 2018. Managing
crowds with wireless and mobile technologies. Wireless Communications and
Mobile Computing 2018 (2018).
187
BuildSys ’23, November 15–16, 2023, Istanbul, Turkey Dong, et al.
[36]
Gde Yogadhita and Widiana Agustin. 2023. Football Stampede in Kanjuruhan
Stadium from the Perspective of Disaster Preparedness on Mass Casualty Incident:
A Case Study of Mass Gathering Event. Prehospital and Disaster Medicine 38, S1
(2023), s78–s79.
[37]
Kathryn M Zeitz, Heather M Tan, M Grief, PC Couns, and Christopher J Zeitz.
2009. Crowd behavior at mass gatherings: a literature review. Prehospital and
disaster medicine 24, 1 (2009), 32–38.
[38]
Guijuan Zhang, Dianjie Lu, and Hong Liu. 2018. Strategies to utilize the posi-
tive emotional contagion optimally in crowd evacuation. IEEE Transactions on
Aective Computing 11, 4 (2018), 708–721.
188
Article
Full-text available
Precision Swine Farming has the potential to directly benefit swine health and industry profit by automatically monitoring the growth and health of pigs. We introduce the first system to use structural vibration to track animals, and the first system for automated characterization of piglet group activities, including nursing, sleeping, and active times. PigSense uses physical knowledge of the structural vibration characteristics caused by pig-activity-induced load changes to recognize different behaviors of the sow and piglets. In order for our system to survive the harsh environment of the farrowing pen for three months, we designed simple, durable sensors for physical fault tolerance, then installed many of them, pooling their data to achieve algorithmic fault tolerance even when some do stop working. The key focus of this work was to create a robust system that can withstand challenging environments, has limited installation and maintenance requirements, and uses domain knowledge to precisely detect a variety of swine activities in noisy conditions while remaining flexible enough to adapt to future activities and applications. We provided an extensive analysis and evaluation of all-round swine activities and scenarios from our 1-year field deployment across two pig farms in Thailand and the USA. To help assess the risk of crushing, farrowing sicknesses, and poor maternal behaviors, PigSense achieves an average of 97.8% and 94% for sow posture and motion monitoring, respectively, and an average of 96% and 71% for ingestion and excretion detection. To help farmers monitor piglet feeding, starvation, and illness, PigSense achieves an average of 87.7%, 89.4%, and 81.9% in predicting different levels of nursing, sleeping, and being active, respectively. In addition, we show that our monitoring of signal energy changes allows the prediction of farrowing in advance, as well as status tracking during the farrowing process and on the occasion of farrowing issues. Furthermore, PigSense also predicts the daily pattern and weight gain in the lactation cycle with 89% accuracy, a metric that can be used to monitor the piglets’ growth progress over the lactation cycle.
Article
Full-text available
Crowd management is a flourishing, active research area and must be given attention due to the potential losses, disasters, and accidents that could occur if it were neglected. For the last decade, the crowd management field has witnessed significant advancements; however, more investigative work is still needed. The integration of different crowd detection and monitoring techniques can enhance the control and the performance compared to those of more limited stand-alone techniques. Crowd management encompasses an entire process, from the monitoring stage through the decision support system stage. This sector involves accessing and interpreting information sources, predicting crowd behavior, and deciding on the use of a range of possible interventions based on context. This paper shows a fresh conclusive review of the concept of the crowd, discussing it from several perspectives in light of its defining characteristics, its risks, and tragedies, which may occur due to challenges faced during crowd management, where these conclusions are based on a massive number of scholarly articles that were newly published. Besides, a systematic discussion is shown concerning the steps of managing a crowd, including crowd detection, in which several new methods are reviewed, followed by illustrating both direct and indirect approaches to crowd monitoring and tracking monitoring. The primary purpose of this review is to establish a comprehensive understanding of crowd-related processes. Moreover, it aims to find research gaps to overcome the limitations of using stand-alone techniques in each process and provide support to other researchers' future work. INDEX TERMS Crowd behavior, crowd simulation, thermography, RFID, spatiotemporal.
Conference Paper
Full-text available
We introduce a footstep-induced floor vibration sensing system that enables us to quantify the gait pattern of individuals with Muscular Dystrophy (MD) in non-clinical settings. MD is a neuromuscular disorder causing progressive loss of muscle, which leads to symptoms in gait patterns such as toe-walking, frequent falls, balance difficulty, etc. Existing systems that are used for progressive tracking include pressure mats, wearable devices, or direct observation by healthcare professionals. However, they are limited by operational requirements including dense deployment, users' device carrying, special training, etc. To overcome these limitations, we introduce a new approach that senses floor vibrations induced by human footsteps. Gait symptoms in these footsteps are reflected by the vibration signals, which enables monitoring of gait health for individuals with MD. Our approach is non-intrusive, unrestricted by line-of-sight, and thus suitable for in-home deployment. To develop our approach, we characterize the gait pattern of individuals with MD using vibration signals, and infer the health state of the patients based on both symptom-based and signal-based features. However, there are two main challenges: 1) different aspects of human gaits are mixed up in footstep-induced floor vibrations; and 2) structural heterogeneity distorts vibration propagation and attenuation through the floor medium. To overcome the first challenge, we characterize the symptom-based gait features of the footstep-induced floor vibration specific to MD. To minimize the performance inconsistency across different sensing locations in the building, we reduce the structural effects by removing the free-vibration phase due to structural damping. With these two challenges addressed, we evaluate our system performance by conducting a real-world experiment with six patients with MD and seven healthy participants. Our approach achieved 96% accuracy in predicting whether the footstep was from a patient with MD.
Article
Person identification is important in providing personalized services in smart buildings. Many existing studies focus on closed-world person identification, which only identifies a fixed group of people who have training data; however, they assume everyone has pre-collected data, which is not practical in real-world scenarios when newcomers are present. To overcome this drawback, open-world person identification recognizes both newcomers and registered people, which opens up new opportunities for smart building applications that involve newcomers, such as smart visitor management, customized retail, personalized health monitoring, and public emergency assistance. To achieve this, structural vibration sensing has various advantages when compared with the existing sensing modalities (e.g., cameras, wearables, and pressure sensors) because it only needs sparsely deployed sensors mounted on the floor, does not require people to carry devices, and is perceived as more privacy-friendly. However, one fundamental challenge in analyzing footstep-induced structural vibration data is its high variability due to the structural heterogeneity and the footstep variations. Therefore, it is difficult to distinguish different people given this high variability within each person, and it is more challenging to recognize a new person as that data is unobserved before. In this paper, we characterize the variability in footstep-induced structural vibration to develop an open-world person identification framework. Specifically, we address three variability challenges in developing our method. First, the high variability within each person comes from multiple sources that are entangled in the vibration signals, and thus is difficult to be decomposed and reduced. Secondly, the distribution of features extracted from the vibration signals is irregularly shaped, and therefore is difficult to model. Moreover, the identity of the next person is correlated with the previous observations, which makes the identification process more complicated. To overcome these challenges, we first characterize multiple variability sources and design a transformation function that results in signal features that are less variable within one person and more separable between different people. We then develop a modified Chinese Restaurant Process (mCRP) for nonparametric Bayesian modeling to capture the irregularly shaped feature patterns both from local and global perspectives. Finally, we design an adaptive hyperparameter that represents the prior probability of newcomers at each observation, which keeps updating depending on the time, location, and previous predictions. We evaluate our approach through walking experiments with 20 people across 2 different structures. With only 1 pre-recorded person at each structure, our method achieves up to 92.3% average accuracy with randomly appearing newcomers.
Article
Introduction: The lack of planning and coordination by the mass gathering event organizers involving other stakeholders, especially from the health sector, caused mass casualty incidents which could not be managed in a timely manner and resulted in many victims. This was worsened by the fact that the nearest health facilities to the mass gathering event did not have a disaster management plan such as a hospital disaster preparedness plan which, if any, was not operational. No firm regulation forced, monitored, and evaluated the necessity of high-risk mass gathering events to have such a preparedness plan yet in Indonesia. Method: Using a case study qualitative research method by conducting media observations and listening to webinars on experiences with health workers involved in handling the social disaster of the Kanjuruhan tragedy. Supported by analysis of policy reviews and in-depth interviews with the involved stakeholders on the field. Results: This is ongoing research, the results have not been finalized. However, from the information that has been obtained so far, it can be concluded that there is no synergy between the plans prepared by the football match organizing committee, police, local government, and nearest referral health facilities. This was identified by the absence of a medical director at the referral health facility, the absence of in and out access for the medical team to the mass gathering event location, and the absence of crowd management at the site of the incident resulted in 720 injured and 135 of them dead. This made the incident the second worst football stampede incident in history. Conclusion: Specific mass gathering regulation specific to football matches is required as Indonesia has a risk of hooliganism in some areas. This will be mandatory for the organizing committee to comply with and involve relevant stakeholders, especially the local health sector.
Chapter
Recent successful works of crowding counting are introduced. We summarize several classic achievements of traditional methods: regression methods, detection methods, and density map estimation methods. Some CNN models are categorized according to its function and structure. Specially, we discuss some problems have solved by CNN models like different scale, different background, and lack of label. CNN methods rely highly on the dataset, so several classic and popular datasets and some newly released dataset are presented. At last, we recognized probably the most convincing difficulties and issues which are investigated in crowd counting and density estimation utilizing computer vision and machine learning methods.
Article
This paper presents a floor-vibration-based step-level occupant-detection approach that enables detection across different structures through model transfer. Detecting the occupants through detecting their footsteps (i.e., step-level occupant detection) is useful in various smart building applications such as senior/healthcare and energy management. Current sensing approaches (e.g., vision-based, pressure-based, radio frequency–based, and mobile-based) for step-level occupant detection are limited due to installation and maintenance requirements such as dense deployment and requiring the occupants to carry a device. To overcome these requirements, previous research used ambient structural vibration sensing for footstep modeling and step-level occupant detection together with supervised learning to train a footstep model to distinguish footsteps from nonfootsteps using a set of labeled data. However, floor-vibration-based footstep models are influenced by the structural properties, which may vary from structure to structure. Consequently, a footstep model in one structure does not accurately capture the responses in another structure, which leads to high detection errors and the costly need for acquiring labeled data in every structure. To address this challenge, the effect of the structure on the footstep-induced floor vibration responses is here characterized to develop a physics-driven model transfer approach that enables step-level occupant detection across structures. Specifically, the proposed model transfer approach projects the data into a feature space in which the structural effects are minimized. By minimizing the structure effect in this projected feature space, the footstep models mainly represent the differences in the excitation types and therefore are transferable across structures. To this end, it is analytically shown that the structural effects are correlated to the maximum-mean-discrepancy (MMD) distance between the source and target marginal data distributions. Therefore, to reduce the structural effect, the MMD between the distributions in the source and target structures is minimized. The robustness of the proposed approach was evaluated through field experiments in three types of structures. The evaluation consists of training a footstep model in a set of structures and testing it in a different structure. Across the three structures, the evaluation results show footstep detection F1 score of up to 99% for the proposed approach, corresponding to a 29-fold improvement compared to the baseline approach, which do not transfer the model.