Conference PaperPDF Available

Privacy Implications of Room Climate Data

Authors:

Abstract and Figures

Smart heating applications promise to increase energy efficiency and comfort by collecting and processing room climate data. While it has been suspected that the sensed data may leak crucial personal information about the occupants , this belief has up until now not been supported by evidence. In this work, we investigate privacy risks arising from the collection of room climate measurements. We assume that an attacker has access to the most basic measurements only: temperature and relative humidity. We train machine learning classifiers to predict the presence and actions of room occupants. On data that was collected at three different locations, we show that occupancy can be detected with up to 93.5% accuracy. Moreover, the four actions reading, working on a PC, standing, and walking, can be discriminated with up to 56.8% accuracy, which is also far better than guessing (25%). Constraining the set of actions allows to achieve even higher prediction rates. For example, we discriminate standing and walking occupants with 95.1% accuracy. Our results provide evidence that even the leakage of such 'inconspicuous' data as temperature and relative humidity can seriously violate privacy.
Content may be subject to copyright.
Privacy Implications of Room Climate Data*
Philipp Morgner1, Christian M¨
uller2, Matthias Ring1, Bj¨
orn Eskofier1,
Christian Riess1, Frederik Armknecht2, and Zinaida Benenson1
1Friedrich-Alexander-Universit¨
at Erlangen-N¨
urnberg, Germany
{philipp.morgner, matthias.ring, bjoern.eskofier,
christian.riess, zinaida.benenson}@fau.de
2University of Mannheim, Germany
{christian.mueller, armknecht}@uni-mannheim.de
Abstract. Smart heating applications promise to increase energy eciency and
comfort by collecting and processing room climate data. While it has been sus-
pected that the sensed data may leak crucial personal information about the oc-
cupants, this belief has up until now not been supported by evidence.
In this work, we investigate privacy risks arising from the collection of room
climate measurements. We assume that an attacker has access to the most basic
measurements only: temperature and relative humidity. We train machine learn-
ing classifiers to predict the presence and actions of room occupants. On data that
was collected at three dierent locations, we show that occupancy can be detected
with up to 93.5% accuracy. Moreover, the four actions reading, working on a PC,
standing, and walking, can be discriminated with up to 56.8% accuracy, which
is also far better than guessing (25%). Constraining the set of actions allows to
achieve even higher prediction rates. For example, we discriminate standing and
walking occupants with 95.1% accuracy. Our results provide evidence that even
the leakage of such ‘inconspicuous’ data as temperature and relative humidity
can seriously violate privacy.
1 Introduction
The vision of the Internet of Things (IoT) is to enhance work processes, energy e-
ciency, and living comfort by interconnecting actuators, mobile devices and sensors.
These networks of embedded technologies enable applications such as smart heating,
home automation, and smart metering, among many others. Sensors are of crucial im-
portance in these applications. Data gathered by sensors is used to represent the current
state of the environment, for instance in smart heating, sensors measure the room cli-
mate. Using these information and a user-defined configuration of the targeted state of
room climate, the application regulates heating, ventilation, and air conditioning.
While the collection of room climate data is obviously essential to enable smart
heating, it may at the same time impose the risk of privacy violations. Consequently, it is
commonly believed among security experts that leaking room climate data may result in
privacy violations and hence that the data needs to be cryptographically protected [39,
*Authors’ version of the paper published in the Proceedings of the 22nd European Symposium
on Research in Computer Security (ESORICS 2017). DOI: 10.1007/978-3-319-66399-9 18
12, 4]. However, these claims have not been supported by scientific evidence so far.
Thus, one could question whether in practice additional eort for protecting the data
would be justified.
The current situation with room climate data is comparable to the area of smart
metering [18, 44, 24, 27]. In 1989, Hart [18] was the first to draw attention to the fact
that smart metering appliances can be exploited as surveillance devices. Since then,
research has shown far-reaching privacy violations through fine-granular power con-
sumption monitoring, ranging from occupancy and everyday activities detection [31]
up to recognizing which program a TV was displaying [14].
Various techniques have been proposed over the years to mitigate privacy risks of
smart metering [3, 37, 25, 44, 36]. This issue has become such a grave concern that the
German Federal Oce for Information Security published a protection profile for smart
meters in 2014 [2]. By considering privacy implications of smart heating, we hope to
initiate consumer protection research and policy debate in this area, analogous to the
developments in smart metering described above.
Research Questions. In this work, we are the first to investigate room climate data
from the perspective of possible privacy violations. More precisely, we address the fol-
lowing research questions:
Occupancy detection: Can an attacker determine the presence of a person in a room
using only room climate data, i.e., temperature and relative humidity?
Activity recognition: Can an attacker recognize activities of the occupant in the
room using only the temperature and relative humidity data?
Our threat scenario targets buildings with multiple rooms that are similar in size,
layout, furnishing, and positions of the sensors. These properties are typical for oce
buildings, dormitories, cruise ships, and hotels, among others. Assuming that an at-
tacker is able to train a classifier that recognizes pre-defined activities, possible privacy
violations are, e.g., tracking presence and working practices of employees in oces, or
the disclosure of lifestyle and intimate activities in private spaces. All these situations
present intrusions in the privacy of the occupants. In contrast to surveillance cameras
and motion sensors, the occupant does not expect to be monitored. Also, legal restric-
tions regarding privacy might apply to surveillance cameras and motion sensors but not
to room climate sensors.
Experiments. To evaluate these threats, we present experiments that consider occu-
pancy detection and activity recognition based on the analysis of room climate data
from a privacy perspective. We measured room climate data in three oce-like rooms
and distinguished between the activities reading, standing, walking, and working on
a laptop. Although we assume that in smart heating applications, only one sensor per
room is most likely to be installed, each room was equipped with several sensors in
order to evaluate dierent positions of sensors in the room. These sensors measured
temperature and relative humidity at a regular time interval of a few seconds. In our
procedure, an occupant performed a pre-defined sequence of tasks in the experimental
space. In sum, we collected almost 90 hours of room climate sensor data from a total
of 36 participants. The collected room climate data was analyzed using an o-the-shelf
machine learning classification algorithm. To reflect realistic settings, we only evaluated
data of a single sensor and did not apply sensor fusion.
Results. Evaluating our collected room climate data, the attacker detects presence of
a person with detection rates up to 93.5% depending on location and the sensor posi-
tion, which is significantly higher than guessing (50%). The attacker can distinguish
between four activities (reading, standing, walking, and working on a laptop) with de-
tection rates up to 56.8%, which is also significantly better than guessing (25%). We
can also distinguish between three activities (sitting, standing and walking) with detec-
tion rates up to 81.0%, as opposed to 33.3% if guessing. Furthermore, we distinguish
between standing and walking with detection rates up to 95.1%. Thus, we show that
the fears of privacy violation by leaking room climate data are well justified. Further-
more, we analyze the influence of the room size, positions of the sensor, and amount
of the measured sensor data on the accuracy. In summary, we provide the first steps in
verifying the common belief that room climate data leaks privacy-sensitive information.
Outline. The remainder of this paper is organized as follows. In Section 2, we give an
overview of related work. Section 3 presents the threat model considered in this work.
In Section 4, we introduce the experimental design and methods. The results of our
experiments are presented and discussed in Sections 5 and 6, respectively. We draw
conclusions in Section 7. Additional information regarding the experimental procedure
can be found in Appendix A.
2 Related Work
Over the last decade, several experiments have been conducted to detect occupancy in
sensor-equipped spaces and to recognize people’s activities as summarized in Table 1.
Activity recognition has been considered for basic activities, such as leaving or arriv-
ing at home, or sleeping [29], as well as for more detailed views, including toileting,
showering and eating [41].
Most of the previous research uses types of sensors that are dierent from tempera-
ture and relative humidity. For example, CO2represents a useful source for occupancy
detection and estimation [43]. Additionally, sensors detecting motion based on passive
infrared (PIR) [1, 6, 15, 28, 17, 46], sound [11, 15], barometric pressure [30], and door
switches [8, 9, 45] are utilized for occupancy estimation. For evaluation, dierent ma-
chine learning techniques are used, e. g., HMM [43], ARHMM [17], ANN [11], and
decision trees [15, 45].
In contrast to previous work, our results rely exclusively on temperature and relative
humidity. Previously published experimental results involved other or additional types
of sensors, such as CO2, acoustics, motion, or lighting (the latter three are referred to as
AML in Table 1), door switches or states of appliances (also gathered with the help of
switches), such as water taps or WC flushes. For this reason, our detection results are
also not directly comparable to these works.
Table 1: Overview of
previous experiments
on occupancy detec-
tion (D), occupancy
estimation (E), which
aims at determining
the number of people
in a room, and activity
recognition (A) with
a focus on selected
sensors; AML denotes
acoustic, motion, and
lighting sensors.
Work
Target
Rel. Humidity
Temperature
CO2
Ventilation
AML
Switches
van Kasteren et al., 2008 [41] A   
Lam et al., 2009 [28] E   
Dong et al., 2010 [6] E   
Lu et al., 2010 [29] A   
Hailemariam et al., ’2011 [15] D   
Han et al., 2012 [17] E   
Zhang et al., 2012 [46] E   
Ekwevugbe et al., 2013 [11] E   
Ebadat et al., 2013 [8] E   
Ai et al., 2014 [1] E   
W¨
orner et al., 2014 [43] D   
Yang et al., 2014 [45] D/E   
Masood et al., 2015 [30] E   
Ebadat et al., 2015 [9] E   
This work D/A   
3 Threat Model
The overall goal of our work is to understand the potential privacy implications if room
climate data is accessed by an attacker. The goal of the attacker is to gain informa-
tion about the state of occupancy as well as the activity of the occupants without their
consent.
Obviously, the more information an attacker can gather, the more likely she can
deduce privacy-harming information from the measurements. Therefore, we base our
analysis on the attacker model that considers a room climate system where only one
sensor node is used to derive information. This is a realistic scenario since usually one
sensor node per room is sucient to monitor the room climate. Moreover, we assume
that this sensor node takes only the two most basic measurements, temperature and
relative humidity. These data are the fundamental properties to describe room climate.
Note that our restricted data is in contrast to existing work (cf. Table 1 and Section 2)
that based their experiments on more types of measurements or used data that is less
common to characterize room climate.
We consider a sensor system that measures the climate of a room, denoted as target
location. At the target location, a temperature and relative humidity sensor is installed
that reports the measured values in regular intervals to a central database. We consider
an attacker model where the attacker has access to this database and aims to derive in-
formation about the occupants at the target location. Furthermore, we assume that the
attacker has access to either the target location itself, or a room similar in size, layout,
sensor positions, and furniture. Such situations are given, for example, at oce build-
ings, hotels, cruise ships, and student dormitories. This location, denoted as training
location, is used to train the classifier, which is a machine learning algorithm learning
the input data labeled with the groundtruth. As the attacker has full control over the
training location, she can freely choose what actions are taking place during the mea-
surements. For example, she could do measurements while no persons are present at the
training location, or one person is present and executes a predefined activity.
There are various scenarios, in which an attacker has incentives to collect and an-
alyze room climate data. For example, the management of a company aims at observ-
ing the presence and working practices of employees in the oces. In another case,
a provider of private spaces (hotels, dormitories, etc.) wants to disclose lifestyle and
intimate activities in these spaces. This information may be utilized for targeted adver-
tising or sold to insurance companies. In any case, the evaluation of room climate data
provides the attacker with the possibility to undermine the privacy of the occupants.
The procedure of these attacks is as follows: First, the attacker collects training data
at a training location, which might be the target location or another room similar in
size, layout, sensor positions, and furniture. The attacker also records the groundtruth
for all events that shall be distinguished. Examples of events are occupancy and non-
occupancy, or dierent activities such as working, walking, and sleeping. The training
data is recorded with a sample rate of a few seconds and split into windows (i.e., a
temperature curve and a relative humidity curve) of same time lengths, usually one to
three minutes. Using the collected training data, the attacker trains a machine learning
classifier. After the classifier is trained, it can be used to classify windows of climate
data from the target location to determine the events. The classifier works on previously
collected data, thus reconstructing past events, and also on live-recorded data, thus de-
termining current events “on-the-fly” at the target location.
4 Experimental Design and Methods
We conducted a study to investigate the feasibility of detecting occupancy and inferring
activities in an oce environment from temperature and relative humidity: From March
to April 2016, we performed experiments at two locations simultaneously, Location A
and Location B, with a distance of approximately 200 km between them. In addition,
from January to February 2017, we conducted further experiments at a third location,
denoted as Location C, which is located in the same building as Location B.
4.1 Experimental Setup and Tasks
The experimental spaces at the three locations are dierent in size, layout, and positions
of the sensors. Thus, each target location is also the training location in our study. At
Location A, the room has a floor area of 16.5 m2and was equipped with room climate
sensors at four positions as shown in Figure 1ii. At Location B, the room has a floor
area of 30.8 m2, i. e., roughly twice as much as at Location A, and had room climate
sensors installed at three positions as illustrated in Figure 1i. Location C has a floor
area of 13.9 m2and was equipped with room climate sensors at five positions as shown
in Figure 1iii. In all locations, the room climate sensors measured temperature and
relative humidity. The number of deployed sensors varied due to limitations of hardware
availability.
Cup-
board DeskDesk
Table
Desk
DeskDesk
B2
B1 B3
h: 1.2m
h: 2.8m h: 2.2m
N
Windows
Door
1m
(i) Location B (30.8 m2)
Desk
Desk
Shelf
Pillar
Desk
h: 0.91m
A1 h: 2.88m
h: 2.25mA3
h: 1.18m
A4
A2
N
Windows
Door
1m
(ii) Location A (16.5 m2)
Shelf
h: 1.14m
C3
h: 2.87m
h: 1.83m
N
Windows
Door C1
Shelf
C5
h: 1.65m
Desk
C4
1m
h: 1.83m
C2
(iii) Location C (13.9 m2)
Fig. 1: Floor plans of the experiment spaces including sensor node locations, h indicates
the node’s height.
Our goal was to determine to which extent the presence and activities of an occu-
pant influences the room climate data. Therefore, we measured temperature and relative
humidity during phases of absence as well as phases of occupants’ presence. If an occu-
pant was present, this person had to perform one task or a sequence of tasks. We defined
the following experimental tasks (see also Figure 2):
Read Sit on an oce chair next to a desk and read.
Stand Stand in the middle of the room, try to avoid movements.
Walk Walk slowly and randomly through the room.
Work Sit on an oce chair next to a desk and use a laptop, which is located on
the desk.
(i) Read (ii) Stand (iii) Walk (iv) Work
Fig. 2: The defined tasks performed by participants at Location A.
To eliminate confounding factors, we defined location default settings applying to
all locations. Essentially, all windows were required to remain closed and no person was
allowed in the room when not in use for the experiment. The rooms have radiators for
heating, which were adjusted to a constant level. At Location A and B, we used shutters
fixed in such positions that enough light was provided for reading and working.
4.2 Sensor Data Collection
We used a homogeneous hardware and software setup at all locations for data collection,
which is described in the following.
Hardware. At each location, we set up a sensor network consisting of several Moteiv
Tmote Sky sensor nodes with an integrated IEEE 802.15.4-compliant radio [32] as well
as an integrated temperature and relative humidity sensor. The nodes have the Contiki
operating system [7] version 2.7 installed. In addition, we deployed a webcam that took
pictures in a 3-second interval at Location A. These were used for verification during
the data collection phase only, and were not given to the classification algorithms.
Software. For sensor data collection, we customized the Collect-View application in-
cluded in Contiki 2.7, which provides a graphical user interface to manage the sen-
sor network. For our purposes, we implemented an additional control panel oering a
customized logging system. The measurement settings of the Collect-View application
were set to a report interval of 4seconds with a variance of 1second, i. e., each sen-
sor node reported its current values in a time interval of 4±1seconds. The variance
is a feature provided by Collect-View to decrease the risk of packet collisions during
over-the-air transmissions.
Collected Data. We structured data collection in units and aimed for a good balance
between presence and absence as well as the dierent tasks among all units, as this is
needed for the later analysis using machine learning. Each unit has a fixed time duration,
t, where exactly one person was present (t{10,30,60}, in minutes) who executed
predefined activities. If the presence time was tminutes, then the absence time before
and after it, respectively, was determined as t
2+5minutes, where 5minutes served
as buer. This accounts for both, the equal distribution of presence time and absence
time, respectively, and the fact that temperature and humidity settle within a 15-minute
period after the 60-minute presence of one person. For a detailed description of the
experimental procedure, we refer to Appendix A.1.
Overall, we collected almost 90 hours of sensor data, 40 hours of which with a
person being present. A more extensive overview of the amount of measured sensor data
is shown in Table 2. To encourage replication and further investigations, all collected
sensor data is available as open data sets on GitHub.3
Table 2: Measured sen-
sor data of all locations
(in hours)
Variable Value Recorded Time [h]
Location A Location B Location C
Occupancy no 20:38:26 15:21:00 13:21:42
yes 14:41:56 11:33:06 13:44:29
Task
Read 4:46:13 2:56:44 3:19:47
Stand 2:45:27 2:34:20 3:28:27
Walk 2:43:53 2:37:12 3:20:05
Work 4:03:33 3:00:20 3:20:52
4.3 Participants and Ethical Principles
For participating in the experiment, 14 subjects volunteered at Location A, 12 subjects
at Location B, and 10 subjects at Location C as shown in Table 3. Demographic data of
participants was collected in order to facilitate replication and future experiments. All
subjects provided written informed consent after the study protocol was approved by
the data protection oce.4We assigned each participant to a random ID. All collected
sensor data as well as the demographic data is only linked to this ID.
4.4 Classifier Design
We used classification to predict occupancy and activities in the rooms. We adopt an
approach that has successfully been used in several applications of biosignal process-
ing, namely extraction of a number of statistical descriptors with subsequent feature
selection [26, 21].
The features use measurements from short time windows. We experimented with
windows of dierent lengths, namely 60 s, 90 s, 120 s, 150 s, and 180 s. The oset be-
tween two consecutive windows was set to 30 s. We excluded all windows where only
a part of the measurements belongs to the same activity.
The feature set was composed from a number of statistical descriptors that were
computed on temperature and humidity measurements within these windows. These
3https://github.com/IoTsec/Room-Climate-Datasets
4Ethical review boards at both locations only consider medical experiments.
Characteristic Location
ABC
Gender f: 3 2 5
m: 11 10 5
Weight [kg] µ: 74.9 81.7 63.1
σ: 8.0 12.1 10.0
Height [cm] µ: 175.9 178.4 170.7
σ: 9.2 5.3 9.3
Age µ: 33.7 30.3 25.6
σ: 8.2 4.8 2.8
Table 3: Demographic data of partici-
pants, µdenotes the average, σdenotes
the standard deviation.
are mean value, variance, skewness, kurtosis, number of nonzero values, entropy, dif-
ference between maximum and minimum value of the window (i.e., value range), cor-
relation between temperature and humidity, and mean and slope of the regression line
for the measurement window before the current window. Additionally, we subtracted
from the measurements their least-square linear regression line, and computed all of
the listed statistics on the subtraction residuals. Feature selection was performed using
a sequential forward search [42, Ch. 7.1 & 11.8], with an inner leave-one-subject-out
cross-validation [19, Ch. 7] to determine the performance of each feature set. For classi-
fication, we used the Na¨
ıve Bayes classifier. To avoid a bias in the results, we randomly
selected identical numbers of windows per class for training, validation and testing. For
implementation, we used the ECST software [38], which wraps the WEKA library [16].
As performance measures, we use accuracy (i. e., the number of correctly classified
windows divided by the number of all windows), and per-class sensitivity (i. e., the
number of correctly classified windows for a specific class divided by the number of all
windows of this class). Classification accuracy was deemed statistically significant if it
was significantly higher than random guessing which is the best choice if the classifier
could not learn any useful information during training. For each experiment, a binomial
test with significance level p < 0.01 was carried out using the R software [34].
Note that neither the features nor the rather simple Na¨
ıve Bayes classifier are par-
ticularly tailored to predicting privacy leaks. However, we show that also such an un-
optimized system is able to correctly predict occupancy and action types and hence
produce privacy leaks. Higher detection rates results can be expected if more advanced
classifiers are applied to this task.
5 Results
In this section, we present the experimental results. First, a visual inspection of the col-
lected data is presented, followed by the machine learning-aided occupancy detection
and activity recognition.
0 600 1200 1800 2400 3000 3600 4200 4800 5400 6000 6600 7200 7800 8400
time (s)
20.6
20.7
20.8
20.9
temperature (°C)
Temperature (Sensor A1)
Stand Read Walk Read Walk
0 600 1200 1800 2400 3000 3600 4200 4800 5400 6000 6600 7200 7800 8400
time (s)
44
45
46
47
humidity (%)
Relative Humidity (Sensor A1)
Stand Read Walk Read Walk
(i) Occupant is present for 60 minutes at Location A.
0 600 1200 1800 2400 3000 3600 4200 4800 5400 6000 6600 7200 7800 8400
time (s)
23
23.2
23.4
temperature (°C)
Temperature (Sensor B2)
Stand Work Walk
0 600 1200 1800 2400 3000 3600 4200 4800 5400 6000 6600 7200 7800 8400
time (s)
48
50
52
humidity (%)
Relative Humidity (Sensor B2)
Stand Work Walk
(ii) Occupant is present for 60 minutes at Location B.
Fig. 3: Visualization of two examples of room climate measurements. The grey back-
ground indicates the presence of the occupant in the experimental space.
5.1 Visual Inspection
We started our evaluation by analyzing the raw sensor data. Hence, we implemented
a visualization script in MATLAB, which plots this data. The visualizations of two
measurements are exemplarily depicted in Figure 3.
The visualizations show an immediate rise of the temperature and humidity as soon
as an occupant enters the room. Furthermore, variations in temperature and humidity
increase rapidly and can be clearly seen. Thus, one can visually distinguish between
phases of occupancy and non-occupancy. One can also notice dierent patterns during
the performance of the tasks. As Figure 3i shows, an occupant walking in the experi-
mental space causes a constant increase of temperature and humidity with only small
variations. In contrast, an occupant standing in the room causes the largest variations of
humidity compared to the other defined tasks (cf. Figure 3ii). The eects of the tasks
reading and working on temperature and humidity in the depicted figures are very sim-
ilar: both variables tend to increase showing medium variations. For further analysis of
the data, we used machine learning as outlined in Section 4.4.
5.2 Occupancy Detection
Occupancy detection describes the binary detection of occupants in the experimental
space based on features from windows with length of 180 seconds (cf. Section 4.4).
This is a two-class task, namely to distinguish whether an occupant is present (true)
or not (false). We only considered training and testing data within the same room (but
separated training and testing both by the days and participants of the acquisition). We
randomly selected the same number of positive and negative cases from the data. Thus,
simply guessing the state has a success probability of 50%. However, our classification
results are considerably higher than that. Table 4 shows that the highest accuracies
per location were 93.5% (Location A), 88.5% (Location B), and 91.0% (Location C).
Considering all sensors of all three locations, detection accuracy ranges between 66.8%
(Sensor B3) and 93.5% (Sensor A1) as shown in Figure 4i. All classification accuracies
were statistically significantly dierent from random guessing. This indicates that an
attacker can reveal the presence of occupants in a target location with a high probability.
Scenario Sensor Sensitivity [%] Guess Acc.
Occup. No Occup. [%] [%]
Occupancy
A1 94.1 93.0 50.0 93.5
A2 94.5 85.0 50.0 89.7
A3 92.0 76.4 50.0 84.2
A4 77.8 79.1 50.0 78.4
B1 91.9 85.1 50.0 88.5
B2 85.3 77.2 50.0 81.3
B3 69.7 63.9 50.0 66.8
C1 92.9 89.2 50.0 91.0
C2 89.9 87.4 50.0 88.6
C3 90.0 82.0 50.0 86.0
C4 89.8 87.6 50.0 88.7
C5 92.5 88.8 50.0 90.7
Table 4: Classification ac-
curacy for occupancy de-
tection. Notations: ‘Oc-
cup.’, sensitivity for class
occupancy. ‘No Occup.’,
sensitivity for class no oc-
cupancy. ‘Guess’, proba-
bility of correct guessing.
Acc.’, classification accu-
racy.
5.3 Activity Recognition
Activity recognition reports the current activity of an occupant in the experimental
space. The four activity tasks are described in Section 4.1. The recognition results for
these tasks are shown in Figure 4.
Activity4 classifies between the activities Read,Stand,Walk,Work. As shown in
Figure 4ii, the accuracy of recognizing activities achieved by the machine learning
pipeline ranged from 23.9% (Sensor C1) to 56.8% (Sensor A1). Overall, the accu-
racy of Activity4 was statistically significantly better than the probability of guessing
the correct task (25%) for 8 out of 12 sensors. Thus, the distinction between multiple
activities is possible, but depends on the target location and the position of the sensor.
A B C
Location
40
60
80
100
Accuracy [%]
A1
A2
A3
A4
B1
B2
B3
C1
C2
C3C4
C5
(i) Occupancy
A B C
Location
20
30
40
50
60
Accuracy [%]
A1
A2
A3
A4
B1
B2
B3
C1
C2
C3
C4
C5
(ii) Activity4
(read, stand, walk, work)
A B C
Location
40
60
80
100
Accuracy [%]
A1
A2
A3
A4
B1
B2
B3
C1
C2
C3
C4
C5
(iii) Activity3
(sit, stand, walk)
A B C
Location
50
60
70
80
90
Accuracy [%]
A1
A2
A3
A4
B1
B2
B3
C1
C2
C3
C4
C5
(iv) Activity2
(sit, upright)
A B C
Location
50
60
70
80
Accuracy [%]
A1
A2
A3
A4
B1
B2
B3
C1
C2
C3
C4
C5
(v) Activity2a
(read, work)
A B C
Location
40
60
80
100
Accuracy [%]
A1
A2
A3
A4
B1
B2
B3
C1
C2
C3
C4
C5
(vi) Activity2b
(stand, walk)
Fig. 4: Classification accuracy for occupancy detection and activity recognition. In each
diagram, the guessing probability is plotted as a line. Each symbol represents the accu-
racy that we achieved with a single sensor. A blue dot marks a statistically significant
result, while a red ‘x’ represents a statistically insignificant result.
In the next step, we investigated whether an attacker can increase the recognition
accuracies by distinguishing between a smaller set of activities. To this end, we com-
bined two tasks to a meta task, e.g., the tasks Read and Work became Sit. The model
Activity3 classifies between the tasks Sit,Stand, and Walk. The probability of correct
guessing is thus 33.3%. This model is typical to represent activities of an occupant
in a private space or an oce room. For Activity3, the achieved accuracy ranged from
31.8% (Sensor C1) to 81.0% (Sensor A1). Our results were statistically significant for
10 out of the 12 sensors deployed in the three locations. Assuming a known layout of
the target location, the attacker might be able to determine the position of the occupant
in the space and infer activities such as watching TV, exercising, cooking or eating.
The model Activity2 classifies between the tasks Sit and Upright, whereby Sit is as
previously Read or Work, and Upright combines Stand and Walk. In this classifica-
tion, the attacker distinguishes whether an occupant is at a certain posture. The model
Activity2a classifies between the tasks Read and Work, and the model Activity2b clas-
sifies between the tasks Stand and Walk.Activity2a indicates that an attacker can even
distinguish between the sedentary activities, such as reading a book or working on the
laptop. In contrast, Activity2b shows that an attacker can dierentiate between standing
and moving activities. Thus, an attacker can detect movements at the target location.
For Activity2,Activity2a, and Activity2b, the probability to guess the correct class is
50%. Using these models, the attacker can infer various work and life habits.
For Activity2, our accuracy varies between 54.6% (Sensor C2) and 82.1% (Sensor
A1), and all accuracies are statistically significant. For Activity2a, the lowest and high-
est accuracies were 54.2% (Sensor B3) and 76.6% (Sensor C2), respectively, which
resulted in statistically significant results for 11 out of 12 sensors. For Activity2b, the
achieved accuracy ranged from 53.3% (Sensor C4) to 95.1% (Sensor A1) and the re-
sults for 10 out of 12 sensors were statistically significant.
5.4 Further Observations
Length of Measurement Windows. The length of the measurement windows influ-
ences the accuracy of detection. We evaluated window sizes in the range between 60
and 180 seconds. Exemplarily, we analyzed the average accuracy of occupancy detec-
tion depending on the window size for all three locations. As shown in Figure 5, the
accuracy increases with a longer window size. We achieved the best results with the
longest window sizes of 180 seconds.
60 90 120 150 180
Window Size [s]
75
80
85
90
Average Accuracy [%]
Location A
Location B
Location C
Fig. 5: Average accuracy over
all sensors from each loca-
tion for occupancy detection de-
pending on the window size
This indicates that the highest accuracies are possible if longer time periods are
considered. From a practical perspective, it is not advisable to extend the window size to
a much larger duration than a few minutes since we assume that the performed activity
is consistent for the whole duration of the window.
Selected Features. To assess the feasibility of an attacker that has only access to either
temperature data or relative humidity data, we evaluated whether it might be enough
to solely collect one type of room climate data. In the classification process, an at-
tacker derives a set of features from temperature and relative humidity data and selects
the best-performing features for each sensor and classification goal automatically (cf.
Section 4.4). Analysis shows that features computed from temperature and relative hu-
midity are of similar importance. In our evaluation, 57.9% of the selected features are
derived from temperature measurements, and 52.3% from relative humidity measure-
ments.5
We also compared the features in terms of dierences between the three locations
as well as dierences between occupancy detection and activity recognition. In all these
cases, there are no significant dierences between the importance of temperature and
relative humidity. An attacker restricted to either temperature or relative humidity data
will perform worse than with both data.
Size and Layout of Rooms. All our locations are oce-like rooms, which have a sim-
ilar layout (rectangular) but dier in size and furnishing. In our evaluation, the accuracy
correlates with the size of the target location. As shown in Figure 5, we had the highest
average accuracy in occupancy detection with Location C, which has also the smallest
ground area of 13.9m2. Location A has a ground area of 16.5m2, and has a slightly
lower average accuracy. Location B is almost twice as large (30.8m2) and shows the
worst average accuracy compared to the other locations. Thus, our experiment indi-
cates that an increasing room size leads to decreasing accuracy on average. An attacker
achieves higher accuracies by monitoring target locations of a small size compared to
target locations of larger sizes.
Position of Sensors. According to our threat model in Section 3, the attacker controls
layout of the target location. Thus, we assume an attacker that can decide at which
position in the target location a room climate sensor is installed. We consider how the
position of a room climate sensor influences the accuracy of derived information. For
occupancy detection, we had the best accuracy with a sensor node that is located in the
center point at the ceiling of the target location (Sensors A1, B1, C1). In this position,
the sensor has the largest gathering area to measure the climate of the room. Sensors
mounted to the walls or on shelves perform dierently in our experiments. For activity
recognition, the central sensor nodes performed best at Location A and B, but not at
Location C.
From the attacker perspective, the best position to deploy a room climate sensor is
at the ceiling in the center of the target location. In large rooms, multiple sensors at the
ceiling could be installed, each covering a subsection of the room.
5Note that some features are based on both, temperature and relative humidity, which is why
the sum of both numbers exceeds 100%.
6 Discussion
As our experiments reveal, knowing the temperature and relative humidity of a room
allows to detect the presence of people and to recognize certain activities with a sig-
nificantly higher probability than guessing. By evaluating temperature and relative hu-
midity curves of the length of 180 seconds, we were able to detect the presence of an
occupant in one of our experimental spaces with an accuracy of 93.5% using a single
sensor. In terms of activity recognition, we distinguished between four activities with an
accuracy up to 56.8%, between three activities up to 81.0%, and between two activities
up to 95.1%. Thus, an attacker focusing on the detection of a specific activity is more
successful than an attacker that aims to classify a broader variety of activities. In the
following, we discuss implications and limitations of our results.
Privacy Implications We show that an attacker might be able to infer life and work
habits of the occupants from the room climate data. Thus, the attacker is able to dis-
tinguish between sitting, standing, and moving, which already might reveal the posi-
tion and activities of the occupant in the room. Moreover, the attacker can distinguish
between upright and sedentary activities, between moving and standing, and between
working on the laptop or reading a book.
Given the limited amount of recorded sensor data, the achieved accuracies in occu-
pancy detection and activity recognition give a clear indication that occupants are sub-
ject to privacy violations according to the threat model described in Section 3. However,
activity recognition is not straightforward since the achieved accuracies dier between
the dierent sensor positions and locations.
Further experiments are required for a better assessment of the privacy risks induced
by the room climate data. Our work provides promising directions for these assess-
ments. For example, we demonstrated the existence of the information leak with the
Na¨
ıve Bayes classifier. Na¨
ıve Bayes is arguably one of the simplest machine learning
classifiers. In future work, it would be interesting to explore upper boundaries for the
detection of presence/absence and dierent activities by using more advanced classifiers
such as the recently popular deep learning algorithms.
Location-Independent Classification An important question is whether it is possible
to perform location-independent classification, i.e., to train the classifier with sensor
data of one location and then use it to classify sensor data at the target location that
is not similar to the training location in size, layout, and sensor positions. If this was
possible, the service providers of smart heating applications would be able to detect
occupancy and to recognize activities without having access to the target locations.
According to their privacy statements, popular smart thermostats from Nest [33],
Ecobee [10], and Honeywell [20] send measured climate data to the service providers’
databases. To evaluate these privacy threats, we used the room climate data of the best-
performing sensor of a location as training data set for other locations. For example, to
classify events of an arbitrary sensor of Location A, we trained the classifier with room
climate data collected by Sensor B1 or Sensor C1. We gained statistically significant
results for a few combinations in occupancy detection but the majority of our occupancy
detection results was not significant. For activity recognition, we were not able to gain
statistically significant results.
However, the possibility of location-independent attackers cannot be excluded. Ab-
sence of significant results in our experiments may be merely due to the limited amount
of data. Future studies should be conducted to gather data from various rooms up to a
point where the combined results hold for arbitrary locations. Having more data from a
multitude of rooms available would help the machine learning classifiers to recognize
and ignore data characteristics that are specific to either of the experimental rooms.
Consequently, the algorithms could better identify the distinct data characteristics of
the dierent classes in occupancy detection and activity recognition. This would enable
location-independent classification of room climate data, in which the training location
is not similar to the target location regarding size, layout, furnishing, and positions of
the sensors.
In a representative smart home survey of German consumers from 2015,34% of the
participants stated that they are interested in technologies for intelligent heating or are
planning to acquire such a system [5]. Another survey with 1,000 US and 600 Canadian
consumers found that for 72% of them, the most desired smart home device would be
a self-adjusting thermostat, and 37% reported that they were likely to purchase one in
the next 12 months [22]. Sharing smart home data with providers and third parties is a
popular idea and a controversial issue for consumers. Thus, in a recent representative
survey with 461 American adults by Pew Research [35], the participants were presented
with a scenario of installing a smart thermostat “in return for sharing data about some
of the basic activities that take place in your house like when people are there and when
they move from room to room”. Of all respondents, 55% said that this scenario was not
acceptable for them, 27% said that it was acceptable, with remaining 17% answering
“it depends”. Furthermore, in a worldwide survey with 9,000 respondents from nine
countries (Australia, Brazil, Canada, France, Germany, India, Mexico, the UK, and the
US), 54% of respondents said that “they might be willing to share their personal data
collected from their smart home with companies in exchange for money” [23].6
We think that the idea of sharing the smart home data for various benefits will
continue to be intensively discussed in the future, and therefore, consumers and policy
makers should be made aware of the level of detail inferable from smart home data.
Which rewards are actually beneficial for consumers? Moreover, which kind of data
sharing is ethically permissible? Only by answering these questions it would be possible
to design fair policies and establish beneficial personal data markets [40]. In this work,
we take the first step towards informing the policy for the smart heating scenario.
7 Conclusions
We investigated the common belief that the data collected by room climate sensors di-
vulge private information about the occupants. To this end, we conducted experiments
that reflect realistic conditions, i.e., considering an attacker who has access to typical
room climate data (temperature and relative humidity) only. Our experiments revealed
6Methodological details, such as representativeness, breakdown by country and the exact for-
mulation of the questions, are not known about this survey.
that knowing a sequence of temperature and relative humidity measurements already
allows to detect the presence of people and to recognize certain activities with high
accuracy. Our results confirm that the assumptions that room climate data needs pro-
tection are justified: the leakage of such ‘inconspicuous’ sensor data as temperature
and relative humidity can seriously violate privacy in smart spaces. Future work is re-
quired determine the level of privacy invasion in more depth and develop appropriate
countermeasures.
Acknowledgement
The work is supported by the German Research Foundation (DFG) under Grant AR
671/3-1: WSNSec – Developing and Applying a Comprehensive Security Framework
for Sensor Networks.
References
1. B. Ai, Z. Fan, and R. X. Gao. Occupancy estimation for smart buildings by an auto-regressive
hidden Markov model. In American Control Conference, ACC 2014, Portland, OR, USA,
June 4-6, 2014, pages 2234–2239. IEEE, 2014.
2. BSI. Protection Profile for the Gateway of a Smart Metering System (Smart Meter Gateway
PP). https://www.commoncriteriaportal.org/files/ppfiles/pp0073b pdf.pdf, Mar. 2014.
3. A. Cavoukian, J. Polonetsky, and C. Wolf. SmartPrivacy for the Smart Grid: embedding pri-
vacy into the design of electricity conservation. Identity in the Information Society, 3(2):275–
294, 2010.
4. Chaos Computer Club: Guidelines for Smart Home Solutions.
https://www.ccc.de/en/updates/2016/smarthome. Feb. 2016 (in German).
5. Deloitte. Ready for Takeo? Consumer Survey, July 2015.
6. B. Dong, B. Andrews, K. P. Lam, M. H¨
oynck, R. Zhang, Y.-S. Chiou, and D. Benitez.
An information technology enabled sustainability test-bed (ITEST) for occupancy detection
through an environmental sensing network. Energy and Buildings, 42(7):1038 – 1046, 2010.
7. A. Dunkels, B. Gr ¨
onvall, and T. Voigt. Contiki – a lightweight and flexible operating sys-
tem for tiny networked sensors. In 29th Annual IEEE International Conference on Local
Computer Networks, 2004, pages 455–462. IEEE, 2004.
8. A. Ebadat, G. Bottegal, D. Varagnolo, B. Wahlberg, and K. H. Johansson. Estimation of
building occupancy levels through environmental signals deconvolution. In BuildSys 2013,
Proceedings of the 5th ACM Workshop On Embedded Systems For Energy-Ecient Build-
ings, Roma, Italy, November 13-14, 2013, pages 8:1–8:8, 2013.
9. A. Ebadat, G. Bottegal, D. Varagnolo, B. Wahlberg, and K. H. Johansson. Regularized
deconvolution-based approaches for estimating room occupancies. IEEE Trans. Automation
Science and Engineering, 12(4):1157–1168, 2015.
10. Ecobee. Privacy policy & terms of use, April 2015.
11. T. Ekwevugbe, N. Brown, V. Pakka, and D. Fan. Real-time building occupancy sensing using
neural-network based sensor network. In 7th IEEE International Conference on Digital
Ecosystems and Technologies (DEST), 2013, pages 114–119, July 2013.
12. European Union Agency For Network And Information Security. Security and
Resilience of Smart Home Environments – Good practices and recommendations.
https://www.enisa.europa.eu. Dec. 2015.
13. S. Fischer-H ¨
ubner and N. Hopper, editors. Privacy Enhancing Technologies - 11th Inter-
national Symposium, PETS 2011, Waterloo, ON, Canada, July 27-29, 2011. Proceedings,
volume 6794 of Lecture Notes in Computer Science. Springer, 2011.
14. U. Greveler, P. Gl ¨
osek¨
otterz, B. Justusy, and D. Loehr. Multimedia content identification
through smart meter power usage profiles. In Proceedings of the International Conference
on Information and Knowledge Engineering (IKE), 2012.
15. E. Hailemariam, R. Goldstein, R. Attar, and A. Khan. Real-time occupancy detection us-
ing decision trees with multiple sensor types. In 2011 Spring Simulation Multi-conference,
SpringSim ’11, Boston, MA, USA, April 03-07, 2011., pages 141–148, 2011.
16. M. Hall, E. Frank, G. Holmes, B. Pfahringer, P. Reutemann, and I. H. Witten. The WEKA
Data Mining Software: An Update. SIGKDD Explor. Newsl., 11(1):10–18, Nov. 2009.
17. Z. Han, R. X. Gao, and Z. Fan. Occupancy and indoor environment quality sensing for
smart buildings. In 2012 IEEE International Instrumentation and Measurement Technology
Conference (I2MTC), pages 882–887, May 2012.
18. G. W. Hart. Residential energy monitoring and computerized surveillance via utility power
flows. Technology and Society Magazine, IEEE, 8(2):12–16, 1989.
19. T. Hastie, R. Tibshirani, and J. H. Friedman. The Elements of Statistical Learning. Springer,
New York, NY, USA, 2nd edition, 2009.
20. Honeywell. Honeywell connected home privacy statement, December 2015.
21. V. Huppert, J. Paulus, U. Paulsen, M. Burkart, B. Wullich, and B. Eskofier. Quantification of
Nighttime Micturition With an Ambulatory Sensor-Based System. IEEE Journal of Biomed-
ical and Health Informatics, 20(3):865–872, May 2016.
22. icontrol Networks: 2015 State of the Smart Home Report.
https://www.icontrol.com/blog/2015-state-of-the-smart-home-report.
23. Intel Security: Intel Security’s International Internet of Things Smart Home Survey Shows
Many Respondents Sharing Personal Data for Money. https://newsroom.intel.com/news-
releases/intel-securitys-international-internet-of-things-smart-home-survey. Mar. 2016.
24. M. Jawurek, M. Johns, and F. Kerschbaum. Plug-in privacy for smart metering billing. In
Fischer-H¨
ubner and Hopper [13], pages 192–210.
25. M. Jawurek, F. Kerschbaum, and G. Danezis. SoK: Privacy technologies for smart grids – a
survey of options. Microsoft Res., Cambridge, UK, 2012.
26. U. Jensen, P. Blank, P. Kugler, and B. Eskofier. Unobtrusive and Energy-Ecient Swimming
Exercise Tracking Using On-Node Processing. IEEE Sensors Journal, 16(10):3972–3980,
May 2016.
27. K. Kursawe, G. Danezis, and M. Kohlweiss. Privacy-friendly aggregation for the smart-grid.
In Fischer-H¨
ubner and Hopper [13], pages 175–191.
28. K. P. Lam, M. H¨
oynck, B. Dong, B. Andrews, Y. shang Chiou, D. Benitez, and J. Choi.
Occupancy detection through an extensive environmental sensor network in an open-plan
oce building. In Proc. of Building Simulation 09, an IBPSA Conference, 2009.
29. J. Lu, T. Sookoor, V. Srinivasan, G. Gao, B. Holben, J. Stankovic, E. Field, and K. White-
house. The smart thermostat: using occupancy sensors to save energy in homes. In Proceed-
ings of the 8th ACM Conference on Embedded Networked Sensor Systems, pages 211–224.
ACM, 2010.
30. M. K. Masood, Y. C. Soh, and V. W. Chang. Real-time occupancy estimation using envi-
ronmental parameters. In 2015 International Joint Conference on Neural Networks, IJCNN
2015, Killarney, Ireland, July 12-17, 2015, pages 1–8. IEEE, 2015.
31. A. Molina-Markham, P. Shenoy, K. Fu, E. Cecchet, and D. Irwin. Private memoirs of a smart
meter. In Proceedings of the 2Nd ACM Workshop on Embedded Sensing Systems for Energy-
Eciency in Building, BuildSys ’10, pages 61–66, New York, NY, USA, 2010. ACM.
32. Moteiv Corporation. Tmote Sky Datasheet, 2006.
33. Nest. Privacy statement for nest products and services, March 2016.
34. R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for
Statistical Computing, Vienna, Austria, 2014.
35. L. Rainie and M. Duggan. Pew Research: Privacy and Information Sharing.
http://www.pewinternet.org/2016/01/14/privacy-and-information-sharing. Jan. 2016.
36. A. Reinhardt, F. Englert, and D. Christin. Averting the privacy risks of smart metering by
local data preprocessing. Pervasive and Mobile Computing, 16:171–183, 2015.
37. A. Rial and G. Danezis. Privacy-preserving smart metering. In Proceedings of the 10th
Annual ACM Workshop on Privacy in the Electronic Society, WPES ’11, pages 49–60, New
York, NY, USA, 2011. ACM.
38. M. Ring, U. Jensen, P. Kugler, and B. Eskofier. Software-based performance and complexity
analysis for the design of embedded classification systems. In Proceedings of the 21st Inter-
national Conference on Pattern Recognition, ICPR 2012, Tsukuba, Japan, November 11-15,
2012, pages 2266–2269. IEEE Computer Society, 2012.
39. M. Selinger. Test: Smart Home Kits Leave the Door Wide Open – for Ev-
eryone. https://www.av-test.org/en/news/news-single-view/test-smart-home-kits-leave-the-
door-wide-open-for-everyone/. Apr. 2014.
40. S. Spiekermann, A. Acquisti, R. B ¨
ohme, and K.-L. Hui. The challenges of personal data
markets and privacy. Electronic Markets, 25(2):161–167, 2015.
41. T. van Kasteren, A. Noulas, G. Englebienne, and B. Kr¨
ose. Accurate activity recognition in
a home setting. In Proceedings of the 10th International Conference on Ubiquitous Comput-
ing. ACM, 2008.
42. I. H. Witten, E. Frank, and M. A. Hall. Data Mining: Practical Machine Learning Tools and
Techniques. Morgan Kaufmann, Burlington, MA, USA, 3rd edition, 2011.
43. D. W ¨
orner, T. von Bomhard, M. Roeschlin, and F. Wortmann. Look twice: Uncover hidden
information in room climate sensor data. In 4th International Conference on the Internet of
Things, IoT 2014, Cambridge, MA, USA, October 6-8, 2014, pages 25–30. IEEE, 2014.
44. W. Yang, N. Li, Y. Qi, W. Qardaji, S. McLaughlin, and P. McDaniel. Minimizing private data
disclosures in the smart grid. In Proceedings of the 2012 ACM Conference on Computer and
Communications Security, pages 415–427. ACM, 2012.
45. Z. Yang, N. Li, B. Becerik-Gerber, and M. D. Orosz. A systematic approach to occupancy
modeling in ambient sensor-rich buildings. Simulation, 90(8):960–977, 2014.
46. R. Zhang, K. P. Lam, Y.-S. Chiou, and B. Dong. Information-theoretic environment features
selection for occupancy detection in open oce spaces. Building Simulation, 5(2):179–188,
2012.
A Additional Material
A.1 Experimental Procedure
The participants were assigned to at least one experimental unit with fixed presence
times and tasks, and provided with a script for their actions (that is, for how long and
in which order the tasks should be performed). Every participant performed each unit
twice, with the same tasks, but possibly on dierent days and in a permuted chronolog-
ical order. Tasks were performed in blocks of 10,20, or 30 minutes. Thus, 10-minute
units contained only one task of 10 minutes; 30-minute units consisted of either three
tasks of 10 or one task of 10 plus one of 20 minutes; 60-minute units were composed
of either two tasks of 20 plus two of 10, or one task of 10,20, and 30 minutes each.
At the beginning of the presence time for each unit, i.e., the time period where a
person had to be present, the experimental supervisor unlocked the room door to let the
participant in. The participant started with the first task and was instructed by phone
(at Locations A and C) or through the glass pane (at Location B) when it was time to
change activities or to leave the room.
Overall, we defined 22 units per location, consisting of six 60-minute plus eight
30-minute and eight 10-minute units. Furthermore, the distribution of units and tasks
was identical for all locations. Both, Read and Work account for 180 minutes each,
whereas Stand and Walk provide 160 minutes each. A comprehensive overview of the
distribution of tasks, number of tasks (per unit and block), and aggregated values is
provided by Table 5.
Table 5: Overview of the number and dis-
tribution of tasks and units at one location.
nxtdenotes the number of nrecorded t-
units (i. e., the time of presence in minutes,
t{10,30,60}), ttask denotes the defined
task block lengths per unit. For instance, in
a total of six 60-minute units, Read and
Work account for two 30-minute blocks,
whereas in a total of eight 10-minute units,
all tasks account for two blocks of 10 min-
utes each.
Units Tasks
nxt ttask Read Stand Walk Work
30 2 – – 2
6x60 20 2 4 2
10 8– –
8x30 20 2 2 2
10 2 6 2 2
8x10 10 2 2 2 2
Total time [min] 180 160 160 180
... In our experiments, we use a real-world dataset taken from [43], which consists of temperature and humidity sensor readings of three to five sensor nodes distributed in three different rooms. ...
... For training and testing the model, we use a similar experimental setting as the original paper [43], i.e. we split the sensor readings of each node in windows of 150 samples each, where each window is obtained by shifting the previous one by 30 samples. Overall, we obtain about 800, 1400 and 450 windows for the human presence task, and 300, 450 and 150 windows for the human activity task in room A, B and C respectively. ...
... It is important to remark that in [43] no communication occurs between nodes. Rather, the authors extract the relevant features from the sensor readings and use these features to train a model -separately for each node-in order to detect the human presence and activity. ...
Article
Full-text available
In several network problems the optimal behavior of the agents (i.e., the nodes of the network) is not known before deployment. Furthermore, the agents might be required to adapt, i.e. change their behavior based on the environment conditions. In these scenarios, offline optimization is usually costly and inefficient, while online methods might be more suitable. In this work, we use a distributed Embodied Evolution approach to optimize spatially distributed, locally interacting agents by allowing them to exchange their behavior parameters and learn from each other to adapt to a certain task within a given environment. Our results on several test scenarios show that the local exchange of information, performed by means of crossover of behavior parameters with neighbors, allows the network to conduct the optimization process more efficiently than the cases where local interactions are not allowed, even when there are large differences on the optimal behavior parameters within each agent’s neighborhood.
... This condition became a particular concern in this research, which is why datasets with complex patterns may be beneficial in training environmental climate prediction models. To represent the SDD and greenhouse, this research used a complex dataset called the Room Climate Dataset, which is composed of various Pearson Correlation Coefficient (PCC) values [8]. This challenging dataset intuitively encourages this research to investigate the great fame of deep learning approaches for predicting sequence-to-sequence cases by implementing the Luong attention mechanism. ...
... The most similar research to ours was done by Gunawan et al., in which they evaluated four deep learning models, which included a two-layered LSTM model, a two-layered GRU model, a Transformer model, and a Transformer model with learnable positional encoding applied to the room climate dataset [8,18]. Unlike this research, which predicted a sequence output of five timestamps in the future, the study conducted by Gunawan et al. predicted only one timestamp in the future, resulting in extremely strong 2 scores. ...
... The Room Climate Dataset, which was acquired from the indoor climate experiment conducted by Morgner et al. [8], is publicly available on GitHub and was used in this research. This research continued the previous research done by Gunawan et al., which used the same dataset [18]. ...
Article
Full-text available
The Solar Dryer Dome (SDD), a solar-powered agronomic facility for drying, retaining, and processing comestible commodities, needs smart systems for optimizing its energy consumption. Therefore, indoor condition variables such as temperature and relative humidity need to be forecasted so that actuators can be scheduled, as the largest energy usage originates from actuator activities such as heaters for increasing indoor temperature and dehumidifiers for maintaining optimal indoor humidity. To build such forecasting systems, prediction models based on deep learning for sequence-to-sequence cases were developed in this research, which may bring future benefits for assisting the SDDs and greenhouses in reducing energy consumption. This research experimented with the complex publicly available indoor climate dataset, the Room Climate dataset, which can be represented as environmental conditions inside an SDD. The main contribution of this research was the implementation of the Luong attention mechanism, which is commonly applied in Natural Language Processing (NLP) research, in time series prediction research by proposing two models with the Luong attention-based sequence-to-sequence (seq2seq) architecture with GRU and LSTM as encoder and decoder layers. The proposed models outperformed the adapted LSTM and GRU baseline models. The implementation of Luong attention had been proven capable of increasing the accuracy of the seq2seq LSTM model by reducing its test MAE by 0.00847 and RMSE by 0.00962 on average for predicting indoor temperature, as well as decreasing 0.068046 MAE and 0.095535 RMSE for predicting indoor humidity. The application of Luong's attention also improved the accuracy of the seq2seq GRU model by reducing the error by 0.01163 in MAE and 0.021996 in RMSE for indoor humidity. However, the implementation of Luong attention in seq2seq GRU for predicting indoor temperature showed inconsistent results by reducing approximately 0.003193 MAE and increasing roughly 0.01049 RMSE. Doi: 10.28991/CEJ-2023-09-05-06 Full Text: PDF
... Similar research had been conducted using a modified LSTM dubbed the Greenhouse Climate Prediction-LSTM (GCP-LSTM) [19], Se-riesNet [20], Convex Bidirectional-Extreme Learning Machine (CB-ELM) [21], Bidirectional LSTM, and Bidirectional Selfattentive Encoder-decoder Framework [22]. Our previous research had used Transformer models in addition to the LSTM and GRU, and all models obtained outstanding results with 0.99 R 2 value [23] on the publicly available Room Climate dataset [24]. Nevertheless, our prior study in this case [23] raised an issue despite of the near-perfect evaluation results. ...
... Smart Home Internet of Things (IoT) technologies increasingly collect seemingly inconspicuous data from homes using simple sensors, e.g. for temperature or light etc. Prior work has shown that not only automated data analytics [7,8] but also human sensemaking of such sensor data, as part of human data interaction, can reveal domestic activities [3,10]. Our research using our method Guess the Data [6] furthermore showed that this ability is not limited to experts or members of a household but also possible through using 'situated knowledge' [4] for others, e.g. ...
Preprint
Human data interaction with sensor data from smart homes can cause some implications when it comes to human sensemaking of this data. With our data-driven method Guess the Data for individual and collective data work we revealed in previous work a number of potential pitfalls when interacting with this type of data. We introduce some of the identified, often wicked implications for further discussion.
Conference Paper
Precise predictions of indoor climate conditions are required in the implementation of Smart Solar Dryer Dome (SDD). Trend development of prediction models is discussed in this review from 15 selected research papers (2018-2022) on indoor climate prediction which was obtained from research paper databases The output shows that the most used model for predicting indoor climate is Artificial Neural Network (ANN), especially Recurrent Neural Network (RNN) such as LSTM and GRU. However, there are some potential methods such as Transformer, Combined Support Vector Machines (SVM)-Deep Learning, and sequence-to-sequence which could outperform other commonly used models. Based on findings various opportunities exist to improve the precision of indoor climate prediction, which can bring power consumption efficiency and others benefit to Smart SDD users. Such studies may further be explored to produce more accurate machine learning models.
Conference Paper
Solar Dryer Dome (SDD), an agricultural facility for drying and preserving agricultural products, needs a smart ability to predict the future indoor climate accurately, including indoor temperature and indoor humidity, in order to optimize electricity usage. To overcome these challenges, deep learning has been a widely adopted method. This research aims to forecast the future indoor climate using time series data by implementing a sequence-to-sequence (seq2seq) architecture, which is mostly used in Natural Language Processing (NLP) tasks. The two proposed seq2seq models, Long Short-Term Memory (LSTM) seq2seq and Gated Recurrent Unit (GRU) seq2seq, have proven to be superior to the adapted LSTM and GRU. The results show that the seq2seq GRU model outperforms the adapted GRU baseline model by an average difference of 0.03013 in MAE and the seq2seq LSTM model outperforms the adapted LSTM baseline model by an average difference of 0.00941 in MAE. To the best of our knowledge, this is the first implementation of seq2seq models for indoor climate forecasting on the Room Climate dataset.
Article
Smart buildings are socio-technical systems that bring together building systems, IoT technology and occupants. A multitude of embedded sensors continually collect and share building data on a large scale which is used to understand and streamline daily operations. Much of this data is highly influenced by the presence of building occupants and could be used to monitor and track their location and activities. The combination of open accessibility to smart building data and the rapid development and enforcement of data protection legislation such as the GDPR and CCPA make the privacy of smart building occupants a concern. Until now, little if any research exists on occupant privacy in work-based or commercial smart buildings. This paper addresses this gap by conducting two user studies ( N = 81 and N = 40) on privacy concerns and preferences about smart buildings. The first study explores the perception of the occupants of a state-of-the-art commercial smart building, and the latter reflects on the concerns and preferences of a more general user group who do not use this building. Our results show that the majority of the participants are not familiar with the types of data being collected, that it is subtly related to them (only 19.75% of smart building residents (occupants) and 7.5% non-residents), nor the privacy risks associated with it. After being informed more about smart buildings and the data they collect, over half of our participants said that they would be concerned with how occupancy data is used. These findings show that despite the more public environment, there are similar levels of privacy concerns for some sensors to those living in smart homes. The participants called for more transparency in the data collection process and beyond, which means that better policies and regulations should be in place for smart building data.
Article
Full-text available
Knowledge engineering relies on ontologies, since they provide formal descriptions of real-world knowledge. However, ontology development is still a nontrivial task. From the view of knowledge engineering, ontology learning is helpful in generating ontologies semi-automatically or automatically from scratch. It not only improves the efficiency of the ontology development process but also has been recognized as an interesting approach for extending preexisting ontologies with new knowledge discovered from heterogenous forms of input data. Driven by the great potential of ontology learning, we present an automatic ontology-based model evolution approach to account for highly dynamic environments at runtime. This approach can extend initial models expressed as ontologies to cope with rapid changes encountered in surrounding dynamic environments at runtime. The main contribution of our presented approach is that it analyzes heterogeneous semi-structured input data for learning an ontology, and it makes use of the learned ontology to extend an initial ontology-based model. Within this approach, we aim to automatically evolve an initial ontology-based model through the ontology learning approach. Therefore, this approach is illustrated using a proof-of-concept implementation that demonstrates the ontology-based model evolution at runtime. Finally, a threefold evaluation process of this approach is carried out to assess the quality of the evolved ontology-based models. First, we consider a feature-based evaluation for evaluating the structure and schema of the evolved models. Second, we adopt a criteria-based evaluation to assess the content of the evolved models. Finally, we perform an expert-based evaluation to assess an initial and evolved models’ coverage from an expert’s point of view. The experimental results reveal that the quality of the evolved models is relevant in considering the changes observed in the surrounding dynamic environments at runtime.
Article
Full-text available
We address the problem of estimating the number of people in a room using information available in standard HVAC systems. We propose an estimation scheme based on two phases. In the first phase, we assume the availability of pilot data and identify a model for the dynamic relations occurring between occupancy levels, ${rm CO}_{2}$ concentration and room temperature. In the second phase, we make use of the identified model to formulate the occupancy estimation task as a deconvolution problem. In particular, we aim at obtaining an estimated occupancy pattern by trading off between adherence to the current measurements and regularity of the pattern. To achieve this goal, we employ a special instance of the so-called fused lasso estimator, which promotes piecewise constant estimates by including an $ell_{1}$ norm-dependent term in the associated cost function. We extend the proposed estimator to include different sources of information, such as actuation of the ventilation system and door opening/closing events. We also provide conditions under which the occupancy estimator provides correct estimates within a guaranteed probability. We test the estimator running experiments on a real testbed, in order to compare it with other occupancy estimation techniques and assess the value of having additional information sources.
Article
Full-text available
Connected sensors are on the march to become pervasive. While they are often deployed for a single purpose it is worth to take a second look. In this study, we show that the widespread Netatmo weather station which is intended to monitor and improve indoor climate can be used to estimate binary occupancy of individual rooms. We collected data from 11 rooms in 3 apartments including binary occupancy for several days. We show that CO2 measurements and derivatives thereof qualify as observables to be used in Hidden Markov Models and achieve accuracies well above 75% in most cases. However, we see that the accuracy metric is often misleading for such timeseries data and consider additional performance metrics as well which show varying results depending on the respective occupancy patterns of a room.
Article
Full-text available
Personal data is increasingly conceived as a tradable asset. Markets for personal information are emerging and new ways of valuating individuals’ data are being proposed. At the same time, legal obligations over protection of personal data and individuals’ concerns over its privacy persist. This article outlines some of the economic, technical, social, and ethical issues associated with personal data markets, focusing on the privacy challenges they raise. © 2015, Institute of Information Management, University of St. Gallen.
Book
This book constitutes the refereed proceedings of the 10th International Symposium, PETS 2011, held in Waterloo, Canada, in July 2011. The 15 revised full papers were carefully reviewed and selected from 61 submissions. The papers address design and realization of privacy services for the Internet, other data systems and communication networks. Presenting novel research on all theoretical and practical aspects of privacy technologies, as well as experimental studies of fielded systems the volume also features novel technical contributions from other communities such as law, business, and data protection authorities, that present their perspectives on technological issues.
Book
Data Mining: Practical Machine Learning Tools and Techniques, Fourth Edition, offers a thorough grounding in machine learning concepts, along with practical advice on applying these tools and techniques in real-world data mining situations. This highly anticipated fourth edition of the most acclaimed work on data mining and machine learning teaches readers everything they need to know to get going, from preparing inputs, interpreting outputs, evaluating results, to the algorithmic methods at the heart of successful data mining approaches. Extensive updates reflect the technical changes and modernizations that have taken place in the field since the last edition, including substantial new chapters on probabilistic methods and on deep learning. Accompanying the book is a new version of the popular WEKA machine learning software from the University of Waikato. Authors Witten, Frank, Hall, and Pal include today's techniques coupled with the methods at the leading edge of contemporary research. Please visit the book companion website at http://www.cs.waikato.ac.nz/ml/weka/book.html It contains Powerpoint slides for Chapters 1-12. This is a very comprehensive teaching resource, with many PPT slides covering each chapter of the book Online Appendix on the Weka workbench; again a very comprehensive learning aid for the open source software that goes with the book Table of contents, highlighting the many new sections in the 4th edition, along with reviews of the 1st edition, errata, etc. Provides a thorough grounding in machine learning concepts, as well as practical advice on applying the tools and techniques to data mining projects Presents concrete tips and techniques for performance improvement that work by transforming the input or output in machine learning methods Includes a downloadable WEKA software toolkit, a comprehensive collection of machine learning algorithms for data mining tasks-in an easy-to-use interactive interface Includes open-access online courses that introduce practical applications of the material in the book.
Article
Body-worn sensors for movement analysis in swimming have to be unobtrusive and energy-efficient. We present a swimming exercise tracker for the unobtrusive positioning at the back of the head and an energy-efficient analysis using an on-node implementation. To develop the system, we collected head kinematics from 11 subjects in two 200-m medley races comprising breaks, turns, and four swimming styles. Each subject was equipped with a 6-D inertial measurement unit and completed one session in rested and fatigued state. Data were analyzed with a classification system, whereby different classifiers, window sizes, and feature sets were evaluated. Algorithm selection for on-node processing was performed on the basis of classifier accuracy and computational cost. The algorithm with the best tradeoff in accuracy and computational cost was selected and had a classification rate of 85.4%. Energy consumption of both on-node processing and Bluetooth streaming was evaluated on the Shimmer sensor platform. The results revealed energy savings of over 60% when data were processed on the sensor node. The presented analysis approach can be easily applied to other data analysis tasks, and the presented toolchain can support the rapid development of wearable systems in sports and healthcare.
Article
Among elderly males, benign prostate syndrome (BPS) is the most common urinary disorder. Nocturia is one of the major symptoms of BPS and has a considerable influence on quality of life (QoL). For assessment of BPS (including nocturia), the International Prostate Symptom Score (IPSS) is widely used, but questionnaires are prone to bias. To date, there is no objective measurement system available for nocturia. In this study, we present an unobtrusive and non-stigmatizing device for objective measurement of nighttime micturition. In a preliminary study of 6 males diagnosed with BPS and nighttime micturition ≥ 2 times, we showed that the device is accurate, with an average misdetection rate of 0.32 events and a mean absolute deviation of 3.8% when comparing the average number of nighttime micturition occurrences. In this extended study, an additional 9 males were recorded and data from an occupancy sensor were also included. The results of the preliminary study were confirmed with an average misdetection rate of 0.33 events and a mean absolute deviation of 9.1%. The system can therefore be used to objectively measure nighttime micturition, and thereby provide the basis for treatment, e.g., medication efficacy assessment.