ArticlePDF Available

Reinforcement learning of occupant behavior model for cross-building transfer learning to various HVAC control systems

Authors:

Abstract and Figures

Occupant behavior plays an important role in the evaluation of building performance. However, many contextual factors, such as occupancy, mechanical system and interior design, have a significant impact on occupant behavior. Most previous studies have built data-driven behavior models, which have limited scalability and generalization capability. Our investigation built a policy-based reinforcement learning (RL) model for the behavior of adjusting the thermostat and clothing level. Occupant behavior was modelled as a Markov decision process (MDP). The action and state space in the MDP contained occupant behavior and various impact parameters. The goal of the occupant behavior was a more comfortable environment, and we modelled the reward for the adjustment action as the absolute difference in the thermal sensation vote (TSV) before and after the action. We used Q-learning to train the RL model in MATLAB and validated the model with collected data. After training, the model predicted the behavior of adjusting the thermostat set point with R² from 0.75 to 0.8, and the mean absolute error (MAE) was less than 1.1 °C (2 °F) in an office building. This study also transferred the behavior knowledge of the RL model to other office buildings with different HVAC control systems. The transfer learning model predicted the occupant behavior with R² from 0.73 to 0.8, and the MAE was less than 1.1 °C (2 °F) most of the time. Going from office buildings to residential buildings, the transfer learning model also had an R² over 0.6. Therefore, the RL model combined with transfer learning was able to predict the building occupant behavior accurately with good scalability, and without the need for data collection.
Content may be subject to copyright.
Reinforcement learning of occupant behavior model for cross-building transfer
1
learning to various HVAC control systems
2
3
Zhipeng Deng1, Qingyan Chen1, *
4
1Center for High Performance Buildings (CHPB), School of Mechanical Engineering,
5
Purdue University, 585 Purdue Mall, West Lafayette, IN 47907, USA
6
7
*Corresponding author: Qingyan Chen, yanchen@purdue.edu
8
9
10
Abstract
11
Occupant behavior plays an important role in the evaluation of building performance.
12
However, many contextual factors, such as occupancy, mechanical system and interior
13
design, have a significant impact on occupant behavior. Most previous studies have built
14
data-driven behavior models, which have limited scalability and generalization capability.
15
Our investigation built a policy-based reinforcement learning (RL) model for the behavior
16
of adjusting the thermostat and clothing level. Occupant behavior was modelled as a
17
Markov decision process (MDP). The action and state space in the MDP contained
18
occupant behavior and various impact parameters. The goal of the occupant behavior was
19
a more comfortable environment, and we modelled the reward for the adjustment action as
20
the absolute difference in the thermal sensation vote (TSV) before and after the action. We
21
used Q-learning to train the RL model in MATLAB and validated the model with collected
22
data. After training, the model predicted the behavior of adjusting the thermostat set point
23
with R2 from 0.75 to 0.8, and the mean absolute error (MAE) was less than 1.1 °C (2 °F)
24
in an office building. This study also transferred the behavior knowledge of the RL model
25
to other office buildings with different HVAC control systems. The transfer learning model
26
predicted the occupant behavior with R2 from 0.73 to 0.8, and the MAE was less than
27
1.1 °C (2 °F) most of the time. Going from office buildings to residential buildings, the
28
transfer learning model also had an R2 over 0.6. Therefore, the RL model combined with
29
transfer learning was able to predict the building occupant behavior accurately with good
30
scalability, and without the need for data collection.
31
32
Keywords
33
Thermal comfort, machine learning, artificial neural network, air temperature, thermostat
34
set point, Q-learning, building performance simulation
35
36
37
1. Introduction
38
In the United States, buildings account for 41% of primary energy use, mainly for
39
maintaining a comfortable and healthy indoor environment [1]. Unfortunately, current
40
methods for simulating building energy consumption are often inaccurate, and the error
41
can be as high as 150% to 250% [2, 3]. Discrepancies between the simulated and actual
42
energy consumption may arise from various occupant behavior in buildings [4, 5].
43
Therefore, it is important to estimate the impact of occupant behavior on building energy
44
consumption [6].
45
46
Occupant behavior in buildings refers to occupants’ movements and their interactions with
47
building components such as thermostats, windows, lights, blinds and internal equipment
48
[7]. The existing methods for exploring the effects of occupant behavior on energy
49
consumption were mostly based on building performance simulations [8]. In these
50
simulations, modelling occupant behavior is challenging due to its complexity [9, 10, 11].
51
Previous studies have tried to predict the energy consumption in commercial and
52
residential buildings with the use of various occupant behavior models. These models can
53
be divided into three categories: data-driven, physics-based and hybrid models.
54
55
In the data-driven category, many researchers have built linear regression models [12],
56
logistic regression models [13, 14], statistical models [15-16], and artificial neural network
57
(ANN) models [17]. To be specific, Andersen [12] and Fabi [13] collected data on
58
occupants heating set-points in dwellings and predicted the thermal preference along with
59
indoor environmental quality and heating demand. Langevin’s model [14] used heating set-
60
point data from a one-year field study in an air-conditioned office building. Sun and Hong
61
[16] used a simulation approach to estimate energy savings for five common types of
62
occupant behavior in a real office building across four typical climates. Deng and Chen
63
[17] collected data in an office building for one year to predict occupant behavior in regard
64
to thermostat and clothing level by means of an ANN model. In these studies, the models
65
considered different variables that affect occupant behavior in buildings. However, the
66
generalization capabilities of these data-driven models were not good [18], since the
67
occupant behavior differed from building to building. Some review papers [19, 20] have
68
discussed contextual factors that cause occupant behavior to vary greatly, such as room
69
occupancy, availability and accessibility of an HVAC system, and interior design. The
70
authors observed that it was difficult to apply an occupant behavior model developed for
71
one building to another building. Hong et al. also indicated that, because a large number of
72
data-driven behavior models emerged in scattered locations around the world, they lack
73
standardization and consistency and cannot easily be compared one with another [21].
74
Moreover, all the data-driven models require sufficient data for training, but the estimation
75
of building energy and modelling of occupant behavior are done mostly during the early
76
design stages, when collecting occupant behavior data is impossible [22]. It is hard to build
77
a data-driven occupant behavior model without data or satisfactory generalization
78
capability.
79
80
As for the physics-based models, a review by Jia et al. [23] pointed out that occupant
81
behavior modelling has progressed from deterministic or static to more detailed and
82
complex. Therefore, many researchers have based their models on the causal relationships
83
of occupant behavior. The driving factors of occupant behavior can be divided into three
84
main types: environmentally related, time related and random factors [20, 24]. Hong et al.
85
developed a DNAS (drivers, needs, actions, systems) framework that standardized the
86
representation of energy-related occupant behavior in buildings [21]. Many researchers
87
have adopted this framework for their behavior studies. For example, dynamic Bayesian
88
networks by Tijani et al. [25] simulated the occupant behavior in office buildings as it
89
relates to indoor air quality. The advantage of Bayesian network model was in its
90
representation of occupant behavior as probabilistic cause-effect relationships based on
91
prior knowledge. D’Oca et al. [26] built a knowledge discovery database for window-
92
operating behavior in 16 offices. Zhou et al. [27] used an action-based Markov chain
93
approach to predict window-operating actions in office spaces. They found that the Markov
94
chain reflected the actual behavior accurately in an open-plan office and was therefore a
95
beneficial supplemental module for energy simulation software. The Markov chain model
96
depends on the previous state to predict the probability of an event occurring. This
97
characteristic is useful for representing individuals’ actions and motivations [9]. In addition,
98
many researchers have built other kinds of models for different building types and
99
scenarios. For instance, hidden Markov models [23, 28] were used to simulate occupant
100
behavior with unobservable hidden states, and thus these models could be employed under
101
very complicated conditions. Survival models [29] could feature different occupant types
102
to mimic variations in control behavior. Meanwhile, a decision tree model [30, 31]
103
regarded occupant decisions and possible behavior as branched graphical classification.
104
This model was straightforward, but complex causal factors in real situations might give
105
rise to too many branches. In recent years, more complex agent-based models [32-34] have
106
yielded good predictions of occupant behavior with individual differences among
107
occupants. In short, physics-based occupant behavior models with physical meaning have
108
exhibited better generalization capability than data-driven models. Hence, the present study
109
used a Markov decision process (MDP) to model occupant behavior and build a logic-
110
based reinforcement learning model to explore the model’s scalability.
111
112
Reinforcement learning (RL) is a machine learning area concerned with the ways in which
113
agents take actions to maximize certain rewards [35]. Off-policy RL can use historical data
114
for training without interacting with the environment. In contrast, policy-based
115
reinforcement learning does not require previous training data because it creates its own
116
experience via random explorations of the environment. As such, this way of learning can
117
obtain rules and knowledge not limited to specific conditions but adaptable to various
118
scenarios. It has been applied successfully to a range of fields, including robot control [36]
119
and playing Go [37]. In the built environment, the RL model has been used to improve
120
building energy efficiency and management when the reward is defined as minimizing
121
building energy consumption [38-40]. For instance, Zhang et al. [38] used deep
122
reinforcement learning to control a radiant heating system in an existing office building
123
and achieved a 16.7% reduction in heating demand. A multi-agent reinforcement learning
124
framework by Kazmi et al. [39] achieved a 20% reduction in the energy required for the
125
hot water systems in over 50 houses. Liang [40] modelled an HVAC scheduling system
126
control as an MDP, and the model did not require prior knowledge of the building thermal
127
dynamics model. Similarly, when the reward is the thermal comfort level of occupants, the
128
RL model can be used to control the thermal comfort and HVAC system in buildings [41,
129
42]. For example, Yoon et al. [43] built performance-based comfort control for cooling
130
while minimizing the energy consumption. Ruelens and coauthors [44] used model-free
131
RL for a heat-pump thermostat. Their learning agent reduced the energy consumption by
132
49% during 100 winter days and by 911% during 80 summer days. Azuatalam et al. [45]
133
applied RL to the optimal control of whole-building HVAC systems while harnessing RL’s
134
demand response capabilities. Similarly, Chen [46] and Ding [47] developed novel deep
135
RL for reducing the training data set and training time. Meanwhile, several previous studies
136
used the RL model for advanced building control [43, 48, 49] and lighting control [50]. In
137
addition, there have been some integrated applications. For example, Valladares et al. [51]
138
used the RL model with a probability of reward combination to improve both the thermal
139
comfort and indoor air quality in buildings. The RL model developed by Brandi et al. [52]
140
optimized indoor temperature control and heating energy consumption in buildings. Ding
141
et al. [53] also employed a novel deep RL framework for optimal control of building
142
subsystems, including HVAC, lighting, blind and window. Hence, RL can be used to model
143
the HVAC system for both thermal comfort and energy management. Physics-based and
144
model-free RL also have the potential to model occupant behavior without data since the
145
logic is very similar. Therefore, this research built an RL model for thermostat set point
146
and clothing level adjustment behavior based on the correlation between thermal sensation
147
and thermally influenced occupant behavior [17].
148
149
For modeling of the occupant behavior in buildings with limited information and no data,
150
transfer learning was a feasible approach [18]. The transfer learning method stores
151
knowledge about one problem and then applies it to a related problem. It has been used for
152
cross-building [54, 55], cross-home [56] and even cross-city [57] energy modelling. For
153
instance, Mocanu et al. [58] transferred a building energy prediction to a new building in a
154
smart grid. Ribeiro et al. [59] used various machine learning methods to predict school
155
building energy and transfer the prediction to other new schools. Gao et al. [60] built a
156
transfer learning model for thermal comfort prediction in multiple cities. Xu et al. [61]
157
conducted transfer learning for HVAC control between buildings with different sizes,
158
numbers of thermal zones, materials, layouts, air conditioner types, and ambient weather
159
conditions. They found that this approach significantly reduced the training time and
160
energy cost. Therefore, based on the potential of transfer learning, we used it to transfer
161
knowledge about occupant behavior from one building to other buildings.
162
163
The purpose of the present study was to build an RL occupant behavior model for
164
thermostat and clothing level adjustment in a particular building, and transfer the model to
165
other buildings with different HVAC control systems. For this purpose, we first built an
166
MDP of the occupant behavior and used a thermal sensation model to build the rewards.
167
We then trained the RL model with the use of Q-learning. Next, we used transfer learning
168
to explore the occupant behavior in several other buildings. We also validated the RL
169
occupant behavior model and the transferred model with data collected from various
170
buildings. Finally, we analyzed the simulated building energy performance with the use of
171
the RL model and the transferred model.
172
173
2. Methods
174
To develop an occupant behavior model, we first modeled the occupant behavior as an
175
MDP and developed the RL model on the basis of this process. Subsequently, we trained
176
the model with the use of a Q-learning algorithm. Next, we transferred the knowledge of
177
the occupant behavior model from one building with manual control to other buildings with
178
thermostat setback and occupancy control systems. Finally, we validated the transfer
179
learning model with collected data. Fig. 1 summarizes the methods and models in this study.
180
181
182
Fig. 1. Flow chart of methods in this study, including the reinforcement learning
183
occupant behavior model, transfer learning model and energy simulation
184
185
2.1 Framework of reinforcement learning model
186
187
As shown in Fig. 2, in the RL model, an agent can gather information directly from the
188
environment of different states, and then take actions inside and compare the results of
189
these actions via the reward function. This cycle is repeated over time, until the agent has
190
enough experience to correctly choose the actions that yield the maximum reward. Thus,
191
through interaction with an environment and repeated actions, the RL model can evaluate
192
the consequences of actions by learning from past experience. As for the building
193
occupants, the decision to take an action in a specific indoor environment is a similar
194
process to that of the RL model. The MDP is used to describe an environment for
195
reinforcement learning, because the indoor environment and thermal comfort are fully
196
observable. In this study, the occupant behavior was modelled as a decision-making
197
process in which the policy-based RL was used. The building occupant, the occupant
198
behavior, the indoor environment and the improving thermal comfort level are the agent,
199
action, state and reward, respectively, in the model. In each state, the logic of occupant
200
behavior is to proactively seek more comfortable conditions in the indoor environment [11].
201
Numerous factors are related to the occupant behavior, and we will introduce them in detail
202
in the following sections.
203
204
Fig. 2. Illustration of the RL model with agent, action space, environment space and
205
rewards.
206
207
We modeled the occupant behavior in offices as an MDP, as shown in Fig. 3. In the initial
208
state, the agent had many possible choices of behavior, such as adjusting the thermostat set
209
point by various degrees or adjusting the clothing level. For every action, there was a
210
corresponding feedback reward, such as improvement or deterioration of thermal comfort.
211
The agent took an action to enter a follow-up environment, and this process kept going.
212
The time step size for action prediction was 15 minutes. We took the actual occupant
213
behavior occurrence into consideration, because there was a certain delay in the occurrence
214
of the behavior, and the occupant did not act immediately when feeling uncomfortable. We
215
also assumed that the action could take effect in the subsequent time step if the HVAC
216
system was in normal operation. Note that in Fig. 3 we have listed only some possible
217
actions. There may be others, such as reducing the clothing level and making a more
218
extreme adjustment to the thermostat set point. These additional actions are represented by
219
an ellipsis.
220
221
The MDP in this study entailed the following specifications:
222
Environment space: The state contains information about the indoor environment that
223
occupants use in deciding on the proper action. In this research, the state space included
224
room air temperature, room air relative humidity, thermostat set point, clothing level of
225
occupants, metabolic rate, room occupancy and time of day. Although there are many other
226
factors [20, 24] that impact occupant behavior, we neglected them in order to simplify the
227
structure of the RL model. Here we assumed that the thermal sensation of occupants was
228
not impacted by the time of day. Therefore, time was not included in the TSV and reward
229
calculation. An exception was the transfer learning model for setback and occupancy
230
control in Section 2.3, which moved to a nighttime state at certain times. Generally, time
231
functioned as a label, and it did not contain a numerical value that might influence the RL
232
model and training. In summary, the state space can be expressed as
233
, , , , , ,
air air setpoint
S T RH T Clo Met occupancy time=
(1)
234
235
Action space: The action is the occupant behavior that is performed with the goal of more
236
comfortable conditions. In this research, the action space included raising or lowering the
237
thermostat set point by different degrees, or maintaining the same set point; putting on,
238
keeping the same, or taking off clothes; and arriving. The action space can be expressed as
239
, , , , ,
raise keep lower put on keep take off
A A A A A A A=
(2)
240
where the first three actions
,,
raise keep lower
A A A
represent adjustments to the thermostat set
241
point, and the last three actions
,,
put on keep take off
A A A
represent adjustments to the clothing
242
level.
243
244
Reward function: The goal of the action is a higher thermal comfort level for the occupants.
245
Therefore, in this research, the reward was modelled as the absolute difference between the
246
initial TSV before the action and the final TSV after the action, which can be expressed as
247
1tt
R TSV TSV +
=−
(3)
248
where subscripts t and t+1 represent the current and next time steps, respectively. It is clear
249
that in order to maximize the reward R,
10
t
TSV +=
, which means that the desired thermal
250
sensation is neutral after the occupant behavior occurs.
251
252
In this research, we predicted the TSV in offices with the use of an ANN model [17, 62]
253
that expresses TSV as a function of four input parameters as:
254
TSV = f (air temperature, relative humidity, clothing insulation, metabolic rate) (4)
255
where f represents the function of the ANN model. We assumed that the mean radiation
256
temperature was the same as the air temperature, and the air velocity was less than 0.2 m/s.
257
To develop the ANN model, we collected data from over 25 occupants in an office building
258
during the four seasons of 2017. The number of collected data points for training the model
259
was about 5,000. The model had three layers, and there were ten neurons in the hidden
260
layer. We used the Levenberg-Marquardt algorithm to train the model, and it predicted the
261
TSV with a mean absolute error (MAE) of 0.43 after training.
262
263
264
Fig. 3. MDP for the occupant behavior of thermostat set point manual control and clothing
265
level adjustment. Each state space includes numerous parameters, as expressed by Eq. (1),
266
and the figure displays only the key parameters. The initial state is followed by many
267
actions, follow-up states and possible subsequent states. In addition to what is shown in
268
the figure, further possibilities are indicated by an ellipsis.
269
270
For buildings without a thermal comfort model, predicted mean vote (PMV) [63] can also
271
be used to model the reward, which is expressed as
272
(5)
273
As above, maximizing the reward R requires that
10
t
PMV +=
.
274
275
Reward modelling in the RL model for multi-occupant offices with multiple agents [64]
276
was different from that for single-occupant offices. For multi-occupant offices, the
277
modelling was divided into two categories. In one category, the reward of a dominant
278
occupant was maximized. Here, one occupant near the thermostat would adjust the
279
thermostat dominantly, and the others in the room would compromise with this occupants
280
preference, as is the case in some workplaces [17, 65]. Thus, the reward was for the
281
dominant individual and can be expressed as
282
283
t,dominant t+1,dominant
R = TSV - TSV
(6)
284
During data collection, we also found that in some offices all the occupants had equal
285
control of the thermostat [17]. Therefore, in our other multi-occupant office category, the
286
average reward for all occupants was maximized. The reward was averaged as
287
288
( )
, 1,
1
t i t i
i
R TSV TSV
n+
=−
(7)
289
where n is the number of occupants in the room, and i represents different occupants.
290
For a single-occupant office where only the dominant occupant was in the room, the two
291
categories of reward modelling were the same as Eq. (6) and (7).
292
293
2.2 Q-learning
294
295
After designing the model framework, we needed to train the RL model. One of the
296
available training methods is Q-learning. Here Q means quality,a policy function of
297
an action taken in a given state. It can be expressed as the following mapping:
298
299
:Q S A R→
(8)
300
301
Q-learning is a model-free RL algorithm for learning a policy that tells an agent which
302
actions to take under various circumstances [66]. This learning method has been widely
303
used for training RL models [43, 49, 51, 67, 68]. With the state space, action space and
304
reward modelling described in Section 2.1, we used the Q-learning algorithm to update the
305
quality. The updating equation for Q-learning can be expressed as
306
307
( ) ( ) ( ) ( )
1
, , max , ,
new t t old t t t t old t t
a
Q s a Q s a r Q s a Q s a

+

= + +

(9)
308
where
Q
is the quality,
s
the state,
a
the action,
the learning rate,
r
the reward,
the
309
discount factor, and
( )
1
max ,
t
aQ s a
+
the estimation of optimal future value. According to
310
this equation, as the training begins, the quality is initialized to arbitrary or uniform values.
311
Then, at each episode
t
of the training process, the agent in state
t
s
selects an action
t
a
312
with a reward
t
r
and an estimated future reward for future actions. After the action, the
313
agent enters a new state
1t
s+
. When the maximized reward is confirmed, the optimal action
314
is learned and the quality
Q
is updated. In this process, the RL model gradually learns to
315
take actions in a certain environment, and we can obtain a Q-learning table of states by
316
various actions. Q-learning is similar to the actual decision process for occupant behavior
317
in buildings.
318
319
The learning rate and discount factor could impact the learning process. In this study, we
320
selected a learning rate of 0.3 and discount factor of 1. We used a table of states by various
321
actions because the choices of actions in the MDP were discrete for adjusting the
322
thermostat by different degrees or clothing insulation to certain values. Thus, the discount
323
factor had little impact on the Q-learning result. As for the learning rate, we will provide
324
training results for learning rate variations in Section 3.2. We used the MATLAB 2020a
325
Reinforcement Learning Toolbox [69] to build and train the RL model.
326
327
2.3 Transfer learning
328
329
After designing and training the RL occupant behavior model, we sought to transfer the
330
model to other buildings with limited information and even with no data. As shown in Fig.
331
4(a), an ANN model, one of the data-driven models, has a layered structure with input,
332
hidden and output layers. The training process for the ANN model uses data to update the
333
values of coefficients in the hidden layer. Therefore, the model can only be used for similar
334
buildings with available data. In previous attempts to apply the model directly to other
335
buildings, the performance was usually not good [18, 21]. In those studies, transfer learning
336
of the ANN model grabbed layers of neural network weights and trained the model again
337
with new data. Prediction for different buildings with transferring data-driven models
338
requires the data to retrain. Additionally, the meanings of the coefficients inside the models
339
are still unclear to researchers. Therefore, the information in the hidden layer cannot be
340
transferred or used for other buildings. However, as shown in Fig. 4(b), the policy-based
341
RL occupant behavior model is a logical model with physical meaning, and thus it can be
342
partially transferred to other buildings. We transferred the higher-level rules of the RL
343
model, i.e., the logic of thermal actions, the pursuit of thermal comfort from one building
344
to another building. We could do this because even for different buildings and HVAC
345
control systems, the logic of occupant behavior that seeks more comfortable conditions
346
remained the same. Therefore, the feasible actions and rewards of the RL model were
347
similar for different buildings. For example, we built an RL occupant behavior model for
348
a building with manual thermostat control. In other buildings with thermostat setback or
349
occupancy control, occupants might adjust the thermostat set point in different ways. When
350
they left the room or during the night, the building automation system could reset the
351
thermostat set point to save energy. When the occupants reentered the room, they could
352
adjust the set point and override the system operation. The occupants’ overriding of the
353
automation systems might indicate their dissatisfaction [70]. As such, there was a night
354
state before the occupants arrival in the morning, when the set point and air temperature
355
were different, as depicted in Fig. 4(b). After the occupants arrival or in the morning, the
356
state space entered the normal initial state. Thus, the transfer learning model structure was
357
similar to original model with possible actions and rewards in the daytime. We could
358
therefore transfer a portion of the parameters in the action space and the rewards to other
359
buildings. Even without data for these buildings, we could still model and predict the
360
occupant behavior.
361
362
For residential buildings, large-scale collection of occupant behavior data has usually been
363
more difficult, because such buildings are generally not equipped with building automation
364
system (BAS) [17]. The use of questionnaire surveys to gather data has been reported as
365
time-consuming and limited in accuracy [23]. Under this circumstance, building a model
366
by transfer learning was a feasible approach. Similarly, we also transferred the RL occupant
367
behavior model for office buildings to residential buildings. The occupant behavior of
368
manual thermostat control was the same in both types of buildings, but the improved
369
thermal comfort level and reward for actions were different [17]. Moreover, there were
370
other factors that distinguished the occupant behavior in office buildings from that in
371
residential buildings [71, 72]. Therefore, we needed to modify the state space and reward
372
in the transfer learning model for residential buildings.
373
374
375
Fig. 4. Transfer of the occupant behavior model for manual control to other buildings with
376
thermostat setback or occupancy control: (a) the data-driven ANN model cannot be
377
transferred because of the coefficient values in the hidden layer; (b) the policy-based RL
378
model can be transferred, and portions of the action and state space are the same.
379
380
For residential buildings, a previous study [17] found that the comfort zone of a building
381
was 1.7 °C (3 °F) higher in summer, and 1.7 °C (3 °F ) lower in winter, than the ASHRAE
382
comfort zone [73]. Therefore, we were able to use this information to transfer the thermal
383
sensation and occupant behavior model from the office building to residential buildings.
384
Since the shape of the thermal comfort zone was similar, whereas the impact of air
385
temperature on thermal comfort and occupant behavior was different [17], the logical RL
386
behavior model could be partially transferred. The MDP for manual control of the
387
thermostat was the same in the office building and residential buildings. We transferred the
388
RL occupant behavior model with the use of PMV to calculate the reward as
389
390
Residence_i Residence_f
R = PMV PMV
(10)
391
392
Here, the PMV in the residence was defined differently from the traditional PMV model
393
because of the different comfort zone. With the 3 °F difference in winter and summer, it
394
was calculated as
395
Residence_winter air r
PMV = PMV(T +3,RH,T ,V,Clo,Met)
(11)
396
397
Residence_summer air r
PMV = PMV(T 3,RH,T ,V,Clo,Met)
(12)
398
where the PMV function represents the traditional way of calculating PMV with six
399
parameters.
400
401
2.4 Data collection for model validation
402
403
In order to validate the RL model, this study collected indoor air temperature, relative
404
humidity, thermostat set point, lighting occupancy, clothing level of occupants, and data
405
on the occupant behavior of adjusting the thermostat, from the BAS in 20 offices in the
406
Ray W. Herrick Laboratories (HLAB) building at Purdue University in 2018, as shown in
407
Fig. 5 (a). Half of the offices were multi-occupant student offices, and the rest were single-
408
occupant faculty offices. The building used a variable air volume (VAV) system for heating
409
and cooling. Each office had an independent VAV box and a thermostat (Siemens 544-
410
760A) that enabled the BAS to control the air temperature in the room. We downloaded
411
the indoor environment data of room air temperature and thermostat set point from the
412
BAS. In addition, we used a questionnaire to record the clothing level of the occupants and
413
their clothing-adjustment behavior in the HLAB building.
414
415
We also gathered room air temperature, relative humidity, thermostat set point and lighting
416
occupancy data in four other office buildings on the Purdue University campus in three
417
seasons of 2018, as shown in Fig. 5(b)-(e). Each building contained more than 100 offices.
418
The HVAC systems in these buildings were similar to those in the HLAB building.
419
However, the HVAC control strategies in the four buildings differed from that in the HLAB
420
building. The HVAC system operated constantly in the HLAB building, and the occupants
421
could adjust the thermostat set point manually. The LWSN building, by contrast, used a
422
thermostat setback that overrode the manual control at night, from 11 PM to 6 AM.
423
Meanwhile, the MSEE, HAAS and STAN buildings used occupancy control for the HVAC
424
system in each room in addition to manual control. Table 1 provides the data collection
425
information for each building, including the number of offices in which data was collected,
426
the HVAC control type, the data collection interval, and the types of data that were
427
collected. The details of the data collection process can be found in [17, 74].
428
429
Fig. 5. Photographs of the buildings used for data collection: (a) HLAB building, (b)
430
MSEE building, (c) LWSN building, (d) STAN building and (e) HAAS building
431
432
Table 1. Data collection information for each building
433
Building
Offices for
data collection
HVAC control type
Data
collection
interval
Collected data
HLAB
20
Manual control
5 min
Room lighting status
Number of room occupants
Room air temperature and RH
Thermostat set point
Room CO2 concentration
Clothing level
Room supply-air flow rate
Room supply-air temperature
LWSN
106
Manual control
+thermostat setback
10 min
Room lighting status
Number of room occupants
Room air temperature and RH
Thermostat set point
MSEE
99
Manual control
+occupancy control
15 min
STAN
122
Manual control
+occupancy control
15 min
Clothing level
HAAS
48
Manual control
+occupancy control
15 min
434
435
2.5 Building energy simulation with RL model
436
The purpose of constructing the RL occupant behavior model was to evaluate the impact
437
of occupant behavior on building energy performance. Therefore, we also implemented the
438
RL occupant behavior model in EnergyPlus. We utilized SketchUp to construct the
439
building geometry model in Fig. 6, and then used the model in the EnergyPlus simulations.
440
Table 2 lists the structural and material properties used for the building envelope in the
441
simulations. The structural information was obtained from the HLAB building construction
442
drawings and documents.
443
444
445
Fig. 6. Geometric model of the HLAB building for EnergyPlus simulations
446
447
Table 2. Structural and material properties of the HLAB building for the simulations
448
Construction
component
Layers (from exterior to
interior)
Thickness
(mm)
Conductivity
(W/m K)
Density
(kg/m3)
Specific
heat
(J/kgK)
Exterior
window
Clear float glass
6
0.99
2528
880
Air cavity
13
0.026
1.225
1010
Clear float glass
6
0.99
2528
880
Exterior
wall 1
Brick
92.1
0.89
1920
790
Air cavity
60.3
0.026
1.225
1010
Rigid insulation
50.8
0.03
43
1210
Exterior sheathing
12.7
0.07
400
1300
CFMF stud
152.4
0.062
57.26
964
Gypsum board
15.9
0.16
800
1090
Exterior
wall 2
Aluminum panel
50.8
45.28
7824
500
Rigid insulation
50.8
0.03
43
1210
Exterior sheathing
12.7
0.07
400
1300
CFMF stud
152.4
0.062
57.26
964
Gypsum board
15.9
0.16
800
1090
Interior
gypsum wall
Gypsum board
15.9
0.16
800
1090
Metal stud
92.1
0.06
118
1048
Gypsum board
15.9
0.16
800
1090
Interior glass
Glass
6
0.99
2528
880
wall/door
Interior
wood door
Wood
44.45
0.15
608
1630
449
450
Fig. 7 depicts the simulation process with the RL occupant behavior model. When the
451
simulation starts, the program first checks whether or not the office is occupied, since the
452
behavior occurs only when there is an occupant inside the office. If so, the agent decides
453
on the action to the next time step based on the Q-learning table. Next, the energy
454
simulation program decides whether or not to adjust the thermostat set point or the clothing
455
level of the occupants. The building energy use will correspond to this decision. Moving
456
to the next time step, the program checks whether or not the simulation time has ended; if
457
not, it again checks if the room is occupied. To obtain a reasonable variation range, we
458
performed the simulation 200 times and analyzed the results [74].
459
460
461
Fig. 7. Building energy simulation process incorporating the RL occupant behavior
462
model and Q-learning table of actions
463
464
465
3 Results
466
467
3.1 Results of modelling the reward for action
468
469
Fig. 8 shows the result of reward modelling when the PMV model and the thermal comfort
470
ANN model were used with Eqs. (3)(5). The figure depicts the relationship between
471
occupant behavior and the corresponding rewards in various air temperatures when other
472
parameters were the same. For example, when the air temperature was 19.4 °C (67 °F), the
473
occupant might feel cool in winter. Thus, the reward for raising the thermostat set point
474
was positive most of the time, until the occurrence of overheating caused by an excessive
475
adjustment. For each state, there was one occupant behavior of set point adjustment that
476
led to the maximum reward. The reward situation was similar when the air temperature
477
was high and the occupant lowered the set point. When the air temperature was about
478
22.8 °C (73 °F), the occupant already felt nearly neutral. In this case, either raising or
479
lowering the set point would lead to a negative reward, and the optimal occupant behavior
480
was to make no adjustment. We used this quantified logic to build the RL model.
481
482
Fig. 8. Reward value modelled for different air temperatures in winter by using (a) the
483
PMV model and (b) the thermal comfort ANN model.
484
485
486
3.2 Results of the RL occupant behavior model
487
488
Fig. 9 depicts the training process for the RL model with the use of Q-learning. The blue,
489
red, and orange curves represent the episode reward, the average reward in nearby episodes,
490
and the quality, respectively. Initially, at the beginning of the training process, the RL
491
model knew nothing about the relationship between the environment, states and actions.
492
Thus, it could only take random actions to explore the relationship, and it received varying
493
rewards. As a result, the episode reward was very low. As the learning process went on,
494
the RL model tried various actions to find a way of maximizing the reward. The quality
495
was updated with the use of Eq. (9). In the examples shown in Fig. 9, the thermostat set
496
point and air temperature were 22.8 °C (73 °F), and the occupant was wearing summer
497
clothing. After training over 300 episodes, the RL model learned to take the action at this
498
state that maximized the reward at 0.61. Fig. 9 also shows that an overly high learning rate
499
made the learning process very unstable, and the quality fluctuated during the training.
500
Meanwhile, a low learning rate would slow down the training process.
501
502
503
Fig. 9. Training of the RL model with the use of Q-learning as the number of episodes
504
increases. The blue, orange, and yellow curves represent the episode reward, the average
505
reward in nearby episodes, and the quality, respectively. (a) learning rate = 0.1; (b)
506
learning rate = 0.3; (c) learning rate = 0.5; (d) learning rate = 0.7.
507
508
The trained RL model would always predict the same occupant behavior in the same state
509
and environment, which was unrealistic. Actual office occupant behavior is influenced by
510
many other factors that we did not build into the RL model [24, 28]. Considering all these
511
factors would have led to an overly complex behavior model. A previous study [11] pointed
512
out that behavior models should not only represent deterministic events but also be
513
described by stochastic laws. Additionally, different thermal preferences on the part of
514
occupants would also cause their behavior to differ. Fig. 10 displays the distribution of
515
collected thermostat set point adjustment behavior at different air temperatures in the
516
HLAB offices. In the box-and-whisker charts, the boxes, whiskers and dots represent the
517
standard deviation, upper and lower bounds, and outliers of the occupant behavior,
518
respectively. The air temperature and occupant behavior had a clear negative correlation.
519
The figure indicates that even at the same air temperature and similar states, the variation
520
range of collected occupant behavior was over ±1.1 °C (2°F) in both single- and multi-
521
occupant offices in different seasons. Under these conditions, the rewards of different
522
actions did not differ greatly, but the RL model always pursued the action that absolutely
523
maximized the reward. For example, the RL model might predict the occupant behavior of
524
raising the set point by 5 °F, while raising it by 4 °F or 6 °F would also be reasonable
525
behavior in a real scenario. Therefore, based on the results in Fig. 10, we added a
526
randomness of -2 °F to +2 °F into the RL model for the final decision to make it more
527
reasonable.
528
529
530
Fig. 10. The distribution of thermostat set point adjustment by occupants in: (a) single-
531
occupant offices, (b) multi-occupant offices, (c) winter with Clo = 1, and (d) summer with
532
Clo = 0.57.
533
534
535
3.3 Validation of the RL model
536
537
We validated the RL model with the use of data collected in 2018 after adding the
538
randomness for the final decision. Fig. 11 compares the collected occupant behavior with
539
the RL model prediction for HLAB offices in four seasons in 2018. For most of the time,
540
the RL prediction results matched the collected data. Table 3 lists all the prediction results
541
for R2 and MAE. The R2 was around 0.70.8, and the mean absolute error (MAE) was
542
around 1.51.9 °F. The overall R2 and MAE were 0.79 and 1.68 °F , respectively. We
543
removed some data as outliers when the HVAC system was under maintenance and the
544
occupant lost control. We also compared the performance of the RL model for single- and
545
multi-occupant offices. For single-occupant offices, the R2 was 0.8 and the MAE was
546
1.5 °F. For multi-occupant offices, the R2 was 0.78 and the MAE was 1.8 °F. The prediction
547
results for multi-occupant offices were not as good as for single-occupant offices. In
548
previous studies, a prediction R2 of 0.8 was deemed acceptable for an occupant behavior
549
model [74]. Hence, the model performance of the RL model was reasonable.
550
551
552
Fig. 11. Comparison of collected data on the occupant behavior of adjusting the thermostat
553
set point and the RL model prediction for HLAB offices in 2018: (a) winter, (b) spring, (c)
554
summer, and (d) fall.
555
556
Table 3. Prediction performance of the RL model for the HLAB offices
557
R2
MAE
Winter 2018
0.75
1.6
Spring 2018
0.79
1.9
Summer
2018
0.79
1.5
Fall 2018
0.81
1.7
Overall
0.79
1.68
558
559
3.4 Results of transfer learning model
560
561
After validating the RL model for the HLAB offices, we used the transfer learning model
562
to predict occupant behavior in four other office buildings on the Purdue University campus.
563
Fig. 12 shows the collected occupant behavior data and the RL model prediction in three
564
seasons. The overall R2 was 0.7, and the MAE was 1.7 °F . The results were not as good as
565
the model validation results for the same building, presented in Section 3.3, but it was a
566
feasible method for predicting occupant behavior for the different buildings without data.
567
568
Fig. 12. Comparison between collected behavior data and behavior predicted by the RL
569
model in four other Purdue University office buildings in 2018 in (a) summer, (b) fall, and
570
(c) winter.
571
572
We also used the defined reward in Eqs. (10)(12) to train the RL model again for
573
residential buildings. Table 4 shows the prediction performance of the transfer learning
574
model. In the residential buildings, the R2 was between 0.6 and 0.7 in the four seasons, and
575
the MAE varied from 2.1 °F to 2.9 °F . The results were worse than for the transfer learning
576
in the other four office buildings. The reason was that the cross-type prediction was more
577
difficult than cross-building prediction. In the residential buildings, there were many
578
factors that impacted the occupant behavior differently than in the office buildings [71, 72]
579
but were not considered in the current RL model. One feasible way to further improve the
580
transfer learning model would be to introduce more impact factors in the state space, in
581
addition to re-modeling the reward function. Furthermore, the quality and quantity of
582
collected data in the residential buildings were not as good as in the office buildings
583
because we used questionnaire surveys in the former. Recording accurate occupant
584
behavior data with corresponding environmental parameters and incorporating the impact
585
factors are directions for improvement in further studies of residential buildings.
586
587
Table 4. Prediction performance of the transfer learning model from the HLAB building
588
to residential buildings
589
Season
R2
MAE
Winter
0.67
2.1
Spring
0.61
2.9
Summer
0.69
2.3
Fall
0.67
2.7
590
3.5 Energy analysis with the RL occupant behavior model
591
592
After using the transfer learning model to predict occupant behavior in different buildings,
593
we compared the collected heating and cooling energy use data and the simulation with the
594
RL model in the HLAB building, for two days in winter. In Fig. 13, the box-and-whisker
595
charts represent the simulation results with the use of the RL model and the ANN model.
596
The black curve represents the measured data. For most of the time, the measured energy
597
fluctuated within the lower and upper bounds predicted by the RL model. However, the
598
variation range predicted by the RL model was narrower than that predicted by the ANN
599
model. Table 3 lists the average heating and cooling loads and standard deviations for
600
different seasons in one year. The reason for the difference between models was that the
601
logic of the RL model was to improve the thermal comfort level of occupants. Therefore,
602
the predicted occupant behavior was mostly reasonable. The model could not simulate
603
illogical and extreme behavior such as adjusting the thermostat set point to the highest or
604
lowest value for quick heating or cooling [74]. Such behavior can waste a lot of energy.
605
606
Fig. 13. Comparison of the collected heating and cooling energy use data and the
607
simulation of manual thermostat control with the RL model in the HLAB building for two
608
days in winter.
609
610
Table 5. Comparison of measured data with the heating and cooling loads (kWh)
611
simulated by the ANN and RL models in four seasons.
612
Load
Winter
Spring
Summer
Fall
Heating
Measurement
3396
2833
2102
3183
Simulation using ANN model
3526±108
2925±110
2275±35
3298±68
Simulation using RL model
3084±67
2948±41
2239±27
3067±24
Cooling
Measurement
857
2261
2725
1205
Simulation using ANN model
902±170
2006±115
2597±42
1136±90
Simulation using RL model
863±72
1812±56
2570±30
974±30
613
614
We also used the transfer learning RL model to predict the energy use with thermostat
615
setback and occupancy control. Fig. 14 shows all the energy simulation results in summer.
616
The measurement and simulation using actual behavior exhibited little divergence.
617
Thermostat setback and occupancy control could reduce energy use by about 30% and 70%,
618
respectively. The average energy simulation results using the RL model were almost the
619
same as with the ANN model, but the variation was less with the former model; this finding
620
was similar to the results in Table 3. Hence, it is feasible to use the transfer learning RL
621
model to predict the energy use in other buildings with various HVAC control systems.
622
623
Fig. 14. Comparison of the measured heating and cooling loads and the results simulated
624
by different models with thermostat setback and occupancy control in summer.
625
626
4 Discussion
627
628
In this study, we built an RL model to predict comfort-related occupant behavior in office
629
buildings, and validated the model with collected data. We also used transfer learning for
630
cross-building occupant behavior modelling. Although various impact factors were
631
modelled in state space, including indoor air temperature and relative humidity, room
632
occupancy and time, we neglected factors such as gender [75], cultural background [76],
633
and age [4]. To improve the model’s performance and widen its applicability, we need to
634
determine the quantitative relationship between these factors and the occupant behavior for
635
reward modelling in future studies. In the MDP, the time step size for occupant behavior
636
prediction was 15 minutes. Thus, the impact of occupant behavior on the HVAC system
637
and indoor environment was not immediate; rather, it was somewhat delayed. We assumed
638
that the action could take effect in the subsequent time step if the HVAC system was in
639
normal operation. Actually, based on the collected data and observation [17], after
640
adjusting their behavior, the occupants tended to wait for a while, being aware of the
641
HVAC response time. Even though the neutral TSV had not been reached, no occupant
642
behavior occurred during this waiting time. If an occupant waited for a long time, such as
643
34 time steps, and still did not feel neutral, then there may have been issues with the
644
HVAC control system or air handing units. In this case, the occupant behavior would be
645
very complicated and personalized, including complaining and making another adjustment,
646
this time to an extreme high or low set point. To improve the learning process and model
647
performance, possible rewards could account for abnormal HVAC operations with longer
648
response time and more time steps. Improving thermal comfort and energy efficiency
649
behavior modelling is a potential direction for our future research.
650
651
In this study, we assumed that the occupant behavior and TSV decisions were based on the
652
current indoor environment. This assumption was similar to those in the most recognized
653
PMV thermal comfort model. According to the adaptive thermal comfort model, the
654
outdoor climate and past thermal history may influence occupants’ thermal preference and
655
behavior. This could explain some of the prediction discrepancy exhibited by the current
656
RL occupant behavior model, which was a limitation in the current study. Furthermore, the
657
adaptive thermal comfort model has usually been applied to naturally ventilated rooms. In
658
this study, the buildings were all mechanically ventilated. If we assumed adaptive thermal
659
comfort and considered the outdoor climate and past thermal history, we could still build
660
the MDP and introduce these factors in the state and reward. In this case, the model would
661
be more complex. We could apply the adaptive thermal comfort theory and use historical
662
states in the RL model to improve the prediction result as a future research direction. In the
663
present study, we defined the reward as the difference between initial and final TSV as
664
shown in Eqs. (5)(7). Such definition was result-oriented and path-independent, because
665
the middle terms could be canceled if there were many adjustment behaviors. Thus, the
666
occupants could find the set point that maximized the cumulative reward in different ways,
667
which increased the variation in occupant behavior. However, this study considered only
668
comfort-related occupant behavior and not energy-related behavior in offices. This was
669
because the cost of maintaining a comfortable environment in an office is typically not on
670
the minds of occupants [17]. For simulation of energy-saving occupant behavior in other
671
kinds of buildings, the RL model would also require energy parameters for the state space
672
and reward modelling, such as heating and cooling rates and air change rate [77]. Finally,
673
the RL model and transfer learning in this study exhibited good generalization capability
674
and scalability. These models also have potential for other kinds of occupant behavior,
675
such as interactions with windows [24], shades [19], lighting [78] and other indoor
676
appliances.
677
678
With the RL model, we tried to model and predict the occupant behavior without collecting
679
data but rather by building a policy-based MDP. We also used transfer learning to obtain
680
the occupant behavior in other office buildings and in residential buildings with different
681
HVAC systems and very limited information. This cross-building occupant behavior
682
transfer was extremely difficult in the data-driven models. Therefore, the generalization
683
capability of the RL and transfer learning models was better than that of the regression
684
models. Meanwhile, the better generalization capability of the RL model may indicate a
685
lesser ability to make predictions for specific buildings. As a result, the prediction accuracy
686
of the RL model may not be as good as that of the data-driven models.
687
688
689
5 Conclusion
690
This study built and validated an RL occupant behavior model for an office building and
691
transferred it to other buildings with thermostat setback and occupancy control. We also
692
compared the energy use simulated by the RL model with measured data and predictions
693
by the ANN model for the HLAB offices and four other office buildings on the Purdue
694
University campus. This investigation led to the following conclusions:
695
1. The policy-based RL occupant behavior model trained by Q-learning was able to
696
learn the logic of occupant behavior and predict the behavior accurately. The results
697
for prediction of set point adjustment exhibited an R2 around 0.8 and MAE less than
698
2 °F.
699
2. Transfer learning successfully transferred the logic and part of the occupant
700
behavior model structure to other buildings with different HVAC control systems,
701
such as thermostat setback and occupancy control. We also transferred the RL
702
model from office buildings to residential buildings with a modification to the
703
impact of air temperature on occupant behavior. The prediction performance was
704
good, with R2 above 0.6 and MSE less than 2 °F. These transfer learning models
705
did not require data collection. Unlike data-driven models, the transfer learning RL
706
model had physical meaning and strong generalization capability.
707
3. The results of energy simulation for thermostat manual control, setback and
708
occupancy control with the use of the RL model were similar to the results with the
709
ANN model. The RL simulation accurately reflected the impact of occupant
710
behavior on building energy use, but the variation predicted by the RL model was
711
less than that predicted by the ANN model.
712
713
714
Acknowledgments
715
The authors would like to thank Dr. Orkan Kurtulus in the Center for High Performance
716
Buildings at Purdue University for his assistance in setting the building automation system
717
in the HLAB building. We would also like to thank all the occupants of the HLAB offices
718
for their participation and assistance in obtaining the data reported in this study, and Blaine
719
Miller and Chris Sorenson in the Utility Plant Office of Purdue University for providing
720
data in four Purdue buildings. The data collection in this study was approved by Purdue
721
University Institutional Review Board Protocol # 1704019079.
722
723
724
Conflict of Interest
725
The authors declare that they have no known competing financial interests or personal
726
relationships that could have appeared to influence the work reported in this paper.
727
728
729
References
730
[1] US Department of Energy, Building energy data. (2011).
731
[2] De Wilde, Pieter. "The gap between predicted and measured energy performance of
732
buildings: A framework for investigation." Automation in Construction 41 (2014): 40-49.
733
https://doi.org/10.1016/j.autcon.2014.02.009
734
[3] Zou, Patrick XW, Xiaoxiao Xu, Jay Sanjayan, and Jiayuan Wang. "Review of 10 years
735
research on building energy performance gap: Life-cycle and stakeholder perspectives."
736
Energy and Buildings 178 (2018): 165-181. https://doi.org/10.1016/j.enbuild.2018.08.040
737
[4] Zhang, Yan, Xuemei Bai, Franklin P. Mills, and John CV Pezzey. "Rethinking the role
738
of occupant behavior in building energy performance: A review." Energy and Buildings
739
172 (2018): 279-294. https://doi.org/10.1016/j.enbuild.2018.05.017
740
[5] D’Oca, Simona, Tianzhen Hong, and Jared Langevin. "The human dimensions of
741
energy use in buildings: A review." Renewable and Sustainable Energy Reviews 81 (2018):
742
731-742. https://doi.org/10.1016/j.rser.2017.08.019
743
[6] Sun, Kaiyu, and Tianzhen Hong. "A framework for quantifying the impact of occupant
744
behavior on energy savings of energy conservation measures." Energy and Buildings 146
745
(2017): 383-396. https://doi.org/10.1016/j.enbuild.2017.04.065
746
[7] Hong, Tianzhen, Sarah C. Taylor-Lange, Simona D’Oca, Da Yan, and Stefano P.
747
Corgnati. "Advances in research and applications of energy-related occupant behavior in
748
buildings." Energy and Buildings 116 (2016): 694-702.
749
https://doi.org/10.1016/j.enbuild.2015.11.052
750
[8] Paone, Antonio, and Jean-Philippe Bacher. "The impact of building occupant behavior
751
on energy efficiency and methods to influence it: A review of the state of the art." Energies
752
11, no. 4 (2018): 953. https://doi.org/10.3390/en11040953
753
[9] Yan, Da, William O’Brien, Tianzhen Hong, Xiaohang Feng, H. Burak Gunay, Farhang
754
Tahmasebi, and Ardeshir Mahdavi. "Occupant behavior modeling for building
755
performance simulation: Current state and future challenges." Energy and Buildings 107
756
(2015): 264-278. https://doi.org/10.1016/j.enbuild.2015.08.032
757
[10] Hong, Tianzhen, Jared Langevin, and Kaiyu Sun. "Building simulation: Ten
758
challenges." In Building Simulation, vol. 11, no. 5, pp. 871-898. Tsinghua University Press,
759
2018. https://doi.org/10.1007/s12273-018-0444-x
760
[11] Hong, Tianzhen, Da Yan, Simona D'Oca, and Chien-fei Chen. "Ten questions
761
concerning occupant behavior in buildings: The big picture." Building and Environment
762
114 (2017): 518-530. https://doi.org/10.1016/j.buildenv.2016.12.006
763
[12] R.V. Andersen, B.W. Olesen, J. Toftum, “Modelling occupants’ heating set-point
764
preferences, in: Building Simulation Conference, 2011, pp. 1416.
765
[13] Fabi, Valentina, Rune Vinther Andersen, and Stefano Paolo Corgnati. "Influence of
766
occupant's heating set-point preferences on indoor environmental quality and heating
767
demand in residential buildings." HVAC&R Research 19, no. 5 (2013): 635-645.
768
https://doi.org/ 10.1080/10789669.2013.789372
769
[14] Langevin, Jared, Jin Wen, and Patrick L. Gurian. "Simulating the human-building
770
interaction: Development and validation of an agent-based model of office occupant
771
behaviors." Building and Environment 88 (2015): 27-45.
772
https://doi.org/10.1016/j.buildenv.2014.11.037
773
[15] Pfafferott, J., and S. Herkel. "Statistical simulation of user behaviour in low-energy
774
office buildings." Solar Energy 81, no. 5 (2007): 676-
775
682.https://doi.org/10.1016/j.buildenv.2014.11.037
776
[16] Sun, Kaiyu, and Tianzhen Hong. "A simulation approach to estimate energy savings
777
potential of occupant behavior measures." Energy and Buildings 136 (2017): 43-62.
778
https://doi.org/10.1016/j.enbuild.2016.12.010
779
[17] Deng, Zhipeng, and Qingyan Chen. "Artificial neural network models using thermal
780
sensations and occupants’ behavior for predicting thermal comfort." Energy and Buildings
781
174 (2018): 587-602. https://doi.org/10.1016/j.enbuild.2018.06.060
782
[18] Wang, Zhe, and Tianzhen Hong. "Reinforcement learning for building controls: The
783
opportunities and challenges." Applied Energy 269 (2020): 115036.
784
https://doi.org/10.1016/j.apenergy.2020.115036
785
[19] O'Brien, William, and H. Burak Gunay. "The contextual factors contributing to
786
occupants' adaptive comfort behaviors in officesA review and proposed modeling
787
framework." Building and Environment 77 (2014): 77-87.
788
https://doi.org/10.1016/j.buildenv.2014.03.024
789
[20] Stazi, Francesca, Federica Naspi, and Marco D'Orazio. "A literature review on driving
790
factors and contextual events influencing occupants' behaviours in buildings." Building
791
and Environment 118 (2017): 40-66. https://doi.org/10.1016/j.buildenv.2017.03.021
792
[21] Hong, Tianzhen, Simona D'Oca, William JN Turner, and Sarah C. Taylor-Lange. "An
793
ontology to represent energy-related occupant behavior in buildings. Part I: Introduction to
794
the DNAs framework." Building and Environment 92 (2015): 764-777.
795
https://doi.org/10.1016/j.buildenv.2015.02.019
796
[22] O’Brien, William, Isabella Gaetani, Sara Gilani, Salvatore Carlucci, Pieter-Jan Hoes,
797
and Jan Hensen. "International survey on current occupant modelling approaches in
798
building performance simulation." Journal of Building Performance Simulation 10, no. 5-
799
6 (2017): 653-671. https://doi.org/10.1080/19401493.2016.1243731
800
[23] Jia, Mengda, Ravi S. Srinivasan, and Adeeba A. Raheem. "From occupancy to
801
occupant behavior: An analytical survey of data acquisition technologies, modeling
802
methodologies and simulation coupling mechanisms for building energy efficiency."
803
Renewable and Sustainable Energy Reviews 68 (2017): 525-540.
804
https://doi.org/10.1016/j.rser.2016.10.011
805
[24] Fabi, Valentina, Rune Vinther Andersen, Stefano Corgnati, and Bjarne W. Olesen.
806
"Occupants' window opening behaviour: A literature review of factors influencing
807
occupant behaviour and models." Building and Environment 58 (2012): 188-198.
808
https://doi.org/10.1016/j.buildenv.2012.07.009
809
[25] Tijani, Khadija, Stephane Ploix, Benjamin Haas, Julie Dugdale, and Quoc Dung Ngo.
810
"Dynamic Bayesian Networks to simulate occupant behaviours in office buildings related
811
to indoor air quality." arXiv preprint arXiv:1605.05966 (2016).
812
https://arxiv.org/ftp/arxiv/papers/1605/1605.05966.pdf
813
[26] D’Oca, Simona, Stefano Corgnati, and Tianzhen Hong. "Data mining of occupant
814
behavior in office buildings." Energy Procedia 78 (2015): 585-590.
815
https://doi.org/10.1016/j.egypro.2015.11.022
816
[27] Zhou, Xin, Tiance Liu, Da Yan, Xing Shi, and Xing Jin. "An action-based Markov
817
chain modeling approach for predicting the window operating behavior in office spaces."
818
In Building Simulation, pp. 1-15. Tsinghua University Press, 2020.
819
https://doi.org/10.1007/s12273-020-0647-9
820
[28] Andrews, Clinton J., Daniel Yi, Uta Krogmann, Jennifer A. Senick, and Richard E.
821
Wener. "Designing buildings for real occupants: An agent-based approach." IEEE
822
Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 41, no. 6
823
(2011): 1077-1091. https://doi.org/10.1109/TSMCA.2011.2116116
824
[29] Reinhart, Christoph F. "Lightswitch-2002: A model for manual and automated control
825
of electric lighting and blinds." Solar Energy 77, no. 1 (2004): 15-28.
826
https://doi.org/10.1016/j.solener.2004.04.003
827
[30] Ryu, Seung Ho, and Hyeun Jun Moon. "Development of an occupancy prediction
828
model using indoor environmental data based on machine learning techniques." Building
829
and Environment 107 (2016): 1-9. https://doi.org/10.1016/j.buildenv.2016.06.039
830
[31] Zhou, Hao, Lifeng Qiao, Yi Jiang, Hejiang Sun, and Qingyan Chen. "Recognition of
831
air-conditioner operation from indoor air temperature and relative humidity by a data
832
mining approach." Energy and Buildings 111 (2016): 233-241.
833
https://doi.org/10.1016/j.enbuild.2015.11.034
834
[32] Papadopoulos, Sokratis, and Elie Azar. "Integrating building performance simulation
835
in agent-based modeling using regression surrogate models: A novel human-in-the-loop
836
energy modeling approach." Energy and Buildings 128 (2016): 214-223.
837
https://doi.org/10.1016/j.enbuild.2016.06.079
838
[33] Azar, Elie, and Carol C. Menassa. "Agent-based modeling of occupants and their
839
impact on energy use in commercial buildings." Journal of Computing in Civil Engineering
840
26, no. 4 (2012): 506-518. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000158
841
[34] Lee, Yoon Soo, and Ali M. Malkawi. "Simulating multiple occupant behaviors in
842
buildings: An agent-based modeling approach." Energy and Buildings 69 (2014): 407-416.
843
https://doi.org/10.1016/j.enbuild.2013.11.020
844
[35] Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (Vol. 135).
845
Cambridge: MIT Press.
846
[36] Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,
847
Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep
848
reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).
849
https://arxiv.org/pdf/1509.02971.pdf
850
[37] Silver, David, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang,
851
Arthur Guez, Thomas Hubert et al. "Mastering the game of go without human knowledge."
852
nature 550, no. 7676 (2017): 354-359. https://doi.org/10.1038/nature24270
853
[38] Zhang, Zhiang, Adrian Chong, Yuqi Pan, Chenlu Zhang, and Khee Poh Lam. "Whole
854
building energy model for HVAC optimal control: A practical framework based on deep
855
reinforcement learning." Energy and Buildings 199 (2019): 472-490.
856
https://doi.org/10.1016/j.enbuild.2019.07.029
857
[39] Kazmi, Hussain, Johan Suykens, Attila Balint, and Johan Driesen. "Multi-agent
858
reinforcement learning for modeling and control of thermostatically controlled loads."
859
Applied energy 238 (2019): 1022-1035. https://doi.org/10.1016/j.apenergy.2019.01.140
860
[40] Yu, Liang, Weiwei Xie, Di Xie, Yulong Zou, Dengyin Zhang, Zhixin Sun, Linghua
861
Zhang, Yue Zhang, and Tao Jiang. "Deep reinforcement learning for smart home energy
862
management." IEEE Internet of Things Journal 7, no. 4 (2019): 2751-2762.
863
https://doi.org/10.1109/JIOT.2019.2957289
864
[41] Han, Mengjie, Ross May, Xingxing Zhang, Xinru Wang, Song Pan, Yan Da, and Yuan
865
Jin. "A novel reinforcement learning method for improving occupant comfort via window
866
opening and closing." Sustainable Cities and Society (2020): 102247.
867
https://doi.org/10.1016/j.scs.2020.102247
868
[42] Han, Mengjie, Ross May, Xingxing Zhang, Xinru Wang, Song Pan, Da Yan, Yuan Jin,
869
and Liguo Xu. "A review of reinforcement learning methodologies for controlling occupant
870
comfort in buildings." Sustainable Cities and Society 51 (2019): 101748.
871
https://doi.org/10.1016/j.scs.2019.101748
872
[43] Yoon, Young Ran, and Hyeun Jun Moon. "Performance based thermal comfort control
873
(PTCC) using deep reinforcement learning for space cooling." Energy and Buildings 203
874
(2019): 109420. https://doi.org/10.1016/j.enbuild.2019.109420
875
[44] Ruelens, Frederik, Sandro Iacovella, Bert J. Claessens, and Ronnie Belmans.
876
"Learning agent for a heat-pump thermostat with a set-back strategy using model-free
877
reinforcement learning." Energies 8, no. 8 (2015): 8300-8318.
878
https://doi.org/10.3390/en8088300
879
[45] Azuatalam, Donald, Wee-Lih Lee, Frits de Nijs, and Ariel Liebman. "Reinforcement
880
learning for whole-building HVAC control and demand response." Energy and AI 2 (2020):
881
100020. https://doi.org/10.1016/j.egyai.2020.100020
882
[46] Chen, Bingqing, Zicheng Cai, and Mario Bergés. "Gnu-RL: A precocial reinforcement
883
learning solution for building HVAC control using a differentiable MPC policy." In
884
Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient
885
Buildings, Cities, and Transportation, pp. 316-325. 2019.
886
https://doi.org/10.1145/3360322.3360849
887
[47] Ding, Xianzhong, Wan Du, and Alberto E. Cerpa. "MB2C: Model-based deep
888
reinforcement learning for multi-zone building control." In Proceedings of the 7th ACM
889
International Conference on Systems for Energy-Efficient Buildings, Cities, and
890
Transportation, pp. 50-59. 2020. https://doi.org/10.1145/3408308.3427986
891
[48] Jia, Ruoxi, Ming Jin, Kaiyu Sun, Tianzhen Hong, and Costas Spanos. "Advanced
892
building control via deep reinforcement learning." Energy Procedia 158 (2019): 6158-6163.
893
https://doi.org/10.1016/j.egypro.2019.01.494
894
[49] Chen, Yujiao, Leslie K. Norford, Holly W. Samuelson, and Ali Malkawi. "Optimal
895
control of HVAC and window systems for natural ventilation through reinforcement
896
learning." Energy and Buildings 169 (2018): 195-205.
897
https://doi.org/10.1016/j.enbuild.2018.03.051
898
[50] Park, June Young, Thomas Dougherty, Hagen Fritz, and Zoltan Nagy. "LightLearn:
899
An adaptive and occupant centered controller for lighting based on reinforcement
900
learning." Building and Environment 147 (2019): 397-414.
901
https://doi.org/10.1016/j.buildenv.2018.10.028
902
[51] Valladares, William, Marco Galindo, Jorge Gutiérrez, Wu-Chieh Wu, Kuo-Kai Liao,
903
Jen-Chung Liao, Kuang-Chin Lu, and Chi-Chuan Wang. "Energy optimization associated
904
with thermal comfort and indoor air control via a deep reinforcement learning algorithm."
905
Building and Environment 155 (2019): 105-117.
906
https://doi.org/10.1016/j.buildenv.2019.03.038
907
[52] Brandi, Silvio, Marco Savino Piscitelli, Marco Martellacci, and Alfonso Capozzoli.
908
"Deep Reinforcement Learning to optimise indoor temperature control and heating energy
909
consumption in buildings." Energy and Buildings (2020): 110225.
910
https://doi.org/10.1016/j.enbuild.2020.110225
911
[53] Ding, Xianzhong, Wan Du, and Alberto Cerpa. "OCTOPUS: Deep reinforcement
912
learning for holistic smart building control." In Proceedings of the 6th ACM International
913
Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp.
914
326-335. 2019. https://doi.org/10.1145/3360322.3360857
915
[54] Li, Ao, Fu Xiao, Cheng Fan, and Maomao Hu. "Development of an ANN-based
916
building energy model for information-poor buildings using transfer learning." In Building
917
Simulation, pp. 1-13. Tsinghua University Press, 2020. https://doi.org/10.1007/s12273-
918
020-0711-5
919
[55] Mosaico, Gabriele, Matteo Saviozzi, Federico Silvestro, Andrea Bagnasco, and
920
Andrea Vinci. "Simplified state space building energy model and transfer learning based
921
occupancy estimation for HVAC optimal control." In 2019 IEEE 5th International forum
922
on Research and Technology for Society and Industry (RTSI), pp. 353-358. IEEE, 2019.
923
https://doi.org/10.1109/RTSI.2019.8895544
924
[56] Ali, SM Murad, Juan Carlos Augusto, and David Windridge. "A survey of user-
925
centred approaches for smart home transfer learning and new user home automation
926
adaptation." Applied Artificial Intelligence 33, no. 8 (2019): 747-774.
927
https://doi.org/10.1080/08839514.2019.1603784
928
[57] Alam, Mohammad Arif Ul, and Nirmalya Roy. "Unseen activity recognitions: A
929
hierarchical active transfer learning approach." In 2017 IEEE 37th International
930
Conference on Distributed Computing Systems (ICDCS), pp. 436-446. IEEE, 2017.
931
https://doi.org/10.1109/ICDCS.2017.264
932
[58] Mocanu, Elena, Phuong H. Nguyen, Wil L. Kling, and Madeleine Gibescu.
933
"Unsupervised energy prediction in a Smart Grid context using reinforcement cross-
934
building transfer learning." Energy and Buildings 116 (2016): 646-655.
935
https://doi.org/10.1016/j.enbuild.2016.01.030
936
[59] Ribeiro, Mauro, Katarina Grolinger, Hany F. ElYamany, Wilson A. Higashino, and
937
Miriam AM Capretz. "Transfer learning with seasonal and trend adjustment for cross-
938
building energy forecasting." Energy and Buildings 165 (2018): 352-363.
939
https://doi.org/10.1016/j.enbuild.2018.01.034
940
[60] Gao, Nan, Wei Shao, Mohammad Saiedur Rahaman, Jun Zhai, Klaus David, and Flora
941
D. Salim. "Transfer learning for thermal comfort prediction in multiple cities." arXiv
942
preprint arXiv:2004.14382 (2020). https://arxiv.org/pdf/2004.14382.pdf
943
[61] Xu, Shichao, Yixuan Wang, Yanzhi Wang, Zheng O'Neill, and Qi Zhu. "One for many:
944
Transfer learning for building HVAC control." In Proceedings of the 7th ACM
945
International Conference on Systems for Energy-Efficient Buildings, Cities, and
946
Transportation, pp. 230-239. 2020. https://doi.org/10.1145/3408308.3427617
947
[62] Deng, Zhipeng, and Qingyan Chen. "Development and validation of a smart HVAC
948
control system for multi-occupant offices by using occupants’ physiological signals from
949
wristband." Energy and Buildings 214 (2020): 109872.
950
https://doi.org/10.1016/j.enbuild.2020.109872
951
[63] Handbook, A.S.H.R.A.E. "Fundamentals, ASHRAEAmerican Society of Heating."
952
Ventilating and Air-Conditioning Engineers (2017).
953
[64] Foerster, Jakob, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson.
954
"Learning to communicate with deep multi-agent reinforcement learning." In Advances in
955
neural information processing systems, pp. 2137-2145. 2016.
956
[65] Klein, Laura, Jun-young Kwak, Geoffrey Kavulya, Farrokh Jazizadeh, Burcin
957
Becerik-Gerber, Pradeep Varakantham, and Milind Tambe. "Coordinating occupant
958
behavior for building energy and comfort management using multi-agent systems."
959
Automation in Construction 22 (2012): 525-536.
960
https://doi.org/10.1016/j.autcon.2011.11.012
961
[66] Melo, Francisco S. "Convergence of Q-learning: A simple proof." Institute Of Systems
962
and Robotics, Tech. Rep (2001): 1-4.
963
[67] Yang, Lei, Zoltan Nagy, Philippe Goffin, and Arno Schlueter. "Reinforcement
964
learning for optimal control of low exergy buildings." Applied Energy 156 (2015): 577-
965
586. https://doi.org/10.1016/j.apenergy.2015.07.050
966
[68] Cheng, Zhijin, Qianchuan Zhao, Fulin Wang, Yi Jiang, Li Xia, and Jinlei Ding.
967
"Satisfaction based Q-learning for integrated lighting and blind control." Energy and
968
Buildings 127 (2016): 43-55. https://doi.org/10.1016/j.enbuild.2016.05.067
969
[69] https://www.mathworks.com/help/reinforcement-learning/
970
[70] Gunay, H. Burak, William O'Brien, and Ian Beausoleil-Morrison. "A critical review
971
of observation studies, modeling, and simulation of adaptive occupant behaviors in
972
offices." Building and Environment 70 (2013): 31-47.
973
https://doi.org/10.1016/j.buildenv.2013.07.020
974
[71] Wei, Shen, Rory Jones, and Pieter De Wilde. "Driving factors for occupant-controlled
975
space heating in residential buildings." Energy and Buildings 70 (2014): 36-44.
976
https://doi.org/10.1016/j.enbuild.2013.11.001
977
[72] Yu, Zhun, Benjamin CM Fung, Fariborz Haghighat, Hiroshi Yoshino, and Edward
978
Morofsky. "A systematic procedure to study the influence of occupant behavior on building
979
energy consumption." Energy and Buildings 43, no. 6 (2011): 1409-1417.
980
https://doi.org/10.1016/j.enbuild.2011.02.002
981
[73] Standard, A.S.H.R.A.E. "Standard 55-2010, Thermal environmental conditions for
982
human occupancy." American Society of Heating, Refrigerating and Air Conditioning
983
Engineers (2010).
984
[74] Deng, Zhipeng, and Qingyan Chen. "Simulating the impact of occupant behavior on
985
energy use of HVAC systems by implementing a behavioral artificial neural network
986
model." Energy and Buildings 198 (2019): 216-227.
987
https://doi.org/10.1016/j.enbuild.2019.06.015
988
[75] Karjalainen, Sami. "Gender differences in thermal comfort and use of thermostats in
989
everyday thermal environments." Building and Environment 42, no. 4 (2007): 1594-1603.
990
https://doi.org/10.1016/j.buildenv.2006.01.009
991
[76] Montazami, Azadeh, Mark Gaterell, Fergus Nicol, Mark Lumley, and Chryssa Thoua.
992
"Impact of social background and behaviour on children's thermal comfort." Building and
993
Environment 122 (2017): 422-434. https://doi.org/10.1016/j.buildenv.2017.06.002
994
[77] Ghahramani, Ali, Kanu Dutta, and Burcin Becerik-Gerber. "Energy trade off analysis
995
of optimized daily temperature setpoints." Journal of Building Engineering 19 (2018): 584-
996
591. https://doi.org/10.1016/j.jobe.2018.06.012
997
[78] Yan, Da, Xiaohang Feng, Yuan Jin, and Chuang Wang. "The evaluation of stochastic
998
occupant behavior models from an application-oriented perspective: Using the lighting
999
behavior model as a case study." Energy and Buildings 176 (2018): 151-162.
1000
https://doi.org/10.1016/j.enbuild.2018.07.037
1001
1002
1003
1004
1005
1006
1007
1008
1009
Highlights
1010
1. Reinforcement learning model for predicting occupant behavior in adjusting
1011
thermostat set point and clothing level in an office building.
1012
2. Transfer learning model for transferring occupant behavior from one building to
1013
another without data.
1014
3. Transfer learning among buildings of the same type was better than among different
1015
types of buildings.
1016
4. The variation range of energy use predicted by the reinforcement learning model
1017
was smaller than that predicted by the artificial neural network model.
1018
1019
1020
Problem and RL model Application
Energy simulation
Transfer learning model
Validation
Q-learning Other residential
buildings
Other office buildings
HLAB building
Reinforcement learning
occupant behavior model Transfer learning
occupant behavior model
Validation
Comparison
Collected
energy
data
RL model design
Transfer
learning
t t+1
Reward = TSV - TSV
Q-learning
table
Agent
(occupant)
Environment
(indoor
environment)
Action
(occupant
behavior)
Reward
(improving
thermal
comfort)
State
Initial
State
Follow-up
State
-2°C
-1°C
+0°C
+1°C
+2°C
+3°C
+4°C
+5°C
+6°C
+7°C
Add clothes
20°C
35%
0.57clo
1Met…
18°C…
19°C …
20°C …
21°C …
22°C …
23°C …
24°C …
25°C …
26°C …
27°C …
1.0clo…
Action
Thermostat manual control
…...
Subsequent
States
…...
…...
…...
(a) (b)
Input
layer Output
Hidden
layer Input
layer Output
Hidden
layer
Cannot transfer
Setback/occupancy control
Manual control
Initial
State
Follow-up
State
-2°C
-1°C
+0°C
+1°C
+2°C
+3°C
+4°C
+5°C
+6°C
+7°C
Add clothes
20°C
35%
0.57clo
1Met…
Initial
State
Action
(behavior)
+0°C
+1°C
+2°C
+3°C
+4°C
20°C
35%
0.57clo
1Met…
Setback/occupancy control
15°C
40%...
Night
state
Action
(behavior)
Morning/
occupant
arrives
Manual control
Transfer learning
…...
…...
Night/
occupant
leaves
Follow-up
State
18°C…
19°C …
20°C …
21°C …
22°C …
23°C …
24°C …
25°C …
26°C …
27°C …
1.0clo…
…...
…...
…...
20°C …
21°C …
22°C …
23°C …
24°C … …...
…...
…...
(a) (b) (c)
(d) (e)
Start
Room occupied?
RL Q-learning table of actions
Time step+1
Adjust set point/clothing level
End time?
End
No
Yes
No
Yes
Skip to next
occupied time Evaluate building energy use
(a) (b)
(a) (b)
(c) (d)
(a) (b)
(c) (d)
(a) (b)
(c) (d)
(a) (b)
(c)
(a) (b)
(c) (d)
... Yang et al. [29] employed a Recurrent Neural Network (RNN) with a nonlinear autoregressive network topology with exogenous inputs to construct an MPC to operate air-condition mechanical ventilation (ACMV) systems. Deng et al. [30] Built and validated a Reinforcement Learning (RL) based occupant behavior model for thermostat and garment level setting in a single building, then transfer the model to other buildings with different HVAC management systems. Zhang et al. [31] employed two basic interpretable ML techniques, linear regression (LR) and decision tree (DT), to create the surrogates for thermal comfort models. ...
... The assessment of thermal comfort models emphasizes the significance of prediction accuracy, as highlighted in Refs. [30,48]. The evaluation of these models relies on performance metrics, which provide insights into the typical errors made by the thermal comfort model during its predictions. ...
Article
Full-text available
In the current context of energy transition and increasing climate change, optimizing building performance has become a critical objective. Efficient energy use and occupant comfort are paramount considerations in building design and operation. To address these challenges, this study introduces a predictive model leveraging Machine Learning (ML) algorithms. The model aims to predict thermal comfort levels and optimize energy consumption in Heating, Ventilation, and Air Conditioning (HVAC) systems. Four distinct ML algorithms Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF), and EXtreme Gradient Boosting (XGBOOST) are employed for this purpose. Data for the model is collected using a network of Raspberry Pi boards equipped with multiple sensors. Performance evaluation of the ML algorithms is conducted using statistical error metrics, including, Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and coefficient of determination (R2). Results reveal that the RF and XGBOOST algorithms exhibit superior performance, achieving accuracies of 96.7 % and 9.64 % respectively. In contrast, the SVM algorithm demonstrates inferior performance with a R2 of 81.1 %. These findings underscore the predictive capability of the RF and XGBOOST model in forecasting Predicted Mean Vote (PMV) values. The proposed model holds promise for enhancing occupant thermal comfort in buildings while simultaneously optimizing energy consumption in HVAC systems. Further research could explore the practical applications of these findings in building design and operation.
... The necessity for efficient design nowadays cannot be addressed by conventional approaches, which is why researchers are looking into spatial layout design. Designers often use interactive modeling tools or build traditional layouts by hand (Deng, & Chen, (2021) ...
... Moreover, classical PID approaches focus only on indoor environment conditioning without accounting for efficient control approaches. As a response to the foregoing, artificial neural network algorithms or smart learning policies utilizing weather station data are proposed by, [12], [13], [14], [15], [16]. Some critical issues in applying these methods are pointed out in, [17]: essentially, what is observed is that building systems are dynamic system subjects that have a general feature of being nonlinear concerning wall types, building orientation, solar absorptance, occupancy behavior, weather forecast, windows shading, HVAC equipment. ...
Article
Full-text available
Indoor environmental quality (IEQ) significantly impacts human health, well-being, and productivity. However, a comprehensive and in-depth review of the combined effects of IAQ and other multi-domain factors on human productivity is lacking. There has not been any prior review that encapsulates the impact of multi-domain factors on productivity and physiological responses of occupants. To address this gap, this review paper investigates and highlights the impact of IAQ and multi-domain factors (thermal, visual, and acoustic) on human productivity and occupant well-being in the built environment. The review explores various research methods, including evaluation of human productivity and creativity, data collection, and physiological signal analysis. We also examined the interactions between IAQ and multi-domain factors, as well as strategies for optimizing productivity through integrated building design and smart systems. The key findings from this review reveal that IAQ significantly impacts human productivity and occupant well-being, with interactions between IAQ and other IEQ factors further impacting these effects. Despite advances in the field, there are several limitations and gaps in the current research methods and study designs, including small sample sizes, limited and insufficient experimental design and control, reliance on laboratory or simulated environments, lack of follow-up and long-term data, and lack of robust performance metrics. The review proposes future research directions, including specific applications, and follow-up work to address these limitations and further advance the understanding of IAQ and multi-domain factors in the built environment. The implications of this review for policy and practice include the need for holistic and integrated approaches to IAQ and IEQ management, with a focus on creating healthy and productive indoor environments. This review emphasizes the importance of considering the complex interplay between IAQ and multi-domain factors, as well as the potentials of adopting smart control systems and sustainable design strategies to optimize productivity and occupant well-being in the built environment. By addressing these critical issues, we can enhance the overall quality of life for building occupants and contribute to a more sustainable future.
Article
Full-text available
The objective of energy transition is to convert the worldwide energy sector from using fossil fuels to using sources that do not emit carbon by the end of the current century. In order to achieve sustainability in the construction of energy-positive buildings, it is crucial to employ novel approaches to reduce reliance on fossil fuels. Hence, it is essential to develop buildings with very efficient structures to promote sustainable energy practices and minimize the environmental impact. Our aims were to shed some light on the standards, building modeling strategies, and recent advances regarding the methods of control utilized in the building sector and to pinpoint the areas for improvement in the methods of control in buildings in hopes of giving future scholars a clearer understanding of the issues that need to be addressed. Accordingly, we focused on recent works that handle methods of control in buildings, which we filtered based on their approaches and relevance to the subject at hand. Furthermore, we ran a critical analysis of the reviewed works. Our work proves that model predictive control (MPC) is the most commonly used among other methods in combination with AI. However, it still faces some challenges, especially regarding its complexity.
Article
Thermal comfort is a crucial element of smart buildings that assists in improving, analyzing, and realizing intelligent structures. Energy consumption forecasts for such smart buildings are crucial owing to the intricate decision-making processes surrounding resource efficiency. Machine learning (ML) techniques are employed to estimate energy consumption. ML algorithms, however, require a large amount of data to be adequate. There may be privacy violations due to collecting this data. To tackle this problem, this study proposes a federated deep learning (FDL) architecture developed around a deep neural network (DNN) paradigm. The study employs the ASHRAE RP-884 standard dataset for experimentation and analysis, which is available to the general public. The data is normalized using the min-max normalization approach, and the Synthetic Minority Over-sampling Technique (SMOTE) is used to enhance the minority class’s interpretation. The DNN model is trained separately on the dataset after obtaining modifications from two clients. Each client assesses the data greatly to reduce the over-fitting impact. The test result demonstrates the efficiency of the proposed FDL by reaching 82.40% accuracy while securing the data.
Article
Full-text available
The HVAC (Heating, Ventilation and Air Conditioning) system is an important part of a building, which constitutes up to 40% of building energy usage. The main purpose of HVAC, maintaining appropriate thermal comfort, is crucial for the best energy usage. Additionally, thermal comfort is also important for well-being, health, and work productivity. Recently, data-driven thermal comfort models have achieved better performance than traditional knowledge-based methods (e.g. the predicted mean vote model). An accurate thermal comfort model requires a large amount of self-reported thermal comfort data from indoor occupants which undoubtedly remains a challenge for researchers. In this research, we aim to address this data-shortage problem and boost the performance of thermal comfort prediction. We utilise sensor data from multiple cities in the same climate zone to learn thermal comfort patterns. We present a transfer learning-based multilayer perceptron model from the same climate zone (TL-MLP-C*) for accurate thermal comfort prediction. Extensive experimental results on the ASHRAE RP-884, Scales Project and Medium US Office datasets show that the performance of the proposed TL-MLP-C* exceeds the performance of state-of-the-art methods in accuracy and F1-score.
Article
Full-text available
This paper proposes a novel reinforcement learning (RL) architecture for the efficient scheduling and controlling of the heating, ventilation and air conditioning (HVAC) energy use in a commercial building while harnessing its demand response (DR) potentials. With advances in automated building management systems, this can be achieved seamlessly by a smart autonomous RL agent which takes the best action, for example, a change in HVAC temperature set point, necessary to change the electricity usage pattern of a building in response to demand response signals, and with minimal thermal comfort impact to customers. Previous research in this area has tackled only individual aspects of the problem using RL. Specifically, due to the challenges in implementing demand response with whole-building models, simpler analytical models which poorly captures reality have been used instead. And where whole-building models are applied, RL is used for HVAC control mainly to achieve energy efficiency goals while demand response is neglected. Thus, in this research, we implement a holistic framework by designing an efficient RL controller for a whole-building model which optimally controls the HVAC system for improved energy efficiency and thermal comfort levels in addition to achieving demand response goals. Our simulation results show that in applying reinforcement learning for normal HVAC operation, a maximum weekly energy reduction of up to 22% can be achieved compared to the baseline controller. Furthermore, by employing a DR-aware RL controller during demand response periods, power reductions/increases (average) of up to 50% can be achieved on a weekly basis compared to the default RL controller, while keeping occupant thermal comfort levels within acceptable bounds.
Article
Full-text available
In this work, Deep Reinforcement Learning (DRL) is implemented to control the supply water temperature setpoint to terminal units of a heating system. The experiment was carried out for an office building in an integrated simulation environment. A sensitivity analysis is carried out on relevant hyperparameters to identify their optimal configuration. Moreover, two sets of input variables were considered for assessing their impact on the adaptability capabilities of the DRL controller. In this context a static and dynamic deployment of the DRL controller is performed. The trained control agent is tested for four different scenarios to determine its adaptability to the variation of forcing variables such as weather conditions, occupant presence patterns and different indoor temperature setpoint requirements. The performance of the agent is evaluated against a reference controller that implements a combination of rule-based and climatic-based logics. As a result, when the set of variables are adequately selected a heating energy saving ranging between 5 and 12 % is obtained with an enhanced indoor temperature control with both static and dynamic deployment. Eventually the study proves that if the set of input variables are not carefully selected a dynamic deployment is strictly required for obtaining good performance.
Article
Accurate building energy prediction is vital to develop optimal control strategies to enhance building energy efficiency and energy flexibility. In recent years, the data-driven approach based on machine learning algorithms has been widely adopted for building energy prediction due to the availability of massive data in building automation systems (BASs), which automatically collect and store real-time building operational data. For new buildings and most existing buildings without installing advanced BASs, there is a lack of sufficient data to train data-driven predictive models. Transfer learning is a promising method to develop accurate and reliable data-driven building energy prediction models with limited training data by taking advantage of the rich data/knowledge obtained from other buildings. Few studies focused on the influences of source building datasets, pre-training data volume, and training data volume on the performance of the transfer learning method. The present study aims to develop a transfer learning-based ANN model for one-hour ahead building energy prediction to fill this research gap. Around 400 non-residential buildings’ data from the open-source Building Genome Project are used to test the proposed method. Extensive analysis demonstrates that transfer learning can effectively improve the accuracy of BPNN-based building energy models for information-poor buildings with very limited training data. The most influential building features which influence the effectiveness of transfer learning are found to be building usage and industry. The research outcomes can provide guidance for implementation of transfer learning, especially in selecting appropriate source buildings and datasets for developing accurate building energy prediction models.
Article
Reliable energy and performance prediction for building design and planning is important for newly-designed or retrofitted buildings. Window operating behavior has an important influence on the ventilation and energy consumption of these buildings under different realistic scenarios. Therefore, quantitatively describing this behavior and constructing a prediction model are important. In this work, an action-based Markov chain modeling approach for predicting window operating behavior in office spaces was proposed. Two summer measurement data (2016 and 2018) were used to verify the accuracy and validity of the modeling approach. The opening rate, outdoor temperature, time distribution, and on-off curve were proposed as four inspection standards. This study also compared the prediction performance between the action-based Markov chain modeling approach with the state-based Markov chain modeling approach, which is the most popular modeling approach to model occupant window operating behavior. This study proved that the yearly variation of occupants’ behavior performed a form of action that remained unchanged during a certain period. Meanwhile, the results also proved that the action-based Markov chain modeling approach can reflect the actual window operating behavior accurately within an open-plan office, which is a beneficial supplement for energy-consumption simulation software in a window-state prediction module. The state-based Markov chain modeling approach showed better stability and accuracy in terms of the opening rate, whereas the action-based Markov chain modeling approach showed good consistency with the measurement data in the on-off curves and in situations with little data. For the on-off curves, the accuracy of action-based modeling approach in the prediction of window open-state is 20% higher.
Article
An occupant's window opening and closing behaviour can significantly influence the level of comfort in the indoor environment. Such behaviour is, however, complex to predict and control conventionally. This paper, therefore, proposes a novel reinforcement learning (RL) method for the advanced control of window opening and closing. The RL control aims at optimising the time point for window opening/closing through observing and learning from the environment. The theory of model-free RL control is developed with the objective of improving occupant comfort, which is applied to historical field measurement data taken from an office building in Beijing. Preliminary testing of RL control is conducted by evaluating the control method’s actions. The results show that the RL control strategy improves thermal and indoor air quality by more than 90 % when compared with the actual historically observed occupant data. This methodology establishes a prototype for optimally controlling window opening and closing behaviour. It can be further extended by including more environmental parameters and more objectives such as energy consumption. The model-free characteristic of RL avoids the disadvantage of implementing inaccurate or complex models for the environment, thereby enabling a great potential in the application of intelligent control for buildings.
Article
Building controls are becoming more important and complicated due to the dynamic and stochastic energy demand, on-site intermittent energy supply, as well as energy storage, making it difficult for them to be optimized by conventional control techniques. Reinforcement Learning (RL), as an emerging control technique, has attracted growing research interest and demonstrated its potential to enhance building performance while addressing some limitations of other advanced control techniques, such as model predictive control. This study conducted a comprehensive review of existing studies that applied RL for building controls. It provided a detailed breakdown of the existing RL studies that use a specific variation of each major component of the Reinforcement Learning: algorithm, state, action, reward, and environment. We found RL for building controls is still in the research stage with limited applications (11%) in real buildings. Three significant barriers prevent the adoption of RL controllers in actual building controls: (1) the training process is time consuming and data demanding, (2) the control security and robustness need to be enhanced, and (3) the generalization capabilities of RL controllers need to be improved using approaches such as transfer learning. Future research may focus on developing RL controllers that could be used in real buildings, addressing current RL challenges, such as accelerating training and enhancing control robustness, as well as developing an open-source testbed and dataset for performance benchmarking of RL controllers.