ArticlePDF Available

Reinforcement learning of occupant behavior model for cross-building transfer learning to various HVAC control systems

March 2021
Energy and Buildings 238(5–6):110860

March 2021
238(5–6):110860

DOI:10.1016/j.enbuild.2021.110860

Authors:

Zhipeng Deng

University of Central Florida

Qingyan Chen

The Hong Kong Polytechnic University

Occupant behavior plays an important role in the evaluation of building performance. However, many contextual factors, such as occupancy, mechanical system and interior design, have a significant impact on occupant behavior. Most previous studies have built data-driven behavior models, which have limited scalability and generalization capability. Our investigation built a policy-based reinforcement learning (RL) model for the behavior of adjusting the thermostat and clothing level. Occupant behavior was modelled as a Markov decision process (MDP). The action and state space in the MDP contained occupant behavior and various impact parameters. The goal of the occupant behavior was a more comfortable environment, and we modelled the reward for the adjustment action as the absolute difference in the thermal sensation vote (TSV) before and after the action. We used Q-learning to train the RL model in MATLAB and validated the model with collected data. After training, the model predicted the behavior of adjusting the thermostat set point with R² from 0.75 to 0.8, and the mean absolute error (MAE) was less than 1.1 °C (2 °F) in an office building. This study also transferred the behavior knowledge of the RL model to other office buildings with different HVAC control systems. The transfer learning model predicted the occupant behavior with R² from 0.73 to 0.8, and the MAE was less than 1.1 °C (2 °F) most of the time. Going from office buildings to residential buildings, the transfer learning model also had an R² over 0.6. Therefore, the RL model combined with transfer learning was able to predict the building occupant behavior accurately with good scalability, and without the need for data collection.

Flow chart of methods in this study, including the reinforcement learning occupant behavior model, transfer learning model and energy simulation

…

Figures - uploaded by Zhipeng Deng

Content may be subject to copyright.

Content uploaded by Zhipeng Deng

Content may be subject to copyright.

Reinforcement learning of occupant behavior model for cross-building transfer

learning to various HVAC control systems

Zhipeng Deng1, Qingyan Chen1, *

1Center for High Performance Buildings (CHPB), School of Mechanical Engineering,

Purdue University, 585 Purdue Mall, West Lafayette, IN 47907, USA

*Corresponding author: Qingyan Chen, yanchen@purdue.edu

Abstract

Occupant behavior plays an important role in the evaluation of building performance.

However, many contextual factors, such as occupancy, mechanical system and interior

design, have a significant impact on occupant behavior. Most previous studies have built

data-driven behavior models, which have limited scalability and generalization capability.

Our investigation built a policy-based reinforcement learning (RL) model for the behavior

of adjusting the thermostat and clothing level. Occupant behavior was modelled as a

Markov decision process (MDP). The action and state space in the MDP contained

occupant behavior and various impact parameters. The goal of the occupant behavior was

a more comfortable environment, and we modelled the reward for the adjustment action as

the absolute difference in the thermal sensation vote (TSV) before and after the action. We

used Q-learning to train the RL model in MATLAB and validated the model with collected

data. After training, the model predicted the behavior of adjusting the thermostat set point

with R2 from 0.75 to 0.8, and the mean absolute error (MAE) was less than 1.1 °C (2 °F)

in an office building. This study also transferred the behavior knowledge of the RL model

to other office buildings with different HVAC control systems. The transfer learning model

predicted the occupant behavior with R2 from 0.73 to 0.8, and the MAE was less than

1.1 °C (2 °F) most of the time. Going from office buildings to residential buildings, the

transfer learning model also had an R2 over 0.6. Therefore, the RL model combined with

transfer learning was able to predict the building occupant behavior accurately with good

scalability, and without the need for data collection.

Keywords

Thermal comfort, machine learning, artificial neural network, air temperature, thermostat

set point, Q-learning, building performance simulation

1. Introduction

In the United States, buildings account for 41% of primary energy use, mainly for

maintaining a comfortable and healthy indoor environment [1]. Unfortunately, current

methods for simulating building energy consumption are often inaccurate, and the error

can be as high as 150% to 250% [2, 3]. Discrepancies between the simulated and actual

energy consumption may arise from various occupant behavior in buildings [4, 5].

Therefore, it is important to estimate the impact of occupant behavior on building energy

consumption [6].

Occupant behavior in buildings refers to occupants’ movements and their interactions with

building components such as thermostats, windows, lights, blinds and internal equipment

[7]. The existing methods for exploring the effects of occupant behavior on energy

consumption were mostly based on building performance simulations [8]. In these

simulations, modelling occupant behavior is challenging due to its complexity [9, 10, 11].

Previous studies have tried to predict the energy consumption in commercial and

residential buildings with the use of various occupant behavior models. These models can

be divided into three categories: data-driven, physics-based and hybrid models.

In the data-driven category, many researchers have built linear regression models [12],

logistic regression models [13, 14], statistical models [15-16], and artificial neural network

(ANN) models [17]. To be specific, Andersen [12] and Fabi [13] collected data on

occupants’ heating set-points in dwellings and predicted the thermal preference along with

indoor environmental quality and heating demand. Langevin’s model [14] used heating set-

point data from a one-year field study in an air-conditioned office building. Sun and Hong

[16] used a simulation approach to estimate energy savings for five common types of

occupant behavior in a real office building across four typical climates. Deng and Chen

[17] collected data in an office building for one year to predict occupant behavior in regard

to thermostat and clothing level by means of an ANN model. In these studies, the models

considered different variables that affect occupant behavior in buildings. However, the

generalization capabilities of these data-driven models were not good [18], since the

occupant behavior differed from building to building. Some review papers [19, 20] have

discussed contextual factors that cause occupant behavior to vary greatly, such as room

occupancy, availability and accessibility of an HVAC system, and interior design. The

authors observed that it was difficult to apply an occupant behavior model developed for

one building to another building. Hong et al. also indicated that, because a large number of

data-driven behavior models emerged in scattered locations around the world, they lack

standardization and consistency and cannot easily be compared one with another [21].

Moreover, all the data-driven models require sufficient data for training, but the estimation

of building energy and modelling of occupant behavior are done mostly during the early

design stages, when collecting occupant behavior data is impossible [22]. It is hard to build

a data-driven occupant behavior model without data or satisfactory generalization

capability.

As for the physics-based models, a review by Jia et al. [23] pointed out that occupant

behavior modelling has progressed from deterministic or static to more detailed and

complex. Therefore, many researchers have based their models on the causal relationships

of occupant behavior. The driving factors of occupant behavior can be divided into three

main types: environmentally related, time related and random factors [20, 24]. Hong et al.

developed a DNAS (drivers, needs, actions, systems) framework that standardized the

representation of energy-related occupant behavior in buildings [21]. Many researchers

have adopted this framework for their behavior studies. For example, dynamic Bayesian

networks by Tijani et al. [25] simulated the occupant behavior in office buildings as it

relates to indoor air quality. The advantage of Bayesian network model was in its

representation of occupant behavior as probabilistic cause-effect relationships based on

prior knowledge. D’Oca et al. [26] built a knowledge discovery database for window-

operating behavior in 16 offices. Zhou et al. [27] used an action-based Markov chain

approach to predict window-operating actions in office spaces. They found that the Markov

chain reflected the actual behavior accurately in an open-plan office and was therefore a

beneficial supplemental module for energy simulation software. The Markov chain model

depends on the previous state to predict the probability of an event occurring. This

characteristic is useful for representing individuals’ actions and motivations [9]. In addition,

many researchers have built other kinds of models for different building types and

scenarios. For instance, hidden Markov models [23, 28] were used to simulate occupant

100

behavior with unobservable hidden states, and thus these models could be employed under

101

very complicated conditions. Survival models [29] could feature different occupant types

102

to mimic variations in control behavior. Meanwhile, a decision tree model [30, 31]

103

regarded occupant decisions and possible behavior as branched graphical classification.

104

This model was straightforward, but complex causal factors in real situations might give

105

rise to too many branches. In recent years, more complex agent-based models [32-34] have

106

yielded good predictions of occupant behavior with individual differences among

107

occupants. In short, physics-based occupant behavior models with physical meaning have

108

exhibited better generalization capability than data-driven models. Hence, the present study

109

used a Markov decision process (MDP) to model occupant behavior and build a logic-

110

based reinforcement learning model to explore the model’s scalability.

111

112

Reinforcement learning (RL) is a machine learning area concerned with the ways in which

113

agents take actions to maximize certain rewards [35]. Off-policy RL can use historical data

114

for training without interacting with the environment. In contrast, policy-based

115

reinforcement learning does not require previous training data because it creates its own

116

experience via random explorations of the environment. As such, this way of learning can

117

obtain rules and knowledge not limited to specific conditions but adaptable to various

118

scenarios. It has been applied successfully to a range of fields, including robot control [36]

119

and playing Go [37]. In the built environment, the RL model has been used to improve

120

building energy efficiency and management when the reward is defined as minimizing

121

building energy consumption [38-40]. For instance, Zhang et al. [38] used deep

122

reinforcement learning to control a radiant heating system in an existing office building

123

and achieved a 16.7% reduction in heating demand. A multi-agent reinforcement learning

124

framework by Kazmi et al. [39] achieved a 20% reduction in the energy required for the

125

hot water systems in over 50 houses. Liang [40] modelled an HVAC scheduling system

126

control as an MDP, and the model did not require prior knowledge of the building thermal

127

dynamics model. Similarly, when the reward is the thermal comfort level of occupants, the

128

RL model can be used to control the thermal comfort and HVAC system in buildings [41,

129

42]. For example, Yoon et al. [43] built performance-based comfort control for cooling

130

while minimizing the energy consumption. Ruelens and coauthors [44] used model-free

131

RL for a heat-pump thermostat. Their learning agent reduced the energy consumption by

132

4–9% during 100 winter days and by 9–11% during 80 summer days. Azuatalam et al. [45]

133

applied RL to the optimal control of whole-building HVAC systems while harnessing RL’s

134

demand response capabilities. Similarly, Chen [46] and Ding [47] developed novel deep

135

RL for reducing the training data set and training time. Meanwhile, several previous studies

136

used the RL model for advanced building control [43, 48, 49] and lighting control [50]. In

137

addition, there have been some integrated applications. For example, Valladares et al. [51]

138

used the RL model with a probability of reward combination to improve both the thermal

139

comfort and indoor air quality in buildings. The RL model developed by Brandi et al. [52]

140

optimized indoor temperature control and heating energy consumption in buildings. Ding

141

et al. [53] also employed a novel deep RL framework for optimal control of building

142

subsystems, including HVAC, lighting, blind and window. Hence, RL can be used to model

143

the HVAC system for both thermal comfort and energy management. Physics-based and

144

model-free RL also have the potential to model occupant behavior without data since the

145

logic is very similar. Therefore, this research built an RL model for thermostat set point

146

and clothing level adjustment behavior based on the correlation between thermal sensation

147

and thermally influenced occupant behavior [17].

148

149

For modeling of the occupant behavior in buildings with limited information and no data,

150

transfer learning was a feasible approach [18]. The transfer learning method stores

151

knowledge about one problem and then applies it to a related problem. It has been used for

152

cross-building [54, 55], cross-home [56] and even cross-city [57] energy modelling. For

153

instance, Mocanu et al. [58] transferred a building energy prediction to a new building in a

154

smart grid. Ribeiro et al. [59] used various machine learning methods to predict school

155

building energy and transfer the prediction to other new schools. Gao et al. [60] built a

156

transfer learning model for thermal comfort prediction in multiple cities. Xu et al. [61]

157

conducted transfer learning for HVAC control between buildings with different sizes,

158

numbers of thermal zones, materials, layouts, air conditioner types, and ambient weather

159

conditions. They found that this approach significantly reduced the training time and

160

energy cost. Therefore, based on the potential of transfer learning, we used it to transfer

161

knowledge about occupant behavior from one building to other buildings.

162

163

The purpose of the present study was to build an RL occupant behavior model for

164

thermostat and clothing level adjustment in a particular building, and transfer the model to

165

other buildings with different HVAC control systems. For this purpose, we first built an

166

MDP of the occupant behavior and used a thermal sensation model to build the rewards.

167

We then trained the RL model with the use of Q-learning. Next, we used transfer learning

168

to explore the occupant behavior in several other buildings. We also validated the RL

169

occupant behavior model and the transferred model with data collected from various

170

buildings. Finally, we analyzed the simulated building energy performance with the use of

171

the RL model and the transferred model.

172

173

2. Methods

174

To develop an occupant behavior model, we first modeled the occupant behavior as an

175

MDP and developed the RL model on the basis of this process. Subsequently, we trained

176

the model with the use of a Q-learning algorithm. Next, we transferred the knowledge of

177

the occupant behavior model from one building with manual control to other buildings with

178

thermostat setback and occupancy control systems. Finally, we validated the transfer

179

learning model with collected data. Fig. 1 summarizes the methods and models in this study.

180

181

182

Fig. 1. Flow chart of methods in this study, including the reinforcement learning

183

occupant behavior model, transfer learning model and energy simulation

184

185

2.1 Framework of reinforcement learning model

186

187

As shown in Fig. 2, in the RL model, an agent can gather information directly from the

188

environment of different states, and then take actions inside and compare the results of

189

these actions via the reward function. This cycle is repeated over time, until the agent has

190

enough experience to correctly choose the actions that yield the maximum reward. Thus,

191

through interaction with an environment and repeated actions, the RL model can evaluate

192

the consequences of actions by learning from past experience. As for the building

193

occupants, the decision to take an action in a specific indoor environment is a similar

194

process to that of the RL model. The MDP is used to describe an environment for

195

reinforcement learning, because the indoor environment and thermal comfort are fully

196

observable. In this study, the occupant behavior was modelled as a decision-making

197

process in which the policy-based RL was used. The building occupant, the occupant

198

behavior, the indoor environment and the improving thermal comfort level are the agent,

199

action, state and reward, respectively, in the model. In each state, the logic of occupant

200

behavior is to proactively seek more comfortable conditions in the indoor environment [11].

201

Numerous factors are related to the occupant behavior, and we will introduce them in detail

202

in the following sections.

203

204

Fig. 2. Illustration of the RL model with agent, action space, environment space and

205

rewards.

206

207

We modeled the occupant behavior in offices as an MDP, as shown in Fig. 3. In the initial

208

state, the agent had many possible choices of behavior, such as adjusting the thermostat set

209

point by various degrees or adjusting the clothing level. For every action, there was a

210

corresponding feedback reward, such as improvement or deterioration of thermal comfort.

211

The agent took an action to enter a follow-up environment, and this process kept going.

212

The time step size for action prediction was 15 minutes. We took the actual occupant

213

behavior occurrence into consideration, because there was a certain delay in the occurrence

214

of the behavior, and the occupant did not act immediately when feeling uncomfortable. We

215

also assumed that the action could take effect in the subsequent time step if the HVAC

216

system was in normal operation. Note that in Fig. 3 we have listed only some possible

217

actions. There may be others, such as reducing the clothing level and making a more

218

extreme adjustment to the thermostat set point. These additional actions are represented by

219

an ellipsis.

220

221

The MDP in this study entailed the following specifications:

222

Environment space: The state contains information about the indoor environment that

223

occupants use in deciding on the proper action. In this research, the state space included

224

room air temperature, room air relative humidity, thermostat set point, clothing level of

225

occupants, metabolic rate, room occupancy and time of day. Although there are many other

226

factors [20, 24] that impact occupant behavior, we neglected them in order to simplify the

227

structure of the RL model. Here we assumed that the thermal sensation of occupants was

228

not impacted by the time of day. Therefore, time was not included in the TSV and reward

229

calculation. An exception was the transfer learning model for setback and occupancy

230

control in Section 2.3, which moved to a nighttime state at certain times. Generally, time

231

functioned as a label, and it did not contain a numerical value that might influence the RL

232

model and training. In summary, the state space can be expressed as

233

 

, , , , , ,

air air setpoint

S T RH T Clo Met occupancy time=

(1)

234

235

Action space: The action is the occupant behavior that is performed with the goal of more

236

comfortable conditions. In this research, the action space included raising or lowering the

237

thermostat set point by different degrees, or maintaining the same set point; putting on,

238

keeping the same, or taking off clothes; and arriving. The action space can be expressed as

239

 

, , , , ,

raise keep lower put on keep take off

A A A A A A A=

(2)

240

where the first three actions

raise keep lower

A A A

represent adjustments to the thermostat set

241

point, and the last three actions

put on keep take off

A A A

represent adjustments to the clothing

242

level.

243

244

Reward function: The goal of the action is a higher thermal comfort level for the occupants.

245

Therefore, in this research, the reward was modelled as the absolute difference between the

246

initial TSV before the action and the final TSV after the action, which can be expressed as

247

1tt

R TSV TSV +

=−

(3)

248

where subscripts t and t+1 represent the current and next time steps, respectively. It is clear

249

that in order to maximize the reward R,

TSV +=

, which means that the desired thermal

250

sensation is neutral after the occupant behavior occurs.

251

252

In this research, we predicted the TSV in offices with the use of an ANN model [17, 62]

253

that expresses TSV as a function of four input parameters as:

254

TSV = f (air temperature, relative humidity, clothing insulation, metabolic rate) (4)

255

where f represents the function of the ANN model. We assumed that the mean radiation

256

temperature was the same as the air temperature, and the air velocity was less than 0.2 m/s.

257

To develop the ANN model, we collected data from over 25 occupants in an office building

258

during the four seasons of 2017. The number of collected data points for training the model

259

was about 5,000. The model had three layers, and there were ten neurons in the hidden

260

layer. We used the Levenberg-Marquardt algorithm to train the model, and it predicted the

261

TSV with a mean absolute error (MAE) of 0.43 after training.

262

263

264

Fig. 3. MDP for the occupant behavior of thermostat set point manual control and clothing

265

level adjustment. Each state space includes numerous parameters, as expressed by Eq. (1),

266

and the figure displays only the key parameters. The initial state is followed by many

267

actions, follow-up states and possible subsequent states. In addition to what is shown in

268

the figure, further possibilities are indicated by an ellipsis.

269

270

For buildings without a thermal comfort model, predicted mean vote (PMV) [63] can also

271

be used to model the reward, which is expressed as

272

1tt

R PMV PMV +

=−

(5)

273

As above, maximizing the reward R requires that

PMV +=

274

275

Reward modelling in the RL model for multi-occupant offices with multiple agents [64]

276

was different from that for single-occupant offices. For multi-occupant offices, the

277

modelling was divided into two categories. In one category, the reward of a dominant

278

occupant was maximized. Here, one occupant near the thermostat would adjust the

279

thermostat dominantly, and the others in the room would compromise with this occupant’s

280

preference, as is the case in some workplaces [17, 65]. Thus, the reward was for the

281

dominant individual and can be expressed as

282

283

t,dominant t+1,dominant

R = TSV - TSV

(6)

284

During data collection, we also found that in some offices all the occupants had equal

285

control of the thermostat [17]. Therefore, in our other multi-occupant office category, the

286

average reward for all occupants was maximized. The reward was averaged as

287

288

( )

, 1,

t i t i

R TSV TSV

=−



(7)

289

where n is the number of occupants in the room, and i represents different occupants.

290

For a single-occupant office where only the dominant occupant was in the room, the two

291

categories of reward modelling were the same as Eq. (6) and (7).

292

293

2.2 Q-learning

294

295

After designing the model framework, we needed to train the RL model. One of the

296

available training methods is Q-learning. Here “Q” means “quality,” a policy function of

297

an action taken in a given state. It can be expressed as the following mapping:

298

299

:Q S A R→

(8)

300

301

Q-learning is a model-free RL algorithm for learning a policy that tells an agent which

302

actions to take under various circumstances [66]. This learning method has been widely

303

used for training RL models [43, 49, 51, 67, 68]. With the state space, action space and

304

reward modelling described in Section 2.1, we used the Q-learning algorithm to update the

305

quality. The updating equation for Q-learning can be expressed as

306

307

( ) ( ) ( ) ( )

, , max , ,

new t t old t t t t old t t

Q s a Q s a r Q s a Q s a





= +  +  −



(9)

308

where

is the quality,

the state,

the action,



the learning rate,

the reward,



the

309

discount factor, and

( )

max ,

aQ s a

the estimation of optimal future value. According to

310

this equation, as the training begins, the quality is initialized to arbitrary or uniform values.

311

Then, at each episode

of the training process, the agent in state

selects an action

312

with a reward

and an estimated future reward for future actions. After the action, the

313

agent enters a new state

. When the maximized reward is confirmed, the optimal action

314

is learned and the quality

is updated. In this process, the RL model gradually learns to

315

take actions in a certain environment, and we can obtain a Q-learning table of states by

316

various actions. Q-learning is similar to the actual decision process for occupant behavior

317

in buildings.

318

319

The learning rate and discount factor could impact the learning process. In this study, we

320

selected a learning rate of 0.3 and discount factor of 1. We used a table of states by various

321

actions because the choices of actions in the MDP were discrete for adjusting the

322

thermostat by different degrees or clothing insulation to certain values. Thus, the discount

323

factor had little impact on the Q-learning result. As for the learning rate, we will provide

324

training results for learning rate variations in Section 3.2. We used the MATLAB 2020a

325

Reinforcement Learning Toolbox [69] to build and train the RL model.

326

327

2.3 Transfer learning

328

329

After designing and training the RL occupant behavior model, we sought to transfer the

330

model to other buildings with limited information and even with no data. As shown in Fig.

331

4(a), an ANN model, one of the data-driven models, has a layered structure with input,

332

hidden and output layers. The training process for the ANN model uses data to update the

333

values of coefficients in the hidden layer. Therefore, the model can only be used for similar

334

buildings with available data. In previous attempts to apply the model directly to other

335

buildings, the performance was usually not good [18, 21]. In those studies, transfer learning

336

of the ANN model grabbed layers of neural network weights and trained the model again

337

with new data. Prediction for different buildings with transferring data-driven models

338

requires the data to retrain. Additionally, the meanings of the coefficients inside the models

339

are still unclear to researchers. Therefore, the information in the hidden layer cannot be

340

transferred or used for other buildings. However, as shown in Fig. 4(b), the policy-based

341

RL occupant behavior model is a logical model with physical meaning, and thus it can be

342

partially transferred to other buildings. We transferred the higher-level rules of the RL

343

model, i.e., the logic of thermal actions, the pursuit of thermal comfort from one building

344

to another building. We could do this because even for different buildings and HVAC

345

control systems, the logic of occupant behavior that seeks more comfortable conditions

346

remained the same. Therefore, the feasible actions and rewards of the RL model were

347

similar for different buildings. For example, we built an RL occupant behavior model for

348

a building with manual thermostat control. In other buildings with thermostat setback or

349

occupancy control, occupants might adjust the thermostat set point in different ways. When

350

they left the room or during the night, the building automation system could reset the

351

thermostat set point to save energy. When the occupants reentered the room, they could

352

adjust the set point and override the system operation. The occupants’ overriding of the

353

automation systems might indicate their dissatisfaction [70]. As such, there was a “night

354

state” before the occupants’ arrival in the morning, when the set point and air temperature

355

were different, as depicted in Fig. 4(b). After the occupants’ arrival or in the morning, the

356

state space entered the normal initial state. Thus, the transfer learning model structure was

357

similar to original model with possible actions and rewards in the daytime. We could

358

therefore transfer a portion of the parameters in the action space and the rewards to other

359

buildings. Even without data for these buildings, we could still model and predict the

360

occupant behavior.

361

362

For residential buildings, large-scale collection of occupant behavior data has usually been

363

more difficult, because such buildings are generally not equipped with building automation

364

system (BAS) [17]. The use of questionnaire surveys to gather data has been reported as

365

time-consuming and limited in accuracy [23]. Under this circumstance, building a model

366

by transfer learning was a feasible approach. Similarly, we also transferred the RL occupant

367

behavior model for office buildings to residential buildings. The occupant behavior of

368

manual thermostat control was the same in both types of buildings, but the improved

369

thermal comfort level and reward for actions were different [17]. Moreover, there were

370

other factors that distinguished the occupant behavior in office buildings from that in

371

residential buildings [71, 72]. Therefore, we needed to modify the state space and reward

372

in the transfer learning model for residential buildings.

373

374

375

Fig. 4. Transfer of the occupant behavior model for manual control to other buildings with

376

thermostat setback or occupancy control: (a) the data-driven ANN model cannot be

377

transferred because of the coefficient values in the hidden layer; (b) the policy-based RL

378

model can be transferred, and portions of the action and state space are the same.

379

380

For residential buildings, a previous study [17] found that the comfort zone of a building

381

was 1.7 °C (3 °F) higher in summer, and 1.7 °C (3 °F ) lower in winter, than the ASHRAE

382

comfort zone [73]. Therefore, we were able to use this information to transfer the thermal

383

sensation and occupant behavior model from the office building to residential buildings.

384

Since the shape of the thermal comfort zone was similar, whereas the impact of air

385

temperature on thermal comfort and occupant behavior was different [17], the logical RL

386

behavior model could be partially transferred. The MDP for manual control of the

387

thermostat was the same in the office building and residential buildings. We transferred the

388

RL occupant behavior model with the use of PMV to calculate the reward as

389

390

Residence_i Residence_f

R = PMV PMV−

(10)

391

392

Here, the PMV in the residence was defined differently from the traditional PMV model

393

because of the different comfort zone. With the 3 °F difference in winter and summer, it

394

was calculated as

395

Residence_winter air r

PMV = PMV(T +3,RH,T ,V,Clo,Met)

(11)

396

397

Residence_summer air r

PMV = PMV(T 3,RH,T ,V,Clo,Met)−

(12)

398

where the PMV function represents the traditional way of calculating PMV with six

399

parameters.

400

401

2.4 Data collection for model validation

402

403

In order to validate the RL model, this study collected indoor air temperature, relative

404

humidity, thermostat set point, lighting occupancy, clothing level of occupants, and data

405

on the occupant behavior of adjusting the thermostat, from the BAS in 20 offices in the

406

Ray W. Herrick Laboratories (HLAB) building at Purdue University in 2018, as shown in

407

Fig. 5 (a). Half of the offices were multi-occupant student offices, and the rest were single-

408

occupant faculty offices. The building used a variable air volume (VAV) system for heating

409

and cooling. Each office had an independent VAV box and a thermostat (Siemens 544-

410

760A) that enabled the BAS to control the air temperature in the room. We downloaded

411

the indoor environment data of room air temperature and thermostat set point from the

412

BAS. In addition, we used a questionnaire to record the clothing level of the occupants and

413

their clothing-adjustment behavior in the HLAB building.

414

415

We also gathered room air temperature, relative humidity, thermostat set point and lighting

416

occupancy data in four other office buildings on the Purdue University campus in three

417

seasons of 2018, as shown in Fig. 5(b)-(e). Each building contained more than 100 offices.

418

The HVAC systems in these buildings were similar to those in the HLAB building.

419

However, the HVAC control strategies in the four buildings differed from that in the HLAB

420

building. The HVAC system operated constantly in the HLAB building, and the occupants

421

could adjust the thermostat set point manually. The LWSN building, by contrast, used a

422

thermostat setback that overrode the manual control at night, from 11 PM to 6 AM.

423

Meanwhile, the MSEE, HAAS and STAN buildings used occupancy control for the HVAC

424

system in each room in addition to manual control. Table 1 provides the data collection

425

information for each building, including the number of offices in which data was collected,

426

the HVAC control type, the data collection interval, and the types of data that were

427

collected. The details of the data collection process can be found in [17, 74].

428

429

Fig. 5. Photographs of the buildings used for data collection: (a) HLAB building, (b)

430

MSEE building, (c) LWSN building, (d) STAN building and (e) HAAS building

431

432

Table 1. Data collection information for each building

433

Building

Offices for

data collection

HVAC control type

Data

collection

interval

Collected data

HLAB

Manual control

5 min

Room lighting status

Number of room occupants

Room air temperature and RH

Thermostat set point

Room CO2 concentration

Clothing level

Room supply-air flow rate

Room supply-air temperature

LWSN

106

Manual control

+thermostat setback

10 min

Room lighting status

Number of room occupants

Room air temperature and RH

Thermostat set point

MSEE

Manual control

+occupancy control

15 min

STAN

122

Manual control

+occupancy control

15 min

Clothing level

HAAS

Manual control

+occupancy control

15 min

434

435

2.5 Building energy simulation with RL model

436

The purpose of constructing the RL occupant behavior model was to evaluate the impact

437

of occupant behavior on building energy performance. Therefore, we also implemented the

438

RL occupant behavior model in EnergyPlus. We utilized SketchUp to construct the

439

building geometry model in Fig. 6, and then used the model in the EnergyPlus simulations.

440

Table 2 lists the structural and material properties used for the building envelope in the

441

simulations. The structural information was obtained from the HLAB building construction

442

drawings and documents.

443

444

445

Fig. 6. Geometric model of the HLAB building for EnergyPlus simulations

446

447

Table 2. Structural and material properties of the HLAB building for the simulations

448

Construction

component

Layers (from exterior to

interior)

Thickness

(mm)

Conductivity

(W/m K)

Density

(kg/m3)

Specific

heat

(J/kgK)

Exterior

window

Clear float glass

0.99

2528

880

Air cavity

0.026

1.225

1010

Clear float glass

0.99

2528

880

Exterior

wall 1

Brick

92.1

0.89

1920

790

Air cavity

60.3

0.026

1.225

1010

Rigid insulation

50.8

0.03

1210

Exterior sheathing

12.7

0.07

400

1300

CFMF stud

152.4

0.062

57.26

964

Gypsum board

15.9

0.16

800

1090

Exterior

wall 2

Aluminum panel

50.8

45.28

7824

500

Rigid insulation

50.8

0.03

1210

Exterior sheathing

12.7

0.07

400

1300

CFMF stud

152.4

0.062

57.26

964

Gypsum board

15.9

0.16

800

1090

Interior

gypsum wall

Gypsum board

15.9

0.16

800

1090

Metal stud

92.1

0.06

118

1048

Gypsum board

15.9

0.16

800

1090

Interior glass

Glass

0.99

2528

880

wall/door

Interior

wood door

Wood

44.45

0.15

608

1630

449

450

Fig. 7 depicts the simulation process with the RL occupant behavior model. When the

451

simulation starts, the program first checks whether or not the office is occupied, since the

452

behavior occurs only when there is an occupant inside the office. If so, the agent decides

453

on the action to the next time step based on the Q-learning table. Next, the energy

454

simulation program decides whether or not to adjust the thermostat set point or the clothing

455

level of the occupants. The building energy use will correspond to this decision. Moving

456

to the next time step, the program checks whether or not the simulation time has ended; if

457

not, it again checks if the room is occupied. To obtain a reasonable variation range, we

458

performed the simulation 200 times and analyzed the results [74].

459

460

461

Fig. 7. Building energy simulation process incorporating the RL occupant behavior

462

model and Q-learning table of actions

463

464

465

3 Results

466

467

3.1 Results of modelling the reward for action

468

469

Fig. 8 shows the result of reward modelling when the PMV model and the thermal comfort

470

ANN model were used with Eqs. (3)–(5). The figure depicts the relationship between

471

occupant behavior and the corresponding rewards in various air temperatures when other

472

parameters were the same. For example, when the air temperature was 19.4 °C (67 °F), the

473

occupant might feel cool in winter. Thus, the reward for raising the thermostat set point

474

was positive most of the time, until the occurrence of overheating caused by an excessive

475

adjustment. For each state, there was one occupant behavior of set point adjustment that

476

led to the maximum reward. The reward situation was similar when the air temperature

477

was high and the occupant lowered the set point. When the air temperature was about

478

22.8 °C (73 °F), the occupant already felt nearly neutral. In this case, either raising or

479

lowering the set point would lead to a negative reward, and the optimal occupant behavior

480

was to make no adjustment. We used this quantified logic to build the RL model.

481

482

Fig. 8. Reward value modelled for different air temperatures in winter by using (a) the

483

PMV model and (b) the thermal comfort ANN model.

484

485

486

3.2 Results of the RL occupant behavior model

487

488

Fig. 9 depicts the training process for the RL model with the use of Q-learning. The blue,

489

red, and orange curves represent the episode reward, the average reward in nearby episodes,

490

and the quality, respectively. Initially, at the beginning of the training process, the RL

491

model knew nothing about the relationship between the environment, states and actions.

492

Thus, it could only take random actions to explore the relationship, and it received varying

493

rewards. As a result, the episode reward was very low. As the learning process went on,

494

the RL model tried various actions to find a way of maximizing the reward. The quality

495

was updated with the use of Eq. (9). In the examples shown in Fig. 9, the thermostat set

496

point and air temperature were 22.8 °C (73 °F), and the occupant was wearing summer

497

clothing. After training over 300 episodes, the RL model learned to take the action at this

498

state that maximized the reward at 0.61. Fig. 9 also shows that an overly high learning rate

499

made the learning process very unstable, and the quality fluctuated during the training.

500

Meanwhile, a low learning rate would slow down the training process.

501

502

503

Fig. 9. Training of the RL model with the use of Q-learning as the number of episodes

504

increases. The blue, orange, and yellow curves represent the episode reward, the average

505

reward in nearby episodes, and the quality, respectively. (a) learning rate = 0.1; (b)

506

learning rate = 0.3; (c) learning rate = 0.5; (d) learning rate = 0.7.

507

508

The trained RL model would always predict the same occupant behavior in the same state

509

and environment, which was unrealistic. Actual office occupant behavior is influenced by

510

many other factors that we did not build into the RL model [24, 28]. Considering all these

511

factors would have led to an overly complex behavior model. A previous study [11] pointed

512

out that behavior models should not only represent deterministic events but also be

513

described by stochastic laws. Additionally, different thermal preferences on the part of

514

occupants would also cause their behavior to differ. Fig. 10 displays the distribution of

515

collected thermostat set point adjustment behavior at different air temperatures in the

516

HLAB offices. In the box-and-whisker charts, the boxes, whiskers and dots represent the

517

standard deviation, upper and lower bounds, and outliers of the occupant behavior,

518

respectively. The air temperature and occupant behavior had a clear negative correlation.

519

The figure indicates that even at the same air temperature and similar states, the variation

520

range of collected occupant behavior was over ±1.1 °C (2°F) in both single- and multi-

521

occupant offices in different seasons. Under these conditions, the rewards of different

522

actions did not differ greatly, but the RL model always pursued the action that absolutely

523

maximized the reward. For example, the RL model might predict the occupant behavior of

524

raising the set point by 5 °F, while raising it by 4 °F or 6 °F would also be reasonable

525

behavior in a real scenario. Therefore, based on the results in Fig. 10, we added a

526

randomness of -2 °F to +2 °F into the RL model for the final decision to make it more

527

reasonable.

528

529

530

Fig. 10. The distribution of thermostat set point adjustment by occupants in: (a) single-

531

occupant offices, (b) multi-occupant offices, (c) winter with Clo = 1, and (d) summer with

532

Clo = 0.57.

533

534

535

3.3 Validation of the RL model

536

537

We validated the RL model with the use of data collected in 2018 after adding the

538

randomness for the final decision. Fig. 11 compares the collected occupant behavior with

539

the RL model prediction for HLAB offices in four seasons in 2018. For most of the time,

540

the RL prediction results matched the collected data. Table 3 lists all the prediction results

541

for R2 and MAE. The R2 was around 0.7–0.8, and the mean absolute error (MAE) was

542

around 1.5–1.9 °F. The overall R2 and MAE were 0.79 and 1.68 °F , respectively. We

543

removed some data as outliers when the HVAC system was under maintenance and the

544

occupant lost control. We also compared the performance of the RL model for single- and

545

multi-occupant offices. For single-occupant offices, the R2 was 0.8 and the MAE was

546

1.5 °F. For multi-occupant offices, the R2 was 0.78 and the MAE was 1.8 °F. The prediction

547

results for multi-occupant offices were not as good as for single-occupant offices. In

548

previous studies, a prediction R2 of 0.8 was deemed acceptable for an occupant behavior

549

model [74]. Hence, the model performance of the RL model was reasonable.

550

551

552

Fig. 11. Comparison of collected data on the occupant behavior of adjusting the thermostat

553

set point and the RL model prediction for HLAB offices in 2018: (a) winter, (b) spring, (c)

554

summer, and (d) fall.

555

556

Table 3. Prediction performance of the RL model for the HLAB offices

557

MAE

Winter 2018

0.75

1.6

Spring 2018

0.79

1.9

Summer

2018

0.79

1.5

Fall 2018

0.81

1.7

Overall

0.79

1.68

558

559

3.4 Results of transfer learning model

560

561

After validating the RL model for the HLAB offices, we used the transfer learning model

562

to predict occupant behavior in four other office buildings on the Purdue University campus.

563

Fig. 12 shows the collected occupant behavior data and the RL model prediction in three

564

seasons. The overall R2 was 0.7, and the MAE was 1.7 °F . The results were not as good as

565

the model validation results for the same building, presented in Section 3.3, but it was a

566

feasible method for predicting occupant behavior for the different buildings without data.

567

568

Fig. 12. Comparison between collected behavior data and behavior predicted by the RL

569

model in four other Purdue University office buildings in 2018 in (a) summer, (b) fall, and

570

571

572

We also used the defined reward in Eqs. (10)–(12) to train the RL model again for

573

residential buildings. Table 4 shows the prediction performance of the transfer learning

574

model. In the residential buildings, the R2 was between 0.6 and 0.7 in the four seasons, and

575

the MAE varied from 2.1 °F to 2.9 °F . The results were worse than for the transfer learning

576

in the other four office buildings. The reason was that the cross-type prediction was more

577

difficult than cross-building prediction. In the residential buildings, there were many

578

factors that impacted the occupant behavior differently than in the office buildings [71, 72]

579

but were not considered in the current RL model. One feasible way to further improve the

580

transfer learning model would be to introduce more impact factors in the state space, in

581

addition to re-modeling the reward function. Furthermore, the quality and quantity of

582

collected data in the residential buildings were not as good as in the office buildings

583

because we used questionnaire surveys in the former. Recording accurate occupant

584

behavior data with corresponding environmental parameters and incorporating the impact

585

factors are directions for improvement in further studies of residential buildings.

586

587

Table 4. Prediction performance of the transfer learning model from the HLAB building

588

to residential buildings

589

Season

MAE

Winter

0.67

2.1

Spring

0.61

2.9

Summer

0.69

2.3

Fall

0.67

2.7

590

3.5 Energy analysis with the RL occupant behavior model

591

592

After using the transfer learning model to predict occupant behavior in different buildings,

593

we compared the collected heating and cooling energy use data and the simulation with the

594

RL model in the HLAB building, for two days in winter. In Fig. 13, the box-and-whisker

595

charts represent the simulation results with the use of the RL model and the ANN model.

596

The black curve represents the measured data. For most of the time, the measured energy

597

fluctuated within the lower and upper bounds predicted by the RL model. However, the

598

variation range predicted by the RL model was narrower than that predicted by the ANN

599

model. Table 3 lists the average heating and cooling loads and standard deviations for

600

different seasons in one year. The reason for the difference between models was that the

601

logic of the RL model was to improve the thermal comfort level of occupants. Therefore,

602

the predicted occupant behavior was mostly reasonable. The model could not simulate

603

illogical and extreme behavior such as adjusting the thermostat set point to the highest or

604

lowest value for quick heating or cooling [74]. Such behavior can waste a lot of energy.

605

606

Fig. 13. Comparison of the collected heating and cooling energy use data and the

607

simulation of manual thermostat control with the RL model in the HLAB building for two

608

days in winter.

609

610

Table 5. Comparison of measured data with the heating and cooling loads (kWh)

611

simulated by the ANN and RL models in four seasons.

612

Load

Winter

Spring

Summer

Fall

Heating

Measurement

3396

2833

2102

3183

Simulation using ANN model

3526±108

2925±110

2275±35

3298±68

Simulation using RL model

3084±67

2948±41

2239±27

3067±24

Cooling

Measurement

857

2261

2725

1205

Simulation using ANN model

902±170

2006±115

2597±42

1136±90

Simulation using RL model

863±72

1812±56

2570±30

974±30

613

614

We also used the transfer learning RL model to predict the energy use with thermostat

615

setback and occupancy control. Fig. 14 shows all the energy simulation results in summer.

616

The measurement and simulation using actual behavior exhibited little divergence.

617

Thermostat setback and occupancy control could reduce energy use by about 30% and 70%,

618

respectively. The average energy simulation results using the RL model were almost the

619

same as with the ANN model, but the variation was less with the former model; this finding

620

was similar to the results in Table 3. Hence, it is feasible to use the transfer learning RL

621

model to predict the energy use in other buildings with various HVAC control systems.

622

623

Fig. 14. Comparison of the measured heating and cooling loads and the results simulated

624

by different models with thermostat setback and occupancy control in summer.

625

626

4 Discussion

627

628

In this study, we built an RL model to predict comfort-related occupant behavior in office

629

buildings, and validated the model with collected data. We also used transfer learning for

630

cross-building occupant behavior modelling. Although various impact factors were

631

modelled in state space, including indoor air temperature and relative humidity, room

632

occupancy and time, we neglected factors such as gender [75], cultural background [76],

633

and age [4]. To improve the model’s performance and widen its applicability, we need to

634

determine the quantitative relationship between these factors and the occupant behavior for

635

reward modelling in future studies. In the MDP, the time step size for occupant behavior

636

prediction was 15 minutes. Thus, the impact of occupant behavior on the HVAC system

637

and indoor environment was not immediate; rather, it was somewhat delayed. We assumed

638

that the action could take effect in the subsequent time step if the HVAC system was in

639

normal operation. Actually, based on the collected data and observation [17], after

640

adjusting their behavior, the occupants tended to wait for a while, being aware of the

641

HVAC response time. Even though the neutral TSV had not been reached, no occupant

642

behavior occurred during this waiting time. If an occupant waited for a long time, such as

643

3–4 time steps, and still did not feel neutral, then there may have been issues with the

644

HVAC control system or air handing units. In this case, the occupant behavior would be

645

very complicated and personalized, including complaining and making another adjustment,

646

this time to an extreme high or low set point. To improve the learning process and model

647

performance, possible rewards could account for abnormal HVAC operations with longer

648

response time and more time steps. Improving thermal comfort and energy efficiency

649

behavior modelling is a potential direction for our future research.

650

651

In this study, we assumed that the occupant behavior and TSV decisions were based on the

652

current indoor environment. This assumption was similar to those in the most recognized

653

PMV thermal comfort model. According to the adaptive thermal comfort model, the

654

outdoor climate and past thermal history may influence occupants’ thermal preference and

655

behavior. This could explain some of the prediction discrepancy exhibited by the current

656

RL occupant behavior model, which was a limitation in the current study. Furthermore, the

657

adaptive thermal comfort model has usually been applied to naturally ventilated rooms. In

658

this study, the buildings were all mechanically ventilated. If we assumed adaptive thermal

659

comfort and considered the outdoor climate and past thermal history, we could still build

660

the MDP and introduce these factors in the state and reward. In this case, the model would

661

be more complex. We could apply the adaptive thermal comfort theory and use historical

662

states in the RL model to improve the prediction result as a future research direction. In the

663

present study, we defined the reward as the difference between initial and final TSV as

664

shown in Eqs. (5)–(7). Such definition was result-oriented and path-independent, because

665

the middle terms could be canceled if there were many adjustment behaviors. Thus, the

666

occupants could find the set point that maximized the cumulative reward in different ways,

667

which increased the variation in occupant behavior. However, this study considered only

668

comfort-related occupant behavior and not energy-related behavior in offices. This was

669

because the cost of maintaining a comfortable environment in an office is typically not on

670

the minds of occupants [17]. For simulation of energy-saving occupant behavior in other

671

kinds of buildings, the RL model would also require energy parameters for the state space

672

and reward modelling, such as heating and cooling rates and air change rate [77]. Finally,

673

the RL model and transfer learning in this study exhibited good generalization capability

674

and scalability. These models also have potential for other kinds of occupant behavior,

675

such as interactions with windows [24], shades [19], lighting [78] and other indoor

676

appliances.

677

678

With the RL model, we tried to model and predict the occupant behavior without collecting

679

data but rather by building a policy-based MDP. We also used transfer learning to obtain

680

the occupant behavior in other office buildings and in residential buildings with different

681

HVAC systems and very limited information. This cross-building occupant behavior

682

transfer was extremely difficult in the data-driven models. Therefore, the generalization

683

capability of the RL and transfer learning models was better than that of the regression

684

models. Meanwhile, the better generalization capability of the RL model may indicate a

685

lesser ability to make predictions for specific buildings. As a result, the prediction accuracy

686

of the RL model may not be as good as that of the data-driven models.

687

688

689

5 Conclusion

690

This study built and validated an RL occupant behavior model for an office building and

691

transferred it to other buildings with thermostat setback and occupancy control. We also

692

compared the energy use simulated by the RL model with measured data and predictions

693

by the ANN model for the HLAB offices and four other office buildings on the Purdue

694

University campus. This investigation led to the following conclusions:

695

1. The policy-based RL occupant behavior model trained by Q-learning was able to

696

learn the logic of occupant behavior and predict the behavior accurately. The results

697

for prediction of set point adjustment exhibited an R2 around 0.8 and MAE less than

698

2 °F.

699

2. Transfer learning successfully transferred the logic and part of the occupant

700

behavior model structure to other buildings with different HVAC control systems,

701

such as thermostat setback and occupancy control. We also transferred the RL

702

model from office buildings to residential buildings with a modification to the

703

impact of air temperature on occupant behavior. The prediction performance was

704

good, with R2 above 0.6 and MSE less than 2 °F. These transfer learning models

705

did not require data collection. Unlike data-driven models, the transfer learning RL

706

model had physical meaning and strong generalization capability.

707

3. The results of energy simulation for thermostat manual control, setback and

708

occupancy control with the use of the RL model were similar to the results with the

709

ANN model. The RL simulation accurately reflected the impact of occupant

710

behavior on building energy use, but the variation predicted by the RL model was

711

less than that predicted by the ANN model.

712

713

714

Acknowledgments

715

The authors would like to thank Dr. Orkan Kurtulus in the Center for High Performance

716

Buildings at Purdue University for his assistance in setting the building automation system

717

in the HLAB building. We would also like to thank all the occupants of the HLAB offices

718

for their participation and assistance in obtaining the data reported in this study, and Blaine

719

Miller and Chris Sorenson in the Utility Plant Office of Purdue University for providing

720

data in four Purdue buildings. The data collection in this study was approved by Purdue

721

University Institutional Review Board Protocol # 1704019079.

722

723

724

Conflict of Interest

725

The authors declare that they have no known competing financial interests or personal

726

relationships that could have appeared to influence the work reported in this paper.

727

728

729

References

730

[1] US Department of Energy, Building energy data. (2011).

731

[2] De Wilde, Pieter. "The gap between predicted and measured energy performance of

732

buildings: A framework for investigation." Automation in Construction 41 (2014): 40-49.

733

https://doi.org/10.1016/j.autcon.2014.02.009

734

[3] Zou, Patrick XW, Xiaoxiao Xu, Jay Sanjayan, and Jiayuan Wang. "Review of 10 years

735

research on building energy performance gap: Life-cycle and stakeholder perspectives."

736

Energy and Buildings 178 (2018): 165-181. https://doi.org/10.1016/j.enbuild.2018.08.040

737

[4] Zhang, Yan, Xuemei Bai, Franklin P. Mills, and John CV Pezzey. "Rethinking the role

738

of occupant behavior in building energy performance: A review." Energy and Buildings

739

172 (2018): 279-294. https://doi.org/10.1016/j.enbuild.2018.05.017

740

[5] D’Oca, Simona, Tianzhen Hong, and Jared Langevin. "The human dimensions of

741

energy use in buildings: A review." Renewable and Sustainable Energy Reviews 81 (2018):

742

731-742. https://doi.org/10.1016/j.rser.2017.08.019

743

[6] Sun, Kaiyu, and Tianzhen Hong. "A framework for quantifying the impact of occupant

744

behavior on energy savings of energy conservation measures." Energy and Buildings 146

745

(2017): 383-396. https://doi.org/10.1016/j.enbuild.2017.04.065

746

[7] Hong, Tianzhen, Sarah C. Taylor-Lange, Simona D’Oca, Da Yan, and Stefano P.

747

Corgnati. "Advances in research and applications of energy-related occupant behavior in

748

buildings." Energy and Buildings 116 (2016): 694-702.

749

https://doi.org/10.1016/j.enbuild.2015.11.052

750

[8] Paone, Antonio, and Jean-Philippe Bacher. "The impact of building occupant behavior

751

on energy efficiency and methods to influence it: A review of the state of the art." Energies

752

11, no. 4 (2018): 953. https://doi.org/10.3390/en11040953

753

[9] Yan, Da, William O’Brien, Tianzhen Hong, Xiaohang Feng, H. Burak Gunay, Farhang

754

Tahmasebi, and Ardeshir Mahdavi. "Occupant behavior modeling for building

755

performance simulation: Current state and future challenges." Energy and Buildings 107

756

(2015): 264-278. https://doi.org/10.1016/j.enbuild.2015.08.032

757

[10] Hong, Tianzhen, Jared Langevin, and Kaiyu Sun. "Building simulation: Ten

758

challenges." In Building Simulation, vol. 11, no. 5, pp. 871-898. Tsinghua University Press,

759

2018. https://doi.org/10.1007/s12273-018-0444-x

760

[11] Hong, Tianzhen, Da Yan, Simona D'Oca, and Chien-fei Chen. "Ten questions

761

concerning occupant behavior in buildings: The big picture." Building and Environment

762

114 (2017): 518-530. https://doi.org/10.1016/j.buildenv.2016.12.006

763

[12] R.V. Andersen, B.W. Olesen, J. Toftum, “Modelling occupants’ heating set-point

764

preferences,” in: Building Simulation Conference, 2011, pp. 14–16.

765

[13] Fabi, Valentina, Rune Vinther Andersen, and Stefano Paolo Corgnati. "Influence of

766

occupant's heating set-point preferences on indoor environmental quality and heating

767

demand in residential buildings." HVAC&R Research 19, no. 5 (2013): 635-645.

768

https://doi.org/ 10.1080/10789669.2013.789372

769

[14] Langevin, Jared, Jin Wen, and Patrick L. Gurian. "Simulating the human-building

770

interaction: Development and validation of an agent-based model of office occupant

771

behaviors." Building and Environment 88 (2015): 27-45.

772

https://doi.org/10.1016/j.buildenv.2014.11.037

773

[15] Pfafferott, J., and S. Herkel. "Statistical simulation of user behaviour in low-energy

774

office buildings." Solar Energy 81, no. 5 (2007): 676-

775

682.https://doi.org/10.1016/j.buildenv.2014.11.037

776

[16] Sun, Kaiyu, and Tianzhen Hong. "A simulation approach to estimate energy savings

777

potential of occupant behavior measures." Energy and Buildings 136 (2017): 43-62.

778

https://doi.org/10.1016/j.enbuild.2016.12.010

779

[17] Deng, Zhipeng, and Qingyan Chen. "Artificial neural network models using thermal

780

sensations and occupants’ behavior for predicting thermal comfort." Energy and Buildings

781

174 (2018): 587-602. https://doi.org/10.1016/j.enbuild.2018.06.060

782

[18] Wang, Zhe, and Tianzhen Hong. "Reinforcement learning for building controls: The

783

opportunities and challenges." Applied Energy 269 (2020): 115036.

784

https://doi.org/10.1016/j.apenergy.2020.115036

785

[19] O'Brien, William, and H. Burak Gunay. "The contextual factors contributing to

786

occupants' adaptive comfort behaviors in offices—A review and proposed modeling

787

framework." Building and Environment 77 (2014): 77-87.

788

https://doi.org/10.1016/j.buildenv.2014.03.024

789

[20] Stazi, Francesca, Federica Naspi, and Marco D'Orazio. "A literature review on driving

790

factors and contextual events influencing occupants' behaviours in buildings." Building

791

and Environment 118 (2017): 40-66. https://doi.org/10.1016/j.buildenv.2017.03.021

792

[21] Hong, Tianzhen, Simona D'Oca, William JN Turner, and Sarah C. Taylor-Lange. "An

793

ontology to represent energy-related occupant behavior in buildings. Part I: Introduction to

794

the DNAs framework." Building and Environment 92 (2015): 764-777.

795

https://doi.org/10.1016/j.buildenv.2015.02.019

796

[22] O’Brien, William, Isabella Gaetani, Sara Gilani, Salvatore Carlucci, Pieter-Jan Hoes,

797

and Jan Hensen. "International survey on current occupant modelling approaches in

798

building performance simulation." Journal of Building Performance Simulation 10, no. 5-

799

6 (2017): 653-671. https://doi.org/10.1080/19401493.2016.1243731

800

[23] Jia, Mengda, Ravi S. Srinivasan, and Adeeba A. Raheem. "From occupancy to

801

occupant behavior: An analytical survey of data acquisition technologies, modeling

802

methodologies and simulation coupling mechanisms for building energy efficiency."

803

Renewable and Sustainable Energy Reviews 68 (2017): 525-540.

804

https://doi.org/10.1016/j.rser.2016.10.011

805

[24] Fabi, Valentina, Rune Vinther Andersen, Stefano Corgnati, and Bjarne W. Olesen.

806

"Occupants' window opening behaviour: A literature review of factors influencing

807

occupant behaviour and models." Building and Environment 58 (2012): 188-198.

808

https://doi.org/10.1016/j.buildenv.2012.07.009

809

[25] Tijani, Khadija, Stephane Ploix, Benjamin Haas, Julie Dugdale, and Quoc Dung Ngo.

810

"Dynamic Bayesian Networks to simulate occupant behaviours in office buildings related

811

to indoor air quality." arXiv preprint arXiv:1605.05966 (2016).

812

https://arxiv.org/ftp/arxiv/papers/1605/1605.05966.pdf

813

[26] D’Oca, Simona, Stefano Corgnati, and Tianzhen Hong. "Data mining of occupant

814

behavior in office buildings." Energy Procedia 78 (2015): 585-590.

815

https://doi.org/10.1016/j.egypro.2015.11.022

816

[27] Zhou, Xin, Tiance Liu, Da Yan, Xing Shi, and Xing Jin. "An action-based Markov

817

chain modeling approach for predicting the window operating behavior in office spaces."

818

In Building Simulation, pp. 1-15. Tsinghua University Press, 2020.

819

https://doi.org/10.1007/s12273-020-0647-9

820

[28] Andrews, Clinton J., Daniel Yi, Uta Krogmann, Jennifer A. Senick, and Richard E.

821

Wener. "Designing buildings for real occupants: An agent-based approach." IEEE

822

Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans 41, no. 6

823

(2011): 1077-1091. https://doi.org/10.1109/TSMCA.2011.2116116

824

[29] Reinhart, Christoph F. "Lightswitch-2002: A model for manual and automated control

825

of electric lighting and blinds." Solar Energy 77, no. 1 (2004): 15-28.

826

https://doi.org/10.1016/j.solener.2004.04.003

827

[30] Ryu, Seung Ho, and Hyeun Jun Moon. "Development of an occupancy prediction

828

model using indoor environmental data based on machine learning techniques." Building

829

and Environment 107 (2016): 1-9. https://doi.org/10.1016/j.buildenv.2016.06.039

830

[31] Zhou, Hao, Lifeng Qiao, Yi Jiang, Hejiang Sun, and Qingyan Chen. "Recognition of

831

air-conditioner operation from indoor air temperature and relative humidity by a data

832

mining approach." Energy and Buildings 111 (2016): 233-241.

833

https://doi.org/10.1016/j.enbuild.2015.11.034

834

[32] Papadopoulos, Sokratis, and Elie Azar. "Integrating building performance simulation

835

in agent-based modeling using regression surrogate models: A novel human-in-the-loop

836

energy modeling approach." Energy and Buildings 128 (2016): 214-223.

837

https://doi.org/10.1016/j.enbuild.2016.06.079

838

[33] Azar, Elie, and Carol C. Menassa. "Agent-based modeling of occupants and their

839

impact on energy use in commercial buildings." Journal of Computing in Civil Engineering

840

26, no. 4 (2012): 506-518. https://doi.org/10.1061/(ASCE)CP.1943-5487.0000158

841

[34] Lee, Yoon Soo, and Ali M. Malkawi. "Simulating multiple occupant behaviors in

842

buildings: An agent-based modeling approach." Energy and Buildings 69 (2014): 407-416.

843

https://doi.org/10.1016/j.enbuild.2013.11.020

844

[35] Sutton, R. S., & Barto, A. G. (1998). Introduction to reinforcement learning (Vol. 135).

845

Cambridge: MIT Press.

846

[36] Lillicrap, Timothy P., Jonathan J. Hunt, Alexander Pritzel, Nicolas Heess, Tom Erez,

847

Yuval Tassa, David Silver, and Daan Wierstra. "Continuous control with deep

848

reinforcement learning." arXiv preprint arXiv:1509.02971 (2015).

849

https://arxiv.org/pdf/1509.02971.pdf

850

[37] Silver, David, Julian Schrittwieser, Karen Simonyan, Ioannis Antonoglou, Aja Huang,

851

Arthur Guez, Thomas Hubert et al. "Mastering the game of go without human knowledge."

852

nature 550, no. 7676 (2017): 354-359. https://doi.org/10.1038/nature24270

853

[38] Zhang, Zhiang, Adrian Chong, Yuqi Pan, Chenlu Zhang, and Khee Poh Lam. "Whole

854

building energy model for HVAC optimal control: A practical framework based on deep

855

reinforcement learning." Energy and Buildings 199 (2019): 472-490.

856

https://doi.org/10.1016/j.enbuild.2019.07.029

857

[39] Kazmi, Hussain, Johan Suykens, Attila Balint, and Johan Driesen. "Multi-agent

858

reinforcement learning for modeling and control of thermostatically controlled loads."

859

Applied energy 238 (2019): 1022-1035. https://doi.org/10.1016/j.apenergy.2019.01.140

860

[40] Yu, Liang, Weiwei Xie, Di Xie, Yulong Zou, Dengyin Zhang, Zhixin Sun, Linghua

861

Zhang, Yue Zhang, and Tao Jiang. "Deep reinforcement learning for smart home energy

862

management." IEEE Internet of Things Journal 7, no. 4 (2019): 2751-2762.

863

https://doi.org/10.1109/JIOT.2019.2957289

864

[41] Han, Mengjie, Ross May, Xingxing Zhang, Xinru Wang, Song Pan, Yan Da, and Yuan

865

Jin. "A novel reinforcement learning method for improving occupant comfort via window

866

opening and closing." Sustainable Cities and Society (2020): 102247.

867

https://doi.org/10.1016/j.scs.2020.102247

868

[42] Han, Mengjie, Ross May, Xingxing Zhang, Xinru Wang, Song Pan, Da Yan, Yuan Jin,

869

and Liguo Xu. "A review of reinforcement learning methodologies for controlling occupant

870

comfort in buildings." Sustainable Cities and Society 51 (2019): 101748.

871

https://doi.org/10.1016/j.scs.2019.101748

872

[43] Yoon, Young Ran, and Hyeun Jun Moon. "Performance based thermal comfort control

873

(PTCC) using deep reinforcement learning for space cooling." Energy and Buildings 203

874

(2019): 109420. https://doi.org/10.1016/j.enbuild.2019.109420

875

[44] Ruelens, Frederik, Sandro Iacovella, Bert J. Claessens, and Ronnie Belmans.

876

"Learning agent for a heat-pump thermostat with a set-back strategy using model-free

877

reinforcement learning." Energies 8, no. 8 (2015): 8300-8318.

878

https://doi.org/10.3390/en8088300

879

[45] Azuatalam, Donald, Wee-Lih Lee, Frits de Nijs, and Ariel Liebman. "Reinforcement

880

learning for whole-building HVAC control and demand response." Energy and AI 2 (2020):

881

100020. https://doi.org/10.1016/j.egyai.2020.100020

882

[46] Chen, Bingqing, Zicheng Cai, and Mario Bergés. "Gnu-RL: A precocial reinforcement

883

learning solution for building HVAC control using a differentiable MPC policy." In

884

Proceedings of the 6th ACM International Conference on Systems for Energy-Efficient

885

Buildings, Cities, and Transportation, pp. 316-325. 2019.

886

https://doi.org/10.1145/3360322.3360849

887

[47] Ding, Xianzhong, Wan Du, and Alberto E. Cerpa. "MB2C: Model-based deep

888

reinforcement learning for multi-zone building control." In Proceedings of the 7th ACM

889

International Conference on Systems for Energy-Efficient Buildings, Cities, and

890

Transportation, pp. 50-59. 2020. https://doi.org/10.1145/3408308.3427986

891

[48] Jia, Ruoxi, Ming Jin, Kaiyu Sun, Tianzhen Hong, and Costas Spanos. "Advanced

892

building control via deep reinforcement learning." Energy Procedia 158 (2019): 6158-6163.

893

https://doi.org/10.1016/j.egypro.2019.01.494

894

[49] Chen, Yujiao, Leslie K. Norford, Holly W. Samuelson, and Ali Malkawi. "Optimal

895

control of HVAC and window systems for natural ventilation through reinforcement

896

learning." Energy and Buildings 169 (2018): 195-205.

897

https://doi.org/10.1016/j.enbuild.2018.03.051

898

[50] Park, June Young, Thomas Dougherty, Hagen Fritz, and Zoltan Nagy. "LightLearn:

899

An adaptive and occupant centered controller for lighting based on reinforcement

900

learning." Building and Environment 147 (2019): 397-414.

901

https://doi.org/10.1016/j.buildenv.2018.10.028

902

[51] Valladares, William, Marco Galindo, Jorge Gutiérrez, Wu-Chieh Wu, Kuo-Kai Liao,

903

Jen-Chung Liao, Kuang-Chin Lu, and Chi-Chuan Wang. "Energy optimization associated

904

with thermal comfort and indoor air control via a deep reinforcement learning algorithm."

905

Building and Environment 155 (2019): 105-117.

906

https://doi.org/10.1016/j.buildenv.2019.03.038

907

[52] Brandi, Silvio, Marco Savino Piscitelli, Marco Martellacci, and Alfonso Capozzoli.

908

"Deep Reinforcement Learning to optimise indoor temperature control and heating energy

909

consumption in buildings." Energy and Buildings (2020): 110225.

910

https://doi.org/10.1016/j.enbuild.2020.110225

911

[53] Ding, Xianzhong, Wan Du, and Alberto Cerpa. "OCTOPUS: Deep reinforcement

912

learning for holistic smart building control." In Proceedings of the 6th ACM International

913

Conference on Systems for Energy-Efficient Buildings, Cities, and Transportation, pp.

914

326-335. 2019. https://doi.org/10.1145/3360322.3360857

915

[54] Li, Ao, Fu Xiao, Cheng Fan, and Maomao Hu. "Development of an ANN-based

916

building energy model for information-poor buildings using transfer learning." In Building

917

Simulation, pp. 1-13. Tsinghua University Press, 2020. https://doi.org/10.1007/s12273-

918

020-0711-5

919

[55] Mosaico, Gabriele, Matteo Saviozzi, Federico Silvestro, Andrea Bagnasco, and

920

Andrea Vinci. "Simplified state space building energy model and transfer learning based

921

occupancy estimation for HVAC optimal control." In 2019 IEEE 5th International forum

922

on Research and Technology for Society and Industry (RTSI), pp. 353-358. IEEE, 2019.

923

https://doi.org/10.1109/RTSI.2019.8895544

924

[56] Ali, SM Murad, Juan Carlos Augusto, and David Windridge. "A survey of user-

925

centred approaches for smart home transfer learning and new user home automation

926

adaptation." Applied Artificial Intelligence 33, no. 8 (2019): 747-774.

927

https://doi.org/10.1080/08839514.2019.1603784

928

[57] Alam, Mohammad Arif Ul, and Nirmalya Roy. "Unseen activity recognitions: A

929

hierarchical active transfer learning approach." In 2017 IEEE 37th International

930

Conference on Distributed Computing Systems (ICDCS), pp. 436-446. IEEE, 2017.

931

https://doi.org/10.1109/ICDCS.2017.264

932

[58] Mocanu, Elena, Phuong H. Nguyen, Wil L. Kling, and Madeleine Gibescu.

933

"Unsupervised energy prediction in a Smart Grid context using reinforcement cross-

934

building transfer learning." Energy and Buildings 116 (2016): 646-655.

935

https://doi.org/10.1016/j.enbuild.2016.01.030

936

[59] Ribeiro, Mauro, Katarina Grolinger, Hany F. ElYamany, Wilson A. Higashino, and

937

Miriam AM Capretz. "Transfer learning with seasonal and trend adjustment for cross-

938

building energy forecasting." Energy and Buildings 165 (2018): 352-363.

939

https://doi.org/10.1016/j.enbuild.2018.01.034

940

[60] Gao, Nan, Wei Shao, Mohammad Saiedur Rahaman, Jun Zhai, Klaus David, and Flora

941

D. Salim. "Transfer learning for thermal comfort prediction in multiple cities." arXiv

942

preprint arXiv:2004.14382 (2020). https://arxiv.org/pdf/2004.14382.pdf

943

[61] Xu, Shichao, Yixuan Wang, Yanzhi Wang, Zheng O'Neill, and Qi Zhu. "One for many:

944

Transfer learning for building HVAC control." In Proceedings of the 7th ACM

945

International Conference on Systems for Energy-Efficient Buildings, Cities, and

946

Transportation, pp. 230-239. 2020. https://doi.org/10.1145/3408308.3427617

947

[62] Deng, Zhipeng, and Qingyan Chen. "Development and validation of a smart HVAC

948

control system for multi-occupant offices by using occupants’ physiological signals from

949

wristband." Energy and Buildings 214 (2020): 109872.

950

https://doi.org/10.1016/j.enbuild.2020.109872

951

[63] Handbook, A.S.H.R.A.E. "Fundamentals, ASHRAE–American Society of Heating."

952

Ventilating and Air-Conditioning Engineers (2017).

953

[64] Foerster, Jakob, Ioannis Alexandros Assael, Nando De Freitas, and Shimon Whiteson.

954

"Learning to communicate with deep multi-agent reinforcement learning." In Advances in

955

neural information processing systems, pp. 2137-2145. 2016.

956

[65] Klein, Laura, Jun-young Kwak, Geoffrey Kavulya, Farrokh Jazizadeh, Burcin

957

Becerik-Gerber, Pradeep Varakantham, and Milind Tambe. "Coordinating occupant

958

behavior for building energy and comfort management using multi-agent systems."

959

Automation in Construction 22 (2012): 525-536.

960

https://doi.org/10.1016/j.autcon.2011.11.012

961

[66] Melo, Francisco S. "Convergence of Q-learning: A simple proof." Institute Of Systems

962

and Robotics, Tech. Rep (2001): 1-4.

963

[67] Yang, Lei, Zoltan Nagy, Philippe Goffin, and Arno Schlueter. "Reinforcement

964

learning for optimal control of low exergy buildings." Applied Energy 156 (2015): 577-

965

586. https://doi.org/10.1016/j.apenergy.2015.07.050

966

[68] Cheng, Zhijin, Qianchuan Zhao, Fulin Wang, Yi Jiang, Li Xia, and Jinlei Ding.

967

"Satisfaction based Q-learning for integrated lighting and blind control." Energy and

968

Buildings 127 (2016): 43-55. https://doi.org/10.1016/j.enbuild.2016.05.067

969

[69] https://www.mathworks.com/help/reinforcement-learning/

970

[70] Gunay, H. Burak, William O'Brien, and Ian Beausoleil-Morrison. "A critical review

971

of observation studies, modeling, and simulation of adaptive occupant behaviors in

972

offices." Building and Environment 70 (2013): 31-47.

973

https://doi.org/10.1016/j.buildenv.2013.07.020

974

[71] Wei, Shen, Rory Jones, and Pieter De Wilde. "Driving factors for occupant-controlled

975

space heating in residential buildings." Energy and Buildings 70 (2014): 36-44.

976

https://doi.org/10.1016/j.enbuild.2013.11.001

977

[72] Yu, Zhun, Benjamin CM Fung, Fariborz Haghighat, Hiroshi Yoshino, and Edward

978

Morofsky. "A systematic procedure to study the influence of occupant behavior on building

979

energy consumption." Energy and Buildings 43, no. 6 (2011): 1409-1417.

980

https://doi.org/10.1016/j.enbuild.2011.02.002

981

[73] Standard, A.S.H.R.A.E. "Standard 55-2010, Thermal environmental conditions for

982

human occupancy." American Society of Heating, Refrigerating and Air Conditioning

983

Engineers (2010).

984

[74] Deng, Zhipeng, and Qingyan Chen. "Simulating the impact of occupant behavior on

985

energy use of HVAC systems by implementing a behavioral artificial neural network

986

model." Energy and Buildings 198 (2019): 216-227.

987

https://doi.org/10.1016/j.enbuild.2019.06.015

988

[75] Karjalainen, Sami. "Gender differences in thermal comfort and use of thermostats in

989

everyday thermal environments." Building and Environment 42, no. 4 (2007): 1594-1603.

990

https://doi.org/10.1016/j.buildenv.2006.01.009

991

[76] Montazami, Azadeh, Mark Gaterell, Fergus Nicol, Mark Lumley, and Chryssa Thoua.

992

"Impact of social background and behaviour on children's thermal comfort." Building and

993

Environment 122 (2017): 422-434. https://doi.org/10.1016/j.buildenv.2017.06.002

994

[77] Ghahramani, Ali, Kanu Dutta, and Burcin Becerik-Gerber. "Energy trade off analysis

995

of optimized daily temperature setpoints." Journal of Building Engineering 19 (2018): 584-

996

591. https://doi.org/10.1016/j.jobe.2018.06.012

997

[78] Yan, Da, Xiaohang Feng, Yuan Jin, and Chuang Wang. "The evaluation of stochastic

998

occupant behavior models from an application-oriented perspective: Using the lighting

999

behavior model as a case study." Energy and Buildings 176 (2018): 151-162.

1000

https://doi.org/10.1016/j.enbuild.2018.07.037

1001

1002

1003

1004

1005

1006

1007

1008

1009

Highlights

1010

1. Reinforcement learning model for predicting occupant behavior in adjusting

1011

thermostat set point and clothing level in an office building.

1012

2. Transfer learning model for transferring occupant behavior from one building to

1013

another without data.

1014

3. Transfer learning among buildings of the same type was better than among different

1015

types of buildings.

1016

4. The variation range of energy use predicted by the reinforcement learning model

1017

was smaller than that predicted by the artificial neural network model.

1018

1019

1020

Problem and RL model Application

Energy simulation

Transfer learning model

Validation

Q-learning Other residential

buildings

Other office buildings

HLAB building

Reinforcement learning

occupant behavior model Transfer learning

occupant behavior model

Validation

Comparison

Collected

energy

data

RL model design

Transfer

learning

t t+1

Reward = TSV - TSV

Q-learning

table

Agent

(occupant)

Environment

(indoor

environment)

Action

(occupant

behavior)

Reward

(improving

thermal

comfort)

State

Initial

State

Follow-up

State

-2°C

-1°C

+0°C

+1°C

+2°C

+3°C

+4°C

+5°C

+6°C

+7°C

Add clothes

20°C

35%

0.57clo

1Met…

18°C…

19°C …

20°C …

21°C …

22°C …

23°C …

24°C …

25°C …

26°C …

27°C …

1.0clo…

Action

Thermostat manual control

…...

Subsequent

States

…...

(a) (b)

Input

layer Output

Hidden

layer Input

layer Output

Hidden

layer

Cannot transfer

Setback/occupancy control

Manual control

Initial

State

Follow-up

State

-2°C

-1°C

+0°C

+1°C

+2°C

+3°C

+4°C

+5°C

+6°C

+7°C

Add clothes

20°C

35%

0.57clo

1Met…

Initial

State

Action

(behavior)

+0°C

+1°C

+2°C

+3°C

+4°C

20°C

35%

0.57clo

1Met…

Setback/occupancy control

15°C

40%...

Night

state

Action

(behavior)

Morning/

occupant

arrives

Manual control

Transfer learning

…...

Night/

occupant

leaves

Follow-up

State

18°C…

19°C …

20°C …

21°C …

22°C …

23°C …

24°C …

25°C …

26°C …

27°C …

1.0clo…

…...

20°C …

21°C …

22°C …

23°C …

24°C … …...

…...

(a) (b) (c)

(d) (e)

Start

Room occupied?

RL Q-learning table of actions

Time step+1

Adjust set point/clothing level

End time?

End

Yes

Skip to next

occupied time Evaluate building energy use

(a) (b)

(c)

(a) (b)

Machine learning-based predictive model for thermal comfort and energy optimization in smart buildings

Article

Full-text available

Apr 2024

In the current context of energy transition and increasing climate change, optimizing building performance has become a critical objective. Efficient energy use and occupant comfort are paramount considerations in building design and operation. To address these challenges, this study introduces a predictive model leveraging Machine Learning (ML) algorithms. The model aims to predict thermal comfort levels and optimize energy consumption in Heating, Ventilation, and Air Conditioning (HVAC) systems. Four distinct ML algorithms Support Vector Machine (SVM), Artificial Neural Network (ANN), Random Forest (RF), and EXtreme Gradient Boosting (XGBOOST) are employed for this purpose. Data for the model is collected using a network of Raspberry Pi boards equipped with multiple sensors. Performance evaluation of the ML algorithms is conducted using statistical error metrics, including, Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and coefficient of determination (R2). Results reveal that the RF and XGBOOST algorithms exhibit superior performance, achieving accuracies of 96.7 % and 9.64 % respectively. In contrast, the SVM algorithm demonstrates inferior performance with a R2 of 81.1 %. These findings underscore the predictive capability of the RF and XGBOOST model in forecasting Predicted Mean Vote (PMV) values. The proposed model holds promise for enhancing occupant thermal comfort in buildings while simultaneously optimizing energy consumption in HVAC systems. Further research could explore the practical applications of these findings in building design and operation.

A Deep Reinforcement Learning Model-based Optimization Method for Graphic Design

Article

Apr 2024

Simplified thermal modelling for optimal thermal control strategy of buildings

Conference Paper

Full-text available

Jan 2024

Energy-Efficient HVAC Control based on Reinforcement Learning and Transfer Learning in a Residential Building

Conference Paper

Mar 2024

Analysis of the building occupancy estimation and prediction process: A systematic review

Article

May 2024
ENERG BUILDINGS

A systematic review of reinforcement learning application in building energy-related occupant behavior simulation

Article

Apr 2024
ENERG BUILDINGS

Impact of Indoor Air Quality and Multi-domain Factors on Human Productivity and Physiological Responses: A Comprehensive Review

Article

Full-text available

Apr 2024
INDOOR AIR

Indoor environmental quality (IEQ) significantly impacts human health, well-being, and productivity. However, a comprehensive and in-depth review of the combined effects of IAQ and other multi-domain factors on human productivity is lacking. There has not been any prior review that encapsulates the impact of multi-domain factors on productivity and physiological responses of occupants. To address this gap, this review paper investigates and highlights the impact of IAQ and multi-domain factors (thermal, visual, and acoustic) on human productivity and occupant well-being in the built environment. The review explores various research methods, including evaluation of human productivity and creativity, data collection, and physiological signal analysis. We also examined the interactions between IAQ and multi-domain factors, as well as strategies for optimizing productivity through integrated building design and smart systems. The key findings from this review reveal that IAQ significantly impacts human productivity and occupant well-being, with interactions between IAQ and other IEQ factors further impacting these effects. Despite advances in the field, there are several limitations and gaps in the current research methods and study designs, including small sample sizes, limited and insufficient experimental design and control, reliance on laboratory or simulated environments, lack of follow-up and long-term data, and lack of robust performance metrics. The review proposes future research directions, including specific applications, and follow-up work to address these limitations and further advance the understanding of IAQ and multi-domain factors in the built environment. The implications of this review for policy and practice include the need for holistic and integrated approaches to IAQ and IEQ management, with a focus on creating healthy and productive indoor environments. This review emphasizes the importance of considering the complex interplay between IAQ and multi-domain factors, as well as the potentials of adopting smart control systems and sustainable design strategies to optimize productivity and occupant well-being in the built environment. By addressing these critical issues, we can enhance the overall quality of life for building occupants and contribute to a more sustainable future.

Deep Learning and Reinforcement Learning for Modeling Occupants’ Information in an Occupant-Centric Building Control: A Systematic Literature Review

Conference Paper

Mar 2024

Balancing Sustainability and Comfort: A Holistic Study of Building Control Strategies That Meet the Global Standards for Efficiency and Thermal Comfort

Article

Full-text available

Mar 2024

The objective of energy transition is to convert the worldwide energy sector from using fossil fuels to using sources that do not emit carbon by the end of the current century. In order to achieve sustainability in the construction of energy-positive buildings, it is crucial to employ novel approaches to reduce reliance on fossil fuels. Hence, it is essential to develop buildings with very efficient structures to promote sustainable energy practices and minimize the environmental impact. Our aims were to shed some light on the standards, building modeling strategies, and recent advances regarding the methods of control utilized in the building sector and to pinpoint the areas for improvement in the methods of control in buildings in hopes of giving future scholars a clearer understanding of the issues that need to be addressed. Accordingly, we focused on recent works that handle methods of control in buildings, which we filtered based on their approaches and relevance to the subject at hand. Furthermore, we ran a critical analysis of the reviewed works. Our work proves that model predictive control (MPC) is the most commonly used among other methods in combination with AI. However, it still faces some challenges, especially regarding its complexity.

Privacy preserved and decentralized thermal comfort prediction model for smart buildings using federated learning

Article

Feb 2024

Thermal comfort is a crucial element of smart buildings that assists in improving, analyzing, and realizing intelligent structures. Energy consumption forecasts for such smart buildings are crucial owing to the intricate decision-making processes surrounding resource efficiency. Machine learning (ML) techniques are employed to estimate energy consumption. ML algorithms, however, require a large amount of data to be adequate. There may be privacy violations due to collecting this data. To tackle this problem, this study proposes a federated deep learning (FDL) architecture developed around a deep neural network (DNN) paradigm. The study employs the ASHRAE RP-884 standard dataset for experimentation and analysis, which is available to the general public. The data is normalized using the min-max normalization approach, and the Synthetic Minority Over-sampling Technique (SMOTE) is used to enhance the minority class’s interpretation. The DNN model is trained separately on the dataset after obtaining modifications from two clients. Each client assesses the data greatly to reduce the over-fitting impact. The test result demonstrates the efficiency of the proposed FDL by reaching 82.40% accuracy while securing the data.

Dynamic Bayesian Networks to Simulate Occupant Behaviours in Office Buildings Related to Indoor Air Quality

Conference Paper

Full-text available

Dec 2015

Transfer learning for thermal comfort prediction in multiple cities

Article

Full-text available

Feb 2021
BUILD ENVIRON

The HVAC (Heating, Ventilation and Air Conditioning) system is an important part of a building, which constitutes up to 40% of building energy usage. The main purpose of HVAC, maintaining appropriate thermal comfort, is crucial for the best energy usage. Additionally, thermal comfort is also important for well-being, health, and work productivity. Recently, data-driven thermal comfort models have achieved better performance than traditional knowledge-based methods (e.g. the predicted mean vote model). An accurate thermal comfort model requires a large amount of self-reported thermal comfort data from indoor occupants which undoubtedly remains a challenge for researchers. In this research, we aim to address this data-shortage problem and boost the performance of thermal comfort prediction. We utilise sensor data from multiple cities in the same climate zone to learn thermal comfort patterns. We present a transfer learning-based multilayer perceptron model from the same climate zone (TL-MLP-C*) for accurate thermal comfort prediction. Extensive experimental results on the ASHRAE RP-884, Scales Project and Medium US Office datasets show that the performance of the proposed TL-MLP-C* exceeds the performance of state-of-the-art methods in accuracy and F1-score.

One for Many: Transfer Learning for Building HVAC Control

Conference Paper

Full-text available

Nov 2020

Reinforcement Learning for Whole-building HVAC Control and Demand Response

Article

Full-text available

Aug 2020

This paper proposes a novel reinforcement learning (RL) architecture for the efficient scheduling and controlling of the heating, ventilation and air conditioning (HVAC) energy use in a commercial building while harnessing its demand response (DR) potentials. With advances in automated building management systems, this can be achieved seamlessly by a smart autonomous RL agent which takes the best action, for example, a change in HVAC temperature set point, necessary to change the electricity usage pattern of a building in response to demand response signals, and with minimal thermal comfort impact to customers. Previous research in this area has tackled only individual aspects of the problem using RL. Specifically, due to the challenges in implementing demand response with whole-building models, simpler analytical models which poorly captures reality have been used instead. And where whole-building models are applied, RL is used for HVAC control mainly to achieve energy efficiency goals while demand response is neglected. Thus, in this research, we implement a holistic framework by designing an efficient RL controller for a whole-building model which optimally controls the HVAC system for improved energy efficiency and thermal comfort levels in addition to achieving demand response goals. Our simulation results show that in applying reinforcement learning for normal HVAC operation, a maximum weekly energy reduction of up to 22% can be achieved compared to the baseline controller. Furthermore, by employing a DR-aware RL controller during demand response periods, power reductions/increases (average) of up to 50% can be achieved on a weekly basis compared to the default RL controller, while keeping occupant thermal comfort levels within acceptable bounds.

Deep Reinforcement Learning to optimise indoor temperature control and heating energy consumption in buildings

Article

Full-text available

Jun 2020
ENERG BUILDINGS

In this work, Deep Reinforcement Learning (DRL) is implemented to control the supply water temperature setpoint to terminal units of a heating system. The experiment was carried out for an office building in an integrated simulation environment. A sensitivity analysis is carried out on relevant hyperparameters to identify their optimal configuration. Moreover, two sets of input variables were considered for assessing their impact on the adaptability capabilities of the DRL controller. In this context a static and dynamic deployment of the DRL controller is performed. The trained control agent is tested for four different scenarios to determine its adaptability to the variation of forcing variables such as weather conditions, occupant presence patterns and different indoor temperature setpoint requirements. The performance of the agent is evaluated against a reference controller that implements a combination of rule-based and climatic-based logics. As a result, when the set of variables are adequately selected a heating energy saving ranging between 5 and 12 % is obtained with an enhanced indoor temperature control with both static and dynamic deployment. Eventually the study proves that if the set of input variables are not carefully selected a dynamic deployment is strictly required for obtaining good performance.

MB2C: Model-Based Deep Reinforcement Learning for Multi-zone Building Control

Conference Paper

Nov 2020

Development of an ANN-based building energy model for information-poor buildings using transfer learning

Article

Sep 2020

Accurate building energy prediction is vital to develop optimal control strategies to enhance building energy efficiency and energy flexibility. In recent years, the data-driven approach based on machine learning algorithms has been widely adopted for building energy prediction due to the availability of massive data in building automation systems (BASs), which automatically collect and store real-time building operational data. For new buildings and most existing buildings without installing advanced BASs, there is a lack of sufficient data to train data-driven predictive models. Transfer learning is a promising method to develop accurate and reliable data-driven building energy prediction models with limited training data by taking advantage of the rich data/knowledge obtained from other buildings. Few studies focused on the influences of source building datasets, pre-training data volume, and training data volume on the performance of the transfer learning method. The present study aims to develop a transfer learning-based ANN model for one-hour ahead building energy prediction to fill this research gap. Around 400 non-residential buildings’ data from the open-source Building Genome Project are used to test the proposed method. Extensive analysis demonstrates that transfer learning can effectively improve the accuracy of BPNN-based building energy models for information-poor buildings with very limited training data. The most influential building features which influence the effectiveness of transfer learning are found to be building usage and industry. The research outcomes can provide guidance for implementation of transfer learning, especially in selecting appropriate source buildings and datasets for developing accurate building energy prediction models.

An action-based Markov chain modeling approach for predicting the window operating behavior in office spaces

Article

Jun 2020

Reliable energy and performance prediction for building design and planning is important for newly-designed or retrofitted buildings. Window operating behavior has an important influence on the ventilation and energy consumption of these buildings under different realistic scenarios. Therefore, quantitatively describing this behavior and constructing a prediction model are important. In this work, an action-based Markov chain modeling approach for predicting window operating behavior in office spaces was proposed. Two summer measurement data (2016 and 2018) were used to verify the accuracy and validity of the modeling approach. The opening rate, outdoor temperature, time distribution, and on-off curve were proposed as four inspection standards. This study also compared the prediction performance between the action-based Markov chain modeling approach with the state-based Markov chain modeling approach, which is the most popular modeling approach to model occupant window operating behavior. This study proved that the yearly variation of occupants’ behavior performed a form of action that remained unchanged during a certain period. Meanwhile, the results also proved that the action-based Markov chain modeling approach can reflect the actual window operating behavior accurately within an open-plan office, which is a beneficial supplement for energy-consumption simulation software in a window-state prediction module. The state-based Markov chain modeling approach showed better stability and accuracy in terms of the opening rate, whereas the action-based Markov chain modeling approach showed good consistency with the measurement data in the on-off curves and in situations with little data. For the on-off curves, the accuracy of action-based modeling approach in the prediction of window open-state is 20% higher.

A novel reinforcement learning method for improving occupant comfort via window opening and closing

Article

May 2020

An occupant's window opening and closing behaviour can significantly influence the level of comfort in the indoor environment. Such behaviour is, however, complex to predict and control conventionally. This paper, therefore, proposes a novel reinforcement learning (RL) method for the advanced control of window opening and closing. The RL control aims at optimising the time point for window opening/closing through observing and learning from the environment. The theory of model-free RL control is developed with the objective of improving occupant comfort, which is applied to historical field measurement data taken from an office building in Beijing. Preliminary testing of RL control is conducted by evaluating the control method’s actions. The results show that the RL control strategy improves thermal and indoor air quality by more than 90 % when compared with the actual historically observed occupant data. This methodology establishes a prototype for optimally controlling window opening and closing behaviour. It can be further extended by including more environmental parameters and more objectives such as energy consumption. The model-free characteristic of RL avoids the disadvantage of implementing inaccurate or complex models for the environment, thereby enabling a great potential in the application of intelligent control for buildings.

Reinforcement Learning for Building Controls: The opportunities and challenges

Article

Apr 2020
APPL ENERG

Building controls are becoming more important and complicated due to the dynamic and stochastic energy demand, on-site intermittent energy supply, as well as energy storage, making it difficult for them to be optimized by conventional control techniques. Reinforcement Learning (RL), as an emerging control technique, has attracted growing research interest and demonstrated its potential to enhance building performance while addressing some limitations of other advanced control techniques, such as model predictive control. This study conducted a comprehensive review of existing studies that applied RL for building controls. It provided a detailed breakdown of the existing RL studies that use a specific variation of each major component of the Reinforcement Learning: algorithm, state, action, reward, and environment. We found RL for building controls is still in the research stage with limited applications (11%) in real buildings. Three significant barriers prevent the adoption of RL controllers in actual building controls: (1) the training process is time consuming and data demanding, (2) the control security and robustness need to be enhanced, and (3) the generalization capabilities of RL controllers need to be improved using approaches such as transfer learning. Future research may focus on developing RL controllers that could be used in real buildings, addressing current RL challenges, such as accelerating training and enhancing control robustness, as well as developing an open-source testbed and dataset for performance benchmarking of RL controllers.

Reinforcement learning of occupant behavior model for cross-building transfer learning to various HVAC control systems

Abstract and Figures

Recommended publications

Fighting fake Chinese Herbal Medicines

Simulation model prepares cardiologists for surgeries

Simulating the Impact of Occupant Behavior on Energy Use of HVAC Systems by Implementing a Behaviora...

Impact of occupant behavior on energy use of HVAC system in offices

Autonomous Building Control Using Offline Reinforcement Learning

Controlling distributed energy resources via deep reinforcement learning for load flexibility and en...