Conference PaperPDF Available

Analyze building performance data for energy-efficient building operation

Authors:
1 INTRODUCTION
There is a great interest to improve energy manage-
ment in buildings considering the increasing price of
fuel, and the global goal of reducing CO² output.
Building Energy Management (BEM) aims at the ef-
fective and efficient usage of energy to maintain
high building performance operation (Capehar et al.
2008, p. 1). One of the current challenges in this
domain is to optimise energy consumption, while
considering occupant comfort (Metz 2007, p. 394).
Building performance analysis emphasizes the
measurement and assessment of various perform-
ance indicators covering the interests of owners, op-
erators, and occupants in aspects like energy, light-
ing, thermal comfort, and maintenance (Augenbroe
& Park 2005).
The continuous development of wired building
automation systems and the current emerging of
easy-to-integrate wireless solutions have increased
the amount of available building performance data
(Menzel et al. 2008) to evaluate these indicators.
Traditional database management systems (DBMS)
are nowadays used to store the building monitoring
data. These DBMS lack the ability to create data ag-
gregations and do not support the analysis of build-
ing performance data to deliver reports and action-
able information (Lane 2007, p. 29).
Modern approaches from computer science may
simplify the building performance analysis. Data
Warehouses (DW) adds data aggregation capabilities
to databases to prepare and deliver reports for large
data sets (Stackowiak et al. 2007). They also facili-
tate the use of modern analysis approaches such as
Knowledge Discovery in Databases and Data Min-
ing (KDD) (Han & Kamber 2006, p. 35) to discover
previously unknown characteristics, relationships,
dependencies, or trends in data (Rob et al. 2008, p.
744).
The paper introduces a system that incorporates
these two technologies to simplify the building per-
formance analysis. Data Warehouse technologies are
used to aggregate building performance data and
provide to users a fast and easy way to manually
analyse it. This approach is demonstrated in Section
2 for the energy consumption of a real building.
Data mining approaches can be used to analyse
patterns in building performance data, but also to
train models (Section 3). This is presented during the
evaluation of thermal comfort to identify rooms with
low comfort in Section 4 using only room tempera-
ture sensors. The data mining process is introduced
from building data sources, to data preparation and
transformation, model building, testing, and scoring.
The paper uses real data from the Environmental
Research Institute (ERI 2002). The ERI is an en-
ergy-efficient building with many sustainable energy
features such as solar panels, geothermal heat pumps
and heat recovery systems. The ERI building is used
by multiple research groups from biology, chemis-
try, as well as engineering. It also facilitates as a
―Living Laboratory‖ to demonstrate smart building
concepts. The mixed usage with office and labora-
tory spaces and the modern sustainable energy fea-
tures define a wide set of requirements for the build-
ing operator to optimize energy usage while
maintaining steady occupant comfort.
Analyze building performance data for energy-efficient building
operation
A. Ahmed, J. Ploennigs, Y. Gao & K. Menzel
IRUSE, University College Cork, Ireland
ABSTRACT: Modern buildings contain several sensors and meters to monitor the building performance. This
data allows analyzing the building performance to increase the energy-efficiency along with user comfort.
This paper presents two approaches to analyse building performance data. One solution uses data warehouse
techniques to create sophisticated energy consumption aggregations. A second approach implements data
mining techniques to estimate the thermal comfort of occupants with a reduced number of sensors. This paper
interprets the knowledge gained using, as an example, University College Cork‘s Environmental Research In-
stitute building to demonstrate the feasibility of this approach.
2 DATA WAREHOUSE FOR ENERGY-
EFFICIENT BUILDING OPERATION
Data Warehouses (DW) structure data in pre-
specified materialised views that are defined by di-
mensions and stored in cubes to support data aggre-
gation.
For example, an operator wants to analyse the en-
ergy consumption of a building and needs to know
when the most energy is used (time), where it is used
(location), and by which tenant (organization). This
use case specifies the dimensions of the data ware-
house respectively Time, Location, and Organiza-
tion. These dimensions are used to structure and ac-
cess the data in queries, for example: Give me the
aggregated energy consumption of ―last year‖ (time)
for the tenant ―IRUSE‖ (organization) in the ―ERI‖
(location). Such aggregation queries are predefined
in cubes that are spanned by dimensions and the re-
sults are pre-computed in the data warehouse, thus
allowing very fast access to such results. The multi-
dimensional data analysis concept and DW tech-
niques for building performance are further detailed
in Ahmed et al. (2009).
Figure 1. GUI for the building operator
Figure 1 shows the GUI implemented for the ERI
DW. The three energy consumption data categories
that affect the operational costs are electricity (main
power board meter), natural gas (boiler and labora-
tory meters) and water (mains water meter). They
are selectable at the bottom of the GUI. This will be
extended to support the ERI´s sustainable energy
systems to allow a comparison of the energy intake.
The operator uses the dimension categories to
specify the data shown in the graph on the top right.
The operator can select the energy consumption for
a whole building, a specific zone (rooms), a tenant
organization, or equipment. The calendar allows
specifying the time dimension from years, to month,
to single days. These dimensions enable the operator
to easily analyse the building‘s energy consumption
from top level (several years per building), down to
the most detailed level (hourly per room). Due to the
pre-computed queries defined by cubes, the data
warehouse quickly responds with results if the op-
erator modifies a relevant query.
3 DATA MINING CONCEPTS AND APPLICA-
TIONS
Knowledge Discovery in Databases (KDD) and
Data Mining (DM) involves processes to extract or
mine knowledge from large amounts of data (Han &
Kamber 2006, p. 5), providing implicit useful
knowledge (Wang & Huang 2006) to address spe-
cific business problems.
Data Mining approaches can usually be catego-
rised into descriptive and predictive algorithms. De-
scriptive algorithms on the one hand are used for
exploratory data analysis to discover individual pat-
terns, such as associations, or clusters. Predictive al-
gorithms on the other hand focus on the creation of
models that allow predicting observations from input
data like classifications, regression models or neural
networks.
Data mining has been used extensively in the
medical field to solve many problems, such as the
association of genes to genetically inherited diseases
(Perez-Iratxeta et al. 2002). In direct marketing, data
mining is able to identify likely buyers of products,
advertise and promote products (Ling & Li 1998),
and for products placement in shopping centres, to
identify items that are likely to be purchased to-
gether. Data mining has proved successful in reduc-
ing the cost of doing business, improving profits,
and increasing service quality (Apte et al. 2002). In
addition, data mining supports the construction of
customers‘ personal profile from customer transac-
tional data (Adornavicius & Tuzhilin 2002) by the
means of knowledge discovery in databases.
In buildings and energy fields, data mining ap-
proaches, like neural networks, are used in modern
building automation to identify usage scenarios
(Lang et al. 2007), or to estimate the energy con-
sumption in residential buildings (Mihalakakou et al.
2002), and tropical regions (Dong et al. 2005). Char-
acterisation of electric energy consumers was ac-
quired using data mining (Figueiredo et al. 2005). It
was also used to analyse data collected from simula-
tions (Morbitzer et al. 2004), or wireless sensor net-
works (Wu & Clements-Croom 2007).
Most of these studies focus on the energy con-
sumption of buildings, but few evaluate occupant re-
lated aspects of building performance like the ther-
mal comfort of occupants. One reason may be, that
the thermal comfort is a complex measurement it-
self, depending, in the case of the Predicted Mean
Vote (PMV), on the temperature, humidity, air ve-
locity, occupants clothing, etc. This requires com-
plex sensor equipment for data gathering, which is
not reasonable in all rooms. Data Mining can help to
solve such limitations with its predictive algorithms
as this paper demonstrates.
The objective is to analyse building performance
data and room thermal comfort to evaluate heating
and cooling systems efficiency. The data used in this
research is the historical sensed data of the ERI. The
ERI has air temperature sensors in each of its 70
rooms, but possesses additional radiant temperature,
humidity, and CO2 sensors in only four rooms. To
evaluate the thermal comfort for all rooms the pre-
dictive models of data mining should be used as dis-
cussed in the next sections.
4 MINING THE BUILDING PERFORMANCE
DATA
Figure 2 shows the mining process of the sensor data
in the ERI building. This includes data acquisition
(gathering) and preparation (data access, data sam-
pling, and data transformation), model building and
evaluation (create model, test model, evaluate and
interpret model), and Knowledge deployment
(model apply) (Haberstroh 2008, pp. 9-12). All logi-
cal definitions and their physical implementation
presented in this paper comply with Oracle Corpora-
tion Specifications for Oracle Data Miner (ODM)
11g version 1 (Oracle 2008).
Problem
definition
Data gathering
and preparations
Model building
and evaluation
Knowledge
deployment
Data access
Data sampling
Data transformation
Create model
Test model
Evaluate and
interpret model
Model apply
Figure 2. The process of mining the ERI sensed data stream.
4.1 Problem definition in terms of Data Mining and
Energy Management
This section defines the problem from the energy
management perspective, then converts this knowl-
edge into a data mining problem definition and
shows the preliminary plan designed to solve it.
As mentioned in Section 2, energy management
is required to provide steady user comfort while re-
ducing energy consumption. Relevant stakeholders
need to evaluate HVAC system efficiency and user
comfort in order to accomplish this task, while keep-
ing the cost of this evaluation as low as possible.
We approach this problem by classifying rooms
based on their thermal comfort into hot, warm,
slightly warm, neutral, slightly cool, cool, and cold.
The classification is based on the Predicted Mean
Vote (PMV) as standardized in the ISO 7730 (2005).
A classification model is created based on 4 rooms
that have the necessary sensors available as detailed
in Section 4.2.3 This model is then applied to all 70
rooms using only air temperature sensors to predict
the comfort class.
4.2 Data acquisition and preparations
4.2.1 Data sources and volumes
Data processing includes cleansing, integration, and
transformation of the sensed data to assure high
quality (Atzmüller 2007, p. 174).
The data source for this research is a collection of
storages of the ERI building performance data, as
mentioned in Section 2. The ERI building is a 4500
m² ―Living Laboratory‖ located on the campus of
University Cork College, Ireland. It is equipped with
multiple types of solar panels, geothermal heat
pumps and an under floor heating system. Building
Performance Data is provided by 180 wired sensors
of the Building Management System. Additionally, a
test bed for wireless sensors and actuators has been
installed since April 2008 in three phases. Demon-
strator 0 has been operational since June 2008. Table
1 shows the expected sensors data stream volume for
the ERI building per year.
Table 1. Expected data volumes in the ERI.
Sensors
Sampling Period
Total records
180 Wired
15 minutes
6,307,200
80 Wireless
1 minutes
42,048,000
Total Volume
48,355,200
Currently, there are 190 sensors installed and
working in the ERI building, with 13 different types
of measurements, including indoor environment and
outdoor weather conditions. These sensors are in-
stalled in 109 points in 94 rooms and spaces such as
stairs way, and corridors.
4.2.2 Data collection
Data extracted and retrieved from the building‘s
monitoring data sources is stored in a table with the
attributes as listed in Table 2. These attributes are
the predictors or the influences that are used to de-
tect the room comfort class.
The data for building and testing the model used
in Section 4.3 was collected for the period of
08/02/2007 to 24/04/2009 and contains 933,235 re-
cords for four rooms in the ERI building.
The data for scoring the model in Section 4.4
represents the period of 13/10/2008 to 01/02/2009
and contains 890,921 records for the air temperature
and outdoor conditions for all rooms in the building.
Table 2. The predictors.
#
Attribute Name
Description
1
MEASURE_ID
A unique id to identify a
sensor measure
2
ROOM_ID
A unique id to identify a
room in the ERI
3
ROOM_NAME
A name to identify a
room
4
ROOM_SIZE
The volume of a room
5
ROOM_FLOOR
Storey in which a room is
located
6
TIME_ID
The time stamp of sensor
reading
7
COMFORT_CLASS
The predicted comfort
class of room
8
ROOM_TEMPERATURE
temperature measure in
room
9
OUT_TEMPERATURE
Outside temperature
10
OUT_HUMIDITY
Outside humidity
11
OUT_LIGHT
Outside light
12
OUT_TOTAL_RADIATION
Outside total solar radia-
tion
13
OUT_DIFFUSE_ RADIA-
TION
Outside total diffuse solar
radiation
14
OUT_WIND_DIRECTION
Wind direction
15
OUT_WIND_SPEED
Wind speed
16
ROOM_RAD_TEMP*
Radiant Temperature
17
ROOM_HUMIDITY*
Relative Humidity
18
ROOM_CO2*
CO2 Concentration
*Available for 4 rooms and used only for computing the com-
fort class
4.2.3 Data preparations and transformation
This section shows the activity of modifying the
values of some attributes and adding other values as
required to present the appropriate data set for min-
ing. There is no methodology agreed upon to prepare
data for the purpose of mining, but it usually tries to
identify and remove outliers, fill null-values and re-
move noise in the data to improve model quality.
First outliers are detected and removed. It has
been found that the air temperature sensor in one
room in the scoring data is broken and delivers read-
ings between -300°C and -200°C. Second, when the
Building Management System is reset it sets all
measurements to zero by default. Both outliers‘
sources were removed from the data leaving 933,235
records for model building and 890,921 records for
scoring.
However, the biggest issue concerns approxi-
mately 90% of the records per measurement (lines 8-
18 in Table 2) that are NULL in the database. The
reason for this is that the timestamps of the sensors
are not synchronized and each sensor fills only its
own column. Thus, when the air temperature sensor
adds a value in the ROOM_TEMPERATURE col-
umn the other measurement columns (lines 9-18) are
left empty. For data mining they need to be filled to
allow the analysis of correlations.
This is done by linearly interpolating each col-
umn over the timestamp for each room. Let us as-
sume for example the air temperature sensor in room
G01 reads 20.0°C at 4:00pm and 15 minutes later
21.5°C. The relative humidity sensor adds its value
at 4:05pm to the database. For this timestamp the
temperature in G01 can be linearly interpolated to
20.5°C. This linear interpolation is implemented in
JAVA for all continuous measurements in Table 2
for building and scoring the model.
As a last preparation step, the thermal comfort
class needs to be computed for the data used for
model building. The classification is based on the
PMV, which is defined in the ISO 7730 and was im-
plemented in JAVA. The PMV value is not an un-
disputable thermal comfort measurement (Nicol &
Parsons 2002, Pfafferott et al. 2007) and other ap-
proaches try to create more general models (Yao et
al. 2009). Nevertheless, the PMV was selected for
this example as it shows the complexity of thermal
comfort evaluation and is established. Other thermal
comfort measures can be analyzed in the same way.
The PMV depends on the air temperature, radiant
temperature, relative humidity, air velocity, as well
as occupant‘s clothing and activity level. Readings
for the air temperature, radiant temperature, and
relative humidity are available for four rooms in the
database. To compute the PMV, we assume constant
air velocity of 0.1m/s, which is a representative
mean value for naturally ventilated offices (Mou-
jalled 2008). At the activity level we assume office
works with 1.2met. The clothing value is interpo-
lated depending on the outside temperature between
1.0m2K/W (indoor winter clothing at 0°C) and
0.5m2K/W (summer clothing at 30°C).
Table 3. Comfort classes based on the PMV.
Comfort Class
Classification
No. in
Data
Percentage
in Data
Hot
3.5 > PMV ≥ 2.5
0
0.0%
Warm
2.5 > PMV ≥ 1.5
4
0.0%
Slightly Warm
1.5 > PMV ≥ 0.5
8,948
1.0%
Neutral
0.5 > PMV ≥ -0.5
772,072
82.7%
Slightly Cool
-0.5 > PMV ≥ -1.5
150,227
16.1%
Cool
-1.5 > PMV ≥ -2.5
1,984
0.2%
Cold
-2.5 > PMV ≥ -3.5
0
0.0%
OutOfRange
otherwise
0
0.0%
The comfort class is assigned from the PMV
value according to the classification in Table 3. The
table lists also the resulting numbers of entries in
each class. The distribution of the PMV and the
room measurements are displayed in Figure 3 for
comparison. The distributions of the PMV values are
about the same for all four rooms.
Figure 4 shows the results of the attribute impor-
tance analysis of the Oracle Data Miner run on the
computed comfort classes for the model building
data. Attribute Importance identifies the subset of at-
tributes relevant for classification using a Minimum
Description Length Algorithm (Oracle 2008). It is
obvious that the PMV and the related Percentage of
Persons Dissatisfied (PPD) have the biggest influ-
ence on the comfort class. The air and radiant tem-
peratures are next in rank of importance. Other val-
ues are less important for the comfort classification.
a) Room air temperature
b) Room radiant temperature
c) Room relative humidity.
d) Room PMV.
Figure 3. Histograms of various measures from 4 rooms.
( - mean value; - standard deviation; c95 95% confidence
interval)
Figure 4. Influences of the indoor measures in room comfort.
This is relevant for the model building in the next
step, as the PMV, PPD, radiance temperature, rela-
tive humidity, and CO2 are removed, as they are not
available in the other rooms on which the model
should be applied to. We assume that this is feasible,
as the room radiance temperature is strongly corre-
lated to the room air temperature (compare Figure 3a
and 3b, the room humidity is correlated to the out-
side humidity, and the clothing level was related to
the outside temperature during the PMV computa-
tion. Several tests in the next section will show if
this assumption is correct.
4.3 Building and evaluating the comfort model
Building a data mining model is the process of find-
ing the best algorithm or technique, by which the
building sensed data is analysed and represented as
patterns and rules (Harinath & Quinn 2006, p. 485).
The following shows how to classify room com-
fort. This is an overview of building, testing, and
scoring a classification model.
Classification is a model or a classifier that is
constructed to predict the categorical label of a room
in a building (Han & Kamber 2006, p. 286). These
classes are defined in Section 4.2.3. Classification
mining function uses different algorithms such as
decision trees, Naïve Bayes, and support vector ma-
chines.
As the attributes in Table 2 are unconditional this
makes Naïve Bayes the optimal algorithm (Fielding
2007, p. 99) to detect room comfort in buildings in
this case. Naïve Bayes is a probabilistic classifier
that uses Bayesian theory. It simplifies the learning
by assuming that the attributes in Table 2 are inde-
pendent (Abellan et al. 2007) given the room com-
fort class as the variable to classify. Decision trees
and support vector machines resulted in poor mod-
els.
In the setting phase to build the model, the cool
label has been used as the preferred target value.
Data split into two subsets of 60% and 40% for
training and testing the models. The 40% is called a
holdout sample or a test dataset. The sampling proc-
ess was disabled, as the model building time was ac-
ceptable for our data size. The model was tuned to-
wards a maximum average accuracy that creates a
model that is good in predicting all labels (Huang et
al. 2008).
During the building process the model learns
from the sensed data how to distinguish between
comfort classes in order to predict the same classes
when the model is applied to other rooms. The test
metrics of ODM, which are detailed in the following
sections, allow evaluation of the model‘s quality
(Maimon & Rokach 2005, p. 1241).
4.3.1 Predictive confidence
Predictive confidence is a visual indication of the ef-
fectiveness of this model compared to a random
guess of the rooms‘ comfort class. It is a validation
of the ability of the model to generalize what it
learned in a different data set (Fernández 2003, p.
152). If the needle in Figure 5 points to the lowest
point on left of the dial, then the model is no better
than a random guess (Haberstroh 2008, p. 85). The
comfort detection model developed in this study
shows 85.28% predictive improvement over a ran-
dom guess in predicting rooms‘ comfort class. In
comparison, a classification model taking also the
rooms‘ humidity, CO2 and radiant temperature into
account reaches a predictive confidence of 89.44%.
If only the rooms‘ air temperature is used for classi-
fication, the predictive confidence reduces to
74.75%. This shows on the one hand the high impor-
tance of the rooms‘ air temperature for the comfort
class. On the other hand, this demonstrates also that
the other values considered in this study, like outside
measurements and room size, improve the model
significantly.
Figure 5. The predictive confidence of the model.
4.3.2 Model accuracy
Model accuracy shows the several interpretations
of the fault detecting model ability in predicting the
class when applied to the test data.
Figure 6. Model accuracy and the confusion matrix.
Figure 6 shows the model accuracy for the com-
fort classification. The table on the top shows the
percentage of values correctly predicted per class.
For example, there are 308,623 cases with a comfort
class ‗neutral‘ and the model predicts 71.8% of them
correct. The cost is an indication of damage done by
incorrect prediction (Berry & Linoff 2004, p. 79),
and it is a valuable metric for model comparisons.
The displayed model was the best model we could
develop, with the lowest cost of predicting rooms‘
comfort classes.
The type of errors expected from this model is
shown on the confusion matrix in the lower table in
Figure 6. Actual (correct) values of the classes are
represented by rows and compared against the pre-
dictions made by the model in columns. The num-
bers tell how many classes were correctly predicted
or misinterpreted as another class. For example, the
first row in Figure 6 indicates that, of the samples
with the actual comfort class ‗cool‘, 709 cases were
correctly predicted and 55 cases were predicted in-
correctly as ‗slightly cool‘.
To interpret the confusion matrix, incorrect pre-
diction variations are usually placed next to the cor-
rect classes, i.e. the ‗neutral‘ class is either predicted
incorrectly as ‗slightly cool‘ or ‗slightly warm‘. The
rare classes ‗warm‘ and ‗cool‘ have a high percent-
age of correct prediction. The reason is probably that
they are characterized by extreme air temperatures.
However, the low number of samples do not allow
generalisation in so far as the classes will also be de-
tected correctly in other data. As the building data
contained no ‗hot‘ and ‗cold‘ cases the model will
not be able to classify these classes.
4.4 Knowledge deployment
The created model can be applied to any building
performance data that has the same structure and
format, to predict the comfort class. The applying
activity is sometimes referred to as scoring the
model (Giovinazzo 2002, p. 168) that uses the model
in a different data set to predict the classification.
This is done for all 70 rooms excluding the room
with the broken temperature sensor, which was
cleaned out as explained in Section 4.2.3. The new
model allows predicting the thermal comfort class
based only on the rooms‘ air temperature and the
buildings outside conditions.
a) Room air temperature.
b) Room PMV.
Figure 7. Histograms of measures from all rooms.
The distributions for the air temperature of these
rooms and the predicted PMV values are shown in
Figure 7. The mean air temperature () is 19.9°C
and slightly lower than the 21.7°C for the four
rooms‘ data used for model (see Figure 3a). The
standard deviation increases from 1.6°C to 2.2°C as
the added rooms increase the variance. This results
in a broader PMV distribution in Figure 7b in com-
parison to Figure 3c, with significantly more
‗slightly cool‘ and ‗cool‘ values.
Figure 8. Sample of output table for applying the model.
A sample of the output table of applying the
model is displayed in Figure 8. The sample table
shows each row with the identifier, prediction of the
most likely class, the probability that this is the right
guess; the cost of incorrect prediction, and the rank
to categorize predictions. The room name was added
to ease readability.
The model estimates to make correct predictions
with mean probabilities of 78%. The ‗neutral‘ label
is usually predicted with 97% mean probability,
‗slightly cool‘ with 76%, ‗slightly warm‘ with 21%,
and ‗cool‘ and ‗warm‘ with 15% mean probability.
The reason for this distribution is that the data used
for building the model contained mostly cases for
‗neutral‘, which increases the model quality for this
case, but the lack of data for the other cases reduces
their model quality.
4.5 Knowledge gained and interpretation
As a last step, the PMV distribution for a room was
analysed to identify the rooms with an emphasis on
not ‗neutral‘ comfort level. See Figure 9 for a loca-
tion of the rooms. 40 rooms out of 70 were identified
as having mainly ‗slightly cool‘ comfort level, and 5
rooms had a ‗cool‘ comfort level for more than 30%
of the cases. Four of these five rooms are located at
the south facade on ground level and three have ex-
terior doors. The ground floor has the highest num-
ber of rooms with ‗neutral‘ comfort. One room in
the middle of the floor shows abnormal behaviour
that should be investigated, as the room has more
than 30% ‗cool‘ comfort level in contrast to its
neighbour rooms.
N
Figure 9. Comfort levels of the rooms.
In general, the thermal comfort for the scored
winter period was ‗slightly cool‘. The set point tem-
perature for all rooms was 20°C, which represents
the mean temperature value as shown in Figure 7a.
However, to provide a better thermal comfort, the
set point should be higher.
Office hours were not considered during the
analysis. The mean air temperature varies in the
scoring data by 1.5°C reaching the minimum of
19.0°C at 2am and the maximum of 20.5°C at 4pm.
5 CONCLUSION
Two approaches were introduced to analyse building
performance data for energy-efficient buildings.
The data warehouse solution provides a single re-
pository for building performance data, creates so-
phisticated energy aggregations, and provides
friendly user interfaces.
The data mining model automates and eases
evaluating building thermal comfort, while reducing
the cost of monitoring equipment. The process from
data acquisition, preparation, model building, to
knowledge deployment was examined using real
data from the ERI. The results show that the ap-
proach is feasible, but more data is needed to train
the model for less frequent classes like ‗hot‘ and
‗cold‘.
Implementing data mining techniques to building
sensed data will help in stabilising rooms‘ prefer-
ences while optimising energy usage. Therefore, the
correlations between the building energy usage and
thermal comfort will be further examined with a
special focus on the sustainable energy sources of
the ERI. Another future research topic will be the
development of mining models for fault detection
and diagnosis as well as mining models that consider
human comfort feedback along with other influences
in room states, such as the structural properties of
the building and its geometrical specifications. The
extensions of the ERI with a further 80 wireless sen-
sors will increase the data set for analysis and will
also provide more validation data for this model.
These solutions are used by the ITOBO (2007) pro-
ject to increase the value of energy-efficient smart
buildings.
6 ACKNOWLEDGEMENT
Work in the Strategic Research Cluster ‗ITOBO‘ is
funded by Science Foundation Ireland and additional
contributions from 5 industry partners. Joern Ploen-
nigs is as Feodor Lynen Fellow in Cork and wants to
thank the Humboldt-Foundation and the German
BMBF for their support.
The authors thank Paul Stack, Luke Allan, Brian
Cahill, Civil Engineering UCC; Anika Schumann,
Cork Constraint Computation Centre; and Haithum
Elhadi, U.S. Telecom and Illinois Institute of Tech-
nology for their contribution to this research.
7 REFERENCES
Abellan, J., Cano, A., Masegosa, A. R., & Moral, S. 2007. A
Semi-Naive Bayes Classifier with Grouping of Cases. In K.
Mellouli (Ed.), 9th European Conference, ECSQARU (pp.
477-488). Hammamet, Tunisia: Springer.
Adornavicius, G., & Tuzhilin, A. 2002. Using data mining
methods to build customer profiles. IEEE Computer 34 (2):
74-82.
Ahmed, A., Menzel, K., Ploennigs, J., & Cahill, B. 2009. As-
pects of Multi-dimensional Data Analysis of Building Per-
formance Data Management. 16th European Group for In-
telligent Computing in Engineering International
Workshop. Berlin, Germany, accepted.
Apte, C., Liu, B., Pednault, E. P., & Smyth, P. 2002. Business
Application of Data Mining. Communications of the ACM
45 (8): 49-53.
Atzmüller, M. 2007. Knowledge-intensive Subgroup Mining:
Techniques for Automatic and Interactive Discovery. IOS
Press.
Augenbroe, G., Park, C. S. 2005. Quantification methods of
technical building performance. Building Research and In-
formation 33 (2): 159-72.
Berry, M. J., & Linoff, G. 2004. Data mining techniques: for
marketing, sales, and customer relationship management.
John Wiley and Sons.
Capehar, B. L., Turner, W. C., & Kennedy, W. J. 2008. Guide
to Energy Management. The Fairmont Press.
Crawley, D. B., Hand, J. W., Kummert, M., & Griffith, B. T.
2008. Contrasting the capabilities of building energy per-
formance simulation programs. Building and Environment
43 (4): 661-673 .
Dong, B., Cao, C., & Lee, S. E. 2005. Applying support vector
machines to predict building energy consumption in tropi-
cal region. Energy and Buildings 37 (5): 545-553.
ERI 2002. Environmental Research Institute. Cork, Ireland:
University College Cork, http://eri.ucc.ie.
Fernández, G. 2003. Data mining using SAS applications. CRC
Press.
Fielding, A. 2007. Cluster and classification techniques for the
biosciences. Cambridge University Press.
Figueiredo, V., Rodrigues, F., Vale, Z., & Gouveia, J. B. 2005.
IEEE Transaction on Power Systems 20 (2): 596-602.
Giovinazzo, W. A. 2002. Internet-enabled business intelli-
gence. Prentice Hall PTR.
Haberstroh, R. 2008. Oracle Data Mining Tutorial for Oracle
Data Mining 11g Release 1. Oracle.
Han, J., & Kamber, M. 2006. Data mining: concepts and tech-
niques (2 ed.). Morgan Kaufmann.
Harinath, S., & Quinn, S. R. 2006. Professional SQL server
analysis services 2005 with MDX. John Wiley and Sons.
Huang, B., Cai, Z., Gu, Q., & Chen, C. 2008. Using Support
Vector Regression for Classification. 4th International Con-
ference on Advanced Data Mining and Applications (pp.
581-588). Chengdu, China: Springer.
ISO 7730:2005. Ergonomics of the thermal environment - Ana-
lytical determination and interpretation of thermal comfort
using calculation of the PMV and PPD indices and local
thermal comfort criteria.
ITOBO 2007. Information & Communication Technology for
Sustainable and Optimised Building Operation. Cork, Ire-
land: http://zuse.ucc.ie/itobo/.
Lane, P. 2007. Data Warehousing Guide, 11g Release 1 (11.1),
Oracle Data Base, Oracle.
Lang, R., Bruckner, D., Pratl, G., Velik, R., & Deutsch, T.
2007. Scenario recognition in modern building automation.
7th IFAC International Conference on Fieldbuses & Net-
works in Industrial & Embedded Systems, (pp. 305-312).
Ling, C. X., & Li, C. 1998. Data Mining for Direct Marketing:
Problems and Solutions. 4th International Conference on
Knowledge Discovery and Data Mining, (pp. 73-79).
Maimon, O. Z., & Rokach, L. 2005. Data mining and knowl-
edge discovery handbook. Springer Science & Business.
McCue, C. 2006. Data mining and predictive analysis: intelli-
gence gathering and crime analysis. Butterworth-
Heinemann.
Menzel, K., Pesch, D., O‘Flynn, B., Keane, M., & O‘Mathuna,
C. 2008. Towards a Wireless Sensor Platform for Energy
Efficient Building Operation. 12th International conference
on Computing in Civil and Building Engineering (pp. 381-
386). Beijing, China : Elsevier B.V.
Metz, B. 2007. IPCC Fourth Assessment Report on the mitiga-
tion of climate change for researchers, students, and poli-
cymakers. University Press.
Mihalakakou, G., Santamouris, M., & Tsangrassoulis, A. 2002.
On the energy consumption in residential buildings. Energy
and Buildings 34 (7): 727-736.
Morbitzer, C. and Strachan, P. and Simpson, C. 2004. Data
mining analysis of building simulation performance data.
Building Services Engineering Research and Technology
35 (3): 253267.
Moujalled, B., Cantin, R., & Guarracin, G. 2008. Comparison
of thermal comfort algorithms in naturally ventilated office
buildings. Energy and Buildings 40 (12): 22152223.
Nicol, F., Parsons, K. 2002, Special issue on thermal comfort
standards, Energy and Buildings 34 (6): 529-685.
Oracle. 2008. Oracle Data Mining Concepts. Oracle.
Perez-Iratxeta, C., Bork, P., & Andrade, M. A. 2002. Associa-
tion of genes to genetically inherited diseases using data
mining. Nature Genetics 31: 316-319.
Pfafferott, J. U., Herkel, S., Kalz, D. E., Zeuschner, A. 2007
Comparison of low-energy office buildings in summer us-
ing different thermal comfort criteria, Energy and Buildings
39 (7): 750-757.
Rob, P., Coronely, C., & Crockett, K. 2008. Data Bases Sys-
tems: Design, Implementation and Management. Cengage
Learning EMEA.
Stackowiak, R., Rayman, J., & Greenwald, R. 2007. Oracle
data warehousing and business intelligence solutions. John
Wiley and Sons.
Wang, X., & Huang, J. Z. 2006. A Cased-Based Data Mining
Platform. In G. J. Williams, & S. J. Simoff, A State of the
Art Survey, Data mining: theory, methodology, techniques,
and applications (pp. 28-39). Springer Science & Business.
Witten, I. H., & Frank, E. 2005. Data mining: practical ma-
chine learning tools and techniques (2 ed.). Morgan Kauf-
mann.
Wu, S., Clements-Croom, D. 2007 Understanding the indoor
environment through mining sensory dataA case study.
Energy and Buildings 39 (11): 11831191.
Yao, R., Li, B., Liu, J. 2009. A theoretical adaptive model of
thermal comfort - Adaptive Predicted Mean Vote (aPMV),
Building and Environment 44 (10): 2089-2096.
... Comprehensive and consistent CPM is a major contribution to quality management in the AECO-sector. The quality of work executed has a substantial impact on buildings' performance and is in some cases a pre-requisite for the introduction of novel, innovative business models, such as energy service contracting [1], [2], predicted building automation and control [3] or performance-based maintenance [4]. Therefore, we selected for our research the monitoring of the installation process of windows as an example. ...
Chapter
Full-text available
Construction progress monitoring (CPM) is essential for effective project management, ensuring on-time and on-budget delivery. Traditional CPM methods often rely on manual inspection and reporting, which are time-consuming and prone to errors. This paper proposes a novel approach for automated CPM using state-of-the-art object detection algorithms. The proposed method leverages e.g. YOLOv8's real-time capabilities and high accuracy to identify and track construction elements within site images and videos. A dataset was created, consisting of various building elements and annotated with relevant objects for training and validation. The performance of the proposed approach was evaluated using standard metrics, such as precision, recall, and F1-score, demonstrating significant improvement over existing methods. The integration of Computer Vision into CPM provides stakeholders with reliable, efficient, and cost-effective means to monitor project progress, facilitating timely decision-making and ultimately contributing to the successful completion of construction projects.
... Context-sensitive information representation aiming to avoid an information overflow in collaborationscenarios is supported using mobile devices in the field [11]. Performance data can be analyse before inspections are executed [12]. Thus, performance-based scheduling of inspections and maintenance activities becomes possible [13]. ...
Chapter
Full-text available
This paper describes how digital twin technology can be used to develop holistic, standardized building documentation, paving the way from the traditional, national ‘Bauwerksbuch’ to the proposed, international EU-building logbook. The research explores the concept of digital twins and their potential to harmonize long-term data management for facilities, enabling the set-up of international collaborative networks among various stakeholders. The study highlights the benefits of digital twins to support collaboration between property owners, public authorities, building inspectors, and different engineering disciplines for maintenance, energy efficiency, and sustainability. It specifies the steps required for improving data quality, consistency, and accuracy of the ‘digital sibling’ to match reality to the greatest extent possible, ensuring seamless collaboration across networks. The authors highlight the potential of building logbooks to transform the way we manage facilities, improve energy efficiency and sustainability, and deliver decision support for the long-term maintenance of our built environment, through a collaborative network approach.
... To improve these communication processes, many efforts have been made toward structuring the processes of collaboration and data sharing (Martin et al., 2006). Building Information Modelling (BIM) stands out as the most notable approach that brings about improved facilities for information management in AEC operation projects (Allan and Menzel, 2009;Ahmed et al., 2009). The BIM environment approach is a digital workspace where multiple stakeholders in a construction project can collaborate and share data, however, despite the many improvements BIM environments have brought to the construction industry, they do not entirely succeed in addressing interoperability challenges (Pauwels, 2014). ...
... Comprehensive and consistent CPM is a major contribution to quality management in the AECO-sector. The quality of work executed has a substantial impact on buildings' performance and is in some cases a pre-requisite for the introduction of novel, innovative business models, such as energy service contracting [1], [2], predicted building automation and control [3] or performance-based maintenance [4]. Therefore, we selected for our research the monitoring of the installation process of windows as an example. ...
Preprint
Full-text available
Construction progress monitoring (CPM) is essential for effective project management, ensuring on-time and on-budget delivery. Traditional CPM methods often rely on manual inspection and reporting, which are time-consuming and prone to errors. This paper proposes a novel approach for automated CPM using state-of-the-art object detection algorithms. The proposed method leverages e.g. YOLOv8's real-time capabilities and high accuracy to identify and track construction elements within site images and videos. A dataset was created, consisting of various building elements and annotated with relevant objects for training and validation. The performance of the proposed approach was evaluated using standard metrics, such as precision, recall, and F1-score, demonstrating significant improvement over existing methods. The integration of Computer Vision into CPM provides stakeholders with reliable, efficient, and cost-effective means to monitor project progress, facilitating timely decision-making and ultimately contributing to the successful completion of construction projects.
... The strategy must include an advanced analysis of the heat flux to obtain efficient energy consumption [13][14][15]. To perform the energy calculation, such as heat loss from the external boundary of the building, and to use the ground as a potential source for heat pump applications, understanding the thermal and deformation properties of the surface and subsurface of the ground system is needed [16,17]. ...
Article
Full-text available
During the cold period, the heat transferred through the building’s external boundaries to the environment changes the naturally established heat balance between atmospheric air and soil layers. The process of the heat transfer into the ground was investigated experimentally in the cases of the relatively high and low levels of the water table. The first part of each experiment was the research of the heat transfer into the soil from the heating surface. The second part was monitoring the heat dissipation in the ground until the return to the initial natural thermodynamic equilibrium after the heating is intercepted. The heating device was installed into the clay at a one-meter depth, and its surface temperature was kept constant at 20 degrees Celsius. The ground was warmed up in contact with the heating surface. The heat spread to other soil layers and transformed the temperature distribution. A new thermodynamic equilibrium was reached six days after the heating started at an initial temperature of 4.4 degrees Celsius. The intensity of the heat flux density approached a stable value equal to 117.4 W/m2, which is required to maintain this thermodynamic equilibrium, as the heat was dissipating in the large volume of the surrounding soil. The heating was turned off, and the natural initial heat balance was reached after two weeks.
... • Data mining is also used for optimising structural engineering design (Leu et al., 2001;Tinoco et al., 2011;Gandomi et al., 2016), energy-efficient and sustainable building design Naganathan et al., 2016), supporting collaborative design processes (Chiu and Lan, 2005), construction operation simulation (Akhavian and Behzadan, 2013), investment feasibility analysis (Yun and Caldas, 2009), future cost forecasting (Wilmot and Cheng, 2003), analysing building performance (Ahmed et al., 2009;Fan et al., 2015), construction accident analysis (Abdelhamid and Everett, 2000;Huang and Hinze, 2003;Ding et al., 2018), safety clash detection (Tixier et al., 2017), optimising bid selection policies (Art Chaovalitwongse et al., 2011), hierarchical construction document classification (Caldas and Soibelman, 2003), asphalt paving pattern prediction and analysis for transportation projects (Nassar, 2007) and predictive cost overrun and mitigation analysis (Ahiaga-Dagbui and Smith, 2014; Williams and Gong, 2014). ...
Article
Full-text available
Purpose The purpose of this paper is to explore the current challenges and drivers for data mining in the AEC sector. Design/methodology/approach Following a comprehensive literature review, the data mining concept was investigated through a workshop with industry experts and academics. Findings The results showed that the key drivers for using data mining within the AEC sector is associated with the sustainability, process improvement, market intelligence, cost certainty and cost reduction, performance certainty and decision support systems agendas in the sector. As for the processes with the greatest potential for data mining application, design, construction, procurement, forensic analysis, sustainability and energy consumption and reuse of digital components were perceived as the main process areas. While the key challenges were perceived as being, data issues due to the fragmented nature of the construction process, the need for a cultural change, IT systems used in silos, skills requirements and having clearly defined business goals. Originality/value With the increasing abundance of data, business intelligence and analytics and its related concepts, data mining and Big Data have captured the attention of practitioners and academics for the last 20 years. On the other hand, and despite the growing amount of data in its business context, the AEC sector still lags behind in utilising those concepts in its end products and daily operations with limited research conducted to explore those issues at the sector level. This paper investigates the main opportunities and barriers for data mining in the AEC sector with a practical focus.
... The use of graphs to show information about the energy consumption of buildings turns out to be more efficient and easier to understand by the user than statistical analysis. Energy consumption data is usually represented in the form of bar graphs (Ahmed, Ploennigs, Gao, & Menzel, 2009;Buchmann, Böhm, Burghardt, & Kessler, 2013), time series (Younger, 2007), and scatter diagrams (Reddy, 2006); other techniques include box whisker mean plots (BWM) and carpet plots (Raftery & Keane, 2011). The more advanced techniques are those applied by virtue of using 'data mining' to reduce dimensionality and to allow carrying out projections of continuous maps by interpolation of discrete variables (Morán, 2012). ...
Article
Full-text available
The constant increase in the requirement for electric energy on the part of the service sector has driven the development of tools for the analysis of energy data regarding the management of buildings. Particularly university campuses are made up of buildings for different purposes and uses. Therefore, this work proposes to develop a method capable of providing clear and easy-to-understand information to track electricity end-use by means of two-dimensional graphs. The aim of the method is to analyse the data of the electricity consumption of one or several buildings and compare them with regard to their surface areas and uses, as well as to obtain a correlation with the outdoor temperature. The visual analysis of the building electricity consumption in the campus 'Sciences et Technologies' of the University of Bordeaux in France is based on graphs which include electricity use intensity, time series of daily electricity consumption, scatter diagrams of consumption versus heating degree-days, and boxplots for daily consumption profiles. This analysis permits characterising the electricity consumption of the different types of buildings and determining a trends of energy end-use in the short as well as in the long term.
... The authors used those rules to achieve near-optimal supervisory control strategies for a mixed-mode building during the cooling season. Although a variety of techniques can be applied to extract IF-THEN rules, most authors use classification techniques [32,33], especially decision tree algorithms [31,28]. In addition, classification models are e↵ective tools that can be used to predict building user comfort under di↵erent environmental conditions [28]. ...
Article
Full-text available
The energy consumption of residential and commercial buildings has risen steadily in recent years, an increase largely due to their HVAC systems. Expected energy loads, transportation, and storage as well as user behavior influence the quantity and quality of the energy consumed daily in buildings. However, technology is now available that can accurately monitor, collect, and store the huge amount of data involved in this process. Furthermore, this technology is capable of analyzing and exploiting such data in meaningful ways. Not surprisingly, the use of data science techniques to increase energy efficiency is currently attracting a great deal of attention and interest. This paper reviews how Data Science has been applied to address the most difficult problems faced by practitioners in the field of Energy Management, especially in the building sector. The work also discusses the challenges and opportunities that will arise with the advent of fully connected devices and new computational technologies.
Chapter
The handover and commissioning of buildings are highly collaborative tasks since multiple sub-systems, designed by numerous experts from different disciplines will become operational. This means, the commissioning process is the final proof of concept for architects and engineers’ design efforts. Additionally, construction companies must verify that all materials used are certified according to national standards. Finally, certified inspectors must approve the operational capability of critical systems and document each approval step using so-called operational certificates. This paper describes a novel, AI-guided approach, which will assist main contractors, investors and building owners in compiling, storing and managing all related product models, certificates and data sources for real-time monitoring (e.g. sensors) in a federated ‘distributed Digital Twin Information Model’ (dDT-IM). Furthermore, the paper explains why the availability of a complete, consistent, and comprehensive dDT-IM is an essential pre-requisite for the successful execution of collaborative, intelligent building operation.
Conference Paper
Today Pakistan is facing numerous challenges for the interconnection of local energy resources and for balanced energy policies. Data Science, Big Data, Artificial Intelligence (AI), IoT and Cloud computing draws our focus towards controlling energy crises in terms of smart energy generation, consumption and to overcome causes of energy crises. To make a conclusion valuable we have to extract significant value from a large amount of data that's why data management plays a significant role. This Paper presents a review of energy sectors, energy resources, energy crises in Pakistan. It also presents the possible solution of energy crises with the help of data science application and the involvement of Big Data, Cloud computing, IoT and AI.
Article
Full-text available
A building performance assessment toolkit was developed for use by large corporate owners and building portfolio managers in the US. A variety of technical performance aspects are addressed such as energy, lighting, thermal comfort, maintenance and indoor air quality. Every assessment is based on a normative and objective Performance Indicator (PI). For easy data capture and calculation of PIs, the toolkit was implemented in a web hosted form, enabling facility managers and staff to collect the data during a walk-through enabled by PDA-based data entry. The current set of performance indicators is discussed and the results of the first benchmarks, most notably the energy benchmarks, are reported.
Article
Detailed simulation studies of building performance can result in large data sets, particularly where statistical information on annual energy or environmental performance is required. Key performance indicators such as the number of hours above a certain temperature can easily be extracted. However, it is difficult for users to explore such datasets and understand the underlying reasons why a building performs in a certain way. This is especially true in climate responsive buildings which involve complex interactions of ventilation, solar gains, internal gains and thermal mass, for example. Data mining techniques have traditionally been employed in the financial and marketing sectors to elicit patterns within the data. This paper describes how the different data mining techniques may be employed in helping to analyse building performance data. Clustering is identified as a particular useful analysis technique and its potential is illustrated through a number of case studies.
Book
This book presents an overview on energy management. Chapters include energy management systems; energy audits; economic evaluation; energy bills; lighting; HVAC systems; industrial wastes; steam generation; control systems; maintenance; insulation; process energy management; and renewable resources and water management. Individual chapters were processed separately for the databases.
Book
The Data Mining process encompasses many different specific techniques and algorithms that can be used to analyze the data and derive the discovered knowledge. An important problem regarding the results of the Data Mining process is the development of efficient indicators of assessing the quality of the results of the analysis. This, the quality assessment problem, is a cornerstone issue of the whole process because: i) The analyzed data may hide interesting patterns that the Data Mining methods are called to reveal. Due to the size of the data, the requirement for automatically evaluating the validity of the extracted patterns is stronger than ever. ii)A number of algorithms and techniques have been proposed which under different assumptions can lead to different results. iii)The number of patterns generated during the Data Mining process is very large but only a few of these patterns are likely to be of any interest to the domain expert who is analyzing the data. In this chapter we will introduce the main concepts and quality criteria in Data Mining. Also we will present an overview of approaches that have been proposed in the literature for evaluating the Data Mining results.
Article
Data warehousing and on-line analytical processing (OLAP) is becoming an important tool for decision making in corporations and other organizations. It is one of the main focuses of the database industry. However, the functions and properties of decision support system are rather different from the traditional database application. For example, user of decision support system may be interested in the trend of certain data instead of the actual data itself. Another feature of data warehouse system is that the amount of data inside is tremendous, which means that the traditional query process on these data will be very time consuming. In this survey paper, we will mainly discuss several techniques used in data warehouse to accelerate the OLAP process speed. The rest of paper is organized as follows: Chapter 1 is the introduction, in which we will give an overview of current technology used in the area of data warehousing and OLAP. In chapter 2, we will talk about a new aggregation operator which is called Data Cube operator. The Data Cube operator can perform N-dimensional aggregation. From chapter 3, we will begin to discuss one of the most importation is issue in data warehousing and OLAP. That is view materialization and view maintenance. In chapter 3, a general introduction to the problems and techniques of materialized views maintenance will be given. In chapter 4, some techniques developed base on the space constrain of data warehouse will be discussed. In chapter 5, we will use a dynamic view management system to discuss the techniques of dynamic view selection and view maintenance.
Conference Paper
In this work, we present a semi-naive Bayes classifier that searches for dependent attributes using different filter approaches. In order to avoid that the number of cases of the compound attributes be too high, a grouping procedure is applied each time after two variables are merged. This method tries to group two or more cases of the new variable into an unique value. In an emperical study, we show as this approach outperforms the naive Bayes classifier in a very robust way and reaches the performance of the Pazzani’s semi-naive Bayes [1] without the high cost of a wrapper search.