Fig 2 - uploaded by Oscar Corcho
Content may be subject to copyright.
Piecewise linear representations 

Piecewise linear representations 

Source publication
Conference Paper
Full-text available
Sensor network deployments have become a primary source of big data about the real world that surrounds us, measuring a wide range of physical properties in real time. With such large amounts of heterogeneous data, a key challenge is to describe and annotate sensor data with high-level metadata, using and extending models, for instance with ontolog...

Similar publications

Article
Full-text available
Data and metadata in datasets experience many different kinds of change. Values are inserted, deleted or updated; rows appear and disappear; columns are added or repurposed, etc. In such a dynamic situation, users might have many questions related to changes in the dataset, for instance which parts of the data are trustworthy and which are not? Use...
Presentation
Full-text available
Introduction to big data management models in the field of bioinformatics.
Article
Full-text available
With the advances of last year's technologies many new observation platforms have been created and connected on network for the diffusion of numerous and diverse observations, and also it provided a great possibility to connect all kind of people facilitating the creations of great scale and long-term studies. This paper is focused on the marine ob...
Article
Full-text available
Abstract: Nowadays, the information technology spreads very fast due to most generated digital data and the exchanged on Internet day-to-day so large volume of data indicates the new age of the big data management. The world‟s effective capacity to exchange information through telecommunication network and amount of internet traffic; the data growt...
Article
Full-text available
Big data bioinformatics aims at drawing biological conclusions from huge and complex biological datasets. Added value from the analysis of big data, however, is only possible if the data is accompanied by accurate metadata annotation. Particularly in high-throughput experiments intelligent approaches are needed to keep track of the experimental des...

Citations

... This paper primarily differs from previously conducted research in that it specifically focuses on how OGD, generated by an IoT system, can be used to drive and enable the development of a targeted tool for environmental monitoring. When previously published papers do discuss IoT and OGD, the papers are almost exclusively related to the topic of smart cities, see, for example, Aguilera et al. (2017); Zanella et al. (2014); Ahlgren et al. (2016), or rather explore the generation and use of data from IoT devices (Montori et al., 2017;Borges Neto et al., 2015;Calbimonte et al., 2012). While such papers do, sometimes, focus on the utilization of OGD, they also often refer simply to data generated from IoT devices. ...
Article
Environmental monitoring of rivers is a cornerstone of the European Union's Water Framework Directive. It requires the estimation and reporting of environmental flows in rivers whose characteristics vary widely across the EU member states. This variability has resulted in a fragmentation of estimation and reporting methods for environmental flows and is exhibited by the myriad of regulatory guidelines and estimation procedures. To standardise and systematically evaluate environmental flows at the pan-European scale, we propose to formalise the estimation procedures through automation by reusing existing river monitoring resources. In this work, we explore how sensor-generated hydrological open government data can be repurposed to automate the estimation and monitoring of river environmental flows. In contrast to existing environmental flows estimation methods, we propose a scalable IoT-based architecture and implement its cloud-layer web service. The major contribution of this work is the demonstration of an automated environmental flows system based on open river monitoring data routinely collected by national authorities. Moreover, the proposed system adds value to existing environmental monitoring data, reduces development and operational costs, facilitates streamlining of environmental compliance and allows for any authority with similar data to reuse or scale it with new data and methods. We critically discuss the opportunities and challenges associated with open government data, including its quality. Finally, we demonstrate the proposed system using the Estonian national river monitoring network and define further research directions.
... All this information is collected in a catalog called "METADATA repository". Metadata standards are specifications that intend to define a common understanding of the meaning or semantics of data [25] in such a way as to ensure their interpretation and the correct use by their owners and other users [26] data and to make their retrieval easier [27]. However, metadata takes on less importance when data collected remain within a narrow scope and in the case of direct data exchange. ...
Article
Full-text available
Climate change and human activities have a strong impact on lakes and their catchments, so to understand ongoing processes it is fundamental to monitor environmental variables with a spatially well-distributed and high frequency network and efficiently share data. An effective sharing and interoperability of environmental information between technician and end-user fosters an in-depth knowledge of the territory and its critical environmental issues. In this paper, we present the approaches and the results obtained during the PITAGORA project (Interoperable Technological Platform for Acquisition, Management and Organization of Environmental data, related to the lake basin). PITAGORA was aimed at developing both instruments and data management, including pre-processing and quality control of raw data to ensure that data are findable, accessible, interoperable, and reusable (FAIR principles). The main results show that the developed instrumentation is low-cost, easily implementable and reliable, and can be applied to the measurement of diverse environmental parameters such as meteorological, hydrological, physico-chemical, and geological. The flexibility of the solutions proposed make our system adaptable to different monitoring purposes, research, management, and civil protection. The real time access to environmental information can improve management of a territory and ecosystems, safety of the population, and sustainable socio-economic development.
... Calbimonte et al. [23] characterize the time-series data using local linear approximations to estimate derivatives. The local linear models are constructed adaptively using a greedy algorithm to minimize the number of piecewise linear segments used to approximate the data. ...
Article
This article presents a Unified Architecture (UA) for automated point tagging of Building Automation System (BAS) data, based on a combination of data-driven approaches. Advanced energy analytics applications—including fault detection and diagnostics and supervisory control—have emerged as a significant opportunity for improving the performance of our built environment. Effective application of these analytics depends on harnessing structured data from the various building control and monitoring systems, but typical BAS implementations do not employ any standardized metadata schema. While standards such as Project Haystack and Brick Schema have been developed to address this issue, the process of structuring the data, i.e., tagging the points to apply a standard metadata schema, has, to date, been a manual process. This process is typically costly, labor-intensive, and error-prone. In this work we address this gap by proposing a UA that automates the process of point tagging by leveraging the data accessible through connection to the BAS, including time-series data and the raw point names. The UA intertwines supervised classification and unsupervised clustering techniques from machine learning and leverages both their deterministic and probabilistic outputs to inform the point tagging process. Furthermore, we extend the UA to embed additional input and output data-processing modules that are designed to address the challenges associated with the real-time deployment of this automation solution. We test the UA on two datasets for real-life buildings: (i) commercial retail buildings and (ii) office buildings from the National Renewable Energy Laboratory (NREL) campus. The proposed methodology correctly applied 85–90% and 70–75% of the tags in each of these test scenarios, respectively for two significantly different building types used for testing UA's fully-functional prototype. The proposed UA, therefore, offers promising approach for automatically tagging BAS data as it reaches close to 90% accuracy. Further building upon this framework to algorithmically identify the equipment type and their relationships is an apt future research direction to pursue.
... Time series classification (TSC) is an important research topic that has been applied to medicine [18], biology [12], etc. Many TSC methods [3,8,10,18,19,21,22] are based on Bag-Of-Patterns (BOP) [10], which represents a time series with the histogram (word counts) of "words" (symbolic strings) converted from its subsequences by some discretization method [3,8,13,20]. BOP can capture local semantics [10], and is robust against noise [12,18] and phase shifts [10]. ...
... Time series classification (TSC) is an important research topic that has been applied to medicine [18], biology [12], etc. Many TSC methods [3,8,10,18,19,21,22] are based on Bag-Of-Patterns (BOP) [10], which represents a time series with the histogram (word counts) of "words" (symbolic strings) converted from its subsequences by some discretization method [3,8,13,20]. BOP can capture local semantics [10], and is robust against noise [12,18] and phase shifts [10]. ...
... First, we extract its subsequences by sliding window. Next, we transform each subsequence into a "word" (symbolic string) with a discretization method [3,8,13,20]. Finally, we obtain the histogram (word counts) of the words as the BOP representation. ...
Chapter
Full-text available
In time series classification, one of the most popular models is Bag-Of-Patterns (BOP). Most BOP methods run in super-linear time. A recent work proposed a linear time BOP model, yet it has limited accuracy. In this work, we present Hybrid Bag-Of-Patterns (HBOP), which can greatly enhance accuracy while maintaining linear complexity. Concretely, we first propose a novel time series discretization method called SLA, which can retain more information than the classic SAX. We use a hybrid of SLA and SAX to expressively and compactly represent subsequences, which is our most important design feature. Moreover, we develop an efficient time series transformation method that is key to achieving linear complexity. We also propose a novel X-means clustering subroutine to handle subclasses. Extensive experiments on over 100 datasets demonstrate the effectiveness and efficiency of our method.
... Another example of instance-based classification is described in [30]. The authors proposed a method to infer semantics of time series data by using depictive slopes of their linear approximations to determine their semantics. ...
Preprint
Full-text available
Building Management Systems (BMS) are crucial in the drive towards smart sustainable cities. This is due to the fact that they have been effective in significantly reducing the energy consumption of buildings. A typical BMS is composed of smart devices that communicate with one another in order to achieve their purpose. However, the heterogeneity of these devices and their associated meta-data impede the deployment of solutions that depend on the interactions among these devices. Nonetheless, automatically inferring the semantics of these devices using data-driven methods provides an ideal solution to the problems brought about by this heterogeneity. In this paper, we undertake a multi-dimensional study to address the problem of inferring the semantics of IoT devices using machine learning models. Using two datasets with over 67 million data points collected from IoT devices, we developed discriminative models that produced competitive results. Particularly, our study highlights the potential of Image Encoded Time Series (IETS) as a robust alternative to statistical feature-based inference methods. Leveraging just a fraction of the data required by feature-based methods, our evaluations show that this encoding competes with and even outperforms traditional methods in many cases.
... For such reason, in order to automatically classify each feed (defined as "datastream") we proposed a sequential ensemble algorithm that combines classifiers for both numerical and natural language data [10]. The algorithm has been tested on a number of datasets, among which the ThingSpeak dataset that we produced and made openly accessible 6 , and it is shown to outperform canonical approaches in literature such as [11]. ...
... Calbimonte et al. [22] characterize the time series data using local linear approximations to estimate derivatives. The local linear models are constructed adaptively using a greedy algorithm to minimize the number of piecewise linear segments used to approximate the data. ...
Preprint
This article presents a Unified Architecture (UA) for automated point tagging of Building Automation System (BAS) data, based on a combination of data-driven approaches. Advanced energy analytics applications-including fault detection and diagnostics and supervisory control-have emerged as a significant opportunity for improving the performance of our built environment. Effective application of these analytics depends on harnessing structured data from the various building control and monitoring systems, but typical BAS implementations do not employ any standardized metadata schema. While standards such as Project Haystack and Brick Schema have been developed to address this issue, the process of structuring the data, i.e., tagging the points to apply a standard metadata schema, has, to date, been a manual process. This process is typically costly, labor-intensive, and error-prone. In this work we address this gap by proposing a UA that automates the process of point tagging by leveraging the data accessible through connection to the BAS, including time series data and the raw point names. The UA intertwines supervised classification and unsupervised clustering techniques from machine learning and leverages both their deterministic and probabilistic outputs to inform the point tagging process. Furthermore, we extend the UA to embed additional input and output data-processing modules that are designed to address the challenges associated with the real-time deployment of this automation solution. We test the UA on two datasets for real-life buildings: (i) commercial retail buildings and (ii) office buildings from the National Renewable Energy Laboratory (NREL) campus. The proposed methodology correctly applied 85-90% and 70-75% of the tags in each of these test scenarios, respectively.
... The best possible usage of the data requires keeping record of all these metadata [11]. According to [20], raw sensor data has limited usage without any metadata that describes it. As a consequence it is hard to discover, integrate or interpret data without its metadata. ...
Article
Full-text available
Meteorological observation systems are extremely data-driven. However, several factors affect measurements, which require the use of environmental metrology techniques to increase the quality of measurements, decrease errors and evaluate measurements uncertainty. In this paper, we propose and develop a framework that integrates, process and visualizes sensor data and its associated metadata (for rainfall monitoring). This task is accomplished with a workflow designed to correct raw sensor data, which uses an elastic stack based infrastructure to collect, transform, and store sensor data and metadata. We validated our framework using real precipitation data from a Tipping Bucket Rain Gauge.
... interlinking weather data with energy data or geographical information). We build on existing research, such as [15] [16] and [17]. In [15] the authors integrate weather information together with health records using Linked Data principles and a conversion process similar to ours. ...
... The SSN ontology is reused for their purpose, as data from various sensors needs to be captured. Instead, in [17] examples of data analysis on semantic sensor metadata modelled using the SSN ontology are described. In our work, we designed a core ontology (that can be aligned to the SSN and AEMET ontologies) which specifically targets the integration of weather data with PV energy data and that eases the process of mapping and uplifting common datasets used by researchers in the domain. ...
... interlinking weather data with energy data or geographical information). We build on existing research, such as [15] [16] and [17]. In [15] the authors integrate weather information together with health records using Linked Data principles and a conversion process similar to ours. ...
... The SSN ontology is reused for their purpose, as data from various sensors needs to be captured. Instead, in [17] examples of data analysis on semantic sensor metadata modelled using the SSN ontology are described. In our work, we designed a core ontology (that can be aligned to the SSN and AEMET ontologies) which specifically targets the integration of weather data with PV energy data and that eases the process of mapping and uplifting common datasets used by researchers in the domain. ...
Preprint
Full-text available
Smart energy systems in general, and solar energy analysis in particular, have recently gained increasing interest. This is mainly due to stronger focus on smart energy saving solutions and recent developments in photovoltaic (PV) cells. Various data-driven and machine-learning frameworks are being proposed by the research community. However, these frameworks perform their analysis - and are designed on - specific, heterogeneous and isolated datasets, distributed across different sites and sources, making it hard to compare results and reproduce the analysis on similar data. We propose an approach based on Web (W3C) standards and Linked Data technologies for representing and converting PV and weather records into an Resource Description Framework (RDF) graph-based data format. This format, and the presented approach, is ideal in a data integration scenario where data needs to be converted into homogeneous form and different datasets could be interlinked for distributed analysis.