Example of data modeled in RDF format

Source publication

Combining Business Intelligence with Semantic Web: Overview and Challenges

Conference Paper

Full-text available

May 2015

Under today's highly complex and dynamic business environment, external data (most often issued from web) need to be included in traditional On-Line Analytical Processing (OLAP) analysis so that decision-makers would be well-informed before making effective decision. Including external web data requires knowing the exact semantic meaning in order t...

Context 1

... of web data, while SW data have an important density of valuable information that can be used for enriching business analysis (Thi et Nguyen, 2008 ; Kä mpgen et Harth, 2011 ; Zorrilla et al. , 2012 ; Etcheverry et R. A. Vaisman, 2012 ; Abelló et al. , 2013 ; Ibragimov et al. , 2014 ; Aufaure et Chiky, 2014). Combining BI with SW, however, is not a trivial task due to the scalability, complexity and heterogeneity of SW data. It raises the following questions: How to integrate heterogeneous SW data in a BI system originally designed for factual data? How to carry out multidimensional analyses over large amount of SW data in the lack of relevant model? How to present analysis results containing both factual data and SW data? These questions are examples of issues waited to be resolved. The aim of this paper is to present an up-to-date survey of research results and outline future research challenges in BI and SW domains. The rest of the paper is organized as follows. We (i) briefly present the concepts of BI and SW in the section 2; (ii) give an overview of recent research results combining the domain of BI with SW in the sections 3 and 4; (iii) discuss emerging trends and perspectives of future researches in the section 5. The term of Business Intelligence (BI) refers to a set of techniques used for collecting, extracting and analyzing business data to support decision-making process. Coming from heterogeneous and distributed operational sources, data used in decision-making process are stored in Data Warehouse after going through a process called ETL (standing for Extraction, Transformation and Loading). Among different types of data warehouse, On-Line Analytical Processing (OLAP) data warehouse has been a specific research topic for over a decade. The concepts of OLAP were firstly proposed in (Codd, Codd, et Salley, 1993), they provide solutions for creating, managing, analyzing and reporting large amount of multidimensional data in an interactive way. Among all data models proposed for OLAP, the Star Schema (Kimball, 1996) is the most widely accepted model (Chaudhuri, Dayal, et Narasayya, 2011). At conceptual level, Star Schema presents data according to subjects of analysis (facts) and axes of analysis (dimensions). At logical level, Star Schema can be built on top of different types of databases: Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP) and Hybrid OLAP (HOLAP). At physical level, Star Schema can be implemented in different ways, as long as the implementation conforms to the twelve evaluation rules defined in (Codd, Codd, et Salley, 1993), such as multidimensionality, transparency, accessibility, etc. Together with the multidimensional data model, a set of operators is indispensable for OLAP analysis. They permit to aggregate information (Drilldown, Rollup), filter analysis results (Slice, Dice) and change analysis axes (Pivot). (Kimball, 1998) points out that the main advantages of OLAP model lie in its simplicity and understandability that permit users to interact with large amount of complex data in an efficient way. Nowadays, OLAP is a well-mastered technology when it comes to homogenous and structured data in classical data warehouse. However, as factual data provide only limited and partial views over open-world business scenarios (Zorrilla et al. , 2012), the data warehouse community looks for solutions for enriching data collection with external data. To accurately exploit web data, a system needs to be capable to read the exact semantic meaning of web-published information. An acknowledged way to publish machine-readable information is to use Semantic web (SW) technologies. The purpose of SW technologies is to fix a common vocabulary and a set of interpretation constraints (inferring rules) so as to semantically express metadata over web information and allow doing some reasoning on it. These technologies 2 provide the capability of annotating web data with semantics, e.g., through RDF and ontologies, hence generating a web of semantic linked data (e.g., Linked Open 3 Data cloud ). 4 Tim Berners-Lee pointed out four principles that SW data should follow : use Uniform Resource Identifiers (URIs) to identify object; use Hypertext Transfer Protocol (HTTP) to facilitate searching for objects by human-beings; use the 5 Resource Description Framework (RDF) format as standard to provide descriptive information about an object; link URIs to others in order to connect individual data into a data web. Compared to traditional web technologies which focus mainly on data representation, SW puts a higher value on providing machine-readable information about web resources and relationships between resources. More specifically, SW presents human knowledge through structured collections of information and sets of inference rules (Berners-Lee, Hendler, et Lassila, 2001). The basic data model is RDF permitting to express simple statements about resources, using named properties and values (cf. figure 2). Resources described by RDF are not necessarily retrievable on the web, they can be anything with an unique identity, from physical objects to abstract concepts (McBride, 2004). A Triple Store permits to store RDF data. The set of statements in a RDF Triple Store is composed of URIs, blank nodes and literals. A RDF triple refers to subject , predicate and object : a subject is a web resource identified by a URI or a blank node; an object can be a web resource or a literal that possesses a primitive value; a predicate is a binary relationship connecting a subject with an object. For instance, in the figure 2 we can find the predicate denoted by the label Concerns associating the resource Sales with another resource ProductX , and another predicate named hasPrice connecting the subject denoted ProductX to a textual literal “ 30 ” which is the product ’ s price. There exist other SW formats with more powerful expressivity than RDF. Built 6 on top of RDF, RDF Vocabulary Description Language (or RDF schema or RDFS ) is a language that defines the terms used in RDF graph. Equivalent to schema definition language in relational and object-oriented data model, RDFS is used to describe classes of resources. In other words, RDFS is a simple ontology definition language which allows expressing taxonomies. The concepts of RDFS are described in form of a set of predefined RDF resources with special meanings. However, the reasoning capacity of RDFS is very limited, only basic inferences about taxonomies are supported (Horrocks, Patel-Schneider, et van Harmelen, 2003). Facing to this issue, the Web Ontology Working Group of W3C develops more powerful ontology languages, such as OWL-Lite, OWL-DL, OWL-Full, which allows defining explicit, formal conceptualizations of domain models. In general, OWL enhances the expressivity of RDF and RDFS schema by adding Description Logic (DL). Hence, OWL is an ontology language with sufficient expressive power which can support efficient reasoning through well-defined syntax and semantics (Antoniou et van Harmelen, 2004). By using the SW formats, web resources can be enriched with annotations and other markups capturing the semantic metadata of resources. However, not all current technologies are fully compatible with the semantic enrichment. For instance, traditional Information Retrieval (IR) technologies cannot directly exploit the annotated semantic meaning of web resources (Finin et al. , 2005). On the other hand, new research directions have been proposed to combine traditional research approaches with SW technologies, such as Semantic Information Retrieval (Ferná ndez et al. , 2011), Exploratory OLAP (Abelló et al. , 2015) etc. In this paper, we only focus on the emerging research direction which aims at enhancing traditional BI with new SW technologies. Nowadays, a large number of researches try to merge OLAP analysis with SW technologies both in data integration and data processing levels. This research direction permits to combine powerful tools and technologies in both domains. But it is not a trivial work mainly due to the reason that follows: OLAP requires a specialized data model to support multidimensional analysis over aggregated values of measurements at different granularity levels. However, SW does not dispose of appropriate model fully satisfying criteria about hierarchical levels proposed by (Codd, Codd, et Salley, 1993). Carrying out OLAP analysis directly over SW data is difficult and inefficient by the lack of suitable data model bridging the gap between SW and OLAP domains. Actually, OLAP is originally conceived for analysis over homogenous and stable warehoused data. With arrival of profusion of schema-less Web information, data become more and more heterogeneous and volatile. By mentioning the volatility of SW data we refer to the quick, unceasing and unpredictable changes in SW data sources. Traditional OLAP technologies are challenged while being applied to analyses over SW data. Facing to these issues, lots of research efforts have been made to combining OLAP with SW. Two types of approaches can be identified (Figure 3). The first approach is OLAP-analyses oriented, which consists of extracting, transforming and then storing multidimensional SW information in traditional OLAP data warehouses (§ 3.1), so that it can be analyzed through existing OLAP tools. The second approach is multidimensional modeling oriented, whose aim is to carry out OLAP analyses directly over RDF-like data modeled in an appropriate multidimensional format (§ 3.2). At the end of the section, we provide a conclusive table (cf. Table 1) that summarizes all mentioned work. OLAP analyses are carried out through analysis operators, such as roll-up , drilldown , rotate and so on (Ravat et al. , 2008). Analysis results are usually presented in Multidimensional Table (MT) allowing visualizing several analysis axes around a subject. Based on a MT, decision-makers can further carry ...

View in full-text

Context 2

... of SW data in the lack of relevant model? How to present analysis results containing both factual data and SW data? These questions are examples of issues waited to be resolved. The aim of this paper is to present an up-to-date survey of research results and outline future research challenges in BI and SW domains. The rest of the paper is organized as follows. We (i) briefly present the concepts of BI and SW in the section 2; (ii) give an overview of recent research results combining the domain of BI with SW in the sections 3 and 4; (iii) discuss emerging trends and perspectives of future researches in the section 5. The term of Business Intelligence (BI) refers to a set of techniques used for collecting, extracting and analyzing business data to support decision-making process. Coming from heterogeneous and distributed operational sources, data used in decision-making process are stored in Data Warehouse after going through a process called ETL (standing for Extraction, Transformation and Loading). Among different types of data warehouse, On-Line Analytical Processing (OLAP) data warehouse has been a specific research topic for over a decade. The concepts of OLAP were firstly proposed in (Codd, Codd, et Salley, 1993), they provide solutions for creating, managing, analyzing and reporting large amount of multidimensional data in an interactive way. Among all data models proposed for OLAP, the Star Schema (Kimball, 1996) is the most widely accepted model (Chaudhuri, Dayal, et Narasayya, 2011). At conceptual level, Star Schema presents data according to subjects of analysis (facts) and axes of analysis (dimensions). At logical level, Star Schema can be built on top of different types of databases: Multidimensional OLAP (MOLAP), Relational OLAP (ROLAP) and Hybrid OLAP (HOLAP). At physical level, Star Schema can be implemented in different ways, as long as the implementation conforms to the twelve evaluation rules defined in (Codd, Codd, et Salley, 1993), such as multidimensionality, transparency, accessibility, etc. Together with the multidimensional data model, a set of operators is indispensable for OLAP analysis. They permit to aggregate information (Drilldown, Rollup), filter analysis results (Slice, Dice) and change analysis axes (Pivot). (Kimball, 1998) points out that the main advantages of OLAP model lie in its simplicity and understandability that permit users to interact with large amount of complex data in an efficient way. Nowadays, OLAP is a well-mastered technology when it comes to homogenous and structured data in classical data warehouse. However, as factual data provide only limited and partial views over open-world business scenarios (Zorrilla et al. , 2012), the data warehouse community looks for solutions for enriching data collection with external data. To accurately exploit web data, a system needs to be capable to read the exact semantic meaning of web-published information. An acknowledged way to publish machine-readable information is to use Semantic web (SW) technologies. The purpose of SW technologies is to fix a common vocabulary and a set of interpretation constraints (inferring rules) so as to semantically express metadata over web information and allow doing some reasoning on it. These technologies 2 provide the capability of annotating web data with semantics, e.g., through RDF and ontologies, hence generating a web of semantic linked data (e.g., Linked Open 3 Data cloud ). 4 Tim Berners-Lee pointed out four principles that SW data should follow : use Uniform Resource Identifiers (URIs) to identify object; use Hypertext Transfer Protocol (HTTP) to facilitate searching for objects by human-beings; use the 5 Resource Description Framework (RDF) format as standard to provide descriptive information about an object; link URIs to others in order to connect individual data into a data web. Compared to traditional web technologies which focus mainly on data representation, SW puts a higher value on providing machine-readable information about web resources and relationships between resources. More specifically, SW presents human knowledge through structured collections of information and sets of inference rules (Berners-Lee, Hendler, et Lassila, 2001). The basic data model is RDF permitting to express simple statements about resources, using named properties and values (cf. figure 2). Resources described by RDF are not necessarily retrievable on the web, they can be anything with an unique identity, from physical objects to abstract concepts (McBride, 2004). A Triple Store permits to store RDF data. The set of statements in a RDF Triple Store is composed of URIs, blank nodes and literals. A RDF triple refers to subject , predicate and object : a subject is a web resource identified by a URI or a blank node; an object can be a web resource or a literal that possesses a primitive value; a predicate is a binary relationship connecting a subject with an object. For instance, in the figure 2 we can find the predicate denoted by the label Concerns associating the resource Sales with another resource ProductX , and another predicate named hasPrice connecting the subject denoted ProductX to a textual literal “ 30 ” which is the product ’ s price. There exist other SW formats with more powerful expressivity than RDF. Built 6 on top of RDF, RDF Vocabulary Description Language (or RDF schema or RDFS ) is a language that defines the terms used in RDF graph. Equivalent to schema definition language in relational and object-oriented data model, RDFS is used to describe classes of resources. In other words, RDFS is a simple ontology definition language which allows expressing taxonomies. The concepts of RDFS are described in form of a set of predefined RDF resources with special meanings. However, the reasoning capacity of RDFS is very limited, only basic inferences about taxonomies are supported (Horrocks, Patel-Schneider, et van Harmelen, 2003). Facing to this issue, the Web Ontology Working Group of W3C develops more powerful ontology languages, such as OWL-Lite, OWL-DL, OWL-Full, which allows defining explicit, formal conceptualizations of domain models. In general, OWL enhances the expressivity of RDF and RDFS schema by adding Description Logic (DL). Hence, OWL is an ontology language with sufficient expressive power which can support efficient reasoning through well-defined syntax and semantics (Antoniou et van Harmelen, 2004). By using the SW formats, web resources can be enriched with annotations and other markups capturing the semantic metadata of resources. However, not all current technologies are fully compatible with the semantic enrichment. For instance, traditional Information Retrieval (IR) technologies cannot directly exploit the annotated semantic meaning of web resources (Finin et al. , 2005). On the other hand, new research directions have been proposed to combine traditional research approaches with SW technologies, such as Semantic Information Retrieval (Ferná ndez et al. , 2011), Exploratory OLAP (Abelló et al. , 2015) etc. In this paper, we only focus on the emerging research direction which aims at enhancing traditional BI with new SW technologies. Nowadays, a large number of researches try to merge OLAP analysis with SW technologies both in data integration and data processing levels. This research direction permits to combine powerful tools and technologies in both domains. But it is not a trivial work mainly due to the reason that follows: OLAP requires a specialized data model to support multidimensional analysis over aggregated values of measurements at different granularity levels. However, SW does not dispose of appropriate model fully satisfying criteria about hierarchical levels proposed by (Codd, Codd, et Salley, 1993). Carrying out OLAP analysis directly over SW data is difficult and inefficient by the lack of suitable data model bridging the gap between SW and OLAP domains. Actually, OLAP is originally conceived for analysis over homogenous and stable warehoused data. With arrival of profusion of schema-less Web information, data become more and more heterogeneous and volatile. By mentioning the volatility of SW data we refer to the quick, unceasing and unpredictable changes in SW data sources. Traditional OLAP technologies are challenged while being applied to analyses over SW data. Facing to these issues, lots of research efforts have been made to combining OLAP with SW. Two types of approaches can be identified (Figure 3). The first approach is OLAP-analyses oriented, which consists of extracting, transforming and then storing multidimensional SW information in traditional OLAP data warehouses (§ 3.1), so that it can be analyzed through existing OLAP tools. The second approach is multidimensional modeling oriented, whose aim is to carry out OLAP analyses directly over RDF-like data modeled in an appropriate multidimensional format (§ 3.2). At the end of the section, we provide a conclusive table (cf. Table 1) that summarizes all mentioned work. OLAP analyses are carried out through analysis operators, such as roll-up , drilldown , rotate and so on (Ravat et al. , 2008). Analysis results are usually presented in Multidimensional Table (MT) allowing visualizing several analysis axes around a subject. Based on a MT, decision-makers can further carry out OLAP operators to continue their analyses. OLAP operators are only applicable to specialized data structures (Harinarayan, Rajaraman, et Ullman, 1996 ; Ravat et al. , 2008 ; Etcheverry et R. A. Vaisman, 2012), RDF descriptions, however, do not dispose component that can directly support OLAP analysis. For instance, in order to carry out drilldown and rollup operations, we need to represent data according to hierarchical levels within a dimension. However, even though RDF triple can be used to describe web resources and relationships between them (instance level), it does not allow revealing hierarchical ...

View in full-text

IoT-Lite: A Lightweight Semantic Model for the Internet of Things

Conference Paper

Full-text available

Jul 2016

Over the past few years the semantics community has developed ontologies to describe concepts and relationships between different entities in various application domains, including Internet of Things (IoT) applications. A key problem is that most of the IoT related semantic descriptions are not as widely adopted as expected. One of the main concern...

Business Intelligence Enhanced by the Web of Data

Thesis

Full-text available

Dec 2017

Jiefu SONG

Classically, Business Intelligence (BI) supports business analyses on data that are materialized beforehand and asynchronously updated in a Data Warehouse (DW). Nevertheless, the Web of data might sensibly complete these business analyses. To address the volatility issue of Web of data and the variety issue of data involved in analyses (DW and Linked Open Data), we propose a unification solution, named Unified Cube. Such a cube blends together multidimensional data without materializing them and is flexible enough to keep only useful data over time. We complete this modeling with a process of on-the-fly analysis which queries different sources in a transparent way to decision-makers. To illustrate the feasibility and the interest of our proposal, we have carried out some experimental assessments and developed some prototypes based on real-world data and benchmarks.

Towards an agent-based approach for multidimensional analyses of semantic web data

Conference Paper

Oct 2017

OLAP analytical systems are essential technologies in decision-making processes; they provide an efficient way to carry out complex analysis in a simpler and faster way to decision-makers. In today’s dynamic and competitive business contexts, the stored internal data within companies does no longer provide enough information for decision-making processes. Therefore, decision analysis systems could be improved by including external data available through the semantic web in order to provide multiple perspectives to decision makers. In this article, we describe a preliminary approach based on the use of multi-agent systems for multidimensional analysis of external data coming from the semantic web also gives a short review of recent research works combining business intelligence and semantic web technologies. The proposed approach is based on an evolutionary architecture by dint of the "agents" technology. The different stages of the analysis are considered tasks that will be assimilated to services, managed by agents.

Vers un Modèle Unifié de Données Entreposées et de Données Ouvertes Liées : Concepts et Expérimentations

Article

Mar 2017

De nos jours, la plupart des Systèmes d’Aide à la Décision (SAD) reposent sur un Entrepôt de Données (ED) construit à partir de données de production internes à l’organisation. Cependant, les analyses décisionnelles peuvent être sensiblement améliorées par l’ajout d’informations supplémentaires provenant de l'extérieur d'une organisation, notamment des Données Ouvertes Liées (DOL). L’intégration de ces données dans un SAD peut offrir de nouveaux points de vue aux décideurs. Dans cet article, nous décrivons un nouveau modèle multidimensionnel, appelé Cube Unifié, qui offre une représentation conceptuelle générique des données entreposées et des DOL. Un processus en deux étapes est proposé pour construire un Cube Unifié. Dans un premier temps, les schémas publiés avec des langages de modélisation spécifiques sont transformés en une représentation conceptuelle reposant sur un même langage. La deuxième étape consiste à associer les schémas précédemment définis pour former un schéma unifié. Un langage algébrique est proposé afin de permettre aux concepteurs de construire un Cube Unifié selon leurs besoins. Pour valider nos propositions, nous montrons comment un Cube Unifié (i) est construit sur des jeux de données réelles et (ii) permet aux décideurs d'effectuer des analyses décisionnelles avec de multiples sources.

Unifying Warehoused Data with Linked Open Data: A Conceptual Modeling Solution

Conference Paper

Full-text available

Sep 2016

Linked Open Data (LOD) become one of the most important sources of information allowing enhancing business analyses based on warehoused data with external data. However, Data Warehouses (DWs) do not directly cooperate with LOD datasets due to the differences between data models. In this paper, we describe a conceptual multidimensional model, named Unified Cube, which is generic enough to include both warehoused data and LOD. Unified Cubes provide a comprehensive representation of useful data and, more importantly, support well-informed decisions by including multiple data sources in one analysis. To demonstrate the feasibility of our proposal, we present an implementation framework for building Unified Cubes based on DWs and LOD datasets.

Incorporation of Ontologies in Data Warehouse/Business Intelligence Systems - A Systematic Literature Review

Article

Nov 2022

Semantic Web (SW) techniques, such as ontologies, are used in Information Systems (IS) to cope with the growing need for sharing and reusing data and knowledge in various research areas. Despite the increasing emphasis on unstructured data analysis in IS, structured data and its analysis remain critical for organizational performance management. This systematic literature review aims at analyzing the incorporation and impact of ontologies in Data Warehouse/Business Intelligence (DW/BI) systems, contributing to the current literature by providing a classification of works based on the field of each case study, SW techniques used, and the authors’ motivations for using them, with a focus on DW/BI design, development and exploration tasks. A search strategy was developed, including the definition of keywords, inclusion and exclusion criteria, and the selection of search engines. Ontologies are mainly defined using the Ontology Web Language standard to support multiple DW/BI tasks, such as Dimensional Modeling, Requirement Analysis, Extract-Transform-Load, and BI Application Design. Reviewed authors present a variety of motivations for ontology-driven solutions in DW/BI, such as eliminating or solving data heterogeneity/semantics problems, increasing interoperability, facilitating integration, or providing semantic content for requirements and data analysis. Further, implications for practice and research agenda are indicated.

Evaluation of Semantic Metadata Pair Modelling Using Data Clustering

Preprint

Sep 2018

Metadata presents a medium for connection, elaboration, examination, and comprehension of relativity between two datasets. Metadata can be enriched to calculate the existence of a connection between different disintegrated datasets. In order to do so, the very first task is to attain a generic metadata representation for domains. This representation narrows down the metadata search space. The metadata search space consists of attributes, tags, semantic content, annotations etc. to perform classification. The existing technologies limit the metadata bandwidth i.e. the operation set for matching purposes is restricted or limited. This research focuses on generating a mapper function called cognate that can find mathematical relevance based on pairs of attributes between disintegrated datasets. Each pair is designed from one of the datasets under consideration using the existing metadata and available meta-tags. After pairs have been generated, samples are constructed using a different combination of pairs. The similarity and relevance between two or more pairs are attained by using a data clustering technique to generate large groups from smaller groups based on similarity index. The search space is divided using a domain divider function and smaller search spaces are created using relativity and tagging as the main concept. For this research, the initial datasets have been limited to textual information. Once all disjoint meta-collection have been generated the approximation algorithm calculates the centers of each meta-set. These centers serve the purpose of meta-pointers i.e. a collection of meta-domain representations. Each pointer can then join a cluster based on the content i.e. meta-content. It also facilitates the process of possible synonyms across cross-functional domains. This can be examined using meta-pointers and graph pools.

EvOLAP Graph – Evolution and OLAP-Aware Graph Data Model: 14th International Conference, BDAS 2018, Held at the 24th IFIP World Computer Congress, WCC 2018, Poznan, Poland, September 18-20, 2018, Proceedings

Chapter

Aug 2018

Towards Answering Provenance-Enabled SPARQL Queries Over RDF Data Cubes

Conference Paper

Nov 2016

The SPARQL 1.1 standard has made it possible to formulate analytical queries in SPARQL. While some approaches have become available for processing analytical queries on RDF data cubes, little attention has been paid to answering provenance-enabled queries over such data. Yet, considering provenance is a prerequisite to being able to validate if a query result is trustworthy. The main challenge for existing triple stores is the way provenance can be encoded in standard triple stores based on context values (named graphs). Hence, in this paper we analyze the suitability of existing triple stores for answering provenance-enabled queries on RDF data cubes, identify their shortcomings, and propose an index to handle the high number of context values that provenance encoding typically entails. Our experimental results using the Star Schema Benchmark show the feasibility and scalability of our index and query evaluation strategies.

Enabling OLAP analyses on the web of data

Conference Paper

Sep 2016

This paper describes a business-oriented analysis environment facilitating analyses of coherent data from Data Warehouses (DWs) and Linked Open Data (LOD) datasets. Specifically, we present a multidimensional modeling solution, named Unified Cube, which provides a single, comprehensive representation of data from multiple sources. Unified Cubes include both concepts close to business terms and user-friendly graphical notations. An implementation framework is proposed to enable unified analyses of warehoused data and LOD. The feasibility of the proposed concepts is illustrated with examples based on real-world datasets.

Example of data modeled in RDF format

Contexts in source publication

Similar publications

Citations