Conference PaperPDF Available

Querying XML Data with SPARQL

August 2009

August 2009

DOI:10.1007/978-3-642-03573-9_32

Source
DBLP

Conference: Database and Expert Systems Applications, 20th International Conference, DEXA 2009, Linz, Austria, August 31 - September 4, 2009. Proceedings

Authors:

Nikos Bikakis

Hellenic Mediterranean University

Stavros Christodoulakis

Technical University of Crete

SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also a cquired industrial impor- tance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interop- erable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings b etween ontology con- structs and XML Schema constructs have been automatically derived or manu- ally specified. The mappings are used to automatica lly translate SPARQL que- ries to semantically equivalent XQuery queries whic h are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.

Overview of the SPARQL Translation Process SPARQL Graph Pattern Normalization. The SPARQL Graph Pattern Normalization activity re-writes the Graph-Pattern (GP) of the SPARQL query in an equivalent normal form based on equivalence rules. The SPARQL GP normalization is based on the GP expression equivalences proved in [3] and re-writing techniques. In particular, each GP can be transformed in a sequence P1 UNION P2 UNION P3 UNION…UNION Pn, where Pi (1≤i≤n) is a Union-Free GP (i.e. GPs that do not contain Union operators). This makes the GP translation process simpler and more efficient. Union-Free Graph Pattern (UF-GP) Processing. The UF-GP processing translates the constituent UF-GPs into semantically equivalent XQuery expressions. The UF-GP Processing activity is a composite one, with various sub-activities. This is actually the step that most of the "real work" is done since at this step most of the translation process takes place. The UF-GP processing activity is decomposed in the following sub-activities:-Determination of Variable Types. For every UF-GP, this activity initially identifies the types of the variables used in order to detect any conflict arising from the user's syntax of the input as well as to identify the form of the results for each variable. We define the following variable types: The Class Instance Variable Type (CIVT), The Literal Variable Type (LVT), The Unknown Variable Type (UVT), The Data Type Predicate Variable Type (DTPVT), The Object Predicate Variable Type (OPVT), The Unknown Predicate Variable Type (UPVT). We also define the following sets: The Data Type Properties Set (DTPS), which contains all the data type properties of the ontology. The Object Properties Set (OPS), which contains all the object properties of the ontology. The Variables Set (V), which contains all the variables that are used in the UF-GP. The Literals Set (L), which contains all the literals referenced in the UF-GP.

…

SPARQL Query Translation Example

…

Content may be subject to copyright.

Content uploaded by Chrisa Tsinaraki

Content may be subject to copyright.

Querying XML Data with SPARQL*

Nikos Bikakis, Nektarios Gioldasis, Chrisa Tsinaraki,

Stavros Christodoulakis

Technical University of Crete, Department of Electronic and Computer Engineering

Laboratory of Distributed Multimedia Information Systems & Applications (TUC/ MUSIC)

University Campus, 73100, Kounoupidiana Chania, Greece

{nbikakis, nektarios, chrisa, stavros}@ced.tuc.gr

Abstract. SPARQL is today the standard access language for Semantic Web

data. In the recent years XML databases have also acquired industrial impor-

tance due to the widespread applicability of XML in the Web. In this paper we

present a framework that bridges the heterogeneity gap and creates an interop-

erable environment where SPARQL queries are used to access XML databases.

Our approach assumes that fairly generic mappings between ontology con-

structs and XML Schema constructs have been automatically derived or manu-

ally specified. The mappings are used to automatically translate SPARQL que-

ries to semantically equivalent XQuery queries which are used to access the

XML databases. We present the algorithms and the implementation of

SPARQL2XQuery framework, which is used for answering SPARQL queries

over XML databases.

Keywords: Semantic Web, XML Data, Information Integration, Interoperabili-

ty, Query Translation, SPARQL, XQuery, SPARQL to XQuery transla-

tion/transformation, SPARQL2XQuery.

1 Introduction

The Semantic Web has to coexist and interoperate with other software environments

and in particular with legacy databases. The Extensible Markup Language (XML), its

derivatives (XPath, XSLT, etc.), and the XML Schema have been extensively used to

describe the syntax and structure of complex documents. In addition, XML Schema

has been extensively used to describe the standards in many business, service, and

multimedia application environments. As a result, a large volume of data is stored and

managed today directly in the XML format in order to avoid inefficient access and

conversion of data, as well as avoiding involving the application users with more than

one data models. The database management systems offer today an environment

supporting the XML data model and the XQuery access language for managing XML

data. In the Web application environment the XML Schema acts also as a wrapper to

relational content that may coexist in the databases.

Our working scenario assumes that users and applications of the Semantic Web

environment ask for content from underlying XML databases using SPARQL. The

An extended version of this paper is available at [14].

SPARQL queries are translated into semantically equivalent XQuery queries which

are (exclusively) used to access and manipulate the data from the XML databases in

order to return the requested results to the user or the application. The results are

returned in RDF (N3 or XML/RDF) or XML [1] format. To answer the SPARQL

queries on top of the XML databases, a mapping at the schema level is required. We

support a set of language level correspondences (rules) for mappings between

RDFS/OWL and XML Schema. Based on these mappings our framework is able to

translate SPARQL queries into semantically equivalent XQuery expressions as well

as to convert XML Data in the RDF format. Our approach provides an important

component of any Semantic Web middleware, which enables transparent access to

existing XML databases.

The framework has been smoothly integrated with the XS2OWL framework [9],

thus achieving not only the automatic generation of mappings between XML Schemas

and OWL ontologies, but also the transformation of XML documents in RDF format.

Various attempts have been made in the literature to address the issue of accessing

XML data from within Semantic Web Environments [2, 4, 5, 6, 7, 8, 9, 10, 11, 12].

An extended overview of related work can be found at [13].

The rest of the paper is organized as follows: The mappings used for the translation

as well as their encoding are described in Section 2. Section 3 provides an overview

of the query translation process. The paper concludes in section 4.

2 Mapping OWL to XML Schema

The framework described here allows XML encoded data to be accessed from Seman-

tic Web applications that are aware of some ontology encoded in OWL. To do that,

appropriate mappings between the OWL ontology (O) and the XML Schema (XS)

should exist. These mappings may be produced either automatically, based on our

previous work in the XS2OWL framework [9], or manually through some mapping

process carried out by a domain expert. However, the definition of mappings between

OWL ontologies and XML Schemas is not the subject of this paper. Thus, we do not

focus on the semantic correctness of the defined mappings. We neither consider what

the mapping process is, nor how these mappings have been produced

Such a mapping process has to be guided from language level correspondences.

That is, the valid correspondences between the OWL and XML Schema language

constructs have to be defined in advance. The language level correspondences that

have been adopted in this paper are well-accepted in a wide range of data integration

approaches [2, 4, 9, 10, 11]. In particular, we support mappings that obey the follow-

ing language level correspondence rules: A class of O corresponds to a Complex Type

of XS, a DataType Property of O corresponds to a Simple Element or Attribute of XS,

and an Object Property of O corresponds to a Complex Element of XS.

Then, at the schema level, mappings between concrete domain conceptualizations

have to be defined (e.g. the employee class is mapped to the worker complex type)

following the correspondences established at the language level.

At the schema level mappings a mapping relationship between O and an XS is a bi-

nary association representing a semantic association among them. It is possible that

for a single ontology construct more than one mapping relationships are defined. That

is, a single source ontology construct can be mapped to more than one target XML

Schema elements (1:n mapping) and vice versa, while more complex mapping rela-

tionships can be supported.

The mappings considered in our work are based on the Consistent Mappings Hypo-

thesis, which states that for each mapped property Pr of O:

a. The domain classes of Pr have been mapped to complex types in XS that

contain the elements or attributes that Pr has been mapped to.

b. If Pr is an object property, the range classes of Pr have been mapped to

complex types in XS, which are used as types for the elements that Pr has been

mapped to.

2.1 Encoding of the Schema Level Mappings

Since we want to translate SPARQL queries into semantically equivalent XQuery

expressions that can be evaluated over XML data following a given (mapped) sche-

ma, we are interested in addressing XML data representations. Thus, based on schema

level mappings for each mapped ontology class or property, we store a set of XPath

expressions (“XPath set” for the rest of this paper) that address all the corresponding

instances (XML nodes) in the XML data level. In particular, based on the schema

level mappings, we construct:

 A Class XPath Set X

for each mapped class C, containing all the possible

XPaths of the complex types to which the class C has been mapped to.

 A Property XPath Set X

for each mapped property Pr, containing all the possi-

ble XPaths of the elements or/and attributes to which Pr has been mapped.

For ontology properties, we are also interested in identifying the property domains

and ranges. Thus, for each property we define the X

PrD

and X

PrR

sets, where:

 The Property Domains XPath Set X

PrD

for a property Pr represents the set of the

XPaths of the property domain classes.

 The Property Ranges XPath Set X

PrR

for a property Pr represents the set of the

XPaths of the property ranges.

Example 1. Encoding of Mappings

Fig. 1 shows the mappings between an OWL Ontology and an XML Schema.

Fig. 1. Mappings Between OWL & XML

To better explain the defined mappings, we show in Fig. 1 the structure of the

XML documents that follow this schema. The encoding of these mappings in our

framework is shown in Fig.2.

Fig. 2. Mappings Encoding

XPath Set Operators. For XPath Sets, the following operators are defined in order to

formally explain the query translation methodology in the next sections:

 The unary Parent Operator

, which, when applied to a set of XPaths X (i.e. (X)

returns the set of the distinct parent XPaths (i.e. the same XPaths without the leaf

node). When applied to the root node, the operator returns the same node.

Example 2. Let Χ={ /a , /a/b , /c/d , /e/f/g , /b/@f } then (Χ)

={ /a , /a , /c , /e/f , /b }.

 The binary Right Child Operator ®, which, when applied to two XPath sets X and Y

(i.e. X®Y ), returns the members (XPaths) of the right set X, the parent XPaths of

which are contained in the left set Y.

Example 3. Let X={ /a , /c/b } and Y={ /a/d , /a/c , /c/b/p , c/a/g } then

X ®Y = { /a/d , /a/c , /c/b/p } .

 The binary Append Operator

, which is applied on an XPath set X and a set of node

names N (i.e. X / N ), resulting in a new set of XPaths Y by appending each member

of N to each member of X.

Example 4. Let X={/a, /a/b} and N={c, d} then Y = X / N = {/a/c, /a/d, /a/b/c, a/b/d }.

XPath Set Relations. We describe here a relation among XPath sets that holds

because of the Consistent Mapping Hypothesis described above. We will use this

relation later on in the query translation process, and in particular in the variable

bindings algorithm (subsection 3.1):

Domain-Range Property Relation:

(

)

(

)

Property Pr and X

Pr Pr PrD Pr Pr

P P

X X X X

R R

∀ ⇒ = = =

The Domain-Range Property Relation can be easily understood taking into account

the hierarchical structure of XML data as well as the Consistent Mappings Hypothe-

sis. It describes that for a single property Pr:



the XPath set of its ranges is equal to its own XPath set (i.e. the instances of its

ranges are the XML nodes of the elements that this property has been mapped to).



the XPath set of its domain classes is equal to the set containing its parent XPaths

(i.e. the XPaths of the CTs(Complex Types) that contain the elements that this

property has been mapped to).

3 Overview of the Query Translation Process

In this section we present in brief the entire translation process using a UML activity

diagram. Fig. 3 shows the entire process which starts taking as input the given

SPARQL query and the defined mappings between the ontology and the XML Sche-

ma (encoded as described in the previous sections). The query translation process

comprises of the activities outlined in the following paragraphs.

act SPARQL2?QUERY

Mappings SPARQL GraphPattern

Normalization

SPARQL

Query

Solution Sequence

Modifiers Translation

Query Form Based

Translation

Union-Free GraphPattern Processing

Determination of

Variable Types

Processing

Onto-Triples

UF-GP2XQuery

Variables

Binding

BGP2XQuery

Union Operator

Translation

[Else]

[SSMs Exist]

[Else]

[Type Conflicts]

[Onto-Triples

Exist]

[Else] [More GPs]

[More U-F GPs]

[More BGPs]

Fig. 3. Overview of the SPARQL Translation Process

SPARQL Graph Pattern Normalization. The SPARQL Graph Pattern Normali-

zation activity re-writes the Graph-Pattern (GP) of the SPARQL query in an equiva-

lent normal form based on equivalence rules. The SPARQL GP normalization is

based on the GP expression equivalences proved in [3] and re-writing techniques. In

particular, each GP can be transformed in a sequence P1 UNION P2 UNION P3 UN-

ION…UNION Pn, where Pi (1≤i≤n) is a Union-Free GP (i.e. GPs that do not contain

Union operators). This makes the GP translation process simpler and more efficient.

Union-Free Graph Pattern (UF-GP) Processing. The UF-GP processing trans-

lates the constituent UF-GPs into semantically equivalent XQuery expressions. The

UF-GP Processing activity is a composite one, with various sub-activities. This is

actually the step that most of the “real work” is done since at this step most of the

translation process takes place. The UF-GP processing activity is decomposed in the

following sub-activities:

– Determination of Variable Types. For every UF-GP, this activity initially iden-

tifies the types of the variables used in order to detect any conflict arising from the

user’s syntax of the input as well as to identify the form of the results for each vari-

able. We define the following variable types: The Class Instance Variable Type

(CIVT), The Literal Variable Type (LVT), The Unknown Variable Type (UVT), The

Data Type Predicate Variable Type (DTPVT), The Object Predicate Variable Type

(OPVT), The Unknown Predicate Variable Type (UPVT).

We also define the following sets: The Data Type Properties Set (DTPS), which

contains all the data type properties of the ontology. The Object Properties Set

(OPS), which contains all the object properties of the ontology. The Variables Set

(V), which contains all the variables that are used in the UF-GP. The Literals Set

(L), which contains all the literals referenced in the UF-GP.

The determination of the variable types is based on a set of rules applied itera-

tively for each triple in the given UF-GP. Below we present a subset of these rules,

which are used to determine the type (T

) of a variable X:

Let S P O be a triple pattern.

1. If P є OPS and Ο є V

⇒

= CIVT. If predicate is an object property and

object is a variable, then the type of the object variable is CIVT.

2. If Ο є L and P є V ⇒ T

= DTPVT. If the object is a literal value, then the

type of the predicate variable is DTPVT.

– Processing Onto-Triples. Onto-Triples actually refer to the ontology structure

and/or semantics. The main objective of this activity is to process Onto-Triples

against the ontology (using SPARQL) and based on this analysis to bind (i.e. assign-

ing the relevant XPaths to variables) the correct XPaths to variables contained in the

Onto-Triples. These bindings are going to be used in the next steps as input to the

Variable Bindings activity.

– UF-GP2XQuery. This activity translates the UF-GP into semantically equivalent

XQuery expressions. The concept of a GP, and thus the concept of UF-GF, is de-

fined recursively. The BGP2XQuery algorithm translates the basic components of a

GP (i.e. Basic Graph Patterns - BGPs which are sequences of triple patterns and fil-

ters) into semantically equivalent XQuery expressions (see subsection 3.2). To do

that a variables binding (see subsection 3.1) step is needed. Finally, BGPs in the

context of a GP have to be properly associated. That is, to apply the SPARQL oper-

ators among them using XQuery expressions and functions. These operators are:

OPT, AND, and FILTER and are implemented using standard XQuery expressions

without any ad hoc processing.

Union Operator Translation. This activity translates the UNION operator that ap-

pears among UF-GPs in a GP, by using the Let and Return XQuery clauses in order

to return the union of the solution sequence produced by the UF-GPs to which the

Union operator applies.

Solution Sequence Modifiers Translation. This activity translates the SPARQL

solution sequence modifiers using XQuery clauses (Order By, For, Let, etc.) and

XQuery built-in functions (you can see the example in subsection 3.3.). The modifiers

supported by SPARQL are Distinct, Order By, Reduced, Limit, and Offset.

Query Forms Based Translation. SPARQL has four forms of queries (Select, Ask,

Construct and Describe). According to the query form, the structure of the final result

is different. The query translation is heavily dependent on the query form. In particu-

lar, after the translation of any solution modifier is done, the generated XQuery is

enhanced with appropriate expressions in order to achieve the desired structure of the

results (e.g. to construct an RDF graph, or a result set) according to query form.

3.1 Variable Bindings

This section describes the variable bindings activity. In the translation process the

term “variable bindings” is used to describe the assignment of the correct XPaths to

the variables referenced in a given Basic Graph Pattern (BGP), thus enabling the

translation of BGP to XQuery expressions. In this activity, Onto-Triples are not taken

into account since their processing has taken place in the previous step.

Definition 1 : A triple pattern has the form (s,p,o) є( I

V )

( I

B )

( I

V ), where I is a set of IRIs, B is a set of Blank Nodes, V is a set of

Variables, and L the set of RDF Literals. In our approach, however, the individuals

in the source ontology are not considered at all (either they do not exist, or they are

not used in semantic queries).

Definition 2 : A variable contained in a Union Free Graph Pattern is called a

Shared Variable when it is referenced in more than one triple patterns of the same

Union-Free Graph Pattern regardless its position in those triple patterns.

Variable Bindings Algorithm. When describing data with the RDF triples (s,p,o),

subjects represent class individuals (RDF nodes), predicates represent properties

(RDF arcs), and objects represent class individuals or data type values (RDF nodes).

Based on that, and the domain-range property relation of Xpaths sets relations section

we have: a) X

= X

= (X

)

= (X

)

b) X

= X

and c) X

= X

pR .

Thus it holds that: Χ

= Χ

= (Χ

)

= (Χ

)

(Χ

)

⇒

= (Χ

)

= (Χ

)

(Subject-

Predicate-Object Relation)

This relation holds for every single triple pattern. Thus, the variable bindings algo-

rithm uses this relation in order to find the correct bindings for the entire set of triple

patterns starting from the bindings of any single triple pattern part (subject, predicate,

or object).

In case of shared variables, the algorithm tries to find the maximum set of bindings

(using the operators for XPath sets) that satisfy this relation for the entire set of triple

patterns (e.g. the entire BGP). Once this relation holds for the entire BGP we have as

a result that all the instances (in XML) that satisfy the BGP have been addressed.

The variable bindings algorithm in case of shared variables of LVT type it doesn’t

determine the XPaths for this kind of variable, since literal equality is independent of

the XPaths expressions. Thus, the bindings for variables of this type cannot be defined

at this step (mark as “Not Definable” at variable bindings rules). Instead, they will be

handled by the BGP2XQuery (subsection 3.2) algorithm (using the mappings and the

determined variables bindings).

The algorithm takes as input a BGP as well as a set of initial bindings and the types

of variables as these are determined in the “Determination of Variable Type” activity.

These initial bindings are the ones produced by the Onto-Triple processing activity

and initialize the bindings of the algorithm. Then, the algorithm performs an iterative

process where it determines, at each step, the bindings of the entire BGP (triple by

triple). The determination of the bindings is based on the rules described below. This

iterative process continues until the bindings for all the variables found in the succes-

sive iterations are equal. This means that no further modifications in the variable

bindings are to be made and that the current bindings are the final ones.

Variable Bindings Rules. Based on the possible combinations of S, P and O, there

are four different types of triple patterns (the ontology instance are not yet supported

by our framework):

Type 1 : S є V, P є I ,O є L. Type 2 : S, O є V, P є I . Type 3 : S, P є V,

O є L. Type 4 : S, P, O є V.

According to the triple pattern type, we have defined a set of rules for the variable

bindings. In this section we present a sub-set of these rules due to space limitations.

In what follows the symbol ′ in XPath sets denotes the new bindings assigned to

the set at each iteration, while the symbol ← denotes the assignment of a new value to

the set. All the XPath sets are considered to be initially set to null. In that case, the

intersection operation is not affected by the null set. E.g. Χ={ null } and Υ= {/a/b ,

d/e} then X ∩ Y ={ /a/b , d/e }. The notation “Not Definable” is used for variables of

type LVT as explained above. Consider the triple S P O :

 If the triple is of Type 1 ⇒ X

′

← X

∩ X

 If the triple is of Type 2 ⇒ X

′

← X

∩ X

∩ (X

)

− If P є OPS ⇒ X

′

← X

′

® X

− If P є DTPS ⇒ X

′ Non Definable (as explained in previously)

 If the triple is of Type 3 ⇒ X

′

← X

∩ X

and X

′

← X

′

® X

 If the triple is of Type 4 ⇒ X

′

← X

∩ X

∩ (X

)

and X

′

← X

′

® X

− If T

= CIVT or T

= UVT ⇒ X

′

← X

′

∩ X

− If T

= LVT ⇒ X

′ Non Definable (as explained previously)

XPath Set Relations for Triple-Patterns. Among XPath sets of triple patterns there

are important relations that can be exploited in the development of the XQuery ex-

pressions in order to correctly associate data that have been bound to different va-

riables of triple patterns. The most important relation among XPath sets of triple pat-

terns is that of extension:

Extension Relation: An XPath set A is said to be an extension of an XPath set B if

all XPaths in A are descendants of the XPaths of B.

As an example of this relation, consider the XPath A′

produced when applying the

append (/) operator to an original XPath set A with a set of nodes.

The extension relation holds for the results of the variable bindings algorithm (Sub-

ject-Predicate-Object Relation) and implies that the XPaths bound to subjects are

parents of the XPaths bound to predicates and objects of triple patterns.

3.2 Translating BGPs to XQuery

In this section we describe the translation of BGPs to semantically equivalent XQuery

expressions. The algorithm manipulates a sequence of triple patterns and filters (i.e. a

BGP) and translates them into semantically equivalent XQuery expressions, thus

allowing the evaluation of a BGP on a set of XML data.

Definition 3 : Return Variables (RV) are those variables for which the given

SPARQL Query would return some information. The set of all Return Variables of

a SPARQL query constitutes the set RV

⊆

The BGP2XQuery Algorithm. We briefly present here the BGP2XQuery algo-

rithm for translating BGPs into semantically equivalent XQuery expressions. The

algorithm takes as input the mappings between the ontology and the XML schema,

the BGP, the determined variable types, as well as the variable bindings. The algo-

rithm is not executed triple-by-triple for a complete BGP. Instead, it processes sub-

jects, predicates, and objects of all the triples separately. For each variable included in

the BGP, the BGP2XQuery it creates a For or Let XQuery clause using the variable

bindings, the input mappings, and the Extension Relation for triple-patterns (see sub-

section.3.1), in order to bound XML data into XQuery variables. The choice between

the For and the Let XQuery clauses is based on specific rules so as to create a solu-

tion sequence based on the SPARQL semantics. Moreover, in order to associate bind-

ings from different variables into concrete solutions, the algorithm uses the Extension

Relation. For literals included in the BGP, the algorithm is using XPath predicates in

order to translate them. Due to the complexity that a SPARQL filter may have, the

algorithm translates all the filters into XQuery where clauses, although some “simple”

of them (e.g. condition on literals) could be translated using XPath predicates. More-

over, SPARQL operators (Built-in functions) included in filter expressions are trans-

lated using built-in XQuery functions and operators. However, for some “special”

SPARQL operators (like sameTerm, lang, etc.) we have developed native XQuery

functions that simulate them.

Finally, the algorithm creates an XQuery Return clause that includes the Return

Variables (RV) that was used in the BGP.

There are some cases of share variables which need special treatment by the algo-

rithm in order to apply the required joins in XQuery expressions. The way that the

algorithm manipulates these cases depends on which parts (subject-predicate-object)

of the triples patterns these shared variables refer to.

3.3 Example

We demonstrate in this example the use of the described framework in order to allow

a SPARQL query to be evaluated in XML Data (based on Example 1). Fig. 4 shows

how a given SPARQL query is translated by our framework into a semantically

equivalent XQuery.

Fig. 4. SPARQL Query Translation Example

4 Conclusions

We have presented a framework and its software implementation that allows the eval-

uation of SPARQL queries over XML data which are stored in XML databases and

accessed with the XQuery language. The framework assumes that a set of mappings

between the OWL ontology and the XML Schema exists which obey to certain well

accepted language correspondences.

The SPARQL2XQuery framework has been implemented as a software service

which can be configured with appropriate mappings (between some ontology and

XML Schema) and translates input SPARQL queries into semantically equivalent

XQuery queries that are answered over the XML Database.

5 References

1. Beckett D. (eds), “SPARQL Query Results XML Format”. W3C Recommendation, 15

January 2008, (http://www.w3.org/TR/rdf-sparql-XMLres/).

2. Bohring H., Auer S.: “Mapping XML to OWL Ontologies”. Leipziger Informatik-Tage

2005: 147-156

3. J. Perez, M. Arenas, C. Gutierrez. Semantics and Complexity of SPARQL. 5th Interna-

tional Semantic Web Conference (ISWC-06), November 2006.

4. Rodrigues T., Rosa P, Cardoso J., “Mapping XML to Exiting OWL ontologies”, Interna-

tional Conference WWW/Internet 2006, Murcia, Spain, 5-8 October 2006.

5. Joel Farrell and Holger Lausen. Semantic Annotations for WSDL and XML Schema.

W3C Recommendation, W3C, August 2007. Available at http://www.w3.org/TR/sawsdl/

6. Sven Groppe, Jinghua Groppe, Volker Linnemann, Dirk Kukulenz, Nils Hoeller, Chris-

toph Reinke: Embedding SPARQL into XQuery/XSLT. SAC 2008: 2271-2278

7. Waseem Akhtar, Jacek Kopecký et.al : XSPARQL: Traveling between the XML and RDF

Worlds - and Avoiding the XSLT Pilgrimage. ESWC 2008:432-447

8. Matthias Droop, Markus Flarer et.al : “Embedding XPATH Queries into SPARQL Que-

ries” In Proc. of the 10th International Conference on Enterprise Information Systems

9. Tsinaraki C., Christodoulakis S., “Interoperability of XML Schema Applications with

OWL Domain Knowledge and Semantic Web Tools”. In Proc. of the ODBASE 2007.

10. Cruz I.R., Huiyong Xiao, Feihong Hsu: “An Ontology-based Framework for XML Seman-

tic Integration”, Database Engineering and Applications Symposium, 2004.

11. V.Christophides, G. Karvounarakis et.al : “The ICS-FORTH SWIM: A Powerful Semantic

Web Integration Middleware”. In Proc. of the SWDB 2003, pages 381-393.

12. Bernd Amann, Catriel Beeri, Irini Fundulaki, Michel Scholl: Querying XML Sources

Using an Ontology-Based Mediator. CoopIS/DOA/ODBASE 2002: 429-448

13. Bikakis N., Gioldasis N., Tsinaraki C., Christodoulakis S.: “Semantic Based Access over

XML Data” In Proc. of 2

World Summit on Knowledge Society 2009 (WSKS2009).

14. Bikakis N., Gioldasis N., Tsinaraki C., Christodoulakis S.: “The SPARQL2XQuery

Framework” Technical Report http://www.music.tuc.gr/reports/SPARQL2XQUERY.PDF

RDF 1.1: Knowledge Representation and Data Integration Language for the Web

Article

Full-text available

Jan 2020

Resource Description Framework (RDF) can seen as a solution in today’s landscape of knowledge representation research. An RDF language has symmetrical features because subjects and objects in triples can be interchangeably used. Moreover, the regularity and symmetry of the RDF language allow knowledge representation that is easily processed by machines, and because its structure is similar to natural languages, it is reasonably readable for people. RDF provides some useful features for generalized knowledge representation. Its distributed nature, due to its identifier grounding in IRIs, naturally scales to the size of the Web. However, its use is often hidden from view and is, therefore, one of the less well-known of the knowledge representation frameworks. Therefore, we summarise RDF v1.0 and v1.1 to broaden its audience within the knowledge representation community. This article reviews current approaches, tools, and applications for mapping from relational databases to RDF and from XML to RDF. We discuss RDF serializations, including formats with support for multiple graphs and we analyze RDF compression proposals. Finally, we present a summarized formal definition of RDF 1.1 that provides additional insights into the modeling of reification, blank nodes, and entailments.

RDF 1.1: Knowledge Representation and Data Integration Language for the Web

Preprint

Full-text available

Jan 2020

Resource Description Framework (RDF) can seen as a solution in today's landscape of knowledge representation research. An RDF language has symmetrical features because subjects and objects in triples can be interchangeably used. Moreover, the regularity and symmetry of the RDF language allow knowledge representation that is easily processed by machines, and because its structure is similar to natural languages, it is reasonably readable for people. RDF provides some useful features for generalized knowledge representation. Its distributed nature, due to its identifier grounding in IRIs, naturally scales to the size of the Web. However, its use is often hidden from view and is, therefore, one of the less well-known of the knowledge representation frameworks. Therefore, we summarise RDF v1.0 and v1.1 to broaden its audience within the knowledge representation community. This article reviews current approaches, tools, and applications for mapping from relational databases to RDF and from XML to RDF. We discuss RDF serializations, including formats with support for multiple graphs and we analyze RDF compression proposals. Finally, we present a summarized formal definition of RDF 1.1 that provides additional insights into the modeling of reification, blank nodes, and entailments.

Strategies for a Semantified Uniform Access to Large and Heterogeneous Data Sources

Thesis

Full-text available

Jan 2021

Mohamed Nadjib Mami

The remarkable advances achieved in both research and development of Data Management as well as the prevalence of high-speed Internet and technology in the last few decades have caused unprecedented data avalanche. Large volumes of data manifested in a multitude of types and formats are being generated and becoming the new norm. In this context, it is crucial to both leverage existing approaches and propose novel ones to overcome this data size and complexity, and thus facilitate data exploitation. In this thesis, we investigate two major approaches to addressing this challenge: Physical Data Integration and Logical Data Integration. The specific problem tackled is to enable querying large and heterogeneous data sources in an ad hoc manner. In the Physical Data Integration, data is physically and wholly transformed into a canonical unique format, which can then be directly and uniformly queried. In the Logical Data Integration, data remains in its original format and form and a middleware is posed above the data allowing to map various schemata elements to a high-level unifying formal model. The latter enables the querying of the underlying original data in an ad hoc and uniform way, a framework which we call Semantic Data Lake, SDL. Both approaches have their advantages and disadvantages. For example, in the former, a significant effort and cost are devoted to pre-processing and transforming the data to the unified canonical format. In the latter, the cost is shifted to the query processing phases, e.g., query analysis, relevant source detection and results reconciliation. In this thesis we investigate both directions and study their strengths and weaknesses. For each direction, we propose a set of approaches and demonstrate their feasibility via a proposed implementation. In both directions, we appeal to Semantic Web technologies, which provide a set of time-proven techniques and standards that are dedicated to Data Integration. In the Physical Integration, we suggest an end-to-end blueprint for the semantification of large and heterogeneous data sources, i.e., physically transforming the data to the Semantic Web data standard RDF (Resource Description Framework). A unified data representation, storage and query interface over the data are suggested. In the Logical Integration, we provide a description of the SDL architecture, which allows querying data sources right on their original form and format without requiring a prior transformation and centralization. For a number of reasons that we detail, we put more emphasis on the virtual approach. We present the effort behind an extensible implementation of the SDL, called Squerall, which leverages state-of-the-art Semantic and Big Data technologies, e.g., RML (RDF Mapping Language) mappings, FnO (Function Ontology) ontology, and Apache Spark. A series of evaluation is conducted to evaluate the implementation along with various metrics and input data scales. In particular, we describe an industrial real-world use case using our SDL implementation. In a preparation phase, we conduct a survey for the Query Translation methods in order to back some of our design choices.

MEI2JSON: a pre-processing music scores converter

Article

Full-text available

Jan 2022

MEI2JSON: a pre-processing music scores converter

Article

Jan 2021

Converting music score content from symbolic formats to simplified data formats is found useful for artificial intelligence purposes. The conversion can be applied using XSL stylesheets and ontologies to ensure the preserving of the data quality throughout the transformation. In this paper, we proposed a new converter capable of transforming music scores encoded in MEI to JSON format for pre-processing purposes, and future usage into artificial intelligence techniques. The proposed converter uses an eastern music score ontology capable of structuring standard music scores content in addition to elements and attributes specific to eastern music. Thus, the converter shares the same support for eastern music scores. We illustrate the conversion process by assessing the performance analysis, the data quality, and the storage of the proposed converter in comparison with a combined approach composed of two state-of-the-art converters.

On Supporting Interoperability between RDF and Property Graph Databases

Thesis

Full-text available

Jun 2021

Harsh Thakkar

Over the last few years, the amount and availability of machine-readable Open, Linked, and Big data on the web has increased. Simultaneously, several data management systems have emerged to deal with the increased amounts of this structured data. RDF and Graph databases are two popular approaches for data management based on modeling, storing, and querying graph-like data. RDF database systems are based on the W3C standard RDF data model and use the W3C standard SPARQL as their defacto query language. Most graph database systems are based on the Property Graph (PG) data model and use the Gremlin language as their query language due to its popularity amongst vendors. Given that both of these approaches have distinct and complementary characteristics – RDF is suited for distributed data integration with built-in world-wide unique identifiers and vocabularies; PGs, on the other hand, support horizontally scalable storage and querying, and are widely used for modern data analytics applications, – it becomes necessary to support interoperability amongst them. The main objective of this dissertation is to study and address this interoperability issue. We identified three research challenges that are concerned with the data interoperability, query interoperability, and benchmarking of these databases. First, we tackle the data interoperability problem. We propose three direct mappings (schema-dependent and schema-independent) for transforming an RDF database into a property graph database. We show that the proposed mappings satisfy the desired properties of semantics preservation and information preservation. Based on our analysis (both formal and empirical), we argue that any RDF database can be transformed into a PG database using our approach. Second, we propose a novel approach for querying PG databases using SPARQL using Gremlin traversals – GREMLINATOR to tackle the query interoperability problem. In doing so, we first formalize the declarative constructs of Gremlin language using a consolidated graph relational algebra and define mappings to translate SPARQL queries into Gremlin traversals. GREMLINATOR has been officially integrated as a plugin for the Apache TinkerPop graph computing framework (as sparql-gremlin), which enables users to execute SPARQL queries over a wide variety of OLTP graph databases and OLAP graph processing frameworks. Finally, we tackle the third, benchmarking (performance evaluation), problem. We propose a novel framework – LITMUS Benchmark Suite that allows a choke-point driven performance comparison and analysis of various databases (PG and RDF-based) using various third-party real and synthetic datasets and queries. We also studied a variety of intrinsic and extrinsic factors – data and system-specific metrics and Key Performance Indicators (KPIs) that influence a given system’s performance. LITMUS incorporates various memory, processor, data quality, indexing, query typology, and data-based metrics for providing a fine-grained evaluation of the benchmark. In conclusion, by filling the research gaps, addressed by this dissertation, we have laid a solid formal and practical foundation for supporting interoperability between the RDF and Property graph database technology stacks. The artifacts produced during the term of this dissertation have been integrated into various academic and industrial projects.

ONTOLOGY-BASED DATA ACCESS TO HETEROGENEOUS DATA SOURCES: STATE OF THE ART APPROACHES AND APPLICATIONS

Article

Apr 2022

FLUX: from SQL to GQL query translation tool

Conference Paper

Dec 2020

Chandan Sharma

Approaches for Efficient Query Optimization Using Semantic Web Technologies

Chapter

Mar 2020

Query optimization system proposes an answer-driven approach to information access. Most of the query optimization system aims for information retrieval required by natural language queries. Queries are generally asked within a context, and answers are provided within that specific context. RDF is a general proposition language for the Web, joining data from diverse resources. SPARQL, a query language for RDF, can join data from different databanks, as well as papers, inference engines, or anything else that may reveal its expertise as a guided classified chart. Because of lack of proper architectural circulation, the existing SPARQL-to-SQL translation techniques have actually trimmed a lot of restrictions that decrease their toughness, effectiveness, and reliability. These constraints include the generation of ineffective or perhaps incorrect SQL inquiries, lack of official history, and bad applications. This paper recommended a structure which made use of by an ontology-based moderator system to provide the well-defined semantical design, which (i) supplies a distinct SPARQL semantics used to rewrite the question in SQL; (ii) ontology-based expertise is created for rapid accessibility as well as equate question revising SPARQL to SQL for reliable information retrieval in semantic Internet data of big dataset; (iii) hybrid query optimization framework is proposed for query handling technique for the effective access of customized details on the semantic Internet making use of bundled ontology expertise and also inference engine.

An approach for semantic integration of heterogeneous data sources

Article

Full-text available

Mar 2020

Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view.

Mapping XML to existing OWL ontologies

Article

Full-text available

Jul 2008

Now-a-days, XML has reached a wide recognition and brought interoperability at a syntactic level. Unfortunately, even when using XML to represent data, problems arise when it is necessary to integrate different data sources because XML lacks support for efficient sharing of conceptualization. Emerging Semantic Web technologies, such as ontologies, can enable semantic interoperability. With ontologies, it is possible to formally represent shared domain knowledge models defined with concepts, attributes, relationships and instances. In this paper, we present a notation to map XML Schema to existing OWL ontologies and the qualities an algorithm should have to transform XML documents (instances of the mapped schema) into instances of the mapped ontology.

The ICS-FORTH SWIM: A Powerful Semantic Web Integration Middleware

Conference Paper

Full-text available

Sep 2003

Semantic Web (SW) technology aims to facilitate the inte- gration of legacy data sources spread worldwide. Despite the plethora of SW languages (e.g., RDF/S, DAML+OIL, OWL) recently proposed for supporting large scale information interoperation, the vast majority of legacy sources still rely on relational databases (RDB) published on the Web or corporate intranets as virtual XML. In this paper, we advocate a Datalog framework for mediating high-level queries to relational and/or XML sources using community ontologies expressed in a SW language such as RDF/S. We describe the architecture and the reasoning services of our SW integration middleware, called SWIM, and we present the main design choices and techniques for supporting powerful mappings between different data models, as well as, reformulation and optimiza- tion of queries expressed against mediation schemas and views.

Mapping XML to OWL ontologies

Conference Paper

Full-text available

Jan 2005

By now, XML has reached a wide acceptance as data exchange format in E-Business. An efficient collaboration between different participants in E-Business thus, is only possible, when business partners agree on a common syntax and have a common understanding of the basic concepts in the domain. XML covers the syntactic level, but lacks support for efficient sharing of conceptualizations. The Web Ontology Language (OWL (Bec04)) in turn supports the representation of domain knowledge using classes, properties and instances for the use in a distributed environment as the World Wide Web. We present in this paper a mapping between the data model elements of XML and OWL. We give account about its implementation within a ready-to-use XSLT framework, as well as its evaluation for common use cases.

Semantic Based Access over XML Data

Conference Paper

Full-text available

Sep 2009

The need for semantic processing of information and services has lead to the introduction of tools for the description and management of knowledge within organizations, such as RDF, OWL, and SPARQL. However, semantic applications may have to access data from diverse sources across the network. Thus, SPARQL queries may have to be submitted and evaluated against existing XML or relational databases, and the results transferred back to be assembled for further processing. In this paper we describe the SPARQL2XQuery framework, which translates the SPARQL queries to semantically equivalent XQuery queries for accessing XML databases from the Semantic Web environment.

Interoperability of XML Schema Applications with OWL Domain Knowledge and Semantic Web Tools

Conference Paper

Full-text available

Nov 2007

Several standards are expressed using XML Schema syntax, since the XML is the default standard for data exchange in the Internet. However, several applications need semantic support offered by domain ontologies and semantic Web tools like logic-based reasoners. Thus, there is a strong need for interop erability between XML Schema and OWL. This can be achieved if the XML schema constructs are expressed in OWL, where the enrichment with OWL domain ontologies and further semantic processing are possible. After semantic processing, the derived OWL constructs should be converted back to instances of the original schema. We present in this paper XS2OWL, a model and a system that allow the transformation of XML Schemas to OWL-DL constructs. These con structs can be used to drive the automatic creation of OWL domain ontologies and individuals. The XS2OWL transformation model allows the correct conver sion of the derived knowledge from OWL-DL back to XML constructs valid according to the original XML Schemas, in order to be used transparently by the applications that follow XML Schema syntax of the standards.

Querying XML Sources Using an Ontology-Based Mediator

Conference Paper

Full-text available

Oct 2002
Lect Notes Comput Sci

In this paper we propose a mediator architecture for the querying and integration of Web-accessible XML data sources. Our contributions are (i) the definition of a simple but expressive mapping language, following the local as view approach and describing XML resources as local views of some global schema, and (ii) efficient algorithms for rewriting user queries according to existing source descriptions. The approach has been validated by the ST YX prototype.

Semantic annotations for WSDL and XML schema

Article

Jan 2007

Embedding SPARQL into XQuery/XSLT

Conference Paper

Mar 2008

The tree-based languages XQuery and XSLT for XML are widely supported. Many tools do not yet support the new RDF graph query language SPARQL. We propose to embed SPARQL subqueries into XQuery/XSLT, such that XQuery and XSLT benefit from the graph query language constructs of SPARQL, and SPARQL benefits from features of XQuery/XSLT, which SPARQL does not support. The embedding enables XQuery/XSLT tools to handle at the same time XML queries and SPARQL subqueries, and XML and RDF data.

Embedding XPath queries into SPARQL queries

Conference Paper

Jan 2008

While XPath is an established query language developed by the W3C for XML, SPARQL is a new query language developed by the W3C for RDF data. Comparisons between the data models of XML and RDF and between the query languages XPath and SPARQL are missing. Since XML and XPath are earlier recommendations of the W3C than RDF and SPARQL, currently more XML data and XPath queries are used in applications. However, recently available SPARQL query evaluators do not deal with XML data and XPath queries. We have developed a prototype for translating XML data into RDF data and embedding XPath queries into SPARQL queries for the following two reasons: 1) We want to compare the XPath and XQuery data model with the RDF data model and the XPath query language with the SPARQL query language in order to show similarities and differences. 2) We want to enable SPARQL query evaluators to deal with XML data and XPath queries in order to support XPath processing and SPARQL processing in parallel. We have developed a prototype for the source-to-source translations from XML data into RDF data and from XPath queries into SPARQL queries. We have run experiments to measure the execution times of the translations, of XPath queries and of their translated SPARQL queries. 1

SPARQL Query Results XML Format

Article

Jan 2007

Querying XML Data with SPARQL

Abstract and Figures

Recommended publications

XML and Semantic Web W3C Standards Timeline

Semantic Based Access over XML Data

SPARQL2XQuery 2.0: Supporting semantic-based queries over XML data

The XML and Semantic Web Worlds: Technologies, Interoperability and Integration. A Survey of the Sta...

The XML and Semantic Web Worlds: Technologies, Interoperability and Integration. A Survey of the Sta...