Conference PaperPDF Available

Querying XML Data with SPARQL

Authors:

Abstract and Figures

SPARQL is today the standard access language for Semantic Web data. In the recent years XML databases have also a cquired industrial impor- tance due to the widespread applicability of XML in the Web. In this paper we present a framework that bridges the heterogeneity gap and creates an interop- erable environment where SPARQL queries are used to access XML databases. Our approach assumes that fairly generic mappings b etween ontology con- structs and XML Schema constructs have been automatically derived or manu- ally specified. The mappings are used to automatica lly translate SPARQL que- ries to semantically equivalent XQuery queries whic h are used to access the XML databases. We present the algorithms and the implementation of SPARQL2XQuery framework, which is used for answering SPARQL queries over XML databases.
Overview of the SPARQL Translation Process SPARQL Graph Pattern Normalization. The SPARQL Graph Pattern Normalization activity re-writes the Graph-Pattern (GP) of the SPARQL query in an equivalent normal form based on equivalence rules. The SPARQL GP normalization is based on the GP expression equivalences proved in [3] and re-writing techniques. In particular, each GP can be transformed in a sequence P1 UNION P2 UNION P3 UNION…UNION Pn, where Pi (1≤i≤n) is a Union-Free GP (i.e. GPs that do not contain Union operators). This makes the GP translation process simpler and more efficient. Union-Free Graph Pattern (UF-GP) Processing. The UF-GP processing translates the constituent UF-GPs into semantically equivalent XQuery expressions. The UF-GP Processing activity is a composite one, with various sub-activities. This is actually the step that most of the "real work" is done since at this step most of the translation process takes place. The UF-GP processing activity is decomposed in the following sub-activities:-Determination of Variable Types. For every UF-GP, this activity initially identifies the types of the variables used in order to detect any conflict arising from the user's syntax of the input as well as to identify the form of the results for each variable. We define the following variable types: The Class Instance Variable Type (CIVT), The Literal Variable Type (LVT), The Unknown Variable Type (UVT), The Data Type Predicate Variable Type (DTPVT), The Object Predicate Variable Type (OPVT), The Unknown Predicate Variable Type (UPVT). We also define the following sets: The Data Type Properties Set (DTPS), which contains all the data type properties of the ontology. The Object Properties Set (OPS), which contains all the object properties of the ontology. The Variables Set (V), which contains all the variables that are used in the UF-GP. The Literals Set (L), which contains all the literals referenced in the UF-GP.
… 
Content may be subject to copyright.
Querying XML Data with SPARQL*
Nikos Bikakis, Nektarios Gioldasis, Chrisa Tsinaraki,
Stavros Christodoulakis
Technical University of Crete, Department of Electronic and Computer Engineering
Laboratory of Distributed Multimedia Information Systems & Applications (TUC/ MUSIC)
University Campus, 73100, Kounoupidiana Chania, Greece
{nbikakis, nektarios, chrisa, stavros}@ced.tuc.gr
Abstract. SPARQL is today the standard access language for Semantic Web
data. In the recent years XML databases have also acquired industrial impor-
tance due to the widespread applicability of XML in the Web. In this paper we
present a framework that bridges the heterogeneity gap and creates an interop-
erable environment where SPARQL queries are used to access XML databases.
Our approach assumes that fairly generic mappings between ontology con-
structs and XML Schema constructs have been automatically derived or manu-
ally specified. The mappings are used to automatically translate SPARQL que-
ries to semantically equivalent XQuery queries which are used to access the
XML databases. We present the algorithms and the implementation of
SPARQL2XQuery framework, which is used for answering SPARQL queries
over XML databases.
Keywords: Semantic Web, XML Data, Information Integration, Interoperabili-
ty, Query Translation, SPARQL, XQuery, SPARQL to XQuery transla-
tion/transformation, SPARQL2XQuery.
1 Introduction
The Semantic Web has to coexist and interoperate with other software environments
and in particular with legacy databases. The Extensible Markup Language (XML), its
derivatives (XPath, XSLT, etc.), and the XML Schema have been extensively used to
describe the syntax and structure of complex documents. In addition, XML Schema
has been extensively used to describe the standards in many business, service, and
multimedia application environments. As a result, a large volume of data is stored and
managed today directly in the XML format in order to avoid inefficient access and
conversion of data, as well as avoiding involving the application users with more than
one data models. The database management systems offer today an environment
supporting the XML data model and the XQuery access language for managing XML
data. In the Web application environment the XML Schema acts also as a wrapper to
relational content that may coexist in the databases.
Our working scenario assumes that users and applications of the Semantic Web
environment ask for content from underlying XML databases using SPARQL. The
*
An extended version of this paper is available at [14].
SPARQL queries are translated into semantically equivalent XQuery queries which
are (exclusively) used to access and manipulate the data from the XML databases in
order to return the requested results to the user or the application. The results are
returned in RDF (N3 or XML/RDF) or XML [1] format. To answer the SPARQL
queries on top of the XML databases, a mapping at the schema level is required. We
support a set of language level correspondences (rules) for mappings between
RDFS/OWL and XML Schema. Based on these mappings our framework is able to
translate SPARQL queries into semantically equivalent XQuery expressions as well
as to convert XML Data in the RDF format. Our approach provides an important
component of any Semantic Web middleware, which enables transparent access to
existing XML databases.
The framework has been smoothly integrated with the XS2OWL framework [9],
thus achieving not only the automatic generation of mappings between XML Schemas
and OWL ontologies, but also the transformation of XML documents in RDF format.
Various attempts have been made in the literature to address the issue of accessing
XML data from within Semantic Web Environments [2, 4, 5, 6, 7, 8, 9, 10, 11, 12].
An extended overview of related work can be found at [13].
The rest of the paper is organized as follows: The mappings used for the translation
as well as their encoding are described in Section 2. Section 3 provides an overview
of the query translation process. The paper concludes in section 4.
2 Mapping OWL to XML Schema
The framework described here allows XML encoded data to be accessed from Seman-
tic Web applications that are aware of some ontology encoded in OWL. To do that,
appropriate mappings between the OWL ontology (O) and the XML Schema (XS)
should exist. These mappings may be produced either automatically, based on our
previous work in the XS2OWL framework [9], or manually through some mapping
process carried out by a domain expert. However, the definition of mappings between
OWL ontologies and XML Schemas is not the subject of this paper. Thus, we do not
focus on the semantic correctness of the defined mappings. We neither consider what
the mapping process is, nor how these mappings have been produced
Such a mapping process has to be guided from language level correspondences.
That is, the valid correspondences between the OWL and XML Schema language
constructs have to be defined in advance. The language level correspondences that
have been adopted in this paper are well-accepted in a wide range of data integration
approaches [2, 4, 9, 10, 11]. In particular, we support mappings that obey the follow-
ing language level correspondence rules: A class of O corresponds to a Complex Type
of XS, a DataType Property of O corresponds to a Simple Element or Attribute of XS,
and an Object Property of O corresponds to a Complex Element of XS.
Then, at the schema level, mappings between concrete domain conceptualizations
have to be defined (e.g. the employee class is mapped to the worker complex type)
following the correspondences established at the language level.
At the schema level mappings a mapping relationship between O and an XS is a bi-
nary association representing a semantic association among them. It is possible that
for a single ontology construct more than one mapping relationships are defined. That
is, a single source ontology construct can be mapped to more than one target XML
Schema elements (1:n mapping) and vice versa, while more complex mapping rela-
tionships can be supported.
The mappings considered in our work are based on the Consistent Mappings Hypo-
thesis, which states that for each mapped property Pr of O:
a. The domain classes of Pr have been mapped to complex types in XS that
contain the elements or attributes that Pr has been mapped to.
b. If Pr is an object property, the range classes of Pr have been mapped to
complex types in XS, which are used as types for the elements that Pr has been
mapped to.
2.1 Encoding of the Schema Level Mappings
Since we want to translate SPARQL queries into semantically equivalent XQuery
expressions that can be evaluated over XML data following a given (mapped) sche-
ma, we are interested in addressing XML data representations. Thus, based on schema
level mappings for each mapped ontology class or property, we store a set of XPath
expressions (“XPath set” for the rest of this paper) that address all the corresponding
instances (XML nodes) in the XML data level. In particular, based on the schema
level mappings, we construct:
A Class XPath Set X
C
for each mapped class C, containing all the possible
XPaths of the complex types to which the class C has been mapped to.
A Property XPath Set X
Pr
for each mapped property Pr, containing all the possi-
ble XPaths of the elements or/and attributes to which Pr has been mapped.
For ontology properties, we are also interested in identifying the property domains
and ranges. Thus, for each property we define the X
PrD
and X
PrR
sets, where:
The Property Domains XPath Set X
PrD
for a property Pr represents the set of the
XPaths of the property domain classes.
The Property Ranges XPath Set X
PrR
for a property Pr represents the set of the
XPaths of the property ranges.
Example 1. Encoding of Mappings
Fig. 1 shows the mappings between an OWL Ontology and an XML Schema.
Fig. 1. Mappings Between OWL & XML
To better explain the defined mappings, we show in Fig. 1 the structure of the
XML documents that follow this schema. The encoding of these mappings in our
framework is shown in Fig.2.
Fig. 2. Mappings Encoding
XPath Set Operators. For XPath Sets, the following operators are defined in order to
formally explain the query translation methodology in the next sections:
The unary Parent Operator
P
, which, when applied to a set of XPaths X (i.e. (X)
P
),
returns the set of the distinct parent XPaths (i.e. the same XPaths without the leaf
node). When applied to the root node, the operator returns the same node.
Example 2. Let Χ={ /a , /a/b , /c/d , /e/f/g , /b/@f } then (Χ)
P
={ /a , /a , /c , /e/f , /b }.
The binary Right Child Operator ®, which, when applied to two XPath sets X and Y
(i.e. X®Y ), returns the members (XPaths) of the right set X, the parent XPaths of
which are contained in the left set Y.
Example 3. Let X={ /a , /c/b } and Y={ /a/d , /a/c , /c/b/p , c/a/g } then
X ®Y = { /a/d , /a/c , /c/b/p } .
The binary Append Operator
/
, which is applied on an XPath set X and a set of node
names N (i.e. X / N ), resulting in a new set of XPaths Y by appending each member
of N to each member of X.
Example 4. Let X={/a, /a/b} and N={c, d} then Y = X / N = {/a/c, /a/d, /a/b/c, a/b/d }.
XPath Set Relations. We describe here a relation among XPath sets that holds
because of the Consistent Mapping Hypothesis described above. We will use this
relation later on in the query translation process, and in particular in the variable
bindings algorithm (subsection 3.1):
Domain-Range Property Relation:
(
)
(
)
Property Pr and X
Pr Pr PrD Pr Pr
P P
X X X X
R R
⇒ = = =
The Domain-Range Property Relation can be easily understood taking into account
the hierarchical structure of XML data as well as the Consistent Mappings Hypothe-
sis. It describes that for a single property Pr:
the XPath set of its ranges is equal to its own XPath set (i.e. the instances of its
ranges are the XML nodes of the elements that this property has been mapped to).
the XPath set of its domain classes is equal to the set containing its parent XPaths
(i.e. the XPaths of the CTs(Complex Types) that contain the elements that this
property has been mapped to).
3 Overview of the Query Translation Process
In this section we present in brief the entire translation process using a UML activity
diagram. Fig. 3 shows the entire process which starts taking as input the given
SPARQL query and the defined mappings between the ontology and the XML Sche-
ma (encoded as described in the previous sections). The query translation process
comprises of the activities outlined in the following paragraphs.
act SPARQL2?QUERY
Mappings SPARQL GraphPattern
Normalization
SPARQL
Query
Solution Sequence
Modifiers Translation
Query Form Based
Translation
Union-Free GraphPattern Processing
Determination of
Variable Types
Processing
Onto-Triples
UF-GP2XQuery
Variables
Binding
BGP2XQuery
Union Operator
Translation
[Else]
[SSMs Exist]
[Else]
[Else]
[Type Conflicts]
[Onto-Triples
Exist]
[Else] [More GPs]
[More U-F GPs]
[More BGPs]
Fig. 3. Overview of the SPARQL Translation Process
SPARQL Graph Pattern Normalization. The SPARQL Graph Pattern Normali-
zation activity re-writes the Graph-Pattern (GP) of the SPARQL query in an equiva-
lent normal form based on equivalence rules. The SPARQL GP normalization is
based on the GP expression equivalences proved in [3] and re-writing techniques. In
particular, each GP can be transformed in a sequence P1 UNION P2 UNION P3 UN-
ION…UNION Pn, where Pi (1in) is a Union-Free GP (i.e. GPs that do not contain
Union operators). This makes the GP translation process simpler and more efficient.
Union-Free Graph Pattern (UF-GP) Processing. The UF-GP processing trans-
lates the constituent UF-GPs into semantically equivalent XQuery expressions. The
UF-GP Processing activity is a composite one, with various sub-activities. This is
actually the step that most of the “real work” is done since at this step most of the
translation process takes place. The UF-GP processing activity is decomposed in the
following sub-activities:
Determination of Variable Types. For every UF-GP, this activity initially iden-
tifies the types of the variables used in order to detect any conflict arising from the
user’s syntax of the input as well as to identify the form of the results for each vari-
able. We define the following variable types: The Class Instance Variable Type
(CIVT), The Literal Variable Type (LVT), The Unknown Variable Type (UVT), The
Data Type Predicate Variable Type (DTPVT), The Object Predicate Variable Type
(OPVT), The Unknown Predicate Variable Type (UPVT).
We also define the following sets: The Data Type Properties Set (DTPS), which
contains all the data type properties of the ontology. The Object Properties Set
(OPS), which contains all the object properties of the ontology. The Variables Set
(V), which contains all the variables that are used in the UF-GP. The Literals Set
(L), which contains all the literals referenced in the UF-GP.
The determination of the variable types is based on a set of rules applied itera-
tively for each triple in the given UF-GP. Below we present a subset of these rules,
which are used to determine the type (T
X
) of a variable X:
Let S P O be a triple pattern.
1. If P є OPS and Ο є V
T
O
= CIVT. If predicate is an object property and
object is a variable, then the type of the object variable is CIVT.
2. If Ο є L and P є V T
P
= DTPVT. If the object is a literal value, then the
type of the predicate variable is DTPVT.
Processing Onto-Triples. Onto-Triples actually refer to the ontology structure
and/or semantics. The main objective of this activity is to process Onto-Triples
against the ontology (using SPARQL) and based on this analysis to bind (i.e. assign-
ing the relevant XPaths to variables) the correct XPaths to variables contained in the
Onto-Triples. These bindings are going to be used in the next steps as input to the
Variable Bindings activity.
UF-GP2XQuery. This activity translates the UF-GP into semantically equivalent
XQuery expressions. The concept of a GP, and thus the concept of UF-GF, is de-
fined recursively. The BGP2XQuery algorithm translates the basic components of a
GP (i.e. Basic Graph Patterns - BGPs which are sequences of triple patterns and fil-
ters) into semantically equivalent XQuery expressions (see subsection 3.2). To do
that a variables binding (see subsection 3.1) step is needed. Finally, BGPs in the
context of a GP have to be properly associated. That is, to apply the SPARQL oper-
ators among them using XQuery expressions and functions. These operators are:
OPT, AND, and FILTER and are implemented using standard XQuery expressions
without any ad hoc processing.
Union Operator Translation. This activity translates the UNION operator that ap-
pears among UF-GPs in a GP, by using the Let and Return XQuery clauses in order
to return the union of the solution sequence produced by the UF-GPs to which the
Union operator applies.
Solution Sequence Modifiers Translation. This activity translates the SPARQL
solution sequence modifiers using XQuery clauses (Order By, For, Let, etc.) and
XQuery built-in functions (you can see the example in subsection 3.3.). The modifiers
supported by SPARQL are Distinct, Order By, Reduced, Limit, and Offset.
Query Forms Based Translation. SPARQL has four forms of queries (Select, Ask,
Construct and Describe). According to the query form, the structure of the final result
is different. The query translation is heavily dependent on the query form. In particu-
lar, after the translation of any solution modifier is done, the generated XQuery is
enhanced with appropriate expressions in order to achieve the desired structure of the
results (e.g. to construct an RDF graph, or a result set) according to query form.
3.1 Variable Bindings
This section describes the variable bindings activity. In the translation process the
term “variable bindings” is used to describe the assignment of the correct XPaths to
the variables referenced in a given Basic Graph Pattern (BGP), thus enabling the
translation of BGP to XQuery expressions. In this activity, Onto-Triples are not taken
into account since their processing has taken place in the previous step.
Definition 1 : A triple pattern has the form (s,p,o) є( I
U
B
U
V )
x
( I
U
V
U
B )
x
( I
U
B
U
L
U
V ), where I is a set of IRIs, B is a set of Blank Nodes, V is a set of
Variables, and L the set of RDF Literals. In our approach, however, the individuals
in the source ontology are not considered at all (either they do not exist, or they are
not used in semantic queries).
Definition 2 : A variable contained in a Union Free Graph Pattern is called a
Shared Variable when it is referenced in more than one triple patterns of the same
Union-Free Graph Pattern regardless its position in those triple patterns.
Variable Bindings Algorithm. When describing data with the RDF triples (s,p,o),
subjects represent class individuals (RDF nodes), predicates represent properties
(RDF arcs), and objects represent class individuals or data type values (RDF nodes).
Based on that, and the domain-range property relation of Xpaths sets relations section
we have: a) X
s
= X
pD
= (X
pR
)
P
= (X
p
)
P
b) X
p
= X
pR
and c) X
o
= X
pR .
Thus it holds that: Χ
s
= Χ
pD
= (Χ
pR
)
P
= (Χ
p
)
P
=
(Χ
o
)
P
Χ
s
= (Χ
p
)
P
= (Χ
o
)
P
(Subject-
Predicate-Object Relation)
This relation holds for every single triple pattern. Thus, the variable bindings algo-
rithm uses this relation in order to find the correct bindings for the entire set of triple
patterns starting from the bindings of any single triple pattern part (subject, predicate,
or object).
In case of shared variables, the algorithm tries to find the maximum set of bindings
(using the operators for XPath sets) that satisfy this relation for the entire set of triple
patterns (e.g. the entire BGP). Once this relation holds for the entire BGP we have as
a result that all the instances (in XML) that satisfy the BGP have been addressed.
The variable bindings algorithm in case of shared variables of LVT type it doesn’t
determine the XPaths for this kind of variable, since literal equality is independent of
the XPaths expressions. Thus, the bindings for variables of this type cannot be defined
at this step (mark as “Not Definable” at variable bindings rules). Instead, they will be
handled by the BGP2XQuery (subsection 3.2) algorithm (using the mappings and the
determined variables bindings).
The algorithm takes as input a BGP as well as a set of initial bindings and the types
of variables as these are determined in the “Determination of Variable Type” activity.
These initial bindings are the ones produced by the Onto-Triple processing activity
and initialize the bindings of the algorithm. Then, the algorithm performs an iterative
process where it determines, at each step, the bindings of the entire BGP (triple by
triple). The determination of the bindings is based on the rules described below. This
iterative process continues until the bindings for all the variables found in the succes-
sive iterations are equal. This means that no further modifications in the variable
bindings are to be made and that the current bindings are the final ones.
Variable Bindings Rules. Based on the possible combinations of S, P and O, there
are four different types of triple patterns (the ontology instance are not yet supported
by our framework):
Type 1 : S є V, P є I ,O є L. Type 2 : S, O є V, P є I . Type 3 : S, P є V,
O є L. Type 4 : S, P, O є V.
According to the triple pattern type, we have defined a set of rules for the variable
bindings. In this section we present a sub-set of these rules due to space limitations.
In what follows the symbol in XPath sets denotes the new bindings assigned to
the set at each iteration, while the symbol denotes the assignment of a new value to
the set. All the XPath sets are considered to be initially set to null. In that case, the
intersection operation is not affected by the null set. E.g. Χ={ null } and Υ= {/a/b ,
d/e} then X Y ={ /a/b , d/e }. The notation “Not Definable” is used for variables of
type LVT as explained above. Consider the triple S P O :
If the triple is of Type 1 X
S
X
PD
X
S
If the triple is of Type 2 X
S
X
PD
X
S
(X
O
)
P
If P є OPS X
O
X
S
® X
O
If P є DTPS X
O
Non Definable (as explained in previously)
If the triple is of Type 3 X
S
X
PD
X
S
and X
P
X
S
® X
P
If the triple is of Type 4 X
S
X
PD
X
S
(X
O
)
P
and X
P
X
S
® X
P
If T
O
= CIVT or T
O
= UVT X
O
X
P
X
O
If T
O
= LVT X
O
Non Definable (as explained previously)
XPath Set Relations for Triple-Patterns. Among XPath sets of triple patterns there
are important relations that can be exploited in the development of the XQuery ex-
pressions in order to correctly associate data that have been bound to different va-
riables of triple patterns. The most important relation among XPath sets of triple pat-
terns is that of extension:
Extension Relation: An XPath set A is said to be an extension of an XPath set B if
all XPaths in A are descendants of the XPaths of B.
As an example of this relation, consider the XPath A
produced when applying the
append (/) operator to an original XPath set A with a set of nodes.
The extension relation holds for the results of the variable bindings algorithm (Sub-
ject-Predicate-Object Relation) and implies that the XPaths bound to subjects are
parents of the XPaths bound to predicates and objects of triple patterns.
3.2 Translating BGPs to XQuery
In this section we describe the translation of BGPs to semantically equivalent XQuery
expressions. The algorithm manipulates a sequence of triple patterns and filters (i.e. a
BGP) and translates them into semantically equivalent XQuery expressions, thus
allowing the evaluation of a BGP on a set of XML data.
Definition 3 : Return Variables (RV) are those variables for which the given
SPARQL Query would return some information. The set of all Return Variables of
a SPARQL query constitutes the set RV
V.
The BGP2XQuery Algorithm. We briefly present here the BGP2XQuery algo-
rithm for translating BGPs into semantically equivalent XQuery expressions. The
algorithm takes as input the mappings between the ontology and the XML schema,
the BGP, the determined variable types, as well as the variable bindings. The algo-
rithm is not executed triple-by-triple for a complete BGP. Instead, it processes sub-
jects, predicates, and objects of all the triples separately. For each variable included in
the BGP, the BGP2XQuery it creates a For or Let XQuery clause using the variable
bindings, the input mappings, and the Extension Relation for triple-patterns (see sub-
section.3.1), in order to bound XML data into XQuery variables. The choice between
the For and the Let XQuery clauses is based on specific rules so as to create a solu-
tion sequence based on the SPARQL semantics. Moreover, in order to associate bind-
ings from different variables into concrete solutions, the algorithm uses the Extension
Relation. For literals included in the BGP, the algorithm is using XPath predicates in
order to translate them. Due to the complexity that a SPARQL filter may have, the
algorithm translates all the filters into XQuery where clauses, although some “simple”
of them (e.g. condition on literals) could be translated using XPath predicates. More-
over, SPARQL operators (Built-in functions) included in filter expressions are trans-
lated using built-in XQuery functions and operators. However, for some “special”
SPARQL operators (like sameTerm, lang, etc.) we have developed native XQuery
functions that simulate them.
Finally, the algorithm creates an XQuery Return clause that includes the Return
Variables (RV) that was used in the BGP.
There are some cases of share variables which need special treatment by the algo-
rithm in order to apply the required joins in XQuery expressions. The way that the
algorithm manipulates these cases depends on which parts (subject-predicate-object)
of the triples patterns these shared variables refer to.
3.3 Example
We demonstrate in this example the use of the described framework in order to allow
a SPARQL query to be evaluated in XML Data (based on Example 1). Fig. 4 shows
how a given SPARQL query is translated by our framework into a semantically
equivalent XQuery.
Fig. 4. SPARQL Query Translation Example
4 Conclusions
We have presented a framework and its software implementation that allows the eval-
uation of SPARQL queries over XML data which are stored in XML databases and
accessed with the XQuery language. The framework assumes that a set of mappings
between the OWL ontology and the XML Schema exists which obey to certain well
accepted language correspondences.
The SPARQL2XQuery framework has been implemented as a software service
which can be configured with appropriate mappings (between some ontology and
XML Schema) and translates input SPARQL queries into semantically equivalent
XQuery queries that are answered over the XML Database.
5 References
1. Beckett D. (eds), “SPARQL Query Results XML Format”. W3C Recommendation, 15
January 2008, (http://www.w3.org/TR/rdf-sparql-XMLres/).
2. Bohring H., Auer S.: “Mapping XML to OWL Ontologies”. Leipziger Informatik-Tage
2005: 147-156
3. J. Perez, M. Arenas, C. Gutierrez. Semantics and Complexity of SPARQL. 5th Interna-
tional Semantic Web Conference (ISWC-06), November 2006.
4. Rodrigues T., Rosa P, Cardoso J., “Mapping XML to Exiting OWL ontologies”, Interna-
tional Conference WWW/Internet 2006, Murcia, Spain, 5-8 October 2006.
5. Joel Farrell and Holger Lausen. Semantic Annotations for WSDL and XML Schema.
W3C Recommendation, W3C, August 2007. Available at http://www.w3.org/TR/sawsdl/
6. Sven Groppe, Jinghua Groppe, Volker Linnemann, Dirk Kukulenz, Nils Hoeller, Chris-
toph Reinke: Embedding SPARQL into XQuery/XSLT. SAC 2008: 2271-2278
7. Waseem Akhtar, Jacek Kopecký et.al : XSPARQL: Traveling between the XML and RDF
Worlds - and Avoiding the XSLT Pilgrimage. ESWC 2008:432-447
8. Matthias Droop, Markus Flarer et.al : “Embedding XPATH Queries into SPARQL Que-
ries” In Proc. of the 10th International Conference on Enterprise Information Systems
9. Tsinaraki C., Christodoulakis S., “Interoperability of XML Schema Applications with
OWL Domain Knowledge and Semantic Web Tools”. In Proc. of the ODBASE 2007.
10. Cruz I.R., Huiyong Xiao, Feihong Hsu: “An Ontology-based Framework for XML Seman-
tic Integration”, Database Engineering and Applications Symposium, 2004.
11. V.Christophides, G. Karvounarakis et.al : “The ICS-FORTH SWIM: A Powerful Semantic
Web Integration Middleware”. In Proc. of the SWDB 2003, pages 381-393.
12. Bernd Amann, Catriel Beeri, Irini Fundulaki, Michel Scholl: Querying XML Sources
Using an Ontology-Based Mediator. CoopIS/DOA/ODBASE 2002: 429-448
13. Bikakis N., Gioldasis N., Tsinaraki C., Christodoulakis S.: “Semantic Based Access over
XML Data” In Proc. of 2
nd
World Summit on Knowledge Society 2009 (WSKS2009).
14. Bikakis N., Gioldasis N., Tsinaraki C., Christodoulakis S.: “The SPARQL2XQuery
Framework” Technical Report http://www.music.tuc.gr/reports/SPARQL2XQUERY.PDF
... Existing Schema Auto-Vocabulary Representation Matic [107] RDFS, DAML+OIL [108] RDFS, OWL [109] RDFS, OWL [110] n/a [111] RDFS, OWL [98,112] n/a [113] RDFS, OWL [114] RDFS, OWL [115] RDFS, OWL [116] RDFS, OWL [69] RDFS, OWL [117] n/a [118] n/a [119] RDFS, OWL [120] RDFS, OWL [121] RDFS, OWL [122] RDFS [123] RDFS [124] RDFS, OWL [125] RDFS, OWL [126] RDFS, OWL [127] RDFS, OWL [128] RDFS, OWL [129] RDFS, OWL [130,131] RDFS, OWL [132] n/a [133] RDFS [134] RDFS, OWL At the beginning, we focus on solutions [115,116,121,126,127] that use existing vocabulary and/or ontology. This means that the XML data is transformed according to the mapped vocabularies. ...
... Another group of proposals [107,109,111,119,120,[122][123][124][125]130,133,134] do not support mappings between XML Schemas and existing vocabularies. Amann et al. [107] discuss a data integration system, where XML is mapped into vocabulary that supports roles and inheritance. ...
... It supports a set of patterns that enable the translation from XML Schema into ontology. SPARQL2XQuery [111] is a framework that transforms SPARQL [135] query into a XQuery [136] using mapping from vocabulary to XML Schema. It allows query XML databases. ...
Article
Full-text available
Resource Description Framework (RDF) can seen as a solution in today’s landscape of knowledge representation research. An RDF language has symmetrical features because subjects and objects in triples can be interchangeably used. Moreover, the regularity and symmetry of the RDF language allow knowledge representation that is easily processed by machines, and because its structure is similar to natural languages, it is reasonably readable for people. RDF provides some useful features for generalized knowledge representation. Its distributed nature, due to its identifier grounding in IRIs, naturally scales to the size of the Web. However, its use is often hidden from view and is, therefore, one of the less well-known of the knowledge representation frameworks. Therefore, we summarise RDF v1.0 and v1.1 to broaden its audience within the knowledge representation community. This article reviews current approaches, tools, and applications for mapping from relational databases to RDF and from XML to RDF. We discuss RDF serializations, including formats with support for multiple graphs and we analyze RDF compression proposals. Finally, we present a summarized formal definition of RDF 1.1 that provides additional insights into the modeling of reification, blank nodes, and entailments.
... The vocab- Table 6: XML mapping. This table presents the features of transformation approaches, namely: existing vocabulary (yes -, no -), schema representation and level of automation (automatic -, semiautomatic -and manual -) Approaches Existing Schema Autovocabulary Representation matic [3] RDFS, DAML+OIL [10] RDFS, OWL [11] RDFS, OWL [13] n/a [14] RDFS, OWL [16,17] n/a [19] RDFS, OWL [40] RDFS, OWL [43] RDFS, OWL [55] RDFS, OWL [56] RDFS, OWL [57] n/a [60] n/a [62] RDFS, OWL [71] RDFS, OWL [73] RDFS, OWL [94] RDFS [95] RDFS [99] RDFS, OWL [113] RDFS, OWL [124] RDFS, OWL [126] RDFS, OWL [135] RDFS, OWL [142] RDFS, OWL [148,149] RDFS, OWL [154] n/a [161] RDFS [162] RDFS, OWL ulary generated in that proposal reflects the database semantics. The mappings are stored in an R 2 O [9] document. ...
... Another group of proposals [3,11,14,62,71,94,95,99,113,149,161,162] do not support mappings between XML Schemas and existing vocabularies. Amann et al. [3] discuss a data integration system, where XML is mapped into vocabulary that supports roles and inheritance. ...
... It supports a set of patterns that enable the translation from XML Schema into ontology. SPARQL2XQuery [14] is a framework that transforms SPARQL [80] query into a XQuery [125] using mapping from vocabulary to XML Schema. It allows query XML databases. ...
Preprint
Full-text available
Resource Description Framework (RDF) can seen as a solution in today's landscape of knowledge representation research. An RDF language has symmetrical features because subjects and objects in triples can be interchangeably used. Moreover, the regularity and symmetry of the RDF language allow knowledge representation that is easily processed by machines, and because its structure is similar to natural languages, it is reasonably readable for people. RDF provides some useful features for generalized knowledge representation. Its distributed nature, due to its identifier grounding in IRIs, naturally scales to the size of the Web. However, its use is often hidden from view and is, therefore, one of the less well-known of the knowledge representation frameworks. Therefore, we summarise RDF v1.0 and v1.1 to broaden its audience within the knowledge representation community. This article reviews current approaches, tools, and applications for mapping from relational databases to RDF and from XML to RDF. We discuss RDF serializations, including formats with support for multiple graphs and we analyze RDF compression proposals. Finally, we present a summarized formal definition of RDF 1.1 that provides additional insights into the modeling of reification, blank nodes, and entailments.
... Finally, edges are resolved into joins, also basing on gTop mappings. SPARQL-to-XPath/XQuery: SPARQL2XQuery is described in a couple of publications [137][138][139]. The translation is based on a mapping model between OWL ontology (existing or user-defined) and XML Schema. ...
... ? SPARQL-to-XPath/XQuery [137][138][139]172] "/" "/" " " " "/" " Others are features provided only by individual efforts. ...
Thesis
Full-text available
The remarkable advances achieved in both research and development of Data Management as well as the prevalence of high-speed Internet and technology in the last few decades have caused unprecedented data avalanche. Large volumes of data manifested in a multitude of types and formats are being generated and becoming the new norm. In this context, it is crucial to both leverage existing approaches and propose novel ones to overcome this data size and complexity, and thus facilitate data exploitation. In this thesis, we investigate two major approaches to addressing this challenge: Physical Data Integration and Logical Data Integration. The specific problem tackled is to enable querying large and heterogeneous data sources in an ad hoc manner. In the Physical Data Integration, data is physically and wholly transformed into a canonical unique format, which can then be directly and uniformly queried. In the Logical Data Integration, data remains in its original format and form and a middleware is posed above the data allowing to map various schemata elements to a high-level unifying formal model. The latter enables the querying of the underlying original data in an ad hoc and uniform way, a framework which we call Semantic Data Lake, SDL. Both approaches have their advantages and disadvantages. For example, in the former, a significant effort and cost are devoted to pre-processing and transforming the data to the unified canonical format. In the latter, the cost is shifted to the query processing phases, e.g., query analysis, relevant source detection and results reconciliation. In this thesis we investigate both directions and study their strengths and weaknesses. For each direction, we propose a set of approaches and demonstrate their feasibility via a proposed implementation. In both directions, we appeal to Semantic Web technologies, which provide a set of time-proven techniques and standards that are dedicated to Data Integration. In the Physical Integration, we suggest an end-to-end blueprint for the semantification of large and heterogeneous data sources, i.e., physically transforming the data to the Semantic Web data standard RDF (Resource Description Framework). A unified data representation, storage and query interface over the data are suggested. In the Logical Integration, we provide a description of the SDL architecture, which allows querying data sources right on their original form and format without requiring a prior transformation and centralization. For a number of reasons that we detail, we put more emphasis on the virtual approach. We present the effort behind an extensible implementation of the SDL, called Squerall, which leverages state-of-the-art Semantic and Big Data technologies, e.g., RML (RDF Mapping Language) mappings, FnO (Function Ontology) ontology, and Apache Spark. A series of evaluation is conducted to evaluate the implementation along with various metrics and input data scales. In particular, we describe an industrial real-world use case using our SDL implementation. In a preparation phase, we conduct a survey for the Query Translation methods in order to back some of our design choices.
... This is where frameworks like SPARQL2XQuery took place to accentuate the adjacency and interoperability of both OWL and XML. Therefore, the proposed framework (Bikakis et al., 2009) was able to evaluate SPARQL queries over XML data after mappings XML to OWL Schemas. ...
... This is where frameworks like SPARQL2XQuery took place to accentuate the adjacency and interoperability of both OWL and XML. Therefore, the proposed framework (Bikakis et al., 2009) was able to evaluate SPARQL queries over XML data after mappings XML to OWL Schemas. ...
Article
Converting music score content from symbolic formats to simplified data formats is found useful for artificial intelligence purposes. The conversion can be applied using XSL stylesheets and ontologies to ensure the preserving of the data quality throughout the transformation. In this paper, we proposed a new converter capable of transforming music scores encoded in MEI to JSON format for pre-processing purposes, and future usage into artificial intelligence techniques. The proposed converter uses an eastern music score ontology capable of structuring standard music scores content in addition to elements and attributes specific to eastern music. Thus, the converter shares the same support for eastern music scores. We illustrate the conversion process by assessing the performance analysis, the data quality, and the storage of the proposed converter in comparison with a combined approach composed of two state-of-the-art converters.
... The translation from an operator into an XQuery constructs is based on transformation rules, which replace the embedded SPARQL constructs with XQuery constructs.2. In SPARQL2XQuery[146][147][148] Bikakis et al. propose a translation approach that is based on a mapping model between OWL ontology (existing or user-defined) and XML Schema. ...
Thesis
Full-text available
Over the last few years, the amount and availability of machine-readable Open, Linked, and Big data on the web has increased. Simultaneously, several data management systems have emerged to deal with the increased amounts of this structured data. RDF and Graph databases are two popular approaches for data management based on modeling, storing, and querying graph-like data. RDF database systems are based on the W3C standard RDF data model and use the W3C standard SPARQL as their defacto query language. Most graph database systems are based on the Property Graph (PG) data model and use the Gremlin language as their query language due to its popularity amongst vendors. Given that both of these approaches have distinct and complementary characteristics – RDF is suited for distributed data integration with built-in world-wide unique identifiers and vocabularies; PGs, on the other hand, support horizontally scalable storage and querying, and are widely used for modern data analytics applications, – it becomes necessary to support interoperability amongst them. The main objective of this dissertation is to study and address this interoperability issue. We identified three research challenges that are concerned with the data interoperability, query interoperability, and benchmarking of these databases. First, we tackle the data interoperability problem. We propose three direct mappings (schema-dependent and schema-independent) for transforming an RDF database into a property graph database. We show that the proposed mappings satisfy the desired properties of semantics preservation and information preservation. Based on our analysis (both formal and empirical), we argue that any RDF database can be transformed into a PG database using our approach. Second, we propose a novel approach for querying PG databases using SPARQL using Gremlin traversals – GREMLINATOR to tackle the query interoperability problem. In doing so, we first formalize the declarative constructs of Gremlin language using a consolidated graph relational algebra and define mappings to translate SPARQL queries into Gremlin traversals. GREMLINATOR has been officially integrated as a plugin for the Apache TinkerPop graph computing framework (as sparql-gremlin), which enables users to execute SPARQL queries over a wide variety of OLTP graph databases and OLAP graph processing frameworks. Finally, we tackle the third, benchmarking (performance evaluation), problem. We propose a novel framework – LITMUS Benchmark Suite that allows a choke-point driven performance comparison and analysis of various databases (PG and RDF-based) using various third-party real and synthetic datasets and queries. We also studied a variety of intrinsic and extrinsic factors – data and system-specific metrics and Key Performance Indicators (KPIs) that influence a given system’s performance. LITMUS incorporates various memory, processor, data quality, indexing, query typology, and data-based metrics for providing a fine-grained evaluation of the benchmark. In conclusion, by filling the research gaps, addressed by this dissertation, we have laid a solid formal and practical foundation for supporting interoperability between the RDF and Property graph database technology stacks. The artifacts produced during the term of this dissertation have been integrated into various academic and industrial projects.
Chapter
Query optimization system proposes an answer-driven approach to information access. Most of the query optimization system aims for information retrieval required by natural language queries. Queries are generally asked within a context, and answers are provided within that specific context. RDF is a general proposition language for the Web, joining data from diverse resources. SPARQL, a query language for RDF, can join data from different databanks, as well as papers, inference engines, or anything else that may reveal its expertise as a guided classified chart. Because of lack of proper architectural circulation, the existing SPARQL-to-SQL translation techniques have actually trimmed a lot of restrictions that decrease their toughness, effectiveness, and reliability. These constraints include the generation of ineffective or perhaps incorrect SQL inquiries, lack of official history, and bad applications. This paper recommended a structure which made use of by an ontology-based moderator system to provide the well-defined semantical design, which (i) supplies a distinct SPARQL semantics used to rewrite the question in SQL; (ii) ontology-based expertise is created for rapid accessibility as well as equate question revising SPARQL to SQL for reliable information retrieval in semantic Internet data of big dataset; (iii) hybrid query optimization framework is proposed for query handling technique for the effective access of customized details on the semantic Internet making use of bundled ontology expertise and also inference engine.
Article
Full-text available
Integrating data from multiple heterogeneous data sources entails dealing with data distributed among heterogeneous information sources, which can be structured, semi-structured or unstructured, and providing the user with a unified view of these data. Thus, in general, gathering information is challenging, and one of the main reasons is that data sources are designed to support specific applications. Very often their structure is unknown to the large part of users. Moreover, the stored data is often redundant, mixed with information only needed to support enterprise processes, and incomplete with respect to the business domain. Collecting, integrating, reconciling and efficiently extracting information from heterogeneous and autonomous data sources is regarded as a major challenge. In this paper, we present an approach for the semantic integration of heterogeneous data sources, DIF (Data Integration Framework), and a software prototype to support all aspects of a complex data integration process. The proposed approach is an ontology-based generalization of both Global-as-View and Local-as-View approaches. In particular, to overcome problems due to semantic heterogeneity and to support interoperability with external systems, ontologies are used as a conceptual schema to represent both data sources to be integrated and the global view.
Article
Full-text available
Now-a-days, XML has reached a wide recognition and brought interoperability at a syntactic level. Unfortunately, even when using XML to represent data, problems arise when it is necessary to integrate different data sources because XML lacks support for efficient sharing of conceptualization. Emerging Semantic Web technologies, such as ontologies, can enable semantic interoperability. With ontologies, it is possible to formally represent shared domain knowledge models defined with concepts, attributes, relationships and instances. In this paper, we present a notation to map XML Schema to existing OWL ontologies and the qualities an algorithm should have to transform XML documents (instances of the mapped schema) into instances of the mapped ontology.
Conference Paper
Full-text available
Semantic Web (SW) technology aims to facilitate the inte- gration of legacy data sources spread worldwide. Despite the plethora of SW languages (e.g., RDF/S, DAML+OIL, OWL) recently proposed for supporting large scale information interoperation, the vast majority of legacy sources still rely on relational databases (RDB) published on the Web or corporate intranets as virtual XML. In this paper, we advocate a Datalog framework for mediating high-level queries to relational and/or XML sources using community ontologies expressed in a SW language such as RDF/S. We describe the architecture and the reasoning services of our SW integration middleware, called SWIM, and we present the main design choices and techniques for supporting powerful mappings between different data models, as well as, reformulation and optimiza- tion of queries expressed against mediation schemas and views.
Conference Paper
Full-text available
By now, XML has reached a wide acceptance as data exchange format in E-Business. An efficient collaboration between different participants in E-Business thus, is only possible, when business partners agree on a common syntax and have a common understanding of the basic concepts in the domain. XML covers the syntactic level, but lacks support for efficient sharing of conceptualizations. The Web Ontology Language (OWL (Bec04)) in turn supports the representation of domain knowledge using classes, properties and instances for the use in a distributed environment as the World Wide Web. We present in this paper a mapping between the data model elements of XML and OWL. We give account about its implementation within a ready-to-use XSLT framework, as well as its evaluation for common use cases.
Conference Paper
Full-text available
The need for semantic processing of information and services has lead to the introduction of tools for the description and management of knowledge within organizations, such as RDF, OWL, and SPARQL. However, semantic applications may have to access data from diverse sources across the network. Thus, SPARQL queries may have to be submitted and evaluated against existing XML or relational databases, and the results transferred back to be assembled for further processing. In this paper we describe the SPARQL2XQuery framework, which translates the SPARQL queries to semantically equivalent XQuery queries for accessing XML databases from the Semantic Web environment.
Conference Paper
Full-text available
Several standards are expressed using XML Schema syntax, since the XML is the default standard for data exchange in the Internet. However, several applications need semantic support offered by domain ontologies and semantic Web tools like logic-based reasoners. Thus, there is a strong need for interop erability between XML Schema and OWL. This can be achieved if the XML schema constructs are expressed in OWL, where the enrichment with OWL domain ontologies and further semantic processing are possible. After semantic processing, the derived OWL constructs should be converted back to instances of the original schema. We present in this paper XS2OWL, a model and a system that allow the transformation of XML Schemas to OWL-DL constructs. These con structs can be used to drive the automatic creation of OWL domain ontologies and individuals. The XS2OWL transformation model allows the correct conver sion of the derived knowledge from OWL-DL back to XML constructs valid according to the original XML Schemas, in order to be used transparently by the applications that follow XML Schema syntax of the standards.
Conference Paper
Full-text available
In this paper we propose a mediator architecture for the querying and integration of Web-accessible XML data sources. Our contributions are (i) the definition of a simple but expressive mapping language, following the local as view approach and describing XML resources as local views of some global schema, and (ii) efficient algorithms for rewriting user queries according to existing source descriptions. The approach has been validated by the ST YX prototype.
Conference Paper
The tree-based languages XQuery and XSLT for XML are widely supported. Many tools do not yet support the new RDF graph query language SPARQL. We propose to embed SPARQL subqueries into XQuery/XSLT, such that XQuery and XSLT benefit from the graph query language constructs of SPARQL, and SPARQL benefits from features of XQuery/XSLT, which SPARQL does not support. The embedding enables XQuery/XSLT tools to handle at the same time XML queries and SPARQL subqueries, and XML and RDF data.
Conference Paper
While XPath is an established query language developed by the W3C for XML, SPARQL is a new query language developed by the W3C for RDF data. Comparisons between the data models of XML and RDF and between the query languages XPath and SPARQL are missing. Since XML and XPath are earlier recommendations of the W3C than RDF and SPARQL, currently more XML data and XPath queries are used in applications. However, recently available SPARQL query evaluators do not deal with XML data and XPath queries. We have developed a prototype for translating XML data into RDF data and embedding XPath queries into SPARQL queries for the following two reasons: 1) We want to compare the XPath and XQuery data model with the RDF data model and the XPath query language with the SPARQL query language in order to show similarities and differences. 2) We want to enable SPARQL query evaluators to deal with XML data and XPath queries in order to support XPath processing and SPARQL processing in parallel. We have developed a prototype for the source-to-source translations from XML data into RDF data and from XPath queries into SPARQL queries. We have run experiments to measure the execution times of the translations, of XPath queries and of their translated SPARQL queries. 1