ArticlePDF Available

Survey of directly mapping SQL databases to the Semantic Web

Authors:

Abstract and Figures

The Semantic Web anticipates integrated access to a large number of information sources on the Internet represented as Resource Description Framework (RDF). Given the large number of websites that are backed by SQL databases, methods that automate the translation of those databases to RDF are crucial. One approach, taken by a number of researchers, is to directly map the SQL schema to an equivalent Web Ontology Language (OWL) or RDF Schema representation, which in turn, implies an RDF representation for the relational data. This paper reviews this research, and derives a consolidated, overarching set of translation rules expressible as a stratified Datalog program. We present all the possible key combinations in an SQL schema and consider their implied semantic properties. We review the approaches and characterize them with respect to the scope of their coverage of SQL constructs.
Content may be subject to copyright.
1
Survey of Directly Mapping SQL Databases to the Semantic Web
JUAN F. SEQUEDA1, SYED HAMID TIRMIZI1, OSCAR CORCHO2 AND DANIEL P. MIRANKER1
1Department of Computer Sciences, The University of Texas at Austin, Austin, USA
2Ontology Engineering Group, Universidad Politecnica de Madrid, Madrid, Spain.
Abstract
One of the most relevant aspects to be considered to ensure the success of the Semantic Web is the availability of semantic data. Since the
majority of current websites are backed by relational databases, the transformation of this data Semantic Web languages is a great opportunity
and is being currently addressed in Linked Data approaches. There are two main approaches for this data transformation task: a) to rely on
automatic methods that derive an ontology from a SQL schema and transform data according to this schema, b) to let an expert create a mapping
between the SQL schema and an existing ontology, which is used to make the transformation into RDF. In this paper we survey the first set of
approaches according to a common framework.
Keywords: Semantic Web, Relational Databases, Ontology, OWL, RDF, RDFS
1. Introduction
A goal of the Semantic Web, also known recently as the Web of Data, is to enable web-wide integration of data
coming from various sources. The architecture proposed for the Semantic Web [ref w3c] proposes a standardized
data model for all data types (RDF, which encodes an arbitrary labeled directed graph), adding metadata encoded as
an ontology (in RDFS or OWL), and linking the data to the ontologies, and to other data across the web. A
standardized service protocol (SPARQL) enables the web to appear as a giant database and allow distributed
execution of queries (Hartig et al., 2009). In 2007, He et al. determined that Internet accessible databases contained
up to 500 times more data compared to the static Web and roughly 70% of websites are backed by relational
databases (He et al., 2007). The quantity of data suggests the success of the Semantic Web depends on developing
methods for making relational databases accessible to the Semantic Web. The number of web sites suggests that it
is paramount to develop automatic methods for integrating relational databases with the Semantic Web.
By the end of 2009, there were approximately 10 billion RDF triples published on the web coming from more
than 150 different datasets. Most of this data are originally stored in relational databases and have been manually
mapped to RDF. In response to this, the World Wide Web Consortium (W3C) has formed the Relational Database to
RDF Working Group, which is chartered to recommend a standard mapping language between the relational data
model and the Semantic Web languages.
Broadly stated, there are two architectural approaches to integrating relational databases with the Semantic
Web. In the first approach, which is the subject of this survey, the database content and its SQL schema are mapped
directly to representations in Semantic Web languages i.e. RDFS or OWL (Figure. 1) (Astrova 2004, Astrova et al.,
2007; Buccella et al., 2004; Li et al., 2005; Shen et al., 2006; Stojanovic et al., 2002; Tirmizi et al., 2008). The first
effort to exploit domain semantics from SQL to RDFS was by Stojanovic et al. (2002). The second approach
considers existing domain ontologies. Such systems include an explicit mapping language for mapping database
schema to ontologies. The ontology may be either a shared global or local ontology (Figure 2). (An et al., 2005; An
et al., 2006; Barrasa et al., 2004; Barrasa et al., 2006; Bizer et al., 2004; Chen et al., 2006; Duo et al., 2006; Korotkiy
et al., 2004; Svihla et al., 2004; de Laborda et al., 2006; Laclavik 2006; Xu et al., 2006; Arens et al., 1993; Collet et
al., 1991)
Direct mapping methods are intrinsically automatic. In each of the papers surveyed the database’s SQL schema
is mechanically transformed into an ontology, usually through the use of translation rules, and the relational data is
exposed as RDF instances of the generated ontology. We are careful to say SQL schema as the transformation
methods usually consider primary and foreign key constraints, and can include column and table constraints (e.g.
age > 0). Relational schema often means only table and column. We call the result of a syntactic transformation of a
SQL schema to an ontology a putative ontology.
Criticisms of direct mapping are two-fold: i) the quality of the putative ontology will depend on the quality and
sophistication of the data modeling effort that created the SQL schema and ii) the putative ontology will still require
integration with some global domain ontology if larger-scale interoperability is sought. While the former issue has
merit, it is not true that SQL schema cannot encode application semantics. We, and others have found that if a
database application was developed with attention to E-R or UML modeling tools, then the quality of a putative
2
ontology may replicate the quality of initial logical data model (Lubyte & Tessaris 2009). The work on direct
mapping by Lubyte et al starts with the assumption that the database has been so created.
Direct mapping shifts the mapping problem from a relational schema-to-ontology matching problem to an
ontology-to-ontology matching problem. Even in isolation of relational databases, the successful creation of the
Semantic Web will see the development of successful ontology-to-ontology mappings (Vrandecic & Sure, 2008).
Thus, if the Semantic Web enjoys larger success, direct mapping of relational databases will be sufficient. One can
also anticipate that matching putative ontologies to domain ontologies as a specialized and easier instance of
ontology-to-ontology mapping (Sequeda et al., 2008).
Figure. 1 Direct Mapping of a Relational Database to the Semantic
Web
Figure. 2 Relational Database to Ontology Mapping
A series of examples follow to demonstrate the capability of a SQL database to encode domain semantics using
SQL-DDL and at the same time to expose downside of this approach. If we take a relational database schema, one
can observe the implicit domain semantics that are represented in that schema. We use a University’s relational
database as a running example (Figure. 3). The schema defines table structures concerning students, their
enrollment, and courses being offered by a specific department.
Figure. 3. Example of University’s relational database
By looking at the SQL schema, simple analogies can be observed between the relational schema and an ontology
and possible transformations can be easily generated. The central focus of direct mapping efforts concerns the
semantic interpretation of the combinations of primary and foreign keys. Any direct mapping system must anticipate
every possible association that might be formed by a foreign key and the implication of every possible composition
of those constraints. In section 4 we develop an abstract SQL grammar capable of generating all possible primary
and foreign key combinations, of which there are 10. As we survey the work on direct mapping, we will use the
coverage of these 10 cases as one of features we use to assess the efforts.
For example, if we take a closer look at the relation between Course and Department, a foreign key between these
two relations exists, encoding a one-to-many relationship between the relations. For a user, it is understandable that
this relationship means that a Course is offered by a Department. This exact knowledge can be represented in an
ontology. Figure. 4 shows the SQL-DDL of the relationship between Course and Dept through a foreign key, and its
equivalent graphical ontological representation, with two classes, Course and Department, and a property,
“offeredBy”, that relates the two classes, having as the domain the Course class and as a range the Department class.
3
This is a simple example of how explicit semantics that are represented in an ontology are represented implicitly in a
relational schema. As a result, some examples in a relational schema can have obvious transformations to an
ontology.
CREATE TABLE COURSE{
CID INT PRIMARY KEY,
NAME VARCHAR NOT NULL,
DEPT INT,
CONSTRAINT DEPT_FK FOREIGN KEY(DEPT)
REFERENCES DEPT(DID)}
CREATE TABLE DEPT{
DID INT PRIMARY KEY,
NAME VARCHAR NOT NULL}
Figure. 4 SQL-DDL and Ontology for Course and Dept
However, just because in this example a transformation can be done does not mean that all examples of a syntactic
form work the same way. One particular example is inheritance. When a person looks at the content and structure of
the database, it can be obvious that it maps to an inheritance relationship, but the same syntactic construct (physical
representation) can represent two different logical relationships. Figure 5 illustrates an example representing
inheritance between STUDENT and PERSON using a Foreign Key as a subset of a Primary Key.
CREATE TABLE PERSON{
ID_PERSON INT PRIMARY KEY,
NAME_PERSON VARCHAR NOT NULL}
CREATE TABLE STUDENT{
ID_STUDENT INT,
ID_PERSON INT,
DEGREE_STUDENT VARCHAR NOT NULL,
CONSTRAINT PERSON_FK FOREIGN
KEY(ID_PERSON) REFERENCES
PERSON(ID_PERSON)
PRIMARY KEY(ID_STUDENT, ID_PERSON)
Figure. 5 SQL-DDL and Ontology for Person and Student
In this case, the table PERSON becomes the class PERSON and the table STUDENT becomes a subclass
PERSON. Due to the fact that one of the primary keys of the table STUDENT is also a foreign key to the table
PERSON, one can say that this is an inheritance.
However, this same syntactic construct, which represented inheritance in the previous example may model a
different type of relationship (Figure. 6). Therefore it is not correct to state that this specific syntactic construct
represents inheritance.
CREATE TABLE ORDER{
ID_ORDER INT PRIMARY KEY,
ORDER_NAME VARCHAR NOT NULL}
CREATE TABLE ORDERITEM{
ID_ORDER INT,
ID_ITEM INT,
ITEM_NAME VARCHAR NOT NULL,
CONSTRAINT PERSON_FK FOREIGN
KEY(ID_ORDER) REFERENCES ORDER(ID_ORDER)
PRIMARY KEY(ID_ORDER, ID_ITEM)
Figure. 6 SQL-DDL and Ontology for Order and Order Item
4
The previous example models the relationship between an ORDERITEM, which is part of an ORDER, which
does not represent inheritance. Thus, it is easy to make mistakes while thinking that some specific syntactic
construct may represent a specific ontological construct. Hence, not all explicit syntactic constructs may be correctly
transformed. When viewed together, the papers reviewed in this survey enumerate a large number of alternatives.
Building on these, we present a consolidated direct mapping system (Section 6). The consolidated system covers all
possible associations formed by foreign key relationships and does not create ontology content when the physical
model may be ambiguous.
Before presenting the surveyed work, Section 5, we first present a generic mapping of components of SQL-DDL
to layers in the Semantic Web layer cake, Section 2. Part of that looks into the evolution of the SQL language as a
basis of understanding how domain semantics can be encoded in a SQL model. In addition it creates a context for
assessing the individual efforts and their respective transformation system. In Section 3 we present transformation
patterns and acknowledge other “hard examples” which include the identification of symmetric and transitive
properties, the use of OWL’s someValuesFrom and allValuesFrom and the effects of Open and Closed World
Assumptions.
2. SQL-DDL and the Semantic Web Languages
SQL standards, including its data definition language (DDL), have been evolving since the first proposal in 1986.
For each version, SQL 86-89, 92, 99, 2003, the SQL language standard changes significantly from its predecessor,
increasing the expressiveness of the language each and every time (Pratt 1990, Pratt 1995, SQL3, Sequeda et al.,
2007). It is arguable that in the first standard implementations of SQL, the language was not expressive enough to
explicitly encode the database’s application semantics. Initially the development of the relational model was largely
concerned with the representation of data (Figure. 7a) as n-ary relations and the correctness of syntactic
transformations leveraging relational algebra (Meier 1983). A central concern was the theory of normal forms. A
primary goal was to reduce storage overhead. Similarly early SQL dialect had little concern with application specific
semantics.
Evident in the sequence of SQL standards are increasingly powerful elements for capturing application domain
semantics. These elements include consistency constraints, database views and triggers. The inclusion of trigger
definitions and view definitions in SQL was initially inspired by knowledge-base systems that capture and execute
both forward and backward rule systems (Stonebraker 1986). This is not to say that SQL-DDL is a knowledge
representation language; the semantics of most of its constructs are not directly comparable with constructs of an
ontology language. However, SQL DDL is capable of capturing some semantics of the domain modeled in a SQL
application. Such semantics can be extracted from the schema by identifying grammatical structure.
2.1 The Evolution of SQL Data Models
SQL is organized in three parts. The SQL data definition language (DDL), the SQL data manipulation language
(DML) and the SQL data query language (DQL). The DDL is used to define both the logical and physical
representation of data in a database. To date direct-mapping systems have been concerned with only the DDL. We
provide an overview of the DDL features as they have evolved and show they parallel the expressive hierarchy of
Semantic Web languages. Taking just some literary license, we can structure the chronological development of
increasingly powerful SQL dialects as layers of expressiveness that correspond precisely to the Semantic Web layer
cake.
The first SQL standard, SQL86-89 was limited. For the purpose of this paper, the following succinct description
suffices; starting from relational algebra, a standard set of data types were defined, abbreviated formal relational
notations were replaced full English words (SELECT, CREATE TABLE, etc), and the now familiar accoutrements
of human readable computer languages added (Figure. 7b). The often heard claim that SQL databases do not provide
for the encoding of data semantics is apt for this version of SQL (Pratt 1990).
SQL 92 added data integrity constraints: CHECK, PRIMARY KEY, FOREIGN KEY and UNIQUE [Pratt
1995]. This first extension already enables DBA’s to encode application semantics by connecting relations and
permitting us to know beforehand what values are allowed. In conjunction with the basic table definition of SQL 88,
this version of SQL is the one in common use today (Figure. 7c).
SQL 99 saw the addition of Triggers (SQL3). Triggers are forward-chaining rules whose predicates are
conditioned on changes to table content. Triggers are often used to maintain correctness and to implement
heterogeneous data integration (Ceri & Widom 1993). Figure. 7d is a trigger that maintains an invariant on the
5
database concerning salary inversion. SQL 2003 introduces XML-related features. We find, so far, that the XML-
related extensions enable only the encoding of syntactic properties specific to XML.
a) Relational Model
Before SQL
Employee (name, age)
b) Table Definition
SQL 86-89
CREATE TABLE employee (name VARCHAR(100), age INTEGER)
c) Constraints
SQL 92
CREATE TABLE employee(
name VARCHAR(100) PRIMARY KEY,
salary INTEGER NOT NULL,
type CHAR(8) CHECK(type IN ('TEMP','FULLTIME','CONTRACT'))
dept_name FOREIGN KEY (dept) REFERENCES department (name))
CREATE TABLE department(
name VARCHAR(100) PRIMARY KEY)
d) Triggers
SQL 99
CREATE TRIGGER sal_adjustment
AFTER UPDATE OF salary ON employee REFERENCING OLD AS OLD_EMP NEW AS NEW_EMP
FOR EACH ROW WHEN (NEW_EMP.SALARY > (OLD_EMP.SALARY *1.20))
BEGIN ATOMIC SIGNAL SQLSTATE ‘75001’(‘Invalid Increase: > 20%’); END
Figure. 7 Evolution of SQL DDL
2.2 SQL and the Semantic Web Layer Cake
Knowing the historical context, it is easy to decompose SQL-DDL into a stack representing increasing expressive
power. While analyzing the evolution of SQL-DDL with respect to the Semantic Web stack, we suggest a
correspondence between individual syntactic structures in SQL-DDL and the layers in the Semantic Web stack
(Figure. 8). A material result is a hierarchy of SQL-DDL grammars whose expressive power corresponds to the
stack of Semantic Web languages. This type of layer cake mapping methodology has been achieved between the
OBO (Open Biomedical Ontology) language and the Semantic Web stack (Tirmizi et al., 2009). We anticipate that
the organization of the Semantic Web with respect to the assignment of expressive power to each of its respective
layers embodies a larger organizing principal.
We observe that systems that translate SQL databases to the Semantic Web have different goals with respect to
how much of the expressive power of SQL-DDL targeted and similarly the amount of expressive power of the
Semantic Web is exploited. To be fair, the choices made by individual projects are at least as dependent on the
timeframe of the effort as they are on the research itself. OWL became a W3C recommendation in 2004, thus efforts
that predate that necessarily do not consider OWL although some of them consider other ontology languages. The
structure illustrated in SQL-DDL can express the domain semantics that the relational database represents by the
relationships between the relational database’s relations.
Figure. 8 Layer Cake Correspondence between SQL and the Semantic Web
We begin by demonstrating the implicit semantics that are in SQL-DDL and how they can be made explicit. An
example of a university SQL schema and its corresponding ontology is presented in section 3, thus demonstrating
how a SQL schema and ontology, of the same domain, are related. We continue to discuss incompatibilities between
relational databases and ontologies.
2.2.1 Relational Model and RDF: Mapping Relational Tuples to RDF
The first layer we encounter is the Relational Model, which predates SQL. This layer only expresses the syntactic
structure of the data stored in a relational database; therefore we can map it only to the data layer of the Semantic
6
Web: RDF. To export the content in a relational database to the Semantic Web, it is necessary to create an RDF
representation of this content.
RDF expresses data in triples as a labeled graph in the form of a Subject-Predicate-Object statement. The n-
tuples can be expressed as a series of triple statement in RDF. In Figure. 9, we show how the n-tuple can be
represented as an RDF graph by having the following triple statements: <T1, name, Juan> and <T1, age, 21>.
T1: Employee (Juan, 21)
Figure. 9 RDF representation of Figure. 7a
2.2.2 Table Definition and RDFS
The relational model offers sufficient semantics to create relations between the data. The first version of SQL
(SQL89), introduced the table definition, which creates a SQL schema for the data. Each column has a specific data
type. Likewise, RDFS is a schema for RDF data, which creates classes and properties.
An equally valid representation of a table definition is an n-dimensional graph, where n is the number of column
names. This graph representation is similar to RDFS, where it established a schema to represent data in RDF triple
form. Therefore, by utilizing this version of SQL, its domain semantics can be mapped only to the RDFS layer.
RDFS can represent the table definition (Figure. 7b) by having a class Employee with properties name and age. In
this example, the column name “age” becomes and rdf:Property as shown in Figure. 10.
Figure. 10 RDFS Representation of Figure. 7b
2.2.3 Constraints and OWL
Climbing the Semantic Web layer cake, more expressive power is encountered. OWL offers expressiveness that can
not be expressed in RDFS. Likewise, SQL99 and in conjunction with elements of SQL92, contains new components
that offer more domain semantics. By using PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE and
CHECK, domain semantics are more explicit. These semantics are richer than the basic table definition and
therefore should be mapped to a higher level in the Semantic Web layer cake. We have observed that the use of SQL
with all the possible constraints will map directly to OWL (hence the use of OWL class instead of RDFS class).
OWL is more expressive due to the fact that it can represent specific type of properties and restrictions. The main
similarity between database constraints and OWL is the possibility of using the database’s referential constraint to
establish relationships between tables as owl:ObjectProperty that connects objects. PRIMARY KEY, UNIQUE and
NOT NULL implies owl:FunctionalProperty and the enumerated CHECK constraint implies owl:oneOf (this will be
discussed in the next section).
Following the example in Figure. 7c, the schema and its constraints can be represented in OWL. In a general
assumption, a relation can be considered an OWL class except if the relation is a binary relation, therefore it is an
OWL object property. All attributes that are referential constraints are OWL object properties. Their domain is the
current relation and range is the referenced relation. Attributes that are not referential constraints are OWL data type
properties. In OWL, the “age” column name becomes an owl:DatatypeProperty.
2.2.4 Triggers and Rules
SQL99 introduces a new semantic component: triggers, as shown in Figure 7d. This new component can be
considered as business rules that guarantee the integrity of data in a relational database and derive new information
materializing it. Even though these rules are not used for inference, they may expose rules about the domain. For
example, the trigger in Figure. 7d, explains that the maximum salary increase is of 20%. Triggers like the previous
one may be analyzed to extract the rules and transform them to Semantic Web format. However, no direct mapping
approach integrates triggers in the transformation process. The integration of triggers into direct mapping is still an
open research question.
7
2.3 An Example of Direct Mapping Transformation
Consider a relational database in 3NF for a university (Figure. 11), where we show an example of how an ontology
can be extracted out of the SQL schema.
University Database Schema
create table PERSON {
ID integer primary key,
NAME varchar not null }
create table STUDENT {
ROLLNO integer primary key,
DEGREE varchar,
ID integer unique not null foreign key references PERSON(ID)}
create table PROFESSOR {
ID integer primary key,
TITLE varchar,
constraint PERSON_FK foreign key (ID) references PERSON(ID)}
create table DEPT {
CODE varchar primary key,
NAME varchar unique not null }
create table SEMESTER {
SNO integer primary key,
YEAR date not null,
SESSION varchar check in (‘SPRING’, ‘SUMMER’, ‘FALL’)}
create table COURSE {
CNO integer primary key,
TITLE varchar,
CODE varchar not null foreign key references DEPT(CODE)}
create table OFFER {
ONO integer primary key,
CNO integer foreign key references COURSE(CNO),
SNO integer foreign key references SEMESTER(SNO),
PID integer foreign key references PROFESSOR(ID),
CONO integer foreign key references OFFER(ONO)}
create table STUDY {
ONO integer foreign key references OFFER(ONO),
RNO integer foreign key references STUDENT(ROLLNO),
GRADE varchar,
constraint STUDY_PK primary key (ONO, RNO)}
create table REG {
SID integer foreign key references STUDENT(ID),
SNO integer foreign key references SEMESTER(SNO),
constraint REG_PK primary key (SID, SNO)
}
Figure. 11 Schema of a University Database
The PERSON table contains data about all the people, some of them may be students and present in STUDENT
table, and some may be professors and present in PROFESSOR table. The Dept table lists the departments in the
university where each department has a unique name, and the COURSE table lists the courses for every department.
The Semester table contains a list of semesters which have a year and one of the three seasons, Spring, Summer or
Fall, associated with them. A course could be offered in a particular semester with a particular professor, and
recorded in OFFER table.
Two offered courses could be co-offered, and recorded as a self-relation in the OFFER table. A student could
study an offered course, which is recorded in STUDY table. Also, a student could be registered in a semester with or
without taking a course, and this information is recorded in the Reg table.
For a domain expert, it is easy to recognize the concepts in this database structure that is in 3NF, and to identify
the semantics of their properties and different kinds of relationships that exist between these concepts. Figure 12
shows an ontology corresponding to the given schema, developed by a domain expert.
8
Domain Expert’s Ontology
Ontology(<urn:sql2owl>
ObjectProperty(<REG> domain(<STUDENT>) range(<SEMESTER>))
ObjectProperty(<REG_I> inverseOf(<REG>))
ObjectProperty(<COURSE.DEPTCODE> Functional domain(<COURSE>) range(<DEPT>))
ObjectProperty(<COURSE.DEPTCODE_I> InverseFunctional inverseOf(<COURSE.DEPTCODE>))
ObjectProperty(<OFFER.CONO> Transitive Symmetric domain(<OFFER>) range(<OFFER>))
...
DatatypeProperty(<COURSE.CNO> Functional domain(<COURSE>) range(xsd:integer))
DatatypeProperty(<SEMESTER.YEAR> Functional domain(<SEMESTER>) range(xsd:date))
DatatypeProperty(<SEMESTER.SESSION> Functional domain(<SEMESTER>)
range(oneOf("SPRING" "SUMMER" "FALL")) range(xsd:string))
...
Class(<PERSON> partial ...)
Class(<PROFESSOR> partial <PERSON> ...)
Class(<STUDENT> partial <PERSON>
restriction(<STUDY.RNO_I> minCardinality(0)) ...)
Class(<COURSE> partial restriction(<COURSE.DEPTCODE> cardinality(1))
restriction(<COURSE.CNO> cardinality(1)) ...)
...
)
Figure. 12 Parts of an ontology corresponding to the schema in Figure. 11, developed by a domain expert. The ontology is presented in OWL
Abstract Syntax.
It is apparent that an ontology does not correspond exactly to a relational database. Ontologies can and often
encode domain semantics that are not possible to be encoded in a relational database, or even in SQL-DDL.
Therefore, a translation of a SQL schema to an ontology will suffer of ambiguity, and several disparities, which are
explained in the following section.
3 Transformation Patterns between Relational Databases and Ontologies
Relational databases are capable of efficiently managing large amounts of structured data. Ontologies are very
useful for knowledge representation. Since these two models are aimed towards different requirements, it is
reasonable to expect some disparities among them in terms of basic assumptions. It is important to understand the
mismatches and to make educated choices when confronted with such problems. In the following sections, we
discuss some key issues that affect a direct transformation system. We show an example of how different
normalizations can affect the ontology that is being derived from the database. It follows by a discussion on why it is
hard to identify inheritance and property characteristics in relational schemas, followed by the effects of identifying
different types of relationships and constraints. Then we discuss the effect of the open world assumption in
ontologies, when a relational database with a closed world assumption is translated into an ontology.
3.1 Basic Transformations
Several components of relational databases can be seen as obvious transformations to an ontology. In most cases, a
relation is mapped to an ontological concept unless it is a binary relation. In this case, the binary relation would be
mapped to a property that connects two concepts. A foreign key connects two relations, therefore, the foreign key is
mapped to a property. All attributes that are not foreign keys are also mapped to properties. All direct mapping
systems follow these basic transformation patterns. As a result, we believe that this overlap indicate that the majority
of direct mapping approaches share rules that can be standardized.
9
Table 1. Summary of Relational Database direct mapping to RDFS and OWL
Relational Database
RDFS Ontology
OWL Ontology
Non-binary Relation
RDFS Class
OWL Class
Binary Relation
RDFS Property
OWL Object
Property
Column
RDFS Property
OWL Datatype
Property
Foreign Key
RDFS Property
OWL Object
Property
3.2 Database Normalization
Normalization is the process of organizing data in a database. The objective of normalization is to simplify the
maintenance of data integrity and reduce the amount of space used for the database. A common practice is to design
a relational database in Third Normal Form (3NF). In real-life scenarios, one may come across different types of
normalizations. For example, schemas may be de-normalized for performance reasons. Different normalizations
may lead to question the putative ontology that can be obtained from them. The same way a relational database can
be normalized in different manners, an ontology can be modeled in different ways. However, there is no right or
wrong way of modeling an ontology because it may depend on the context and/or the domain expert. Nevertheless,
when creating a putative ontology, we need to analyze the type of ontology that is derived depending on the
database normalization. The following examples show how the derived ontology appears if a database is in three
different normal forms, 1NF, 2NF and 3NF. Consider the following database in 1NF:
Table 2. Example of a database in 1NF
Author
AuthorID
AuthorName
1
John Smith
2
Jane Doe
Books
ISBN
Title
Publisher
City
Subject
AuthorID
123-456
ABC
Publisher
XYZ
Dallas
Science
Fiction
1
123-456
ABC
Publisher
XYZ
Dallas
Science
Fiction
2
987-654
XYZ
Publisher
ABC
Austin
Auto-
biography
2
Table 2 shows a database in 1NF. There is a table for Authors and Books. The Books table has a foreign key that
points back to the Author table. The corresponding ontology that could be derived from a database in 1NF is shown
in Figure 13.
Figure. 13 Ontology derived from database in 1NF from Table 2
The same way the database lacks higher normalization; a domain expert can consider that the ontology could be
modeled in a better way. For example, the publisher should be a class, and not just literal values. Now, lets consider
the database in 2NF, as shown in the following table
10
Table 3. Example of a database in 2NF
Author
AuthorID
AuthorName
1
John Smith
2
Jane Doe
Books
ISBN
Title
PublisherID
SubjectID
AuthorID
123-456
ABC
2
1
1
123-456
ABC
2
1
2
987-654
XYZ
1
2
2
Subject
SubjectID
Subject
Type
1
Science
Fiction
Fiction
2
Auto-
biography
Non-
Fiction
Publisher
PublisherID
Publisher
City
1
Publisher ABC
Austin
2
Publisher XYZ
Dallas
Figure. 14 Ontology derived from database in 2NF from Table 3
The derived ontology is shown in Figure 14, which shows two new classes. These can automatically generated if the
database is in 2NF. However, if we further normalize the database to 3NF, as shown in Table 4, the derived
ontology is the same ontology that is generated from the database in 2NF, because the table Authorship is
transformed into a relationship between Author and Books.
Table 4. Example of a database in 2NF
Author
AuthorID
AuthorName
1
John Smith
2
Jane Doe
Books
ISBN
Title
PublisherID
SubjectID
123-456
ABC
2
1
987-654
XYZ
1
2
Subject
SubjectID
Subject
Type
1
Science
Fiction
Fiction
2
Auto-
biography
Non-
Fiction
Publisher
PublisherID
Publisher
City
1
Publisher
ABC
Austin
2
Publisher
XYZ
Dallas
Authorship
ISBN
AuthorID
123-456
1
123-456
2
987-654
2
This example does not prove that a putative ontology from a 2NF or 3NF database is the same. However, it does
suggest that a putative ontology from a database may depend on the normalization.
A relational database can be represented in a logical model (UML, ER) or a physical schema (SQL DDL). As a
good software engineering practice, it is quite common to develop logical models for the relational schema, which
then gets translated to a physical schema. Even after deployment, a database undergoes modifications due to
changing application requirements. Such modifications are often not reflected in the logical model. Therefore, the
physical schema, easily expressed in SQL DDL, becomes the most accurate source for the structure of the database.
Direct mapping approaches usually assume that the input database is in 3NF, even though they can still be applied if
they are not in this normal form.
11
3.3 Inheritance Modeling
Given that a SQL schema contains relationships between entities that can be expressed using only foreign keys, we
find it necessary to identify patterns of foreign key definition that can express only the inheritance relationships. In
other words, given a foreign key definition between two entities, is it possible to say that a subclass relationship
exists between the entities involved? If such patterns exist, we can map them to subclass relationships in the
ontology.
Relational databases do not provide a mechanism to express inheritance. However, inheritance hierarchies can be
modeled in a variety of ways in SQL schemas. Our university schema example shows two different modeling
choices for inheritance. STUDENT and PROFESSOR entities are subclasses of PERSON, even though the
relationships have been modeled differently. We present some inheritance modeling possibilities through possible
foreign key patterns and discuss why some inheritance modeling choices are harder to identify automatically.
Foreign key is also the primary key: An example of this case is the PROFESSOR-PERSON relationship in our
university schema. This is a common pattern that identifies inheritance. An exception to this would be vertical
partitioning of tables for performance reasons, as in some data warehousing applications. However, with our
requirement of 3NF, such a scenario would not occur. Therefore, most direct mapping approaches automatically
identify inheritance modeled in this way.
Foreign key and primary key are disjoint: The STUDENT-PERSON relationship in our university schema is an
example of this pattern. This pattern does not uniquely identify inheritance, and therefore cannot be automatically
translated into an inheritance hierarchy in an ontology. A counterexample is the Course-Dept relationship
modeled in the same schema. In fact, this pattern is the most common one used for expressing one-to-many
relationships.
Foreign key is a subset of the primary key: This is another option for modeling inheritance in a relational
database. However, other relationships can also be modeled this way, and therefore it is not a good candidate for
automatic translation to an inheritance hierarchy in the ontology. A counterexample for this pattern is:
ORDER(ONo) ONo is the primary key
ORDERITEM(ONo,INo) (ONo,INo) is the primary key, ONo is a foreign key to Order
In a business domain, the relationship most likely means that an order item is a part of an order, instead of
representing an inheritance relationship between the two entities.
3.4 Symmetric and Transitive Relationships
SQL lacks the expressive power to enforce symmetry and transitivity in relationships. Expressing such properties of
relationships is natural to ontology languages like OWL.
The self-relation on the OFFER entity, that represents co-location of an offered course with another offered
course, has interesting characteristics. First, it is symmetric, because if an offered course A is co-located with an
offered course B, it means B is co-located with A as well. And second, it is transitive, which means that if A is co-
located with B, and B is co-located with C, then A is co-located with C.
Such a self-relationship may or may not imply a symmetric or transitive relationship. Consider the example:
Employee(ID,Name,MgrID), where ID is the primary key, and MgrID is a foreign key to the Employee table itself
that captures the manager’s ID. Clearly, this relationship is not symmetric, because if John is the manager of Peter,
that rules out the possibility of the same Peter being the manager of the same John.
Depending upon the domain semantics, the relationship may or may not be transitive. If an employee’s manager
means any other employee higher in the organization, then being a manager is a transitive relationship. If it means
only the immediate supervisor, then it is not a transitive relationship. The example shows that it is hard to identify
logical characteristics of relationships in a SQL schema without using the domain knowledge.
3.5 Value Constraints
The OWL language has a set of value constraints which puts constraints on the range of a property when applied to a
particular class description. The three value constraints are allValuesFrom, someValuesFrom and hasValue. The
OWL allValuesFrom is analogous to the universal quantifier of predicate logic while, the OWL someValuesFrom is
analogous to the existential quantifier of predicate logic. Consider the following example:
Employee(id, id_role) Admin(id, id_role) Role(id_role,role_name)
12
Employee(id_role) and Admin(id_role) are foreign keys to Role(id_role). One can consider this example as
having two different relationships: Employee has a role and an Admin has a role. Therefore, two Object Properties
would be created. Nevertheless, if both object properties result in meaning the same thing, then it would be
sufficient to have only one Object Property and have an owl:allValuesFrom with the union of the Employee and
Admin table. It is unclear on how to determine if these two Object Properties are the same.
Different value constraints may lead to different interpretations. For example, if we consider an Open World
Assumption, then a property should only have a domain and not a range. If a property has a domain and a range,
then we are considering a Closed World Assumption. We could tighten the closed world in a property by not only
having a domain and range, but also having a allValuesFrom constraint.
3.6 Check Constraint
Check constraints are conditions that validate the data that is being inserted in a table. The enumerated CHECK
constraint can be represented in OWL using owl:oneOf and rdf:List. For example, if we take attribute Session from
the Semester table in Figure 12; its representation in OWL would be the following:
oneof(‘Spring’ ‘Summer’ ‘Fall’)
Nevertheless, the check constraint could represent inheritance. In this case, a “Spring Session” is also a type of
“Session”. On the other hand, OWL is not expressive enough to represent all the possibilities that the semantics of
CHECK offers. Consider having a CHECK constraint to guarantee that the value of a column degree is between 0
and 360. This type of constraint cannot be represented in OWL.
3.7 Open World vs. Closed World
Relational databases usually operate under the closed world (CW) assumption. This means that whatever is not in
the database is considered false. CW is essential for important database concepts like integrity constraints and data
validation (Drummond & Shearer 2006). On the other hand, a knowledge-base community like the Semantic Web
has an open world (OW) approach where whatever is not in the knowledge base is considered unknown. This
assumption is natural for knowledge bases that often contain incomplete knowledge, and grow with the manual
discovery of new domain knowledge and automatic inferences.
Due to this difference, the concept of a constraint has very different meanings in the two worlds (Motik et al.,
2007). In a database setting, a constraint is mainly used for validation and prevents incorrect data from entering the
database. In contrast, in an ontology, a constraint expresses some characteristics of classes or relationships but does
not prevent assertion of any facts. Due to these constraints, some assertions may even result in unintuitive
inferences.
Consider this example: In our university database, the relationship between a course and a department is
expressed by a foreign key constraint. The foreign key constraint will allow the Course relation to have the record
(CS386,CS,Databases) where CS is a department. However, the record (CS386,John,Databases) John is a student
will not be allowed because it violates the integrity constraint. In the ontology, the same constraint appears as
object property CODE, with range restriction of Dept on the relationship. Unlike database constraints, the ontology
constraint will not only allow the assertion of the triple CODE(CS386,John), it will also infer that John is an
instance of Dept, because of the range restriction on CODE property.
While there are obvious differences between closed and open worlds, open worlds can be closed by explicitly
stating required negations. OWL provides a way to state such facts by providing constructs to express disjoints.
When developing an ontology based on a SQL schema, it is very important to keep these differences in mind.
The question of whether the open world should be closed or not, depends upon the domain and application
requirements. If the ontology is for use within a particular application, it might make sense to close the world,
whereas in a data integration setting, it might make more sense to have an open world.
4 Key Combinations of a Direct Mapping Transformation
Since we are inferring semantic properties formed by foreign keys, we have exhaustedly considered all possible
primary and foreign key combinations. This issue is complicated by the possible presence of compound keys (keys
composed of multiple attributes) and the possibilities of multiple overlapping keys.
13
We present all possible combinations of primary and foreign keys in a SQL DDL to ontology transformations
where the rules of the transformation system cover the entire range of possible relations that can be described in a
SQL schema. While it is trivial to translate relations into ontology classes, the existence of foreign keys represents
relationships between corresponding classes, and poses a challenge for any transformation system. Multiple foreign
keys may be present in a table, and each of them may be in a different form, representing different kinds of
relationships, e.g. one-to-one, one-to-many etc., between the entities. The interaction of the foreign keys with
primary keys provides clues to the properties of these relationships.
Theorem 1
The space of relations describable in SQL DDL using various combinations of primary key and foreign key
references between the relations can be partitioned into 10 disjoint cases of key combinations.
The proof involves a syntactic enumeration of the cases and a closure operation over the space of relations. Fig15
provides a useful summary of the theorem and its proof.
Proof: Briefly, we first partition the space by examining the number of foreign keys contained in relations. All
relations without any foreign keys can be easily translated into classes in an ontology. Similarly, relations with more
than two foreign keys usually represent N-ary relationships, and the rules for N-ary relationships are applicable. The
cases for one or two foreign keys are more interesting and give rise to more possibilities including binary relations,
inheritance and new classes. However, for each possible branch, we have carefully defined sets of rules for
producing ontology classes and properties.
A table in a relational database can have either a Primary Key and/or Foreign Key. Both keys can include a number
of attributes, x, where x = 0…n. There can only be one Primary Key in a table, whereas there can be one or more
Foreign Keys. Therefore, our initial starting point is:
1. PK: a relation only has a Primary Key
2. C-PK: a relation only has a composite Primary Key
3. S-FK: a relation only has one Foreign Key
4. N-FK: a relation has at least two or more Foreign Keys
This can be represented in the following grammar:
E PK + T | C-PK +T
E S-FK
E N-FK
T S-FK | N-FK
Space of
relations
0 FKs
FK=PK
FKPK
i FKi=PK
¬(i FKi=PK)
i FKi=PK
FKiFKj=PK (ij)
Otherwise
Has non-FK attributes
All attributes in FKs
All attributes in FKs
Has non-FK attributes
1 FK
2 FKs
>2 FKs
Fig.15 The tree describes the complete space of relations when all possible combinations of primary and
foreign keys are considered
14
By applying the closure operation for LR(0) item sets (Aho et al., 2006), the following elements are obtained:
5. PK + S-FK: a relation has a Primary Key and only one Foreign Key
a) PK = S-FK: the Foreign Key is the Primary Key
b) PK S-FK = 0: the Foreign Key and the Primary Key do not share any attributes
6. PK + N-FK: a relation has a Primary Key and two (2) Foreign Keys
a) PK N-FK = 0: the Foreign Key and the Primary Key do not share any attributes
b) PK
N-FK: one of the Foreign Keys is also the Primary Key
7. PK + N-FK: a relation has a Primary Key and more than two (> 2) Foreign Keys
c) PK N-FK = 0: the Foreign Key and the Primary Key do not share any attributes
d) PK
N-FK: one of the Foreign Keys is also the Primary Key
8. C-PK + S-FK: a relation has a Composite Primary Key and only one Foreign Key.
a) C-PK S-FK = 0: the Foreign Key and the Primary Key do not share any attributes
b) S-FK
C-PK: the Foreign Key is part of the Primary Key
9. C-PK + N-FK: a relation has a Composite Primary Key and two (2) Foreign Keys
a) C-PK N-FK = 0: all the Foreign Keys and the Primary Key do not share any attributes
b) N-FK
C-PK: all the Foreign Keys are part of the Primary Key
c) C-PK N-FK
0, C-PK N-FK
0, N-FK C-PK
0: The Foreign Keys and Primary Key
share common attributes
10. C-PK + N-FK: a relation has a Composite Primary Key and more than two (> 2) Foreign Keys
d) C-PK N-FK = 0: all the Foreign Keys and the Primary Key do not share any attributes
e) N-FK
C-PK: all the Foreign Keys are part of the Primary Key
f) C-PK N-FK
0, C-PK N-FK
0, N-FK C-PK
0: The Foreign Keys and Primary Key
share common attributes
Therefore, a direct mapping transformation system should be complete with respect to all the cases of the theorem of
key combinations.
5 Direct Mapping Approaches
We will use the suggested correspondence between SQL and the Semantic Web as a basis for characterizing and
comparing the different mapping efforts. In addition, we will look to the 10 cases determined by the theorem of key
combinations presented in the last section to help organize the discussion. The coverage of the 10 cases by each
method is addressed in Section 7.
Some surveyed approaches are not formally presented, but rely heavily on examples and written descriptions of
the transformations; very much like the introduction of this paper. As a result, developing precise meanings is
sometimes difficult. We will acknowledge the overlap between the surveyed systems, particularly when such
overlap helped motivate the inclusion of a transform in the consolidated system. We will offer counter-examples to
any specific transformation where the syntactic underpinning of the the preconditions does not uniquely determine
the consequent. The analysis is summarized in Table 5. Some cells may be empty and this is because the system
does not offer any transformation from the given SQL construct. The work of Lubyte et al. is not shown in Table 7
because their input is not SQL-DDL and their output is not in RDFS or OWL.
It is important to note that the relational model of a relational database is the basis to obtain RDF content
(Figure. 9). Data is to RDF as schema is to ontology (RDFS or OWL). For there to be data, it has to be stored in a
relational database that is defined with a schema. Likewise, RDF data is an instance of a specific triple schema that
is part of an ontology. After creating an ontology from the database schema, the data in the relational database can
be extracted and transformed to ontological instances. All the following surveyed approaches accomplish this
objective.
The solution fails when the relational database is updated. Upon an update, the extracted data as ontological
instances will not represent the current status of the relational database. Another option involves translating
SPARQL (RDF query language) to SQL, where one would not have to have to separate sets of data: the relational
data and the ontological instances, instead, the SPARQL query can be translated into SQL, extract the data in the
relational database, and then translate it, using the ontology that was automatically created, and return the data in
RDF format.
15
Table 5. Output of each direct mapping approach given the SQL input
Stojanovic et al
Astrova et al
2004
Buccella et al.
Li et al.
Shen et al.
Astrova et al 2007
Consolidated
System
Ontology
Language
RDFS / F-Logic
RDFS / F-Logic
OWL
OWL
OWL
OWL Full
OWL DL
<table name>
RDFS Class
RDFS Subclass
RDFS Property
RDFS Domain
RDFS Range
RDFS Class
RDFS Subclass
RDFS Property
RDFS Domain
RDFS Range
OWL Class
RDFS Subclass
OWL
ObjectProperty
RDFS Domain
RDFS Range
OWL Class
RDFS Subclass
OWL
ObjectProperty
RDFS Domain
RDFS Range
OWL Class
RDFS Subclass
OWL
ObjectProperty
RDFS Domain
RDFS Range
OWL Class
RDFS Subclass
OWL
ObjectProperty
RDFS Domain
RDFS Range
OWL Class
RDFS Subclass
OWL
ObjectProperty
RDFS Domain
RDFS Range
<column name>
RDFS Property
RDFS Domain
RDFS Property
RDFS Domain
OWL Datatype
Property
OWL Functional
Property
OWL
allValuesFrom
OWL Datatype
Property
OWL Object
Property
RDFS Domain
OWL Datatype
Property
OWL Object
Property
RDFS Domain
OWL Datatype
Property
OWL Object
Property
RDFS Domain
OWL Datatype
Property
OWL Object
Property
OWL Functional
Property
RDFS Domain
<data type>
RDFS Range
RDFS Range
OWL Functional
Property
OWL
AllValuesFrom
XML Schema
Datatypes
RDFS range
RDFS range
OWL Maximum
Cardinality of 1
RDFS range
OWL Maximum
Cardinality of 1
OWL Functional
Property
RDFS range
XML Schema
Datatypes
<referential
constraint
definition>
RDFS Property
RDFS Property
OWL Object
Property
OWL Functional
Property
OWL InverseOf
OWL Object
Property
OWL Minimum
Cardinality of 1
OWL Maximum
Cardinality of 1
OWL Object
Property
OWL Minimum
Cardinality of 1
OWL Maximum
Cardinality of 1
OWL
AllValuesFrom
OWL Object
Property
OWL Cardinality of
1
OWL Symmetric
Property
OWL Transitive
Property
OWL Object
Property
OWL Functional
Property
OWL InverseOf
OWL Minimum
Cardinality of 0
<primary key
definition>
F-Logic axioms
F-Logic axioms
OWL Cardinality
of 1
OWL Minimum
Cardinality of 1
OWL Maximum
Cardinality of 1
OWL Minimum
Cardinality of 1
OWL Maximum
Cardinality of 1
OWL Inverse
Functional Property
OWL Minimum
Cardinality of 1
OWL Cardinality of
1
OWL Functional
Property
16
<unique
constraint
definition>
F-Logic axioms
F-Logic axioms
OWL Maximum
Cardinality of 1
OWL Maximum
Cardinality of 1
OWL Inverse
Functional Property
OWL Inverse
Functional Property
<not null
constraint
definition>
F-Logic axioms
F-Logic axioms
OWL Cardinality
of 1
OWL Minimum
Cardinality of 1
OWL Minimum
Cardinality of 1
OWL Minimum
Cardinality of 1
OWL Cardinality of
1
<check
constraint
definition>
OWL Datatype
Property
OWL hasValue
OWL oneOf
OWL Datatype
Property
OWL oneOf
17
Stojanovic et al.
Stojanovic et al. published the first approach that aimed at extracting domain semantics from SQL-DDL and
transforming it to a Semantic Web representation (Stojanovic et al., 2002). Their primary contribution is an
approach for an (automated) migration of data-intensive websites into the Semantic Web”. It provides rules for
translation of relational schemas to F-Logic and RDF Schema. Their rules consist of creating classes, subclasses and
properties. The only constraints that Stojanovic utilizes are Foreign Keys, which are used to create ontological
relationships. Stojanovic et al.’s work formally defines rules for identification of classes and properties in relational
schemas; the target language is RDF Schema. Its mapping process consists of four steps:
First, capture information from a relational schema through reverse engineering (relations, attributes, attributes
types, primary keys, and foreign keys/inclusion dependencies). Continue by mapping database entities into
ontological entities by a set of mapping rules by alignment of top level terms (which relation name corresponds to
which concept name), relations to concepts and then relation attributes to concept attributes. The mapping rules
consist of identifying Concepts, Inheritance and Relationships. A relationship is a concept except if the relation
expresses a many-to-many relationship. Furthermore, if information is spread across several relations; in this case all
the relations can be integrated into one concept. Inheritance is an inclusion dependency between two relations and
both relations are concepts. This rule conflicts when information is spread across several relations. In this case, the
user must decide which rule to apply, thus making this effort semi-automatic. Figures 16 and 17 represent the
translation of SQL-DDL to RDFS. In this example, two tables that share the same primary key could be mapped
either to one same concept or to subclass relationship. The system depends on a person to resolve the ambiguity. In
the figure, it is mapped to a subclass relationship.
create table STUDENT {
STUDID integer primary key,
NAME varchar}
create table PHDSTUDENT {
STUDID integer primary key,
YEAR date not null }
Figure. 16 SQL-DDL representation of Student and PhD Student
Figure. 17 RDFS result of translating the SQL-DDL in Figure. 15
with Stojanovic et al.’s rules and the SQL-DDL in Figure. 17 with
Astrova et al.’s rules
A many-to-many relationship, one-to-many relationship and one-to-one relationship are converted into properties. If
n-ary relationships exist, these must be transformed into further concepts and a set of binary relations. After the
mappings are manually evaluated, validated and refined, the data can be migrated by creating ontological instances
based on tuples of the relational database. The process consists of two steps: create the instances with a unique
identifier and translate all the attributes, except for foreign keys and then establish relations between instances using
the information contained in the foreign keys in the database tuples
Astrova et al 2004
Astrova et al’s initial approach (Astrova 2004) is very close to (Stojanovic et al., 2002). The motivation for their
approach “is to migrate from data-intensive Web pages into the Semantic Web”. It is also based on a reverse
engineering approach using SQL-DDL as the relational database model and transforming it into an RDFS ontology.
They differentiate themselves from Stojanovic et al. by stating that theyassume data equality and data inclusion,
thus being able to extract only a subset of semantics embedded within a relational database”. Astrova et al.
acknowledges that “hidden” semantics can be discovered by analyzing data overlap (intersection) and data
disjointedness (no intersection).
Astrova et al.’s transformation systems analyzes the key, data and attribute correlations to extract conceptual
schema, which expresses the semantics of a relational database. This process uses a relational database in third
normal form (3NF) and consists of the following classifications and mappings. Firstly, relations can be classified as
either Base (independent of other relations), Dependent (the primary key of a relation depends on another relation,
meaning that it is also a foreign key) or Composite (not Base or Dependent). Secondly, all relations are concepts
except if they are a Composite relation. In this case, it can be mapped to a concept, property or an inheritance. N-ary
18
relations become concepts. Third, all attributes are properties except if they are foreign keys or primary keys and
foreign keys. Then they are mapped to relationships.
Relations can be mapped by a combination of key, data and attribute correlation. Analyzing the different foreign
and primary key combinations of attributes lead to determine if a relation gets mapped to a concepts, properties or
relationships. When the key of the relation and its data are equal across two different relations, but the attributes
differ (Key and Data equality and Attribute disjointedness), the relationship expresses vertical partitioning and both
relations become one concept. However, when the key and attributes are equal across two different relations but the
data is different (Key and Attribute equality and Data disjointedness), the relationship expresses horizontal
partitioning and both relations become one concept. Furthermore, when the keys between two relations are equal
and the data of one relation is included in the other relation (Key equality and Data inclusion), the relationship
between both relations expresses single inheritance. For example, the SQL-DDL in Figure 18 would be transformed
to the RDFS in Figure 17. If the keys are equal across two relations and there is data overlap and data inclusion
between both relations (Key equality, Data overlap and Data inclusion), a new concept is discovered which would
express multiple inheritance. Figure 16 is an example of the SQL-DDL that represents key equality, data overlap and
data inclusion and it would be transformed to the RDFS in Figure 19. As it can be seen, Figure 19 creates a class that
is not necessary, therefore this rule may lead into ambiguity. Moreover, a new concept is also discovered if the keys
are equal, the data is disjoint and the attributes overlap between two relations. In addition, equal keys, data and
attributes between relations expresses a diamond-shape inheritance.
create table STUDENT {
STUDID integer primary key,
NAME varchar}
create table PHDSTUDENT {
STUDID integer primary key references STUDENT,
YEAR date not null }
Figure. 18 Another SQL-DDL representation of Student and PhD
Student
Figure. 19 RDFS result of translating the SQL-DDL in Figure. 15
with Astrova et al.’s rules
In summary, we have explained the two different approaches to transform SQL-DDL to RDFS. These
approaches are very similar because given the same SQL-DDL input, the resulting RDFS is the same. However
there are several mismatches that have been explained in the previous examples: different SQL-DDL can translate to
the same RDFS constructs and the same SQL-DDL can translate to different RDFS constructs. Since these systems
are not formalized transformation systems, they are prone to different interpretations and ambiguities.
Buccella et al
Buccella et al. published the first approach that takes SQL-DDL and transforms it to OWL. The transformation
system presented is not formalized but is based on expository examples. They cover five cases: tables without
constraint, tables with exactly one foreign key, tables with exactly two foreign keys, tables with more than two
foreign keys, and check constraints.
The first case translates tables without constraints are mapped to an OWL Class. All the attributes of the table
are mapped to OWL Functional Datatype Property. If the attribute is NOT NULL, it is mapped to a cardinality
constraint of 1. Primary Keys already have the NOT NULL implicit, therefore they are also mapped to a cardinality
constraint of 1. The Datatype properties do not have domain and range defined because they can be used by other
classes. Furthermore, the restriction owl:allValuesFrom is applied with a String. This approach would lead to
ambiguity. Take the following example in the following example.
create table PERSON {
ID integer primary key,
NAME varchar not null }
create table DEPT {
CODE varchar primary key,
NAME varchar unique not null }
Figure. 20 SQL-DDL of Person table and Dept table
19
Following this rule, when a Datatype Property “hasName” has been created for the Class PERSON, it would not
have the domain PERSON and Range string, because another class could reuse it. The table DEPT also has an
attribute name, therefore the previous Datatype Property created from the table PERSON could be reused for the
new class DEPT. Because it is not explicit that the class PERSON and class DEPT are disjoint, it may be inferred
that a name of a Dept is also a name of a Person which is false. Hence, due to the Open World Assumption, this rule
is ambiguous.
The next case is when a table has exactly one foreign key. This table is then mapped to an OWL Class. The non
foreign key attributes are mapped following the previous rules. There are now two sub cases: 1) if the foreign key is
a subset of the primary key and 2) if the foreign key is not a subset of the primary key. For the first sub case, the
primary key that is not a foreign key is mapped to an OWL functional datatype property with cardinality of 1 and
with the owl:allValuesFrom restriction, with the attributes datatype. The foreign key is mapped to a Functional
object property. This object property does have a domain and range, where the current table is the domain, and the
relationship of the foreign key as the range. The cardinality of 1 is also added and the inverse object property is also
created. This rule is one of the incompatibilities discussed in section 8.1 (Inheritance Modeling). For the second sub
case, the foreign key is mapped to a Functional Object Property with minimum cardinality of 1, the domain is the
current table and the range is the foreign key table. The Inverse property is also created which is not functional.
The third case is when a table has exactly two foreign keys. There are two sub cases: 1) it does not have any
additional attributes or 2) it does have additional attributes. For the first sub case, the table becomes an object
property with domain and range being the foreign key tables respectively. The second sub case would map the table
to an OWL Class and map the attributes to properties by following the respective rule.
If a table has more than two foreign keys, it is mapped to an OWL class and all the attributes are also mapped to
Datatype properties by following the respective rule.
Finally the CHECK constraint is considered. If check constraint has LIKE A%, a class “beginning_with_A” can
be created and that attribute can have a restriction that all values from that property have to be from that new class.
Another possibility is to have the enumerated check constraints which is then mapped to owl:oneOf
Li et al
Li et al’s approach is the first one to transform SQL-DDL to OWL with a combination of some formal notation and
English language. The rules generate ontological classes, properties, hierarchy, cardinality and instances.
Li et al presents a rule to learn an OWL Class which is if several relations are used to describe one entity by
sharing the same primary key, then they are all integrated into one ontological class. This rule is taken in account if
vertical partioning is considered in the relational schema. However, this same rule is used to learn inheritance. Thus,
applying this rule is subject to human interaction.
A rule is presented which creates a “has-part” and “is-part-of” Object Property, if a relation has a foreign key as
a primary key (Figure. 5). Nevertheless, this rule can also be used to represent hierarchy (Figure. 6). Therefore this
rule can have a double meaning. Furthermore, if a relation represents a n-ary relationship, then there is an object
property that connects every single relation with the other relations in the n-ary relationship. Taking this approach
can produce unnecessary relationships. Consider the example with a 3-ary relation from Figure. 11: Professor,
Course and Semester under the relation Offer shown in Figure. 21. This rule would create the relationships
Professor-Course, Professor-Semester, Course-Semester. Is the relationship Professor-Semester needed? Maybe the
other relationships were already created, and now it is repeated.
Figure. 21. Translating the Offer relation of Figure 11 following Li et al
20
OWL Cardinalities are learned from the constraints of attributes in the relations. If an attribute is a primary key
or foreign key then the minimum and maximum cardinality is 1. If an attribute is NOT NULL, the minimum
cardinality is 1. If an attribute is UNIQUE, the maximum cardinality is 1.
We believe that their shortcomings are due to lack of a formal system and thorough examination of examples
capturing a variety of modeling choices in various domains
Shen et al
Shen et al. (Shen et al. 2006) follows Li et al’s learning rules with slight variations. Shen et al does not include Li et
al’s rules of identifying a “has-part” relationship. Furthermore, it only includes a rule of identifying 3-ary
relationships. However it adds the use of OWL allValuesFrom restriction. This is possible if a foreign key is mapped
to an OWL Object Property, then the object property has an OWL allValuesFrom restriction, which refers to the
corresponding inclusion dependency. This may lead to a disparity discussed previously in Section 3.4.
Astrova et al 2007
Astrova et al. (Astrova et al. 2007) provides expository rules and examples to describe a system for automatic
transformation of a relational schema to an OWL ontology, which discovers more semantics than their previous
effort (Astrova et al. 2004). However, the output from this system does not conform to OWL DL restrictions.
Astrova et al’s approach is to map the relational model and the ontological model, which represent the relational
database and the ontology respectively.
A main difference between Astrova et al. and Li et al’s rules is the mapping of n-ary relationships. If the table is
composed of foreign keys to three other tables (3-ary relationship), this table is mapped to a class, instead of creating
object properties between the relationships, as shown in Figure. 22.
Figure. 22 Translating the Offer relation of Figure 11 following Astrova et al
Columns become OWL Datatype properties with a maximum cardinality of 1 unless it is a foreign key. The
property can also be defined as functional, which is the same as saying that the maximum cardinality is 1.
SQL data type is mapped to XML Schema Data types (XSD). If a CHECK constraint that verifies that an
integer is greater than 0 is used, then the XSD used is positiveInteger.
SQL constraints are mapped as follows. The attribute with a UNIQUE constraint is mapped to an inverse
functional property. This rule cannot be true if the target ontology is OWL DL because Inverse Functional
Properties can only be applied to Object Properties and not Data Type Properties (OWL Guide 2004). UNIQUE
describes a key constraint, and would lead to computational intractability even in realistic ontologies, and are not
supported by any implemented OWL reasoners (Lutz et al., 2004). If the ontology is OWL Full, then it can be
accepted.
Attributes with a NOT NULL constraint are mapped to a minimum cardinality of 1. PRIMARY KEY
constraints are mapped to OWL Inverse Functional Property (which encounters the same problem explained
previously) and a minimum cardinality 1.
If the foreign key is part of the primary key, it is also mapped to an object property accompanied with its
cardinality of 1. However, Li et al assigns this case by translating it into a “has-part” Object Property. This same
model can be used for inheritance, as explained in section 8.1. Finally, by taking vertical partioning in account, if the
foreign key is the primary key, then it is mapped to an inheritance.
If a foreign key is a reference to its same table, then it is considered a Symmetric Property. This rule cannot be
always applied, which has been explained in Section 8.2. If a column in a table is a foreign key to the same table
21
accompanied by ON DELETE CASCADE, the relationship consists of a whole part, where the part cannot exist
without the whole; therefore it is a transitive property.
CHECK constraints can be mapped several ways. If a column has a CHECK constraint to a single same value
for all instances, then it is mapped to OWL hasValue restriction. Even though this translation makes sense, this rule
appears to be ambiguous because it depends on a specific example, which is presented in the following example.
CREATE TABLE Project{
type VARCHAR CHECK (type =’Software’)}
<owl:hasValue rdf:datatype=”&xsd;string”>Software</owl:hasValue>
If the CHECK constraint is with enumeration, then it can be mapped to an enumerated data type owl:oneof.
Astrova et al. identifies more semantics from a relational databases schema as compared to Li et al. However,
the output from this system does not conform to OWL DL restrictions. Since the rules have not been formally
defined, the system is susceptible to ambiguities.
Lubyte et al
The work presented by Lubyte & Tessaris (2007) is an ontology extraction algorithm from a relational database and
proven to be an equivalence preserving schema transformation. Therefore, there is no information loss and the
extracted ontology can be used to access the data or evaluate queries correctly (Lubyte & Tessaris 2009). However
as mentioned in the introduction, Lubyte et al. considers the input as a logical schema (such as UML or ER) instead
of a physical schema. Furthermore, the output is not in a Semantic Web language, instead it is to DLR-DB ontology
language. This approach consists of two steps: first it classifies relations by analyzing the constraints and finally the
actual ontology is generated.
In the first step, relations can be classified as one of the following 4 cases: base relation is when the primary key
and foreign key of a relation do not share attributes; specific relation is when a primary key is among the foreign key
and either there is only one foreign key (the primary key and foreign key are the same attribute) or the relation is
referred to by another relation; relationship relation is when the primary key is composed by all the foreign keys.
Finally, all other relations are classified as ambiguous.
Once the relations have been classified, the second step of this approach starts: generating the ontology. The
algorithm presented in (Lubyte & Tessaris 2007) shows how the associations between each of the classified relations
are generated. The base and specific relations become the equivalent to ontological classes. Associations between
the newly formed classes are determined by an attribute that is not a primary key and is a foreign key in one relation
referencing another relation. Additional axioms are generated depending if the values are not nulls or unique.
Inheritance is identified when an ontological class, derived from a base or specific relation has an attribute that is a
primary key and a foreign key that references another relation. In addition, each exclusion dependency on the set of
specific relations produces a disjointness axiom and every covering constraint on the set of relations produces a
covering axiom. Each relationship relation is translated into an ontological class and the foreign key of the relation is
an association to another ontological class. For the case of ambiguous relations, the heuristic in their algorithm
“prefers” to generate an inheritance relationship, however the final user can determine the correct structure.
Even though (Lubyte & Tessaris 2009) also classifies relations based on the combination of primary keys and
foreign keys, it is not possible to consider if this approach is complete with respect to all the cases of Theorem 1
because it may miss cases due to the fact that its input is a logical schema. For example, this approach does not
consider the case if a relation has exactly two foreign keys and they are both primary keys. In this case, the relation
is mapped to an association itself between two classes, because this relation denotes a relationship between two
other relations. In a logical schema, this binary relation would not exist because it would be represented as a direct
link between the two relations. However, when translating the logical schema to the physical schema, this relation
appears. Due to the fact that this work does not target a Semantic Web language, it is not possible for it to be
completely evaluated with respect to all the other previously surveyed approaches.
6 A Consolidated System of Direct Mapping of a Relational Database to the Semantic Web
Building on the related work described in this paper, Tirmizi et al. (2008) defines a consolidated system for
automatic transformation of relational databases into OWL ontologies defined in first order logic (FOL) (Tirmizi et
al., 2008).
22
6.1 Assumptions
In order to translate a relational schema into an ontology, we make the following assumptions:
The relational schema, in its most accurate form, is available in SQL DDL. As a good software engineering
practice, it is quite common to develop logical models for the relational schema. However, even after
deployment, a database undergoes modifications due to changing application requirements. Such modifications
are often not reflected on the logical models. Therefore, the physical model, easily expressed in SQL DDL,
becomes the most accurate source for the structure of the database.
The relational schema is normalized, at least up to third normal form. While all databases might not be well
normalized, it is possible to automate the process of finding functional dependencies within data and to
algorithmically transform a relational schema to third normal form [Du et al. 1999, Wang et al. 2000].
6.2 Predicates and Functions
We have defined a number of predicates and functions to aid the process of defining transformation rules in first
order logic.
There are two sets of predicates in our system. RDB predicates test whether an argument (or a set of arguments)
matches a construct in the domain of relational databases. Such predicates are listed below and the examples are
with respect to Figure. 11:
Rel(r)
-
r is a relation (or table) identified by CREATE TABLE statement
Attr(x,r)
-
x is an attribute in relation r
NN(x,r)
-
x is an attribute (or a set of attributes) in relation r with NOT NULL
constraint(s); for example: NN(NAME,PERSON) holds
Unq(x,r)
-
x is an attribute (or a set of attributes) in relation r with UNIQUE constraint; for
example Unq({NAME},DEPT) holds
Chk(x,r)
-
x is an attribute in relation r with enumerated list (CHECK IN) constraint; for
example Chk(SESSION,SEMESTER) holds
PK(x,r)
-
x is the (single or composite) primary key of relation r identified using the
PRIMARY KEY constraint; for example: PK({OFFER,SID},STUDY) holds;
also: PK(x,r)
Unq(x,r)
NN(x,r)
FK(x,r,y,s)
-
x is a (single or composite) foreign key in relation r and references y in relation
s; for example: FK({ID},STUDENT,{ID},PERSON) is satisfied
NonFK(x,r)
-
x is an attribute in relation r that does not participate in any foreign key; for
example: NonFK(NAME,DEPT) holds, NonFK(SID,REG) does not hold
Ontology predicates test whether an argument (or a set of arguments) matches a construct that can be represented in
an OWL ontology. These predicates are:
Class(m)
-
m is a class
ObjP(p,d,r)
-
p is an object property with domain d and range r
DTP(p,d,r)
-
p is an data type property with domain d and range r
Inv(p,q)
-
when p and q are object properties, p is an inverse of q
FP(p)
-
p is a functional property
IFP(p)
-
p is an inverse functional property, i.e. inverse of a functional property
Crd(p,m,v)
-
the (maximum and minimum) cardinality of property p for class m is v
MinC(p,m,v)
-
the minimum cardinality of property p for class m is v
MaxC(p,m,v)
-
the maximum cardinality of property p for class m is v
Subclass(m,n)
-
m is a subclass of class n
The constructs represented by ontology predicates are described as they appear in the rules mentioned in the
upcoming sections of this paper.
We have also defined the following functions:
fkey(x,r,s)
-
takes a set of attributes x, relations r and s, and returns the foreign key defined
on attributes x in r referencing x; undefined if there is no such foreign key
23
type(x)
-
maps an attribute x to its suitable OWL recommended data type (we discuss
data types in more detail in a later section)
list(x)
-
maps an attribute x to the list of allowed values for it; this function is applicable
only to attributes that have a CHECK IN constraint defined on them, i.e.
Chk(x,r) is true
In addition to the predicates and functions listed above, we describe the concept of a binary relation, written BinRel,
as a relation that only contains two (single or composite) foreign keys that reference other relations. Such tables are
used to resolve many-to-many relationships between entities. Using RDB predicates, we define BinRel as follows:
Rule Set 1:
BinRel(r,s,t)
Rel(r)
FK(xtr,r,_,t)
FK(xsr,r,_,s)
xtr
xsr
Attr(y,r)
¬
NonFK(y,r)
FK(z,r,_,u)
fkey(z,r,u)
{fkey(xsr,r,s),fkey(xtr,r,t)}
This rule states that a binary relation r between two relations s and t exists if r is a relation that has foreign keys to s
and t, and r has no other foreign keys or attributes (each attribute in the relation belongs to one of the two foreign
keys). Note that there is no condition that requires s and t to be different, allowing binary relations that have their
domain equal to their range.
6.3 Transformation Rules and Examples
In this section we present rules and examples for transformation of a relational database to an OWL ontology.
6.3.1 Producing Unique Identifiers (URIs) and Label
Before we discuss the transformation rules, it is important to understand how we can produce identifiers and names
for classes and properties that form the ontology.
The concept of globally unique identifiers is fundamental to OWL ontologies. Therefore, each class or property
in the ontology must have a unique identifier, or URI. While it is possible to use the names from the relational
schema to label the concepts in the ontology, it is necessary to resolve any duplications, either by producing URIs
based on fully qualified names of schema elements, or by producing them randomly. In addition, for human
readability, RDFS labels should be produced for each ontology element containing names of corresponding
relational schema elements.
For the purposes of this paper, we have not used fully qualified names in our examples. When needed, we
append a name with an integer to make it unique, e.g. ID1, ID2 etc.
6.3.2 Transformation of Data Types
Transformations from relational schemas to ontologies require preserving data type information along with the other
semantic information. OWL (and RDF) specifications recommend the use of a subset of XML Schema types (Biron
et al. (Ed.) 2004) in Semantic Web ontologies (Dean & Schreiber (Ed.) 2004, Hayes (Ed.) 2004).
In Table 6 we present a list of commonly used SQL data types along with their corresponding XML Schema
types. During transformation of data type properties, the SQL data types are transformed into the corresponding
XML Schema types.
Table 6 Some common SQL data types and corresponding XML Schema types recommended for OWL
SQL Data Type
XML Schema Type
SQL Data Type
XML Schema Type
INTEGER
xsd:integer
VARCHAR
xsd:string
DECIMAL
xsd:decimal
CHAR
xsd:string
FLOAT
xsd:float
DATE
xsd:date
REAL
xsd:double
TIME
xsd:time
BOOLEAN
xsd:boolean
TIMESTAMP
xsd:dateTime
24
6.3.3 Identifying Classes
According to the OWL Language Guide [Smith et al. (Ed.) 2004], “the most basic concepts in a domain should
correspond to classes …” Therefore we would expect basic entities in the data model to translate into classes in an
OWL ontology.
Given the definition of a binary relation, it is quite straightforward to identify OWL classes from a relational
schema. Any relation that is not a binary relation can be mapped to a class in an OWL ontology, as stated in the rule
below.
Rule Set 2:
Class(r)
Rel(r)
¬
BinRel(r,_,_)
In combination with Rule Set 1, a binary relation has exactly two foreign keys and no other attributes, this very
simple rule covers a number of cases for identifying classes:
All tables that do not have foreign keys should be transformed to classes. In our example schema, the PERSON
table does not have a foreign key, so Rel(PERSON) is true and BinRel(PERSON,_,_) is false. Therefore, we
conclude Class(PERSON), i.e. Person should be mapped to a class. The same reasoning holds for the Dept and
Semester tables.
All tables that have one foreign key should also be transformed to classes. No such tables can satisfy the BinRel
predicate. Using the same rule we conclude that STUDENT, PROFESSOR and COURSE should be mapped to
classes.
The tables with more than two foreign keys should be transformed to classes as well. Such tables may represent
an entity (when they have attributes not appearing in a foreign key) or an N-ary relationship between entities
(where all attributes appear in foreign keys). Fortunately, in OWL, both these cases can be modeled the same
way, i.e. by translating the entity or the N-ary relationship into a class (Noy 2006). From our running example,
OFFER represents an N-ary relationship, and can be modeled as a class using the given rule.
For tables containing exactly two foreign keys, presence of independent attributes qualifies them to be treated as
entities instead of binary relations, and translated to classes in OWL ontologies. The table STUDY, with an
independent attribute Grade, is an example of this case, and is translated to an OWL class.
So, as a result of applying Rule Set 2, we have successfully identified the classes (see Table 7) from our relational
schema.
Table 7 Classes identified from the relational schema by applying Rule Set 2
Classes
Class(PERSON)
Class(STUDENT)
Class(PROFESSOR)
Class(DEPT)
Class(SEMESTER)
Class(COURSE)
Class(STUDY)
Class(OFFER)
6.3.4 Identifying Object Properties
A property is a relationship that lets us assert general facts about the members of classes. There are two major types
of properties, object properties and data type properties (Smith et al. (Ed.) 2004). Properties have a domain, which
defines the subject, and a range, which defines the object (Dean & Schreiber (Ed.) 2004). We will describe data type
properties in the next section.
An object property is a relation between instances of two classes in a particular direction. The direction is from
the domain of the object property to the range of the object property. In practice, it is often useful to define the
direction of an object property in both directions, creating a pair of object properties that are inverses of each other.
OWL provides us the means to mark properties as inverses of each other. In our work, when we translate something
to an object property, say ObjP(r,s,t), it implicitly means we have created an inverse of that property, written r’ in
our notation, such that, ObjP(r’,t,s).
There are two ways of extracting OWL object properties from a relational schema. One of the ways is through
identification of binary relations, which represent many-to-many relationships. The following rule identifies an
object property using a binary relation.
Rule Set 3:
ObjP(r,s,t)
BinRel(r,s,t)
Rel(s)
Rel(t)
¬
BinRel(s,_,_)
¬BinRel(t,_,_)
25
This rule states that a binary relation r between two relations s and t, neither being a binary relation, can be
translated into an OWL object property with domain s and range t. Notice that the rule implies Class(s) and Class(t)
hold true, so the domain and range of the object property can be expressed in terms of corresponding OWL classes.
From our university database schema, only the Reg table fits the condition. Reg is a binary relation between
STUDENT and SEMESTER entities, which are not binary relations. Therefore, ObjP(REG,STUDENT,SEMESTER)
holds, and since we can create inverses, ObjP(REG’,SEMESTER,STUDENT) and Inv(REG,REG’) also hold true.
Foreign key references between tables that are not binary relations represent one-to-one and one-to-many
relationships between entities. A pair of object properties that are inverses of each other and have a maximum
cardinality of 1 can represent one-to-one relationships. Also, one-to-many relationships can be mapped to an object
property with maximum cardinality of 1, and an inverse of that object property with no maximum cardinality
restrictions.
In OWL, a (data type or object) property with minimum cardinality of 0 and maximum cardinality of 1 is called
a functional property, represented as FP in our rules. If an object property is functional, then its inverse is an inverse
functional property, represented as IFP. In addition to specifying cardinality restrictions on properties in general, we
can also specify such restrictions when a property is applied over a particular domain. In our rules, we use ontology
predicates Crd, MinC and MaxC to specify these restrictions. The examples following the rules explain the use of
these predicates.
The following rule set identifies object properties and their characteristics using foreign key references (not
involving binary relations, covered in Rule Set 3) with various combinations of uniqueness and null restrictions. To
simplify the rules, we first define a predicate NonBinFK representing foreign keys not in or referencing binary
relations and then express the rules in terms of this predicate.
Rule Set 4:
NonBinFK(x,s,y,t)
FK(x,s,y,t)
Rel(s)
Rel(t)
¬
BinRel(s,_,_)
¬BinRel(t,_,_)
a.
ObjP(x,s,t), FP(x), MinC(x’,t,0)
NonBinFK(x,s,y,t)
¬
NN(x)
¬
Unq(x)
b.
ObP(x,s,t), FP(x), Crd(x,s,1), MinC(x’,t,0)
NonBinFK(x,s,y,t)
NN(x)
¬
Unq(x)
c.
ObjP(x,s,t), FP(x), FP(x’)
NonBinFK(x,s,y,t)
¬
NN(x)
Unq(x)
d.
ObjP(x,s,t), FP(x), Crd(x,s,1), FP(x’)
NonBinFK(x,s,y,t)
NN(x)
Unq(x)
¬
PK(x,s)
Each rule in Rule Set 4 states that a foreign key represents an object property from the entity containing the
foreign key (domain) to the referenced entity (range). Since a foreign key can reference at most one record
(instance) of the range, the object property is functional. This requieres that inverse of that object property is inverse
functional. One example from our university schema is the foreign key from Study to Student which gives us:
ObjP(RNO,STUDY,STUDENT), FP(RNO), Inv(RNO’,RNO), ObjP(RNO’,STUDENT,STUDY), IFP(RNO’).
Rules 4a and 4b represent variations of one-to-many relationships.
We can apply a stronger restriction on cardinality of the object property if the foreign key is constrained as NOT
NULL. Without this constraint, (rule 4a), the minimum cardinality is 0, which is already covered by functional
property predicate. With this constraint (rule 4b), we can set the maximum and minimum cardinality to 1.
These rules imply, a minimum cardinality restriction of 0 on the inverse property. An instance in the range can be
referenced by any number of instances in the domain, thus, there is no maximum cardinality restriction on the
inverse property.
The other two rules, 4c and 4d, represent one-to-one relationships, modeled by applying a uniqueness constraint
on the foreign key. It means that an instance in the range can relate to at most one object in the domain, making the
inverse property functional too. This also means that the original object property is inverse functional as well.
The difference between rules 4c and 4d is the NOT NULL constraint. Like one-to-many relationships
mentioned above, if NOT NULL is present, it gives us a stronger cardinality restriction on the object property
represented by the foreign key.
Notice that none of the rules allow the foreign key to be the same as the primary key of the domain relation.
Rule 4d restricts this by providing an extra condition, whereas the negation of uniqueness or NOT NULL constraints
in rules 4a-c, by definition, implies this condition.
26
We do not create an object property if the foreign is being equal to the primary key. Instead, we consider it as a
pattern for inheritance and propose a rule for inheritance mapping. On the other hand, we are unable to capture the
inheritance between STUDENT and PERSON, since it is not a unique inheritance pattern, and we transform this
relationship into an object property.
A list showing some object properties and their characteristics obtained from the sample relational schema by
applying Rule Sets 3 and 4 are presented in Table 8.
Table 8 Some object properties identified from the relational schema by applying Rule Sets 3 and 4. An object property P implies the
existence of an inverse property P’. Due to lack of space, we explicitly specify the inverse property only for the first property.
Object Properties
ObjP(REG,STUDENT,SEMESTER), ObjP(REG’,SEMESTER,STUDENT), Inv(REG,REG’)
ObjP(ID1,STUDENT,PERSON), FP(ID1), FP(ID1’), Crd(ID1,STUDENT,1)
ObjP(CODE,COURSE,DEPT), FP(CODE), IFP(CODE’), Crd(CODE,COURSE,1),
MinC(CODE’,DEPT,0)
ObjP(CNO,OFFER,COURSE), FP(CNO), IFP(CNO’), MinC(CNO’,COURSE,0)
ObjP(CONO,OFFER,OFFER), FP(CONO), IFP(CONO’), MinC(CONO’,OFFER,0)
ObjP(RNO,STUDY,STUDENT), FP(RNO), IFP(RNO’), MinC(RNO’,STUDENT,0)
6.3.5 Identifying Data Type Properties
Data type properties are relationship between instances of classes with RDF literals and XML Schema data types.
Like object properties, data type properties can also be functional, and can be specified with cardinality restrictions.
However, unlike object properties, OWL DL does not allow them or their inverses to be inverse functional.
Attributes of relations in a database schema can be mapped to data type properties in the corresponding OWL
ontology. Rule Set 5 identifies data type properties in a relational schema.
Rule Set 5:
a.
DTP(x,r,type(x)), FP(x)
NonFK(x,r)
b.
DTP(x,r,type(x)), FP(x), Crd(x,r,1)
NonFK(x,r)
NN(x,r)
c.
DTP(x,r,type(x)
list(x)), FP(x)
NonFK(x,r)
Chk(x,r)
Rule Set 5 says that attributes that do not contribute towards foreign keys can be mapped to data type properties with
range equal to their mapped OWL type. Since each record can have at most one value per attribute, each data type
property can be marked as a functional property. When an attribute has a NOT NULL constraint, rule 5b allows us
to put an additional cardinality restriction on the property. Rule 5c allows us to infer stronger range restrictions on
attributes with enumerated list (CHECK IN) constraints.
In some cases, it may be possible to apply more than one rule to an attribute. In such cases, all possible rules
should be applied to extract more semantics out of the relational schema. Some data type properties extracted from
our sample university database schema are listed in Table 9.
Table 9. Some data type properties identified from the relational schema by applying Rule Set 5.
Data Type Properties
DTP(ID1,PERSON,xsd:integer), FP(ID1), Crd(ID1,PERSON,1)
DTP(NAME1,PERSON,xsd:string), FP(NAME1), Crd(NAME1,PERSON,1)
DTP(ROLLNO,STUDENT,xsd:integer), FP(ROLLNO), Crd(ROLLNO,STUDENT,1)
DTP(DEGREE,STUDENT,xsd:string), FP(DEGREE)
DTP(SNO,SEMESTER,xsd:integer), FP(SNO), Crd(SNO,SEMESTER,1)
DTP(YEAR,SEMESTER,xsd:date), FP(YEAR), Crd(YEAR,SEMESTER,1)
DTP(SESSION,SEMESTER,xsd:string
{SPRING,SUMMER,FALL}), FP(SESSION)
DTP(GRADE,STUDY,xsd:string), FP(GRADE)
27
6.3.6 Identifying Inheritance
Inheritance allows us to form new classes using already defined classes. It relates a more specific class to a more
general one using subclass relationships (Smith et al. (Ed.) 2004).
Inheritance relationships between entities in a relational schema can be modeled in a variety of ways. As
discussed earlier, since most of these models are not limited to expressing inheritance alone, sometimes it is hard to
identify subclass relationships in a relational schema.
The following rule describes a common case that can be used for inheritance modeling in a normalized database
design.
Rule Set 6:
Subclass(r,s)
Rel(r)
Rel(s)
PK(x,r)
FK(x,r,_,s)
This rule states that an entity represented by a relation r is a subclass of an entity represented by relation s, if the
primary key of r is a foreign key to s. In our sample university schema, we can clearly identify that
Subclass(PROFESSOR,PERSON) holds.
6.4 Implementation
As a result of applying our rules on the given relational schema, we derive the ontology shown in Table 10.
A comparison of the ontologies produced by the domain expert with the one produced automatically using our
rules (shown in Figure 11) shows a number of differences. For example, the rules of the consolidated system are
unable to capture the subclass relationship of STUDENT with PERSON, or the symmetric and transitive
characteristics of the co-location relationship among OFFER instances. These examples clearly show that automatic
translation of a relational schema to an ontology has some limitations, and that these limitations are inline with the
disparities we have identified earlier.
As a result of presenting this consolidated system in FOL, these rules have been implemented in the JESS rule
language; therefore the FOL expression is stratified.
For sake of this example, we have implemented the rules in JESS. To obtain the SQL-DDL, we analyze only the
data dictionary of MySQL databases. The ontology is then generated using the Jena framework.
Table 10 Parts of an ontology corresponding to the University Database, produced automatically by applying the rules on the given SQL
DDL. The output format is OWL Abstract Syntax.
Automatically Produced Ontology
Ontology(<urn:sql2owl>
ObjectProperty(<REG> domain(<STUDENT>) range(<SEMESTER>))
ObjectProperty(<REG_I> inverseOf(<REG>))
ObjectProperty(<COURSE.DEPTCODE> Functional domain(<COURSE>) range(<DEPT>))
ObjectProperty(<COURSE.DEPTCODE_I> InverseFunctional inverseOf(<COURSE.DEPTCODE>))
ObjectProperty(<OFFER.CONO> Functional domain(<OFFER>) range(<OFFER>))
ObjectProperty(<OFFER.CONO_I> InverseFunctional inverseOf(<OFFER.CONO>))
ObjectProperty(<STUDENT.ID> Functional InverseFunctional
domain(<STUDENT>) range(<PERSON>))
ObjectProperty(<STUDENT.ID_I> Functional InverseFunctional
inverseOf(<STUDENT.ID>))
...
DatatypeProperty(<COURSE.CNO> Functional domain(<COURSE>) range(xsd:integer))
DatatypeProperty(<SEMESTER.YEAR> Functional domain(<SEMESTER>) range(xsd:date))
DatatypeProperty(<SEMESTER.SESSION> Functional domain(<SEMESTER>)
range(oneOf("SPRING" "SUMMER" "FALL")) range(xsd:string))
...
28
Class(<PERSON> partial ...)
Class(<PROFESSOR> partial <PERSON> ...)
Class(<STUDENT> partial restriction(<STUDENT.ID> cardinality(1))
restriction(<STUDY.RNO_I> minCardinality(0)) ...)
Class(<COURSE> partial restriction(<COURSE.DEPTCODE> cardinality(1))
restriction(<COURSE.CNO> cardinality(1)) ...)
...
)
7 An Evaluation of Direct Mapping Approaches
We believe that the following are fundamental ingredients for a transformation system aiming for the migration of
relational data to the Semantic Web:
The rules for a transformation system should be specified formally to avoid any syntactic or semantic
ambiguities in the specifications. Rules defined in formal systems like first order logic can be implemented
in languages such as Prolog, Datalog or JESS.
When developing rules for automatic translation of a relational database to an ontology, special care should
be taken to avoid the influence of domain specific examples, thus an automated transformation system
should try to capture the semantics offered by the schema definition language alone. Sometimes, the
influence of examples from a particular domain can result in incorrect rules that are based on enumerating
examples instead of focusing on formal power.
The transformation system should infer semantic properties by considering all the possible combination of
foreign and primary keys, hence satisfying all the cases of Theorem 1.
We have surveyed existing approaches of direct mapping relational databases to RDFS and OWL ontologies. In
Table 11 we show the approaches that have formal rules and can be mapped to the different cases of Theorem 1. If
an approach does not present formal rules, they do not appear in Table 11. Stojanovic et al and Astrova et al were
the first approaches to consider extracting the domain semantics from the SQL-DDL and transforming them into
RDFS. The majority of their rules overlap, meaning that given the same SQL-DDL input, it would output the same
RDFS. The difference between these two approaches on how they each identify inheritance. Stojanovic et al
presents the transformation rules in a formal manner. As shown in Table 11, Stojanovic et al’s rules are not
complete with respect to Theorem 1 in reference when a primary key is a subset of a foreign key. On the other hand,
Astrova et al presents the transformations through examples without any formality, which may lead to ambiguity;
for that reason, it cannot be determined if Astrova et al’s approach is complete with respect to Theorem 1 and is not
shown in Table 11.
Another series of surveyed direct mapping approaches transform the SQL-DDL to OWL ontologies. Li et al
does use a combination of formal notation and English language and has enumerated the rules, thus it is easier to
identify which rules apply to specific cases. Table 11 shows that Li et al’s approach has rules to satisfy each case of
Theorem 1. Furthermore, Shen et al extends on Li et al’s approach, which is also presented in a combination of
formal notation and English language. However, Shen et al does not present rules to satisfy case 7 and 10, as seen in
Table 11, hence this approach is not complete with respect to Theorem 1. On the other hand, Bucella et al and
Astrova et al only present examples of transformation without any formalism at all, therefore it is ambiguous and
there is no formal way to determine if they are complete with respect to Theorem 1, thus not showing up in Table
11. Building on the surveyed work, we also present a consolidated direct mapping system for automatic
transformation of normalized relational schemas represented in SQL DDL into OWL ontologies, which builds upon
the surveyed approaches. We have defined the entire set of transformation rules in first order logic eliminating the
possibility of syntactic and semantic ambiguities and complete with respect to Theorem 1. The use of first order
logic also allows for easy implementation of the system in rule-based languages. In defining the consolidated rules,
we have ensured compatibility with description logics based OWL sublanguage (OWL DL), which is essential to
assuring decidability for reasoning.
29
Table 11 Rules that apply to each of the Closure set item
Relation with
Stojanovic et al.1
Li et al.2
Shen et al. 3
Consolidated
System4
1) PK
Rule C1, C2, C3
Rule 2, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 5
2) C-PK
Rule C1, C2, C3,
Rule 2, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 5
3) S-FK
Rule C1, C2, C3,
Rule 2, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 4, 5
4) N-FK
Rule C1, C2, C3,
Rule 2, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 4, 5
5) PK + S-FK
a) PK = S-FK
Rule C1, C2, C3,
A3, I1
Rule 2, 7, 8, 9, 10, 11
Rule C-1, C-3, P-1.2, R-
2, R-3, R-4
Rule 2, 5, 6
b) PK S-FK = 0
Rule C1, C2, C3,
A2
Rule 2, 3, 7, 9, 10, 11
Rule C-1, P-1.1, P-1.2,
R-2, R-3, R-4
Rule 2, 4, 5
6) PK + N-FK | N = 2
a) PK N-FK = 0
Rule C1, C2, C3,
A1
Rule 2, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 4, 5
b) PK
N-FK
Rule C1, C2, C3,
A4
Rule 2, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 4, 5, 6
7) PK + N-FK | N > 2
a) PK N-FK = 0
Rule C1, C2, C3,
A5
Rule 2, 7, 9, 10, 11
-
Rule 2, 4, 5
b) PK
N-FK
-
Rule 2, 7, 9, 10, 11
-
Rule 2, 4, 5, 6
8) C-PK + S-FK
a) C-PK S-FK = 0
Rule C1, C2, C3,
A2
Rule 2, 3, 7, 9, 10, 11
Rule C-1, P-1.1, P-1.2,
R-2, R-3, R-4
Rule 2, 5, 6
b) S-FK
C-PK:
-
Rule 2, 4, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 4, 5
9) C-PK + N-FK | N = 2
a) C-PK N-FK = 0
Rule C1, C2, C3,
A2
Rule 2, 3, 7, 9, 10, 11
Rule C-1, P-1.1, P-1.2,
R-2, R-3, R-4
Rule 2, 4, 5
b) N-FK
C-PK
Rule C1, C2, C3,
A4
Rule 2, 5, 6, 7, 9, 10,
11
Rule C-1, P-2, P-1.2, R-
2, R-3, R-4
Rule 2, 3, 4, 5
c) C-PK N-FK
0,
C-PK N-FK
0,
N-FK C-PK
0
-
Rule 2, 7, 9, 10, 11
Rule C-1, P-1.2, R-2, R-
3, R-4
Rule 2, 4, 5
10) C-PK + N-FK | N > 2
a) C-PK N-FK = 0
Rule C1, C2, C3,
A2
Rule 2, 3, 7, 9, 10, 11
-
Rule 2, 4, 5
b) N-FK
C-PK
Rule C1, C2, C3,
A4
Rule 2, 5, 6, 7, 9, 10,
11
-
Rule 2, 3, 4, 5
c) C-PK N-FK
0,
C-PK N-FK
0,
N-FK C-PK
0
-
Rule 2, 7, 9, 10, 11
-
Rule 2, 4, 5
1 The rule numbers correspond to the same rule numbers of (Stojanovic et al 2002)
2 The rule numbers correspond to the same rule numbers of (Li et al 2005)
3 The rule numbers correspond to the same rule numbers of [Shen et al 2006]
4 The rule numbers correspond to the same rule numbers of Section 6
30
In Table 5, we present the output of each direct mapping transformation system given the SQL-DDL input. As we
can see, there is a major overlap in the output of all direct mapping systems as presented in Section 3.1. Stojanovic
et al and Astrova el at (1) only output RDFS ontologies. Buccella et al and Shen et al are the only approaches that
considers using owl:AllValuesFrom, however they are used differently. This leads us to acknowledge that
identifying the use of owl:allValuesFrom and owl:someValuesFrom directly from a relational schema is still an open
problem, as expressed in Section 3.4. In addition there is a difference in the usage of the cardinalities that are
identified. Some approaches present the use of owl:maxCardinality and owl:minCardinality while others use only
owl:cardinality.
In addition in Table 12, we also present the specific RDFS and OWL constructs that are created through each
direct mapping approach. We can realize that Stojanovic et al and Astrova et al (1) output the same RDFS
constructs. Li et al and Shen et al also produce the same OWL outputs, for the exception that Shen et al is able to
use owl:allValuesFrom. Astrova et al (2) is the only approach able to output owl:SymmetricProperty,
owl:TransistiveProperty and owl:hasValue. However, the rules to output these specific constructs are ambiguous and
therefore not recommendable. Hence, we acknowledge that identifying symmetric and transitive properties from a
relational schema is also an open problem.
Table 12 RDFS/OWL constructs that are outputted by each direct mapping approach
Stojanovic
et al.
Astrova
et al. (1)
Buccella
et al.
Li et
al.
Shen et
al.
Astrova et
al. (2)
Consolidated
System
owl:class
x
x
x
x
x
rdfs:class
x
x
owl:allValuesFrom
x
x
owl:someValuesFrom
owl:hasValue
x
Cardinality constraint
owl:maxCardinality
x
x
x
x
x
owl:minCardinality
x
x
x
x
x
owl:cardinality
x
x
owl:intersectionOf
owl:unionOf
x
x
owl:complementOf
rdfs:subClassOf
x
x
x
x
x
x
owl:equivalentClass
owl:disjointWith
owl:oneOf
x
x
owl:DatatypeProperty
x
x
x
x
x
owl:ObjectProperty
x
x
x
x
x
RDFS Properties
rdfs:subPropertyOf
rdfs:domain
x
x
x
x
x
x
x
rdfs:range
x
x
x
x
x
x
x
owl:equivalentProperty
owl:inverseOf
x
x
x
x
x
owl:FunctionalProperty
x
x
owl:InverseFunctionalProperty
x
x
owl:TransitiveProperty
x
owl:SymmetricProperty
x
31
8 Discussion and Future Directions
The Semantic Web is showing a constant growth pattern. This has been attributed to the Linked Open Data project5
created by the W3C’s Semantic Web Education and Outreach Interest Group6. The main goal of this project is to
publish data sets as RDF on the web and create links between them. For common web users, the Semantic Web does
not yet have great significance; therefore it is rare that integrating a relational database with the Semantic Web
accrues obvious benefit to the owners of website. It is expected that the Semantic Web will become mainstream and
more accessible to common web users; therefore the need to integrate databases to the Semantic Web will increase.
The W3C foresees this issue and has already organized the Relational Database to RDF (RDB2RDF) working
group7 in order to recommend a standardized mapping language between relational databases to RDF. Given this
situation, it is important to understand the relationship between relational databases and the Semantic Web.
In this work we have surveyed different approaches that automatically expose relational databases to the
Semantic Web. This process, called direct mapping, automatically synthesizes an OWL putative ontology from the
relational database schema and exposes the relational data as RDF instances of the ontology. We introduce the
concept of the putative ontology”, which is a syntactic transformation from the basic implicit domain semantics of
a relational database to an ontology, which is obtained by applying direct mapping rules. We present Theorem 1,
which states that a direct mapping transformation must consider all the possible primary key and foreign key
combinations. We compare all the approaches and identify which ones satisfy Theorem 1. Finally, we present a
consolidated direct mapping system.
Although there are many proposals for direct-mapping SQL schema to ontologies, they all depend on the
richness of the SQL schema with respect to its encoding of domain semantics. Any skepticism of the quality of the
putative ontology is definitely well founded with respect to many legacy databases. Foreign key constraints were
introduced as part of the SQL93 standard. Clearly, all direct mapping systems depend on explicit expression of such
constraints to achieve accurate results. However, starting a number of years ago, databases began to be designed
using sophisticated data modeling tools that capture considerable domain semantics such as foreign keys and check
constraints. If the schema has been developed with such tools and in a normalized manner, the schema can portray
enough semantics to create a semantically rich ontology. If this is the case, then direct mapping is the easiest way to
create RDF from relational database data, by virtue of being completely automated.
On the other hand, the amount of domain semantics captured in SQL-DDL models is highly variable and
occasionally does not contain rich structure. If these types of models are used in a direct mapping system, the
putative ontology that is derived will lack rich semantics. Even though the putative ontology may not have the same
expressivity as a domain ontology, it is not incorrect, hence the name putative. The deficiencies of a putative
ontology become evident when an ontology produced by the system is compared to an ontology produced by a
domain expert.
We expose patterns between relational databases and ontologies, and the difficulties when it comes to
identifying richer semantics in a relational database schema. Firstly we show how normalization can affect the
putative ontology and demonstrate that inheritance in a relational database schema can be modeled in different
ways; therefore a single rule to identify inheritance is not possible. In addition, symmetric and transitive
relationships may be implicit in the domain but are not explicit in the relational schema. Furthermore, identifying
specific value constraints such as “for all” or “there exist” may imply identifying new properties or reusing existing
properties. Continuously, it may also have implications in the open and closed world assumptions. Additionally,
SQL’s check constraint can represent different types of semantics such as enumeration, inheritance or business type
rules. Finally, we recognize that analyzing triggers to extract semantics applicable to ontologies is still an open
problem. Overall, there is also a need to evaluate the quality of a putative ontology and a domain ontology.
We assert that the scope and frequency of success of direct mapping transformation systems is very unlikely to
achieve the scope that database-ontology mapping methods can achieve, however we consider direct mapping the
first step in a two step process. Sequeda et al. presents an application of direct mapping a relational database and
manually mapping the putative ontology to a domain ontology, in order to overcome database heterogeneity
(Sequeda et al., 2009). Direct mapping converts the relational database to ontology mapping problem to an ontology
to ontology mapping problem. Therefore, we consider that the next step in this area is to bridge the gap between
putative and domain ontologies.
In order to bridge this gap, we propose to refine the putative ontology through a bootstrapping architecture
(Sequeda et al., 2008). This refinement process consists of initially matching the putative ontology with an existing
5 http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
6 http://www.w3.org/2001/sw/sweo/
7 http://www.w3.org/2009/08/rdb2rdf-charter.html
32
domain ontology. Due to the fact that the putative ontology lacks semantics and may have some ambiguity with
respect to the domain ontology, we treat the alignment between the putative and domain ontology as an alignment
hypothesis. Querying and analyzing the relational database data can test the alignment hypothesis. After the
hypothesis has been tested, the new knowledge is used to refine the putative ontology. For example, a putative
ontology may not represent inheritance, but its existence could be determined by the mapping to a domain ontology
and the analysis of the content of the relational database. In this process, the putative ontology would be
automatically matched to other domain ontologies. This would be an iteratively process with different domain
ontologies, until the putative ontology has been refined and it completely represents the domain semantics of the
relational database.
Acknowledgments
This research is funded in part by the National Science Foundation grant IIS-0531767 and DBI-0851052 and the
Spanish fundamental R&D project myBigData.
References
Aho, A., Lam, M., Sethi, R., and Ullman, J. 2006. Compilers: Principles, Techniques and Tools. Addison-Wesley.
An, Y., Borgida, A., & Mylopoulos, J. 2005. Inferring Complex Semantic Mappings between Relational Tables and Ontologies from Simple
Correspondences. In Proceedings of On The Move to Meaninful Internet Systems (OTM’05): CoopIS, DOA, and ODBASE, Agia Napa,
Cyprus. Springer Verlag.
An, Y., Mylopoulos, J., & Borgida, A. 2006. Building Semantic Mappings from Databases to Ontologies. In Proceedings of the 21th National
Conference on Artificial Intelligence (AAAI-06) Nectar Track.
Arens, A., Chee, C.Y., Hsu, C., & Knoblock, C. 1993. Retrieving and Integrating Data from Multiple Information Sources. International Journal
on Intelligent and Cooperative Information Systems 2(2), 127-158.
Astrova, I. 2004. Reverse Engineering of Relational Databases to Ontologies. In Proceedings of 1st European Semantic Web Symposium (ESWS
2004). Crete, Greece. Springer Berlin
Astrova, I., Korda, N., & Kalja, A. 2007. Rule-Based Transformation of SQL Relational Databases to OWL Ontologies. In Proceedings of 2nd
International Conference on Metadata & Semantic Research.
Barrasa, J., & Gomez-Perez, A. 2006. Upgrading relational legacy data to the semantic web. In Proceedings of 15th International World Wide
Web Conference. ACM. 1069-1070
Barrasa, J., Corcho, O., & Gomez-Perez, A. 2004. R2O, an Extensible and Semantically Based Database-to-Ontology Mapping Language. In
Proceedings of 2nd Workshop on Semantic Web and Databases.
Bizer, C., & Seaborne, A. 2004. D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs. In Proceedings of 3rd International Semantic
Web Conference. Hiroshima, Japan
Buccella, A., Penabad, M. R., Rodríguez, F. J., Fariña, A., Cechich, A. 2004. From Relational Databases to OWL Ontologies. In Proceedings of
the 6th Russian Conference on Digital Libraries. Pushchino, Rusia.
Ceri, S., Widom, J. 1993. Managing Semantic Heterogeneity with Production Rules and Persistent Queues. In Proceedings of 19th Very Large
Databases Conference. Dublin, Ireland
Chen, H., Wang, Y., Wang, H., Mao, Y., Tang, J., Zhou, C., et al. 2006. Towards a Semantic Web of Relational Databases: a Practical Semantic
Toolkit and an In-Use Case from Traditional Chinese Medicine. In Proceedings of 5th International Semantic Web Conference. Athens, GA,
USA
Collet, C., Huhns, M.N., Hsu, C., & Shen, W. 1991. Resource Integration Using a Large Knowledge Base in Carnot. Computer 12(24), 55-62.
de Laborda, C. P., & Conrad, S. 2005. Relational.OWL: a data and schema representation format based on OWL. In Proceedings of 2nd Asia-
Pacific Conference on Conceptual Modelling. Newcastle, Australia.
de Laborda, C. P., & Conrad, S. 2006. Database to Semantic Web Mapping using RDF Query Languages. In Proceedings of 25th International
Conference on Conceptual Mapping. Tucson, AZ, USA
Drummond, N., & Shearer, R. 2006. The Open World Assumption. eSI Workshop: The Closed World of Databases meets the Open World of the
Semantic Web.
Du, H., & Wery, L. 1999. Micro: A normalization tool for relational database engineers. Journal of Network and Computer Applications 22(4),
215-232 .
Duo, D., Pan, J., Qin, H., & LePendu, P. 2006. Towards Populating and Querying the Semantic Web. In Proceedings of 2nd International
workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006). Athens, GA, USA
Hartig, O., Bizer, C., Freytag, J.C. 2009. Executing SPARQL Queries over the Web of Linked Data. In Proceedings of the 8th International
Semantic Web Conference (ISWC2009). Chantilly, VA, USA
He, B., Patel, M., Zhang, Z., & Chang, K. 2007. Accessing the deep web. Communications of the ACM , 50 (5), pp. 94-101.
Korotkiy, M., & Top, J. 2004. From Relational Data to RDFS Models. In Proceedings of 2004 International Conference on Web Engineering.
Munich, Germany
Laclavik, M. 2006. RDB2Onto: Relational Database Data to Ontology Individual Mapping. Tools for Acquisition, Organisation and Presenting of
Information and Knowledge.
Li, M., Du, X., & Wang, S. 2005. Learning ontology from relational database. In Proceedings of 4th International Conference on Machine
Learning and Cybernetics.
Lubyte, L., & Tessaris, S. 2007. Extracting Ontologies from Relational Databases. Technical Report KRDB07-04, Free University of Bozen-
Bolzano.
Lubyte, L., & Tessaris, S. 2009. Automatic Extraction of Ontologies Wrapping Relational Data Sources. In Proceedings of 20th Database and
Expert Systems Applications (DEXA2009).
33
Lutz, C., Areces, C., Horricks, I., and Satler, U. 2004. Keys, Nominals, and Concrete Domains. Journal of Artificial Intelligence Research, Vol.
23, pp 667-726.
Meier, D. (1983). The Theory of Relational Databases. Computer Science Press.
Miller, R., Haas, L., & Hernandez, M. 2000. Schema mapping as query discovery. In Proceedings of 26th International Conference on Very Large
Database. Cairo, Egypt.
Motik, B., Horrocks, I., & Sattler, U. 2007. Bridging the gap between OWL and relational databases. In Proceedings of 16th International
Conference on World Wide Web. Banff, Canada
Smith, M. K., Welty, C., & McGuinness, D. L. (Ed.) 2004. OWL Web Ontology Language Guide. Retrieved February 5, 2009, from W3C
Recommendation: http://www.w3.org/TR/owl-guide/
Dean, M., & Schreiber, G. (Ed.) 2004. OWL Web Ontological Language Reference. Retrieved February 5, 2009, from W3C Recommendation:
http://www.w3.org/TR/owl-ref/
Pratt, P. J. 1990. A Guide to SQL. Boston: Boyd & Fraser Publishing Company.
Pratt, P. J. 1995. A Guide to SQL (3rd Edition ed.). Boston: Boyd & Fraser Publishing Company.
Hayes, P. (Ed.) 2004. RDF Semantics. Retrieved February 5, 2009, from W3C Recommendation: http://www.w3.org/TR/rdf-mt/
Sequeda, J.F., Tirmizi, S.H., and Miranker, D.P. 2007. SQL Databases are a Moving Target. Position Paper for W3C Workshop on RDF Access to
Relational Databases, Cambridge, MA, USA
Sequeda, J.F., Tirmizi, S.H., and Miranker, D.P. 2008. A Bootstrapping Architecture for Integration of Relational Databases to the Semantic
Web. In Proceedings of the Poster and Demonstration Session at the 7th International Semantic Web Conference. Karlsruhe, Germany
Sequeda, J.F., Garcia-Castro, A., Corcho, O., Tirmizi, S.H., and Miranker, D.P. 2009. Overcoming database heterogeneity to facilitate social
networks: the Colombian displaced population as a case study. In Proceedings of 18th International Conference on World Wide Web Ibero-
America Track. Madrid, Spain
Shen, G., Huang, Z., Zhu, X., Zhao, X. 2006. Research on the Rules of Mapping from Relational Model to OWL. In Proceedings of the
Workshop on OWL: Experiences and Directions. Athens, GA, USA
SQL3 (ISO-ANSI Working Draft). (n.d.). Retrieved February 5, 2009, from http://www.inf.fu-berlin.de/lehre/SS94/einfdb/SQL3/sqlindex.html
Stonebraker, M. 1986. Triggers and inference in database systems. In on Knowledge Base Management Systems: integrating Artificial
intelligence and Database Technologies. M. L. Brodie and J. Mylopoulos, Eds. Springer Topics In Information Systems. Springer-Verlag
New York, New York, NY, 297-314.
Stojanovic, L., Stojanovic, N., & Volz, R. 2002. Migrating data-intensive web sites into the Semantic Web. In Proceedings of the 2002 ACM
Symposium on Applied Computing (Madrid, Spain, March 11 - 14, 2002). SAC '02. ACM, New York, NY, 1100-1107.
Stojanovic, L., Stojanovic, N., & Volz, R. 2002. A reverse engineering approach for migrating data-intensive web sites to the Semantic Web. In
Proceedings of the IFIP 17th World Computer Congress - Tc12 Stream on intelligent information Processing (August 25 - 30, 2002). M. A.
Musen, B. Neumann, and R. Studer, Eds. IFIP Conference Proceedings, vol. 221. Kluwer B.V., Deventer, The Netherlands, 141-154.
Svihla, M., & Jelinek, I. (2004). Two Layer Mapping from Database to RDF. In Proceedings of Electronic Computers and Informatics (ECI).
Slovakia, Kosice.
Tirmizi, S.H., Sequeda, J.F., & Miranker, D.P. 2008. Translating SQL Applications to the Semantic Web. In Proceedings of the 19th
International Conference on Database and Expert Systems Applications, September 01-05, 2008, Turin, Italy. S. S. Bhowmick, J. Küng,
and R. Wagner, Eds. Lecture Notes In Computer Science, vol. 5181. Springer-Verlag, Berlin, Heidelberg, 450-464.
Tirmizi, S.H., Aitken, S., Moreira, D., Mungall, C., Sequeda, J.F., Shah, N.H., & Miranker, D.P. (2009). OBO & OWL: Roundtrip Ontology
Transformations. In Proceedings of Semantic Web Applications and Tools for Life Sciences Workshop, Amsterdam, The Netherlands.
Vrandecic, D., & Sure, Y. (2008). How to Design Better Ontology Metrics. In Proceedings of the 4th European Conference on Semantic Web:
Research and Applications (Innsbruck, Austria, June 03 - 07, 2007). E. Franconi, M. Kifer, and W. May, Eds. Lecture Notes In Computer
Science, vol. 4519. Springer-Verlag, Berlin, Heidelberg, 311-325.
Wang, S., Shen, J., & Hong, T. 2000. Mining fuzzy functional dependencies from quantitative data. IEEE International Conference on Systems,
Man and Cybernetics, 5, 3600-3606.
Biron, P. V., Permanente, K., & Malhotra, A. (Ed.) 2004. XML Schema Part 2: Datatypes Second Edition. Retrieved February 5, 2009, from
W3C Recommendation: http://www.w3.org/TR/xmlschema-2/
Xu, Z., Zhang, S., & Y, D. 2006. Mapping between Relational Database Schema and OWL Ontology for Deep Annotation. In Proceedings of the
2006 IEEE/WIC/ACM international Conference on Web intelligence.Web Intelligence. IEEE Computer Society, Washington, DC, 548-552.
... Firstly, data within a relational database structure is straightforwardly mappable into the data structure of knowledge graph triples. 38 Secondly, a solid grasp of how to resolve distinct entities can be achieved through ESAMP before populating the nodes of the knowledge graph. Alternative approaches of merging all samples with the same label or considering every possibly-distinct sample to be a unique material are too coarse-and ne-grained, respectively. ...
Article
Full-text available
While the vision of accelerating materials discovery using data driven methods is well-founded, practical realization has been throttled due to challenges in data generation, ingestion, and materials state-aware machine learning. High-throughput experiments and automated computational workflows are addressing the challenge of data generation, and capitalizing on these emerging data resources requires ingestion of data into an architecture that captures the complex provenance of experiments and simulations. In this manuscript, we describe an event-sourced architecture for materials provenance (ESAMP) that encodes the sequence and interrelationships among events occurring in a simulation or experiment. We use this architecture to ingest a large and varied dataset (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments performed using various modalities such as serial, parallel, multi-modal experimentation. Our data architecture tracks the evolution of a material's state, enabling a demonstration of how state-equivalency rules can be used to generate datasets that significantly enhance data-driven materials discovery. Specifically, using state-equivalency rules and parameters associated with state-changing processes in addition to the typically used composition data, we demonstrated marked reduction of uncertainty in prediction of overpotential for oxygen evolution reaction (OER) catalysts. Finally, we discuss the importance of ESAMP architecture in enabling several aspects of accelerated materials discovery such as dynamic workflow design, generation of knowledge graphs, and efficient integration of simulation and experiment.
... Bootstrapping techniques are responsible for extracting a representation of the source schemata and there has been a significant amount of work in this area, as presented in several surveys [2,23,27,52]. Most of the current available efforts, however, do require an a priori available target schema and/or materialize the source data in the global schema [8,14,36]. ...
Article
Full-text available
Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present Nextia DI , a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.
... Other contributions restrict their attention to the algorithms behind the generation of mappings, notably [36,58,69] for the R2RML language. We mention also some surveys and comparative analyses [16,[77][78][79][80], where the interested reader can further explore the tools and techniques mentioned here. We finally notice that, in our review, we did not find any study introducing an in depth analysis of existing real scenarios of DB-to-ontology mapping, as we do in the present paper, aimed at showing that the identified categories actually reflect the real design choices and methodologies in use by the mapping designers. ...
... However, it suffers from weak semantics [28]. And thereby, there is an urgent need to represent the database semantically to be machine-readable [29], [30]. ...
Article
Full-text available
Electronic Health Records (EHRs) aggregate the entire patient’s data from different systems. Achieving interoperability for distributed EHR systems is expected to improve patient safety and care continuity, and therefore it improves the healthcare industry. However, achieving interoperability is challenging because of many standards, medical terminologies and ontologies, and different data formats. These formats make the integration of different systems an impossible process. If the hospital uses one standard to implement all of its medical systems, it will be no problem integrating them. However, hospitals usually depend on multiple standards and data formats to deliver different systems like hospital information systems, radiology information systems, laboratory information systems, etc. Semantic Web presents new technology for achieving EHRs interoperability. In this paper, we propose a novel ontological model to implement interoperability for distributed EHR environments. The proposed semantic ontology-based model can unify different EHRs data formats. In this study, We unify five different and popular healthcare data formats and standards. In addition, the framework could be extended straightforwardly to accept any other EHR data format. By implementing the proposed in real environments, we provide the physician with a single interface with a single terminology to query and interact with distributed healthcare systems that use different standards and data formats. This process is expected to help the physician to collect patient data from different systems quickly, completely, and correctly. The proposed ontological model has two stages. The first stage of the proposed converts each different input source to OWL ontology. In the second stage, it integrates all those ontologies into a merged crisp one. The integrated ontology includes 3753 axioms, 2606 logical axioms, 186 classes, 136 individuals, 126 datatype properties, and 257 object properties. We use SPARQL Protocol and RDF Query Language (SPARQL) and Description Logic (DL) queries to evaluate the output ontology. The obtained results ensure that the proposed framework helps physicians and specialists make a centralized point for all patients’ data. It could aggregate data with any heterogeneous structures with high precision.
... Other studies have been carried out to give approaches and tools allowing integration of the semantic web and big data. Transforming legacy data from various forms into resource description framework (RDF) is an important first step in enabling RDF-based data integration [6]- [8]. RDF is more and more used as a hinge format for combining disparate data sources. ...
Article
Full-text available
Nowadays, the database field has gotten much more diverse, and as a result, a variety of non-relational (NoSQL) databases have been created, including JSON-document databases and key-value stores, as well as extensible markup language (XML) and graph databases. Due to the emergence of a new generation of data services, some of the problems associated with big data have been resolved. In addition, in the haste to address the challenges of big data, NoSQL abandoned several core databases features that make them extremely efficient and functional, for instance the global view, which enables users to access data regardless of how it is logically structured or physically stored in its sources. In this article, we propose a method that allows us to query non-relational databases based on the ontology-based access data (OBDA) framework by delegating SPARQL protocol and resource description framework (RDF) query language (SPARQL) queries from ontology to the NoSQL database. We applied the method on a popular database called Couchbase and we discussed the result obtained.
Chapter
For decades, ontologies have defined valuable terminology for describing and representing a knowledge domain, capturing relationships between concepts, and improving knowledge management. Ontologies enable the exchange and sharing of information, extending syntactic and semantic interoperability: such advantages are also very useful in the cultural heritage (CH) field. Nowadays, ontologies are often made manually, although various attempts have been made in the literature for their automatic generation (Ontology Learning). This paper proposes a new way for the semi-automatic building of an ontology from a Relational Database (RDB). Following an accurate review of existing methods, we propose the implementation of a Python library capable of converting an RDB into an OWL ontology and importing its data inside the ontology as concepts and properties instances. We present a case study on actual data from the cultural heritage world coming from the REMIAM project of the High Technology District for Cultural Heritage (Distretto ad Alta Tecnologia per i Beni Culturali - DATABENC). Through interviews with experts in the field, a series of valid questions were identified for the experts’ research work and the interrogation of the knowledge base. The questions were then converted into SQL and SPARQL queries to assess the correctness of the method. The ability of the generated ontology to infer new knowledge on accurate data in the RDB will also be highlighted.
Article
Full-text available
A relational database (RDB) is a digital database that uses components (such as constraints, tables, keys, etc.) to manage data in a structured manner. Because of these components, RDBs are considered ’poor’ from a semantic point of view, precisely because of the structure-oriented nature of the components used. One way to eliminate this limitation is to transform the RDB into an ontology. The purpose of this article is to review the different approaches existing in the literature to extract data from an RDB and convert it into ontology instances. Two approaches are used to integrate the mapping between RDBs and ontologies. The first allows ontologies to be extracted from an RDB, the second consists of a mapping of the relational database to an existing ontology. Our proposed review focuses on methods for creating a specific ontology from an RDB. The proposed review examines this field, classifying the methods that will be analyzed according to their inputs and outputs. Such classification may be useful for understanding the usability of methods. The aim is to critically review existing studies to help outline this research topic’s progress and identify methods’ gaps and functionalities.
Chapter
The need for cultural heritage enhancement in the Italian context requires an effort from new technologies. Indeed, various tools can be functional to guarantee the improvement of the artistic artifacts’ maintenance or users’ enjoyment. This paper aims to present an architecture to improve the cultural experience through recommender systems and digital storytelling techniques. Moreover, the construction of a mathematical model allows for the building of personalized paths. An experimental phase, based on the development of a prototype, permits the evaluation of the proposed architecture, and obtained results are promising.KeywordsRecommender systemContext awarenessPath recommendationDigital storytelling
Article
Full-text available
Many description logics (DLs) combine knowledge representation on an abstract, logical level with an interface to "concrete" domains like numbers and strings with built-in predicates such as <, +, and prefix-of. These hybrid DLs have turned out to be useful in several application areas, such as reasoning about conceptual database models. We propose to further extend such DLs with key constraints that allow the expression of statements like "US citizens are uniquely identified by their social security number". Based on this idea, we introduce a number of natural description logics and perform a detailed analysis of their decidability and computational complexity. It turns out that naive extensions with key constraints easily lead to undecidability, whereas more careful extensions yield NExp-TIME-complete DLs for a variety of useful concrete domains.
Article
Full-text available
The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.