ArticlePDF Available

Survey of directly mapping SQL databases to the Semantic Web

December 2011
The Knowledge Engineering Review 26(04):445-486

December 2011
26(04):445-486

DOI:10.1017/S0269888911000208

Source
DBLP

Authors:

Juan Sequeda

University of Texas at Austin

Syed Ahmed Tirmizi

Oscar Corcho

Universidad Politécnica de Madrid

The Semantic Web anticipates integrated access to a large number of information sources on the Internet represented as Resource Description Framework (RDF). Given the large number of websites that are backed by SQL databases, methods that automate the translation of those databases to RDF are crucial. One approach, taken by a number of researchers, is to directly map the SQL schema to an equivalent Web Ontology Language (OWL) or RDF Schema representation, which in turn, implies an RDF representation for the relational data. This paper reviews this research, and derives a consolidated, overarching set of translation rules expressible as a stratified Datalog program. We present all the possible key combinations in an SQL schema and consider their implied semantic properties. We review the approaches and characterize them with respect to the scope of their coverage of SQL constructs.

. Summary of Relational Database direct mapping to RDFS and OWL

…

Example of a database in 1NF

…

Example of a database in 2NF

…

. Example of a database in 2NF

…

Output of each direct mapping approach given the SQL input

…

Figures - uploaded by Oscar Corcho

Content may be subject to copyright.

Content uploaded by Oscar Corcho

Content may be subject to copyright.

Survey of Directly Mapping SQL Databases to the Semantic Web

JUAN F. SEQUEDA1, SYED HAMID TIRMIZI1, OSCAR CORCHO2 AND DANIEL P. MIRANKER1

1Department of Computer Sciences, The University of Texas at Austin, Austin, USA

2Ontology Engineering Group, Universidad Politecnica de Madrid, Madrid, Spain.

Abstract

One of the most relevant aspects to be considered to ensure the success of the Semantic Web is the availability of semantic data. Since the

majority of current websites are backed by relational databases, the transformation of this data Semantic Web languages is a great opportunity

and is being currently addressed in Linked Data approaches. There are two main approaches for this data transformation task: a) to rely on

automatic methods that derive an ontology from a SQL schema and transform data according to this schema, b) to let an expert create a mapping

between the SQL schema and an existing ontology, which is used to make the transformation into RDF. In this paper we survey the first set of

approaches according to a common framework.

Keywords: Semantic Web, Relational Databases, Ontology, OWL, RDF, RDFS

1. Introduction

A goal of the Semantic Web, also known recently as the Web of Data, is to enable web-wide integration of data

coming from various sources. The architecture proposed for the Semantic Web [ref w3c] proposes a standardized

data model for all data types (RDF, which encodes an arbitrary labeled directed graph), adding metadata encoded as

an ontology (in RDFS or OWL), and linking the data to the ontologies, and to other data across the web. A

standardized service protocol (SPARQL) enables the web to appear as a giant database and allow distributed

execution of queries (Hartig et al., 2009). In 2007, He et al. determined that Internet accessible databases contained

up to 500 times more data compared to the static Web and roughly 70% of websites are backed by relational

databases (He et al., 2007). The quantity of data suggests the success of the Semantic Web depends on developing

methods for making relational databases accessible to the Semantic Web. The number of web sites suggests that it

is paramount to develop automatic methods for integrating relational databases with the Semantic Web.

By the end of 2009, there were approximately 10 billion RDF triples published on the web coming from more

than 150 different datasets. Most of this data are originally stored in relational databases and have been manually

mapped to RDF. In response to this, the World Wide Web Consortium (W3C) has formed the Relational Database to

RDF Working Group, which is chartered to recommend a standard mapping language between the relational data

model and the Semantic Web languages.

Broadly stated, there are two architectural approaches to integrating relational databases with the Semantic

Web. In the first approach, which is the subject of this survey, the database content and its SQL schema are mapped

directly to representations in Semantic Web languages i.e. RDFS or OWL (Figure. 1) (Astrova 2004, Astrova et al.,

2007; Buccella et al., 2004; Li et al., 2005; Shen et al., 2006; Stojanovic et al., 2002; Tirmizi et al., 2008). The first

effort to exploit domain semantics from SQL to RDFS was by Stojanovic et al. (2002). The second approach

considers existing domain ontologies. Such systems include an explicit mapping language for mapping database

schema to ontologies. The ontology may be either a shared global or local ontology (Figure 2). (An et al., 2005; An

et al., 2006; Barrasa et al., 2004; Barrasa et al., 2006; Bizer et al., 2004; Chen et al., 2006; Duo et al., 2006; Korotkiy

et al., 2004; Svihla et al., 2004; de Laborda et al., 2006; Laclavik 2006; Xu et al., 2006; Arens et al., 1993; Collet et

al., 1991)

Direct mapping methods are intrinsically automatic. In each of the papers surveyed the database’s SQL schema

is mechanically transformed into an ontology, usually through the use of translation rules, and the relational data is

exposed as RDF instances of the generated ontology. We are careful to say SQL schema as the transformation

methods usually consider primary and foreign key constraints, and can include column and table constraints (e.g.

age > 0). Relational schema often means only table and column. We call the result of a syntactic transformation of a

SQL schema to an ontology a putative ontology.

Criticisms of direct mapping are two-fold: i) the quality of the putative ontology will depend on the quality and

sophistication of the data modeling effort that created the SQL schema and ii) the putative ontology will still require

integration with some global domain ontology if larger-scale interoperability is sought. While the former issue has

merit, it is not true that SQL schema cannot encode application semantics. We, and others have found that if a

database application was developed with attention to E-R or UML modeling tools, then the quality of a putative

ontology may replicate the quality of initial logical data model (Lubyte & Tessaris 2009). The work on direct

mapping by Lubyte et al starts with the assumption that the database has been so created.

Direct mapping shifts the mapping problem from a relational schema-to-ontology matching problem to an

ontology-to-ontology matching problem. Even in isolation of relational databases, the successful creation of the

Semantic Web will see the development of successful ontology-to-ontology mappings (Vrandecic & Sure, 2008).

Thus, if the Semantic Web enjoys larger success, direct mapping of relational databases will be sufficient. One can

also anticipate that matching putative ontologies to domain ontologies as a specialized and easier instance of

ontology-to-ontology mapping (Sequeda et al., 2008).

Figure. 1 Direct Mapping of a Relational Database to the Semantic

Web

Figure. 2 Relational Database to Ontology Mapping

A series of examples follow to demonstrate the capability of a SQL database to encode domain semantics using

SQL-DDL and at the same time to expose downside of this approach. If we take a relational database schema, one

can observe the implicit domain semantics that are represented in that schema. We use a University’s relational

database as a running example (Figure. 3). The schema defines table structures concerning students, their

enrollment, and courses being offered by a specific department.

Figure. 3. Example of University’s relational database

By looking at the SQL schema, simple analogies can be observed between the relational schema and an ontology

and possible transformations can be easily generated. The central focus of direct mapping efforts concerns the

semantic interpretation of the combinations of primary and foreign keys. Any direct mapping system must anticipate

every possible association that might be formed by a foreign key and the implication of every possible composition

of those constraints. In section 4 we develop an abstract SQL grammar capable of generating all possible primary

and foreign key combinations, of which there are 10. As we survey the work on direct mapping, we will use the

coverage of these 10 cases as one of features we use to assess the efforts.

For example, if we take a closer look at the relation between Course and Department, a foreign key between these

two relations exists, encoding a one-to-many relationship between the relations. For a user, it is understandable that

this relationship means that a Course is offered by a Department. This exact knowledge can be represented in an

ontology. Figure. 4 shows the SQL-DDL of the relationship between Course and Dept through a foreign key, and its

equivalent graphical ontological representation, with two classes, Course and Department, and a property,

“offeredBy”, that relates the two classes, having as the domain the Course class and as a range the Department class.

This is a simple example of how explicit semantics that are represented in an ontology are represented implicitly in a

relational schema. As a result, some examples in a relational schema can have obvious transformations to an

ontology.

CREATE TABLE COURSE{

CID INT PRIMARY KEY,

NAME VARCHAR NOT NULL,

DEPT INT,

CONSTRAINT DEPT_FK FOREIGN KEY(DEPT)

REFERENCES DEPT(DID)}

CREATE TABLE DEPT{

DID INT PRIMARY KEY,

NAME VARCHAR NOT NULL}

Figure. 4 SQL-DDL and Ontology for Course and Dept

However, just because in this example a transformation can be done does not mean that all examples of a syntactic

form work the same way. One particular example is inheritance. When a person looks at the content and structure of

the database, it can be obvious that it maps to an inheritance relationship, but the same syntactic construct (physical

representation) can represent two different logical relationships. Figure 5 illustrates an example representing

inheritance between STUDENT and PERSON using a Foreign Key as a subset of a Primary Key.

CREATE TABLE PERSON{

ID_PERSON INT PRIMARY KEY,

NAME_PERSON VARCHAR NOT NULL}

CREATE TABLE STUDENT{

ID_STUDENT INT,

ID_PERSON INT,

DEGREE_STUDENT VARCHAR NOT NULL,

CONSTRAINT PERSON_FK FOREIGN

KEY(ID_PERSON) REFERENCES

PERSON(ID_PERSON)

PRIMARY KEY(ID_STUDENT, ID_PERSON)

Figure. 5 SQL-DDL and Ontology for Person and Student

In this case, the table PERSON becomes the class PERSON and the table STUDENT becomes a subclass

PERSON. Due to the fact that one of the primary keys of the table STUDENT is also a foreign key to the table

PERSON, one can say that this is an inheritance.

However, this same syntactic construct, which represented inheritance in the previous example may model a

different type of relationship (Figure. 6). Therefore it is not correct to state that this specific syntactic construct

represents inheritance.

CREATE TABLE ORDER{

ID_ORDER INT PRIMARY KEY,

ORDER_NAME VARCHAR NOT NULL}

CREATE TABLE ORDERITEM{

ID_ORDER INT,

ID_ITEM INT,

ITEM_NAME VARCHAR NOT NULL,

CONSTRAINT PERSON_FK FOREIGN

KEY(ID_ORDER) REFERENCES ORDER(ID_ORDER)

PRIMARY KEY(ID_ORDER, ID_ITEM)

Figure. 6 SQL-DDL and Ontology for Order and Order Item

The previous example models the relationship between an ORDERITEM, which is part of an ORDER, which

does not represent inheritance. Thus, it is easy to make mistakes while thinking that some specific syntactic

construct may represent a specific ontological construct. Hence, not all explicit syntactic constructs may be correctly

transformed. When viewed together, the papers reviewed in this survey enumerate a large number of alternatives.

Building on these, we present a consolidated direct mapping system (Section 6). The consolidated system covers all

possible associations formed by foreign key relationships and does not create ontology content when the physical

model may be ambiguous.

Before presenting the surveyed work, Section 5, we first present a generic mapping of components of SQL-DDL

to layers in the Semantic Web layer cake, Section 2. Part of that looks into the evolution of the SQL language as a

basis of understanding how domain semantics can be encoded in a SQL model. In addition it creates a context for

assessing the individual efforts and their respective transformation system. In Section 3 we present transformation

patterns and acknowledge other “hard examples” which include the identification of symmetric and transitive

properties, the use of OWL’s someValuesFrom and allValuesFrom and the effects of Open and Closed World

Assumptions.

2. SQL-DDL and the Semantic Web Languages

SQL standards, including its data definition language (DDL), have been evolving since the first proposal in 1986.

For each version, SQL 86-89, 92, 99, 2003, the SQL language standard changes significantly from its predecessor,

increasing the expressiveness of the language each and every time (Pratt 1990, Pratt 1995, SQL3, Sequeda et al.,

2007). It is arguable that in the first standard implementations of SQL, the language was not expressive enough to

explicitly encode the database’s application semantics. Initially the development of the relational model was largely

concerned with the representation of data (Figure. 7a) as n-ary relations and the correctness of syntactic

transformations leveraging relational algebra (Meier 1983). A central concern was the theory of normal forms. A

primary goal was to reduce storage overhead. Similarly early SQL dialect had little concern with application specific

semantics.

Evident in the sequence of SQL standards are increasingly powerful elements for capturing application domain

semantics. These elements include consistency constraints, database views and triggers. The inclusion of trigger

definitions and view definitions in SQL was initially inspired by knowledge-base systems that capture and execute

both forward and backward rule systems (Stonebraker 1986). This is not to say that SQL-DDL is a knowledge

representation language; the semantics of most of its constructs are not directly comparable with constructs of an

ontology language. However, SQL DDL is capable of capturing some semantics of the domain modeled in a SQL

application. Such semantics can be extracted from the schema by identifying grammatical structure.

2.1 The Evolution of SQL Data Models

SQL is organized in three parts. The SQL data definition language (DDL), the SQL data manipulation language

(DML) and the SQL data query language (DQL). The DDL is used to define both the logical and physical

representation of data in a database. To date direct-mapping systems have been concerned with only the DDL. We

provide an overview of the DDL features as they have evolved and show they parallel the expressive hierarchy of

Semantic Web languages. Taking just some literary license, we can structure the chronological development of

increasingly powerful SQL dialects as layers of expressiveness that correspond precisely to the Semantic Web layer

cake.

The first SQL standard, SQL86-89 was limited. For the purpose of this paper, the following succinct description

suffices; starting from relational algebra, a standard set of data types were defined, abbreviated formal relational

notations were replaced full English words (SELECT, CREATE TABLE, etc), and the now familiar accoutrements

of human readable computer languages added (Figure. 7b). The often heard claim that SQL databases do not provide

for the encoding of data semantics is apt for this version of SQL (Pratt 1990).

SQL 92 added data integrity constraints: CHECK, PRIMARY KEY, FOREIGN KEY and UNIQUE [Pratt

1995]. This first extension already enables DBA’s to encode application semantics by connecting relations and

permitting us to know beforehand what values are allowed. In conjunction with the basic table definition of SQL 88,

this version of SQL is the one in common use today (Figure. 7c).

SQL 99 saw the addition of Triggers (SQL3). Triggers are forward-chaining rules whose predicates are

conditioned on changes to table content. Triggers are often used to maintain correctness and to implement

heterogeneous data integration (Ceri & Widom 1993). Figure. 7d is a trigger that maintains an invariant on the

database concerning salary inversion. SQL 2003 introduces XML-related features. We find, so far, that the XML-

related extensions enable only the encoding of syntactic properties specific to XML.

a) Relational Model

Before SQL

Employee (name, age)

b) Table Definition

SQL 86-89

CREATE TABLE employee (name VARCHAR(100), age INTEGER)

c) Constraints

SQL 92

CREATE TABLE employee(

name VARCHAR(100) PRIMARY KEY,

salary INTEGER NOT NULL,

type CHAR(8) CHECK(type IN ('TEMP','FULLTIME','CONTRACT'))

dept_name FOREIGN KEY (dept) REFERENCES department (name))

CREATE TABLE department(

name VARCHAR(100) PRIMARY KEY)

d) Triggers

SQL 99

CREATE TRIGGER sal_adjustment

AFTER UPDATE OF salary ON employee REFERENCING OLD AS OLD_EMP NEW AS NEW_EMP

FOR EACH ROW WHEN (NEW_EMP.SALARY > (OLD_EMP.SALARY *1.20))

BEGIN ATOMIC SIGNAL SQLSTATE ‘75001’(‘Invalid Increase: > 20%’); END

Figure. 7 Evolution of SQL DDL

2.2 SQL and the Semantic Web Layer Cake

Knowing the historical context, it is easy to decompose SQL-DDL into a stack representing increasing expressive

power. While analyzing the evolution of SQL-DDL with respect to the Semantic Web stack, we suggest a

correspondence between individual syntactic structures in SQL-DDL and the layers in the Semantic Web stack

(Figure. 8). A material result is a hierarchy of SQL-DDL grammars whose expressive power corresponds to the

stack of Semantic Web languages. This type of layer cake mapping methodology has been achieved between the

OBO (Open Biomedical Ontology) language and the Semantic Web stack (Tirmizi et al., 2009). We anticipate that

the organization of the Semantic Web with respect to the assignment of expressive power to each of its respective

layers embodies a larger organizing principal.

We observe that systems that translate SQL databases to the Semantic Web have different goals with respect to

how much of the expressive power of SQL-DDL targeted and similarly the amount of expressive power of the

Semantic Web is exploited. To be fair, the choices made by individual projects are at least as dependent on the

timeframe of the effort as they are on the research itself. OWL became a W3C recommendation in 2004, thus efforts

that predate that necessarily do not consider OWL although some of them consider other ontology languages. The

structure illustrated in SQL-DDL can express the domain semantics that the relational database represents by the

relationships between the relational database’s relations.

Figure. 8 Layer Cake Correspondence between SQL and the Semantic Web

We begin by demonstrating the implicit semantics that are in SQL-DDL and how they can be made explicit. An

example of a university SQL schema and its corresponding ontology is presented in section 3, thus demonstrating

how a SQL schema and ontology, of the same domain, are related. We continue to discuss incompatibilities between

relational databases and ontologies.

2.2.1 Relational Model and RDF: Mapping Relational Tuples to RDF

The first layer we encounter is the Relational Model, which predates SQL. This layer only expresses the syntactic

structure of the data stored in a relational database; therefore we can map it only to the data layer of the Semantic

Web: RDF. To export the content in a relational database to the Semantic Web, it is necessary to create an RDF

representation of this content.

RDF expresses data in triples as a labeled graph in the form of a Subject-Predicate-Object statement. The n-

tuples can be expressed as a series of triple statement in RDF. In Figure. 9, we show how the n-tuple can be

represented as an RDF graph by having the following triple statements: <T1, name, Juan> and <T1, age, 21>.

T1: Employee (Juan, 21)

Figure. 9 RDF representation of Figure. 7a

2.2.2 Table Definition and RDFS

The relational model offers sufficient semantics to create relations between the data. The first version of SQL

(SQL89), introduced the table definition, which creates a SQL schema for the data. Each column has a specific data

type. Likewise, RDFS is a schema for RDF data, which creates classes and properties.

An equally valid representation of a table definition is an n-dimensional graph, where n is the number of column

names. This graph representation is similar to RDFS, where it established a schema to represent data in RDF triple

form. Therefore, by utilizing this version of SQL, its domain semantics can be mapped only to the RDFS layer.

RDFS can represent the table definition (Figure. 7b) by having a class Employee with properties name and age. In

this example, the column name “age” becomes and rdf:Property as shown in Figure. 10.

Figure. 10 RDFS Representation of Figure. 7b

2.2.3 Constraints and OWL

Climbing the Semantic Web layer cake, more expressive power is encountered. OWL offers expressiveness that can

not be expressed in RDFS. Likewise, SQL99 and in conjunction with elements of SQL92, contains new components

that offer more domain semantics. By using PRIMARY KEY, FOREIGN KEY, NOT NULL, UNIQUE and

CHECK, domain semantics are more explicit. These semantics are richer than the basic table definition and

therefore should be mapped to a higher level in the Semantic Web layer cake. We have observed that the use of SQL

with all the possible constraints will map directly to OWL (hence the use of OWL class instead of RDFS class).

OWL is more expressive due to the fact that it can represent specific type of properties and restrictions. The main

similarity between database constraints and OWL is the possibility of using the database’s referential constraint to

establish relationships between tables as owl:ObjectProperty that connects objects. PRIMARY KEY, UNIQUE and

NOT NULL implies owl:FunctionalProperty and the enumerated CHECK constraint implies owl:oneOf (this will be

discussed in the next section).

Following the example in Figure. 7c, the schema and its constraints can be represented in OWL. In a general

assumption, a relation can be considered an OWL class except if the relation is a binary relation, therefore it is an

OWL object property. All attributes that are referential constraints are OWL object properties. Their domain is the

current relation and range is the referenced relation. Attributes that are not referential constraints are OWL data type

properties. In OWL, the “age” column name becomes an owl:DatatypeProperty.

2.2.4 Triggers and Rules

SQL99 introduces a new semantic component: triggers, as shown in Figure 7d. This new component can be

considered as business rules that guarantee the integrity of data in a relational database and derive new information

materializing it. Even though these rules are not used for inference, they may expose rules about the domain. For

example, the trigger in Figure. 7d, explains that the maximum salary increase is of 20%. Triggers like the previous

one may be analyzed to extract the rules and transform them to Semantic Web format. However, no direct mapping

approach integrates triggers in the transformation process. The integration of triggers into direct mapping is still an

open research question.

2.3 An Example of Direct Mapping Transformation

Consider a relational database in 3NF for a university (Figure. 11), where we show an example of how an ontology

can be extracted out of the SQL schema.

University Database Schema

create table PERSON {

ID integer primary key,

NAME varchar not null }

create table STUDENT {

ROLLNO integer primary key,

DEGREE varchar,

ID integer unique not null foreign key references PERSON(ID)}

create table PROFESSOR {

ID integer primary key,

TITLE varchar,

constraint PERSON_FK foreign key (ID) references PERSON(ID)}

create table DEPT {

CODE varchar primary key,

NAME varchar unique not null }

create table SEMESTER {

SNO integer primary key,

YEAR date not null,

SESSION varchar check in (‘SPRING’, ‘SUMMER’, ‘FALL’)}

create table COURSE {

CNO integer primary key,

TITLE varchar,

CODE varchar not null foreign key references DEPT(CODE)}

create table OFFER {

ONO integer primary key,

CNO integer foreign key references COURSE(CNO),

SNO integer foreign key references SEMESTER(SNO),

PID integer foreign key references PROFESSOR(ID),

CONO integer foreign key references OFFER(ONO)}

create table STUDY {

ONO integer foreign key references OFFER(ONO),

RNO integer foreign key references STUDENT(ROLLNO),

GRADE varchar,

constraint STUDY_PK primary key (ONO, RNO)}

create table REG {

SID integer foreign key references STUDENT(ID),

SNO integer foreign key references SEMESTER(SNO),

constraint REG_PK primary key (SID, SNO)

}

Figure. 11 Schema of a University Database

The PERSON table contains data about all the people, some of them may be students and present in STUDENT

table, and some may be professors and present in PROFESSOR table. The Dept table lists the departments in the

university where each department has a unique name, and the COURSE table lists the courses for every department.

The Semester table contains a list of semesters which have a year and one of the three seasons, Spring, Summer or

Fall, associated with them. A course could be offered in a particular semester with a particular professor, and

recorded in OFFER table.

Two offered courses could be co-offered, and recorded as a self-relation in the OFFER table. A student could

study an offered course, which is recorded in STUDY table. Also, a student could be registered in a semester with or

without taking a course, and this information is recorded in the Reg table.

For a domain expert, it is easy to recognize the concepts in this database structure that is in 3NF, and to identify

the semantics of their properties and different kinds of relationships that exist between these concepts. Figure 12

shows an ontology corresponding to the given schema, developed by a domain expert.

Domain Expert’s Ontology

Ontology(<urn:sql2owl>

ObjectProperty(<REG> domain(<STUDENT>) range(<SEMESTER>))

ObjectProperty(<REG_I> inverseOf(<REG>))

ObjectProperty(<COURSE.DEPTCODE> Functional domain(<COURSE>) range(<DEPT>))

ObjectProperty(<COURSE.DEPTCODE_I> InverseFunctional inverseOf(<COURSE.DEPTCODE>))

ObjectProperty(<OFFER.CONO> Transitive Symmetric domain(<OFFER>) range(<OFFER>))

...

DatatypeProperty(<COURSE.CNO> Functional domain(<COURSE>) range(xsd:integer))

DatatypeProperty(<SEMESTER.YEAR> Functional domain(<SEMESTER>) range(xsd:date))

DatatypeProperty(<SEMESTER.SESSION> Functional domain(<SEMESTER>)

range(oneOf("SPRING" "SUMMER" "FALL")) range(xsd:string))

...

Class(<PERSON> partial ...)

Class(<PROFESSOR> partial <PERSON> ...)

Class(<STUDENT> partial <PERSON>

restriction(<STUDY.RNO_I> minCardinality(0)) ...)

Class(<COURSE> partial restriction(<COURSE.DEPTCODE> cardinality(1))

restriction(<COURSE.CNO> cardinality(1)) ...)

...

)

Figure. 12 Parts of an ontology corresponding to the schema in Figure. 11, developed by a domain expert. The ontology is presented in OWL

Abstract Syntax.

It is apparent that an ontology does not correspond exactly to a relational database. Ontologies can and often

encode domain semantics that are not possible to be encoded in a relational database, or even in SQL-DDL.

Therefore, a translation of a SQL schema to an ontology will suffer of ambiguity, and several disparities, which are

explained in the following section.

3 Transformation Patterns between Relational Databases and Ontologies

Relational databases are capable of efficiently managing large amounts of structured data. Ontologies are very

useful for knowledge representation. Since these two models are aimed towards different requirements, it is

reasonable to expect some disparities among them in terms of basic assumptions. It is important to understand the

mismatches and to make educated choices when confronted with such problems. In the following sections, we

discuss some key issues that affect a direct transformation system. We show an example of how different

normalizations can affect the ontology that is being derived from the database. It follows by a discussion on why it is

hard to identify inheritance and property characteristics in relational schemas, followed by the effects of identifying

different types of relationships and constraints. Then we discuss the effect of the open world assumption in

ontologies, when a relational database with a closed world assumption is translated into an ontology.

3.1 Basic Transformations

Several components of relational databases can be seen as obvious transformations to an ontology. In most cases, a

relation is mapped to an ontological concept unless it is a binary relation. In this case, the binary relation would be

mapped to a property that connects two concepts. A foreign key connects two relations, therefore, the foreign key is

mapped to a property. All attributes that are not foreign keys are also mapped to properties. All direct mapping

systems follow these basic transformation patterns. As a result, we believe that this overlap indicate that the majority

of direct mapping approaches share rules that can be standardized.

Table 1. Summary of Relational Database direct mapping to RDFS and OWL

Relational Database

RDFS Ontology

OWL Ontology

Non-binary Relation

RDFS Class

OWL Class

Binary Relation

RDFS Property

OWL Object

Property

Column

RDFS Property

OWL Datatype

Property

Foreign Key

RDFS Property

OWL Object

Property

3.2 Database Normalization

Normalization is the process of organizing data in a database. The objective of normalization is to simplify the

maintenance of data integrity and reduce the amount of space used for the database. A common practice is to design

a relational database in Third Normal Form (3NF). In real-life scenarios, one may come across different types of

normalizations. For example, schemas may be de-normalized for performance reasons. Different normalizations

may lead to question the putative ontology that can be obtained from them. The same way a relational database can

be normalized in different manners, an ontology can be modeled in different ways. However, there is no right or

wrong way of modeling an ontology because it may depend on the context and/or the domain expert. Nevertheless,

when creating a putative ontology, we need to analyze the type of ontology that is derived depending on the

database normalization. The following examples show how the derived ontology appears if a database is in three

different normal forms, 1NF, 2NF and 3NF. Consider the following database in 1NF:

Table 2. Example of a database in 1NF

Author

AuthorID

AuthorName

John Smith

Jane Doe

Books

ISBN

Title

Publisher

City

Subject

AuthorID

123-456

ABC

Publisher

XYZ

Dallas

Science

Fiction

123-456

ABC

Publisher

XYZ

Dallas

Science

Fiction

987-654

XYZ

Publisher

ABC

Austin

Auto-

biography

Table 2 shows a database in 1NF. There is a table for Authors and Books. The Books table has a foreign key that

points back to the Author table. The corresponding ontology that could be derived from a database in 1NF is shown

in Figure 13.

Figure. 13 Ontology derived from database in 1NF from Table 2

The same way the database lacks higher normalization; a domain expert can consider that the ontology could be

modeled in a better way. For example, the publisher should be a class, and not just literal values. Now, lets consider

the database in 2NF, as shown in the following table

Table 3. Example of a database in 2NF

Author

AuthorID

AuthorName

John Smith

Jane Doe

Books

ISBN

Title

PublisherID

SubjectID

AuthorID

123-456

ABC

123-456

ABC

987-654

XYZ

Subject

SubjectID

Subject

Type

Science

Fiction

Auto-

biography

Non-

Fiction

Publisher

PublisherID

Publisher

City

Publisher ABC

Austin

Publisher XYZ

Dallas

Figure. 14 Ontology derived from database in 2NF from Table 3

The derived ontology is shown in Figure 14, which shows two new classes. These can automatically generated if the

database is in 2NF. However, if we further normalize the database to 3NF, as shown in Table 4, the derived

ontology is the same ontology that is generated from the database in 2NF, because the table Authorship is

transformed into a relationship between Author and Books.

Table 4. Example of a database in 2NF

Author

AuthorID

AuthorName

John Smith

Jane Doe

Books

ISBN

Title

PublisherID

SubjectID

123-456

ABC

987-654

XYZ

Subject

SubjectID

Subject

Type

Science

Fiction

Auto-

biography

Non-

Fiction

Publisher

PublisherID

Publisher

City

Publisher

ABC

Austin

Publisher

XYZ

Dallas

Authorship

ISBN

AuthorID

123-456

987-654

This example does not prove that a putative ontology from a 2NF or 3NF database is the same. However, it does

suggest that a putative ontology from a database may depend on the normalization.

A relational database can be represented in a logical model (UML, ER) or a physical schema (SQL DDL). As a

good software engineering practice, it is quite common to develop logical models for the relational schema, which

then gets translated to a physical schema. Even after deployment, a database undergoes modifications due to

changing application requirements. Such modifications are often not reflected in the logical model. Therefore, the

physical schema, easily expressed in SQL DDL, becomes the most accurate source for the structure of the database.

Direct mapping approaches usually assume that the input database is in 3NF, even though they can still be applied if

they are not in this normal form.

3.3 Inheritance Modeling

Given that a SQL schema contains relationships between entities that can be expressed using only foreign keys, we

find it necessary to identify patterns of foreign key definition that can express only the inheritance relationships. In

other words, given a foreign key definition between two entities, is it possible to say that a subclass relationship

exists between the entities involved? If such patterns exist, we can map them to subclass relationships in the

ontology.

Relational databases do not provide a mechanism to express inheritance. However, inheritance hierarchies can be

modeled in a variety of ways in SQL schemas. Our university schema example shows two different modeling

choices for inheritance. STUDENT and PROFESSOR entities are subclasses of PERSON, even though the

relationships have been modeled differently. We present some inheritance modeling possibilities through possible

foreign key patterns and discuss why some inheritance modeling choices are harder to identify automatically.

• Foreign key is also the primary key: An example of this case is the PROFESSOR-PERSON relationship in our

university schema. This is a common pattern that identifies inheritance. An exception to this would be vertical

partitioning of tables for performance reasons, as in some data warehousing applications. However, with our

requirement of 3NF, such a scenario would not occur. Therefore, most direct mapping approaches automatically

identify inheritance modeled in this way.

• Foreign key and primary key are disjoint: The STUDENT-PERSON relationship in our university schema is an

example of this pattern. This pattern does not uniquely identify inheritance, and therefore cannot be automatically

translated into an inheritance hierarchy in an ontology. A counterexample is the Course-Dept relationship

modeled in the same schema. In fact, this pattern is the most common one used for expressing one-to-many

relationships.

• Foreign key is a subset of the primary key: This is another option for modeling inheritance in a relational

database. However, other relationships can also be modeled this way, and therefore it is not a good candidate for

automatic translation to an inheritance hierarchy in the ontology. A counterexample for this pattern is:

ORDER(ONo) – ONo is the primary key

ORDERITEM(ONo,INo) – (ONo,INo) is the primary key, ONo is a foreign key to Order

In a business domain, the relationship most likely means that an order item is a part of an order, instead of

representing an inheritance relationship between the two entities.

3.4 Symmetric and Transitive Relationships

SQL lacks the expressive power to enforce symmetry and transitivity in relationships. Expressing such properties of

relationships is natural to ontology languages like OWL.

The self-relation on the OFFER entity, that represents co-location of an offered course with another offered

course, has interesting characteristics. First, it is symmetric, because if an offered course A is co-located with an

offered course B, it means B is co-located with A as well. And second, it is transitive, which means that if A is co-

located with B, and B is co-located with C, then A is co-located with C.

Such a self-relationship may or may not imply a symmetric or transitive relationship. Consider the example:

Employee(ID,Name,MgrID), where ID is the primary key, and MgrID is a foreign key to the Employee table itself

that captures the manager’s ID. Clearly, this relationship is not symmetric, because if John is the manager of Peter,

that rules out the possibility of the same Peter being the manager of the same John.

Depending upon the domain semantics, the relationship may or may not be transitive. If an employee’s manager

means any other employee higher in the organization, then being a manager is a transitive relationship. If it means

only the immediate supervisor, then it is not a transitive relationship. The example shows that it is hard to identify

logical characteristics of relationships in a SQL schema without using the domain knowledge.

3.5 Value Constraints

The OWL language has a set of value constraints which puts constraints on the range of a property when applied to a

particular class description. The three value constraints are allValuesFrom, someValuesFrom and hasValue. The

OWL allValuesFrom is analogous to the universal quantifier of predicate logic while, the OWL someValuesFrom is

analogous to the existential quantifier of predicate logic. Consider the following example:

Employee(id, id_role) Admin(id, id_role) Role(id_role,role_name)

Employee(id_role) and Admin(id_role) are foreign keys to Role(id_role). One can consider this example as

having two different relationships: Employee has a role and an Admin has a role. Therefore, two Object Properties

would be created. Nevertheless, if both object properties result in meaning the same thing, then it would be

sufficient to have only one Object Property and have an owl:allValuesFrom with the union of the Employee and

Admin table. It is unclear on how to determine if these two Object Properties are the same.

Different value constraints may lead to different interpretations. For example, if we consider an Open World

Assumption, then a property should only have a domain and not a range. If a property has a domain and a range,

then we are considering a Closed World Assumption. We could tighten the closed world in a property by not only

having a domain and range, but also having a allValuesFrom constraint.

3.6 Check Constraint

Check constraints are conditions that validate the data that is being inserted in a table. The enumerated CHECK

constraint can be represented in OWL using owl:oneOf and rdf:List. For example, if we take attribute Session from

the Semester table in Figure 12; its representation in OWL would be the following:

oneof(‘Spring’ ‘Summer’ ‘Fall’)

Nevertheless, the check constraint could represent inheritance. In this case, a “Spring Session” is also a type of

“Session”. On the other hand, OWL is not expressive enough to represent all the possibilities that the semantics of

CHECK offers. Consider having a CHECK constraint to guarantee that the value of a column degree is between 0

and 360. This type of constraint cannot be represented in OWL.

3.7 Open World vs. Closed World

Relational databases usually operate under the closed world (CW) assumption. This means that whatever is not in

the database is considered false. CW is essential for important database concepts like integrity constraints and data

validation (Drummond & Shearer 2006). On the other hand, a knowledge-base community like the Semantic Web

has an open world (OW) approach where whatever is not in the knowledge base is considered unknown. This

assumption is natural for knowledge bases that often contain incomplete knowledge, and grow with the manual

discovery of new domain knowledge and automatic inferences.

Due to this difference, the concept of a constraint has very different meanings in the two worlds (Motik et al.,

2007). In a database setting, a constraint is mainly used for validation and prevents incorrect data from entering the

database. In contrast, in an ontology, a constraint expresses some characteristics of classes or relationships but does

not prevent assertion of any facts. Due to these constraints, some assertions may even result in unintuitive

inferences.

Consider this example: In our university database, the relationship between a course and a department is

expressed by a foreign key constraint. The foreign key constraint will allow the Course relation to have the record

(CS386,CS,Databases) where CS is a department. However, the record (CS386,John,Databases) – John is a student

– will not be allowed because it violates the integrity constraint. In the ontology, the same constraint appears as

object property CODE, with range restriction of Dept on the relationship. Unlike database constraints, the ontology

constraint will not only allow the assertion of the triple CODE(CS386,John), it will also infer that John is an

instance of Dept, because of the range restriction on CODE property.

While there are obvious differences between closed and open worlds, open worlds can be closed by explicitly

stating required negations. OWL provides a way to state such facts by providing constructs to express disjoints.

When developing an ontology based on a SQL schema, it is very important to keep these differences in mind.

The question of whether the open world should be closed or not, depends upon the domain and application

requirements. If the ontology is for use within a particular application, it might make sense to close the world,

whereas in a data integration setting, it might make more sense to have an open world.

4 Key Combinations of a Direct Mapping Transformation

Since we are inferring semantic properties formed by foreign keys, we have exhaustedly considered all possible

primary and foreign key combinations. This issue is complicated by the possible presence of compound keys (keys

composed of multiple attributes) and the possibilities of multiple overlapping keys.

We present all possible combinations of primary and foreign keys in a SQL DDL to ontology transformations

where the rules of the transformation system cover the entire range of possible relations that can be described in a

SQL schema. While it is trivial to translate relations into ontology classes, the existence of foreign keys represents

relationships between corresponding classes, and poses a challenge for any transformation system. Multiple foreign

keys may be present in a table, and each of them may be in a different form, representing different kinds of

relationships, e.g. one-to-one, one-to-many etc., between the entities. The interaction of the foreign keys with

primary keys provides clues to the properties of these relationships.

Theorem 1

The space of relations describable in SQL DDL using various combinations of primary key and foreign key

references between the relations can be partitioned into 10 disjoint cases of key combinations.

The proof involves a syntactic enumeration of the cases and a closure operation over the space of relations. Fig15

provides a useful summary of the theorem and its proof.

Proof: Briefly, we first partition the space by examining the number of foreign keys contained in relations. All

relations without any foreign keys can be easily translated into classes in an ontology. Similarly, relations with more

than two foreign keys usually represent N-ary relationships, and the rules for N-ary relationships are applicable. The

cases for one or two foreign keys are more interesting and give rise to more possibilities including binary relations,

inheritance and new classes. However, for each possible branch, we have carefully defined sets of rules for

producing ontology classes and properties.

A table in a relational database can have either a Primary Key and/or Foreign Key. Both keys can include a number

of attributes, x, where x = 0…n. There can only be one Primary Key in a table, whereas there can be one or more

Foreign Keys. Therefore, our initial starting point is:

1. PK: a relation only has a Primary Key

2. C-PK: a relation only has a composite Primary Key

3. S-FK: a relation only has one Foreign Key

4. N-FK: a relation has at least two or more Foreign Keys

This can be represented in the following grammar:

E  PK + T | C-PK +T

E  S-FK

E  N-FK

T  S-FK | N-FK

Space of

relations

0 FKs

FK=PK

FK≠PK

∃i FKi=PK

¬(∃i FKi=PK)

∃i FKi=PK

FKi∪FKj=PK (i≠j)

Otherwise

Has non-FK attributes

All attributes in FKs

Has non-FK attributes

1 FK

2 FKs

>2 FKs

Fig.15 The tree describes the complete space of relations when all possible combinations of primary and

foreign keys are considered

By applying the closure operation for LR(0) item sets (Aho et al., 2006), the following elements are obtained:

5. PK + S-FK: a relation has a Primary Key and only one Foreign Key

a) PK = S-FK: the Foreign Key is the Primary Key

b) PK ∩ S-FK = 0: the Foreign Key and the Primary Key do not share any attributes

6. PK + N-FK: a relation has a Primary Key and two (2) Foreign Keys

a) PK ∩ N-FK = 0: the Foreign Key and the Primary Key do not share any attributes

b) PK

⊂

N-FK: one of the Foreign Keys is also the Primary Key

7. PK + N-FK: a relation has a Primary Key and more than two (> 2) Foreign Keys

c) PK ∩ N-FK = 0: the Foreign Key and the Primary Key do not share any attributes

d) PK

⊂

N-FK: one of the Foreign Keys is also the Primary Key

8. C-PK + S-FK: a relation has a Composite Primary Key and only one Foreign Key.

a) C-PK ∩ S-FK = 0: the Foreign Key and the Primary Key do not share any attributes

b) S-FK

⊂

C-PK: the Foreign Key is part of the Primary Key

9. C-PK + N-FK: a relation has a Composite Primary Key and two (2) Foreign Keys

a) C-PK ∩ N-FK = 0: all the Foreign Keys and the Primary Key do not share any attributes

b) N-FK

⊆

C-PK: all the Foreign Keys are part of the Primary Key

c) C-PK ∩ N-FK

≠

0, C-PK – N-FK

≠

0, N-FK – C-PK

≠

0: The Foreign Keys and Primary Key

share common attributes

10. C-PK + N-FK: a relation has a Composite Primary Key and more than two (> 2) Foreign Keys

d) C-PK ∩ N-FK = 0: all the Foreign Keys and the Primary Key do not share any attributes

e) N-FK

⊆

C-PK: all the Foreign Keys are part of the Primary Key

f) C-PK ∩ N-FK

≠

0, C-PK – N-FK

≠

0, N-FK – C-PK

≠

0: The Foreign Keys and Primary Key

share common attributes

Therefore, a direct mapping transformation system should be complete with respect to all the cases of the theorem of

key combinations.

5 Direct Mapping Approaches

We will use the suggested correspondence between SQL and the Semantic Web as a basis for characterizing and

comparing the different mapping efforts. In addition, we will look to the 10 cases determined by the theorem of key

combinations presented in the last section to help organize the discussion. The coverage of the 10 cases by each

method is addressed in Section 7.

Some surveyed approaches are not formally presented, but rely heavily on examples and written descriptions of

the transformations; very much like the introduction of this paper. As a result, developing precise meanings is

sometimes difficult. We will acknowledge the overlap between the surveyed systems, particularly when such

overlap helped motivate the inclusion of a transform in the consolidated system. We will offer counter-examples to

any specific transformation where the syntactic underpinning of the the preconditions does not uniquely determine

the consequent. The analysis is summarized in Table 5. Some cells may be empty and this is because the system

does not offer any transformation from the given SQL construct. The work of Lubyte et al. is not shown in Table 7

because their input is not SQL-DDL and their output is not in RDFS or OWL.

It is important to note that the relational model of a relational database is the basis to obtain RDF content

(Figure. 9). Data is to RDF as schema is to ontology (RDFS or OWL). For there to be data, it has to be stored in a

relational database that is defined with a schema. Likewise, RDF data is an instance of a specific triple schema that

is part of an ontology. After creating an ontology from the database schema, the data in the relational database can

be extracted and transformed to ontological instances. All the following surveyed approaches accomplish this

objective.

The solution fails when the relational database is updated. Upon an update, the extracted data as ontological

instances will not represent the current status of the relational database. Another option involves translating

SPARQL (RDF query language) to SQL, where one would not have to have to separate sets of data: the relational

data and the ontological instances, instead, the SPARQL query can be translated into SQL, extract the data in the

relational database, and then translate it, using the ontology that was automatically created, and return the data in

RDF format.

Table 5. Output of each direct mapping approach given the SQL input

Stojanovic et al

Astrova et al

2004

Buccella et al.

Li et al.

Shen et al.

Astrova et al 2007

Consolidated

System

Ontology

Language

RDFS / F-Logic

OWL

OWL Full

OWL DL

RDFS Class

RDFS Subclass

RDFS Property

RDFS Domain

RDFS Range

RDFS Class

RDFS Subclass

RDFS Property

RDFS Domain

RDFS Range

OWL Class

RDFS Subclass

OWL

ObjectProperty

RDFS Domain

RDFS Range

OWL Class

RDFS Subclass

OWL

ObjectProperty

RDFS Domain

RDFS Range

OWL Class

RDFS Subclass

OWL

ObjectProperty

RDFS Domain

RDFS Range

OWL Class

RDFS Subclass

OWL

ObjectProperty

RDFS Domain

RDFS Range

OWL Class

RDFS Subclass

OWL

ObjectProperty

RDFS Domain

RDFS Range

RDFS Property

RDFS Domain

RDFS Property

RDFS Domain

OWL Datatype

Property

OWL Functional

Property

OWL

allValuesFrom

OWL Datatype

Property

OWL Object

Property

RDFS Domain

OWL Datatype

Property

OWL Object

Property

RDFS Domain

OWL Datatype

Property

OWL Object

Property

RDFS Domain

OWL Datatype

Property

OWL Object

Property

OWL Functional

Property

RDFS Domain

RDFS Range

OWL Functional

Property

OWL

AllValuesFrom

XML Schema

Datatypes

RDFS range

OWL Maximum

Cardinality of 1

RDFS range

OWL Maximum

Cardinality of 1

OWL Functional

Property

RDFS range

XML Schema

Datatypes

<referential

constraint

definition>

RDFS Property

OWL Object

Property

OWL Functional

Property

OWL InverseOf

OWL Object

Property

OWL Minimum

Cardinality of 1

OWL Maximum

Cardinality of 1

OWL Object

Property

OWL Minimum

Cardinality of 1

OWL Maximum

Cardinality of 1

OWL

AllValuesFrom

OWL Object

Property

OWL Cardinality of

OWL Symmetric

Property

OWL Transitive

Property

OWL Object

Property

OWL Functional

Property

OWL InverseOf

OWL Minimum

Cardinality of 0

<primary key

definition>

F-Logic axioms

OWL Cardinality

of 1

OWL Minimum

Cardinality of 1

OWL Maximum

Cardinality of 1

OWL Minimum

Cardinality of 1

OWL Maximum

Cardinality of 1

OWL Inverse

Functional Property

OWL Minimum

Cardinality of 1

OWL Cardinality of

OWL Functional

Property

<unique

constraint

definition>

F-Logic axioms

OWL Maximum

Cardinality of 1

OWL Maximum

Cardinality of 1

OWL Inverse

Functional Property

OWL Inverse

Functional Property

<not null

constraint

definition>

F-Logic axioms

OWL Cardinality

of 1

OWL Minimum

Cardinality of 1

OWL Minimum

Cardinality of 1

OWL Minimum

Cardinality of 1

OWL Cardinality of

<check

constraint

definition>

OWL Datatype

Property

OWL hasValue

OWL oneOf

OWL Datatype

Property

OWL oneOf

Stojanovic et al.

Stojanovic et al. published the first approach that aimed at extracting domain semantics from SQL-DDL and

transforming it to a Semantic Web representation (Stojanovic et al., 2002). Their primary contribution is an

“approach for an (automated) migration of data-intensive websites into the Semantic Web”. It provides rules for

translation of relational schemas to F-Logic and RDF Schema. Their rules consist of creating classes, subclasses and

properties. The only constraints that Stojanovic utilizes are Foreign Keys, which are used to create ontological

relationships. Stojanovic et al.’s work formally defines rules for identification of classes and properties in relational

schemas; the target language is RDF Schema. Its mapping process consists of four steps:

First, capture information from a relational schema through reverse engineering (relations, attributes, attributes

types, primary keys, and foreign keys/inclusion dependencies). Continue by mapping database entities into

ontological entities by a set of mapping rules by alignment of top level terms (which relation name corresponds to

which concept name), relations to concepts and then relation attributes to concept attributes. The mapping rules

consist of identifying Concepts, Inheritance and Relationships. A relationship is a concept except if the relation

expresses a many-to-many relationship. Furthermore, if information is spread across several relations; in this case all

the relations can be integrated into one concept. Inheritance is an inclusion dependency between two relations and

both relations are concepts. This rule conflicts when information is spread across several relations. In this case, the

user must decide which rule to apply, thus making this effort semi-automatic. Figures 16 and 17 represent the

translation of SQL-DDL to RDFS. In this example, two tables that share the same primary key could be mapped

either to one same concept or to subclass relationship. The system depends on a person to resolve the ambiguity. In

the figure, it is mapped to a subclass relationship.

create table STUDENT {

STUDID integer primary key,

NAME varchar}

create table PHDSTUDENT {

STUDID integer primary key,

YEAR date not null }

Figure. 16 SQL-DDL representation of Student and PhD Student

Figure. 17 RDFS result of translating the SQL-DDL in Figure. 15

with Stojanovic et al.’s rules and the SQL-DDL in Figure. 17 with

Astrova et al.’s rules

A many-to-many relationship, one-to-many relationship and one-to-one relationship are converted into properties. If

n-ary relationships exist, these must be transformed into further concepts and a set of binary relations. After the

mappings are manually evaluated, validated and refined, the data can be migrated by creating ontological instances

based on tuples of the relational database. The process consists of two steps: create the instances with a unique

identifier and translate all the attributes, except for foreign keys and then establish relations between instances using

the information contained in the foreign keys in the database tuples

Astrova et al 2004

Astrova et al’s initial approach (Astrova 2004) is very close to (Stojanovic et al., 2002). The motivation for their

approach “is to migrate from data-intensive Web pages into the Semantic Web”. It is also based on a reverse

engineering approach using SQL-DDL as the relational database model and transforming it into an RDFS ontology.

They differentiate themselves from Stojanovic et al. by stating that they “assume data equality and data inclusion,

thus being able to extract only a subset of semantics embedded within a relational database”. Astrova et al.

acknowledges that “hidden” semantics can be discovered by analyzing data overlap (intersection) and data

disjointedness (no intersection).

Astrova et al.’s transformation systems analyzes the key, data and attribute correlations to extract conceptual

schema, which expresses the semantics of a relational database. This process uses a relational database in third

normal form (3NF) and consists of the following classifications and mappings. Firstly, relations can be classified as

either Base (independent of other relations), Dependent (the primary key of a relation depends on another relation,

meaning that it is also a foreign key) or Composite (not Base or Dependent). Secondly, all relations are concepts

except if they are a Composite relation. In this case, it can be mapped to a concept, property or an inheritance. N-ary

relations become concepts. Third, all attributes are properties except if they are foreign keys or primary keys and

foreign keys. Then they are mapped to relationships.

Relations can be mapped by a combination of key, data and attribute correlation. Analyzing the different foreign

and primary key combinations of attributes lead to determine if a relation gets mapped to a concepts, properties or

relationships. When the key of the relation and its data are equal across two different relations, but the attributes

differ (Key and Data equality and Attribute disjointedness), the relationship expresses vertical partitioning and both

relations become one concept. However, when the key and attributes are equal across two different relations but the

data is different (Key and Attribute equality and Data disjointedness), the relationship expresses horizontal

partitioning and both relations become one concept. Furthermore, when the keys between two relations are equal

and the data of one relation is included in the other relation (Key equality and Data inclusion), the relationship

between both relations expresses single inheritance. For example, the SQL-DDL in Figure 18 would be transformed

to the RDFS in Figure 17. If the keys are equal across two relations and there is data overlap and data inclusion

between both relations (Key equality, Data overlap and Data inclusion), a new concept is discovered which would

express multiple inheritance. Figure 16 is an example of the SQL-DDL that represents key equality, data overlap and

data inclusion and it would be transformed to the RDFS in Figure 19. As it can be seen, Figure 19 creates a class that

is not necessary, therefore this rule may lead into ambiguity. Moreover, a new concept is also discovered if the keys

are equal, the data is disjoint and the attributes overlap between two relations. In addition, equal keys, data and

attributes between relations expresses a diamond-shape inheritance.

create table STUDENT {

STUDID integer primary key,

NAME varchar}

create table PHDSTUDENT {

STUDID integer primary key references STUDENT,

YEAR date not null }

Figure. 18 Another SQL-DDL representation of Student and PhD

Student

Figure. 19 RDFS result of translating the SQL-DDL in Figure. 15

with Astrova et al.’s rules

In summary, we have explained the two different approaches to transform SQL-DDL to RDFS. These

approaches are very similar because given the same SQL-DDL input, the resulting RDFS is the same. However

there are several mismatches that have been explained in the previous examples: different SQL-DDL can translate to

the same RDFS constructs and the same SQL-DDL can translate to different RDFS constructs. Since these systems

are not formalized transformation systems, they are prone to different interpretations and ambiguities.

Buccella et al

Buccella et al. published the first approach that takes SQL-DDL and transforms it to OWL. The transformation

system presented is not formalized but is based on expository examples. They cover five cases: tables without

constraint, tables with exactly one foreign key, tables with exactly two foreign keys, tables with more than two

foreign keys, and check constraints.

The first case translates tables without constraints are mapped to an OWL Class. All the attributes of the table

are mapped to OWL Functional Datatype Property. If the attribute is NOT NULL, it is mapped to a cardinality

constraint of 1. Primary Keys already have the NOT NULL implicit, therefore they are also mapped to a cardinality

constraint of 1. The Datatype properties do not have domain and range defined because they can be used by other

classes. Furthermore, the restriction owl:allValuesFrom is applied with a String. This approach would lead to

ambiguity. Take the following example in the following example.

create table PERSON {

ID integer primary key,

NAME varchar not null }

create table DEPT {

CODE varchar primary key,

NAME varchar unique not null }

Figure. 20 SQL-DDL of Person table and Dept table

Following this rule, when a Datatype Property “hasName” has been created for the Class PERSON, it would not

have the domain PERSON and Range string, because another class could reuse it. The table DEPT also has an

attribute name, therefore the previous Datatype Property created from the table PERSON could be reused for the

new class DEPT. Because it is not explicit that the class PERSON and class DEPT are disjoint, it may be inferred

that a name of a Dept is also a name of a Person which is false. Hence, due to the Open World Assumption, this rule

is ambiguous.

The next case is when a table has exactly one foreign key. This table is then mapped to an OWL Class. The non

foreign key attributes are mapped following the previous rules. There are now two sub cases: 1) if the foreign key is

a subset of the primary key and 2) if the foreign key is not a subset of the primary key. For the first sub case, the

primary key that is not a foreign key is mapped to an OWL functional datatype property with cardinality of 1 and

with the owl:allValuesFrom restriction, with the attributes datatype. The foreign key is mapped to a Functional

object property. This object property does have a domain and range, where the current table is the domain, and the

relationship of the foreign key as the range. The cardinality of 1 is also added and the inverse object property is also

created. This rule is one of the incompatibilities discussed in section 8.1 (Inheritance Modeling). For the second sub

case, the foreign key is mapped to a Functional Object Property with minimum cardinality of 1, the domain is the

current table and the range is the foreign key table. The Inverse property is also created which is not functional.

The third case is when a table has exactly two foreign keys. There are two sub cases: 1) it does not have any

additional attributes or 2) it does have additional attributes. For the first sub case, the table becomes an object

property with domain and range being the foreign key tables respectively. The second sub case would map the table

to an OWL Class and map the attributes to properties by following the respective rule.

If a table has more than two foreign keys, it is mapped to an OWL class and all the attributes are also mapped to

Datatype properties by following the respective rule.

Finally the CHECK constraint is considered. If check constraint has LIKE A%, a class “beginning_with_A” can

be created and that attribute can have a restriction that all values from that property have to be from that new class.

Another possibility is to have the enumerated check constraints which is then mapped to owl:oneOf

Li et al

Li et al’s approach is the first one to transform SQL-DDL to OWL with a combination of some formal notation and

English language. The rules generate ontological classes, properties, hierarchy, cardinality and instances.

Li et al presents a rule to learn an OWL Class which is if several relations are used to describe one entity by

sharing the same primary key, then they are all integrated into one ontological class. This rule is taken in account if

vertical partioning is considered in the relational schema. However, this same rule is used to learn inheritance. Thus,

applying this rule is subject to human interaction.

A rule is presented which creates a “has-part” and “is-part-of” Object Property, if a relation has a foreign key as

a primary key (Figure. 5). Nevertheless, this rule can also be used to represent hierarchy (Figure. 6). Therefore this

rule can have a double meaning. Furthermore, if a relation represents a n-ary relationship, then there is an object

property that connects every single relation with the other relations in the n-ary relationship. Taking this approach

can produce unnecessary relationships. Consider the example with a 3-ary relation from Figure. 11: Professor,

Course and Semester under the relation Offer shown in Figure. 21. This rule would create the relationships

Professor-Course, Professor-Semester, Course-Semester. Is the relationship Professor-Semester needed? Maybe the

other relationships were already created, and now it is repeated.

Figure. 21. Translating the Offer relation of Figure 11 following Li et al

OWL Cardinalities are learned from the constraints of attributes in the relations. If an attribute is a primary key

or foreign key then the minimum and maximum cardinality is 1. If an attribute is NOT NULL, the minimum

cardinality is 1. If an attribute is UNIQUE, the maximum cardinality is 1.

We believe that their shortcomings are due to lack of a formal system and thorough examination of examples

capturing a variety of modeling choices in various domains

Shen et al

Shen et al. (Shen et al. 2006) follows Li et al’s learning rules with slight variations. Shen et al does not include Li et

al’s rules of identifying a “has-part” relationship. Furthermore, it only includes a rule of identifying 3-ary

relationships. However it adds the use of OWL allValuesFrom restriction. This is possible if a foreign key is mapped

to an OWL Object Property, then the object property has an OWL allValuesFrom restriction, which refers to the

corresponding inclusion dependency. This may lead to a disparity discussed previously in Section 3.4.

Astrova et al 2007

Astrova et al. (Astrova et al. 2007) provides expository rules and examples to describe a system for automatic

transformation of a relational schema to an OWL ontology, which discovers more semantics than their previous

effort (Astrova et al. 2004). However, the output from this system does not conform to OWL DL restrictions.

Astrova et al’s approach is to map the relational model and the ontological model, which represent the relational

database and the ontology respectively.

A main difference between Astrova et al. and Li et al’s rules is the mapping of n-ary relationships. If the table is

composed of foreign keys to three other tables (3-ary relationship), this table is mapped to a class, instead of creating

object properties between the relationships, as shown in Figure. 22.

Figure. 22 Translating the Offer relation of Figure 11 following Astrova et al

Columns become OWL Datatype properties with a maximum cardinality of 1 unless it is a foreign key. The

property can also be defined as functional, which is the same as saying that the maximum cardinality is 1.

SQL data type is mapped to XML Schema Data types (XSD). If a CHECK constraint that verifies that an

integer is greater than 0 is used, then the XSD used is positiveInteger.

SQL constraints are mapped as follows. The attribute with a UNIQUE constraint is mapped to an inverse

functional property. This rule cannot be true if the target ontology is OWL DL because Inverse Functional

Properties can only be applied to Object Properties and not Data Type Properties (OWL Guide 2004). UNIQUE

describes a key constraint, and would lead to computational intractability even in realistic ontologies, and are not

supported by any implemented OWL reasoners (Lutz et al., 2004). If the ontology is OWL Full, then it can be

accepted.

Attributes with a NOT NULL constraint are mapped to a minimum cardinality of 1. PRIMARY KEY

constraints are mapped to OWL Inverse Functional Property (which encounters the same problem explained

previously) and a minimum cardinality 1.

If the foreign key is part of the primary key, it is also mapped to an object property accompanied with its

cardinality of 1. However, Li et al assigns this case by translating it into a “has-part” Object Property. This same

model can be used for inheritance, as explained in section 8.1. Finally, by taking vertical partioning in account, if the

foreign key is the primary key, then it is mapped to an inheritance.

If a foreign key is a reference to its same table, then it is considered a Symmetric Property. This rule cannot be

always applied, which has been explained in Section 8.2. If a column in a table is a foreign key to the same table

accompanied by ON DELETE CASCADE, the relationship consists of a whole part, where the part cannot exist

without the whole; therefore it is a transitive property.

CHECK constraints can be mapped several ways. If a column has a CHECK constraint to a single same value

for all instances, then it is mapped to OWL hasValue restriction. Even though this translation makes sense, this rule

appears to be ambiguous because it depends on a specific example, which is presented in the following example.

CREATE TABLE Project{

type VARCHAR CHECK (type =’Software’)}

<owl:hasValue rdf:datatype=”&xsd;string”>Software</owl:hasValue>

If the CHECK constraint is with enumeration, then it can be mapped to an enumerated data type owl:oneof.

Astrova et al. identifies more semantics from a relational databases schema as compared to Li et al. However,

the output from this system does not conform to OWL DL restrictions. Since the rules have not been formally

defined, the system is susceptible to ambiguities.

Lubyte et al

The work presented by Lubyte & Tessaris (2007) is an ontology extraction algorithm from a relational database and

proven to be an equivalence preserving schema transformation. Therefore, there is no information loss and the

extracted ontology can be used to access the data or evaluate queries correctly (Lubyte & Tessaris 2009). However

as mentioned in the introduction, Lubyte et al. considers the input as a logical schema (such as UML or ER) instead

of a physical schema. Furthermore, the output is not in a Semantic Web language, instead it is to DLR-DB ontology

language. This approach consists of two steps: first it classifies relations by analyzing the constraints and finally the

actual ontology is generated.

In the first step, relations can be classified as one of the following 4 cases: base relation is when the primary key

and foreign key of a relation do not share attributes; specific relation is when a primary key is among the foreign key

and either there is only one foreign key (the primary key and foreign key are the same attribute) or the relation is

referred to by another relation; relationship relation is when the primary key is composed by all the foreign keys.

Finally, all other relations are classified as ambiguous.

Once the relations have been classified, the second step of this approach starts: generating the ontology. The

algorithm presented in (Lubyte & Tessaris 2007) shows how the associations between each of the classified relations

are generated. The base and specific relations become the equivalent to ontological classes. Associations between

the newly formed classes are determined by an attribute that is not a primary key and is a foreign key in one relation

referencing another relation. Additional axioms are generated depending if the values are not nulls or unique.

Inheritance is identified when an ontological class, derived from a base or specific relation has an attribute that is a

primary key and a foreign key that references another relation. In addition, each exclusion dependency on the set of

specific relations produces a disjointness axiom and every covering constraint on the set of relations produces a

covering axiom. Each relationship relation is translated into an ontological class and the foreign key of the relation is

an association to another ontological class. For the case of ambiguous relations, the heuristic in their algorithm

“prefers” to generate an inheritance relationship, however the final user can determine the correct structure.

Even though (Lubyte & Tessaris 2009) also classifies relations based on the combination of primary keys and

foreign keys, it is not possible to consider if this approach is complete with respect to all the cases of Theorem 1

because it may miss cases due to the fact that its input is a logical schema. For example, this approach does not

consider the case if a relation has exactly two foreign keys and they are both primary keys. In this case, the relation

is mapped to an association itself between two classes, because this relation denotes a relationship between two

other relations. In a logical schema, this binary relation would not exist because it would be represented as a direct

link between the two relations. However, when translating the logical schema to the physical schema, this relation

appears. Due to the fact that this work does not target a Semantic Web language, it is not possible for it to be

completely evaluated with respect to all the other previously surveyed approaches.

6 A Consolidated System of Direct Mapping of a Relational Database to the Semantic Web

Building on the related work described in this paper, Tirmizi et al. (2008) defines a consolidated system for

automatic transformation of relational databases into OWL ontologies defined in first order logic (FOL) (Tirmizi et

al., 2008).

6.1 Assumptions

In order to translate a relational schema into an ontology, we make the following assumptions:

• The relational schema, in its most accurate form, is available in SQL DDL. As a good software engineering

practice, it is quite common to develop logical models for the relational schema. However, even after

deployment, a database undergoes modifications due to changing application requirements. Such modifications

are often not reflected on the logical models. Therefore, the physical model, easily expressed in SQL DDL,

becomes the most accurate source for the structure of the database.

• The relational schema is normalized, at least up to third normal form. While all databases might not be well

normalized, it is possible to automate the process of finding functional dependencies within data and to

algorithmically transform a relational schema to third normal form [Du et al. 1999, Wang et al. 2000].

6.2 Predicates and Functions

We have defined a number of predicates and functions to aid the process of defining transformation rules in first

order logic.

There are two sets of predicates in our system. RDB predicates test whether an argument (or a set of arguments)

matches a construct in the domain of relational databases. Such predicates are listed below and the examples are

with respect to Figure. 11:

Rel(r)

r is a relation (or table) identified by CREATE TABLE statement

Attr(x,r)

x is an attribute in relation r

NN(x,r)

x is an attribute (or a set of attributes) in relation r with NOT NULL

constraint(s); for example: NN(NAME,PERSON) holds

Unq(x,r)

x is an attribute (or a set of attributes) in relation r with UNIQUE constraint; for

example Unq({NAME},DEPT) holds

Chk(x,r)

x is an attribute in relation r with enumerated list (CHECK IN) constraint; for

example Chk(SESSION,SEMESTER) holds

PK(x,r)

x is the (single or composite) primary key of relation r identified using the

PRIMARY KEY constraint; for example: PK({OFFER,SID},STUDY) holds;

also: PK(x,r)

→

Unq(x,r)

∧

NN(x,r)

FK(x,r,y,s)

x is a (single or composite) foreign key in relation r and references y in relation

s; for example: FK({ID},STUDENT,{ID},PERSON) is satisfied

NonFK(x,r)

x is an attribute in relation r that does not participate in any foreign key; for

example: NonFK(NAME,DEPT) holds, NonFK(SID,REG) does not hold

Ontology predicates test whether an argument (or a set of arguments) matches a construct that can be represented in

an OWL ontology. These predicates are:

Class(m)

m is a class

ObjP(p,d,r)

p is an object property with domain d and range r

DTP(p,d,r)

p is an data type property with domain d and range r

Inv(p,q)

when p and q are object properties, p is an inverse of q

FP(p)

p is a functional property

IFP(p)

p is an inverse functional property, i.e. inverse of a functional property

Crd(p,m,v)

the (maximum and minimum) cardinality of property p for class m is v

MinC(p,m,v)

the minimum cardinality of property p for class m is v

MaxC(p,m,v)

the maximum cardinality of property p for class m is v

Subclass(m,n)

m is a subclass of class n

The constructs represented by ontology predicates are described as they appear in the rules mentioned in the

upcoming sections of this paper.

We have also defined the following functions:

fkey(x,r,s)

takes a set of attributes x, relations r and s, and returns the foreign key defined

on attributes x in r referencing x; undefined if there is no such foreign key

type(x)

maps an attribute x to its suitable OWL recommended data type (we discuss

data types in more detail in a later section)

list(x)

maps an attribute x to the list of allowed values for it; this function is applicable

only to attributes that have a CHECK IN constraint defined on them, i.e.

Chk(x,r) is true

In addition to the predicates and functions listed above, we describe the concept of a binary relation, written BinRel,

as a relation that only contains two (single or composite) foreign keys that reference other relations. Such tables are

used to resolve many-to-many relationships between entities. Using RDB predicates, we define BinRel as follows:

Rule Set 1:

BinRel(r,s,t)

←

Rel(r)

∧

FK(xtr,r,_,t)

∧

FK(xsr,r,_,s)

∧

xtr

≠

xsr

∧

Attr(y,r)

∧

NonFK(y,r)

∧

FK(z,r,_,u)

∧

fkey(z,r,u)

∈

{fkey(xsr,r,s),fkey(xtr,r,t)}

This rule states that a binary relation r between two relations s and t exists if r is a relation that has foreign keys to s

and t, and r has no other foreign keys or attributes (each attribute in the relation belongs to one of the two foreign

keys). Note that there is no condition that requires s and t to be different, allowing binary relations that have their

domain equal to their range.

6.3 Transformation Rules and Examples

In this section we present rules and examples for transformation of a relational database to an OWL ontology.

6.3.1 Producing Unique Identifiers (URIs) and Label

Before we discuss the transformation rules, it is important to understand how we can produce identifiers and names

for classes and properties that form the ontology.

The concept of globally unique identifiers is fundamental to OWL ontologies. Therefore, each class or property

in the ontology must have a unique identifier, or URI. While it is possible to use the names from the relational

schema to label the concepts in the ontology, it is necessary to resolve any duplications, either by producing URIs

based on fully qualified names of schema elements, or by producing them randomly. In addition, for human

readability, RDFS labels should be produced for each ontology element containing names of corresponding

relational schema elements.

For the purposes of this paper, we have not used fully qualified names in our examples. When needed, we

append a name with an integer to make it unique, e.g. ID1, ID2 etc.

6.3.2 Transformation of Data Types

Transformations from relational schemas to ontologies require preserving data type information along with the other

semantic information. OWL (and RDF) specifications recommend the use of a subset of XML Schema types (Biron

et al. (Ed.) 2004) in Semantic Web ontologies (Dean & Schreiber (Ed.) 2004, Hayes (Ed.) 2004).

In Table 6 we present a list of commonly used SQL data types along with their corresponding XML Schema

types. During transformation of data type properties, the SQL data types are transformed into the corresponding

XML Schema types.

Table 6 Some common SQL data types and corresponding XML Schema types recommended for OWL

SQL Data Type

XML Schema Type

SQL Data Type

XML Schema Type

INTEGER

xsd:integer

VARCHAR

xsd:string

DECIMAL

xsd:decimal

CHAR

xsd:string

FLOAT

xsd:float

DATE

xsd:date

REAL

xsd:double

TIME

xsd:time

BOOLEAN

xsd:boolean

TIMESTAMP

xsd:dateTime

6.3.3 Identifying Classes

According to the OWL Language Guide [Smith et al. (Ed.) 2004], “the most basic concepts in a domain should

correspond to classes …” Therefore we would expect basic entities in the data model to translate into classes in an

OWL ontology.

Given the definition of a binary relation, it is quite straightforward to identify OWL classes from a relational

schema. Any relation that is not a binary relation can be mapped to a class in an OWL ontology, as stated in the rule

below.

Rule Set 2:

Class(r)

←

Rel(r)

∧

BinRel(r,_,_)

In combination with Rule Set 1, a binary relation has exactly two foreign keys and no other attributes, this very

simple rule covers a number of cases for identifying classes:

• All tables that do not have foreign keys should be transformed to classes. In our example schema, the PERSON

table does not have a foreign key, so Rel(PERSON) is true and BinRel(PERSON,_,_) is false. Therefore, we

conclude Class(PERSON), i.e. Person should be mapped to a class. The same reasoning holds for the Dept and

Semester tables.

• All tables that have one foreign key should also be transformed to classes. No such tables can satisfy the BinRel

predicate. Using the same rule we conclude that STUDENT, PROFESSOR and COURSE should be mapped to

classes.

• The tables with more than two foreign keys should be transformed to classes as well. Such tables may represent

an entity (when they have attributes not appearing in a foreign key) or an N-ary relationship between entities

(where all attributes appear in foreign keys). Fortunately, in OWL, both these cases can be modeled the same

way, i.e. by translating the entity or the N-ary relationship into a class (Noy 2006). From our running example,

OFFER represents an N-ary relationship, and can be modeled as a class using the given rule.

• For tables containing exactly two foreign keys, presence of independent attributes qualifies them to be treated as

entities instead of binary relations, and translated to classes in OWL ontologies. The table STUDY, with an

independent attribute Grade, is an example of this case, and is translated to an OWL class.

So, as a result of applying Rule Set 2, we have successfully identified the classes (see Table 7) from our relational

schema.

Table 7 Classes identified from the relational schema by applying Rule Set 2

Classes

Class(PERSON)

Class(STUDENT)

Class(PROFESSOR)

Class(DEPT)

Class(SEMESTER)

Class(COURSE)

Class(STUDY)

Class(OFFER)

6.3.4 Identifying Object Properties

A property is a relationship that lets us assert general facts about the members of classes. There are two major types

of properties, object properties and data type properties (Smith et al. (Ed.) 2004). Properties have a domain, which

defines the subject, and a range, which defines the object (Dean & Schreiber (Ed.) 2004). We will describe data type

properties in the next section.

An object property is a relation between instances of two classes in a particular direction. The direction is from

the domain of the object property to the range of the object property. In practice, it is often useful to define the

direction of an object property in both directions, creating a pair of object properties that are inverses of each other.

OWL provides us the means to mark properties as inverses of each other. In our work, when we translate something

to an object property, say ObjP(r,s,t), it implicitly means we have created an inverse of that property, written r’ in

our notation, such that, ObjP(r’,t,s).

There are two ways of extracting OWL object properties from a relational schema. One of the ways is through

identification of binary relations, which represent many-to-many relationships. The following rule identifies an

object property using a binary relation.

Rule Set 3:

ObjP(r,s,t)

←

BinRel(r,s,t)

∧

Rel(s)

∧

Rel(t)

∧

BinRel(s,_,_)

∧

¬BinRel(t,_,_)

This rule states that a binary relation r between two relations s and t, neither being a binary relation, can be

translated into an OWL object property with domain s and range t. Notice that the rule implies Class(s) and Class(t)

hold true, so the domain and range of the object property can be expressed in terms of corresponding OWL classes.

From our university database schema, only the Reg table fits the condition. Reg is a binary relation between

STUDENT and SEMESTER entities, which are not binary relations. Therefore, ObjP(REG,STUDENT,SEMESTER)

holds, and since we can create inverses, ObjP(REG’,SEMESTER,STUDENT) and Inv(REG,REG’) also hold true.

Foreign key references between tables that are not binary relations represent one-to-one and one-to-many

relationships between entities. A pair of object properties that are inverses of each other and have a maximum

cardinality of 1 can represent one-to-one relationships. Also, one-to-many relationships can be mapped to an object

property with maximum cardinality of 1, and an inverse of that object property with no maximum cardinality

restrictions.

In OWL, a (data type or object) property with minimum cardinality of 0 and maximum cardinality of 1 is called

a functional property, represented as FP in our rules. If an object property is functional, then its inverse is an inverse

functional property, represented as IFP. In addition to specifying cardinality restrictions on properties in general, we

can also specify such restrictions when a property is applied over a particular domain. In our rules, we use ontology

predicates Crd, MinC and MaxC to specify these restrictions. The examples following the rules explain the use of

these predicates.

The following rule set identifies object properties and their characteristics using foreign key references (not

involving binary relations, covered in Rule Set 3) with various combinations of uniqueness and null restrictions. To

simplify the rules, we first define a predicate NonBinFK – representing foreign keys not in or referencing binary

relations – and then express the rules in terms of this predicate.

Rule Set 4:

NonBinFK(x,s,y,t)

≡

FK(x,s,y,t)

∧

Rel(s)

∧

Rel(t)

∧

BinRel(s,_,_)

∧

¬BinRel(t,_,_)

ObjP(x,s,t), FP(x), MinC(x’,t,0)

←

NonBinFK(x,s,y,t)

∧

NN(x)

∧

Unq(x)

ObP(x,s,t), FP(x), Crd(x,s,1), MinC(x’,t,0)

←

NonBinFK(x,s,y,t)

∧

NN(x)

∧

Unq(x)

ObjP(x,s,t), FP(x), FP(x’)

←

NonBinFK(x,s,y,t)

∧

NN(x)

∧

Unq(x)

ObjP(x,s,t), FP(x), Crd(x,s,1), FP(x’)

←

NonBinFK(x,s,y,t)

∧

NN(x)

∧

Unq(x)

∧

PK(x,s)

Each rule in Rule Set 4 states that a foreign key represents an object property from the entity containing the

foreign key (domain) to the referenced entity (range). Since a foreign key can reference at most one record

(instance) of the range, the object property is functional. This requieres that inverse of that object property is inverse

functional. One example from our university schema is the foreign key from Study to Student which gives us:

ObjP(RNO,STUDY,STUDENT), FP(RNO), Inv(RNO’,RNO), ObjP(RNO’,STUDENT,STUDY), IFP(RNO’).

Rules 4a and 4b represent variations of one-to-many relationships.

• We can apply a stronger restriction on cardinality of the object property if the foreign key is constrained as NOT

NULL. Without this constraint, (rule 4a), the minimum cardinality is 0, which is already covered by functional

property predicate. With this constraint (rule 4b), we can set the maximum and minimum cardinality to 1.

• These rules imply, a minimum cardinality restriction of 0 on the inverse property. An instance in the range can be

referenced by any number of instances in the domain, thus, there is no maximum cardinality restriction on the

inverse property.

The other two rules, 4c and 4d, represent one-to-one relationships, modeled by applying a uniqueness constraint

on the foreign key. It means that an instance in the range can relate to at most one object in the domain, making the

inverse property functional too. This also means that the original object property is inverse functional as well.

The difference between rules 4c and 4d is the NOT NULL constraint. Like one-to-many relationships

mentioned above, if NOT NULL is present, it gives us a stronger cardinality restriction on the object property

represented by the foreign key.

Notice that none of the rules allow the foreign key to be the same as the primary key of the domain relation.

Rule 4d restricts this by providing an extra condition, whereas the negation of uniqueness or NOT NULL constraints

in rules 4a-c, by definition, implies this condition.

We do not create an object property if the foreign is being equal to the primary key. Instead, we consider it as a

pattern for inheritance and propose a rule for inheritance mapping. On the other hand, we are unable to capture the

inheritance between STUDENT and PERSON, since it is not a unique inheritance pattern, and we transform this

relationship into an object property.

A list showing some object properties and their characteristics obtained from the sample relational schema by

applying Rule Sets 3 and 4 are presented in Table 8.

Table 8 Some object properties identified from the relational schema by applying Rule Sets 3 and 4. An object property P implies the

existence of an inverse property P’. Due to lack of space, we explicitly specify the inverse property only for the first property.

Object Properties

ObjP(REG,STUDENT,SEMESTER), ObjP(REG’,SEMESTER,STUDENT), Inv(REG,REG’)

ObjP(ID1,STUDENT,PERSON), FP(ID1), FP(ID1’), Crd(ID1,STUDENT,1)

ObjP(CODE,COURSE,DEPT), FP(CODE), IFP(CODE’), Crd(CODE,COURSE,1),

MinC(CODE’,DEPT,0)

ObjP(CNO,OFFER,COURSE), FP(CNO), IFP(CNO’), MinC(CNO’,COURSE,0)

ObjP(CONO,OFFER,OFFER), FP(CONO), IFP(CONO’), MinC(CONO’,OFFER,0)

ObjP(RNO,STUDY,STUDENT), FP(RNO), IFP(RNO’), MinC(RNO’,STUDENT,0)

6.3.5 Identifying Data Type Properties

Data type properties are relationship between instances of classes with RDF literals and XML Schema data types.

Like object properties, data type properties can also be functional, and can be specified with cardinality restrictions.

However, unlike object properties, OWL DL does not allow them or their inverses to be inverse functional.

Attributes of relations in a database schema can be mapped to data type properties in the corresponding OWL

ontology. Rule Set 5 identifies data type properties in a relational schema.

Rule Set 5:

DTP(x,r,type(x)), FP(x)

←

NonFK(x,r)

DTP(x,r,type(x)), FP(x), Crd(x,r,1)

←

NonFK(x,r)

∧

NN(x,r)

DTP(x,r,type(x)

∩

list(x)), FP(x)

←

NonFK(x,r)

∧

Chk(x,r)

Rule Set 5 says that attributes that do not contribute towards foreign keys can be mapped to data type properties with

range equal to their mapped OWL type. Since each record can have at most one value per attribute, each data type

property can be marked as a functional property. When an attribute has a NOT NULL constraint, rule 5b allows us

to put an additional cardinality restriction on the property. Rule 5c allows us to infer stronger range restrictions on

attributes with enumerated list (CHECK IN) constraints.

In some cases, it may be possible to apply more than one rule to an attribute. In such cases, all possible rules

should be applied to extract more semantics out of the relational schema. Some data type properties extracted from

our sample university database schema are listed in Table 9.

Table 9. Some data type properties identified from the relational schema by applying Rule Set 5.

Data Type Properties

DTP(ID1,PERSON,xsd:integer), FP(ID1), Crd(ID1,PERSON,1)

DTP(NAME1,PERSON,xsd:string), FP(NAME1), Crd(NAME1,PERSON,1)

DTP(ROLLNO,STUDENT,xsd:integer), FP(ROLLNO), Crd(ROLLNO,STUDENT,1)

DTP(DEGREE,STUDENT,xsd:string), FP(DEGREE)

DTP(SNO,SEMESTER,xsd:integer), FP(SNO), Crd(SNO,SEMESTER,1)

DTP(YEAR,SEMESTER,xsd:date), FP(YEAR), Crd(YEAR,SEMESTER,1)

DTP(SESSION,SEMESTER,xsd:string

∩

{SPRING,SUMMER,FALL}), FP(SESSION)

DTP(GRADE,STUDY,xsd:string), FP(GRADE)

6.3.6 Identifying Inheritance

Inheritance allows us to form new classes using already defined classes. It relates a more specific class to a more

general one using subclass relationships (Smith et al. (Ed.) 2004).

Inheritance relationships between entities in a relational schema can be modeled in a variety of ways. As

discussed earlier, since most of these models are not limited to expressing inheritance alone, sometimes it is hard to

identify subclass relationships in a relational schema.

The following rule describes a common case that can be used for inheritance modeling in a normalized database

design.

Rule Set 6:

Subclass(r,s)

←

Rel(r)

∧

Rel(s)

∧

PK(x,r)

∧

FK(x,r,_,s)

This rule states that an entity represented by a relation r is a subclass of an entity represented by relation s, if the

primary key of r is a foreign key to s. In our sample university schema, we can clearly identify that

Subclass(PROFESSOR,PERSON) holds.

6.4 Implementation

As a result of applying our rules on the given relational schema, we derive the ontology shown in Table 10.

A comparison of the ontologies produced by the domain expert with the one produced automatically using our

rules (shown in Figure 11) shows a number of differences. For example, the rules of the consolidated system are

unable to capture the subclass relationship of STUDENT with PERSON, or the symmetric and transitive

characteristics of the co-location relationship among OFFER instances. These examples clearly show that automatic

translation of a relational schema to an ontology has some limitations, and that these limitations are inline with the

disparities we have identified earlier.

As a result of presenting this consolidated system in FOL, these rules have been implemented in the JESS rule

language; therefore the FOL expression is stratified.

For sake of this example, we have implemented the rules in JESS. To obtain the SQL-DDL, we analyze only the

data dictionary of MySQL databases. The ontology is then generated using the Jena framework.

Table 10 Parts of an ontology corresponding to the University Database, produced automatically by applying the rules on the given SQL

DDL. The output format is OWL Abstract Syntax.

Automatically Produced Ontology

Ontology(<urn:sql2owl>

ObjectProperty(<REG> domain(<STUDENT>) range(<SEMESTER>))

ObjectProperty(<REG_I> inverseOf(<REG>))

ObjectProperty(<COURSE.DEPTCODE> Functional domain(<COURSE>) range(<DEPT>))

ObjectProperty(<COURSE.DEPTCODE_I> InverseFunctional inverseOf(<COURSE.DEPTCODE>))

ObjectProperty(<OFFER.CONO> Functional domain(<OFFER>) range(<OFFER>))

ObjectProperty(<OFFER.CONO_I> InverseFunctional inverseOf(<OFFER.CONO>))

ObjectProperty(<STUDENT.ID> Functional InverseFunctional

domain(<STUDENT>) range(<PERSON>))

ObjectProperty(<STUDENT.ID_I> Functional InverseFunctional

inverseOf(<STUDENT.ID>))

...

DatatypeProperty(<COURSE.CNO> Functional domain(<COURSE>) range(xsd:integer))

DatatypeProperty(<SEMESTER.YEAR> Functional domain(<SEMESTER>) range(xsd:date))

DatatypeProperty(<SEMESTER.SESSION> Functional domain(<SEMESTER>)

range(oneOf("SPRING" "SUMMER" "FALL")) range(xsd:string))

...

Class(<PERSON> partial ...)

Class(<PROFESSOR> partial <PERSON> ...)

Class(<STUDENT> partial restriction(<STUDENT.ID> cardinality(1))

restriction(<STUDY.RNO_I> minCardinality(0)) ...)

Class(<COURSE> partial restriction(<COURSE.DEPTCODE> cardinality(1))

restriction(<COURSE.CNO> cardinality(1)) ...)

...

)

7 An Evaluation of Direct Mapping Approaches

We believe that the following are fundamental ingredients for a transformation system aiming for the migration of

relational data to the Semantic Web:

• The rules for a transformation system should be specified formally to avoid any syntactic or semantic

ambiguities in the specifications. Rules defined in formal systems like first order logic can be implemented

in languages such as Prolog, Datalog or JESS.

• When developing rules for automatic translation of a relational database to an ontology, special care should

be taken to avoid the influence of domain specific examples, thus an automated transformation system

should try to capture the semantics offered by the schema definition language alone. Sometimes, the

influence of examples from a particular domain can result in incorrect rules that are based on enumerating

examples instead of focusing on formal power.

• The transformation system should infer semantic properties by considering all the possible combination of

foreign and primary keys, hence satisfying all the cases of Theorem 1.

We have surveyed existing approaches of direct mapping relational databases to RDFS and OWL ontologies. In

Table 11 we show the approaches that have formal rules and can be mapped to the different cases of Theorem 1. If

an approach does not present formal rules, they do not appear in Table 11. Stojanovic et al and Astrova et al were

the first approaches to consider extracting the domain semantics from the SQL-DDL and transforming them into

RDFS. The majority of their rules overlap, meaning that given the same SQL-DDL input, it would output the same

RDFS. The difference between these two approaches on how they each identify inheritance. Stojanovic et al

presents the transformation rules in a formal manner. As shown in Table 11, Stojanovic et al’s rules are not

complete with respect to Theorem 1 in reference when a primary key is a subset of a foreign key. On the other hand,

Astrova et al presents the transformations through examples without any formality, which may lead to ambiguity;

for that reason, it cannot be determined if Astrova et al’s approach is complete with respect to Theorem 1 and is not

shown in Table 11.

Another series of surveyed direct mapping approaches transform the SQL-DDL to OWL ontologies. Li et al

does use a combination of formal notation and English language and has enumerated the rules, thus it is easier to

identify which rules apply to specific cases. Table 11 shows that Li et al’s approach has rules to satisfy each case of

Theorem 1. Furthermore, Shen et al extends on Li et al’s approach, which is also presented in a combination of

formal notation and English language. However, Shen et al does not present rules to satisfy case 7 and 10, as seen in

Table 11, hence this approach is not complete with respect to Theorem 1. On the other hand, Bucella et al and

Astrova et al only present examples of transformation without any formalism at all, therefore it is ambiguous and

there is no formal way to determine if they are complete with respect to Theorem 1, thus not showing up in Table

11. Building on the surveyed work, we also present a consolidated direct mapping system for automatic

transformation of normalized relational schemas represented in SQL DDL into OWL ontologies, which builds upon

the surveyed approaches. We have defined the entire set of transformation rules in first order logic eliminating the

possibility of syntactic and semantic ambiguities and complete with respect to Theorem 1. The use of first order

logic also allows for easy implementation of the system in rule-based languages. In defining the consolidated rules,

we have ensured compatibility with description logics based OWL sublanguage (OWL DL), which is essential to

assuring decidability for reasoning.

Table 11 Rules that apply to each of the Closure set item

Relation with

Stojanovic et al.1

Li et al.2

Shen et al. 3

Consolidated

System4

1) PK

Rule C1, C2, C3

Rule 2, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 5

2) C-PK

Rule C1, C2, C3,

Rule 2, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 5

3) S-FK

Rule C1, C2, C3,

Rule 2, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 4, 5

4) N-FK

Rule C1, C2, C3,

Rule 2, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 4, 5

5) PK + S-FK

a) PK = S-FK

Rule C1, C2, C3,

A3, I1

Rule 2, 7, 8, 9, 10, 11

Rule C-1, C-3, P-1.2, R-

2, R-3, R-4

Rule 2, 5, 6

b) PK ∩ S-FK = 0

Rule C1, C2, C3,

Rule 2, 3, 7, 9, 10, 11

Rule C-1, P-1.1, P-1.2,

R-2, R-3, R-4

Rule 2, 4, 5

6) PK + N-FK | N = 2

a) PK ∩ N-FK = 0

Rule C1, C2, C3,

Rule 2, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 4, 5

b) PK

⊂

N-FK

Rule C1, C2, C3,

Rule 2, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 4, 5, 6

7) PK + N-FK | N > 2

a) PK ∩ N-FK = 0

Rule C1, C2, C3,

Rule 2, 7, 9, 10, 11

Rule 2, 4, 5

b) PK

⊂

N-FK

Rule 2, 7, 9, 10, 11

Rule 2, 4, 5, 6

8) C-PK + S-FK

a) C-PK ∩ S-FK = 0

Rule C1, C2, C3,

Rule 2, 3, 7, 9, 10, 11

Rule C-1, P-1.1, P-1.2,

R-2, R-3, R-4

Rule 2, 5, 6

b) S-FK

⊂

C-PK:

Rule 2, 4, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 4, 5

9) C-PK + N-FK | N = 2

a) C-PK ∩ N-FK = 0

Rule C1, C2, C3,

Rule 2, 3, 7, 9, 10, 11

Rule C-1, P-1.1, P-1.2,

R-2, R-3, R-4

Rule 2, 4, 5

b) N-FK

⊆

C-PK

Rule C1, C2, C3,

Rule 2, 5, 6, 7, 9, 10,

Rule C-1, P-2, P-1.2, R-

2, R-3, R-4

Rule 2, 3, 4, 5

c) C-PK ∩ N-FK

≠

C-PK – N-FK

≠

N-FK – C-PK

≠

Rule 2, 7, 9, 10, 11

Rule C-1, P-1.2, R-2, R-

3, R-4

Rule 2, 4, 5

10) C-PK + N-FK | N > 2

a) C-PK ∩ N-FK = 0

Rule C1, C2, C3,

Rule 2, 3, 7, 9, 10, 11

Rule 2, 4, 5

b) N-FK

⊆

C-PK

Rule C1, C2, C3,

Rule 2, 5, 6, 7, 9, 10,

Rule 2, 3, 4, 5

c) C-PK ∩ N-FK

≠

C-PK – N-FK

≠

N-FK – C-PK

≠

Rule 2, 7, 9, 10, 11

Rule 2, 4, 5

1 The rule numbers correspond to the same rule numbers of (Stojanovic et al 2002)

2 The rule numbers correspond to the same rule numbers of (Li et al 2005)

3 The rule numbers correspond to the same rule numbers of [Shen et al 2006]

4 The rule numbers correspond to the same rule numbers of Section 6

In Table 5, we present the output of each direct mapping transformation system given the SQL-DDL input. As we

can see, there is a major overlap in the output of all direct mapping systems as presented in Section 3.1. Stojanovic

et al and Astrova el at (1) only output RDFS ontologies. Buccella et al and Shen et al are the only approaches that

considers using owl:AllValuesFrom, however they are used differently. This leads us to acknowledge that

identifying the use of owl:allValuesFrom and owl:someValuesFrom directly from a relational schema is still an open

problem, as expressed in Section 3.4. In addition there is a difference in the usage of the cardinalities that are

identified. Some approaches present the use of owl:maxCardinality and owl:minCardinality while others use only

owl:cardinality.

In addition in Table 12, we also present the specific RDFS and OWL constructs that are created through each

direct mapping approach. We can realize that Stojanovic et al and Astrova et al (1) output the same RDFS

constructs. Li et al and Shen et al also produce the same OWL outputs, for the exception that Shen et al is able to

use owl:allValuesFrom. Astrova et al (2) is the only approach able to output owl:SymmetricProperty,

owl:TransistiveProperty and owl:hasValue. However, the rules to output these specific constructs are ambiguous and

therefore not recommendable. Hence, we acknowledge that identifying symmetric and transitive properties from a

relational schema is also an open problem.

Table 12 RDFS/OWL constructs that are outputted by each direct mapping approach

Stojanovic

et al.

Astrova

et al. (1)

Buccella

et al.

Li et

al.

Shen et

al.

Astrova et

al. (2)

Consolidated

System

owl:class

rdfs:class

owl:allValuesFrom

owl:someValuesFrom

owl:hasValue

Cardinality constraint

owl:maxCardinality

owl:minCardinality

owl:cardinality

owl:intersectionOf

owl:unionOf

owl:complementOf

rdfs:subClassOf

owl:equivalentClass

owl:disjointWith

owl:oneOf

owl:DatatypeProperty

owl:ObjectProperty

RDFS Properties

rdfs:subPropertyOf

rdfs:domain

rdfs:range

owl:equivalentProperty

owl:inverseOf

owl:FunctionalProperty

owl:InverseFunctionalProperty

owl:TransitiveProperty

owl:SymmetricProperty

8 Discussion and Future Directions

The Semantic Web is showing a constant growth pattern. This has been attributed to the Linked Open Data project5

created by the W3C’s Semantic Web Education and Outreach Interest Group6. The main goal of this project is to

publish data sets as RDF on the web and create links between them. For common web users, the Semantic Web does

not yet have great significance; therefore it is rare that integrating a relational database with the Semantic Web

accrues obvious benefit to the owners of website. It is expected that the Semantic Web will become mainstream and

more accessible to common web users; therefore the need to integrate databases to the Semantic Web will increase.

The W3C foresees this issue and has already organized the Relational Database to RDF (RDB2RDF) working

group7 in order to recommend a standardized mapping language between relational databases to RDF. Given this

situation, it is important to understand the relationship between relational databases and the Semantic Web.

In this work we have surveyed different approaches that automatically expose relational databases to the

Semantic Web. This process, called direct mapping, automatically synthesizes an OWL putative ontology from the

relational database schema and exposes the relational data as RDF instances of the ontology. We introduce the

concept of the “putative ontology”, which is a syntactic transformation from the basic implicit domain semantics of

a relational database to an ontology, which is obtained by applying direct mapping rules. We present Theorem 1,

which states that a direct mapping transformation must consider all the possible primary key and foreign key

combinations. We compare all the approaches and identify which ones satisfy Theorem 1. Finally, we present a

consolidated direct mapping system.

Although there are many proposals for direct-mapping SQL schema to ontologies, they all depend on the

richness of the SQL schema with respect to its encoding of domain semantics. Any skepticism of the quality of the

putative ontology is definitely well founded with respect to many legacy databases. Foreign key constraints were

introduced as part of the SQL93 standard. Clearly, all direct mapping systems depend on explicit expression of such

constraints to achieve accurate results. However, starting a number of years ago, databases began to be designed

using sophisticated data modeling tools that capture considerable domain semantics such as foreign keys and check

constraints. If the schema has been developed with such tools and in a normalized manner, the schema can portray

enough semantics to create a semantically rich ontology. If this is the case, then direct mapping is the easiest way to

create RDF from relational database data, by virtue of being completely automated.

On the other hand, the amount of domain semantics captured in SQL-DDL models is highly variable and

occasionally does not contain rich structure. If these types of models are used in a direct mapping system, the

putative ontology that is derived will lack rich semantics. Even though the putative ontology may not have the same

expressivity as a domain ontology, it is not incorrect, hence the name putative. The deficiencies of a putative

ontology become evident when an ontology produced by the system is compared to an ontology produced by a

domain expert.

We expose patterns between relational databases and ontologies, and the difficulties when it comes to

identifying richer semantics in a relational database schema. Firstly we show how normalization can affect the

putative ontology and demonstrate that inheritance in a relational database schema can be modeled in different

ways; therefore a single rule to identify inheritance is not possible. In addition, symmetric and transitive

relationships may be implicit in the domain but are not explicit in the relational schema. Furthermore, identifying

specific value constraints such as “for all” or “there exist” may imply identifying new properties or reusing existing

properties. Continuously, it may also have implications in the open and closed world assumptions. Additionally,

SQL’s check constraint can represent different types of semantics such as enumeration, inheritance or business type

rules. Finally, we recognize that analyzing triggers to extract semantics applicable to ontologies is still an open

problem. Overall, there is also a need to evaluate the quality of a putative ontology and a domain ontology.

We assert that the scope and frequency of success of direct mapping transformation systems is very unlikely to

achieve the scope that database-ontology mapping methods can achieve, however we consider direct mapping the

first step in a two step process. Sequeda et al. presents an application of direct mapping a relational database and

manually mapping the putative ontology to a domain ontology, in order to overcome database heterogeneity

(Sequeda et al., 2009). Direct mapping converts the relational database to ontology mapping problem to an ontology

to ontology mapping problem. Therefore, we consider that the next step in this area is to bridge the gap between

putative and domain ontologies.

In order to bridge this gap, we propose to refine the putative ontology through a bootstrapping architecture

(Sequeda et al., 2008). This refinement process consists of initially matching the putative ontology with an existing

5 http://esw.w3.org/SweoIG/TaskForces/CommunityProjects/LinkingOpenData

6 http://www.w3.org/2001/sw/sweo/

7 http://www.w3.org/2009/08/rdb2rdf-charter.html

domain ontology. Due to the fact that the putative ontology lacks semantics and may have some ambiguity with

respect to the domain ontology, we treat the alignment between the putative and domain ontology as an alignment

hypothesis. Querying and analyzing the relational database data can test the alignment hypothesis. After the

hypothesis has been tested, the new knowledge is used to refine the putative ontology. For example, a putative

ontology may not represent inheritance, but its existence could be determined by the mapping to a domain ontology

and the analysis of the content of the relational database. In this process, the putative ontology would be

automatically matched to other domain ontologies. This would be an iteratively process with different domain

ontologies, until the putative ontology has been refined and it completely represents the domain semantics of the

relational database.

Acknowledgments

This research is funded in part by the National Science Foundation grant IIS-0531767 and DBI-0851052 and the

Spanish fundamental R&D project myBigData.

References

Aho, A., Lam, M., Sethi, R., and Ullman, J. 2006. Compilers: Principles, Techniques and Tools. Addison-Wesley.

An, Y., Borgida, A., & Mylopoulos, J. 2005. Inferring Complex Semantic Mappings between Relational Tables and Ontologies from Simple

Correspondences. In Proceedings of On The Move to Meaninful Internet Systems (OTM’05): CoopIS, DOA, and ODBASE, Agia Napa,

Cyprus. Springer Verlag.

An, Y., Mylopoulos, J., & Borgida, A. 2006. Building Semantic Mappings from Databases to Ontologies. In Proceedings of the 21th National

Conference on Artificial Intelligence (AAAI-06) Nectar Track.

Arens, A., Chee, C.Y., Hsu, C., & Knoblock, C. 1993. Retrieving and Integrating Data from Multiple Information Sources. International Journal

on Intelligent and Cooperative Information Systems 2(2), 127-158.

Astrova, I. 2004. Reverse Engineering of Relational Databases to Ontologies. In Proceedings of 1st European Semantic Web Symposium (ESWS

2004). Crete, Greece. Springer Berlin

Astrova, I., Korda, N., & Kalja, A. 2007. Rule-Based Transformation of SQL Relational Databases to OWL Ontologies. In Proceedings of 2nd

International Conference on Metadata & Semantic Research.

Barrasa, J., & Gomez-Perez, A. 2006. Upgrading relational legacy data to the semantic web. In Proceedings of 15th International World Wide

Web Conference. ACM. 1069-1070

Barrasa, J., Corcho, O., & Gomez-Perez, A. 2004. R2O, an Extensible and Semantically Based Database-to-Ontology Mapping Language. In

Proceedings of 2nd Workshop on Semantic Web and Databases.

Bizer, C., & Seaborne, A. 2004. D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs. In Proceedings of 3rd International Semantic

Web Conference. Hiroshima, Japan

Buccella, A., Penabad, M. R., Rodríguez, F. J., Fariña, A., Cechich, A. 2004. From Relational Databases to OWL Ontologies. In Proceedings of

the 6th Russian Conference on Digital Libraries. Pushchino, Rusia.

Ceri, S., Widom, J. 1993. Managing Semantic Heterogeneity with Production Rules and Persistent Queues. In Proceedings of 19th Very Large

Databases Conference. Dublin, Ireland

Chen, H., Wang, Y., Wang, H., Mao, Y., Tang, J., Zhou, C., et al. 2006. Towards a Semantic Web of Relational Databases: a Practical Semantic

Toolkit and an In-Use Case from Traditional Chinese Medicine. In Proceedings of 5th International Semantic Web Conference. Athens, GA,

USA

Collet, C., Huhns, M.N., Hsu, C., & Shen, W. 1991. Resource Integration Using a Large Knowledge Base in Carnot. Computer 12(24), 55-62.

de Laborda, C. P., & Conrad, S. 2005. Relational.OWL: a data and schema representation format based on OWL. In Proceedings of 2nd Asia-

Pacific Conference on Conceptual Modelling. Newcastle, Australia.

de Laborda, C. P., & Conrad, S. 2006. Database to Semantic Web Mapping using RDF Query Languages. In Proceedings of 25th International

Conference on Conceptual Mapping. Tucson, AZ, USA

Drummond, N., & Shearer, R. 2006. The Open World Assumption. eSI Workshop: The Closed World of Databases meets the Open World of the

Semantic Web.

Du, H., & Wery, L. 1999. Micro: A normalization tool for relational database engineers. Journal of Network and Computer Applications 22(4),

215-232 .

Duo, D., Pan, J., Qin, H., & LePendu, P. 2006. Towards Populating and Querying the Semantic Web. In Proceedings of 2nd International

workshop on Scalable Semantic Web Knowledge Base Systems (SSWS 2006). Athens, GA, USA

Hartig, O., Bizer, C., Freytag, J.C. 2009. Executing SPARQL Queries over the Web of Linked Data. In Proceedings of the 8th International

Semantic Web Conference (ISWC2009). Chantilly, VA, USA

He, B., Patel, M., Zhang, Z., & Chang, K. 2007. Accessing the deep web. Communications of the ACM , 50 (5), pp. 94-101.

Korotkiy, M., & Top, J. 2004. From Relational Data to RDFS Models. In Proceedings of 2004 International Conference on Web Engineering.

Munich, Germany

Laclavik, M. 2006. RDB2Onto: Relational Database Data to Ontology Individual Mapping. Tools for Acquisition, Organisation and Presenting of

Information and Knowledge.

Li, M., Du, X., & Wang, S. 2005. Learning ontology from relational database. In Proceedings of 4th International Conference on Machine

Learning and Cybernetics.

Lubyte, L., & Tessaris, S. 2007. Extracting Ontologies from Relational Databases. Technical Report KRDB07-04, Free University of Bozen-

Bolzano.

Lubyte, L., & Tessaris, S. 2009. Automatic Extraction of Ontologies Wrapping Relational Data Sources. In Proceedings of 20th Database and

Expert Systems Applications (DEXA2009).

Lutz, C., Areces, C., Horricks, I., and Satler, U. 2004. Keys, Nominals, and Concrete Domains. Journal of Artificial Intelligence Research, Vol.

23, pp 667-726.

Meier, D. (1983). The Theory of Relational Databases. Computer Science Press.

Miller, R., Haas, L., & Hernandez, M. 2000. Schema mapping as query discovery. In Proceedings of 26th International Conference on Very Large

Database. Cairo, Egypt.

Motik, B., Horrocks, I., & Sattler, U. 2007. Bridging the gap between OWL and relational databases. In Proceedings of 16th International

Conference on World Wide Web. Banff, Canada

Smith, M. K., Welty, C., & McGuinness, D. L. (Ed.) 2004. OWL Web Ontology Language Guide. Retrieved February 5, 2009, from W3C

Recommendation: http://www.w3.org/TR/owl-guide/

Dean, M., & Schreiber, G. (Ed.) 2004. OWL Web Ontological Language Reference. Retrieved February 5, 2009, from W3C Recommendation:

http://www.w3.org/TR/owl-ref/

Pratt, P. J. 1990. A Guide to SQL. Boston: Boyd & Fraser Publishing Company.

Pratt, P. J. 1995. A Guide to SQL (3rd Edition ed.). Boston: Boyd & Fraser Publishing Company.

Hayes, P. (Ed.) 2004. RDF Semantics. Retrieved February 5, 2009, from W3C Recommendation: http://www.w3.org/TR/rdf-mt/

Sequeda, J.F., Tirmizi, S.H., and Miranker, D.P. 2007. SQL Databases are a Moving Target. Position Paper for W3C Workshop on RDF Access to

Relational Databases, Cambridge, MA, USA

Sequeda, J.F., Tirmizi, S.H., and Miranker, D.P. 2008. A Bootstrapping Architecture for Integration of Relational Databases to the Semantic

Web. In Proceedings of the Poster and Demonstration Session at the 7th International Semantic Web Conference. Karlsruhe, Germany

Sequeda, J.F., Garcia-Castro, A., Corcho, O., Tirmizi, S.H., and Miranker, D.P. 2009. Overcoming database heterogeneity to facilitate social

networks: the Colombian displaced population as a case study. In Proceedings of 18th International Conference on World Wide Web Ibero-

America Track. Madrid, Spain

Shen, G., Huang, Z., Zhu, X., Zhao, X. 2006. Research on the Rules of Mapping from Relational Model to OWL. In Proceedings of the

Workshop on OWL: Experiences and Directions. Athens, GA, USA

SQL3 (ISO-ANSI Working Draft). (n.d.). Retrieved February 5, 2009, from http://www.inf.fu-berlin.de/lehre/SS94/einfdb/SQL3/sqlindex.html

Stonebraker, M. 1986. Triggers and inference in database systems. In on Knowledge Base Management Systems: integrating Artificial

intelligence and Database Technologies. M. L. Brodie and J. Mylopoulos, Eds. Springer Topics In Information Systems. Springer-Verlag

New York, New York, NY, 297-314.

Stojanovic, L., Stojanovic, N., & Volz, R. 2002. Migrating data-intensive web sites into the Semantic Web. In Proceedings of the 2002 ACM

Symposium on Applied Computing (Madrid, Spain, March 11 - 14, 2002). SAC '02. ACM, New York, NY, 1100-1107.

Stojanovic, L., Stojanovic, N., & Volz, R. 2002. A reverse engineering approach for migrating data-intensive web sites to the Semantic Web. In

Proceedings of the IFIP 17th World Computer Congress - Tc12 Stream on intelligent information Processing (August 25 - 30, 2002). M. A.

Musen, B. Neumann, and R. Studer, Eds. IFIP Conference Proceedings, vol. 221. Kluwer B.V., Deventer, The Netherlands, 141-154.

Svihla, M., & Jelinek, I. (2004). Two Layer Mapping from Database to RDF. In Proceedings of Electronic Computers and Informatics (ECI).

Slovakia, Kosice.

Tirmizi, S.H., Sequeda, J.F., & Miranker, D.P. 2008. Translating SQL Applications to the Semantic Web. In Proceedings of the 19th

International Conference on Database and Expert Systems Applications, September 01-05, 2008, Turin, Italy. S. S. Bhowmick, J. Küng,

and R. Wagner, Eds. Lecture Notes In Computer Science, vol. 5181. Springer-Verlag, Berlin, Heidelberg, 450-464.

Tirmizi, S.H., Aitken, S., Moreira, D., Mungall, C., Sequeda, J.F., Shah, N.H., & Miranker, D.P. (2009). OBO & OWL: Roundtrip Ontology

Transformations. In Proceedings of Semantic Web Applications and Tools for Life Sciences Workshop, Amsterdam, The Netherlands.

Vrandecic, D., & Sure, Y. (2008). How to Design Better Ontology Metrics. In Proceedings of the 4th European Conference on Semantic Web:

Research and Applications (Innsbruck, Austria, June 03 - 07, 2007). E. Franconi, M. Kifer, and W. May, Eds. Lecture Notes In Computer

Science, vol. 4519. Springer-Verlag, Berlin, Heidelberg, 311-325.

Wang, S., Shen, J., & Hong, T. 2000. Mining fuzzy functional dependencies from quantitative data. IEEE International Conference on Systems,

Man and Cybernetics, 5, 3600-3606.

Biron, P. V., Permanente, K., & Malhotra, A. (Ed.) 2004. XML Schema Part 2: Datatypes Second Edition. Retrieved February 5, 2009, from

W3C Recommendation: http://www.w3.org/TR/xmlschema-2/

Xu, Z., Zhang, S., & Y, D. 2006. Mapping between Relational Database Schema and OWL Ontology for Deep Annotation. In Proceedings of the

2006 IEEE/WIC/ACM international Conference on Web intelligence.Web Intelligence. IEEE Computer Society, Washington, DC, 548-552.

ESAMP: Event-Sourced Architecture for Materials Provenance Management and Application to Accelerated Materials Discovery

Article

Full-text available

Jun 2023

While the vision of accelerating materials discovery using data driven methods is well-founded, practical realization has been throttled due to challenges in data generation, ingestion, and materials state-aware machine learning. High-throughput experiments and automated computational workflows are addressing the challenge of data generation, and capitalizing on these emerging data resources requires ingestion of data into an architecture that captures the complex provenance of experiments and simulations. In this manuscript, we describe an event-sourced architecture for materials provenance (ESAMP) that encodes the sequence and interrelationships among events occurring in a simulation or experiment. We use this architecture to ingest a large and varied dataset (MEAD) that contains raw data and metadata from millions of materials synthesis and characterization experiments performed using various modalities such as serial, parallel, multi-modal experimentation. Our data architecture tracks the evolution of a material's state, enabling a demonstration of how state-equivalency rules can be used to generate datasets that significantly enhance data-driven materials discovery. Specifically, using state-equivalency rules and parameters associated with state-changing processes in addition to the typically used composition data, we demonstrated marked reduction of uncertainty in prediction of overpotential for oxygen evolution reaction (OER) catalysts. Finally, we discuss the importance of ESAMP architecture in enabling several aspects of accelerated materials discovery such as dynamic workflow design, generation of knowledge graphs, and efficient integration of simulation and experiment.

Incremental schema integration for data wrangling via knowledge graphs

Article

Full-text available

Jun 2023

Virtual data integration is the current approach to go for data wrangling in data-driven decision-making. In this paper, we focus on automating schema integration, which extracts a homogenised representation of the data source schemata and integrates them into a global schema to enable virtual data integration. Schema integration requires a set of well-known constructs: the data source schemata and wrappers, a global integrated schema and the mappings between them. Based on them, virtual data integration systems enable fast and on-demand data exploration via query rewriting. Unfortunately, the generation of such constructs is currently performed in a largely manual manner, hindering its feasibility in real scenarios. This becomes aggravated when dealing with heterogeneous and evolving data sources. To overcome these issues, we propose a fully-fledged semi-automatic and incremental approach grounded on knowledge graphs to generate the required schema integration constructs in four main steps: bootstrapping, schema matching, schema integration, and generation of system-specific constructs. We also present Nextia DI , a tool implementing our approach. Finally, a comprehensive evaluation is presented to scrutinize our approach.

Conceptually-grounded mapping patterns for Virtual Knowledge Graphs

Article

Mar 2023
DATA KNOWL ENG

Semantic Architecture for Interoperability in Distributed Healthcare Systems

Article

Full-text available

Nov 2022

Electronic Health Records (EHRs) aggregate the entire patient’s data from different systems. Achieving interoperability for distributed EHR systems is expected to improve patient safety and care continuity, and therefore it improves the healthcare industry. However, achieving interoperability is challenging because of many standards, medical terminologies and ontologies, and different data formats. These formats make the integration of different systems an impossible process. If the hospital uses one standard to implement all of its medical systems, it will be no problem integrating them. However, hospitals usually depend on multiple standards and data formats to deliver different systems like hospital information systems, radiology information systems, laboratory information systems, etc. Semantic Web presents new technology for achieving EHRs interoperability. In this paper, we propose a novel ontological model to implement interoperability for distributed EHR environments. The proposed semantic ontology-based model can unify different EHRs data formats. In this study, We unify five different and popular healthcare data formats and standards. In addition, the framework could be extended straightforwardly to accept any other EHR data format. By implementing the proposed in real environments, we provide the physician with a single interface with a single terminology to query and interact with distributed healthcare systems that use different standards and data formats. This process is expected to help the physician to collect patient data from different systems quickly, completely, and correctly. The proposed ontological model has two stages. The first stage of the proposed converts each different input source to OWL ontology. In the second stage, it integrates all those ontologies into a merged crisp one. The integrated ontology includes 3753 axioms, 2606 logical axioms, 186 classes, 136 individuals, 126 datatype properties, and 257 object properties. We use SPARQL Protocol and RDF Query Language (SPARQL) and Description Logic (DL) queries to evaluate the output ontology. The obtained results ensure that the proposed framework helps physicians and specialists make a centralized point for all patients’ data. It could aggregate data with any heterogeneous structures with high precision.

Bridging the gap between the semantic web and big data: answering SPARQL queries over NoSQL databases

Article

Full-text available

Dec 2022
IJECE

Nowadays, the database field has gotten much more diverse, and as a result, a variety of non-relational (NoSQL) databases have been created, including JSON-document databases and key-value stores, as well as extensible markup language (XML) and graph databases. Due to the emergence of a new generation of data services, some of the problems associated with big data have been resolved. In addition, in the haste to address the challenges of big data, NoSQL abandoned several core databases features that make them extremely efficient and functional, for instance the global view, which enables users to access data regardless of how it is logically structured or physically stored in its sources. In this article, we propose a method that allows us to query non-relational databases based on the ontology-based access data (OBDA) framework by delegating SPARQL protocol and resource description framework (RDF) query language (SPARQL) queries from ontology to the NoSQL database. We applied the method on a popular database called Couchbase and we discussed the result obtained.

Method for Ontology Learning from an RDB: Application to the Domain of Cultural Heritage

Chapter

Jan 2024

For decades, ontologies have defined valuable terminology for describing and representing a knowledge domain, capturing relationships between concepts, and improving knowledge management. Ontologies enable the exchange and sharing of information, extending syntactic and semantic interoperability: such advantages are also very useful in the cultural heritage (CH) field. Nowadays, ontologies are often made manually, although various attempts have been made in the literature for their automatic generation (Ontology Learning). This paper proposes a new way for the semi-automatic building of an ontology from a Relational Database (RDB). Following an accurate review of existing methods, we propose the implementation of a Python library capable of converting an RDB into an OWL ontology and importing its data inside the ontology as concepts and properties instances. We present a case study on actual data from the cultural heritage world coming from the REMIAM project of the High Technology District for Cultural Heritage (Distretto ad Alta Tecnologia per i Beni Culturali - DATABENC). Through interviews with experts in the field, a series of valid questions were identified for the experts’ research work and the interrogation of the knowledge base. The questions were then converted into SQL and SPARQL queries to assess the correctness of the method. The ability of the generated ontology to infer new knowledge on accurate data in the RDB will also be highlighted.

Research on Data Transformation Method Based on RDB-RDF Schema Mapping

Article

Jan 2023

金高

Ontology learning from relational database: a review

Article

Full-text available

Sep 2023

A relational database (RDB) is a digital database that uses components (such as constraints, tables, keys, etc.) to manage data in a structured manner. Because of these components, RDBs are considered ’poor’ from a semantic point of view, precisely because of the structure-oriented nature of the components used. One way to eliminate this limitation is to transform the RDB into an ontology. The purpose of this article is to review the different approaches existing in the literature to extract data from an RDB and convert it into ontology instances. Two approaches are used to integrate the mapping between RDBs and ontologies. The first allows ontologies to be extracted from an RDB, the second consists of a mapping of the relational database to an existing ontology. Our proposed review focuses on methods for creating a specific ontology from an RDB. The proposed review examines this field, classifying the methods that will be analyzed according to their inputs and outputs. Such classification may be useful for understanding the usability of methods. The aim is to critically review existing studies to help outline this research topic’s progress and identify methods’ gaps and functionalities.

A Path Recommender System for Enjoyment Improvement of the Cultural Heritage

Chapter

Jul 2023

The need for cultural heritage enhancement in the Italian context requires an effort from new technologies. Indeed, various tools can be functional to guarantee the improvement of the artistic artifacts’ maintenance or users’ enjoyment. This paper aims to present an architecture to improve the cultural experience through recommender systems and digital storytelling techniques. Moreover, the construction of a mathematical model allows for the building of personalized paths. An experimental phase, based on the development of a prototype, permits the evaluation of the proposed architecture, and obtained results are promising.KeywordsRecommender systemContext awarenessPath recommendationDigital storytelling

Dynamic Ontology Generation using Relational Schema

Conference Paper

Dec 2022

Keys, nommals, and concrete domains

Article

Full-text available

Jan 2005

Many description logics (DLs) combine knowledge representation on an abstract, logical level with an interface to "concrete" domains like numbers and strings with built-in predicates such as <, +, and prefix-of. These hybrid DLs have turned out to be useful in several application areas, such as reasoning about conceptual database models. We propose to further extend such DLs with key constraints that allow the expression of statements like "US citizens are uniquely identified by their social security number". Based on this idea, we introduce a number of natural description logics and perform a detailed analysis of their decidability and computational complexity. It turns out that naive extensions with key constraints easily lead to undecidability, whereas more careful extensions yield NExp-TIME-complete DLs for a variety of useful concrete domains.

Linked Data: The Story so Far

Article

Full-text available

Jan 2009

The term Linked Data refers to a set of best practices for publishing and connecting structured data on the Web. These best practices have been adopted by an increasing number of data providers over the last three years, leading to the creation of a global data space containing billions of assertions-the Web of Data. In this article we present the concept and technical principles of Linked Data, and situate these within the broader context of related technological developments. We describe progress to date in publishing Linked Data on the Web, review applications that have been developed to exploit the Web of Data, and map out a research agenda for the Linked Data community as it moves forward.

OWL Web Ontology Language Guide

Technical Report