Conference PaperPDF Available

The Use of SKOS Vocabularies in Digital Repositories: The DSpace Case

October 2010

October 2010

DOI:10.1109/ICSC.2010.83

Source
IEEE Xplore

Conference: Semantic Computing (ICSC), 2010 IEEE Fourth International Conference on

Authors:

Georgia D. Solomou

University of Patras

Thesauri are concept schemes that help in efficiently characterizing and retrieving items from digital libraries. SKOS is a data model that provides a standardized way to represent thesauri-and controlled vocabularies in general-using Resource Description Framework. A digital repository system that can inherently ingest and handle thesauri, although not in SKOS format, is DSpace. SKOS support in DSpace is implemented thanks to an add-on, provided by the University of Minho. Our initial objective was to apply this add-on to a running DSpace instance. We then tested this updated DSpace installation using a real vocabulary: the Thesaurus of Greek Terms for which we took on the task of bringing it in SKOS. As a final step, we tried to tackle with arising problems and to propose solutions, which are mostly based on the Semantic Web techniques.

…

Figures - uploaded by Georgia D. Solomou

Content may be subject to copyright.

Content uploaded by Georgia D. Solomou

Content may be subject to copyright.

The Use of SKOS Vocabularies in Digital

Repositories

The DSpace Case

Georgia Solomou

High Performance Information Systems Laboratory

Computer Engineering and Informatics Dpt.

University of Patras

Patras-Rio, Greece

solomou@hpclab.ceid.upatras.gr

Theodore Papatheodorou

High Performance Information Systems Laboratory

Computer Engineering and Informatics Dpt.

University of Patras

Patras-Rio, Greece

tsp@hpclab.ceid.upatras.gr

Abstract—Thesauri are concept schemes that help in efficiently

characterizing and retrieving items from digital libraries. SKOS

is a data model that provides a standardized way to represent

thesauri -and controlled vocabularies in general- using Resource

Description Framework. A digital repository system that can

inherently ingest and handle thesauri, although not in SKOS

format, is DSpace. SKOS support in DSpace is implemented

thanks to an add-on, provided by the University of Minho. Our

initial objective was to apply this add-on to a running DSpace

instance. We then tested this updated DSpace installation using a

real vocabulary: the Thesaurus of Greek Terms for which we

took on the task of bringing it in SKOS. As a final step, we tried

to tackle with arising problems and to propose solutions, which

are mostly based on the Semantic Web techniques.

Digital Libraries; Semantic Web; SKOS; Thesauri; Controlled

Vocabularies; DSpace;

NTRODUCTION

More and more cultural and educational institutions are

based upon well-known digital library systems in order to store,

manage and disseminate their digital assets. These mechanisms

often implement facilities that render their content more

knowledge-intensive and thus suitable for exploitation by

Semantic Web applications.

An extremely popular system implemented for handling

digital collections is DSpace. On top of DSpace many

institutional repositories have been built worldwide, serving

museums, libraries, national archives, etc. A significant feature

of DSpace is the ability to characterize its items using a

predefined set of keywords, namely a controlled vocabulary.

The adoption of a structured controlled vocabulary by a

digital library system is of great importance: It helps in

properly characterizing the ingested content and plays a

fundamental role in effectively indexing, searching and

retrieving information. Related to other types of controlled

vocabularies, thesauri are a more powerful choice as they allow

for the explicit declaration of relationships between concepts.

All type of controlled vocabularies, in order to be utilizable

and exchangeable between computer and web applications,

need to be expressed as machine-readable data. A standardized,

interoperable and machine-understandable way for representing

controlled vocabularies and using them within the framework

of the Semantic Web is SKOS.

SKOS (Simple Knowledge Organization System) [5] is a

data model for expressing the basic structure of all concept

schemes, like thesauri. It is actually a practical application of

RDF [7] (and RDFS) and its main objective is to enable easy

publication of controlled structured vocabularies for the

Semantic Web. SKOS is a W3C recommendation, hence a

standardized Web technology. Thesauri expressed in SKOS

can potentially provide added-value to Semantic Web

applications.

In this work we are focusing on SKOS, its prevalence

among digital repositories as well as on the process of

converting thesauri to SKOS (SKOSification). In particular, in

section II we present SKOSified thesauri and related tools that

witness the wide applicability of SKOS. In section III we talk

about existing methods for converting controlled vocabularies

to SKOS and finally show how we adopted this standard for the

Thesaurus of Greek Terms. In section IV we examine the case

of incorporating a SKOSified vocabulary in DSpace.

Conclusions and future work follow.

II. SKOS

SAGE

At this moment, there are enough running projects in the

field of culture that involve SKOS: For example, ATHENA,

the European digital library “Europeana”,

STERNA

(Semantic Web-based Thematic European Reference Network

Application), STAR

(Semantic Technologies for

http://www.athenaeurope.org/

http://www.europeana.eu/

http://www.sterna-net.eu/

http://hypermedia.research.glam.ac.uk/kos/star/

Archeological Resources) and many more projects adopt this

model as a means to provide more knowledge-intensive data.

In addition to these projects, many more institutions,

responsible for publishing thesauri, show particular interest in

adopting the SKOS standard. Besides, the continuously

increasing number of tools that are built around SKOS (like

editors, validators and converters), definitely encourage and

facilitate the SKOSification process.

A. Thesauri in SKOS

The need to migrate knowledge organization systems into

SKOS has long been recognized by organizations that deal with

controlled vocabularies. Some of them have already deployed

an official SKOSified version of their structured vocabularies

whereas others are on the way to do so.

1) Popular SKOSified vocabularies

In this section, we mention some known thesauri that have

been converted to SKOS. These concept schemes are available

to the public and apply to many different areas of knowledge:

• LCSH - Library of Congress Subject Headings [11].

It is a very popular subject heading system maintained

by the US Library of Congress. It offers an online

catalogue where users can search and browse

thousands of terms. Through this online catalogue,

users are able to also obtain the SKOS version of each

selected term.

• AGROVOC – The Food and Agriculture

Organization Thesaurus.

It is a multilingual

thesaurus that provides terminology for all subject

fields in agriculture, forestry, fisheries, food and

related domains. For each concept its corresponding

SKOS description is also available, expressed in the

RDF/XML serialization syntax.

• UKAT - UK Archival Thesaurus.

It is a subject

thesaurus aiming to be utilized for indexing and

searching in the UK archive sector. Its SKOSified

version is provided as a single XML file that can be

directly downloaded from the UKAT official page.

• GEMET - General Multilingual Environmental

Thesaurus.

A general thesaurus which defines a core

terminology for the environment. It is available as a

web service and can be accessed online; its SKOS

format, though, is not browseable but it can be

obtained in the form of a XML file.

• AAT - Getty Arts and Architecture Thesaurus.

It is

a structured vocabulary used for characterizing any

type of cultural material, as well as items of art and

architecture. Although the AAT thesaurus can be

browsed online, its SKOSification is still in a draft

http://aims.fao.org/website/Search-AGROVOC

http://isegserv.itd.rl.ac.uk/skos/ukat/

http://isegserv.itd.rl.ac.uk/skos/gemet/

http://www.getty.edu/research/conducting_research/vocabular

ies/aat/

stage and that is the reason why it hasn’t be

incorporated yet to the online catalogue.

• WordNet.

It is a lexical database for the English

language that may be considered as a thesaurus:

expressed concepts are interlinked by various semantic

relations. For WordNet 2.0 a partial conversion to

SKOS has been proposed.

The aforementioned SKOS implementations are some

representative examples and definitely not the only existing

attempts. In addition to these vocabularies, there are many

others, more or less significant, that try to successfully

accomplish their migration to SKOS. Through this work, we

chose to further analyze the case of the Thesaurus of Greek

Terms, a controlled vocabulary implemented in Greece and

meant to be used by Greek institutions.

2) The Thesaurus of Greek Terms

The National Documentation Center of Greece (EKT) is the

national infrastructure for scientific documentation and online

information. It is an institution responsible for publishing and

handling the first official thesaurus in Greece, the Thesaurus of

Greek Terms (TGT).

The TGT thesaurus is structured as a controlled vocabulary

that allows representation of both vertical (hierarchical) and

horizontal associations between concepts. It is composed of

5227 bilingual (Greek, English) terms that cover a broad field

of knowledge. Its aim is to facilitate institutions in Greece, like

libraries, museums and information centers, in characterizing

and managing their digital material.

Despite the thesaurus’ notable presence in Greece, EKT

hasn’t proceeded yet with the SKOSification of this product.

For this reason, in section III we first propose -and afterwards

utilize- a SKOSified version of the TGT thesaurus. The

conversion process is described in section III.B.

B. SKOS Specific Tools

Apart from the increasing number of SKOSified thesauri,

the wide acceptance of SKOS becomes also evident by the

number of tools that are built around it. In what follows, we

will shortly present the most known such tools; they are

distinguished in two basic categories: editors and validators.

• ThManager.

It is an open-source tool implemented

in Java. Aims at facilitating the creation and

visualization of SKOS vocabularies.

• SKOSEd [6]. It is a plug-in for Protégé 4 (an OWL

ontology editor) that augments the latter with the

ability to create and modify SKOS thesauri. SKOSed is

accompanied by the SKOS API

, a programmatic API

implemented in Java that can be utilized for building

SKOS based applications.

http://isegserv.itd.rl.ac.uk/skos/WordNet.zip

http://thmanager.sourceforge.net/

http://skosapi.sourceforge.net/

• PoolParty [10]. It is a commercial system suitable for

editing SKOS vocabularies and for managing any type

of thesauri. It is built upon Semantic Web technologies,

and among others, allows for thesauri management via

easy-to-use GUIs. PoolParty also offers a SKOS

thesaurus consistency checker service that validates the

submitted vocabularies for their alignment with the

SKOS recommendation [9].

• W3C Validation Service.

It is an experimental on-

line SKOS validator, provided by W3C.

• The MONDECA SKOS Reader.

This tool

facilitates users to easily navigate and browse a SKOS

thesaurus, provided that this thesaurus is published as

an accessible SKOS file. It actually produces readable

versions of the imported files, whereas it can display

concepts in various orders (e.g., hierarchically or

alphabetically).

III. T

SKOS

IFICATION

ROCESS

The process of converting thesauri to SKOS, as we will see

in this section, is not standardized and depends on the nature of

the candidate vocabulary.

A. Existing Methods

A notable attempt for SKOSification is proposed in [1] by a

team at the VU University of Amsterdam. They apply their

method in some well-known thesauri, like MeSH and GTAA.

However, although the proposed method behaves well for

controlled vocabularies that are based on older thesauri

recommendations (e.g., ISO or ANSI/NISO standard),

vocabularies with non-standard features cannot be handled.

Apart from the aforementioned case, which tries to propose

a structured method for bringing thesauri to SKOS, some other

attempts have also been published, aiming to fit the needs of a

particular vocabulary. For example, [11] is a technical report

dealing with the conversion of the TheSoz thesaurus

(Thesaurus of the Social Sciences) to SKOS. Moreover, [11]

describes the SKOSification of the Library of Congress Subject

Headings. Both works present a procedure that requires manual

effort for implementing the mapping from the original

thesaurus’ elements to SKOS notions. In both cases, an

appropriate XSL transformation is finally applied, which

accomplishes the migration to SKOS.

B. SKOSifying the TGT Thesaurus

Having all these in mind, we proceeded with the

SKOSification of the TGT thesaurus. The latter follows the

structure of any usual subject thesaurus (see Fig. 1). It makes

use of hierarchical (<BT>, <NT>, <MT>), associative (<RT>)

and equivalence (<UF>) relations. In addition, for each term its

English translation is provided (<ET>), as well as its

correspondence to the Dewey Decimal Classification system

(<dewey>).

http://www.w3.org/2004/02/skos/validation

http://client2.mondeca.com/mondecalabs/skosReader.html

Figure 1. The TGT thesaurus in its original XML format.

First, we manually mapped thesaurus elements to SKOS

notions, paying particular attention to what the SKOS

specification dictates. This mapping is summarized in Table I.

We then constructed an XSL transformation able to convert

TGT to the desired SKOS format, taking into account the

mapping in Table I. The final SKOSified version of TGT can

be accessed online

IV. T

PACE

ASE

DSpace is an open-source digital repository system. It is

responsible for the efficient description, preservation,

management, and distribution of any kind of digital material. It

is popular because it supports an extensible core metadata

schema, based on the well-known Dublin Core specification

[4]. Furthermore, DSpace is multilingual, adaptable to

administrator’s needs and able to incorporate novel features.

One such feature is the utilization of controlled vocabularies so

as to better characterize and manipulate its items.

TABLE I. M

APPING TO

SKOS

LEMENTS

XML

element Function SKOS notion

<TERM> The described term <skos:Concept>

<USER> Thesaurus’ owner -

Term’s label <skos:prefLabel lang="el">

<MT> Microthesauri term <skos:broaderTransitive>

<ET>

English translation <skos:prefLabel lang="en">

<ET> Alternative English

translation <skos:altLabel lang="en">

<BT> Broader term <skos:broader>

<NT> Narrower term <skos:narrower>

<RT> Related term <skos:related>

<UF> Opposite of the Used

Instead (USE) term <skos:altLabel lang="el">

<SN> A short description <skos:definition>

<DEWEY>

A number indicating

the correspondence to

Dewey system

<skos:notation>

a. The first occurrence of <ET> element is considered as the preferred translation

http://swig.hpclab.ceid.upatras.gr/SKOS?action=AttachFile

&do=get&target=ekt_to_skos.rdf

<TERM>

<CONTEXT>αστικά δικαστήρια</CONTEXT>

<MT>Νομικές Επιστήμες</MT>

<ET>civil courts</ET>

<BT>δικαστήρια</BT>

<NT>ειρηνοδικεία<NT>

<NT>πρωτοδικεία<NT>

<UF>βλ. πολιτικά δικαστήρια</UF>

<RT>πολιτική δικονομία</RT>

<SN>some description</SN>

</TERM>

Figure 2. HTML node tree.

A. Controlled Vocabullaries in DSpace

DSpace supports controlled vocabularies in order to provide

and restrict a set of keywords that end-users utilize for

describing, searching and browsing items. These keywords are

organized in the form of a tree (taxonomy) which becomes

available to the end-user during the search or submission

process (see Fig. 2).

Controlled vocabularies are fed to DSpace as simple XML

files: there is one such file per each ingested vocabulary. But in

contrast to the general multilingual philosophy of DSpace, the

controlled vocabulary facility lacks the multilingualism

characteristic. To solve this, we have augmented DSpace with

the ability to support any number of translations for each

vocabulary. Each vocabulary’s translation is fed, and thus

handled by the system, as a separate XML file. Consequently,

when end-users select a language for the DSpace interface, they

automatically choose the translation in which all available

controlled vocabularies will appear.

In order for a controlled vocabulary to be recognized by the

DSpace system, the former should be structured according to a

specific format (“DSpace node schema”). This means that in

order to make DSpace able to support vocabularies in different

formats -other than the “DSpace node schema”- an

intermediate transformation is needed.

According to the DSpace node schema all information

about a vocabulary term are enclosed in a <node> element

(see Fig. 3). Only hierarchical -narrower in meaning-

relationships can be expressed using the sub-element

<isComposedBy>. Moreover, a simple annotation

mechanism is provided by the optional sub-element

<hasNote>.

B. Support of SKOS in DSpace

The Odisseia Research at the University of Minho in

Portugal has implemented an add-on for version 1.4.2 of

DSpace [3] which augments this digital repository system with

the ability to support controlled vocabularies expressed in

SKOS. In particular, the provided add-on makes the following

changes:

…

</isComposedBy >

</node>

Figure 3. The DSpace node schema.

• Enhances the DSpace inherent node schema so as to

manipulate related and preferred (use-instead) terms.

• Allows support for SKOSified thesauri.

• Offers the ability to assign different vocabulary to each

DSpace Community.

Apart from the last feature, which is beyond the scope of

this work, we have successfully accommodated the first two in

the latest version of DSpace (DSpace 1.6). The modifications

we had to accomplish were subtle and didn’t affect the way

DSpace was handling SKOSified vocabularies

The add-on actually alters the controlled vocabulary

ingestion process in DSpace. It adds an intermediate

transformation step, implemented by an appropriate XSLT file

(see Fig. 4). As a result, the support of SKOSified vocabularies

is finally achieved through two subsequent XSL

transformations:

• The first applies to the original SKOS file and

produces a valid DSpace node schema.

• The second is responsible for converting the inherent

node schema to an HTML node tree (taxonomy)

But this approach includes also a number of problems,

something that became evident when we tried to import a real

SKOS thesaurus in a working DSpace instance, enhanced with

this add-on. The arising problems are explained in the

following section.

C. The TGT Thesaurus in DSpace

After we had successfully SKOSified the TGT thesaurus,

we attempted to incorporate it in a DSpace 1.6 working

instance. The result was not satisfactory as we faced two kinds

of problems, concerning the construction of the HTML node

SKOS

SKOS SKOS

SKOS

Vocabulary

(RDF/XML)

XML

XMLXML

XML

DSpace

Node

Schema

XSL Transformation

(

vocabularySKOS2node.xsl

)

XSL Transformation

(

vocabulary2html.xsl

)

HTML

Node

Tree

Submission

Process

Subject

Figure 4. The controlled vocabulary ingestion process.

tree:

1. Some terms appeared in the wrong place of the

taxonomy (wrong depth level or repetitions of terms).

2. A number of terms, although present in the SKOS file,

were missing from the tree hierarchy.

The main reason behind these problems was the inability of

the provided XSL transformations to deal with every possible

relationship among described concepts (e.g., there was no

provision for broader terms). These problematic

transformations, along with the non-exhaustive (but not

semantically inconsistent) implementation of TGT, in which

not every possible relationship is asserted, made the situation

even worse. In particular, we noticed that the thesaurus terms

that do not exist as stand-alone concepts and are only

referenced through a relation by another concept (hierarchical,

associative or relation of equivalence) fail to appear in the final

HTML tree. If the TGT thesaurus was complete the problem of

“missing” terms would probably not exist.

To tackle the aforementioned issues, we initially modified

the XSL transformation that is responsible for converting the

imported SKOS files into the DSpace node schema. By this

way, we managed to handle the first problem, whereas the

latter remains still unsolved.

More specifically, the transformation implemented by the

provided add-on was replaced by a new one, which was

deployed so as to carefully manipulate all relationships

between concepts. As a result we had neither repetitions of

terms nor wrong placement of them. DSpace users are now

presented with an accurate taxonomy, at least as far as its

structure is concerned. Top concepts appear as top categories in

the constructed tree and each sub-term lies beneath them. In

other words, each term is placed under its broader concept. A

part of the taxonomy tree that is produced by the SKOSified

TGT thesaurus when imported in DSpace 1.6, is shown in Fig.

D. Proposed Solutions

A possible and more effective solution to the problem of

“missing” terms would be to consider thesauri as ontologies.

Besides, the SKOS is by itself defined as being in the Web

Ontology Language (OWL [2][1]) format. Such a consideration

would allow for a programmatic access to the thesaurus’

elements. In particular, by exploiting the OWL API, a simpler

way to construct the vocabulary’s node tree would be possible,

in contrast to the more complex one offered by the XSL

transformation. Furthermore, the new constructs offered by the

latest version of OWL (OWL 2) would allow for the expression

of richer semantic conditions in SKOS, as stated in [8].

Another gain in handling the SKOS thesaurus as an OWL

ontology, is the possibility to apply OWL reasoners (like

FaCT++ or Pellet). Such a reasoning based approach allows for

inferencing and thus for the automatic handling of non-asserted

relationships. Consequently, an inferenced-based classification

and rendering of the thesaurus could be achieved, resulting in

having no “missing” terms in the tree hierarchy.

V. C

ONCLUSIONS AND

UTURE

ORK

In this work we tried to show the importance of SKOS, as a

data model able to transfer knowledge organization systems to

the Semantic Web in a quick and simplified manner. Many

popular thesauri have either already migrated to SKOS or are

working on this task.

Following this trend in library science, we took over the

task to convert the TGT thesaurus -a controlled vocabulary

targeted for use by Greek institutions- in SKOS. Afterwards,

we tried to see how such a SKOSified thesaurus can be

exploited by a digital repository system, which manages any

kind of educational and cultural content. During the SKOSified

vocabulary ingestion process we faced several problems,

mostly originating from the problematic nature of the applied

add-on. Another reason of these problems was the non-

exhaustive description of the thesaurus. To fix some of these

issues, we successfully modified the provided XSL

transformation. Nevertheless, the problem of “missing” terms,

still remains unsolved.

As future work we intend to utilize Semantic Web

technologies in order to manipulate controlled vocabularies in a

more refined way. To the extent of handling thesauri as

ontologies, our final objective is to make them manageable by

OWL reasoners. In this way, we at least expect to overcome the

problem of “missing” terms in the produced HTML hierarchy,

given that inferred relationships can fill in missing descriptions

in a thesaurus.

EFERENCES

[1]

M. Assem, V. van, Malaisι, A. Miles, and G. Schreiber, “A Method to

Convert Thesauri to SKOS”, Proc. 3rd European Semantic Web

Conference (ESWC 2006), Springer Berlin/Heidelberg, 2006. pp. 95-

106, doi: 10.1007/11762256_10

[2]

S. Bechhofer, V. F. Harmelen, J. Hendler, I. Horrocks, D.

McGuinness, P. Patel-Schneider, P., and L. Stein, “OWL Web

Ontology Language: Reference”, W3C Recommendation, 2004

http://www.w3.org/TR/owl-ref/

Figure 5. The TGT thesaurus in DSpace 1.6.

[3]

S. Costa, M. Ferreira, and A. Alice, “Controlled-Vocabulary Add-on

Patch for DSpace 1.4.2”, 2007.

http://sourceforge.net/tracker/index.php?func=detail&aid=1833347

&group_id=19984&atid=319984.

[4]

DCMI Usage Board, “DCMI Metadata Terms”, DCMI

Recommendation, 2008.

http://dublincore.org/documents/dcmi-terms/

[5]

A. Issaac, and E. Summers, (eds), “SKOS Simple Knowledge

Organization System Primer”, W3C Proposed Recommendation,

2009. http://www.w3.org/TR/2009/WD-skos-primer-20090615/

[6]

S. Jupp, S. Bechhofer, and R. Stevens, “A Flexible API and Editor for

SKOS”, Proc. 7th International Semantic Web Conference (ISWC

2008), 2008

[7]

G. Klyne, and J. J. Carroll, (eds) “Resource Description Framework

(RDF): Concepts and Abstract Syntax”, W3C Recommendation, 2004.

http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/

[8]

D. Koutsomtiropoulos, and G. Solomou, “SKOS in OWL 2”, 2009.

http://swig.hpclab.ceid.upatras.gr/SKOS

[9]

A. Miles, and S. Bechhofer, “SKOS Simple Knowledge Organization

System Reference”, W3C Recommendation, 2009.

http://www.w3.org/TR/skos-reference/

[10]

Schandl, and A. Blumauer, “PoolParty: SKOS Thesaurus

Management Utilizing Linked Data”, Proc. Extended Semantic Web

Conference (ESWC 2010), 2010. Springer Berlin/Heidelberg. pp. 421-

425, 10.1007/978-3-642-13489-0_36

[11]

E. Summers, A. Isaac, C. Redding, and D. Krech, “LCSH, SKOS and

Linked Data”, Proc. International Conference on Dublin Core and

Metadata Applications (DC 2008), 2008

[12] B

. Zapilko, and Y. Sure, “Converting the TheSoz to SKOS”, GESIS

Technical Report 2009/07, GESIS - Leibniz Institute for the Social

Sciences, 2009

Federated Semantic Search Using Terminological Thesauri for Learning Object Discovery

Article

Full-text available

Jul 2017
J Enterprise Inform Manag

Purpose The purpose of this paper is to propose a framework and system to address the inability to discover new and authentic learning material and the lack of a single access point for search and browsing of remote learning object repositories (LORs). Design/methodology/approach The authors develop a framework for keyword-based query expansion using SKOS domain terminologies and implement a federated search mechanism integrating various disparate LORs within a learning management system (LMS). Findings The authors show that the expanded query achieves improved information gain and it is applied for federated information access, by simultaneously searching within a number of repositories. Results can be seamlessly aggregated back within the LMS and the course context. Practical implications It is possible to retrieve additional learning objects (LOs) and achieve a corresponding increase in recall, while maintaining precision. SKOS expansion behaves well in a scholarly setting, which, combined with federated search, can contribute toward LOs’ discovery at a balanced cost. The system can be easily integrated with other platforms as well, building on open standards and RESTful communication. Originality/value To the authors’ knowledge, this is the first time SKOS-based query expansion is applied in a federated setting, and for the discovery and alignment of learning objects residing within LORs. The results show that this approach can achieve considerable information gain and that it is possible to strike a balance between search effectiveness, query drift and performance.

Reconciliation of patient/doctor vocabulary in a structured resource

Article

Full-text available

Dec 2019
Health Informat J

Today, social media is increasingly used by patients to openly discuss their health. Mining automatically such data is a challenging task because of the non-structured nature of the text and the use of many abbreviations and the slang terms. Our goal is to use Patient Authored Text to build a French Consumer Health Vocabulary on breast cancer field, by collecting various kinds of non-experts’ expressions that are related to their diseases and then compare them to biomedical terms used by health care professionals. We combine several methods of the literature based on linguistic and statistical approaches to extract candidate terms used by non-experts and to link them to expert terms. We use messages extracted from the forum on ‘cancerdusein.org’ and a vocabulary dedicated to breast cancer elaborated by the Institut National Du Cancer. We have built an efficient vocabulary composed of 192 validated relationships and formalized in Simple Knowledge Organization System ontology.

Diseño de una ontología para la gestión de datos heterogéneos en universidades: marco metodológico

Article

Full-text available

Dec 2016

An ontology is a useful tool to deal with the heterogeneity of data and their semantization, with a view to raising the quality of processes of organization, search and retrieval of information in institutional management systems, particularly those implemented in a university web. In view of the wealth of multidisciplinary knowledge treasured by the Universities, a UH Ontology is proposed for the management of data. The paper describes the characteristics and conditions of the heterogeneous data currently managed by various information management systems at the Universities, and proposes a methodological framework for the design of an ontology for the management of heterogeneous data at the institution. The proposal includes the design of the ontology, its classes, annotations, ontological languages and semantic annotation scheme, based on the methodology developed by Noy and McGuinness and the software Protégé as a construction tool.

حتمیة التقارب الرقمی من أجل التكامل المعرفی لبناء الذاكرة الجمعیة لرسم المستقبل للمجتمع

Article

Apr 2023

شریف كامل شاهین

Conversión normalizada (SKOS) de sistemas de organización del conocimiento interoperables en la web

Article

Full-text available

Jan 2020

Este estudio examina de forma exhaustiva la literatura científica dedicada a los procesos de skosificación de vocabularios y sistemas de organización del conocimiento. Se analizan en profundidad 49 trabajos que describen y detallan la transformación de un total de 59 vocabularios controlados convencionales o SOC (Sistemas de Organización del Conocimiento) a Simple Knowledge Organization System (SKOS). Se identifican los puntos clave para hacer el análisis de metodologías de transformación de vocabularios en SKOS para la web y se comparan los estudios para determinar las aproximaciones y parámetros más recomendables para llevar a cabo estos procesos de conversión de vocabularios, cada vez más frecuentes y necesarios en la web semántica y en entornos de linked data (LD). Los resultados señalan que la mayor parte de SOC transformados son tesauros, que los formatos mayoritarios son de texto o registros bibliográficos, que el objetivo más común al cambiar a SKOS es la mejora de la interoperabilidad de los vocabularios, y que los procesos de conversión pueden agruparse mediante tres formas: scripts realizados en distintos lenguajes, transformaciones XSL y lenguajes de mapeo. Se concluye queSKOS es considerado por los autores como una buena opción para mejorar la interoperabilidad de vocabularios controlados.

User Experience of an Institutional Repository in a Private University in Mexico: A Fundamental Component in the Framework of Open Science

Article

Full-text available

Oct 2019

Technology platforms, as viewed from the perspective of their users, provide new perspectives to discover aspects to enhance its use. The objective of this study is to provide the instruments and indicators that allow us to obtain empirical evidence of the experience of users of an institutional repository through the user-centered design methodology. The guiding question of the study was: How can society measure the experience of users who use an institutional repository? The authors employed a sequential mixed explanatory methodology and user-centered design, with the use of focus groups and surveys, applied to a sample of students and professors. The findings suggest that three key aspects could be considered to promote satisfactory experiences of users in relation to the open educational movement: a) innovation in communication strategies to increase knowledge transfer in an open format, b) the versatility of the technologies and c) establish the normativity and regulation of their use of institutional repositories.

Trends and challenges of visual search interfaces in digital libraries and repositories

Article

Feb 2017
ELECTRON LIBR

Purpose This paper aims to present an overview of the challenges encountered in integrating visual search interfaces into digital libraries and repositories. These challenges come in various forms, including information visualisation, the use of knowledge organisation systems and metadata quality. The main purpose of this study is the identification of criteria for the evaluation and integration of visual search interfaces, proposing guidelines and recommendations to improve information retrieval tasks with emphasis on the education-al context. Design/methodology/approach The information included in this study was collected based on a systematic literature review approach. The main information sources were explored in several digital libraries, including Science Direct, Scopus, ACM and IEEE, and include journal articles, conference proceedings, books, European project reports and deliverables and PhD theses published in an electronic format. A total of 142 studies comprised the review. Findings There are several issues that authors did not fully discuss in this literature review study; more specific, aspects associated with access of digital resources in digital libraries and repositories based on human computer interaction, i.e. usability and learnability of user interfaces; design of a suitable navigation method of search based on simple knowledge organisation schemes; and the use of usefulness of visual search interfaces to locate relevant resources. Research limitations/implications The main steps for carrying out a systematic review are drawn from health care; this methodology is not commonly used in fields such as digital libraries and repositories. The authors aimed to apply the fundamentals of the systematic literature review methodology considering the context of this study. Additionally, there are several aspects of accessibility that were not considered in the study, such as accessibility to content for disabled people as defined by ISO/IEC 40500:2012. Originality/value No other systematic literature reviews have been conducted in this field. The research presents an in-depth analysis of the criteria associated with searching and navigation methods based on the systematic literature review approach. The analysis is relevant for researchers in the field of digital library and repository creation in that it may direct them to considerations in designing and implementing visual search interfaces based on the use of information visualisation.

Gestión de fondos de archivos con datos enlazados y consultas federadas

Article

Full-text available

Sep 2016

En este trabajo se presentan las principales tecnologías de la Web Semántica que pueden ser de utilidad para la gestión de fondos archivísticos. Se examinan diversos proyectos de ámbito internacional y local que parten de descripciones normalizadas ISAD-G para generar ontologías, así como la disponibilidad de LIAM (Linked Archival Metadata), que facilita la transformación de datos de archivo a formato RDF (Resource Description Framework). Por otra parte, se analiza cómo la gestión de datos enlazados permite la interoperabilidad entre sistemas de información y la búsqueda facetada a partir de fondos documentales almacenados, descritos en OWL (Ontology Web Language), SKOS (Simple Knowledge Organization System) y Dublin Core. Los autores proponen la utilización de un CMS (Content Management System) que gestione fondos de archivo, compatible con SIOC (Semantically-Interlinked Online Communities) y OAI-PMH (Open Archives Initiative - Protocol Metadata Harvesting), para facilitar el intercambio y la recuperación de información. En concreto, se detallan las tecnologías que se han utilizado para desarrollar CoroArchivo, sistema que además se evalúa con un experimento que realiza la creación automática de ontologías a partir de descripciones ISAD-G almacenadas en DSpace. La herramienta desarrollada permite realizar consultas federadas sustentadas en las clases de exclusión e igualdad del vocabulario OWL.

A Study Of Authority Control in Spanish University Repositories

Article

Full-text available

Jan 2012
KNOWL ORGAN

This paper examines the current state of authority control development in Spanish university repositories. As a decade has now gone by since the initiation of the first projects for institutional repositories in Spain, it would seem a suitable time to draw attention to authority control, an element of the first rank in evaluating the consistency and integrity of systems for recovering bibliographic information. The work is focused on examining the implementation of authorities in twenty-six Spanish university repositories, taking into account the information provided by the standardization experts working in them. The study considers the responses of the coordinators for these digital collections using a set of analytic criteria set out in the study. The handling of authorities in the group of university repositories studied may be described as uneven. Greater interest may be observed in controlling author entries, with laxer solutions for authority control of subjects. It suggests the need to establish effective poli-cies for the management of authorities by means of cooperative efforts permitting the building up of corpora of entries for authorities that would aid the processes of cataloguing, metadata creation, and information retrieval in systems based on syntactic and semantic interoperability in which manual intervention should be minimal.

Exploiting Semantics in Subject-Based Searching of DSpace

Conference Paper

Nov 2012

Subjects are grouped in a classification system according to their established natural relationship. Subject-based searching (SS) in institutional repositories (IR) searches objects on the basis of subjects. The existing SS technique in the IR system: DSpace, partially exploits the semantics inherent in subject-based classification and misses out objects that are relevant to a user query. Ontology describes the semantic relationships that exist among different subjects of classification. Our proposed system employs ontology for SS in order to exploit the semantic relationships that exist among subjects. We figure out the family of a user given subject, assign different weights to semantic relationships and rank the retrieved objects according to weights. The proposed system was compared with the existing DSpace SS system. The results shows that the system returns more relevant objects than the existing SS system.

LCSH, SKOS and Linked Data

Conference Paper

Full-text available

Jan 2008

A Flexible API and Editor for SKOS

Conference Paper

Full-text available

Jan 2008

This poster presents a programmatic interface (SKOS API) and plugin for Protege 4 for editing and working with the Simple Knowledge Organisation System (SKOS). The SKOS API has been designed to work with SKOS models at a high level of abstraction to aid developers of applications that use SKOS. We discuss SKOSEd, a tool for authoring and editing SKOS artefacts. A key aspect to the design of the API and editor is how SKOS relates to OWL and what existing OWL infrastructure can be exploited to work with SKOS.

PoolParty: SKOS Thesaurus Management Utilizing Linked Data

Conference Paper

Full-text available

May 2010

Building and maintaining thesauri are complex and laborious tasks. PoolParty is a Thesaurus Management Tool (TMT) for the Semantic Web, which aims to support the creation and maintenance of thesauri by utilizing Linked Open Data (LOD), text-analysis and easy-to-use GUIs, so thesauri can be managed and utilized by domain experts without needing knowledge about the semantic web. Some aspects of thesaurus management, like the editing of labels, can be done via a wiki-style interface, allowing for lowest possible access barriers to contribution. PoolParty can analyse documents in order to glean new concepts for a thesaurus. Additionally a thesaurus can be enriched by retrieving relevant information from Linked Data sources and thesauri can be imported and updated via LOD URIs from external systems and also can be published as new linked data sources on the semantic web.

A method to convert thesauri to SKOS

Conference Paper

Full-text available

Jun 2006

Thesauri can be useful resources for indexing and retrieval on the Semantic Web, but often they are not published in RDF/OWL. To convert thesauri to RDF for use in Semantic Web applications and to ensure the quality and utility of the conversion a structured method is required. Moreover, if dierent thesauri are to be interoperable with- out complicated mappings, a standard schema for thesauri is required. This paper presents a method for conversion of thesauri to the SKOS RDF/OWL schema, which is a proposal for such a standard under de- velopment by W3Cs Semantic Web Best Practices Working Group. We apply the method to three thesauri: IPSV, GTAA and MeSH. With these case studies we evaluate our method and the applicability of SKOS for representing thesauri.

OWL Web Ontology Language Reference

Article

Full-text available

Feb 2004

The Web Ontology Language OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language. This document contains a structured informal description of the full set of OWL language constructs and is meant to serve as a reference for OWL users who want to construct OWL ontologies.

LCSH, SKOS and Linked Data

Article

Full-text available

May 2008

A technique for converting Library of Congress Subject Headings MARCXML to Simple Knowledge Organization System (SKOS) RDF is described. Strengths of the SKOS vocabulary are highlighted, as well as possible points for extension, and the integration of other semantic web vocabularies such as Dublin Core. An application for making the vocabulary available as linked-data on the Web is also described.

Resource Description Framework (RDF): Concepts and Abstract Syntax

Article