Conference PaperPDF Available

The Use of SKOS Vocabularies in Digital Repositories: The DSpace Case

Authors:

Abstract and Figures

Thesauri are concept schemes that help in efficiently characterizing and retrieving items from digital libraries. SKOS is a data model that provides a standardized way to represent thesauri-and controlled vocabularies in general-using Resource Description Framework. A digital repository system that can inherently ingest and handle thesauri, although not in SKOS format, is DSpace. SKOS support in DSpace is implemented thanks to an add-on, provided by the University of Minho. Our initial objective was to apply this add-on to a running DSpace instance. We then tested this updated DSpace installation using a real vocabulary: the Thesaurus of Greek Terms for which we took on the task of bringing it in SKOS. As a final step, we tried to tackle with arising problems and to propose solutions, which are mostly based on the Semantic Web techniques.
Content may be subject to copyright.
The Use of SKOS Vocabularies in Digital
Repositories
The DSpace Case
Georgia Solomou
High Performance Information Systems Laboratory
Computer Engineering and Informatics Dpt.
University of Patras
Patras-Rio, Greece
solomou@hpclab.ceid.upatras.gr
Theodore Papatheodorou
High Performance Information Systems Laboratory
Computer Engineering and Informatics Dpt.
University of Patras
Patras-Rio, Greece
tsp@hpclab.ceid.upatras.gr
Abstract—Thesauri are concept schemes that help in efficiently
characterizing and retrieving items from digital libraries. SKOS
is a data model that provides a standardized way to represent
thesauri -and controlled vocabularies in general- using Resource
Description Framework. A digital repository system that can
inherently ingest and handle thesauri, although not in SKOS
format, is DSpace. SKOS support in DSpace is implemented
thanks to an add-on, provided by the University of Minho. Our
initial objective was to apply this add-on to a running DSpace
instance. We then tested this updated DSpace installation using a
real vocabulary: the Thesaurus of Greek Terms for which we
took on the task of bringing it in SKOS. As a final step, we tried
to tackle with arising problems and to propose solutions, which
are mostly based on the Semantic Web techniques.
Digital Libraries; Semantic Web; SKOS; Thesauri; Controlled
Vocabularies; DSpace;
I.
I
NTRODUCTION
More and more cultural and educational institutions are
based upon well-known digital library systems in order to store,
manage and disseminate their digital assets. These mechanisms
often implement facilities that render their content more
knowledge-intensive and thus suitable for exploitation by
Semantic Web applications.
An extremely popular system implemented for handling
digital collections is DSpace. On top of DSpace many
institutional repositories have been built worldwide, serving
museums, libraries, national archives, etc. A significant feature
of DSpace is the ability to characterize its items using a
predefined set of keywords, namely a controlled vocabulary.
The adoption of a structured controlled vocabulary by a
digital library system is of great importance: It helps in
properly characterizing the ingested content and plays a
fundamental role in effectively indexing, searching and
retrieving information. Related to other types of controlled
vocabularies, thesauri are a more powerful choice as they allow
for the explicit declaration of relationships between concepts.
All type of controlled vocabularies, in order to be utilizable
and exchangeable between computer and web applications,
need to be expressed as machine-readable data. A standardized,
interoperable and machine-understandable way for representing
controlled vocabularies and using them within the framework
of the Semantic Web is SKOS.
SKOS (Simple Knowledge Organization System) [5] is a
data model for expressing the basic structure of all concept
schemes, like thesauri. It is actually a practical application of
RDF [7] (and RDFS) and its main objective is to enable easy
publication of controlled structured vocabularies for the
Semantic Web. SKOS is a W3C recommendation, hence a
standardized Web technology. Thesauri expressed in SKOS
can potentially provide added-value to Semantic Web
applications.
In this work we are focusing on SKOS, its prevalence
among digital repositories as well as on the process of
converting thesauri to SKOS (SKOSification). In particular, in
section II we present SKOSified thesauri and related tools that
witness the wide applicability of SKOS. In section III we talk
about existing methods for converting controlled vocabularies
to SKOS and finally show how we adopted this standard for the
Thesaurus of Greek Terms. In section IV we examine the case
of incorporating a SKOSified vocabulary in DSpace.
Conclusions and future work follow.
II. SKOS
U
SAGE
At this moment, there are enough running projects in the
field of culture that involve SKOS: For example, ATHENA,
1
the European digital library “Europeana”,
2
STERNA
3
(Semantic Web-based Thematic European Reference Network
Application), STAR
4
(Semantic Technologies for
1
http://www.athenaeurope.org/
2
http://www.europeana.eu/
3
http://www.sterna-net.eu/
4
http://hypermedia.research.glam.ac.uk/kos/star/
Archeological Resources) and many more projects adopt this
model as a means to provide more knowledge-intensive data.
In addition to these projects, many more institutions,
responsible for publishing thesauri, show particular interest in
adopting the SKOS standard. Besides, the continuously
increasing number of tools that are built around SKOS (like
editors, validators and converters), definitely encourage and
facilitate the SKOSification process.
A. Thesauri in SKOS
The need to migrate knowledge organization systems into
SKOS has long been recognized by organizations that deal with
controlled vocabularies. Some of them have already deployed
an official SKOSified version of their structured vocabularies
whereas others are on the way to do so.
1) Popular SKOSified vocabularies
In this section, we mention some known thesauri that have
been converted to SKOS. These concept schemes are available
to the public and apply to many different areas of knowledge:
LCSH - Library of Congress Subject Headings [11].
It is a very popular subject heading system maintained
by the US Library of Congress. It offers an online
catalogue where users can search and browse
thousands of terms. Through this online catalogue,
users are able to also obtain the SKOS version of each
selected term.
AGROVOC The Food and Agriculture
Organization Thesaurus.
5
It is a multilingual
thesaurus that provides terminology for all subject
fields in agriculture, forestry, fisheries, food and
related domains. For each concept its corresponding
SKOS description is also available, expressed in the
RDF/XML serialization syntax.
UKAT - UK Archival Thesaurus.
6
It is a subject
thesaurus aiming to be utilized for indexing and
searching in the UK archive sector. Its SKOSified
version is provided as a single XML file that can be
directly downloaded from the UKAT official page.
GEMET - General Multilingual Environmental
Thesaurus.
7
A general thesaurus which defines a core
terminology for the environment. It is available as a
web service and can be accessed online; its SKOS
format, though, is not browseable but it can be
obtained in the form of a XML file.
AAT - Getty Arts and Architecture Thesaurus.
8
It is
a structured vocabulary used for characterizing any
type of cultural material, as well as items of art and
architecture. Although the AAT thesaurus can be
browsed online, its SKOSification is still in a draft
5
http://aims.fao.org/website/Search-AGROVOC
6
http://isegserv.itd.rl.ac.uk/skos/ukat/
7
http://isegserv.itd.rl.ac.uk/skos/gemet/
8
http://www.getty.edu/research/conducting_research/vocabular
ies/aat/
stage and that is the reason why it hasn’t be
incorporated yet to the online catalogue.
WordNet.
9
It is a lexical database for the English
language that may be considered as a thesaurus:
expressed concepts are interlinked by various semantic
relations. For WordNet 2.0 a partial conversion to
SKOS has been proposed.
The aforementioned SKOS implementations are some
representative examples and definitely not the only existing
attempts. In addition to these vocabularies, there are many
others, more or less significant, that try to successfully
accomplish their migration to SKOS. Through this work, we
chose to further analyze the case of the Thesaurus of Greek
Terms, a controlled vocabulary implemented in Greece and
meant to be used by Greek institutions.
2) The Thesaurus of Greek Terms
The National Documentation Center of Greece (EKT) is the
national infrastructure for scientific documentation and online
information. It is an institution responsible for publishing and
handling the first official thesaurus in Greece, the Thesaurus of
Greek Terms (TGT).
The TGT thesaurus is structured as a controlled vocabulary
that allows representation of both vertical (hierarchical) and
horizontal associations between concepts. It is composed of
5227 bilingual (Greek, English) terms that cover a broad field
of knowledge. Its aim is to facilitate institutions in Greece, like
libraries, museums and information centers, in characterizing
and managing their digital material.
Despite the thesaurus’ notable presence in Greece, EKT
hasn’t proceeded yet with the SKOSification of this product.
For this reason, in section III we first propose -and afterwards
utilize- a SKOSified version of the TGT thesaurus. The
conversion process is described in section III.B.
B. SKOS Specific Tools
Apart from the increasing number of SKOSified thesauri,
the wide acceptance of SKOS becomes also evident by the
number of tools that are built around it. In what follows, we
will shortly present the most known such tools; they are
distinguished in two basic categories: editors and validators.
ThManager.
10
It is an open-source tool implemented
in Java. Aims at facilitating the creation and
visualization of SKOS vocabularies.
SKOSEd [6]. It is a plug-in for Protégé 4 (an OWL
ontology editor) that augments the latter with the
ability to create and modify SKOS thesauri. SKOSed is
accompanied by the SKOS API
11
, a programmatic API
implemented in Java that can be utilized for building
SKOS based applications.
9
http://isegserv.itd.rl.ac.uk/skos/WordNet.zip
10
http://thmanager.sourceforge.net/
11
http://skosapi.sourceforge.net/
PoolParty [10]. It is a commercial system suitable for
editing SKOS vocabularies and for managing any type
of thesauri. It is built upon Semantic Web technologies,
and among others, allows for thesauri management via
easy-to-use GUIs. PoolParty also offers a SKOS
thesaurus consistency checker service that validates the
submitted vocabularies for their alignment with the
SKOS recommendation [9].
W3C Validation Service.
12
It is an experimental on-
line SKOS validator, provided by W3C.
The MONDECA SKOS Reader.
13
This tool
facilitates users to easily navigate and browse a SKOS
thesaurus, provided that this thesaurus is published as
an accessible SKOS file. It actually produces readable
versions of the imported files, whereas it can display
concepts in various orders (e.g., hierarchically or
alphabetically).
III. T
HE
SKOS
IFICATION
P
ROCESS
The process of converting thesauri to SKOS, as we will see
in this section, is not standardized and depends on the nature of
the candidate vocabulary.
A. Existing Methods
A notable attempt for SKOSification is proposed in [1] by a
team at the VU University of Amsterdam. They apply their
method in some well-known thesauri, like MeSH and GTAA.
However, although the proposed method behaves well for
controlled vocabularies that are based on older thesauri
recommendations (e.g., ISO or ANSI/NISO standard),
vocabularies with non-standard features cannot be handled.
Apart from the aforementioned case, which tries to propose
a structured method for bringing thesauri to SKOS, some other
attempts have also been published, aiming to fit the needs of a
particular vocabulary. For example, [11] is a technical report
dealing with the conversion of the TheSoz thesaurus
(Thesaurus of the Social Sciences) to SKOS. Moreover, [11]
describes the SKOSification of the Library of Congress Subject
Headings. Both works present a procedure that requires manual
effort for implementing the mapping from the original
thesaurus’ elements to SKOS notions. In both cases, an
appropriate XSL transformation is finally applied, which
accomplishes the migration to SKOS.
B. SKOSifying the TGT Thesaurus
Having all these in mind, we proceeded with the
SKOSification of the TGT thesaurus. The latter follows the
structure of any usual subject thesaurus (see Fig. 1). It makes
use of hierarchical (<BT>, <NT>, <MT>), associative (<RT>)
and equivalence (<UF>) relations. In addition, for each term its
English translation is provided (<ET>), as well as its
correspondence to the Dewey Decimal Classification system
(<dewey>).
12
http://www.w3.org/2004/02/skos/validation
13
http://client2.mondeca.com/mondecalabs/skosReader.html
Figure 1. The TGT thesaurus in its original XML format.
First, we manually mapped thesaurus elements to SKOS
notions, paying particular attention to what the SKOS
specification dictates. This mapping is summarized in Table I.
We then constructed an XSL transformation able to convert
TGT to the desired SKOS format, taking into account the
mapping in Table I. The final SKOSified version of TGT can
be accessed online
14
.
IV. T
HE
DS
PACE
C
ASE
DSpace is an open-source digital repository system. It is
responsible for the efficient description, preservation,
management, and distribution of any kind of digital material. It
is popular because it supports an extensible core metadata
schema, based on the well-known Dublin Core specification
[4]. Furthermore, DSpace is multilingual, adaptable to
administrator’s needs and able to incorporate novel features.
One such feature is the utilization of controlled vocabularies so
as to better characterize and manipulate its items.
TABLE I. M
APPING TO
SKOS
E
LEMENTS
XML
element Function SKOS notion
<TERM> The described term <skos:Concept>
<USER> Thesaurus’ owner -
<CONTEXT>
Term’s label <skos:prefLabel lang="el">
<MT> Microthesauri term <skos:broaderTransitive>
<ET>
a
English translation <skos:prefLabel lang="en">
<ET> Alternative English
translation <skos:altLabel lang="en">
<BT> Broader term <skos:broader>
<NT> Narrower term <skos:narrower>
<RT> Related term <skos:related>
<UF> Opposite of the Used
Instead (USE) term <skos:altLabel lang="el">
<SN> A short description <skos:definition>
<DEWEY>
A number indicating
the correspondence to
Dewey system
<skos:notation>
a. The first occurrence of <ET> element is considered as the preferred translation
14
http://swig.hpclab.ceid.upatras.gr/SKOS?action=AttachFile
&do=get&target=ekt_to_skos.rdf
<TERM>
<CONTEXT>αστικά δικαστήρια</CONTEXT>
<USER>EKT</USER>
<MT>Νομικές Επιστήμες</MT>
<ET>civil courts</ET>
<BT>δικαστήρια</BT>
<NT>ειρηνοδικεία<NT>
<NT>πρωτοδικεία<NT>
<UF>βλ. πολιτικά δικαστήρια</UF>
<RT>πολιτική δικονομία</RT>
<SN>some description</SN>
<dewey>347</dewey>
</TERM>
Figure 2. HTML node tree.
A. Controlled Vocabullaries in DSpace
DSpace supports controlled vocabularies in order to provide
and restrict a set of keywords that end-users utilize for
describing, searching and browsing items. These keywords are
organized in the form of a tree (taxonomy) which becomes
available to the end-user during the search or submission
process (see Fig. 2).
Controlled vocabularies are fed to DSpace as simple XML
files: there is one such file per each ingested vocabulary. But in
contrast to the general multilingual philosophy of DSpace, the
controlled vocabulary facility lacks the multilingualism
characteristic. To solve this, we have augmented DSpace with
the ability to support any number of translations for each
vocabulary. Each vocabulary’s translation is fed, and thus
handled by the system, as a separate XML file. Consequently,
when end-users select a language for the DSpace interface, they
automatically choose the translation in which all available
controlled vocabularies will appear.
In order for a controlled vocabulary to be recognized by the
DSpace system, the former should be structured according to a
specific format (“DSpace node schema”). This means that in
order to make DSpace able to support vocabularies in different
formats -other than the “DSpace node schema”- an
intermediate transformation is needed.
According to the DSpace node schema all information
about a vocabulary term are enclosed in a <node> element
(see Fig. 3). Only hierarchical -narrower in meaning-
relationships can be expressed using the sub-element
<isComposedBy>. Moreover, a simple annotation
mechanism is provided by the optional sub-element
<hasNote>.
B. Support of SKOS in DSpace
The Odisseia Research at the University of Minho in
Portugal has implemented an add-on for version 1.4.2 of
DSpace [3] which augments this digital repository system with
the ability to support controlled vocabularies expressed in
SKOS. In particular, the provided add-on makes the following
changes:
<node id="acmccs 98" label="ACMC CS98">
<isComposedBy>
<node id="A." label=" General Literat ure">
<isComposedBy>
<node id="A.0" la bel="GENERAL"/>
<node id="A.1" la bel="INTRODUCTORY AND SURVEY"/>
</isComposedBy >
</node>
Figure 3. The DSpace node schema.
Enhances the DSpace inherent node schema so as to
manipulate related and preferred (use-instead) terms.
Allows support for SKOSified thesauri.
Offers the ability to assign different vocabulary to each
DSpace Community.
Apart from the last feature, which is beyond the scope of
this work, we have successfully accommodated the first two in
the latest version of DSpace (DSpace 1.6). The modifications
we had to accomplish were subtle and didn’t affect the way
DSpace was handling SKOSified vocabularies
The add-on actually alters the controlled vocabulary
ingestion process in DSpace. It adds an intermediate
transformation step, implemented by an appropriate XSLT file
(see Fig. 4). As a result, the support of SKOSified vocabularies
is finally achieved through two subsequent XSL
transformations:
The first applies to the original SKOS file and
produces a valid DSpace node schema.
The second is responsible for converting the inherent
node schema to an HTML node tree (taxonomy)
But this approach includes also a number of problems,
something that became evident when we tried to import a real
SKOS thesaurus in a working DSpace instance, enhanced with
this add-on. The arising problems are explained in the
following section.
C. The TGT Thesaurus in DSpace
After we had successfully SKOSified the TGT thesaurus,
we attempted to incorporate it in a DSpace 1.6 working
instance. The result was not satisfactory as we faced two kinds
of problems, concerning the construction of the HTML node
SKOS
SKOS SKOS
SKOS
Vocabulary
(RDF/XML)
XML
XMLXML
XML
DSpace
Node
Schema
XSL Transformation
(
vocabularySKOS2node.xsl
)
XSL Transformation
(
vocabulary2html.xsl
)
HTML
Node
Tree
Submission
Process
Subject
Search
Figure 4. The controlled vocabulary ingestion process.
tree:
1. Some terms appeared in the wrong place of the
taxonomy (wrong depth level or repetitions of terms).
2. A number of terms, although present in the SKOS file,
were missing from the tree hierarchy.
The main reason behind these problems was the inability of
the provided XSL transformations to deal with every possible
relationship among described concepts (e.g., there was no
provision for broader terms). These problematic
transformations, along with the non-exhaustive (but not
semantically inconsistent) implementation of TGT, in which
not every possible relationship is asserted, made the situation
even worse. In particular, we noticed that the thesaurus terms
that do not exist as stand-alone concepts and are only
referenced through a relation by another concept (hierarchical,
associative or relation of equivalence) fail to appear in the final
HTML tree. If the TGT thesaurus was complete the problem of
“missing” terms would probably not exist.
To tackle the aforementioned issues, we initially modified
the XSL transformation that is responsible for converting the
imported SKOS files into the DSpace node schema. By this
way, we managed to handle the first problem, whereas the
latter remains still unsolved.
More specifically, the transformation implemented by the
provided add-on was replaced by a new one, which was
deployed so as to carefully manipulate all relationships
between concepts. As a result we had neither repetitions of
terms nor wrong placement of them. DSpace users are now
presented with an accurate taxonomy, at least as far as its
structure is concerned. Top concepts appear as top categories in
the constructed tree and each sub-term lies beneath them. In
other words, each term is placed under its broader concept. A
part of the taxonomy tree that is produced by the SKOSified
TGT thesaurus when imported in DSpace 1.6, is shown in Fig.
5.
D. Proposed Solutions
A possible and more effective solution to the problem of
“missing” terms would be to consider thesauri as ontologies.
Besides, the SKOS is by itself defined as being in the Web
Ontology Language (OWL [2][1]) format. Such a consideration
would allow for a programmatic access to the thesaurus’
elements. In particular, by exploiting the OWL API, a simpler
way to construct the vocabulary’s node tree would be possible,
in contrast to the more complex one offered by the XSL
transformation. Furthermore, the new constructs offered by the
latest version of OWL (OWL 2) would allow for the expression
of richer semantic conditions in SKOS, as stated in [8].
Another gain in handling the SKOS thesaurus as an OWL
ontology, is the possibility to apply OWL reasoners (like
FaCT++ or Pellet). Such a reasoning based approach allows for
inferencing and thus for the automatic handling of non-asserted
relationships. Consequently, an inferenced-based classification
and rendering of the thesaurus could be achieved, resulting in
having no “missing” terms in the tree hierarchy.
V. C
ONCLUSIONS AND
F
UTURE
W
ORK
In this work we tried to show the importance of SKOS, as a
data model able to transfer knowledge organization systems to
the Semantic Web in a quick and simplified manner. Many
popular thesauri have either already migrated to SKOS or are
working on this task.
Following this trend in library science, we took over the
task to convert the TGT thesaurus -a controlled vocabulary
targeted for use by Greek institutions- in SKOS. Afterwards,
we tried to see how such a SKOSified thesaurus can be
exploited by a digital repository system, which manages any
kind of educational and cultural content. During the SKOSified
vocabulary ingestion process we faced several problems,
mostly originating from the problematic nature of the applied
add-on. Another reason of these problems was the non-
exhaustive description of the thesaurus. To fix some of these
issues, we successfully modified the provided XSL
transformation. Nevertheless, the problem of “missing” terms,
still remains unsolved.
As future work we intend to utilize Semantic Web
technologies in order to manipulate controlled vocabularies in a
more refined way. To the extent of handling thesauri as
ontologies, our final objective is to make them manageable by
OWL reasoners. In this way, we at least expect to overcome the
problem of “missing” terms in the produced HTML hierarchy,
given that inferred relationships can fill in missing descriptions
in a thesaurus.
R
EFERENCES
[1]
M. Assem, V. van, Malaisι, A. Miles, and G. Schreiber, “A Method to
Convert Thesauri to SKOS”, Proc. 3rd European Semantic Web
Conference (ESWC 2006), Springer Berlin/Heidelberg, 2006. pp. 95-
106, doi: 10.1007/11762256_10
[2]
S. Bechhofer, V. F. Harmelen, J. Hendler, I. Horrocks, D.
McGuinness, P. Patel-Schneider, P., and L. Stein, “OWL Web
Ontology Language: Reference”, W3C Recommendation, 2004
http://www.w3.org/TR/owl-ref/
Figure 5. The TGT thesaurus in DSpace 1.6.
[3]
S. Costa, M. Ferreira, and A. Alice, “Controlled-Vocabulary Add-on
Patch for DSpace 1.4.2”, 2007.
http://sourceforge.net/tracker/index.php?func=detail&aid=1833347
&group_id=19984&atid=319984.
[4]
DCMI Usage Board, “DCMI Metadata Terms”, DCMI
Recommendation, 2008.
http://dublincore.org/documents/dcmi-terms/
[5]
A. Issaac, and E. Summers, (eds), “SKOS Simple Knowledge
Organization System Primer”, W3C Proposed Recommendation,
2009. http://www.w3.org/TR/2009/WD-skos-primer-20090615/
[6]
S. Jupp, S. Bechhofer, and R. Stevens, “A Flexible API and Editor for
SKOS”, Proc. 7th International Semantic Web Conference (ISWC
2008), 2008
[7]
G. Klyne, and J. J. Carroll, (eds) “Resource Description Framework
(RDF): Concepts and Abstract Syntax”, W3C Recommendation, 2004.
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
[8]
D. Koutsomtiropoulos, and G. Solomou, “SKOS in OWL 2”, 2009.
http://swig.hpclab.ceid.upatras.gr/SKOS
[9]
A. Miles, and S. Bechhofer, “SKOS Simple Knowledge Organization
System Reference”, W3C Recommendation, 2009.
http://www.w3.org/TR/skos-reference/
[10]
T.
Schandl, and A. Blumauer, “PoolParty: SKOS Thesaurus
Management Utilizing Linked Data”, Proc. Extended Semantic Web
Conference (ESWC 2010), 2010. Springer Berlin/Heidelberg. pp. 421-
425, 10.1007/978-3-642-13489-0_36
[11]
E. Summers, A. Isaac, C. Redding, and D. Krech, “LCSH, SKOS and
Linked Data”, Proc. International Conference on Dublin Core and
Metadata Applications (DC 2008), 2008
[12] B
. Zapilko, and Y. Sure, “Converting the TheSoz to SKOS”, GESIS
Technical Report 2009/07, GESIS - Leibniz Institute for the Social
Sciences, 2009
... Although both of these thesauri cover specific fields of knowledge, they are generic enough and thus sufficient for the characterization of the most common subjects met in these thematic areas. Through an appropriate mapping process [20] we achieved the SKOS transformation of these two thesauri, from their initial XML format into OWL. ...
... To evaluate our system we used the two domain thesauri we developed and described above. In addition, we experimented with larger thesauri, MeSH (Medical Subject Headings), already converted into SKOS [23] and comprising of 23,883 concepts and the complete Thesaurus of Greek Terms (TGT), that we have implemented in SKOS previously [20], comprising of 5,227 terms. Experiments were performed on standard virtualized hardware, using a single CPU and 2GB RAM. ...
Article
Full-text available
Purpose The purpose of this paper is to propose a framework and system to address the inability to discover new and authentic learning material and the lack of a single access point for search and browsing of remote learning object repositories (LORs). Design/methodology/approach The authors develop a framework for keyword-based query expansion using SKOS domain terminologies and implement a federated search mechanism integrating various disparate LORs within a learning management system (LMS). Findings The authors show that the expanded query achieves improved information gain and it is applied for federated information access, by simultaneously searching within a number of repositories. Results can be seamlessly aggregated back within the LMS and the course context. Practical implications It is possible to retrieve additional learning objects (LOs) and achieve a corresponding increase in recall, while maintaining precision. SKOS expansion behaves well in a scholarly setting, which, combined with federated search, can contribute toward LOs’ discovery at a balanced cost. The system can be easily integrated with other platforms as well, building on open standards and RESTful communication. Originality/value To the authors’ knowledge, this is the first time SKOS-based query expansion is applied in a federated setting, and for the discovery and alignment of learning objects residing within LORs. The results show that this approach can achieve considerable information gain and that it is possible to strike a balance between search effectiveness, query drift and performance.
... SKOS itself being an OWL ontology, the representation of SKOS is based on Resource Description Framework (RDF) graphs. Increasingly, vocabularies are implemented SKOS for health and audiovisual applications, 34 for education and culture, 35 for Food and Agriculture (Agrovoc), 36 for activities of the European Union (Eurovoc), 37 for the environment (GEMET) 38 and for the economy (STW). 39 Figure 1 shows the proposed method, structured in six steps. ...
Article
Full-text available
Today, social media is increasingly used by patients to openly discuss their health. Mining automatically such data is a challenging task because of the non-structured nature of the text and the use of many abbreviations and the slang terms. Our goal is to use Patient Authored Text to build a French Consumer Health Vocabulary on breast cancer field, by collecting various kinds of non-experts’ expressions that are related to their diseases and then compare them to biomedical terms used by health care professionals. We combine several methods of the literature based on linguistic and statistical approaches to extract candidate terms used by non-experts and to link them to expert terms. We use messages extracted from the forum on ‘cancerdusein.org’ and a vocabulary dedicated to breast cancer elaborated by the Institut National Du Cancer. We have built an efficient vocabulary composed of 192 validated relationships and formalized in Simple Knowledge Organization System ontology.
... -SKOS, 8 con funciones de tesauro. ...
Article
Full-text available
An ontology is a useful tool to deal with the heterogeneity of data and their semantization, with a view to raising the quality of processes of organization, search and retrieval of information in institutional management systems, particularly those implemented in a university web. In view of the wealth of multidisciplinary knowledge treasured by the Universities, a UH Ontology is proposed for the management of data. The paper describes the characteristics and conditions of the heterogeneous data currently managed by various information management systems at the Universities, and proposes a methodological framework for the design of an ontology for the management of heterogeneous data at the institution. The proposal includes the design of the ontology, its classes, annotations, ontological languages and semantic annotation scheme, based on the methodology developed by Noy and McGuinness and the software Protégé as a construction tool.
Article
Full-text available
Este estudio examina de forma exhaustiva la literatura científica dedicada a los procesos de skosificación de vocabularios y sistemas de organización del conocimiento. Se analizan en profundidad 49 trabajos que describen y detallan la transformación de un total de 59 vocabularios controlados convencionales o SOC (Sistemas de Organización del Conocimiento) a Simple Knowledge Organization System (SKOS). Se identifican los puntos clave para hacer el análisis de metodologías de transformación de vocabularios en SKOS para la web y se comparan los estudios para determinar las aproximaciones y parámetros más recomendables para llevar a cabo estos procesos de conversión de vocabularios, cada vez más frecuentes y necesarios en la web semántica y en entornos de linked data (LD). Los resultados señalan que la mayor parte de SOC transformados son tesauros, que los formatos mayoritarios son de texto o registros bibliográficos, que el objetivo más común al cambiar a SKOS es la mejora de la interoperabilidad de los vocabularios, y que los procesos de conversión pueden agruparse mediante tres formas: scripts realizados en distintos lenguajes, transformaciones XSL y lenguajes de mapeo. Se concluye queSKOS es considerado por los autores como una buena opción para mejorar la interoperabilidad de vocabularios controlados.
Article
Full-text available
Technology platforms, as viewed from the perspective of their users, provide new perspectives to discover aspects to enhance its use. The objective of this study is to provide the instruments and indicators that allow us to obtain empirical evidence of the experience of users of an institutional repository through the user-centered design methodology. The guiding question of the study was: How can society measure the experience of users who use an institutional repository? The authors employed a sequential mixed explanatory methodology and user-centered design, with the use of focus groups and surveys, applied to a sample of students and professors. The findings suggest that three key aspects could be considered to promote satisfactory experiences of users in relation to the open educational movement: a) innovation in communication strategies to increase knowledge transfer in an open format, b) the versatility of the technologies and c) establish the normativity and regulation of their use of institutional repositories.
Article
Purpose This paper aims to present an overview of the challenges encountered in integrating visual search interfaces into digital libraries and repositories. These challenges come in various forms, including information visualisation, the use of knowledge organisation systems and metadata quality. The main purpose of this study is the identification of criteria for the evaluation and integration of visual search interfaces, proposing guidelines and recommendations to improve information retrieval tasks with emphasis on the education-al context. Design/methodology/approach The information included in this study was collected based on a systematic literature review approach. The main information sources were explored in several digital libraries, including Science Direct, Scopus, ACM and IEEE, and include journal articles, conference proceedings, books, European project reports and deliverables and PhD theses published in an electronic format. A total of 142 studies comprised the review. Findings There are several issues that authors did not fully discuss in this literature review study; more specific, aspects associated with access of digital resources in digital libraries and repositories based on human computer interaction, i.e. usability and learnability of user interfaces; design of a suitable navigation method of search based on simple knowledge organisation schemes; and the use of usefulness of visual search interfaces to locate relevant resources. Research limitations/implications The main steps for carrying out a systematic review are drawn from health care; this methodology is not commonly used in fields such as digital libraries and repositories. The authors aimed to apply the fundamentals of the systematic literature review methodology considering the context of this study. Additionally, there are several aspects of accessibility that were not considered in the study, such as accessibility to content for disabled people as defined by ISO/IEC 40500:2012. Originality/value No other systematic literature reviews have been conducted in this field. The research presents an in-depth analysis of the criteria associated with searching and navigation methods based on the systematic literature review approach. The analysis is relevant for researchers in the field of digital library and repository creation in that it may direct them to considerations in designing and implementing visual search interfaces based on the use of information visualisation.
Article
Full-text available
En este trabajo se presentan las principales tecnologías de la Web Semántica que pueden ser de utilidad para la gestión de fondos archivísticos. Se examinan diversos proyectos de ámbito internacional y local que parten de descripciones normalizadas ISAD-G para generar ontologías, así como la disponibilidad de LIAM (Linked Archival Metadata), que facilita la transformación de datos de archivo a formato RDF (Resource Description Framework). Por otra parte, se analiza cómo la gestión de datos enlazados permite la interoperabilidad entre sistemas de información y la búsqueda facetada a partir de fondos documentales almacenados, descritos en OWL (Ontology Web Language), SKOS (Simple Knowledge Organization System) y Dublin Core. Los autores proponen la utilización de un CMS (Content Management System) que gestione fondos de archivo, compatible con SIOC (Semantically-Interlinked Online Communities) y OAI-PMH (Open Archives Initiative - Protocol Metadata Harvesting), para facilitar el intercambio y la recuperación de información. En concreto, se detallan las tecnologías que se han utilizado para desarrollar CoroArchivo, sistema que además se evalúa con un experimento que realiza la creación automática de ontologías a partir de descripciones ISAD-G almacenadas en DSpace. La herramienta desarrollada permite realizar consultas federadas sustentadas en las clases de exclusión e igualdad del vocabulario OWL.
Article
Full-text available
This paper examines the current state of authority control development in Spanish university repositories. As a decade has now gone by since the initiation of the first projects for institutional repositories in Spain, it would seem a suitable time to draw attention to authority control, an element of the first rank in evaluating the consistency and integrity of systems for recovering bibliographic information. The work is focused on examining the implementation of authorities in twenty-six Spanish university repositories, taking into account the information provided by the standardization experts working in them. The study considers the responses of the coordinators for these digital collections using a set of analytic criteria set out in the study. The handling of authorities in the group of university repositories studied may be described as uneven. Greater interest may be observed in controlling author entries, with laxer solutions for authority control of subjects. It suggests the need to establish effective poli-cies for the management of authorities by means of cooperative efforts permitting the building up of corpora of entries for authorities that would aid the processes of cataloguing, metadata creation, and information retrieval in systems based on syntactic and semantic interoperability in which manual intervention should be minimal.
Conference Paper
Subjects are grouped in a classification system according to their established natural relationship. Subject-based searching (SS) in institutional repositories (IR) searches objects on the basis of subjects. The existing SS technique in the IR system: DSpace, partially exploits the semantics inherent in subject-based classification and misses out objects that are relevant to a user query. Ontology describes the semantic relationships that exist among different subjects of classification. Our proposed system employs ontology for SS in order to exploit the semantic relationships that exist among subjects. We figure out the family of a user given subject, assign different weights to semantic relationships and rank the retrieved objects according to weights. The proposed system was compared with the existing DSpace SS system. The results shows that the system returns more relevant objects than the existing SS system.
Conference Paper
Full-text available
This poster presents a programmatic interface (SKOS API) and plugin for Protege 4 for editing and working with the Simple Knowledge Organisation System (SKOS). The SKOS API has been designed to work with SKOS models at a high level of abstraction to aid developers of applications that use SKOS. We discuss SKOSEd, a tool for authoring and editing SKOS artefacts. A key aspect to the design of the API and editor is how SKOS relates to OWL and what existing OWL infrastructure can be exploited to work with SKOS.
Conference Paper
Full-text available
Building and maintaining thesauri are complex and laborious tasks. PoolParty is a Thesaurus Management Tool (TMT) for the Semantic Web, which aims to support the creation and maintenance of thesauri by utilizing Linked Open Data (LOD), text-analysis and easy-to-use GUIs, so thesauri can be managed and utilized by domain experts without needing knowledge about the semantic web. Some aspects of thesaurus management, like the editing of labels, can be done via a wiki-style interface, allowing for lowest possible access barriers to contribution. PoolParty can analyse documents in order to glean new concepts for a thesaurus. Additionally a thesaurus can be enriched by retrieving relevant information from Linked Data sources and thesauri can be imported and updated via LOD URIs from external systems and also can be published as new linked data sources on the semantic web.
Conference Paper
Full-text available
Thesauri can be useful resources for indexing and retrieval on the Semantic Web, but often they are not published in RDF/OWL. To convert thesauri to RDF for use in Semantic Web applications and to ensure the quality and utility of the conversion a structured method is required. Moreover, if dierent thesauri are to be interoperable with- out complicated mappings, a standard schema for thesauri is required. This paper presents a method for conversion of thesauri to the SKOS RDF/OWL schema, which is a proposal for such a standard under de- velopment by W3Cs Semantic Web Best Practices Working Group. We apply the method to three thesauri: IPSV, GTAA and MeSH. With these case studies we evaluate our method and the applicability of SKOS for representing thesauri.
Article
Full-text available
The Web Ontology Language OWL is a semantic markup language for publishing and sharing ontologies on the World Wide Web. OWL is developed as a vocabulary extension of RDF (the Resource Description Framework) and is derived from the DAML+OIL Web Ontology Language. This document contains a structured informal description of the full set of OWL language constructs and is meant to serve as a reference for OWL users who want to construct OWL ontologies.
Article
Full-text available
A technique for converting Library of Congress Subject Headings MARCXML to Simple Knowledge Organization System (SKOS) RDF is described. Strengths of the SKOS vocabulary are highlighted, as well as possible points for extension, and the integration of other semantic web vocabularies such as Dublin Core. An application for making the vocabulary available as linked-data on the Web is also described.