Figure 6 - uploaded by Marta Villegas
Content may be subject to copyright.
SPARQL query example 

SPARQL query example 

Source publication
Conference Paper
Full-text available
The proliferation of different metadata schemas and models pose serious problems of interoperability. Maintaining isolated repositories with overlapping data is costly in terms of time and effort. In this paper, we describe how we have achieved a Linked Open Data version of metadata descriptions coming from heterogeneous sources, originally encoded...

Context in source publication

Context 1
... with the number of triples involved in each case. Finally, although the MS schema defines „ language ‟ information (both names and codes) as simple xs:string 20 type elements, we used the FAO ontology for language codes (ISO638-3). Currently, our dataset includes 298 triples with language code information, using 19 different codes. Besides using existing vocabularies, we defined links between our instances and external instances whenever possible. We used the sameAs and seeAlso properties to point to the relevant records in the DBLP, the Virtual International Authority File (VIAF) and the LOC 21 Authorities Names . Table 11 lists the correspondences between agents and documents with external data. Data categories are, by large, the most prolific elements in the MS schema. As we have already mentioned, the schema includes up to 807 different enumerations that were translated into instances of relevant classes in the final model (see the example in Figure 5 above). We undertook the task of linking these instances to relevant ISOcat data categories and to DBpedia whenever possible. In Table 12 we list the data categories used: the first column indicates the class; the second column shows the number of instances for each class; and the third and fourth columns show the number of linked instances to ISOcat and DBpedia respectively (we defined a ms:dcr property to link to ISOcat and used sameAs property with DBpedia). As corresponding explained in tools the ”. first Services section (and of this tools) paper, are resources we come from which a are scenario distinguished where corpus from and other lexicons resources are described in their in resourceComponentType the META-SHARE node . and stored in our institutional e-Repository, Due to these disparate whereas descriptions services are it registered is not possible in the to BioCatalogue. directly map the In BioCatalogue, information in a the service input is SoapServices defined as “ a container descriptions for into ServiceDeployments the target MS toolService and variants component. ” , where a variant This implies is either that a we SoapService need to or merge RestService the MS , and and a ServiceDeployment BioCatalogue ontologies. is a particular The solution running is instance to map of the the service variants . Services of a given also BioCatalogue include information service as about different testing. MS On resources the other , i.e. hand, as subtypes in the MS of model, the MS a Service service . is Table 13 defined as shows a “ form the resulting in which merged NLP tasks model. are realized and delivered corresponding tools ”. Services (and tools) are resources which are distinguished from other resources in their resourceComponentType . Due to these disparate descriptions it is not possible to directly map the information in the input SoapServices descriptions into the target MS toolService component. This implies that we need to merge the MS and BioCatalogue ontologies. The solution is to map the variants of a given BioCatalogue service as different MS resources , i.e. as subtypes of the MS Service . Table 13 shows the resulting merged model. LOD is the perfect place to merge and combine data. As explained in the preceding section, by reusing data from external vocabularies we were able to re-define some MS components such as Person and Organization in a much more standard way (via FOAF). Similarly, for Documents, the BIBO ontology allowed us to merge our bibliographical database into the dataset. In the original MS schema, documents could only be attached to resources, either as the „documentation‟ of the whole resource or as „reports‟ relevant for its validation, usage, annotation and evaluation aspects. In the converted model, documents include articles, manuals, reports, videos, etc. and constitute an important element of the dataset. We assume that everything can be documented either directly (by means of the ms:documentation or ms:validationReport properties among others) or indirectly (by means of dc:subject or dcterms:references properties). Thus, any class and instance in the dataset may be linked to some relevant document . For example, the class SOAPservice may be referenced by some video or article. Similarly, a Task instance such as namedEntityRecognition or a Standard instance such as LMF may be also documented. We use the dc:subject and dcterms:references properties to link documents to relevant parts of the data set. Integrating disparate, yet related, datasets into a single repository has obvious benefits, especially, to the maintainability, interoperability and integrity of the data. All of this has positive consequences both on cost and functionality. We also get an additional advantage from the properties of the RDF/OWL/SPARQL framework, which makes data exploitation simple and efficient. For example, if our user wants to know about Named Entity Recognition (NER), we can get all relevant data with a very simple query (Figure 6). With this simple query we easily retrieve „everything that has to do with NER‟. In this case we get Articles , Reports and Projects dealing with NER as well as Services performing such a task. In addition, the seeAlso property suggests us to check namedEntity . All data is stored at the SPARQL endpoint: We run a data browser located at The benefits of our exercise can be summarized as follows. On the one hand, the final RDF/OWL model is much simpler than the original XSD schema: it clearly differentiates between Classes and Properties and avoids problems typical of XML syntax, such as semantic ambiguity and order constraint. This is essential for mapping purposes. On the other hand, the open world assumption of RDF/OWL allows to naturally integrate objects from different schemas and to add further extensions, making merging of different models straightforward. We were effectively able to merge Service and Document ontologies into the MS model in a natural way. Moreover, linking data has an additional benefit: The resulting model not only includes linking between merged catalogues but also linking to external data. Last but not least, the RDF/OWL version of the registry (loaded in a SPARQL end point) not only provides a single unified repository but also facilitates the exploitation of the metadata records by the end-user. The success of our experiment encourages us to apply it on a larger scenario, such as the CLARIN Component Registry. The idea behind the CMDI approach was that identifying components blocks in the original schemas would improve interoperability among models. However, the proliferation of components in the Component Registry eventually becomes a critical problem. Conversion to RDF/OWL paves the way for more ambitious goals such as being able to derive a general ontology accounting for the different underlying schemas. The work reported has been co-funded by the "Fons europeu de desenvolupament regional (FEDER), Programa operatiu FEDER de Catalunya 2007-2013, Objective 1". Broeder, Daan, Oliver Schonefeld, Thorsten Trippel, Dieter Van Uytvanck and Andreas Witt. “A pragmatic approach to XML interoperability – the Component Metadata Inf rastructure (CMDI).” Presented at Balisage: The Markup Conference 2011, Montréal, Canada, August 2 - 5, 2011. In Proceedings of Markup Technologies, vol. 7 (2011). doi:10.4242/BalisageVol7. Haslhofer, Bernhard and Klas, Wolfgang. (2010) A survey of techniques for achieving metadata interoperability. ACM Comput. Surv . 42, 2, Article 7 (March 2010), 37 pages Hunter, Jane and Lagoze, Carl (2001) Combining RDF and XML schemas to enhance interoperability between metadata application profiles. In Proceedings of the (WWW '01). ACM, New York, NY, USA, 457-466. Lam, H. Y., Marenco, L., Shepherd, G. M., Miller, P. L., Cheung K. H. (2006) Using web ontology language to integrate heterogeneous databases in the neurosciences. AMIA ... Annual Symposium proceedings / AMIA Symposium . AMIA Symposium, pp. 464-468. Tsinaraki, Chrisa and Christodoulakis, Stavros (2007) Interoperability of XML schema applications with OWL domain knowledge and semantic web tools. In IS - Volume Part I (OTM'07), Robert Meersman and Zahir Tari (Eds.), Vol. Part I. Springer-Verlag, Berlin, Heidelberg, 850-869. Windhouwer, M., & Wright, S. E. (2012). Linking to linguistic data categories in ISOcat. In C. Chiarcos, S. Nordhoff, & S. Hellmann (Eds .), Linked data in data and language metadata (pp. 99-107). Berlin: Springer. Xiaoshu Wang, Robert Gorlitsky, Jonas S Almeida (2005). From XML to RDF: how semantic web technologies will change the design of 'omic' standards. In Nature Technology, Vol 23, No 9, pp 1099-1103, Sep ...

Similar publications

Conference Paper
Full-text available
A crucial issue in the development of smart sustainable cities and territories is the use of standards and approach’s to ensure interoperability – so that equipment and systems produced by different vendors work together seamlessly – and to reduce costs through economies of scale. In this work, we investigate the core challenges faced when consumin...
Conference Paper
Full-text available
The convergence of Libraries, Archives and Museums (LAM) has been a topic of much discussion in the \(Digital \, Library \, (DL)\) research field, but their similarities and common points are not yet fully exploited in existing formal models for DL such as the \(Streams, \, Structures, \, Spaces, \, Scenarios, \, Societies \, (\textit{5}S)\) model...
Article
Full-text available
This article presents a work-in-progress version of a Dublin Core Application Profile (DCAP) developed to serve the Social and Solidarity Economy (SSE). Studies revealed that this community is interested in implementing both internal interoperability between their Web platforms to build a global SSE e -marketplace, and external interoperability amo...

Citations

... The development of standards can be affected by whether a OWA or CWA paradigm is adopted by the standard designers. An OWA standard semantic framework allows for modularity between different conceptualisations, where an ecosystem of different semantic artefacts can develop (Villegas et al., 2014;Chah, 2018). This system allows anybody to define or iterate on an ontology or its sub-elements. ...
Article
Full-text available
With the rapid evolution of the wind energy sector, there is an ever-increasing need to create value from the vast amounts of data made available both from within the domain and from other sectors. This article addresses the challenges faced by wind energy domain experts in converting data into domain knowledge, connecting and integrating them with other sources of knowledge, and making them available for use in next-generation artificial intelligence systems. To this end, this article highlights the role that knowledge engineering can play in the digital transformation of the wind energy sector. It presents the main concepts underpinning knowledge-based systems and summarises previous work in the areas of knowledge engineering and knowledge representation in a manner that is relevant and accessible to wind energy domain experts. A systematic analysis of the current state of the art on knowledge engineering in the wind energy domain is performed with available tools put into perspective by establishing the main domain actors and their needs, as well as identifying key problematic areas. Finally, recommendations for further development and improvement are provided.
... Vocabulary Of Interlinked Datasets (voiD) [3], Vocabulary of a Friend (VOAF), 25 Data Catalog Vocabulary (DCAT) 26 and DataID [9], and for more modeland domain-specific cataloguing, e.g. the Semantic Web Applications in Neuromedicine Ontology (SWAN) 27 by the Semantic Web Health Care and Life Sciences (HCLS) Interest Group; 28 the Linguistic Metadata (LIME) [28] for OntoLex, 29 the Meta-Share.owl ontology 30 [57], a linked open data version of the XML-based META-SHARE [29]. ...
Article
Full-text available
The need for reusable, interoperable, and interlinked linguistic resources in Natural Language Processing downstream tasks has been proved by the increasing efforts to develop standards and metadata suitable to represent several layers of information. Nevertheless, despite these efforts, the achievement of full compatibility for metadata in linguistic resource production is still far from being reached. Access to resources observing these standards is hindered either by (i) lack of or incomplete information, (ii) inconsistent ways of coding their metadata, and (iii) lack of maintenance. In this paper, we offer a quantitative and qualitative analysis of descriptive metadata and resources availability of two main metadata repositories: LOD Cloud and Annohub. Furthermore, we introduce a metadata enrichment, which aims at improving resource information, and a metadata alignment to META-SHARE ontology, suitable for easing the accessibility and interoperability of such resources.
... In this paper we contribute to the interoperability of these repositories by developing an ontology in the Web Ontology Language (OWL) [18] that allows us to represent the metadata schemes of these repositories under an extensible, open-world model. 16 The proposed ontology is based on the ontology developed by Villegas et al. [23] for the University Pompeu Fabra's (UPF) META-SHARE node (covering part of the original schema), which is extended to the complete schema (in order to cover all relevant LRs) and incorporates the consensus reached in the context of the W3C Linked Data for Language Technologies (LD4LT) Community Group 17 . We show how this model interacts with the DCAT [16] vocabulary as well as the most prominent models in the CLARIN VLO data. ...
... We observed that the straightforward application of such a principle may derive unnecessarily verbose graphs. Thus, following Villegas et al. [23], we identified potentially removable nodes before undertaking the actual RDFication process. Embedded complex elements with cardinality of exactly one are identified as potentially removable, provided they contain neither text nor attributes. ...
... The IULA-UPF CLARIN Competence Centre 24 aims to promote and support the use of technology and text analysis tools in the Humanities and Social Sciences research. The centre includes a Catalogue 25 with information on language [23] and the original data generated from the UPF META-SHARE node 26 . The source XML records were converted into RDF and augmented with service descriptions (not included in the UPF META-SHARE node) and relevant documentation (appropriate articles, documentation, sample data and results, illustrative experiments, examples from outstanding projects, illustrative use cases, etc) to encourage potential users to embrace digital tools. ...
... In the framework of the LD4LT group, the META-SHARE model has been the base for the development of an ontology in OWL; the MS/OWL ontology has been based on the on the ontology developed by Villegas et al. (Villegas et al., 2014) (covering part of the original schema) and extended to the complete schema (in order to cover all relevant LRs) (McCrae et al., 2015). The transformation from the XSD schema to the OWL ontology involved the transformation of components to classes and that of elements to properties 15 . ...
Article
Full-text available
This scientific review paper aims at challenging a common point of view on metadata as a necessary evil and something mandatory to the data creating and dataset publishing process. Metadata are instead presented as a crucial element to ensure the findability of data services and repositories. This paper describes a way through four levels of metadata management and publication, from default unstructured data, through schema-based metadata with literal values and/or URIs, towards linked open (meta)data providing explicit linkage between reliable data resources. Such research was conducted within the European Union's project PoliVisu. Special attention is given to the following: (1) guidance on publication aimed at the broad audience of search engine users and (2) the publication of geo (meta)data not only via standard technologies, such as the OGC Catalogue Service for Web and open data portals, but also through leading search engines (that are Schema.org-based).
Conference Paper
META-SHARE is an infrastructure for sharing Language Resources (LRs) where significant effort has been made into providing carefully curated metadata about LRs. However, in the face of the flood of data that is used in computational linguistics, a manual approach cannot suffice. We present the development of the META-SHARE ontology, which transforms the metadata schema used by META-SHARE into ontology in the Web Ontology Language (OWL) that can better handle the diversity of metadata found in legacy and crowd-sourced resources. We show how this model can interface with other more general purpose vocabularies for online datasets and licensing, and apply this model to the CLARIN VLO, a large source of legacy metadata about LRs. Furthermore, we demonstrate the usefulness of this approach in two public metadata portals for information about language resources.
Article
Los investigadores en Humanidades Digitales tienen dificultades para acceder y utilizar herramientas informáticas que les asistan en la explotación de los textos objeto de sus estudios. Esto se debe al hecho de que, en la mayoría de los portales y registros especializados, la información sobre dichas herramientas no está enlazada con la información sobre dónde encontrarlas y cómo utilizarlas. El Centro de Competencias Clarin IULA-UPF compila e interrelaciona la información necesaria en un catálogo de datos enlazados para ofrecer a los investigadores una forma integrada de acceder a toda la información. En este artículo se presentan los detalles del diseño y de la selección de materiales e instrumentos de descripción utilizados en la elaboración de dicho catálogo.