ChapterPDF Available

OSCAR: A Customisable Tool for Free-Text Search over SPARQL Endpoints: 3rd International Workshop, SAVE-SD 2017, Perth, Australia, April 3, 2017, and 4th International Workshop, SAVE-SD 2018, Lyon, France, April 24, 2018, Revised Selected Papers

Authors:

Abstract and Figures

SPARQL is a very powerful query language for RDF data, which can be used to retrieve data following specific patterns. In order to foster the availability of scholarly data on the Web, several project and institutions make available Web interfaces to SPARQL endpoints so as to enable a user to search for information in the RDF datasets they expose using SPARQL. However, SPARQL is quite complex to learn, and usually it is fully accessible only to experts in Semantic Web technologies, remaining completely obscure to ordinary Web users. In this paper we introduce OSCAR, the OpenCitations RDF Search Application, which is a user-friendly search platform that can be used to search any RDF triplestore providing a SPARQL endpoint, while hiding the complexities of SPARQL. We present its main features and demonstrate how it can be adapted to work with different SPARQL endpoints containing scholarly data, vis those provided by OpenCitations, ScholarlyData and Wikidata. We conclude by discussing the results of a user testing session that reveal the usability of the OSCAR search interface when employed to access information within the OpenCitations Corpus.
Content may be subject to copyright.
A preview of the PDF is not available
... In order to make such SPARQL endpoints usable by a broader audience, without obliging such users to become experts in Semantic Web technology, we have developed OSCAR, the OpenCitations RDF Search Application, previously described at the SAVE-SD 2018 Workshop 5 (co-located with The Web Conference 2018 6 ) [6]. OSCAR is a user-friendly search platform that can be used with any RDF triplestore providing a SPARQL endpoint, and which is entirely built without the need for integration of external application components. ...
... Currently, OpenCitations provides two different datasets, i.e. the OCC 10 (the OpenCitations Corpus) [14] and COCI 11 inside the OpenCitations website so as to enable searches on these datasets, thus permitting ordinary Web users to compose and obtain responses to simple textual queries. The original version of OSCAR (Version 1.0), described in [6], was able to accept free-text queries that were analysed so as to understand the user intent, and then executed in the background by employing the appropriate SPARQL query. Since then, we have developed new features in response to users needs and the outcomes of the usability studies described in [6]. ...
... The original version of OSCAR (Version 1.0), described in [6], was able to accept free-text queries that were analysed so as to understand the user intent, and then executed in the background by employing the appropriate SPARQL query. Since then, we have developed new features in response to users needs and the outcomes of the usability studies described in [6]. This paper reports these new features, made possible by additions of the OSCAR architecture. ...
Article
Full-text available
In this paper we introduce the latest version (Version 2.0) of OSCAR, the OpenCitations RDF Search Application, which has several improved features and extends the query workflow comparing with the previous version (Version 1.0) that we presented at the workshop entitled Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination (SAVE-SD 2018), held in conjunction with The Web Conference 2018. OSCAR is a user-friendly search platform that can be used to search any RDF triplestore providing a SPARQL endpoint, while hiding the complexities of SPARQL, thus making the search operations accessible to those who are not experts in Semantic Web technologies. We present here the basic features and the main extensions of this latest version of OSCAR. In addition, we demonstrate how it can be adapted to work with different SPARQL endpoints containing scholarly data, using as examples the OpenCitations Corpus (OCC) and the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI), both provided by OpenCitations, and also the Wikidata dataset provided by the Wikimedia Foundation. We conclude by reporting the usage statistics of OSCAR, retrieved from the OpenCitations website logs, so as to demonstrate its uptake.
... These two interfaces have been developed by means of OSCAR, the OpenCitations RDF Search Application (https://github.com/opencitations/oscar) [11], and LUCINDA, the OpenCitations RDF Resource Browser (https://github.com/opencitations/lucinda), that provide a configurable layer over SPARQL endpoints that permit one easily to create Web interfaces for querying and visualising the results of SPARQL queries. ...
Preprint
Full-text available
In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (http://opencitations.net/index/coci). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived from the data available in Crossref. These citations are described in RDF by means of the new extended version of the OpenCitations Data Model (OCDM). We introduce the workflow we have developed for creating these data, and also show the additional services that facilitate the access to and querying of these data by means of different access points: a SPARQL endpoint, a REST API, bulk downloads, Web interfaces, and direct access to the citations via HTTP content negotiation. Finally, we present statistics regarding the use of COCI citation data, and we introduce several projects that have already started to use COCI data for different purposes.
Chapter
A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data supplier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies. We also evaluate the effective reusability of OCDM according to ontology evaluation practices, mention existing users of OCDM, and discuss the use and impact of OCDM in the wider open science community.
Chapter
In this paper we present a Universal API in order to facilitate the access and reuse of Linked Open Data (LOD). Nowadays, it is difficult to explore heterogeneous data by structured query languages, especially for end users and developers unfamiliar with SPARQL and RDF. Our solution proposes a universal access to the LOD scenario through a common interface, which automatically generates SPARQL queries to access data from any dataset available online. Moreover, the results given by this Universal API are restructured and parsed to well-known formats easily understandable by the majority of developers, such as JSON or CSV. In order to easily use the Web API proposed, there is a Web interface which guides users to get the desired data, providing appropriate documentation to facilitate the search of relevant information. The main innovation of this approach is offering programmatic access to Linked Open Data through the automatic building of SPARQL queries without requiring any prior knowledge of the data and the Semantic Web environment.
Conference Paper
Full-text available
Existing citation indexes of scientific literature, like Scopus, WOS, Google Scholar and OpenAire, collect metadata of papers and authors. By using appropriate queries, it is possible to access these metadata, find specific papers and all associated details, as well as retrieve indices, metrics and information about authors. However, a big issue is that they generally do not aggregate the results of subsequent searches and do not offer explicit representations of author/citation relationships between found items. This paper introduces VisualBib, a Web application prototype conceived to support researchers who wish to create, modify, visualize and share bibliographies. Starting with a small set of papers or with a restricted number of authors, it generates, in real-time, an interactive visual representation of the corresponding bibliography; the user can explore the network of cited/citing references and dynamically add new papers in order to build up customized bibliographies which are represented using holistic, aggregated and graphical views.
Article
Full-text available
Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8x faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8x speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.
Article
Full-text available
There are times when user experience practitioners might consider using the System Usability Scale (SUS), but there is an item that just doesn't work in their context of measurement. For example, the first item is "I think I would like to use this system frequently." If the system under study is one that would only be used infrequently, then there is a concern that including this item would distort the scores, or at best, distract the participant. The results of the current research show that the mean scores of all 10 possible nine-item variants of the SUS are within one point (out of a hundred) of the mean of the standard SUS. Thus, practitioners can leave out any one of the SUS items without having a practically significant effect on the resulting scores, as long as an appropriate adjustment is made to the multiplier (specifically, multiply the sum of the adjusted item scores by 100/36 instead of the standard 100/40, or 2.5, to compensate for the dropped item).
Conference Paper
Full-text available
Scholia is a tool to handle scientific bibliographic information through Wikidata. The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. To collect the data, it queries the SPARQL-based Wikidata Query Service. Among several display formats available in Scholia are lists of publications for individual researchers and organizations, plots of publications per year, employment timelines, as well as co-author and topic networks and citation graphs. The Python package implementing the Web service is also able to format Wikidata bibliographic entries for use in LaTeX/BIBTeX. Apart from detailing Scholia, we describe how Wikidata has been used for bibliographic information and we also provide some scientometric statistics on this information.
Article
Large-scale implementation of high level computational theoretical chemical kinetics offers the prospect for dramatically improving the fidelity of combustion chemical modeling. As a first step toward this goal, we developed a procedure for automatically generating the thermochemical data for combustion of an arbitrary fuel. The procedure begins by producing a list of combustion relevant species from a specification of the fuel and combustion conditions of interest. Then, for each element in the list of species, the procedure determines an internal coordinate z-matrix description of its structure, the optimal torsional configuration via Monte Carlo sampling, key rovibrational properties for that optimal geometry (including anharmonic corrections from torsional mappings and/or vibrational perturbation theory), and high level estimates of the electronic and zero-point energies via arbitrarily defined composite methods. This dataset is then converted first to partition functions, then to thermodynamic properties, and finally to NASA polynomial representations of the data. The end product is an automatically generated database of electronic structure results and thermochemical data including representations in a format appropriate for combustion simulations. The utility and functioning of this predictive automated computational thermochemistry (PACT) software package is illustrated through application to the automated generation of thermochemical data for the combustion of n-butane. Butane is chosen for this demonstration as its species list is of reasonably manageable size for debugging level computations, while still presenting most of the key challenges that need to be surmounted in the consideration of larger fuels. Furthermore, its low temperature chemistry is representative of that occurring with larger alkanes.
Article
Alkylated aromatics constitute a significant fraction of the components commonly found in commercial fuels. Toluene is typically considered as a reference fuel. Together with n-heptane and iso-octane, it allows for realistic emulations of the behavior of real fuels by the means of surrogate mixture formulations. Moreover, it is a key precursor for the formation of poly-aromatic hydrocarbons, which are of relevance to understanding soot growth and oxidation mechanisms. In this study the POLIMI kinetic model is first updated based on the literature and on recent kinetic modelling studies of toluene pyrolysis and oxidation. Then, important reaction pathways are investigated by means of high-level theoretical methods, thereby advancing the present knowledge on toluene oxidation. H-abstraction reactions by OH, HO2, O and O2, and the reactivity on the multi well benzyl-oxygen (C6H5CH2 + O2) potential energy surface (PES) were investigated using electronic structure calculations, transition state theory in its conventional, variational, and variable reaction coordinate forms (VRC-TST), and master equation calculations. Exploration of the effect on POLIMI model performance of literature rate constants and of the present calculations provides valuable guidelines for implementation of the new rate parameters in existing toluene kinetic models.