ChapterPDF Available

OSCAR: A Customisable Tool for Free-Text Search over SPARQL Endpoints: 3rd International Workshop, SAVE-SD 2017, Perth, Australia, April 3, 2017, and 4th International Workshop, SAVE-SD 2018, Lyon, France, April 24, 2018, Revised Selected Papers

January 2018

January 2018

DOI:10.1007/978-3-030-01379-0_9

In book: Semantics, Analytics, Visualization (pp.121-137)

Authors:

Ivan Heibi

University of Bologna

Silvio Peroni

University of Bologna

David Shotton

University of Bologna

SPARQL is a very powerful query language for RDF data, which can be used to retrieve data following specific patterns. In order to foster the availability of scholarly data on the Web, several project and institutions make available Web interfaces to SPARQL endpoints so as to enable a user to search for information in the RDF datasets they expose using SPARQL. However, SPARQL is quite complex to learn, and usually it is fully accessible only to experts in Semantic Web technologies, remaining completely obscure to ordinary Web users. In this paper we introduce OSCAR, the OpenCitations RDF Search Application, which is a user-friendly search platform that can be used to search any RDF triplestore providing a SPARQL endpoint, while hiding the complexities of SPARQL. We present its main features and demonstrate how it can be adapted to work with different SPARQL endpoints containing scholarly data, vis those provided by OpenCitations, ScholarlyData and Wikidata. We conclude by discussing the results of a user testing session that reveal the usability of the OSCAR search interface when employed to access information within the OpenCitations Corpus.

Role of OpenResearch in the workflow of academic conference management

…

Linked Data set generation and publishing workflow for the iLastic project.

…

The FAIR data principles. [12]

…

Core classes and properties of the indicators-ontology.

…

+44

A look of VisualBib interface: search of an author given the last name. (Color figure online)

…

Figures - uploaded by Ivan Heibi

Content may be subject to copyright.

Content uploaded by Ivan Heibi

Content may be subject to copyright.

A preview of the PDF is not available

Enabling text search on SPARQL endpoints through OSCAR

Article

Full-text available

Apr 2019

In this paper we introduce the latest version (Version 2.0) of OSCAR, the OpenCitations RDF Search Application, which has several improved features and extends the query workflow comparing with the previous version (Version 1.0) that we presented at the workshop entitled Semantics, Analytics, Visualisation: Enhancing Scholarly Dissemination (SAVE-SD 2018), held in conjunction with The Web Conference 2018. OSCAR is a user-friendly search platform that can be used to search any RDF triplestore providing a SPARQL endpoint, while hiding the complexities of SPARQL, thus making the search operations accessible to those who are not experts in Semantic Web technologies. We present here the basic features and the main extensions of this latest version of OSCAR. In addition, we demonstrate how it can be adapted to work with different SPARQL endpoints containing scholarly data, using as examples the OpenCitations Corpus (OCC) and the OpenCitations Index of Crossref open DOI-to-DOI citations (COCI), both provided by OpenCitations, and also the Wikidata dataset provided by the Wikimedia Foundation. We conclude by reporting the usage statistics of OSCAR, retrieved from the OpenCitations website logs, so as to demonstrate its uptake.

COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations

Preprint

Full-text available

Apr 2019

In this paper, we present COCI, the OpenCitations Index of Crossref open DOI-to-DOI citations (http://opencitations.net/index/coci). COCI is the first open citation index created by OpenCitations, in which we have applied the concept of citations as first-class data entities, and it contains more than 445 million DOI-to-DOI citation links derived from the data available in Crossref. These citations are described in RDF by means of the new extended version of the OpenCitations Data Model (OCDM). We introduce the workflow we have developed for creating these data, and also show the additional services that facilitate the access to and querying of these data by means of different access points: a SPARQL endpoint, a REST API, bulk downloads, Web interfaces, and direct access to the citations via HTTP content negotiation. Finally, we present statistics regarding the use of COCI citation data, and we introduce several projects that have already started to use COCI data for different purposes.

The OpenCitations Data Model

Chapter

Nov 2020

A variety of schemas and ontologies are currently used for the machine-readable description of bibliographic entities and citations. This diversity, and the reuse of the same ontology terms with different nuances, generates inconsistencies in data. Adoption of a single data model would facilitate data integration tasks regardless of the data supplier or context application. In this paper we present the OpenCitations Data Model (OCDM), a generic data model for describing bibliographic entities and citations, developed using Semantic Web technologies. We also evaluate the effective reusability of OCDM according to ontology evaluation practices, mention existing users of OCDM, and discuss the use and impact of OCDM in the wider open science community.

A Universal Application Programming Interface to Access and Reuse Linked Open Data

Chapter

Jun 2020

In this paper we present a Universal API in order to facilitate the access and reuse of Linked Open Data (LOD). Nowadays, it is difficult to explore heterogeneous data by structured query languages, especially for end users and developers unfamiliar with SPARQL and RDF. Our solution proposes a universal access to the LOD scenario through a common interface, which automatically generates SPARQL queries to access data from any dataset available online. Moreover, the results given by this Universal API are restructured and parsed to well-known formats easily understandable by the majority of developers, such as JSON or CSV. In order to easily use the Web API proposed, there is a Web interface which guides users to get the desired data, providing appropriate documentation to facilitate the search of relevant information. The main innovation of this approach is offering programmatic access to Linked Open Data through the automatic building of SPARQL queries without requiring any prior knowledge of the data and the Semantic Web environment.

VisualBib: Narrative Views for Customized Bibliographies

Conference Paper

Full-text available

Jul 2018

Existing citation indexes of scientific literature, like Scopus, WOS, Google Scholar and OpenAire, collect metadata of papers and authors. By using appropriate queries, it is possible to access these metadata, find specific papers and all associated details, as well as retrieve indices, metrics and information about authors. However, a big issue is that they generally do not aggregate the results of subsequent searches and do not offer explicit representations of author/citation relationships between found items. This paper introduces VisualBib, a Web application prototype conceived to support researchers who wish to create, modify, visualize and share bibliographies. Starting with a small set of papers or with a restricted number of authors, it generates, in real-time, an interactive visual representation of the corresponding bibliography; the user can explore the network of cited/citing references and dynamically add new papers in order to build up customized bibliographies which are represented using holistic, aggregated and graphical views.

Classification of Imbalance Data using Tomek Link (T-Link) Combined with Random Under-sampling (RUS) as a Data Reduction Method

Article

Full-text available

Jan 2016

Snorkel: Rapid Training Data Creation with Weak Supervision

Article

Full-text available

Nov 2017

Labeling training data is increasingly the largest bottleneck in deploying machine learning systems. We present Snorkel, a first-of-its-kind system that enables users to train state-of-the-art models without hand labeling any training data. Instead, users write labeling functions that express arbitrary heuristics, which can have unknown accuracies and correlations. Snorkel denoises their outputs without access to ground truth by incorporating the first end-to-end implementation of our recently proposed machine learning paradigm, data programming. We present a flexible interface layer for writing labeling functions based on our experience over the past year collaborating with companies, agencies, and research labs. In a user study, subject matter experts build models 2.8x faster and increase predictive performance an average 45.5% versus seven hours of hand labeling. We study the modeling tradeoffs in this new setting and propose an optimizer for automating tradeoff decisions that gives up to 1.8x speedup per pipeline execution. In two collaborations, with the U.S. Department of Veterans Affairs and the U.S. Food and Drug Administration, and on four open-source text and image data sets representative of other deployments, Snorkel provides 132% average improvements to predictive performance over prior heuristic approaches and comes within an average 3.60% of the predictive performance of large hand-curated training sets.

Can I Leave This One Out? The Effect of Dropping an Item From the SUS

Article

Full-text available

Nov 2017

There are times when user experience practitioners might consider using the System Usability Scale (SUS), but there is an item that just doesn't work in their context of measurement. For example, the first item is "I think I would like to use this system frequently." If the system under study is one that would only be used infrequently, then there is a concern that including this item would distort the scores, or at best, distract the participant. The results of the current research show that the mean scores of all 10 possible nine-item variants of the SUS are within one point (out of a hundred) of the mean of the standard SUS. Thus, practitioners can leave out any one of the SUS items without having a practically significant effect on the resulting scores, as long as an appropriate adjustment is made to the multiplier (specifically, multiply the sum of the adjusted item scores by 100/36 instead of the standard 100/40, or 2.5, to compensate for the dropped item).

Scholia, Scientometrics and Wikidata

Conference Paper

Full-text available

Nov 2017

Scholia is a tool to handle scientific bibliographic information through Wikidata. The Scholia Web service creates on-the-fly scholarly profiles for researchers, organizations, journals, publishers, individual scholarly works, and for research topics. To collect the data, it queries the SPARQL-based Wikidata Query Service. Among several display formats available in Scholia are lists of publications for individual researchers and organizations, plots of publications per year, employment timelines, as well as co-author and topic networks and citation graphs. The Python package implementing the Web service is also able to format Wikidata bibliographic entries for use in LaTeX/BIBTeX. Apart from detailing Scholia, we describe how Wikidata has been used for bibliographic information and we also provide some scientometric statistics on this information.

Automated computational thermochemistry for butane oxidation: A prelude to predictive automated combustion kinetics

Article

Aug 2018

Large-scale implementation of high level computational theoretical chemical kinetics offers the prospect for dramatically improving the fidelity of combustion chemical modeling. As a first step toward this goal, we developed a procedure for automatically generating the thermochemical data for combustion of an arbitrary fuel. The procedure begins by producing a list of combustion relevant species from a specification of the fuel and combustion conditions of interest. Then, for each element in the list of species, the procedure determines an internal coordinate z-matrix description of its structure, the optimal torsional configuration via Monte Carlo sampling, key rovibrational properties for that optimal geometry (including anharmonic corrections from torsional mappings and/or vibrational perturbation theory), and high level estimates of the electronic and zero-point energies via arbitrarily defined composite methods. This dataset is then converted first to partition functions, then to thermodynamic properties, and finally to NASA polynomial representations of the data. The end product is an automatically generated database of electronic structure results and thermochemical data including representations in a format appropriate for combustion simulations. The utility and functioning of this predictive automated computational thermochemistry (PACT) software package is illustrated through application to the automated generation of thermochemical data for the combustion of n-butane. Butane is chosen for this demonstration as its species list is of reasonably manageable size for debugging level computations, while still presenting most of the key challenges that need to be surmounted in the consideration of larger fuels. Furthermore, its low temperature chemistry is representative of that occurring with larger alkanes.

FaBIO and CiTO: Ontologies for Describing Bibliographic Resources and Citations

Article

Jan 2012

Semantic Web in Data Mining and Knowledge Discovery: A Comprehensive Survey

Article

Jan 2016

H-abstraction Reactions by OH, HO2, O, O2 and Benzyl Radical Addition to O2 and Their Implications for Kinetic Modelling of Toluene Oxidation.

Article

Jan 2018

Alkylated aromatics constitute a significant fraction of the components commonly found in commercial fuels. Toluene is typically considered as a reference fuel. Together with n-heptane and iso-octane, it allows for realistic emulations of the behavior of real fuels by the means of surrogate mixture formulations. Moreover, it is a key precursor for the formation of poly-aromatic hydrocarbons, which are of relevance to understanding soot growth and oxidation mechanisms. In this study the POLIMI kinetic model is first updated based on the literature and on recent kinetic modelling studies of toluene pyrolysis and oxidation. Then, important reaction pathways are investigated by means of high-level theoretical methods, thereby advancing the present knowledge on toluene oxidation. H-abstraction reactions by OH, HO2, O and O2, and the reactivity on the multi well benzyl-oxygen (C6H5CH2 + O2) potential energy surface (PES) were investigated using electronic structure calculations, transition state theory in its conventional, variational, and variable reaction coordinate forms (VRC-TST), and master equation calculations. Exploration of the effect on POLIMI model performance of literature rate constants and of the present calculations provides valuable guidelines for implementation of the new rate parameters in existing toluene kinetic models.

Exploratory Spatio-Temporal Queries in Evolving Information

Chapter

Jan 2018

OSCAR: A Customisable Tool for Free-Text Search over SPARQL Endpoints: 3rd International Workshop, SAVE-SD 2017, Perth, Australia, April 3, 2017, and 4th International Workshop, SAVE-SD 2018, Lyon, France, April 24, 2018, Revised Selected Papers

Abstract and Figures

Recommended publications

Proceedings of the Workshop on Time Series Analytics and Applications

Entertainment in Serious Games and Entertaining Serious Purposes

Evapotranspiration; papers presented at a workshop, 24-27 May 1982, held at Bunbury, W.A., Australia

Analyzing international scientific collaboration pattern for China by using ESI database