A BGP in SPARQL syntax and as a graph (above), with its evaluation over the graph of Fig. 1 (below)

Source publication

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Article

Full-text available

Nov 2021

RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend...

Fig. 3 RDF data visualization in Protégé

Results of precision for entity extraction and linking

Exploiting lexical patterns for knowledge graph construction from unstructured text in Spanish

Article

Full-text available

Aug 2022

Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descrip...

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Article

Full-text available

Nov 2021

Figure 2: Motivating Example. The generation process starts with (1)...

Figure 3: Representation of source and target text used to fine-tune...

DBLP-QuAD: A Question Answering Dataset over the DBLP Scholarly Knowledge Graph

Preprint

Full-text available

Mar 2023

In this work we create a question answering dataset over the DBLP scholarly knowledge graph (KG). DBLP is an on-line reference for bibliographic information on major computer science publications that indexes over 4.4 million publications published by more than 2.2 million authors. Our dataset consists of 10,000 question answer pairs with the corre...

Fig. 1. Architectural overview of RDF QDAG

Fig. 5. Execution time (nanoseconds) of queries using both strategies...

Fig. 6. Initial accuracy and the improved accuracy of the optimizer.

Fig. 7. Execution time (ms) of queries using WKT and WKB.

Fig. 8. Execution time (ms) of queries with and without Spatial Pruning.

SRDF_QDAG: An efficient end-to-end RDF data management when graph exploration meets spatial processing

Article

Full-text available

Jan 2023

The popularity of RDF has led to the creation of several datasets (e.g., Yago, DBPedia) with different natures (graph, temporal, spatial). Different extensions have also been proposed for SPARQL language to provide appropriate processing. The best known is GeoSparql, that allows the integration of a set of spatial operators. In this paper, we propo...

Figure 4: Jena ARQ store-based SPARQL execution time (dotted) and...

Shape Fragments

Preprint

Full-text available

Dec 2021

In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties in RDF graphs are known as "shapes". Schemas in these languages list the various shapes that certain targeted nodes must satisfy for the graph to conform to the schema. Using SHACL, we propose in this paper a novel use of shapes, by which a set...

A Review of Semantic Annotation in the Context of the Linked Open Data Cloud

Article

Full-text available

Jun 2024

Algorithms and methods for automated construction of knowledge graphs based on text sources

Article

Full-text available

Jun 2024

In this article, we present our path towards building knowledge graphs automatically from Russian texts. We explore various methodologies and libraries to extract triples, which are the fundamental building blocks of knowledge graphs. Our approach involves the use of libraries for analyzing morphological characteristics of words, such as PyMorphy and Yandex Mystem, to construct triples. We also utilize the NLP library spaCy to analyze text and build triples based on semantic relationships recognized by the library. However, we found that in some cases, we could not extract relationships from the text, leading us to use word2vec to define relationships. Unfortunately, the results obtained from word2vec were unsatisfactory and could not be used as relationships. We also encountered the problem of building triples from text due to the use of pronouns. To address this issue, we explored the use of coreference resolution libraries, but unfortunately, there are no working libraries available for the Russian language at this time. Our results highlight both positive and negative outcomes of applying these methodologies and libraries, providing insights into the challenges and opportunities of building knowledge graphs automatically from Russian texts.

Unifying Faceted Search and Analytics over RDF Knowledge Graphs

Article

Full-text available

Mar 2024
KNOWL INF SYST

The formulation of analytical queries over Knowledge Graphs in RDF is a challenging task that presupposes familiarity with the syntax of the corresponding query languages and the contents of the graph. To alleviate this problem, we introduce a model for aiding users in formulating analytic queries over complex, i.e., not necessarily star schema-based, RDF Knowledge Graphs. To come up with an intuitive interface, we leverage the familiarity of users with Faceted Search systems. In particular, we start from a general model for Faceted Search over RDF data, and we extend it with actions that enable users to formulate analytic queries, too. Thus, the proposed model can be used not only for formulating analytic queries but also for exploratory purposes, i.e., for locating the desired resources in a Faceted Search manner. We describe the model from various perspectives, i.e., (1) we propose a generic user interface for intuitively analyzing RDF Knowledge Graphs, (2) we define formally the state space of the interaction model and the required algorithms for producing the user interface actions, (3) we present an implementation of the model that showcases its feasibility, and (4) we discuss the results of an evaluation with users that provides evidence for the acceptance of the method by users. Apart from being intuitive for end users, another distinctive characteristic of the proposed model is that it allows the gradual formulation of complex analytic queries (including nested ones).

Imperative Formal Knowledge Representation for Control Engineering: Examples from Lyapunov Theory

Article

Full-text available

Mar 2024

In this paper, we introduce a novel method to formally represent elements of control engineering knowledge in a suitable data structure. To this end, we first briefly review existing representation methods (RDF, OWL, Wikidata, ORKG). Based on this, we introduce our own approach: The Python-based imperative representation of knowledge (PyIRK) and its application to formulate the Ontology of Control Systems Engineering (OCSE). One of its main features is the possibility to represent the actual content of definitions and theorems as nodes and edges of a knowledge graph, which is demonstrated by selected theorems from Lyapunov’s theory. While the approach is still experimental, the current result already allows the application of methods of automated quality assurance and a SPARQL-based semantic search mechanism. The feature set of the framework is demonstrated by various examples. The paper concludes with a discussion of the limitations and directions for further development.

DIAERESIS: RDF data partitioning and query processing on SPARK

Article

Full-text available

Mar 2024

The explosion of the web and the abundance of linked data demand effective and efficient methods for storage, management, and querying. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploiting Spark for querying RDF data, adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic data partitioning fails, on one hand, to minimize data access and on the other hand to group data usually queried together. This is translated into limited improvement in terms of efficiency in query answering. In this paper, we present DIAERESIS, a novel platform that accepts as input an RDF dataset and effectively partitions it, minimizing data access and improving query answering efficiency. To achieve this, DIAERESIS first identifies the top-k most important schema nodes, i.e., the most important classes, as centroids and distributes the other schema nodes to the centroid they mostly depend on. Then, it allocates the corresponding instance nodes to the schema nodes they are instantiated under. Our algorithm enables fine-tuning of data distribution, significantly reducing data access for query answering. We experimentally evaluate our approach using both synthetic and real workloads, strictly dominating existing state-of-the-art, showing that we improve query answering in several cases by orders of magnitude.

Evaluating Ontology-Based PD Monitoring and Alerting in Personal Health Knowledge Graphs and Graph Neural Networks

Article

Full-text available

Feb 2024

In the realm of Parkinson’s Disease (PD) research, the integration of wearable sensor data with personal health records (PHR) has emerged as a pivotal avenue for patient alerting and monitoring. This study delves into the complex domain of PD patient care, with a specific emphasis on harnessing the potential of wearable sensors to capture, represent and semantically analyze crucial movement data and knowledge. The primary objective is to enhance the assessment of PD patients by establishing a robust foundation for personalized health insights through the development of Personal Health Knowledge Graphs (PHKGs) and the employment of personal health Graph Neural Networks (PHGNNs) that utilize PHKGs. The objective is to formalize the representation of related integrated data, unified sensor and PHR data in higher levels of abstraction, i.e., in a PHKG, to facilitate interoperability and support rule-based high-level event recognition such as patient’s missing dose or falling. This paper, extending our previous related work, presents the Wear4PDmove ontology in detail and evaluates the ontology within the development of an experimental PHKG. Furthermore, this paper focuses on the integration and evaluation of PHKG within the implementation of a Graph Neural Network (GNN). This work emphasizes the importance of integrating PD-related data for monitoring and alerting patients with appropriate notifications. These notifications offer health experts precise and timely information for the continuous evaluation of personal health-related events, ultimately contributing to enhanced patient care and well-informed medical decision-making. Finally, the paper concludes by proposing a novel approach for integrating personal health KGs and GNNs for PD monitoring and alerting solutions.

A Semantic Analysis Prototype for Distributed Syrian Government Data

Conference Paper

Jan 2024

The COVID-2019 pandemic has exposed the importance of government data analysis in responding to such emergencies, but at the same time, government data seemed unprepared to meet the requirements of the case, due to the privacy, distribution, and heterogeneous nature of government data. Through this research, we draw the government's attention to the importance of this matter, and present a proposal that adopts the best practices and standards to achieve this purpose. In the first phase (the objective of this paper) we outline the regulation rules of the data analysis life-cycle through our proposed prototype which relies on semantic web and natural language processing techniques. In the second phase, we develop the model's algorithms like semantic indexing, graph capturing and the other model units. We take into account the privacy of the Arabic language due to its complex nature, which means that English language tools will not work efficiently with the Arabic language. In addition, there is a lack of technical resources and research works.

Potentials of the Metaverse for Robotized Applications in Industry 4.0 and Industry 5.0

Article

Full-text available

Jan 2024

Eric Guiffo Kaigom

As a digital environment of interconnected virtual ecosystems driven by measured and synthesized data, the Metaverse has so far been mostly considered from its gaming perspective that closely aligns with online edutainment. Although it is still in its infancy and more research as well as standardization efforts remain to be done, the Metaverse could provide considerable advantages for smart robotized applications in the industry. Workflow efficiency, collective decision enrichment even for executives, as well as a natural, resilient, and sustainable robotized assistance for the workforce are potential advantages. Hence, the Metaverse could consolidate the connection between Industry 4.0 and Industry 5.0. This paper identifies and puts forward potential advantages of the Metaverse for robotized applications and highlights how these advantages support goals pursued by the Industry 4.0 and Industry 5.0 visions.

Re-Construction Impact on Metadata Representation Models

Conference Paper

Dec 2023

Compressed and queryable self-indexes for RDF archives

Article

Full-text available

Aug 2023
KNOWL INF SYST

RDF compression and querying are consolidated topics in the Web of Data, with a plethora of solutions to efficiently store and query static datasets. However, as RDF data changes along time, it becomes necessary to keep different versions of RDF datasets, in what is called an RDF archive. For large RDF datasets, naive techniques to store these versions lead to significant scalability problems. In this paper, we present v-RDF-SI, one of the first RDF archiving solutions that aim at joining both compression and fast querying. In v-RDF-SI, we extend existing RDF representations based on compact data structures to provide efficient support of version-based queries in compressed space. We present two implementations of v-RDF-SI, named v-RDFCSA and v-HDT, based, respectively, on RDFCSA (an RDF self-index) and HDT (a W3C-supported compressed RDF representation). We experimentally evaluate v-RDF-SI over a public benchmark named BEAR, showing that v-RDF-SI drastically reduces space requirements, being up to 40 times smaller than the baselines provided by BEAR, and 4 times smaller than alternatives based on compact data structures, while yielding significantly faster query times in most cases. On average, the fastest variants of v-RDF-SI outperform the alternatives by almost an order of magnitude.

A BGP in SPARQL syntax and as a graph (above), with its evaluation over the graph of Fig. 1 (below)

Similar publications

Citations