Figure - available from: The VLDB Journal
This content is subject to copyright. Terms and conditions apply.
A BGP in SPARQL syntax and as a graph (above), with its evaluation over the graph of Fig. 1 (below)

A BGP in SPARQL syntax and as a graph (above), with its evaluation over the graph of Fig. 1 (below)

Source publication
Article
Full-text available
RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend...

Similar publications

Article
Full-text available
Knowledge graphs (KGs) are useful data structures for the integration, retrieval, dissemination, and inference of information in various information domains. One of the main challenges in building KGs is the extraction of named entities (nodes) and their relations (edges), particularly when processing unstructured text as it has no semantic descrip...
Preprint
Full-text available
In this work we create a question answering dataset over the DBLP scholarly knowledge graph (KG). DBLP is an on-line reference for bibliographic information on major computer science publications that indexes over 4.4 million publications published by more than 2.2 million authors. Our dataset consists of 10,000 question answer pairs with the corre...
Article
Full-text available
The popularity of RDF has led to the creation of several datasets (e.g., Yago, DBPedia) with different natures (graph, temporal, spatial). Different extensions have also been proposed for SPARQL language to provide appropriate processing. The best known is GeoSparql, that allows the integration of a set of spatial operators. In this paper, we propo...
Preprint
Full-text available
In constraint languages for RDF graphs, such as ShEx and SHACL, constraints on nodes and their properties in RDF graphs are known as "shapes". Schemas in these languages list the various shapes that certain targeted nodes must satisfy for the graph to conform to the schema. Using SHACL, we propose in this paper a novel use of shapes, by which a set...

Citations

... provide a standard-compliant database data sharing method. Interconnecting disparate datasets within a specific organization to enable SPARQL cross-dataset searches [29]. ...
... It has come to be used as a common method for describing and exchanging data represented as a graph, making it a powerful tool for organizing structured information within a knowledge graph. Data in RDF is represented in the form of triplets -triples of the form "subject-predicate-object" [12]. Each part of an RDF triplet is individually addressed via unique URIs [12]. ...
... Data in RDF is represented in the form of triplets -triples of the form "subject-predicate-object" [12]. Each part of an RDF triplet is individually addressed via unique URIs [12]. This representation allows semantic data to be unambiguously queried and analyzed. ...
... To work with RDF in the context of knowledge graphs, the SPARQL language (SPARQL Protocol and RDF Query Language) is used. It is a semantic query language for databases that is able to retrieve and manipulate data stored in RDF format, specifying patterns and conditions that the data should fulfill [12]. SPARQL queries can be used to extract, update and manipulate RDF data, making it a key technology for semantic applications. ...
Article
Full-text available
In this article, we present our path towards building knowledge graphs automatically from Russian texts. We explore various methodologies and libraries to extract triples, which are the fundamental building blocks of knowledge graphs. Our approach involves the use of libraries for analyzing morphological characteristics of words, such as PyMorphy and Yandex Mystem, to construct triples. We also utilize the NLP library spaCy to analyze text and build triples based on semantic relationships recognized by the library. However, we found that in some cases, we could not extract relationships from the text, leading us to use word2vec to define relationships. Unfortunately, the results obtained from word2vec were unsatisfactory and could not be used as relationships. We also encountered the problem of building triples from text due to the use of pronouns. To address this issue, we explored the use of coreference resolution libraries, but unfortunately, there are no working libraries available for the Russian language at this time. Our results highlight both positive and negative outcomes of applying these methodologies and libraries, providing insights into the challenges and opportunities of building knowledge graphs automatically from Russian texts.
... Below we describe the proposed approach from three perspectives: (a) we formally specify the interaction model with states and transitions, (b) we express the query requirements of the model formally using a query language independent formalism facilitating in this way the implementation of the model over different technologies, query languages, and triplestores (see [55] for a survey of triplestores), and (c) we provide the exact specification of the UI and the algorithms followed for facilitating the implementation of the model. ...
... The efficiency of queries' execution depends on (1) the storing, indexing, and query processing techniques of the triplestore (as discussed in [55]) and (2) the size of the collection. Query optimization is beyond the scope of this work, i.e., our focus is how to facilitate the formulation of the query. ...
... We should mention that there are works that focus on optimizations for the evaluation of SPARQL analytic queries. Apart from the corresponding parts of the surveys [37,55], we could mention that [60] focuses on queries that include several chain and star patterns. ...
Article
Full-text available
The formulation of analytical queries over Knowledge Graphs in RDF is a challenging task that presupposes familiarity with the syntax of the corresponding query languages and the contents of the graph. To alleviate this problem, we introduce a model for aiding users in formulating analytic queries over complex, i.e., not necessarily star schema-based, RDF Knowledge Graphs. To come up with an intuitive interface, we leverage the familiarity of users with Faceted Search systems. In particular, we start from a general model for Faceted Search over RDF data, and we extend it with actions that enable users to formulate analytic queries, too. Thus, the proposed model can be used not only for formulating analytic queries but also for exploratory purposes, i.e., for locating the desired resources in a Faceted Search manner. We describe the model from various perspectives, i.e., (1) we propose a generic user interface for intuitively analyzing RDF Knowledge Graphs, (2) we define formally the state space of the interaction model and the required algorithms for producing the user interface actions, (3) we present an implementation of the model that showcases its feasibility, and (4) we discuss the results of an evaluation with users that provides evidence for the acceptance of the method by users. Apart from being intuitive for end users, another distinctive characteristic of the proposed model is that it allows the gradual formulation of complex analytic queries (including nested ones).
... In contrast to the backend-dependent API, SPARQL is a widespread standard to retrieve data stored in RDF format [45][46][47]. This query language is explicitly designed to make use of the triple structure, where the subject, predicate and object are typically so-called uniform resource identifiers (URIs) or literal values (such as strings or numbers). ...
Article
Full-text available
In this paper, we introduce a novel method to formally represent elements of control engineering knowledge in a suitable data structure. To this end, we first briefly review existing representation methods (RDF, OWL, Wikidata, ORKG). Based on this, we introduce our own approach: The Python-based imperative representation of knowledge (PyIRK) and its application to formulate the Ontology of Control Systems Engineering (OCSE). One of its main features is the possibility to represent the actual content of definitions and theorems as nodes and edges of a knowledge graph, which is demonstrated by selected theorems from Lyapunov’s theory. While the approach is still experimental, the current result already allows the application of methods of automated quality assurance and a SPARQL-based semantic search mechanism. The feature set of the framework is demonstrated by various examples. The paper concludes with a discussion of the limitations and directions for further development.
... The prevalence of Linked Open Data, and the explosion of available information on the Web, have led to an enormous amount of widely available RDF datasets [8]. To store, manage and query these ever increasing RDF data, many RDF stores and SPARQL engines have been developed [2] whereas in many cases other non RDF specific big data infrastructures have been leveraged for query processing on RDF knowledge graphs. Apache Spark is a big-data management engine, with an ever increasing interest in using it for efficient query answering over RDF data [1]. ...
... Allegrograph, SHARD, H2RDF). For a complete view on the systems currently available in the domain the interested reader is forwarded to the relevant surveys [2,7]. ...
Article
Full-text available
The explosion of the web and the abundance of linked data demand effective and efficient methods for storage, management, and querying. Apache Spark is one of the most widely used engines for big data processing, with more and more systems adopting it for efficient query answering. Existing approaches exploiting Spark for querying RDF data, adopt partitioning techniques for reducing the data that need to be accessed in order to improve efficiency. However, simplistic data partitioning fails, on one hand, to minimize data access and on the other hand to group data usually queried together. This is translated into limited improvement in terms of efficiency in query answering. In this paper, we present DIAERESIS, a novel platform that accepts as input an RDF dataset and effectively partitions it, minimizing data access and improving query answering efficiency. To achieve this, DIAERESIS first identifies the top-k most important schema nodes, i.e., the most important classes, as centroids and distributes the other schema nodes to the centroid they mostly depend on. Then, it allocates the corresponding instance nodes to the schema nodes they are instantiated under. Our algorithm enables fine-tuning of data distribution, significantly reducing data access for query answering. We experimentally evaluate our approach using both synthetic and real workloads, strictly dominating existing state-of-the-art, showing that we improve query answering in several cases by orders of magnitude.
... RDF is fundamental to KGs, using triples (subject-predicate-object) to express statements. SPARQL (SPARQL Protocol and RDF Query Language) [23] facilitates the querying and manipulation of RDF data within a KG. ...
Article
Full-text available
In the realm of Parkinson’s Disease (PD) research, the integration of wearable sensor data with personal health records (PHR) has emerged as a pivotal avenue for patient alerting and monitoring. This study delves into the complex domain of PD patient care, with a specific emphasis on harnessing the potential of wearable sensors to capture, represent and semantically analyze crucial movement data and knowledge. The primary objective is to enhance the assessment of PD patients by establishing a robust foundation for personalized health insights through the development of Personal Health Knowledge Graphs (PHKGs) and the employment of personal health Graph Neural Networks (PHGNNs) that utilize PHKGs. The objective is to formalize the representation of related integrated data, unified sensor and PHR data in higher levels of abstraction, i.e., in a PHKG, to facilitate interoperability and support rule-based high-level event recognition such as patient’s missing dose or falling. This paper, extending our previous related work, presents the Wear4PDmove ontology in detail and evaluates the ontology within the development of an experimental PHKG. Furthermore, this paper focuses on the integration and evaluation of PHKG within the implementation of a Graph Neural Network (GNN). This work emphasizes the importance of integrating PD-related data for monitoring and alerting patients with appropriate notifications. These notifications offer health experts precise and timely information for the continuous evaluation of personal health-related events, ultimately contributing to enhanced patient care and well-informed medical decision-making. Finally, the paper concludes by proposing a novel approach for integrating personal health KGs and GNNs for PD monitoring and alerting solutions.
... Thus, the graph construction lifecycle is not fully automated. In a survey of RDF stores [13], they identified several points that need further research (server overload, storage methods, indexing and caching). In [9], the author referred to the fact that using JSON serialization will facilitate accessibility to RDF data. ...
Conference Paper
The COVID-2019 pandemic has exposed the importance of government data analysis in responding to such emergencies, but at the same time, government data seemed unprepared to meet the requirements of the case, due to the privacy, distribution, and heterogeneous nature of government data. Through this research, we draw the government's attention to the importance of this matter, and present a proposal that adopts the best practices and standards to achieve this purpose. In the first phase (the objective of this paper) we outline the regulation rules of the data analysis life-cycle through our proposed prototype which relies on semantic web and natural language processing techniques. In the second phase, we develop the model's algorithms like semantic indexing, graph capturing and the other model units. We take into account the privacy of the Arabic language due to its complex nature, which means that English language tools will not work efficiently with the Arabic language. In addition, there is a lack of technical resources and research works.
... RDF allows for the storage Fig. 5: Architecture pursued in this work, which is based upon [55], for a natural (language-based) compliance control of robots. of multi-modal scene graphs, including text, image, and video formats as well as underlying semantic similarities and spatial relations [48]. Extensions of SPARQL [49], such as VGStore that builds on the grammar of SPARQL and the Python module pyparsing, can be used to conveniently query multi-modal information on RDF-stored scene graphs [48]. Handling multi-modality is pivotal to enrich pre-trained foundation models (e.g., BERT, CLIP, ChatGPT, etc.) [50] provided in the collaboration space layer of the Metaverse in fig. 2 for a twofold objective. ...
Article
Full-text available
As a digital environment of interconnected virtual ecosystems driven by measured and synthesized data, the Metaverse has so far been mostly considered from its gaming perspective that closely aligns with online edutainment. Although it is still in its infancy and more research as well as standardization efforts remain to be done, the Metaverse could provide considerable advantages for smart robotized applications in the industry. Workflow efficiency, collective decision enrichment even for executives, as well as a natural, resilient, and sustainable robotized assistance for the workforce are potential advantages. Hence, the Metaverse could consolidate the connection between Industry 4.0 and Industry 5.0. This paper identifies and puts forward potential advantages of the Metaverse for robotized applications and highlights how these advantages support goals pursued by the Industry 4.0 and Industry 5.0 visions.
... All resources to reproduce the experiments are available online [23]. 1 ...
... The analysis of SPARQL triplestores has been widely studied in the literature [1,37]. The Berlin SPARQL Benchmark (BSBM) [8], focused on the e-commerce domain, is one of the most well-known benchmarks used to evaluate the performance and scalability of triplestores and virtual knowledge graph construction systems. ...
... In 2022, more than 14 million websites published data in RDF (with almost a 3× increase since 2018) [7]. Since RDF only defines a conceptual data model, a number of different solutions have been proposed to physically implement RDF stores that are able to store and query RDF data efficiently [2,42]. ...
... For example, note that the object of the second triple (S id [6]) in Fig. 3 is referenced from [7] that keeps the subject of the third triple. However, the modified in RDFCSA contains [15] = 2, where SA [2] = 4 points to S id [4], which contains the subject of the second triple. Consequently, becomes cyclical within RDF triples. ...
... For example, if we want to retrieve all the triples including subject 1, we will use the triple pattern Q ←(1??). The initial binary search on SA gives that SA [1,2] point to s 1 = S id [1] and s 2 = S id [4] that are, respectively, the subjects of the first and second triples. Now, if we want to recover the predicate p 1 A key property of RDFCSA is that any triple can be easily identified by the position in SA where its subject is located. ...
Article
Full-text available
RDF compression and querying are consolidated topics in the Web of Data, with a plethora of solutions to efficiently store and query static datasets. However, as RDF data changes along time, it becomes necessary to keep different versions of RDF datasets, in what is called an RDF archive. For large RDF datasets, naive techniques to store these versions lead to significant scalability problems. In this paper, we present v-RDF-SI, one of the first RDF archiving solutions that aim at joining both compression and fast querying. In v-RDF-SI, we extend existing RDF representations based on compact data structures to provide efficient support of version-based queries in compressed space. We present two implementations of v-RDF-SI, named v-RDFCSA and v-HDT, based, respectively, on RDFCSA (an RDF self-index) and HDT (a W3C-supported compressed RDF representation). We experimentally evaluate v-RDF-SI over a public benchmark named BEAR, showing that v-RDF-SI drastically reduces space requirements, being up to 40 times smaller than the baselines provided by BEAR, and 4 times smaller than alternatives based on compact data structures, while yielding significantly faster query times in most cases. On average, the fastest variants of v-RDF-SI outperform the alternatives by almost an order of magnitude.