Example Vertical Fragment

Source publication

Query Workload-based RDF Graph Fragmentation and Allocation

Article

Full-text available

Aug 2015

As the volume of the RDF data becomes increasingly large, it is essential for us to design a distributed database system to manage it. For distributed RDF data design, it is quite common to partition the RDF data into some parts, called fragments, which are then distributed. Thus, the distribution design consists of two steps: fragmentation and all...

Context 1

... 1. Given the frequent access pattern p 3 in Figure 4, Figure 5 shows the corresponding vertical fragment. ...

View in full-text

A Dual-Store Structure for Knowledge Graphs

Article

Jun 2021
IEEE T KNOWL DATA EN

To effectively manage increasing knowledge graphs in various domains, a hot research topic, knowledge graph storage management, has emerged. Existing methods are classified to relational stores and native graph stores. Relational stores are able to store large-scale knowledge graphs and convenient in updating knowledge, but the query performance weakens obviously when the selectivity of a knowledge graph query is large. Native graph stores are efficient in processing complex knowledge graph queries due to its index-free adjacent property, but they are inapplicable to manage a large-scale knowledge graph due to limited storage budgets or inflexible updating process. Motivated by this, we propose a dual-store structure which leverages a graph store to accelerate the complex query process in the relational store. However, it is challenging to determine what data to transfer from relational store to graph store at what time. To address this problem, we formulate it as a Markov Decision Process and derive a physical design tuner ${{\sf DOTIL}}$ based on reinforcement learning. With ${{\sf DOTIL}}$ , the dual-store structure is adaptive to dynamic changing workloads. Experimental results on real knowledge graphs demonstrate that our proposed dual-store structure improves query performance up to average 50.11 percent compared with the most commonly used relational stores.

A Workload-Adaptive Streaming Partitioner for Distributed Graph Stores

Article

Full-text available

Jun 2021

Streaming graph partitioning methods have recently gained attention due to their ability to scale to very large graphs with limited resources. However, many such methods do not consider workload and graph characteristics. This may degrade the performance of queries by increasing inter-node communication and computational load imbalance. Moreover, existing workload-aware methods cannot consistently provide good performance as they do not consider dynamic workloads that keep emerging in graph applications. We address these issues by proposing a novel workload-adaptive streaming partitioner named WASP, that aims to achieve low-latency and high-throughput online graph queries. As each workload typically contains frequent query patterns, WASP exploits the existing workload to capture active vertices and edges which are frequently visited and traversed, respectively. This information is used to heuristically improve the quality of partitions either by avoiding the concentration of active vertices in a few partitions proportional to their visit frequencies or by reducing the probability of the cut of active edges proportional to their traversal frequencies. In order to assess the impact of WASP on a graph store and to show how easily the approach can be plugged on top of the system, we exploit it in a distributed graph-based RDF store. Our experiments over three synthetic and real-world graph datasets and the corresponding static and dynamic query workloads show that WASP achieves a better query performance against state-of-the-art graph partitioners, especially in dynamic query workloads.

A novel subgraph querying method based on paths and spectra

Article

Full-text available

Sep 2019
NEURAL COMPUT APPL

Graph and graph database are widely used in many domains, and the graph querying attracts more and more attentions. Among these querying problems, subgraph querying is the most compelling one, since it contains very expensive subgraph isomorphism. The paper proposes a novel subgraph querying method PLGCoding, which use some information of shortest paths and Laplacian spectra to filter out false positives. Specifically, we first extract some features, including some information of vertices, edges, the shortest paths and Laplacian spectra, and encode extracted features. An index PLGCode-Tree is built based on codes to shrink the candidate set. Then, we propose two-step filtering strategy to implement the filtering-and-verification framework and thus generate the answer set. Compared with competing methods on real dataset, experimental results show PLGCoding can improve the querying efficiency.

Accelerating Partial Evaluation in Distributed SPARQL Query Evaluation

Preprint

Full-text available

Feb 2019

Partial evaluation has recently been used for processing SPARQL queries over a large resource description framework (RDF) graph in a distributed environment. However, the previous approach is inefficient when dealing with complex queries. In this study, we further improve the "partial evaluation and assembly" framework for answering SPARQL queries over a distributed RDF graph, while providing performance guarantees. Our key idea is to explore the intrinsic structural characteristics of partial matches to filter out irrelevant partial results, while providing performance guarantees on a network trace (data shipment) or the computational cost (response time). We also propose an efficient assembly algorithm to utilize the characteristics of partial matches to merge them and form final results. To improve the efficiency of finding partial matches further, we propose an optimization that communicates variables' candidates among sites to avoid redundant computations. In addition, although our approach is partitioning-tolerant, different partitioning strategies result in different performances, and we evaluate different partitioning strategies for our approach. Experiments over both real and synthetic RDF datasets confirm the superiority of our approach.

WORQ: Workload-Driven RDF Query Processing: 17th International Semantic Web Conference, Monterey, CA, USA, October 8–12, 2018, Proceedings, Part I

Chapter

Full-text available

Sep 2018

RDF Data Storage and Query Processing Schemes: A Survey

Article

Full-text available

Sep 2018
ACM COMPUT SURV

The Resource Description Framework (RDF) represents a main ingredient and data representation format for Linked Data and the Semantic Web. It supports a generic graph-based data model and data representation format for describing things, including their relationships with other things. As the size of RDF datasets is growing fast, RDF data management systems must be able to cope with growing amounts of data. Even though physically handling RDF data using a relational table is possible, querying a giant triple table becomes very expensive because of the multiple nested joins required for answering graph queries. In addition, the heterogeneity of RDF Data poses entirely new challenges to database systems. This article provides a comprehensive study of the state of the art in handling and querying RDF data. In particular, we focus on data storage techniques, indexing strategies, and query execution mechanisms. Moreover, we provide a classification of existing systems and approaches. We also provide an overview of the various benchmarking efforts in this context and discuss some of the open problems in this domain.

Loom: Query-aware Partitioning of Online Graphs

Article

Nov 2017

As with general graph processing systems, partitioning data over a cluster of machines improves the scalability of graph database management systems. However, these systems will incur additional network cost during the execution of a query workload, due to inter-partition traversals. Workload-agnostic partitioning algorithms typically minimise the likelihood of any edge crossing partition boundaries. However, these partitioners are sub-optimal with respect to many workloads, especially queries, which may require more frequent traversal of specific subsets of inter-partition edges. Furthermore, they largely unsuited to operating incrementally on dynamic, growing graphs. We present a new graph partitioning algorithm, Loom, that operates on a stream of graph updates and continuously allocates the new vertices and edges to partitions, taking into account a query workload of graph pattern expressions along with their relative frequencies. First we capture the most common patterns of edge traversals which occur when executing queries. We then compare sub-graphs, which present themselves incrementally in the graph update stream, against these common patterns. Finally we attempt to allocate each match to single partitions, reducing the number of inter-partition edges within frequently traversed sub-graphs and improving average query performance. Loom is extensively evaluated over several large test graphs with realistic query workloads and various orderings of the graph updates. We demonstrate that, given a workload, our prototype produces partitionings of significantly better quality than existing streaming graph partitioning algorithms Fennel and LDG.

Analysing workload trends for boosting triple stores performance

Article

Jun 2024
INFORM SYST

Analysing Workload Trends for Boosting Triple Stores Performance

Chapter

Aug 2022

The Resource Description Framework (RDF) is widely used to model web data. The scale and complexity of the modeled data emphasized performance challenges on the RDF-triple stores. Workload adaption is one important strategy to deal with those challenges on the storage level. In all the current adaptation approaches, the workload statistics are built collectively, and the analysis process is not aware of old or recent items in the workloads. However, that does not simulate the timely trends that exist naturally in user queries and causes the analysis process to lag behind the rapid workload development. In this work, we model the workload statistics as time series and apply well-known smoothing techniques allowing the importance of the workload to decay over time. We apply the proposed approach on UniAdapt [1] which follows a unified and comprehensive storage adaption process.KeywordsRDFTriple-storesWorkload adaption

MPC: Minimum Property-Cut RDF Graph Partitioning

Conference Paper

May 2022

Example Vertical Fragment

Context in source publication

Citations