A workflow which shows changes in red; using partition statistics, our query optimizer computes data-induced predicates and outputs plans that read less input

Source publication

Example illustrating creation and use of a data-induced predicate which...

Illustrating the need to move diPs past other operations (left). On a...

A workflow which shows changes in red; using partition statistics, our...

Other data statistics (histograms, range-sets) for the same example as...

For TPC-H query 17 in Fig. 2 (left), the table shows the partition...

Data-induced predicates for sideways information passing in query optimizers

Article

Full-text available

Aug 2021

Using data statistics, we convert predicates on a table into data-induced predicates (diPs) that apply on the joining tables. Doing so substantially speeds up multi-relation queries because the benefits of predicate pushdown can now apply beyond just the tables that have predicates. We use diPs to skip data exclusively during query optimization; i....

A survey of RDF stores & SPARQL engines for querying knowledge graphs

Article

Full-text available

Nov 2021

RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend...

Optimising Queries for Pattern Detection Over Large Scale Temporally Evolving Graphs

Article

Full-text available

Jan 2024

Large-scale graph processing and Stream processing are two distinct computational paradigms for big data processing. Graph processing deals with computation on graphs of billions of vertices and edges. However, large-scale graph processing frameworks mostly work on graphs that do not change over time, while on the other end of the spectrum, stream processing operates on a continuous stream of data in real time. Modern-day graphs change very rapidly over time, and finding patterns in temporally evolving graphs could reveal a lot of insights that can not be unveiled using traditional graph computations. We have proposed a novel framework called FlowGraph which could find patterns in dynamic and temporally evolving graphs. Computations on large-scale graphs are iterative and take multiple steps before final results can be calculated, which is very different from stream processing which is one-shot computation. Therefore, the most critical bottleneck of such a system is the time required to process the query. In this work, we have proposed a query optimization technique that could reduce the time required to process the pattern. The proposed system has an optimization technique that could reduce the time required to process the pattern, especially those related to the temporal evolution of the graph. Our method shows for eight clauses the execution time is reduced by 75%, we also proved that this improvement is not affected by the scaling of the graph or the change of elements in given clauses.

SAT: sampling acceleration tree for adaptive database repartition

Article

Full-text available

Aug 2023
WORLD WIDE WEB

Nowadays, the volume of online data stored on websites is constantly increasing, and users’ demand for faster query response times is also on the rise with the expansion of network bandwidth. To improve the efficiency of database query, many large enterprises use database partitioning to divide huge database tables and speed up query results. While database partitioning methods based on query workloads have been successful, they have their limitations. These methods rely heavily on current workloads and the resulting partitioning structures may need to be improved when workloads change, a process called database repartitioning. Most current methods for repartitioning involve restarting the partitioning module directly, leading to significant overhead in industry due to the high complexity of the partitioning algorithm. Additionally, existing repartitioning models are often artificially determined and cannot achieve truly adaptive repartitioning. To address these issues, we propose a multi-tree training sampling model based on existing tree-shaped structure, which can speed up qdtree partitioning algorithm and reduce overhead caused by repartitioning. We also introduce improvements to qdtree structure to make it more adaptable to our method. For each query received by the partitioning model, we use a result-return rate mechanism to accumulate the evaluation of the current query on the partition structure, and initiate repartitioning only after a certain threshold is reached. Furthermore, we use the data redundancy storage technique to further improve query speed.

A workflow which shows changes in red; using partition statistics, our query optimizer computes data-induced predicates and outputs plans that read less input

Similar publications

Citations