Figure - available from: The VLDB Journal
This content is subject to copyright. Terms and conditions apply.
A workflow which shows changes in red; using partition statistics, our query optimizer computes data-induced predicates and outputs plans that read less input

A workflow which shows changes in red; using partition statistics, our query optimizer computes data-induced predicates and outputs plans that read less input

Source publication
Article
Full-text available
Using data statistics, we convert predicates on a table into data-induced predicates (diPs) that apply on the joining tables. Doing so substantially speeds up multi-relation queries because the benefits of predicate pushdown can now apply beyond just the tables that have predicates. We use diPs to skip data exclusively during query optimization; i....

Similar publications

Article
Full-text available
RDF has seen increased adoption in recent years, prompting the standardization of the SPARQL query language for RDF, and the development of local and distributed engines for processing SPARQL queries. This survey paper provides a comprehensive review of techniques and systems for querying RDF knowledge graphs. While other reviews on this topic tend...

Citations

Article
Full-text available
Large-scale graph processing and Stream processing are two distinct computational paradigms for big data processing. Graph processing deals with computation on graphs of billions of vertices and edges. However, large-scale graph processing frameworks mostly work on graphs that do not change over time, while on the other end of the spectrum, stream processing operates on a continuous stream of data in real time. Modern-day graphs change very rapidly over time, and finding patterns in temporally evolving graphs could reveal a lot of insights that can not be unveiled using traditional graph computations. We have proposed a novel framework called FlowGraph which could find patterns in dynamic and temporally evolving graphs. Computations on large-scale graphs are iterative and take multiple steps before final results can be calculated, which is very different from stream processing which is one-shot computation. Therefore, the most critical bottleneck of such a system is the time required to process the query. In this work, we have proposed a query optimization technique that could reduce the time required to process the pattern. The proposed system has an optimization technique that could reduce the time required to process the pattern, especially those related to the temporal evolution of the graph. Our method shows for eight clauses the execution time is reduced by 75%, we also proved that this improvement is not affected by the scaling of the graph or the change of elements in given clauses.
Article
Full-text available
Nowadays, the volume of online data stored on websites is constantly increasing, and users’ demand for faster query response times is also on the rise with the expansion of network bandwidth. To improve the efficiency of database query, many large enterprises use database partitioning to divide huge database tables and speed up query results. While database partitioning methods based on query workloads have been successful, they have their limitations. These methods rely heavily on current workloads and the resulting partitioning structures may need to be improved when workloads change, a process called database repartitioning. Most current methods for repartitioning involve restarting the partitioning module directly, leading to significant overhead in industry due to the high complexity of the partitioning algorithm. Additionally, existing repartitioning models are often artificially determined and cannot achieve truly adaptive repartitioning. To address these issues, we propose a multi-tree training sampling model based on existing tree-shaped structure, which can speed up qdtree partitioning algorithm and reduce overhead caused by repartitioning. We also introduce improvements to qdtree structure to make it more adaptable to our method. For each query received by the partitioning model, we use a result-return rate mechanism to accumulate the evaluation of the current query on the partition structure, and initiate repartitioning only after a certain threshold is reached. Furthermore, we use the data redundancy storage technique to further improve query speed.