3-dimensional angle-based partitioning

Source publication

Angle-based space partitioning for efficient parallel skyline computation

Conference Paper

Full-text available

Jun 2008

Recently, skyline queries have attracted much attention in the database research community. Space partitioning tech- niques, such as recursive division of the data space, have been used for skyline query processing in centralized, paral- lel and distributed settings. Unfortunately, such grid-based partitioning is not suitable in the case of a paral...

Context 1

... technique first maps the cartesian coordinate space into a hyperspherical space, and then partitions the data space based on the angular coordinates into N partitions. A vi- sualization of the angle-based partitioning technique for a 3-dimensional space is depicted in Figure 3. ...

View in full-text

Context 2

... leads to a partitioning where all points that have similar angular coordinates fall in the same partition independently from the radial coordinate, i.e. how far the point is from the origin. For example consider the 3-dimensional space depicted in Figure 3. The data space is divided in N = 9 partitions using the angular coordinates φ1 and φ2. ...

View in full-text

Context 3

... differs for the angle-based partitioning scheme where the partitioning space (angular coordinates) is derived through a perspective projection of the data space. Consider for example Figure 3 where the partitioning space is the surface of the sphere, while the data space is the sphere itself. It is not sufficient to split the angular coordinates into equal parts in order to have equi-volume parts of the data space projected into each partition. ...

View in full-text

Context 4

... this is similar to applying the aforementioned grid partitioning scheme on the surface of the part of the hypersphere that contains all points in the dataset. For ex- ample, in the case of d = 3 and N = 9 (see Figure 3), there are two angular coordinates (φ1 and φ2) and each of them is divided into 3 parts, thus resulting in 9 partitions. The data space is a part (namely the 1 8 -th) of the sphere that encom- passes the data points with volume equal to V3 = πL 3 /6. ...

View in full-text

Context 5

... of dividing the data space into N equi-volume partitions at once, we separately divide for each angular coordinate φi the data space into k equi-volume partitions. For example in the case of Figure 3 with d = 3 and N = 9, the data space could be divided into 3 equi-volume slices based on the angular coordinate φ1, which results in the boundary values φ 1 1 = 48.24 and φ 2 1 = 70.55. ...

View in full-text

Pairwise shortest path distance computation: c-WSSD versus random...

Scalability test: preprocessing cost of the Voronoi diagram...

The speed up of evaluating pst\documentclass[12pt]{minimal}...

Distance-Aware Selective Online Query Processing Over Large Distributed Graphs

Article

Full-text available

Mar 2017

Performing online selective queries against graphs is a challenging problem due to the unbounded nature of graph queries which leads to poor computation locality. It becomes even difficult when a graph is too large to be fit in the memory. Although there have been emerging efforts on managing large graphs in a distributed and parallel setting, e.g....

The Design and Construction of a Grid Skyline for Custom-Built PC Recommendations Based on a Multi-Attribute Model

Article

Full-text available

Aug 2023

In recent years, people have been buying custom-built PCs based on the performance they want and what they will use them for. However, there are many challenges for non-technical users when purchasing a custom-built PC. Not only is the terminology of computer devices unfamiliar to non-experts, but there are many specifications for different computer devices that need to be considered. Therefore, this paper proposes a method for recommending appropriate device models when purchasing custom-built PCs using a skyline. Because different computer devices have different specifications, we need a method that takes into account multiple attributes. Skyline querying is a technique that considers multiple attributes of an object and indexes them in order of user satisfaction. A grid skyline is a technique that uses a grid-based partitioning technique to reduce the number of calculations of the dominance relationship between objects in the existing skyline technique, thus reducing the index construction time. We measured the similarity between the results of the grid skyline and the leaderboard for each model of computer device. As a result of this experiment, compared to the leaderboard categorized by model of computer device, the average score was 88 out of 100, which was similar to the actual leaderboard.

Integration of Skyline Queries into Spark SQL

Preprint

Full-text available

Oct 2022

Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL.

ReSKY: Efficient Subarray Skyline Computation in Array Databases

Article

Full-text available

Jul 2022
DISTRIB PARALLEL DAT

Large-scale spatial data have been generated in various fields such as scientific domains and location-based services. Array databases, which model a space as an array, have become one of the means of managing such spatial data. Each cell in an array tends to interact with cells neighboring with regard to dimensions (such as latitude and longitude); therefore, instead of considering a single cell, considering a concept of subarray is required in some applications. In addition, each cell has several attribute values (such as temperature and price) to indicate its features. Based on the two observations, we propose a new type of query, subarray skyline, that provides a way to find meaningful subarrays or filter less meaningful subarrays considering attributes. We also introduce an efficient processing method, ReSKY, for subarray skyline query processing. To handle large-scale spatial data, we extend ReSKY to distributed processing. We also propose another version of ReSKY that reduces memory usage during query processing. Through extensive experiments using an array database and real datasets, we show that ReSKY has better performance than the existing techniques.

Angle-Based Multi-Objective Evolutionary Algorithm Based On Pruning-Power Indicator for Game Map Generation

Article

Apr 2022

This paper presents a multi-objective search-based approach to generate balanced maps for real-time strategy games. First, an angle-based pruning-power indicator guided evolutionary algorithm is proposed to solve the many-objective optimization problems. This algorithm divides the hyper-spherical coordinate space into a set of local partitions, where the pruning power indicator is used to access the dominance ability of the solution over its located partition, and the radius-penalized angle calculation indicator aims to guide the evolution of population along specific partition direction. In this way, both the convergence and diversity of solutions are considered, and the selection pressure toward Pareto fronts is strengthened. We show the effectiveness of the proposed strategies on a set of many-objective benchmark functions. Then, the proposed algorithm is used to generate game maps –a real-world problem derived from the procedural content generation, in which there are four objective functions to guide the map generation of MegaGlest based on the principle of fairness, playability, strategy, and interestingness of games. Experimental results on four map generation instances verify the competitiveness and effectiveness of the proposed algorithm.

Finding the most profitable candidate product by dynamic skyline and parallel processing

Article

Full-text available

Dec 2021
DISTRIB PARALLEL DAT

Given a set of existing products in the market and a set of customer preferences, we set a price for a specific product selected from a pool of candidate products to launch to market to gain the most profit. A customer preference represents his/her basic requirements. The dynamic skyline of a customer preference identifies the products that the customer may purchase. Each time the price of a candidate product is adjusted, it needs to compete with all of the existing products to determine whether it can be one of the dynamic skyline products of some customer preferences. To compute in parallel, we use a Voronoi-Diagram-based partitioning method to separate the set of existing products and that of customer preferences into cells. For these cells, a large number of combinations can be generated. For each price under consideration of a candidate product, we process all the combinations in parallel to determine whether this candidate product can be one of the dynamic skyline products of the customer preferences. We then integrate the results to decide the price for each candidate product to achieve the most profit. To further improve the performance, we design two efficient pruning strategies to avoid computing all combinations. A set of experiments using real and synthetic datasets are performed and the experiment results reveal that the pruning strategies are effective.

Parallel computation of probabilistic skyline queries using MapReduce

Article

Full-text available

Jan 2021
J SUPERCOMPUT

Elaheh Gavagsaz

In recent years, numerous applications have been continuously generating large amounts of uncertain data. The advanced analysis queries such as skyline operators are essential topics to extract interesting objects from the vast uncertain dataset. Recently, the MapReduce system has been widely used in the area of big data analysis. Although the probabilistic skyline query is not decomposable, it does not make sense to implement the probabilistic skyline query in the MapReduce framework. This paper proposes an effective parallel method called parallel computation of probabilistic skyline query (PCPS) that can measure the probabilistic skyline set in one MapReduce computation pass. The proposed method takes into account the critical sections and detects data with a high probability of existence through a proposed smart sampling algorithm. PCPS implements a new approach to the fair allocation of input data. The experimental results indicate that our proposed approach can not only reduce the processing time of the probabilistic skyline queries, but also achieve fair precision with varying dimensionality degrees.

Towards Service Composition based on Hybrid Bio-Inspired Cloud-based QoS Provisioning Approach.

Preprint

Full-text available

Dec 2020

It has recently become a critical issue to provide software development in a service-based conceptual style for business companies . As a powerful technology for service-oriented computing, the composition of web services is investigated. This offered great opportunities to improve IT industries and business processes by forming new value-added services that satisfy the user’s complex requirements. Unfortunately, many challenges are facing the service composition process. These include the difficulties to satisfy the user’s complex demands, maintaining the performance to be matched with the quality of service (QoS) requirements, and search space reduction for QoS missing or changeable values. Accordingly, this paper proposes a cloud-based QoS provisioning service composition (CQPC) framework to address these challenges. To prove the concept and the applicability of the CQPC framework, a Hybrid Bio-Inspired QoS provisioning (HBIQP) technique is presented for the operation of the CQPC framework modules. The solution space is reduced via utilizing skyline concepts to have faster execution time and keep only reliable and most interesting services. The CQPC framework is equipped with two proposed algorithms: (i) the modified highly accurate prediction (MHAP) algorithm to enhance the prediction of QoS values of the services participating in the composition process, (ii) the MapReduce fruit fly Particle swarm Optimization (MR-FPSO) algorithm to handle composing web services for large scale of data in the cloud environment. The experimental results demonstrate the worthiness of the HBIQP technique to meet the performance metrics more than other state-of-the-art techniques in terms of average fitness value, accuracy, and execution time.

A Skyline-Based Decision Boundary Estimation Method for Binominal Classification in Big Data

Article

Full-text available

Sep 2020

One of the most common tasks nowadays in big data environments is the need to classify large amounts of data. There are numerous classification models designed to perform best in different environments and datasets, each with its advantages and disadvantages. However, when dealing with big data, their performance is significantly degraded because they are not designed—or even capable—of handling very large datasets. The current approach is based on a novel proposal of exploiting the dynamics of skyline queries to efficiently identify the decision boundary and classify big data. A comparison against the popular k-nearest neighbor (k-NN), support vector machines (SVM) and naïve Bayes classification algorithms shows that the proposed method is faster than the k-NN and the SVM. The novelty of this method is based on the fact that only a small number of computations are needed in order to make a prediction, while its full potential is revealed in very large datasets.

HI-Sky: Hash Index-Based Skyline Query Processing

Article

Full-text available

Mar 2020

The skyline query has recently attracted a considerable amount of research interest in several fields. The query conducts computations using the domination test, where “domination” means that a data point does not have a worse value than others in any dimension, and has a better value in at least one dimension. Therefore, the skyline query can be used to construct efficient queries based on data from a variety of fields. However, when the number of dimensions or the amount of data increases, naïve skyline queries lead to a degradation in overall performance owing to the higher cost of comparisons among data. Several methods using index structures have been proposed to solve this problem but have not improved the performance of skyline queries because their indices are heavily influenced by the dimensionality and data amount. Therefore, in this study, we propose HI-Sky, a method that can perform quick skyline computations by using the hash index to overcome the above shortcomings. HI-Sky effectively manages data through the hash index and significantly improves performance by effectively eliminating unnecessary data comparisons when computing the skyline. We provide the theoretical background for HI-Sky and verify its improvement in skyline query performance through comparisons with prevalent methods.

An Efficient Contribution to Computing the Skyline on GPU

Article

Full-text available

Dec 2019

The skyline computation is very important in the field of decision making. It gives solutions to help among a wide dataset and where information is contradictory, especially when the implemented solution is progressive. As the need to get rapid solution is growing, it will be suitable to exploit new machines' performances and plate-forms. In this paper, we present a new solution of type divide-and-conquer for computing the skyline on GPU (Graphics Processing Units) cards. The proposed partitioning is adaptable to characteristics of the GPU. This proposition can lead to a well balanced computing and avoids overflows. The dominance tests are performed on points components in parallel and dominated points are early discarded unlike other solutions which save them for next loops. This way of comparison avoids threads' idleness. We compare our solution with other similar solutions on the same datasets. Experimentations show that our proposition is better in terms of time computing and exploitation of the GPU parallelism.

3-dimensional angle-based partitioning

Contexts in source publication

Similar publications

Citations