Figure 3 - uploaded by Yannis Kotidis
Content may be subject to copyright.
3-dimensional angle-based partitioning  

3-dimensional angle-based partitioning  

Source publication
Conference Paper
Full-text available
Recently, skyline queries have attracted much attention in the database research community. Space partitioning tech- niques, such as recursive division of the data space, have been used for skyline query processing in centralized, paral- lel and distributed settings. Unfortunately, such grid-based partitioning is not suitable in the case of a paral...

Contexts in source publication

Context 1
... technique first maps the cartesian coordinate space into a hyperspherical space, and then partitions the data space based on the angular coordinates into N partitions. A vi- sualization of the angle-based partitioning technique for a 3-dimensional space is depicted in Figure 3. ...
Context 2
... leads to a partitioning where all points that have similar angular coordinates fall in the same partition independently from the radial coordinate, i.e. how far the point is from the origin. For example consider the 3-dimensional space depicted in Figure 3. The data space is divided in N = 9 partitions using the angular coordinates φ1 and φ2. ...
Context 3
... differs for the angle-based partitioning scheme where the partitioning space (angular coordinates) is derived through a perspective projection of the data space. Consider for example Figure 3 where the partitioning space is the surface of the sphere, while the data space is the sphere itself. It is not sufficient to split the angular coordinates into equal parts in order to have equi-volume parts of the data space projected into each partition. ...
Context 4
... this is similar to applying the aforementioned grid partitioning scheme on the surface of the part of the hypersphere that contains all points in the dataset. For ex- ample, in the case of d = 3 and N = 9 (see Figure 3), there are two angular coordinates (φ1 and φ2) and each of them is divided into 3 parts, thus resulting in 9 partitions. The data space is a part (namely the 1 8 -th) of the sphere that encom- passes the data points with volume equal to V3 = πL 3 /6. ...
Context 5
... of dividing the data space into N equi-volume partitions at once, we separately divide for each angular coordinate φi the data space into k equi-volume partitions. For example in the case of Figure 3 with d = 3 and N = 9, the data space could be divided into 3 equi-volume slices based on the angular coordinate φ1, which results in the boundary values φ 1 1 = 48.24 and φ 2 1 = 70.55. ...

Similar publications

Article
Full-text available
Performing online selective queries against graphs is a challenging problem due to the unbounded nature of graph queries which leads to poor computation locality. It becomes even difficult when a graph is too large to be fit in the memory. Although there have been emerging efforts on managing large graphs in a distributed and parallel setting, e.g....

Citations

... ABSP employs a novel strategy of utilizing angles to partition the space into distinct regions. By leveraging this approach, each skyline within the partitioned space can be individually identified, and subsequently, these skylines are amalgamated to form a unified skyline representation [16]. However, it is worth noting that ABSP possesses a drawback in the form of prolonged angle computation, which can contribute to increased processing time and hinder the overall efficiency of the method. ...
Article
Full-text available
In recent years, people have been buying custom-built PCs based on the performance they want and what they will use them for. However, there are many challenges for non-technical users when purchasing a custom-built PC. Not only is the terminology of computer devices unfamiliar to non-experts, but there are many specifications for different computer devices that need to be considered. Therefore, this paper proposes a method for recommending appropriate device models when purchasing custom-built PCs using a skyline. Because different computer devices have different specifications, we need a method that takes into account multiple attributes. Skyline querying is a technique that considers multiple attributes of an object and indexes them in order of user satisfaction. A grid skyline is a technique that uses a grid-based partitioning technique to reduce the number of calculations of the dominance relationship between objects in the existing skyline technique, thus reducing the index construction time. We measured the similarity between the results of the grid skyline and the leaderboard for each model of computer device. As a result of this experiment, compared to the leaderboard categorized by model of computer device, the average score was 88 out of 100, which was similar to the actual leaderboard.
... It would be interesting to implement additional algorithms based on other paradigms like ordering [3,4,10,11,17,18,33] or index structures [28,29,40] and to evaluate their strengths and weaknesses in the Apache Spark context. Also, for the partitioning scheme, further options such as angle-based partitioning [25,42] are worth trying. Further specialized algorithms, that require a deeper modification of Spark (such as Z-order partitioning [41], which requires the computation of a Z-address for each tuple [30]) are more longterm projects. ...
Preprint
Full-text available
Skyline queries are frequently used in data analytics and multi-criteria decision support applications to filter relevant information from big amounts of data. Apache Spark is a popular framework for processing big, distributed data. The framework even provides a convenient SQL-like interface via the Spark SQL module. However, skyline queries are not natively supported and require tedious rewriting to fit the SQL standard or Spark's SQL-like language. The goal of our work is to fill this gap. We thus provide a full-fledged integration of the skyline operator into Spark SQL. This allows for a simple and easy to use syntax to input skyline queries. Moreover, our empirical results show that this integrated solution of skyline queries by far outperforms a solution based on rewriting into standard SQL.
... For example, the array in Fig. 1 has two dimensions, D 1 (latitude) and D 2 (longitude), and two attributes, A 1 (living population) and A 2 (real estate price). Cell (0, 1) has (16,15) attribute values while cell (0, 2) is an empty cell. ...
... (8) Shuffle: After representative computation, the coordinator gathered all subarrays from all workers, randomly divided the entire subarrays into disjoint subsets with an equal size, and sent each subset back to each worker without any operations. Shuffle served as a guideline for distributed methods based on how to partition data [16,17]. ...
... Tradeoff [63] and FDS [64] focused on reducing communication costs through more rounds. Data-Partitioning based distributed skyline processing has also been studied in [16,17,[65][66][67]. Distributed skyline query processing in MapReduce has been studied in [17,18,[68][69][70][71]. ...
Article
Full-text available
Large-scale spatial data have been generated in various fields such as scientific domains and location-based services. Array databases, which model a space as an array, have become one of the means of managing such spatial data. Each cell in an array tends to interact with cells neighboring with regard to dimensions (such as latitude and longitude); therefore, instead of considering a single cell, considering a concept of subarray is required in some applications. In addition, each cell has several attribute values (such as temperature and price) to indicate its features. Based on the two observations, we propose a new type of query, subarray skyline, that provides a way to find meaningful subarrays or filter less meaningful subarrays considering attributes. We also introduce an efficient processing method, ReSKY, for subarray skyline query processing. To handle large-scale spatial data, we extend ReSKY to distributed processing. We also propose another version of ReSKY that reduces memory usage during query processing. Through extensive experiments using an array database and real datasets, we show that ReSKY has better performance than the existing techniques.
... Based on this point, we use reference vectors to conduct hyper-spherical partitioning for generating a set of partitions. This approach is introduced in the parallel sky query for optimal data partitioning [56], [57]. The general procedures are as follows: ...
... In discontinued MaOPs, its PF is composed of several discontinued segments, if there are no interactions between a weight vector and the PF, the best solution obtained by MOEA/D depends on the nearest point Q of the corresponding PF segment, as shown in Fig. 1(a). In contrast, the angle-based partitioning scheme [57] can cover the discontinued segment with higher probability, and then the algorithm uses a specific preference indicator to find the relatively feasible points P and Q on PF in the partition, as shown in Fig. 1. ...
... As a result, the secondary selection criterion becomes more significant for the survival of individuals. Recent pruning power measure has great potential to tackle this issue [57], while it aims to calculate the dominance area of the solution the located partition, which can be used as an indicator to drive the evolution of the population. ...
Article
This paper presents a multi-objective search-based approach to generate balanced maps for real-time strategy games. First, an angle-based pruning-power indicator guided evolutionary algorithm is proposed to solve the many-objective optimization problems. This algorithm divides the hyper-spherical coordinate space into a set of local partitions, where the pruning power indicator is used to access the dominance ability of the solution over its located partition, and the radius-penalized angle calculation indicator aims to guide the evolution of population along specific partition direction. In this way, both the convergence and diversity of solutions are considered, and the selection pressure toward Pareto fronts is strengthened. We show the effectiveness of the proposed strategies on a set of many-objective benchmark functions. Then, the proposed algorithm is used to generate game maps –a real-world problem derived from the procedural content generation, in which there are four objective functions to guide the map generation of MegaGlest based on the principle of fairness, playability, strategy, and interestingness of games. Experimental results on four map generation instances verify the competitiveness and effectiveness of the proposed algorithm.
... As time goes by, there are mass data generated. To obtain skyline from the mass data, many works [1,7,15,24,27,28], divide the dataset and compute the (reverse) skyline results in parallel. Afrati et al. [1], Mullesgaard et al. [15], Zhang et al. [28], and Islam et al. [7] adopt the grid partition strategy, a space-driven partition method. ...
... Since skyline objects cannot be dominated by others, the OSP scheme can use spatial dominance relationships to reduce the number of subspaces processing. Vlachou et al. [24] adopt the angle partition strategy that divides a d-dimensional dataset into subsets by applying a grid partition method over the d −1 angular coordinates. The angle partition strategy is motivated by the observation that the skyline tuples are located near the origin of coordinates since the smaller value of each attribute is preferred. ...
Article
Full-text available
Given a set of existing products in the market and a set of customer preferences, we set a price for a specific product selected from a pool of candidate products to launch to market to gain the most profit. A customer preference represents his/her basic requirements. The dynamic skyline of a customer preference identifies the products that the customer may purchase. Each time the price of a candidate product is adjusted, it needs to compete with all of the existing products to determine whether it can be one of the dynamic skyline products of some customer preferences. To compute in parallel, we use a Voronoi-Diagram-based partitioning method to separate the set of existing products and that of customer preferences into cells. For these cells, a large number of combinations can be generated. For each price under consideration of a candidate product, we process all the combinations in parallel to determine whether this candidate product can be one of the dynamic skyline products of the customer preferences. We then integrate the results to decide the price for each candidate product to achieve the most profit. To further improve the performance, we design two efficient pruning strategies to avoid computing all combinations. A set of experiments using real and synthetic datasets are performed and the experiment results reveal that the pruning strategies are effective.
... Most methods have proposed grid-based partitioning or angle-based partitioning techniques. The grid-based partitioning technique tends to be widely adopted in distributed environments for computation of the skyline [24]. In this technique, the data space is split into several partitions to divide the workload between different machines. ...
... Recent works are taking an increasing interest in this method. It was initially introduced in [24]. This algorithm uses hyperspherical coordinates of points for partitioning the space. ...
... The angle-based partitioning method was first proposed in [24]. The approach primarily aims to increase the performance of parallel skyline query processing. ...
Article
Full-text available
In recent years, numerous applications have been continuously generating large amounts of uncertain data. The advanced analysis queries such as skyline operators are essential topics to extract interesting objects from the vast uncertain dataset. Recently, the MapReduce system has been widely used in the area of big data analysis. Although the probabilistic skyline query is not decomposable, it does not make sense to implement the probabilistic skyline query in the MapReduce framework. This paper proposes an effective parallel method called parallel computation of probabilistic skyline query (PCPS) that can measure the probabilistic skyline set in one MapReduce computation pass. The proposed method takes into account the critical sections and detects data with a high probability of existence through a proposed smart sampling algorithm. PCPS implements a new approach to the fair allocation of input data. The experimental results indicate that our proposed approach can not only reduce the processing time of the probabilistic skyline queries, but also achieve fair precision with varying dimensionality degrees.
... Finally, the results are merged to get the global result. Data partitioning is employed by many schemes of partitioning such as grid-portioning [30], angular partitioning [31] and . . . etc. ...
Preprint
Full-text available
It has recently become a critical issue to provide software development in a service-based conceptual style for business companies . As a powerful technology for service-oriented computing, the composition of web services is investigated. This offered great opportunities to improve IT industries and business processes by forming new value-added services that satisfy the user’s complex requirements. Unfortunately, many challenges are facing the service composition process. These include the difficulties to satisfy the user’s complex demands, maintaining the performance to be matched with the quality of service (QoS) requirements, and search space reduction for QoS missing or changeable values. Accordingly, this paper proposes a cloud-based QoS provisioning service composition (CQPC) framework to address these challenges. To prove the concept and the applicability of the CQPC framework, a Hybrid Bio-Inspired QoS provisioning (HBIQP) technique is presented for the operation of the CQPC framework modules. The solution space is reduced via utilizing skyline concepts to have faster execution time and keep only reliable and most interesting services. The CQPC framework is equipped with two proposed algorithms: (i) the modified highly accurate prediction (MHAP) algorithm to enhance the prediction of QoS values of the services participating in the composition process, (ii) the MapReduce fruit fly Particle swarm Optimization (MR-FPSO) algorithm to handle composing web services for large scale of data in the cloud environment. The experimental results demonstrate the worthiness of the HBIQP technique to meet the performance metrics more than other state-of-the-art techniques in terms of average fitness value, accuracy, and execution time.
... There are many different variations of skyline queries trying to solve different problems. For example, ε-skyline [22] allows users to control the number of output skyline points relaxing or restricting the dominance property, k-skyband queries [20] consider that multiple dominated points may be an option, angle-based approach [23] computes the skyline by modifying the dominance property, in [24] authors try to find an approximation of the original skyline, metric skyline [25] is useful to find the strongest DNA sequence similarity where a string is of a more appropriate value representation than a vector, in Euclidean space and [26] studies the cardinality and complexity of skyline queries in anti-correlated distributions. For the cases where the set of skyline points is too large, the representative skyline was proposed [27]. ...
Article
Full-text available
One of the most common tasks nowadays in big data environments is the need to classify large amounts of data. There are numerous classification models designed to perform best in different environments and datasets, each with its advantages and disadvantages. However, when dealing with big data, their performance is significantly degraded because they are not designed—or even capable—of handling very large datasets. The current approach is based on a novel proposal of exploiting the dynamics of skyline queries to efficiently identify the decision boundary and classify big data. A comparison against the popular k-nearest neighbor (k-NN), support vector machines (SVM) and naïve Bayes classification algorithms shows that the proposed method is faster than the k-NN and the SVM. The novelty of this method is based on the fact that only a small number of computations are needed in order to make a prediction, while its full potential is revealed in very large datasets.
... Therefore, various data-partitioning techniques have been proposed. In particular, Vlachou et al. [18] proposed a skyline query method that has received significant attention and is based on angle-based space partitioning (ABSP). Grid partitioning, which is a traditional partitioning approach, can potentially create a partition that does not contain a skyline. ...
Article
Full-text available
The skyline query has recently attracted a considerable amount of research interest in several fields. The query conducts computations using the domination test, where “domination” means that a data point does not have a worse value than others in any dimension, and has a better value in at least one dimension. Therefore, the skyline query can be used to construct efficient queries based on data from a variety of fields. However, when the number of dimensions or the amount of data increases, naïve skyline queries lead to a degradation in overall performance owing to the higher cost of comparisons among data. Several methods using index structures have been proposed to solve this problem but have not improved the performance of skyline queries because their indices are heavily influenced by the dimensionality and data amount. Therefore, in this study, we propose HI-Sky, a method that can perform quick skyline computations by using the hash index to overcome the above shortcomings. HI-Sky effectively manages data through the hash index and significantly improves performance by effectively eliminating unnecessary data comparisons when computing the skyline. We provide the theoretical background for HI-Sky and verify its improvement in skyline query performance through comparisons with prevalent methods.
... In order to exploit this architecture, the authors in [13] published APSkyline for the skyline computation in a multi-core system. It proposes the angle-based partitioning model [20]; then, APskyline adopts the paradigm partition-execute-merge framework where the dataset is split into N partitions (one for each core). The local skyline set for each partiti on is computed, and finally the skyline set is determined by merging these local skyline sets. ...
Article
Full-text available
The skyline computation is very important in the field of decision making. It gives solutions to help among a wide dataset and where information is contradictory, especially when the implemented solution is progressive. As the need to get rapid solution is growing, it will be suitable to exploit new machines' performances and plate-forms. In this paper, we present a new solution of type divide-and-conquer for computing the skyline on GPU (Graphics Processing Units) cards. The proposed partitioning is adaptable to characteristics of the GPU. This proposition can lead to a well balanced computing and avoids overflows. The dominance tests are performed on points components in parallel and dominated points are early discarded unlike other solutions which save them for next loops. This way of comparison avoids threads' idleness. We compare our solution with other similar solutions on the same datasets. Experimentations show that our proposition is better in terms of time computing and exploitation of the GPU parallelism.