PreprintPDF Available

To Review and Compare Evolutionary Algorithms in Optimization of Distributed Database Query

September 2022

September 2022

Authors:

Mohammed Hussein Abdalla

University of Raparin

Murat Karabatak

Firat University

Preprints and early-stage research may not have been peer reviewed yet.

Distributed Database Optimization Model[8]

…

The structure of a chromosome.

…

The structure of the Swarm Particle Algorithm.

…

The structure of Fitness function [15].

…

Figures - uploaded by Mohammed Hussein Abdalla

Content may be subject to copyright.

Content uploaded by Mohammed Hussein Abdalla

Content may be subject to copyright.

To Review and Compare Evolutionary Algorithms

in Optimization of Distributed Database Query

Mohammed Hussein Abdalla

Department of Computer Science

University of Raparin

Rania, Iraq

muhamad_it@uor.edu.krd

Murat Karabatak

Department of Software Engineering,

Firat University,

Elazig, Turkey

mkarabatak@firat.edu.tr

Abstract— Processing queries in distributed databases

require the conversion and transmission of data from local and

global sites. A query can be used to create a different executive

plan of the query, which these plans are equal in size, meaning

that they achieve the same result, but differ in the order in which

the operators are executed and how they are executed, and

therefore in their performance. The problem of query

optimization in the distributed database is NP-hard in nature

and difficult to solve.

Exploring all the query plans in large

search space is not feasible.

The purpose and motivation of this

paper are to examine the query challenges in the distributed

databases and algorithms for the executive plan of the query.

Keywords— Distributed Database, Search Strategy, Query

optimization, Evolutionary Algorithms

I. I

NTRODUCTION

A database is a set of organized data that is stored

according to specific rules and regulations. Distributed

Database is a set of databases that logically communicate with

each other and are distributed over a computer network to

improve the performance, credibility, availability, and

modularity of distributed systems. The distributed databases

are more reliable, more accessible, and had better

performance. The system increases file sharing and its security

and extensibility and increases the local independence of the

organization's databases, which is geographically distributed

in different locations by applying local policies to use the

databases [1]. To extract information from database tables and

analyze them, the query is used. The query is a useful way to

save data. A query selects the user's intended data from

different sources and determines how to combine this data [2].

With the development of database technology and the use

of distributed databases in computer networks, the issue of

query optimization in the decentralized database has been

raised. Query optimization in distributed environments has

more problems and complexities than centralized

environments. In a decentralized database, since the data is

geographically distributed in different locations, in addition to

the cost of input and output operations, other costs such as

processing cost and transmission cost on the network are also

considered in search optimization. Transmission cost is a cost

of transferring data obtained by a request from various sites to

a site from which the request is executed.

Data recovery from various resources is considered a

Distributed Query Plan (DQP) [3]. The specific purpose of

optimizing the distributed query is to determine the criteria of

the frameworks on other sites to reduce the total cost incurred

during the execution of a set of requests. By implementing this

method, the average time of request execution is reduced,

which is of great importance in a normal distributed group and

multimedia database systems. In this optimization, a strategy

is chosen that uses fewer resources and ultimately reduces its

running costs [4]. When a user has a query, it becomes

standard algebraic forms, and the optimizer looks for a series

of optimal executable programs for it. As the number of

relations required to process a query increases, so the number

of possible alternative solutions for query execution increases

exponentially. The optimizer has to search for a very large

space to run an optimal program for a query. In designing

optimization algorithms, a lot of attention is paid to the time

and cost required to run these programs. The selection of the

query executive plan with an increase in the number of

relations and joins in distributed database tables, is associated

with some problems which are presented as an NP-Hard

problem. To solve this problem, research has been done using

evolutionary algorithms, such as genetics and particle swarm.

But considering all the parameters of the problem and its

restrictions, as well as obtaining the optimal response, will be

a very time consuming and unacceptable issue. Therefore,

swarm intelligence methods such as ant colony were used to

solve this problem. One of the problems with this algorithm is

that by increasing the number of relations and the spread of

the database, the diversity of the algorithm will be reduced and

the algorithm will be converted to a local optimum. Therefore,

the combination of ant colony algorithm with other

evolutionary algorithms such as genetic algorithm was

proposed to solve the optimal query problem, and in further

research, meta-heuristic algorithms such as bee colony

algorithm were used. This algorithm is significantly capable

of finding high-quality responses by searching the response

space [5].

In the following, the structure of the paper has been

expressed. In the first part of this paper, the issue of optimizing

the query is explained and the basic steps of query

optimization in distributed databases are examined. In the

second part, the optimizer architecture and problem definition

are discussed. In the third part, we will examine how to

allocate data to different sites and implementation methods to

produce the optimal plan. In the next section, we will evaluate

the query optimization evolutionary algorithms, and finally

highlight the outstanding points for future research.

II. Q

UERY

PTIMIZATION IN

ISTRIBUTED

ATABASES

To reduce the number of input and output operations is

the main purpose of query optimization that database

management needs to do to recover data. This operation is

relatively time-consuming, so reducing the number of input

and output operations will speed up the execution of the

Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.

query. The optimizer works based on the value of each

achievement, and the database management checks the

amount of task required, then selects the plan that takes the

least amount of task. The optimizer evaluation is based on the

data that the database management stores about each table

and index. Therefore, this data used to evaluate the number

of input/output operations required to recover the intended

query records [6].

The distributed query processing is done in three phases

of local processing, simplification, and assembly, in the local

processing phase, selections are performed. The initial

specified algebraic query is converted into sections (data

analysis) and became available for initial processing in the

relevant sites. In the simplification phase, a reducer is used to

reduce the size of the relations, and a series of joins and semi-

joins, to minimize the amount of data, has been used, i.e., the

number of joins that need to be transferred in a cost-effective

way to implement a joint operation; and in the final

processing phase or result assembly, the relations are

collected in one place and the ending result of the query is

created and sent [7]. Query optimization in the distributed

database includes 4 steps:

A. Decomposition of Query

B. 2. Localization of Data

C. 3. Globally Optimization

D. Locally Optimization

The query analysis process involves fragmenting a

complex query into simpler algebraic terms, during which the

data is checked and verified.

After the data analysis process, the process of data

localization is done. Localization of the data means the

availability of the query data in local space to process it.

Overall optimization mainly involves determining the

best execution place for local sub-query. Much research has

been done on various factors involved in this area, such as

optimization algorithms, search space, implementation

strategies, and cost models for query optimization [8].

Fig. 1. Distributed Database Optimization Model[8]

According to the relational model in the databases, query

optimization leads to an increase in the performance of the

database management system. The most important goal in

optimization is the review, selection, and use of the most

appropriate method which can provide the result of the query

to the applicant in the shortest possible time. Usually, in a

database, the ability to insert, update, and delete exists, in

which these operations are less common than queries. The

cost of a strategy is the sum of the processing costs of each

operator. Among them, the most costly ones are the process

and optimize the join operator () and because the optimization

of query consisting of the join operator is much more time

consuming than other relational operators, like a select or

image operator. All of the query optimization algorithms is

about queries with join [9].

A query can be done through several methods, which

known as executive plans. These schemes have similar results

in terms of output but differ in terms of time and resource

consumption. Among the query execution plans, the query

optimizer selects a plan which has the lowest executive cost

in terms of time and resources [10]. Optimizer tasks are:

• Determining the execution order of relational

operations.

• Determining access methods for interrelated

relations.

• Determining the relations for the join operation in a

given search space by implementing appropriate

search strategies so that the performance

measurement reports the results of the execution of

the appropriate and optimal optimization program.

• Determine the order of data movements between

locations (join locations) so that it reduces the

amount of data and the cost of the communication

network.

Much research has been done on optimization query

algorithms using evolutionary algorithms in distributed

databases, and many researchers have addressed this issue.

III. S

EARCH

PACE

Search space is a set of alternative executable plans that

represent the input query. These plans are equal in size,

meaning that they achieve the same result, but differ in the

order in which the operators are executed and how they are

executed, and therefore in their performance. Search space is

obtained by applying transformation rules, such as relational

algebra [1]. Each database and the relation of its tables in a

distributed system can be presented by a graph, and any

executive plan of a query can be shown by a tree. For

example, in the following search:

Query 1: Select < attribute list> From F0, F1, F2, F3, F4

Where F0.A0 = F1.A0 and F1.A1 = F2.A1 and F2.A2 =

F3.A2 and F1.A3 = F3.A3

Figure 2 can be considered for the database on which this

query is executed.

Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.

Fig. 2. Database graph and how to connect it.

The executive plan of the query in this database is shown

in Figure 3.

Fig. 3. QEP query execution plan.

Figure 3, which shows the QEP executive plan, the

annotations represent the site where the join was performed

(Si), the order of the joins (Ji), the tables used for the join (Si,

Fi); and an arrow from the side of the reduction file is dragged

to the reduced file to show the reduction and reduced file. In

a distributed database that includes many relations and joins,

the number of QEPs increases exponentially due to the

relational and alternative feature of join operation (O (N!).

When N is the number of joins). Query optimizer requires a

large search space to produce optimal query plans which can

be linear trees (Left Deep Trees, Right Deep Trees, Bushy

Join Trees, Binary Trees or, Zig-Zag) (4).

Fig. 4. Forms of query join graphs [11].

As the number of joins and distributed data storage sites

increases, the solution space grows exponentially, and the

optimizer ability to deal with issues such as join order, join

method, reducing the size and cost of the query data became

a problem. Therefore, evolutionary algorithms such as ant

colony optimization, genetic algorithm, and particle swarm

optimization were studied to find the optimal and sub-optimal

solution for large join queries in the given search space

processed by relational and distributed databases. These

algorithms have been extremely successful due to their ability

to search globally, their strong nature, and their ability to

manage various combinational optimization problems.

IV. C

OST

UNCTION

The cost model is used to predict the cost of each

executive plan and compare equivalent plans to select the best

plan. To do this accurately, the cost model must have good

knowledge of the distributed executive environment by using

data statistics and cost functions.

In a distributed database, statistics are typically dependent

on components and include the cardinality of the components

and their size, and the number and size of the distinct values

of each feature. To minimize the possibility of error, more

accurate statistics are sometimes used, such as the histogram

of feature values in pursuit of higher management costs. The

accuracy of the statistics is obtained by periodic updates.

Proper cost measurement is the total cost of processing a

query. The total cost is the sum of the processing times of

query operators in different sites. Another accurate

measurement is the response time of the query, in which the

time is passed to perform the query. As operators can perform

executive operations in parallel on different sites, the

response time of a query may be significantly less than the

total cost of it.

In a distributed database system, the total costs to be

minimized include CPU, I/O, and communication costs. The

CPU cost is the cost that the CPU incurs when the operators

perform data on the main memory. The I/O cost is the time

required to access the disk. I/O cost can be minimized by

reducing the number of disk accesses through fast data access

methods and efficient use of main memory (buffer

management). The cost of communication is the time

required to exchange data between sites participating in the

query. This cost is generated in the processing of messages

(formatting /reformatting) and the transfer of data through the

communication network.

The cost component of communications is probably the

most important factor in a distributed database. Many initial

proposals for optimizing the distributed query assume that the

cost of communication is largely specific to local processing

cost (I/O and CPU cost), and therefore ignore the other factor.

However, modern distributed processing environments have

much faster communication networks whose bandwidth is

not comparable to disk. Therefore, the solution is to have a

weighted combination of these three components because

they all play a significant role in estimating the total cost of a

query.

V. E

VOLUTIONARY

LGORITHMS

Many types of research have been done to select the best

query plan to reduce communication costs. The basis of some

of this research is dynamic programming. But dynamic

programming is problematic in terms of scalability for large-

Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.

scale distributed databases. Therefore, evolutionary

algorithms were considered to overcome this problem.

Evolutionary algorithms have partially solved this problem

by dividing the search space. But the computational

complexity of these algorithms leads to increase response

time and offering the local optimum in some cases.

Therefore, it was suggested to use swam intelligence and

meta-heuristic algorithms to exit the local optimum. These

algorithms require less computational time in combination

with adaptive techniques to achieve the optimal solution.

Another advantage of using hybrid algorithms is the

avoidance of rapid convergence to local optimum, so the

accesses to the global optimum in these algorithms is more

likely. In the following, we will examine these studies. The

genetic algorithm has been used as an evolutionary algorithm

in the optimal selection of query executive plans. In this

algorithm, query executive plans are generated as problem

solutions, which n-compound of them representing a

chromosome Figure 5.

The chromosome set forms the primary population of this

algorithm. Then, among these chromosomes, using the

roulette wheel method, those are selected for the combination

which will play the role of parents in the genetic algorithm.

In the roulette wheel method, based on a cost fitness function,

chromosomes that have a lower communication cost and

processing cost, have a better chance of being selected [8].

Fig. 5. The structure of a chromosome.

In parental composition, some part of the parent

chromosome changes to create children based on the

midpoint method. Due to the validity degree of parent

chromosomes in terms of efficiency of query executive plans,

it is expected that children's executive plans will be

appropriate in terms of communication costs and processing

costs. This is also true in the mutation process. This algorithm

continues until the best query executable plans are presented

[12].

The ant colony algorithm is from the swarm intelligence

family which used to optimally selection of query executive

plans based on the shortest run-time [13].

The ACO algorithm models the behavior of true ant

colonies because they use the shortest path between their food

sources and nest. Ants can communicate with each other

through chemicals called pheromones and create and use a

path between the nest and the food source. Therefore, it is

likely that the shortest path has more pheromones, and ants

tend to return to the nest from this path. As the number of

relations increases in a query, more memory and processing

are used. Ant's marking behavior is used to guide them to

unexplored areas and search spaces, and to see all nodes

without knowing the topology of the graphs, therefore to find

optimal solutions for distributed database queries. These ants

calculate the execution time of the executable programs for a

given query and provide fast high-performance and optimal

results with the least amount of resources [14].

Particle swarm optimization (PSO) algorithm is a

universal minimization method that can be used to deal with

problems that those answers are a point or a surface in the n-

dimensional space. In such a space, assumptions are made

and an initial speed is assigned to them, and the

communication channels are considered between the

particles. These particles then move in the response space,

and the results are calculated based on a "merit criterion" after

each time interval. Over time, the particles accelerate toward

particles that have a higher merit criterion and are in the same

communication group [15].

At each step, the position of each particle is updated using

two values. The first is the best position ever achieved by the

particle. This position is called Pbest and the second case is

the best position ever obtained by the particle population and

is displayed as Gbest.

The new position of each particle is obtained from the

following equation:

󰇛1

󰇜

󰇛󰇜 (1)

Here, the last sentence specifies the speed (particle

displacement rate). The phrase is calculated as follows:













  󰇛󰇜







󰇛  󰇛󰇜󰇜 (2)

r1 and r2 values are random numbers in the range [0,1]

and c1 is the importance of the best state of the particle and

c2 is the importance of the best state of the neighbors. Many

parameters are involved in the execution of the PSO

algorithm, the proper setting of which greatly affects the

performance of the algorithm. These parameters are the

number of particles, the inertia coefficients [16].

Fitness function (in terms of time):

• I/O cost + CPU cost + communication cost

• These might have different weights in different

distributed environments (LAN vs. WAN)

• Can also maximize throughput

Fig. 6. The structure of the Swarm Particle Algorithm.

Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.

Fig. 7. The structure of Fitness function [15].

In this algorithm, the quality of query execution plans of

the algorithms is evaluated not only from the perspective of

discovered plan quality but also from the perspective of

optimization times. PSO algorithms can provide high-quality

solutions like those of genetic algorithms. In the algorithms,

particles representing the solutions are moved according to a

probability distribution rather than by manipulating the

velocity parameter. This mechanism ensures a simple and

eective way to explore the search space. As new and better

global and local solutions are discovered, PSO algorithms

keep searching and can produce near-optimal quality

execution plans [17].

VI. C

ONCLUSION

The perfect realization of the combination of evolutionary

algorithms for query in distributed databases is still a new

field of study. Research is progressing to generate, execute,

and implement combinations of evolutionary algorithms to

solve a variety of optimization problems. The results of this

research have shown that the execution and implementation

of these evolutionary algorithms produce appropriate and

feasible solutions for the relational and distributed database

management system in which the size of the queries and the

number of joins is large. There are still many research

opportunities in the field of producing optimal solutions and

reviewing and modifying search strategies using

combinations of evolutionary algorithms for distributed

databases, especially when the size and complexity of the

effect of the relation on the number of query parameters.

EFERENCES

[1] M.T. Özsu, P. Valduriez, Distributed Query Processing, Principles of

Distributed Database Systems, cham,(2020), PP. 129-182.

[2] J. Gao, W. Liu &Z. Li, An adaptive strategy for statistics collecting in

a distributed database, Frontiers of Computer Science, Comput. Sc i. 14,

145610 (2020).

[3] J.Gaoa, W.Liua, Z.Lia, J.Zhang, L.Shen, A general fragments

allocation method for join query in a distributed database, Information

Sciences, Vol.512, (2020), PP. 1249-1263.

[4] S.M. Mohsin Darwish, A. Younes, Dynamic Cost Ant Colony

Algorithm for Optimize Distributed Database Query, Proceedings of

the International Conference on Artificial Intelligence and Computer

Vision, Vol. 1153, (2020), PP. 170-181.

[5] P.Tiwari and S.V. Chande, Optimal Ant and Join Cardinality for

Distributed Query Optimization Using Ant Colony Optimization

Algorithm. Emerging Trends in Expert Applications and Security.

Advances in Intelligent Systems and Computing, Vol. 841, (2019), PP.

385-392.

[6] D. Das, J. Yan, M. Zait, S.R. Vallur, N.Vyas, R. Krishnamachari, P.

Gaharwar, J Kamp, N.Mukherjee, Query optimization in Oracle 12c

database in-memory. Proceedings of the VLDB Endowment,

8(12),(2020),PP.1770–1781.

[7] Ms. Preeti Tiwari and S. V. Chande, Query Optimization Strategies in

Distributed Databases, International Journal of Advances in

Engineering Sciences, Vol.3 (3), (2013), PP.23-29.

[8] M.Sharma, G.Singh, R.Singh, Clinical decision support system query

optimizer using hybrid Firefly and controlled Genetic Algorithm,

Journal of King Saud University - Computer and Information

Sciences,(2018).

[9] J. Gao, W. Liu & Z. Li, An adaptive strategy for statistics collecting in

distributed database, Frontiers of Computer Science, Vol. 14, (2020),

[10] A. Aljanaby, E. Abuelrub, M. Odeh, Database Systems Performance

Analysis, A Survey of Distributed Query Optimization, The

International Arab Journal of Information Technology, Vol. 2, (2005),

PP.48-57.

[11] S. Padia, S. Khulge, A. Gupta, P. Khadilika, Query Optimization

Strategies in Distributed Databases, Proceedings of 2nd International

Conference on Emerging Trends in Engineering and

Management,(2013).

[12] S.Mirjalili, J. Song Dong, A.S. Sadiq, H. Faris, Genetic Algorithm:

Theory, Literature Review, and Application in Image Reconstruction,

Nature-Inspired Optimizers, Vol. 811, (2020), PP. 69-85.

[13] S.Mirjalili, J. Song Dong, A. Lewis, Ant Colony Optimizer: Theory,

Literature Review, and Application in AUV Path Planning, Nature-

Inspired Optimizers, Vol. 811, (2020), PP. 7-21.

[14] D. Kumar, V.K. Jha, An improved query optimization process in big

data using ACO-GA algorithm and HDFS map reduce

technique, Distrib Parallel Databases (2020).

[15] S.Mirjalili, J. Song Dong, A. Lewis,A.S. Sadiq, Particle Swarm

Optimization: Theory, Literature Review, and Application in Airfoil

Design, Nature-Inspired Optimizers, Vol. 811, (2020), PP. 167-184.

[16] M.Q. Yasin, X. Zhang,R. Haq, Z. Feng, S. Yitagesu , A Comprehensive

Study for Essentiality of Graph Based Distributed SPARQL Query

Processing, Database Systems for Advanced Applications,Vol

10829,(2018), PP. 156-170.

[17] T. Dokeroglu, U. Tosun, A, Cosar, Particle Swarm Intelligence as a

new heuristic for the optimization of distributed database queries,

International Conference on Application of Information and

Communication Technologies (AICT),(2012).

Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.

ResearchGate has not been able to resolve any citations for this publication.

An improved query optimization process in big data using ACO-GA algorithm and HDFS map reduce technique

Article

Full-text available

Mar 2021
DISTRIB PARALLEL DAT

Storing as well as retrieving the data on a specific time frame is fundamental for any application today. So an efficiently designed query permits the user to get results in the desired time and creates credibility for the corresponding application. To avoid the difficulty in query optimization, this paper proposed an improved query optimization process in big data (BD) using the ACO-GA algorithm and HDFS map-reduce. The proposed methodology consists of ‘2’ phases, namely, BD arrangement and query optimization phases. In the first phase, the input data is pre-processed by finding the hash value (HV) using the SHA-512 algorithm and the removal of repeated data using the HDFS map-reduce function. Then, features such as closed frequent pattern, support, and confidence are extracted. Next, the support and confidence are managed by using the entropy calculation. Centered on the entropy calculation, the related information is grouped by using Normalized K-Means (NKM) algorithm. In the 2nd phase, the BD queries are collected, and then the same features are extorted. Next, the optimized query is found by utilizing the ACO-GA algorithm. Finally, the similarity assessment process is performed. The experimental outcomes illustrate that the algorithm outperformed other existent algorithms.

Clinical Decision Support System Query Optimizer using Hybrid Firefly and Controlled Genetic Algorithm

Article

Full-text available

Jun 2018

Abstract Two nature-inspired computing techniques, i.e. the Firefly and the controlled Genetic Algorithm in a restricted divergence environment has been combined to propose an improved Clinical Decision Support System (CDSS) query optimizer. The proposed model is intended to get better query execution plan which will detract the input-output, processing and communication lust for the execution of the CDSS queries. The amalgamated use of the proposed CDSS query optimizer framework will yield substantial variation in two consecutive generations which will efficaciously resolve the slow convergence problem of the controlled Genetic Algorithm. Furthermore, the results of the proposed CDSS query optimizer are tested against the effects of other genetic algorithm based CDSS query optimizers. It is experimentally found that the results produced using RDFG_CDQO are 13%, 10%, 7% and 3.5% better than the outcomes of the other CDSS query optimizers designed using simple, novel, restricted and entropy-based restricted genetic algorithm respectively. To get the best possible solution using RDFG_CDQO, the value of divergence rate should be set up to 50%. Furthermore, to statistically approve the proposed framework, the results obtained using RDFG_CDQO are validated using different measures of regression analysis viz. assumption of linearity, independence, and constant variance. Keywords Clinical DSS; Query optimization; Divergence; Controlled Genetic Algorithm; Firefly algorithm; Regression analysis; System resources

Particle Swarm Intelligence as a new heuristic for the optimization of distributed database queries

Conference Paper

Full-text available

Oct 2012

Particle Swarm Optimization (PSO) is a member of the nature inspired algorithms. Its ability to solve many complex search problems efficiently and accurately has made it an interesting research area. In this study, we model Distributed Database Query Optimization problem as a Bare Bones PSO and develop a set of canonical and hybrid PSO algorithms. To the best of our knowledge, this is the first time that Bare Bones PSO is being used for solving this problem. We explore and evaluate the capabilities of PSO against Iterative Dynamic Programming, and a Genetic Algorithm. We experimentally show that PSO algorithms are able to find near-optimal solutions efficiently.

Distributed Query Processing

Chapter

Jan 2020

By hiding the low-level details about the physical organization of the data, relational database languages allow the expression of complex queries in a concise and simple manner. In particular, to construct the answer to the query, the user does not precisely specify the procedure to follow; this procedure is actually devised by a module, called a query processor. This relieves the user from query optimization, a time-consuming task that is best handled by the query processor, since it can exploit a large amount of useful information about the data.

A general fragments allocation method for join query in distributed database

Article

Oct 2019
INFORM SCIENCES

The quality of fragments allocation is key for improving performance of join query in distributed database. Current strategies concentrate on using heuristic rules to allocate fragments to corresponding locations, such as picking the location with maximum required data or with greedy algorithm. Notwithstanding their benefits, under distributed environment, facing various query plans, different data distributions and expensive network cost, their scene-sensitive character may easily generate low quality allocation plan due to lack of generalization ability. In this paper, for breaking this limitation, we propose a general strategy for allocating fragments(AlCo, Allocate fragments based on Cost). AlCo evaluates multiple candidate allocation plans based on cost, which is realized by a modified genetic algorithm employed from PostgreSQL. Our fitness function (cost model) synthetically considers various changeable factors to support generalization ability. For reducing the risks caused by randomization of genetic algorithm, AlCo provides an upper bound computed through current heuristic methods to improve the robustness of our genetic algorithm. We implement AlCo in a distributed database system, and the experiments show that, on TPC-H benchmark, AlCo is up to 2x–4x better on performance than existing strategies and performs well in robustness and scalability.

Optimal Ant and Join Cardinality for Distributed Query Optimization Using Ant Colony Optimization Algorithm: Proceedings of ICETEAS 2018

Chapter

Jan 2019

Query Optimization Strategies in Distributed Databases

Article

Jan 2013

P. Tewari

Query Optimization in Oracle 12c Database In-Memory

Conference Paper

Aug 2015

Traditional on-disk row major tables have been the dominant storage mechanism in relational databases for decades. Over the last decade, however, with explosive growth in data volume and demand for faster analytics, has come the recognition that a different data representation is needed. There is widespread agreement that in-memory column-oriented databases are best suited to meet the realities of this new world. Oracle 12c Database In-memory, the industry’s first dual-format database, allows existing row major on-disk tables to have complementary in-memory columnar representations. The new storage format brings new data processing techniques and query execution algorithms and thus new challenges for the query optimizer. Execution plans that are optimal for one format may be sub-optimal for the other. In this paper, we describe the changes made in the query optimizer to generate execution plans optimized for the specific format – row major or columnar – that will be scanned during query execution. With enhancements in several areas – statistics, cost model, query transformation, access path and join optimization, parallelism, and cluster-awareness – the query optimizer plays a significant role in unlocking the full promise and performance of Oracle Database In-Memory.

A Survey of Distributed Query Optimization

Article

Jan 2005

The distributed query optimization is one of the hardest problems in the database area. The great commercial success of database systems is partly due to the development of sophisticated query optimization technology where users pose queries in a declarative way using SQL or OQL and the optimizer of the database system finds a good way (i. e. plan) to execute these queries. The optimizer, for example, determines which indices should be used to execute a query and in which order the operations of a query (e. g. joins, selects, and projects) should be executed. To this end, the optimizer enumerates alternative plans, estimates the cost of every plan using a cost model, and chooses the plan with lowest cost. There has been much research into this field. In this paper, we study the problem of distributed query optimization; we focus on the basic components of the distributed query optimizer, i. e. search space, search strategy, and cost model. A survey of the available work into this field is given. Finally, some future work is highlighted based on some recent work that uses mobile agent technologies.

Dynamic Cost Ant Colony Algorithm for Optimize Distributed Database Query

Jan 2020
170-181

S M Mohsin Darwish
A Younes

S.M. Mohsin Darwish, A. Younes, Dynamic Cost Ant Colony Algorithm for Optimize Distributed Database Query, Proceedings of the International Conference on Artificial Intelligence and Computer Vision, Vol. 1153, (2020), PP. 170-181.

To Review and Compare Evolutionary Algorithms in Optimization of Distributed Database Query

Figures

Recommended publications

A Wireless Sensor Network Topology Design Method Based on Negotiable Evolutionary Algorithm

Scaling in distributed evolutionary algorithms with persistent population

Accuracy improvement of connectivity-based sensor network localization

Query optimization in distributed database using hybrid evolutionary algorithm