PreprintPDF Available

To Review and Compare Evolutionary Algorithms in Optimization of Distributed Database Query

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Figures

Content may be subject to copyright.
978-1-7281-6939-2/20/$31.00 ©2020 IEEE
To Review and Compare Evolutionary Algorithms
in Optimization of Distributed Database Query
Mohammed Hussein Abdalla
Department of Computer Science
University of Raparin
Rania, Iraq
muhamad_it@uor.edu.krd
Murat Karabatak
Department of Software Engineering,
Firat University,
Elazig, Turkey
mkarabatak@firat.edu.tr
Abstract— Processing queries in distributed databases
require the conversion and transmission of data from local and
global sites. A query can be used to create a different executive
plan of the query, which these plans are equal in size, meaning
that they achieve the same result, but differ in the order in which
the operators are executed and how they are executed, and
therefore in their performance. The problem of query
optimization in the distributed database is NP-hard in nature
and difficult to solve.
Exploring all the query plans in large
search space is not feasible.
The purpose and motivation of this
paper are to examine the query challenges in the distributed
databases and algorithms for the executive plan of the query.
Keywords— Distributed Database, Search Strategy, Query
optimization, Evolutionary Algorithms
I. I
NTRODUCTION
A database is a set of organized data that is stored
according to specific rules and regulations. Distributed
Database is a set of databases that logically communicate with
each other and are distributed over a computer network to
improve the performance, credibility, availability, and
modularity of distributed systems. The distributed databases
are more reliable, more accessible, and had better
performance. The system increases file sharing and its security
and extensibility and increases the local independence of the
organization's databases, which is geographically distributed
in different locations by applying local policies to use the
databases [1]. To extract information from database tables and
analyze them, the query is used. The query is a useful way to
save data. A query selects the user's intended data from
different sources and determines how to combine this data [2].
With the development of database technology and the use
of distributed databases in computer networks, the issue of
query optimization in the decentralized database has been
raised. Query optimization in distributed environments has
more problems and complexities than centralized
environments. In a decentralized database, since the data is
geographically distributed in different locations, in addition to
the cost of input and output operations, other costs such as
processing cost and transmission cost on the network are also
considered in search optimization. Transmission cost is a cost
of transferring data obtained by a request from various sites to
a site from which the request is executed.
Data recovery from various resources is considered a
Distributed Query Plan (DQP) [3]. The specific purpose of
optimizing the distributed query is to determine the criteria of
the frameworks on other sites to reduce the total cost incurred
during the execution of a set of requests. By implementing this
method, the average time of request execution is reduced,
which is of great importance in a normal distributed group and
multimedia database systems. In this optimization, a strategy
is chosen that uses fewer resources and ultimately reduces its
running costs [4]. When a user has a query, it becomes
standard algebraic forms, and the optimizer looks for a series
of optimal executable programs for it. As the number of
relations required to process a query increases, so the number
of possible alternative solutions for query execution increases
exponentially. The optimizer has to search for a very large
space to run an optimal program for a query. In designing
optimization algorithms, a lot of attention is paid to the time
and cost required to run these programs. The selection of the
query executive plan with an increase in the number of
relations and joins in distributed database tables, is associated
with some problems which are presented as an NP-Hard
problem. To solve this problem, research has been done using
evolutionary algorithms, such as genetics and particle swarm.
But considering all the parameters of the problem and its
restrictions, as well as obtaining the optimal response, will be
a very time consuming and unacceptable issue. Therefore,
swarm intelligence methods such as ant colony were used to
solve this problem. One of the problems with this algorithm is
that by increasing the number of relations and the spread of
the database, the diversity of the algorithm will be reduced and
the algorithm will be converted to a local optimum. Therefore,
the combination of ant colony algorithm with other
evolutionary algorithms such as genetic algorithm was
proposed to solve the optimal query problem, and in further
research, meta-heuristic algorithms such as bee colony
algorithm were used. This algorithm is significantly capable
of finding high-quality responses by searching the response
space [5].
In the following, the structure of the paper has been
expressed. In the first part of this paper, the issue of optimizing
the query is explained and the basic steps of query
optimization in distributed databases are examined. In the
second part, the optimizer architecture and problem definition
are discussed. In the third part, we will examine how to
allocate data to different sites and implementation methods to
produce the optimal plan. In the next section, we will evaluate
the query optimization evolutionary algorithms, and finally
highlight the outstanding points for future research.
II. Q
UERY
O
PTIMIZATION IN
D
ISTRIBUTED
D
ATABASES
To reduce the number of input and output operations is
the main purpose of query optimization that database
management needs to do to recover data. This operation is
relatively time-consuming, so reducing the number of input
and output operations will speed up the execution of the
Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.
query. The optimizer works based on the value of each
achievement, and the database management checks the
amount of task required, then selects the plan that takes the
least amount of task. The optimizer evaluation is based on the
data that the database management stores about each table
and index. Therefore, this data used to evaluate the number
of input/output operations required to recover the intended
query records [6].
The distributed query processing is done in three phases
of local processing, simplification, and assembly, in the local
processing phase, selections are performed. The initial
specified algebraic query is converted into sections (data
analysis) and became available for initial processing in the
relevant sites. In the simplification phase, a reducer is used to
reduce the size of the relations, and a series of joins and semi-
joins, to minimize the amount of data, has been used, i.e., the
number of joins that need to be transferred in a cost-effective
way to implement a joint operation; and in the final
processing phase or result assembly, the relations are
collected in one place and the ending result of the query is
created and sent [7]. Query optimization in the distributed
database includes 4 steps:
A. Decomposition of Query
B. 2. Localization of Data
C. 3. Globally Optimization
D. Locally Optimization
The query analysis process involves fragmenting a
complex query into simpler algebraic terms, during which the
data is checked and verified.
After the data analysis process, the process of data
localization is done. Localization of the data means the
availability of the query data in local space to process it.
Overall optimization mainly involves determining the
best execution place for local sub-query. Much research has
been done on various factors involved in this area, such as
optimization algorithms, search space, implementation
strategies, and cost models for query optimization [8].
Fig. 1. Distributed Database Optimization Model[8]
According to the relational model in the databases, query
optimization leads to an increase in the performance of the
database management system. The most important goal in
optimization is the review, selection, and use of the most
appropriate method which can provide the result of the query
to the applicant in the shortest possible time. Usually, in a
database, the ability to insert, update, and delete exists, in
which these operations are less common than queries. The
cost of a strategy is the sum of the processing costs of each
operator. Among them, the most costly ones are the process
and optimize the join operator () and because the optimization
of query consisting of the join operator is much more time
consuming than other relational operators, like a select or
image operator. All of the query optimization algorithms is
about queries with join [9].
A query can be done through several methods, which
known as executive plans. These schemes have similar results
in terms of output but differ in terms of time and resource
consumption. Among the query execution plans, the query
optimizer selects a plan which has the lowest executive cost
in terms of time and resources [10]. Optimizer tasks are:
Determining the execution order of relational
operations.
Determining access methods for interrelated
relations.
Determining the relations for the join operation in a
given search space by implementing appropriate
search strategies so that the performance
measurement reports the results of the execution of
the appropriate and optimal optimization program.
Determine the order of data movements between
locations (join locations) so that it reduces the
amount of data and the cost of the communication
network.
Much research has been done on optimization query
algorithms using evolutionary algorithms in distributed
databases, and many researchers have addressed this issue.
III. S
EARCH
S
PACE
Search space is a set of alternative executable plans that
represent the input query. These plans are equal in size,
meaning that they achieve the same result, but differ in the
order in which the operators are executed and how they are
executed, and therefore in their performance. Search space is
obtained by applying transformation rules, such as relational
algebra [1]. Each database and the relation of its tables in a
distributed system can be presented by a graph, and any
executive plan of a query can be shown by a tree. For
example, in the following search:
Query 1: Select < attribute list> From F0, F1, F2, F3, F4
Where F0.A0 = F1.A0 and F1.A1 = F2.A1 and F2.A2 =
F3.A2 and F1.A3 = F3.A3
Figure 2 can be considered for the database on which this
query is executed.
Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.
Fig. 2. Database graph and how to connect it.
The executive plan of the query in this database is shown
in Figure 3.
Fig. 3. QEP query execution plan.
Figure 3, which shows the QEP executive plan, the
annotations represent the site where the join was performed
(Si), the order of the joins (Ji), the tables used for the join (Si,
Fi); and an arrow from the side of the reduction file is dragged
to the reduced file to show the reduction and reduced file. In
a distributed database that includes many relations and joins,
the number of QEPs increases exponentially due to the
relational and alternative feature of join operation (O (N!).
When N is the number of joins). Query optimizer requires a
large search space to produce optimal query plans which can
be linear trees (Left Deep Trees, Right Deep Trees, Bushy
Join Trees, Binary Trees or, Zig-Zag) (4).
Fig. 4. Forms of query join graphs [11].
As the number of joins and distributed data storage sites
increases, the solution space grows exponentially, and the
optimizer ability to deal with issues such as join order, join
method, reducing the size and cost of the query data became
a problem. Therefore, evolutionary algorithms such as ant
colony optimization, genetic algorithm, and particle swarm
optimization were studied to find the optimal and sub-optimal
solution for large join queries in the given search space
processed by relational and distributed databases. These
algorithms have been extremely successful due to their ability
to search globally, their strong nature, and their ability to
manage various combinational optimization problems.
IV. C
OST
F
UNCTION
The cost model is used to predict the cost of each
executive plan and compare equivalent plans to select the best
plan. To do this accurately, the cost model must have good
knowledge of the distributed executive environment by using
data statistics and cost functions.
In a distributed database, statistics are typically dependent
on components and include the cardinality of the components
and their size, and the number and size of the distinct values
of each feature. To minimize the possibility of error, more
accurate statistics are sometimes used, such as the histogram
of feature values in pursuit of higher management costs. The
accuracy of the statistics is obtained by periodic updates.
Proper cost measurement is the total cost of processing a
query. The total cost is the sum of the processing times of
query operators in different sites. Another accurate
measurement is the response time of the query, in which the
time is passed to perform the query. As operators can perform
executive operations in parallel on different sites, the
response time of a query may be significantly less than the
total cost of it.
In a distributed database system, the total costs to be
minimized include CPU, I/O, and communication costs. The
CPU cost is the cost that the CPU incurs when the operators
perform data on the main memory. The I/O cost is the time
required to access the disk. I/O cost can be minimized by
reducing the number of disk accesses through fast data access
methods and efficient use of main memory (buffer
management). The cost of communication is the time
required to exchange data between sites participating in the
query. This cost is generated in the processing of messages
(formatting /reformatting) and the transfer of data through the
communication network.
The cost component of communications is probably the
most important factor in a distributed database. Many initial
proposals for optimizing the distributed query assume that the
cost of communication is largely specific to local processing
cost (I/O and CPU cost), and therefore ignore the other factor.
However, modern distributed processing environments have
much faster communication networks whose bandwidth is
not comparable to disk. Therefore, the solution is to have a
weighted combination of these three components because
they all play a significant role in estimating the total cost of a
query.
V. E
VOLUTIONARY
A
LGORITHMS
Many types of research have been done to select the best
query plan to reduce communication costs. The basis of some
of this research is dynamic programming. But dynamic
programming is problematic in terms of scalability for large-
Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.
scale distributed databases. Therefore, evolutionary
algorithms were considered to overcome this problem.
Evolutionary algorithms have partially solved this problem
by dividing the search space. But the computational
complexity of these algorithms leads to increase response
time and offering the local optimum in some cases.
Therefore, it was suggested to use swam intelligence and
meta-heuristic algorithms to exit the local optimum. These
algorithms require less computational time in combination
with adaptive techniques to achieve the optimal solution.
Another advantage of using hybrid algorithms is the
avoidance of rapid convergence to local optimum, so the
accesses to the global optimum in these algorithms is more
likely. In the following, we will examine these studies. The
genetic algorithm has been used as an evolutionary algorithm
in the optimal selection of query executive plans. In this
algorithm, query executive plans are generated as problem
solutions, which n-compound of them representing a
chromosome Figure 5.
The chromosome set forms the primary population of this
algorithm. Then, among these chromosomes, using the
roulette wheel method, those are selected for the combination
which will play the role of parents in the genetic algorithm.
In the roulette wheel method, based on a cost fitness function,
chromosomes that have a lower communication cost and
processing cost, have a better chance of being selected [8].
Fig. 5. The structure of a chromosome.
In parental composition, some part of the parent
chromosome changes to create children based on the
midpoint method. Due to the validity degree of parent
chromosomes in terms of efficiency of query executive plans,
it is expected that children's executive plans will be
appropriate in terms of communication costs and processing
costs. This is also true in the mutation process. This algorithm
continues until the best query executable plans are presented
[12].
The ant colony algorithm is from the swarm intelligence
family which used to optimally selection of query executive
plans based on the shortest run-time [13].
The ACO algorithm models the behavior of true ant
colonies because they use the shortest path between their food
sources and nest. Ants can communicate with each other
through chemicals called pheromones and create and use a
path between the nest and the food source. Therefore, it is
likely that the shortest path has more pheromones, and ants
tend to return to the nest from this path. As the number of
relations increases in a query, more memory and processing
are used. Ant's marking behavior is used to guide them to
unexplored areas and search spaces, and to see all nodes
without knowing the topology of the graphs, therefore to find
optimal solutions for distributed database queries. These ants
calculate the execution time of the executable programs for a
given query and provide fast high-performance and optimal
results with the least amount of resources [14].
Particle swarm optimization (PSO) algorithm is a
universal minimization method that can be used to deal with
problems that those answers are a point or a surface in the n-
dimensional space. In such a space, assumptions are made
and an initial speed is assigned to them, and the
communication channels are considered between the
particles. These particles then move in the response space,
and the results are calculated based on a "merit criterion" after
each time interval. Over time, the particles accelerate toward
particles that have a higher merit criterion and are in the same
communication group [15].
At each step, the position of each particle is updated using
two values. The first is the best position ever achieved by the
particle. This position is called Pbest and the second case is
the best position ever obtained by the particle population and
is displayed as Gbest.
The new position of each particle is obtained from the
following equation:
󰇛1
󰇜
󰇛󰇜 (1)
Here, the last sentence specifies the speed (particle
displacement rate). The phrase is calculated as follows:


 󰇛󰇜
󰇛 󰇛󰇜󰇜 (2)
r1 and r2 values are random numbers in the range [0,1]
and c1 is the importance of the best state of the particle and
c2 is the importance of the best state of the neighbors. Many
parameters are involved in the execution of the PSO
algorithm, the proper setting of which greatly affects the
performance of the algorithm. These parameters are the
number of particles, the inertia coefficients [16].
Fitness function (in terms of time):
I/O cost + CPU cost + communication cost
These might have different weights in different
distributed environments (LAN vs. WAN)
Can also maximize throughput
Fig. 6. The structure of the Swarm Particle Algorithm.
Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.
Fig. 7. The structure of Fitness function [15].
In this algorithm, the quality of query execution plans of
the algorithms is evaluated not only from the perspective of
discovered plan quality but also from the perspective of
optimization times. PSO algorithms can provide high-quality
solutions like those of genetic algorithms. In the algorithms,
particles representing the solutions are moved according to a
probability distribution rather than by manipulating the
velocity parameter. This mechanism ensures a simple and
eective way to explore the search space. As new and better
global and local solutions are discovered, PSO algorithms
keep searching and can produce near-optimal quality
execution plans [17].
VI. C
ONCLUSION
The perfect realization of the combination of evolutionary
algorithms for query in distributed databases is still a new
field of study. Research is progressing to generate, execute,
and implement combinations of evolutionary algorithms to
solve a variety of optimization problems. The results of this
research have shown that the execution and implementation
of these evolutionary algorithms produce appropriate and
feasible solutions for the relational and distributed database
management system in which the size of the queries and the
number of joins is large. There are still many research
opportunities in the field of producing optimal solutions and
reviewing and modifying search strategies using
combinations of evolutionary algorithms for distributed
databases, especially when the size and complexity of the
effect of the relation on the number of query parameters.
R
EFERENCES
[1] M.T. Özsu, P. Valduriez, Distributed Query Processing, Principles of
Distributed Database Systems, cham,(2020), PP. 129-182.
[2] J. Gao, W. Liu &Z. Li, An adaptive strategy for statistics collecting in
a distributed database, Frontiers of Computer Science, Comput. Sc i. 14,
145610 (2020).
[3] J.Gaoa, W.Liua, Z.Lia, J.Zhang, L.Shen, A general fragments
allocation method for join query in a distributed database, Information
Sciences, Vol.512, (2020), PP. 1249-1263.
[4] S.M. Mohsin Darwish, A. Younes, Dynamic Cost Ant Colony
Algorithm for Optimize Distributed Database Query, Proceedings of
the International Conference on Artificial Intelligence and Computer
Vision, Vol. 1153, (2020), PP. 170-181.
[5] P.Tiwari and S.V. Chande, Optimal Ant and Join Cardinality for
Distributed Query Optimization Using Ant Colony Optimization
Algorithm. Emerging Trends in Expert Applications and Security.
Advances in Intelligent Systems and Computing, Vol. 841, (2019), PP.
385-392.
[6] D. Das, J. Yan, M. Zait, S.R. Vallur, N.Vyas, R. Krishnamachari, P.
Gaharwar, J Kamp, N.Mukherjee, Query optimization in Oracle 12c
database in-memory. Proceedings of the VLDB Endowment,
8(12),(2020),PP.1770–1781.
[7] Ms. Preeti Tiwari and S. V. Chande, Query Optimization Strategies in
Distributed Databases, International Journal of Advances in
Engineering Sciences, Vol.3 (3), (2013), PP.23-29.
[8] M.Sharma, G.Singh, R.Singh, Clinical decision support system query
optimizer using hybrid Firefly and controlled Genetic Algorithm,
Journal of King Saud University - Computer and Information
Sciences,(2018).
[9] J. Gao, W. Liu & Z. Li, An adaptive strategy for statistics collecting in
distributed database, Frontiers of Computer Science, Vol. 14, (2020),
[10] A. Aljanaby, E. Abuelrub, M. Odeh, Database Systems Performance
Analysis, A Survey of Distributed Query Optimization, The
International Arab Journal of Information Technology, Vol. 2, (2005),
PP.48-57.
[11] S. Padia, S. Khulge, A. Gupta, P. Khadilika, Query Optimization
Strategies in Distributed Databases, Proceedings of 2nd International
Conference on Emerging Trends in Engineering and
Management,(2013).
[12] S.Mirjalili, J. Song Dong, A.S. Sadiq, H. Faris, Genetic Algorithm:
Theory, Literature Review, and Application in Image Reconstruction,
Nature-Inspired Optimizers, Vol. 811, (2020), PP. 69-85.
[13] S.Mirjalili, J. Song Dong, A. Lewis, Ant Colony Optimizer: Theory,
Literature Review, and Application in AUV Path Planning, Nature-
Inspired Optimizers, Vol. 811, (2020), PP. 7-21.
[14] D. Kumar, V.K. Jha, An improved query optimization process in big
data using ACO-GA algorithm and HDFS map reduce
technique, Distrib Parallel Databases (2020).
[15] S.Mirjalili, J. Song Dong, A. Lewis,A.S. Sadiq, Particle Swarm
Optimization: Theory, Literature Review, and Application in Airfoil
Design, Nature-Inspired Optimizers, Vol. 811, (2020), PP. 167-184.
[16] M.Q. Yasin, X. Zhang,R. Haq, Z. Feng, S. Yitagesu , A Comprehensive
Study for Essentiality of Graph Based Distributed SPARQL Query
Processing, Database Systems for Advanced Applications,Vol
10829,(2018), PP. 156-170.
[17] T. Dokeroglu, U. Tosun, A, Cosar, Particle Swarm Intelligence as a
new heuristic for the optimization of distributed database queries,
International Conference on Application of Information and
Communication Technologies (AICT),(2012).
Authorized licensed use limited to: University of Exeter. Downloaded on June 18,2020 at 21:32:02 UTC from IEEE Xplore. Restrictions apply.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
Storing as well as retrieving the data on a specific time frame is fundamental for any application today. So an efficiently designed query permits the user to get results in the desired time and creates credibility for the corresponding application. To avoid the difficulty in query optimization, this paper proposed an improved query optimization process in big data (BD) using the ACO-GA algorithm and HDFS map-reduce. The proposed methodology consists of ‘2’ phases, namely, BD arrangement and query optimization phases. In the first phase, the input data is pre-processed by finding the hash value (HV) using the SHA-512 algorithm and the removal of repeated data using the HDFS map-reduce function. Then, features such as closed frequent pattern, support, and confidence are extracted. Next, the support and confidence are managed by using the entropy calculation. Centered on the entropy calculation, the related information is grouped by using Normalized K-Means (NKM) algorithm. In the 2nd phase, the BD queries are collected, and then the same features are extorted. Next, the optimized query is found by utilizing the ACO-GA algorithm. Finally, the similarity assessment process is performed. The experimental outcomes illustrate that the algorithm outperformed other existent algorithms.
Article
Full-text available
Abstract Two nature-inspired computing techniques, i.e. the Firefly and the controlled Genetic Algorithm in a restricted divergence environment has been combined to propose an improved Clinical Decision Support System (CDSS) query optimizer. The proposed model is intended to get better query execution plan which will detract the input-output, processing and communication lust for the execution of the CDSS queries. The amalgamated use of the proposed CDSS query optimizer framework will yield substantial variation in two consecutive generations which will efficaciously resolve the slow convergence problem of the controlled Genetic Algorithm. Furthermore, the results of the proposed CDSS query optimizer are tested against the effects of other genetic algorithm based CDSS query optimizers. It is experimentally found that the results produced using RDFG_CDQO are 13%, 10%, 7% and 3.5% better than the outcomes of the other CDSS query optimizers designed using simple, novel, restricted and entropy-based restricted genetic algorithm respectively. To get the best possible solution using RDFG_CDQO, the value of divergence rate should be set up to 50%. Furthermore, to statistically approve the proposed framework, the results obtained using RDFG_CDQO are validated using different measures of regression analysis viz. assumption of linearity, independence, and constant variance. Keywords Clinical DSS; Query optimization; Divergence; Controlled Genetic Algorithm; Firefly algorithm; Regression analysis; System resources
Conference Paper
Full-text available
Particle Swarm Optimization (PSO) is a member of the nature inspired algorithms. Its ability to solve many complex search problems efficiently and accurately has made it an interesting research area. In this study, we model Distributed Database Query Optimization problem as a Bare Bones PSO and develop a set of canonical and hybrid PSO algorithms. To the best of our knowledge, this is the first time that Bare Bones PSO is being used for solving this problem. We explore and evaluate the capabilities of PSO against Iterative Dynamic Programming, and a Genetic Algorithm. We experimentally show that PSO algorithms are able to find near-optimal solutions efficiently.
Chapter
By hiding the low-level details about the physical organization of the data, relational database languages allow the expression of complex queries in a concise and simple manner. In particular, to construct the answer to the query, the user does not precisely specify the procedure to follow; this procedure is actually devised by a module, called a query processor. This relieves the user from query optimization, a time-consuming task that is best handled by the query processor, since it can exploit a large amount of useful information about the data.
Article
The quality of fragments allocation is key for improving performance of join query in distributed database. Current strategies concentrate on using heuristic rules to allocate fragments to corresponding locations, such as picking the location with maximum required data or with greedy algorithm. Notwithstanding their benefits, under distributed environment, facing various query plans, different data distributions and expensive network cost, their scene-sensitive character may easily generate low quality allocation plan due to lack of generalization ability. In this paper, for breaking this limitation, we propose a general strategy for allocating fragments(AlCo, Allocate fragments based on Cost). AlCo evaluates multiple candidate allocation plans based on cost, which is realized by a modified genetic algorithm employed from PostgreSQL. Our fitness function (cost model) synthetically considers various changeable factors to support generalization ability. For reducing the risks caused by randomization of genetic algorithm, AlCo provides an upper bound computed through current heuristic methods to improve the robustness of our genetic algorithm. We implement AlCo in a distributed database system, and the experiments show that, on TPC-H benchmark, AlCo is up to 2x–4x better on performance than existing strategies and performs well in robustness and scalability.
Conference Paper
Traditional on-disk row major tables have been the dominant storage mechanism in relational databases for decades. Over the last decade, however, with explosive growth in data volume and demand for faster analytics, has come the recognition that a different data representation is needed. There is widespread agreement that in-memory column-oriented databases are best suited to meet the realities of this new world. Oracle 12c Database In-memory, the industry’s first dual-format database, allows existing row major on-disk tables to have complementary in-memory columnar representations. The new storage format brings new data processing techniques and query execution algorithms and thus new challenges for the query optimizer. Execution plans that are optimal for one format may be sub-optimal for the other. In this paper, we describe the changes made in the query optimizer to generate execution plans optimized for the specific format – row major or columnar – that will be scanned during query execution. With enhancements in several areas – statistics, cost model, query transformation, access path and join optimization, parallelism, and cluster-awareness – the query optimizer plays a significant role in unlocking the full promise and performance of Oracle Database In-Memory.
Article
The distributed query optimization is one of the hardest problems in the database area. The great commercial success of database systems is partly due to the development of sophisticated query optimization technology where users pose queries in a declarative way using SQL or OQL and the optimizer of the database system finds a good way (i. e. plan) to execute these queries. The optimizer, for example, determines which indices should be used to execute a query and in which order the operations of a query (e. g. joins, selects, and projects) should be executed. To this end, the optimizer enumerates alternative plans, estimates the cost of every plan using a cost model, and chooses the plan with lowest cost. There has been much research into this field. In this paper, we study the problem of distributed query optimization; we focus on the basic components of the distributed query optimizer, i. e. search space, search strategy, and cost model. A survey of the available work into this field is given. Finally, some future work is highlighted based on some recent work that uses mobile agent technologies.
Dynamic Cost Ant Colony Algorithm for Optimize Distributed Database Query
  • S M Mohsin Darwish
  • A Younes
S.M. Mohsin Darwish, A. Younes, Dynamic Cost Ant Colony Algorithm for Optimize Distributed Database Query, Proceedings of the International Conference on Artificial Intelligence and Computer Vision, Vol. 1153, (2020), PP. 170-181.