ArticlePDF Available

Multiple Users Replica Selection in Data Grids for Fair User Satisfaction: A Hybrid Approach

Authors:
  • Universal Universities, Earth
1
Multiple Users Replica Selection in Data Grids for Fair User Satisfaction:
A Hybrid Approach
Ayman Jaradata,*, Hitham Alhussianb, Ahmed Patelc, and Suliman Mohamed Fatid
aComputer Science Department, College of Science and Human Studies at Hotat Sudair, Al Majmaah University,
Hotat Sudair, Majmaah 11952 - SA
[ay.jaradat@mu.edu.sa]
bCenter for Research in Data Science (CREDAS), Institute of Autonomous Systems, Universiti Teknologi PETRONAS,
32610 Bandar Seri Iskandar, Perak, Malaysia
[seddig.alhussian@utp.edu.my]
c Computer Networks and Security Laboratory (LARCES), State Université of Ceara (UECE), Fortaleza, Brazil
[whinchat@gmail.com]
d College of Computer and Information Sciences, Prince Sultan University
11586, Riyadh - SA
[smfati@yahoo.com]
*Corresponding author: Ayman K. Jaradat
Computer Science Department, College of Science and Human Studies at Hotat Sudair, Al Majmaah University,
Hotat Sudair, Majmaah 11952 - SA
[ay.jaradat@mu.edu.sa]
Abstract
Replica selection in data grids aims to select the best replica location based on the quality-of-service parameters preferred by the
user. This choice is important because of the limited number of available data resources in comparison with the large number of
users. Typically, user requests are fulfilled in a first-in, first-out manner. This may satisfy the users at the beginning of the queue
more than those at the end. Better results can be achieved by considering the requests of all users simultaneously, thereby leading to
a higher level of overall satisfaction; however, this is a difficult task because it requires a vast order of magnitude to search through
a huge set of users. Therefore, in this study, the proposed combination of hybrid of the genetic algorithm and user-preference
algorithm is used to overcome this problem. The results overwhelmingly verify that the proposed hybrid approach outperformed
previously known used methods significantly.
Keywords: Data grid, Replica Selection, Fair user satisfaction, Multi-objective decision, Quality-of-service (QoS) parameters,
Genetic Algorithm
1. Introduction
Data grids are special types of distributed systems that consist
of geographically discrete resources without any centralised
control. The capability of data grids to offer large-scale sharing
of internet or cloud-accessible data differentiates them from
traditional distributed computing. Applications that deal with
terabyte- and petabyte-scale data repositories are emerging as
critical resources for users. The data-grid infrastructure
potentially needs to support thousands of users, especially
those scientists who collaborate globally with a community of
users scattered across different geographical locations [1]. It is
evident that one virtual organisation alone may not be enough
to manage the massive volumes of data produced by
experiments and simulations. The exponential growth of data-
intensive applications has given rise to a new research field for
computer scientists and researchers to develop efficient
techniques and algorithms for scientific applications that
require an immense amount of data to be accessed, stored,
transferred, analysed, and replicated in geographically
distributed locations [2].
Replication and distribution of data among various grid sites
are required to address the need for increased data accessibility,
reliability, and availability. Replicated data requires a process
that involves the selection of replica locations from numerous
locations based on quality-of-service (QoS) parameters and
user preferences. This process is referred to as replica selection.
It is reported [3] that in data grids, jobs that are processing
typically fail because of hardware malfunction, infected
software, network failures, overloaded resources, or non-
availability of required resources. Furthermore, replicas of the
same dataset may have location-dependent access costs,
https://doi.org/10.1016/j.csi.2020.103432
Final Version in Computer Standards & Interfaces
Volume 71, August 2020, 103432
DRAFT VERSION
2
similar to tangible goods in actual economies [4]. Some users
attempt to optimise mapping to the required replica based on
the costs of their network operators or hosting services,
especially if a 95th-percentile billing mechanism is imposed
for the services [5]. Cost minimisation can be achieved by
decreasing the frequency of peak consumption or by not
exceeding the budgeted rates.
Replica-selection algorithms proposed in previous studies [5–
12] have attempted to satisfy user requests in a greedy manner
without considering the other users in the queue. This can lead
to the users at the beginning of the scheduling queue being
satisfied at the expense of those at the end of the queue. This
may lead to unfair or disproportionate user satisfaction. Fair
satisfaction, in the context of this study, implies an equal level
of QoS fulfilment for all the users in the queue. For example, if
a user pays $10 and receives services worth $9 translating to
90% satisfaction, then another user who pays $1,000 must also
receive services worth $900, thereby effectively obtaining 90%
satisfaction on the relative scale of satisfaction. In other words,
ensuring equality in terms of user satisfaction is essential to
meet fair QoS fulfilment requirements.
In this study, the replica-selection problem is introduced as a
multi-criterion decision-making problem. A global
optimisation algorithm is proposed to attain optimal efficiency
and fair satisfaction for all grid users. Specifically, the main
contributions of this study are to provide a detailed formulation
of the problem, the design of the solution, and the application
of a genetic algorithm (GA)-based solution.
The remainder of this paper is organised as follows: Section 2
describes the related work necessary to clarify the problem and
the development of approaches to replica selection. Section 3
presents the problem formulation and includes explanatory
examples. Section 4 focuses on the importance of GA as a
proposed solution. Section 5 describes the mathematical
modelling of the problem. Section 6 explains the system
design. Section 7 presents the performance metrics and
evaluation. Section 8 includes the results and discussion.
Finally, Section 9 presents the conclusions of the study.
2. Related Work
A problematic replica-selection algorithm that integrates QoS
parameters and transfer speeds has been investigated by many
researchers [6–8, 13]. A novel approach, called D-system [7],
has been previously proposed; it unifies three QoS parameters
(time, availability, and security) in the selection process. D-
system offers a method in which all the grid sites holding the
requisite replica are evaluated by unifying the three QoS
parameters into a single value. The time parameter refers to the
estimated time required to move the replica from the selected
site. The availability parameter is the ratio of the permitted time
declared by the site that hosts the replica to access its resources
to the anticipated time required to move the replica from that
grid site; this parameter is an element of the local policies of a
site that define the times allowed for serving others, sometimes
on weekends or after midnight. In a recent study [14],
availability was defined as a measure of the execution time and
bandwidth consumption of data replication to achieve
optimisation. The security parameter consists of the reputation,
capabilities, reliability, and self-protection of a site. In D-
system, the three parameters of a site are based on their values
before the replica location is selected. Each parameter is
assigned a value between 0 and 100. The next step is to use an
ideal site with a rating of 100 for each QoS parameter. Next, the
Euclidean distance between the potential and the ideal sites is
measured. The QoS metric is the distance between the ideal and
the candidate sites. This metric integrates time, availability,
and security and is abbreviated as TAS. The lowest value of
TAS characterises the best site.
D-system adopts a fixed-value ideal node that holds the values
(100, 100, 100) for time, availability, and security as a base for
rating the performances of the sites. However, it has some
limitations that are best illustrated in Figure 1 (which uses two
attributes rather than three) and in Figure 2 (which depicts the
use of three attributes).
Figure 1: D-system Utilizing 2 Attributes
Figure 2: D-system Utilizing 3 Attributes
In Figure 1, B characterises a site rated 50% for time and 50%
for security. Nodes M, B, L, and N are equidistant from the
ideal value and are given the same rate by D-system. Thus, one
of them is chosen randomly. The main limitation of D-system
is that one of the sites that fall within the diameter of the quarter
circle is chosen randomly. Moreover, D-system does not
consider site E even if the priority of the user or the application
is time. In this case, D-system must not consider availability
and security; however, it still calculates the rating based on all
the three parameters. Similarly, this limitation affects the rating
of site F if the primary concern of the user is security. A similar
scenario is observed in Figure 2, where all the nodes (B, L, M,
and R) fall on the surface of a one-eighth sphere and are rated
the same by D-system. Furthermore, nodes that are vital for
some users, such as E, K, and S, are overlooked. The
3
limitations of D-system have inspired researchers [7] to
propose a smart replica selection in data grids (RSDG) strategy
to address such limitations. The authors argued that the
limitations of D-system lie in the use of numerical attributes,
and they discuss whether it is better if linguistic attributes
(expressed in grey numbers) are used. To prevent replicas with
the same attribute values, grey numbers are typically used for
an exact number expression.
In their research, the authors used manually derived decision
attributes to formulate the decision table according to prior
observations of the performances of the data grid sites. In
another research [8], the authors improved the RSDG strategy
to allow it to function even in the absence of decision attributes
(historical replica information). However, they were not
concerned with addressing all the requests in the queue to
achieve fair user satisfaction for all the users. Furthermore, in
[15], the authors proposed a replica-selection strategy that
integrates three QoS parameters: time, security, and reliability,
in which a multi-criteria decision-maker is used to select the
best replica. The main objective of this system is to achieve fair
resource allocation amongst users, based on the above-
mentioned QoS parameters, to satisfy them. This approach has
been implemented based on a multiple-criteria decision maker
known as the analytical hierarchy process (AHP) [16] that
relies entirely on the historical average of the QoS attained by
users. For example, if history indicates that security is the least
important QoS parameter for a user, then means that the best
replica for that user is the one that has the highest security
rating. The first drawback of the system is that it targets a
uniform fairness based on historical information, and it does
not consider user preferences. It aims to deliver the same
portions of QoS to all the users, irrespective of what they desire
or pay for. Second, it considers the average for the users;
irrespective of the number of requests already made (the user
who requested 100 replicas is treated similarly to the one who
requested 1,000 replicas). Third, it performs the selection one
by one, which may result in better resource allocation based on
the comparison between the QoS for users at the beginning of
the queue and those for users at the end of the queue. Fourth, it
does not address site availability, and it is built under the
assumption that users are competing for limited data resources.
Moreover, using this approach, researchers have defined
security as an additional level to protect the data files (which,
typically already secured by the grid toolkits) from
unauthorised users. In fact, relying on the history of events may
prevent the system from selecting the best site because the
decision maker aims at balancing the QoS rather than
improving all the parameters.
The authors of [17] proposed a replica-selection process that
focused on grouping popularity and affinity files as the most
important parameters in selecting replicas to improve data
availability. However, this method relies on only one user’s
decision. Further, the authors of [18] proposed a new replica-
selection algorithm based on the ant-colony model, in which
the replica choice metrics (replica node network bandwidth,
network delay, etc.) are drawn by the ant-colony pheromone.
Their technique first optimises the ant-colony algorithm using
an initialisation formula to make each copy have its own initial
prime value. Next, by prioritising replica nodes on P2P, it
reduces the use of cloud replica nodes and the cost of service
providers. However, the ant-colony algorithm is a very slow
process because it goes through many repetitions to achieve the
best results. Moreover, the decisions are only based on one user
each time, resulting in unfair treatment of users. Another recent
study [19] combines replica creation with replica selection, in
which a dynamic replica-creation algorithm is based on hot
(high demand for access) files, and the replica-selection
algorithm is based on site service capability. The replica-
creation strategy is based on access heat that considers hot data
and guarantees that the demand for hot replicas in the next
cycle is met. In this replica-selection algorithm, the service
capability of the hosting site is considered; therefore, the load
capacity of the replica site and network distance between the
replica site and user are adequately addressed. This is because
the data transferred between the user and the site in term of
network bandwidth usage has a significant impact on the
reading speed of the user. However, this strategy only focuses
on bandwidth as the central aspect while ignoring other
important factors such as security, cost, availability, and fair
treatment of users.
The authors of [10] enhanced D-system by choosing the site
that has almost equal portions of each QoS parameter as a
balanced solution without addressing all the requests in the
queue to ensure good satisfaction for each user. Next, they
enhanced their work by proposing the user-preference
algorithm (UPA) [11], in which user preferences are
considered in the selection process, and the stock market model
is adopted to achieve equality among users. The user’s
preferred QoS (UPQ) was used as a performance metric, where
UPQ is the Euclidean distance between the desired and the
granted quality rates. For example, if the user orders a 90%
security rate and a 70% security rate is assigned; the distance
would be 20 UPQ. The smallest value of UPQ corresponds to
the best quality. Nevertheless, this approach individually
addresses each request without considering all the requests in
the queue to achieve good satisfaction for each user.
Furthermore, the authors of [20] proposed the use of dynamic
reliability and availability calculation to select the best replica.
Herein, reliability refers to how long the system has been
continuously active, and availability refers to the percentage of
time that the system is in operational mode. They argued that,
although reliability and availability are different, there is a
relationship between them. Reliability measures the time
between failures occurring in the system. Reliability and
availability are calculated and used to make the selection
decision for each request, without considering all the requests
in the queue to achieve good satisfaction for each user. Further,
the authors of [21] proposed centralised dynamic scheduling
strategy-replica placement strategies (CDSS-RPS) to minimise
the implementation cost and data-transfer time. CDSS-RPS
attempts to enhance the replica access time using replication,
based on a number of factors including the number of accesses,
storage capacity of the node, and response time of a nominated
replica. Similarly, this approach does not pay consider other
requests in the queue. The authors of [22] proposed a replica-
selection algorithm based on a three-criteria set, in which the
criteria were reliability, security, and response time. The best
replica was the one that produced the highest quality in the
minimum response time. They used a random process and
AHP, but their approach considers the replica requests one by
one based on requests that were initiated first.
As the existing literature demonstrates, no previous research
has addressed all the requests in the queue simultaneously to
achieve good satisfaction for each user. Furthermore, even
though the GA has been used in grid-job scheduling [23–25],
none of the previous researchers used it specifically in replica
selection. This research addresses the problem of replica
selection that involves making an important decision. The
solution focuses on the fair satisfaction of grid users to deliver
4
their required replicas with certain levels of security, time
expenditure, availability, and cost that are based on their
individual preferences.
In this study, a new multi-user genetic-based replica-selection
algorithm for data grids is proposed. The GA is a meta-
heuristic search method that enables huge solution spaces to be
heuristically searched using evolutionary methods observed in
nature [26]. Four QoS parameters are adopted in this study:
time (T), availability (A), security (S), and cost (C). Each of
these is rated on a scale of 100, where higher values correlate to
higher quality. The four values for each grid site are integrated
into one value. The process of defining and calculating these
QoS values was addressed in previous studies [11, 12, 27].
3. Problem Formulation
The proposed model involves pairing users with replica
locations in which each user must be assigned to only one
location. Each location has its own specifications, and each
user also has specific preferences. The assignment must be as
fair as possible in satisfying all users in a single batch
(scheduling cycle). For example, suppose there are four sites
with different QoS rates and four users with different QoS
requirements. In this study, one QoS parameter, namely replica
transfer speed, was used. If the transfer speed for the best site is
20 Gbps, then it will be rated as 100% available, and another
site with 15 Gbps transfer speed will consequently be rated as
available. However, the availability of
(15
20
×
100)
75%
the best site can reduce over time owing to many connections
occurring simultaneously, and this can result in another site
with a lower rate becoming the new best site.
As shown in Table 1, the first user requested a transfer speed
rated at 80%. Meanwhile, the best site at that time was rated at
80%. In this case, the user will be 100% satisfied because the
results are identical . However, this
(80
80
×
100
100)
pairing will degrade the rating of site 1 because some of the
bandwidth and slices of the storage system will be used.
Subsequently, the second user requested a 90% rated site but
was assigned a 70% rated site ; thus, the
(70
90
×
100
77)
level of satisfaction was 77%. Requesters 3 and 4 experienced
similar scenarios.
Table 1: Pairing Requests to Sites in an Arbitrary Order.
Site Number
Site 1
Site 2
Site 3
Site 4
Site’s Rate
80
70
65
50
Request Number
1
2
3
4
User's Requested Rate
80
90
60
70
User's Satisfaction
100%
77%
100%
71%
The above example demonstrates that there is an inefficient
allocation of resources, despite the relatively high level of
satisfaction. The satisfaction average is (100 + 77 + 100 + 71) /
4 = 87%. The standard deviation (SD) of such an allocation is
0.13, which is relatively unfair because it means that the user
satisfaction is highly variable. However, reordering the
requests in the queue to select the best replica can increase both
satisfaction level and QoS fairness.
In Table 2, the requests are reordered before the matching
process begins. If we start with a 90% requester and the best
available site is rated at 80%, the resulting satisfaction will be
88.87%. Consequently, requester 2, who has a target rate of
80%, will be granted the site rated 70%, resulting in 87.5%
satisfaction. The results of the third and fourth requests are
illustrated in Table 2.
Table 2: Pairing Requests to Sites in A Managed Order.
Site Number
Site 1
Site 2
Site 3
Site 4
Site’s Rate
80
70
65
50
Request Number
1
2
3
4
Requested Rate
90
80
70
60
Requester
Satisfaction
88.87%
87.5%
92.85%
83.33%
Reordering the selection slightly increases the average
satisfaction level (88.87 + 87.5 + 92.85 + 83.33) / 4 = 88.14%
and significantly increases the SD to 0.028. This indicates that
the satisfaction level has increased because it means that there
is only a slight variation in user satisfaction. This example
demonstrates that reordering the requests to make a selection
can yield a more equitable satisfaction for all users. Applying
this technique to one parameter is not difficult, but
simultaneously considering four parameters (time, availability,
security, and cost) is a very challenging task. For example, if
there are three sites and three requests, there are six different
solutions. The mathematical problem becomes one of
permutations without repetition, i.e., . However,
n!
(
n
r
)
!
each solution yields different total values and SDs of the UPQs.
Table 3 presents the result of each permutation.
Table 3: Different Pairing Requests to Sites.
Site
Number
Site 1
Site 2
Site 3
Requests Number
Site’s Rate
70,60,40,60
65,70,60,55
55,65,60,40
Total UPQs
SD
Total UPQs + SD
User
1
User
2
User
3
1st order
70,70,60,60
75,75,80,60
80,60,60,70
1
UPQs
22
23
39
84
7.8
91.8
2nd order
User 1
User 3
User 2
2
UPQs
22
23
36
81
6.4
87.4
3rd order
User 2
User 1
User 3
3
UPQs
43
7
39
89
16.1
105.1
4th order
User 2
User 3
User 1
4
UPQs
43
23
25
91
9
100
5th order
User 3
User 1
User 2
5
UPQs
24
7
36
67
11.9
78.9
6th order
User 3
User 2
User 1
6
UPQs
24
23
25
72
0.8
72.8
For example, site 1 is rated 70% for time, 60% for availability,
40% for security, and 60% for cost, and user 1 requested 70%
for time, 70% for availability, 60% for security, and 60% for
cost. The Euclidean distance between site 1 and user 1’s
request is calculated as follows:
5
2
(
70
70
)
2
+
(
60
70
)
2
+
(
40
60
)
2
+
(
60
60
)
2
=
22 UPQ
Similarly, the Euclidean distance between site 2 and user 2’s
request is 23 UPQ, and the distance between site 3 and user 3’s
request is 39 UPQ. The total of all the UPQs is 22 + 23 + 39 =
85. A smaller UPQ indicates better quality and higher user
satisfaction. Pairing the sites with the users in a different order
(permutation) leads to a lower total UPQ (TUPQ) value. As
evident in row 5 of Table 3, when user 3 is paired with site 1,
user 1 with site 2, and user 2 with site 3, the TUPQ value is 67,
which is significantly lower than those for other combinations,
making this the best combination in terms of UPQ; however, in
terms of fairness, we can see that the UPQs are 24, 7, and 36 for
users 1, 2, and 3, respectively, which indicates that there are
significant differences in user satisfaction levels. The SD of
these three values (24, 7, and 36) is 11.9, which is very high. In
contrast, row 6 has the second-lowest TUPQ value of 72, with
individual UPQ values of 24, 23, and 25. These values (24, 23,
and 25) are more similar to each other, and this provides more
fairness for the users. Even though 67 is lower than 72 and
therefore better, the individual UPQs that make up a total of 72
exhibit less variation in user satisfaction, with an SD of only
0.8, making 72 the better option.
This example simplifies the problem. However, in real-life
scenarios, there are thousands of sites, and only ten are selected
each time, creating a very large search space. For example, to
select 10 out of 1,000 sites, the total number of permutations is
8.26 × 1059, and this requires the use of a solid algorithm.
4. Permutation and Genetic Algorithm
Evolutionary algorithms are typically used in optimisation
problems. For example, the authors of [28] used a GA to
facilitate QoS-based selection. However, this GA represents a
stochastic optimisation that is very useful for permutations as
reported in [29] owing to the following advantages:
GAs is not prone to being stuck in the local optima if the
attributes are prepared appropriately. This is a well-
known advantage of stochastic optimisation that is
particularly useful for permutation corrections as all the
criteria have strong local minima.
GAs demonstrate faster convergence in comparison to
other stochastic optimisation algorithms, specifically for
problems with a wide-dimensioned space [30].
GAs is very suitable for discrete optimisation because
they naturally use binary series to present the solution.
GAs are efficient for multi-objective genetic
optimisation because they enable the entire multi-
objective optimum solution set to be evolved in parallel
[31].
5. Mathematical Modelling
Let be a set of data grid sites and
𝑍
{
𝑔
1
,
𝑔
2
,
𝑔
3
….
𝑔
𝑛
}
𝑈
a set of users such that . Each site and
{
𝑢
1
,
𝑢
2
,
𝑢
3
….
𝑢
𝑚
}
𝑚
𝑛
.
𝑢
𝑖
user has four parameters (T, A, S, and C). Each parameter
value is between 0 and 100. Each parameter value for each grid
site represents its rate based on its performance, and for each
user, it represents the preferences.
Each user is assigned to one grid site , and the assignment
𝑢
𝑖
𝑔
𝑗
is denoted by equation (1):
(1)
𝑅
𝑖𝑗
{
0,1
}
,
𝑖
=
1,2,…
𝑚
,
𝑗
=
1,2,…
𝑛
To guarantee that each user is assigned to only one grid site
𝑢
𝑖
, equation (2) must be satisfied:
𝑔
𝑗
(2)
𝑛
𝑗
=
1
𝑅
𝑖𝑗
=
1
,
𝑖
=
1,2,…
𝑚
5.1 Objective Function
The satisfaction model adopted in this research is measured by
the Euclidean distance between user and grid site and is
𝑢
i
𝑔
j
denoted by equation (3):
(3)
𝒅
(
𝒖
𝒊
,
𝒈
𝒋
)
=
(
𝑻
𝒖
𝒊
𝑻
𝒈
𝒋
)
𝟐
+
(
𝑨
𝒖
𝒊
𝑨
𝒈
𝒋
)
𝟐
+
(
𝑺
𝒖
𝒊
𝑺
𝒈
𝒋
)
𝟐
+
(
𝑪
𝒖
𝒊
𝑪
𝒈
𝒋
)
𝟐
The shortest distance is considered to be the best. Hence, the
objective functions are created by maximising the levels of
preference of each user in the required grid site. Satisfaction
maximisation is achieved by minimising the Euclidean
distance between each user i and its assigned grid site j.
However, minimising one user’s distance (increasing
satisfaction) may lead to increasing others’ distances
(decreasing satisfaction). Therefore, the solution must be a
trade-off between the satisfaction levels of users.
Now, let us assume that equation (4) is a general vector
representing all decision variables as follows:
(4)
𝑚
𝑖
=
1
𝑑
(
𝑢
𝑖
,
𝑔
𝑗
)
,
𝑗
1,2,3,…
𝑛
Then, the following objectives are necessary:
1. Minimise the total Euclidean distances between the users
and the grid sites as denoted by equation (5):
(5)
𝑀𝑖𝑛
[
𝑚
𝑖
=
1
𝑛
𝑗
=
1
𝑅
𝑖𝑗
𝑑
(
𝑢
𝑖
,
𝑔
𝑗
)
]
2. Minimise the SD between user distances as denoted by
equation (6). This implies that the users shall obtain
similar distances, which guarantees similar satisfaction
levels of users:
(6)
𝑀𝑖𝑛
[
𝑆𝐷
[
𝑚
𝑖
=
1
𝑛
𝑗
=
1
𝑅
𝑖𝑗
𝑑
(
𝑢
𝑖
,
𝑔
𝑗
)
]
]
Therefore, the objective of this study is to minimise the total
distances among the user preferences and grid site rates while
minimising the SD between user distances. This means that the
solution is using multi-objective decision making with two
parameters.
5.2 Scalarisation
Scalarisation refers to the consolidation of various objectives
into one in a manner that repetitively sorts out the single-
objective optimisation problem with different parameters. This
can allow the researchers to obtain all optimal solutions for the
preliminary multi-objective problem. Many scalarisation
methods have been established [32]. To simplify the multi-user
approach of this study, scalarisation was adopted as the best
solution, and it is denoted by equation (7):
𝑀𝑖𝑛
,
[
𝛼
𝑆𝐷
[
𝑚
𝑖
=
1
𝑛
𝑗
=
1
𝑅
𝑖𝑗
𝑑
(
𝑢
𝑖
,
𝑔
𝑗
)
]
β
𝑚
𝑖
=
1
𝑛
𝑗
=
1
𝑅
𝑖𝑗
𝑑
(
𝑢
𝑖
,
𝑔
𝑗
)
]
(7)
6
where are used by the data grid administrator to scale
𝛼
and
𝛽
up (or down) the SD or the total distance value, based on the
experienced observations and their effects.
6. System Design
The architecture of the data-grid service is divided into two
levels. The upper level includes high-level services that use
low-level or core services. The replica-selection–optimisation
technique is a high-level service; thus, it invokes a number of
core services. Information of an individual resource, or a set of
resources, is collected and maintained by a grid-resource
information service (GRIS) daemon [33]. GRIS is designed to
gather and announce system-configuration metadata describing
that storage system. For example, each storage resource in the
Globus data grid [34] incorporates a GRIS to circulate its
information. Typically, GRIS provides information of
attributes such as storage capacity, seek times, and descriptions
of site-specific policies governing storage-system usage. Some
attributes are dynamic and vary with several frequencies, such
as total space, available space, queue waiting time, and mount
point. Others, such as disk transfer rate, are static.
The new approach, i.e., multiple-user replica-selection hybrid
approach (MRH) is illustrated in Figure 3. It functions by
receiving user requests via the grid resource broker (RB). Then,
the RB retrieves related physical file names and locations from
the replica location service (RLS). Subsequently, the algorithm
receives information of the sites that hold the replicas and their
network status from the GRIS such as network weather service
(NWS) [35], meta-computing directory service [36], and grid
file transfer protocol [36]. Further, the algorithm receives
security ratings for each replica location from the grid manager
and receives availability and cost information from the log files
of each replica. Next, each replica location is rated, and the new
algorithm matches the user requests with the best replica
location in a manner that ensures fair user satisfaction.
Figure 3: Data Access Overview in MRH
6.1 Genetic Algorithm
The GA is a meta-heuristic search method that enables huge
solution spaces to be heuristically searched using evolutionary
methods observed in nature. It is based on iterations and fitness
function. During each iteration cycle, the fitness value of each
individual in the population is evaluated systematically. Next,
the reproduction operations and selections are implemented to
produce a new population that is used again in the following
iteration of the GA. Reproduction consists of crossover and
mutation operations. The entire procedure is reiterated a
number of times, and each iteration is referred to as a
generation.
The GA comprises two main components the encoding schema
and the evaluation function (also known as the fitness function).
In this research, the chromosome was represented as a matrix
of integers. The location of each element indicates the users
who are requesting replicas, whereas the value of each element
specifies the site ID that holds the replica assigned to the user.
For example, the tenth element (entitled gene) of the
chromosome presented in Figure 4 denotes the tenth user in the
queue assigned to site 6. To measure the value or the quality of
a solution, the fitness function was implemented. Every
chromosome is associated with a fitness value. In this study, we
used a multi-criteria fitness function, as denoted in Eq. 7. The
Euclidean distance between the site QoS specifications desired
by the user and the assigned site QoS specifications is the first
criterion that must be minimised. The second criterion is the
SD of Euclidean distances between users. Both criteria were
assumed to be equally weighted = β = 1).
8
1
5
...................
6
4
2
3
……………
9
1
2
3
10
11
12
n
Figure 4: GA Chromosome encoding
6.1.1 Selection Operator
The selection of operators, as explained in this section, plays a
significant role in exploiting the benefits of the GA. A selected
operator determines the individuals that will be reproduced in
the next generation while aiming to disregard or replace poor
solutions with a predefined probability (the standard
probability value is 70% [37, 38]) to produce new offspring, as
depicted in Figure 5a. This study used only the simplest form of
crossover, where the crossover point was always located in the
middle of the chromosome. In future research, different
crossover points shall be examined.
6.1.2 Mutation Operator
The mutation operator randomly changes the integer (site
number) of the chromosome. This process is conducted with a
very small probability (e.g., 0.05% [37, 38]), as shown in
Figure 5b, to maintain diversity in the chromosome population
and also to overcome the local optima in the search space. The
GA when either a predefined number of generations are
produced or the fitness of the individuals in the population
converges. The output of the GA is the fittest chromosome in
all the populations that were produced.
Figure 5: a) One point crossover, b) Mutation
6.1.3 Repair Operator
In highly constrained optimisation problems, the crossover and
7
mutation operators typically produce invalid or infeasible
solutions that waste time. This problem can be solved by
incorporating problem-specific knowledge to either prevent the
genetic operators from producing infeasible solutions or to
repair these solutions when they occur [39]. Replica selection
can constrain the produced solutions to guide the GA during
the search process. In this research, the repair mechanism
began by checking each chromosome in an attempt to search
for any duplication. Next, the located duplications (if there are
any) were removed via the repair mechanism by assigning the
gene that has a duplicated site number with a new randomly
selected site number that is not already included in the
chromosome. Mutation generates sites randomly, which means
it is also prone to site duplication. Therefore, a repair operation
is required after each mutation.
6.2 Hybridising UPA with GA
Although the GA is reputed to be slow, it has been used in real-
time applications like scheduling in grid computing [25]. The
key is to merge a greedy algorithm with GA. The role of the
greedy algorithm is to fill up the initial population to decrease
the convergence time. Similarly, in this research, several runs
of the UPA were carried out to fill up the initial population, and
the results in terms of time convergence were promising when
compared to the results obtained using only the GA.
The exact sequence of steps in the proposed algorithm is as
follows:
The requested replica and the QoS preferred by users
are collected.
The replica’s physical file names and locations from
grid services are collected; see [12] for details.
Site information is collected and rated; see [11, 12] for
details.
The UPA is used to generate the initial population
through several runs.
The GA is used to pair each user request with the
preferred replica location in a manner that achieves
fair user satisfaction.
7. Performance Metrics and Evaluation
The performance of MRH was evaluated by means of
calculating, anatomising, and comparing its outputs with other
algorithms. Thus, two new metrics, known as average user
satisfaction and fair user satisfaction, were proposed to
evaluate and analyse the performance of the new algorithm.
7.1 Total User Satisfaction
The user-satisfaction criterion is highly important. In the
proposed model, user satisfaction was determined by the
distance between the QoS preferred by the user and the actual
QoS already assigned to that user. As a result, the metric used
to measure MRH is UPQ. The smaller the value of UPQ, the
better is the MRH performance. The UPQ level for any user is
calculated as denoted by equation (8):
(8)
𝑈𝑃𝑄
(
𝑇
𝑢
𝑖
𝑇
𝑔
𝑗
)
2
(
𝐴
𝑢
𝑖
𝐴
𝑔
𝑗
)
2
(
𝑆
𝑢
𝑖
𝑆
𝑔
𝑗
)
2
(
𝐶
𝑢
𝑖
𝐶
𝑔
𝑗
)
2
The TUPQ for all users is given by equation (9):
(9)
𝑇𝑈𝑃𝑄
𝑛
𝑖
=
1
𝑈𝑃𝑄
The average UPQ for the users is another metric that is given
by equation (10):
(10)
𝑈𝑃𝑄
𝑛
1
(
𝑇
𝑢
𝑖
𝑇
𝑔
𝑗
)
2
+
(
𝐴
𝑢
𝑖
𝐴
𝑔
𝑗
)
2
+
(
𝑆
𝑢
𝑖
𝑆
𝑔
𝑗
)
2
+
(
𝐶
𝑢
𝑖
𝐶
𝑔
𝑗
)
2
𝑛
7.2 Fair User Satisfaction:
The fair user satisfaction (FUS) metric measures the different
levels of QoS distributed to the users. The UPQs of the users
must be as fair as possible. Discrepancies in the UPQs of the
users must be reduced in an attempt to be fair to each user. As
UPQ is the proposed criterion, the SD metric is the best one to
be used to calculate the fairness level. FUS is calculated as
denoted by equation (11):
(11)
𝐹𝑈𝑆
=
𝑆𝐷(
𝑈𝑃𝑄
1
,
𝑈𝑃𝑄
2
,……………
𝑈𝑃𝑄
𝑛
)
A smaller value of FUS indicates a better MRH performance in
terms of FUS.
7.3 Evaluation
To evaluate the performance of the replica-selection decision-
making process, a simulation tool is required for evaluating
system trade-offs. Based on this, a distributed and parallel
system search must be conducted. The particular simulation
tools used in this context are those in-line the grid
specifications [40] such as SimGrid, OptorSim, ChicSim,
Bricks, MicroGrid, GridSim, and Monarc. However, none of
the mentioned simulators support multi-request replica
selection, fairness concept, or QoS parameters. In this respect,
OptorSim is the most suitable simulator owing to its ability to
simulate data-replication strategies [41]. Consequently,
OptorSim was used as a base to build our own simulator. In
contrast, because no previous study on replica selection has
considered the case of multiple users and because the enhanced
version of D-system UPA [11] is the most similar to the system
used in this research, two main differences between MRH and
UPA were identified. First, UPA does not integrate cost.
Second, it pairs users to their preferred sites one by one with no
global considerations.
8. Results and Discussion
Experiments discussed in this section were based on two cases.
The first case compared the performance of the MRH system to
those of UPA and AHP. The second case examined the
scalability of the MRH system. The simulation setup is
presented in Table 4.
Table 4: Experiment Setup Parameters
Number of Users
10-70
Number of sites that holds the replica
20-200
Population size
Number of Users
Offspring Producing Probability
70%
Mutation Probability
0.05 %
Crossover
Uniform
Number of Generations
Number of Users
× Number of sites
8.1 Case (1): Fair user satisfaction and total UPQ
In the first step for both MRH and UPA simulations, 20 grid
sites were assumed, and each site had 4 QoS parameters with
8
values between 0 and 100. These values were generated
randomly as given in Table 5 and graphically shown in Figure
6.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
0
20
40
60
80
100
T
A
S
C
Site Number
QoS Value
Figure 6: Sites with their QoS Parameter Values
Table 5: Sites with their QoS Parameter Values
Site Number
T
A
S
C
Site Number
T
A
S
C
1
55
87
95
86
11
54
71
56
96
2
68
95
65
83
12
54
76
76
69
3
88
93
92
90
13
50
51
81
100
4
57
61
77
62
14
70
55
54
99
5
63
88
61
72
15
90
95
96
93
6
81
76
69
66
16
50
92
65
52
7
85
72
90
83
17
51
84
77
50
8
90
53
82
53
18
69
66
60
61
9
55
76
62
59
19
63
51
82
72
10
81
82
72
53
20
59
82
61
95
It was also assumed that 10 users would independently request
one replica each, according to their preferred QoS levels for
each parameter. These values were randomly generated, as
shown in Table 6.
Table 6: Users with their Preferred QoS Parameter Values
User Number
T
A
S
C
1
75
79
68
88
2
71
77
62
95
3
77
89
78
87
4
65
85
93
71
5
73
91
75
90
6
88
84
99
92
7
89
73
77
75
8
89
91
92
100
9
86
73
97
86
10
98
86
95
89
The second step was to run the simulation process using all the
systems. AHP and UPA implement the selections for the users
depending on their positions in the scheduling queue,
beginning from the first and continuing to the last. FUSs and
TUPQs were computed using the MRH system, AHP, and
UPA, as shown in Tables 7 and 8. Figure 7 illustrates the
content of Tables 7 and 8 in terms of TUPQ and Figure 8
illustrates the content of Tables 7 and 8 in terms of FUS.
The same experiment was repeated 10 times with the same data
shown in Table 6 but with different orders. The results of the
aforementioned experiments are presented in Figures 7–12
which demonstrate that the efficiency of the MRH system is
better than those of AHP and UPA, the values resulting from
the simulations were computed, and the efficiency is calculated
as denoted by equation (12):
, (12)
Efficiency
=
𝑂
𝑎𝑣
𝑈
𝑎𝑣
𝑂
𝑎𝑣
×
100
where is the other system value, and is the underlying
𝑂
𝑎𝑣
𝑈
𝑎𝑣
system value.
9
Table 7: FUSs & TUPQs of the 10 Experiments Using MRH & UPA Systems
MRH
UPA
Run
Number
TUPQ
FUS
TUPQ
FUS
Efficiency
Based on TUPQ
Efficiency
Based on FUS
1
178.86
7.69
218.92
11.59
18.30
33.65
2
178.86
7.69
230.09
11.55
22.27
33.42
3
178.86
7.69
186.71
7.85
4.20
2.04
4
180.87
7.41
230.64
12.01
21.58
38.30
5
178.86
7.69
238.10
10.79
24.88
28.73
6
180.87
7.41
218.92
11.59
17.38
36.07
7
180.87
7.41
235.47
10.45
23.19
29.09
8
178.86
7.69
197.62
11.77
9.49
34.66
9
178.86
7.69
190.69
10.72
6.20
28.26
10
180.87
7.41
212.03
13.55
14.70
45.31
Table 8: FUSs & TUPQs of the 10 Experiments Using MRH & AHP Systems
MRH
AHP
Run
Number
TUPQ
FUS
TUPQ
FUS
Efficiency
Based on TUPQ
Efficiency
Based on FUS
1
178.86
7.69
269.08
12.86
33.53%
40.20%
2
178.86
7.69
292.98
11.6
38.95%
33.71%
3
178.86
7.69
335.76
9.19
46.73%
16.32%
4
180.87
7.41
296.55
12.61
39.01%
41.24%
5
178.86
7.69
307.39
9.05
41.81%
15.03%
6
180.87
7.41
307.03
13.76
41.09%
46.15%
7
180.87
7.41
243.01
12.61
25.57%
41.24%
8
178.86
7.69
287.25
10.59
37.73%
27.38%
9
178.86
7.69
290.55
12.81
38.44%
39.97%
10
180.87
7.41
289.87
11.20
37.60%
33.84%
As illustrated in Figures 7–14, the MRH system performed
better than both UPA and AHP in all the experiments, in terms
of both TUPQ, which corresponds to increased user
satisfaction, and FUS, which indicates that higher quality and
more fairness have been achieved. Figure 7, shows the TUPQs
values of MRH is always less than the values of UPA and AHP
which means it is closer to user satisfaction similar situation
shown by Figure 8, but in terms of FUS. The efficiency MRH
reached 24.88% and 45.31% in terms of user satisfaction and
fairness, respectively, in comparison with UPA. On the other
hand, the efficiency in comparison with AHP reached 46.73%
% in terms of user satisfaction, and the fairness reached
46.15%, which highlights the significance of the MRH system.
Based on the above experiments and with respect to UPA, the
average TUPQ enhancement was 16.22%, and the SD was
6.98. With respect to AHP, the average enhancement was
38.05%, and the SD was 5.55. The average FUS enhancement
based on UPA was 30.95%, and the SD was 11.38, whereas the
average FUS enhancement based on UPA was 33.51%, and the
SD was 10.75. Figures 8 and 9 depict that MRH is always more
efficient that UPA and AHP in terms of TUPQs and FUSs.
1
2
3
4
5
6
7
8
9
10
0
100
200
300
400
MRH
UPA
AHP
Performance of MRH, UPA and AHP based
on TUPQs
Figure 8: TUPQs of 10 Experiments demonstrating efficiency of
MRH over UPA & AHP
10
1
2
3
4
5
6
7
8
9
10
0
5
10
15
20
MRH
UPA
AHP
Performance of MRH, UPA and AHP based on
FUSs
Figure 8: FUSs of 10 Experiments demonstrating efficiency of
MRH over UPA & AHP
1
2
3
4
5
6
7
8
9
10
0.00%
20.00%
40.00%
60.00%
80.00%
UPA
APH
Efficiency of MRH
Based on TUPQ over APH & UPA
Figure 9: Efficiency of MRH over UPA & APH Based on TUPQ
1
2
3
4
5
6
7
8
9
10
0.00%
50.00%
100.00%
UPA
APH
Efficiency of MRH
Based on FUS over APH & UPA
Figure 10: Efficiency of MRH over UPA & APH Based on FUS
1
2
3
4
5
6
7
8
9
10
0
50
100
150
200
250
300
MRH-TUPQ
UPA-TUPQ
Figure 11: TUPQs of 10 Experiments Using MRH and UPA
Systems
1
2
3
4
5
6
7
8
9
10
0
2
4
6
8
10
12
14
16
MRH-FUS
UPA-FUS
Figure 12: FUS of 10 Experiments Using MRH & UPA Systems
1
2
3
4
5
6
7
8
9
10
0
50
100
150
200
250
300
350
400
MRH-TUPQ
AHP-TUPQ
Figure 13: MRH versus AHP Systems in Terms of TUPQ
11
1
2
3
4
5
6
7
8
9
10
0
2
4
6
8
10
12
14
MRH-FUS
AHP-FUS
Figure 14: MRH versus AHP Systems in Terms of FUS
The detailed user and site combinations, ordered from 1 to 10
for simplicity, are presented in Figures 15(a) and 15(b), which
are generated from Table 9. The UPA and AHP match users
and sites based on the user positions in the scheduling queue.
The MRH system provides optimal solutions that are more
stable and always show similar matching for users and sites
with little variation based on the order of the users in the queue
initiated from the initial population. This indicates that the FUS
and TUPQ values in Tables 7 and 8 were nearly the same,
irrespective of the order of users in the queue. In contrast, the
FUS and TUPQ values obtained from the UPA and the AHP
always noticeably varied based on the user positions in the
queue. The efficiencies of the pairing results obtained from
UPA and AHP were less than that obtained from the MRH
system. Nevertheless, there is a small possibility that the UPA
and AHP accidentally achieve a performance similar to the
performance of the MRH system.
0
5
10
15
20
MRH
UPA
AHP
MRH
UPA
AHP
MRH
UPA
AHP
MRH
UPA
AHP
MRH
UPA
AHP
User 10
User 9
User 8
User 7
User 6
User 5
User 4
User 3
User 2
User 1
Figure 9(a): Detailed Combinations between Users and Sites (Run
number 1 to 5)
0
5
10
15
20
MRH
UPA
AHP
MRH
UPA
AHP
MRH
UPA
AHP
MRH
UPA
AHP
MRH
UPA
AHP
User 10
User 9
User 8
User 7
User 6
User 5
User 4
User 3
User 2
User 1
Figure 15(b): Detailed Combinations between Users and Sites
(Run number 6 to 10)
12
Table 9: 10 Experiments Using MRH, AHP & UPA Systems
Users
Run
Number
System
1
2
3
4
5
6
7
8
9
10
MRH
20
11
5
12
2
1
6
15
7
3
UPA
20
11
7
1
2
15
6
5
19
3
1
AHP
6
20
5
3
2
10
1
15
8
7
MRH
20
11
5
12
2
1
6
15
7
3
UPA
2
11
15
12
5
3
6
20
7
1
2
AHP
7
20
6
10
15
2
8
3
1
5
MRH
20
11
5
12
2
1
6
15
7
3
UPA
20
11
2
12
5
3
6
1
7
15
3
AHP
20
1
6
15
3
10
5
7
8
2
MRH
20
11
2
12
5
1
6
15
7
3
UPA
2
20
7
1
12
3
6
5
19
15
4
AHP
6
20
5
3
2
7
15
10
8
1
MRH
20
11
5
12
2
1
6
15
7
3
UPA
2
20
5
1
3
12
6
7
19
15
5
AHP
15
2
20
3
5
7
10
1
6
8
MRH
20
11
2
12
5
1
6
15
7
3
UPA
20
11
7
1
2
15
6
5
19
3
6
AHP
8
20
6
2
1
7
3
10
5
15
MRH
20
11
2
12
5
1
6
15
7
3
UPA
2
11
3
1
20
12
6
7
19
15
7
AHP
5
20
10
3
2
7
6
15
1
8
MRH
20
11
5
12
2
1
6
15
7
3
UPA
5
20
2
1
11
12
6
15
7
3
8
AHP
3
20
10
2
6
15
8
5
1
7
MRH
20
11
5
12
2
1
6
15
7
3
UPA
20
11
2
1
5
12
6
15
7
3
9
AHP
7
2
10
6
1
3
8
15
20
5
MRH
20
11
2
12
5
1
6
15
7
3
UPA
5
20
3
1
2
12
6
15
7
10
10
AHP
7
2
10
20
1
3
8
5
6
15
8.1.1 Statistical Testing
8.1.1.1 TUPQ
Tables 7 and 8 clearly demonstrate that the MRH system was
superior to both UPA and AHP systems. However, statistical
testing is a useful method that can be used to validate the
significance of these results. Therefore, a one-way repeated
measure, analysis of variance (ANOVA), was conducted to
compare the MRH, UPA, and AHP systems based on the
TUPQ metric. The means and SDs are presented in Table 10.
The TUPQ means were 179.6, 215.9, and 291.9 when the MRH
system, UPA, and AHP were used, respectively. To test
whether the differences between the means were significant or
not, a multivariate test was conducted, and the results are
presented in Table 11, which indicates that the system used had
a significant effect on the TUPQ values, especially when the
MRH system was used [Wilks’ Lambda = 0.023, F(2, 8) =
172.6, p < 0.0005, multivariate partial η2 = 0.977].
Table 10: TUPQ Descriptive Statistics for MRH, UPA & AHP
Systems
System
Mean
Std. Deviation
Number
MRH
179.6640
1.03796
10
UPA
215.9190
18.70657
10
AHP
291.9470
24.39006
10
Table 11: The Results of Multivariate Tests based on TUPQ
Effect
Value
F
Hypothesis
df
Error
df
Significance
Partial
ETA
Squared
Wilks'
Lambda
0.023
172.6
2.0
8.0
0.000
0.977
Additional analyses were conducted to determine the directions
of these differences in TUPQ values, and tests were performed
13
to shed light on the effects of each system. The results of these
multivariate tests are presented in Table 12.
Table 12: The Results of Tests between Systems Effects based on
TUPQ
Source
Type III Sum
of Squares
df
Mean Square
F
Significance
Partial
ETA
Squared
Intercept
1575658.336
1
1575658.336
8325.460
0.000
0.999
Error
1703.320
9
189.258
-
-
-
The one-way repeated measures of ANOVA showed that these
TUPQs were significantly different—F(1, 9) = 8325.4, p <
0.001, partial η2 = 0.999, repeated measures using a Bonferroni
adjustment α= 0.05/3 = 0.017. Moreover, pairwise
comparisons, as presented in Table 13, proved that the TUPQ
values (Euclidean distances) were significantly smaller when
the MRH system was used. Furthermore, there was a
significant reduction in the Euclidean distance when AHP was
compared with UPA. These results indicate that UPA
outperformed AHP, and that in comparison to both of them,
MRH system performed the best. In visual summary form,
Figure 16 presents the mean difference and standard error for
the TUPQ pairwise comparison.
Table 13: TUPQ Pairwise Comparisons
95% Confidence
Interval for
Differences
(I) factor1
Dimension1
(J) factor1
Dimension2
Mean
Difference
(I-J)
Std.
Error
Sig.
Lower
Bound
Upper
Bound
MRH
UPA
-36.255-
5.797
0.000
-53.261-
-19.249-
MRH
AHP
-112.283-
7.810
0.000
-
135.192-
-89.374-
AHP
UPA
-76.028-
11.506
0.000
-42.278-
-109.778-
Mean Difference
Std. Error
0
20
40
60
80
100
120
MRH vs UPA
MRH vs AHP
AHP vs UPA
TUPQ Pairwise Comparisons: MRH vs UPA vs AHP
Figure 16: Mean difference and standard error for the TUPQ
pairwise comparison
8.1.1.2 FUS
Similarly, one-way repeated measures of ANOVA were
conducted to compare the MRH, UPA, and AHP systems based
on the FUS metric. The means and SDs are presented in Table
14.
The TUPQ means were 7.58, 11.19, and 11.63 when the MRH
system, UPA, and AHP were used, respectively. To determine
if the difference between the means was significant, a
multivariate test was conducted. As shown in Table 15, there
was is a significant effect on FUS depending on the system that
was used, and this was especially evident when using the MRH
system [Wilks’ Lambda = 0.103, F(2, 8) = 34.902, p < 0.0005,
multivariate partial η2 = 0.897].
Table 14: FUS Descriptive Statistics for MRH, UPA & AHP
Systems
System
Mean
Std. Deviation
Number
MRH-system
7.58
0.145
10
UPA
11.19
1.46
10
AHP
11.63
1.61
10
Table 15: The Results of Multivariate Tests based on FUS
Effect
Value
F
Hypothesis
df
Error
df
Sig.
Partial
ETA
Squared
Wilks'
Lambda
0.103
34.902
2.000
8.000
0.000
0.897
More analyses were conducted to determine the directions of
these differences in FUS values, and tests were performed to
shed light on the effects of the different systems. The results of
these multivariate tests are presented in Table 16.
Table 16: The results of Tests between Systems Effects based on FUS
Source
Type III Sum
of Squares
df
Mean
Square
F
Sig.
Partial ETA
Squared
Intercept
3079.115
1
3079.115
1491.755
0.000
0.994
Error
18.577
9
2.064
-
-
-
One-way repeated measures of ANOVA indicate that these
FUSs were significantly different: F(1, 9) = 1491.755, p <
0.001, partial η2 = 0.994, repeated-measures using a Bonferroni
adjustment α = 0.05/3 = 0.017. Moreover, pairwise
comparisons, as presented in Table 17, proved that the FUS
values were significantly smaller when the MRH system was
used. These results demonstrate that UPA outperformed AHP,
and that in comparison to both of them, MRH system
performed the best. In visual summary form, Figure 17 presents
the mean difference and standard error for the FUS pairwise
comparison.
Table 17: FUS Pairwise Comparisons
95% Confidence
Interval for
Differences
(I) factor1
Dimension1
(J) factor1
Dimension2
Mean
Difference
(I-J)
Std.
Error
Sig.
Lower
Bound
Upper
Bound
MRH
UPA
-3.609-
0.482
0.000
-5.022-
-2.196-
MRH
AHP
-4.050-
0.532
0.000
-5.611-
-2.489-
AHP
UPA
-0.441-
0.532
1.000
-1.121-
-2.003-
Mean Difference
Std. Error
0
2
4
6
MRH vs UPA
MRH vs AHP
AHP vs UPA
FUS Pairwise Comparison: MRH vs UPA vs AHP
Figure 17: Mean difference and standard error for the TUPQ
pairwise comparison
14
8.2 Case (2): Scalability test and best replica
Simulations were conducted using various methods, and the
results were compared with those of the UPA and the AHP to
determine the feasibility and scalability of the proposed
system. UPA was expected to be superior to AHP because
UPA was designed specifically for replica selection with
multiple parameters, whereas AHP is a general-purpose
decision model. In this simulation, the total number of grid
sites was one independent variable, and the number of users
was another independent variable. Therefore, nine scenarios
were examined, as shown in Tables 18 and 19 respectively. For
ease of understanding and interpretability, the results are
presented through visualisation: Table 18 is depicted in Figures
18(a) to 18(i), and Table 19 is depicted in Figures 19(a) to 19(i)
respectively.
Table 18: The Performance of 9 Experiments Using MRH
& UPA Systems
MRH
UPA
Scenarios
Sets
Number of sites
No of requests
TUPQ
FUS
TUPQ
FUS
Efficiency Based
on TUPQ
Efficiency
Based on FUS
1
10
144.60
1.13
145.39
4.09
0.54%
72.37%
2
20
288.74
4.12
313.27
4.4
7.83%
6.36%
3
A
50
30
406.72
4.81
436.28
6.09
6.78%
21.02%
4
20
222.29
2.68
227.18
2.75
2.15%
2.55%
5
30
343.20
3.56
367.88
4.20
6.71%
15.24%
6
B
100
50
705.515
6.00
777.04
6.32
9.20%
5.06%
7
30
279.43
2.47
303.72
3.09
8.00%
20.06%
8
50
464.10
3.46
488.75
3.48
5.04%
6.46%
9
C
200
70
644.33
3.35
695.19
3.53
7.32%
5.10%
Table 19: The Performance of 9 Experiments Using MRH & AHP
Systems
MRH
AHP
Scenarios
Sets
Number of sites
No of requests
TUPQ
FUS
TUPQ
FUS
Efficiency
Based on
TUPQ
Efficiency
Based on FUS
1
10
144.60
1.13
390.15
10.10
62.94%
88.81%
2
20
288.74
4.12
669.87
11.94
56.90%
65.49%
3
A
50
30
406.72
4.81
995.46
10.27
59.14%
53.16%
4
20
222.29
2.68
662.07
9.33
66.43%
71.28%
5
30
343.20
3.56
1023.46
13.13
66.47%
72.89%
6
B
100
50
705.515
6.00
1737.53
12.11
59.40%
50.45%
7
30
279.43
2.47
1054.8
12.35
73.51%
80.00%
8
50
464.10
3.46
1750.10
10.95
73.48%
68.40%
9
C
200
70
644.33
3.35
2362.23
11.07
72.72%
69.74%
The first chromosome used in the MRH system was the one
obtained from UPA; thus, both systems began from the same
point. The results obtained from these simulations demonstrate
that the MRH system was scalable and outperformed the UPA
and AHP in all the scenarios. The fairness efficiency was
highly significant because it reached 72.37% in comparison to
UPA and 88.81% in comparison to AHP. The superiority of the
MRH system can be attributed to its nature as a weighted
algorithm that includes prior Multiple-User considerations
before making any decision. In contrast, UPA and AHP are
both greedy. They satisfy the current user without making any
prior considerations about the remaining users, similar to the
scenario of closest-city selection in the travelling-salesman
problem, which results in very long distances at the end.
Moreover, in terms of the TUPQ performance metric, the MRH
system delivered better results than UPA. The average
improvement value was 5.95% with an SD of 2.8. When
compared with AHP, the average TUPQ improvement value
was 65.66% with an SD of 22.27. On the other hand, the
average FUS improvement when compared with UPA was
16.48% with an SD of 2.87. Compared with AHP, the average
FUS improvement was 68.91% with an SD of 11.95.
10
20
30
0
100
200
300
400
MRH- TUPQ
UPA- TUPQ
Number of requests
Scenarios of Set A with 50 sites
Figure 18(a): Performance of MRH-TUPQ vs UPA-
TUPQ in Table 18 Set A
20
30
50
0
200
400
600
800
MRH- TUPQ
UPA- TUPQ
Number of requests
Scenarios of Set B with 100 sites
Figure 18(b): Performance of MRH-TUPQ vs UPA-
TUPQ in Table 18 Set B
10
20
30
0
200
400
600
800
MRH- TUPQ
UPA- TUPQ
Number of requests
Scenarios of Set C with 50 sites
Figure 18(c): Performance of MRH-TUPQ vs UPA-
TUPQ in Table 18 Set C
15
10
20
30
0
1
2
3
4
5
6
7
MRH- FUS
UPA- FUS
Number of requests
Scenarios of Set A with 50 sites
Figure 18(d): Performance of MRH-FUS vs UPA-FUS in
Table 18 Set A
20
30
50
0
1
2
3
4
5
6
7
MRH- FUS
UPA- FUS
Number of requests
Scenarios of Set B with 100 sites
Figure 18(e): Performance of MRH-FUS vs UPA-FUS in
Table 18 Set B
30
50
70
0
0.5
1
1.5
2
2.5
3
3.5
4
MRH- FUS
UPA- FUS
Number of requests
Scenarios of Set C with 50 sites
Figure 18(f): Performance of MRH-FUS vs UPA-FUS in
Table 18 Set C
10
20
30
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Efficiency Based on TUPQ
Efficiency Based on FUS
Number of requests
Scenarios of Set A with 50 sites
Figure 18(g) UPA Efficiency TUPQ vs FUS in Table 18
Set A
20
30
50
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Efficiency Based on TUPQ
Efficiency Based on FUS
Number of requests
Scenarios of Set B with 100 sites
Figure 18(h): UPA Efficiency TUPQ vs FUS in Table 18
Set B
30
50
70
0%
20%
40%
60%
80%
100%
Efficiency Based on TUPQ
Efficiency Based on FUS
Number of requests
Scenarios of Set C with 200 sites
Figure 18 (i): UPA Efficiency: TUPQ vs FUS in Table 18
Set C
16
10
20
30
0
100
200
300
400
MRH- TUPQ
AHP- TUPQ
Number of requests
Scenarios of Set A with 50 sites
Figure 19(a): Performance of MRH-TUPQ vs AHP-TUPQ
in Table 19 Set A
20
30
50
0
100
200
300
400
500
600
700
800
MRH- TUPQ
AHP- TUPQ
Number of requests
Scenarios of Set B with 100 sites
Figure 19(b): Performance of MRH-TUPQ vs AHP-TUPQ
in Table 19 Set
10
20
30
0
100
200
300
400
500
600
700
800
MRH- TUPQ
AHP- TUPQ
Number of requests
Scenarios of Set C with 50 sites
Figure 19(c): Performance of MRH-TUPQ vs AHP-TUPQ
in Table 19 Set C
10
20
30
0
1
2
3
4
5
6
7
MRH- FUS
AHP- FUS
Number of requests
Scenarios of Set A with 50 sites
Figure 19(d): Performance of MRH-FUS vs AHP-FUS in
Table 19 Set A
20
30
50
0
1
2
3
4
5
6
7
MRH- FUS
AHP- FUS
Number of requests
Scenarios of Set B with 100 sites
Figure 19(e): Performance of MRH-FUS vs AHP-FUS in
Table 19 Set B
30
50
70
0
0.5
1
1.5
2
2.5
3
3.5
4
MRH- FUS
AHP- FUS
Number of requests
Scenarios of Set C with 50 sites
Figure 19(f): Performance of MRH-FUS vs AHP-FUS in
Table 19 Set C
17
10
20
30
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Efficiency Based on TUPQ
Efficiency Based on FUS
Number of requests
Scenarios of Set A with 50 sites
Figure 19(g) AHP Efficiency TUPQ vs FUS in Table 19 Set
A
20
30
50
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Efficiency Based on TUPQ
Efficiency Based on FUS
Number of requests
Scenarios of Set B with 100 sites
Figure 19(h): AHP Efficiency TUPQ vs FUS in Table 19
Set B
30
50
70
0%
20%
40%
60%
80%
100%
Efficiency Based on TUPQ
Efficiency Based on FUS
Number of requests
Scenarios of Set C with 200 sites
Figure 19(i): AHP Efficiency: TUPQ vs FUS in Table 19
Set C
9. Conclusion
In this complex study, a new hybrid replica-selection approach
was introduced to the data grid environment. This algorithm
integrates the QoS attributes of time, site availability, security,
cost, and user preferences into the replica-selection decision-
making process. The primary achievement of this proposed
approach was the simultaneous addressing of multiple requests
in the queue that was clearly demonstrated and mathematically
modelled. The GA was used owing to the complexity of the
problem. FUS and the average are two new metrics
UPQ
proposed to measure the performance of the approach. The
simulation experiments adopted and extended some modules in
OptorSim. The robustness of the approach was investigated,
and the experimental results were presented. The simulation
results indicate that the new approach enhanced the
performance of the grid environment by reducing both FUS
and and increasing efficiency up to a maximum of 25%.
UPQ
The slow computational speed of the GA is the main drawback
of the proposed system. Therefore, in our planned future
research work, methods of reducing the search space by using
Artificial Intelligence and Machine Learning means as well as
increasing the computational speed of GA to improve the
efficiency for real-time applications will be investigated,
simulated and prototyped in pilot environments to assess its
performance and viability against a profound set of criteria.
References
[1] S. Vazhkudai, S. Tuecke, and I. Foster, "Replica selection in the
globus data grid," in Cluster Computing and the Grid, 2001.
Proceedings. First IEEE/ACM International Symposium on,
2001, pp. 106-113: IEEE.
[2] A. Chervenak et al., "Giggle: a framework for constructing
scalable replica location services," in Supercomputing,
ACM/IEEE 2002 Conference, 2002, pp. 58-58: IEEE.
[3] S. B. Priya, M. Prakash, and K. Dhawan, "Fault tolerance-
genetic algorithm for grid task scheduling using check point," in
Grid and Cooperative Computing, 2007. GCC 2007. Sixth
International Conference on, 2007, pp. 676-680: IEEE.
[4] S. Venugopal and R. Buyya, "An SCP-based heuristic approach
for scheduling distributed data-intensive applications on global
grids," Journal of Parallel and Distributed Computing, vol. 68,
no. 4, pp. 471-487, 2008.
[5] R. M. Rahman, R. Alhajj, and K. Barker, "Replica selection
strategies in data grid," Journal of Parallel and Distributed
Computing, vol. 68, no. 12, pp. 1561-1574, 2008.
[6] A. Jaradat, R. Salleh, and A. Abid, "Imitating K-Means to
Enhance Data Selection," Journal of Applied Sciences, vol. 9,
no. 19, pp. 3569-3574, 2009.
[7] R. M. Almuttairi, R. Wankar, A. Negi, and C. Rao, "Smart
Replica Selection for Data Grids using Rough Set
Approximations (RSDG)," in International Conference on
Computational Intelligence and Communication Networks,
Bhopal 2010, pp. 466-471: IEEE.
[8] R. M. Almuttairi, R. Wankar, A. Negi, and C. Rao, "Replica
Selection in Data Grids Using Preconditioning of Decision
Attributes by K-means Clustering (K-RSDG)," in Second
Vaagdevi International Conference on Information Technology
for Real World Problems, vcon, 2010, vol. 1, pp. 18-23: IEEE.
[9] P. Wendell, J. W. Jiang, M. J. Freedman, and J. Rexford,
"Donar: decentralized server selection for cloud services," in
the ACM SIGCOMM 2010 conference New York, NY, USA,
2010, vol. 40, pp. 231-242: ACM.
[10] A. Jaradat, A. H. M. Amin, and M. N. Zakaria, "Balanced QoS
Replica Selection Strategy to Enhance Data Grid," presented at
the the 2nd International Conference on Networking and
Information Technology Hong Kong, China, 2011.
[11] A. Jaradat, A. H. M. Amin, M. Zakaria, and K. J. Golden, "An
Enhanced Grid Performance Data Replica Selection Scheme
Satisfying User Preferences Quality of Service," European
Journal of Scientific Research, vol. 73, no. 4, pp. 527-538,
2012.
[12] A. Jaradat, A. Patel, M. N. Zakaria, and M. A. Amina,
18
"Accessibility algorithm based on site availability to enhance
replica selection in a data grid environment," Computer Science
and Information Systems, vol. 10, no. 1, pp. 105-132, 2013.
[13] O. Jadaan, W. Abdulal, M. A. Hameed, and A. Jabas,
"Enhancing Data Selection Using Genetic Algorithm," in
Second Vaagdevi International Conference on Information
Technology for Real World Problems, VCON, 2010, pp. 434-
439: IEEE.
[14] C. Hamdeni, T. Hamrouni, and F. B. Charrada, "Evaluation of
site availability exploitation towards performance optimization
in data grids," Cluster Computing, vol. 21, no. 4, pp. 1967-1980,
2018.
[15] H. H. E. Al-Mistarihi and C. H. Yong, "On fairness, optimizing
replica selection in data grids," IEEE transactions on parallel
and distributed systems, vol. 20, no. 8, pp. 1102-1111, 2008.
[16] T. L. Saaty, "Decision making with the analytic hierarchy
process," International Journal of Services Sciences, vol. 1, no.
1, pp. 83-98, 2008.
[17] W. Awang, M. Deris, O. F. Rana, M. Zarina, and A. Rose,
"Affinity Replica Selection in Distributed Systems," in
International Conference on Parallel Computing Technologies,
2019, pp. 385-399: Springer.
[18] G. Yang and Z. Liu, "Replica Selection Algorithm for
Streaming Media," in 2018 International Conference on
Mathematics, Modelling, Simulation and Algorithms (MMSA
2018), 2018: Atlantis Press.
[19] C. Li, J. Tang, and Y. Luo, "Scalable replica selection based on
node service capability for improving data access performance
in edge computing environment," The Journal of
Supercomputing, pp. 1-35, 2019.
[20] A. Abbasi, A. M. Rahmani, and E. Zeinali Khasraghi,
"Reliability and Availability Improvement in Economic Data
Grid Environment Based On Clustering Approach," Journal of
Advances in Computer Engineering and Technology, vol. 1, no.
4, pp. 1-14, 2015.
[21] B. Nazir, F. Ishaq, S. Shamshirband, and A. T. Chronopoulos,
"The Impact of the Implementation Cost of Replication in Data
Grid Job Scheduling," Mathematical and Computational
Applications, vol. 23, no. 2, p. 28, 2018.
[22] R. K. Grace and S. S. Kumar, "Replica Selection Using Random
and AHP Algorithms in Data Grid," International Journal on
Information Sciences & Computing, vol. 11, no. 1, 2017.
[23] S. Song, K. Hwang, and Y. K. Kwok, "Risk-resilient heuristics
and genetic algorithms for security-assured grid job
scheduling," IEEE Transactions on Computers, vol. 55, no. 6,
pp. 703-719, 2006.
[24] J. Carretero, F. Xhafa, and A. Abraham, "Genetic algorithm
based schedulers for grid computing systems," International
Journal of Innovative Computing, Information and Control,
vol. 3, no. 6, pp. 1-19, 2007.
[25] K. Z. Gkoutioudi and H. D. Karatza, "Multi-Criteria Job
Scheduling in Grid Using an Accelerated Genetic Algorithm,"
Journal of Grid Computing, vol. 10, no. 2, pp. 1-13, 2012.
[26] J. H. Holland, Adaptation in natural and artificial systems (no.
53). University of Michigan press, 1975.
[27] V. Vijayakumar and R. Banu, "Security for resource selection in
grid computing based on trust and reputation responsiveness,"
International Journal of Computer Science and Network
Security, vol. 8, no. 11, pp. 107-115, 2008.
[28] H. Sun and Y. Ding, "QoS scheduling of fuzzy strategy grid
workflow based on the bio-network," International Journal of
Computational Science and Engineering, vol. 6, no. 1, pp. 114-
121, 2011.
[29] D. Kolossa, B. U. Köhler, M. Conrath, and R. Orglmeister,
"Optimal Permutation Correction by Multiobjective Genetic
Algorithms," Proceedings of ICA, San Diego, CA, 2001.
[30] Z. Michalewicz, Genetic algorithms+ data structures. Springer,
1996.
[31] C. M. Fonseca and P. J. Fleming, "An overview of evolutionary
algorithms in multiobjective optimization," Evolutionary
computation, vol. 3, no. 1, pp. 1-16, 1995.
[32] A. Rubinov and R. Gasimov, "Scalarization and nonlinear
scalar duality for vector optimization with preferences that are
not necessarily a pre-order relation," Journal of Global
Optimization, vol. 29, no. 4, pp. 455-477, 2004.
[33] D. H. Kim and K. W. Kang, "Design and implementation of
integrated information system for monitoring resources in grid
computing," in the 10th International Conference on Computer
Supported Cooperative Work in Design, Nanjing, 2006, pp. 1-6:
Ieee.
[34] S. Vazhkudai, S. Tuecke, and I. Foster, "Replica selection in the
globus data grid," in the first IEEE/ACM International
Symposium on Cluster Computing and the Grid, Brisbane, Qld,
2001, pp. 106-113: IEEE.
[35] R. Wolski, "Dynamically forecasting network performance
using the network weather service," Cluster Computing, vol. 1,
no. 1, pp. 119-132, 1998.
[36] S. Fitzgerald, I. Foster, C. Kesselman, G. Von Laszewski, W.
Smith, and S. Tuecke, "A directory service for configuring
high-performance distributed computations," in the 6th IEEE
Symposium on High Performance Distributed Computing,
Portland, Oregon, 1997, pp. 365-375: IEEE.
[37] E. Amaldi, A. Capone, and F. Malucelli, "Optimizing base
station siting in UMTS networks," in the 53rd Vehicular
Technology Conference, Rhodes, 2001, vol. 4, pp. 2828-2832
vol. 4: IEEE.
[38] S. Gaber, M. El-Sharkawi, and M. N. El-deen, "Traditional
genetic algorithm and random-weighted genetic algorithm with
GIS to plan radio network," URISA Journal, vol. 22, no. 1, pp.
205-222, 2010.
[39] T. A. El-Mihoub, A. A. Hopgood, L. Nolle, and A. Battersby,
"Hybrid genetic algorithms: A review," Engineering Letters,
vol. 13, no. 2, pp. 124-137, 2006.
[40] A. Sulistio, C. S. Yeo, and R. Buyya, "A taxonomy of
computerbased simulations and its mapping to parallel and
distributed systems simulation tools," Software: Practice and
Experience, vol. 34, no. 7, pp. 653-673, 2004.
[41] W. H. Bell, D. G. Cameron, A. P. Millar, L. Capozza, K.
Stockinger, and F. Zini, "Optorsim: A grid simulator for
studying dynamic data replication strategies," International
Journal of High Performance Computing Applications, vol. 17,
no. 4, pp. 403-416, 2003.
Authors Biographies
AYMAN JARADAT received the B.Sc. degree from Yarmouk University, Jordan, in 1989, the M.Sc.
degree from University Sains Malaysia, in 2007, and the Ph.D. degree from Universiti Teknologi Petronas,
in 2013. He was the Dean of the Faculty of Computer and Information Technology, Al-Madinah
International University. He is currently an Assistant Professor with Al Majmaah University. He is
specialized in computer science and his research interests include high-performance computing, grid
computing, cloud computing, genetic algorithm, and distributed algorithms and applications
20
HITHAM ALHUSSIAN received the B.Sc. and M.Sc. degrees in computer science from the School of
Mathematical Sciences, Khartoum University, Sudan, and the Ph.D. degree from Universiti Teknologi
Petronas, Malaysia, where he is currently a Senior Lecturer with the Computer and Information Sciences
Department and Core research member of Centre in Research and Data Science (CERDAS). His main
research interests are in Real-time parallel and distributed systems, cloud computing, big data mining and
machine learning.
AHMED PATEL received his MSc & PhD degrees in Computer Science from Trinity College Dublin
(TCD), University of Dublin in 1978 & 1984 respectively, specializing in the design, implementation &
performance analysis of packet switched networks. He is Research Professor at Universidade Estadual do
Ceará, Fortaleza, Brazil with key research interest in Advanced Computer Networking, Internet of Things,
Cloud Computing, Big Data, Predictive Analysis, Use of Advanced Computing Techniques, Impact of e-
social Networking, Closing the digital divide ICT gap and ICT Project Management. He has published
well-over 272 technical & scientific papers & co-authored three books, two on computer network security
& the third on group communications. He co-edited one book on distributed search systems for the
Internet and also co-edited & co-authored another book entitled: Securing Information & Communication
Systems: Principles, Technologies & Applications”. He is a member of the Editorial Advisory Board of
International Journals and has participated in Irish, Malaysian, and European funded research projects.
SULIMAN MOHAMED FATI obtained his BS.c (2002), MS.c. (2009), and Ph.D. (2014) from Ain
Shams University-Egypt, Cairo University -Egypt. And Universiti Sains Malaysia (USM) Malaysia
respectively. Currently, he is an assistant Professor in College of Computers and Information Sciences,
Prince Sultan university, Saudi Arabia. His research interests focus on Internet of Things, Machine
Learning, Social Media Mining, Cloud Computing, Cloud Computing Security, and Information Security.
He has authored over 20 journal/conference papers, books, and book chapters. He is a member of different
professional bodies as IEEE, IACSIT, IAENG, and Institute of Research Engineers and Doctors, USA. He
is a reviewer in many international journals.
... Replica Selection [4] is a mechanism to choose the best replica place amongst several replica locations according to quality of service (QoS) parameters. There are several QoS parameters like response time (RsT), security, availability, reliability, and cost are very important and have crucial impact on the Grid environment as explained in [1,[5][6][7]. RsT or for simplicity time is a vital element that eff ects the replica selection and thus the job turnaround time. Earlier replica selection algorithms addressed time QoS parameter as the only parameter, and dedicated or shed on estimating it and put all the eff orts in selecting the replica place with fastest replica movement, from source to sink. ...
... Finally, as mentioned above several researches integrated a number of QoS parameters a part of RsT such as reliability, security, availability in the selection process [1,[19][20][21][22][23][24]. In addition to these QoS parameters some works also added users preferences to guide the selection process [6,25]. Moreover, a recent work went for group decision making by considering multiple users preferences simultaneously prior to assign users to replica grid sites [6]. ...
... In addition to these QoS parameters some works also added users preferences to guide the selection process [6,25]. Moreover, a recent work went for group decision making by considering multiple users preferences simultaneously prior to assign users to replica grid sites [6]. ...
Article
Full-text available
The design of Data Grids allows grid facilities to manage data files and their corresponding replicas from all around the globe. Replica selection in Data Grids is a complex service that selects the best replica place amongst several scattered places based on quality of service parameters. All replica selection algorithms look for the best replica for the requesting users without taking into account the limitation of their network or hardware capabilities to find the best fit. This leaves capable users with limited ability to connect with the best replica places without fully utilizing their download speed. It furthermore compromises the best replica places and shifts capable users to lower quality replica places and degrades the whole Data Grid environment. To improve quality of service parameters the solution we propose is, a matching algorithm that matches the capabilities of grid user with replica providers that are the best fit. This best-fit approach takes into account both the capabilities of grid users and the capabilities of replica places and creates matches of almost similar capabilities. Simulation results proved that the best-fit algorithm outperforms previous replica selection algorithms.
... As such, it is a criterion measurable or evaluated using several factors [6,7] and therefore used to understand the success of systems [8] . This can be in terms of Quality of Experiences (QoE) and Quality of Services (QoS) [9,10] . Studies of this nature rely on users' data based on the usage of the services under investigation. ...
Chapter
Full-text available
Replication is one of the key techniques used in distributed systems to improve high data availability, data access performance and data reliability. To optimize the maximum benefits from file replication, a systems that includes replicas need a strategy for selecting and accessing suitable replicas. A replica selection strategy determines the available replicas and chooses the most access files. In most of these access frequency based solutions or popularity of files are assuming that files are independent of each other. In contrast, distributed systems such as peer-to-peer file sharing, and mobile database, files may be dependent or correlated to one another. Thus, this paper focused on the combination of popularity and affinity files as the most important parameters in selecting replicas in distributed environments. Herein, a replica selection is proposed focusing on popular files and affinity files. The idea is to improve data availability in distributed data replica selection strategy. A P2P simulator, PeerSim, is used to evaluate the performance of the dynamic replica selection strategy. The simulation results provided a proof that the proposed affinity replica selection has contributed towards a new dimension of replica selection strategy that incorporates the affinity and popularity of file replicas in distributed systems.
Article
Full-text available
The replica strategies in traditional cloud computing often result in excessive resource consumption and long response time. In the edge cloud environment, if the replica node cannot be managed efficiently, it will cause problems such as low user’s access speed and low system fault tolerance. Therefore, this paper proposed replica creation and selection strategy based on the edge cloud architecture. The dynamic replica creation algorithm based on access heat (DRC-AH) and replica selection algorithms based on node service capability (DRS-NSC) were proposed. The DRC-AH uses data block as replication granularity and Grey Markov chain to dynamically adjust the number of replicas. After the replica is created, when client receives the user’s request, the DRS-NSC selects the best replica node to respond to the user. The experiments show that the proposed algorithms have significant advantages in prediction accuracy, user’s request response time, resource utilization, etc., and improve the performance of the system to a certain extent.
Article
Full-text available
Data in distributed systems are often replicated into different storage elements in order to facilitate their access. This allows optimizing execution time and bandwidth consumption, ensures load balancing and increases data availability and quality of service. Several replication strategies have then been proposed in the literature. In this work, a new evaluation metric for replication strategies is introduced and experimentally evaluated. This metric, called SAvE, serves to tackle a key feature, although neglected in the literature, which is the ability of a replication strategy to exploit the most available sites in the system. The design of such a metric requires an accurate determination of the availability degree of each site. A new measurement of site availability, denoted SA, is then designed to be integrated into SAvE while overcoming the drawbacks experienced by existing measurements. Extensive experiments are performed using the OptorSim simulator to show the accuracy and the effectiveness of our contributions.
Article
Full-text available
Data Grids deal with geographically-distributed large-scale data-intensive applications. Schemes scheduled for data grids attempt to not only improve data access time, but also aim to improve the ratio of data availability to a node, where the data requests are generated. Data replication techniques manage large data by storing a number of data files efficiently. In this paper, we propose centralized dynamic scheduling strategy-replica placement strategies (CDSS-RPS). CDSS-RPS schedule the data and task so that it minimizes the implementation cost and data transfer time. CDSS-RPS consists of two algorithms, namely (a) centralized dynamic scheduling (CDS) and (b) replica placement strategy (RPS). CDS considers the computing capacity of a node and finds an appropriate location for the job. RPS attempts to improve file access time by using replication on the basis of number of accesses, storage capacity of a computing node, and response time of a requested file. Extensive simulations are carried out to demonstrate the effectiveness of the proposed strategy. Simulation results demonstrate that the replication and scheduling strategies improve the implementation cost and average access time significantly.
Conference Paper
The best replica selection problem is one of the important aspects of data management strategy of data grid infrastructure. Recently, rough set theory has emerged as a powerful tool for problems that require making optimal choice amongst a large enumerated set of options. In this paper, we propose a new replica selection strategy using a grey-based rough set approach. Here first the rough set theory is used to nominate a number of replicas, (alternatives of ideal replicas) by lower approximation of rough set theory. Next, linguistic variables are used to represent the attributes values of the resources (files) in rough set decision table to get a precise selection cause, some attribute values like security and availability need to be decided by linguistic variables (grey numbers) since the replica mangers' judgments on attribute often cannot be estimated by the exact numerical values (integer values). The best replica site is decided by grey relational analysis based on a grey number. Our results show an improved performance, compared to the previous work in this area.
Article
The Globus Data Grid architecture provides a scalable infrastructure for the management of storage resources and data that are distributed across Grid environments. These services are designed to support a variety of scientific applications, ranging from high-energy physics to computational genomics, that require access to large amounts of data (terabytes or even petabytes) with varied quality of service requirements. By layering on a set of core services, such as data transport, security, and replica cataloging, one can construct various higher-level services. In this paper, we discuss the design and implementation of a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives. We first present a basic replica selection service design, then show how dynamic information collected using Globus information service capabilities concerning storage system properties can help improve and optimize the selection process. We demonstrate the use of Condor's ClassAds resource description and matchmaking mechanism as an efficient tool for representing and matching storage resource capabilities and policies against application requirements.
Article
The main function of replica selection in data grids is to select replica locations that show the best QoS amongst many scattered locations across the world. Estimating the QoS properly in grid environments is a challenging task because QoS attributes are different and users' preferences are different as well. Current replica selection algorithms do not providethe best QoS because they do not consider users' preferences. As a result, optimum replica selection from the data grid sites remains a significant challenge. This challenge is met here by presenting a data grid scheme based on a stock market model in which customers typically exchange an equal value of different shares according to their priorities. The scheme analyzes user preferences to select the best replica and provide a newly defined User Preference Algorithm (UPA). UPA is designed and developed specifically to meet the user's QoS, such as response time, availability and security. The algorithmic technique is based on a sound mathematical model and a supporting metric to measure User's Preferred Quality (UPQ) parameters, which integrates user preferences with the data selection process resulting in a novel multi-criteria technique. Computer simulation performance results demonstrated that the UPA performed the best compared to the competitors (the Dsystem, LfuOptimiser and Random algorithms) in terms of QoS compliance with user preferences. UPA imitates the stock market by assigning points to the users, representing QoS parameters. These points are distributed to the users prior to the data selection process. The quantity of points is dependent upon the QoS available on the sites holding the required replica.