ArticlePDF Available

Multiple Users Replica Selection in Data Grids for Fair User Satisfaction: A Hybrid Approach

March 2020
Computer Standards & Interfaces 71(4):103432

March 2020
71(4):103432

DOI:10.1016/j.csi.2020.103432

Authors:

Ayman Jaradat

Jadara University

Hitham Alhussian

Universiti Teknologi PETRONAS

Ahmed Patel

Universal Universities, Earth

D-system Utilizing 2 Attributes.

…

FUSs of 10 Experiments demonstrating efficiency of MRH over UPA & AHP.

…

a): Detailed Combinations between Users and Sites (Run number 1 to 5)

…

a): Performance of MRH-TUPQ vs UPA-TUPQ in Table 18 Set A

…

+12

h): UPA Efficiency TUPQ vs FUS in Table 18 Set B

…

Figures - uploaded by Ahmed Patel

Content may be subject to copyright.

Content uploaded by Ahmed Patel

Content may be subject to copyright.

Multiple Users Replica Selection in Data Grids for Fair User Satisfaction:

A Hybrid Approach

Ayman Jaradata,*, Hitham Alhussianb, Ahmed Patelc, and Suliman Mohamed Fatid

aComputer Science Department, College of Science and Human Studies at Hotat Sudair, Al Majmaah University,

Hotat Sudair, Majmaah 11952 - SA

[ay.jaradat@mu.edu.sa]

bCenter for Research in Data Science (CREDAS), Institute of Autonomous Systems, Universiti Teknologi PETRONAS,

32610 Bandar Seri Iskandar, Perak, Malaysia

[seddig.alhussian@utp.edu.my]

c Computer Networks and Security Laboratory (LARCES), State Université of Ceara (UECE), Fortaleza, Brazil

[whinchat@gmail.com]

d College of Computer and Information Sciences, Prince Sultan University

11586, Riyadh - SA

[smfati@yahoo.com]

*Corresponding author: Ayman K. Jaradat

Computer Science Department, College of Science and Human Studies at Hotat Sudair, Al Majmaah University,

Hotat Sudair, Majmaah 11952 - SA

[ay.jaradat@mu.edu.sa]

Abstract

Replica selection in data grids aims to select the best replica location based on the quality-of-service parameters preferred by the

user. This choice is important because of the limited number of available data resources in comparison with the large number of

users. Typically, user requests are fulfilled in a first-in, first-out manner. This may satisfy the users at the beginning of the queue

more than those at the end. Better results can be achieved by considering the requests of all users simultaneously, thereby leading to

a higher level of overall satisfaction; however, this is a difficult task because it requires a vast order of magnitude to search through

a huge set of users. Therefore, in this study, the proposed combination of hybrid of the genetic algorithm and user-preference

algorithm is used to overcome this problem. The results overwhelmingly verify that the proposed hybrid approach outperformed

previously known used methods significantly.

Keywords: Data grid, Replica Selection, Fair user satisfaction, Multi-objective decision, Quality-of-service (QoS) parameters,

Genetic Algorithm

1. Introduction

Data grids are special types of distributed systems that consist

of geographically discrete resources without any centralised

control. The capability of data grids to offer large-scale sharing

of internet or cloud-accessible data differentiates them from

traditional distributed computing. Applications that deal with

terabyte- and petabyte-scale data repositories are emerging as

critical resources for users. The data-grid infrastructure

potentially needs to support thousands of users, especially

those scientists who collaborate globally with a community of

users scattered across different geographical locations [1]. It is

evident that one virtual organisation alone may not be enough

to manage the massive volumes of data produced by

experiments and simulations. The exponential growth of data-

intensive applications has given rise to a new research field for

computer scientists and researchers to develop efficient

techniques and algorithms for scientific applications that

require an immense amount of data to be accessed, stored,

transferred, analysed, and replicated in geographically

distributed locations [2].

Replication and distribution of data among various grid sites

are required to address the need for increased data accessibility,

reliability, and availability. Replicated data requires a process

that involves the selection of replica locations from numerous

locations based on quality-of-service (QoS) parameters and

user preferences. This process is referred to as replica selection.

It is reported [3] that in data grids, jobs that are processing

typically fail because of hardware malfunction, infected

software, network failures, overloaded resources, or non-

availability of required resources. Furthermore, replicas of the

same dataset may have location-dependent access costs,

https://doi.org/10.1016/j.csi.2020.103432

Final Version in Computer Standards & Interfaces

Volume 71, August 2020, 103432

DRAFT VERSION

similar to tangible goods in actual economies [4]. Some users

attempt to optimise mapping to the required replica based on

the costs of their network operators or hosting services,

especially if a 95th-percentile billing mechanism is imposed

for the services [5]. Cost minimisation can be achieved by

decreasing the frequency of peak consumption or by not

exceeding the budgeted rates.

Replica-selection algorithms proposed in previous studies [5–

12] have attempted to satisfy user requests in a greedy manner

without considering the other users in the queue. This can lead

to the users at the beginning of the scheduling queue being

satisfied at the expense of those at the end of the queue. This

may lead to unfair or disproportionate user satisfaction. Fair

satisfaction, in the context of this study, implies an equal level

of QoS fulfilment for all the users in the queue. For example, if

a user pays $10 and receives services worth $9 translating to

90% satisfaction, then another user who pays $1,000 must also

receive services worth $900, thereby effectively obtaining 90%

satisfaction on the relative scale of satisfaction. In other words,

ensuring equality in terms of user satisfaction is essential to

meet fair QoS fulfilment requirements.

In this study, the replica-selection problem is introduced as a

multi-criterion decision-making problem. A global

optimisation algorithm is proposed to attain optimal efficiency

and fair satisfaction for all grid users. Specifically, the main

contributions of this study are to provide a detailed formulation

of the problem, the design of the solution, and the application

of a genetic algorithm (GA)-based solution.

The remainder of this paper is organised as follows: Section 2

describes the related work necessary to clarify the problem and

the development of approaches to replica selection. Section 3

presents the problem formulation and includes explanatory

examples. Section 4 focuses on the importance of GA as a

proposed solution. Section 5 describes the mathematical

modelling of the problem. Section 6 explains the system

design. Section 7 presents the performance metrics and

evaluation. Section 8 includes the results and discussion.

Finally, Section 9 presents the conclusions of the study.

2. Related Work

A problematic replica-selection algorithm that integrates QoS

parameters and transfer speeds has been investigated by many

researchers [6–8, 13]. A novel approach, called D-system [7],

has been previously proposed; it unifies three QoS parameters

(time, availability, and security) in the selection process. D-

system offers a method in which all the grid sites holding the

requisite replica are evaluated by unifying the three QoS

parameters into a single value. The time parameter refers to the

estimated time required to move the replica from the selected

site. The availability parameter is the ratio of the permitted time

declared by the site that hosts the replica to access its resources

to the anticipated time required to move the replica from that

grid site; this parameter is an element of the local policies of a

site that define the times allowed for serving others, sometimes

on weekends or after midnight. In a recent study [14],

availability was defined as a measure of the execution time and

bandwidth consumption of data replication to achieve

optimisation. The security parameter consists of the reputation,

capabilities, reliability, and self-protection of a site. In D-

system, the three parameters of a site are based on their values

before the replica location is selected. Each parameter is

assigned a value between 0 and 100. The next step is to use an

ideal site with a rating of 100 for each QoS parameter. Next, the

Euclidean distance between the potential and the ideal sites is

measured. The QoS metric is the distance between the ideal and

the candidate sites. This metric integrates time, availability,

and security and is abbreviated as TAS. The lowest value of

TAS characterises the best site.

D-system adopts a fixed-value ideal node that holds the values

(100, 100, 100) for time, availability, and security as a base for

rating the performances of the sites. However, it has some

limitations that are best illustrated in Figure 1 (which uses two

attributes rather than three) and in Figure 2 (which depicts the

use of three attributes).

Figure 1: D-system Utilizing 2 Attributes

Figure 2: D-system Utilizing 3 Attributes

In Figure 1, B characterises a site rated 50% for time and 50%

for security. Nodes M, B, L, and N are equidistant from the

ideal value and are given the same rate by D-system. Thus, one

of them is chosen randomly. The main limitation of D-system

is that one of the sites that fall within the diameter of the quarter

circle is chosen randomly. Moreover, D-system does not

consider site E even if the priority of the user or the application

is time. In this case, D-system must not consider availability

and security; however, it still calculates the rating based on all

the three parameters. Similarly, this limitation affects the rating

of site F if the primary concern of the user is security. A similar

scenario is observed in Figure 2, where all the nodes (B, L, M,

and R) fall on the surface of a one-eighth sphere and are rated

the same by D-system. Furthermore, nodes that are vital for

some users, such as E, K, and S, are overlooked. The

limitations of D-system have inspired researchers [7] to

propose a smart replica selection in data grids (RSDG) strategy

to address such limitations. The authors argued that the

limitations of D-system lie in the use of numerical attributes,

and they discuss whether it is better if linguistic attributes

(expressed in grey numbers) are used. To prevent replicas with

the same attribute values, grey numbers are typically used for

an exact number expression.

In their research, the authors used manually derived decision

attributes to formulate the decision table according to prior

observations of the performances of the data grid sites. In

another research [8], the authors improved the RSDG strategy

to allow it to function even in the absence of decision attributes

(historical replica information). However, they were not

concerned with addressing all the requests in the queue to

achieve fair user satisfaction for all the users. Furthermore, in

[15], the authors proposed a replica-selection strategy that

integrates three QoS parameters: time, security, and reliability,

in which a multi-criteria decision-maker is used to select the

best replica. The main objective of this system is to achieve fair

resource allocation amongst users, based on the above-

mentioned QoS parameters, to satisfy them. This approach has

been implemented based on a multiple-criteria decision maker

known as the analytical hierarchy process (AHP) [16] that

relies entirely on the historical average of the QoS attained by

users. For example, if history indicates that security is the least

important QoS parameter for a user, then means that the best

replica for that user is the one that has the highest security

rating. The first drawback of the system is that it targets a

uniform fairness based on historical information, and it does

not consider user preferences. It aims to deliver the same

portions of QoS to all the users, irrespective of what they desire

or pay for. Second, it considers the average for the users;

irrespective of the number of requests already made (the user

who requested 100 replicas is treated similarly to the one who

requested 1,000 replicas). Third, it performs the selection one

by one, which may result in better resource allocation based on

the comparison between the QoS for users at the beginning of

the queue and those for users at the end of the queue. Fourth, it

does not address site availability, and it is built under the

assumption that users are competing for limited data resources.

Moreover, using this approach, researchers have defined

security as an additional level to protect the data files (which,

typically already secured by the grid toolkits) from

unauthorised users. In fact, relying on the history of events may

prevent the system from selecting the best site because the

decision maker aims at balancing the QoS rather than

improving all the parameters.

The authors of [17] proposed a replica-selection process that

focused on grouping popularity and affinity files as the most

important parameters in selecting replicas to improve data

availability. However, this method relies on only one user’s

decision. Further, the authors of [18] proposed a new replica-

selection algorithm based on the ant-colony model, in which

the replica choice metrics (replica node network bandwidth,

network delay, etc.) are drawn by the ant-colony pheromone.

Their technique first optimises the ant-colony algorithm using

an initialisation formula to make each copy have its own initial

prime value. Next, by prioritising replica nodes on P2P, it

reduces the use of cloud replica nodes and the cost of service

providers. However, the ant-colony algorithm is a very slow

process because it goes through many repetitions to achieve the

best results. Moreover, the decisions are only based on one user

each time, resulting in unfair treatment of users. Another recent

study [19] combines replica creation with replica selection, in

which a dynamic replica-creation algorithm is based on hot

(high demand for access) files, and the replica-selection

algorithm is based on site service capability. The replica-

creation strategy is based on access heat that considers hot data

and guarantees that the demand for hot replicas in the next

cycle is met. In this replica-selection algorithm, the service

capability of the hosting site is considered; therefore, the load

capacity of the replica site and network distance between the

replica site and user are adequately addressed. This is because

the data transferred between the user and the site in term of

network bandwidth usage has a significant impact on the

reading speed of the user. However, this strategy only focuses

on bandwidth as the central aspect while ignoring other

important factors such as security, cost, availability, and fair

treatment of users.

The authors of [10] enhanced D-system by choosing the site

that has almost equal portions of each QoS parameter as a

balanced solution without addressing all the requests in the

queue to ensure good satisfaction for each user. Next, they

enhanced their work by proposing the user-preference

algorithm (UPA) [11], in which user preferences are

considered in the selection process, and the stock market model

is adopted to achieve equality among users. The user’s

preferred QoS (UPQ) was used as a performance metric, where

UPQ is the Euclidean distance between the desired and the

granted quality rates. For example, if the user orders a 90%

security rate and a 70% security rate is assigned; the distance

would be 20 UPQ. The smallest value of UPQ corresponds to

the best quality. Nevertheless, this approach individually

addresses each request without considering all the requests in

the queue to achieve good satisfaction for each user.

Furthermore, the authors of [20] proposed the use of dynamic

reliability and availability calculation to select the best replica.

Herein, reliability refers to how long the system has been

continuously active, and availability refers to the percentage of

time that the system is in operational mode. They argued that,

although reliability and availability are different, there is a

relationship between them. Reliability measures the time

between failures occurring in the system. Reliability and

availability are calculated and used to make the selection

decision for each request, without considering all the requests

in the queue to achieve good satisfaction for each user. Further,

the authors of [21] proposed centralised dynamic scheduling

strategy-replica placement strategies (CDSS-RPS) to minimise

the implementation cost and data-transfer time. CDSS-RPS

attempts to enhance the replica access time using replication,

based on a number of factors including the number of accesses,

storage capacity of the node, and response time of a nominated

replica. Similarly, this approach does not pay consider other

requests in the queue. The authors of [22] proposed a replica-

selection algorithm based on a three-criteria set, in which the

criteria were reliability, security, and response time. The best

replica was the one that produced the highest quality in the

minimum response time. They used a random process and

AHP, but their approach considers the replica requests one by

one based on requests that were initiated first.

As the existing literature demonstrates, no previous research

has addressed all the requests in the queue simultaneously to

achieve good satisfaction for each user. Furthermore, even

though the GA has been used in grid-job scheduling [23–25],

none of the previous researchers used it specifically in replica

selection. This research addresses the problem of replica

selection that involves making an important decision. The

solution focuses on the fair satisfaction of grid users to deliver

their required replicas with certain levels of security, time

expenditure, availability, and cost that are based on their

individual preferences.

In this study, a new multi-user genetic-based replica-selection

algorithm for data grids is proposed. The GA is a meta-

heuristic search method that enables huge solution spaces to be

heuristically searched using evolutionary methods observed in

nature [26]. Four QoS parameters are adopted in this study:

time (T), availability (A), security (S), and cost (C). Each of

these is rated on a scale of 100, where higher values correlate to

higher quality. The four values for each grid site are integrated

into one value. The process of defining and calculating these

QoS values was addressed in previous studies [11, 12, 27].

3. Problem Formulation

The proposed model involves pairing users with replica

locations in which each user must be assigned to only one

location. Each location has its own specifications, and each

user also has specific preferences. The assignment must be as

fair as possible in satisfying all users in a single batch

(scheduling cycle). For example, suppose there are four sites

with different QoS rates and four users with different QoS

requirements. In this study, one QoS parameter, namely replica

transfer speed, was used. If the transfer speed for the best site is

20 Gbps, then it will be rated as 100% available, and another

site with 15 Gbps transfer speed will consequently be rated as

available. However, the availability of

(15

100)

75%

the best site can reduce over time owing to many connections

occurring simultaneously, and this can result in another site

with a lower rate becoming the new best site.

As shown in Table 1, the first user requested a transfer speed

rated at 80%. Meanwhile, the best site at that time was rated at

80%. In this case, the user will be 100% satisfied because the

results are identical . However, this

(80

100

100)

pairing will degrade the rating of site 1 because some of the

bandwidth and slices of the storage system will be used.

Subsequently, the second user requested a 90% rated site but

was assigned a 70% rated site ; thus, the

(70

100

77)

level of satisfaction was 77%. Requesters 3 and 4 experienced

similar scenarios.

Table 1: Pairing Requests to Sites in an Arbitrary Order.

Site Number

Site 1

Site 2

Site 3

Site 4

Site’s Rate

Request Number

User's Requested Rate

User's Satisfaction

100%

77%

100%

71%

The above example demonstrates that there is an inefficient

allocation of resources, despite the relatively high level of

satisfaction. The satisfaction average is (100 + 77 + 100 + 71) /

4 = 87%. The standard deviation (SD) of such an allocation is

0.13, which is relatively unfair because it means that the user

satisfaction is highly variable. However, reordering the

requests in the queue to select the best replica can increase both

satisfaction level and QoS fairness.

In Table 2, the requests are reordered before the matching

process begins. If we start with a 90% requester and the best

available site is rated at 80%, the resulting satisfaction will be

88.87%. Consequently, requester 2, who has a target rate of

80%, will be granted the site rated 70%, resulting in 87.5%

satisfaction. The results of the third and fourth requests are

illustrated in Table 2.

Table 2: Pairing Requests to Sites in A Managed Order.

Site Number

Site 1

Site 2

Site 3

Site 4

Site’s Rate

Request Number

Requested Rate

Requester

Satisfaction

88.87%

87.5%

92.85%

83.33%

Reordering the selection slightly increases the average

satisfaction level (88.87 + 87.5 + 92.85 + 83.33) / 4 = 88.14%

and significantly increases the SD to 0.028. This indicates that

the satisfaction level has increased because it means that there

is only a slight variation in user satisfaction. This example

demonstrates that reordering the requests to make a selection

can yield a more equitable satisfaction for all users. Applying

this technique to one parameter is not difficult, but

simultaneously considering four parameters (time, availability,

security, and cost) is a very challenging task. For example, if

there are three sites and three requests, there are six different

solutions. The mathematical problem becomes one of

permutations without repetition, i.e., . However,

(

‒

)

each solution yields different total values and SDs of the UPQs.

Table 3 presents the result of each permutation.

Table 3: Different Pairing Requests to Sites.

Site

Number

Site 1

Site 2

Site 3

Requests Number

Site’s Rate

70,60,40,60

65,70,60,55

55,65,60,40

Total UPQs

Total UPQs + SD

User

1st order

70,70,60,60

75,75,80,60

80,60,60,70

UPQs

7.8

91.8

2nd order

User 1

User 3

User 2

UPQs

6.4

87.4

3rd order

User 2

User 1

User 3

UPQs

16.1

105.1

4th order

User 2

User 3

User 1

UPQs

100

5th order

User 3

User 1

User 2

UPQs

11.9

78.9

6th order

User 3

User 2

User 1

UPQs

0.8

72.8

For example, site 1 is rated 70% for time, 60% for availability,

40% for security, and 60% for cost, and user 1 requested 70%

for time, 70% for availability, 60% for security, and 60% for

cost. The Euclidean distance between site 1 and user 1’s

request is calculated as follows:

(

‒

)

(

‒

)

(

‒

)

(

‒

)

22 UPQ

Similarly, the Euclidean distance between site 2 and user 2’s

request is 23 UPQ, and the distance between site 3 and user 3’s

request is 39 UPQ. The total of all the UPQs is 22 + 23 + 39 =

85. A smaller UPQ indicates better quality and higher user

satisfaction. Pairing the sites with the users in a different order

(permutation) leads to a lower total UPQ (TUPQ) value. As

evident in row 5 of Table 3, when user 3 is paired with site 1,

user 1 with site 2, and user 2 with site 3, the TUPQ value is 67,

which is significantly lower than those for other combinations,

making this the best combination in terms of UPQ; however, in

terms of fairness, we can see that the UPQs are 24, 7, and 36 for

users 1, 2, and 3, respectively, which indicates that there are

significant differences in user satisfaction levels. The SD of

these three values (24, 7, and 36) is 11.9, which is very high. In

contrast, row 6 has the second-lowest TUPQ value of 72, with

individual UPQ values of 24, 23, and 25. These values (24, 23,

and 25) are more similar to each other, and this provides more

fairness for the users. Even though 67 is lower than 72 and

therefore better, the individual UPQs that make up a total of 72

exhibit less variation in user satisfaction, with an SD of only

0.8, making 72 the better option.

This example simplifies the problem. However, in real-life

scenarios, there are thousands of sites, and only ten are selected

each time, creating a very large search space. For example, to

select 10 out of 1,000 sites, the total number of permutations is

8.26 × 1059, and this requires the use of a solid algorithm.

4. Permutation and Genetic Algorithm

Evolutionary algorithms are typically used in optimisation

problems. For example, the authors of [28] used a GA to

facilitate QoS-based selection. However, this GA represents a

stochastic optimisation that is very useful for permutations as

reported in [29] owing to the following advantages:

GAs is not prone to being stuck in the local optima if the

attributes are prepared appropriately. This is a well-

known advantage of stochastic optimisation that is

particularly useful for permutation corrections as all the

criteria have strong local minima.

GAs demonstrate faster convergence in comparison to

other stochastic optimisation algorithms, specifically for

problems with a wide-dimensioned space [30].

GAs is very suitable for discrete optimisation because

they naturally use binary series to present the solution.

GAs are efficient for multi-objective genetic

optimisation because they enable the entire multi-

objective optimum solution set to be evolved in parallel

[31].

5. Mathematical Modelling

Let be a set of data grid sites and

𝑍

{

𝑔

….

𝑔

𝑛

}

𝑈

a set of users such that . Each site and

{

𝑢

….

𝑢

𝑚

}

𝑚

≤

𝑛

𝑢

𝑖

user has four parameters (T, A, S, and C). Each parameter

value is between 0 and 100. Each parameter value for each grid

site represents its rate based on its performance, and for each

user, it represents the preferences.

Each user is assigned to one grid site , and the assignment

𝑢

𝑖

𝑔

𝑗

is denoted by equation (1):

(1)

𝑅

𝑖𝑗

∈

{

0,1

}

𝑖

1,2,…

𝑚

𝑗

1,2,…

𝑛

To guarantee that each user is assigned to only one grid site

𝑢

𝑖

, equation (2) must be satisfied:

𝑔

𝑗

(2)

∑

𝑛

𝑗

𝑅

𝑖𝑗

𝑖

1,2,…

𝑚

5.1 Objective Function

The satisfaction model adopted in this research is measured by

the Euclidean distance between user and grid site and is

𝑢

𝑔

denoted by equation (3):

(3)

𝒅

(

𝒖

𝒊

𝒈

𝒋

)

(

𝑻

𝒖

𝒊

‒

𝑻

𝒈

𝒋

)

𝟐

(

𝑨

𝒖

𝒊

‒

𝑨

𝒈

𝒋

)

𝟐

(

𝑺

𝒖

𝒊

‒

𝑺

𝒈

𝒋

)

𝟐

(

𝑪

𝒖

𝒊

‒

𝑪

𝒈

𝒋

)

𝟐

The shortest distance is considered to be the best. Hence, the

objective functions are created by maximising the levels of

preference of each user in the required grid site. Satisfaction

maximisation is achieved by minimising the Euclidean

distance between each user i and its assigned grid site j.

However, minimising one user’s distance (increasing

satisfaction) may lead to increasing others’ distances

(decreasing satisfaction). Therefore, the solution must be a

trade-off between the satisfaction levels of users.

Now, let us assume that equation (4) is a general vector

representing all decision variables as follows:

(4)

∑

𝑚

𝑖

𝑑

(

𝑢

𝑖

𝑔

𝑗

)

𝑗

1,2,3,…

𝑛

Then, the following objectives are necessary:

1. Minimise the total Euclidean distances between the users

and the grid sites as denoted by equation (5):

(5)

𝑀𝑖𝑛

[

∑

𝑚

𝑖

∑

𝑛

𝑗

𝑅

𝑖𝑗

∙

𝑑

(

𝑢

𝑖

𝑔

𝑗

)

]

2. Minimise the SD between user distances as denoted by

equation (6). This implies that the users shall obtain

similar distances, which guarantees similar satisfaction

levels of users:

(6)

𝑀𝑖𝑛

[

𝑆𝐷

[

∑

𝑚

𝑖

∑

𝑛

𝑗

𝑅

𝑖𝑗

∙

𝑑

(

𝑢

𝑖

𝑔

𝑗

)

]

Therefore, the objective of this study is to minimise the total

distances among the user preferences and grid site rates while

minimising the SD between user distances. This means that the

solution is using multi-objective decision making with two

parameters.

5.2 Scalarisation

Scalarisation refers to the consolidation of various objectives

into one in a manner that repetitively sorts out the single-

objective optimisation problem with different parameters. This

can allow the researchers to obtain all optimal solutions for the

preliminary multi-objective problem. Many scalarisation

methods have been established [32]. To simplify the multi-user

approach of this study, scalarisation was adopted as the best

solution, and it is denoted by equation (7):

𝑀𝑖𝑛

[

𝛼

∙

𝑆𝐷

[

∑

𝑚

𝑖

∑

𝑛

𝑗

𝑅

𝑖𝑗

∙

𝑑

(

𝑢

𝑖

𝑔

𝑗

)

]

∙

∑

𝑚

𝑖

∑

𝑛

𝑗

𝑅

𝑖𝑗

∙

𝑑

(

𝑢

𝑖

𝑔

𝑗

)

]

(7)

where are used by the data grid administrator to scale

𝛼

and

𝛽

up (or down) the SD or the total distance value, based on the

experienced observations and their effects.

6. System Design

The architecture of the data-grid service is divided into two

levels. The upper level includes high-level services that use

low-level or core services. The replica-selection–optimisation

technique is a high-level service; thus, it invokes a number of

core services. Information of an individual resource, or a set of

resources, is collected and maintained by a grid-resource

information service (GRIS) daemon [33]. GRIS is designed to

gather and announce system-configuration metadata describing

that storage system. For example, each storage resource in the

Globus data grid [34] incorporates a GRIS to circulate its

information. Typically, GRIS provides information of

attributes such as storage capacity, seek times, and descriptions

of site-specific policies governing storage-system usage. Some

attributes are dynamic and vary with several frequencies, such

as total space, available space, queue waiting time, and mount

point. Others, such as disk transfer rate, are static.

The new approach, i.e., multiple-user replica-selection hybrid

approach (MRH) is illustrated in Figure 3. It functions by

receiving user requests via the grid resource broker (RB). Then,

the RB retrieves related physical file names and locations from

the replica location service (RLS). Subsequently, the algorithm

receives information of the sites that hold the replicas and their

network status from the GRIS such as network weather service

(NWS) [35], meta-computing directory service [36], and grid

file transfer protocol [36]. Further, the algorithm receives

security ratings for each replica location from the grid manager

and receives availability and cost information from the log files

of each replica. Next, each replica location is rated, and the new

algorithm matches the user requests with the best replica

location in a manner that ensures fair user satisfaction.

Figure 3: Data Access Overview in MRH

6.1 Genetic Algorithm

The GA is a meta-heuristic search method that enables huge

solution spaces to be heuristically searched using evolutionary

methods observed in nature. It is based on iterations and fitness

function. During each iteration cycle, the fitness value of each

individual in the population is evaluated systematically. Next,

the reproduction operations and selections are implemented to

produce a new population that is used again in the following

iteration of the GA. Reproduction consists of crossover and

mutation operations. The entire procedure is reiterated a

number of times, and each iteration is referred to as a

generation.

The GA comprises two main components the encoding schema

and the evaluation function (also known as the fitness function).

In this research, the chromosome was represented as a matrix

of integers. The location of each element indicates the users

who are requesting replicas, whereas the value of each element

specifies the site ID that holds the replica assigned to the user.

For example, the tenth element (entitled gene) of the

chromosome presented in Figure 4 denotes the tenth user in the

queue assigned to site 6. To measure the value or the quality of

a solution, the fitness function was implemented. Every

chromosome is associated with a fitness value. In this study, we

used a multi-criteria fitness function, as denoted in Eq. 7. The

Euclidean distance between the site QoS specifications desired

by the user and the assigned site QoS specifications is the first

criterion that must be minimised. The second criterion is the

SD of Euclidean distances between users. Both criteria were

assumed to be equally weighted (α = β = 1).

...................

……………

Figure 4: GA Chromosome encoding

6.1.1 Selection Operator

The selection of operators, as explained in this section, plays a

significant role in exploiting the benefits of the GA. A selected

operator determines the individuals that will be reproduced in

the next generation while aiming to disregard or replace poor

solutions with a predefined probability (the standard

probability value is 70% [37, 38]) to produce new offspring, as

depicted in Figure 5a. This study used only the simplest form of

crossover, where the crossover point was always located in the

middle of the chromosome. In future research, different

crossover points shall be examined.

6.1.2 Mutation Operator

The mutation operator randomly changes the integer (site

number) of the chromosome. This process is conducted with a

very small probability (e.g., 0.05% [37, 38]), as shown in

Figure 5b, to maintain diversity in the chromosome population

and also to overcome the local optima in the search space. The

GA when either a predefined number of generations are

produced or the fitness of the individuals in the population

converges. The output of the GA is the fittest chromosome in

all the populations that were produced.

Figure 5: a) One point crossover, b) Mutation

6.1.3 Repair Operator

In highly constrained optimisation problems, the crossover and

mutation operators typically produce invalid or infeasible

solutions that waste time. This problem can be solved by

incorporating problem-specific knowledge to either prevent the

genetic operators from producing infeasible solutions or to

repair these solutions when they occur [39]. Replica selection

can constrain the produced solutions to guide the GA during

the search process. In this research, the repair mechanism

began by checking each chromosome in an attempt to search

for any duplication. Next, the located duplications (if there are

any) were removed via the repair mechanism by assigning the

gene that has a duplicated site number with a new randomly

selected site number that is not already included in the

chromosome. Mutation generates sites randomly, which means

it is also prone to site duplication. Therefore, a repair operation

is required after each mutation.

6.2 Hybridising UPA with GA

Although the GA is reputed to be slow, it has been used in real-

time applications like scheduling in grid computing [25]. The

key is to merge a greedy algorithm with GA. The role of the

greedy algorithm is to fill up the initial population to decrease

the convergence time. Similarly, in this research, several runs

of the UPA were carried out to fill up the initial population, and

the results in terms of time convergence were promising when

compared to the results obtained using only the GA.

The exact sequence of steps in the proposed algorithm is as

follows:

The requested replica and the QoS preferred by users

are collected.

The replica’s physical file names and locations from

grid services are collected; see [12] for details.

Site information is collected and rated; see [11, 12] for

details.

The UPA is used to generate the initial population

through several runs.

The GA is used to pair each user request with the

preferred replica location in a manner that achieves

fair user satisfaction.

7. Performance Metrics and Evaluation

The performance of MRH was evaluated by means of

calculating, anatomising, and comparing its outputs with other

algorithms. Thus, two new metrics, known as average user

satisfaction and fair user satisfaction, were proposed to

evaluate and analyse the performance of the new algorithm.

7.1 Total User Satisfaction

The user-satisfaction criterion is highly important. In the

proposed model, user satisfaction was determined by the

distance between the QoS preferred by the user and the actual

QoS already assigned to that user. As a result, the metric used

to measure MRH is UPQ. The smaller the value of UPQ, the

better is the MRH performance. The UPQ level for any user is

calculated as denoted by equation (8):

(8)

𝑈𝑃𝑄

(

𝑇

𝑢

𝑖

‒

𝑇

𝑔

𝑗

)

(

𝐴

𝑢

𝑖

‒

𝐴

𝑔

𝑗

)

(

𝑆

𝑢

𝑖

‒

𝑆

𝑔

𝑗

)

(

𝐶

𝑢

𝑖

‒

𝐶

𝑔

𝑗

)

The TUPQ for all users is given by equation (9):

(9)

𝑇𝑈𝑃𝑄

∑

𝑛

𝑖

𝑈𝑃𝑄

The average UPQ for the users is another metric that is given

by equation (10):

(10)

𝑈𝑃𝑄

∑

𝑛

(

𝑇

𝑢

𝑖

‒

𝑇

𝑔

𝑗

)

(

𝐴

𝑢

𝑖

‒

𝐴

𝑔

𝑗

)

(

𝑆

𝑢

𝑖

‒

𝑆

𝑔

𝑗

)

(

𝐶

𝑢

𝑖

‒

𝐶

𝑔

𝑗

)

𝑛

7.2 Fair User Satisfaction:

The fair user satisfaction (FUS) metric measures the different

levels of QoS distributed to the users. The UPQs of the users

must be as fair as possible. Discrepancies in the UPQs of the

users must be reduced in an attempt to be fair to each user. As

UPQ is the proposed criterion, the SD metric is the best one to

be used to calculate the fairness level. FUS is calculated as

denoted by equation (11):

(11)

𝐹𝑈𝑆

𝑆𝐷(

𝑈𝑃𝑄

,……………

𝑈𝑃𝑄

𝑛

)

A smaller value of FUS indicates a better MRH performance in

terms of FUS.

7.3 Evaluation

To evaluate the performance of the replica-selection decision-

making process, a simulation tool is required for evaluating

system trade-offs. Based on this, a distributed and parallel

system search must be conducted. The particular simulation

tools used in this context are those in-line the grid

specifications [40] such as SimGrid, OptorSim, ChicSim,

Bricks, MicroGrid, GridSim, and Monarc. However, none of

the mentioned simulators support multi-request replica

selection, fairness concept, or QoS parameters. In this respect,

OptorSim is the most suitable simulator owing to its ability to

simulate data-replication strategies [41]. Consequently,

OptorSim was used as a base to build our own simulator. In

contrast, because no previous study on replica selection has

considered the case of multiple users and because the enhanced

version of D-system UPA [11] is the most similar to the system

used in this research, two main differences between MRH and

UPA were identified. First, UPA does not integrate cost.

Second, it pairs users to their preferred sites one by one with no

global considerations.

8. Results and Discussion

Experiments discussed in this section were based on two cases.

The first case compared the performance of the MRH system to

those of UPA and AHP. The second case examined the

scalability of the MRH system. The simulation setup is

presented in Table 4.

Table 4: Experiment Setup Parameters

Number of Users

10-70

Number of sites that holds the replica

20-200

Population size

Number of Users

Offspring Producing Probability

70%

Mutation Probability

0.05 %

Crossover

Uniform

Number of Generations

Number of Users

× Number of sites

8.1 Case (1): Fair user satisfaction and total UPQ

In the first step for both MRH and UPA simulations, 20 grid

sites were assumed, and each site had 4 QoS parameters with

values between 0 and 100. These values were generated

randomly as given in Table 5 and graphically shown in Figure

100

Site Number

QoS Value

Figure 6: Sites with their QoS Parameter Values

Table 5: Sites with their QoS Parameter Values

Site Number

100

It was also assumed that 10 users would independently request

one replica each, according to their preferred QoS levels for

each parameter. These values were randomly generated, as

shown in Table 6.

Table 6: Users with their Preferred QoS Parameter Values

User Number

100

The second step was to run the simulation process using all the

systems. AHP and UPA implement the selections for the users

depending on their positions in the scheduling queue,

beginning from the first and continuing to the last. FUSs and

TUPQs were computed using the MRH system, AHP, and

UPA, as shown in Tables 7 and 8. Figure 7 illustrates the

content of Tables 7 and 8 in terms of TUPQ and Figure 8

illustrates the content of Tables 7 and 8 in terms of FUS.

The same experiment was repeated 10 times with the same data

shown in Table 6 but with different orders. The results of the

aforementioned experiments are presented in Figures 7–12

which demonstrate that the efficiency of the MRH system is

better than those of AHP and UPA, the values resulting from

the simulations were computed, and the efficiency is calculated

as denoted by equation (12):

, (12)

Efficiency

𝑂

𝑎𝑣

‒

𝑈

𝑎𝑣

𝑂

𝑎𝑣

100

where is the other system value, and is the underlying

𝑂

𝑎𝑣

𝑈

𝑎𝑣

system value.

Table 7: FUSs & TUPQs of the 10 Experiments Using MRH & UPA Systems

MRH

UPA

Run

Number

TUPQ

FUS

TUPQ

FUS

Efficiency

Based on TUPQ

Efficiency

Based on FUS

178.86

7.69

218.92

11.59

18.30

33.65

178.86

7.69

230.09

11.55

22.27

33.42

178.86

7.69

186.71

7.85

4.20

2.04

180.87

7.41

230.64

12.01

21.58

38.30

178.86

7.69

238.10

10.79

24.88

28.73

180.87

7.41

218.92

11.59

17.38

36.07

180.87

7.41

235.47

10.45

23.19

29.09

178.86

7.69

197.62

11.77

9.49

34.66

178.86

7.69

190.69

10.72

6.20

28.26

180.87

7.41

212.03

13.55

14.70

45.31

Table 8: FUSs & TUPQs of the 10 Experiments Using MRH & AHP Systems

MRH

AHP

Run

Number

TUPQ

FUS

TUPQ

FUS

Efficiency

Based on TUPQ

Efficiency

Based on FUS

178.86

7.69

269.08

12.86

33.53%

40.20%

178.86

7.69

292.98

11.6

38.95%

33.71%

178.86

7.69

335.76

9.19

46.73%

16.32%

180.87

7.41

296.55

12.61

39.01%

41.24%

178.86

7.69

307.39

9.05

41.81%

15.03%

180.87

7.41

307.03

13.76

41.09%

46.15%

180.87

7.41

243.01

12.61

25.57%

41.24%

178.86

7.69

287.25

10.59

37.73%

27.38%

178.86

7.69

290.55

12.81

38.44%

39.97%

180.87

7.41

289.87

11.20

37.60%

33.84%

As illustrated in Figures 7–14, the MRH system performed

better than both UPA and AHP in all the experiments, in terms

of both TUPQ, which corresponds to increased user

satisfaction, and FUS, which indicates that higher quality and

more fairness have been achieved. Figure 7, shows the TUPQs

values of MRH is always less than the values of UPA and AHP

which means it is closer to user satisfaction similar situation

shown by Figure 8, but in terms of FUS. The efficiency MRH

reached 24.88% and 45.31% in terms of user satisfaction and

fairness, respectively, in comparison with UPA. On the other

hand, the efficiency in comparison with AHP reached 46.73%

% in terms of user satisfaction, and the fairness reached

46.15%, which highlights the significance of the MRH system.

Based on the above experiments and with respect to UPA, the

average TUPQ enhancement was 16.22%, and the SD was

6.98. With respect to AHP, the average enhancement was

38.05%, and the SD was 5.55. The average FUS enhancement

based on UPA was 30.95%, and the SD was 11.38, whereas the

average FUS enhancement based on UPA was 33.51%, and the

SD was 10.75. Figures 8 and 9 depict that MRH is always more

efficient that UPA and AHP in terms of TUPQs and FUSs.

100

200

300

400

MRH

UPA

AHP

Performance of MRH, UPA and AHP based

on TUPQs

Figure 8: TUPQs of 10 Experiments demonstrating efficiency of

MRH over UPA & AHP

MRH

UPA

AHP

Performance of MRH, UPA and AHP based on

FUSs

Figure 8: FUSs of 10 Experiments demonstrating efficiency of

MRH over UPA & AHP

0.00%

20.00%

40.00%

60.00%

80.00%

UPA

APH

Efficiency of MRH

Based on TUPQ over APH & UPA

Figure 9: Efficiency of MRH over UPA & APH Based on TUPQ

0.00%

50.00%

100.00%

UPA

APH

Efficiency of MRH

Based on FUS over APH & UPA

Figure 10: Efficiency of MRH over UPA & APH Based on FUS

100

150

200

250

300

MRH-TUPQ

UPA-TUPQ

Figure 11: TUPQs of 10 Experiments Using MRH and UPA

Systems

MRH-FUS

UPA-FUS

Figure 12: FUS of 10 Experiments Using MRH & UPA Systems

100

150

200

250

300

350

400

MRH-TUPQ

AHP-TUPQ

Figure 13: MRH versus AHP Systems in Terms of TUPQ

MRH-FUS

AHP-FUS

Figure 14: MRH versus AHP Systems in Terms of FUS

The detailed user and site combinations, ordered from 1 to 10

for simplicity, are presented in Figures 15(a) and 15(b), which

are generated from Table 9. The UPA and AHP match users

and sites based on the user positions in the scheduling queue.

The MRH system provides optimal solutions that are more

stable and always show similar matching for users and sites

with little variation based on the order of the users in the queue

initiated from the initial population. This indicates that the FUS

and TUPQ values in Tables 7 and 8 were nearly the same,

irrespective of the order of users in the queue. In contrast, the

FUS and TUPQ values obtained from the UPA and the AHP

always noticeably varied based on the user positions in the

queue. The efficiencies of the pairing results obtained from

UPA and AHP were less than that obtained from the MRH

system. Nevertheless, there is a small possibility that the UPA

and AHP accidentally achieve a performance similar to the

performance of the MRH system.

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

User 10

User 9

User 8

User 7

User 6

User 5

User 4

User 3

User 2

User 1

Figure 9(a): Detailed Combinations between Users and Sites (Run

number 1 to 5)

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

User 10

User 9

User 8

User 7

User 6

User 5

User 4

User 3

User 2

User 1

Figure 15(b): Detailed Combinations between Users and Sites

(Run number 6 to 10)

Table 9: 10 Experiments Using MRH, AHP & UPA Systems

Users

Run

Number

System

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

MRH

UPA

AHP

8.1.1 Statistical Testing

8.1.1.1 TUPQ

Tables 7 and 8 clearly demonstrate that the MRH system was

superior to both UPA and AHP systems. However, statistical

testing is a useful method that can be used to validate the

significance of these results. Therefore, a one-way repeated

measure, analysis of variance (ANOVA), was conducted to

compare the MRH, UPA, and AHP systems based on the

TUPQ metric. The means and SDs are presented in Table 10.

The TUPQ means were 179.6, 215.9, and 291.9 when the MRH

system, UPA, and AHP were used, respectively. To test

whether the differences between the means were significant or

not, a multivariate test was conducted, and the results are

presented in Table 11, which indicates that the system used had

a significant effect on the TUPQ values, especially when the

MRH system was used [Wilks’ Lambda = 0.023, F(2, 8) =

172.6, p < 0.0005, multivariate partial η2 = 0.977].

Table 10: TUPQ Descriptive Statistics for MRH, UPA & AHP

Systems

System

Mean

Std. Deviation

Number

MRH

179.6640

1.03796

UPA

215.9190

18.70657

AHP

291.9470

24.39006

Table 11: The Results of Multivariate Tests based on TUPQ

Effect

Value

Hypothesis

Error

Significance

Partial

ETA

Squared

Wilks'

Lambda

0.023

172.6

2.0

8.0

0.000

0.977

Additional analyses were conducted to determine the directions

of these differences in TUPQ values, and tests were performed

to shed light on the effects of each system. The results of these

multivariate tests are presented in Table 12.

Table 12: The Results of Tests between Systems Effects based on

TUPQ

Source

Type III Sum

of Squares

Mean Square

Significance

Partial

ETA

Squared

Intercept

1575658.336

8325.460

0.000

0.999

Error

1703.320

189.258

The one-way repeated measures of ANOVA showed that these

TUPQs were significantly different—F(1, 9) = 8325.4, p <

0.001, partial η2 = 0.999, repeated measures using a Bonferroni

adjustment α= 0.05/3 = 0.017. Moreover, pairwise

comparisons, as presented in Table 13, proved that the TUPQ

values (Euclidean distances) were significantly smaller when

the MRH system was used. Furthermore, there was a

significant reduction in the Euclidean distance when AHP was

compared with UPA. These results indicate that UPA

outperformed AHP, and that in comparison to both of them,

MRH system performed the best. In visual summary form,

Figure 16 presents the mean difference and standard error for

the TUPQ pairwise comparison.

Table 13: TUPQ Pairwise Comparisons

95% Confidence

Interval for

Differences

(I) factor1

Dimension1

(J) factor1

Dimension2

Mean

Difference

(I-J)

Std.

Error

Sig.

Lower

Bound

Upper

Bound

MRH

UPA

-36.255-

5.797

0.000

-53.261-

-19.249-

MRH

AHP

-112.283-

7.810

0.000

135.192-

-89.374-

AHP

UPA

-76.028-

11.506

0.000

-42.278-

-109.778-

Mean Difference

Std. Error

100

120

MRH vs UPA

MRH vs AHP

AHP vs UPA

TUPQ Pairwise Comparisons: MRH vs UPA vs AHP

Figure 16: Mean difference and standard error for the TUPQ

pairwise comparison

8.1.1.2 FUS

Similarly, one-way repeated measures of ANOVA were

conducted to compare the MRH, UPA, and AHP systems based

on the FUS metric. The means and SDs are presented in Table

14.

The TUPQ means were 7.58, 11.19, and 11.63 when the MRH

system, UPA, and AHP were used, respectively. To determine

if the difference between the means was significant, a

multivariate test was conducted. As shown in Table 15, there

was is a significant effect on FUS depending on the system that

was used, and this was especially evident when using the MRH

system [Wilks’ Lambda = 0.103, F(2, 8) = 34.902, p < 0.0005,

multivariate partial η2 = 0.897].

Table 14: FUS Descriptive Statistics for MRH, UPA & AHP

Systems

System

Mean

Std. Deviation

Number

MRH-system

7.58

0.145

UPA

11.19

1.46

AHP

11.63

1.61

Table 15: The Results of Multivariate Tests based on FUS

Effect

Value

Hypothesis

Error

Sig.

Partial

ETA

Squared

Wilks'

Lambda

0.103

34.902

2.000

8.000

0.000

0.897

More analyses were conducted to determine the directions of

these differences in FUS values, and tests were performed to

shed light on the effects of the different systems. The results of

these multivariate tests are presented in Table 16.

Table 16: The results of Tests between Systems Effects based on FUS

Source

Type III Sum

of Squares

Mean

Square

Sig.

Partial ETA

Squared

Intercept

3079.115

1491.755

0.000

0.994

Error

18.577

2.064

One-way repeated measures of ANOVA indicate that these

FUSs were significantly different: F(1, 9) = 1491.755, p <

0.001, partial η2 = 0.994, repeated-measures using a Bonferroni

adjustment α = 0.05/3 = 0.017. Moreover, pairwise

comparisons, as presented in Table 17, proved that the FUS

values were significantly smaller when the MRH system was

used. These results demonstrate that UPA outperformed AHP,

and that in comparison to both of them, MRH system

performed the best. In visual summary form, Figure 17 presents

the mean difference and standard error for the FUS pairwise

comparison.

Table 17: FUS Pairwise Comparisons

95% Confidence

Interval for

Differences

(I) factor1

Dimension1

(J) factor1

Dimension2

Mean

Difference

(I-J)

Std.

Error

Sig.

Lower

Bound

Upper

Bound

MRH

UPA

-3.609-

0.482

0.000

-5.022-

-2.196-

MRH

AHP

-4.050-

0.532

0.000

-5.611-

-2.489-

AHP

UPA

-0.441-

0.532

1.000

-1.121-

-2.003-

Mean Difference

Std. Error

MRH vs UPA

MRH vs AHP

AHP vs UPA

FUS Pairwise Comparison: MRH vs UPA vs AHP

Figure 17: Mean difference and standard error for the TUPQ

pairwise comparison

8.2 Case (2): Scalability test and best replica

Simulations were conducted using various methods, and the

results were compared with those of the UPA and the AHP to

determine the feasibility and scalability of the proposed

system. UPA was expected to be superior to AHP because

UPA was designed specifically for replica selection with

multiple parameters, whereas AHP is a general-purpose

decision model. In this simulation, the total number of grid

sites was one independent variable, and the number of users

was another independent variable. Therefore, nine scenarios

were examined, as shown in Tables 18 and 19 respectively. For

ease of understanding and interpretability, the results are

presented through visualisation: Table 18 is depicted in Figures

18(a) to 18(i), and Table 19 is depicted in Figures 19(a) to 19(i)

respectively.

Table 18: The Performance of 9 Experiments Using MRH

& UPA Systems

MRH

UPA

Scenarios

Sets

Number of sites

No of requests

TUPQ

FUS

TUPQ

FUS

Efficiency Based

on TUPQ

Efficiency

Based on FUS

144.60

1.13

145.39

4.09

0.54%

72.37%

288.74

4.12

313.27

4.4

7.83%

6.36%

406.72

4.81

436.28

6.09

6.78%

21.02%

222.29

2.68

227.18

2.75

2.15%

2.55%

343.20

3.56

367.88

4.20

6.71%

15.24%

100

705.515

6.00

777.04

6.32

9.20%

5.06%

279.43

2.47

303.72

3.09

8.00%

20.06%

464.10

3.46

488.75

3.48

5.04%

6.46%

200

644.33

3.35

695.19

3.53

7.32%

5.10%

Table 19: The Performance of 9 Experiments Using MRH & AHP

Systems

MRH

AHP

Scenarios

Sets

Number of sites

No of requests

TUPQ

FUS

TUPQ

FUS

Efficiency

Based on

TUPQ

Efficiency

Based on FUS

144.60

1.13

390.15

10.10

62.94%

88.81%

288.74

4.12

669.87

11.94

56.90%

65.49%

406.72

4.81

995.46

10.27

59.14%

53.16%

222.29

2.68

662.07

9.33

66.43%

71.28%

343.20

3.56

1023.46

13.13

66.47%

72.89%

100

705.515

6.00

1737.53

12.11

59.40%

50.45%

279.43

2.47

1054.8

12.35

73.51%

80.00%

464.10

3.46

1750.10

10.95

73.48%

68.40%

200

644.33

3.35

2362.23

11.07

72.72%

69.74%

The first chromosome used in the MRH system was the one

obtained from UPA; thus, both systems began from the same

point. The results obtained from these simulations demonstrate

that the MRH system was scalable and outperformed the UPA

and AHP in all the scenarios. The fairness efficiency was

highly significant because it reached 72.37% in comparison to

UPA and 88.81% in comparison to AHP. The superiority of the

MRH system can be attributed to its nature as a weighted

algorithm that includes prior Multiple-User considerations

before making any decision. In contrast, UPA and AHP are

both greedy. They satisfy the current user without making any

prior considerations about the remaining users, similar to the

scenario of closest-city selection in the travelling-salesman

problem, which results in very long distances at the end.

Moreover, in terms of the TUPQ performance metric, the MRH

system delivered better results than UPA. The average

improvement value was 5.95% with an SD of 2.8. When

compared with AHP, the average TUPQ improvement value

was 65.66% with an SD of 22.27. On the other hand, the

average FUS improvement when compared with UPA was

16.48% with an SD of 2.87. Compared with AHP, the average

FUS improvement was 68.91% with an SD of 11.95.

100

200

300

400

MRH- TUPQ

UPA- TUPQ

Number of requests

Scenarios of Set A with 50 sites

Figure 18(a): Performance of MRH-TUPQ vs UPA-

TUPQ in Table 18 Set A

200

400

600

800

MRH- TUPQ

UPA- TUPQ

Number of requests

Scenarios of Set B with 100 sites

Figure 18(b): Performance of MRH-TUPQ vs UPA-

TUPQ in Table 18 Set B

200

400

600

800

MRH- TUPQ

UPA- TUPQ

Number of requests

Scenarios of Set C with 50 sites

Figure 18(c): Performance of MRH-TUPQ vs UPA-

TUPQ in Table 18 Set C

MRH- FUS

UPA- FUS

Number of requests

Scenarios of Set A with 50 sites

Figure 18(d): Performance of MRH-FUS vs UPA-FUS in

Table 18 Set A

MRH- FUS

UPA- FUS

Number of requests

Scenarios of Set B with 100 sites

Figure 18(e): Performance of MRH-FUS vs UPA-FUS in

Table 18 Set B

0.5

1.5

2.5

3.5

MRH- FUS

UPA- FUS

Number of requests

Scenarios of Set C with 50 sites

Figure 18(f): Performance of MRH-FUS vs UPA-FUS in

Table 18 Set C

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Efficiency Based on TUPQ

Efficiency Based on FUS

Number of requests

Scenarios of Set A with 50 sites

Figure 18(g) UPA Efficiency TUPQ vs FUS in Table 18

Set A

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Efficiency Based on TUPQ

Efficiency Based on FUS

Number of requests

Scenarios of Set B with 100 sites

Figure 18(h): UPA Efficiency TUPQ vs FUS in Table 18

Set B

20%

40%

60%

80%

100%

Efficiency Based on TUPQ

Efficiency Based on FUS

Number of requests

Scenarios of Set C with 200 sites

Figure 18 (i): UPA Efficiency: TUPQ vs FUS in Table 18

Set C

100

200

300

400

MRH- TUPQ

AHP- TUPQ

Number of requests

Scenarios of Set A with 50 sites

Figure 19(a): Performance of MRH-TUPQ vs AHP-TUPQ

in Table 19 Set A

100

200

300

400

500

600

700

800

MRH- TUPQ

AHP- TUPQ

Number of requests

Scenarios of Set B with 100 sites

Figure 19(b): Performance of MRH-TUPQ vs AHP-TUPQ

in Table 19 Set

100

200

300

400

500

600

700

800

MRH- TUPQ

AHP- TUPQ

Number of requests

Scenarios of Set C with 50 sites

Figure 19(c): Performance of MRH-TUPQ vs AHP-TUPQ

in Table 19 Set C

MRH- FUS

AHP- FUS

Number of requests

Scenarios of Set A with 50 sites

Figure 19(d): Performance of MRH-FUS vs AHP-FUS in

Table 19 Set A

MRH- FUS

AHP- FUS

Number of requests

Scenarios of Set B with 100 sites

Figure 19(e): Performance of MRH-FUS vs AHP-FUS in

Table 19 Set B

0.5

1.5

2.5

3.5

MRH- FUS

AHP- FUS

Number of requests

Scenarios of Set C with 50 sites

Figure 19(f): Performance of MRH-FUS vs AHP-FUS in

Table 19 Set C

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Efficiency Based on TUPQ

Efficiency Based on FUS

Number of requests

Scenarios of Set A with 50 sites

Figure 19(g) AHP Efficiency TUPQ vs FUS in Table 19 Set

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Efficiency Based on TUPQ

Efficiency Based on FUS

Number of requests

Scenarios of Set B with 100 sites

Figure 19(h): AHP Efficiency TUPQ vs FUS in Table 19

Set B

20%

40%

60%

80%

100%

Efficiency Based on TUPQ

Efficiency Based on FUS

Number of requests

Scenarios of Set C with 200 sites

Figure 19(i): AHP Efficiency: TUPQ vs FUS in Table 19

Set C

9. Conclusion

In this complex study, a new hybrid replica-selection approach

was introduced to the data grid environment. This algorithm

integrates the QoS attributes of time, site availability, security,

cost, and user preferences into the replica-selection decision-

making process. The primary achievement of this proposed

approach was the simultaneous addressing of multiple requests

in the queue that was clearly demonstrated and mathematically

modelled. The GA was used owing to the complexity of the

problem. FUS and the average are two new metrics

UPQ

proposed to measure the performance of the approach. The

simulation experiments adopted and extended some modules in

OptorSim. The robustness of the approach was investigated,

and the experimental results were presented. The simulation

results indicate that the new approach enhanced the

performance of the grid environment by reducing both FUS

and and increasing efficiency up to a maximum of 25%.

UPQ

The slow computational speed of the GA is the main drawback

of the proposed system. Therefore, in our planned future

research work, methods of reducing the search space by using

Artificial Intelligence and Machine Learning means as well as

increasing the computational speed of GA to improve the

efficiency for real-time applications will be investigated,

simulated and prototyped in pilot environments to assess its

performance and viability against a profound set of criteria.

References

[1] S. Vazhkudai, S. Tuecke, and I. Foster, "Replica selection in the

globus data grid," in Cluster Computing and the Grid, 2001.

Proceedings. First IEEE/ACM International Symposium on,

2001, pp. 106-113: IEEE.

[2] A. Chervenak et al., "Giggle: a framework for constructing

scalable replica location services," in Supercomputing,

ACM/IEEE 2002 Conference, 2002, pp. 58-58: IEEE.

[3] S. B. Priya, M. Prakash, and K. Dhawan, "Fault tolerance-

genetic algorithm for grid task scheduling using check point," in

Grid and Cooperative Computing, 2007. GCC 2007. Sixth

International Conference on, 2007, pp. 676-680: IEEE.

[4] S. Venugopal and R. Buyya, "An SCP-based heuristic approach

for scheduling distributed data-intensive applications on global

grids," Journal of Parallel and Distributed Computing, vol. 68,

no. 4, pp. 471-487, 2008.

[5] R. M. Rahman, R. Alhajj, and K. Barker, "Replica selection

strategies in data grid," Journal of Parallel and Distributed

Computing, vol. 68, no. 12, pp. 1561-1574, 2008.

[6] A. Jaradat, R. Salleh, and A. Abid, "Imitating K-Means to

Enhance Data Selection," Journal of Applied Sciences, vol. 9,

no. 19, pp. 3569-3574, 2009.

[7] R. M. Almuttairi, R. Wankar, A. Negi, and C. Rao, "Smart

Replica Selection for Data Grids using Rough Set

Approximations (RSDG)," in International Conference on

Computational Intelligence and Communication Networks,

Bhopal 2010, pp. 466-471: IEEE.

[8] R. M. Almuttairi, R. Wankar, A. Negi, and C. Rao, "Replica

Selection in Data Grids Using Preconditioning of Decision

Attributes by K-means Clustering (K-RSDG)," in Second

Vaagdevi International Conference on Information Technology

for Real World Problems, vcon, 2010, vol. 1, pp. 18-23: IEEE.

[9] P. Wendell, J. W. Jiang, M. J. Freedman, and J. Rexford,

"Donar: decentralized server selection for cloud services," in

the ACM SIGCOMM 2010 conference New York, NY, USA,

2010, vol. 40, pp. 231-242: ACM.

[10] A. Jaradat, A. H. M. Amin, and M. N. Zakaria, "Balanced QoS

Replica Selection Strategy to Enhance Data Grid," presented at

the the 2nd International Conference on Networking and

Information Technology Hong Kong, China, 2011.

[11] A. Jaradat, A. H. M. Amin, M. Zakaria, and K. J. Golden, "An

Enhanced Grid Performance Data Replica Selection Scheme

Satisfying User Preferences Quality of Service," European

Journal of Scientific Research, vol. 73, no. 4, pp. 527-538,

2012.

[12] A. Jaradat, A. Patel, M. N. Zakaria, and M. A. Amina,

"Accessibility algorithm based on site availability to enhance

replica selection in a data grid environment," Computer Science

and Information Systems, vol. 10, no. 1, pp. 105-132, 2013.

[13] O. Jadaan, W. Abdulal, M. A. Hameed, and A. Jabas,

"Enhancing Data Selection Using Genetic Algorithm," in

Second Vaagdevi International Conference on Information

Technology for Real World Problems, VCON, 2010, pp. 434-

439: IEEE.

[14] C. Hamdeni, T. Hamrouni, and F. B. Charrada, "Evaluation of

site availability exploitation towards performance optimization

in data grids," Cluster Computing, vol. 21, no. 4, pp. 1967-1980,

2018.

[15] H. H. E. Al-Mistarihi and C. H. Yong, "On fairness, optimizing

replica selection in data grids," IEEE transactions on parallel

and distributed systems, vol. 20, no. 8, pp. 1102-1111, 2008.

[16] T. L. Saaty, "Decision making with the analytic hierarchy

process," International Journal of Services Sciences, vol. 1, no.

1, pp. 83-98, 2008.

[17] W. Awang, M. Deris, O. F. Rana, M. Zarina, and A. Rose,

"Affinity Replica Selection in Distributed Systems," in

International Conference on Parallel Computing Technologies,

2019, pp. 385-399: Springer.

[18] G. Yang and Z. Liu, "Replica Selection Algorithm for

Streaming Media," in 2018 International Conference on

Mathematics, Modelling, Simulation and Algorithms (MMSA

2018), 2018: Atlantis Press.

[19] C. Li, J. Tang, and Y. Luo, "Scalable replica selection based on

node service capability for improving data access performance

in edge computing environment," The Journal of

Supercomputing, pp. 1-35, 2019.

[20] A. Abbasi, A. M. Rahmani, and E. Zeinali Khasraghi,

"Reliability and Availability Improvement in Economic Data

Grid Environment Based On Clustering Approach," Journal of

Advances in Computer Engineering and Technology, vol. 1, no.

4, pp. 1-14, 2015.

[21] B. Nazir, F. Ishaq, S. Shamshirband, and A. T. Chronopoulos,

"The Impact of the Implementation Cost of Replication in Data

Grid Job Scheduling," Mathematical and Computational

Applications, vol. 23, no. 2, p. 28, 2018.

[22] R. K. Grace and S. S. Kumar, "Replica Selection Using Random

and AHP Algorithms in Data Grid," International Journal on

Information Sciences & Computing, vol. 11, no. 1, 2017.

[23] S. Song, K. Hwang, and Y. K. Kwok, "Risk-resilient heuristics

and genetic algorithms for security-assured grid job

scheduling," IEEE Transactions on Computers, vol. 55, no. 6,

pp. 703-719, 2006.

[24] J. Carretero, F. Xhafa, and A. Abraham, "Genetic algorithm

based schedulers for grid computing systems," International

Journal of Innovative Computing, Information and Control,

vol. 3, no. 6, pp. 1-19, 2007.

[25] K. Z. Gkoutioudi and H. D. Karatza, "Multi-Criteria Job

Scheduling in Grid Using an Accelerated Genetic Algorithm,"

Journal of Grid Computing, vol. 10, no. 2, pp. 1-13, 2012.

[26] J. H. Holland, Adaptation in natural and artificial systems (no.

53). University of Michigan press, 1975.

[27] V. Vijayakumar and R. Banu, "Security for resource selection in

grid computing based on trust and reputation responsiveness,"

International Journal of Computer Science and Network

Security, vol. 8, no. 11, pp. 107-115, 2008.

[28] H. Sun and Y. Ding, "QoS scheduling of fuzzy strategy grid

workflow based on the bio-network," International Journal of

Computational Science and Engineering, vol. 6, no. 1, pp. 114-

121, 2011.

[29] D. Kolossa, B. U. Köhler, M. Conrath, and R. Orglmeister,

"Optimal Permutation Correction by Multiobjective Genetic

Algorithms," Proceedings of ICA, San Diego, CA, 2001.

[30] Z. Michalewicz, Genetic algorithms+ data structures. Springer,

1996.

[31] C. M. Fonseca and P. J. Fleming, "An overview of evolutionary

algorithms in multiobjective optimization," Evolutionary

computation, vol. 3, no. 1, pp. 1-16, 1995.

[32] A. Rubinov and R. Gasimov, "Scalarization and nonlinear

scalar duality for vector optimization with preferences that are

not necessarily a pre-order relation," Journal of Global

Optimization, vol. 29, no. 4, pp. 455-477, 2004.

[33] D. H. Kim and K. W. Kang, "Design and implementation of

integrated information system for monitoring resources in grid

computing," in the 10th International Conference on Computer

Supported Cooperative Work in Design, Nanjing, 2006, pp. 1-6:

Ieee.

[34] S. Vazhkudai, S. Tuecke, and I. Foster, "Replica selection in the

globus data grid," in the first IEEE/ACM International

Symposium on Cluster Computing and the Grid, Brisbane, Qld,

2001, pp. 106-113: IEEE.

[35] R. Wolski, "Dynamically forecasting network performance

using the network weather service," Cluster Computing, vol. 1,

no. 1, pp. 119-132, 1998.

[36] S. Fitzgerald, I. Foster, C. Kesselman, G. Von Laszewski, W.

Smith, and S. Tuecke, "A directory service for configuring

high-performance distributed computations," in the 6th IEEE

Symposium on High Performance Distributed Computing,

Portland, Oregon, 1997, pp. 365-375: IEEE.

[37] E. Amaldi, A. Capone, and F. Malucelli, "Optimizing base

station siting in UMTS networks," in the 53rd Vehicular

Technology Conference, Rhodes, 2001, vol. 4, pp. 2828-2832

vol. 4: IEEE.

[38] S. Gaber, M. El-Sharkawi, and M. N. El-deen, "Traditional

genetic algorithm and random-weighted genetic algorithm with

GIS to plan radio network," URISA Journal, vol. 22, no. 1, pp.

205-222, 2010.

[39] T. A. El-Mihoub, A. A. Hopgood, L. Nolle, and A. Battersby,

"Hybrid genetic algorithms: A review," Engineering Letters,

vol. 13, no. 2, pp. 124-137, 2006.

[40] A. Sulistio, C. S. Yeo, and R. Buyya, "A taxonomy of

computer‐based simulations and its mapping to parallel and

distributed systems simulation tools," Software: Practice and

Experience, vol. 34, no. 7, pp. 653-673, 2004.

[41] W. H. Bell, D. G. Cameron, A. P. Millar, L. Capozza, K.

Stockinger, and F. Zini, "Optorsim: A grid simulator for

studying dynamic data replication strategies," International

Journal of High Performance Computing Applications, vol. 17,

no. 4, pp. 403-416, 2003.

Authors Biographies

AYMAN JARADAT received the B.Sc. degree from Yarmouk University, Jordan, in 1989, the M.Sc.

degree from University Sains Malaysia, in 2007, and the Ph.D. degree from Universiti Teknologi Petronas,

in 2013. He was the Dean of the Faculty of Computer and Information Technology, Al-Madinah

International University. He is currently an Assistant Professor with Al Majmaah University. He is

specialized in computer science and his research interests include high-performance computing, grid

computing, cloud computing, genetic algorithm, and distributed algorithms and applications

HITHAM ALHUSSIAN received the B.Sc. and M.Sc. degrees in computer science from the School of

Mathematical Sciences, Khartoum University, Sudan, and the Ph.D. degree from Universiti Teknologi

Petronas, Malaysia, where he is currently a Senior Lecturer with the Computer and Information Sciences

Department and Core research member of Centre in Research and Data Science (CERDAS). His main

research interests are in Real-time parallel and distributed systems, cloud computing, big data mining and

machine learning.

AHMED PATEL received his MSc & PhD degrees in Computer Science from Trinity College Dublin

(TCD), University of Dublin in 1978 & 1984 respectively, specializing in the design, implementation &

performance analysis of packet switched networks. He is Research Professor at Universidade Estadual do

Ceará, Fortaleza, Brazil with key research interest in Advanced Computer Networking, Internet of Things,

Cloud Computing, Big Data, Predictive Analysis, Use of Advanced Computing Techniques, Impact of e-

social Networking, Closing the digital divide ICT gap and ICT Project Management. He has published

well-over 272 technical & scientific papers & co-authored three books, two on computer network security

& the third on group communications. He co-edited one book on distributed search systems for the

Internet and also co-edited & co-authored another book entitled: “Securing Information & Communication

Systems: Principles, Technologies & Applications”. He is a member of the Editorial Advisory Board of

International Journals and has participated in Irish, Malaysian, and European funded research projects.

SULIMAN MOHAMED FATI obtained his BS.c (2002), MS.c. (2009), and Ph.D. (2014) from Ain

Shams University-Egypt, Cairo University -Egypt. And Universiti Sains Malaysia (USM) Malaysia

respectively. Currently, he is an assistant Professor in College of Computers and Information Sciences,

Prince Sultan university, Saudi Arabia. His research interests focus on Internet of Things, Machine

Learning, Social Media Mining, Cloud Computing, Cloud Computing Security, and Information Security.

He has authored over 20 journal/conference papers, books, and book chapters. He is a member of different

professional bodies as IEEE, IACSIT, IAENG, and Institute of Research Engineers and Doctors, USA. He

is a reviewer in many international journals.

Replica Selection Algorithm in Data Grids: the Best-Fit Approach

Article

Full-text available

Dec 2021

Ayman Jaradat

The design of Data Grids allows grid facilities to manage data files and their corresponding replicas from all around the globe. Replica selection in Data Grids is a complex service that selects the best replica place amongst several scattered places based on quality of service parameters. All replica selection algorithms look for the best replica for the requesting users without taking into account the limitation of their network or hardware capabilities to find the best fit. This leaves capable users with limited ability to connect with the best replica places without fully utilizing their download speed. It furthermore compromises the best replica places and shifts capable users to lower quality replica places and degrades the whole Data Grid environment. To improve quality of service parameters the solution we propose is, a matching algorithm that matches the capabilities of grid user with replica providers that are the best fit. This best-fit approach takes into account both the capabilities of grid users and the capabilities of replica places and creates matches of almost similar capabilities. Simulation results proved that the best-fit algorithm outperforms previous replica selection algorithms.

Users’ Satisfaction with Internet Service Performance-Based on User Log

Article

Dec 2020

Affinity Replica Selection in Distributed Systems

Chapter

Full-text available

Jul 2019

Replication is one of the key techniques used in distributed systems to improve high data availability, data access performance and data reliability. To optimize the maximum benefits from file replication, a systems that includes replicas need a strategy for selecting and accessing suitable replicas. A replica selection strategy determines the available replicas and chooses the most access files. In most of these access frequency based solutions or popularity of files are assuming that files are independent of each other. In contrast, distributed systems such as peer-to-peer file sharing, and mobile database, files may be dependent or correlated to one another. Thus, this paper focused on the combination of popularity and affinity files as the most important parameters in selecting replicas in distributed environments. Herein, a replica selection is proposed focusing on popular files and affinity files. The idea is to improve data availability in distributed data replica selection strategy. A P2P simulator, PeerSim, is used to evaluate the performance of the dynamic replica selection strategy. The simulation results provided a proof that the proposed affinity replica selection has contributed towards a new dimension of replica selection strategy that incorporates the affinity and popularity of file replicas in distributed systems.

Scalable replica selection based on node service capability for improving data access performance in edge computing environment

Article

Full-text available

Nov 2019
J SUPERCOMPUT

The replica strategies in traditional cloud computing often result in excessive resource consumption and long response time. In the edge cloud environment, if the replica node cannot be managed efficiently, it will cause problems such as low user’s access speed and low system fault tolerance. Therefore, this paper proposed replica creation and selection strategy based on the edge cloud architecture. The dynamic replica creation algorithm based on access heat (DRC-AH) and replica selection algorithms based on node service capability (DRS-NSC) were proposed. The DRC-AH uses data block as replication granularity and Grey Markov chain to dynamically adjust the number of replicas. After the replica is created, when client receives the user’s request, the DRS-NSC selects the best replica node to respond to the user. The experiments show that the proposed algorithms have significant advantages in prediction accuracy, user’s request response time, resource utilization, etc., and improve the performance of the system to a certain extent.

Evaluation of site availability exploitation towards performance optimization in data grids

Article

Full-text available

Dec 2018
CLUSTER COMPUT

Data in distributed systems are often replicated into different storage elements in order to facilitate their access. This allows optimizing execution time and bandwidth consumption, ensures load balancing and increases data availability and quality of service. Several replication strategies have then been proposed in the literature. In this work, a new evaluation metric for replication strategies is introduced and experimentally evaluated. This metric, called SAvE, serves to tackle a key feature, although neglected in the literature, which is the ability of a replication strategy to exploit the most available sites in the system. The design of such a metric requires an accurate determination of the availability degree of each site. A new measurement of site availability, denoted SA, is then designed to be integrated into SAvE while overcoming the drawbacks experienced by existing measurements. Extensive experiments are performed using the OptorSim simulator to show the accuracy and the effectiveness of our contributions.

The Impact of the Implementation Cost of Replication in Data Grid Job Scheduling

Article

Full-text available

May 2018

Data Grids deal with geographically-distributed large-scale data-intensive applications. Schemes scheduled for data grids attempt to not only improve data access time, but also aim to improve the ratio of data availability to a node, where the data requests are generated. Data replication techniques manage large data by storing a number of data files efficiently. In this paper, we propose centralized dynamic scheduling strategy-replica placement strategies (CDSS-RPS). CDSS-RPS schedule the data and task so that it minimizes the implementation cost and data transfer time. CDSS-RPS consists of two algorithms, namely (a) centralized dynamic scheduling (CDS) and (b) replica placement strategy (RPS). CDS considers the computing capacity of a node and finds an appropriate location for the job. RPS attempts to improve file access time by using replication on the basis of number of accesses, storage capacity of a computing node, and response time of a requested file. Extensive simulations are carried out to demonstrate the effectiveness of the proposed strategy. Simulation results demonstrate that the replication and scheduling strategies improve the implementation cost and average access time significantly.

Replica Selection Algorithm for Streaming Media

Conference Paper

Full-text available

Jan 2018

Balanced QoS Replica Selection Strategy to Enhance Data Grid

Conference Paper

Full-text available

Nov 2011

Ayman Jaradat

Smart Replica Selection for Data Grids using Rough Set Approximations (RSDG)

Conference Paper

Nov 2010

The best replica selection problem is one of the important aspects of data management strategy of data grid infrastructure. Recently, rough set theory has emerged as a powerful tool for problems that require making optimal choice amongst a large enumerated set of options. In this paper, we propose a new replica selection strategy using a grey-based rough set approach. Here first the rough set theory is used to nominate a number of replicas, (alternatives of ideal replicas) by lower approximation of rough set theory. Next, linguistic variables are used to represent the attributes values of the resources (files) in rough set decision table to get a precise selection cause, some attribute values like security and availability need to be decided by linguistic variables (grey numbers) since the replica mangers' judgments on attribute often cannot be estimated by the exact numerical values (integer values). The best replica site is decided by grey relational analysis based on a grey number. Our results show an improved performance, compared to the previous work in this area.

Genetic algorithm based schedulers for grid computing systems

Article

Jan 2007

Replica Selection in the Globus Data Grid

Article

Apr 2001

The Globus Data Grid architecture provides a scalable infrastructure for the management of storage resources and data that are distributed across Grid environments. These services are designed to support a variety of scientific applications, ranging from high-energy physics to computational genomics, that require access to large amounts of data (terabytes or even petabytes) with varied quality of service requirements. By layering on a set of core services, such as data transport, security, and replica cataloging, one can construct various higher-level services. In this paper, we discuss the design and implementation of a high-level replica selection service that uses information regarding replica location and user preferences to guide selection from among storage replica alternatives. We first present a basic replica selection service design, then show how dynamic information collected using Globus information service capabilities concerning storage system properties can help improve and optimize the selection process. We demonstrate the use of Condor's ClassAds resource description and matchmaking mechanism as an efficient tool for representing and matching storage resource capabilities and policies against application requirements.

An enhanced grid performance data replica selection scheme satisfying user preferences quality of service

Article

Mar 2012

The main function of replica selection in data grids is to select replica locations that show the best QoS amongst many scattered locations across the world. Estimating the QoS properly in grid environments is a challenging task because QoS attributes are different and users' preferences are different as well. Current replica selection algorithms do not providethe best QoS because they do not consider users' preferences. As a result, optimum replica selection from the data grid sites remains a significant challenge. This challenge is met here by presenting a data grid scheme based on a stock market model in which customers typically exchange an equal value of different shares according to their priorities. The scheme analyzes user preferences to select the best replica and provide a newly defined User Preference Algorithm (UPA). UPA is designed and developed specifically to meet the user's QoS, such as response time, availability and security. The algorithmic technique is based on a sound mathematical model and a supporting metric to measure User's Preferred Quality (UPQ) parameters, which integrates user preferences with the data selection process resulting in a novel multi-criteria technique. Computer simulation performance results demonstrated that the UPA performed the best compared to the competitors (the Dsystem, LfuOptimiser and Random algorithms) in terms of QoS compliance with user preferences. UPA imitates the stock market by assigning points to the users, representing QoS parameters. These points are distributed to the users prior to the data selection process. The quantity of points is dependent upon the QoS available on the sites holding the required replica.

Multiple Users Replica Selection in Data Grids for Fair User Satisfaction: A Hybrid Approach

Figures

Recommended publications

Hybrid Performance-Oriented Scheduling of Moldable Jobs with QoS Demands in Multiclusters and Grids

Research on task scheduling algorithms of grid computing based on multipled QoS constrain and geneti...

Replica Selection Algorithm in Data Grids: the Best-Fit Approach

An enhanced grid performance data replica selection scheme satisfying user preferences quality of se...

GA Based Replica Selection in Data Grid

New Replica Selection Technique for Binding Cheapest Replica Sites in Data Grids