ArticlePDF Available

Monetary-and-QoS Aware Replica Placements in Cloud-Based Storage Systems

February 2015

February 2015
2015:672-675

DOI:10.1109/CloudCom.2014.109

Authors:

Lingfang Zeng

Zhejiang Lab

Yang Wang

University of Alberta

Show all 7 authorsHide

This paper proposes a replication cost model and two greedy algorithms, named GS QoS and GS QoS C1, for replication placements in cloud-based storage systems. The model aims to minimize replication cost with full consideration of quality of user access to storage nodes. Our two algorithms employ a utility measurement to guide placement procedures. Our final experimental results show that 1) GS QoS outperforms GS QoS C1, 2) both algorithms have more economical results than those from existing greedy replica placement algorithm.

Performance CDF of GS QoS.

…

Figures - uploaded by Kenneth B. Kent

Content may be subject to copyright.

Content uploaded by Kenneth B. Kent

Content may be subject to copyright.

Monetary-and-QoS Aware Replica Placements in

Cloud-Based Storage Systems

Lingfang Zeng †, Shijie Xu [, Yang Wang [, Xiang Cui §, Tan Wee Kiat §, David Bremner [, Kenneth Kent [

†Wuhan National Laboratory for Optoelectronics

School of Computer, Huazhong University of Science and Technology

[IBM Centre for Advanced Studies (CAS Atlantic)

University of New Brunswick, Fredericton, Canada E3B 5A3

§Department of Electrical and Computer Engineering,

National University of Singapore, Singapore 117576

E-mail: lfzeng@hust.edu.cn, {shijiexu, ywang8, bremner, ken}@unb.ca, cuixiang23@gmail.com

Abstract—This paper proposes a replication cost model and

two greedy algorithms, named GS QoS and GS QoS C1, for

replication placements in cloud-based storage systems. The model

aims to minimize replication cost with full consideration of

user access qualities to storage nodes. Our two algorithms

employ a utility measurement to guide placement procedures.

Our ﬁnal experimental results show that 1) GS QoS outperforms

GS QoS C1; 2) both algorithms have more economical results

than those from existing greedy site algorithm.

Index Terms—replication, greedy, cloud storage system

I. INTRODUCTION

Replication technology has been widely used to improve

the performance of network-based applications. By serving

clients with nearby replicas, it can signiﬁcantly reduce the

overall network latency. Additionally, the replication would

also beneﬁt service reliability. In practice, all of the computing

resources such as processors, storage, and networks, are not

failure free, and a failure is usually fatal to an existing running

system. As a result, replica sites have to be selected to serve

all clients of the failed nodes so that the service could be

continuous.

In spite of these beneﬁts, the replication should be designed

carefully due to the incurred cost. In a cloud environment,

cloud vendors invest a large amount of money to hardware

resources (i.e. data centers, power, network bandwidth and

machines), and then hire employees to monitor and maintain

these resources. As a result, service providers have to pay for

resources their applications consumed (e.g. network trafﬁc) in

cloud platforms. The more resources consumed by the service

applications, the higher the fee would be charged by the cloud

vendors.

It is still necessary to study replication in the cloud storage

system though it has been conducted intensively in traditional

content delivery network (CDN) and Grid systems. Contrary

to existing CDNs and grids, the distinct characteristics of

replication in the cloud storage systems are 1) end users will

contribute to the majority of network trafﬁc as they are content

owners; 2) The data sets are diverse and dynamic. In the cloud

storage systems (e.g., Dropbox, Tencent Weiyun, and Google

Storage) users update their data to the system continuously,

and the content sizes can be ranged from several MBs to

GBs. This trafﬁc is totally different from those in traditional

CDN where the content comes from the service providers and

remain the same for a long time. Consequently, the user access

trafﬁc patterns in CDNs and Grids are relatively more uniform

than those in the cloud storage systems.

This paper addresses the issue of replication placements

in the cloud storage systems. In this paper, a mathematical

model is built to minimize replication cost, and two monetary

and Qos-aware algorithms, named GS QoS and GS QoS C1,

respectively, are provided to resolve this model.

II. RE LATE D WOR K

Numerous works have been conducted to address the replica

placement problem in CDNs and Grids [3]. For example,

Mansouri et al. [5] provide a selection algorithm for replica

allocation, named combination of Modiﬁed DHR Algorithm

(MDHRA). Response time is calculated using factors, e.g.,

data transfer time and storage access latency, and then the

best replica location is determined. Li et al. [4] argue that the

placement of web proxies is critical to network performance

and can signiﬁcantly reduce the overall latency. In the paper,

they used a tree topology to model the placement problem, but

only the download cost is included in its cost model. In another

paper [7], Xu et al. offer solutions using a tree topology,

and include both upload and download costs. However, the

relevance of replication direction in the provisioning cost

between replica sites is not considered in their work, while

in [1], Chen et al. take a different approach that moves

away from the usual tree topology. In addition, all three

different costs i.e., upload, download, and storage costs are

included in their cost model. Although they emphasize the

importance of both choosing the set of replica sites as well as

specifying the replication directions, the algorithm is not QoS

friendly. Though all of the above algorithms may work well in

traditional CDNs and Grid systems, they overlook the trafﬁc

volume and its cost as well in the cloud storage systems, which

are the two main characteristics in the current cloud storage

systems.

The problem of replicas placements can be deﬁned as a

procedure to select Nnodes to host mreplicas with given

distance matrix Dso that the objectives are optimized. The

element d(i, j)in the distance matrix indicates a distance

metric between the ith request location and the jth storage

node location. As discussed in [2], this is a NP-hard problem

and thus many heuristic algorithms have been proposed to

resolve it. For example, three different greedy algorithms,

i.e., normal greedy algorithm,simple greedy algorithm and

heuristic algorithm are discussed in [6].

III. PLACEMENT MODE L AN D ALGORITHMS

In this section, we ﬁrst provide a model for replication

placements in the cloud-based storage system, then we present

two new Monetary-QoS aware greedy site (GS) algorithms,

GS QoS and GS Qos C1, to obtain an optimal placement

strategy.

A. Model

There are in total three kinds of entries in the cloud storage

systems, i.e. cloud storage nodes, client users, and data sets.

The cloud nodes provide storage space for the data. In case

of one-node failure, a nearby storage node can be used to

continue the service. The users, on the other hand, contribute

the data access volume for the cloud storage systems. They

synchronize contents from the local nodes to the storage nodes

everyday, but rarely download. These trafﬁcs are typically uni-

directed upload trafﬁc which is different from dual replication

directions inside of the cloud nodes or CDN nodes.

Thereafter, the total cost is a sum of the replication cost and

the user accesses cost. Replication cost is a sum of the network

trafﬁc costs (i.e., incoming and outgoing costs at one node)

during the course of replication and synchronization, and the

storage cost when a new node is selected for storage. The

access cost, on the other hand, only refers to the uploading

trafﬁc cost for a node when users access it and its value is

proportional to user access frequency.

Finally, our model is to ﬁnd a procedure of selecting storage

nodes from Ngiven nodes so that the objective cost in Eq.(1)

can be minimized:

cost =Xj∈nodes Cusers access traf f ic(j) + Creplication (j)(1)

B. Modiﬁcation to Greedy Site algorithm GS QoS

Algorithm 1 shows the general GS procedure for replica

placements. In this procedure, a node with the highest utility

is ﬁrst selected, and then potential users are assigned to it in

each round. The selection is repeated until all the users are

assigned.

The function utility(j)(GB/$) distinguishes among differ-

ent GS algorithms. A common deﬁnition of utility is the ratio

of total potential trafﬁc volume at node jand the total cost. A

lower utility value indicates less trafﬁc volume per expense.

Compared to the GS algorithm in [1], our GS QoS introduces

a QoS penalization factor afor the utility function (Algorithm

Algorithm 1 GS QoS procedure

1: E is the set of unassigned users

2: Ejis the current set of users who can be assigned to Cj

3: Ukis the kth user and Cjis the jth cloud node.

4: wkis the data size of requests from user Uk

5: while E!=0 do

6: j∗=argmax(utility(j))j∈all un-selected nodes

7: Ej∗is the set of unassigned users for node j∗

8: Assign all users in Ej∗to j∗

9: Select node j∗

10: E=E−Ej∗

11: end while

Algorithm 2 Utility(j)Calculation

1: Dj: download price, Pj: upload price, Sj: storage price.

2: Fis synchronization frequency and default value is 1.

3: Data set size W and replication cost:

Creplication(j) = W∗(Sj+Pj∗F+Di∗F)

4: trafﬁc cost of serving a user k assigned to node j:

Ck(j) = a∗wkDj

5: Analytical size of Request objects from user k to j:

wk(j) = awk

6: Size of all request objects for unassigned users:

WT(j) = X

wk(j), k ∈E

7: Analytical utility of site j, and k ∈all unassigned users:

utility(j) = WT(j)

(PkCk(j) + Creplication(j))

2), which adheres to the following rule:

a=(1,if user is not within QoS Distance

QoSD(i,j )

Q,otherwise (2)

As an element in QoS Distance matrix, QoSD(i, j)is the

maximal QoS distance between the user and a node.

C. Improvement GS QoS C1

It is possible that a selected node might not have a large

number of potential users and this would result in a waste

if such a node is selected. To avoid this case, an additional

constraint is used to determine whether to select a new node

for the replication or not. In GS QoS C1, we check whether

or not there would be sufﬁcient potential users for the node

to be selected. The constraint is:

Wt≥X

wk(1

n+β1

In this formula, k∈all assigned users. βis a coefﬁcient while

nis the number of total nodes and Nis the number of total

TABLE I: Summary of instances where GS QoS outperforms

GS GS QoS

n 20 20

nodes selection order 17,6,1,3,11,12,16 17,16,14,6,12,3,1,

,18,14,2,0,7,10, 10,18,7,11,0,4,9,

19,4,9 2,19

Numbers of nodes selected 16 16

Total cost 9.9 9.4

Relative cost 1 0.97456

users. This formula infers that potential volumes for the new

selected node should not be less than the average value over

all the existing selected nodes. The complete GS QOS C1

algorithm is shown in Algorithm 3:

Algorithm 3 GS QoS C1

1: E is the set of unassigned users

2: while E!=0 do

3: j∗=argmax(utility(j)) j∈all unselected sites

4: Ej∗is the set of unassigned users, select node j∗and

assigned users to it if:

5: Wt=Pkwk, k ∈Ej∗

6: if(Wt≥Pkwk(1

n+β1

N)) select node j∗

7: else repeat step 3 to ﬁnd the next best site

8: Assign all users in Ej∗to j∗

9: select node j∗

10: E=E−Ej∗

11: end while

IV. RES ULTS

In order to make our result comparable, existing data from

[1],which are prices and parameters for the cloud storage, is re-

used in our experiments. The replicate sites are assumed to be

randomly selected in a geographical space. During repeating

tests, we calculate statistics and study parameter impacts on

the replication results.

In addition to these, we also deﬁne two terms. One is

QoSD(i, j)in the QoS Distance matrix, which is the maximal

QoS distance between the user and a node. If the distance

between user iand node jexceeds QoSD(i, j ), the QoS of

the user can not be satisﬁed. The other is Relative cost for

algorithm i:

relative cost(i) = PGS,k (Callusers(k) + Cr eplication(k))

Pi,k (Callusers(k) + Creplication(k))

According to the relative cost, algorithm iis better than the

GS algorithm if relative cost is greater than 1.

A. Modiﬁed Greedy Site algorithm GS QoS

The y-axis in Fig 1 is CDF (Cumulative Distribution Func-

tion). According to this ﬁgure, GS QoS is better majority of

the time. The relative cost(GS QoS)value is 1.5 in 95%

Fig. 1: Performance CDF of GS QoS.

Fig. 2: Overall Performance CDF of GS QoS and

GS QoS C1.

and 1.0 in 20% of the test cases. This indicates that there are

only 20% of the tests cases where GS QoS would result in

more cost than that of GS.

The value of relative cost is impacted by both node

selection order and number of selected nodes. In the example

of Table 1, relative cost is less than 1 though two selected

node sets are equivalent. According to this table, if nodes

having lower outgoing cost are selected ﬁrst, the resultant

selection cost would effectively be less. As the utility in

GS QoS is calculated by including all unassigned users, it has

a tendency of choosing sites with less outgoing ﬁrst. Similar to

order, the number of selected nodes for replication also have

impacts on the relative cost value.

The relative cost comparisons of GS QoS and

GS QoS C1 are shown in Fig 2 and Fig 3. In Fig 2,

the relative cost that is less than 1.0 occurs at 18% for

GS QoS C1 while it occurs at 23% for GS QoS. This

implies that GS QoS C1 is better than the result of GS QoS

as GS QoS C1 outperforms GS in 82% of test cases while

it is only 77% for GS QoS. Additionally, there is signiﬁcant

reduction of relative cost when GS QoS C1 outperforms

GS QoS, i.e., instances with relative cost > 1. From Fig 3,

the boxed area for GS QoS C1 is also less than that of

Fig. 3: Comparison of boxplots of GS QoS and GS QoS C1.

Fig. 4: Varying Replica Size (W).

GS QoS, which further implies that GS QoS C1 has more

consistent results with GS than that of GS QoS.

The relationship between replica size and relative cost

is shown in Fig 4. According to this ﬁgure, relative cost

rises along with replica size and then remains steady at 1.1 in

both algorithms. The explanation is that the difference between

replication cost and user access trafﬁc cost is diminished along

with replica size.

The relative cost is also impacted by QoS Distance(Q) and

the number of nodes (n) which are shown in Fig 5 and Fig 6.

In both algorithms, relative cost climbs along with QoS

distance (Q) at ﬁrst, but goes down when Q is greater than 10.

This is because a node alone can nearly serve all users when

Q is large which in turn favors the the utility of GS QoS. This

ﬁgure also proves that GS QoS outperforms GS QoS C1.

Regarding to the nodes number, GS QoS is also better than

GS QoS C1. According to Fig 6, the rise in relative cost

with an increasing number of nodes for both algorithms is

a typical case where an increase in solution space lowers

the performance of a heuristic based algorithm. However, the

performance of the algorithms are still reasonably good, at a

relative cost of below 1.1 even when there are 40 sites in the

cloud.

Fig. 5: Varying QoS Distance (Q).

Fig. 6: Varying number of nodes (n).

V. CONCLUSION

We provide a model for replication placements in the

cloud storage systems and present two new monetary-and-QoS

aware greedy algorithms to minimize the replication costs.

Our results show that both the proposed algorithms not only

have more economical results than those from GS [1] but also

guarantee the QoS for user accesses.

REFERENCES

[1] Fangfei Chen, Katherine Guo, John Lin, and Thomas F. La Porta. Intra-

cloud lightning: Building cdns in the cloud. In INFOCOM, pages 433–

441, 2012.

[2] Magnus Karlsson and Christos Karamanolis. Bounds on the replication

cost for qos. Technical report, 2003.

[3] R. Kingsy Grace and R. Manimegalai. Dynamic replica placement and

selection strategies in data grids- a comprehensive survey. J. Parallel

Distrib. Comput., 74(2):2099–2108, February 2014.

[4] Bo Li, M.J. Golin, G.F. Italiano, Xin Deng, and K. Sohraby. On

the optimal placement of web proxies in the internet. In INFOCOM

’99. Eighteenth Annual Joint Conference of the IEEE Computer and

Communications Societies. Proceedings. IEEE, volume 3, pages 1282–

1290 vol.3, Mar 1999.

[5] Najme Mansouri, Gholam Hosein Dastghaibyfard, and Ehsan Mansouri.

Combination of data replication and scheduling algorithm for improving

data availability in data grids. Journal of Network and Computer

Applications, 36(2):711–722, 2013.

[6] Konstantinos Tserpes Dimosthenis Kyriazis Vassiliki Andronkou, Kon-

stantinos Mamouras and Theodora Varvarigou. Dynamic qos-aware data

replication in grid environments based on data importance. Future

Generation Computer Systems, 28:544–553, 2011.

[7] Jianliang Xu, Bo Li, and D.L. Lee. Placement problems for transparent

data replication proxy services. Selected Areas in Communications, IEEE

Journal on, 20(7):1383–1398, Sep 2002.

A Survey on Content Placement Algorithms for Cloud-Based Content Delivery Networks

Article

Full-text available

Sep 2017

This paper provides a comprehensive survey of Content Placement (CP) algorithms for Cloud-based Content Delivery Networks (CCDNs). CP algorithms are essential for content delivery for their major role in selecting content to be stored in the geographically distributed surrogate servers in the cloud to meet end-user demands with quality of service (QoS). Evidently, the key objectives of CP, i.e. cost and QoS, are competing. Cost is determined by the underlying cost model of the CCDN infrastructure while the delivered QoS is determined by where content is placed in CCDN. Therefore, we provide an overview of content and CCDN infrastructure. The overview of content includes content characteristics and the influence of Online Social Networking on CP. The overview of CCDN infrastructure includes elasticity and cost model, which affect CP. Our goal is to provide a holistic perspective of the aspects that impact CP algorithms and their efficiency. From the influential factors, we derive a set of design criteria for CP algorithms in CCDNs. We discuss the state-of-the-art CP algorithms for CCDNs and evaluate them against the well-motivated design criteria. We also delineate practical implications and uncover future research challenges. Index Terms—Cloud-based content delivery networks, content placement algorithms, content correlation, content popularity, online social networking relationships, quality of service, resource utilization, user-generated content

A Survey on Replica Server Placement Algorithms for Content Delivery Networks

Article

Full-text available

Nov 2016

Content Delivery Networks (CDNs) have gained immense popularity over the years. Replica server placement is a key design issue in CDNs. It entails placing replica servers at meticulous locations, such that cost is minimized and Quality of Service (QoS) of end-users is satisfied. Many replica server placement models have been proposed in the literature of traditional CDN. As the CDN architecture is evolving through the adoption of emerging paradigms, such as, cloud computing and Network Functions Virtualization (NFV), new algorithms are being proposed. In this paper, we present a comprehensive survey of replica server placement algorithms in traditional and emerging paradigm based CDNs. We categorize the algorithms and provide a summary of their characteristics. Besides, we identify requirements for an efficient replica server placement algorithm and perform a comparison in the light of the requirements. Finally, we discuss potential avenues for further research in replica server placement in CDNs.

QoS-Aware Indiscriminate Volume Storage Cloud

Article

Apr 2017

Storage quality-of-service (QoS) is a key issue for a storage cloud infrastructure. This paper presents QoSC, a QoS-aware indiscriminate volume Storage Cloud over the dynamic network, based on the Hadoop distributed file system. QoSC employs a data redundancy policy based on indiscriminate recovery volumes and a QoS-aware data placement strategy. We consider the QoS of a storage node as a combination of the transfer bandwidth, the availability of service, the workload (CPU utilization), the free storage space, and the failure rate of DataNodes. We have deployed QoSC on the campus network of Zhejiang University and have conducted a group of experiments and simulations on file storage and retrieval. The results show that QoSC improves the performance of file storage and retrieval and balances the workload among DataNodes, by being aware of QoS of DataNodes.

Intracloud lightning: Building CDNs in the cloud

Article

Full-text available

Mar 2012

Content distribution networks (CDNs) using storage clouds have recently started to emerge. Compared to traditional CDNs, storage cloud-based CDNs have the advantage of cost effectively offering hosting services to Web content providers without owning infrastructure. However, existing work on replica placement in CDNs does not readily apply in the cloud. In this paper, we investigated the joint problem of building distribution paths and placing Web server replicas in cloud CDNs to minimize the cost incurred on the CDN providers while satisfying QoS requirements for user requests. We formulate the cost optimization problem with accurate cost models and QoS requirements and show that the monthly cost can be as low as 2.62 US Dollars for a small Web site. We develop a suite of offline, online-static and online-dynamic heuristic algorithms that take as input network topology and work load information such as user location and request rates. We then evaluate the heuristics via Web trace-based simulation, and show that our heuristics behave very close to optimal under various network conditions.

On the Optimal Placement of Web Proxies in the Internet: The Linear Topology

Conference Paper

Full-text available

Jan 1998

Web caching or web proxy has been considered as the prime vehicle to cope with the ever-increasing demand for information retrieval over the Internet, WWW being a typical example. The existing work on web proxy has primarily focused on content based caching; relatively less attention has been given to the development of proper placement strategies for the potential web proxies in the Internet. This paper investigates the optimal placement policy of web proxies for a target web server in the Internet. The objective is to minimize the overall latency of searching the target web server subject to the network resources and traffic pattern. Specifically, we are interested in finding the optimal placement of multiple web proxies (m) among the potential sites (n) under a given traffic pattern. We model the problem as a Dynamic Programming problem, and we obtain an optimal solution for a linear array topology using O(n 2 m) time.

On the optimal placement of web proxies in the Internet

Conference Paper

Full-text available

Apr 1999

Web caching or web proxy has been considered as the prime vehicle of coping with the ever-increasing demand for information retrieval over the Internet, the WWW being a typical example. Existing work on web proxy has primarily focused on content based caching; relatively less attention has been given to the development of proper placement strategies for the potential web proxies in the Internet. In this paper, we argue that the placement of web proxies is critical to the performance and further investigates the optimal placement policy of web proxies for a target web server in the Internet. The objective is to optimize a given performance measure for the target web server subject to system resources and traffic pattern. Specifically, we are interested in finding the optimal placement of multiple web proxies (M) among potential sites (N) under a given traffic pattern. We show this can be modeled a dynamic programming problem. We further obtain the optimal solution for the tree topology using O(N<sup>3</sup>M<sup>2</sup>) time

Placement problems for transparent data replication proxy services

Article

Full-text available

Oct 2002

Transparent data replication has been considered a promising technique for improving system performance for a large distributed network. In this paper, a hybrid transparent replication model is presented. We address the problems of replication proxy placement in the network and data replica placement on the installed proxies given that a maximum of M proxies are allowed. Both reads and writes are considered in these problems. The performance objective is to minimize the total data transfer cost. To address the placement problems, we first present the optimal solutions for a single object in a tree network without/with constraint on the number of replicas. Based on that, two schemes, namely, aggregate access (AGGA) and weighted popularity (WPOP), are proposed for the replication proxy placement problem. An optimal solution is described for the replica placement problem. The performance of the proposed placement schemes is evaluated with a set of carefully designed simulation experiments over a wide range of system parameters. The results give us several helpful intuitions in deploying transparent replication proxies in a practical system.

Dynamic replica placement and selection strategies in data grids—A comprehensive survey

Article

Jan 2013
J PARALLEL DISTR COM

Data replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth. Data replication enhances data availability and thereby increases the system reliability. There are two steps involved in data replication, namely, replica placement and replica selection. Replica placement involves identifying the best possible node to duplicate data based on network latency and user request. Replica selection involves selecting the best replica location to access the data for job execution in the data grid. Various replica placement and selection algorithms are available in the literature. These algorithms measure and analyze different parameters such as bandwidth consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper, various replica placement and selection strategies along with their merits and demerits are discussed. This paper also analyses the performance of various strategies with respect to the parameters mentioned above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the data grid environment.

Combination of data replication and scheduling algorithm for improving data availability in Data Grids

Article

Mar 2013

Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper two algorithms are proposed, first a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in queue, the location of required data for the job and the computing capacity of sites. Second a dynamic data replication strategy, called the Modified Dynamic Hierarchical Replication Algorithm (MDHRA) that improves file access time. This strategy is an enhanced version of Dynamic Hierarchical Replication (DHR) strategy. Data replication should be used wisely because the storage capacity of each Grid site is limited. Thus, it is important to design an effective strategy for the replication replacement. MDHRA replaces replicas based on the last time the replica was requested, number of access, and size of replica. It selects the best replica location from among the many replicas based on response time that can be determined by considering the data transfer time, the storage access latency, the replica requests that waiting in the storage queue and the distance between nodes. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.

Bounds on the replication cost for QoS

Article

data replication, QoS, placement algorithms, lower bounds, integer programming Data replication is used extensively in wide-area distributed systems to achieve low data-access latency. Minimizing the cost of the resources used for replication is a key problem in these systems. The paper proposes a method to calculate lower bounds for the replication cost required to achieve certain QoS goals. We obtain bounds for the general case as well as for certain classes of replica placement heuristics. We observe that the cost of heuristics depends heavily on the workload and QoS goal. Based on these results, we discuss the inherent properties of heuristics that affect their cost and applicability to different environments. Abstract Data replication is used extensively in wide-area distributed systems to achieve low data-access latency. Minimizing the cost of the resources used for replication is a key problem in these systems. The paper proposes a method to calculate lower bounds for the replication cost required to achieve certain QoS goals. We obtain bounds for the general case as well as for certain classes of replica placement heuristics. We observe that the cost of heuristics depends heavily on the workload and QoS goal. Based on these results, we discuss the inherent properties of heuristics that affect their cost and applicability to different environments.

Dynamic QoS-aware data replication in grid environments based on data "importance"

Article

Mar 2012
FUTURE GENER COMP SY

Data replication comprises a standard fault tolerance approach for systems–especially large-scale ones–that store and provide data over wide geographical and administrative areas. The major topics that the task of data replication covers include the replica creation, placement, relocation and retirement, replica consistency and replica access. In a business context a number of constraints exists which are set by the infrastructure, network and application capabilities in combination with the Quality of Service (QoS) requirements that hinder the effectiveness of data replication schemes. In this paper, we examine how this combination affects the replication lifecycle in Data Grids and we introduce a set of interoperable novel file replication algorithms that take into account the infrastructural constraints as well as the ‘importance’ of the data. The latter is approximated through a multi-parametric factor that encapsulates a set of data-specific parameters, such as popularity and content significance.

Monetary-and-QoS Aware Replica Placements in Cloud-Based Storage Systems

Abstract and Figures

Recommended publications

A novel video replica placement strategy for storage cloud-based CDN

Toward cost‐effective replica placements in cloud storage systems with QoS‐awareness

A Survey for Replica Placement Techniques in Data Grid Environment

VMBackup: An efficient framework for online virtual machine image backup and recovery