ArticlePDF Available

Monetary-and-QoS Aware Replica Placements in Cloud-Based Storage Systems

Authors:
  • Zhejiang Lab

Abstract and Figures

This paper proposes a replication cost model and two greedy algorithms, named GS QoS and GS QoS C1, for replication placements in cloud-based storage systems. The model aims to minimize replication cost with full consideration of quality of user access to storage nodes. Our two algorithms employ a utility measurement to guide placement procedures. Our final experimental results show that 1) GS QoS outperforms GS QoS C1, 2) both algorithms have more economical results than those from existing greedy replica placement algorithm.
Content may be subject to copyright.
Monetary-and-QoS Aware Replica Placements in
Cloud-Based Storage Systems
Lingfang Zeng , Shijie Xu [, Yang Wang [, Xiang Cui §, Tan Wee Kiat §, David Bremner [, Kenneth Kent [
Wuhan National Laboratory for Optoelectronics
School of Computer, Huazhong University of Science and Technology
[IBM Centre for Advanced Studies (CAS Atlantic)
University of New Brunswick, Fredericton, Canada E3B 5A3
§Department of Electrical and Computer Engineering,
National University of Singapore, Singapore 117576
E-mail: lfzeng@hust.edu.cn, {shijiexu, ywang8, bremner, ken}@unb.ca, cuixiang23@gmail.com
Abstract—This paper proposes a replication cost model and
two greedy algorithms, named GS QoS and GS QoS C1, for
replication placements in cloud-based storage systems. The model
aims to minimize replication cost with full consideration of
user access qualities to storage nodes. Our two algorithms
employ a utility measurement to guide placement procedures.
Our final experimental results show that 1) GS QoS outperforms
GS QoS C1; 2) both algorithms have more economical results
than those from existing greedy site algorithm.
Index Terms—replication, greedy, cloud storage system
I. INTRODUCTION
Replication technology has been widely used to improve
the performance of network-based applications. By serving
clients with nearby replicas, it can significantly reduce the
overall network latency. Additionally, the replication would
also benefit service reliability. In practice, all of the computing
resources such as processors, storage, and networks, are not
failure free, and a failure is usually fatal to an existing running
system. As a result, replica sites have to be selected to serve
all clients of the failed nodes so that the service could be
continuous.
In spite of these benefits, the replication should be designed
carefully due to the incurred cost. In a cloud environment,
cloud vendors invest a large amount of money to hardware
resources (i.e. data centers, power, network bandwidth and
machines), and then hire employees to monitor and maintain
these resources. As a result, service providers have to pay for
resources their applications consumed (e.g. network traffic) in
cloud platforms. The more resources consumed by the service
applications, the higher the fee would be charged by the cloud
vendors.
It is still necessary to study replication in the cloud storage
system though it has been conducted intensively in traditional
content delivery network (CDN) and Grid systems. Contrary
to existing CDNs and grids, the distinct characteristics of
replication in the cloud storage systems are 1) end users will
contribute to the majority of network traffic as they are content
owners; 2) The data sets are diverse and dynamic. In the cloud
storage systems (e.g., Dropbox, Tencent Weiyun, and Google
Storage) users update their data to the system continuously,
and the content sizes can be ranged from several MBs to
GBs. This traffic is totally different from those in traditional
CDN where the content comes from the service providers and
remain the same for a long time. Consequently, the user access
traffic patterns in CDNs and Grids are relatively more uniform
than those in the cloud storage systems.
This paper addresses the issue of replication placements
in the cloud storage systems. In this paper, a mathematical
model is built to minimize replication cost, and two monetary
and Qos-aware algorithms, named GS QoS and GS QoS C1,
respectively, are provided to resolve this model.
II. RE LATE D WOR K
Numerous works have been conducted to address the replica
placement problem in CDNs and Grids [3]. For example,
Mansouri et al. [5] provide a selection algorithm for replica
allocation, named combination of Modified DHR Algorithm
(MDHRA). Response time is calculated using factors, e.g.,
data transfer time and storage access latency, and then the
best replica location is determined. Li et al. [4] argue that the
placement of web proxies is critical to network performance
and can significantly reduce the overall latency. In the paper,
they used a tree topology to model the placement problem, but
only the download cost is included in its cost model. In another
paper [7], Xu et al. offer solutions using a tree topology,
and include both upload and download costs. However, the
relevance of replication direction in the provisioning cost
between replica sites is not considered in their work, while
in [1], Chen et al. take a different approach that moves
away from the usual tree topology. In addition, all three
different costs i.e., upload, download, and storage costs are
included in their cost model. Although they emphasize the
importance of both choosing the set of replica sites as well as
specifying the replication directions, the algorithm is not QoS
friendly. Though all of the above algorithms may work well in
traditional CDNs and Grid systems, they overlook the traffic
volume and its cost as well in the cloud storage systems, which
are the two main characteristics in the current cloud storage
systems.
The problem of replicas placements can be defined as a
procedure to select Nnodes to host mreplicas with given
distance matrix Dso that the objectives are optimized. The
element d(i, j)in the distance matrix indicates a distance
metric between the ith request location and the jth storage
node location. As discussed in [2], this is a NP-hard problem
and thus many heuristic algorithms have been proposed to
resolve it. For example, three different greedy algorithms,
i.e., normal greedy algorithm,simple greedy algorithm and
heuristic algorithm are discussed in [6].
III. PLACEMENT MODE L AN D ALGORITHMS
In this section, we first provide a model for replication
placements in the cloud-based storage system, then we present
two new Monetary-QoS aware greedy site (GS) algorithms,
GS QoS and GS Qos C1, to obtain an optimal placement
strategy.
A. Model
There are in total three kinds of entries in the cloud storage
systems, i.e. cloud storage nodes, client users, and data sets.
The cloud nodes provide storage space for the data. In case
of one-node failure, a nearby storage node can be used to
continue the service. The users, on the other hand, contribute
the data access volume for the cloud storage systems. They
synchronize contents from the local nodes to the storage nodes
everyday, but rarely download. These traffics are typically uni-
directed upload traffic which is different from dual replication
directions inside of the cloud nodes or CDN nodes.
Thereafter, the total cost is a sum of the replication cost and
the user accesses cost. Replication cost is a sum of the network
traffic costs (i.e., incoming and outgoing costs at one node)
during the course of replication and synchronization, and the
storage cost when a new node is selected for storage. The
access cost, on the other hand, only refers to the uploading
traffic cost for a node when users access it and its value is
proportional to user access frequency.
Finally, our model is to find a procedure of selecting storage
nodes from Ngiven nodes so that the objective cost in Eq.(1)
can be minimized:
cost =Xjnodes Cusers access traf f ic(j) + Creplication (j)(1)
B. Modification to Greedy Site algorithm GS QoS
Algorithm 1 shows the general GS procedure for replica
placements. In this procedure, a node with the highest utility
is first selected, and then potential users are assigned to it in
each round. The selection is repeated until all the users are
assigned.
The function utility(j)(GB/$) distinguishes among differ-
ent GS algorithms. A common definition of utility is the ratio
of total potential traffic volume at node jand the total cost. A
lower utility value indicates less traffic volume per expense.
Compared to the GS algorithm in [1], our GS QoS introduces
a QoS penalization factor afor the utility function (Algorithm
Algorithm 1 GS QoS procedure
1: E is the set of unassigned users
2: Ejis the current set of users who can be assigned to Cj
3: Ukis the kth user and Cjis the jth cloud node.
4: wkis the data size of requests from user Uk
5: while E!=0 do
6: j=argmax(utility(j))jall un-selected nodes
7: Ejis the set of unassigned users for node j
8: Assign all users in Ejto j
9: Select node j
10: E=EEj
11: end while
Algorithm 2 Utility(j)Calculation
1: Dj: download price, Pj: upload price, Sj: storage price.
2: Fis synchronization frequency and default value is 1.
3: Data set size W and replication cost:
Creplication(j) = W(Sj+PjF+DiF)
4: traffic cost of serving a user k assigned to node j:
Ck(j) = awkDj
5: Analytical size of Request objects from user k to j:
wk(j) = awk
6: Size of all request objects for unassigned users:
WT(j) = X
k
wk(j), k E
7: Analytical utility of site j, and k all unassigned users:
utility(j) = WT(j)
(PkCk(j) + Creplication(j))
2), which adheres to the following rule:
a=(1,if user is not within QoS Distance
QoSD(i,j )
Q,otherwise (2)
As an element in QoS Distance matrix, QoSD(i, j)is the
maximal QoS distance between the user and a node.
C. Improvement GS QoS C1
It is possible that a selected node might not have a large
number of potential users and this would result in a waste
if such a node is selected. To avoid this case, an additional
constraint is used to determine whether to select a new node
for the replication or not. In GS QoS C1, we check whether
or not there would be sufficient potential users for the node
to be selected. The constraint is:
WtX
k
wk(1
n+β1
N)
In this formula, kall assigned users. βis a coefficient while
nis the number of total nodes and Nis the number of total
TABLE I: Summary of instances where GS QoS outperforms
GS
GS GS QoS
n 20 20
nodes selection order 17,6,1,3,11,12,16 17,16,14,6,12,3,1,
,18,14,2,0,7,10, 10,18,7,11,0,4,9,
19,4,9 2,19
Numbers of nodes selected 16 16
Total cost 9.9 9.4
Relative cost 1 0.97456
users. This formula infers that potential volumes for the new
selected node should not be less than the average value over
all the existing selected nodes. The complete GS QOS C1
algorithm is shown in Algorithm 3:
Algorithm 3 GS QoS C1
1: E is the set of unassigned users
2: while E!=0 do
3: j=argmax(utility(j)) jall unselected sites
4: Ejis the set of unassigned users, select node jand
assigned users to it if:
5: Wt=Pkwk, k Ej
6: if(WtPkwk(1
n+β1
N)) select node j
7: else repeat step 3 to find the next best site
8: Assign all users in Ejto j
9: select node j
10: E=EEj
11: end while
IV. RES ULTS
In order to make our result comparable, existing data from
[1],which are prices and parameters for the cloud storage, is re-
used in our experiments. The replicate sites are assumed to be
randomly selected in a geographical space. During repeating
tests, we calculate statistics and study parameter impacts on
the replication results.
In addition to these, we also define two terms. One is
QoSD(i, j)in the QoS Distance matrix, which is the maximal
QoS distance between the user and a node. If the distance
between user iand node jexceeds QoSD(i, j ), the QoS of
the user can not be satisfied. The other is Relative cost for
algorithm i:
relative cost(i) = PGS,k (Callusers(k) + Cr eplication(k))
Pi,k (Callusers(k) + Creplication(k))
According to the relative cost, algorithm iis better than the
GS algorithm if relative cost is greater than 1.
A. Modified Greedy Site algorithm GS QoS
The y-axis in Fig 1 is CDF (Cumulative Distribution Func-
tion). According to this figure, GS QoS is better majority of
the time. The relative cost(GS QoS)value is 1.5 in 95%
Fig. 1: Performance CDF of GS QoS.
Fig. 2: Overall Performance CDF of GS QoS and
GS QoS C1.
and 1.0 in 20% of the test cases. This indicates that there are
only 20% of the tests cases where GS QoS would result in
more cost than that of GS.
The value of relative cost is impacted by both node
selection order and number of selected nodes. In the example
of Table 1, relative cost is less than 1 though two selected
node sets are equivalent. According to this table, if nodes
having lower outgoing cost are selected first, the resultant
selection cost would effectively be less. As the utility in
GS QoS is calculated by including all unassigned users, it has
a tendency of choosing sites with less outgoing first. Similar to
order, the number of selected nodes for replication also have
impacts on the relative cost value.
The relative cost comparisons of GS QoS and
GS QoS C1 are shown in Fig 2 and Fig 3. In Fig 2,
the relative cost that is less than 1.0 occurs at 18% for
GS QoS C1 while it occurs at 23% for GS QoS. This
implies that GS QoS C1 is better than the result of GS QoS
as GS QoS C1 outperforms GS in 82% of test cases while
it is only 77% for GS QoS. Additionally, there is significant
reduction of relative cost when GS QoS C1 outperforms
GS QoS, i.e., instances with relative cost > 1. From Fig 3,
the boxed area for GS QoS C1 is also less than that of
Fig. 3: Comparison of boxplots of GS QoS and GS QoS C1.
Fig. 4: Varying Replica Size (W).
GS QoS, which further implies that GS QoS C1 has more
consistent results with GS than that of GS QoS.
The relationship between replica size and relative cost
is shown in Fig 4. According to this figure, relative cost
rises along with replica size and then remains steady at 1.1 in
both algorithms. The explanation is that the difference between
replication cost and user access traffic cost is diminished along
with replica size.
The relative cost is also impacted by QoS Distance(Q) and
the number of nodes (n) which are shown in Fig 5 and Fig 6.
In both algorithms, relative cost climbs along with QoS
distance (Q) at first, but goes down when Q is greater than 10.
This is because a node alone can nearly serve all users when
Q is large which in turn favors the the utility of GS QoS. This
figure also proves that GS QoS outperforms GS QoS C1.
Regarding to the nodes number, GS QoS is also better than
GS QoS C1. According to Fig 6, the rise in relative cost
with an increasing number of nodes for both algorithms is
a typical case where an increase in solution space lowers
the performance of a heuristic based algorithm. However, the
performance of the algorithms are still reasonably good, at a
relative cost of below 1.1 even when there are 40 sites in the
cloud.
Fig. 5: Varying QoS Distance (Q).
Fig. 6: Varying number of nodes (n).
V. CONCLUSION
We provide a model for replication placements in the
cloud storage systems and present two new monetary-and-QoS
aware greedy algorithms to minimize the replication costs.
Our results show that both the proposed algorithms not only
have more economical results than those from GS [1] but also
guarantee the QoS for user accesses.
REFERENCES
[1] Fangfei Chen, Katherine Guo, John Lin, and Thomas F. La Porta. Intra-
cloud lightning: Building cdns in the cloud. In INFOCOM, pages 433–
441, 2012.
[2] Magnus Karlsson and Christos Karamanolis. Bounds on the replication
cost for qos. Technical report, 2003.
[3] R. Kingsy Grace and R. Manimegalai. Dynamic replica placement and
selection strategies in data grids- a comprehensive survey. J. Parallel
Distrib. Comput., 74(2):2099–2108, February 2014.
[4] Bo Li, M.J. Golin, G.F. Italiano, Xin Deng, and K. Sohraby. On
the optimal placement of web proxies in the internet. In INFOCOM
’99. Eighteenth Annual Joint Conference of the IEEE Computer and
Communications Societies. Proceedings. IEEE, volume 3, pages 1282–
1290 vol.3, Mar 1999.
[5] Najme Mansouri, Gholam Hosein Dastghaibyfard, and Ehsan Mansouri.
Combination of data replication and scheduling algorithm for improving
data availability in data grids. Journal of Network and Computer
Applications, 36(2):711–722, 2013.
[6] Konstantinos Tserpes Dimosthenis Kyriazis Vassiliki Andronkou, Kon-
stantinos Mamouras and Theodora Varvarigou. Dynamic qos-aware data
replication in grid environments based on data importance. Future
Generation Computer Systems, 28:544–553, 2011.
[7] Jianliang Xu, Bo Li, and D.L. Lee. Placement problems for transparent
data replication proxy services. Selected Areas in Communications, IEEE
Journal on, 20(7):1383–1398, Sep 2002.
... Another research objective can be to scrutinize the effect of content personalization [87] on CP algorithms for CCDNs. Furthermore, recent advances in ubiquitous computing pose a challenge in designing CP algorithms for CCDNs such that the content of different resolutions are stored [88]. ...
Article
Full-text available
This paper provides a comprehensive survey of Content Placement (CP) algorithms for Cloud-based Content Delivery Networks (CCDNs). CP algorithms are essential for content delivery for their major role in selecting content to be stored in the geographically distributed surrogate servers in the cloud to meet end-user demands with quality of service (QoS). Evidently, the key objectives of CP, i.e. cost and QoS, are competing. Cost is determined by the underlying cost model of the CCDN infrastructure while the delivered QoS is determined by where content is placed in CCDN. Therefore, we provide an overview of content and CCDN infrastructure. The overview of content includes content characteristics and the influence of Online Social Networking on CP. The overview of CCDN infrastructure includes elasticity and cost model, which affect CP. Our goal is to provide a holistic perspective of the aspects that impact CP algorithms and their efficiency. From the influential factors, we derive a set of design criteria for CP algorithms in CCDNs. We discuss the state-of-the-art CP algorithms for CCDNs and evaluate them against the well-motivated design criteria. We also delineate practical implications and uncover future research challenges. Index Terms—Cloud-based content delivery networks, content placement algorithms, content correlation, content popularity, online social networking relationships, quality of service, resource utilization, user-generated content
... As we see in Section.VII, very few works address this issue. Zeng et al. [100] introduces a constraint that ensures load balanced placement in cloud based storage systems. The constraint indicates that the potential volume of load for the selected replica server must not be less than the average load over the replica servers that have been selected already. ...
Article
Full-text available
Content Delivery Networks (CDNs) have gained immense popularity over the years. Replica server placement is a key design issue in CDNs. It entails placing replica servers at meticulous locations, such that cost is minimized and Quality of Service (QoS) of end-users is satisfied. Many replica server placement models have been proposed in the literature of traditional CDN. As the CDN architecture is evolving through the adoption of emerging paradigms, such as, cloud computing and Network Functions Virtualization (NFV), new algorithms are being proposed. In this paper, we present a comprehensive survey of replica server placement algorithms in traditional and emerging paradigm based CDNs. We categorize the algorithms and provide a summary of their characteristics. Besides, we identify requirements for an efficient replica server placement algorithm and perform a comparison in the light of the requirements. Finally, we discuss potential avenues for further research in replica server placement in CDNs.
Article
Storage quality-of-service (QoS) is a key issue for a storage cloud infrastructure. This paper presents QoSC, a QoS-aware indiscriminate volume Storage Cloud over the dynamic network, based on the Hadoop distributed file system. QoSC employs a data redundancy policy based on indiscriminate recovery volumes and a QoS-aware data placement strategy. We consider the QoS of a storage node as a combination of the transfer bandwidth, the availability of service, the workload (CPU utilization), the free storage space, and the failure rate of DataNodes. We have deployed QoSC on the campus network of Zhejiang University and have conducted a group of experiments and simulations on file storage and retrieval. The results show that QoSC improves the performance of file storage and retrieval and balances the workload among DataNodes, by being aware of QoS of DataNodes.
Article
Full-text available
Content distribution networks (CDNs) using storage clouds have recently started to emerge. Compared to traditional CDNs, storage cloud-based CDNs have the advantage of cost effectively offering hosting services to Web content providers without owning infrastructure. However, existing work on replica placement in CDNs does not readily apply in the cloud. In this paper, we investigated the joint problem of building distribution paths and placing Web server replicas in cloud CDNs to minimize the cost incurred on the CDN providers while satisfying QoS requirements for user requests. We formulate the cost optimization problem with accurate cost models and QoS requirements and show that the monthly cost can be as low as 2.62 US Dollars for a small Web site. We develop a suite of offline, online-static and online-dynamic heuristic algorithms that take as input network topology and work load information such as user location and request rates. We then evaluate the heuristics via Web trace-based simulation, and show that our heuristics behave very close to optimal under various network conditions.
Conference Paper
Full-text available
Web caching or web proxy has been considered as the prime vehicle to cope with the ever-increasing demand for information retrieval over the Internet, WWW being a typical example. The existing work on web proxy has primarily focused on content based caching; relatively less attention has been given to the development of proper placement strategies for the potential web proxies in the Internet. This paper investigates the optimal placement policy of web proxies for a target web server in the Internet. The objective is to minimize the overall latency of searching the target web server subject to the network resources and traffic pattern. Specifically, we are interested in finding the optimal placement of multiple web proxies (m) among the potential sites (n) under a given traffic pattern. We model the problem as a Dynamic Programming problem, and we obtain an optimal solution for a linear array topology using O(n 2 m) time.
Conference Paper
Full-text available
Web caching or web proxy has been considered as the prime vehicle of coping with the ever-increasing demand for information retrieval over the Internet, the WWW being a typical example. Existing work on web proxy has primarily focused on content based caching; relatively less attention has been given to the development of proper placement strategies for the potential web proxies in the Internet. In this paper, we argue that the placement of web proxies is critical to the performance and further investigates the optimal placement policy of web proxies for a target web server in the Internet. The objective is to optimize a given performance measure for the target web server subject to system resources and traffic pattern. Specifically, we are interested in finding the optimal placement of multiple web proxies (M) among potential sites (N) under a given traffic pattern. We show this can be modeled a dynamic programming problem. We further obtain the optimal solution for the tree topology using O(N<sup>3</sup>M<sup>2</sup>) time
Article
Full-text available
Transparent data replication has been considered a promising technique for improving system performance for a large distributed network. In this paper, a hybrid transparent replication model is presented. We address the problems of replication proxy placement in the network and data replica placement on the installed proxies given that a maximum of M proxies are allowed. Both reads and writes are considered in these problems. The performance objective is to minimize the total data transfer cost. To address the placement problems, we first present the optimal solutions for a single object in a tree network without/with constraint on the number of replicas. Based on that, two schemes, namely, aggregate access (AGGA) and weighted popularity (WPOP), are proposed for the replication proxy placement problem. An optimal solution is described for the replica placement problem. The performance of the proposed placement schemes is evaluated with a set of carefully designed simulation experiments over a wide range of system parameters. The results give us several helpful intuitions in deploying transparent replication proxies in a practical system.
Article
Data replication techniques are used in data grid to reduce makespan, storage consumption, access latency and network bandwidth. Data replication enhances data availability and thereby increases the system reliability. There are two steps involved in data replication, namely, replica placement and replica selection. Replica placement involves identifying the best possible node to duplicate data based on network latency and user request. Replica selection involves selecting the best replica location to access the data for job execution in the data grid. Various replica placement and selection algorithms are available in the literature. These algorithms measure and analyze different parameters such as bandwidth consumption, access cost, scalability, execution time, storage consumption and makespan. In this paper, various replica placement and selection strategies along with their merits and demerits are discussed. This paper also analyses the performance of various strategies with respect to the parameters mentioned above. In particular, this paper focuses on the dynamic replica placement and selection strategies in the data grid environment.
Article
Data Grid is a geographically distributed environment that deals with large-scale data-intensive applications. Effective scheduling in Grid can reduce the amount of data transferred among nodes by submitting a job to a node, where most of the requested data files are available. Data replication is another key optimization technique for reducing access latency and managing large data by storing data in a wisely manner. In this paper two algorithms are proposed, first a novel job scheduling algorithm called Combined Scheduling Strategy (CSS) that uses hierarchical scheduling to reduce the search time for an appropriate computing node. It considers the number of jobs waiting in queue, the location of required data for the job and the computing capacity of sites. Second a dynamic data replication strategy, called the Modified Dynamic Hierarchical Replication Algorithm (MDHRA) that improves file access time. This strategy is an enhanced version of Dynamic Hierarchical Replication (DHR) strategy. Data replication should be used wisely because the storage capacity of each Grid site is limited. Thus, it is important to design an effective strategy for the replication replacement. MDHRA replaces replicas based on the last time the replica was requested, number of access, and size of replica. It selects the best replica location from among the many replicas based on response time that can be determined by considering the data transfer time, the storage access latency, the replica requests that waiting in the storage queue and the distance between nodes. The simulation results demonstrate the proposed replication and scheduling strategies give better performance compared to the other algorithms.
Article
data replication, QoS, placement algorithms, lower bounds, integer programming Data replication is used extensively in wide-area distributed systems to achieve low data-access latency. Minimizing the cost of the resources used for replication is a key problem in these systems. The paper proposes a method to calculate lower bounds for the replication cost required to achieve certain QoS goals. We obtain bounds for the general case as well as for certain classes of replica placement heuristics. We observe that the cost of heuristics depends heavily on the workload and QoS goal. Based on these results, we discuss the inherent properties of heuristics that affect their cost and applicability to different environments. Abstract Data replication is used extensively in wide-area distributed systems to achieve low data-access latency. Minimizing the cost of the resources used for replication is a key problem in these systems. The paper proposes a method to calculate lower bounds for the replication cost required to achieve certain QoS goals. We obtain bounds for the general case as well as for certain classes of replica placement heuristics. We observe that the cost of heuristics depends heavily on the workload and QoS goal. Based on these results, we discuss the inherent properties of heuristics that affect their cost and applicability to different environments.
Article
Data replication comprises a standard fault tolerance approach for systems–especially large-scale ones–that store and provide data over wide geographical and administrative areas. The major topics that the task of data replication covers include the replica creation, placement, relocation and retirement, replica consistency and replica access. In a business context a number of constraints exists which are set by the infrastructure, network and application capabilities in combination with the Quality of Service (QoS) requirements that hinder the effectiveness of data replication schemes. In this paper, we examine how this combination affects the replication lifecycle in Data Grids and we introduce a set of interoperable novel file replication algorithms that take into account the infrastructural constraints as well as the ‘importance’ of the data. The latter is approximated through a multi-parametric factor that encapsulates a set of data-specific parameters, such as popularity and content significance.