ArticlePDF Available

Popularity-Aware Multi-Failure Resilient and Cost-Effective Replication for High Data Durability in Cloud Storage

Authors:

Abstract and Figures

Large-scale data stores are an increasingly important component of cloud datacenter services. However, cloud storage system usually experiences data loss, hindering data durability. Three-way random replication is commonly used to lead better data durability in cloud storage systems. However, three-way random replication cannot effectively handle correlated machine failures to prevent data loss. Although Copyset Replication and Tiered Replication can reduce data loss in correlated and independent failures, and enhance data durability, they fail to leverage different data popularities to substantially reduce the storage cost and bandwidth cost caused by replication. To address these issues, we present a popularity-aware multi-failure resilient and cost-effective replication (PMCR) scheme for high data durability in cloud storage. PMCR splits the cloud storage system into primary tier and backup tier, and classifies data into hot data, warm data and cold data based on data popularities. To handle both correlated and independent failures, PMCR stores the three replicas of the same data into one Copyset formed by two servers in the primary tier and one server in the backup tier. For the third replicas of warm data and cold data in the backup tier, PMCR uses the compression methods to reduce storage cost and bandwidth cost. Extensive numerical results based on trace parameters and experimental results from real-world Amazon S3 show that PMCR achieves high data durability, low probability of data loss, and low storage cost and bandwidth cost compared to previous replication schemes.
Content may be subject to copyright.
Popularity-Aware Multi-Failure Resilient and
Cost-Effective Replication for High Data
Durability in Cloud Storage
Jinwei Liu, Member, IEEE, Haiying Shen , Senior Member, IEEE, and Husnu S. Narman
Abstract—Large-scale data stores are an increasingly important component of cloud datacenter services. However, cloud storage
system usually experiences data loss, hindering data durability. Three-way random replication is commonly used to lead better data
durability in cloud storage systems. However, three-way random replication cannot effectively handle correlated machine failures to
prevent data loss. Although Copyset Replication and Tiered Replication can reduce data loss in correlated and independent failures ,
and enhance data durability, they fail to leverage different data popularities to substantially reduce the storage cost and bandwidth cost
caused by replication. To address these issues, we present a popularity-aware multi-failure resilient and cost-effective replication
(PMCR) scheme for high data durability in cloud storage. PMCR splits the cloud storage system into primary tier and backup tier, and
classifies data into hot data, warm data and cold data based on data popularities. To handle both correlated and independent failures,
PMCR stores the three replicas of the same data into one Copyset formed by two servers in the primary tier and one server in the
backup tier. For the third replicas of warm data and cold data in the backup tier, PMCR uses the compression methods to reduce
storage cost and bandwidth cost. Extensive numerical results based on trace parameters and experimental results from real-world
Amazon S3 show that PMCR achieves high data durability, low probability of data loss, and low storage cost and bandwidth cost
compared to previous replication schemes.
Index Terms—Cloud storage, replication, data durability, cost-effectiveness, SLA
Ç
1INTRODUCTION
LARGE-SCALE data stores are an increasingly important
component of cloud datacenter services. Cloud pro-
viders, such as Amazon S3 [1], Google Cloud Storage
(GCS) [2] and Windows Azure [3] offer storage as a ser-
vice. In the storage as a service, users store their data
(i.e., files) into a cloud storage system and retrieve their
data from the system. It is critical for cloud providers to
reduce Service Level Agreement (SLA) violations to pro-
vide high quality of service and reduce the associated
penalties for such services. High data durability is usu-
ally required by cloud storage systems to meet SLAs.
Durability means the data objects that an application has
storedintothesystemarenotlostduetomachinefail-
ures (e.g., disk failure) [4]. For example, services that use
Amazon Dynamo storage system typically require that
99.9 percent of the read and writes requests execute
within 300 ms [5].
Data loss caused by machine failures typically affects
data durability. Machine failures usually can be catego-
rized into correlated machine failures and non-correlated
machine failures. Correlated machine failures refer to the
events in which multiple nodes (i.e., servers, physical
machines) fail concurrently due to the common failure
causes [6], [7] (e.g., cluster power outages, workload-trig-
gered software bug manifestations, Denial-of-Service
attacks), and this type of failures often occur in large-scale
storage systems [8], [9], [10]. Significant data loss is
caused by correlated machine failures [11], [12], which
have been documented by Yahoo! [13], LinkedIn [6] and
Facebook [14]. Non-correlated machine failures refer to
the events in which nodes fail individually (e.g., individ-
ual disk failure, kernel crash). Usually, non-correlated
machinefailuresarecausedbyfactorssuchasdifferent
hardware/software compositions and configurations, and
varying network access abilities.
The storage demand in a cloud storage system
increases exponentially [15]. Data popularity is skewed in
cloud storage. The analysis of traces from Yahoo!’s Druid
cluster shows that the top 1 percent of data is an order of
magnitudemorepopularthanthebottom40percent[16].
Due to highly skewed data popularity distributions [16],
[17], popular data with considerably higher request fre-
quency (referred to as hot data) [18] could generate heavy
load on some nodes [16], which may result in data unavail-
ability at a time. Availability means that the requested data
objects will be able to be returned to users [4]. Actually,
much of the data stored in a cloud system is rarely read
J. Liu is with the Department of Computer and Information Sciences at Florida
A&M University, Tallahassee, FL 32307. E-mail: jinwei.liu@famu.edu.
H. Shen is with the Computer Science Department, University of Virginia,
Charlottesville, VA 22904. E-mail: hs6ms@virginia.edu.
H.S. Narman is with the Computer Science Department, Marshall University,
Huntington, WV 25755. E-mail: narman@marshall.edu.
Manuscript received 8 Aug. 2017; revised 6 Sept. 2018; accepted 14 Sept.
2018. Date of publication 1 Oct. 2018; date of current version 11 Sept. 2019.
(Corresponding author: Haiying Shen.)
Recommended for acceptance by Z. Chen.
For information on obtaining reprints of this article, please send e-mail to:
reprints@ieee.org, and reference the Digital Object Identifier below.
Digital Object Identifier no. 10.1109/TPDS.2018.2873384
IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019 2355
1045-9219 ß2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See ht_tp://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
(commonly referred to as cold data [15], [18], [19]). To
enhance data availability and durability, data replication
is commonly used in cloud storage systems. Replicas of
cold data waste the storage resource and generate consid-
erable storage and bandwidth costs (for data updates and
data requests) [18] that outweigh their effectiveness on
enhancing data durability. Thus, it is important to com-
press and deduplicate unpopular data objects and store
them in low-cost storage medium [15], [20].
Random replication, as a popular replication scheme, has
been widely used in cloud storage systems [11], [21]. Cloud
storage systems, such as Hadoop Distributed File System
(HDFS) [13], RAMCloud [22], Google File System (GFS) [23]
and Windows Azure [24] use random replication to repli-
cate their data in three randomly selected servers from dif-
ferent racks to prevent data loss in a single cluster [11], [21],
[22], [25]. However, the three-way random replication can-
not well handle correlated machine failures because data
loss occurs if any combination of three nodes fail simulta-
neously [11]. To handle correlated machine failures, Copy-
set Replication [11] and Tiered Replication [21] have been
proposed. However, both methods do not try to reduce stor-
age cost or bandwidth cost caused by replication though
data replicas bring about considerably high storage and
bandwidth costs. Although many replication schemes have
been proposed to improve data durability [8], [9], [26], [27],
[28], [29], they do not concurrently consider different data
popularities and multiple failures (i.e., correlated and non-
correlated machine failures) to increase data availability
and durability and reduce the storage and bandwidth costs
caused by replication without compromising request delay
greatly.
To address the above issues, in this paper, we aim to design
a cost-effective replication scheme that can achieve high data
durability and availability while reducing storage cost and
bandwidth cost caused by replication. To achieve our goal,
we propose a popularity-aware multi-failure resilient and
cost-effective replication scheme (PMCR), which has advan-
tages over the previous proposed replication schemes because
it concurrently owns the following distinguishing features:
First, it can handle both correlated and non-correlated
machine failures. Second, it compresses rarely used replicas
of unpopular data to reduce storage cost and bandwidth cost
without compromising the data durability, data availability,
and data request delay greatly. We summarize the contribu-
tions of this work below.
We conducted trace data analysis, and the analytical
results confirm the existence of read-intensive and
write-intensive data, data popularity and data simi-
larity in cloud storage, which lay the solid founda-
tion of the design of PMCR.
PMCR handles both correlated and independent fail-
ures by storing the three replicas of the same data
into one Copyset formed by two servers in the pri-
mary tier and one server in the backup tier. The pri-
mary tier resides close to primary replicas and is
used for recovering data with low read latency, and
the backup tier is located off-site (e.g., remote loca-
tion) and serves as the disaster recovery site to pro-
tect from site outage or to restore when the local
backup is not available.
PMCR classifies data into hot data, warm data and
cold data based on data popularity, and it signifi-
cantly reduces the storage and bandwidth costs
without compromising data durability, data avail-
ability, and data request delay greatly by selec-
tively compressing the third replicas of data
objects based on data popularity in the backup
tier. For read-intensive data, PMCR uses the Simi-
lar Compression method (SC), which leverages the
similarities among replica chunks and removes
redundant replica chunks; for write-intensive data,
PMCR uses the Delta Compression method (DC),
which records the differences of similar data
objects and between sequential data updates.
Since Balanced Incomplete Block Design (BIBD)
does not always exist for any given combination of
treatment number, replication level and block
size [11], [30], PMCR uses Partially Balanced
Incomplete Block Design (PBIBD) to generate the
sets of nodes for storing the replicas of the data
when the BIBD does not exist, which overcomes the
limitation of BIBD and greatly increases the chance
of generating the sets of nodes.
PMCR enhances SC by eliminating the redundant
chunks between different data objects (rather than
only within one data object) and enhances DC by
recording the differences between different data
objects (rather than only the difference between
sequential updates), and it further reduces the stor-
age and bandwidth costs caused by replication.
We analyzed the system performance of PMCR in
comparison with other replication schemes in terms
of storage cost, data durability and bandwidth cost,
which shows that PMCR outperforms other schemes
in these aspects.
We have conducted extensive numerical analysis
based on trace parameters and experiments on Ama-
zon S3 to compare PMCR with other state-of-the-art
replication schemes. Both numerical and experimen-
tal results show that PMCR achieves high data dura-
bility, low data loss probability and low storage cost
and bandwidth cost.
The remainder of this paper is organized as follows.
Section 2 presents the analysis of the trace data. Section 3
presents the design for PMCR. Section 4 describes the analy-
sis of system performance. Section 5 presents the numerical
and experimental results. Section 6 reviews the related
work. Section 7 concludes this paper with remarks on our
future work.
2TRACE DATA ANALYSIS
We collected two real-world traces: a public cloud from
CloudVPS [31] and a substantial amount of block I/O traces
from a private cloud at Florida International University
(FIU). The CloudVPS trace consists of block I/O traces col-
lected from hundreds of VMs on the production system of
the IaaS cloud for several days. The FIU trace contains
around two months of block I/O traces collected from sev-
eral production servers (i.e., webserver). It contains the trace
for the homes workload, web-vm workload and webserver
2356 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
workload which are for different applications. The homes
workload is from a NFS server that serves the home
directories of the research group at FIU. The research
group activities include software deployment, testing,
experimentation, plotting using software and technical
document preparation. The web-vm workload is collected
from a virtualized system hosting two Computer Science
department web-servers: webmail proxy and online
course management system.
2.1 Different Data Popularities
Fig. 1a shows the different access (including read and write)
frequencies of files in the homes workload. We see
that around 41 percent files fall in the range of (0,2], and
10 percent files fall in the range of (8,16]. The result shows
that different files have different access frequencies, and a
small percentage of files have very low access frequency or
extremely high access frequency. Fig. 1b shows the different
access frequencies of files in the web-vm workload. We see
that around 82 percent files are in the range of (0,2], and
7 percent files fall in the range of (2,4]. The result also shows
that different files have different access frequencies, and a
small percentage of files have very low access frequency or
extremely high access frequency. Both Figs. 1a and 1b
indicate the skewness of popularity distribution of files.
Based on this observation, PMCR considers the different
popularities of files in file operation, so that the storage and
bandwidth costs can be reduced as much as possible with-
out compromising the data durability and availability and
file request delay greatly.
2.2 Different Data Intensiveness
To show the existence of write-intensive and read-intensive
data in cloud storage system, we measure the FIU web-
server trace and CloudVPS trace. Fig. 2 shows the number
of reads and writes for the FIU webserver trace per week
for a total of 35 days. Overall, across the entire trace, there
are around 30 percent of reads and 70 percent of writes.
From this figure, we see that the I/O patterns of FIU web-
server are dominated by writes, that is, the FIU webserver
data is write-intensive.
Fig. 3 shows the number of reads and writes for different
VMs in CloudVPS. From the figure, we see that the I/O pat-
terns of some VMs are dominated by reads and the I/O pat-
terns of some VMs are dominated by writes. The results
from Figs. 2 and 3 confirm the existence of write-intensive
and read-intensive data in cloud storage systems. Based on
this observation, PMCR uses different compression methods
for write-intensive data and read-intensive data in order to
reduce the storage and bandwidth costs as much as possible.
2.3 Chunk Similarity
We use two FIU workloads, homes workload and web-vm
workload, to analyze the similarity between chunks of data
objects. In cloud storage systems, data objects are usually
stored in the form of chunks. The chunks usually have some
similarity between each other [32], [33]. We grouped the
chunks that have no more than 10, 100 and 1,000 replicas,
respectively, into each group. Then, we calculated the aver-
age number of replicas per chunk in each group (called
workload similarity). The similarity between two chunks
(say A and B) is defined as
SimðA; BÞ¼jA\Bj
A:(1)
Fig. 1. Percent of files with different number of reads/writes.
Fig. 2. I/O patterns of the FIU webserver.
Fig. 3. I/O patterns of the VMs in CloudVPS.
LIU ET AL.: POPULARITY-AWARE MULTI-FAILURE RESILIENT AND COST-EFFECTIVE REPLICATION FOR HIGH DATA DURABILITY... 2357
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
Fig. 4 shows the workload similarity of each group of the
homes workload and web-vm workload. From the figure,
we see that the workload similarity exists in each group. In
the group with no more than 1,000 replicas is 8.7 and 4.5
in the homes workload and web-vm workload, respectively.
The result shows that data similarity exists among data
chunks as indicated in [34]. This observation motivates the
design of PMCR, which leverages the similarities between
data chunks to eliminate the redundant data chunks in stor-
age and data transmission.
3SYSTEM DESIGN
In this section, we first introduce some concepts and
assumptions, and then formulate our problem. Finally, we
present the design of PMCR based on the observations from
the workload analysis.
Suppose there are mdata objects and each data object is
split into Mpartitions (i.e., chunks) in the cloud storage sys-
tem [28], [34], [35], [36]. A data object is lost if any of its par-
titions is lost [11]. We assume there are Nservers in the
cloud storage system. For analytical tractability, we assume
that a server belongs to a rack, a room, a data-center, a
country and a continent. We use the label in the form of
“continent-country-datacenter-room-rack-server” to iden-
tify the geographic location of a server [28], [37].
Problem Statement. Given data object request probabili-
ties, data object sizes, and failure probability, how to repli-
cate the chunks of data objects so that the node failure
probability, storage cost and bandwidth cost are minimized
in both correlated failures and non-correlated failures?
To solve this problem, we build a cost-effective replica-
tion scheme with data compression to maximize the data
durability in both correlated and non-correlated failures
while reducing the cost (storage cost and bandwidth cost).
3.1 PMCR Replication Scheme
3.1.1 Classification of Data Types
PMCR classifies data into three types: hot data, warm data
and cold data based on data popularity. The popularity of a
data object is measured by its visit frequency, i.e., the num-
ber of visits in a time epoch (denoted by vi) [17], [28], [38].
That is, iðÞ ¼ avi, where idenotes the popularity of a
data object, ais a coefficient. Suppose the time is split into
epoches, then the popularity at epoch tþ1can be estimated
based on the popularity value and coefficient bat epoch t
tþ1
iðÞ ¼ bt
iðÞ þ avi:(2)
To determine the popularity type of a data object, PMCR
first calculates the popularity of each data object, and then
ranks them based on their popularity values. PMCR con-
siders the data objects with popularity rank within top
25 percent as hot data, the data objects with popularity rank
between ð25%;50%as warm data, and the data objects with
popularity rank between ð50%;100%as cold data.
PMCR also needs to determine whether a data object is
read-intensive or write-intensive in order to choose a com-
pression method accordingly. For this purpose, it sets thresh-
olds for read rate and write rate. PMCR logs the number of
reads and writes of each data object in each time epoch. A
data object is write-intensive if its write rate is higher than
the pre-defined write rate threshold, and it is read-intensive
if its read rate is higher than the pre-defined read rate thresh-
old. PMCR determines the read-intensiveness and write-
intensiveness of each data object periodically.
3.1.2 Replica Placement
PMCR first splits the nodes in the system into two tiers: pri-
mary tier and backup tier. As in the three-way replication,
in PMCR, the first two replicas of all data objects are stored
in primary tier, and the third replicas of data objects are
stored in backup tier. For load balance, the number of nodes
in the primary tier is twice of the number of nodes in the
backup tier. That is, PMCR assigns b2N
3cnodes to the pri-
mary tier, and assigns bN
3cto the backup tier.
To reduce the data loss caused by correlated machine
failures, PMCR adopts the fault-tolerant set (FTS) [11] (i.e.,
Copyset). An FTS is a distinct set of servers that holds all
replicas of a data object’s chunk. Each FTS is a single unit of
failure because at least one data object is lost when an FTS
fails. We will explain the details of FTS in Section 4.2.1.
PMCR then partitions the nodes and uses Balanced Incom-
plete Block Design (BIBD)-based (or Partially Balanced
Incomplete Block Design (PBIBD)-based) to generate FTSs.
As shown in Fig. 5, each FTS contains two nodes from the
primary tier and one node from the backup tier, which can
protect against correlated machine failures [21]. PMCR rep-
licates each chunk of every data object in a single FTS.
1
For example, in Fig. 5, 12 servers are split into two tiers,
and there are 8 FTSs across the primary tier and the backup
tier. The servers with red lines (marked by “P”) are from the
primary tier and the servers with black lines (marked by
“B”) are from the backup tier. PMCR replicates the first two
chunk replicas of data objects on the primary tier, and repli-
cates the third chunk replicas on the backup tier. PMCR
Fig. 4. Data similarity in homes workload and web-vm workload.
Fig. 5. Fault-tolerant sets (FTSs) in PMCR. (P: Primary tier, B:
Backup tier.).
1. Although putting all replicas of a chunk to the nodes in an FTS
can bring about the cost of inter-rack transfer (across oversubscribed
switches), it can significantly reduce data loss probability caused by
correlated machine failures by using BIBD-based method [11].
2358 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
compresses the third replicas of warm data and cold data on
the backup tier to further reduce the storage cost and band-
width cost since they are not frequently visited.
Algorithm 1 shows the pseudocode of the PMCR replica-
tion algorithm. PMCR splits the nodes in the storage system
into primary tier and backup tier (Line 1) [21]. The primary
tier stores the first two chunk replicas of data objects and
the backup tier stores the third replicas of data objects [21].
This three-way replication and help handle the correlated
machine failures. PMCR uses BIBD-based or PBIBD-based
method to generate FTS. Each FTS consists of two servers
from the primary tier and one server from the backup tier
(Line 2). Each chunk will be replicated into one FTS to pro-
tect against correlated machine failures. To reduce storage
cost and bandwidth cost without compromising data avail-
ability of popular data, PMCR classifies data into hot data,
warm data and cold data based on popularities of data
objects, and determines whether each data object is read-
intensive and (or) write-intensive (Line 3), and uses differ-
ent strategies to store the third replicas of data objects in
each data type. Accordingly, PMCR places the replicas of
each chunk into the nodes in an FTS (Lines 5-11).
Algorithm 1. Pseudocode for the PMCR Algorithm
Input: Data objects’ visit frequencies, read and write rates,
thresholds for determining hot data, warm data and
cold data
1 Split the nodes (in the system) into primary tier and backup
tier
2 Use BIBD-based (or PBIBD-based) method to generate
FTSs, each FTS contains two nodes from the primary
tier and one node from the backup tier
3 Compute the popularities of each data object
4for each data object do
5if the data is hot data then
6 Store its chunk replicas to the nodes in an FTS, the
first two chunk replicas are in primary tier and the
third one is in backup tier
7else
8if the data is read-intensive data then
9 Store its chunk replicas to the nodes in an FTS,
the first two chunk replicas are in primary tier
and the third one is in backup tier using SC
10 if the data is write-intensive data then
11 Store its chunk replicas to the nodes in an FTS,
the first two chunk replicas are in primary tier
and the third one is in backup tier using DC
For hot data, PMCR puts the third replicas on the backup
tier without compression so that the data can be quickly
recovered when the nodes that store the first two replicas fail
(Lines 5-6). To further reduce storage cost and bandwidth
cost, for warm data and cold data, PMCR puts the third repli-
cas on the backup tier using compression (Lines 8-11). It uses
SC to compress read-intensive data (Lines 8-9) and uses DC to
compress write-intensive data (Lines 10-11). We will explain
the details of SC and DC in Sections 3.2 and 3.3, respectively.
The SC method removes the similar chunks within a file or
among the files for storage and transmission to the file
requester, and the file requester recovers the removed chunks
after it receives the compressed file. The DC method stores a
copy of a file and the different parts of other files that are
similar to this file. For a file request, the stored file copy and
the different parts are transmitted to the file requester. In file
update, only the updated parts need to be transmitted to the
replica nodes. As a result, rather than storing the entire data
object, the size of the stored data is greatly reduced with the
SC and DC methods. For read-intensive data, rather than
transmitting the entire file for a data request, the size of data
in transmission is reduced with the SC method. For write-
intensive data, rather than transmitting the entire file, only
the updated parts are transmitted with the DC method. As a
result, the storage and bandwidth costs are greatly reduced.
The data recovery for compressed parts may generate a cer-
tain delay and overhead when processing a data request.
However, the benefits of storage and bandwidth cost saving
from the compression outweigh this downside since the
warm and cold data objects in the backup tier are rarely read.
3.2 Similar Compression
In SC, similar chunks are grouped together and a certain
number of similar chunks form a block. Then, duplicate
blocks or near-duplicate blocks to a block are removed. Fig. 6
shows an example illustrating the process of grouping simi-
lar chunks and compressing the similar chunks together. In
Fig. 6a, similar blocks including (A, A’, A”), (C, C’), (E, E’)
are grouped together and they are considered as redundant.
In Fig. 6b, for each similar block group, the redundant blocks
are removed and only the first block (including A, B, C, D, E)
is remained. The data within a data object sometimes are
similar to each other [39]. PMCR adopts the SC method to
eliminate the redundant chunks within each data object in
order to reduce the storage cost and bandwidth cost in data
transmission for data requests. Specifically, in PMCR, for
read-intensive data objects in the backup tier, for each group
of similar blocks, only the first block needs to be stored and
all other similar blocks are removed.
Also, PMCR extends the SC method originates from [39]
to eliminate the redundant chunks between different data
objects to further reduce the costs. We present examples for
the intra-file compression and inter-file compression. Fig. 7
shows an example of intra-file compression in a file. Similar
blocks are marked in the same color. For example, the
blocks Aand A0are similar blocks; Cand C0are similar
blocks; Dand D0are similar blocks; Eand E0are similar
blocks. Fig. 8 shows an example of inter-file compression.
Similar blocks are marked in the same color. The blocks C
and C0in the left data object are similar to the block Cin the
right data object. The block Ein the left data object is similar
to the block Ein the right data object. Similar blocks within
a file or between files are grouped together for compression.
That is, except the first block, other similar blocks are
removed in the storage of a server. An index for a removed
Fig. 6. Similar compression.
LIU ET AL.: POPULARITY-AWARE MULTI-FAILURE RESILIENT AND COST-EFFECTIVE REPLICATION FOR HIGH DATA DURABILITY... 2359
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
block is created to point to the first similar block. When a file
requester receives the compressed file, it recovers the
removed blocks from the intra-file compression based on
the indices. When a received compressed file contains indi-
ces pointing to similar blocks in other files caused by inter-
file compression, if the file requester has the files, it simply
recovers the removed blocks. Otherwise, it requests for
these blocks from the cloud to recover the removed blocks.
We will explain how to calculate the similarity of blocks in
Section 3.4. SC can help reduce the data storage cost due to
the reduction of stored data size. It can also reduce
the bandwidth cost in responding requested data since
removed blocks may not need to transmit.
Algorithm 2 shows the pseudocode of the SC algorithm
conducted by each server. Each server first creates chunk
blocks with each block containing similar chunks in a file
(Line 1). Then, it uses Bloom filter to measure the similarity
between chunk blocks and group similar blocks into a group
(Lines 2-8). Specifically, it compares each chunk block with
every other chunk block (Line 3). If the two blocks are similar
to each other, SC groups the blocks together (Lines 4-6).
Finally, the server compresses the similar chunk blocks
grouped together (Line 9).
Algorithm 2. Pseudocode for Similar Compression (SC)
Conducted by Each Server
Input: Data chunks of data objects, threshold for determining
similarity (Sth)
1 Create blocks; each block contains similar chunks in a file
2for each block blksdo
3 Use Bloom filter to measure the similarity between block
blksand every other block blkt
4if BF ðblksÞBF ðblktÞ>S
th then
//Dot product of the two Bloom filters
5blksand blktare considered similar to each other
6 Group blksand blkttogether
7else
8blksand blktare considered not similar to each other
9 Use intra-file and inter-file compression for each block group
3.3 Delta Compression
Write-intensive data objects have frequent updates. To reduce
the cost caused by replication, PMCR uses Delta Compression
(DC) to compress the third replicas of the data objects in the
backup tier. Fig. 9 uses an example to illustrate the process of
DC. In Fig. 9, chunk B and chunk B’ are similar chunks. The
regions of difference between chunk B and chunk B’ are
marked in orange. DC stores chunk B and the differences for
chunk B’. When chunk B or chunk B’ is updated, only the
updated parts rather than the entire chunk are sent to the rep-
lica servers. Then, the replica servers update the correspond-
ing parts accordingly. To send a chunk to a file requester, the
stored different parts of this chunk and the other parts from
the stored entire chunk (chunk B in the above example) are
transmitted. Since duplicated parts are removed in the storage,
the storage cost is reduced. Also, the bandwidth cost for data
updates and for data responses is reduced. We will explain
how to calculate the similarity of chunks in Section 3.4.
A user in the cloud storage system sends a read request for
a data object. For each chunk of the data object, PMCR first
checks if it is in the primary tier. If it is in the primary tier,
PMCR chooses the replica of the chunk from the node with a
shorter geographic distance to the user and returns the chunk
to the user. Otherwise, PMCR fetches the replica from the
nodes located in the backup tier with shorter geographic dis-
tance to the user, and sends it to the user. If the data object is
warm data or cold data, PMCR sends the compressed data
object to the user, then the data object will be decompressed
on the client-side. We will explain the method to measure the
geographic distance between servers in Section 3.5.
When a user sends a write request for a data object,
PMCR first checks the popularity type of the data object. If
the data object is hot data object, PMCR updates the first
two replicas and the third replica without compression.
Otherwise, the data object is warm or cold data object.
Then, PMCR further checks if the data object is read-
intensive or write-intensive. If the data object is read-inten-
sive, PMCR uses SC to compress the third replica in the
backup tier. If the data object is write-intensive, PMCR uses
DC to compression the third replica.
3.4 Similarity Calculation
To remove the redundant replicas of the data chunks in
the backup tier, first we need to find the duplicate (iden-
tical) or similar replicas. In this paper, we use the Bloom
filter technique to detect the similarity between data
blocks or chunks. Compared to other similarity detection
methods, Bloom filter enables fast comparison as match-
ing is a simple bitwise-AND operation and generates
lower computing overhead. Also, the chunks can be
uniquely identified by the SHA-1 hash signature, also
called fingerprint. As the amount of data increases, more
fingerprints need to be generated, which consume more
storage space and incur more time overhead for index
Fig. 7. Intra-file similarity. Fig. 8. Inter-file similarity.
Fig. 9. Delta compression.
2360 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
searching. To overcome the scalability of fingerprint-
index search, PMCR groups a certain number of chunks
into a block, and detects the similarity between blocks.
Below, we introduce the Bloom filter for detecting simi-
larity between data blocks and will extend this algorithm
for detecting similarity between data chunks.
Denote E¼fe1;...;e
jEj gas a set of chunks of a block. As
shown in Fig. 10, the Bloom filter for each set Eis represented
as a bit array of ubits, with all bits initialized to 0 [40]. Each
element eðe2EÞis hashed using kdifferent hash functions,
i.e., h1;...;h
k. The hash functions return values between 1
and uand each hash function maps eto one of the uarray
positions with a uniform random distribution. To add an ele-
ment to the set, the Bloom filter feeds it to each of the khash
functions to get karray positions, and sets the kbits corre-
sponding to the hash functions’ output, in the Bloom filter to
1. If a bit has already been set to 1, it stays 1. Fig. 10 shows an
example of the Bloom filter of the set fe1;e
2;e
3;e
4gwith
u¼18 and k¼3. The colored arrows indicate the positions
in the 18 bit array that each set element is hashed to.
The chunks of a block is a set in Bloom filter parlance
whose elements are the chunks. Data blocks that are similar to
each other have a large number of common 1s among their
Bloom filters. To find similar blocks of a given block, we com-
pare the Bloom filter of the block with the Bloom filter of all
the other blocks. The blocks that have the percentage of com-
mon 1s higher than a certain threshold (e.g., 70 percent) are
considered as similar blocks [33]. For example, data block A
has f0;1;1;0;1;1;1;1;0;1gas Bloom Filter array for its
A1;A
2;andA
3chunks. Data block Bhas f0;1;1;0;1;
1;1;0;1;1gas Bloom Filter array for its B1;B
2;andB
3
chunks. If threshold is 70 percent, then A and B are similar
blocks. If the threshold is 100 percent, then A and B are not
similar blocks. To detect similar chunks, we can consider a
block as a chunk and consider a chunk as a sub-chunk in the
above algorithm and use the same algorithm.
3.5 Distance Calculation
We adopt the method in [28] to compute the geographic dis-
tance between servers. The method uses a 6-bit number to
represent the locations of servers. Each bit corresponds to
the location part of a server, i.e., continent-country-datacen-
ter-room-rack-server. To calculate the distance difference
between two servers, starting with the most significant bit,
each location part of both servers are compared one by one
to compute the geo-similarity between them. If the location
parts are equivalent, the corresponding bit is set to 1. Other-
wise, the corresponding bit is set to 0. Once a bit is set to 0,
all of its lower significant bits are automatically set to 0. For
example, given two particular and arbitrary servers Siand
Sj. If the distance between them is represented as 111,000
(as shown in below), it indicates that Siand Sjare in the
same datacenter but not in the same room.
continent country datacenter room rack server
11 1000
The geographic distance is obtained by applying a binary
“NOT” operation to the geo-similarity. In this example, it is
111000 ¼000111 ¼7ðdecimalÞ:
4ANALYSIS OF SYSTEM PERFORMANCE
4.1 Storage Cost Reduction
Different storage mediums have different costs per unit size.
For example, SSD is more expensive than disk, and disk is
more expensive than tape. To reduce the storage cost while sat-
isfying the SLA requirements of different applications, we
need to decide the storage mediums for different data objects
in different tiers (i.e., primary tier and backup tier). The pri-
mary tier stores the data objects’ first two replicas that are for
data availability, and the backup tier stores the data objects’
third replicas and it is mainly used to enhance durability. The
first two replicas of the data objects in the primary tier are
always used for failure recovery, and the third replica is used
for failure recovery only if the first two replicas in the primary
tier fail simultaneously. Thus, the replicas in the backup tier
have lower read frequency compared to the replicas in the pri-
mary tier. Therefore, the replicas in the backup tier can be
stored on cheaper storage mediums (e.g., tape, disk), and the
replicas in the primary tier can be stored on relatively fast and
expensive storage mediums (e.g., Memory, SSD). Hot data
with considerably higher request frequency could generate
heavy load on some nodes, which may lead to data unavail-
ability at a time, and cold data with lower request frequency
may waste the storage resource and increase the storage cost.
Thus, it is important to choose the storage mediums for storing
data based on the popularities of data objects.
To reduce the storage cost (as shown in Fig. 11), we
choose SSD to store the first two replicas of a hot data object
and choose tape to store its third replica; we choose SSD to
store the first replica of warm data and cold data, and
choose disk to store their second replica, and choose tape to
store their third replica with compression.
In the following, we analyze the performance of storage cost
saving of PMCR. Denote siasthesizeofdataobjectdiwithout
compression. Define Icas an indicator function representing
whether the third replica of a data object needs to be com-
pressed. Given a data object di,wehave
IcðdiÞ¼ 1;ifdataobjectd
iis hot data
0;ifdataobjectd
iis warm data or cold data:
(3)
To represent the actual storage consumption of a data
object, we define an indicator function Is
Fig. 10. An example of the Bloom filter of the set fe1;e
2;e
3;e
4g.
Fig. 11. Selecting storage mediums for data objects’ replicas based on
their popularities and the tiers where they are located.
LIU ET AL.: POPULARITY-AWARE MULTI-FAILURE RESILIENT AND COST-EFFECTIVE REPLICATION FOR HIGH DATA DURABILITY... 2361
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
IsðdiÞ¼ 1; if data object diis compressed
0; if data object diis not compressed:
(4)
Hence, the storage consumption of data object diwith com-
pression can be calculated as follows:
s0
i¼IsðdiÞsi
gþð1IsðdiÞÞ si;(5)
where gis the compression ratio, which is defined as the
ratio between the uncompressed size and compressed size.
The total storage consumption for three-way replication is
Os¼X
m
i¼1
ð2siþIsðdiÞs0
iþð1IsðdiÞÞ siÞ;(6)
where mis the number of data objects in the cloud storage
system.
Denote ci(i2f1;2;3g) as the unit cost of SSD, disk and
tape, respectively. The total storage cost (denoted by Cs)of
PMCR is
Cs¼X
m
i¼1
ððc1þðc1IcðdiÞþc2ð1IcðdiÞÞÞÞsiþc3ðIcðdiÞsi
þð1IcðdiÞÞs0
iÞÞ;
(7)
where s0
iis the storage consumption of data object diwith
compression. Compared to previous replication schemes
with the consideration of data popularity but without com-
pression [9], PMCR obtains the following storage cost savings
Cs
s¼X
m
i¼1
ðc1þðc1IcðdiÞÞ þ c2ð1IcðdiÞÞ þ c3ÞsiCs:(8)
Compared to the replication schemes without compression
or the consideration of data popularity (assume all replicas
are stored on SSD for fast recovery), PMCR obtains the fol-
lowing storage cost savings:
Cs
s¼X
m
i¼1
ðc1siÞCs:(9)
4.2 Data Durability Enhancement
4.2.1 Correlated Machine Failures
Recall that PMCR adopts FTS [11] to correlated machine fail-
ures. Each FTS is a single unit of failure because at least one
data object is lost when an FTS fails. As the number of FTS
increases, the probability of data loss caused by correlated
machine failures increases because the probability that the
failed servers constitute at least one FTS increases. Hence,
the probability data loss caused by correlated machine fail-
ures can be minimized by minimizing the number of FTSs.
The probability of failure in correlated machine failures
is equal to the ratio of the number of FTSs over the maxi-
mum possible number of sets
#FTSs
maxf#setsg:(10)
Based on the work [11], the probability of failure in corre-
lated machine failures is
pcor ¼S
R1
N
R=N
R

;(11)
where Sdenotes the scatter width (the number of servers
that could be used to store the secondary replicas of a
chunk), Rdenotes the size of FTS (i.e., the number of servers
in one FTS). Based on the work [11], the probability of fail-
ure in correlated machine failures in random replication can
be obtained by substituting #FTSs in Formula (10) by the
number of FTSs created in random replication.
The following example illustrates the process of generat-
ing FTSs. Suppose a storage system has N¼12 servers, the
size of FTS R¼3, and S¼4. Using the BIBD-based method,
one solution for achieving less number of FTSs is as follows:
B1¼f0;1;2g;B
2¼f3;4;10g;B
3¼f6;7;8g;B
4¼f9;10;11g;
B5¼f0;3;8g;B
6¼f1;4;7g;B
7¼f2;5;11g;B
8¼f5;6;9g:
The number of FTSs is 8. Therefore, the probability of
data loss caused by correlated machine failures is
#FTSs= N
R

¼8=12
3

¼0:036:
However, the number of FTSs in random replication is
72. Hence, the probability of data loss caused by correlated
machine failure in random replication is
#FTSs= N
R

¼72=12
3

¼0:327:
There are many methods for constructing BIBDs, but no
single method can create optimal BIBDs for any given combi-
nation of Nand R[41], [42]. Copyset Replication combines
BIBD and random replication to generate a non-optimal
design. When BIBD-based method cannot find BIBD, PBIBD
can be used to generate the sets of nodes for storing the repli-
cas of the data. PBIBD overcomes the limitation of BIBD and
greatly increases the chance of generating the sets of nodes.
Although the PBIBD is not an optimal approach, it can
increase the probability of successfully generating the FTSs
for the given combination.
4.2.2 Non-Correlated Machine Failures
In non-correlated machine failures, the failure events of
machines are statistically independent of each other. They can
be categorized into uniform and nonuniform machine fail-
ures. In the scenario of uniform machine failures, each
machine fails with the same probability, denoted by p(0<
p<1), possibly due to the same computer configuration. The
data object is lost if any chunk of the data object is lost, and a
chunk is lost only if all the replicas of the chunk are lost. In
this analysis, we assume each data object has three replicas.
Hence, a chunk loss probability is puni ¼p3, and the expected
number of chunk loss per data object due to uniform machine
failure is
Epuni ¼X
m
j¼1
Mp3
!
=m; (12)
where Mis the number of chunks for each data object, and
mis the number of data objects.
2362 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
In the scenario of nonuniform machine failures, each
machine fails with different probabilities, denoted by pi
(0<p
i<1), possibly due to different hardware/software
compositions and configurations. We assume replicas of
data objects are placed on machines with no concern for
individual machine failures. Denote p1;...;p
Nas the failure
probabilities of Nservers in the cloud storage system,
respectively. According to the work [9], the expected data
object failure probability is the same as that on uniform fail-
ure machines with per-machine failure probability equaling
PN
i¼1pi=N. Hence, an approximation of chunk loss probabil-
ity is pnon ¼ð
PN
i¼1pi=NÞ3(the actual data chunk loss proba-
bility for certain machines would be pipjpkwhere i, j and
k represent the machines which store the data chunks and
pi;p
j;p
kare the failure probobalities of those machines). The
approximate number of expected data chunk loss per data
object caused by nonuniform machine failure is
Epnon ¼X
m
j¼1
MX
N
k¼1
pk=N
!
3
0
@1
A=m: (13)
4.2.3 Correlated and Non-Correlated Machine Failures
Denote Fas the event that failure occurs, U1as the event that
correlated machine failures occur, U2as the event that uni-
form machine failure occurs, and U3as the event that nonuni-
form machine failure occurs. Based on previous works [11],
[29], both correlated and non-correlated machine failures
(uniform and nonuniform machine failures) exist in cloud
storage system, and any type of machine failures can incur
data loss. Then, the probability of data loss caused by
machine failures (correlated and non-correlated machine
failures) is obtained as follows:
PðFÞ¼X
3
i¼1
PðFjUiÞPðUiÞX
3
i¼1
PðUiÞ¼1
!
;(14)
where PðFjU1Þ(pcor in Formula (11)), PðFjU2Þ(puni in For-
mula (12)) and PðFjU3Þ(pnon in Formula (13)) are the proba-
bilities of a data object loss due to correlated machine
failures, uniform machine failure and nonuniform machine
failure, respectively. PðU1Þ,PðU2Þ, and PðU3Þare the proba-
bilities of the occurrences of correlated machine failures,
uniform machine failure and nonuniform machine failure,
respectively.
4.3 Bandwidth Cost Reduction
Replication can enhance data durability and availability but
may incur bandwidth cost because the bandwidth is required
to keep replicas synchronized [43]. To reduce bandwidth cost,
PMCR first categorizes data into read-intensive and write-
intensive data based on the historical operations (i.e., read
and write) on the data [44]. Then PMCR uses SC to compress
the third replicas of read-intensive data, and uses DC to com-
press the third replicas ofwrite-intensive data.
Based on the previous work [45], the data object write
overhead is linear with the number of data object replicas.
The bandwidth cost of a data object’s partition caused by
maintaining the consistency between the replicas of the par-
tition can be approximated as the product of the number of
replicas of the partition and the average communication
cost parameter (denoted by mcom) [28], i.e., 3mcom (for three-
way replication). Thus, the total bandwidth cost of all data
objects caused by maintaining the consistency between the
replicas of data objects can be calculated as follows:
Cc
b¼X
m
j¼1
ð3MmcomÞ:(15)
Based on the work [46], the fixed average communication
cost can be computed as follows:
mcom ¼suEX
i;j
disðSi;S
jÞs
"#
;(16)
where suis the average update message size, disðSi;S
jÞis the
geographic distance between the server storing the original
copy Si(referred to as primary server)andareplicaserverSj.
The geographic distance is an expectation of all possible dis-
tances between primary server and replica servers, which is
calculated from a probabilistic perspective. sis the average
communication cost of a unit data per unit distance.
A storage system should be capable of recovering from the
loss of data when failures occur, which preserves the reliabil-
ity guarantees of the system over time. Failure recovery also
results in bandwidth cost. When a node fails, all data chunks
it was hosting need to be recreated on a new node (We assume
a new node is available for replacing the faulty ones [18].), that
is, a new node needs to download the data stored on the faulty
node to repair the data and replace the failure node.
For simplicity, we assume the data in primary tier and
backup tier is evenly distributed over the servers. Hence, the
total bandwidth cost caused by recovering data, denoted by
Cr
b,is
Cr
b¼Pm
i¼1ð2siÞ
b2N
3c2NPðFÞ
3
þPm
i¼1ðIsðdiÞs0
iþð1IsðdiÞÞsiÞ
bN
3cNPðFÞ
3dd;
(17)
where s0
iand siare the size of the data object diwith and
without compression, respectively, and Nis the number of
nodes in the cloud storage system. Pm
i¼1ð2siÞ=b2N
3cand
Pm
i¼1ðIsðdiÞs0
iþð1IsðdiÞÞsiÞ=bN
3care the average amount
of data stored on a server in the primary tier and the backup
tier for three-way replication, respectively. d2NP ðFÞ
3eand
bNPðFÞ
3care the number of failure nodes in the primary tier
and the backup tier, respectively. ddis the average commu-
nication cost per unit of data between primary servers and
replica servers in the storage system, and it is calculated as
E½Pi;j disðSi;S
jÞs.
Based on Formulas (15) and (17), the total bandwidth cost
caused by consistency maintenance and data recovery is
Cb¼Cc
bþCr
b:(18)
Compared to previous replication schemes without
compression [11], the bandwidth cost savings obtained by
PMCR is around
Cs
b¼Pm
i¼1si
bN
3cPm
i¼1ðIsðdiÞs0
iþð1IsðdiÞÞsiÞ
bN
3c
!
:dd:(19)
LIU ET AL.: POPULARITY-AWARE MULTI-FAILURE RESILIENT AND COST-EFFECTIVE REPLICATION FOR HIGH DATA DURABILITY... 2363
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
5PERFORMANCE EVALUATION
We conducted the numerical analysis based on the parame-
ters in [11] (Table 2) derived from the system statistics from
Facebook and HDFS (as illustrated in Table 1) [6], [11], [13],
[14], [22], [48], and also conducted real-world experiments
on Amazon S3.
5.1 Numerical Analysis
We conducted numerical analysis under various scenarios.
We compare our method with the other replication schemes:
Random Replication (RR), Copyset Replication (Copy-
set) [11], Tiered Replication (TR) [21] and WAN Optimized
Replication (WOR) [49]. RR is based on Facebook’s design,
which chooses secondary replica holders from a window of
nodes around the primary node. We use Rto denote the
number of replicas for each data chunk. Specifically, RR pla-
ces the primary replica on a random node (say node i) in the
system, and places the secondary replicas on ðR1Þnodes
around the primary node (i.e., nodes i+1, i+2,...).
2
Copyset
splits the nodes into a number of Copysets, and constrains
the replicas of every chunk to a single Copyset so that it can
reduce the frequency of data loss by minimizing the number
of Copysets for correlated machine failures. TR stores the
first two replicas of a data object in the primary tier for pro-
tecting against independent node failures, and stores the
third replica in the backup tier for protecting against corre-
lated failures. WOR uses three-way random replication and
Delta Compression for replication of backup datasets. The
storage medium for the third replica is disk in RR, Copyset
and WOR, and is disk or tape that is randomly chosen in TR.
The number of nodes that experience concurrent failures in
the system was set to 1 percent of the nodes in the system [50].
We randomly generated 6 bit number from reasonable
ranges for each node to represent its location. The distribu-
tions of the file popularity and updates follow those of FIU
trace. Table 2 shows the parameter settings in our analysis
unless otherwise specified.
We first calculate the probability of data loss for each
method. We use Formula (14) to calculate the probability of
data loss for PMCR and Formula (11) for Copyset. We use the
method in [11] to calculate the data loss probability of random
replication for RR and WOR, and use the method in [21] to cal-
culate the data loss probability for TR. Figs. 12a and 12b show
the relationship between the probability of data loss and the
number of nodes in the Facebook and HDFS environments,
respectively. We see that the probability of data loss follows
PMCR <TR <Copyset <RRWOR. PMCR, TR and Copyset
generate lower probabilities of data loss than RR and WOR
because they constrain the replicas of a data object to an FTS
which can reduce the probability of data loss in correlated
machine failures. TR and PMCR generate lower probabilities
of data loss than Copyset because they separate the primary
data from the backup data by storing the backup data on a
remote site, which can further reduce the correlation in fail-
ures between nodes in the primary tier and the backup
tier [21]. The probability of data loss in PMCR is slightly lower
than TR because PMCR chooses different storage mediums
for data with different popularities, which decreases the prob-
ability of the occurrence of correlated machine failures.
We then calculate the availability of request data object
by 1Pm
i¼1riMðPðFÞÞ3, where riis the normalized prob-
ability of requesting data di. Figs. 13a and 13b show the rela-
tionship between the availability of requested data objects
and the number of nodes in the Facebook and HDFS envi-
ronments, respectively. We observe that the availability fol-
lows PMCR >TR >Copyset >RRWOR. PMCR, TR and
Copyset produce greater data availability than RR and
WOR because they constrain the replicas of a data object to
an FTS to reduce the probability of data loss caused by cor-
related machine failures and thus increase the availability
of data object requests. PMCR and TR generate higher data
availability than Copyset because they separate the primary
data from the backup data by storing the backup data on a
remote site, which can further reduce the correlation in fail-
ures between nodes in the primary tier and the backup
tier [21]. Therefore, PMCR and TR have higher availability
of requested data objects than Copyset.
We then use Formula (18) to calculate the bandwidth cost
for PMCR. For RR, Copyset and TR, we use Formula (18)
TABLE 2
Parameter Settings
Parameter Meaning Setting
N# of servers 1,000-10,000
M# of chunks of a data object 50 [47]
R# of servers in each FTS 3
# of FTSs containing a pair of servers 1
SScatter width 4
pProb. of a server failure 0.5
m# of data objects 10,000-50,000
TABLE 1
Parameters from Publicly Available Data [11]
System Chunks per node Cluster size Scatter width
Facebook 10,000 1,000-5,000 10
HDFS 10,000 100-10,000 200
Fig. 12. Probability of data loss versus the number of nodes.
Fig. 13. Availability of requested data object versus the number of
nodes.
2. RR is based on Facebook’s design, which chooses secondary rep-
lica holders from a window of nodes around the primary node.
2364 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
without considering compression. For WOR, we use For-
mula (18) with the consideration of compression. Figs. 14a
and 14b show the relationship between the bandwidth cost
and the number of data objects in the Facebook and HDFS
environments, respectively. We observe that the bandwidth
cost increases as the number of data objects increases. This
is because more data objects lead to more data transfers for
data updates and for data requests, which results in higher
bandwidth cost. We also see that the bandwidth cost fol-
lows PMCR <WOR <TRCopysetRR. PMCR and WOR
generate lower bandwidth cost than TR, Copyset and RR
because they use compression and deduplication to reduce
the data size in storage, which can reduce the bandwidth
cost for data transfer.
We then use Formula (7) to calculate the storage cost for
PMCR. For RR and Copyset, we use Formula (7) without
considering compression or the selection of different storage
mediums for storing data objects. For WOR, we use For-
mula (7) with the consideration of compression and without
the selection of different storage mediums for storing data
in the backup tier. For TR, we use Formula (7) without con-
sidering compression but with the selection of different
storage mediums for storing replicas in the backup tier.
Figs. 15a and 15b show the relationship between the storage
cost and the number of data objects in the Facebook and
HDFS environments, respectively. We see that the storage
cost increases as the number of data objects increases
because the more the data objects, the more storage resource
needed for storing the data objects. We also see that the stor-
age cost follows PMCR <WOR <TR <CopysetRR. TR has
lower storage cost than Copyset and RR and higher storage
cost than WOR and PMCR. This is because TR uses less
expensive storage medium to store the third replicas of data
objects to reduce the storage cost, which is not considered in
Copyset and RR. WOR utilizes data compression and data
deduplication to reduce storage cost. PMCR has the lowest
storage cost because PMCR considers data popularity and
uses compression to reduce the amount of data stored in the
system, and also chooses less expensive storage mediums to
store unpopular data objects.
Figs. 16a and 16b show the relationship between the MTTF
(mean time to failure) and the number of nodes in the Facebook
environment and HDFS environment, respectively. In Figs. 16a
and 16b, we see that the MTTF decreases as the number of
nodes increases. This is because the probability of correlated
machine failures increases as the number of nodes increases,
which increases the probability that the nodes fail. We also see
that the MTTF follows PMCRTR >CopysetRRWOR. The
reason for the MTTF in PMCR, TR and Copyset being greater
than RR and WOR is PMCR, TR and Copyset constrain the rep-
licas of a data object to an FTS to reduce the probability of
data loss caused by correlated machine failures and thus
increase the availability of data object requests.
To test the computational overhead on compression of
PMCR, we tested the runtime of PMCR and PMCR without
compression (PMCRW/oC), a variant of PMCR in which
compression is not used. Figs. 17a and 17b show the relation-
ship between the computational overhead (runtime) and the
number of objects in the Facebook environment and HDFS
environment, respectively. In Figs. 17a and 17b, we see that
the runtime of PMCRW/oC is less than that of PMCR, and
the runtime increases as the number of data objects increases.
This is because the compression of data objects introduces
additional time consumption and the larger the number of
data objects the more the time required for compression. We
also see that the runtime of PMCR increases faster than that
of PMCRW/oC as the number of data objects increases.
5.2 Real-World Experimental Results
To further verify the performance of our method in the real-
world environment, we conducted experiments on Amazon
S3. We used three regions of Amazon S3 in the U.S. to gen-
erate geo-distributed storage datacenters. We created the
same number of buckets in each region and each bucket
contains a data server. We varied the number of buckets
from 10 to 30 with step size 5. We generated 50,000 data
objects. The sizes of data objects follow a normal distribu-
tion. We distributed the data objects to servers randomly.
We used the distributions of read and writes from the FIU
Fig. 14. Bandwidth cost versus the number of data objects.
Fig. 15. Performance on storage cost of various methods.
Fig. 16. Performance on MTTF of different methods with scatter
width of 4.
Fig. 17. Performance on computational overhead on compression of
PMCR.
LIU ET AL.: POPULARITY-AWARE MULTI-FAILURE RESILIENT AND COST-EFFECTIVE REPLICATION FOR HIGH DATA DURABILITY... 2365
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
trace to generate reads and writes. The requests were gener-
ated from servers in Windows Azure eastern region. We
consider the requests targeting each region with latency
more than 100 ms as failed requests due to unavailable data
objects. In the test, Nis the number of simulated data serv-
ers. We used the actual price of the data access of Amazon
S3 [51] to calculate the storage cost and the bandwidth cost.
We first measure the probability of data loss for each
method. Figs. 18a and 18b show the relationship between
the probability of data loss and the number of nodes on
Amazon S3 with R¼3and R¼2, respectively. We see that
the probability of data loss increases as the number of nodes
increases. We also see that the probability of data loss
approximately follows PMCRTR <Copyset <WORRR.
Both our numerical result and real-world experimental
result confirm that PMCR and TR generate the lowest prob-
ability of data loss. PMCR and TR generate relatively lower
probability of data loss than Copyset because they separate
the primary data from the backup data by storing the
backup data on a remote site, which further reduces the cor-
relation in failures between nodes in the primary tier and
the backup tier [21]. Copyset generates lower probability of
data loss than WOR and RR. The reason is that Copyset con-
straints the replica nodes of every chunk to a single Copyset
and reduces probability of data loss in correlated machine
failures. WOR and RR cannot handle correlated machine
failures and thus have higher probability of data loss. By
examining Figs. 18a and 18b, we find that the probability of
data loss in Fig. 18b is higher than that in Fig. 18a. This is
because fewer replicas for a chunk lead to a higher probabil-
ity that all the servers storing the chunk fail concurrently,
hence a higher probability of data loss.
Figs. 19a and 19b show the relationship between the
availability of requested data objects and the number of
nodes on Amazon S3 with R¼3and R¼2, respectively.
We see that the availability of requested data objects
decreases as the number of nodes increases. We also see
that the availability of requested data objects follows
PMCRTR >Copyset >RRWOR due to the same reasons
explained in Fig. 13. Comparing Figs. 19a and 19b, we find
that the availability of requested data objects in Fig. 19a is
higher than that in Fig. 19b. The reason is that the more the
replicas for a chunk, the lower the probability that all the
machines storing the chunk fail concurrently, leading to
higher availability of requested data objects.
Figs. 20a and 20b show the relationship between the band-
width cost and the number of data objects on Amazon S3 with
R=3 and R=2, respectively. We observe that the bandwidth
cost increases as the number of data objects increases due to
the same reasons explained in Fig. 14. We also see that the
bandwidth cost follows PMCRWOR <TRCopysetRR.
TR generates higher bandwidth cost than WOR and PMCR.
This is because WOR and PMCR compress data objects and
reduce the size of data for transfer for data requests and
updates, and therefore reduce the bandwidth cost. Both the
numerical result in Fig. 14 and the real-world experimental
result in Fig. 20 indicate that compression (with deduplica-
tion) in replication is effective in reducing bandwidth cost. By
examining Figs. 20a and 20b, we see that the bandwidth cost
increases as the number of replicas for each data object
increases. This is because more replicas for each data object
lead to more data transfers for data requests and updates.
Figs. 21a and 21b depict the relationship between the
storage cost and the number of data objects on Amazon S3
with R=3 and R=2, respectively. We see that the storage cost
increases as the number of data objects increases due to the
same reasons explained in Fig. 15. We also find that the stor-
age cost follows PMCR <WOR <TR <CopysetRR. WOR
generates higher storage cost than PMCR and lower storage
cost than TR, Copyset and RR. This is because PMCR and
WOR compress data objects, which reduces the storage
cost, but other methods do not use compression. Moreover,
PMCR considers data popularity neglected in all the other
methods, and chooses less expensive storage media for stor-
ing unpopular data objects, which further reduces the stor-
age cost. Comparing Figs. 21a and 21b, we see that the
Fig. 18. Probability of data loss versus the number of nodes on Amazon
S3 with scatter width S=4.
Fig. 19. Availability of requested data objects versus the number of
nodes on Amazon S3 with scatter width S=4.
Fig. 20. Performance on bandwidth cost of various methods on
Amazon S3.
Fig. 21. Performance on storage cost of various methods on
Amazon S3.
2366 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
storage cost increases as the number of replicas for each
data object increases. The reason is that more replicas for
each data object lead to higher storage consumption for stor-
ing data objects.
Figs. 22a and 22b shows the relationship between the
MTTF and the number of nodes on Amazon S3 when R¼3
and R¼2, respectively. We find that the MTTF follows
PMCRTR >Copyset >WORRR. The results approxi-
mately conform the numerical results. Copyset, TR and
PMCR have larger MTTF than WOR and RR because they
constrain the replica nodes of every chunk in a single FTS,
which reduces the failures in correlated machine failures.
Copyset has relatively smaller MTTF than TR and PMCR
because TR and PMCR separate the primary data from the
the backup data and store the backup data in a remote site,
which reduces the correlation in failures between nodes in
the primary tier and the backup tier.
6RELATED WORK
Many methods have been proposed to prevent data loss
caused by correlated or non-correlated machine failures.
Zhong et al. [9] assumed independent machine failures, and
proposed a model that achieves high expected service avail-
ability under given space constraint, object request popular-
ities, and object sizes. However, this model does not
consider correlated machine failures and hence cannot han-
dle such failures. Nath et al. [8] identified a set of design
principles that system builders can use to tolerate failures.
Cidon et al. [11] proposed Copyset Replication to reduce
the frequency of data loss caused by correlated machine fail-
ures by limiting the replica nodes of many chunks to a sin-
gle Copyset. Chun et al. [4] proposed the Carbonite
replication algorithm for keeping data durable at a low cost.
Carbonite ensures that creating new copies of data objects is
faster than permanent disk failures. Cidon et al. [21] pro-
posed a Tiered Replication that splits the cluster into a pri-
mary tier and abackup tier. The first two replicas of the data
are stored on the primary tier, which is used for protect
against independent node failures; the third replica is stored
on the backup tier, which is used to protect against corre-
lated failures. However, these methods do not try to reduce
the storage cost and bandwidth cost caused by replication.
There is a large body of work on enhancing data avail-
ability and durability. Renesse et al. [52] proposed chain
replication to coordinate clusters of fail-stop storage servers
for supporting large-scale storage services that exhibit
high throughput and availability without sacrificing
strong consistency guarantees. Almeida et al. [53] proposed
ChainReaction, a Geo-distributed key-value datastore, to
offer causal+ consistency, with high performance, fault-
tolerance, and scalability. Zhang et al. [54] proposed Mojim
to provide the reliability and availability in large-scale storage
systems while preserving the performance of non-volatile
main memory. Mojim uses a two-tier architecture in which
the primary tier contains a mirrored pair of nodes and the sec-
ondary tier contains one or more secondary backup nodes
with weakly consistentcopies of data. Kim et al. [55] proposed
SHADOW systems to provide high availability. SHADOW
systems push the task of managing database replication into
the underlying storage service, and provide write offloading
to free the active database system from the need to update the
persistent database. Colgrove et al. [56] presented Purity, a
all-flash enterprise storage system to support compression,
deduplication and high-availability. Specifically, Purity lever-
ages flash’s ability to perform fast random reads and sequen-
tial writes by compressing data and storing a single instance
of duplicate blocks written to different logical addresses.
However, these works fail to consider data popularity to
reduce the storage cost and bandwidth cost without
compromising data request delay greatly.
In order to reduce the storage cost and bandwidth cost
caused by replication, many methods have been proposed.
Shilane et al. [49] proposed a new method for replicating
backup datasets across a wire area network (WAN). The
method can eliminate duplicate regions of files (deduplica-
tion) and also compress similar regions of files with Delta
compression. The method leverages deduplication locality
to also find similarity matches used for delta compression.
Puttaswamy et al. [57] proposed FCFS, a storage solution
that drastically reduces the cost of operating a file system in
the cloud. FCFS integrates multiple storage services and
dynamically adapts the storage volume sizes of each service
to provide a cost-efficient solution with provable perfor-
mance bounds. However, these methods do not consider
data popularity to reduce the storage cost and bandwidth
cost. Also, these methods neglect correlated machine fail-
ures, which can result in data loss in such failures.
To resolve the problems in the existing replication
schemes, we propose PMCR that can effectively handle
both correlated and non-correlated machine failures and
also considers different file popularities to increase data
durability and availability and reduce the bandwidth cost
and storage cost without compromising data request delay
greatly.
7CONCLUSION
Previous replication schemes for cloud storage systems con-
sider correlated machine failures or non-correlated machine
failures to reduce data loss. However, although data repli-
cas bring about additional storage and bandwidth costs,
few methods aim to maximize data durability and availabil-
ity while reducing the cost caused by replication (i.e., stor-
age cost and bandwidth cost) with the consideration of data
popularity. In this paper, in order to improve data durabil-
ity and availability, and meanwhile reduce costs caused by
replication, we propose a popularity-aware multi-failure
resilient and cost-effective replication scheme (PMCR).
PMCR classifies data into hot data, warm data and cold
data based on the data popularity. PMCR puts the first two
replicas of data objects to primary tier and puts the third
replicas to backup tier. The replicas of the same data are put
into one fault-tolerant set to handle correlated failures.
Fig. 22. Performance on MTTF of different methods with scatter width of
4 on Amazon S3.
LIU ET AL.: POPULARITY-AWARE MULTI-FAILURE RESILIENT AND COST-EFFECTIVE REPLICATION FOR HIGH DATA DURABILITY... 2367
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
PMCR uses SC for read-intensive data and uses DC for
write-intensive data to compress the third replicas of warm
data and cold data in the backup tier to reduce both storage
cost and bandwidth cost. Our extensive numerical analysis
and real-world experimental results on Amazon S3 show
that PMCR outperforms other replication schemes in differ-
ent performance metrics. In the future, we will further con-
sider network failures to further reduce the data loss and
improve the data durability. Also, we will consider the
effects of node joining and node leaving. Further, we will
consider energy consumption of machines and design a rep-
lication scheme to save energy.
ACKNOWLEDGMENTS
This research was supported in part by U.S. National Science
Foundation grants OAC-1724845, CNS-1733596 and ACI-
1661378, Microsoft Research Faculty Fellowship 8300751,
and IBM Ph.D. fellowship award 2017. An early version of
this work was presented in the Proceedings of Big Data
2016 [58]. We would like to thank Dr. Rajkumar Buyya, Dr.
Kuang-Ching Wang, Dr. James Z. Wang, Dr. Adam Hoover,
Dr. Wingyan Chung, Dr. Svetlana Poznanovi
candDr.Warren
Adams for their help on this work.
REFERENCES
[1] Amazon S3. [Online]. Available: http://aws.amazon.com/s3.
Accessed on: Jan. 2016.
[2] Google cloud storage. [Online]. Available: http://cloud.google.
com/storage. Accessed on: Jan. 2016.
[3] Windows azure. [Online]. Available: http://www.microsoft.
com/windowsazure. Accessed on: Jan. 2016.
[4] B. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon,
M. Kaashoek, J. Kubiatowicz, and R. Morris, “Efficient replica
maintenance for distributed storage systems,” in Proc. USENIX
Conf. Netw. Syst. Des. Implementation, 2006, pp. 4–4.
[5] G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Laksh-
man, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels,
“Dynamo: Amazon’s highly available key-value store,” in Proc.
ACM Symp. Operating Syst. Principles, 2007, pp. 205–220.
[6] R. Chansler, “Data availability and durability with the hadoop
distributed file system,” USENIX Mag., vol. 37, pp. 16–22, 2012.
[7] J. Dean, “Evolution and future directions of large-scale storage
and computation systems at Google,” in Proc. ACM Symp. Cloud
Comput., 2010, pp. 1–1.
[8] S. Nath, H. Yu, P. Gibbons, and S. Seshan, “Subtleties in toler-
ating correlated failures in wide-area storage systems,” in
Proc. USENIX Conf. Netw. Syst. Des. Implementation, May 2006,
pp. 225–238.
[9] M. Zhong, K. Shen, and J. Seiferas, “Replication degree customiza-
tion for high availability,” in Proc. ACM SIGOPS/EuroSys Eur.
Conf. Comput. Syst., 2008, pp. 55–68.
[10] A. Haeberlen, A. Mislove, and P. Druschel, “Glacier: Highly
durable, decentralized storage despite massive correlated fail-
ures,” in Proc. USENIX Conf. Netw. Syst. Des. Implementation,
2005, pp. 143–158.
[11] A. Cidon, S. Rumble, R. Stutsman, S. Katti, J. Ousterhout, and
M. Rosenblum, “Copysets: Reducing the frequency of data loss in
cloud storage,” in Proc. USENIXAnnu. Techn. Conf., 2013, pp. 37–48.
[12] J. Liu and H. Shen, “A low-cost multi-failure resilient replication
scheme for high data availability in cloud storage,” in Proc. Int.
Conf. High Perform. Comput., 2016, pp. 242–251.
[13] K. Shvachko, H. Kuang, S. Radia, and R. Chansler, “The Hadoop
distributed file system,” in Proc. Symp. Mass Storage Syst. Technol.,
2010, pp. 1–10.
[14] D. Borthakur, J. Gray, J. Sarma, K. Muthukkaruppan,
N. Spiegelberg, H. Kuang, K. Ranganathan, D. Molkov, A. Menon,
S. Rash, R. Schmidt, and A. Aiyer, “Apache Hadoop goes realtime
at Facebook,” in Proc. SIGMOD Int. Conf. Manage. Data, 2011,
pp. 1071–1080.
[15] S. Balakrishnan, R. Black, A. Donnelly, P. England, A. Glass,
D. Harper, S. Legtchenko, A. Ogus, E. Peterson, and A. Rows-
tron, “Pelican: A building block for exascale cold data storage,”
in Proc. USENIX Conf. Operating Syst. Des. Implementation, 2014,
pp. 351–365.
[16] M. Ghosh, A. Raina, L. Xu, X. Qian, I. Gupta, and H. Gupta,
“Popular is cheaper: Curtailing memory costs in interactive ana-
lytics engines,” in Proc. ACM SIGOPS/EuroSys Eur. Conf. Comput.
Syst., 2018, Art. no. 40.
[17] G. Ananthanarayanan, S. Agarwal, S. Kandula, A. Greenberg,
I. Stoica, D. Harlan, and E. Harris, “Scarlett: Coping with skewed
content popularity in MapReduce clusters,” in Proc. ACM
SIGOPS/EuroSys Eur. Conf. Comput. Syst., Apr. 2011, pp. 287–300.
[18] F. Andr
e, A. Kermarrec, E. Merrer, N. Scouarnec, G. Straub, and
A. Kempen, “Archiving cold data in warehouses with clustered
network coding,” in Proc. ACM SIGOPS/EuroSys Eur. Conf. Com-
put. Syst., 2014, Art. no. 21.
[19] A. March, storage pod 4.0: Direct wire drives - faster, simpler, and
less expensive. Mar. 2014. [Online]. Available: http://blog.
backblaze.com/2014/03/19/backblaze-storage-pod-4/, Accessed
on: Jan. 2016.
[20] G. A. N. Yasa and P. C. Nagesh, “Space savings and design con-
siderations in variable length deduplication,” SIGOPS Operating
Syst. Rev., vol. 46, no. 3, pp. 57–64, 2012.
[21] A. Cidon, R. Escriva, S. Katti, M. Rosenblum, and E. G. Sirer,
“Tiered replication: A cost-effective alternative to full cluster geo-
replication,” in Proc. USENIX Annu. Techn. Conf., 2015, pp. 31–43.
[22] D. Ongaro, S. Rumble, R. Stutsman, J. Ousterhout, and M. Rose-
nblum, “Fast crash recovery in RAMCloud,” in Proc. ACM Symp.
Operating Syst. Principles, 2011, pp. 29–41.
[23] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google file
system,” in Proc. ACM Symp. Operating Syst. Principles,2003,
pp. 29–43.
[24] B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold,
S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas,
C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali,
R. Abbasi, A. Agarwal, M. F. U. Haq, M. I. U. Haq, D. Bhardwaj,
S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, and
K. M. L. Rigas, “Windows azure storage: A highly available cloud
storage service with strong consistency,” in Proc. ACM Symp.
Operating Syst. Principles, 2011, pp. 143–157.
[25] Y. Zhang, C. Guo, D. Li, R. Chu, H. Wu, and Y. Xiong,
“CubicRing: Enabling one-hop failure detection and recovery for
distributed in-memory storage systems,” in Proc. USENIX Conf.
Netw. Syst. Des. Implementation, 2015, pp. 529–542.
[26] H. Abu-Libdeh, R. Renesse, and Y. Vigfusson, “Leveraging shard-
ing in the design of scalable replication protocols,” in Proc. ACM
Symp. Cloud Comput., 2013, Art. no. 12.
[27] S. Y. Ko, I. Hoque, B. Cho, and I. Gupta, “Making cloud intermedi-
ate data fault-tolerant,” in Proc. ACM Symp. Cloud Comput., 2010,
pp. 181–192.
[28] N. Bonvin, T. Papaioannou, and K. Aberer, “A self-organized,
fault-tolerant and scalable replication scheme for cloud storage,”
in Proc. ACM Symp. Cloud Comput., 2010, pp. 205–216.
[29] D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong,
L. Barroso, C. Grimes, and S. Quinlan, “Availability in globally
distributed storage systems,” in Proc. USENIX Conf. Operating
Syst. Des. Implementation, 2010, pp. 61–74.
[30] S. Welham, S. Gezan, S. Clark, and A. Mead, Statistical Methods in
Biology: Design and Analysis of Experiments and Regression. Boca
Raton, FL, USA: CRC Press, 2014.
[31] Cloud VPS. [Online]. Available: https://www.cloudvps.nl/.
Accessed on: Jan. 2016.
[32] R. Koller and R. Rangaswami, “I/O deduplication: Utilizing con-
tent similarity to improve I/O performance,” in Proc. USENIX
Conf. File Storage Technol., 2010, pp. 16–16.
[33] N. Jain, M. Dahlin, and R. Tewari, “TAPER: Tiered approach for
eliminating redundancy in replica synchronization,” in Proc. USE-
NIX Conf. File Storage Technol., 2005, pp. 281–294.
[34] W. Xia, H. Jiang, D. Feng, and Y. Hua, “SiLo:A similarity-locality based
near-exact deduplication schemewithlowRAMoverheadandhigh
throughput,” in Proc. USENIX Annu. Techn. Conf., 2011, pp. 26–28.
[35] G. You, S. Hwang, and N. Jain, “Scalable load balancing in cluster
storage systems,” in Proc. Int. Middleware Conf., 2011, pp. 100–119.
[36] Y. Fu, H. Jiang, and N. Xiao, “A scalable inline cluster deduplica-
tion framework for big data protection,” in Proc. Int. Middleware
Conf., 2012, pp. 354–373.
2368 IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. 30, NO. 10, OCTOBER 2019
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
[37] N. Bonvin, T. Papaioannou, and A. Aberer, “Autonomic SLA-
driven provisioning for cloud applications,” in Proc. IEEE/ACM
Int. Symp. Cluster Cloud Grid Comput., 2011, pp. 434–443.
[38] K. Chen and H. Shen, “DSearching: Distributed searching of
mobile nodes in DTNs with floating mobility information,” in
Proc. IEEE INFOCOM, 2014, pp. 2283–2291.
[39] X. Lin, G. Lu, F. Douglis, P. Shilane, and G. Wallace, “Migratory
compression: Coarse-grained data reordering to improve
compressibility,” in Proc. USENIX Conf. File Storage Technol., 2014,
pp. 257–271.
[40] B. H. Bloom, “Space/time trade-offs in hash coding with allow-
able errors,” Commun. ACM, vol. 13, no. 7, pp. 422–226, 1970.
[41] J. J. S. Houghten, L. Thiel and C. Lam, “There is no (46, 6, 1) block
design*,” J. Combinatorial Des., vol. 9, no. 1, pp. 60–71, 2001.
[42] P. Kaski and P.
Osterga
rd, “There exists no (15, 5, 4) RBIBD,” J.
Combinatorial Des., vol. 9, pp. 227–232, 2001.
[43] L. Xu, A. Pavlo, S. Sengupta, J. Li, and G. Ganger, “Reducing repli-
cation bandwidth for distributed document databases,” in Proc.
ACM Symp. Cloud Comput., 2015, pp. 222–235.
[44] D. Arteaga and M. Zhao, “Client-side flash caching for cloud sys-
tems,” in Proc. ACM Int. Syst. Storage Conf., Jun. 2014, pp. 1–11.
[45] J. Kubiatowicz, D. Bindel, Y. Chen, P. Eaton, D.Geels, R. Gummadi, S.
Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao,
“OceanStore: An architecture for global-scale persistent storage,” in
Proc. Int. Conf. Archit. Support Program. Languages Operating Syst., 2000,
pp. 190–201.
[46] M. Wittie, V. Pejovic, L. Deek, K. Almeroth, and B. Zhao, “Exploiting
locality of interest in online social networks,” in Proc. ACM Int. Conf.
Emerging Netw. Experiments Technol., 2010, Art. no. 25.
[47] S. Aceda
nski, S. Deb, M. M
edard, and R. Koetter, “How good is
random linear coding based distributed networked storage,” in
Proc. 1st Workshop Netw. Coding Theory Appl., 2005, pp. 1–6.
[48] Intelligent block placement policy to decrease probability of data
loss. 2012. [Online]. Available: https://issues.apache.org/jira/
browse/HDFS-1094. Accessed on: Jan. 2016.
[49] P. Shilane, M. Huang, G. Wallace, and W. Hsu, “WAN optimized
replication of backup datasets using stream-informed delta
compression,” in Proc. USENIX Conf.File Storage Technol., 2012, p. 5.
[50] V. Rawat, “Reducing failure probability of cloud storage services
using multi-clouds,” Comput. Res. Repository, vol. abs/1310.4919,
2013, http://arxiv.org/abs/1310.4919
[51] Amazon S3 Pricing. [Online]. Available: http://aws.amazon.
com/s3/pricing/. Accessed on: Jan. 2016.
[52] R. Renesse and F. Schneider, “Chain replication for supporting
high throughput and availability,” in Proc. USENIX Conf. Operat-
ing Syst. Des. Implementation, 2004, pp. 7–7.
[53] S. Almeida, J. Leitao, and L. Rodrigues, “ChainReaction: A causal
+ consistent datastore based on chain replication,” in Proc. ACM
SIGOPS/EuroSys Eur. Conf. Comput. Syst., 2013, pp. 85–98.
[54] Y. Zhang, J. Yang, A. Memaripour, and S. Swanson, “Mojim: A
reliable and highly-available non-volatile memory system,” in
Proc. ACM Int. Conf. Archit. Support Program. Languages Operating
Syst., 2015, pp. 3–18.
[55] J. Kim, K. Salem, K. Daudjee, A. Aboulnaga, and X. Pan,
“Database high availability using SHADOW systems,” in Proc.
ACM Symp. Cloud Comput., 2015, pp. 209–221.
[56] J. Colgrove, J. D. Davis, J. Hayes, E. L. Miller, C. Sandvig, R. Sears,
A. Tamches, N. Vachharajani, and F. Wang, “Purity: Building
fast, highly-available enterprise flash storage from commodity
components,” in Proc. ACM SIGMOD Int. Conf. Manage. Data,
2015, pp. 1683–1694.
[57] K. P. N. Puttaswamy, T. Nandagopal, and M. Kodialam, “Frugal
storage for cloud file systems,” in Proc. ACM SIGOPS/EuroSys Eur.
Conf. Comput. Syst., 2012, pp. 71–84.
[58] J. Liu and H. Shen, “A popularity-aware cost-effective replication
scheme for high data durability in cloud storage,” in Proc. IEEE
Int. Conf. Big Data, 2016, pp. 384–389.
Jinwei Liu received the MS degree in Computer
Science from Clemson University, SC, USA and
University of Science and Technology of China,
China. He received his PhD degree in Computer
Engineering from Clemson University, SC, USA, in
2016. He worked as a Postdoctoral Associate at
University of Central Florida and was recognized
for outstanding achievement in academic publica-
tions. He is currently an Assistant Professor in the
Department of Computer and Information Sciences
at Florida A&M University. His research interests
include cloud computing, big data, machine learning and data mining,
cybersecurity, wireless sensor networks, social networks, HPC and IoT.
He was the member of the Program Committees of several international
conferences and the Editor-in-Chief and/or Editorial Board Member of
several journals. He isa member of the IEEE and the ACM.
Haiying Shen received the BS degree in com-
puter science and engineering from Tongji Uni-
versity, China, in 2000, and the MS and PhD
degrees in computer engineering from Wayne
State University, in 2004 and 2006, respectively.
She is currently an associate professor with the
Computer Science Department, University of
Virginia. Her research interests include cloud
computing and cyber-physical systems. She was
the program co-chair for a number of international
conferences and member of the Program Com-
mittees of many leading conferences. She is a Microsoft faculty fellow of
2010, a senior member of the IEEE and the ACM.
Husnu S. Narman received the BS degree in
mathematics from Abant Izzet Baysal University,
Turkey, in 2006, the MS degree in computer sci-
ence from the University of Texas at San Antonio,
San Antonio, Texas, in 2011, and the PhD degree
in computer science from the University of
Oklahoma, Norman, Oklahoma, in 2016. Cur-
rently, he is a faculty member with Marshall Uni-
versity, Huntington, West Virginia. His research
interests include queuing theory, network man-
agement, network topology, Internet of Things,
LTE, and cloud computing.
"
For more information on this or any other computing topic,
please visit our Digital Library at www.computer.org/publications/dlib.
LIU ET AL.: POPULARITY-AWARE MULTI-FAILURE RESILIENT AND COST-EFFECTIVE REPLICATION FOR HIGH DATA DURABILITY... 2369
Authorized licensed use limited to: Florida A& M University. Downloaded on June 28,2020 at 05:36:43 UTC from IEEE Xplore. Restrictions apply.
... In 2019, Liu et al. [38] presented a "popularity-aware multi-failure resilient and cost-effective replication (PMCR)" structure in cloud storage for high data durability. The analysis on the experiment was evaluated based on trace parameters and has demonstrated that the implemented PMCR approach has attained a lower probability of "data loss, low storage cost high data durability and bandwidth cost" while differentiating over other earlier replication strategies. ...
... The validation of the projected model has been done by varying the count of requests from 200, 400, 600, 800, and 1000, respectively. To show the enhancement of the projected model in data replication, it has been compared to the existing models like SMO, LA, ROA, GWO, SLnO, PSO [38], and HCS [40], respectively. The proposed GUEES algorithm parameters are shown in Table 2. ...
... The outcomes acquired are shown in Table 3. The projected model has been compared to the existing models like SMO, LA, ROA, GWO, SLnO, PSO [38], and HCS [40], respectively. On noticing the results, the presented model has attained a low response time (in ms), and this reduction in the response time is owing to the hybridization of the two standard optimization models (GWO and SLnO). ...
Research
Full-text available
It is critical in cloud computing to have excellent data accessibility and system performance. To improve system availability, commonly used data should be duplicated to many places, allowing users to access it from a nearby site. Deciding on a sensible number and location for replicas is a difficult problem in cloud computing. Therefore, a novel Data Replication system based on data mining techniques is being proposed in this research work. The data replication is done here by locating commonly utilized data patterns in a node’s massive database. This will be accomplished using an optimization-assisted frequent pattern mining approach, with a novel hybrid algorithm performing the best threshold selection. The proposed hybrid algorithm referred to as Greywolves Updated Exploration and Exploitation with Sealion Behaviour (GUEES), hybrids the concept of Sealion Optimization Model (SLnO) and Grey wolf optimization (GWO) algorithms. Apart from this, the mining will be carried out under the defining dual constraints such as (i) Prioritization and (ii) Cost. The prioritization falls under two cases: queuing both high and low-priority data, and the cost relies on the evaluation of storage demand. The highpriority queues are optimized with the GUEES model. Finally, a comparative validation is carried out to validate the efficiency of the adopted model. Accordingly, when the number of requests=1000, the network usage of the proposed model is 35.07%, 34.9%, 30.5%, 29.23%, 24.57%, 16.8%, and 16.85% higher than the existing methods like SMO, LA, ROA, GWO, SLnO, PSO, HCS, respectively.
... In 2019, Liu et al. [38] presented a "popularity-aware multi-failure resilient and cost-effective replication (PMCR)" structure in cloud storage for high data durability. The analysis on the experiment was evaluated based on trace parameters and has demonstrated that the implemented PMCR approach has attained a lower probability of "data loss, low storage cost high data durability and bandwidth cost" while differentiating over other earlier replication strategies. ...
... The validation of the projected model has been done by varying the count of requests from 200, 400, 600, 800, and 1000, respectively. To show the enhancement of the projected model in data replication, it has been compared to the existing models like SMO, LA, ROA, GWO, SLnO, PSO [38], and HCS [40], respectively. The proposed GUEES algorithm parameters are shown in Table 2. ...
... The outcomes acquired are shown in Table 3. The projected model has been compared to the existing models like SMO, LA, ROA, GWO, SLnO, PSO [38], and HCS [40], respectively. On noticing the results, the presented model has attained a low response time (in ms), and this reduction in the response time is owing to the hybridization of the two standard optimization models (GWO and SLnO). ...
Article
It is critical in cloud computing to have excellent data accessibility and system performance. To improve system availability, commonly used data should be duplicated to many places, allowing users to access it from a nearby site. Deciding on a sensible number and location for replicas is a difficult problem in cloud computing. Therefore, a novel Data Replication system based on data mining techniques is being proposed in this research work. The data replication is done here by locating commonly utilized data patterns in a node’s massive database. This will be accomplished using an optimization-assisted frequent pattern mining approach, with a novel hybrid algorithm performing the best threshold selection. The proposed hybrid algorithm referred to as Greywolves Updated Exploration and Exploitation with Sealion Behaviour (GUEES), hybrids the concept of Sealion Optimization Model (SLnO) and Grey wolf optimization (GWO) algorithms. Apart from this, the mining will be carried out under the defining dual constraints such as (i) Prioritization and (ii) Cost. The prioritization falls under two cases: queuing both high and low-priority data, and the cost relies on the evaluation of storage demand. The highpriority queues are optimized with the GUEES model. Finally, a comparative validation is carried out to validate the efficiency of the adopted model. Accordingly, when the number of requests=1000, the network usage of the proposed model is 35.07%, 34.9%, 30.5%, 29.23%, 24.57%, 16.8%, and 16.85% higher than the existing methods like SMO, LA, ROA, GWO, SLnO, PSO, HCS, respectively.
... Storage tiers can be effectively used to achieve high data durability and availability. For example, Liu et al. [17] developed an algorithm (PMCR) and did extensive numerical analysis and real-world experiments on Amazon S3. Extending the work on high availability, Wiera et al. [35] presents a unified approach named Wier to achieve high availability and data consistency. ...
... Several data replication strategies are available to achieve various objectives. For example, Mansouri et al. [25], Liu et al. [17], and Edwin et al. [4] developed data replication strategies to achieve optimal cost. Mansouri and Javidi [22] and Nannai and Mirnalinee [32] focused on achieving low access latency by developing dynamic data replication strategies. ...
Conference Paper
Full-text available
The cost of using cloud storage services is complex and often an unclear structure, while it is one of the important factors for organisations adopting cloud storage. Furthermore, organisations take advantage of multi-cloud or hybrid solutions to combine multiple public and/or private cloud service providers to avoid vendor lock-in, achieve high availability and performance, optimise cost, etc. This complicated ecosystem makes it even harder to understand and manage cost. Therefore, in this paper, we provide a taxonomy of cloud storage cost in order to provide a better understanding and insights on this complex problem domain.
... Many researchers and practitioners in different cloud environments such as; grid, cloud, edge and fog computing environments have widely adopted the dynamic replication mechanism due to its ability to handle data replication intelligently flexibly based on system users accessing patterns [32][33][34]. ...
... Relatively, [33] researchers have developed a popularity-aware multi-failure resilient and costeffective replication (PMCR) algorithm with an identical strategy as PRCR to store replica copies into primary and backup tiers by splitting cloud storage. The goal is to increase data resilience in cloud storage, allowing the PMCR algorithm to distinguish hot, warm and cold data based on its popularity. ...
Article
Full-text available
As the amount of data continues to grow rapidly, the variety of data produced by applications is becoming more affluent than ever. Cloud computing is the best technology evolving today to provide multi-services for the mass and variety of data. The cloud computing features are capable of processing, managing, and storing all sorts of data. Although data is stored in many high-end nodes, either in the same data centers or across many data centers in cloud, performance issues are still inevitable. The cloud replication strategy is one of best solutions to address risk of performance degradation in the cloud environment. The real challenge here is developing the right data replication strategy with minimal data movement that guarantees efficient network usage, low fault tolerance, and minimal replication frequency. The key problem discussed in this research is inefficient network usage discovered during selecting a suitable data center to store replica copies induced by inadequate data center selection criteria. Hence, to mitigate the issue, we proposed Replication Strategy with a comprehensive Data Center Selection Method (RS-DCSM), which can determine the appropriate data center to place replicas by considering three key factors: Popularity, space availability, and centrality. The proposed RS-DCSM was simulated using CloudSim and the results proved that data movement between data centers is significantly reduced by 14% reduction in overall replication frequency and 20% decrement in network usage, which outperformed the current replication strategy, known as Dynamic Popularity aware Replication Strategy (DPRS) algorithm.
... Cost Efficiency: The ongoing increase of DCNs necessitates the deployment of a significant amount of network hardware. The widespread utilization of network components becomes a limiting factor in terms of cost when constructing scalable DCNs [10], [11]. Cost Efficiency thus merits a lot of consideration. ...
Article
The escalating count of online services and substantial influx of data triggers the rapid expansion of data center networks (DCNs). DCN should ensure high scalability, low latency, and cost-efficiency, allowing it to adapt swiftly to the dynamic demands of evolving services. The task of constructing a DCN that marries flexibility, cost-effectiveness, low latency, and robust scalability has emerged as a formidable challenge. Driven by this prevailing tendency, this paper proposes a novel server-centric network termed $AlveoliNet$ . $AlveoliNet$ is a symmetric network encompassing several clusters formed of two switch layers and two server types: Symmetric and Binary servers. The Symmetric servers have two ports, implying lower cable complexity and construction cost. To perform remarkably with respect to fault tolerance, incremental scalability, and wide bisection width, the Binary servers are endowed with multi-port in accordance with the network level. $AlveoliNet$ supports incremental incorporation of servers while maintaining its fundamental topological characteristics; its diameter cannot be beyond five hops, irrespective of the DCN size. Additionally, two routing algorithms are designed exclusively for this architecture. The performance evaluations and empirical findings illustrate that $AlveoliNet$ achieves a harmonious equilibrium between flexibility, remarkable fault-tolerance, incremental scalability, cost-efficiency, and power savings compared to cutting-edge DCN structures. Specifically, the evaluation results demonstrate that for 103,680 nodes operating at 10 GB/s, the construction cost of $AlveoliNet$ is approximately 28.8%, 75.7%, 74.6%, 43.2%, and 60.5% of the cost of $Fat$ - $Tree$ , $Totoro$ , $DCell$ , $BCube$ , and $LaScaDa$ , respectively. Furthermore, the simulation outcomes reveal $AlveoliNet$ as a highly promising architecture, surpassing $Totoro$ , $DCell$ , $BCube$ , $LaScaDa$ , and $FiConn$ regarding latency by approximately 12%, 14%, 18%, 4%, and 31%, respectively.
... As the access pattern changes, the popularity of data blocks has to be updated. Various researches [1], [45] used popularity-based classification to improve the durability, availability, and read performance of cloud storage systems. ...
Article
Full-text available
Cloud data centers have started utilizing erasure coding in large-scale storage systems to ensure high reliability with limited overhead compared to replication. However, data recovery in erasure coding incurs high network bandwidth consumption compared to replication. Cloud storage systems also play an important role in the energy consumption of data centers. Heuristic proactive recovery algorithms select all data blocks from failure-predicted disk/machine and perform proactive replication that contributes to huge recovery bandwidth savings. However, they fail to optimize the selection. Optimization can further improve resource savings. To address this issue, we propose a recovery algorithm that applies minimization on data blocks selected for proactive replication by considering the necessary and appropriate constraints that are constructed based on the system’s current network traffic and data duplication information. We evaluate the proposed algorithm using extensive simulations. Experiments show that the recovery algorithm reduces network traffic by 60% and storage overhead by 46% compared to the heuristic proactive recovery approach. Also, the proposed proactive recovery methods reduce the storage system’s energy consumption by up to 52% compared to replication.
Article
Full-text available
Cloud service providers offer application providers with virtually infinite storage and computing resources, while providing cost-efficiency and various other quality of service (QoS) properties through a storage-as-a-service (StaaS) approach. Organizations also use multi-cloud or hybrid solutions by combining multiple public and/or private cloud service providers to avoid vendor lock-in, achieve high availability and performance, and optimise cost. Indeed cost is one of the important factors for organizations while adopting cloud storage; however, cloud storage providers offer complex pricing policies, including the actual storage cost and the cost related to additional services (e.g., network usage cost). In this article, we provide a detailed taxonomy of cloud storage cost and a taxonomy of other QoS elements, such as network performance, availability, and reliability. We also discuss various cost trade-offs, including storage and computation, storage and cache, and storage and network. Finally, we provide a cost comparison across different storage providers under different contexts and a set of user scenarios to demonstrate the complexity of cost structure and discuss existing literature for cloud storage selection and cost optimization. We aim that the work presented in this article will provide decision-makers and researchers focusing on cloud storage selection for data placement, cost modelling, and cost optimization with a better understanding and insights regarding the elements contributing to the storage cost and this complex problem domain.
Article
Full-text available
In a distributed environment, replication is the most investigated phenomenon. Replication is a way of storing numerous copies of the same data at different locations. Whenever data is needed, it will be fetched from the nearest accessible copy, avoiding delays and improving system performance. To manage the replica placement strategy in the Cloud, three key challenges must be addressed. The challenges in determining the best time to make replicas were generated, the kind of files to replicate, as well as the best location to store the replicas. This survey conducts a review of 65 articles published on data replication in the cloud. The literature review examines a series of research publications and offers a detailed analysis. The analysis begins by presenting several replication strategies in the reviewing articles. Analysis of each contributor’s performance measures is conducted. Moreover, this survey offers a comprehensive examination of data auditing systems. This work also determines the analytical evaluation of replication handling in the cloud. Furthermore, the evaluation tools used in the papers are examined. Furthermore, the survey describes a lot of research issues & limitations that might help researchers support better future work on pattern mining for data replication in the cloud.
Article
With the development and maturity of cloud storage, it has attracted a large number of users. Although cloud users do not need to concern themselves with the infrastructure used for storage, thus saving on equipment and maintenance costs, the sheer volume of data still generates significant cloud storage usage costs, which motivates cloud storage users to look for ways to further save costs. In this article, we analyze the whole process of using cloud storage to exhaustively explore opportunities, motivations and challenges of cost optimization from user perspectives. Then we provide a comprehensive taxonomy and summary of recent advances in terms of storage efficiency (i.e., cost optimization by improving storage efficiency), cloud storage services (i.e., cost optimization by leveraging the features of cloud storage services) and emerging storage paradigms (i.e., cost optimization by leveraging emerging storage paradigms such as edge storage). Finally, we present future directions for cost optimization from user perspectives and conclude this article. This article offers a thorough survey of recent advances focusing on how to optimize the cost of using cloud storage for cloud users, and it has an opportunity to attract a broad audience in the cost-effective cloud storage market.
Book
Full-text available
Written in simple language with relevant examples, Statistical Methods in Biology: Design and Analysis of Experiments and Regression is a practical and illustrative guide to the design of experiments and data analysis in the biological and agricultural sciences. The book presents statistical ideas in the context of biological and agricultural sciences to which they are being applied, drawing on relevant examples from the authors’ experience. Taking a practical and intuitive approach, the book only uses mathematical formulae to formalize the methods where necessary and appropriate. The text features extended discussions of examples that include real data sets arising from research. The authors analyze data in detail to illustrate the use of basic formulae for simple examples while using the GenStat® statistical package for more complex examples. Each chapter offers instructions on how to obtain the example analyses in GenStat and R.
Conference Paper
Full-text available
In cloud systems, efficient resource provisioning is needed to maximize the resource utilization while reducing the Service Level Objective (SLO) violation rate, which is important to cloud providers for high profit. Several methods have been proposed to provide efficient provisioning. However, the previous methods do not consider leveraging the complementary of jobs' requirements on different resource types and job size concurrently to increase the resource utilization. Also, by simply packing complementary jobs without considering job size in the job packing, it can decrease the resource utilization. Therefore, in this paper, we consider both jobs' demands on different resource types (in the spatial space) and jobs' execution time (in the temporal space); we pack the complementary jobs (whose demands on multiple resource types are complementary to each other) belonging to the same type and assign them to a Virtual Machine (VM) to increase the resource utilization. Moreover, the previous methods do not provide efficient resource allocation for heterogeneous jobs in current cloud systems and do not offer different SLO degrees for different job types to achieve higher resource utilization and lower SLO violation rate. Therefore, we propose a Customized Cooperative Resource Provisioning (CCRP) scheme for the heterogeneous jobs in clouds. CCRP uses the hybrid resource allocation and provides SLO availability customization for different job types. To test the performance of CCRP, we compared CCRP with existing methods under various scenarios. Extensive experimental results based on a real cluster and Amazon EC2 show that CCRP achieves 50\% higher or more resource utilization and 50\% lower or less SLO violation rate compared to the previous resource provisioning strategies.
Conference Paper
Full-text available
Cloud storage system usually experiences data loss, hindering data durability. Three-way random replication is commonly used to prevent data loss in cloud storage systems. However, it cannot effectively handle correlated machine failures. Although Copyset Replication and Tiered Replication can reduce data loss in correlated and independent failures and enhance data durability, they fail to leverage different data popularities to substantially reduce the storage cost and bandwidth cost caused by replication. To address these issues, we present a popularity-aware multi-failure resilient and cost-effective replication (PMCR) scheme for high data durability in cloud storage. PMCR splits the cloud storage system into primary tier and backup tier, and classifies data into hot data, warm data and cold data based on data popularities. To handle both correlated and independent failures, PMCR stores the three replicas of the same data into one Copyset formed by two servers in the primary tier and one server in the backup tier. For the third replicas of warm data and cold data in the backup tier, PMCR uses the Similar Compression method for read-intensive data and uses the Delta Compression method for write-intensive data to reduce storage cost and bandwidth cost. As a result, these costs are reduced and data durability and availability are enhanced without compromising data request delay greatly. Extensive experiment results based on trace parameters show that PMCR achieves high data durability, low probability of data loss, and low storage cost and bandwidth cost compared to previous replication schemes.
Conference Paper
Full-text available
Software-defined networks are constantly evolving due to the updates such as network function (NF) state updates, VM migrations. Network functions virtualization (NFV) with software-defined networking (SDN) has the capability of accurately monitoring and manipulating network traffic, and reducing operating cost. However, it cannot effectively handle the congestion and satisfy service level agreements (SLAs) on NF performance (e.g., throughput) while minimizing the operating cost in the scenarios of requirements for packet processing to be redistributed across a collection of NF instances simultaneously. Although OpenNF, a control plane architecture can allow quick, safe, and fine-grained reallocation of flows across NF instances, it neglects the congestion existing in practical scenarios for scheduling the updates, which can result in SLA violations. Also, it does not consider the load of links to schedule the updates for minimizing the operating cost. To address this problem, we adequately consider the congestion caused by the competition for limited resource (e.g., bandwidth) and utilize the load information to propose a load-aware and congestion-free state management (LCSM) strategy. LCSM can provide congestion-free scheduling of updates and minimize the operating cost. Extensive simulation results show the advantages of our proposed LCSM.
Conference Paper
Full-text available
Replication is a common approach to enhance data availability in cloud storage systems. Previously proposed replication schemes cannot effectively handle both correlated and non-correlated machine failures while increasing the data availability with the limited resource. The schemes for correlated machine failures must create a constant number of replicas for each data object, which neglects diverse data popularities and cannot utilize the resource to maximize the expected data availability. Also, the previous schemes neglect the consistency maintenance cost and the storage cost caused by replication. It is critical for cloud providers to maximize data availability (hence minimize SLA violations) while minimizing cost caused by replication in order to maximize the revenue. In this paper, we build a nonlinear integer programming model to maximize data availability in both types of failures and minimize the cost caused by replication. Based on the model's solution for the replication degree of each data object, we propose a low-cost multi-failure resilient replication scheme (MRR). MRR can effectively handle both correlated and non-correlated machine failures, considers data popularities to enhance data availability, and also tries to minimize consistency maintenance cost and storage cost. Extensive numerical results from trace parameters and experiments from real-world Amazon S3 show that MRR achieves high data availability, low data loss probability and low consistency maintenance cost and storage cost compared to previous replication schemes.
Conference Paper
As an alternate form in the road transportation system, electric vehicle (EV) can help reduce the fossil-fuel consumption. However, the usage of EVs is constrained by the limited capacity of battery. Wireless Power Transfer (WPT) can increase the driving range of EVs by charging EVs in motion when they drive through a wireless charging lane embedded in a road. The amount of power that can be supplied by a charging lane at a time is limited. A problem here is when a large number of EVs pass a charging lane, how to efficiently distribute the power among different penetrations levels of EVs? However, there has been no previous research devoted to tackling this challenge. To handle this challenge, we propose a system to balance the State of Charge (called BSoC) among the EVs. It consists of three components: i) fog-based power distribution architecture, ii) power scheduling model, and iii) efficient vehicle-to-fog communication protocol. The fog computing center collects information from EVs and schedules the power distribution. We use fog closer to vehicles rather than cloud in order to reduce the communication latency. The power scheduling model schedules the power allocated to each EV. In order to avoid network congestion between EVs and the fog, we let vehicles choose their own communication channel to communicate with local controllers. Finally, we evaluate our system using extensive simulation studies in Network Simulator-3, MatLab, and Simulation for Urban MObility tools, and the experimental results confirm the efficiency of our system.