Conference PaperPDF Available

Simulation Analysis of Download and Recovery Processes in P2P Storage Systems

Authors:

Abstract and Figures

Peer-to-peer storage systems rely on data fragmentation and distributed storage. Unreachable fragments are continuously recovered, requiring multiple fragments of data (constituting a ldquoblockrdquo) to be downloaded in parallel. Recent modeling efforts have assumed the recovery process to follow an exponential distribution, an assumption made mainly in the absence of studies characterizing the ldquorealrdquo distribution of the recovery process. This work aims at filling this gap through a simulation study. To that end, we implement the distributed storage protocol in the NS-2 network simulator and run a total of seven experiments covering a large variety of scenarios. We show that the fragment download time follows approximately an exponential distribution. We also show that the block download time and the recovery time essentially follow a hypo-exponential distribution with many distinct phases (maximum of as many exponentials). We use expectation maximization and least square estimation algorithms to fit the empirical distributions. We also provide a good approximation of the number of phases of the hypo-exponential distribution that applies in all scenarios considered. Last, we test the goodness of our fits using statistical (Kolmogorov-Smirnov test) and graphical methods.
Content may be subject to copyright.
Simulation Analysis of Download and Recovery Processes in P2P Storage
Systems
Abdulhalim Dandoush, Sara Alouf and Philippe Nain
INRIA Sophia Antipolis – B.P. 93 – 06902 Sophia Antipolis, France {adandous, salouf, nain}@sophia.inria.fr
July 10, 2019
Abstract
Peer-to-peer storage systems rely on data fragmentation and
distributed storage. Unreachable fragments are continuously
recovered, requiring multiple fragments of data (constituting
a “block”) to be downloaded in parallel. Recent modeling ef-
forts have assumed the recovery process to follow an exponen-
tial distribution, an assumption made mainly in the absence of
studies characterizing the “real” distribution of the recovery
process. This work aims at filling this gap through a simu-
lation study. To that end, we implement the distributed stor-
age protocol in the NS-2 network simulator and run a total of
seven experiments covering a large variety of scenarios. We
show that the fragment download time follows approximately
an exponential distribution. We also show that the block down-
load time and the recovery time essentially follow a hypo-
exponential distribution with many distinct phases (maximum
of as many exponentials). We use expectation maximization
and least square estimation algorithms to fit the empirical dis-
tributions. We also provide a good approximation of the num-
ber of phases of the hypo-exponential distribution that applies
in all scenarios considered. Last, we test the goodness of our
fits using statistical (Kolmogorov-Smirnov test) and graphical
methods.
1 Introduction
The peer-to-peer (P2P) model has proved to be an alternative
to the Client/Server model and a promising paradigm for Grid
computing, file sharing, voice over IP, backup and storage ap-
plications. A major advantage of P2P systems is that peers
can build a virtual overlay network on top of existing archi-
tecture and topology. Each peer receives/provides a service
from/to other peers through the overlay network; examples of
such a service are sharing the capacity of its central processing
unit, sharing its bandwidth capacity, sharing its free storage
space, and sharing local information about neighbors to help
each other locating resources.
P2P storage systems (P2PSS) have emerged as a cheap, scal-
able and self-repairing solution. Such distributed systems rely
on data fragmentation and distributed storage. Files are par-
titioned into fixed-size blocks that are themselves partitioned
into fragments. Fragments are usually stored on different
peers. Given this configuration, a user wishing to retrieve a
given data would need to perform multiple downloads, gen-
erally in parallel for an enhanced service. To mitigate churn
of peers, redundant fragments are continuously injected in the
system, thus maintaining data redundancy above a minimum
desired level. When the amount of unreachable fragments at-
tains a predefined threshold, a recovery process is initiated.
In this paper, we consider systems relying on erasure codes
to generate the redundant fragments. If sdenotes the initial
number of fragments and rdenotes the amount of additional
redundant fragments, then any sout of the s+rfragments
can be used to generate a new redundant fragment (e.g. [18]).
Observe that this notation covers the case of replication-based
systems, with s= 1 and rdenoting the number of replicas.
The recovery process includes the download of a full block
of sfragments. P2PSS may rely on a central authority that
initiates the recovery process when necessary. This central
authority could reconstruct all missing fragments of a given
block of data and remotely store them on as many new peers.1
Alternatively, secure agents running on new peers could re-
construct by themselves missing fragments to be stored on the
peers disks. A more detailed description of P2PSS, their re-
covery schemes and their policies is presented in Section 2.
1.1 Motivation
There have been recent modeling efforts focusing on the per-
formance analysis of P2PSS in terms of data durability and
data availability. In [17], Ramabhadran and Pasquale analyze
systems that use full replication for data reliability. They de-
velop a Markov chain analysis, then derive an expression for
the lifetime of the replicated state and study the impact of
bandwidth and storage limits on the system. This study relies
on the assumption that the recovery process follows an expo-
nential distribution. Observe that in replication-based systems,
the recovery process lasts mainly for the download of one frag-
ment of data. In other words, the authors of [17] are implic-
itly assuming that the fragment download time is exponentially
1By “new” peers, we refer to peers that do not already store fragments of
the same block. We assume the system enforces the rule that a peer can store
at most one fragment of any given block.
1
distributed.
In our previous work [1], we developed a more general
model than that in [17], which applies to both replicated and
erasure-coded P2PSS. Also, unlike [17], our model accounts
for transient disconnections of peers, namely, the churn in the
system. We also assumed the recovery process to be expo-
nentially distributed. However, this assumption differs sub-
stantially between replicated and erasure-coded P2PSS, as in
the latter systems the recovery process is much more complex
than in the former systems. Furthermore, the recovery process
differs from centralized to distributed implementation.
In both studies, findings and conclusions rely on the as-
sumption that the recovery process is exponentially distributed.
However, this assumption is not supported by any experimental
data. To the best of our knowledge, there has been no simula-
tion study characterizing this process in real P2PSS.
This work aims at filling this gap through a simulation anal-
ysis. We believe it is essential to characterize the distribution of
download and recovery processes in P2PSS. Evaluating these
distributions is crucial to validate (or invalidate) the results pre-
sented in the above works and to better understand the avail-
ability and durability of data in these systems.
We will show through intensive simulations of many re-
alistic scenarios that (i) the fragment download time follows
closely an exponential distribution and (ii) fragment download
times are weakly correlated. Given that in erasure-coded sys-
tems, the block download time consists of downloading sev-
eral fragments in parallel, it follows that the recovery process
should follow approximately a hypo-exponential distribution
of several phases. (This is nothing but the sum of several in-
dependent random variables exponentially distributed having
each its own rate [11].) We will show that this is indeed the
case in the simulated data. We find that, in erasure-coded sys-
tems, the exponential assumption made on the block download
time and on the recovery process is not met in most cases that
we considered. Our results suggest that the models presented
in [1] give accurate results on data durability and availability
only in replicated P2PSS (as in [17]). The case of erasure-
coded systems was inaccurately studied.
Building on the results of this paper, we incorporated into
the model of [1] the assumption that fragment download and
upload times are exponentially distributed with parameters α
and β, respectively. The resulting models, appeared in [5],
characterize data lifetime and availability in P2PSS storage
systems that use either replication or erasure codes, under more
realistic assumptions as supported by this paper.
1.2 Why do we use simulations?
To collect traces of fragment download/upload times, of block
download times and of recovery times, one can choose to per-
form simulations or experimentations either on testbeds or on
real networks. We would like to consider situations where
peers are either homogeneous or heterogeneous, different un-
derlying network topologies, and different propagation delays
in the network. Also, we would like to consider systems with
a large number of peers. To achieve all this with experiments
over real networks is very difficult. Setting up experiments
over a dedicated network like Planet-Lab [16] would require a
long time, and there will be limitations on changing the topol-
ogy and the peers characteristics. We find it most attractive
to implement the distributed storage protocol in a well-known
network simulator and to simulate different scenarios. We
choose NS-2 as network simulator because it is an open source
discrete event simulator targeted at networking research. NS-2
provides substantial support for simulation of TCP and routing
and it is well known and well validated.
1.3 Contributions
Our contributions are as follows:
Implementation of the download and the recovery pro-
cesses in NS-2.
Evaluation of the fragment/block download time and the
recovery process, under a variety of conditions: differ-
ent network topologies, heterogeneity of peers, different
propagation delay, centralized vs. distributed recovery
process.
Data fitting: the distribution of the block download time
and recovery time are fitted using the Expectation Maxi-
mization (EM) algorithm and the distribution of the frag-
ment download time is fitted using the Maximum Like-
lihood Estimation (MLE) and Least Square Estimation
(LSE).
Statistical goodness-of-fit test, namely, the Kolmogorov-
Smirnov test [13].
The rest of this paper is organized as follows. Section 2
overviews the storage protocol that we consider. Section 3 de-
scribes the simulation architecture, the methodology and the
setup of the simulations. In Section 4, some of our experi-
mental results are discussed. Section 5 briefly reviews related
work. Last, Section 6 concludes the paper.
2 System Description
We will describe in this section the storage protocol that we
want to simulate:
Files are partitioned into fixed-size blocks (the block size
is SB) that are themselves partitioned into sfragments
(the fragment size is SF).
Each block is stored as a total of s+rfragments, rof
them are redundant and generated using erasure codes.
Fixing block and fragment sizes helps to fix the value of
the parameters sand rin the system for all stored blocks.
These s+rfragments are stored over s+rdifferent peers.
2
Mainly for privacy issues, a peer can store at most one
fragment of any block of data.
The system has perfect knowledge of the location of frag-
ments at any given time, e.g. by using a Distributed Hash
Table (DHT) or a central authority. Only the latest known
location of each fragment is tracked, whether it is a con-
nected or disconnected peer.
To overcome churn and maintain data reliability and
availability, unreachable fragments are continuously re-
covered.
The number of connected peers at any time is typically
much larger than the number of fragments associated with
a block of data, i.e., s+r. Therefore, there are always at
least s+rnew connected peers which are ready to receive
and store fragments of a block of data.
Once an unreachable fragment is recovered, any other
copy of it that “reappears” in the system due to a peer
reconnection is simply ignored, as only one location of
the fragment (the newest one) is recorded in the system.
Similarly, if a fragment is unreachable, the system knows
of only one disconnected peer that stores the unreachable
fragment.
Two implementations of the recovery process are considered.
This process is triggered for each block whose number of un-
reachable fragments reaches a threshold k.
In the centralized implementation, a central authority will:
(1) download in parallel sfragments from the peers which are
connected, (2) reconstruct at once all unreachable fragments
(by now considered as missing), and (3) upload them all in
parallel onto as many new peers for storage. In fact, Step 2
executes in a negligible time compared to the execution time
of Steps 1 and 3. Step 1 (resp. Step 3) execution completes
when the last fragment completes being downloaded (resp. up-
loaded).
In the distributed implementation, a secure agent on one new
peer is notified of the identity of one out of the kunreachable
fragments for it to reconstruct it. Upon notification, the se-
cure agent (1) downloads sfragments from the peers which
are connected to the storage system, (2) reconstructs the spec-
ified fragment and stores it on the peer’s disk; (3) the secure
agent then discards the sdownloaded fragments so as to meet
the privacy constraint that only one fragment of a block of data
is held by a peer. This operation iterates until less than kfrag-
ments are sensed unreachable and stops if the number of miss-
ing fragments reaches k1. The recovery of one fragment
lasts mainly for the execution time of Step 1; the recovery is
completed then as soon as the last fragment (out of s) com-
pletes being downloaded.
When k= 1, the recovery process is said to be eager; when
k∈ {2,...,r}, the recovery process is said to be lazy.
3 Simulation Architecture and Setup
3.1 Architecture and Assumptions Overview
We implemented an application and a wrapper layers in NS-
2 (version 2.33) following the architecture depicted in Fig. 1.
The application layer represents the P2PSS application. As for
the wrapper layer, it is an intermediate layer that passes the
data between a transport agent object in NS-2 and the P2PSS
application.
We did minor changes to the following NS-2 files: node.cc,
node.h, agent.cc, agent.h, tcp-full.cc, and tcp-full.h. We use
FullTcp since it supports bidirectional data transfers. We fol-
low the same methodology as the Web cash application pre-
sented in the NS Manual (cf. [8, Chap. 40]) and use some of
the technical ideas presented in [7].
Implementing a new protocol at the application level of NS-
2 is very well documented in the NS Manual. We will therefore
skip the description of technical details of the implementation,
and refer the interested reader to [8, pp. 344–360].
We consider two different storage applications, a backup-
like application and an e-library-like application (“e” stands
for “electronic”) . In the first, a file stored in the system can
be requested for retrieval only by the peer that has produced
the file. In the second, any file can be downloaded by any
peer in the system. In both applications, the storage protocol
follows the description of Section 2. In particular, the s+
rpeers associated with a given block are chosen uniformly
among the peers in the system.
Two types or requests are issued in the system. The first
type is issued by the users of the system: a user issues a re-
quest to retrieve one of its files in the backup-like application,
or a public document in the e-library-like application. The sec-
ond type consists of management requests. These are issued by
the central authority (in the centralized implementation of the
recovery process) or by a peer (in the distributed implementa-
tion) as soon as the threshold kis reached for any stored block
of data.
File download requests are translated into (i) a request to
Agent (FullTcp)
packets
P2PSS Agent Wrapper
P2PSS Application
send(bytes) recv(bytes)
send_data(AppData) process_data(AppData)
Figure 1: Simulator architecture.
3
the directory service to obtain, for each block of the desired
file, a list of at least speers that store fragments of the block,
(ii) opening TCP connections with each peer in the said list
to download one fragment. All download requests issued by a
given peer form a Poisson process.
Recovery requests are issued only in the scenarios where
there is churn in the network. A recovery request concerning a
given block translates into (i) a request to the directory service
to obtain a list of at least speers that store fragments of said
block, (ii) opening TCP connections with each peer in the said
list to download one fragment. Once all sfragments have been
downloaded, the process proceeds with Steps 2 and 3, accord-
ing to the implementation, as explained in Section 2.
All peers in the simulator have the architecture reported in
Fig. 1. Peers share their available upload bandwidths and their
free storage volumes.
3.2 Network Topology
Having a representative view of enterprise networks or the In-
ternet topology is very important for a simulator to predict
the behavior of a network protocol or application if it were to
be deployed. In fact, the simulated topology often influences
the outcome of the simulations. Realistic topologies are thus
needed to produce realistic simulation results. Most of existing
simulations studies have used representations of a real topol-
ogy (e.g. the Arpanet), simple models (e.g. a star topology),
or random flat graphs (i.e. non-hierarchical) that are generated
by Waxman’s edge-probability function [20].
However, random models offer very little control over the
structure of the resulting topologies. In particular, they do
not capture the hierarchy that is present in the Internet. Re-
cently, tools such as (BRITE [14] and GT-ITM [3]) have been
designed to generate more complex random graphs, that are
hierarchical, to better approximate the Internet’s hierarchical
structure.
To produce realistic topologies for our simulations, we use
the tool GT-ITM [3] to generate a total of six random graphs.
Three levels of hierarchy are used corresponding to transit do-
mains, stub domains, and local area networks (LANs) attached
to stub domains. Each graph has one transit domain of four
nodes; each of the nodes is connected to two or three other
transit nodes. Each transit node is connected on average to two
stub nodes, and each stub node is in turn connected on average
to four routers. Behind every router there is a certain number of
fully-connected peers constituting a LAN. The first of these six
graphs is depicted in Fig. 2, where we have used the notation
TN for “transit node” and SN for “stub node”. The total num-
ber of peers in each graph is in the set {480,640,800,960}.
3.3 Experiments Setup
We ran a total of seven experiments. Experiments 1–6 used
the random graphs generated with the GT-ITM tool as detailed
before, whereas a simple star topology is used in Experiment
LAN router
Stub Node (SN)
Transit Node (TN)
peer
TN
1Gbps 1Gbps
622mbps
622mbps
TN
TN
TN
TN
1Gbps
1Gbps
622mbps
36mbps
1Gbps
622mbps
622mbps
622mbps
622mbps
51mbps
34mbps
622mbps
76mbps
Figure 2: Three-level hierarchical random graph of Experi-
ment 1.
7. Regarding the intra- and inter-domain capacities, we rely on
the information provided by RENATER [19] and G ´
EANT [9]
web sites. In those networks, the links are well-provisioned.
To have a more complete study, we will consider, in Experi-
ments 5 and 6, links with smaller capacities, as can be seen in
rows 4–6 of Table 1. Propagation delays over TN-SN edges
vary from edge to edge as can be seen in row 7 of Table 1.
Let Cuand Cddenote respectively the upload and down-
load capacity of a peer. To set these values, we rely mainly on
the findings of [10] and [12]. The experimental study of file
sharing systems and of the Skype P2P voice over IP system
[10] found that more than 90% of users have upload capacity
Cubetween 30Kbps and 384Kbps. However, the measurement
study [12] done on BitTorrent clients in 2007 reports that 70%
of peers have an upload capacity Cubetween 350Kbps and
1Mbps and even 10% of peers have an upload capacity be-
tween 10Mbps and 110 Mbps. The capacities that we have se-
lected in the simulations vary uniformly between the values of
the ISDN and ADSL technologies; they can be found in rows
8–9 of Table 1. Observe that, except in Experiment 2, peers are
heterogeneous. We will attribute the propagation delays over
routers-peers edges randomly between 1ms and 25ms as can
be seen in row 10 of Table 1.
In Experiments 1, 3, 5 and 6, there exists a background traf-
fic between three pairs of routers across the common back-
bone. This traffic consists of random exponential and CBR
traffic over UDP protocol and FTP traffic over TCP.
In each of the experiments, the amount of data transferred
between routers and peers in the system during the observed
time (that is from 4e+5 up to 5e+6 seconds) are, on average,
4.5–9 GB of P2P application traffic, and when applicable 150–
350 MB of FTP, 200–400 MB of CBR, and 250–500 MB of the
exponential traffic. In each of the experiments, the P2P traffic
4
Table 1: Experiments setup
Experiment number 1 2 3 4 5 6 7
Topology random random random random random random star
Number of peers 960 480 480 960 640 800 480
TN-TN capacities (Gbps) 1 1 1 1 1 1
TN-SN capacities (Mbps) 622 622 622 622 10–34 10–34
SN-routers capacities (Mbps) 34–155 34–155 34–155 34–155 4–10 4–10
TN-SN delays (ms) 5–25 5–50 5–75 5–50 5–25 5–25
Cuof peers (Kbps) 150–1000 256 128–1000 128–1000 256–700 256–1000 256–700
Cdof peers (Kbps) 8×Cu512 8×Cu8×Cu10 ×Cu4×Cu2048
routers-peers delays (ms) 1–20 1–20 1–20 1–20 1–10 1–25 1–25
Background traffic yes no yes no yes yes no
Application type e-library e-library backup e-library backup e-library e-library
Peers churn no no no no yes yes yes
Recovery process — — — dist. dist. cent.
r— — — s s s/2
1(min.) 80 8 80 144e3 160 13 16
SB(MB) 8 8 8 8 4 8 8
SF(KB) 1024 1024 1024 1024 512 1024 1024
s8 8 8 8 8 8 8
is well distributed over the active peers.
Experiments 3 and 5 simulate a backup-like application
whereas the other five experiments simulate an e-library-like
application. Churn is considered only in Experiments 5–7. As
a consequence, redundancy is added and maintained only in
these experiments. The storage overhead r/s is either 1 or
0.5. We consider the distributed implementation of the recov-
ery process in Experiments 5 and 6, and the centralized im-
plementation of the same in Experiment 7; the eager policy
(k= 1) is considered in all three experiments. In other words,
once a peer disconnects from the system, all fragments that are
stored on it must be recovered.
Churn is implemented as follows. We assume that each peer
alternates between a connected state, that lasts for a duration
called “on-time”, and a disconnected state, that lasts for a du-
ration called “off-time”. We assume in the simulations that
the successive on-times (respectively off-times) of a peer are
independent and identically distributed random variables with
a common exponential distribution function with parameter
µ1>0(respectively µ2>0). This assumption is in agree-
ment with the analysis in [17]. We consider 11= 3 hours
and 12= 1 hours.
Download requests are generated at each peer according to
a Poisson process. This assumption is met in real networks
as found in [10]. We assume all peers have the same request
generation rate, denoted λ. We vary the value of λacross the
experiments as reported in row 16 of Table 1.
The last setting concerns the files that are stored in the
P2PSS. Fragment sizes SF(resp. block sizes SB) in P2P sys-
tems are typically between 64KB and 4MB each (resp. be-
tween 4MB and 9MB each). We will consider in most of our
experiments SF= 1MB and SB= 8MB, except in Experi-
ment 5 where SF= 512KB and SB= 4MB. Therefore s= 8
in all experiments. As for the file size, we assume for now that
it is equal to the block size. Therefore, the file download size
is actually the block download size. We leave the case of more
general file sizes to a future study. Observe that the recovery
process is related to the block download time and not to the file
download time.
Table 1 summarizes the key settings of the experiments.
4 Experimental Results
In this section, we present the results of our simulations and the
inference that we can draw from them. For each experiment,
we collect the fragment download time, the block download
time and the recovery time when applicable. In Experiments
5 and 6 (distributed recovery), the two latter durations are col-
lected to the same dataset as there is no essential difference
between them. Having collected these samples, we compute
the sample average and use MLE, LSE and EM algorithms to
fit the empirical distributions. Concerning the fragment down-
load time, we perform the Kolmogorov-Smirnov test [13] on
the fitted distribution. In the following, we will present se-
lected results from Experiments 1, 6 and 7. The results of the
other experiments are briefly reported in Tables 2–3.
4.1 Experiment 1
We have collected 76331 samples of the fragment download
time (cf. column 2 of Table 2). The empirical cumulative dis-
tribution function (CDF) is depicted in Fig. 3(a). We can see
5
Table 2: Summary of experiments results
Experiment number 1 2 3 4 5 6 7
Average frag. down. time = 1/α (sec.) 40.35 141.89 44.89 30.66 34.7367 108.86 40.722
Samples number 76331 71562 12617 4851 9737 80301 4669
tm(sec.) 8.77 33.71 8.631 8.71 6.84 8.743 16.4
1/ˆα(sec.) 39.351 124.607 39.622 27.3392 32.106 103.635 32.05
1,1/ˆ
β(sec.) — — — — — 6.22, 5.11
Average of recovery or block down. time (sec.) 102.75 365.73 105.254 82.88 92.4762 278.71 89.848
Samples number 9197 8938 1516 602 589 10025 561
Table 3: Block download time or recovery process: Validation of the approximations introduced in Eqs. (1)–(3)
Experiment number 1 2 3 4 5 6 7
Sample average 102.75 365.73 105.25 82.88 92.48 278.71 89.85
Inferred average from Eqs. (1), (2) 109.66 385.64 122.00 83.33 94.40 295.86 116.89
Relative error (%) 6.7 5.4 15.9 0.5 2.1 6.2 30.1
Inferred average from Eqs. (4), (3) 106.95 372.38 116.32 83.01 94.10 290.41 92.21
Relative error (%) 4.1 1.8 10.5 0.2 1.8 4.2 2.6
that it is remarkably close to the exponential distribution. Two
exponential distributions are plotted in Fig. 3(a), each having a
different parameter, derived from a different fitting technique.
The two techniques that we used are MLE and LSE. The pa-
rameter returned by MLE is nothing but the inverse of the sam-
ple average and is denoted α; see row 2 of Table 2.
Beyond the graphical match between the empirical distri-
bution and the exponential distribution, we did a hypothesis
test. Let Xbe a vector storing the collected fragment down-
load times. The Kolmogorov-Smirnov test compares the vec-
tor Xwith a CDF function, denoted cdf (in the present case,
it is the exponential distribution), to determine if the sample X
could have the hypothesized continuous distribution cdf. The
null hypothesis is that Xhas the distribution defined in cdf,
the alternative one being that Xdoes not have that distribution.
We reject the null hypothesis if the test is significant at the l%
level. In Experiment 1, the null hypothesis with α= 1/40.35
is not rejected for l= 7%.
Looking now at concurrent downloads, we have found that
these are weakly correlated and close to be independent. Be-
side the fact that the total workload is equally distributed over
the active peers, there are two main reasons for the weak cor-
relation between concurrent downloads as observed in Experi-
ment 1: (i) the good connectivity of the core network and (ii)
the asymmetry in peers upstream and downstream bandwidths.
So, as long as the bottleneck is the upstream capacity of peers,
the fragment download times are close to be independent.
Regarding the block download times, we have collected
9197 samples. The sample average is given in row 7 of Ta-
ble 2). The empirical CDF is plotted in Fig. 3(b). We followed
the same methodology and computed the closest exponential
distribution using MLE. However, the match between the two
distribution appears to be poor, and actually, the alternative hy-
pothesis is not rejected in this case.
To find a distribution that will more likely fit the empirical
data, we make the following analysis. To get a block of data, s
fragments, stored on sdifferent peers, have to be downloaded.
This is more efficiently done in parallel and this is how we im-
plemented it in the simulator. We have seen that the download
of a single fragment is well-modeled by an exponential random
variable with parameter α. Also, concurrent downloads were
found to be close to independent. Therefore, the time needed
for downloading sfragments in parallel is distributed like the
maximum of s“independent” exponential random variables,
which, due to the memoryless property (see also [11]), is the
sum of sindependent exponential random variables with pa-
rameters sα, (s1)α, . . . , α. This distribution is called the
hypo-exponential distribution and its expectation is
E[T] = 1
s
X
i=1
1/i (1)
where Tdenotes the block download time (or equivalently the
distributed recovery duration).
In Experiment 1, E[T] = 109.66 seconds, while the sam-
ple average is equal to 102.75; cf. column 2 of Table 3. The
relative error is 6.7%. The hypo-exponential distribution with
sphases and parameters sα, (s1)α,...,αis plotted in Fig.
3(b). This distribution has a very good visual match with the
empirical CDF of the block download time.
As a next step, we apply an EM algorithm [6] to find the
best hypo-exponential distribution with sphases that fits the
empirical data. In particular, we use EMpht [15], which is a
program for fitting phase-type distributions to collected data.
We do not plot the outcome of this program in Fig. 3(b) as
it mainly overlaps with the hypo-exponential distribution with
sphases and parameters sα, (s1)α,...,α that is already
plotted there. After performing the Kolmogorov-Smirnov test,
we find that the null hypothesis is not rejected for l= 7%
(same significance level as for the fragment download times).
6
0 100 200 300 400 500 600 700 800
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Cumulative distribution function
Fragment download time (seconds)
CDF of fragment download time
LSE exponential fit of data
MLE exponential fit of data
(a) Exponential fit of the fragment download time distribution
0 100 200 300 400 500 600 700 800
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Block download time (seconds)
Cumulative distribution function
CDF of block download time
MLE exponential fit of data
Hypo−exponential fit of data
(b) Fitting of the block download time distribution
Figure 3: Experiment 1: Fragment and block download times.
We conclude the analysis of the first experiment’s results
with four important points:
The exponential assumption on the block download time
is not met in realistic simulations.
The fragment download time could be modeled by an ex-
ponential distribution with parameter αequal to the in-
verse of its average.
Download times are weakly correlated and close to be in-
dependent as long as the bottleneck is the upstream ca-
pacity of peers.
As a consequence, the block download time could be
modeled by a hypo-exponential distribution with sphases
and parameters sα, (s1)α,...,α.
4.2 Experiment 6
In this experiment, peers are not always connected. Each time
a peer disconnects from the network, all the fragments that
were stored on his disk will have to be recovered. The recovery
process is implemented in a distributed way.
The empirical CDF of the fragment download time and that
of the block download time or the recovery time are reported
in Fig. 4. Following the same methodology as that used to
analyze the results of Experiment 1, we reach the same con-
clusions. The relevant parameters are reported in column 7 of
Tables 2 and 3. However, the null hypothesis for the block
download time or the recovery process is not always rejected.
This is the case of Experiment 7, as seen next.
(a) Exponential fit of the fragment download time distribution
0 100 200 300 400 500 600 700 800 900 1000 1100
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recovery or block download time (seconds)
Cumulative distribution function
CDF of recovery or block download time
MLE exponential fit of data
Hypo−exponential fit of data
(b) Fitting of the recovery or block download time distribution
Figure 4: Experiment 6: Download and distributed recovery
processes.
4.3 Experiment 7
Experiment 7 is the only one that uses a centralized recovery
process. Also, it is the only one using a simple star topology.
In this experiment, the alternative hypothesis on the recovery
process distribution is not rejected.
There is a simple reason for that. We actually know that
the download of a single fragment cannot be infinitely small,
as suggested by the exponential distribution. Let tmbe the
7
duration of the fastest fragment download among all sdown-
loads. All other (slower) downloads are necessarily bounded
by tm. The effect of this minimum value can be neglected
as long as tmis negligible with respect to the average frag-
ment download time. Otherwise, we need to consider that the
fragment download/upload time is composed of two compo-
nents: (i) a (constant) minimum delay tmand (ii) a random
variable distributed exponentially with parameter ˆα(resp. ˆ
β).
This random variable models the collected data, shifted left by
the value of tm. The minimum delay can be approximated as
RT T + (SF+Headers)/max{Cu}, where RT T stands for
round-trip time.
The value of tmis clearly visible in Fig. 5(a). We plot in
this figure the empirical CDF of the fragment download time
and the MLE exponential fits to both the collected and shifted
data. The null hypothesis is rejected for the collected data but
not rejected for the shifted data.
0 50 100 150 200 250 300 350 400 450
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fragment download time (seconds)
Cumulative distribution function
CDF of fragment download time
MLE exponential fit of data
MLE exponential fit of shifted data
(a) Exponential fit of the fragment download time distribution
ignoring the minimum value
50 100 150 200 250 300 350 400 450
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recovery time (seconds)
Cumulative distribution function
CDF of recovery time
MLE exponential fit of data
Hypo−exponential fit of data
Hypo−exponential fit of shifted data
(b) Fitting of the recovery time distribution
Figure 5: Experiment 7: Fragment and recovery time, central-
ized recovery.
This is the same case of the recovery process, whose em-
pirical CDF is plotted in Fig. 5(b). Repeating the same anal-
ysis than in Section 4.1, and assuming that the fragment up-
load time follows an exponential distribution with parameter
β, then the centralized recovery process, denoted Tc, would be
modeled by a hypo-exponential distribution with s+kphases
(k= 1 in Experiment 7) having expectation
E[Tc] = 1
s
X
i=1
1/i + 1
k
X
j=1
1/j . (2)
Considering this distribution, we find that the null hypothesis
of the Kolmogorov-Smirnov test for the collected data with
parameters 1= 40.72 and 1= 6.22 is rejected2for l=
6%, while it is not rejected for the shifted data with parameters
1/ˆα= 32.05 and 1/ˆ
β= 5.11.
Equations (1) and (2) should then be replaced with
E[T] = tm+ 1/ˆα
s
X
i=1
1/i , (3)
E[Tc] = tm+ 1/ˆα
s
X
i=1
1/i + 1/ˆ
β
k
X
j=1
1/j . (4)
The averages inferred from Eqs. (1)–(4) are listed in rows
3 and 5 of Table 3, and their relative errors with respect to the
sample average are listed in rows 4 and 6 of the same table.
Observe that the inferred average improves across all exper-
iments when considering shifted data. The best improvement
seen is that in Experiment 7. By considering that the shifted re-
covery time is hypo-exponen tially distributed with s+1 phases
and parameters sˆα, (s1)ˆα, . . . , ˆα, ˆ
β, the relative error on the
inferred average drops from 30.1% to 2.6%.
The conclusion of this discussion is that the exponential as-
sumption on fragments download/upload time is met in most
cases. The same assumption does not hold on the block down-
load time. The recovery time and the block download time are
well approximated by a hypo-exponential distribution in most
cases.
5 Related work
Although the literature on modeling and simulating P2P sys-
tems and parallel downloading is abundant, the recovery pro-
cess in P2PSS is a subject that has not been analyzed.
In [2], the authors propose a multiple-access protocol to
minimize the download time of a document from multiple mir-
ror sites in parallel using Tornado erasure codes based on the
idea of digital fountain. A document of size SBis encoded on
each mirror server with redundant information. The encoded
document consists of n=s+rdifferent fragments of size
SFwhere nSF> sSF> SB. To minimize the number of
duplicated packets received at the requester, each mirror en-
codes the document with Tornado codes and generates all the
nfragments, then it permutes the order of packets before send-
ing, and finally starts to deliver the packets continuously to the
2Even though it is rejected, this distribution is still much closer to the em-
pirical data than the exponential distribution.
8
requester of the document. The receiver can then reconstruct
the document SBafter collecting sdistinct packets of size SF
from the mirrors.
In [4], the authors have focused on the average download
time of each user in a P2P network while considering the het-
erogeneity of service capacities of peers. They point out that
the common approach of analyzing the average download time
based on average service capacity is fundamentally flawed.
The authors of [7] implement a BitTorrent file sharing pro-
tocol in NS-2 and compare packet-level simulation results with
flow-level for the download time of one file among an active
peer-set. They show that the propagation delay can signifi-
cantly influence the download performance of BitTorrent.
6 Conclusion
This paper performs a simulation analysis of download and
recovery processes in P2PSS. Implementing a storage proto-
col in NS-2, we set up seven simulations which enables us to
collect fragment/block download times and recoveries times
under a variety of conditions. We show that the exponential
assumption on the block download time does not hold. The
same assumption on fragments download/upload time is met
in most cases implying that both the block download time and
the recovery process could be modeled by a hypo-exponential
distribution with a pre-determined number of phases.
References
[1] S. Alouf, A. Dandoush, and P. Nain. Performance anal-
ysis of peer-to-peer storage systems. In Proc. of 20th
ITC, volume 4516 of Lecture Notes in Computer Science,
pages 642–653, Ottawa, Canada, June 2007.
[2] J. Byers, M. Luby, and M. Mitzenmacher. Accessing
multiple mirror sites in parallel: Using tornado codes to
speed up downloads. In Proc. of IEEE Infocom ’99, pages
21–25, New York, USA, March 1999.
[3] K. Calvert, M. Doar, and E. W. Zegura. Modeling In-
ternet topology. IEEE Communications Magazine, June
1997.
[4] Yuh-Ming Chiu and Do Young Eun. Minimizing
file download time in stochastic peer-to-peer networks.
IEEE/ACM Trans. Netw., 16(2):253–266, 2008.
[5] A. Dandoush, S. Alouf, and P. Nain. Performance analy-
sis of centralized versus distributed recovery schemes in
P2P storage systems. In Proc. of IFIP/TC6 NETWORK-
ING 2009, Aachen, Germany, May 11–15, 2009. to ap-
pear.
[6] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum
likelihood from incomplete data via the em algorithm. J.
Royal Statist. Soc., 39(1):1–37, 1977.
[7] K. Eger, T. Hobfeld, A. Binzenhofer, and G. Kunzmann.
Efficient simulation of large-scale P2P networks: Packet-
level vs. flow-level simulations. In Proc. of UPGRADE-
CN’07, Monterey, California, USA, June 2007.
[8] K. Fall and K. Varadhan. The NS manual, the VINT
project, UC Berkeley, LBL, USC/ISI, and Xerox PARC.
http://www.isi.edu/nsnam/ns/ns-documentation.ht
November 2008.
[9] G ´
EANT: a pan-European backbone which connects
Europe’s national research and education networks.
http://www.geant.net/server/show/nav.159.
[10] A. Guha, N. Daswani, and R. Jain. An experimental study
of the skype peer-to-peer VoIP system. In Proc. of 5th
IPTPS, Santa Barbara, California, February 2006.
[11] P. Harrison and S. Zertal. Queueing models of RAID sys-
tems with maxima of waiting times. Performance Evalu-
ation Journal, 64(7-8):664–689, August 2007.
[12] T. Isdal, M. Piatek, A. Krishnamurthy, and T. Anderson.
Leveraging Bittorrent for end host measurements. In
Proc. 8th Passive and Active Measurement Conference,
Louvain-la-neuve, Belgium, April 2007.
[13] F. J. Massey. The kolmogorov-smirnov test for goodness
of fit. J. Am. Statist. Assoc., 46(253):68—-78, 1951.
[14] A. Medina, A. Lakhina, I. Matta, and J. Byers. Brite:
Boston University representative Internet topology gen-
erator. http://www.cs.bu.edu/brite/.
[15] M. Olsson. The EMpht-programme. Technical re-
port, Department of Mathematics, Chalmers University
of Technology, 1998.
[16] PlanetLab. An open platform for developing,
deploying, and accessing planetary-scale services.
http://www.planet-lab.org/, 2007.
[17] S. Ramabhadran and J. Pasquale. Analysis of long-
running replicated systems. In Proc. of IEEE Info-
com ’06, Barcelona, Spain, April 2006.
[18] I.S. Reed and G. Solomon. Polynomial codes over certain
finite fields. J. SIAM, 8(2):300–304, June 1960.
[19] Renater: Le R´eseau National de t´el´ecommunications
pour la Technologie, l’Enseignement et la Recherche.
http://www.renater.fr.
[20] Bernard M. Waxman. outing of multipoint connections.
IEEE Journal on Selected Areas in Communications,
6(9):1617–1622, 1988.
9
... For the data downloading and recovery processes, we developed a realistic simulation model and implemented it on top of the ns-2 network simulator [36]. This simulator is able to precisely predict the behavior of these processes, while considering the impact of several constraints such as peer heterogeneity and physical network topology [37]. ...
... Our second contribution in P2P storage systems appears in [37] and characterizes the distribution of download and recovery processes in P2P storage systems. To that end, we implemented the distributed storage protocol in the NS-2 network simulator (the details of our implementation are in [36]). ...
... Our experimental results reported in [37] indicate that the exponential assumption on fragments download/upload time is met in most cases. The same assumption does not hold on the block download time. ...
Thesis
Full-text available
This manuscript recollects some of my contributions since my PhD defense. These were achieved while I was a researcher at Inria Sophia Antipolis Méditerranée in the Project-Team Maestro. My research activities focus on the modeling and performance evaluation of networks. In fact, this term encompasses a wide variety of situations. One may consider a particular layer in the protocols stack like the access protocol to the communication channels, or focus on the application layer and study overlay networks or cache networks. As a researcher, I first became interested in controlling the access to the medium in wireless networks. Subsequently, I considered on one hand the power save mode of mobile devices and on the other hand the problem of evolutionary routing in networks with reduced connectivity. After studying the energy savings at mobile terminals, I turned my focus on the energy consumption of base stations in cellular networks and on the use of renewable energy sources for their electrical power supply. This line of research has led to the stochastic modeling of solar radiation in order to better take into account renewable energy sources in the performance evaluation of communications networks. Meanwhile, I developed a second line of research at the application level of communications systems. I was first interested in peer-to-peer storage systems. I then studied hierarchical networks of caches such as that of the Domain Name System (DNS). I have also done research work in the framework of partnership projects with private companies. I contributed to a study on the active management of flows at the core of the network (project with the former Alcatel-Lucent Bell Labs, now Nokia) and to the performance evaluation of urban train control based on wireless communications (project with Alstom Transport).
... Modeling a multi-phase service as a hypoexponential distribution is justified in some simulation work. The authors in [20] show that the block download time and the recovery time in a P2P Storage Systems essentially follow a hypoexponential distribution with many distinct phases. The delay on the transmission path in a mobile opportunistic network has been experimentally validated in [21,22] to follow a hypoexponential distribution. ...
... The state probability p K ,1 can be derived from either Eqs. (19) or (20). Either derivation yields to the same exact result, and this can be numerically proven. ...
... In fact, we develop a succinct algorithm (shown in Algorithm 1) to obtain recursively all state probabilities which are expressed in Eqs. (11)(12)(13)(14)(15)(16)(17)(18)(19)(20). Now, we can derive important performance measures for the system. ...
Article
Full-text available
Hypoexponential servers are commonly seen in today’s computer and communication networks whereby incoming packets are processed by the network server in multiple stages with each stage having a different processing time. This paper presents an analytical model to capture the behavior and subsequently analyze the performance of these network servers or similarly behaving systems. From our model, we derive key performance measures and features which include CPU utilization, system idleness, mean throughput, packet loss, mean system and queuing packet delays, and mean system and queue sizes. In addition, we present two popular finite queueing models (namely, M / D / 1 / K and M / M / 1 / K) to approximate our hypoexponential model. Results show that the both of these approximate models give close results when the system queue size is large.
... A first quantitative comparison between erasure coding and replication in self-repairing and fault resilient distributed storage systems can be found in Weatherspoon and Kubiatowicz (2002). The solution has been explored in peer to peer systems: in Kameyama and Sato (2007) a family of erasure codes with a very low overhead factor is applied to distributed storage to show the appropriateness of this approach; in Dandoush et al. (2009) the problem of lost data blocks reconstruction in distributed storage systems is studied in peer to peer architectures, providing an interesting characterization in terms of empirical distributions obtained by means of event based simulations; a further application is in Aguilera et al. (2005); an analysis of the combined application of redundancy and erasure coding in DHT is given in Wu et al. (2005); in another analysis (Rodrigues and Liskov (2005)) the authors evaluate the performances of DHT by an analytical approach based on traces of 3 different applications, and conclude that results are actually confirming the expected advantages, but cost an excess of complexity in the design of the overall system because of the implementation of erasure coding. ...
... We suppose that the user sends its request in parallel to all the nodes having the chunk, and then that she reads or write it on the node that answered first. We suppose that chunks are requested at a rate c = 0.02 requests per hour, and that the time required to access a block follows a Normal distribution (see Dandoush et al. (2009)). We set the mean transfer time to µ = 100 msec., and its standard deviation σ = 25 msec. ...
Article
Replication of Data Blocks is one of the main technologies on which Storage Systems in Cloud Computing and Big Data Applications are based. With the heterogeneity of nodes, and an always-changing topology, keeping the reliability of the data contained in the common large-scale distributed file system is an important research challenge. Common approaches are based either on replication of data or erasure codes. The former stores each data block several times in different nodes of the considered infrastructures: the drawback is that this can lead to large overhead and non-optimal resources utilization. Erasure coding instead exploits Maximum Distance Separable codes that minimize the information required to restore blocks in case of node failure: this approach can lead to increased complexity and transfer time due to the fact that several blocks, coming from different sources, are required to reconstruct lost information. In this paper we study, by means of discrete event simulation, the performances that can be obtained by combining both techniques, with the goal of minimizing the overhead and increasing the reliability while keeping the performances. The analysis proves that a careful balance between the application of replication and erasure codes significantly improves reliability and performances avoiding large overheads with respect to the isolated use of replication and redundancy.
... To understand how the recovery process could be better modeled, they have implemented this process in the ns-2 network simulator (cf. [31]) and have performed an intensive simulation analysis of it in [69], [118]. ...
... Building on the findings in [69], [118], A. Dandoush, S. Alouf and P. Nain develop in [68] Markovian models assuming that the fragment download/upload time is exponentially distributed so that the recovery time follows a hypo-exponential distribution with many distinct phases. They find in particular that a distributed recovery scheme is a good implementation choice only in large networks where peers have a good availability. ...
Technical Report
Full-text available
MAESTRO is an INRIA project-team whose members are located in Sophia Antipolis (S. Alouf, K. Avrachenkov, P. Nain, G. Neglia), at LIA in Avignon (E. Altman) and at LIRMM in Montpellier (A.-E. Baert and A. Jean-Marie). MAESTRO is concerned with the modeling, performance evaluation, optimization and control of stochastic Discrete-Event Dynamical Systems (DEDS), with a particular emphasis on networks and their applications. The scientific contributions are both theoretical, with the development of new modeling formalisms, and applied, with the development of software tools for the performance evaluation of DEDS.
... A solution with a massively distributed approach that aims at cost reduction is presented in [20], in which an analytical model is used to analyze the characterization of lost chunks reconstruction processes. Erasure coding has been extensively studied, with its applications, specially in peer to peer systems: [1] provides a good application, [22] presents a quantitative evaluation of the benefits deriving from the adoption of erasure coding and replication strategies in resilient storage subsystems, while [13] presents a low overhead erasure codes family applied to peer to peer systems and [9] presents an empirical simulation based statistical analysis of reconstruction of data blocks in peer to peer architectures. Applications to distributed hash tables can be found in [24] and [17]. ...
Chapter
The efficiency of storage systems is a key factor to ensure sustainability in data centers devoted to provide cloud services. A proper management of storage infrastructures can ensure the best trade off between costs, reliability and quality of service, enabling the provider to be competitive in the market. Heterogeneity of nodes, and the need for frequent expansion and reconfiguration of the subsystems fostered the development of efficient approaches that replace traditional data replication, by exploiting more advanced techniques, such the ones that leverage erasure codes. In this paper we use an ad-hoc discrete event simulation approach to study the performances of replication and erasure coding with different parametric configurations, aiming at the minimization of overheads while obtaining the desired reliability. The approach is demonstrated with a practical application to the erasure coding plugins of the increasingly popular CEPH distributed file system.
... Dandoush et al. in [37] perform a simulation study of the download and the repairing process. They use the NS2 simulator to measure the distribution of the repair time. ...
Article
Large scale peer-to-peer systems are foreseen as a way to provide highly reliable data storage at low cost. To ensure high durability and high resilience over a long period of time the system must add redundancy to the original data. It is well-known that erasure coding is a space efficient solution to obtain a high degree of fault-tolerance by distributing encoded fragments into different peers of the network. Therefore, a repair mechanism needs to cope with the dynamic and unreliable behavior of peers by continuously reconstructing the missing redundancy. Consequently, the system depends on many parameters that need to be well tuned, such as the redundancy factor, the placement policies, and the frequency of data repair. These parameters impact the amount of resources, such as the bandwidth usage and the storage space overhead that are required to achieve a desired level of reliability, i.e., probability of losing data. This thesis aims at providing tools to analyze and predict the performance of general large scale data storage systems. We use these tools to analyze the impact of different choices of system design on different performance metrics. For instance, the bandwidth consumption, the storage space overhead, and the probability of data loss should be as small as possible. Different techniques are studied and applied. First, we describe a simple Markov chain model that harnesses the dynamics of a storage system under the effects of peer failures and of data repair. Then we provide closed-form formulas that give good approximations of the model. These formulas allow us to understand the interactions between the system parameters. Indeed, a lazy repair mechanism is studied and we describe how to tune the system parameters to obtain an efficient utilization of bandwidth. We confirm by comparing to simulations that this model gives correct approximations of the system average behavior, but does not capture its variations over time. We then propose a new stochastic model based on a fluid approximation that indeed captures the deviations around the mean behavior. These variations are most of the time neglected by previous works, despite being very important to correctly allocate the system resources. We additionally study several other aspects of a distributed storage system: we propose queuing models to calculate the repair time distribution under limited bandwidth scenarios; we discuss the trade-offs of a Hybrid coding (mixing erasure codes and replication); and finally we study the impact of different ways to distribute data fragments among peers, i.e., placement strategies.
... Dandoush et al. in [DAN09] perform a simulation study of the download and the repairing process. They use the NS2 simulator to measure the distribution of the repair time. ...
Article
In this thesis we study multiple approaches to efficiently accommodating for the future growth of the Internet. The exponential growth of Internet traffic, reported to be as high as 41% in peak throughput in 2012 alone, continues to pose challenges to all interested parties. Therefore, to accommodate the growth, smart management and communication protocols are needed. The basic protocols of the Internet are point-to-point in nature. However, the traffic is largely broadcasting, with projections stating that as much as 80-90% of it will be video by 2016. This discrepancy leads to inefficiency, where multiple copies of essentially the same messages travel in parallel through the same links. In this thesis we study multiple approaches to mitigating this inefficiency. The contributions are organized by layers and phases of the network life. We look into optimal cache provisioning during network design. Next, we move to managing an existing network. We look into putting devices to sleep mode, using caching and cooperation with Content Distribution Networks. In the application layer, we look into maintaining balanced trees for media broadcasting. Finally, we analyze data survivability in a distributed backup system, which can reduce network traffic by putting the backups closer to the client than if using a data center. Our work is based on both theoretical methods, like Markov chains and linear programming, as well as empirical tools, like simulation and experimentation.
Article
Full-text available
For storage and recovery requirements on large-scale seismic waveform data of the National Earthquake Data Backup Center (NEDBC), a distributed cluster processing model based on Kafka message queues is designed to optimize the inbound efficiency of seismic waveform data stored in HBase at NEDBC. Firstly, compare the characteristics of big data storage architectures with that of traditional disk array storage architectures. Secondly, realize seismic waveform data analysis and periodic truncation, and write HBase in NoSQL record form through Spark Streaming cluster. Finally, compare and test the read/write performance of the data processing process of the proposed big data platform with that of traditional storage architectures. Results show that the seismic waveform data processing architecture based on Kafka designed and implemented in this paper has a higher read/write speed than the traditional architecture on the basis of the redundancy capability of NEDBC data backup, which verifies the validity and practicability of the proposed approach.
Chapter
Big Data applications provide new, disruptive tools to advance our knowledge about the mechanisms that characterize complex aspects of reality. Be it a high energy physics experiment or an analysis of social networks data, the strength of the approach is the availability of a huge richness of data; but, at the same time, it is also the main challenge, as this abundance of information must be processed at a bearable cost per information unit and requires higher scale systems to provide enough computing power. This is only possible if the Big Data platform is properly managed and exploited according to the needs of the applications, and a fundamental premise is the capability for a proper performance evaluation of the platform. In this chapter, we provide a glance over the main aspects of performance evaluation for Big Data architectures, together with some examples of model-based evaluation, in order to show how it is possible to characterize big scale architectures to support their correct management, and suggest a methodological coarse grain solution to exploit different conceptual and technical tools to integrate a flexible, model-based, performance analysis supported approach to Big Data systems design, capable of scaling up easily in the core evaluation stage means of Markovian agents.
Conference Paper
We describe a scalable server-array based testbed for simulating various usage scenarios of peer-to-peer (P2P) overlay networks. Each server is responsible for a subset of the simulated peer processes, managed by the mechanisms presented herein. The system follows a star topology, where one master server acts as a point of control for a set of slave servers. We present both the structure of the system and the activities needed before, during, and after a simulation run in order to accomplish automated simulations, where each interesting combination of the variable parameters of the overlay network is evaluated. The functionality of the control scripts is explained in detail. Among other things, the system sets up the required start conditions for a P2P overlay simulation, manages the online-time and the specific P2P activities of each simulated peer, and facilitates the handling of the generated log files, from which the result statistics are derived. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.
Technical Report
Full-text available
This report evaluates and compares the performance of two schemes for recovering lost data in a peer-to-peer (P2P) storage systems. The first scheme is centralized and relies on a server that recovers multiple losses at once, whereas the second one is distributed. By representing the state of each scheme by an absorbing Markov chain, we are able to compute their performance in terms of the delivered data lifetime and data availability. Numerical computations are provided to better illustrate the impact of each system parameter on the performance. Depending on the context considered, we provide guidelines on how to tune the system parameters in order to provide a desired data lifetime.
Article
Full-text available
Despite its popularity, relatively little is known about the traffic characteristics of the Skype VoIP system and how they differ from other P2P systems. We describe an experimental study of Skype VoIP traffic conducted over a five month period, where over 82 mil-lion datapoints were collected regarding the population of online clients, the number of supernodes, and their traffic characteristics. This data was collected from September 1, 2005 to January 14, 2006. Experiments on this data were done in a black-box manner, i.e., without knowing the internals or specifics of the Skype system or messages, as Skype encrypts all user traffic and signaling traffic payloads. The results indicate that although the structure of the Skype system appears to be similar to other P2P systems, particu-larly KaZaA, there are several significant differences in traffic. The number of active clients shows diurnal and work-week behavior, correlating with normal working hours regardless of geography. The population of supernodes in the system tends to be relatively stable; thus node churn, a significant concern in other systems, seems less problematic in Skype. The typical bandwidth load on a supernode is relatively low, even if the supernode is relaying VoIP traffic. The paper aims to aid further understanding of a significant, successful P2P VoIP system, as well as provide experimental data that may be useful for future design and modeling of such sys-tems. These results also imply that the nature of a VoIP P2P system like Skype differs fundamentally from earlier P2P systems that are oriented toward file-sharing, and music and video download appli-cations, and deserves more attention from the research community.
Conference Paper
Full-text available
The growing interest in peer-to-peer systems (such as Gnutella) has inspired numerous research activities in this area. Although many demonstrations have been performed that show that the performance of a peer-to-peer system is highly dependent on the underlying network characteristics, much of the evaluation of peer-to-peer proposals has used simplified models that fail to include a detailed model of the underlying network. This can be largely attributed to the complexity in experimenting with a scalable peer-to-peer system simulator built on top of a scalable network simulator with packet-level details. In this work we design and develop a framework for an extensible and scalable peer-to-peer simulation environment that can be built on top of existing packet-level network simulators. The simulation environment is portable to different network simulators, which enables us to simulate a realistic large scale peer-to-peer system using existing parallelization techniques. We demonstrate the use of the simulator for some simple experiments that show how Gnutella system performance can be impacted by the network characteristics.
Article
The test is based on the maximum difference between an empirical and a hypothetical cumulative distribution. Percentage points are tabled, and a lower bound to the power function is charted. Confidence limits for a cumulative distribution are described. Examples are given. Indications that the test is superior to the chi-square test are cited.
Article
A queueing model is developed that approximates the effect of synchronizations at parallel service completion instants. Exact results are first obtained for the maxima of independent exponential random variables with arbitrary parameters, and this is followed by a corresponding approximation for general random variables, which reduces to the exact result in the exponential case. This approximation is then used in a queueing model of RAID (Redundant Array of Independent Disks) systems, in which accesses to multiple disks occur concurrently and complete only when every disk involved has completed. We consider the two most common RAID variants, RAID0-1 and RAID5, as well as a multi-RAID system in which they coexist. This can be used to model adaptive multi-level RAID systems in which the RAID level appropriate to an application is selected dynamically. The random variables whose maximum has to be computed in these applications are disk response times, which are modelled by the waiting times in M/G/1 queues. To compute the mean value of their maximum requires the second moment of queueing time and we obtain this in terms of the third moment of disk service time, itself a function of seek time, rotational latency and block transfer time. Sub-models for these quantities are investigated and calibrated individually in detail. Validation against a hardware simulator shows good agreement at all traffic intensity levels, including the threshold for practical operation above which performance deteriorates sharply.
Article
S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.
Conference Paper
We address the problem of using replication to reliably maintain state in a distributed system for time spans that far exceed the lifetimes of individual replicas. This scenario is relevant for any system comprised of a potentially large and selectable number of replicated components, each of which may be highly unreliable, where the goal is to have enough replicas to keep the system "alive" (meaning at least one replica is working or available) for a certain expected period of time, i.e., the system's lifetime. In particular, this applies to recent efforts to build highly available storage systems based on the peer-to-peer paradigm. We model notions of replica loss and replica repair in such systems by a simple Markov chain model, and derive an expression for the lifetime of the replicated state. We then apply this model to study the impact of practical considerations like storage and bandwidth limits on the system, and describe methods to optimally choose system parameters so as to maximize lifetime. Our analysis sheds light on the efficacy of various replication strategies.