Conference PaperPDF Available

Simulation Analysis of Download and Recovery Processes in P2P Storage Systems

October 2009

October 2009

Source
IEEE Xplore

Conference: Teletraffic Congress, 2009. ITC 21 2009. 21st International

Authors:

Sara Alouf

National Institute for Research in Computer Science and Control

Philippe Nain

National Institute for Research in Computer Science and Control

Peer-to-peer storage systems rely on data fragmentation and distributed storage. Unreachable fragments are continuously recovered, requiring multiple fragments of data (constituting a ldquoblockrdquo) to be downloaded in parallel. Recent modeling efforts have assumed the recovery process to follow an exponential distribution, an assumption made mainly in the absence of studies characterizing the ldquorealrdquo distribution of the recovery process. This work aims at filling this gap through a simulation study. To that end, we implement the distributed storage protocol in the NS-2 network simulator and run a total of seven experiments covering a large variety of scenarios. We show that the fragment download time follows approximately an exponential distribution. We also show that the block download time and the recovery time essentially follow a hypo-exponential distribution with many distinct phases (maximum of as many exponentials). We use expectation maximization and least square estimation algorithms to fit the empirical distributions. We also provide a good approximation of the number of phases of the hypo-exponential distribution that applies in all scenarios considered. Last, we test the goodness of our fits using statistical (Kolmogorov-Smirnov test) and graphical methods.

Simulator architecture.

…

Three-level hierarchical random graph of Experiment 1.

…

Figures - uploaded by Sara Alouf

Content may be subject to copyright.

Content uploaded by Sara Alouf

Content may be subject to copyright.

Content uploaded by Abdulhalim Dandoush

Content may be subject to copyright.

Simulation Analysis of Download and Recovery Processes in P2P Storage

Systems

Abdulhalim Dandoush, Sara Alouf and Philippe Nain

INRIA Sophia Antipolis – B.P. 93 – 06902 Sophia Antipolis, France {adandous, salouf, nain}@sophia.inria.fr

July 10, 2019

Abstract

Peer-to-peer storage systems rely on data fragmentation and

distributed storage. Unreachable fragments are continuously

recovered, requiring multiple fragments of data (constituting

a “block”) to be downloaded in parallel. Recent modeling ef-

forts have assumed the recovery process to follow an exponen-

tial distribution, an assumption made mainly in the absence of

studies characterizing the “real” distribution of the recovery

process. This work aims at ﬁlling this gap through a simu-

lation study. To that end, we implement the distributed stor-

age protocol in the NS-2 network simulator and run a total of

seven experiments covering a large variety of scenarios. We

show that the fragment download time follows approximately

an exponential distribution. We also show that the block down-

load time and the recovery time essentially follow a hypo-

exponential distribution with many distinct phases (maximum

of as many exponentials). We use expectation maximization

and least square estimation algorithms to ﬁt the empirical dis-

tributions. We also provide a good approximation of the num-

ber of phases of the hypo-exponential distribution that applies

in all scenarios considered. Last, we test the goodness of our

ﬁts using statistical (Kolmogorov-Smirnov test) and graphical

methods.

1 Introduction

The peer-to-peer (P2P) model has proved to be an alternative

to the Client/Server model and a promising paradigm for Grid

computing, ﬁle sharing, voice over IP, backup and storage ap-

plications. A major advantage of P2P systems is that peers

can build a virtual overlay network on top of existing archi-

tecture and topology. Each peer receives/provides a service

from/to other peers through the overlay network; examples of

such a service are sharing the capacity of its central processing

unit, sharing its bandwidth capacity, sharing its free storage

space, and sharing local information about neighbors to help

each other locating resources.

P2P storage systems (P2PSS) have emerged as a cheap, scal-

able and self-repairing solution. Such distributed systems rely

on data fragmentation and distributed storage. Files are par-

titioned into ﬁxed-size blocks that are themselves partitioned

into fragments. Fragments are usually stored on different

peers. Given this conﬁguration, a user wishing to retrieve a

given data would need to perform multiple downloads, gen-

erally in parallel for an enhanced service. To mitigate churn

of peers, redundant fragments are continuously injected in the

system, thus maintaining data redundancy above a minimum

desired level. When the amount of unreachable fragments at-

tains a predeﬁned threshold, a recovery process is initiated.

In this paper, we consider systems relying on erasure codes

to generate the redundant fragments. If sdenotes the initial

number of fragments and rdenotes the amount of additional

redundant fragments, then any sout of the s+rfragments

can be used to generate a new redundant fragment (e.g. [18]).

Observe that this notation covers the case of replication-based

systems, with s= 1 and rdenoting the number of replicas.

The recovery process includes the download of a full block

of sfragments. P2PSS may rely on a central authority that

initiates the recovery process when necessary. This central

authority could reconstruct all missing fragments of a given

block of data and remotely store them on as many new peers.1

Alternatively, secure agents running on new peers could re-

construct by themselves missing fragments to be stored on the

peers disks. A more detailed description of P2PSS, their re-

covery schemes and their policies is presented in Section 2.

1.1 Motivation

There have been recent modeling efforts focusing on the per-

formance analysis of P2PSS in terms of data durability and

data availability. In [17], Ramabhadran and Pasquale analyze

systems that use full replication for data reliability. They de-

velop a Markov chain analysis, then derive an expression for

the lifetime of the replicated state and study the impact of

bandwidth and storage limits on the system. This study relies

on the assumption that the recovery process follows an expo-

nential distribution. Observe that in replication-based systems,

the recovery process lasts mainly for the download of one frag-

ment of data. In other words, the authors of [17] are implic-

itly assuming that the fragment download time is exponentially

1By “new” peers, we refer to peers that do not already store fragments of

the same block. We assume the system enforces the rule that a peer can store

at most one fragment of any given block.

distributed.

In our previous work [1], we developed a more general

model than that in [17], which applies to both replicated and

erasure-coded P2PSS. Also, unlike [17], our model accounts

for transient disconnections of peers, namely, the churn in the

system. We also assumed the recovery process to be expo-

nentially distributed. However, this assumption differs sub-

stantially between replicated and erasure-coded P2PSS, as in

the latter systems the recovery process is much more complex

than in the former systems. Furthermore, the recovery process

differs from centralized to distributed implementation.

In both studies, ﬁndings and conclusions rely on the as-

sumption that the recovery process is exponentially distributed.

However, this assumption is not supported by any experimental

data. To the best of our knowledge, there has been no simula-

tion study characterizing this process in real P2PSS.

This work aims at ﬁlling this gap through a simulation anal-

ysis. We believe it is essential to characterize the distribution of

download and recovery processes in P2PSS. Evaluating these

distributions is crucial to validate (or invalidate) the results pre-

sented in the above works and to better understand the avail-

ability and durability of data in these systems.

We will show through intensive simulations of many re-

alistic scenarios that (i) the fragment download time follows

closely an exponential distribution and (ii) fragment download

times are weakly correlated. Given that in erasure-coded sys-

tems, the block download time consists of downloading sev-

eral fragments in parallel, it follows that the recovery process

should follow approximately a hypo-exponential distribution

of several phases. (This is nothing but the sum of several in-

dependent random variables exponentially distributed having

each its own rate [11].) We will show that this is indeed the

case in the simulated data. We ﬁnd that, in erasure-coded sys-

tems, the exponential assumption made on the block download

time and on the recovery process is not met in most cases that

we considered. Our results suggest that the models presented

in [1] give accurate results on data durability and availability

only in replicated P2PSS (as in [17]). The case of erasure-

coded systems was inaccurately studied.

Building on the results of this paper, we incorporated into

the model of [1] the assumption that fragment download and

upload times are exponentially distributed with parameters α

and β, respectively. The resulting models, appeared in [5],

characterize data lifetime and availability in P2PSS storage

systems that use either replication or erasure codes, under more

realistic assumptions as supported by this paper.

1.2 Why do we use simulations?

To collect traces of fragment download/upload times, of block

download times and of recovery times, one can choose to per-

form simulations or experimentations either on testbeds or on

real networks. We would like to consider situations where

peers are either homogeneous or heterogeneous, different un-

derlying network topologies, and different propagation delays

in the network. Also, we would like to consider systems with

a large number of peers. To achieve all this with experiments

over real networks is very difﬁcult. Setting up experiments

over a dedicated network like Planet-Lab [16] would require a

long time, and there will be limitations on changing the topol-

ogy and the peers characteristics. We ﬁnd it most attractive

to implement the distributed storage protocol in a well-known

network simulator and to simulate different scenarios. We

choose NS-2 as network simulator because it is an open source

discrete event simulator targeted at networking research. NS-2

provides substantial support for simulation of TCP and routing

and it is well known and well validated.

1.3 Contributions

Our contributions are as follows:

•Implementation of the download and the recovery pro-

cesses in NS-2.

•Evaluation of the fragment/block download time and the

recovery process, under a variety of conditions: differ-

ent network topologies, heterogeneity of peers, different

propagation delay, centralized vs. distributed recovery

process.

•Data ﬁtting: the distribution of the block download time

and recovery time are ﬁtted using the Expectation Maxi-

mization (EM) algorithm and the distribution of the frag-

ment download time is ﬁtted using the Maximum Like-

lihood Estimation (MLE) and Least Square Estimation

(LSE).

•Statistical goodness-of-ﬁt test, namely, the Kolmogorov-

Smirnov test [13].

The rest of this paper is organized as follows. Section 2

overviews the storage protocol that we consider. Section 3 de-

scribes the simulation architecture, the methodology and the

setup of the simulations. In Section 4, some of our experi-

mental results are discussed. Section 5 brieﬂy reviews related

work. Last, Section 6 concludes the paper.

2 System Description

We will describe in this section the storage protocol that we

want to simulate:

•Files are partitioned into ﬁxed-size blocks (the block size

is SB) that are themselves partitioned into sfragments

(the fragment size is SF).

•Each block is stored as a total of s+rfragments, rof

them are redundant and generated using erasure codes.

•Fixing block and fragment sizes helps to ﬁx the value of

the parameters sand rin the system for all stored blocks.

These s+rfragments are stored over s+rdifferent peers.

•Mainly for privacy issues, a peer can store at most one

fragment of any block of data.

•The system has perfect knowledge of the location of frag-

ments at any given time, e.g. by using a Distributed Hash

Table (DHT) or a central authority. Only the latest known

location of each fragment is tracked, whether it is a con-

nected or disconnected peer.

•To overcome churn and maintain data reliability and

availability, unreachable fragments are continuously re-

covered.

•The number of connected peers at any time is typically

much larger than the number of fragments associated with

a block of data, i.e., s+r. Therefore, there are always at

least s+rnew connected peers which are ready to receive

and store fragments of a block of data.

•Once an unreachable fragment is recovered, any other

copy of it that “reappears” in the system due to a peer

reconnection is simply ignored, as only one location of

the fragment (the newest one) is recorded in the system.

Similarly, if a fragment is unreachable, the system knows

of only one disconnected peer that stores the unreachable

fragment.

Two implementations of the recovery process are considered.

This process is triggered for each block whose number of un-

reachable fragments reaches a threshold k.

In the centralized implementation, a central authority will:

(1) download in parallel sfragments from the peers which are

connected, (2) reconstruct at once all unreachable fragments

(by now considered as missing), and (3) upload them all in

parallel onto as many new peers for storage. In fact, Step 2

executes in a negligible time compared to the execution time

of Steps 1 and 3. Step 1 (resp. Step 3) execution completes

when the last fragment completes being downloaded (resp. up-

loaded).

In the distributed implementation, a secure agent on one new

peer is notiﬁed of the identity of one out of the kunreachable

fragments for it to reconstruct it. Upon notiﬁcation, the se-

cure agent (1) downloads sfragments from the peers which

are connected to the storage system, (2) reconstructs the spec-

iﬁed fragment and stores it on the peer’s disk; (3) the secure

agent then discards the sdownloaded fragments so as to meet

the privacy constraint that only one fragment of a block of data

is held by a peer. This operation iterates until less than kfrag-

ments are sensed unreachable and stops if the number of miss-

ing fragments reaches k−1. The recovery of one fragment

lasts mainly for the execution time of Step 1; the recovery is

completed then as soon as the last fragment (out of s) com-

pletes being downloaded.

When k= 1, the recovery process is said to be eager; when

k∈ {2,...,r}, the recovery process is said to be lazy.

3 Simulation Architecture and Setup

3.1 Architecture and Assumptions Overview

We implemented an application and a wrapper layers in NS-

2 (version 2.33) following the architecture depicted in Fig. 1.

The application layer represents the P2PSS application. As for

the wrapper layer, it is an intermediate layer that passes the

data between a transport agent object in NS-2 and the P2PSS

application.

We did minor changes to the following NS-2 ﬁles: node.cc,

node.h, agent.cc, agent.h, tcp-full.cc, and tcp-full.h. We use

FullTcp since it supports bidirectional data transfers. We fol-

low the same methodology as the Web cash application pre-

sented in the NS Manual (cf. [8, Chap. 40]) and use some of

the technical ideas presented in [7].

Implementing a new protocol at the application level of NS-

2 is very well documented in the NS Manual. We will therefore

skip the description of technical details of the implementation,

and refer the interested reader to [8, pp. 344–360].

We consider two different storage applications, a backup-

like application and an e-library-like application (“e” stands

for “electronic”) . In the ﬁrst, a ﬁle stored in the system can

be requested for retrieval only by the peer that has produced

the ﬁle. In the second, any ﬁle can be downloaded by any

peer in the system. In both applications, the storage protocol

follows the description of Section 2. In particular, the s+

rpeers associated with a given block are chosen uniformly

among the peers in the system.

Two types or requests are issued in the system. The ﬁrst

type is issued by the users of the system: a user issues a re-

quest to retrieve one of its ﬁles in the backup-like application,

or a public document in the e-library-like application. The sec-

ond type consists of management requests. These are issued by

the central authority (in the centralized implementation of the

recovery process) or by a peer (in the distributed implementa-

tion) as soon as the threshold kis reached for any stored block

of data.

File download requests are translated into (i) a request to

Agent (FullTcp)

packets

P2PSS Agent Wrapper

P2PSS Application

send(bytes) recv(bytes)

send_data(AppData) process_data(AppData)

Figure 1: Simulator architecture.

the directory service to obtain, for each block of the desired

ﬁle, a list of at least speers that store fragments of the block,

(ii) opening TCP connections with each peer in the said list

to download one fragment. All download requests issued by a

given peer form a Poisson process.

Recovery requests are issued only in the scenarios where

there is churn in the network. A recovery request concerning a

given block translates into (i) a request to the directory service

to obtain a list of at least speers that store fragments of said

block, (ii) opening TCP connections with each peer in the said

list to download one fragment. Once all sfragments have been

downloaded, the process proceeds with Steps 2 and 3, accord-

ing to the implementation, as explained in Section 2.

All peers in the simulator have the architecture reported in

Fig. 1. Peers share their available upload bandwidths and their

free storage volumes.

3.2 Network Topology

Having a representative view of enterprise networks or the In-

ternet topology is very important for a simulator to predict

the behavior of a network protocol or application if it were to

be deployed. In fact, the simulated topology often inﬂuences

the outcome of the simulations. Realistic topologies are thus

needed to produce realistic simulation results. Most of existing

simulations studies have used representations of a real topol-

ogy (e.g. the Arpanet), simple models (e.g. a star topology),

or random ﬂat graphs (i.e. non-hierarchical) that are generated

by Waxman’s edge-probability function [20].

However, random models offer very little control over the

structure of the resulting topologies. In particular, they do

not capture the hierarchy that is present in the Internet. Re-

cently, tools such as (BRITE [14] and GT-ITM [3]) have been

designed to generate more complex random graphs, that are

hierarchical, to better approximate the Internet’s hierarchical

structure.

To produce realistic topologies for our simulations, we use

the tool GT-ITM [3] to generate a total of six random graphs.

Three levels of hierarchy are used corresponding to transit do-

mains, stub domains, and local area networks (LANs) attached

to stub domains. Each graph has one transit domain of four

nodes; each of the nodes is connected to two or three other

transit nodes. Each transit node is connected on average to two

stub nodes, and each stub node is in turn connected on average

to four routers. Behind every router there is a certain number of

fully-connected peers constituting a LAN. The ﬁrst of these six

graphs is depicted in Fig. 2, where we have used the notation

TN for “transit node” and SN for “stub node”. The total num-

ber of peers in each graph is in the set {480,640,800,960}.

3.3 Experiments Setup

We ran a total of seven experiments. Experiments 1–6 used

the random graphs generated with the GT-ITM tool as detailed

before, whereas a simple star topology is used in Experiment

LAN router

Stub Node (SN)

Transit Node (TN)

peer

1Gbps 1Gbps

622mbps

1Gbps

622mbps

36mbps

1Gbps

622mbps

51mbps

34mbps

622mbps

76mbps

Figure 2: Three-level hierarchical random graph of Experi-

ment 1.

7. Regarding the intra- and inter-domain capacities, we rely on

the information provided by RENATER [19] and G ´

EANT [9]

web sites. In those networks, the links are well-provisioned.

To have a more complete study, we will consider, in Experi-

ments 5 and 6, links with smaller capacities, as can be seen in

rows 4–6 of Table 1. Propagation delays over TN-SN edges

vary from edge to edge as can be seen in row 7 of Table 1.

Let Cuand Cddenote respectively the upload and down-

load capacity of a peer. To set these values, we rely mainly on

the ﬁndings of [10] and [12]. The experimental study of ﬁle

sharing systems and of the Skype P2P voice over IP system

[10] found that more than 90% of users have upload capacity

Cubetween 30Kbps and 384Kbps. However, the measurement

study [12] done on BitTorrent clients in 2007 reports that 70%

of peers have an upload capacity Cubetween 350Kbps and

1Mbps and even 10% of peers have an upload capacity be-

tween 10Mbps and 110 Mbps. The capacities that we have se-

lected in the simulations vary uniformly between the values of

the ISDN and ADSL technologies; they can be found in rows

8–9 of Table 1. Observe that, except in Experiment 2, peers are

heterogeneous. We will attribute the propagation delays over

routers-peers edges randomly between 1ms and 25ms as can

be seen in row 10 of Table 1.

In Experiments 1, 3, 5 and 6, there exists a background traf-

ﬁc between three pairs of routers across the common back-

bone. This trafﬁc consists of random exponential and CBR

trafﬁc over UDP protocol and FTP trafﬁc over TCP.

In each of the experiments, the amount of data transferred

between routers and peers in the system during the observed

time (that is from 4e+5 up to 5e+6 seconds) are, on average,

4.5–9 GB of P2P application trafﬁc, and when applicable 150–

350 MB of FTP, 200–400 MB of CBR, and 250–500 MB of the

exponential trafﬁc. In each of the experiments, the P2P trafﬁc

Table 1: Experiments setup

Experiment number 1 2 3 4 5 6 7

Topology random random random random random random star

Number of peers 960 480 480 960 640 800 480

TN-TN capacities (Gbps) 1 1 1 1 1 1 —

TN-SN capacities (Mbps) 622 622 622 622 10–34 10–34 —

SN-routers capacities (Mbps) 34–155 34–155 34–155 34–155 4–10 4–10 —

TN-SN delays (ms) 5–25 5–50 5–75 5–50 5–25 5–25 —

Cuof peers (Kbps) 150–1000 256 128–1000 128–1000 256–700 256–1000 256–700

Cdof peers (Kbps) 8×Cu512 8×Cu8×Cu10 ×Cu4×Cu2048

routers-peers delays (ms) 1–20 1–20 1–20 1–20 1–10 1–25 1–25

Background trafﬁc yes no yes no yes yes no

Application type e-library e-library backup e-library backup e-library e-library

Peers churn no no no no yes yes yes

Recovery process — — — — dist. dist. cent.

r— — — — s s s/2

1/λ (min.) 80 8 80 144e3 160 13 16

SB(MB) 8 8 8 8 4 8 8

SF(KB) 1024 1024 1024 1024 512 1024 1024

s8 8 8 8 8 8 8

is well distributed over the active peers.

Experiments 3 and 5 simulate a backup-like application

whereas the other ﬁve experiments simulate an e-library-like

application. Churn is considered only in Experiments 5–7. As

a consequence, redundancy is added and maintained only in

these experiments. The storage overhead r/s is either 1 or

0.5. We consider the distributed implementation of the recov-

ery process in Experiments 5 and 6, and the centralized im-

plementation of the same in Experiment 7; the eager policy

(k= 1) is considered in all three experiments. In other words,

once a peer disconnects from the system, all fragments that are

stored on it must be recovered.

Churn is implemented as follows. We assume that each peer

alternates between a connected state, that lasts for a duration

called “on-time”, and a disconnected state, that lasts for a du-

ration called “off-time”. We assume in the simulations that

the successive on-times (respectively off-times) of a peer are

independent and identically distributed random variables with

a common exponential distribution function with parameter

µ1>0(respectively µ2>0). This assumption is in agree-

ment with the analysis in [17]. We consider 1/µ1= 3 hours

and 1/µ2= 1 hours.

Download requests are generated at each peer according to

a Poisson process. This assumption is met in real networks

as found in [10]. We assume all peers have the same request

generation rate, denoted λ. We vary the value of λacross the

experiments as reported in row 16 of Table 1.

The last setting concerns the ﬁles that are stored in the

P2PSS. Fragment sizes SF(resp. block sizes SB) in P2P sys-

tems are typically between 64KB and 4MB each (resp. be-

tween 4MB and 9MB each). We will consider in most of our

experiments SF= 1MB and SB= 8MB, except in Experi-

ment 5 where SF= 512KB and SB= 4MB. Therefore s= 8

in all experiments. As for the ﬁle size, we assume for now that

it is equal to the block size. Therefore, the ﬁle download size

is actually the block download size. We leave the case of more

general ﬁle sizes to a future study. Observe that the recovery

process is related to the block download time and not to the ﬁle

download time.

Table 1 summarizes the key settings of the experiments.

4 Experimental Results

In this section, we present the results of our simulations and the

inference that we can draw from them. For each experiment,

we collect the fragment download time, the block download

time and the recovery time when applicable. In Experiments

5 and 6 (distributed recovery), the two latter durations are col-

lected to the same dataset as there is no essential difference

between them. Having collected these samples, we compute

the sample average and use MLE, LSE and EM algorithms to

ﬁt the empirical distributions. Concerning the fragment down-

load time, we perform the Kolmogorov-Smirnov test [13] on

the ﬁtted distribution. In the following, we will present se-

lected results from Experiments 1, 6 and 7. The results of the

other experiments are brieﬂy reported in Tables 2–3.

4.1 Experiment 1

We have collected 76331 samples of the fragment download

time (cf. column 2 of Table 2). The empirical cumulative dis-

tribution function (CDF) is depicted in Fig. 3(a). We can see

Table 2: Summary of experiments results

Experiment number 1 2 3 4 5 6 7

Average frag. down. time = 1/α (sec.) 40.35 141.89 44.89 30.66 34.7367 108.86 40.722

Samples number 76331 71562 12617 4851 9737 80301 4669

tm(sec.) 8.77 33.71 8.631 8.71 6.84 8.743 16.4

1/ˆα(sec.) 39.351 124.607 39.622 27.3392 32.106 103.635 32.05

1/β,1/ˆ

β(sec.) — — — — — — 6.22, 5.11

Average of recovery or block down. time (sec.) 102.75 365.73 105.254 82.88 92.4762 278.71 89.848

Samples number 9197 8938 1516 602 589 10025 561

Table 3: Block download time or recovery process: Validation of the approximations introduced in Eqs. (1)–(3)

Experiment number 1 2 3 4 5 6 7

Sample average 102.75 365.73 105.25 82.88 92.48 278.71 89.85

Inferred average from Eqs. (1), (2) 109.66 385.64 122.00 83.33 94.40 295.86 116.89

Relative error (%) 6.7 5.4 15.9 0.5 2.1 6.2 30.1

Inferred average from Eqs. (4), (3) 106.95 372.38 116.32 83.01 94.10 290.41 92.21

Relative error (%) 4.1 1.8 10.5 0.2 1.8 4.2 2.6

that it is remarkably close to the exponential distribution. Two

exponential distributions are plotted in Fig. 3(a), each having a

different parameter, derived from a different ﬁtting technique.

The two techniques that we used are MLE and LSE. The pa-

rameter returned by MLE is nothing but the inverse of the sam-

ple average and is denoted α; see row 2 of Table 2.

Beyond the graphical match between the empirical distri-

bution and the exponential distribution, we did a hypothesis

test. Let Xbe a vector storing the collected fragment down-

load times. The Kolmogorov-Smirnov test compares the vec-

tor Xwith a CDF function, denoted cdf (in the present case,

it is the exponential distribution), to determine if the sample X

could have the hypothesized continuous distribution cdf. The

null hypothesis is that Xhas the distribution deﬁned in cdf,

the alternative one being that Xdoes not have that distribution.

We reject the null hypothesis if the test is signiﬁcant at the l%

level. In Experiment 1, the null hypothesis with α= 1/40.35

is not rejected for l= 7%.

Looking now at concurrent downloads, we have found that

these are weakly correlated and close to be independent. Be-

side the fact that the total workload is equally distributed over

the active peers, there are two main reasons for the weak cor-

relation between concurrent downloads as observed in Experi-

ment 1: (i) the good connectivity of the core network and (ii)

the asymmetry in peers upstream and downstream bandwidths.

So, as long as the bottleneck is the upstream capacity of peers,

the fragment download times are close to be independent.

Regarding the block download times, we have collected

9197 samples. The sample average is given in row 7 of Ta-

ble 2). The empirical CDF is plotted in Fig. 3(b). We followed

the same methodology and computed the closest exponential

distribution using MLE. However, the match between the two

distribution appears to be poor, and actually, the alternative hy-

pothesis is not rejected in this case.

To ﬁnd a distribution that will more likely ﬁt the empirical

data, we make the following analysis. To get a block of data, s

fragments, stored on sdifferent peers, have to be downloaded.

This is more efﬁciently done in parallel and this is how we im-

plemented it in the simulator. We have seen that the download

of a single fragment is well-modeled by an exponential random

variable with parameter α. Also, concurrent downloads were

found to be close to independent. Therefore, the time needed

for downloading sfragments in parallel is distributed like the

maximum of s“independent” exponential random variables,

which, due to the memoryless property (see also [11]), is the

sum of sindependent exponential random variables with pa-

rameters sα, (s−1)α, . . . , α. This distribution is called the

hypo-exponential distribution and its expectation is

E[T] = 1/α

i=1

1/i (1)

where Tdenotes the block download time (or equivalently the

distributed recovery duration).

In Experiment 1, E[T] = 109.66 seconds, while the sam-

ple average is equal to 102.75; cf. column 2 of Table 3. The

relative error is 6.7%. The hypo-exponential distribution with

sphases and parameters sα, (s−1)α,...,αis plotted in Fig.

3(b). This distribution has a very good visual match with the

empirical CDF of the block download time.

As a next step, we apply an EM algorithm [6] to ﬁnd the

best hypo-exponential distribution with sphases that ﬁts the

empirical data. In particular, we use EMpht [15], which is a

program for ﬁtting phase-type distributions to collected data.

We do not plot the outcome of this program in Fig. 3(b) as

it mainly overlaps with the hypo-exponential distribution with

sphases and parameters sα, (s−1)α,...,α that is already

plotted there. After performing the Kolmogorov-Smirnov test,

we ﬁnd that the null hypothesis is not rejected for l= 7%

(same signiﬁcance level as for the fragment download times).

0 100 200 300 400 500 600 700 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Cumulative distribution function

Fragment download time (seconds)

CDF of fragment download time

LSE exponential fit of data

MLE exponential fit of data

(a) Exponential ﬁt of the fragment download time distribution

0 100 200 300 400 500 600 700 800

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Block download time (seconds)

Cumulative distribution function

CDF of block download time

MLE exponential fit of data

Hypo−exponential fit of data

(b) Fitting of the block download time distribution

Figure 3: Experiment 1: Fragment and block download times.

We conclude the analysis of the ﬁrst experiment’s results

with four important points:

•The exponential assumption on the block download time

is not met in realistic simulations.

•The fragment download time could be modeled by an ex-

ponential distribution with parameter αequal to the in-

verse of its average.

•Download times are weakly correlated and close to be in-

dependent as long as the bottleneck is the upstream ca-

pacity of peers.

•As a consequence, the block download time could be

modeled by a hypo-exponential distribution with sphases

and parameters sα, (s−1)α,...,α.

4.2 Experiment 6

In this experiment, peers are not always connected. Each time

a peer disconnects from the network, all the fragments that

were stored on his disk will have to be recovered. The recovery

process is implemented in a distributed way.

The empirical CDF of the fragment download time and that

of the block download time or the recovery time are reported

in Fig. 4. Following the same methodology as that used to

analyze the results of Experiment 1, we reach the same con-

clusions. The relevant parameters are reported in column 7 of

Tables 2 and 3. However, the null hypothesis for the block

download time or the recovery process is not always rejected.

This is the case of Experiment 7, as seen next.

0 100 200 300 400 500 600 700 800 900 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fragment download time (seconds)

Cumulative distribution function

CDF of fragment download time

LSE exponential fit of data

MLE exponential fit of data

(a) Exponential ﬁt of the fragment download time distribution

0 100 200 300 400 500 600 700 800 900 1000 1100

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recovery or block download time (seconds)

Cumulative distribution function

CDF of recovery or block download time

MLE exponential fit of data

Hypo−exponential fit of data

(b) Fitting of the recovery or block download time distribution

Figure 4: Experiment 6: Download and distributed recovery

processes.

4.3 Experiment 7

Experiment 7 is the only one that uses a centralized recovery

process. Also, it is the only one using a simple star topology.

In this experiment, the alternative hypothesis on the recovery

process distribution is not rejected.

There is a simple reason for that. We actually know that

the download of a single fragment cannot be inﬁnitely small,

as suggested by the exponential distribution. Let tmbe the

duration of the fastest fragment download among all sdown-

loads. All other (slower) downloads are necessarily bounded

by tm. The effect of this minimum value can be neglected

as long as tmis negligible with respect to the average frag-

ment download time. Otherwise, we need to consider that the

fragment download/upload time is composed of two compo-

nents: (i) a (constant) minimum delay tmand (ii) a random

variable distributed exponentially with parameter ˆα(resp. ˆ

β).

This random variable models the collected data, shifted left by

the value of tm. The minimum delay can be approximated as

RT T + (SF+Headers)/max{Cu}, where RT T stands for

round-trip time.

The value of tmis clearly visible in Fig. 5(a). We plot in

this ﬁgure the empirical CDF of the fragment download time

and the MLE exponential ﬁts to both the collected and shifted

data. The null hypothesis is rejected for the collected data but

not rejected for the shifted data.

0 50 100 150 200 250 300 350 400 450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Fragment download time (seconds)

Cumulative distribution function

CDF of fragment download time

MLE exponential fit of data

MLE exponential fit of shifted data

(a) Exponential ﬁt of the fragment download time distribution

ignoring the minimum value

50 100 150 200 250 300 350 400 450

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Recovery time (seconds)

Cumulative distribution function

CDF of recovery time

MLE exponential fit of data

Hypo−exponential fit of data

Hypo−exponential fit of shifted data

(b) Fitting of the recovery time distribution

Figure 5: Experiment 7: Fragment and recovery time, central-

ized recovery.

This is the same case of the recovery process, whose em-

pirical CDF is plotted in Fig. 5(b). Repeating the same anal-

ysis than in Section 4.1, and assuming that the fragment up-

load time follows an exponential distribution with parameter

β, then the centralized recovery process, denoted Tc, would be

modeled by a hypo-exponential distribution with s+kphases

(k= 1 in Experiment 7) having expectation

E[Tc] = 1/α

i=1

1/i + 1/β

j=1

1/j . (2)

Considering this distribution, we ﬁnd that the null hypothesis

of the Kolmogorov-Smirnov test for the collected data with

parameters 1/α = 40.72 and 1/β = 6.22 is rejected2for l=

6%, while it is not rejected for the shifted data with parameters

1/ˆα= 32.05 and 1/ˆ

β= 5.11.

Equations (1) and (2) should then be replaced with

E[T] = tm+ 1/ˆα

i=1

1/i , (3)

E[Tc] = tm+ 1/ˆα

i=1

1/i + 1/ˆ

j=1

1/j . (4)

The averages inferred from Eqs. (1)–(4) are listed in rows

3 and 5 of Table 3, and their relative errors with respect to the

sample average are listed in rows 4 and 6 of the same table.

Observe that the inferred average improves across all exper-

iments when considering shifted data. The best improvement

seen is that in Experiment 7. By considering that the shifted re-

covery time is hypo-exponen tially distributed with s+1 phases

and parameters sˆα, (s−1)ˆα, . . . , ˆα, ˆ

β, the relative error on the

inferred average drops from 30.1% to 2.6%.

The conclusion of this discussion is that the exponential as-

sumption on fragments download/upload time is met in most

cases. The same assumption does not hold on the block down-

load time. The recovery time and the block download time are

well approximated by a hypo-exponential distribution in most

cases.

5 Related work

Although the literature on modeling and simulating P2P sys-

tems and parallel downloading is abundant, the recovery pro-

cess in P2PSS is a subject that has not been analyzed.

In [2], the authors propose a multiple-access protocol to

minimize the download time of a document from multiple mir-

ror sites in parallel using Tornado erasure codes based on the

idea of digital fountain. A document of size SBis encoded on

each mirror server with redundant information. The encoded

document consists of n=s+rdifferent fragments of size

SFwhere nSF> sSF> SB. To minimize the number of

duplicated packets received at the requester, each mirror en-

codes the document with Tornado codes and generates all the

nfragments, then it permutes the order of packets before send-

ing, and ﬁnally starts to deliver the packets continuously to the

2Even though it is rejected, this distribution is still much closer to the em-

pirical data than the exponential distribution.

requester of the document. The receiver can then reconstruct

the document SBafter collecting sdistinct packets of size SF

from the mirrors.

In [4], the authors have focused on the average download

time of each user in a P2P network while considering the het-

erogeneity of service capacities of peers. They point out that

the common approach of analyzing the average download time

based on average service capacity is fundamentally ﬂawed.

The authors of [7] implement a BitTorrent ﬁle sharing pro-

tocol in NS-2 and compare packet-level simulation results with

ﬂow-level for the download time of one ﬁle among an active

peer-set. They show that the propagation delay can signiﬁ-

cantly inﬂuence the download performance of BitTorrent.

6 Conclusion

This paper performs a simulation analysis of download and

recovery processes in P2PSS. Implementing a storage proto-

col in NS-2, we set up seven simulations which enables us to

collect fragment/block download times and recoveries times

under a variety of conditions. We show that the exponential

assumption on the block download time does not hold. The

same assumption on fragments download/upload time is met

in most cases implying that both the block download time and

the recovery process could be modeled by a hypo-exponential

distribution with a pre-determined number of phases.

References

[1] S. Alouf, A. Dandoush, and P. Nain. Performance anal-

ysis of peer-to-peer storage systems. In Proc. of 20th

ITC, volume 4516 of Lecture Notes in Computer Science,

pages 642–653, Ottawa, Canada, June 2007.

[2] J. Byers, M. Luby, and M. Mitzenmacher. Accessing

multiple mirror sites in parallel: Using tornado codes to

speed up downloads. In Proc. of IEEE Infocom ’99, pages

21–25, New York, USA, March 1999.

[3] K. Calvert, M. Doar, and E. W. Zegura. Modeling In-

ternet topology. IEEE Communications Magazine, June

1997.

[4] Yuh-Ming Chiu and Do Young Eun. Minimizing

ﬁle download time in stochastic peer-to-peer networks.

IEEE/ACM Trans. Netw., 16(2):253–266, 2008.

[5] A. Dandoush, S. Alouf, and P. Nain. Performance analy-

sis of centralized versus distributed recovery schemes in

P2P storage systems. In Proc. of IFIP/TC6 NETWORK-

ING 2009, Aachen, Germany, May 11–15, 2009. to ap-

pear.

[6] A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum

likelihood from incomplete data via the em algorithm. J.

Royal Statist. Soc., 39(1):1–37, 1977.

[7] K. Eger, T. Hobfeld, A. Binzenhofer, and G. Kunzmann.

Efﬁcient simulation of large-scale P2P networks: Packet-

level vs. ﬂow-level simulations. In Proc. of UPGRADE-

CN’07, Monterey, California, USA, June 2007.

[8] K. Fall and K. Varadhan. The NS manual, the VINT

project, UC Berkeley, LBL, USC/ISI, and Xerox PARC.

http://www.isi.edu/nsnam/ns/ns-documentation.ht

November 2008.

[9] G ´

EANT: a pan-European backbone which connects

Europe’s national research and education networks.

http://www.geant.net/server/show/nav.159.

[10] A. Guha, N. Daswani, and R. Jain. An experimental study

of the skype peer-to-peer VoIP system. In Proc. of 5th

IPTPS, Santa Barbara, California, February 2006.

[11] P. Harrison and S. Zertal. Queueing models of RAID sys-

tems with maxima of waiting times. Performance Evalu-

ation Journal, 64(7-8):664–689, August 2007.

[12] T. Isdal, M. Piatek, A. Krishnamurthy, and T. Anderson.

Leveraging Bittorrent for end host measurements. In

Proc. 8th Passive and Active Measurement Conference,

Louvain-la-neuve, Belgium, April 2007.

[13] F. J. Massey. The kolmogorov-smirnov test for goodness

of ﬁt. J. Am. Statist. Assoc., 46(253):68—-78, 1951.

[14] A. Medina, A. Lakhina, I. Matta, and J. Byers. Brite:

Boston University representative Internet topology gen-

erator. http://www.cs.bu.edu/brite/.

[15] M. Olsson. The EMpht-programme. Technical re-

port, Department of Mathematics, Chalmers University

of Technology, 1998.

[16] PlanetLab. An open platform for developing,

deploying, and accessing planetary-scale services.

http://www.planet-lab.org/, 2007.

[17] S. Ramabhadran and J. Pasquale. Analysis of long-

running replicated systems. In Proc. of IEEE Info-

com ’06, Barcelona, Spain, April 2006.

[18] I.S. Reed and G. Solomon. Polynomial codes over certain

ﬁnite ﬁelds. J. SIAM, 8(2):300–304, June 1960.

[19] Renater: Le R´eseau National de t´el´ecommunications

pour la Technologie, l’Enseignement et la Recherche.

http://www.renater.fr.

[20] Bernard M. Waxman. outing of multipoint connections.

IEEE Journal on Selected Areas in Communications,

6(9):1617–1622, 1988.

From Wireless Networks to Green IT - A Journey Through Stochastic Models and Tools

Thesis

Full-text available

Dec 2017

Sara Alouf

This manuscript recollects some of my contributions since my PhD defense. These were achieved while I was a researcher at Inria Sophia Antipolis Méditerranée in the Project-Team Maestro. My research activities focus on the modeling and performance evaluation of networks. In fact, this term encompasses a wide variety of situations. One may consider a particular layer in the protocols stack like the access protocol to the communication channels, or focus on the application layer and study overlay networks or cache networks. As a researcher, I first became interested in controlling the access to the medium in wireless networks. Subsequently, I considered on one hand the power save mode of mobile devices and on the other hand the problem of evolutionary routing in networks with reduced connectivity. After studying the energy savings at mobile terminals, I turned my focus on the energy consumption of base stations in cellular networks and on the use of renewable energy sources for their electrical power supply. This line of research has led to the stochastic modeling of solar radiation in order to better take into account renewable energy sources in the performance evaluation of communications networks. Meanwhile, I developed a second line of research at the application level of communications systems. I was first interested in peer-to-peer storage systems. I then studied hierarchical networks of caches such as that of the Domain Name System (DNS). I have also done research work in the framework of partnership projects with private companies. I contributed to a study on the active management of flows at the core of the network (project with the former Alcatel-Lucent Bell Labs, now Nokia) and to the performance evaluation of urban train control based on wireless communications (project with Alstom Transport).

Performance modeling and analysis of hypoexponential network servers

Article

Full-text available

Aug 2017
TELECOMMUN SYST

Hypoexponential servers are commonly seen in today’s computer and communication networks whereby incoming packets are processed by the network server in multiple stages with each stage having a different processing time. This paper presents an analytical model to capture the behavior and subsequently analyze the performance of these network servers or similarly behaving systems. From our model, we derive key performance measures and features which include CPU utilization, system idleness, mean throughput, packet loss, mean system and queuing packet delays, and mean system and queue sizes. In addition, we present two popular finite queueing models (namely, M / D / 1 / K and M / M / 1 / K) to approximate our hypoexponential model. Results show that the both of these approximate models give close results when the system queue size is large.

Improving reliability and performances in large scale distributed applications with erasure codes and replication

Article

Jul 2015
FUTURE GENER COMP SY

Replication of Data Blocks is one of the main technologies on which Storage Systems in Cloud Computing and Big Data Applications are based. With the heterogeneity of nodes, and an always-changing topology, keeping the reliability of the data contained in the common large-scale distributed file system is an important research challenge. Common approaches are based either on replication of data or erasure codes. The former stores each data block several times in different nodes of the considered infrastructures: the drawback is that this can lead to large overhead and non-optimal resources utilization. Erasure coding instead exploits Maximum Distance Separable codes that minimize the information required to restore blocks in case of node failure: this approach can lead to increased complexity and transfer time due to the fact that several blocks, coming from different sources, are required to reconstruct lost information. In this paper we study, by means of discrete event simulation, the performances that can be obtained by combining both techniques, with the goal of minimizing the overhead and increasing the reliability while keeping the performances. The analysis proves that a careful balance between the application of replication and erasure codes significantly improves reliability and performances avoiding large overheads with respect to the isolated use of replication and redundancy.

Models for Performance Analysis and Control of Networks

Technical Report

Full-text available

Jan 2009

MAESTRO is an INRIA project-team whose members are located in Sophia Antipolis (S. Alouf, K. Avrachenkov, P. Nain, G. Neglia), at LIA in Avignon (E. Altman) and at LIRMM in Montpellier (A.-E. Baert and A. Jean-Marie). MAESTRO is concerned with the modeling, performance evaluation, optimization and control of stochastic Discrete-Event Dynamical Systems (DEDS), with a particular emphasis on networks and their applications. The scientific contributions are both theoretical, with the development of new modeling formalisms, and applied, with the development of software tools for the performance evaluation of DEDS.

Modeling Replication and Erasure Coding in Large Scale Distributed Storage Systems Based on CEPH

Chapter

Jul 2016

The efficiency of storage systems is a key factor to ensure sustainability in data centers devoted to provide cloud services. A proper management of storage infrastructures can ensure the best trade off between costs, reliability and quality of service, enabling the provider to be competitive in the market. Heterogeneity of nodes, and the need for frequent expansion and reconfiguration of the subsystems fostered the development of efficient approaches that replace traditional data replication, by exploiting more advanced techniques, such the ones that leverage erasure codes. In this paper we use an ad-hoc discrete event simulation approach to study the performances of replication and erasure coding with different parametric configurations, aiming at the minimization of overheads while obtaining the desired reliability. The approach is demonstrated with a practical application to the erasure coding plugins of the increasingly popular CEPH distributed file system.

Modeling and Analysis of Reliable Peer-to-Peer Storage Systems

Article

Nov 2010

Julian Monteiro

Large scale peer-to-peer systems are foreseen as a way to provide highly reliable data storage at low cost. To ensure high durability and high resilience over a long period of time the system must add redundancy to the original data. It is well-known that erasure coding is a space efficient solution to obtain a high degree of fault-tolerance by distributing encoded fragments into different peers of the network. Therefore, a repair mechanism needs to cope with the dynamic and unreliable behavior of peers by continuously reconstructing the missing redundancy. Consequently, the system depends on many parameters that need to be well tuned, such as the redundancy factor, the placement policies, and the frequency of data repair. These parameters impact the amount of resources, such as the bandwidth usage and the storage space overhead that are required to achieve a desired level of reliability, i.e., probability of losing data. This thesis aims at providing tools to analyze and predict the performance of general large scale data storage systems. We use these tools to analyze the impact of different choices of system design on different performance metrics. For instance, the bandwidth consumption, the storage space overhead, and the probability of data loss should be as small as possible. Different techniques are studied and applied. First, we describe a simple Markov chain model that harnesses the dynamics of a storage system under the effects of peer failures and of data repair. Then we provide closed-form formulas that give good approximations of the model. These formulas allow us to understand the interactions between the system parameters. Indeed, a lazy repair mechanism is studied and we describe how to tune the system parameters to obtain an efficient utilization of bandwidth. We confirm by comparing to simulations that this model gives correct approximations of the system average behavior, but does not capture its variations over time. We then propose a new stochastic model based on a fluid approximation that indeed captures the deviations around the mean behavior. These variations are most of the time neglected by previous works, despite being very important to correctly allocate the system resources. We additionally study several other aspects of a distributed storage system: we propose queuing models to calculate the repair time distribution under limited bandwidth scenarios; we discuss the trade-offs of a Hybrid coding (mixing erasure codes and replication); and finally we study the impact of different ways to distribute data fragments among peers, i.e., placement strategies.

Distribution and Storage in Networks

Article

Oct 2013

Remigiusz Modrzejewski

In this thesis we study multiple approaches to efficiently accommodating for the future growth of the Internet. The exponential growth of Internet traffic, reported to be as high as 41% in peak throughput in 2012 alone, continues to pose challenges to all interested parties. Therefore, to accommodate the growth, smart management and communication protocols are needed. The basic protocols of the Internet are point-to-point in nature. However, the traffic is largely broadcasting, with projections stating that as much as 80-90% of it will be video by 2016. This discrepancy leads to inefficiency, where multiple copies of essentially the same messages travel in parallel through the same links. In this thesis we study multiple approaches to mitigating this inefficiency. The contributions are organized by layers and phases of the network life. We look into optimal cache provisioning during network design. Next, we move to managing an existing network. We look into putting devices to sleep mode, using caching and cooperation with Content Distribution Networks. In the application layer, we look into maintaining balanced trees for media broadcasting. Finally, we analyze data survivability in a distributed backup system, which can reduce network traffic by putting the backups closer to the client than if using a data center. Our work is based on both theoretical methods, like Markov chains and linear programming, as well as empirical tools, like simulation and experimentation.

Research on a Distributed Processing Model Based on Kafka for Large-Scale Seismic Waveform Data

Article

Full-text available

Feb 2020

For storage and recovery requirements on large-scale seismic waveform data of the National Earthquake Data Backup Center (NEDBC), a distributed cluster processing model based on Kafka message queues is designed to optimize the inbound efficiency of seismic waveform data stored in HBase at NEDBC. Firstly, compare the characteristics of big data storage architectures with that of traditional disk array storage architectures. Secondly, realize seismic waveform data analysis and periodic truncation, and write HBase in NoSQL record form through Spark Streaming cluster. Finally, compare and test the read/write performance of the data processing process of the proposed big data platform with that of traditional storage architectures. Results show that the seismic waveform data processing architecture based on Kafka designed and implemented in this paper has a higher read/write speed than the traditional architecture on the basis of the redundancy capability of NEDBC data backup, which verifies the validity and practicability of the proposed approach.

Performance Modeling of Big Data-Oriented Architectures

Chapter

Oct 2016

Big Data applications provide new, disruptive tools to advance our knowledge about the mechanisms that characterize complex aspects of reality. Be it a high energy physics experiment or an analysis of social networks data, the strength of the approach is the availability of a huge richness of data; but, at the same time, it is also the main challenge, as this abundance of information must be processed at a bearable cost per information unit and requires higher scale systems to provide enough computing power. This is only possible if the Big Data platform is properly managed and exploited according to the needs of the applications, and a fundamental premise is the capability for a proper performance evaluation of the platform. In this chapter, we provide a glance over the main aspects of performance evaluation for Big Data architectures, together with some examples of model-based evaluation, in order to show how it is possible to characterize big scale architectures to support their correct management, and suggest a methodological coarse grain solution to exploit different conceptual and technical tools to integrate a flexible, model-based, performance analysis supported approach to Big Data systems design, capable of scaling up easily in the core evaluation stage means of Markovian agents.

Scalable Star-Topology Server-Array Based P2P Overlay Network Testbed

Conference Paper

Jan 2012

We describe a scalable server-array based testbed for simulating various usage scenarios of peer-to-peer (P2P) overlay networks. Each server is responsible for a subset of the simulated peer processes, managed by the mechanisms presented herein. The system follows a star topology, where one master server acts as a point of control for a set of slave servers. We present both the structure of the system and the activities needed before, during, and after a simulation run in order to accomplish automated simulations, where each interesting combination of the variable parameters of the overlay network is evaluated. The functionality of the control scripts is explained in detail. Among other things, the system sets up the required start conditions for a P2P overlay simulation, manages the online-time and the specific P2P activities of each simulated peer, and facilitates the handling of the generated log files, from which the result statistics are derived. © 2012 ICST Institute for Computer Science, Social Informatics and Telecommunications Engineering.

Performance Analysis of Peer-to-Peer Storage Systems

Technical Report

Full-text available

Jan 2006

This report evaluates and compares the performance of two schemes for recovering lost data in a peer-to-peer (P2P) storage systems. The first scheme is centralized and relies on a server that recovers multiple losses at once, whereas the second one is distributed. By representing the state of each scheme by an absorbing Markov chain, we are able to compute their performance in terms of the delivered data lifetime and data availability. Numerical computations are provided to better illustrate the impact of each system parameter on the performance. Depending on the context considered, we provide guidelines on how to tune the system parameters in order to provide a desired data lifetime.

An Experimental Study of the Skype Peer-to-Peer VoIP System

Article

Full-text available

Dec 2005

Despite its popularity, relatively little is known about the traffic characteristics of the Skype VoIP system and how they differ from other P2P systems. We describe an experimental study of Skype VoIP traffic conducted over a five month period, where over 82 mil-lion datapoints were collected regarding the population of online clients, the number of supernodes, and their traffic characteristics. This data was collected from September 1, 2005 to January 14, 2006. Experiments on this data were done in a black-box manner, i.e., without knowing the internals or specifics of the Skype system or messages, as Skype encrypts all user traffic and signaling traffic payloads. The results indicate that although the structure of the Skype system appears to be similar to other P2P systems, particu-larly KaZaA, there are several significant differences in traffic. The number of active clients shows diurnal and work-week behavior, correlating with normal working hours regardless of geography. The population of supernodes in the system tends to be relatively stable; thus node churn, a significant concern in other systems, seems less problematic in Skype. The typical bandwidth load on a supernode is relatively low, even if the supernode is relaying VoIP traffic. The paper aims to aid further understanding of a significant, successful P2P VoIP system, as well as provide experimental data that may be useful for future design and modeling of such sys-tems. These results also imply that the nature of a VoIP P2P system like Skype differs fundamentally from earlier P2P systems that are oriented toward file-sharing, and music and video download appli-cations, and deserves more attention from the research community.

Mapping Peer Behavior to Packet-level Details: A Framework for Packet-level Simulation of Peer-to-Peer Systems.

Conference Paper

Full-text available

Jan 2003

The growing interest in peer-to-peer systems (such as Gnutella) has inspired numerous research activities in this area. Although many demonstrations have been performed that show that the performance of a peer-to-peer system is highly dependent on the underlying network characteristics, much of the evaluation of peer-to-peer proposals has used simplified models that fail to include a detailed model of the underlying network. This can be largely attributed to the complexity in experimenting with a scalable peer-to-peer system simulator built on top of a scalable network simulator with packet-level details. In this work we design and develop a framework for an extensible and scalable peer-to-peer simulation environment that can be built on top of existing packet-level network simulators. The simulation environment is portable to different network simulators, which enables us to simulate a realistic large scale peer-to-peer system using existing parallelization techniques. We demonstrate the use of the simulator for some simple experiments that show how Gnutella system performance can be impacted by the network characteristics.

Routing of multipoint connections

Article

Jan 1988

B.M. Waxman

Polynomial Codes Over Certain Finite Fields

Article

Jun 1960

BRITE: Boston university representative internet topology generator

Article

The Kolmogorov-Smirnov Test of Goodness of Fit

Article

Mar 1951
J AM STAT ASSOC

E J. Massey

The test is based on the maximum difference between an empirical and a hypothetical cumulative distribution. Percentage points are tabled, and a lower bound to the power function is charted. Confidence limits for a cumulative distribution are described. Examples are given. Indications that the test is superior to the chi-square test are cited.

Queueing models of RAID systems with maxima of waiting times

Article

Aug 2007
PERFORM EVALUATION

A queueing model is developed that approximates the effect of synchronizations at parallel service completion instants. Exact results are first obtained for the maxima of independent exponential random variables with arbitrary parameters, and this is followed by a corresponding approximation for general random variables, which reduces to the exact result in the exponential case. This approximation is then used in a queueing model of RAID (Redundant Array of Independent Disks) systems, in which accesses to multiple disks occur concurrently and complete only when every disk involved has completed. We consider the two most common RAID variants, RAID0-1 and RAID5, as well as a multi-RAID system in which they coexist. This can be used to model adaptive multi-level RAID systems in which the RAID level appropriate to an application is selected dynamically. The random variables whose maximum has to be computed in these applications are disk response times, which are modelled by the waiting times in M/G/1 queues. To compute the mean value of their maximum requires the second moment of queueing time and we obtain this in terms of the third moment of disk service time, itself a function of seek time, rotational latency and block transfer time. Sub-models for these quantities are investigated and calibrated individually in detail. Validation against a hardware simulator shows good agreement at all traffic intensity levels, including the threshold for practical operation above which performance deteriorates sharply.

Maximum Likelihood from Incomplete Data Via EM Algorithm

Article

Sep 1977

S ummary A broadly applicable algorithm for computing maximum likelihood estimates from incomplete data is presented at various levels of generality. Theory showing the monotone behaviour of the likelihood and convergence of the algorithm is derived. Many examples are sketched, including missing value situations, applications to grouped, censored or truncated data, finite mixture models, variance component estimation, hyperparameter estimation, iteratively reweighted least squares and factor analysis.

Analysis of Long-Running Replicated Systems

Conference Paper

Apr 2006

We address the problem of using replication to reliably maintain state in a distributed system for time spans that far exceed the lifetimes of individual replicas. This scenario is relevant for any system comprised of a potentially large and selectable number of replicated components, each of which may be highly unreliable, where the goal is to have enough replicas to keep the system "alive" (meaning at least one replica is working or available) for a certain expected period of time, i.e., the system's lifetime. In particular, this applies to recent efforts to build highly available storage systems based on the peer-to-peer paradigm. We model notions of replica loss and replica repair in such systems by a simple Markov chain model, and derive an expression for the lifetime of the replicated state. We then apply this model to study the impact of practical considerations like storage and bandwidth limits on the system, and describe methods to optimally choose system parameters so as to maximize lifetime. Our analysis sheds light on the efficacy of various replication strategies.

Simulation Analysis of Download and Recovery Processes in P2P Storage Systems

Abstract and Figures

Recommended publications

Evaluation of a patient cytochemical status using nonparametric statistical methods

Precise and Statistical Studies of Morphology, Physiology and Taxonomy on Absidia. Part: Morphology,...

Robust nonparametric statistical methods. Thomas P. Hettmansperger and Joseph McKean, Arnold/Wiley,...

Robust Nonparametric Statistical Methods. A simple and powerful test for autocorrelated errors in OL...