ArticlePDF Available

Quantifying Path Exploration in the Internet

April 2009
IEEE/ACM Transactions on Networking 17(2):445-458

April 2009
17(2):445-458

DOI:10.1109/TNET.2009.2016390

Source
DBLP

Authors:

Beichuan Zhang

The University of Arizona

Lixia Zhang

University of California, Los Angeles

Previous measurement studies have shown the existence of path exploration and slow convergence in the global Internet routing system, and a number of protocol enhancements have been proposed to remedy the problem. However, existing measurements were conducted only over a small number of testing prefixes. There has been no systematic study to quantify the pervasiveness of Border Gateway Protocol (BGP) slow convergence in the operational Internet, nor any known effort to deploy any of the proposed solutions. In this paper, we present our measurement results that identify BGP slow convergence events across the entire global routing table. Our data shows that the severity of path exploration and slow convergence varies depending on where prefixes are originated and where the observations are made in the Internet routing hierarchy. In general, routers in tier-1 Internet service providers (ISPs) observe less path exploration, hence they experience shorter convergence delays than routers in edge ASs; prefixes originated from tier-1 ISPs also experience less path exploration than those originated from edge ASs. Furthermore, our data show that the convergence time of route fail-over events is similar to that of new route announcements and is significantly shorter than that of route failures. This observation is contrary to the widely held view from previous experiments but confirms our earlier analytical results. Our effort also led to the development of a path-preference inference method based on the path usage time, which can be used by future studies of BGP dynamics.

Path exploration triggered by a fail-down event.

…

CCDF of interarrival times of BGP updates for the eight beacon prefixes as observed from the 50 monitors.

…

Difference in number of events per [monitor,prefix] for and 8 min, relatively to min, during one-month period.

Usage time per ASPATH-Prefix for router 12.0.1.63, January 2006.

…

Figures - uploaded by Lixia Zhang

Content may be subject to copyright.

Content uploaded by Lixia Zhang

Content may be subject to copyright.

IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009 445

Quantifying Path Exploration in the Internet

Ricardo Oliveira, Member, IEEE, Beichuan Zhang, Dan Pei, and Lixia Zhang

Abstract—Previous measurement studies have shown the exis-

tence of path exploration and slow convergence in the global In-

ternet routing system, and a number of protocol enhancements

have been proposed to remedy the problem. However, existing mea-

surements were conducted only over a small number of testing pre-

ﬁxes. There has been no systematic study to quantify the perva-

siveness of Border Gateway Protocol (BGP) slow convergence in

the operational Internet, nor any known effort to deploy any of the

proposed solutions.

In this paper, we present our measurement results that identify

BGP slow convergence events across the entire global routing table.

Our data shows that the severity of path exploration and slow con-

vergence varies depending on where preﬁxes are originated and

where the observations are made in the Internet routing hierarchy.

In general, routers in tier-1 Internet service providers (ISPs) ob-

serve less path exploration, hence they experience shorter conver-

gence delays than routers in edge ASs; preﬁxes originated from

tier-1 ISPs also experience less path exploration than those origi-

nated from edge ASs. Furthermore, our data show that the conver-

gence time of route fail-over events is similar to that of new route

announcements and is signiﬁcantly shorter than that of route fail-

ures. This observation is contrary to the widely held view from pre-

vious experiments but conﬁrms our earlier analytical results. Our

effort also led to the development of a path-preference inference

method based on the path usage time, which can be used by future

studies of BGP dynamics.

Index Terms—AS topology completeness, Border Gateway Pro-

tocol (BGP), inter-domain routing, Internet topology.

I. INTRODUCTION

THE Border Gateway Protocol (BGP) is the routing pro-

tocol used in the global Internet. A number of previous

analytical and measurement studies [1]–[3] have shown the ex-

istence of BGP path exploration and slow convergence in the

operational Internet routing system, which can potentially lead

to severe performance problems in data delivery. Path explo-

ration suggests that, in response to path failures or routing policy

changes, some BGP routers may try a number of transient paths

Manuscript received November 29, 2006; revised September 19, 2007; ap-

proved by IEEE/ACM TRANSACTIONS ON NETWORKING Editor Z.-L. Zhang.

Current version published April 15, 2009. This material is based upon work

supported by the Defense Advanced Research Projects Agency (DARPA) under

Contract N66001-04-1-8926 and by the National Science Foundation (NSF)

under Contract ANI-0221453. Any opinions, ﬁndings, and conclusions or rec-

ommendations expressed in this material are those of the authors and do not

necessarily reﬂect the views of the DARPA or NSF.

R. Oliveira and L. Zhang are with the Computer Science Department, Univer-

sity of California, Los Angeles, CA 90095 USA (e-mail: rveloso@cs.ucla.edu;

lixia@cs.ucla.edu).

B. Zhang is with the Computer Science Department, University of Arizona,

Tucson, AZ 85721 USA (e-mail: bzhang@arizona.edu).

D. Pei is with AT&T Labs–Research, Florham Park, NJ 07932 USA (e-mail:

peidan@research.att.com).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TNET.2009.2016390

Fig. 1. Path exploration triggered by a fail-down event.

before selecting a new best path or declaring unreachability to

a destination. Consequently, a long time period may elapse be-

fore the whole network eventually converges to the ﬁnal deci-

sion, resulting in slow routing convergence. An example of path

exploration is depicted in Fig. 1, where node C’s original path

to node E (path 1) fails due to the failure of link D–E. C reacts

to the failure by attempting two alternative paths (paths 2 and 3)

before it ﬁnally gives up. The experiments in [1]–[3] show that

some BGP routers can spend up to several minutes exploring a

large number of alternate paths before declaring a destination

unreachable.

The analytical models used in the previous studies tend to rep-

resent worst case scenarios of path exploration [1], [2], and the

measurement studies have all been based on controlled exper-

iments with a small number of beacon preﬁxes. In the Internet

operational community, there exist various different views re-

garding whether BGP path exploration and slow convergence

represent a signiﬁcant threat to the network performance, or

whether the severity of the problem, as shown in simulations

and controlled experiments, would be rather rare in practice. A

systematic study is needed to quantify the pervasiveness and sig-

niﬁcance of BGP slow convergence in the operational routing

system, which is the goal of this paper.

In this paper, we provide measurement results from the BGP

log data collected by RouteViews [4] and RIPE [5]. For all the

destination preﬁxes announced in the Internet, we cluster their

BGP updates into routing events and classify the events into

different convergence classes. We then characterize path explo-

ration and convergence time of each class of events. The results

reported in this paper are obtained from BGP logs of January

and February 2006, which are representative of data we have

examined during other time periods. The main contributions of

this paper are summarized as follows.

• We provide the ﬁrst quantitative assessment on path explo-

rations for the entire Internet destination preﬁxes. Our re-

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

446 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

sults conﬁrmed the wide existence of path exploration and

slow convergence in the Internet but also revealed that the

extent of the problem depends on where a preﬁx is orig-

inated and where the observation is made in the Internet

routing hierarchy. When observed from a top-tier Internet

service provider (ISP), there is relatively little path explo-

ration, and this is especially true when the preﬁxes being

observed are also originated from some other top-tier ISPs.

On the other hand, an observer in an edge network is likely

to notice a much higher degree of path exploration and slow

convergence, especially when the preﬁxes being observed

are originated from other edge networks. In other words,

the existing different opinions on the extent of path explo-

ration and slow convergence may be a reﬂection of where

one takes measurement and which preﬁxes are being ex-

amined.

• We provide the ﬁrst measurement and analysis on the

convergence times of route change events in the entire

operational Internet. Our results show that route fail-over

events, where the paths move from shorter or more pre-

ferred ones to longer or less preferred ones, has much

shorter convergence time than route failure events, where

the destinations become unreachable. Moreover, we ﬁnd

that, on average, the durations of various route conver-

gence events take the following order: Among all routing

events, those moving from longer or less preferred to

shorter or more preferred paths, symbolically denoted

as events, have the shortest convergence delay,

which are closely followed by new preﬁx announcements

(denoted as event), which in turn have similar con-

vergence delay as the routing events of moving from

shorter to longer paths (denoted as ). Finally, route

failure events, denoted as , have a substantially

longer delay than all the above events. In short, we have

regarding their con-

vergence delays. Note that is signiﬁcantly shorter

than , which is a noticeable departure from widely

accepted views based on the previous “worst-case” exper-

iments [1] but is in accordance to our previous theoretical

analysis results presented in [6].

• A major challenge in our data analysis is how to differ-

entiate and events, which requires knowing

routers’ path preferences. We have developed a new path

ranking algorithm to infer relative preference of each path

among all the alternative paths to the same destination

preﬁx. We believe that our path ranking algorithm can be

of useful in many other BGP data analysis studies.

The rest of the paper is organized as follows. Section II de-

scribes our general methodology and data set, where we develop

a path ranking algorithm to classify events into different types.

We analyze the extent of path exploration and slow convergence

for each type of events in Sections III and IV. Section V dis-

cusses related work, and Section VI concludes the paper.

II. METHODOLOGY AND DATA SET

Previous measurement results on BGP slow convergence

were obtained through controlled experiments. In these exper-

iments, a small number of “beacon” preﬁxes are periodically

announced and withdrawn by their origin ASs at ﬁxed time

intervals [7], [8], and the resulting routing updates are collected

at remote monitoring routers and analyzed. In addition, to gen-

erate announcements and withdrawals ( and events),

one can also use a beacon preﬁx to generate events by

doing AS prepending [1]. For a given beacon preﬁx, because

one knows exactly what, when, and where is the root cause of

each routing update, one can easily measure the routing con-

vergence time by calculating the difference between when the

root cause is triggered and when the last update due to the same

root cause is observed. Although routing updates for beacon

preﬁxes may also be generated by unexpected path changes in

the network, those updates can be clearly identiﬁed through

the use of anchor preﬁxes, as explained later in this section.

Unfortunately, one cannot assess the overall Internet routing

performance from observing the small number of existing

beacon preﬁxes.

Our observation of routing dynamics is based on a set of

routers, termed monitors, that propagate their routing table up-

dates to collector boxes, which store them in disks (e.g. Route-

Views [4]). To obtain a comprehensive understanding of BGP

path explorations in the operational Internet, we ﬁrst cluster

routing updates from the same monitor and for the same preﬁx

into events, sort all the routing events into several classes, and

then measure the duration and number of paths explored for

each class of events. Our task is signiﬁcantly more difﬁcult than

measuring the convergence delay of beacon preﬁxes for the fol-

lowing reasons. First, there is no easy way to tell whether a se-

quence of routing updates is due to the same or different root

causes in order to properly group them into events. Second, upon

receiving an update for a preﬁx, one cannot tell what is the root

cause of the update, as is the case with beacon preﬁxes. Further-

more, when the path to a given destination preﬁx changes, it is

difﬁcult to determine whether the new path is a more, or less,

preferred path compared to the previous one, i.e. whether the

preﬁx experiences a or a event in our event classi-

ﬁcation.

To address the above problems, we take advantage of beacon

updates to develop and calibrate effective heuristics and then

apply them to all the preﬁxes. In the rest of this section, we ﬁrst

describe our data set, then discuss how we use beacon updates to

validate a timer-based mechanism for grouping routing updates

into events and how we use beacon updates to develop a usage-

based path ranking method, which is then used in our routing

event classiﬁcations.

A. Data Set and Preprocessing

To develop and calibrate our update grouping and path

ranking heuristics, we used eight BGP beacons, one from PSG

[7] (psg01), the other seven from RIPE [8] (rrc01,rrc03,rrc05,

rrc07,rrc10,rrc11 and rrc12). All eight beacon preﬁxes are an-

nounced and withdrawn alternately every 2 h. We preprocessed

the beacon updates following the methods developed in [3].

First, we removed from the update stream all the duplicate up-

dates, as well as the updates that differ only in COMMUNITY

or MED attribute values because they are usually caused by

internal dynamics inside the last-hop AS. Second, we used the

anchor preﬁx of each beacon to detect routing changes other

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

OLIVEIRA et al.: QUANTIFYING PATH EXPLORATION IN THE INTERNET 447

than those generated by the beacon origins. An anchor preﬁx is

a separate preﬁx announced by a beacon preﬁx’s origin AS and

is never withdrawn after its announcement. Thus, it serves as

a calibration point to identify routing events that are not orig-

inated by the beacon injection/removal mechanism. Because

the anchor preﬁx shares the same origin AS, and hopefully the

same routing path, with the beacon preﬁx, any routing changes

that are not associated with the beacon mechanism will trigger

routing updates for both the anchor and the beacon preﬁxes. To

remove all beacon updates triggered by such unexpected routing

events, for each anchor preﬁx update at time , we ignore all

beacon updates during the time window .We

set ’s value to 5 min, as the results reported in [3] show that

the number of beacon updates remains more or less constant

for min. After the above two steps of preprocessing,

beacon updates are mainly comprised of those triggered by the

scheduled beacon activity at the origin ASs.

To assess the degree of path exploration for all the preﬁxes in

the global routing table, we used the public BGP data collected

from 50 full-table monitoring points by RIPE [5] and Route-

Views [4] collectors during the months of January and February

2006. We used the data from January to evaluate the different

path comparison metrics, and we later analyzed the events in

both months. We removed from the data all the updates that were

caused by BGP session resets between the collectors and the

monitors, using the minimum collection time method described

in [9]. Those updates correspond to BGP routing table transfers

between the collectors and the monitors, and therefore should

not be accounted in our study of the convergence process.

The 50 monitors were chosen based on the fact that each of

them provided full routing tables and continuous routing data

during our measurement period. One month was chosen as our

measurement period based on the assumption that ISPs are un-

likely to make many changes of their interconnectivity within a

one-month period, so we can assume the AS level topology did

not change much over our measurement time period, an assump-

tion that is used in our AS path comparison later in the paper.

B. Clustering Updates Into Events

Some of the previous BGP data analysis studies [10]–[12] de-

veloped a timer-based approach to cluster routing updates into

events. Based on the observation that BGP updates come in

bursts, two adjacent updates for the same preﬁx are assumed to

be due to the same routing event if they are separated by a time

interval less than a threshold . A critical step in taking this

approach is to ﬁnd an appropriate value for . A value that is

too high can incorrectly group multiple events into one. On the

other hand, a value that is too low may divide a single event into

multiple ones. Since the root causes of beacon routing events are

known, and the beacon update streams contain little noise after

the preprocessing, we use beacon preﬁxes to ﬁnd an appropriate

value for .

Fig. 2 shows the distribution of update interarrival times of

the eight beacon preﬁxes as observed from the 50 monitors. All

the curves start ﬂattening out either before or around 4 min (the

vertical line in the ﬁgure). If we use 4 min as the threshold value

to separate updates into different events, i.e. min, in

the worst case (rrc01 beacon) we incorrectly group about 8%

Fig. 2. CCDF of interarrival times of BGP updates for the eight beacon preﬁxes

as observed from the 50 monitors.

Fig. 3. Difference in number of events per [monitor,preﬁx] for and

8 min, relatively to min, during one-month period.

of messages of the same event into different events; this cor-

responds to the interarrival time difference between the cutting

point of the rrc01 curve at 4 min and the horizontal tail of the

curve. The tail drop of all the curves at 7200 s corresponds to

the 2-h interval between the scheduled beacon preﬁx activities.1

Although the data for the beacon updates suggests that a

threshold of min may work well for grouping up-

dates into events, no single value of would be a perfect

ﬁt for all the preﬁxes and all the monitors. Thus, we need to

assess how sensitive our results may be with the choice of

min. Fig. 3 compares the result of using min

with that of min and min for clustering the

updates of all the preﬁxes collected from all the 50 monitors

during our one-month measurement period. Let

be the number of events identiﬁed by monitor for preﬁx

1The psg01 curve reaches a plateau earlier than the other curves, indicating

that it suffers less from slow routing convergence. However, one may note its

absence of update interarrivals between 100 and 3600 s, followed by a high

number of interarrivals around 3600 s. As hinted in [3], this behavior could be

explained by BGP’s route ﬂat damping, and 1 h is the default maximum sup-

pression time applied to an unstable preﬁx when its announcement goes through

a router that enforces BGP damping.

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

448 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

Fig. 4. Event taxonomy.

using min. and are similarly

deﬁned but with min and min respectively.

Fig. 3 shows the distribution of and

, which reﬂects the impact of using

a higher or lower timeout value, respectively. As one can see

from the ﬁgure, in about 50% of the cases, the three different

values result in the same number of events, and in more

than 80% of the cases, the results from using the different

values differ by at most two events. Based on the data, we can

conclude that the result of event clustering is insensitive to the

choice of min. This observation is also consistent with

previous work. For example, [12] experimented with various

timeout threshold values between 2 and 16 min and found no

signiﬁcant difference in the clustering results. In the rest of the

paper, we use min.

C. Classifying Routing Events

After the routing updates are grouped into events, we clas-

sify the events into different types based on the effect that each

event has on the routing path. Let us consider two consecutive

events and for the same preﬁx observed by the same

monitor. We deﬁne the path in the last update of event as the

ending path of event , which is also the starting path for event

. Let and denote an event’s starting and ending

paths, respectively, and denote the path in a withdrawal mes-

sage (representing an empty path). If the last update in an event

is a withdrawal, we have . Based on the relation be-

tween and of each event, we classify all the routing

events into one of the following categories, as shown in Fig. 4.2

1) Same Path : A routing event is classiﬁed as a

if its , and every update in the event

reports the same AS path as , although they may

differ in some other BGP attribute such as MED or COM-

MUNITY value. events typically reﬂect the routing

dynamics inside the monitor’s AS.

2) Path Disturbance : A routing event is classiﬁed as

if its and at least one update in the

event carries a different AS path. In other words, the AS

path is the same before and after the event, with some tran-

sient change(s) during the event. events are likely re-

sulted from multiple root causes, such as a transient failure

closely followed by a quick recovery, hence the name of

the event type. When multiple root causes occur closely

in time, the updates they produce also follow each other

2To establish a valid starting state, we initialize for each (mon-

itor,preﬁx) pair with the path extracted from the routing table of the corre-

sponding monitor.

very closely, and no timeout value would be able to accu-

rately separate them out by the root causes. In our study,

we identify these events but do not include them in

the convergence analysis.

3) Path Change: A routing event is classiﬁed as a path change

if its . In other words, the paths before and

after the event are different. Path change events are further

classiﬁed into ﬁve categories based on whether the des-

tination becomes available or unavailable, or changed to

a more preferred or less preferred path, at the end of the

event. Let represent a router’s preference of path

, with a higher value representing a higher preference.

•: A routing event is classiﬁed as a if its

. A previously unreachable destination becomes reach-

able through path by the end of the event.

• : A routing event is classiﬁed as if its

. That is, a previously reachable destination

becomes unreachable by the end of the event.

•: A routing event is classiﬁed as if its

, , and ,

indicating a reachable destination has changed the path

to a more preferred one by the end of the event.

•: A routing event is classiﬁed as a event if its

, , and ,

indicating a reachable destination has changed the path

to a less preferred one by the end of the event.

•: A routing event is classiﬁed as if its

, , and .

That is, a reachable destination has changed the path by

the end of the event, but the starting and ending paths

have the same preference.

A major challenge in event classiﬁcation is how to differ-

entiate between and events, a task that requires

judging the relative preference between two given paths. Indi-

vidual routers use locally conﬁgured routing policies to choose

the most preferred path among available ones. Because we do

not have precise knowledge of the routing policies, we must de-

rive effective heuristics to infer a routers’ path preference. It is

possible that our heuristics label two paths with equal prefer-

ence, in which case the event will be classiﬁed as . How-

ever, a good path-ranking heuristic should minimize such ambi-

guity.

D. Comparing AS Paths

If a routing event has nonempty and , then the rela-

tive preference between and determines whether the

event is a or . In the controlled experiments using

beacon preﬁxes, one can create such events by manipulating AS

paths. For example in [1], AS paths with length up to 30 AS hops

were used to simulate events.

However, in general there has been no good way to infer

routers’ preferences among multiple available AS paths to the

same destination. Given a set of available paths, a BGP router

chooses the most preferred one through a decision process.

During this process, the router usually considers several factors

in the following order: local preference (which reﬂects the

local routing policy conﬁguration), AS path length, the MED

attribute value, IGP cost, and tie-breaking rules. Some of the

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

OLIVEIRA et al.: QUANTIFYING PATH EXPLORATION IN THE INTERNET 449

Fig. 5. Usage time per ASPATH-Preﬁx for router 12.0.1.63, January 2006.

previous efforts in estimating path preference tried to emulate a

BGP router’s decision process to various degrees. For example,

[1], [2], and [12] used path length only. Because BGP is not

a shortest-path routing protocol, however, it is known that the

most preferred BGP paths are not always the shortest paths. In

addition, there often exist multiple shortest paths with equal

AS hop lengths. There are also a number of other efforts in

inferring AS relationship and routing policies. However, as we

will show later in this section, none of the existing approaches

signiﬁcantly improves the inference accuracy.

To infer path preference with a high accuracy for our event

classiﬁcation, we took a different approach from all the previous

studies. Instead of emulating the router’s decision process, we

propose to look at the end result of the router’s decision: the

usage time of each path. The usage time is deﬁned as the cumu-

lative duration of time that a path remains in the router’s routing

table for each destination (or preﬁx). Assuming that the Internet

routing is relatively stable most of the time and failures are

recovered promptly, then most preferred paths should be used

most and thus remain in the routing table for the longest time.

Given our study period is only one month, it is unlikely that sig-

niﬁcant changes happened to routing policies and/or ISP peering

connections in the Internet during this time period. Thus, we

conjecture that relative preferences of routing paths remained

stable for most, if not all, the destinations during our study

period. Fig. 5 shows the path usage time distribution for the

monitor with IP address 12.0.1.63 (AT&T). The total number

of distinct ASPATH-preﬁx pairs that appeared in this router’s

routing table during the month is slightly less then 650 000 (cor-

responding to about 190 000 preﬁxes). About 23% of the AS-

PATH-preﬁx pairs (the 150 000 on the left side of the curve)

stayed in the table for the entire measurement period, and about

500 000 ASPATH-preﬁx pairs appeared in the routing table for

only a fraction of the period, ranging from a few days to some

small number of seconds.

We compare this new Usage Time-based approach with three

other existing methods for inferring path preference: Length,

Policy, and Policy+Length.Usage Time uses the usage time

to rank paths. Length infers path preference according the AS

path length. Policy is derives path preference based on inferred

Fig. 6. Validation of path-preference metric.

inter-AS relationships. We used the algorithm developed in

[13] to classify the relationships between ASs into customer,

provider, peer, and sibling. A path that goes through a customer

is preferred over a path that goes through a peer, which is pre-

ferred over a path that goes through a provider.3Policy+Length

infers path preference by using the policies ﬁrst and then using

AS length for those paths that have the same AS relationship.

One challeng-e in conducting this comparison is how to

verify the path-ranking results without knowing the router’s

routing policy conﬁgurations. We tackle this problem by lever-

aging our understanding about and events. During

events, routers explore multiple paths in the order of de-

creasing preference; during events, routers explore paths in

the order of increasing preference. Since we can identify

and events fairly accurately, we can use the information

learned from these events to verify the results from different

path-ranking methods.

In an ideal scenario where paths explored during a (or

) event follow a monotonically decreasing (or increasing)

preference order, we can take samples of every consecutive pair

of routing updates and rank-order the paths they carried. How-

ever, due to the difference in update timing and propagation de-

lays along different paths, the monotonicity does not hold true

all the time. For example, we observed path withdrawals ap-

pearing in the middle of update sequences during events.

Therefore, instead of comparing the AS paths carried in adjacent

updates during a routing event, we compare the paths occurred

during an event with the stable path used either before or after

the event. Fig. 6 shows our procedure in detail. All the updates

in the ﬁgure are for the same preﬁx . Before the event

occurs, the router does not have any route to reach . The ﬁrst

four updates are clustered into a event that stabilizes with

path . After is in use for some period of time, the preﬁx

becomes unreachable. During the event, paths and

are tried before the ﬁnal withdrawal update. From this ex-

ample, we can extract the following pairs of path preference:

, ,

, , and .

After extracting path preference pairs from and

events, we apply the four path-ranking methods in comparison

to the same set of routing updates and see whether they produce

the same path-ranking results as we derived from and

events. We keep three counters , , and for

each method. For instance, in the example of Fig. 6, if a method

results in and being worse than , and having the

3We ignore those cases in which we could not establish the policy relation

between two ASs. Such cases happened in less than 1% of the total paths.

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

450 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

Fig. 7. Comparison between , , and of Length,Policy,

and Usage Time metrics for (a) and (b) events of beacon preﬁxes.

same preference of (equal), then for the event we have

, , and . Likewise, for the

event, if a method results in being better than and

being equal to , then we have , ,

and . To quantify the accuracy of different inference

methods, we deﬁne

. We use as a measure of accuracy in our com-

parison.

To compare the four different path-ranking methods, we ﬁrst

applied them to our beacon data set, which contains updates

generated by and events, and computed the values of

, , and for each of the four methods. Fig. 7

shows the result. As one can see from the ﬁgure, Length works

very well in ranking paths explored during events, giving

93% correct cases and 5% equal cases. However, it performs

much worse in ranking the paths explored during events,

producing 40% correct cases and 40% wrong cases. During

events, many “invalid” paths are explored and they are

very likely to be longer than the stable path. However, during

events, only “valid” paths are explored, and their prefer-

ences are not necessarily based on their path lengths.

Policy performs roughly equally for ranking paths during

and events. It does not make many wrong choices

but produces a large number of equal cases (around 70% of

the total). This demonstrates that the inferred AS relationship

Fig. 8. Comparison between accuracy of Length,Policy, and Usage Time met-

rics.

and routing policies provide insufﬁcient information for path

ranking. They do not take into account many details—such

as trafﬁc engineering, AS internal routing metric, etc.—that

affect actual routes being used. Compared with Length,

Policy+Length has a slightly worse performance with

events and a moderate improvement with events. Our

observations are consistent with a recent study that concludes

that per-AS relationships is not ﬁne-grained enough to compute

routing paths correctly [14].

Usage Time works surprisingly well and outperforms the

other three in both and events. Its is about

96.3% in and 99.4% in events. Its value

is 0 in both and events. This is because we are

measuring the path usage time using the unit of second, which

effectively puts all the paths in strict rank order. We also notice

that for events, about 3.7% of the comparisons are wrong,

whereas for events this number is as low as 0.6%. We

believe this noticeable percentage of wrong comparisons in

events is due to path changes caused by topological changes,

such as a new link established between two ASs as a result of a

customer switching to a new provider. Because the new paths

have low usage time, our Usage Time-based inference will give

them a low rank, although these paths are actually the preferred

ones. Nevertheless, the data conﬁrmed our earlier assumption

that, during our one-month measurement period, there were

no signiﬁcant changes in Internet topology or routing polices.

Otherwise, we would have seen a much higher percentage of

wrong cases produced by Usage Time.

We now examine how the value of varies between

different monitors under each of the four path-ranking methods.

Fig. 8 shows the distribution of for different methods,

with X-axis representing the monitors sorted in decreasing order

of their value. The value of for each monitor is

calculated over all the and events in our beacon data

set. When using the path usage time for path ranking, we ob-

serve an accuracy between 84% and 100% across all the moni-

tors, whereas with using path length for ranking, we observe the

value can be as low as 31% for some monitor. Using

policy for path ranking leads to even lower values.

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

OLIVEIRA et al.: QUANTIFYING PATH EXPLORATION IN THE INTERNET 451

Fig. 9. Number of events per monitor.

After we developed and calibrated the usage-time-based path-

ranking method using beacon updates, we applied the method,

together with the other three, to the BGP updates for all the

preﬁxes collected from all the 50 monitors, and we obtained the

results similar to that from the beacon update set. Considering

the aggregate of all monitors and all preﬁxes, is 17%

for Policy, 65% for Length, 73% for Policy+Length, and 96.5%

for Usage Time. Thus, we believe usage time works very well

for our purpose and use it throughout our study.

To the best of our knowledge, we are the ﬁrst to propose the

method of using usage time to infer relative path preference.

We believe this new method can be used for many other studies

on BGP routing dynamics. For example, [12] pointed out that

if after a routing event, the stable path is switched from P1 to

P2, the root cause of the event should lie on the better path of

the two. The study used length-only in their path ranking, and

the root cause inference algorithm produced a mixed result. Our

result shows that using length for path ranking gives only about

65% accuracy, and usage time can give more than 96% accuracy.

Using usage time to rank path can potentially improve the results

of the root-cause inference scheme proposed in [12].

III. CHARACTERIZING EVENTS

After applying the classiﬁcation algorithm to BGP data, we

count the number of events observed by each monitor as

a sanity check. A event means that a previously reachable

preﬁx becomes unreachable, suggesting that the root cause of

the failure is very likely at the AS that originates the preﬁx and

should be observed by all the monitors. Therefore, we expect

every monitor to observe roughly the same number of

events. Fig. 9 shows the number of events seen by each

monitor. Most monitors observe a similar number of

events, but there are also a few outliers that observe either too

many or too few events. Too many events can

be due to failures that are close to monitors and partition the

monitors from the rest of the Internet or underestimation of

the relative timeout used to cluster updates. Too few

events can be due to missing data during monitor downtime

or overestimation of the relative timeout . In order to keep

consistency among all monitors, we decided to exclude the

TABLE I

EVENT STATISTICS FOR JANUARY 2006 (31 DAYS)

TABLE II

EVENT STATISTICS FOR FEBRUARY 2006 (28 DAYS)

Fig. 10. Duration of events for January 2006.

head and tail of the distribution, reducing the data set to 32

monitors.

Now we examine the results of event classiﬁcation. Tables I

and II show the statistics for January and February respectively

for each event class, including the total number of events, theav-

erage event duration, the average number of updates per event,

and the average number of unique paths explored per event. We

exclude events from the table since their percentage is

negligible. Comparing the results from the two months, we note

that the values are very close, as can also be observed by com-

paring the distribution of event duration on Figs. 10 and 11.

Given this similarity, we will base our following analysis on Jan-

uary data, although the same observations apply to February.

There are three observations. First, the three high-level event

categories in Fig. 4 have approximately the same number of

events: Path-Change events are about 36% of all the events,

Same-Path 34%, and Path-Disturbance 30%. Breaking down

Path-Change events, we see that the number of balances

that of , and the number of balances that of . This

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

452 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

Fig. 11. Duration of events for February 2006.

Fig. 12. Number of updates per event, January 2006.

Fig. 13. Number of unique paths explored per event, January 2006.

makes sense since failures are recovered with events,

and failures are recovered with events.

Second, the average duration of different types of events can

be ordered as follows:

.4Fig. 10 shows the distributions of event dura-

tions,5which also follow the same order. Note that the shape of

the curves is stepwise with jumps at multiples of around 26.5 s.

The next section will explain that this is due to the MinRouteAd-

vertisementInterval (MRAI) timer, which controls the interval

between consecutive updates sent by a router. The default range

of MRAI timer has the average value of 26.5 s, making events

last for multiples of this value. Table I also shows that

events have the longest duration and most updates and explore

the most unique paths. This suggests that likely contains

two events very close in time, e.g., alink failure followed shortly

by its recovery. A study [15] on network failures inside a tier-1

provider revealed that about 90% of the failures on high-failure

links take less than 3 min to recover, while 50% of optical-re-

lated failures take less than 3.5 min to recover. Therefore, there

are many short-lived network failures, and they can very well

generate routing events like . On the other hand,

events are much shorter and have less updates. It is because that

is likely due to routing changes inside the AS hosting

the monitor and, thus, does not involve interdomain path explo-

ration.

Third, among the path changing events, events last the

longest, have the most updates, and explore the most unique

paths. Figs. 10, 12, and 13 show the distributions of event dura-

tion, number of updates per event, and number of unique paths

explored per event, respectively. The results show that route

fail-down events last considerably longer than route

fail-over events . In fact, Fig. 10 shows that about 60% of

events have duration of zero, while 50% of events

last more than 80 s. In addition, Fig. 12 shows that about 60% of

events have only one update, while about 70% of

events have three or more updates. Fig. 13 shows that

explore more unique paths than . These results are in ac-

cordance with our previous analytical results in [6] but con-

trary to the results of previous measurement work [2], which

concluded that the duration of events is similar to that of

and longer than that of and . In [6], we showed

that the upper bound of convergence time is proportional

to , where is the MRAI timer value, is the path

length of to the destination after the event, and is the distance

from the failure location to the destination. Since is typically

small for most Internet paths, and could be anywhere between

0 and , the duration of most events should be short. We

believe that the main reason [2] reached a different conclusion is

because they conducted measurements by artiﬁcially increasing

to 30 AS hops using AS prepending. The analysis in [6] shows

that an overestimate of would result in a longer con-

vergence time, which would explain why they observed longer

durations for beacon preﬁxes than what we observed for opera-

tional preﬁxes.

A. The Impact of Unstable Preﬁxes

So far we have been treating all destination preﬁxes in the

same way by aggregating them in a single set in our measure-

ments. However, previous work [10] showed that most routing

4The order of and average durations invert on February 2006,

even though the values remain very close to each other.

5The curve is omitted from the ﬁgure for clarity.

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

OLIVEIRA et al.: QUANTIFYING PATH EXPLORATION IN THE INTERNET 453

Fig. 14. Duration of events for unstable preﬁxes, January 2006.

Fig. 15. Duration of events for stable preﬁxes, January 2006.

instabilities affect a small number of unstable preﬁxes, and pop-

ular destinations (with high trafﬁc volume) are usually stable.

Therefore, it might be the case that the results we just described

are biased toward those unstable preﬁxes since these preﬁxes are

associated with more events. In order to verify if this is the case,

we classify each preﬁx into one of two classes based on the

number of events associated with it. If we let be the median

of the distribution of the number of events per preﬁx , then

we can classify each preﬁx in 1) unstable if , or 2)

stable if . From the 205 980 preﬁxes in our set, only

28 954 (or 14%) were classiﬁed as unstable, i.e. 14% of pre-

ﬁxes were responsible for 50% of events. In Figs. 14 and 15, we

show the distribution of event duration for unstable and stable

preﬁxes, respectively. Note that not only are these two distribu-

tions very similar, but they are also very close to the original

distribution of the aggregate in Fig. 10. Based on these obser-

vations, we believe there is no sensitive bias in the aggregated

results shown before.

IV. POLICIES,TOPOLOGY AND ROUTING CONVERGENCE

In this section, we compare the extent of slow convergence

across different preﬁxes and different monitors to examine the

impacts of routing polices and topology on slow convergence.

Fig. 16. Determining MRAI conﬁguration.

A. MRAI Timer

In order to make fair comparisons of slow convergence ob-

served by different monitors, we need to be able to tell whether

a monitor enables MRAI timer or not. The BGP speciﬁcation

(RFC 4271 [16]) deﬁnes the MRAI as the minimum amount

of time that must elapse between two consecutive updates sent

by a router regarding the same destination preﬁx. Lacking

MRAI timer may lead to signiﬁcantly more update messages

and longer global convergence time [17]. Even though it is

a recommended practice to enable the MRAI timer, not all

routers are conﬁgured this way. Since MRAI timer will affect

observed event duration and number of updates, for the pur-

pose of studying impacts of policies and topology, we should

only make comparisons among MRAI monitors, or among

non-MRAI monitors, but not between MRAI and non-MRAI

monitors.

By default, the MRAI timer is set to 30 s plus a jitter to avoid

unwanted synchronization. The amount of jitter is determined

by multiplying the base value (e.g., 30 s) by a random factor

that is uniformly distributed in the range [0.75, 1]. Assuming

routers are conﬁgured with the default MRAI values, we should

1) not observe consecutive updates spaced by less than

s for the same destination preﬁx, and 2) observe

a considerable amount of interarrival times between 22.5 and

30 s, centered around the expected value,

For each monitor, we deﬁne a Non-MRAI Likelihood,,

as the probability of ﬁnding consecutive updates for the same

preﬁx spaced by less than 22 s. Fig. 16 shows for all the

50 monitors in our initial set. Clearly, there are monitors with

very high and monitors with very small . The curve has

a sharp turn, hinting a major conﬁguration change. Based on

this, we decided to set as a threshold to differen-

tiate MRAI and non-MRAI monitors. Those with

are classiﬁed as MRAI monitors, and those with

are classiﬁed as non-MRAI monitors. However, there could still

be cases of non-MRAI monitors with MRAI timer conﬁgura-

tion just slightly bellow the RFC recommendation, which would

therefore be excluded using our method. In order to assure this

was not the case, we show in Fig. 16 the curve corresponding

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

454 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

to the probability of ﬁnding consecutive updates spaced by less

than 10 s. We note that the 10-s curve is very close to the 22-s

curve, and therefore we are effectively only excluding monitors

that depart signiﬁcantly from the 30-s base value of the RFC.

Using this technique, we detect that 15 routers from the initial

set of 50 are non-MRAI (see the vertical line in Fig. 16), and 10

of them are part of the set of 32 routers we used in the previous

section. We will use this set of monitors for

the next subsection to compare the extent of slow convergence

across monitors.

B. The Impact of Policy and Topology on Routing Convergence

Internet routing is policy-based. The “no-valley” policy [13],

which is based on inter-AS relationships, is the most preva-

lent one in practice. Generally, most ASs have relationships

with their neighbors as provider–customer or peer–peer. In

a provider–customer relationship, the customer AS pays the

provider AS to get access service to the rest of the Internet. In

a peer–peer relationship, the two ASs freely exchange trafﬁc

between their respective customers. As a result, a customer

AS does not forward packets between its two providers, and a

peer–peer link can only be used for trafﬁc between the two in-

cident ASs’ customers. For example, in Fig. 19, paths [C E D],

[C E F], and [C B D] all violate the “no-valley” policy and

generally are not allowed in the Internet.

Based on AS connectivity and relationships, the Internet

routing infrastructure can be viewed as a hierarchy.

•Core: Consisting of a dozen or so tier-1 providers forming

the top level of the hierarchy.

•Middle: ASs that provide transit service but are not part of

the core.

•Edge: Stub ASs that do not provide transit service (they are

customers only).

We collect an Internet AS topology [18], infer inter-AS relation-

ships using the algorithm from [19], and then classify all ASs

into these three tiers. Core ASs are manually selected based on

their connectivity and relationships with other ASs [18], Edge

ASs are those that only appear at the end of AS paths, and the

rest are middle ASs. With this classiﬁcation, we can locate mon-

itors and preﬁx origins with regard to the routing hierarchy.

Our set of 22 monitors consists of four monitors in the core,

15 in the middle and three at the edge. We would like to have

a more representative set of monitors at the edge, but we only

found these many monitors in this class with consistent data

from the RouteViews and RIPE data archive. The results pre-

sented in this subsection might not be quantitatively accurate

due to the limitation of the monitor set, but we believe they still

qualitatively illustrate the impact of monitor location on slow

convergence.

In the previous section, we showed that events have

both the longest convergence time and the most path exploration

from all path change events. Furthermore, in a event, the

root cause of the failure is most likely inside the destination AS,

and thus all monitors should observe the same set of events.

Therefore, the events provide a common base for compar-

ison across monitors and preﬁxes, and the difference between

convergence time and the number of updates should be most

pronounced. In this subsection, we examine how the location of

Fig. 17. Duration of events as seen by monitors at different tiers.

Fig. 18. Number of unique paths explored during as seen by monitors

at different tiers.

preﬁx origins and monitors impact the extent of slow conver-

gence.

Fig. 17 shows the duration of events seen by monitors

in each tier. The order of convergence time is

, and the medians of convergence times are 60, 84, and

84 s for core,middle, and edge, respectively. Taking into ac-

count that our edge monitor ASs are well connected—one has

three providers in the core, and the other two reach the core

within two AS hops—we believe that, in reality, edge will gen-

erally experience even longer convergence times than the values

we measured. Fig. 18 shows that monitors in the middle and at

the edge explore two or more paths in about 60% of the cases,

whereas monitors in the core explore at most one path in about

65% of the cases.

In a event, the monitor will not ﬁnish the convergence

process until it has explored all alternative paths. Therefore,

the event duration depends on the number of alternative paths

between the event origin and the monitor. In general, due to

no-valley policy [13], tier-1 ASs have fewer paths to explore

than lower tier ASs. For example, in Fig. 19, node D (repre-

senting a tier-1 AS) has only one no-valley path to reach node G

(path 4), while node E has three paths to reach the same destina-

tion: paths 1, 2, and 3. In order to reach a destination, tier-1 ASs

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

OLIVEIRA et al.: QUANTIFYING PATH EXPLORATION IN THE INTERNET 455

Fig. 19. Topology example.

Fig. 20. Duration of events observed and originated in different tiers.

can only utilize provider–customer links and peer–peer links to

other tier-1s, but a lower tier AS can also use customer–provider

links and peer–peer links in the middle tier, which leads to more

alternative paths to explore during events.

We have studied how events are experienced by moni-

tors in different tiers. We now study how the origin of the event

impacts the convergence process. Note that we must again di-

vide the results according to the monitor location; otherwise, we

may introduce bias caused by the fact that most of our monitors

are in the middle tier. We use the notation , where is the

tier where the event is originated from and is the tier

of the monitor that observes the event. In our measurements, we

observed that the convergence times of case were close

to the case. Therefore, from these two cases, we will only

show the case where we have a higher percentage of monitors.

For instance, between and cases,

we will only show the latter since our monitor set covers about

27% of the core but only a tiny percentage of the edge. Fig. 20

shows the duration of events for preﬁxes originated and

observed at different tiers. We omit the cases

and for clarity of the ﬁgure since they al-

most overlap with curves and ,

respectively. The ﬁgure shows that the case is the

fastest, and the and cases are the

Fig. 21. Number of paths explored during events observed and origi-

nated in different tiers.

Fig. 22. Median of duration of events observed and originated in dif-

ferent tiers.

slowest. This observation is also conﬁrmed by Fig. 21, which

shows the number of paths explored during . Fig. 22 lists

the median durations of events originated and observed at

different tiers. Events observed by the core have the shortest du-

rations, which conﬁrms our previous observation (see Fig. 17).

Note that the convergence is slightly faster than

the convergence. We believe this happens be-

cause, as mentioned before, our set of edge monitors are very

close to the core. Therefore, they may not observe so much path

exploration as the middle monitors, which may have a number

of additional peer links to reach other edge nodes without going

through the core.

Note that we expect that the case reﬂects most

of the slow routing convergence observed in the Internet because

about 80% of the autonomous systems in the Internet are at the

edge, and about 68% of the events are originated at the

edge, as shown in the next subsection.

C. Origin of Fail-Down Events

We now examine where the events are originated

in the Internet hierarchy. Since we expect the set of

events to be common to all the 32 monitors of our data set

(Section III), we will use in this subsection a single monitor,

the router 144.228.241.81 from Sprint. Note that similar results

are obtained from other monitors.

Because our data set spans a one-month period, we do not

know if during this time there was any high-impact event that

triggered an abnormal number of failures, which could

bias our results if we simply use daily count or hourly count.

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

456 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

Fig. 23. Number of events over time.

TABLE III

EVENTS BY ORIGIN AS

Instead, Fig. 23 plots the cumulative number of events

as observed by the monitor during January 2006, and the time

granularity is second. The cumulative number of events grows

linearly, with an approximate constant number of 3600

events per day. This uniform distribution along the time dimen-

sion seems also to suggest that most fail-down events have a

random nature.

Table III shows the breakdown of events by the tier

from which they are originated. We observe that about 68% of

the events are originated at the edge. However, the edge also an-

nounces a chunk of 56% of the preﬁxes. Therefore, in order to

assess the stability of each tier, and since our identiﬁcation of

events is based on preﬁx, a simple event count is not enough.

A better measure is to divide the number of events originated

at each tier by the total number of preﬁxes originated from that

tier. The row “No. events per preﬁx” in Table III shows that

if the core originates events per preﬁx, the middle originates

and the edge originates such events, yielding the

interesting proportion 1:2:3. This seems to indicate that, gener-

ally, preﬁxes in the middle are twice as unstable as preﬁxes in

the core, and preﬁxes at the edge are three times as unstable as

preﬁxes in the core.

D. Impact of Fail-Down Convergence

The ultimate goal of routing is to deliver data packets. One

may argue that although events have the longest conver-

gence time, they do not make the performance of data delivery

worse because the data packets would be dropped anyway if the

preﬁx is unreachable. However, this is not necessarily true. In

the current Internet, sometimes the same destination network

can be reached via multiple preﬁxes. Therefore, the failure to

Fig. 24. Case where convergence disrupts data delivery.

reach one preﬁx does not necessarily mean that the destination

is unreachable because it may be reachable via another preﬁx.

Fig. 24 shows a typical example. Network A has two

providers, B and C. To improve the availability of its In-

ternet access, A announces preﬁx 131.179/16 via B and preﬁx

131.179.100/24 via C. In this case, 131.179/16 is called the

“covering preﬁx” [20] of 131.179.100/24. As routing is done

by longest preﬁx match, data trafﬁc destined to 131.179.100/24

normally takes link A–C to enter network A. When link A–C

fails, ideally, data trafﬁc should switch to link A–B quickly

with minimal damage to data delivery performance. How-

ever, the failure of link A–C will result in a event for

131.179.100/24. Before the convergence process completes,

routers will keep trying obsolete paths to 131.179.100/24 rather

than switching to paths toward 131.179/16. This can result in

packets lost and long delays, which probably will have serious

negative impacts on data delivery performance.

We analyzed routing tables from RouteViews and RIPE mon-

itors to see how frequent the scenarios illustrated by Fig. 24

are. The result shows that routing announcements like the one

in Fig. 24 are a common practice in the Internet. In the global

routing table, 50% of preﬁxes have covering preﬁxes being an-

nounced through a different provider and are, therefore, vulner-

able to the negative impacts caused by fail-down convergence.

A recent study [21] showed that about 50% of VOIP glitches

as perceived by end users may be caused by BGP slow conver-

gence.

V. R ELATED WORK

There are two types of BGP update characterization work in

the literature: passive measurements [10], [12], [22]–[28] and

active measurements [1]–[3]. The work presented in this paper

belongs to the ﬁrst category. We conducted a systematic mea-

surement to classify routing instability events and quantify path

exploration for all the preﬁxes in the Internet. Our measurement

also showed the impact of AS’s tier level on the extent of path

explorations.

Existing measurements of path exploration and slow con-

vergence have all been based on active measurements [1]–[3],

where controlled events were injected into the Internet from

a small number of beacon sites. These measurement results

demonstrated the existence of BGP path exploration and slow

convergence but did not show to what extent they exist on the

Internet under real operational conditions. In contrast, in this

paper, we classify routing events of all preﬁxes, as opposed

to a small number of beacon sites, into different categories,

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

OLIVEIRA et al.: QUANTIFYING PATH EXPLORATION IN THE INTERNET 457

and for each category we provide measurement results on the

updates per event and event durations. Given we examine the

updates from multiple peers for all the preﬁxes in the global

routing table, we are able to identify the impact of AS tier

levels on path exploration. Regarding the relation between the

tier levels of origin ASs, our results agree with previous active

measurement work [2] (using a small number of beacon sites)

that preﬁxes originated from tier-1 ASs tend to experience less

slow convergence compared to preﬁxes originated from lower

tier ASs. Moreover, our results also showed that, for the same

preﬁx, routers of different AS tiers observe different degrees of

slow convergence, with tier-1 ASs seeing much less than lower

tier ASs.

Existing passive measurements have studied the instability of

all the preﬁxes. The focuses have been on update interarrival

time, event duration, location of instability, and characteriza-

tion of individual updates [10], [12], [22]–[28]. There is no pre-

vious work on classifying routing events according to their ef-

fects (e.g. whether path becomes better or worse after the event).

Our paper describes a novel path preference heuristic based on

path usage time, and studies in detail the characteristics of dif-

ferent classes of instability events in the Internet.

Our approach shares certain similarities with [10], [12], and

[28] in that we all use a timeout-based approach to group up-

dates into events. Such an approach can mistakenly group up-

dates of multiple root causes that happened close to each other

or overlapped in time into a single event. As we discussed ear-

lier, the events in our Path-Disturbance category can be exam-

ples of grouping updates of overlap root causes because the path

to a preﬁx changed at least twice, and often more times, during

one event. We moved a step forward by detecting and separating

these overlapping events into a different category. It is most

likely that those Path-Change events with very long durations

are also overlapping events, and one possible way to identify

them is to set a time threshold on the event duration, which we

plan to do in the future.

VI. CONCLUSION

We conducted the ﬁrst systematic measurement study to

quantify the existence of path exploration and slow conver-

gence in the global Internet routing system. We ﬁrst developed a

new path-ranking method based on the usage time of each path

and validated its effectiveness using data from controlled exper-

iments with beacon preﬁxes. We then applied our path-ranking

method to BGP updates of all the preﬁxes in the global routing

table and classiﬁed each observed routing event into three

classes: Path Change,Path Disturbance, and Same Path. For

Path Change events, we further classiﬁed them into 4 subcat-

egories: , , , and . We measured the path

exploration, convergence duration, and update count for each

type of event.

Our work shows several signiﬁcant results. First, although

there is a wide existence of path exploration and slow con-

vergence in the global routing system, the signiﬁcance of the

problem can vary considerably depending on the locations of

both the origin ASs and the observation routers in the routing

system hierarchy. In general, routers in tier-1 ISPs observe less

path exploration and shorter convergence delays than routers in

edge ASs, and preﬁxes originated from tier-1 ISPs also expe-

rience much less slow convergence than those originated from

edge ASs.

Second, events have short duration, in general, that are

comparable to that of and events. This is in accor-

dance to our previous theoretical analysis results presented in

[6] and is a noticeable departure from widely accepted views

based on the previous experiments [1].

Furthermore, our data shows that the Same Path events ac-

count for about 34% of the total routing events, which seems an

alarmingly high value. Since this class of events is most likely

caused by internal routing changes within individual ASs, most

of them probably should not have existed in the ﬁrst place. Fur-

ther investigations are needed to better understand the causes

of the Same Path events. We also observed that about 30% of

the routing events are due to transient route changes (which are

captured as path disturbance events in our measurement) and are

responsible for close to half of all the routing updates (47%).

It would be interesting to identify the causes of these transient

routing changes in order to further stabilize the global routing

system.

REFERENCES

[1] C. Labovitz, A. Ahuja, A. Abose, and F. Jahanian, “Delayed Internet

routing convergence,” IEEE/ACM Trans. Netw., vol. 9, no. 3, pp.

293–306, Jun. 2001.

[2] C. Labovitz, A. Ahuja, R. Wattenhofer, and S. Venkatachary, “The im-

pact of Internet policy and topology on delayed routing convergence,”

in Proc. IEEE INFOCOM, Anchorage, AK, Apr. 2001, pp. 537–546.

[3] Z. M. Mao, R. Bush, T. Grifﬁn, and M. Roughan, “BGP beacons,” in

Proc. ACM SIGCOMM Internet Meas. Conf. (IMC), Miami Beach, FL,

Oct. 2003, pp. 1–14.

[4] “The RouteViews project,” 2005 [Online]. Available: http://www.

routeviews.org/

[5] “The RIPE routing information services,” 2008 [Online]. Available:

http://www.ris.ripe.net

[6] D. Pei, B. Zhang, D. Massey, and L. Zhang, “An analysis of path-vector

routing protocol convergence algorithms,” Comput. Netw., vol. 50, no.

3, pp. 398–421, 2006.

[7] “PSG beacon list,” [Online]. Available: http://www.psg.com/~zmao/

BGPBeacon.html

[8] “RIPE beacon list,” [Online]. Available: http://www.ripe.net/ris/docs/

beaconlist.html

[9] B. Zhang, V. Kambhampati, M. Lad, D. Massey, and L. Zhang, “Identi-

fying BGP routing table transfers,” in Proc. ACM SIGCOMM MineNet

Workshop, Philadelphia, PA, Aug. 2005, pp. 213–218.

[10] J. Rexford, J. Wang, Z. Xiao, and Y. Zhang, “BGP routing stability of

popular destinations,” in Proc. ACM SIGCOMM Internet Meas. Work-

shop (IMW), Marseille, France, 2002, pp. 197–202.

[11] D. Chang, R. Govindan, and J. Heidemann, “The temporal and topo-

logical characteristics of BGP path changes,” in Proc. Int. Conf. Netw.

Protocols (ICNP), Atlanta, GA, Nov. 2003, pp. 190–199.

[12] A. Feldmann, O. Maennel, Z. M. Mao, A. Berger, and B. Maggs, “Lo-

cating Internet routing instabilities,” in Proc. ACM SIGCOMM, Port-

land, OR, 2004, pp. 205–218.

[13] L. Gao, “On inferring autonomous system relationships in the Internet,”

IEEE/ACM Trans. Netw., vol. 9, no. 6, pp. 733–745, Dec. 2001.

[14] W. Mühlbauer, S. Uhlig, B. Fu, M. Meulle, and O. Maennel, “In search

for an appropriate granularity to model routing policies,” in Proc. ACM

SIGCOMM, Kyoto, Japan, 2007, pp. 145–156.

[15] A. Markopoulou, G. Iannaccone, S. Bhattacharyya, C.-N. Chuah, and

C. Diot, “Characterization of failures in an IP backbone network,” in

Proc. IEEE INFOCOM, Hong Kong, Mar. 2004, Sprint ATL Research

Report.

[16] Y. Rekhter, T. Li, and S. Hares, “Border gateway protocol 4,” Internet

Engineering Task Force, RFC 4271, Jan. 2006, , .

[17] T. G. Grifﬁn and B. J. Premore, “An experimental analysis of BGP con-

vergence time,” in Proc. Int. Conf. Netw. Protocols (ICNP), Riverside,

CA, Nov. 2001, pp. 53–61.

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

458 IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 17, NO. 2, APRIL 2009

[18] B. Zhang, R. Liu, D. Massey, and L. Zhang, “Collecting the Internet

as-level topology,” ACM SIGCOMM Comput. Commun. Rev., vol. 35,

no. 1, pp. 53–62, Jan. 2005.

[19] J. Xia and L. Gao, “On the evaluation of AS relationship inferences,”

in Proc. IEEE GLOBECOM, Dec. 2004, vol. 3, pp. 1373–1377.

[20] X. Meng, Z. Xu, B. Zhang, G. Huston, S. Lu, and L. Zhang, “IPv4 ad-

dress allocation and BGP routing table evolution,” in Proc. ACM SIG-

COMM Comput. Commun. Rev. (CCR) Special Issue on Internet Vital

Statistics, Jan. 2005, pp. 71–80.

[21] N. Kushman, S. Kandula, and D. Katabi, “Can you hear me now?!: It

must be BGP,” SIGCOMM Comput. Commun. Rev., vol. 37, no. 2, pp.

75–84, 2007.

[22] C. Labovitz, G. Malan, and F. Jahanian, “Internet routing instability,”

in Proc. ACM SIGCOMM, Cannes, France, Sep. 1997, pp. 115–126.

[23] C. Labovitz, R. Malan, and F. Jahanian, “Origins of Internet routing

instability,” in Proc. IEEE INFOCOM, New York, NY, Mar. 1999, pp.

218–226.

[24] C. Labovitz, A. Ahuja, and F. Jahanian, “Experimental study of In-

ternet stability and backbone failures,” in Proc. FTCS, Madison, WI,

Jun. 1999, pp. 278–285.

[25] L. Wang, X. Zhao, D. Pei, R. Bush, D. Massey, A. Mankin, S. F.

Wu, and L. Zhang, “Observation and analysis of BGP behavior under

stress,” in Proc. ACM SIGCOMM Internet Meas. Workshop (IMW),

Marseille, France, 2002, pp. 183–195.

[26] D. Andersen, N. Feamster, S. Bauer, and H. Balakrishnan, “Topology

inference from BGP routing dynamics,” in Proc. ACM SIGCOMM In-

ternet Meas. Workshop (IMW), Marseille, France, 2002, pp. 243–248.

[27] O. Maennel and A. Feldmann, “Realistic BGP trafﬁc for test labs,” in

Proc. ACM SIGCOMM, Pittsburgh, PA, 2002, pp. 31–44.

[28] J. Wu, Z. M. Mao, J. Rexford, and J. Wang, “Finding a needle in

a haystack: Pinpointing signiﬁcant BGP routing changes in an IP

network,” in Proc. Symp. Netw. Syst. Design Implementation (NSDI),

Boston, MA, May 2005, vol. 2, pp. 1–14.

Ricardo Oliveira (M’08) received the B.S. in

electrical engineering from the Engineering Faculty

of Porto University (FEUP), Porto, Portugal, in 2001

and the M.S. degree in computer science from the

University of California, Los Angeles (UCLA) in

2005. He has been pursuing the Ph.D. degree in

computer science at UCLA since 2005.

His research interests include Internet topology,

next generation routing architectures, and develop-

ment of Internet monitoring and measurement plat-

forms. He is a student member of the Association for

Computing Machinery.

Beichuan Zhang received the B.S. degree from Bei-

jing University, Beijing, China, in 1995 and the Ph.D.

degree in computer science from the University of

California, Los Angeles in 2003.

He is an Assistant Professor in the Department

of Computer Science at the University of Arizona,

Tucson. His research interests include Internet

routing and topology, multicast, network measure-

ment, and security.

Dan Pei received the B.S. and M.S. degrees from Ts-

inghua University, Beijing, China, in 1997 and 2000,

respectively, and the Ph.D. degree from the Univer-

sity of California, Los Angeles, in 2005.

He is a Researcher at AT&T Research, Florham

Park, NJ. His current research interests are network

measurement and security.

Lixia Zhang received the Ph.D. degree in computer

science from the Massachusetts Institute of Tech-

nology, Cambridge.

She was a Member of the research staff at the

Xerox Palo Alto Research Center before joining the

faculty of the Computer Science Department at the

University of California, Los Angeles, in 1995.

Dr. Zhang has served as the Vice Chair of ACM

SIGCOMM and as Co-Chair of IEEE Commu-

nication Society Internet Technical Committee.

She is on the editorial board for the IEEE/ACM

TRANSACTIONS ON NETWORKING. She is currently serving on the Internet

Architecture Board.

Authorized licensed use limited to: Barbara Lange. Downloaded on April 15, 2009 at 01:13 from IEEE Xplore. Restrictions apply.

A Survey on Rerouting Techniques with P4 Programmable Data Plane Switches

Article

Full-text available

Apr 2023
COMPUT NETW

Traditionally, the networking industry has been dominated by closed and proprietary hardware and software. Vendors have been controlling the network by hard-coding how packets should be processed and providing the network operators with a set of predefined protocols. Recently, the industry, operators, and the research community have started to pay special attention to data plane programmability, which allows the user to define the packet processing behavior. Allowing the network operators and programmers to define, deploy, and test new forwarding behaviors in a relatively short time paved the way for a significant wave of innovation and experimentation. With the emergence of programmable data planes, traffic rerouting has been used by the research community. Rerouting approaches are deployed to mitigate various network issues. Despite the considerable number of works that deploy innovative rerouting mechanisms using programmable switches, the literature lacks a comprehensive survey. To this end, this paper provides an in-depth overview, detailed analysis, and unique categorization of the recent programmable data plane-based rerouting approaches. The survey explains the need for rerouting by highlighting the promising results while dealing with link/node failures, load imbalance, and congestion. It then discusses the challenges and considerations and presents future perspectives and open research issues.

Measuring Internet Routing from the Most Valuable Points

Preprint

Full-text available

May 2024

While the increasing number of Vantage Points (VPs) in RIPE RIS and RouteViews improves our understanding of the Internet, the quadratically increasing volume of collected data poses a challenge to the scientific and operational use of the data. The design and implementation of BGP and BGP data collection systems lead to data archives with enormous redundancy, as there is substantial overlap in announced routes across many different VPs. Researchers thus often resort to arbitrary sampling of the data, which we demonstrate comes at a cost to the accuracy and coverage of previous works. The continued growth of the Internet, and of these collection systems, exacerbates this cost. The community needs a better approach to managing and using these data archives. We propose MVP, a system that scores VPs according to their level of redundancy with other VPs, allowing more informed sampling of these data archives. Our challenge is that the degree of redundancy between two updates depends on how we define redundancy, which in turn depends on the analysis objective. Our key contribution is a general framework and associated algorithms to assess redundancy between VP observations. We quantify the benefit of our approach for four canonical BGP routing analyses: AS relationship inference, AS rank computation, hijack detection, and routing detour detection. MVP improves the coverage or accuracy (or both) of all these analyses while processing the same volume of data.

A Multicast Routing Scheme for the Internet: Simulation and Experimentation in Large-Scale Networks

Article

Full-text available

Sep 2021

With the globalisation of the multimedia entertainment industry and the popularity of streaming and content services, multicast routing is (re-)gaining interest as a bandwidth saving technique. In the 1990’s, multicast routing received a great deal of attention from the research community; nevertheless, its main problems still remain mostly unaddressed and do not reach the acceptance level required for its wide deployment. Among other reasons, the scaling limitation and the relative complexity of the standard multicast protocol architecture can be attributed to the conventional approach of overlaying the multicast routing on top of the unicast routing topology. In this paper, we present the Greedy Compact Multicast Routing (GCMR) scheme. GMCR is characterised by its scalable architecture and independence from any addressing and unicast routing schemes; more specifically, the local knowledge of the cost to direct neighbour nodes is enough for the GCMR scheme to properly operate. The branches of the multicast tree are constructed directly by the joining destination nodes which acquire the routing information needed to reach the multicast source by means of an incremental two-stage search process. In this paper we present the details of GCMR and evaluate its performance in terms of multicast tree size (i.e., the stretch), the memory space consumption, the communication cost, and the transmission cost. The comparative performance analysis is performed against one reference algorithm and two well-known protocol standards. Both simulation and emulation results show that GCMR achieves the expected performance objectives and provide the guidelines for further improvements.

Experimentation Environments for Data Center Routing Protocols: A Comprehensive Review

Article

Full-text available

Jan 2022

The Internet architecture has been undergoing a significant refactoring, where the past preeminence of transit providers has been replaced by content providers, which have a ubiquitous presence throughout the world, seeking to improve the user experience, bringing content closer to its final recipients. This restructuring is materialized in the emergence of Massive Scale Data Centers (MSDC) worldwide, which allows the implementation of the Cloud Computing concept. MSDC usually deploy Fat-Tree topologies, with constant bisection bandwidth among servers and multi-path routing. To take full advantage of such characteristics, specific routing protocols are needed. Multi-path routing also calls for revision of transport protocols and forwarding policies, also affected by specific MSDC applications’ traffic characteristics. Experimenting over these infrastructures is prohibitively expensive, and therefore, scalable and realistic experimentation environments are needed to research and test solutions for MSDC. In this paper, we review several environments, both single-host and distributed, which permit analyzing the pros and cons of different solutions.

Software-Defined Networking Approaches for Link Failure Recovery: A Survey

Article

Full-text available

May 2020

Deployment of new optimized routing rules on routers are challenging, owing to the tight coupling of the data and control planes and a lack of global topological information. Due to the distributed nature of the traditional classical internet protocol networks, the routing rules and policies are disseminated in a decentralized manner, which causes looping issues during link failure. Software-defined networking (SDN) provides programmability to the network from a central point. Consequently, the nodes or data plane devices in SDN only forward packets and the complexity of the control plane is handed over to the controller. Therefore, the controller installs the rules and policies from a central location. Due to the central control, link failure identification and restoration becomes pliable because the controller has information about the global network topology. Similarly, new optimized rules for link recovery can be deployed from the central point. Herein, we review several schemes for link failure recovery by leveraging SDN while delineating the cons of traditional networking. We also investigate the open research questions posed due to the SDN architecture. This paper also analyzes the proactive and reactive schemes in SDN using the OpenDayLight controller and Mininet, with the simulation of application scenarios from the tactical and data center networks.

BGP convergence in an MRAI-free Internet

Article

Feb 2024
COMPUT NETW

BGP Control Plane Overhead in Fat-Trees: An Analytical Approach

Conference Paper

Oct 2023

Routing Algorithm for Software Defined Network Based on Boxcovering Algorithm

Conference Paper

Full-text available

Oct 2023

A routing algorithm is the most fundamental problem in complex network communication. In complex networks, the amount of computation increases as the number of nodes increases which reduces routing performance. In this paper, we propose a routing algorithm for software-defined networking (SDN) based on a box-covering (BC) algorithm. It is known that using the BC algorithm it is possible to increase performance in complex SDN. We partition the entire SDN network into subnets using three existing box-covering methods such as MEMB, GC and CIEA, then we use Dijkstra’s algorithm to find the shortest path between subnets and within each subnet. We compared all box-covering algorithms and found that the GC algorithm has the highest performance for SDN routing.

Optimal Trained Hybrid Classifier for Border Gateway Protocol Anomaly Detection

Article

Full-text available

Jan 2022

Border Gateway Protocol (BGP) anomalies have interrupted network connection on a large scale, henceforth; recognizing them is of very important. Machine learning algorithms play a vital role in detecting the BGP anomalies in network. This paper intends to propose a new BGP anomaly detection method under certain processes (i) Feature Extraction and (ii) Classification. In feature extraction, certain features like "Number of Exterior Gateway Protocol (EGP) packets, Number of Interior Gateway Protocol (IGP) packets, Number of incomplete packets, Maximum Autonomous System (AS) path length, average AS-path length, packet size" etc. are extracted. Along with this, the statistical features such as mean, mode, variance, median, standard deviation, and higher-order statistical features such as kurtosis, skewness, second moment, entropy, and percentiles are also extracted. Subsequently, the classification is carried out by a hybrid classifier model that merges the Convolutional Neural Network (CNN) and Long Short Term Memory (LSTM) models.

A hybrid computing approach to improve convergence time for scalable network

Article

Sep 2020

Border Gateway Protocol (BGP) is a widely used routing protocol in the new era for the inter-communication between the multiple autonomous systems and it has been largely on the internet in all categories of the scalable network. In the event of failure, the BGP as an inter-domain routing protocol shows slow convergence, which results in high considerable delay in several internet/web applications. The minimum route advertisement interval (MRAI) timers are mostly used by network operators to reduce the issues occurring at the time of increasing convergence time. Many researchers have been working on variation in MRAI timer and effect of it on scalability and network convergence. The increasing size of a network leads to an increase in the value of MRAI timers. Hence, keeping the value of MRAI timers optimum results in reducing the issue of slow convergence for the scalable network. The proposed system (FAPSO) reduces the problem of convergence time by incorporating fuzzy logic into Particle Swarm Optimization (PSO) algorithm for the scalable network. In comparison with the static value of MRAI timer i.e., 30 s, FAPSO is a suitable algorithm that gives the optimal value of convergence time for the scalable network.

Border gateway protocol 4

Article

Full-text available

Jan 1995

An analysis of path-vector routing protocol convergence algorithms

Article

Full-text available

Apr 2004

Today's Internet uses a path vector routing protocol, BGP, for global routing. After a connectivity change, a path vector protocol tends to explore a potentially large number of alternative paths before converging on new stable paths. Several techniques for improving path vector convergence have been proposed, however there has been no comparative analysis to judge the relative merit of each approach. In this paper we develop a novel analytical framework for analyzing the convergence delay bounds of path-vector routing protocols in general. Our framework can accommodate different message processing delay models. By incorporating the commonly used uniform processing delay model we are able to fill in all the cases where analytical results are missing previously. The results obtained by using our framework not only confirm the previous work but also provide new insights into the underlying network behavior. We then present a new delay model, the model, which takes into account the actual message queueing delay in actual BGP implementations and simulations. By incorporating the model in our framework, we are able to obtain tighter delay bounds and explain simulation results that cannot be explained using the previous uniform message delay model. Abstract— Today's Internet uses a path vector routing protocol, BGP, for global routing. After a connectivity change, a path vector protocol tends to explore a potentially large number of alternative paths before converging on new stable paths. Several techniques for improving path vector convergence have been proposed, however there has been no comparative analysis to judge the relative merit of each approach. In this paper we develop a novel analytical framework for analyzing the convergence delay bounds of path-vector routing protocols in general. Our framework can accommodate different message processing delay models. By incorporating the commonly used uniform processing delay model we are able to fill in all the cases where analytical results are missing previously. The results obtained by using our framework not only confirm the previous work but also provide new insights into the underlying network behavior. We then present a new delay model, the model, which takes into account the actual message queueing delay in actual BGP implementations and simulations. By incorporating the model in our framework, we are able to obtain tighter delay bounds and explain simulation results that cannot be explained using the previous uniform message delay model.

BGP beacons

Conference Paper

Full-text available

Jan 2003

The desire to better understand global BGP dynamics has motivated several studies using active measurement techniques, which inject announcements and withdrawals of prefixes from the global routing domain. From these one can measure quantities such as the BGP convergence time. Previously, the route injection infrastructure of such experiments has either been temporary in nature, or its use has been restricted to the experimenters. The routing research community would benefit from a permanent and public infrastructure for such active probes. We use the term BGP Beacon to refer to a publicly documented prefix having global visibility and a published schedule for announcements and withdrawals. A BGP Beacon is to be used for the ongoing study of BGP dynamics, and so should be supported with a long-term commitment. We describe several BGP Beacons that have been set up at various points in the Internet. We then describe techniques for processing BGP updates when a BGP Beacon is observed from a BGP monitoring point such as Oregon's Route Views. Finally, we illustrate the use of BGP Beacons in the analysis of convergence delays, route flap damping, and update inter-arrival times.

In search for an appropriate granularity to model routing policy

Conference Paper

Full-text available

Oct 2007
COMPUT COMMUN REV

Routing policies are typically partitioned into a few classes that capture the most common practices in use today[1]. Unfortunately, it is known that the reality of routing policies[2] and peering relationships is far more complex than those few classes[1,3]. We take the next step of searching for the appropriate granularity at which policies should be modeled. For this purpose, we study how and where to configure per-prefix policies in an AS-level model of the Internet, such that the selected paths in the model are consistent with those observed in BGP data from multiple vantage points. By comparing business relationships with per-prefix filters, we investigate the role and limitations of business relationships as a model for policies. We observe that popular locations for filtering correspond to valleys where no path should be propagated according to inferred business relationships. This result reinforces the validity of the valley-free property used for business relationships inference. However, given the sometimes large path diversity ASs have, business relationships do not contain enough information to decide which path will be chosen as the best. To model how individual ASs choose their best paths, we introduce a new abstraction: next-hop atoms . Next-hop atoms capture the different sets of neighboring ASs an AS uses for its best routes. We show that a large fraction of next-hop atoms correspond to per-neighbor path choices. A non-negligible fraction of path choices, however, correspond to hot-potato routing and tie-breaking within the BGP decision process, very detailed aspects of Internet routing.

Observation and analysis of BGP behavior under stress

Conference Paper

Jan 2002

Topology inference from BGP routing dynamics

Conference Paper

Jan 2002

Finding a needle in a haystack: Pinpointing significant BGP routing changes in an IP network

Article

Jan 2005

An experimental study of delayed internet routing convergence

Article

Jan 2000

This paper examines the latency in Internet path failure, failover and repair due to the convergence properties of inter-domain routing. Unlike switches in the public telephony network which exhibit failover on the order of milliseconds, we show that inter-domain routers in the packet switched Internet may take several minutes to reach a consistent view of the network topology after a fault. These delays stem from temporary routing table oscillations formed during operation of the BGP path selection process on Internet backbone routers. During these periods of delayed convergence , end-to-end Internet paths will experience intermittent loss of connectivity, as well as increased packet loss and latency. We present a two-year study of Internet routing convergence through the experimental instrumentation of key portions of the Internet infrastructure, including both passive data collection and fault-injection machines at major Internet exchange points. Based on data from the injection and measurement of several hundred thousand inter-domain routing faults, we describe several unexpected properties of convergence and show that the measured upper bound on Internet inter-domain routing convergence delay is an order of magnitude slower than previously thought. Our analysis also shows that the upper computational bound on the number of router states and control messages exchanged during the process of BGP convergence is exponential with respect to the number of autonomous systems on the Internet. Finally, we demonstrate that much of the observed convergence delay stems from both specific router vendor implementation decisions, as well as ambiguity in the BGP specification.

An Experimental Analysis of BGP Convergence Time

Article

Nov 2001

B. Premore

An abstract is not available.

An analysis of convergence delay in path vector routing protocols

Article

Feb 2006
COMPUT NETW

Path vector routing protocols such as the Border Gateway Protocol (BGP) are known to suffer from slow convergence following a change in the network topology or policy. Although a number of convergence enhancements have been proposed recently, there has been no general analytical framework to assess and compare the various proposed algorithms. In this paper we present such a general framework to analyze the upper bounds of path vector protocols’ convergence delay under shortest path routing policy and single link failure. Our framework takes into account important factors such as network connectivity, failure location, and routing message processing delay. It can be used to analyze both standard BGP and all the proposed convergence improvement algorithms in the case of shortest path routing policy and single link failure. It enables us to obtain previously unavailable analytical results, including the delay bounds of path fail-over for standard BGP and its convergence enhancements. Our analysis shows that BGP fail-over delay bounds are mainly determined by two factors: (1) the distance between the failure location and the destination, and (2) the length of the longest alternate path to reach the destination after the failure. These two factors are captured formally by our analysis and can explain why existing convergence enhancements often provide only limited improvements in fail-over events. Moreover, explicitly modeling message processing delay reveals insights into the impacts of connectivity richness (i.e., node degree and total number of links in the network), and also the effectiveness of different enhancements. These new results enable one to better understand and compare the behavior of various path vector protocols under different topology structures, network sizes, and message delays.

Quantifying Path Exploration in the Internet

Abstract and Figures

Recommended publications

On Inferring Autonomous System Relationships in the Internet

Quantifying Path Exploration in the Internet

Understanding BGP session failures in a large ISP

BGP performance analysis for large scale VPN

On update rate-limiting in BGP