ArticlePDF Available

Recovery techniques in next generation networks

Authors:

Abstract and Figures

This article provides a concise and up-to-date survey of major recovery (protection and restoration) techniques for High Speed (Gb/s) Next Generation Networks. The survey includes an overview of both existing and newly proposed methodologies. Core ideas underlying the Disjoint Paths, Protection Rings, Protection Cycles, pcycles, Redundant Trees, Resilient Routing Layers and IP Fast Reroute technologies are presented. A comparison of strengths and weaknesses, between different recovery procedures is provided. Applications of these recovery procedures to IP based networks are also addressed.
Content may be subject to copyright.
Recovery Techniques in Next Generation Networks
Aun Haider
IIST, Massey University,
Palmerston North, New Zealand.
a.haider@massey.ac.nz
Richard Harris
IIST, Massey University,
Palmerston North, New Zealand.
r.harris@massey.ac.nz
Abstract— This paper provides a concise and up-to-date survey
of major recovery (protection and restoration) techniques for
High Speed (Gbits/s) Next Generation Networks. The survey
includes an overview of both existing and newly proposed
methodologies. Core ideas underlying the Disjoint Paths, Pro-
tection Rings, Protection Cycles, p-cycles, Redundant Trees,
Resilient Routing Layers and IP Fast Reroute technologies are
presented. A comparison of strengths and weaknesses, between
different recovery procedures is provided. Applications of these
recovery procedures to IP based networks are also addressed.
I. INTRODUCTION
A world wide revolution in ultra high capacity Next Gen-
eration Networks (NGNs) is emerging on the horizon of the
communication network industry. Several such networks are
being developed in various parts of world, such as Internet 2
and Euro NGI.
The main impetus behind this transition and subsequent
growth can be attributed to the great advances being made
in optical fiber technology. Components such as photonic (all-
optical) switches, optical add/drop multiplexers (OADMs) and
optical cross-connects (OXCs) have fundamentally changed
the methodology for designing high capacity networks. In-
stead of designing optical spans with electrical switch-
ing/regeneration, the current trend is to design networks that
can switch wavelengths without involving electronic signals;
e.g. employing optical burst switching, [1].
Wavelength Division Multiplexing (WDM), [2], and Dense
Wavelength Division Multiplexing (DWDM), [3] and [4],
provide a tremendous amount of bandwidth in a single optical
fiber, [5]. This bandwidth can be divided into non-overlapping
channels that can be operated at very high speed. In such
systems, at the photonic layer (also referred to as layer 0)
a wavelength,
λ
, is a basic unit of bandwidth which can
be operated at speeds upto 10 Gigabits/s, [6]. Currently,
DWDM based fiber optical networks have been operated at
1.6 Terabits/s, i.e. with 160
λ
s at 10 Gbits/s each, [4].
In order to deliver the benefits of fiber optics, right to the
door of customers, several new technologies, such as Passive
Optical Networks (PONs) and Gigabit PONs, and various
other architectures are being developed; they are collectively
known as Fiber-to-the x (FTTX), [7]. Wherein “x” can be a
Node, Exchange, Cabinet, Building or Home.
One of the fundamental requirements in high capacity fiber
optical networks is “survivability”. It refers to the ability
of a network to recover affected traffic in failure environ-
ments and to provide different services continuously. In [8],
a network survivability function (a probability function for
the percentage of total data flow after failure) and related
survivability attributes (such as expected percentage of total
data flow delivered after failure, p-percentile values and worst
case survivability) have been defined. In their study, the Polish
backbone network was chosen for survivability assessment.
Such an approach can be used for quantitative estimation of
survivability for any Telecommunication network.
There are two basic types of network failures, i.e. link
and node failure. Link failures usually occur as a result of
cuts in cables and fibers, whereas node failures typically
occur because of hardware failures such as a power supply or
related hardware infrastructure. In the case of optical networks,
there can be a channel failure as well, due to failure of
transmitting/receiving equipment. This paper will mainly focus
on techniques for protection against node and link failures.
Thus, despite all advantages and future promises, the flip
side for optical networks is that, the greater is the capacity
the bigger will be the challenge to protect the network from
failures and to meet resilience assurance. Hence, the advan-
tages and disadvantages of optical networks lies both in their
ability to transfer data at extra-ordinary rates.
A major cause of link failure for optical networks is a fiber
cut, which occurs very frequently all over the world (despite
a large amount of effort to avoid this eventuality), [9]. It has
been estimated that for every 10 Km of fiber, a cut is expected
to occur once every 12 years, [10]. The causes of fiber cuts
include construction work (backhoe fade), rodents, fires or
human error. Thus, it is very important to investigate, study
and improve the protection and restoration techniques for Next
Generation Networks.
A network can recover from failure by means of protection
or restoration schemes, where the former operate on smaller
time scales. Protection schemes are generally proactive since
the backup paths are computed in advance,whereas restoration
schemes are reactive as backup paths are computed upon
detection of a failure. Thus, restoration offers more flexibility
in deciding the actions for network recovery. For the satis-
factory operation of high capacity networks, such as NGNs,
the ideal recovery time is considered to be 50 ms. An
interruption of signals for a period 50 ms is perceived
by higher layers as a transmission error only and is usually
managed in a graceful manner; whereas, for higher intervals,
some applications may have packet losses, see: pp. 106-112
of [10] for further discussion.
The protection and restoration (recovery) mechanisms, can
be applied to span (link), segment or path levels in a high
speed network. At the span level these mechanisms can protect
a pair of adjacent nodes by switching traffic to alternate
spans between the same nodes. Similarly, in the case of
segment or path level recovery, these protection and restoration
mechanisms can switch traffic to alternate segments and paths
between the same origin and destination pairs. At the path
level, new paths, possibly link disjoint from the failed path,
can be selected to reroute the traffic. In general, span switching
is quicker and easier to implement in hardware; however, path
restoration is often more efficient but slower to implement due
to extra overheads. At each of these levels, the protection and
restoration mechanisms should not share resources with the
working system.
The rest of this paper is organised as follows: Section II
presents the basics of NGNs recovery. An overview of previous
work done is given in Section III. Disjoint Paths are discussed
in Section IV. Protection Rings and Mesh based networks are
described in Section V. Protection Cycles and p-cycles are
investigated in Sections VI and VII, respectively. Redundant
Trees, Resilient Routing Layers and IP Fast Reroute are
described in Sections VIII, IX and X, respectively. Also, a
comparison between different structured recovery techniques
has been presented in Section XI. Finally, some concluding
discussion is provided in Section XII.
II. BASICS OF NGNSRECOVERY
For creating the NGNs, International Telecommunication
Union (ITU) and Internet Engineering Task Force (IETF)
have standardised Automatically Switched Optical Networks
(ASON) and Generalized Multi-Protocol Label Switching
(GMPLS) specifications, respectively.
ASON is an architecture that defines components and inter-
actions between them, in optical control plane. It is mostly a
protocol independent specifications. GMPLS is an extension of
Multi Protocol Label Switching (MPLS) concept, [11], that is
concerned with routing and quality of service for NGNs. The
extension of MPLS involves applying the concept to multiple
wavelengths in an optical network.
Management
System
Management
System
Management Plane
Control Plane
Transport or Data Plane
Asymmetrical ArchitectureSymmetrical Architecture
Fig. 1. Functional planes in ASON and GMPLS based NGNs; adapted from
[12].
Both ASON and GMPLS contain three functional planes,
[12]: (i) Transport Plane or Data plane which is responsible
for traffic transport and switching, (ii) Control Plane which
is responsible for connection and resource management (iii)
Management Plane for supervision and handling of the entire
system. These three functional planes for a NGN architec-
ture are depicted in Fig. 1. Readers further interested in
an overview of the strong competition between ASON and
GMPLS technologies, are referred to [13].
The main focus of the present paper is to provide a concise
survey of the most recent recovery techniques in the data
plane for ordinary and next generation networks. However,
in subsection II-C we also provide an introduction to recovery
in the control plane for NGNs.
A. Path Protection Methodologies
Based on the availability of network resources, protection
mechanisms can be categorised into following four major
classes, [14]:
1) 1+1 Protection: In this scheme a dedicated backup
path is predefined to protect a given working path in
a network. Traffic is transmitted along both paths and
a selector mechanism will choose the best signal, and
thus the path.
2) 1:1 Protection: A dedicated backup path is predefined,
but data is switched to the backup path only after the
primary working path fails.
3) 1:N Protection: In this technique, a dedicated backup
path is predefined to protect Nnumber of primary
working paths. The data will be switched to backup path,
if any of the primary paths fail, after that, the remaining
N1 paths are un-protected.
4) M:N protection: 1MNdedicated backup paths are
predefined, to protect N1 primary paths.
The dedicated 1+1 protection is the fastest (recovery time
50ms), as the traffic is being simultaneously transmitted
over working and backup paths. However, compared to an
un-protected system, it requires twice the amount of network
resources, i.e. 100% redundancy (ratio of spare to working
capacity in a network). This technique has been widely used in
automatic protection switching of premium or high availability
services.
It should be noted that both 1:1 and 1:N are actually special
cases of M:N technique, [14]. In M:N protection, M number
of backup paths are used to protect the N number of working
paths. This provides a better utilization of resources than 1+1,
since backup paths are used by multiple working paths. Also,
the idle backup resources can be used by low priority traffic to
further enhance the network resource utilization. However, this
improved resource utilization in M:N is obtained at the cost of
additional signalling and increased protection switching time,
which will increase the overall recovery time of the network
against faults.
A redundant capacity based taxonomy of major recovery
algorithms surveyed in this paper, i.e. 1+1 and M:N based
Rings, Disjoint Paths, Protection Cycles, p-cycles, Redundant
2
0% 100%
Paths
Disjoint
Protection
Cycles
Redundant
Trees
RRLs
IPFRR
p−cycles
<100%
Redundant Capacity of Network
M:N
<100 %
1+1
Rings
Fig. 2. Taxonomy of major recovery schemes in NGNs.
Trees, Resilient Routing Layers (RRLs) and IP Fast Reroute
(IPFRR), has been sketched in Fig. 2. A number of these
approaches have not yet been discussed but will be considered
in more details in following Sections.
B. Shared Risk Link Groups
All of the protection methodologies presented in the pre-
vious subsection, have been designed for protection against
failure of an individual or a diverse set of links (which
may also have a failure correlation) in a network. However,
there exists a much broader concept known as Shared Risk
Link Groups (SRLGs) for the design of survivable networks.
It refers to situations where a set of links share common
resources (fiber or physical attributes such as cable, conduit
etc) and if one link fails then the other links in the group may
also fail. Thus, all links in the SRLGs have a shared risk of
failure. A simple example of SRLGs is shown in Fig. 3.
1
2
3
4
(a)
1 2
34
g1 g2
g3 g4
(b)
Fig. 3. An example of four SRLGs (i.e. g1, g2, g3 and g4) formed in an
optical network consisting of 4 nodes and 5 links, where: (a) Optical layer
(b) Physical layer. Failure of g1 or g2 will disconnect nodes 1 and 2 as well
as nodes 1 and 4; whereas, failure of g3 or g4 will disconnect nodes 3 and 4
as well as nodes 1 and 4.
SRLGs have been proposed as a fundamental concept for
failure management in GMPLS. The scope of this paper is
limited to providing a survey on link and node protection
schemes and it does not include protection of network with
SRLGs. However, readers further interested in protection
schemes involving SRLGs are referred to [15] and its associ-
ated references.
C. Recovery in the Control Plane
Until recently, it has been assumed that the control plane
was very reliable and, thus, the major focus of recovery
techniques was in the data plane. However,in order to maintain
established connections in the data plane, the recovery mecha-
nisms in the control plane are also very important. Therefore,
a reliable control plane is necessary for proper operation of
NGNs, as it is responsible for signalling/routing messages and
management of connections/resources. Moreover, the majority
of the protection and restoration mechanisms in the data plane
need efficient signalling from control plane, [12].
The Resource Reservation Protocol with Traffic Engineering
extensions (RSVP-TE) and Constraint based Routing Label
Distribution Protocol (CR-LDP) are two major control sig-
nalling protocols proposed by the IETF for GMPLS controlled
NGNs. For routing, Intermediate System to Intermediate Sys-
tem (IS-IS), Open Shortest Path First (OSPF) and Border
Gateway Protocol (BGP) are widely used; whereas the Link
Management Protocol is used in managing the signalling
channels and data links between nodes, [16]. For an overview
of an architecture and recovery mechanisms for the control
plane see [12]. It describes control plane recovery procedures
under link and node failures. The control plane recovery
technique in the case of LDP and CR-LDP failures has been
investigated in [17] and [18], respectively. Further, we have
concentrated on providing a survey major recovery techniques
in Data plane.
D. Graph Theory and Recovery Techniques
Graph Theory has been widely used to analyse network
recovery algorithms, by abstracting the physical nodes and
links with vertices and edges of a graph. Thus, a complete
physical network is reduced into a graph, which can be
easily analysed for different failure scenarios, involving failure
of various combinations of nodes and links. An important
property of such an abstracted network is the nodal or vertex
degree, which can be defined as the number of links (edges)
incident upon a node (vertex). The average nodal degree
is obtained by taking an arithmetic mean of set of nodal
degrees computed at all of the nodes in a graph. For a good
introduction to Graph Theory, please see [19].
A graph where are all links are directed, i.e. they have
source and destination node pairs in which the direction of
flow is only from source to destination, is known as a digraph.
Two paths are said to be node disjoint if they do not have a
common node. Also, two paths are said to be link disjoint if
they have no shared link. Multigraph has more than one link
between any two pair of nodes.
A vertex cut to a connected graph refers to removal of a
set of vertices that will make the graph disconnected. Vertex
connectivity is the size (number of edges crossing the cut) of
smallest possible cut in a graph. A graph is called k-vertex-
connected if its vertex connectivity is k. Analogous concepts
can be defined for edges of a graph.Thus, a bi-connected graph
is defined as a connected graph that is not broken down into
disconnected pieces by removing any of the single vertex or
edge. An articulation point of a connected graph is a vertex
whose removal will disconnect the graph
3
Flow augmentation refers to the situation in which a group
of links in a path is not carrying traffic at their full capacity
and which are combined together to form a new flow with a
larger capacity, [20]. A spanning tree of a connected graph
is defined as a tree which consists of all vertices and either
some or all edges of a graph. An example of spanning tree
for a closed network with un-directed links (vertices) has been
shown in Fig. 4.
A
B
C D
E F
(a)
A
B
C D
E F
(b)
Fig. 4. A sample Graph and its Spanning Tree: (a) connected network with
un-directed links (b) Spanning tree. Note that in (a) the nodal degree at A,
B, C, D, E, F is 3, 3, 4, 2, 2, 1, respectively; thus giving an average nodal
degree of 2.5
Another important concept in analysing network recovery
algorithms is “Completeness”, [9]. A set of problems for which
an algorithm (working in polynomial time) exists (or can be
developed) to compute its exact solution is called P and the
corresponding set of all such problems is called NP. A problem
is NP-hard, if any solution algorithm will have to examine all
possible solutions or have run-times which are not bounded
by any polynomial time function. NP-complete is similar to
NP-hard, but with additional requirement of being complete,
i.e. if you can solve any such problem in polynomial time
then you should be able to any other similar problem also in
polynomial time, see: [9]. Readers interested in more details
and the formal Theory of NP-completeness are referred to
[21].
A cycle of a graph is a set of the edges that forms a closed
path such that the first and last nodes of the path overlap
with each other. A Hamiltonian cycle in a connected graph
G(V,E), where V={v1,v2,...vi,vj, ...}is a set of vertices and
E={e1,e2,...ek,...}is a set of edges with edge ekrepresented
by an un-ordered pair of vertices (vi,vj), is a closed walk that
traverses each vertex of Gexactly once except the starting
vertex where such a walk ends. For instance, in Fig. 9 the
cycle formed by traversing nodes A, B, C, D, E and A is
a Hamiltonian cycle. For a graph of any network topology,
one of the main issues is to detect a Hamiltonian cycle and
also how to find all of such cycles; which is a NP-complete
problem, [22].
1) Menger’s Theorem: From graph theory point of view,
the basis of failure recovery methods is Menger’s theorem,
[23] and [24]. Formally, it can be stated as, [25]:
In a graph G= (V,E), where V is a set of vertices and
E is a set of edges, there is no cut of size k1 or less
disconnecting two given nodes sand t, if and only if there
exist at least kedge-disjoint [s,t]-paths in G.
Let sand tbe two non-adjacent nodes in G. Then there
is no articulation set Zof size k1 or less disconnecting
sand t, if and only if there exist at least knode-disjoint
[s,t]-paths in G.
Informally, it can be stated as follows: a graph Gis k-
connected (vertex connectivity k) if and only if for each
pair of distinct vertices there are at least kdisjoint paths in G.
Thus, a bi-connected graph will provide each pair of vertices
with two disjoint paths, [26]. A similar principle also applies to
edges of a graph. Further, [19] provides three different proofs
of this theorem.
Moreover,in order to perform simulations for testing newly
proposed recovery algorithms, one often needs to randomly
generate graphs for large network topologies. For this purpose,
commonly used network topology generators are: Brite [27]
and Rocket Fuel [28]. A good comparison between various
approaches to topology generation can be found in [29].
Further, a quantitative comparison of graph based models for
the Internet topology has been presented in [30].
III. RELATED PREVIOUS WORK
Extensive work on recovery techniques in both computer
and telecommunication networks has been carried out in the
last few decades. The importance of network recovery has
been further increased recently, by the advent of ultra-high
capacity next generation networks. In the following, some
existing papers, dealing with performance comparison between
various recovery techniques, are reviewed briefly.
In [31], an online routing and wavelength assignment of
protection paths in WDM network with a link-disjoint backup
path has been studied. Both static (fixed route and wave-
length) and dynamic strategies for 1:1 path protection has
been considered, while assuming that working paths cannot
be re-arranged. It has been reported that, contrary to intuition,
the static strategy performs (number of requests met with
increasing number of working paths) better than the dynamic
one, as the latter has a tendency to pack the working and
protection paths into different sets of wavelengths. It deals
only with WDM and does not consider protection at the IP
layer.
For Asynchronous Transfer Mode (ATM) networks, [32]
describes a restoration methodology which employs unused
network capacity, instead of dedicated capacity, for fault pro-
tection. It requires the signalling from upstream neighbouring
switches to effected connections for finding an alternate path.
The performance of the proposed scheme has been discussed
on the basis of simulation studies for video on demand
applications. This paper does not consider modern IP based
NGNs, but does demonstrate the application of the idea of
using unused spare capacity for network restoration.
An overview of models, analysis, architecture, framework
and implementation for multi-layered network survivability
has been provided in [33]. For synchronous transfer mode
networks, it employs genetic algorithms (stochastic search
that mimics the survival of the fittest observed paradigm in
4
nature), to develop techniques for spare capacity assignment
with disjoint paths.
The protection and restoration of survivable WDM net-
works, with static traffic demands and single link failure, have
been studied in [34] and [35], respectively. In the former work,
Integer Linear Programming formulations (ILP) have been
developed to determine the capacity requirements for path and
link protection/restoration. An analysis of protection-switching
and restoration times, computing restoration efficiency, and
proposal for distributed restoration protocols has been carried
out in [35]. A partial path (in which only a limited number
of incoming lightpaths can change their wavelength towards
the outgoing link) protection scheme for WDM has been
investigated in [36] and [37].
A description of different survivability techniques for optical
networks, both for non-WDM and WDM, has been provided in
[38]. For non-WDM networks, protection schemes have been
categorised into pre-designed and dynamic types, where the
former includes Automatic Protection Switching (APS), Dual
Homing, Self Healing Rings and Mesh Protection. The authors
briefly explain APS (1+1, 1:1 and 1:N) and self healing rings.
A brief qualitative description of dynamic protection has also
been provided. For WDM networks, multilayer protection for
single link failure, using APS, has been explained at length.
However, this paper does not capture other important and
modern protection techniques, such as p-cycles presented in
[39].
Similarly, [40] provides a description of APS schemes
for non-WDM based synchronous optical networks and also
briefly describes the use of rings and mesh schemes in WDM
networks. Thus, both [38] and [40] do not provide an in-depth
and broad survey of variousstate of art protection schemes for
next generation networks employing WDM and DWDM.
The perspective of optical layer survivability in the post-
optical-bubble era has been briefly described in [41]. It has
been suggested that optical protection is viable for simple
dedicated schemes, such as 1+1; whereas for complex shared
ring and mesh networks the optical-electronic-optical based
protection schemes would be more useful. Survivability issues
in the Trans-European Network (till 1999), employing Syn-
chronous Digital Hierarchy (SDH) and WDM technologies,
has been studied in [42]. Both [41] and [42] do not provide a
complete survey of the various recovery techniques employed
in networks.
Different approaches to protect a mesh based Wave-
length Division Multiplexing (WDM) optical network have
been studied in [10]. These protection schemes (dedicated,
shared and dynamic) have been based on two survivability
paradigms, which are path protection/restoration and link
protection/restoration respectively. The numerical results, ob-
tained by solving an ILP problem formulated for a rep-
resentative network topology with random traffic demands,
show that there is a tradeoff between capacity utilisation and
susceptibility to multiple failures. It has also been reported
that path restoration has better efficiency than link restoration;
whereas the latter has shorter recovery time than the former
approach.
A unified ILP formulation for protection in mesh networks
has been formulated in [43]. Further, with an objective of
minimizing the total required capacity, various protection
schemes (dedicated path 1+1, M:N, shared path, dedicated
span, shared span, p-cycles, edge and node disjoint) have been
considered for the European COST 266 action network, [9].
It has been found that higher is the average nodal degree of
network, the less capacity will be required for protection. Also,
it has been pointed out that for wavelength path networks
(where wavelength does not change in the network), the
ILP formulation is too complex to solve, thus heuristics are
required.
None of the above mentioned papers includes a thorough
and comprehensive overview of the state of art technologies
for recovery in NGNs; whereas this paper attempts to provide
breadth wise survey of major recovery techniques, with a
reasonable level of details about the underlying core ideas.
Further in this paper, we have presented various recovery
techniques as have been given in Fig. 2.
IV. DISJOINT PATHS
In order to increase reliability of networks, it is desired to
have multiple disjoint paths between different pairs of nodes.
Disjoint paths are very useful when a network is experiencing
sudden topological changes, such as link failures, or dynamic
overload conditions.
By providing of disjoint paths in a network: (i) the source
can simultaneously send packets on both paths to increase
network reliability and survivability (ii) the source can monitor
end-to-end performance of different paths and can use the
best path (iii) the source can split traffic into two paths to
get a larger amount of data transfer. (iv) disjoint paths can be
used as backup paths during node or link failures. Therefore,
computing and establishing disjoint paths is an important
component in the design of survivable networks. An example
of disjoint paths in a simple network is shown in Fig. 5.
1
2 4
5
6
3
Fig. 5. An example of disjoint paths; paths (1, 2, 4, 6) and (1, 3, 5, 6) are
two disjoint paths between nodes 1 and 6.
Extensive research has been done in finding disjoint paths
in computer networks. Using well known Dijkstra’s algorithm
[44], a new algorithmto compute link disjoint paths in a simple
digraph has been presented in [45]. It uses flow augmentation,
described in subsection II-D, to obtain a maximal set of
node disjoint paths with a minimum total length. It proves
5
that, in general, if there exists a set Pmof mnode disjoint
paths of minimum total length and a shortest flow augmenting
path, then one can obtain another set Pm+1containing m+1
node disjoint paths with a minimum total length. Similar
algorithm to find disjoint paths with shortest length has also
been presented in [46]. Both [45] and [46] have been widely
used as a basis to develop other algorithms for finding disjoint
paths in a network.
In order to design survivable mesh networks, [47] presents
shortest path edge disjoint and vertex disjoint algorithms. The
application of both algorithms is illustrated by using several
network topologies. It has been stated that both algorithms
are simpler than those in [45] and [46], but no graph theoretic
complexity analysis has been given. For further details the
reader is referred to [48].
The problem of finding maximal link disjoint paths in a
multigraph has been addressed in [20]. An algorithm to find
the maximum number of disjoint paths between two nodes
has been developed by using work presented in [45]. This
paper makes extensive use of Graph Theory to represent the
new algorithm. [49] considers the problem of determiningspan
disjoint paths, for providing physical diversity, in networks.
The solution path pair obtained is not guaranteed to be optimal,
but it can be either optimal or near optimal design in a short
time. The generalisation of the algorithm to more than two
paths has been left as an open research problem.
An efficient algorithm to construct an optimal set of short
disjoint paths has also been developed in [50]. Two ap-
proximations for generating sub-optimal solutions has also
been developed. These approximate algorithms have reduced
computational complexity. It has also been found that, com-
pared to Dijkstra’s algorithm, [44], these algorithms perform
better for the well connected large networks. Further, it has
been reported that all three algorithms has been successfully
employed in Enhanced Position Location Reporting System
network of the US Army.
Based on [51] and [46], a distributed algorithm for find-
ing two disjoint paths of minimum total length from each
possible source to destination node has been presented in
[52]. It provides each node with sufficient information to
make incremental routing decisions for forwarding packets
over disjoint paths. This paper does not provide simulation or
experimental results for the proposed algorithm. However, in
[53] this algorithm has been implemented and its performance
has been compared with a newer distributed algorithm for
finding disjoint paths. The algorithm presented in [53] has
been shown to require 3-4 times less messages to discover
paths of comparable quality.
In [54], the problem of finding the shortest path in a pair
of disjoint paths has been investigated. It has been proved
that this problem is NP-complete in either single link cost
(e.g. dedicated backup bandwidth, such as in 1+1) or in dual
link cost (e.g. shared backup bandwidth, such as in M:N).
A novel heuristic algorithm has also been developed whose
performance has been evaluated by using simulations. This
paper also employs Graph Theory extensively. Experimental
studies for shortest path algorithms adapted to the task of
finding the k-successively shortest link disjoint replacement
paths for restoration in network with nnumber of nodes has
been presented in [55]. A list of survey papers on “Shortest
Path Algorithms” has also been provided in [55].
For networks employing MPLS, [11], two methods for local
repair of a label switched path have been described in [56].
Restoration by path concatenation for MPLS paths has been
investigated in [57]. It has been proved that a shortest path after
removing kedges (links) is the concatenation of at most k+1
shortest paths in the original network topology. The theory is
then combined with path concatenation techniques in MPLS to
get a flexible and robust method which has been evaluated on
three large networks. It has been reported that the restoration
scheme performs well in actual networks. However, extension
of this technique to GMPLS has not been investigated yet.
For restoration of a network after failure, the disjoint path
technique can be generalised into k-shortest disjoint paths,
[58]. Recently, probabilistic methods have also been proposed
for the distributed computation of shared backup paths in mesh
optical networks, [59].
Despite its simplicity, network recovery using only the sets
of un-structured disjoint backup paths is difficult to manage
and optimise. It also requires notification of failure to healthy
nodes. Hence, it has been suggested that structured recovery
techniques, forming subtopologies of the network, should be
adopted. Such an approach has been presented in [60], [61]
and [62]. It consists of structuring the disjoint path into a
spanning tree. Further, a communication protocol has been
proposed in [62], which uses k-rooted spanning trees and has
the property that for every vertex v, the paths from vto root
are edge disjoint. An algorithm to find two such trees in a
2-edge connected graph has also been presented. This paper
also employs the extensive use of Graph Theory. Another such
method for structured recovery, using multi-trees, has been
presented in [63]; for more details see Section VIII.
V. PROTECTION RINGS AND MESH BASED NETWORKS
Traditionally, there have been two distinct approaches to
survivable network design and its restoration, which are
Ring and Mesh based. Ring based survivable networking
has been widely employed in Synchronous Optical Networks
(SONET)/SDH [64], and optical DWDM networks, [3] and
[65]. The SONET/SDH standard dictates that the total recov-
ery time after failure must be 60ms.
The types of ring architecture are classified according to
the basis of direction of the traffic flow. Uni-directional rings
carry traffic in only one direction, whereas Bi-directional rings
carry network traffic in both directions. Therefore, ring based
networks involve using Uni-directional Path Switched Ring
(UPSR) and Bi-directional Line Switched Rings (BLSR); or
their optical versions: Optical Path Protection Ring (OPPR)
and Optical Shared Protection Ring (OSPR), [66].
Among the major Telecommunication carriers, three types
of rings have become very popular, which are: 2-fibers UPSR,
4-fibers BLSR (BLSR/4) and 2-fiber BLSR (BLSR/2). They
6
are also known as USHR, BSHR/4 and BSHR/2, where SHR
stands for Self Healing Ring, [5]. Sample UPSR and BLSR/4
topologies are shown in Fig. 6. UPSRs are popular topologies
for lower speed local access networks, whereas BLSRs are
widely used long-haul networks.
(a)
(b)
Fig. 6. Uni-directional Path-Switched Ring (UPSR) and 4-fiber Bi-directional
Line-Switched Ring (BLSR/4) with four Add Drop Multiplexer (ADM);
adapted from [5].
As an alternative to protection provided by backup fibers in
SONET rings, [67] presents a WDM based backup recovery
for optical networks. In this technique certain wavelengths
are used to backup other wavelengths. By employing Graph
Theory, an algorithm for node recovery and an algorithm for
link recovery has been presented. However, no simulation or
experimental results have been provided.
In [68], a passively protected 4-fiber Bi-directional self
healing ring architecture has been presented. It uses a SONET
ring to carry working traffic and a passive ring, made ofoptical
switches and amplifiers, to carry protection traffic in case of
failure in network components. This scheme shows significant
cost savings (14-57%) over conventional scheme, shown in
Fig. 6b. The application of this scheme has also been demon-
strated practically for OC-24 (1.244 Gbps) transmission, see
[68] and its references.
It has been pointed out in [6] thatalgorithms to find classical
rings belong to one of the following four classes: (i) circuit
vector space (ii) back tracking algorithms (iii) powers of
adjacency matrix and (iv) edge-digraphs. For more details on
these classes and some other heuristic methods to find rings
please see [6] and its associated references.
Further, three approaches (Span Coverage, Fixed Cost and
Routing, Foundation Design) to design the Ring-based optical
networks have been presented in [69]. Preliminary results
indicate that these three approaches can provide useful so-
lution to the difficult problem of optimal design of Rings.
However, these formulations do not provide optimal location
of wavelength converters.
Due to capacity in-efficiency, in-flexibility and complexity
of the ring based networks, recently there has been a growing
trend among network operators to adopt other alternatives such
as mesh based networks, [70]. An example of a Mesh based
network is shown in Fig. 7.
A comparison of important recovery schemes for mesh
networks, i.e. (i) 1+1 APS (ii) span restoration with and
without joint optimisation of working and spare capacity (iii)
chain-optimisation (one in which working and spare capacity
Fig. 7. An example of mesh based network, Pan European COST 239 [9].
are jointly optimised) (iv) shared backup path protection and
(v) true path protection (involving a centralized rerouting),
has been provided in [71]. Two expressions for ratio of spare
to working capacity for mesh networks have been derived.
Further, authors have reported results of the tests conducted
on these six schemes by employing networks given in [72]. For
the design of survivable-mesh networks, AMPL, [73], models
have been developed which are later solved through CPLEX,
[74]. The schemes in (ii) and (iii) employ dynamic span
restoration, either by distributed pre-planning or by centralized
planning, [75].
For mesh based networks, the capacity efficiency is best
for highly connected graphs. It can be been noticed that the
average nodal degree of European Networks is much higher
(close to 4.5), than for the US networks (close to 2.5), [70].
Therefore, for such sparse mesh networks, capacity efficiency
may be less significant.
A common goal of both ring and mesh based networks is
to provide a fully restorable network, i.e. a network which can
provide almost real-time recovery from single-span failure.
The ring based network has higher restoration speed with
100% redundancy in resources, whereas mesh based networks
have lower restoration speed with lesser redundancy require-
ments. An intermediate approach (known as p-cycles), aiming
at ring like speeds and mesh like resource utilisation, has been
surveyed in Section VII.
VI. PROTECTION CYCLES
The concept of protection cycles for link failures in net-
works with optical-mesh-topologies and bidirectional links,
while performing automatic protection switching, was inde-
pendently proposed in [76]. Networks having planar (one that
can be drawn on plane without intersection of edges), non-
planar and Eulerian (one that use each edge of graph exactly
once) topologies have been considered. Further, it has been
concluded that the proposed recovery system is distributed,
autonomous and network state independent.
For planar topologies, with no intersecting edges except
at vertices, the protection cycles are a set of facial cycles;
where for a planar graph with nvertices (n3) and m
7
edges, there are f=2+mnfaces, [76]. For non-planar
topologies, a cycle double cover of the graph has to be found.
The cycle double cover of the graph has been defined as a
cycle decomposition, such that each edge appears in exactly
two cycles. For Eulerian digraphs, a cycle decomposition can
always be obtained, see [76] and its references. An example
for a cycle double cover for a non-planar network is shown in
Fig. 8.
12
3
4
5
6
7
8
9
10
11
12
13
14
Fig. 8. An example of protection cycles in a 14 nodes non-planar network,
[76]; cycle[0]=[1 14][14 13][13 12][12 11][11 10][10 1], cycle[1]=[8 7][7
2][2 3][3 12][12 13][13 8],.., cycle[6]=[7 6][6 5][5 14][14 1][1 2][2 7]
For planar topologies, the maximum number of bi-
directional and uni-directional link failures that can be pro-
tected are: bf
2cand f1. For non-planar topologies with
2-edge connected graph of order n, these values are bn1
2c
and n2. The problem of finding all Eulerian circuits for an
arbitrary Eulerian graph is NP-complete, [21]. The maximum
number of uni- or bi-directional links that can be simultane-
ously restored in an Eulerian graph of order nis bn1
2c, [76].
In [77], Hamiltonian Cycle Protection has been proposed
for mesh WDM optical networks. It is a simple and efficient
scheme, which aggregates all spare resources of the network
into a single spanning ring which can be separately managed
from working resources and is also independent of working
load. This scheme does not require a complicated optimization
procedure, but it lacks generality and optimality.
VII. PRE-CONFIGURED CYCLES
Pre-configured cycles, widely known as p-cycles, are aimed
at achieving ring-like restoration speeds (50-150 ms) while
retaining the capacity efficiency of mesh-restorable networks.
They were first introduced for optical WDM and SONET
transport networks in [39] and [78]. An updated repository
of papers on p-cycles has been maintained at [79].
A p-cycle is constructed from spare capacity available in the
network, and it occupies a unit of spare capacity on each of the
on-cycle spans. Similar to a self-healing ring, a p-cycle has one
restoration path for every on-cycle span; whereas in contrast
it provides two restoration paths for a straddling span (a span
whose two end nodes are on a p-cycle but it is not part of a
p-cycle). An example of a p-cycle, BCDE, is shown in Fig. 9.
For an on-cycle span BC, p-cycle provides only one restoration
path BEDC; whereas for a straddling span BD the restoration
paths are BED and BCD. In [80], it has been shown (by a
bounding type argument) that p-cycles have a high restoration
efficiency and are superior to preconfigured linear segments
or trees. Also, it has been shown that the capacity efficiency
of fully preconfigured p-cycle networks have the same lower
bound on the ratio of spare to working capacity as on-demand
cross-connected span restorable mesh networks.
A
B
C
D
E
Fig. 9. An example of p-cycle: (B, C, D, E, B).
Two formulations, termed as “max Rp|spare” and “min
spare|Rp=1”, for generating p-cycles have been provided in
[9]. The former methodology produces p-cycles with max-
imum restorability within a given amount and placement
of spare capacity; whereas the latter produces p-cycles that
minimize the total spare capacity required for 100 % p-cycle
restorability. Further, a software package has been provided to
generate p-cycles using both techniques. After the generation
of p-cycles, the next important step is their optimisation with
respect to capacity efficiency and length. For such optimisation
purposes, algorithmic approaches have been proposed in [81]
and [82].
The strategies for dual failure protection by p-cycles have
been investigated in [9]. Both static and re-configurable p-
cycles have been considered. Test case studies on the COST
239 network, shown in Fig. 7, showed that re-configurable
p-cycles allows dual-failure recovery at low capacity cost;
though it is 70 % higher than single failure recovery case.
The tradeoff between the number of deployed p-cycles and
survivability to dual fiber duct failures has been investigated
in [83]. It is reported that dual failure recovery can vary signif-
icantly for configurations with different number of deployed
p-cycles.
In [84], use of p-cycles has been studied for providing
protection in Multi Protocol Label Switched networks, [11].
The problem complexity has been reduced by pre-selection
of elementary p-cycles. It has been stated that the proposed
p-cycle scheme performs well for routing of bandwidth guar-
anteed bypass tunnels in MPLS networks.
The concept of p-cycle has also been extended to SRLGs,
see subsection II-B, in [85]. An algorithm to generate a basic
p-cycle for 100% restorability in the case of a single SRLG
failure has been presented. It has been noted that SRLG failure
detection undermines fast restoration. Thus, the use of p-cycles
for SRLGs is still in its rudimentary stages and needs to be
investigated further.
A. Adaptation of p-cycles for IP networks
Basically, p-cycles has been designed for protection against
link failures in optical networks only. However, in [66] it
8
was pointed out that p-cycles can also be used for IP-layer
restoration. Further, it has also been found that in IP networks,
node failure (routers) is more common than link failure (due
to fiber cuts), whereas the reverse is true for WDM and
SONET based optical networks. Therefore, p-cycles need to
be modified to cope with the node failure in IP networks.
Towards this end, [86] and [87] present virtual protection
cycles for employing the p-cycles technique against network
node failures in IP networks.
Closed logical loops (node-encircling p-cycle) are formed
by virtual circuit techniques, such as used in MPLS, which
protect links and nodes in an IP network. In this scheme, at
the event of failure, the packets which would have normally
been lost, are encapsulated with an IP address of a p-cycle
(forming p-cycle packet) and re-entered into the routing table
which directs them to the path defined by the p-cycle. This
scheme does not disturb conventional routing protocols, such
as BGP and OSPF, for global updating of routing tables.
However, despite good resource utilisation, this scheme can
create exceptionally long backup paths in highly connected IP
networks, [88].
B. Hamiltonian p-cycles
A very efficient strategy to protect a network is to provide
a p-cycle on an Hamiltonian cycle, described in subsection
II-D. It has been shown in [89] that Hamiltonian p-cycles
can achieve a logical redundancy (ratio of spare to working
capacity) bound of 1/(¯
d1), where ¯
dis the average node
degree of the network. In[90], it has been reported that a single
Hamiltonian p-cycle in a homogeneous network (where all
links have same capacity) will be optimal in redundancy. Also,
a single p-cycle is non-optimal in redundancy, if the network
has different link capacities, even if it contains Hamiltonian
cycles. An area of further research is to optimise the design
of such non-homogeneous networks, in which high load spans
can be arranged as straddlers on p-cycles, [90].
Based on swarm intelligence, [91] proposes a cross entropy
ant system for solving max-min optimisation problem for
finding the Hamiltonian p-cycles in the mesh networks. It is
a biologically inspired system consisting of a large number
of ant-like autonomous components, [92]. These agents work
in a distributed fashion and communicate with each other
by asynchronous means. Such a system can help find a near
optimal solution to NP hard problems. The reported simulation
results indicate that 80% of link failures can be fully protected
by unused network capacity.
C. Network Planning with p-cycles
Capacity efficient planning of resilient networks with p-
cycles has been investigated in [93]. The process of finding
the optimal set of p-cycles has been explained through a block
diagram. The advantages and disadvantages of p-cycles have
also been discussed briefly. It has been noted that p-cycles are
fast and capacity efficient for single element failures, but they
may have longer path-lengths than the original route. Thus,
it has been suggested that p-cycles may be better for well
connected networks. Authors have selected COST-239, shown
in Fig. 7, network to show capacity efficiency of p-cycles.
Similarly, capacity efficient planning with non-simple p-cycles
(one in which nodes and links can be traversed more than
once) has been presented in [94]. The number of such cycles
in a network is larger than the elementary simple cycles. The
main advantage of non-simple cycle approach lies in the fact
that in case of capacitated or highly sparse meshed networks
their number is larger than simple cycles.
In [95], p-cycles have been introduced in a mixed method
environment that do not strictly require 1+1 automatic protec-
tion switching but need a faster switching time than shared
backup path protection. It has been found that significant
savings in capacity can occur in a well-connected network.
In [96], configuration of p-cycles in WDM network with
limited (partial) number of wavelength-conversion has been
investigated. It has been found that total number of required
converters can be greatly reduced by a small increase in spare
capacity of network for protection by p-cycles. The problem
of optimal configuration of p-cycles in a WDM network has
been modelled in [97]. Two types of paths, i.e. with and
without wavelength conversion have been considered. Later,
these models are solved for COST 239 network, Fig. 7, and
it has been reported that p-cycles achieve high efficiency in
wavelength converting networks.
For sparse wavelength conversion (in which only a sub-
set of network nodes has wavelength conversion capability),
the problem of optimal configuration of p-cycles has been
investigated in [98]. Wavelength converters and p-cycles are
determined with sparsity constraint. It has been reported that
sparse wavelength converting technique significantly improves
the protection cost over the non-wavelength converting path
approach presented in [96] and [97].
VIII. REDUNDANT TREES
In [99] and [63], two new algorithms have been presented
which create redundant (multiple) trees on arbitrary node-
redundant and link-redundant networks. Menger’s Theorem,
subsection II-D.1, has been applied to construct the algorithms
for generating these redundant trees. These trees have a
property that any node is connected to the common root of
a tree by at least one of the trees in case of either link or node
failure. Two types of trees are defined, primary and secondary
or red and blue types, that are immune to failure of a single
vertex or edge. The blue tree is working tree and the red tree
is used for the recovery process.
The redundant tree approach is basically a path re-routing
mechanism and it requires a dedicated backup path, as in 1+1
protection, given in subsection II-A. The the goal is to design
two directed trees such that elimination of any vertex or edge
(except source) leaves each destination vertex connected to
source by one of the trees.
A. An example of Redundant Trees
Let us consider a network consisting of 8 vertices (nodes)
and 12 edges (links), as shown in Fig. 10(a), [100]. Directed
9
trees routed at the source vertex A, spanning all vertices, has
been shown in Fig. 10(b). The Blue (working path) and Red
tree (backup path) have been represented by solid and dotted
lines, respectively. Under normal operating conditions, i.e. no
failures, each node is connected to source node A by means
of the Blue tree. For failure at node G, nodes F and H will no
longer be connected to A through the Blue tree, but through
the Red tree. Similarly, for failure of the link CG, the nodes
G, E, F and H will not stay connected to A through the Blue
tree but instead through the Red tree.
A
B
C
D
E
F
G
H
(a)
A
B
C
D
E
G
F
H
(b)
Fig. 10. An example of Redundant Trees, [100]: (a) network topology (b) a
working (blue) and a recovery (red) tree.
The algorithms presented in [63], to generate redundant
trees, can be modified to take into account cost functions
which can also be implemented in a distributed fashion for
edge redundancy. Any cost function (such as delay minimisa-
tion) can be applied to select cycles and paths. Multiple fail-
ures, spare capacity planning, cost minimisation, link rerouting
and creation of three trees for edge failure have been pointed
as areas for further research.
Further, building on [101], Quality of Service and Quality of
Protection issues in preplanned recovery using redundant trees
has been investigated in two closely related papers, [100] and
[100]. It has been assumed that average delay, total cost and
bandwidth are independent of each other, which might not be
true in practice. Performance evaluation of these algorithms in
real networks still needs to be done.
Preplanned recovery, using redundant trees, has been heuris-
tically studied in [101]. Two new algorithms have been devel-
oped by using concepts of Graph Theory. Extensive simulation
results, using the C++ LEDA library, show that the proposed
algorithms can reduce average delay in protection trees by 60-
90% compared to the trees produced by algorithms in [63]. It
also lists tractability and computation of single recovery trees
as an open research problem.
The application of Redundant Trees in MPLS networks has
been investigated in [102]. A new scheme has been proposed
that ensures that every protected node is connected to two
protection paths placed in a way that no single link failure
will cause the simultaneous loss of connectivity. Simulation
results have shown that the new algorithm along with heuristic
extensions achieve a low cost and low complexity protection.
IX. RESILIENT ROUTING LAYERS
The concept of Resilient Routing Layers (RRLs) has been
presented in [88] and [26]. This technique consists of finding
fully connected topology subsets, called layers, which are
employed to carry traffic during a network failure. The main
idea is that, for each node in the network, there is a subset
of the topology which can safely carry the traffic affected by
failure in that node or in any of its links.
The work similar to RRLs has been presented in [103] and
[104], which deals with the network formed by computer clus-
ters. RRLs can be used both for connectionless IP networks
or connection oriented networks, e.g. MPLS. For IP networks,
each packet must be marked according to the layer which is
currently valid. Thus, for nlayers, log2nbits are required in
the packet header to identify the current valid layer. In case
of failure, the traffic passing through the failed node will be
shifted to another layer, while other traffic will still be routed
through the full topology, [26].
Layers are formed by removing links from the full network
topology in a manner that each link is safe in at least one layer.
A link is safe in a layer, if it is absent in that layer. Multiple
links can be safe in a particular layer. While generating
layers, a link can be removed unless it is an articulation link
(disconnecting the topology). RRLs can be generated both
manually or by an automated process, [88]. Three algorithms
for generating RRLs, i.e. minimum, rich and sparse, have been
evaluated in [88]. The network topologies considered are those
obtained from Rocketfuel [28] and Brite Topology generator
[27].
A. Examples of RRLs
Let us consider the simple network topology shown in Fig.
11(a) which consists of 6 nodes and 11 links. In order to
make the network safe against link failure, we need to create
layers such that for each link there exists a layer in which it
is absent. Two such layers have been shown in Fig 11(b) and
11(c). where each link has been removed in at least one layer,
[88].
(a) (b) (c)
1
2345
6
1
2345
6
1
2345
6
Fig. 11. An example of RRLs for link failure, [88]: (a) full network topology,
(b) Layer 1, and (c) Layer 2.
Similarly, an example of application of RRL for protection
against network node failure is provided in Fig. 12, [88]. The
two RRLs are shown in Fig. 12(b) and 12(c), respectively. The
flow of traffic after failure of the node 5 has been shown in
Fig. 12(d); note that the dotted line is not available in layer
1 and thus is a safe layer for node 5. Also, before failure of
node 5 all traffic is using full topology, i.e. Fig. 12(a). In the
case of local recovery, traffic is first routed from node 6 to
4 and then node 4 will detect the fault at node 5 and switch
traffic to layer 1. Thus, traffic flowing along the path (6, 4, 5,
3) in healthy network, Fig. 12(a), will be switched to (6, 4,
7, 8, 3). In the case of global recovery, node 6 would have
already know that fault has occurred at node 5, thus path (6,
10
7, 8, 3) will be selected. Different algorithms to generate RRL
have been presented in [88] and implemented in [105].
In [106], RRLs are proposed for recovery during transient
failures and before IP re-routing has been completed and
converged. It has been reported that RRLs perform close
to optimal with respect to throughput and global recovery,
which may not be possible while using a pre-configured
approach (p-cycles) in connectionless IP networks. It has been
mentioned that RRLs will provide milli-second recovery with
good throughput and link load distributions. It has been shown
that issues of optimal generation and routing for RRLs needs
further investigation.
123
4 5
678
(a)
123
4 5
678
(b)
123
4 5
678
(c)
123
4 5
678
(d)
Fig. 12. An example of RRLs for node failure, [88]: (a) full network topology,
(b) Layer 1, (c) Layer 2 and (d) Traffic paths.
X. IP FAST REROUTE
IP Fast Reroute (IPFRR) is another emerging technology
to rapidly repair failure conditions in IP networks. It can
provide resiliency to high speed networks by quickly (<50
ms) rerouting traffic to precomputed paths. When a link or
a router fails in a network, the neighbouring routers around
failure point will be first to know about that fault and perform
repair by steering packets to their destinations through alter-
nate paths. An evaluation of five different IPFRR techniques
(Links, Loop Free Alternatives, U-Turns, Tunnels and Directed
Tunnels) on five different networks has been presented in
[107]. It has been shown by simulations that combined Loop
Free Alternatives and U-Turns are sufficient to protect 40 to
90 % of directed link faults, whereas protection tunnels can
provide full protection.
A common limitation of most of IPFRR mechanisms is
an inability to forward repaired packets around failure point,
for wide variety of network topologies. To overcome these
limitations, a new mechanism called “IPFRR using not-via ad-
dresses” has been recently proposed in [108]. In this technique
a special address called “not-via address” is attached to each
protected component. A packet addressed to a not-via address
must be delivered to the router advertising that address and not
via the protected component. This technique requiresgood co-
operation among all of the routers in the network as well as
it has several security concerns, (not-via addresses should not
be advertised, should not used as primary protection method,
should be filtered at network entry points), [108]. A simple
S P
C
B
A
D
Bp
Repair to B
p
Fig. 13. An example of IPFRR for failure at node P in a network fragment,
[108].
example of node repair by IPFRR has been given in Fig. 13.
Packets from node S have to reach node D via nodes P and B.
As soon as node S find that node P has failed, it encapsulate
packets to reach Bp without passing P. The path from S to Bp
is the shortest path from S to B without transiting at P. When
packet arrives at B, it removes its encapsulation and forward
it to the final destination D. Other examples of using IPFRR
for link failure, Multi-homed prefixes and compound failures
in Shared Risk Link Groups can be found in [108]. The use of
IPFRR for multiple simultaneous failures is an area of further
study.
XI. COMPARISON BETWEEN STRUCTURED RECOVERY
TECHNIQUES
Ring based networks are easy to operate and have a very
short restoration time (within the range of 50-60 ms), but
they require at least 100 % redundancy in resources. Also,
the planning of and operation of multi-ring based networks
is more complex. A comparison of basic characteristics of
UPSR, BLSR/4 and BLSR/2 types of Ring architecture has
been provided at pp. 436 [5]. It has been noted that with
single fiber, fast restoration speed and low node complexity,
the UPSR does not provide spatially reuse of fiber capacity
(allowing multiple nodes to fully utilise the spare capacity of
a fiber); whereas BLSR/4 and BLSR/2 do provide the spatial
reuse of fiber bandwidth.
A. Ring and Mesh Networks
A quantitative comparison of ring and mesh based networks
has been presented in [109]. It has been reported that a
fully restorable mesh network can roughly require one third
of total link redundancy of an all ring design. Further, in
[110] a simulation based quantitative comparison of end-
to-end availability of service paths in ring and mesh based
networks has been presented. It has been reported that, mesh
networks have higher availability then ring networks. Hence
findings of this paper support the idea, that high redundancy
will not necessarily increase the availability, as there seems
to be no simple link between both attributes of networks.
11
Therefore, compared with Rings the Mesh based networks
have higher potential for being used in construction of NGNs.
B. Protection Rings and p-cycles
A comparison between Protection Rings and p-cycles tech-
nologies has been presented at pp. 667 [9]. It has been reported
that p-cycle has better modularity, protection yield (upto two
protection paths), flexibility, capacity redundancy and length of
protection path for straddling span failures. However, authors
have not provided reliability assessment of both techniques.
Therefore, reliability aspects of both technologies has been
surveyed in the next subsection.
1) Reliability Assessment: In [111], reliability assessment
of Protection Rings and p-cycles has been investigated and
the widely accepted dominance of the former technique has
been seriously challenged. In this regard, the methodology
employed is similar to that used in [112]. Wherein, reliability
models for a double ring network, reliability improvement of
double ring over single ring and the effect of concentrator trees
in rings have been presented.
However, for simplicity, it has been assumed in [111] that
the failure of links are independent from each other, which is
not always true in real network scenarios. Reliability formulae
for p-cycles, Uni-directional Path Switched Ring (UPSR) and
4-fiber Bi-directional Link Switched Ring (BLSR) have been
derived. Further, formulae in [113] have been employed to
determine the steady state availability in each case. It has been
noted that BLSR has largest resiliency, while p-cycle is worst.
The availability of p-cycles also reduces with an increase in the
number of straddling spans. However, it has been suggested
that p-cycles are more suitable for metropolitan area networks
(having large values of fiber availability) rather than for wide
area networks. It has also been stated that, consideration of
node reliability will make the reliability function of p-cycles
further inferior.
The reliability of p-cycles has also been analysed in [114];
expressions for mean time to failure and expected loss of
traffic have been derived. A two state Markov model has been
used to find the expression for mean time to failure as in
[113]. A quality of recovery function, involving redundancy
and availability, has also been introduced. Such functions can
be used at the planning stages of the network. It has been
concluded that p-cycles do not always have good reliability
properties (low value of mean time to failure and high value
of expected loss of traffic). If an operator decides to employ
p-cycles, then it must be ensured that a reasonable trade-
off exists between resource sharing and availability, [114].
Furthermore, based on the quality of the recovery function
presented in [114], a framework for assessment of recovery
procedures has been presented in [115].
C. Protection cycles and p-cycles
Comparative properties of both Protection cycles and p-
cycles technologies have been tabulated in [9], pp. 670-673.
It has been shown that p-cycles are far superior to protection
cycles in terms of capacity redundancy, application domain
and conceptual basis. Also, it has been noted in [77], that
p-cycles is a general formulation from which the results for
protection cycles can be derived as a limiting case.
D. p-cycles, Redundant Trees, RRLs and IPFRR
A comparison between RRLs and p-cycles has been pre-
sented in [105]. It has been shown that the larger p-cycles
will result in prohibitively longer backup paths and will also
have higher state and addressing needs. Also, it has been found
that the length of smaller p-cycles grows logarithmically with
the number of network nodes, whereas for RRLs it grows sub-
logarithmically. Thus, it has been suggested that creation of
smaller p-cycles on existing networks will be advantageous.
From simulation results, it has been found that, compared to
p-cycles, the protection success ratio in RRLs is 25 % higher
and the backup path length is 25 % shorter, [105]. Hence,
apparently, it can be conjectured that, compared to p-cycles,
RRLs are a more economical scheme for recovery in networks.
The applicability of RRLs for handling multiple failures
in network components has been investigated in [116]. They
reported that, compared to Redundant Trees, fault tolerance
capability in RRLs is often more than double. Also, it has
been suggested that an in-depth comparative study of k-failure
recovery by Redundant Trees, p-cycles and RRLs is required.
Further, it can be noted that the point of similarity between
Protection cycles, p-cycles, Redundant trees and Resilient
Routing Layers consists of creating spanning sub-topologies
over the full topology of the network, and employing these
sub-topologies in the event of failure in the network.
Compared to all other recovery methods surveyed in this
paper, IPFRR has been specifically designed for packet net-
works employing Internet Protocol to transfer a wide variety
of data. In the case of router or link failure, this technology
has a very short restoration time (in the order of few tens
of ms). Also, IPFRR can be readily adopted for SRLGs (see
Subsection II-B).
Finally, a comparison of different recovery techniques has
been presented in Table I. It shows that p-cycles, RRLs and
IPFRR are the most promising recovery techniques for NGNs.
XII. CONCLUDING DISCUSSION
In this paper an effort has been made to concisely present
and compare the Disjoint paths, Protection Rings, Protection
Cycles, p-cycles, Redundant Trees and Resilient Routing Lay-
ers techniques for recovery against faults in Next Generation
Networks. The main ideas behind these techniques have been
concisely explained.
The use of disjoint paths for recovery form an un-structured
method of recovery, whereas Protection Rings, Protection Cy-
cles, p-cycles, Redundant Trees and Resilient Routing Layers
constitute structured methods. Rings have been widely used
in high capacity networks for fast recovery, though they are
expensive. An alternative way to build networks is to adopt
mesh based techniques. p-cycles were developed to get the best
features of both Ring and Mesh based networks. They have
been very popular within the network research community as
12
TABLE I
COMPARISON OF DIFFE RENT PROTECTION TECHNIQU ES ; [9], [26], [63] AND [108]
Protection Techniques
Attributes Disjoint Paths Protection Rings Protection Cycles p-cycles RRL Redundant Trees IPFRR
Modularity Nil Good Good Good Good Good Good
Redundancy 100% 100% 100% <100% Nil 100% Nil
Node recovery Nil Poor Poor Poor Good Nil Good
Link Recovery Good Good Good Good Good Good Good
Local Recovery Yes Yes Yes Yes Yes Yes Yes
Global Recovery No No No No Yes No Yes
well as for implementation in practical networks. Originally,
p-cycles had been developed for optical networks, but can
be adapted for use with IP based networks. In packet based
networks, the long backup paths provided by p-cycles are
considered to be un-acceptable. When compared to Rings, p-
cycles have also been shown to be less reliable. However,
p-cycles do provide better capacity efficiency.
Redundant Trees were originally designed for optical net-
works, but have also been applied to MPLS networks. How-
ever, their applicability to IP based networks needs to be
investigated.
Recently, in the domain of IP based packet networks, the
efficacy of p-cycles has been strongly challenged by the
Resilient Routing Layers approach. These protection layers
can provide protection for both node and link failures; and
they can be constructed/optimised algorithmically. RRLs is a
scalable approach, even large networks can be covered by 5
layers. The optimisation of RRLs and comparison of capacity
efficiency with p-cycles, needs to further investigated for IP
based networks.
For pure IP based networks, IPFRR is a very promising
technique which can provide a short recovery time. Among
various possible IPFRR methodologies, not-via-addresses is
the most recent one. It requires a complete coordination
among along all routers in a network. The issues and effi-
cacy of not-via-addresses for network employing IPv6 needs
to be investigated further. Also, time required for updating
the routing tables (packet Forwarding Information Base) for
implementation of not-via-addresses and latter the problem of
address lookup needs to be researched further.
XIII. ACKNOWLEDGEMENT
Authors wish to thank Tarik Cicic (tarikc@simula.no) for
help in understanding the Java code for RRLs, [105]. We are
also thankful to anonymous reviewers for their constructive
comments.
REFERENCES
[1] Z. Rosberg, H. L. Vu, M. Zukerman, and J. White, “Performance
Analyses of Optical Burst-Switching Networks,” IEEE Journal on
Selected Areas in Communications, vol. 21, no. 7, pp. 1187–1197,
September 2003.
[2] W. Goralski, Optical Networking & WDM. McGraw-Hill, June 2001.
[3] Cisco, 2000, information available at
http://www.cisco.com/univercd/cc/td/doc/product/mels/cm1500/dwdm/
index.htm, http://www.cisco.com/univercd/cc/td/doc/product/mels/dwdm
/dwdm fns.htm.
[4] Xilinx, http://www.xilinx.com/esp/wired/optical/collateral/WDM.pdf.
[5] R. Ramaswami and K. N. Sivarajan, Optical Networks: A Practical
Perspective. Morgan Kaufmann Publishers, Inc., 1998, San Francisco,
USA.
[6] H. Zhang and O. Yang, “Finding Protection Cycles in DWDM Net-
works,” Proc. of IEEE ICC’02, vol. 5, pp. 2756–2760, April-May 2002.
[7] G. E. Keiser, FTTX Concepts and Applications. Wiley-IEEE Press,
January 2006.
[8] W. Molisz, “Survivability Function A Measure of Disaster-Based Rout-
ing Performance,” IEEE Journal on Selected Areas in Communications,
vol. 22, no. 9, pp. 1876–1883, November 2004.
[9] W. D. Grover, Mesh-Based Survivable Networks, Options and Strate-
gies for Optical, MPLS, SONET, and ATM Networking, 1st ed. Upper
Saddle River, NJ 07458: Prentice Hall PTR, 2004.
[10] S. Ramamurthy, L. Sahasrabuddhe, and B. Mukherjee, “Survivable
WDM Mesh Networks,” IEEE Journal of Lightwave Technology,
vol. 21, no. 4, pp. 870–883, April 2003.
[11] B. S. Davie and Y. Rekhter, MPLS: Technology and Applications.
Morgan Kaufmann Publishers, 2000, San Francisco, USA.
[12] A. Jajszczyk and P.Rozycki, “Recovery of the Control Plane after Fail-
ures in ASON/GMPLS Networks,” IEEE Network, pp. 4–10, January-
February 2006.
[13] N. Larkin, “ASON and GMPLS-The Battle of the Optical Control
Plane, An overview of the ongoing work of IETF and ITU to stan-
dardise optical control plane protocols,” August 2004, available at:
http://www.dataconnection.com/network/download/whitepapers/
asongmpls.pdf.
[14] J. P. Lang and J. Drake, “Mesh Network Resiliency Using GMPLS,”
Proceedings of the IEEE, vol. 90, no. 9, pp. 1559–1564, September
2002.
[15] D. Xu, Y. Xiong, C. Qiao, and G. Li, “Trap Avoidance and Protection
Schemes in Networks with Shared Risk Link Groups,” Journal of
Lightwave Technology, vol. 20, no. 11, pp. 2683–2693, November
2003.
[16] A. Farrel, The Internet and Its Protocols: A Comparative Approach.
500 Sansome Street, Suite 400, San Francisco, CA 94111: Morgan
Kaufmann, 2004.
[17] J. Wu, M. Savoie, and H. Mouftah, “Recovery from control plane
failures in LDP signalling protocol,Elsevier Optical Switching and
Networking, vol. 2, pp. 148–162, August 2005.
[18] J. Wu, M. Savoie, D. Y. Montuno, and H. T. Mouftah, “Recovery from
Control Plane Failures in the CR-LDP Signalling Protocol,” Proc. of
Design of Reliable Communication Networks, pp. 139–146, October
2003.
[19] R. Diestel, Graph Theory. Springer-Verlag,
2005, also available at http://www.math.uni-
hamburg.de/home/diestel/books/graph.theory/GraphTheoryIII.pdf,
http://www1.cs.columbia.edu/sanders/graphtheory/writings/books.html
and http://en.wikipedia.org/wiki/Graph theory.
[20] J. S. Whalen and J. Kenney, “Finding Maximal Link Disjoint Paths
in a Multigraph,” Proceedings of IEEE GLOBECOM ’90, vol. 1, pp.
470–474, December 1990.
[21] M. R. Garey and D. S. Johnson, Computers and Intractability. W. H.
Freeman and Company, 1979, USA.
[22] M. Serra and K. Kent, “Using FPGAs to solve the Hamiltonian cycle
problem,” Proceedings of International Symposium on Circuits and
Systems, ISCAS ’03, vol. 3, pp. 228–231, May 2003.
[23] K. Menger, “Zur allgemeinen kurventheorie,” pp. 96–15, 1927.
13
[24] E. W. Weisstein, “Menger’s n-Arc Theorem,” available at
http://mathworld.wolfram.com/Mengersn-ArcTheorem.html.
[25] M. Stoer, Design of Survivable Networks. Springer-Verlag, 1992,
lecture Notes in Mathematics.
[26] A. Kvalbein, A. F. Hansen, T. Cicic, S. Gjessing, and O. Lysne, “Re-
silient Routing Layers for Recovery in Packet Networks,” Proceedings
of International Conference on Dependable Systems and Networks
DSN’05, pp. 238–247, June-July 2005.
[27] A. Medina, A. Lakhina, I. Matta, and J. Byers, “BRITE: An Ap-
proach to Universal Topology Generation,Proceedings of 9-th Inter-
national Symposium on Modeling, Analysis and Simulation of Com-
puter and Telecommunication Systems, pp. 346–353, August 2001,
http://www.cs.bu.edu/brite/index.html.
[28] R. Mahajan, N. Spring, D. Wetherall, and T. Anderson,
“Rocketfuel: An ISP Topology Mapping Engine,” 2002,
http://www.cs.washington.edu/research/networking/rocketfuel/.
[29] H. Tangmunarunkit, R. Govindan, S. Jamin, S. Shenker, and W. Will-
inger, “Network Topology Generators: Degree-Based vs. Structural,”
Proc. of ACM SIGCOMM’02, pp. 147–159, August 2002.
[30] E. W. Zegura, K. L. Calvert, and M. J. Donahoo, “A Quantitative Com-
parison of Graph-Based Models for Internet Topology,IEEE/ACM
Transactions on Networking, vol. 5, no. 6, pp. 770–783, December
1997.
[31] V. Anand and C. Qiao, “Dynamic Establishment of Protection Paths in
WDM Networks Part I,” Proceedings of Ninth International Conference
on Computer Communications and Networks, ICCCN, pp. 198–204,
October 2000.
[32] D. phone K. Hsing, B.-C. Cheng, G. Goncu, and L. Kant, “A
Restoration Methodology Based on Pre-Planned Source Routing in
ATM Networks,” Proceedings of IEEE ICC, vol. 1, pp. 277–282, June
1997.
[33] D. Medhi and D. Tipper, “Multi-Layered Network Survivability
Models, Analysis, Architecture,Framework and Implementation: An
Overview,Proceedings of DARPA Information Survivability Confer-
ence and Exposition, DISCEX ’00, vol. 1, pp. 173–186, January 2000.
[34] S. Ramamurthy and B. Mukherjee, “Survivable WDM Mesh Networks,
Part I - Protection,” Proceedings of IEEE INFOCOM’99, vol. 2, pp.
744–751, March 1999.
[35] ——, “Survivable WDM Mesh Networks, Part II - Restoration,”
Proceedings of IEEE ICC’99, vol. 3, pp. 2023–2030, June 1999.
[36] G. Xue, W. Zhang, J. Tang, and K. Thulasiraman, “Establishment
of Survivable Connections in WDM Networks using Partial Path
Protection,” Proceedings of IEEE ICC, vol. 3, pp. 1756–1760, May
2005.
[37] H. Wang, E. Modiano, and M. Medard, “Partial Path Protection for
WDM Networks: End-to-End Recovery Using Local Failure Informa-
tion,” Proceedings of Seventh International Symposium on Computers
and Communications, ISCC, pp. 719–725, July 2002.
[38] D. Zhou and S. Subramaniam, “Survivability in Optical Networks,”
IEEE Network, vol. 14, no. 6, pp. 16–23, November/December 2000.
[39] W. D. Grover and D. Stamatelakis, “Cycle-Oriented Distributed Precon-
figuration: Ring-like Speed with Mesh-like Capacity for Self-planning
Network Restoration,” Proceedings of IEEE International Conference
on Communications, ICC 98, vol. 1, pp. 537–543, June 1998.
[40] S. Chalasani and V. Rajaravivarma, “Survivability in Optical Net-
works,” Proceedings of the 35th Southeastern Symposium on System
Theory, pp. 6–10, March 2003.
[41] O. Gerstel and R. Ramaswami, “Optical Layer Survivability: A Post-
Bubble Perspective,IEEE Communications Magazine, pp. 51–53,
September 2003.
[42] N. Wauters, G. Ocakoglu, K. Struyve, and P. F. Fonseca, “Survivability
in a New Pan-European Carriers Carrier Network Based on WDM and
SDH Technology: Current Implementation and Future Requirements,”
IEEE Communications Magazine, vol. 37, no. 8, pp. 63–69, August
1999.
[43] C. Mauz, “Unified ILP Formulation of Protection in Mesh Networks,”
Proceedings of the 7th International Conference on Telecommunica-
tions, ConTEL 2003, vol. 2, pp. 737–741, June 2003.
[44] E. W. Dijkstra, “ A Note on Two Problems in Connexion with Graphs,
Numerische Mathematik, vol. 1, pp. 269–271, 1959.
[45] J. W. Suurballe, “Disjoint Paths in a Network,Networks, pp. 125–145,
1974.
[46] J. W. Suurballe and R. E. Tarjan, “A quick method for finding shortest
pairs of disjoint paths,” Networks, vol. 14, pp. 325–336, 1984.
[47] R. Bhandari, “Optimal Physical Diversity Algorithms and Survivable
Networks,” Proceedings of 2nd IEEE Symposium on Computers and
Communications (ISCC ’97), pp. 433–441, July 1997.
[48] ——, Survivable Networks: Algorithms for Diverse Routing. Kluwer
Academic Publishers, 1999.
[49] S. Z. Shaikh, “Span-Disjoint Paths for Physical Diversity in Networks,”
Proceedings of IEEE Symposium on Computers and Communications,
pp. 127–133, June 1995.
[50] D. Torrieri, “Algorithms for Finding an Optimal Set of Short Disjoint
Paths in a Communication Network,” IEEE Transactions on Commu-
nications, vol. 40, no. 11, pp. 1698–1702, November 1992.
[51] J. W. Suurballe, “Disjoint Paths in a Network,” Networks, vol. 4, pp.
125–145, 1974.
[52] R. Ogier and N. Shacham, “A Distributed Algorithm for finding
Shortest Pairs of Disjoint Paths,” Proc. of IEEE INFOCOM’89, vol. 1,
pp. 173–182, April 1989.
[53] D. Sidhu, R. Nair, and S. Abdallah, “Finding Disjoint Paths in Net-
works,” ACM SIGCOMM Computer Communication Review, vol. 21,
no. 4, pp. 43–51, September 1991.
[54] D. Xu, Y. Chen, Y. Xiong, C. Qido, and X. He, “On Finding Disjoint
Paths in Single and Dual Link Cost Networks,” Proc. IEEE Infocom’04,
vol. 1, pp. 705–715, March 2004.
[55] M. MacGregor and W. Grover, “Optimized k-shortest-paths
Algorithm for Facility Restoration,” Software-Practice and
Experience, vol. 24, no. 9, pp. 823–834, September 1994,
http://www.trlabs.ca/library/pubs/edm-E0208/E0208.pdf.
[56] P. Pan, G. Swallow, and A. Atlas, “Fast Reroute Extensions to
RSVP-TE for LSP Tunnels,” May 2005, category: Standards Track,
http://www.ietf.org/rfc/rfc4090.txt.
[57] A. Bremler-Barr, Y. Afek, H. Kaplan, E. Cohen, and M. Merritt,
“Restoration by Path Concatenation: Fast Recovery of MPLS Paths,”
Proc. ACM Symposium on Principles of Distributed Computing, pp.
43–52, 2000.
[58] D. A. Dunn, W. D. Grover, and M. H. MacGregor, “Comparison of
k-Shortest Paths and Maximum Flow Routing for Network Facility
Restoration,” IEEE Journal on Selected Areas in Communications,
vol. 12, no. 1, pp. 88–99, January 1994.
[59] E. Bouillet and J.-F. Labourdette, “Distributed Computation of Shared
Backup Path in Mesh Optical Networks Using Probabilistic Methods,”
IEEE/ACM Transactions on Networking, vol. 12, no. 5, pp. 920–930,
October 2004.
[60] A. Itai and M. Rodeh, “The Multi-Tree Approach to Reliability in
Distributed Networks,” 25th Annual Symposium on Foundations of
Computer Science, pp. 137–147, October 1984.
[61] ——, “The Multi-Tree Approach to Reliability in Distributed Net-
works,” Information and Computation, vol. 79, no. 1, pp. 43–59,
October 1988.
[62] ——, “The Multi-Tree Approach to Reliability in Distributed Net-
works,” Information and Computation, vol. 79, no. 1, pp. 43–59, 1988,
http://www.cs.technion.ac.il/itai/publications/TwoTrees/TwoTrees.pdf.
[63] M. Medard, S. G. Finn, R. A. Barry, and R. G. Gallager, “Redundant
Trees for Preplanned Recovery in Arbitrary Vertex-Redundant or Edge-
Redndant Graphs,” IEEE/ACM Transactions on Networking, vol. 7,
no. 5, pp. 641–652, October 1999.
[64] C. A. Siller and M. S. E. Jr., SONET/SDH : A Sourcebook of
Synchronous Networking. IEEE Press, New York, 1996, iEEE
Communications Society.
[65] S. V. Kartalopoulos, Introduction to DWDM Technology: Data in a
Rainbow. Wiley-IEEE Press, December 1999.
[66] W. D. Grover and D. Stamatelakis, “Bridging the ring-mesh dichotomy
with p-cycles,” pp. 92–104, April 2000, technical University of Munich,
Germany, http://www.trlabs.ca/library/pubs/edm-E0444/E0444.pdf.
[67] M. Medard, S. G. Finn, and R. A. Barry, “WDM Loop-back Recovery
in Mesh Networks,” Proceedings of IEEE INFOCOM’99, vol. 2, pp.
752–759, March 1999.
[68] T.-H. WU and W. I. Way, “A Novel Passive Protected SONET
Bidirectional Self-Healing Ring Architecture,” Journal of Lightwave
Technology, vol. 10, no. 9, pp. 1314–1322, September 1992.
[69] G. D. Morley and W. D. Grover, “Current Approaches in the Design
of Ring-based Optical Networks,” Proceedings of the IEEE Canadian
Conference on Electrical and Computer Engineering, pp. 220–225,
May 2000.
14
[70] W. D. Grover, J. Doucette, M. Clouqueur, D. Leung, and D. Stamate-
lakis, “New Options and Insights for Survivable Transport Networks,
IEEE Communications Magazine, pp. 34–41, January 2002.
[71] J. Doucette and W. D. Grover, “Comparison of Mesh Protection and
Restoration Schemes and the Dependency on Graph Connectivity,”
Proceedings of 3rd International Workshop on the Design of Reliable
Communication Networks (DRCN 2001), pp. 121–128, October 2001,
available at http://www.trlabs.ca/library/pubs/edm-E0486/E0486.pdf.
[72] W. D. Grover and J. Doucette, “Increasing the Efficiency of Span-
restorable Mesh Networks on Low-connectivity Graphs,” Proceedings
of 3rd International Workshop on Design of Reliable Communication
Networks (DRCN 2001), pp. 99–106, October 2001, available at
http://www.trlabs.ca/library/pubs/edm-E0487/E0487.pdf.
[73] AMPL, “A Modeling Language for Mathematical Programming,” 2006,
URLs http://www.ampl.com/, http://www.ampl.com/BOOK/index.html
and http://www.ampl.com/DOWNLOADS/index.html.
[74] ILOG, “CPLEX,” 2006, URLs http://www.ilog.com/products/cplex/
and http://www.ampl.com/DOWNLOADS/cplex80.html#current.
[75] W. D. Grover, Telecommunications Network Management into the
21st Century, Techniques, Standards, Technologies and Applica-
tions. IEEE Press, ISBN 0-7803-1013-6, 1994, chapter 11,
“Distributed Restoration of the Transport Network”, Available at
http://www.trlabs.ca/library/pubs/edm-E0176/E0176.pdf (18.1 MB).
[76] G. Ellinas, A. G. Hailemariam, and T. E. Stern, “Protection Cycles in
Mesh WDM Networks,” IEEE Journal on Selected Areas in Commu-
nication, vol. 18, no. 10, pp. 1924–1937, October 2000.
[77] H. Huang and J. Copeland, “Hamiltonian Cycle Protection: A Novel
Approach to Mesh WDM Optical Network Protection,” IEEE Workshop
on High Performance Switching and Routing, pp. 31–35, May 2001.
[78] D. Stamatelakis, “Theory and Algorithms for Preconfiguration of Spare
Capacity in Mesh Restorable Networks,” Master’s thesis, Depart-
ment of Electrical and Computer Engineering, University of Alberta,
Canada, 1997, available at http://www.trlabs.ca/library/pubs/edm-
E0597T/E0597T.pdf.
[79] W. D. Grover et al., papers available at
http://www.trlabs.ca/trlabs/research/library?new url=people/edm-
grover.html and http://tomato.edm.trlabs.ca/p-cycles/papers.htm.
[80] D. Stamatelakis and W. D. Grover, “Theoretical Underpinnings for the
Efficiency of Restorable Networks Using Preconfigured Cycles (“p-
cycles”),” IEEE Transactions on Communications, vol. 48, no. 8, pp.
1262–1265, August 2000.
[81] J. Doucette, D. He, W. D. Grover, and O. Yang, “Algorithmic Ap-
proaches for Efficient Enumeration of Candidate p-Cycles and Ca-
pacitated p-Cycle Network Design,” Proc. of Fourth International
Workshop on Design of Reliable Communication Networks (DRCN),
pp. 212–220, October 2003, http://www.trlabs.ca/library/pubs/edm-
E0515/E0515.pdf.
[82] C. Liu and L. Ruan, “Finding Good Candidate Cycles for Efficient p-
Cycles Network Design,” Proceedings of International Conference on
Computer Communications and Networks, pp. 321–326, 2004.
[83] D. A. Schupke, “The Tradeoff between the Number of Deployed p-
Cycles and the Survivability to Dual Fiber Duct Failures,” Proc. IEEE
ICC’03, vol. 2, pp. 1428–1432, May 2003.
[84] J. Kang and M. J. Reed, “Bandwidth Protection in MPLS Networks
Using p-Cycle Structure,” Proc. of Workshop on Design of Reliable
Communication Networks (DRCN), pp. 356–362, October 2003.
[85] C. Liu and L. Ruan, “p-Cycle Design in Survivable WDM Networks
with Shared Risk Link Groups (SRLGs),” Proc. of 5-th International
Workshop on Design of Reliable Communication Networks, pp. 207–
212, October 2005.
[86] D. Stamatelakis and W. D. Grover, “IP Layer Restoration and Net-
work Planning Based on Virtual Protection Cycles,” IEEE Journal on
Selected Areas in Communications, vol. 18, no. 10, pp. 1938–1949,
October 2000.
[87] ——, “IP Layer Restoration and Network Planning Based on Virtual
Protection Cycles,” IEEE Journal on Selected Areas in Communica-
tions, vol. 18, no. 10, pp. 1938–1949, October 2000.
[88] A. Kvalbein, A. F. Hansen, T. Cicic, S. Gjessing, and O. Lysne, “Fast
Recovery from Link Failures using Resilient Routing Layers,” Pro-
ceedings of 10th IEEE Symposium on Computers and Communications
(ISCC) 2005, pp. 554–560, 2005.
[89] D. A. Schupke, “On Hamiltonian Cycles as Optimal p-Cycles,” IEEE
Communications Letters, vol. 9, no. 4, pp. 360–362, April 2005.
[90] A. Sack and W. D. Grover, “Hamiltonian p-Cycles for Fiber-Level Pro-
tection in Homogeneous and Semi-Homogeneous Optical Networks,”
IEEE Network, vol. 18, no. 2, pp. 49–56, March/April 2004.
[91] O. Wittner, B. E. Helvik, and V. Nicola, “Internet Failure Protection
using Hamiltonian p-Cycles found by Ant-like Agents,Proceedings
of 5th International Workshop on Design of Reliable Communication
Networks, DRCN 2005, pp. 437–444, October 2005.
[92] O. Wittner, “Emergent behavior based implements for distributed net-
work management,” Ph.D. dissertation, Dept. of Telematics, Norwegian
University of Science and Technology, November 2003, Available at
http://www.item.ntnu.no/wittner/wittner2003ebdnm.pdf.
[93] C. G. Gruber and D. A. Schupke, “Capacity-efficient Planning of
Resilient Networks with p-Cycles,” Proceedings of 10-th International
Telecommunication Network Strategy and Planning Symposium, June
2002.
[94] C. G. Gruber, “Resilient Networks with Non-Simple p-Cycles,” 10th
International Conference on Telecommunications, ICT, vol. 2, pp.
1027–1032, 2003.
[95] F. J. Blouin, A. Sack, W. Grover, and H. Nasrallah, “Benefits
of p-Cycles in a Mixed Protection and Restoration Approach,”
Proc. of Fourth International Workshop on Design of Reliable
Communication Networks (DRCN), pp. 203–211, October 2003,
http://www.trlabs.ca/library/pubs/edm-E0514/E0514.pdf.
[96] D. A. Schupke and M. C. Scheffel, “Configuration of p-Cycles in WDM
Networks with Partial Wavelength Conversion,Photonic Network
Communications, vol. 6, no. 3, pp. 239–252, 2003.
[97] D. A. Schupke, C. G. Gruber, and A. Autenrieth, “Optimal Con-
figuration of p-cycles in WDM Networks,” Proceedings of IEEE
International Conference on Communications, ICC, vol. 5, pp. 2761–
2765, April-May 2002.
[98] T. Li and B. Wang, “Optimal Configuration of p-Cycles in WDM
Optical Networks with Sparse Wavelength Conversion,” Proceedings of
IEEE GLOBECOM ’04, vol. 3, pp. 2024–2028, November–December
2004.
[99] S. G. Finn, M. M. Medard, and R. A. Barry, “A Novel Approach to
Automatic Protection Switching Using Trees,Proceedings of IEEE
ICC, vol. 1, pp. 272–276, June 1997.
[100] G. Xue, L. Chen, and K. Thulasiraman, “QoS Issues in Redundant
Trees for Protection in Vertex-Redundant or Edge Redundant Graphs,”
Proceedings of IEEE International Conference on Communications,
vol. 5, pp. 2766–2770, April-May 2002.
[101] G. Xue, K. Thulasiraman, and L. Chen, “Delay Reduction in Redundant
Trees for Pre-Planned Protection against Single Link/Node Failure in
2-Connected Graphs,” Proceedings of IEEE GLOBECOM ’02, vol. 3,
pp. 2691–2695, November 2002.
[102] R. Bartos and M. Raman, “A Heuristic Approach to Service Restoration
in MPLS Networks,” Proc. IEEE ICC’01, vol. 1, pp. 117–121, June
2001.
[103] I. Theiss and O. Lysne, “FROOTS-fault handling in up/down routed
networks with multiple roots,” Proceedings of International Conference
on High Performance Computing, 2003.
[104] I. Theiss and O. Lysne., “ FRoots, A Fault Tolerant and Topology
Agnostic Routing Technique,IEEE Transactions on Parallel and
Distributed Systems, vol. 17, no. 10, pp. 1136–1150, 2006.
[105] T. Cicic, A. Kvalbein, A. F. Hansen, and S. Gjessing, “Re-
silient Routing Layers and p-cycles: Tradeoffs in Network Fault
Tolerance,Workshop on High Performance Switching and Rout-
ing, (HPSR) 2005, pp. 278–282, May 2005, code available at:
http://heim.ifi.uio.no/ tarikc/software/RRL/.
[106] A. F. Hansen, A. Kvalbein, S. Gjessing, T. Cicic, T. Jensen,
O. N. sterb, and O. Lysne, “Fast, Effective and Stable IP Re-
covery using Resilient Routing Layers,” Proc. of the 19th Inter-
national Teletraffic Congress (ITC19), p. 16311640, Aug-Sep 2005,
http://www.simula.no/portal memberdata/tarikc.
[107] P. Francois and O. Bonaventure, “An evaluation of IP-based Fast
Reroute Techniques,Proc. of ACM CoNEXT’05, pp. 244–245, October
2005.
[108] S. Bryant, M. Shand, and S. Previdi, “IP Fast Reroute Using Not-via
Addresses,” Network Working Group, Internet Draft, December 2006,
expiration date: June 2007, http://www.ietf.org/internet-drafts/draft-
ietf-rtgwg-ipfrr-notvia-addresses-00.txt.
[109] W. D. Grover, “Case Studies of Survivable Ring, Mesh and Mesh-
Arc Hybrid Networks,” Proceedings of IEEE GLOBECOM ’92, vol. 1,
15
pp. 633–638, December 1992, http://www.trlabs.ca/library/pubs/edm-
E0133/E0133.pdf.
[110] M. Clouqueur and W. D. Grover, “Quantitative Comparison of
End-to-End Availability of Service Paths in Ring and Mesh-
restorable Networks,” Proc. of National Fiber Optic Engineers Con-
ference (NFOEC ’03), pp. 1–10, September 2003, available at
http://www.trlabs.ca/library/pubs/edm-E0511/E0511.pdf.
[111] P. Cholda and A. Jajszczyk, “Reliability Assessment of
Rings and p-Cycles in DWDM Networks,” Proceedings
of 1st Conference on Next Generation Internet Networks
Traffic Engineering (NGI2005), April 2005, [Online],
http://www.kt.agh.edu.pl/cholda/Papers/NGI2005.pdf.
[112] D. Logothetis and K. S. Trivedi, “Reliability Analysis of the Double
Counter-Rotating Ring with Concentrator Attachments,” IEEE/ACM
Transactions on Networking, vol. 2, no. 5, pp. 520–532, October 1994.
[113] M. L. Shooman, Reliability of Computer Systems and Networks. John
Wiley & Sons, Inc., 2002.
[114] P. Cholda and A. Jajszczyk, “Reliability Assessment of p-cycles,”
Proceedings of IEEE Globecom’05, vol. 1, pp. 63–67, Nov-Dec 2005.
[115] P. Cholda, A. Jajszczyk, and K. Wajda, “A Unified Frame-
work for the Assessment of Recovery Procedures,” Proceedings
of IEEE Workshop on High Performance Switching and Rout-
ing (HPSR 2005), pp. 269–273, May 2005, [Online]. Available at
http://www.kt.agh.edu.pl/cholda/Papers/HPSR2005.pdf.
[116] T. Cicic, A. F. Hansen, S. Gjessing, and O. Lysne, “Applicability of
Resilient Routing Layers for k-Fault Network Recovery,Proceedings
of International Conference on Networking, (ICN), pp. 173–183, April
2005.
16
... In addition, it should be noted that FRR-related technological routing solutions may support the following redundancy schemes depending on resilience requirements [5] and [13]: ...
Article
Full-text available
The paper proposes a Fast ReRoute model with the realization of path and bandwidth protection scheme, which can be used in MPLS for SDN. The task of calculating the set of disjoint primary and backup paths during fast rerouting was reduced to solving the optimization problem of Integer Linear Programming. The advantage of the proposed solution is the possibility of implementing 1:1, 1:2, …, 1:n path protection schemes without the introduction of an additional set of control (routing) variables, which contributes to reducing the dimension of the optimization problem being solved and the computational complexity of its practical implementation. The criterion of optimality of routing solutions contributes to the formation of the primary and backup disjoint paths with the highest possible bandwidth. In this case, the path with the maximum bandwidth will correspond to the primary path, while the remaining paths will be used as backup ones in decreasing order of their bandwidth. The total number of calculated disjoint paths depends on the selected redundancy scheme. The study conducted showed the efficiency and adequacy of the proposed Fast ReRoute model when using various redundancy schemes.
... Upon failures, the affected VN traffic can be rerouted to alternate paths following any predefined policy, e.g., customer priority. This survivability model also allows to delegate failure handling responsibility to an SP, which can then use a Software Defined Network (SDN) controller to employ their own network design and restoration techniques instead of simply relying on the InP [9]- [11]. ...
Article
Full-text available
This paper addresses Connectivity-aware Virtual Network Embedding (CoViNE) problem, which consists in embedding a virtual network (VN) on a substrate network while ensuring VN connectivity (without any bandwidth guarantee) against multiple substrate link failures. CoViNE provides a weaker form of survivability incurring less resource overhead than traditional VN survivability models. To optimally solve CoViNE, we present an Integer Linear Program (ILP), namely CoViNE-opt. CoViNE-opt enumerates an exponential number of edge-cuts in a VN severely limiting its scalability. Therefore, we decompose CoViNE into three sub-problems: i) augmenting a VN with virtual links to provide necessary connectivity, ii) identifying the virtual links that should be embedded disjointly, and iii) computing a VN embedding while satisfying the disjointness constraints. We introduce conflicting set abstraction that allows to address sub-problems (i) and (ii) without enumerating all the edge-cuts of a VN. We propose two novel solutions to CoViNE leveraging conflicting set, namely CoViNE-ILP and CoViNE-fast. CoViNE-ILP uses a heuristic algorithm to address sub-problems (i) and (ii), while an ILP is used for sub-problem (iii). In contrast, CoViNE-fast uses heuristics for solving all three sub-problems. Through simulation, we evaluate the optimality and scalability of our solutions and demonstrate a failure restoration use-case enabled by CoViNE.
... Lightpath protection and restoration in the traditional optical networks is a well-researched topic covered by theoretical, simulation and experimental studies [1], [5]- [6]. It is however necessary to investigate restoration benefits that are offered by SDN architectures. ...
Article
The paper represents a method for ensuring the resilience of a corporate computer network under the influence of various types of threats. This article will provide an overview of the aspects of resilience and existing approaches to ensuring resilient routing. This article is the result of many studies and experiments, and evaluating the final result, it can be noted that this method can successfully reflect the possible importance of a node when it comes to epidemic dynamics for various network models to ensure network resilience. A possible way to solve the problem was to use the theory of linear stationary systems and the phenomenon of propagation in networks as the basis of the method. Complex interdependencies between their elements characterize various systems. The method of synthesizing hardware and software means of ensuring the stability of a corporate computer network consists of such steps as representing networks as a linear stationary system, modelling the stability of a computer network in the context of epidemics by using virtual network expansion, studying the stability of a computer network in the context of uncertain data transmission and virtual network expansion, processing input data received from the modelled computer network, etc. To solve the problem, the method involves the theory of linear stationary systems and the use of the NiR metric, which can successfully reflect the possible importance of a node when it comes to the dynamics of an epidemic for various network models to ensure network resilience. The method is tested by simulations, the results of which show a high correlation with the actual propagation dynamics modeled by SI and SIR processes. NiR also shows a small variance, which means it is reliable for different computer network topologies. The method also involves finding the most critical nodes in a computer network, for which a cascading failure model was used, which models overloaded nodes as non-functional.
Chapter
Survivable routing serves as one of the most important issues in optical backbone design. Due to the high data rates enabled by the wavelength division multiplexing technology, any interruption in the service results in the loss of a large amount of application data. Thus, making efforts to calculate and signal the protection resources promptly after the failure occurred would lead to an unacceptable high delay. As the main purpose of this chapter, the principles of pre-planned protection approaches in mesh optical backbone networks are discussed. The Shared Risk Link Group (SRLG) concept is introduced modeling physical and geographical dependency among seemingly unrelated link failures. Finally, methods are presented for calculating the exact end-to-end availability of a connection.
Article
This paper proposes an algorithm for disaster avoidance control against heavy rainfall. According to weather information, the algorithm reconfigures a logical network (slice), including the migration of virtual machines (VMs), to avoid disasters. It was applied to a nationwide network of 105 nodes and 140 edges, including cases with more than 10,000 slices. Through numerical simulations using actual data of rainfall that caused significant damage in Japan, we found that the probability of service disruption under the proposed control with suitable parameter settings is 10%-30% of that without control, on average. Our proposed control experimental system is implemented by using the software-defined network technology. It can migrate VMs and estimates VM migration time to determine how many VMs should be migrated. By using the experimental system, we found that the control interval has an optimal value, which depends on the management system processing capacity.
Chapter
An MPLS Traffic Engineering solution of multipath Fast ReRoute with local and bandwidth protection is proposed. The novelty of the solution lies in the fact that the optimization problem of load balancing during fast rerouting is presented in the linear form provided the communication links bandwidth protection. This solution practically reduces the computational complexity of determining the routing variables responsible for the formation of the primary and backup paths and provides a balanced load of communication links that meet the requirements of the Traffic Engineering concept. The model provides implementation of local (link, node) and bandwidth protection schemes for fast rerouting with load balancing in telecommunication networks. The analysis of the proposed model has confirmed its adequacy and efficiency in terms of obtaining optimal solutions to ensure balanced load of network communication links and the implementation of necessary schemes for network elements (link, node, and bandwidth) protection.
Article
Full-text available
This chapter focuses on the important practical and theoretical problem of designing survivable communication networks, i.e., communication networks that are still functional after the failure of certain network components. We motivate this topic in Section 2 by using the example of fiber optic communication network design for telephone companies. A very general model (for undirected networks) is presented in Section 3 which includes practical, as well as theoretical problems, including the well-studied minimum spanning tree, Steiner tree, and minimum cost k-connected network design problems. The development of this area starts with outlining structural properties in Section 4 which are useful for the design and analysis of algorithms for designing survivable networks. These lead to worst-case upper and lower bounds. Heuristics that work well in practice are also described in Section 4. Polynomially-solved special cases of the general survivable network design problem are summarized in Section 5. Section 6 contains polyhedral results from the study of these problems as integer programming models. We present large classes of valid and often facet-defining inequalities. We also summarize the complexity of the separation problem for these inequalities. Finally, we provide complete and nonredundant descriptions of a number of polytopes related to network survivability problems of small dimensions. Section 7 contains computational results using cutting plane approaches based on the polyhedral results of Section 6 and the heuristics described in Section 4. The results show that these methods are efficient and effective in producing optimal or near-optimal solutions to real-world problems. A brief review of the work on survivability models of directed networks is given in Section 8. We also show here how directed models can help to solve undirected cases.
Book
“Companies and research labs worldwide are racing to develop Dense Wavelength Division Multiplexing (DWDM) technology, a far-reaching advancement in the fiber optical communications field. To help you keep pace with these latest developments, this all-in-one resource brings you a clear, concise overview of the technology that is transporting and processing vast amounts of information at the speed of light. Until now, no book offered a practical introduction to DWDM advances. INTRODUCTION TO DWDM TECHNOLOGY will help you learn all the essentials for this emerging field: * Principles of physics underlying optical devices * Optical components needed to design optical and DWDM systems * Coding and decoding techniques used in optical communications * Overview of DWDM systems * State-of-the-art research trends Complete with four-color illustrations to show how devices work, this comprehensive book provides an invaluable discussion of DWDM basics necessary for practicing electrical engineers, optical systems designers, technical managers, and undergraduate students in optical communications.
Article
This book presents fundamental passive optical network (PON) concepts, providing you with the tools needed to understand, design, and build these new access networks. The logical sequence of topics begins with the underlying principles and components of optical fiber communication technologies used in access networks. Next, the book progresses from descriptions of PON and fiber-to-the-X (FTTX) alternatives to their application to fiber-to-the-premises (FTTP) networks and, lastly, to essential measurement and testing procedures for network installation and maintenance. An Instructor's Manual presenting detailed solutions to all the problems in the book is available from the Wiley editorial department.
Article
Using Hamiltonian p-cycles, it can be shown that p-cycle design is able to reach the logical redundancy bound of 1/(d̄-1) where d̄ is the average node degree. We formulate two conditions on which the design is able to reach this bound if and only if Hamiltonian p-cycles are used.
Article
The p-cycle concept offers a capacity-efficient and rapid protection mechanism for mesh-restorable networks. This work investigates the configuration of span protecting p-cycles in wavelength division multiplexing (WDM) networks with limited wavelength conversion. An important point of view is the relation between the costs associated with the number of required wavelength converters, and the protection capacity-efficiency achieved. We formulate mathematical models and solve the respective optimization problems for a pan-European network as a test-case. An interesting finding is that the total number of converters required for the network as a whole can be greatly reduced, with only a small increase in spare capacity for protection by a strategy of associating wavelength converters with the access points between a pure wavelength path (WP) working layer and a set of pure WP p-cycle protection structures.
Article
Publisher Summary This chapter focuses on the important practical and theoretical problem of designing survivable communication networks, i.e., communication networks that are still functional after the failure of certain network components. A very general model (for undirected networks) is presented which includes practical, as well as theoretical, problems, including the well-studied minimum spanning tree, Steiner tree, and minimum cost k-connected network design problems. The development of this area starts with outlining structural properties which are useful for the design and analysis of algorithms for designing survivable networks. These lead to worst-case upper and lower bounds. Heuristics that work well in practice are also described. Polynomially-solvable special cases of the general survivable network design problem are summarized. The chapter discusses polyhedral results from the study of these problems as integer programming models. The chapter provides complete and nonredundant descriptions of a number of polytopes related to network survivability problems of small dimensions. The computational results using cutting plane approaches basedon the polyhedral results are given. The results show that these methods are efficient and effective in producing optimal or near-optimal solutions to real-world problems. A brief review of the work on survivability models of directed networks is given.