ArticlePDF Available

Joint Design on DCN Placement and Survivable Cloud Service Provision over All-Optical Mesh Networks

Authors:

Abstract and Figures

Cloud services based on data center networks (DCNs) require a transmission infrastructure with high-capacity, low-latency, low-cost and high-availability, which can be offered by survivable optical networks. DCN placement is a fundamental issue in supporting cloud services in optical networks. It concerns not only the cost of providing cloud services, but also the service availability against failures via proper service replicas. In this paper, we jointly optimize DCN placement with service routing and protection to minimize the network cost, while ensuring fast protection of all services against any single link failure or service failure at a particular DCN. An ILP (Integer Linear Program) is first formulated to achieve optimal joint design. It integrates p-cycle (preconfigured protection cycle) for fast protection against a single link failure, and DCN replicas and fast service rerouting against a service failure. To make the design more scalable, a two-step heuristic is then proposed for large-size network scenarios. The first step separately solves the DCN placement and service routing problem in the failure-free scenario, and the second step takes fast service protection into account. The proposed design is validated by extensive numerical experiments.
Content may be subject to copyright.
IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 1, JANUARY 2014 235
Joint Design on DCN Placement and
Survivable Cloud Service Provision over
All-Optical Mesh Networks
Jie Xiao, Hong Wen, Bin Wu, Xiaohong Jiang, Pin-Han Ho, and Lei Zhang
Abstract—Cloud services based on data center networks
(DCNs) require a transmission infrastructure with high-capacity,
low-latency, low-cost and high-availability, which can be offered
by survivable optical networks. DCN placement is a fundamental
issue in supporting cloud services in optical networks. It concerns
not only the cost of providing cloud services, but also the service
availability against failures via proper service replicas. In this
paper, we jointly optimize DCN placement with service routing
and protection to minimize the network cost, while ensuring fast
protection of all services against any single link failure or service
failure at a particular DCN. An ILP (Integer Linear Program) is
first formulated to achieve optimal joint design. It integrates p-
cycle (preconfigured protection cycle) for fast protection against
a single link failure, and DCN replicas and fast service rerouting
against a service failure. To make the design more scalable, a two-
step heuristic is then proposed for large-size network scenarios.
The first step separately solves the DCN placement and service
routing problem in the failure-free scenario, and the second step
takes fast service protection into account. The proposed design
is validated by extensive numerical experiments.
Index Terms—Cloud services, data center networks (DCNs),
optical networks, routing, survivability.
I. INTRODUCTION
THE rapid growth of broadband communications has led
to many new web applications such as online interactive
maps, social networks, video streaming, cloud computing and
CDN (Content Distribution Network) services. Most of those
applications are provided by data center networks (DCNs) [1-
2]. A DCN is a warehouse-scale and massively parallel com-
puting and storage resource. It generally consists of thousands
of clustered servers. DCN based applications are reshaping
the network landscape, by pushing the traditional hierarchical
and connectivity-oriented Internet towards a more flat and
Manuscript received March 31, 2013; revised August 20 and November 9,
2013. The editor coordinating the review of this paper and approving it for
publication was C. Assi.
J. Xiao is with the School of Computer Science and Technology, Tianjin
University, Tianjin, 300072, P. R. China, and with the National Key Lab on
Communications, University of Electronic Science and Technology of China,
Chengdu, 611731, P. R. China (e-mail: jiexiao001@gmail.com).
H. Wen is with the National Key Laboratory on Communications, Uni-
versity of Electronic Science and Technology of China, Chengdu, Sichuan,
611731, P. R. China (e-mail: sunlike@uestc.edu.cn).
B. Wu and L. Zhang, corresponding author, are with the School of
Computer Science and Technology, Tianjin University, Tianjin, 300072, P.
R. China (e-mail: binwu.tju@gmail.com, lzhang@tju.edu.cn).
X. Jiang is with the School of Systems Information Science, Future
University Hakodate, Hakodate, Japan (e-mail: jiang@fun.ac.jp).
P.-H. Ho is with the ECE Department, University of Waterloo, Waterloo,
ON, N2L 3G1, Canada (e-mail: p4ho@uwaterloo.ca).
Digital Object Identifier 10.1109/TCOMM.2013.121313.130240
service-oriented infrastructure [3]. Accordingly, networks pro-
vide more direct connections from content/service providers to
customers, with services delivered through a network of DCNs
(referred to as cloud [4-6]).
As cloud services grow rapidly, network-planning is neces-
sary to solve both the service deployment and the reliability
issues. Cloud services can be supported by the anycast service
mode [7-8], where heterogeneous contents or services are
replicated across multiple DCNs located at different nodes,
and a user demand can be served by any DCN that supports
the desired service. Such a distributed service mode brings two
benefits: 1) a demand can be served by a nearby DCN rather
than a remote centralized one, thereby reducing the service
transmission cost and latency; and 2) it improves service
availability, as the demands can still be served by other DCN
replicas upon a specific service failure at a particular DCN.
In fact, a failed service can be simultaneously protected by
multiple nearby replicas, as long as the sum capacity of those
replicas can satisfy the original demand.
Under the anycast mode, DCN placement is important in
balancing between network cost and service latency, as well
as ensuring cloud service availability. On one hand, cloud
service providers wish to direct user demands to nearby DCNs
with the smallest latencies and the minimum transmission
costs. This can be easily achieved by densely distributing the
DCNs in the area of interest. On the other hand, deploying a
DCN incurs not only a basic investment (e.g. warehouse rent
and power supply, etc.), but also the service capacity related
cost (e.g. servers and switches). This requires the number of
DCNs to be minimized. The tradeoff between network cost
and service latency makes DCN placement a fundamental
optimization problem.
In addition to the above tradeoff, service availability is an-
other important issue. It is desired that fast service protection
can be achieved against a component or service failure. In
terms of providing cloud services, anycast mode enhances
the service availability. Since cloud services are generally
deployed over a national-wide area, wavelength division mul-
tiplexing (WDM) optical network [9-10] is ideal for ser-
vice transmission. Although survivability has been extensively
studied in traditional optical networks [11-20], anycast mode
is taken into account only in a few recent works [8] to provide
highly available cloud services.
Given a backbone network topology with a certain amount
of service demands at each node, the DCN placement problem
0090-6778/14$31.00 c
2014 IEEE
236 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 1, JANUARY 2014
under the anycast mode should be addressed in three aspects:
1) how many DCNs should be placed; 2) where should the
DCNs be placed; and 3) what are the service and replica
capacities of the DCNs for satisfying all demands and ensuring
service availability against failures. Those problems should be
solved under the objective of minimizing the total network
cost, which includes not only the DCN construction and
the service transmission costs, but also the cost for service
protection. In particular, service transmission cost in optical
networks can be defined as distance related, and thus is also
related to the average service latency. By manipulating among
different cost metrics, the tradeoff between network cost and
service latency can be balanced as well.
In this paper, we study an integrated joint design under
the anycast mode, which takes both DCN placement and fast
cloud service protection into account. An ILP (Integer Linear
Program) is first formulated to achieve optimal joint design. In
terms of service protection, we consider any single link failure
and service failure at a particular DCN. To make the design
more scalable to large-size networks, a two-step heuristic is
proposed, which separately solves the DCN placement and
service routing problem before service protection is consid-
ered.
Compared with the classic node placement problem [21-
23], DCN placement in our proposed joint design is much
more complex. It differs from the former in three key aspects:
1) the classic node placement generally places nodes with
exactly the same configuration [21-23], whereas the service
and replica capacities of the DCNs are to be optimized and
thus different in our work; 2) cloud service brings about the
anycast mode which is not considered in the classic node
placement; and 3) in addition to service and replica capacity
optimization, DCN placement in the joint design is also
coupled with service routing and fast protection. As far as we
know, the proposed joint design provides the first complete and
integrated network-planning solution under the anycast mode
for reliable cloud services, where the classic node placement
can be taken as a basic building brick overlaid by multiple
interplaying factors.
The rest of the paper is organized as follows. We describe
the system model and the problem in Section II, and formulate
the optimal ILP for joint design in Section III. Section IV
presents the heuristic. Section V gives numerical results and
we conclude the paper in Section VI.
II. SYSTEM MODEL AND PROBLEM DESCRIPTION
A. System Model
Consider a WDM optical backbone network with topology
G(V,E), where Vis the set of all node/vertices and Eis the
set of all fiber links/edges. DCNs can be placed at a selected
subset of network nodes. Assume that a sufficient number of
wavelengths can be multiplexed onto each link for high-speed
optical transmissions. A set S={s1,s
2, ..., sS}denotes a
total number of S=|S|service types provided in the network.
Every node has a specific amount of demands on each service
type, which is counted in full wavelength granularity. The set
of services Sare hosted at the DCNs and are delivered to
the destination nodes through all-optical service paths, which
can be achieved by using OXCs (optical cross-connects) at the
intermediate nodes for transparent optical connections.
Network cost consists of the DCN construction cost,the
service transmission cost and the service protection cost.DCN
construction cost is the sum of the costs of all individual
DCNs deployed in the network. The cost of a particular
DCN includes a basic investment and a capacity related cost
measured by Psfor per unit service s. Service transmission
cost counts for the costs of all working wavelength capacity
for establishing the service paths to satisfy the network-wide
demands. Besides, service protection cost counts for the costs
of all spare wavelength capacity for fighting against the failure,
where at most a single service or link failure is assumed in
the network. Note that the cost of DCN service replicas is
counted in the DCN construction cost rather than in the service
protection cost.
In the failure-free scenario, service demands at each node
are served by nearby DCNs to reduce the service transmission
cost. Upon a service failure, the disrupted services must be
recovered using the service replicas hosted by other DCNs at
different nodes (i.e., the anycast service mode), whereas all
unaffected services keep their original paths without the need
of rerouting. Define protection segments as the preconfigured
spare wavelengths between a replicated service and the failed
one. To achieve fast service recovery against a service failure,
the rerouted service paths are set up by connecting the precon-
figured protection segments to the existing original paths of
the disrupted service. As such, only a single optical switching
operation is needed at the node with the failed service, and
reconfigurations at other intermediate nodes of the path can
be avoided for fast protection.
On the other hand, the classic p-cycle [14, 20] is adopted
for fast service protection against any single link failure. A
p-cycle is a preconfigured ring-like structure consisting of a
spare wavelength on each on-cycle link. If both end nodes of
a link are on a p-cycle but the link is not an on-cycle link, it is
defined as a straddling link.Ap-cycle can protect one unit of
traffic on each on-cycle link, and two units on each straddling
link. Note that pre-cross-connection of spare capacity is the
key to achieve fast protection, and p-cycle is not the only
choice. Instead, our work can be tailored to adopt other fast
protection schemes such as CFP [15] and pre-cross-connected
trail (p-trail) [24-25].
We also assume that service and link failures will not occur
at the same time, though simultaneous service and link failures
can be allowed by slightly modifying our proposed algorithms
as discussed later.
B. Problem Description
Our objective is to minimize the total network cost as
detailed in Section II.A, subject to all service demands being
satisfied with fast protection against any single link failure
(p-cycle based fast protection) or service failure.
As introduced in Section I, the above problem entails a
complex joint optimization on DCN placement, service routing
and fast protection. In our work, DCN placement differs
from the classic node placement problem [21-23] due to
the (service and replica) capacity optimization of the DCNs
XIAO et al.: JOINT DESIGN ON DCN PLACEMENT AND SURVIVABLE CLOUD SERVICE PROVISION OVER ALL-OPTICAL MESH NETWORKS 237
and the anycast mode of cloud services. The problem is
further complicated by the joint optimization on routing and
protection.
Notably, the anycast mode is the pivot that couples all
those interplaying factors into an integrated joint design. Each
service at a DCN is subject to a possible service failure.
In the anycast mode, a failed service can be protected by
multiple nearby DCNs via service replicas, as long as the
sum capacity of those replicas can satisfy the original demand.
Therefore, DCN placement concerns not only the locations to
place the service and replica capacities, but also the amount
and the protection relationship among them. In other words,
DCN placement is constrained by all possible service failure
scenarios and the protection scheme under the anycast mode,
as well as the routing scheme in both failure-free and failure
scenarios. This is much more complex than that in the tradi-
tional node placement problem [21-23]. The basic investment
for constructing a DCN also infers a less number of DCNs and
promotes the collocation of service and replica capacities at
the same DCN. Besides, DCN placement and routing scheme
determine the traffic load on each link, which interplays with
p-cycle placement against a possible link failure. In our work,
all the above factors are jointly considered under the anycast
mode to generate a cost efficient solution for highly available
cloud services.
III. ILP FOR OPTIMAL JOINT DESIGN
We define the notations used in the ILP in Section III.A, and
formulate the ILP in Section III.B. Our ILP imports a modular
Cycle Exclusion technique from [14] for optimal p-cycle
design without candidate cycle enumeration. It involves some
notions including cycle set, root node and voltage [14], which
will be further explained later in Fig. 1 and the corresponding
context.
A. Notation List
V: The set of all nodes in the network.
E: The set of all bidirectional links in the network.
S: The set of all service types in the network.
J: Predefined constant. It is the maximum allowed num-
ber of p-cycles for link failure protection. Cycles are
indexed by j∈{1,2, ..., J}.
Bu: Predefined constant. It is the basic investment cost for
constructing a DCN at a particular node u. Note that
Bucan take different values at different nodes.
Ps: Predefined constant. It is the cost for a unit amount
of service sSprovided by a DCN server.
Cuv: Predefined constant. It is the cost of a wavelength
on link (u,v). We assume Cuv =Cvu for bidirectional
links and it can be either distance-related or hop-count
based.
β: Predefined constant greater than |S|×|Vmax{ds
u|
uV,sS}.
ds
u: Predefined constant. It is the amount of demands on
service type sat node u.
Du: Binary variable. It takes 1 if a DCN is placed at node
uand 0 otherwise.
Wuv: Non-negative integer variable. It is the total number of
working wavelengths on link (u,v) for service trans-
missions plus spare wavelengths for service rerouting
against a service failure. Note that the spare wave-
lengths for p-cycles are not included in Wuv.
cs
u: Non-negative integer variable. It is the sum of working
capacity (when no service fails) and replicated spare
capacity of service sat a DCN server placed at node
u.
ts
u: Non-negative integer variable. It is the working capac-
ity of service sprovided by a DCN at node uwhen
no DCN service fails.
ts
u|n: Non-negative integer variable. It is the capacity replica
of service sprovided by a DCN at node uif a DCN
at node nfails to provide the same type of service.
ts
uv: Non-negative integer variable. It is the wavelength
capacity on link (u,v) for transmitting service sfrom
node uto node vwhen no DCN service fails.
ts
uv|n: Non-negative integer variable. It is the spare wave-
length capacity of service son link (u,v) from node
uto node vif service sfails at node n.
α: Predefined fractional constant where 1/|V|≥α>0.
θj
uv: Binary variable. It takes 1 if link (u,v) is traversed by
cycle set jfrom node uto node v,and0otherwise.
zj
u: Binary variable. It takes 1 if node uis on cycle set j,
and 0 otherwise.
rj
u: Binary variable. It takes 1 if node uis a root node on
cycle set j,and0otherwise.
pj
u: Fractional variable. It is the voltage value of node u
when constructing cycle set j.
xj
uv: Binary variable. It takes 1 if link (u,v) can be
protected by cycle set j,and0otherwise.
B. ILP Formulation
The ILP in (1)-(15) carries out the joint design of DCN
placement, service routing and protection (against a link or
service failure) to minimize the total network cost, with the set
of parameters {G(V, E ),S,B,Ps,Cuv,ds
u,β,J}as the input.
minimize
(u, v)E
CuvWuv +
j
(u, v)E
Cuv(θj
uv +θj
vu)
+
uVBuDu+
sS
Pscs
u(1)
Subject to
Du1
β
sS
cs
u,uV;(2)
ts
u+
(u,v)E
(ts
vu ts
uv)=ds
u,uV,sS;(3)
ts
n|n=0,nV,sS;(4)
(n,v)E
(ts
vn|nts
nv|n)=ts
n,nV,sS;(5)
238 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 1, JANUARY 2014
ts
u|n=
(u,v)E
(ts
uv |nts
vu|n),
u,n V:u=n,sS;
(6)
cs
uts
u+ts
u|n,u,n V,sS;(7)
Wuv (ts
uv|n+ts
vu|n)+
sS
(ts
uv +ts
vu),
nV,sS,(u, v)E;
(8)
θj
uv +θj
vu 1,(u, v)E,j;(9)
(u,v)E
(θj
uv +θj
vu)=2zj
u,uV,j;(10)
uV
rj
u1,j;(11)
(u,v)E
θj
uv 1+rj
u,uV,j;(12)
pj
vpj
uαθj
uv (1 θj
uv),(u, v)E,j;(13)
xj
uv 1
2(zj
u+zj
v),(u, v)E,j;(14)
j
(2xj
uv θj
uv θj
vu)
sS
(ts
uv +ts
vu),
(u, v)E.
(15)
Objective (1) minimizes the network cost, which consists of
three terms. The first term counts for the capacity of working
wavelengths (i.e., service transmission cost) plus the spare
wavelength capacity for service rerouting against a service
failure. The second term is the spare wavelength capacity
dedicated to p-cycle protection against a link failure. The third
term counts for the DCN construction cost, which includes the
basic investment and the capacity related costs of all DCNs.
Constraints (2)-(8) formulate DCN placement and service
protection against a service failure. In particular, constraint
(2) means that if any type of DCN service is provided at
a particular node, then we must place a DCN at this node.
Constraint (3) formulates the flow conservation property of
an arbitrary service at each node in the failure-free scenario.
It also ensures that all demands in the network can be served.
Constraints (4)-(8) assume that an arbitrary service sfails at
node n. In particular, constraint (4) means that the replicated
capacity of sat node nshould be zero. Constraint (5) gives
the flow conservation property of service sat node n.Itsays
that the net amount of service semanating from node nmust
keep the same as if the service had never failed. This makes
the service failure transparent to the affected demands, and
ensures that the original paths of the disrupted services can
be reused by being connected to the preconfigured protection
segments. Constraint (6) formulates the flow conservation
property at other nodes without service failure. It says that
the replicated capacity of service sat those nodes equals to
the net traffic load of supon the service failure at node n.
This supports the anycast mode. Constraint (7) specifies that
the DCN capacity of service sat a node must be sufficient to
satisfy all demands on sno matter whether a service fails
or not. Finally, the required wavelength capacity on each
link must be sufficient to support the service transmissions
Root node
These two cycles are excluded
to avoid voltage value conflicts
A cycle set where the underlying network topology is omitted.
Fig. 1. Cycle Exclusion mechanism [14].
as formulated in (8). Constraint (8) also ensures that all
service paths in the failure-free scenario, including those of
the disrupted services which are reused upon the failure, will
keep unchanged.
Constraints (9)-(15) are dedicated to p-cycle protection of
the services against a link failure, which can be taken as a
separate module imported from [14]. It removes the candidate
cycle enumeration process by using the Cycle Exclusion algo-
rithm [14] (constraints (9)-(13)) to formulate a single cycle at
a time. Constraint (14) says that a link can be protected if its
two end nodes are on the p-cycle. Constraint (15) says that all
services must be protected, where one unit of traffic can be
protected if the failed link is an on-cycle link and two units
can be protected if it is a straddling one.
To make the paper self-contained, we briefly explain the
Cycle Exclusion algorithm [14] by referring to Fig. 1. In Cycle
Exclusion, a cycle can traverse a link only once in either
direction as specified by (9). Constraint (10) requires each
node to be incident on either two or zero on-cycle links. When
formulating a single p-cycle, constraint (10) may generate
multiple disjoint cycles referred to as a Cycle Set.Togeta
single cycle by excluding all other redundant ones, a unique
root node is defined on each cycle set as in (11). By constraint
(12), we logically allow the root node to have two outgoing
on-cycle links, whereas all other nodes can have at most one.
Next, a voltage value is defined for each node, and it must
keep increasing at the nodes along the logically directed on-
cycle links as required by (13). Due to the cyclic structure
of the cycles in a cycle set, if a cycle does not traverse the
root node, the voltage values along the cycle will encounter
a conflict and thus violate (13). Consequently, all redundant
cycles will be excluded from the cycle set, and only the one
traversing the unique root node remains as a single p-cycle.
Recall that in Section II.A, we have assumed that service
and link failures will not occur at the same time. Nevertheless,
by changing constraint (15) to the following (16), a simulta-
neous link and service failure can be allowed.
j
(2xj
uv θj
uv θj
vu)Wuv ,(u, v)E.(16)
IV. HEURISTIC
Solving the ILP in Section III is not an easy task. Although
some large-scale optimization techniques such as column
generation [26-28] can be applied, they are still ILP-based or
XIAO et al.: JOINT DESIGN ON DCN PLACEMENT AND SURVIVABLE CLOUD SERVICE PROVISION OVER ALL-OPTICAL MESH NETWORKS 239
DPP Algorithm: DCN Placement and Service Protection
Input:
G(V, E), S, B, Ps for ∀ ∈ , Cuv for (,)∈,
for ∀ ∈  and
∀ ∈ , J and . Define as a null set.
Output:
Location of each DCN, service and replicated capacity of each
service type, service transmission capacity and paths for all demands,
and protection capacity and paths against a single link or service failure.
1. DCN placement and service routing:
1.1) Bipartite graph construction:
Construct a bipartite graph consisting of a left set L=V and a right
set R=V of vertices. Use an edge to connect each vertex pair ,
with ∈ and ∈. Calculate the shortest path Puv⊆ between
, in G(V, E) and assign the path cost  =(,)∈ as the
weight to the edge.
1.2) DCN placement:
while (≠∅)
{
=arg
∈min = Service_Matching(,,).
Place DCN at node in G(V, E).
Set −→ and −
→.
}
Subroutine _(, , ):
_(,,)
{
Set →, 0→, ∅→
and +∞ → ;
while (≠)
{
=arg
∈()min=
()
∈

∈ ;
if (≤
)
{
Set →
 and ∪
→
;
Set +
(+
)
∈ →;
Set +
∈ →;
}
else break;
}
return ;
}
1.3) Service routing:
Based on shortest paths, each node in network G(V, E) finds its
closest DCN to serve its demands (tie is broken by random choice).
In the failure-free scenario, the required capacity
( ∀ ∈ ,
∀ ∈ ) of service s at DCNu can thus be determined, and the
wavelength capacity required on each link is ∑(

+

)
∈ where

+

is the amount of traffic s on link (u, v).
2. Service protection:
2.1) Protection against a service failure:
2.1.1) Protection rule:
Let DCNu be the closest DCN of DCNv. If service s at DCNv fails,
merge the shortest path Puv (i.e., the protection segment) with the
original service paths of s emanated from DCNv. The failed service
is protected by DCNu using the merged paths.
2.1.2) Spare capacity on the protection segment:
Under the assumptions in 2.1.1), the required number of spare
wavelengths on each link along Puv is max∈
.
2.1.3) DCN capacity and replicas:
Let be a set of nearby DCNs protected by DCNu. The total
capacity
of service s at DCNu (which includes
and the
replicated capacity of s) is determined by
=
+max
∈
.
2.2) Protection against a link failure:
Use the ILP module in constraints (9)-(15) to minimize the spare
wavelength capacity of p-cycles for link failure protection. p-cycles
can thus be placed in the network.
Fig. 2. Pseudo code of the proposed heuristic.
exhaustive search algorithms and thus not fully scalable. To
make the design more scalable for large-size networks, in this
section we propose a two-step heuristic DPP (DCN Placement
and Service Protection) by dividing the problem into two sub-
problems. The first step separately solves the DCN placement
and service routing problem, and the second step considers
service protection against a service or link failure by taking
the result of the first step as the input. In what follows, we
first present the DPP algorithm in Section IV.A, and then give
more detailed theoretical analysis in Section IV.B to validate
the DCN placement process in DPP. Finally, Section IV.C
discusses the complexity of DPP and how fast protection is
achieved in the solutions.
A. Algorithm Description
The pseudo code of the proposed heuristic DPP is given
in Fig. 2. In Step 1.1, we first construct a bipartite graph as
illustrated by the 4-node example in Fig. 3, where initially
both the left and right sets of vertices (Land R) include all
the nodes in the network. An edge <u, v >is used to connect
each and all node pairs {u, v}between the two sets, and is
weighted by the cost Puv of the shortest path between uand
vin G(V,E). In our analysis, a vertex vRgives a possible
location of a DCN (denoted by DCNv). It matches a set of
vertices uLwhich denotes the set of nodes that DCNv
should serve, provided that DCNvcan finally be placed at
node v.
Step 1.2 is dedicated to DCN placement. It checks each
vertex vRby assuming a DCNvplaced at node v, and finds
the corresponding set of nodes CvLto be served using a
subroutine Service Matching(L,v,Cv). The subroutine takes
Land vas inputs and Cvas the output. If adding a node uL
to Cvcan further reduce the amortized service cost Auin (17),
then uis added to Cv(i.e., to be served by DCNv)andthe
total cost Cand demands Dare updated accordingly (see the
pseudo code for the subroutine in Fig. 2).
Au=
C+
sS
ds
u(Ps+Puv)
D+
sS
ds
u
(17)
By checking all vertices in the right set Rusing the
subroutine Service Matching(L,v,Cv), a node vwith the
minimum amortized service cost Amin is identified. Then,
DCNvis placed at node v,andthesetsRand Lare updated
by R−{v}−Rand LCv−→ L, respectively. At this
point, Step 1.2 is repeated again based on the updated Rand
L, until all vertices uLhave been properly matched to their
serving DCNs. Then, the number of DCNs and their locations
can be determined.
Based on the DCN locations, Step 1.3 simply uses shortest
paths to route the services between a demanding node and its
closest DCN, and ensures that all demands in the network can
be satisfied in the failure-free scenario. Then, service routing
and the required server capacity for every service type can be
determined at each DCN.
Step 2 is for service protection against a single link or
service failure. The process described in Step 2 of Fig. 2 is
quite straightforward and thus not further explained here.
B. Theoretical Analysis on DCN Placement in DPP
DCN placement (Step 1.2 in Fig. 2) invokes a subroutine
Service Matching(L,v,Cv)whichtakesLand vas inputs
240 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 1, JANUARY 2014
ݒאࡾ
Each edge <u,v> matches a shortest path Puvكࡱ between nodes
ݑǡ ݒ in G(V,E), and is weighted by the path cost
௨௩ σܥ௦௧ሺ௦ǡ௧ሻאࡼ࢛࢜ .
1
2
3
4
1
2
3
4
ݑאࡸ
Fig. 3. Illustration of the bipartite graph with |V|=4.
ܣ୫୧୬
||
||
݇୫୧୬
123kk+1 k+2
Fig. 4. How Amin changes with |Cv|in Service Matching(L,v,Cv).
and Cvas the output. For a given node vRand a set L,
it returns a set CvLwith the minimum amortized service
cost Amin by assuming that the nodes in Cvare served by
DCNvat node v. As discussed in Section IV.A, nodes uL
are sequentially added to Cvif they can continuously reduce
Auin (17). As indicated by the two circled parts in Fig. 2, the
subroutine breaks and returns once this trend changes (i.e.,
whenever Auis found larger than Amin =min{Au}recorded
in all previous rounds). In this subsection, we carry out some
theoretical analysis to validate Step 1.2 in DPP.
We now prove that Amin returned by the subroutine is
globally minimal for any |Cv|∈{1,2, ..., |L|}. By assum-
ing that Service Matching(L,v,Cv) breaks and returns at
|Cv|=kmin, the subroutine has ensured a decreasing series
of {Amin}recorded as |Cv|increases up to kmin,asshownin
Fig. 4. Therefore, we only need to prove that Amin cannot be
smaller if |Cv|is allowed to go beyond kmin, which can be
achieved by defining a Service Matching(L,v,Cv) without
the two circled parts in Service Matching(L,v,Cv)(seeFig.
2). To this end, we have the following Theorem 1 (proved in
the Appendix), which ensures the global minimum of Amin at
|Cv|=kmin.
Theorem 1: In Service Matching(L,v,Cv), Amin always
keeps increasing with |Cv|for |Cv|>k
min.
Note that Service Matching(L,v,Cv) calculates the min-
imum amortized service cost only for a single given node
vR. By repeatedly invoking this subroutine in Step 1.2
of Fig. 2 to minimize Amin over all vR, DPP identifies a
node vwith the minimum Amin to place DCNv. Obviously,
the following Theorem 2 is correct. It implies that DPP
intrinsically adopts a greedy approach for DCN placement.
Theorem 2: When DPP finds a matching {v,Cv}in Step
1.2toplaceaDCN
v, there is no any other matching {v
R,C
vL}which can achieve a smaller Amin than that by {v,
Cv}.
Theorems 1 and 2 provide underlying theoretical supports
to validate Step 1.2 for DCN placement in DPP. In particular,
Theorem 1 ensures that Service Matching(L,v,Cv) can
always return a proper set CvLto minimize the amortized
service cost Amin for a given node vR. By checking all
nodes vRin Step 1.2, Theorem 2 ensures that the specific
node vRidentified for placing DCNvcan achieve the
minimum Amin among all nodes vR.
C. Discussions
We now analyze the complexity of DPP by excluding the
ILP based Step 2.2 for p-cycle placement. Without Step 2.2,
the complexity of DPP is dominated by two parallel opera-
tions: DCN placement in Step 1.2 and calculation on all-pairs
shortest paths used in other steps. In DCN placement, a DCNv
with the minimum amortized service cost can be placed at a
node vafter all vRare checked, which incurs a complexity
of O(|V|). Meanwhile, subroutine Service Matching(L,v,
Cv) is invoked for checking each vR, where each node
uLis sequentially tested to check whether the amortized
service cost can be further reduced or not, and at most |V|
nodes uLcan be tested. As a result, placing a single DCN
requires a complexity of O(|V|2). Since at most |V|DCNs can
be placed, the total complexity of DCN placement is O(|V|3).
On the other hand, it is well known that the complexity of all-
pairs shortest path calculation is O(|V|3), and it is in parallel
with DCN placement. Hence, the total complexity of DPP is
O(|V|3)without considering Step 2.2.
Step 2.2 adopts the ILP module in constraints (9)-(15) for
p-cycle design. It is imported from [14], which adopts Cycle
Exclusion algorithm to remove the traditional candidate cycle
enumeration process, and thus can generate p-cycle solutions
in a fast manner. Even when the network size is relatively
large, the ILP running time is tolerable (e.g., only a few hours
for the network in Fig. 7 with 30 nodes and 62 links). On
the other hand, Step 2.2 can be replaced by some existing
heuristics for p-cycle placement if necessary. In this case, the
complexity of DPP will be completely in polynomial time.
Besides, if p-cycles are designed based on (16) rather than
(15), a simultaneous link and service failure can be allowed.
We emphasize that both the proposed ILP and the heuristic
focus on fast protection against either a service or a link fail-
ure. For a service failure, preconfigured protection segments
are connected to the original paths of the disrupted services (as
specified in Step 2.1.1 in Fig. 2). Since only a single operation
of local failure detection and optical switching is needed, the
optical recovery speed can be very fast. On the other hand,
it is well known that p-cycles can achieve fast protection
against a link failure. Fast protection is desired in providing
cloud services over all-optical networks for minimizing service
interruption time, and our proposed joint design well meets
this essential requirement.
XIAO et al.: JOINT DESIGN ON DCN PLACEMENT AND SURVIVABLE CLOUD SERVICE PROVISION OVER ALL-OPTICAL MESH NETWORKS 241
8+4, 8+3, 8+4
7+5, 7+4, 7+5
9+3, 7+4, 9+3
(a) Pan-European COST 239 network.
0
123
4
56
7
89
10
Ͷ
՜ͷ
՜͸
Ͷ
՜ͷ
՜ͺ
՜͸
ͺ
՜͸
Ͷ
՜ͷ
՜͸
ͺ
՜͸
Ͷ
՜ͷ
՜͸
Ͷ
՜ͷ
՜ͺ
՜͸
ͺ
՜͸
ݏ fails at DCN6ݏ fails at DCN6ݏ fails at DCN6
Ͷ
՜ͷ
՜ͺ
͸
՜ͷ
՜ͺ
͸
՜ͺ
Ͷ
՜ͷ
՜ͺ
͸
՜ͺ
Ͷ
՜ͷ
՜ͺ
͸
՜ͷ
՜ͺ
͸
՜ͺ
ݏ fails at DCN8ݏ fails at DCN8ݏ fails at DCN8
Simulation
parameters
B=16000 Į=0.01 ȕ=1000
S={s1,s2,s3}
ܲ
=200 ܲ
=250 ܲ
=300
J=7
0-2-1-7-5-4-10-8-6-9-3-0
0-2-1-7-10-4-5-8-6-9-3-0
1-4-2-5-8-6-3-9-10-7-1
p-cycles
Node (u) ݀
݀
݀
0: Copenhagen
1: London
2: Amsterdam
3: Berlin
4: Brussels
5: Luxembourg
6: Prague
7: Paris
8: Zürich
9: Vienna
10: Milan
1 4 1
1 2 2
2 1 0
1 3 1
2 1 2
3 2 3
4 1 3
2 1 5
3 0 4
5 3 2
0 4 1
(d) Service demands at each node and routing in the failure-free scenario.
Service paths in the failure-free scenario
s1: ͸
՜Ͳ s2: ͸
՜Ͳ, Ͷ
՜ʹ
՜Ͳ s3: ͸
՜Ͳ
s1: Ͷ
՜ͳ s2: Ͷ
՜ͳ s3: Ͷ
՜ͳ
s1: Ͷ
՜ʹ s2: Ͷ
՜ʹ
s1: ͸
՜͵ s2: ͸
՜͵ s3: ͸
՜͵
s1: Ͷ
՜Ͷ s2: Ͷ
՜Ͷ s3: Ͷ
՜Ͷ
s1: Ͷ
՜ͷ s2: Ͷ
՜ͷ, ͺ
՜ͷ s3: ͺ
՜ͷ
s1: ͸
՜͸ s2: ͸
՜͸ s3: ͸
՜͸
s1: ͺ
՜͹ s2: Ͷ
՜͹ s3: Ͷ
՜͹, ͺ
՜͹
s1: ͺ
՜ͺ s3: ͺ
՜ͺ
s1: ͸
՜ͻ, ͺ
՜ͻ s2: ͸
՜ͻ, ͺ
՜ͻ s3: ͸
՜ͻ
s2: ͺ
՜ͳͲ s3: ͺ
՜ͳͲ
(c) Simulation parameters.
(e) p-cycle placement.
(f) Protection segments against a service failure.
͸
՜ͷ
՜Ͷ
͸
՜ͺ
՜ͷ
՜Ͷ
ͺ
՜ͷ
՜Ͷ
͸
՜ͷ
՜Ͷ
͸
՜ͺ
՜ͷ
՜Ͷ
ͺ
՜ͷ
՜Ͷ
͸
՜ͷ
՜Ͷ
͸
՜ͺ
՜ͷ
՜Ͷ
ͺ
՜ͷ
՜Ͷ
ݏ fails at DCN4ݏ fails at DCN4ݏ fails at DCN4
(b) Link cost in kilometers
Link Cost Link Cost Link Cost
(
0, 1
)
1310
(
0
,
2
)
760
(
0, 3
)
390
(
0
,
6
)
740
(
1
,
2
)
550
(
1
,
4
)
390
(
1
,
7
)
450
(
2
,
3
)
660
(
2
,
4
)
210
(
2
,
5
)
390
(
3
,
6
)
340
(
3, 7
)
1090
(
3, 9
)
660
(
4
,
5
)
220
(
4
,
7
)
300
(
4
,
10
)
930
(
5
,
6
)
730
(
5
,
7
)
400
(
5
,
8
)
350
(
6
,
8
)
565
(
6
,
9
)
320
(
7, 8
)
600
7
10
820
8
9
730
8, 10
320
9
10
820
Fig. 5. ILP based optimal solution for the joint design with a total network cost of 120805.
p-cycles
(J=10)
DCN protection segment
Total cost
Gap to optimal in Fig. 5
0-1-7-5-2-4-10-8-9-6-3-0
0-3-2-5-7-4-10-8-9-6-0
0-1-7-5-2-4-10-8-9-6-3-0
0-2-5-4-1-7-8-10-9-6-3-0
2-4-10-8-7-5-2
0-3-2-5-4-7-8-9-6-0
0-3-6-0
͵ଵ଻
՞ʹଵ଻
՞ͷ
141260
16.93%
(b) Service protection and the total network cost.
11+13, 11+11, 7+17
13+11, 11+11, 17+7
(a) DCN placement and service routing.
10
Service from DCN5
Service from DCN3
0
123
4
56
7
89
Fig. 6. DPP Heuristic solution for COST 239 network with a total network cost of 141260 (16.93% above the optimal solution in Fig.5).
V. N UMERICAL RESULTS
In this section, we carry out numerical experiments to check
the proposed ILP and heuristic DPP. For simplicity, we assume
that the basic investment costs Butake the same value B
for all nodes uV, although they may be different in
practical networks. Since the ILP approach is not scalable,
we can optimally solve the ILP only in small-size networks.
This provides a benchmark to gauge the DPP performance.
For large-size network scenarios, we focus on checking the
feasibility of the DPP solutions.
A. ILP Based Optimal Joint Design in COST 239
CPLEX 11.0 is adopted to solve the ILP in (1)-(15). We
consider the typical pan-European COST 239 network with
11 nodes and 26 links as shown in Fig. 5a. Fig. 5b defines
the distance-related link cost in kilometers. In the simulation,
we set B= 16000,α=0.01 and β= 1000. Three types
of services {s1,s2,s3}are assumed in the network. The DCN
costs of per-unit capacity for {s1,s2,s3}are assumed to be
different from each other as {Ps1,Ps2,Ps3}={200,250,300}.
Besides, the maximum allowed number of p-cycles is set to
242 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 1, JANUARY 2014
J=7. Fig. 5c summarizes the simulation parameters. Service
demands ds
uat each node are listed in the first two columns
in Fig. 5d.
The optimal DCN placement and the capacity configuration
of each service are shown in Fig. 5a. DCNs are placed at
nodes 4, 6 and 8. The capacity of the services at each DCN is
{cs1
u,cs2
u,cs3
u}. Each term cs
uis expressed by x+y,where
xdenotes the service working capacity in the failure-free
scenario and ydenotes the capacity of service replicas. The
service paths in the failure-free scenario are listed in the
last three columns in Fig. 5d, with a number above each
arrow indicating the traffic load on the link. Besides, Fig. 5e
gives the p-cycles against a link failure, and Fig. 5f lists the
preconfigured protection segments against a service failure.
The total network cost is 120805.
B. DPP Design in Small-Size Networks
Fig. 6 gives the heuristic solution generated by the DPP
algorithm in Fig. 2 for COST 239, with the same set of
system parameters as in Fig. 5c and service demands as in
Fig. 5d. In particular, Fig. 6a shows DCN placement and
service routing, where the shortest-path based service paths
are drawn in the network topology using the bold arrows.
Fig. 6b shows the required p-cycles against a link failure and
the preconfigured DCN protection segment against a service
failure, as well as the total network cost and the gap to
the ILP based optimal solution in Fig. 5. The small gap-to-
optimality of 16.93% confirms the superior performance of
DPP. In addition to COST 239, we have carried out many other
experiments with similar network sizes. The results show that
the gap-to-optimality of DPP keeps quite stable at almost the
same level.
Despite of the excellent performance of DPP, some major
differences between the solutions in Fig. 5 and Fig. 6 can
be observed: 1) DCN placement schemes are different. There
are 3 DCNs in Fig. 5a but 2 in Fig. 6a. In addition, both
the DCN locations and capacities are different; 2) Due to
the differences on DCN placement and service routing, p-
cycle placement is also quite different in the two solutions.
In particular, the optimal solution requires 3 p-cycles as in
Fig. 5e, whereas the DPP solution requires 7 as in Fig. 6b;
3) Service transmissions are based on shortest paths in DPP
but may take detours in the optimal solution due to the more
intelligent joint optimization; and 4) In both failure-free and
service failure scenarios, the anycast service mode is better
supported in the optimal solution. In contrast, DPP always
serves the demands using the closest DCN in the failure-free
scenario, or resorts to the closest DCN replica for service
protection upon a service failure. Take the circled part in Fig.
5d as an example for the failure-free scenario. In Fig. 5d, the
demands at node 7 on service s3are served simultaneously by
nodes 4 and 8 (i.e., anycast), with a distance-related link cost
of 300 for 4−→ 7and 600 for 8−→ 7(i.e., detour). In Fig.
6a, the demands at node 7 are served by the closest DCN5.
Moreover, it is obvious from Fig. 5f that the anycast mode is
well supported in the optimal solution upon any service failure,
with multiple DCN replicas providing fast service protection
at the same time. In contrast, DPP only uses the closest DCN
replica (see Step 2.1.1 in Fig. 2).
C. DPP Design in Large-Size Networks
We now apply DPP to the network topology in Fig. 7a with
30 nodes and 62 links, which is taken from [14]. Simulation
parameters as shown in Fig. 7b are almost the same as that in
Fig. 5 and Fig. 6, but the link and path costs are based on a
constant cost of 600 for each link. Service demands at each
node are listed in Fig. 7c.
Fig. 7a shows the DCN placement and service routing
result. In Fig. 7a, each of the five shadowed zones covers
the nodes with demands served by the corresponding DCN in
the zone, and the bold arrows indicate the service transmission
paths in the failure-free scenario. Fig. 7d lists the capacity con-
figuration and the preconfigured protection segments (against
a service failure) for each DCN. As shown in Fig. 7e, 7 p-
cycles are required for fast protection against a link failure.
The total network cost is 278850.
D. Relationship Between B and the Number of DCNs
Fig. 8 shows how the number of DCNs placed in the
network changes with the basic investment B. In particular,
Fig. 8a is obtained using the optimal ILP based on the COST
239 network with the same sets of simulation parameters and
service demands as in Fig. 5. Fig. 8b is obtained using DPP
based on the same network and simulation settings as in Fig.
7. Obviously, the number of DCNs decreases as Bincreases,
because constructing a DCN becomes more expensive and thus
a less number of DCNs should be deployed. Fig. 8 confirms
this trend in both the optimal ILP and the DPP solutions.
VI. CONCLUSION
We studied the joint design of DCN (Data Center Network)
placement, service routing and service protection for provid-
ing cloud services in all-optical WDM (Wavelength Division
Multiplexing) networks. An optimal ILP (Integer Linear Pro-
gram) was first formulated to achieve joint optimization for
minimizing the total network cost. Then, a heuristic algorithm
DPP (DCN Placement and Service Protection) was proposed
to make the design more scalable in large-size networks.
The solutions generated by our algorithms can work under
the anycast service mode to satisfy all service demands in
the network, with the minimized network cost by leveraging
between the costs of DCNs and optical wavelengths. With
preconfigured protection segments and p-cycles adopted in
the proposed scheme, fast service protection can be achieved
to fight against a service or link failure. Numerical results
validated the correctness of the ILP and demonstrated the
superior performance of the proposed DPP heuristic.
ACKNOWLEDGMENT
This work is supported by the Major State Basic Re-
search Program of China (973 project No. 2013CB329301 and
2010CB327806), the Natural Science Fund of China (NSFC
project No. 61372085, 61032003 and 61271165), the Research
Fund for the Doctoral Program of Higher Education of China
(RFDP project No. 20120185110025 and 20120185110030),
and the Fundamental Research Funds for the Central Uni-
versities. It is also supported by Tianjin Key Laboratory of
XIAO et al.: JOINT DESIGN ON DCN PLACEMENT AND SURVIVABLE CLOUD SERVICE PROVISION OVER ALL-OPTICAL MESH NETWORKS 243
Working Capacity Capacity Replica
DCN0
DCN25
DCN15
DCN21
DCN5
9 9 6
6 5 7
6 9 8
7 5 8
8 3 6
0 0 0
0 0 0
9 9 8
0 0 0
8 5 7
DCN
(d) DCN placement and preconfigured protection segments.
Protection Segment
5-3-0
15-14-5
5-14-15
5-7-17-21
15-13-25
0-1-2-6-19-20-29-21-18-17-22-23-15-24-25-13-11-10-12-14-16-7-5-4-3-0
0-1-2-6-19-20-29-21-18-17-22-23-15-24-25-13-11-10-9-14-16-7-8-5-4-3-0
p-cycles
(e) 7 p-cycles required for fast link protection.
0-1-2-5-14-8-7-16-23-15-24-26-27-25-13-11-10-12-4-3-0
11-13-15-23-22-21-20-29-28-27-26-24-25-11
0-1-3-4-0
0-3-4-0
0-3-4-0
(b) Simulation parameters.
Simulation
parameters
B=16000 α=0.01 β=1000 S={s1, s2, s3}
=200 =250 =300
J=20
Cuv=600 for (,)∈
(a) DCN placement and service routing.
(c) Service demands at each node.
Node (u)
0
1
2
3
4
5
6
7
8
9
1 2 1
2 1 1
1 0 1
1 1 1
1 2 1
1 0 2
2 1 1
3 1 1
1 1 1
1 1 0
10
11
12
13
14
15
16
17
18
19
1 1 1
2 0 2
1 1 1
2 2 1
0 1 1
1 1 2
1 2 1
1 0 1
0 1 1
2 1 1
Node (u)
Node (u)
20
21
22
23
24
25
26
27
28
29
1 2 1
1 0 2
3 1 1
1 0 1
0 3 1
2 1 1
0 2 2
1 1 1
1 0 1
1 1 2
0
4 9 10 11 26
27
25
13
12
8
1 3
14
5
2
7
6
16
15 24
28
29
21
20
19
17
18
23
22
Fig. 7. DPP design in large-size network taken from [14] with 30 nodes and 62 links. The total network cost is 278850.
(a) Optimal ILP solution for COST 239. (b) DPP solution for the network in Fig. 7a.
0
1
2
3
4
5
Number of DCNs
Fig. 8. The relationship between Band the number of DCNs in the optimal ILP (for COST 239) and the DPP (for the network in Fig. 7a) solutions.
244 IEEE TRANSACTIONS ON COMMUNICATIONS, VOL. 62, NO. 1, JANUARY 2014
Cognitive Computing and Application, School of Computer
Science and Technology, Tianjin University, Tianjin, P. R.
China.
APPENDIX PROOF OF THEROM 1
Thereom 1:In Service Matching(L,v,Cv), Amin always
keeps increasing with |Cv|for |Cv|>k
min.
Proof :InService Matching(L,v,Cv), assume that a
node uis added to Cvat a particular stage and |Cv|=k>
kmin,butCand Dare not updated yet. For ease of analysis,
at this point we redefine some parameters as follows.
Ak
min =Amin;(18)
Ck=C;(19)
Ck=
sS
ds
u(Ps+Puv); (20)
Dk=D;(21)
Dk=
sS
ds
u.(22)
Then, based on (17) we have
Ak
min =Ck+Ck
Dk+Dk
.(23)
When Ckand Dkare updated, we have
Ck+1 =Ck+Ck;(24)
Dk+1 =Dk+Dk.(25)
We now prove Theorem 1 by induction. Because the sub-
routine Service Matching (L,v,Cv) breaks and returns with
|Cv|=kmin,wehaveAkmin
min <A
kmin+1
min (see the circled parts
in Fig. 2).
Assume that Ak
min <A
k+1
min holds for a specific kkmin.
According to (23), we have
Ck+Ck
Dk+Dk
<Ck+1 +Ck+1
Dk+1 +Dk+1
.(26)
From (24)-(26), we get
Ck+1Dk+1 −C
k+1 D
k+1 >0.(27)
On the other hand, since Ak+1
min is minimized by adding a
specific node uto Cvand thus makes |Cv|=k+1,wehave
Ck+2
Dk+2
=Ck+1 +Ck+1
Dk+1 +Dk+1
<Ck+1 +Ck+2
Dk+1 +Dk+2
,(28)
where Ck+2 and Dk+2 match another node (i.e., the (k+
2)th node) added to Cv(after node u). In other words, the
inequality in (28) must hold because that is why node uis
chosen as the (k+1)
th node added to Cv. From (27)-(28), we
get
Ck+2Dk+2 >Ck+2 D
k+2,(29)
and thus Ck+2
Dk+2
<Ck+2 +Ck+2
Dk+2 +Dk+2
,(30)
which is equivalent to
Ak+1
min <A
k+2
min .(31)
This proves Theorem 1 by closing the induction.
REFERENCES
[1] K. Chen, C. Guo, H. Wu, J. Yuan, Z. Feng, Y. Chen, S. Lu, and W.
Wu, “DAC: generic and automatic address configuration for data center
networks,” IEEE/ACM Trans. Networking, vol. 20, no. 1, pp. 84–99,
2012.
[2] M. Bari, R. Boutaba, R. Esteves, L. Granville, M. Podlesny, M. Rabbani,
Q. Zhang, and M. Zhani, “Data center network virtualization: a survey,”
IEEE Commun. Surveys & Tutorials, pre-published, pp. 1–20, 2012.
[3] C. Lam, H. Liu, B. Koley, X. Zhao, V. Kamalov, and V. Gill, “Fiber optic
communication technologies: what’s needed for datacenter network
operations,” IEEE Commun. Mag., vol. 48, no. 7, pp. 32–39, July 2010.
[4] Z. Zheng, T. Zhou, M. Lyu, and I. King, “Component ranking for
fault-tolerant cloud applications,” IEEE Trans. Services Computing,pre-
published, 2011.
[5] Y. Simmhan, C. Ingen, G. Subramanian, and J. Li, “Bridging the gap
between desktop and the cloud for escience applications,” in Proc. 2010
IEEE International Conference on Cloud Computing, pp. 474–481.
[6] P. Wright, T. Harmer, J. Hawkins, and Y. L. Sun, “A commodity-focused
multi-cloud marketplace exemplar application,” in Proc. 2011 IEEE
International Conference on Cloud Computing, pp. 590–597.
[7] J. Abley, A. Canada, and K. Lindqvist, RFC 47867-Operation of Anycast
Services, Dec. 2006.
[8] M. F. Habib, M. Tornatore, M. D. Leenheer, F. Dikbiyik, and B.
Mukherjee, “Design of disaster-resilient optical datacenter networks,”
IEEE/OSA J. Lightw. Technol., vol. 30, no. 16, pp. 2563–2573, Aug.
2012.
[9] L. Guo, J. Cao, H. Yu, and L. Li, “Path-based routing provisioning with
mixed shared protection in WDM mesh networks,” IEEE/OSA J. Lightw.
Technol., vol. 24, no. 3, pp. 1129–1141, 2006.
[10] L. Guo, “LSSP: a novel local segment-shared protection for multi-
domain optical mesh networks,” Computer Commun., vol. 30, no. 8,
pp. 1794–1801, 2007.
[11] B. Wu, K. L. Yeung, and P.-H. Ho, “Monitoring cycle design for fast
link failure localization in all-optical networks,” in IEEE/OSA J. Lightw.
Technol., vol. 27, no. 10, pp. 1392–1401, May 2009.
[12] B. Wu, P.-H. Ho, K. L. Yeung, J. Tapolcai, and H. T. Mouftah, “Optical
layer monitoring schemes for fast link failure localization in all-optical
networks,” IEEE Commun. Surveys and Tutorials, vol. 13, no. 1, pp.
114–125, 2011.
[13] B. Wu, P.-H. Ho, and K. L. Yeung, “Monitoring trail: on fast link failure
localization in WDM mesh networks,” IEEE/OSA J. Lightw. Technol.,
vol. 27, no. 18, pp. 4175–4185, Sept. 2009.
[14] B. Wu, K. L. Yeung, and P.-H. Ho, “ILP formulations for p-cycle design
without candidate cycle enumeration,” IEEE/ACM Trans. Networking.,
vol. 18, no. 1, pp. 284–295, Feb. 2010.
[15] B. Wu, P.-H. Ho, K. L. Yeung, J. Tapolcai, and H. T. Mouftah, “CFP:
cooperative fast protection,” IEEE/OSA J. Lightw. Technol. , vol. 28, no.
7, pp. 1102–1113, Apr. 2010.
[16] L. Guo and L. Li, “A novel survivable routing algorithm with partial
shared-risk link groups (SRLG)-disjoint protection based on differenti-
ated reliability constraints in WDM optical mesh networks,” IEEE/OSA
J. Lightw. Technol., vol. 25, no. 6, pp. 1410–1415, Jun. 2007.
[17] S. S. Ahuja, S. Ramasubramanian, and M. Krunz, “SRLG failure
localization in optical networks,” IEEE/ACM Trans. Networking,vol.
19, no. 4, pp. 989–999, 2011.
[18] J. Liu, X. Jiang, H. Nishiyama, and N. Kato, “Reliability assessment
for wireless mesh networks under probabilistic region failure model,”
IEEE Trans. Vehicular Technol., vol. 60, no. 5, pp. 2253–2264, 2011.
[19] X. Wang, X. Jiang, and A. Pattavina, “Assessing network vulnerability
under probabilistic region failure model,” in Proc. 2011 IEEE Inter-
national Conference on High Performance Switching and Routing, pp.
164–170.
[20] M. S. Kiaei, C. Assi, and B. Jaumard, “A survey on the p-cycle
protection method,” IEEE Commun. Surveys and Tutorials, vol. 11, no.
3, pp. 53–70, 2009.
[21] K. L. Yeung and T.-S. P. Yum, “Node placement optimization in
ShuffleNets,” IEEE/ACM Trans. Networking, vol. 6, no. 3, pp. 319–324,
1998.
[22] B. Wang, H. Xu, W. Liu, and H. Liang, “A novel node placement for
long belt coverage in wireless networks,IEEE Trans. Computers,pre-
published, 2012.
[23] P. Cheng, C.-N. Chuah, and X. Liu, “Energy-aware node placement in
wireless sensor networks,” in Proc. 2004 IEEE GlobeCom, vol. 5, pp.
3210–3214.
[24] B. Wu, K. L. Yeung, and P.-H. Ho, “ILP formulations for non-simple
p-cycle and p-trail design in WDM mesh networks,Elsevier Computer
Networks, vol. 54, no. 5, pp. 716–725, Apr. 2010.
XIAO et al.: JOINT DESIGN ON DCN PLACEMENT AND SURVIVABLE CLOUD SERVICE PROVISION OVER ALL-OPTICAL MESH NETWORKS 245
[25] T. Y. Chow, F. Chudak, and A. M. Ffrench, “Fast optical layer mesh
protection using pre-cross-connected trails,” IEEE/ACM Trans. Network-
ing., vol. 12, no. 3, pp. 539–548, Jun. 2004.
[26] C. Barnhart, E. L. Johnson, G. L. Nemhauser, M. W. Savelsbergh, and
P. H. Vance, “Branch-and-price: column generation for solving huge
integer programs,” in Operations Research., vol. 46, no. 3, pp. 316–
329, 1998.
[27] J. El-Najjar, C. Assi, and B. Jaumard, “Joint routing and scheduling in
WiMAX-based mesh networks,” IEEE Trans. Wireless Commun.,vol.
9, no. 7, pp. 2371–2381, 2010.
[28] B. Jaumard and H. A. Hoang, “Design and dimensioning of logical
survivable topologies against multiple failures,IEEE/OSA J. Optical
Commun. and Networking, vol. 5, no. 1, pp. 23–36, 2013.
Jie Xiao received the B.Eng. degree in Electronic
Information Science and Technology from Central
China Normal University (Wuhan, P. R. China)
in 2010. He is now pursuing his Ph.D degree in
Computer Systems and Networking at Tianjin Uni-
versity (Tianjin, P. R. China) under the supervision
of professor Bin Wu. His research interests in-
clude computer systems and networking, optical and
wireless communications and networking, network
survivability and security issues.
Hong Wen was born in Chengdu, P. R. China. She
received the M.Sc. degree in Electrical Engineering
from Sichuan Union University of Sichuan, P. R.
China, in 1997. She pursued her Ph.D. degree in
Communication and Computer Engineering Dept. at
the Southwest Jiaotong University (Chengdu, P. R.
China). Then she worked as an associate professor
in the National Key Laboratory of Science and
Technology on Communications at UESTC, P. R.
China. From January 2008 to August 2009, she was
a visiting scholar and postdoctoral fellow in the ECE
Dept. at University of Waterloo. Now she holds the professor position at
UESTC, P. R. China. Her major interests focus on wireless communication
systems.
Bin Wu (S04 M07) received his Ph.D. degree
in Electrical and Electronic Engineering from The
University of Hong Kong (Pokfulam, Hong Kong)
in 2007. He worked as a postdoctoral research fellow
from 2007-2012 in the ECE Dept. at University of
Waterloo (Waterloo, Canada). He is now a professor
in the School of Computer Science and Technology
at Tianjin University (Tianjin, P. R. China). His
research interests include computer systems and
networking, optical, wireless communications and
networking and network survivability.
Xiaohong Jiang (M03) received his B.S., M.S. and
Ph.D degrees all from Xidian University, Xian, P.
R. China. He is currently a full professor at Future
University Hakodate, Hakodate, Japan. Dr. Jiang
was an Associate professor at Tohoku University,
Japan from Feb. 2005 to Mar. 2010, an assistant
professor in Japan Advanced Institute of Science and
Technology (JAIST) from Oct. 2001 to Jan. 2005.
He was a JSPS research fellow at JAIST from Oct.
1999 to Oct. 2001, and a research associate in the
University of Edinburgh from Mar. 1999 to Oct.
1999. His research interests include computer communications and networks,
mainly on wireless networks, optical networks, etc. He has published over 190
technical papers at premium international journals and conferences, which
include over 20 papers published in IEEE journals such as IEEE/ACM
TRANSACTIONS ON NETWORKING, IEEE JOURNAL OF SELECTED AREAS
ON COMMUNICATIONS, etc. Dr. Jiang was the winner of the Best Paper
Award and Outstanding Paper Award of IEEE WCNC 2008, IEEE ICC 2005
Optical Networking Symposium, and IEEE/IEICE HPSR 2002. He is a Senior
Member of IEEE.
Pin-Han Ho received his B.Sc. and M.Sc. degrees
from the Electrical and Computer Engineering, De-
partment of National Taiwan University in 1993
and 1995, respectively. He started his Ph.D. studies
in 2000 at Queens University, Kingston, Ontario,
Canada, focusing on optical communications sys-
tems, survivable networking, and QoS routing prob-
lems. He finished his Ph.D. in 2002, and joined the
Electrical and Computer Engineering Department at
the University of Waterloo as an assistant professor
in the same year, where he is currently an associate
professor. He is the author/co-author of more than 100 refereed technical
papers and book chapters, and the co-author of a book on optical networking
and survivability. He is the recipient of the Distinguished Research Excellence
Award in the ECE Department at the University of Waterloo, the Early
Researcher Award in 2005, the Best Paper Award at SPECTS 02 and the
ICC 05 Optical Networking Symposium, and the Outstanding Paper Award
in HPSR 02.
Lei Zhang (S03 M08) received her Ph.D. de-
gree in Computer Science from Auburn University
(Auburn, AL, USA) in 2008. She worked as an
assistant professor from 2008-2011 in the Computer
Science Dept. at Frostburg State University (Frost-
burg, MD, USA). She is now an assistant professor
in the School of Computer Science and Technology
at Tianjin University (Tianjin, P. R. China). Her re-
search interests include computer networks, wireless
communications, distributed algorithms and network
security.
... Datacenter placement is another important problem in supporting cloud services in optical networks. Several approaches to address the datecenter network (DCN) placement problem were proposed [87,97]. In [87] the authors proposed a static disaster-aware DCN placement to avoid placing the DC in risky (i.e., disaster) zones. ...
... So, copies of each file must be stored in multiple data centers, in order to guarantee that the requested contents/files can be delivered to users, even in the event of a disaster affecting a DC. In [97] the authors studied the DCN placement problem, adapting anycast principle which has been demonstrated to enhance the service availability by efficient content replication in ...
Book
Recent studies show that deliberate malicious attacks performed by high-power signals can put large amount of data under risk. We investigate the problem of survivable optical networks resource provisioning scheme against malicious attacks, more specifically crosstalk jamming attacks. These types of attacks may cause service disruption (or possibly service denial). We consider optical networks based on wavelength-division multiplexing (WDM) technology and two types of jamming attacks: in-band and out-of-band attacks. We propose an attack-aware routing and wavelength assignments (RWA) scheme to avoid or reduce the damaging effects of potential attacking signals on individual or multiple legitimate lightpaths traversing the same optical switches and links. An integer linear programs (ILPs) as well as heuristic approaches were proposed to solve the problem. We consider dynamic traffic where each demand is defined by its start time and a duration. Our results show that the proposed approaches were able to limit the vulnerability of lightpaths to jamming attacks. Recently, large-scale failures caused by natural disasters and/or deliberate attacks have left major parts of the networks damaged or disconnected. We also investigate the problem of disaster-aware WDM network resource provisioning in case of disasters. We propose an ILP and efficient heuristic to route the lightpaths in such a way that provides protection against disasters and minimize the network vi resources such as the number of wavelength links used in the network. Our models show that significant resource savings can be achieved while accommodating users demands. In the last few years, optical networks using Space Division Multiplexing (SDM) has been proposed as a solution to the speed bottleneck anticipated in data center (DC) networks. To our knowledge the new challenges of designing such communication systems have not been addressed yet. We propose an optimal approach to the problem of developing a path-protection scheme to handle communication requests in DC networks using elastic optical networking and space division multiplexing. We have formulated our problem as an ILP. We have also proposed a heuristic that can handle problems of practical size. Our simulations explore important features of our approach.
... The researchers [19] suggested a solution to improve the quality of service in terms of service availability against failures and cost efficiency by choosing the optimal location for the cloud data center and providing service in cases of failure in the data center networks by formulating a linear program to achieve the optimal design of the joint. It integrates a preconfigure protection cycle to provide failover protection in just one link. ...
Article
Full-text available
Abstract: Massive amounts of heterogeneous data are produced by Internet of Things (IoT) devices utilized in daily life and numerous fields, and these data streams need to be stored, processed, analyzed, and transmitted to the cloud. It usually suffers from missing values and anomalies; system services also suffer from congestion due to slow processors, resulting in low throughput, a high response time, slow decision-making, and data loss, resulting in low quality of service and the deterioration of the system's performance. In this study, propose to integrate the smart controller (SC) with the Message Queuing Telemetry Transport (MQTT) broker and services in the fog node to make decisions automatically to prevent congestion in the system's services and speed up the processing. The IoT stream is inspected in the services for anomalies using one-class support vector machines (OCSVM). Then, using the integrating technique of principal component analysis (PCA) and the k-nearest neighbors (KNN) algorithm in the SC, obtain the best prediction of the efficient number of services that must be deployed in the system. The operating model proposed showed significantly stable system performance in terms of throughput, latency, response time, the amount of data loss, and preventing congestion.
... The ILP formulation models an NP-hard problem, so it scales poorly as the problem instance increases. It is the same case with all mathematical modelling (INLP) (MILP) and (MINLP) [53][54][55]. A sub-optimal solution must be found instead of an exact solution to tackle this issue. ...
Article
Full-text available
Within Edge and Fog computing, edge and fog nodes must be optimally located at the network edge to minimise the network's overall latency. This survey addresses all aspects of these nodes' placement problems. Literature on edge and fog nodes' placement is collected from reputable databases (IEEE Xplore digital library, Scopus, ScienceDirect, and Web of Science) using a search query. Manual search using keywords and the snowball method is also used to get as many related papers as possible. According to defined inclusion criteria, retrieved documents are filtered to 64 articles for eight years (2015–2022). Depending on the optimisation method used, literature is classified into six categories. The first relies on Integer programming, accounting for 20.3% (13/64). The second category depends on heuristic and metaheuristic methods, accounting for 20.3% (13/64). The third category depends on hybrid methods between the two aforementioned categories accounting for 18.7% (12/64). Forth category depends on clustering methods, accounting for 11% (7/64). The fifth category depends on reinforcement learning, accounting for 6.3% (4/64). And the final category depends on the hybrid methods between two or more methods mentioned above, accounting for 23.4% (15/64). Papers have been analysed to get information like the optimisation problem, the method used for solving it, considered parameters, objectives, constraints, implementation tools, and evaluation methods.
... As the next generation power system, smart grid can realize on-line monitoring and real-time information control of important operation parameters in each link of power grid by using advanced digital information and communication technology. However, the access of communication facilities also makes smart grid face potential network attack risk [1][2][3][4][5][6][7][8][9][10][11][12]. In particular, the false data injection (FDI) attack can bypass the traditional bad data detection (BDD) mechanism in the power system. ...
Article
Full-text available
The concealed false data injection (FDI) attack in the smart grid can successfully pass the power system state estimation commonly used in the residual bad data test without being detected by the existing algorithms, resulting in the state estimation error of the control center and disturbing the normal operation of the power system. In this paper, false data injection attack and attack detection are carried out in the actual power edge security protection system, microgrid control system and intelligent energy security protection system. Among them, the false data attack part simulates the attack of the voltage phase angle, and compares it with the ordinary attack method, and demonstrates the success of these attacks in the actual power grid system.
Article
Full-text available
With the growth of data volumes and variety of Internet applications, data centers (DCs) have become an efficient and promising infrastructure for supporting data storage, and providing the platform for the deployment of diversified network services and applications (e.g., video streaming, cloud computing). These applications and services often impose multifarious resource demands (storage, compute power, bandwidth, latency) on the underlying infrastructure. Existing data center architectures lack the flexibility to effectively support these applications, which results in poor support of QoS, deployability, manageability, and defence against security attacks. Data center network virtualization is a promising solution to address these problems. Virtualized data centers are envisioned to provide better management flexibility, lower cost, scalability, better resources utilization, and energy efficiency. In this paper, we present a survey of the current state-of-the-art in data center networks virtualization, and provide a detailed comparison of the surveyed proposals. We discuss the key research challenges for future research and point out some potential directions for tackling the problems related to data center design.
Article
Full-text available
Coverage is an important issue in many wireless networks. In this paper, we address the problem of node placement for ensuring complete coverage in a long belt scenario and propose a novel placement approach to minimize the number of nodes needed. In our work, each node is assumed to be able to cover a disk area centered at itself with a fixed radius, then a divide-and-cover node placement method is proposed. In the proposed method, a long belt is divided into some sub-belts (if necessary), and then a string of nodes are placed parallel to the long side of each sub-belt to completely cover the sub-belt. We then determine the optimal distance between two adjacent nodes in a string and the number of such strings to minimize the number of nodes for complete belt coverage. Theoretical proofs and analysis show that compared with other node placement including the well-known regular triangular-lattice placement, the proposed method can achieve lower node density in some cases when the belt height is not very large. A combination of the proposed method and the triangular-lattice placement is then proposed, and the optimal ranges of the belt height for their respective applications to achieve the lowest node density are computed.
Article
Full-text available
In IP-over-WDM networks, protection can be offered at the optical layer or at the IP layer. Today, it is well acknowledged that synergies need to be developed between the IP and optical layers in order to optimize the resource utilization and to reduce the costs and the energy consumption of future networks. In this paper, we study the design of logical survivable topologies for service recovery against multiple failures, including SRLG—shared risk link group—failures in IP-over-WDM networks. We propose a new optimization model, called surlog_cgilp, based on a column generation path formulation. It is highly scalable and allows the exact solution of several benchmark instances, which have only been solved with the help of heuristics so far. In the numerical experiments, we investigate the dimensioning of the physical links assuming IP restoration against multiple-link failures. We observe that the redundancy ratios (recovery over primary ratios for the bandwidth requirements) that are obtained are similar to the redundancy ratios reported for optical protection.
Article
Full-text available
The concept of p -cycle (preconfigured protection cycle) allows fast and efficient span protection in wavelength division multiplexing (WDM) mesh networks. To design p -cycles for a given network, conventional algorithms need to enumerate cycles in the network to form a candidate set, and then use an integer linear program (ILP) to find a set of p -cycles from the candidate set. Because the size of the candidate set increases exponentially with the network size, candidate cycle enumeration introduces a huge number of ILP variables and slows down the optimization process. In this paper, we focus on p -cycle design without candidate cycle enumeration. Three ILPs for solving the problem of spare capacity placement (SCP) are first formulated. They are based on recursion, flow conservation, and cycle exclusion, respectively. We show that the number of ILP variables/constraints in our cycle exclusion approach only increases linearly with the network size. Then, based on cycle exclusion, we formulate an ILP for solving the joint capacity placement (JCP) problem. Numerical results show that our ILPs are very efficient in generating p -cycle solutions.
Conference Paper
The node placement problem in ShuffleNets is a combinatorial optimization problem. An efficient node placement algorithm called the gradient algorithm is proposed. A communication cost function between a node pair is defined and the gradient algorithm places the node pairs one by one based on the gradient of the cost function. Then two lower bounds on the traffic weighted mean internodal distance h¯ are proposed. The performance of the gradient algorithm is compared to the lower bounds as well as some algorithms in the literature. Significant reduction of h¯ is obtained with the use of the gradient algorithm, especially for highly skewed traffic distributions. For a ShuffleNet with N=64 nodes, the h¯ found is only 22% above the lower bound for the uniform random traffic distribution, and 14.7% for a highly skewed traffic distribution with skew factor γ=100
Article
Survivability against disasters-both natural and deliberate attacks, and spanning large geographical areas-is becoming a major challenge in communication networks. Cloud services delivered by datacenter networks yield new opportunities to provide protection against disasters. Cloud services require a network substrate with high capacity, low latency, high availability, and low cost, which can be delivered by optical networks. In such networks, path protection against network failures is generally ensured by providing a backup path to the same destination (i.e., a datacenter), which is link-disjoint to the primary path. This protection fails to protect against disasters covering an area which disrupts both primary and backup paths. Also, protection against destination (datacenter) node failure is not ensured by a generic protection scheme. Moreover, content/service protection is a fundamental problem in a datacenter network, as the failure of a datacenter should not cause the disappearance of a specific content/service from the network. So content placement, routing, and protection of paths and content should be addressed together. In this work, we propose an integrated Integer Linear Program (ILP) to design an optical datacenter network, which solves the above-mentioned problems simultaneously. We show that our disaster protection scheme exploiting anycasting provides more protection, but uses less capacity than dedicated single-link failure protection. We show that a reasonable number of datacenters and selective content replicas with intelligent network design can provide survivability to disasters while supporting user demands. We also propose ILP relaxations and heuristics to solve the problem for large networks.
Article
A monitoring cycle (m-cycle) is a preconfigured optical loop-back connection of supervisory wavelengths with a dedicated monitor. In an all-optical network (AON), if a link fails, the supervisory optical signals in a set of m-cycles covering this link will be disrupted. The link failure can be localized using the alarm code generated by the corresponding monitors. In this paper, we first formulate an optimal integer linear program (ILP) for m-cycle design. The objective is to minimize the monitoring cost which consists of the monitor cost and the bandwidth cost (i.e., supervisory wavelength-links). To reduce the ILP running time, a heuristic ILP is also formulated. To the best of our survey, this is the first effort in m-cycle design using ILP, and it leads to two contributions: 1) nonsimple m-cycles are considered; and 2) an efficient tradeoff is allowed between the monitor cost and the bandwidth cost. Numerical results show that our ILP-based approach outperforms the existing m-cycle design algorithms with a significant performance gain.
Article
Wireless networks in an open environment are ex- posed to various large region threats, e.g., natural disasters and malicious attacks. Available works with regard to region failures generally adopt a kind of "deterministic" failure models, which failed to reflect some key features of a real region failure. In this paper, we provide a more general "probabilistic" region failure model to capture the key features of a region failure and apply it for the reliability assessment of wireless mesh networks. To facilitate such an assessment, we develop a grid-partition-based scheme to estimate the expected flow capacity degradation from a random region failure. We then establish a theoretical framework to determine a suitable grid partition such that a specified estima- tion error requirement is satisfied. The grid partition technique is also useful for identifying the vulnerable zones of a network, which can guide network designers to initiate proper network protection against such failures. This paper helps us understand the network reliability under a region failure and facilitates the design and maintenance of future highly survivable wireless networks. Index Terms—Network reliability, region failure, wireless mesh networks.