ArticlePDF Available

Bandwidth on demand for inter-data center communication

November 2011

November 2011

DOI:10.1145/2070562.2070586

Authors:

Ajay Mahimkar

AT&T

Angela L. Chiu

AT&T

Show all 9 authorsHide

Cloud service providers use replication across geographically distributed data centers to improve end-to-end performance as well as to offer high reliability under failures. Content replication often involves the transfer of huge data sets over the wide area network and demands high backbone transport capacity. In this paper, we discuss how a Globally Reconfigurable Intelligent Photonic Network (GRIPhoN) between data centers could improve operational flexibility for cloud service providers. The proposed GRIPhoN architecture is an extension of earlier work [34] and can provide a bandwidth-on-demand service ranging from low data rates (e.g., 1 Gbps) to high data rates (e.g., 10-40 Gbps). The inter-data center communication network which is currently statically provisioned could be dynamically configured based on demand. Today's backbone optical networks can take several weeks to provision a customer's private line connection. GRIPhoN would enable cloud operators to dynamically set up and tear down their connections (sub-wavelength or wavelength rates) within a few minutes. GRIPhoN also offers cost-effective restoration capabilities at wavelength rates and automated bridge-and-roll of private line connections to minimize the impact of planned maintenance activities.

Carrier’s view of current services & network layers.

…

provides a simplified representation of how today's technology layers are interrelated and how the service categories map to them. Most large carriers' current core transport networks consist of a Wideband Digital Cross-connect System (W-DCS) Layer, SONET Layer, DWDM Layer, and Fiber Layer. Consider the network layers from the bottom up. At the very base is the fiber-optic layer. This layer consists of fiberoptic cables connecting the various nodes in the network. Laying these cables between cities is a huge capital investment, and hence this layer is very static. Built upon this fiber base is the transport layer. Dense-wavelength-division multiplexing (DWDM) is utilized in the core network because of it's huge capacity compared with all other technologies. A modern DWDM system utilizes anywhere from 40 to 100 wavelengths, each carrying signals at rates ranging from 10 to 100Gbps. Sub-wavelength channels at 2.5Gbps or 10Gbps can be provided via muxponders. These wavelength connections are bidirectional and multiplexed together onto a fiber-pair. Hence the transport layer is known as the DWDM layer. DWDM systems were initially point-to-point systems, with all traffic terminating at the two end nodes. If some connections were destined to travel further down the line, then they would be electronically regenerated before transmission on the next leg of their path. In recent years, ROADM technologies for DWDM transport networks have been deployed due to their capital and operational savings. A ROADM network typically includes a set of multi-degree ROADM nodes connected via fibers to form a mesh topology. Traffic may be added or dropped, regenerated, or expressed through

…

Carrier's view of future services & network layers.

…

BoD for inter-data center communication using GRIPhoN.

…

GRIPhoN Testbed.

…

Figures - uploaded by Angela L. Chiu

Content may be subject to copyright.

Content uploaded by Angela L. Chiu

Content may be subject to copyright.

Bandwidth on Demand for Inter-Data Center

Communication

Ajay Mahimkar, Angela Chiu, Robert Doverspike, Mark D. Feuer, Peter Magill,

Emmanuil Mavrogiorgis, Jorge Pastor, Sheryl L. Woodward, Jennifer Yates

AT&T Labs – Research

{mahimkar,chiu,rdd,mdfeuer,pete,emaurog,jorel,sheri,jyates}@research.att.com

ABSTRACT

Cloud service providers use replication across g eographi-

cally distributed data centers to improve end-to-end perfor-

mance as well as to offer high reliability under failures. Con-

tent replication often involves the transfe r of huge data sets

over the wide area network and demands high back bone trans-

port capacity. In this paper, we discuss how a G

lobally

econﬁgurab le Intelligent Photonic Network (GRIPhoN) be -

tween data centers could improve operational ﬂexibility for

cloud service providers. The proposed GRIP hoN architec-

ture is an extension of earlier work [34 ] and can provide a

bandwidth-on-demand service ranging from low data rates

(e.g., 1 Gbps) to high data rates (e.g., 10-40 Gbps). The

inter-data center communication network which is currently

statically provisioned co uld be dynamically conﬁgured based

on demand. Today’s back bone optical networks can take

several weeks to provision a customer’s private line connec -

tion. GRIPhoN would enable cloud operators to dy nami-

cally set up and tear down their connections (sub-wavelength

or wavelength rates) within a few minutes. GRIPhoN also

offers cost-effective restoration capabilities at wavelength rates

and automated bridge -and-roll of private line conne c tions to

minimize the impact of planned maintenance activities.

Categories and Subject Descriptors

C.2.1 [Computer-Communication Networks]: Network Ar-

chitecture and Design—Network communications

General Terms

Design, Performance, Reliability

Keywords

Inter-data cen te r communication, ROAD M, OTN

The views expressed are those of the authors and do not reﬂect the ofﬁcial

policy or position of the Department of Defense or the U.S. Government

and are classiﬁed under distribution statement “A” (Approved for Public

Release, Distribution Unlimited). Permission to make digital or hard copies

of all or part of this work for personal or classroom use is granted without

fee provided that copies are not made or distributed for proﬁt or commercial

advantage and that copies bear this notice and the full citation on the ﬁrst

page. To copy otherwise, to republish, to post on servers or to redistribute

to lists, requires prior speciﬁc permission and/or a fee.

Hotnets ’11, November 14–15, 2011, Cambridge, MA, USA.

1. INTRODUCTION

In the past few ye ars, we have seen the rapid growth of

cloud service offerings from companies such as Amazon [4],

IBM [21], Yahoo [32], Apple [2], Microsoft [5], Google [15]

and Facebook [10]. These cloud service providers (CSP)

use multiple geographically distributed da ta centers to im-

prove the end-to-end performan c e as well as to offer high

availability under failures. Massive amounts of content are

being collected by the data centers. The CSPs often repli-

cate the content on a regular basis across multiple data cen-

ters. Inter-data center replication a nd redundancy imp ose

high band w idth requirements on the inter-data ce nter wide

area network.

Traditionally, a CSP leases or owns a dedicated line be-

tween its data centers. Greenberg et al. [16] reports that

wide area transport is expensive and costs m ore than the in-

ternal network of a data c enter. This is also why some CSPs

do not operate multiple geographically distributed data cen-

ters [20]. The peak trafﬁc volumes between data centers are

dominated by background, non-interactive, bulk data trans-

fers (as also observed by Chen et al. [6]). The CSP runs

backup and replication a pplications to transfer bulk data be-

tween its data centers. The scale of this data can range

from several terabytes (e.g., em e rging scientiﬁc and indus-

trial applications) to petabytes (e.g., Google’s Distributed

Peta-Scale Data Transfer [36]) . A recent survey conducted

by Forrester, Inc. [14] further highlights that a ma jority of

CSPs perform bulk data transfer among three or more data

centers. They project that inter-data-center transport require-

ments will double or triple in the next two to four ye ars.

There is a great deal of research literature on achieving

full bisection bandwidth within a data center with improved

network pe rformance (e.g., VL 2 [17], DCell [19], BCube [18],

MDCube [31], PortLand [25 ], c-Through [29], Helios [11],

Proteus [28]). However, there are few recent studies on inter-

data center bulk transfers [1, 6, 8, 23, 22]. Chen et al.

[6] characterizes the inter-data center trafﬁc characteristics

using Yahoo ! data-sets. NetStitcher [22] takes the inter-

esting approach of stitching together unutilized bandwidth

across different data centers by using multi- path and multi-

hop store and forward scheduling. It effectively achieves

inter-data center bulk transfers with existing capacity.

Our Approach. In this paper, we take a completely differ-

ent approach to achieving dynamic inter-data cen te r commu-

nication. We propose GRIPhoN - a Globally Reconﬁgurable

ntelligent Photonic Network that would offer a Bandwidth

on Demand (BoD) service in the core network for efﬁcient

inter-data center communication. We believe we are the ﬁrst

to address the inter-data center capacity issue from the car-

rier’s perspective. The mo tivation behind BoD comes from

the variability in trafﬁc demands for commu nication ac ross

data centers. Non-interactive bulk data transfers between

data centers are typically performed by the cloud operators

and have different patterns than interactive end-user driven

trafﬁc. This gives us the opportunity to explore the use of

different data rates at different times - for example, high data

rate (10-40 Gbps) between data centers for non-interactive

data transfers and low rate (1-10 Gbps) for suppo rting inter-

active sessions. G RIPhoN provides a pla tform for offering

such dynamic connectivity. The inter-data center communi-

cation network which was previously statically provisioned

can now be viewed as adjustable. GRIPhoN offers ﬂexibility

to the CSP in dynamica lly adjusting the bandwidth between

its geographically distributed data centers based on the de-

mand. The carrier also beneﬁts from the intellige nt re-use of

the p ool of resources across multip le customers.

BoD Service Vision and Today’s Reality. We now outline

the dynamic service vision of GRIPhoN and compare it to

today’s reality.

1. Dynamic conﬁgurable-rate services. The vision b e-

hind GRIP hoN is to offer dynamic multi-rate services for

communication between geographically distributed data

centers. Having a choice between multiple data rates of-

fers ﬂexibility to the CSPs in dynamically selecting the

right bandwidth based on dem and. Today carriers of-

fer BoD private-line services in limited a rch itec tures and

usually at rates ≤622 Mbps.

2. Rapid establishment of new connections. Dynamic

bandwidth adjustments require rapid connection provi-

sioning. This is achievable today at low data rates by

re-conﬁguring electronic circuit switches [9 ]. However,

provisioning times for connections which require a fu ll

wavelength in the backbone are orders of magnitude slower

than needed. This is primarily because there has been

no call fo r faster times and hence neither the Element

Manage ment Systems ( EMS) nor the optical ha rdware is

optimized for fast speeds.

3. Reduced outage time. Following any network failure,

it is im portant to quickly restore the service. For low-

data-rate services, restoration times are on the order of

milliseconds. H owever, no restoration is usually avail-

able today for full wavelength capabilities. There are two

alternatives f or private-line customers: either buy expen-

sive 1+1 protection where if a primary connec tion fails,

trafﬁc is re-routed to a backup, or wait for the carrier to

manually restore conn e ctions which me ans long ou ta ge

times (4 to 12 hours typically).

4. Minimal impact during maintenance. Maintenance is

a signiﬁcant aspect of managing and operating large ne t-

works. Carriers would like to ensu re minimal or no im-

pact of maintenance on performance. Since the wave-

length connection management is being manually han -

dled today, there is a non-negligible impact on service.

GRIPhoN Contributions. GRIPhoN aims at bridging the

gap between the dynamic service vision and today’s real-

ity as shown in Table 1. By offering dyn amic conﬁgurable-

rate services, GRIPhoN enables the CSP to a c tively adjust

their inter-data center connections. Such a BoD service is

not new to large carriers, at least for lower data rates. Such

lower-data-rate services are already offered, for example the

Optical Mesh Service (OMS) [9, 26, 27] . GRIPhoN scales

these concepts to very high data rates and offers the ﬁrst BoD

service demonstration that can select data rates from sub-

wavelength connections (e.g., 1 Gbps) to full wavelength

connections (e.g., 10-40 Gbp s) . The sub-wavelength con-

nections are provided by OTN (Optical Transport Network)

switches in the network’s OTN Layer. Full wavelength con-

nections are established in the photonic layer by using col-

orless and non-directional reconﬁgurable optical add/drop

multiplexers (ROADMs). A CSP leases dedicated optical

access to the GRIPhoN core network at multiple data center

locations and dynamically sets up optical connections be-

tween the m. GRIPhoN enables dynamic and rapid c onnec-

tion management capabilities with the automated control of

ﬁber cross-connects (FXC) to route signa ls to either the pho-

tonic or OTN layer. This enables a CSP to utilize wavelength

and/or sub-wavelength resources.

GRIPhoN also offers cost-effective restoration capabili-

ties at wavelength rates via automatic fault identiﬁcation and

dynamic re-establishment of connections. This reinstates

customer conne ctions far faster than repair of the underlying

fault. Though not as fast as 1+1 protection, this would also

be far less expensive. Finally, by using automated bridge-

and-roll [34] of private line conn e ctions, GRIPhoN mini-

mizes the impact during planned maintenance.

Comparison to prior work on dynamic optical networks.

In con trast to CANARIE [3], CHEETAH [35], DRAGON [2 4],

DWDM-RAM [13] and Lambda GRID [33] which are ini-

tiatives of research and e ducation networks tha t serve uni-

versities and national laboratories, GRIPhoN is intended for

the backbone network of a major carrier. Providing dyna mic

wavelength services on an inter-city commercial network

presents challen ges not only in the eventual scale that must

be managed, but also in the transition phase from today’s

static network. Efﬁcient network implementation across mul-

tiple layers and multiple customers, cost-e ffective service

restoration and conformance with commercial operational

practices have received less attention in the research and ed-

ucation initiatives, whereas these issues are the primary fo-

cus in GRIPhoN.

2. BANDWIDTH O N DEMAND SERVICE

In this section, we ﬁrst present a simpliﬁed view of the

services and network layers offered by the carrier. We then

describe the design of the BoD service offered by GRIPhoN

that can be utilized by the cloud service providers to dynam-

ically adjust the bandwidth available between their data cen-

ters.

BoD service vision Today’s reality GRIPhoN proposal

Dynamic conﬁgurable-rate Maximum rate well below full wave-

length rate

Rate conﬁgurable over wide range. Integrated services

using OTN, FXC and wavelength switching

Rapid establishment of new connections Takes several weeks for highest data

rates

Automated F iber Cross-connect (FXC) and ROADMs

enable full wavelength connections in minutes

Reduced outage times None (unless 1+1) for full wavelength

rates

Automated outage detection and dynamic re-

provisioning of impacted connections

Minimal impact during maintenance Non-negligible impact on service Automated bridge-and-roll [34]

Table 1: Bandwidth on Demand (BoD) service vision, today’s reality and GRIPhoN proposal.

Figure 1: Carrier’s view of current services & network

layers.

2.1 Carrier’s view of services & network lay-

ers

Fig. 1 provides a simpliﬁed representation of how today’s

technology layers are interrelated and how the service cate-

gories map to them. Most large carriers’ current core trans-

port networks consist of a Wideban d Digital Cross-connect

System (W-DCS) Lay er, SONET Layer, DWDM Layer, and

Fiber Layer.

Consider the network layers from the bottom up. At the

very base is the ﬁber-optic laye r. This layer c onsists of ﬁber-

optic cables connecting the various nodes in the network.

Laying these cables between cities is a huge capital invest-

ment, and hence this layer is very static. Bu ilt upon this

ﬁber base is the transport layer. Dense-wavelength-division

multiplexing (DWDM) is utilized in th e core network be-

cause of it’s hu ge capacity compared with a ll o ther technolo-

gies. A modern DWDM system utilizes anywhere from 40 to

100 wavelengths, each carrying signals at rates ranging from

10 to 100Gbps. Sub-wavelength channels at 2.5Gbps or

10Gbps can be provided via muxponders. These wavelength

connections are bidirectional and multiplexed together onto

a ﬁber-pair. Hence the transport layer is known as the DWDM

layer. DWDM systems were initially point-to-point systems,

with all trafﬁc terminating at the two end nodes. If some con-

nections were destine d to travel further down the line, then

they would be electronically regenerated before transmission

on the next leg of their path. I n recent years, ROADM tech-

nologies for DWDM transport networks have b een deployed

due to their capital and operational savings. A ROADM

network typically includes a set of multi-degree ROADM

nodes connected v ia ﬁbers to form a mesh topology. Trafﬁc

may be added or dropped, regenerated, o r expressed through

at each ROADM. Op tical transponders (OT) are connected

to the ports of the ROADM to transmit and receive line-

side optical signals and convert them to standard client-sid e

optical signals. Optical-to-Electrical-to-Optical (OEO) re-

generation is needed when the distance between terminat-

ing nodes exceeds a limit for ad e quate signal quality, known

as the optical reach. When that happens, optical regenera-

tors (REGENs) a re used at one or more intermediate nodes.

ROADM’s are now being deployed with a dd/drop ports which

are both “colorless” (so that any OT can be tuned to pro-

vide a signal at any wavelength) and “non-d irectional” (Any

OT’s signal can be used on any of the ROADM’s inter-node

ﬁber-pairs; this is also referred to as “steerable” or “direc-

tionless”).

The SONET (Syn chronous Optical Network) layer rides

on top of DWDM layer with B roa dband DCSs that cross-

connect at STS-1 rate as its most common network element.

The Add-Drop Multiplexer (ADM) is a special case of a

DCS with 2 degrees to fo rm SONET rings. It provides SONET

connections at rates from STS-1 (52Mbps) to OC-192 (10Gbps).

It carries both TDM and data trafﬁc and provides an au-

tomatic protection/restoration mechanism to switch trafﬁc

from working circuits to backup circuits in less than a sec-

ond. The Wide-band Digital Cross-connect System (W-DCS)

is above the SONET layer and consists of D CS-3/1s and

other DCS that cross-connect at greater tha n DS0 but be low

DS3 rates. It provides nxD S1 (1.5Mbps) TDM co nnections.

Ethernet Virtual Circuits (EVCs) provide v irtual links with

guaranteed bandwid th. Ethernet private lines are links be-

tween customer routers or Ethernet switches, usually con-

sisting of Giga bit Ethernet interfaces at customer ends and

then enca psulated and rate-limited into pipe s consisting of

virtually concatenated SONET STS-1s. Circuit-based BoD

services use virtual concatenation of channels fed from a

dedicated access or metro pipe to the customer. With cur-

rent services and network layers, the carrier offers BoD only

at the SONET la yer, not at the DWDM layer. With the

GRIPhoN vision using future services & network layers, BoD

at high data rates would be offered at the OTN la yer as well

as the DWDM layer.

Fig. 2 provides a view of such future services and network

layers from the carrier’s perspective. One of th e key assump-

tions of this service evolution model is that th e transport of

Guaranteed Bandwid th connections can be categorized by

bandwidth: below 1 Gbps is transported via the IP layer as

EVCs; 1 Gbps up to the core wavelength rate is transported

by the sub-wavelength layer as Ethernet Private Lines, most

likely encapsulated into concatenated TDM pipes; high-rate

private-line services (TDM connections of wavelength rate)

are carried directly over the DWDM (Dense Wavelength Di-

Figure 2: Carrier’s view of future services & net work

layers.

vision Multiplexing) layer. The OTN layer is introduced as

the sub-wavelength layer that provides higher switching ca-

pacity and better scalability than today’s SONET/Broadband

DCS layer. The OTN switches cross-connect at an O DU0

rate (1.25Gbps) and can su pport both TDM a nd Ethernet

packet-based client signals. Using ITU-standardized digi-

tally framed signals with digital overhead, the OTN layer

supports connection management as well as Forward Error

Correction for enhanced system performance. Compared

to using muxponders in the DWDM layer to provide sub-

wavelength con nections, the OTN layer with its switching

capability can achieve m ore efﬁcient packin g of wavelengths

in the transport network. Moreover, it can provide auto-

matic sub-second sh ared-mesh restoration similar to today’s

SONET lay er.

2.2 GRIPhoN Design

Fig. 3 shows an overview of the GRIPhoN target service

architecture that enables BoD service for d ynamic inter-data

center communication. The data center premises connect

to the carrier’s network via a ﬁxed, dedicated access pipe.

In order to allow for better grooming of the provided band-

width, we partition the ca rrier’s network into two separate

layers that are (i) the Optical Transport Network (OTN) layer

that provides low data rate connections (e.g., 1 Gbps), and

(ii) the Dense Wavelength Division Multiplexing (DWDM)

layer that provides high data rate conne c tions (e.g., 40 Gbps).

This allows a CSP to adjust the bandwidth according to their

exact needs. For example, they can use lower-speed circuits

to augment a high-speed circuit by using a co mbination of

2 x 1G OTN circuits and one 10G DWDM to achieve a to-

tal bandwidth of 12G instead of consuming a second 10G

DWDM.

Reconﬁgurable Fiber Cross-Connect (FXC). In order to

efﬁciently provide B oD services at wavelength rates, it is

necessary to have a switch on the c lient-side of the OT [12,

30]. A client-side switch allows for dynamic sharing of

transponders, which is useful in keeping costs low. While

this switch could be electronic, the low cost, small footprint,

and low-power consumption of a ﬁber-cross-connect (FXC)

Figure 3: BoD for inter-data center communication us-

ing GRIPhoN.

makes it an attractive tec hnology. Unfortunately, an FXC is

incapable of grooming trafﬁc. Therefore, to provide BoD

services at rates below the data rate of a single wavelength,

electronic switch ing is necessary. This is provided by the

OTN switch, a part of the OTN layer of the GRIPhoN net-

work. T his layer rides on top of the DWDM layer. When

a connection is requested, the FXC, under the control of the

GRIPhoN controller, directs the signal to either an OT, to

be carried directly on the DWDM layer, or to a port on the

OTN switch, where it c an be combined with other OTN sig-

nals before transmission over the DWDM layer.

GRIPhoN controller. Connection establishment and re-

lease based on requests from the CSP are han dled by the

GRIPhoN controller. The GRIPhoN co ntroller communi-

cates with the network elements via the appropriate vendor-

supplied EMS. The c ontroller is responsible for keeping track

of the available network resources in its database, commu-

nication with the network ele ments (FXC controllers, OTN

switch EMS, ROADM EMS and NTE controllers) in order

to create or tear down the connections orde red by the CSPs,

capacity and resource manageme nt, inventory database man-

agement, failure detectio n, localiza tion and automated restora-

tions. To minimize service interruption during ne twork re-

conﬁgurations due to restoration, the GRIPhoN controller

executes a bridge-and-roll operation [7, 34] that ﬁrst creates

a full new wavelength path (the “bridge”) while the original

connection is still in use an d then quickly “rolls” the traf-

ﬁc on to the new path when ready. Th e bridg e-and-roll re-

sults in an almost hitless movement of trafﬁc prior to sched-

uled maintenance or reversion following a failure restoration

(moving trafﬁc from backu p paths to repaired primary). One

constraint of the bridge-and-roll operation however, is that

the new wavelength path has to be resource disjoint to the

old path.

Customer Graphical User Interface (GUI). Each cu sto mer

has a graphical user interface to GRIPhoN to visualize an d

manage his co nnections. The customer only visualizes the

channelized or un-cha nnelized interfaces (for sub-wavelength

or wavelength connections, respectively) of the NTE on his

Figure 4: G RIPhoN Testbed.

premises. The GUI comprises capabilities for connection

management (setting up or tearing down connections on de-

mand) and simp le fault ma nagement f rom the customer view-

point, such as showing status of connections affected by out-

ages, localizing the fault location, and indicating when restora-

tion is performed. The complexity of the GRIPhoN network

(access pipes, carrier equipments, network layers, GRIPhoN

controller) is hidde n from the customer.

3. TESTBED

In this section, we describe our laboratory prototype im-

plementation of GRIPhoN and present preliminary results

on wavelength connection management. Fig. 4 shows ou r

GRIPhoN testbe d with three customer premises, and the core

GRIPhoN network with DWDM and OTN layer. The DWDM

layer consists of Reconﬁgurable Optical Add/Drop Multi-

plexers (ROADMs) to provide wavelength switching (cu r-

rently at 10 Gbps, with plans to go to 40 Gbps). In our

prototype, we use two 3-degree ROADMs and two 2-degree

ROADMs. Wavelength-tuna ble optical transponders (OT)

are installed a t the ROADM add/drop ports and are used to

setup end-to-end wavelength connections. Clien t-side FXCs

allow for dynamic sharing of OT’s and REG ENs. The OTN

layer is in the process of installation . Each of the three cus-

tomer premises sites that could host a data center facility

includes servers, Ethernet switches, low-speed multiplexers

(1Gbps/10Gbps), and a 10Gbps/40Gbps Muxponder (10/40

MXP). The servers provide vid eo-on-de mand (VoD) content

across multiple facilities. The 1/10Gbps multiplexer aggre-

gates from multiple Ethernet switches and transmits over

a high-speed (10Gbps) channelized line. The 10/40Gbps

Muxp onder emulates Network Terminating equipment (NTE)

and has four 10Gbps ports on the client side and a 40Gbps

transmission rate on the line side (towards the carrier). The

line-side is the “fat pipe” shown in Fig. 3, and it emulates

a metro network which brings customer trafﬁc to the core

network. Central O fﬁce terminals (COT) would receive the

customer data, in our prototype this is emulated by another

10/40MXP.

Path length (hops) 1 (I-IV) 2 (I-III-IV) 3 (I-II-III-IV)

Connection establishment

time (seconds)

62.48 65.67 70.94

Table 2: Dependence of w avelength connection establish-

ment times and the path lengt h in the ROADM layer.

We have constructed a customer GUI that has capabili-

ties for dynamically settin g up and tearing down connections

at chosen rates. It sh ows four 10Gbps po rts at each cus-

tomer premises. In this paper, we focus on DWDM layer

experiments. The 10Gbps connection is established from

the customer to the Core PoP (Core Point-of-Presence) over

the customer’s fat p ipe controlled through the EMS of the

40Gbps link. Th e wavelength connection that will be used

to traverse the backbone network is set up be tween a pair of

OTs installed at the source and destination ROAD M nodes

(in this case, in their respective core PoPs). The establish-

ment of a wavelength connection ranges from 60 to 70 sec-

onds. There are two contributions to this time: (i) ROADM

Element Ma nagement System (EMS) co nﬁguration steps,

and ( ii) optical tasks, such as ROADM reconﬁguration, laser

tuning, power balancing and link equalization. The times as-

sociated with both components at present are not constrained

by any fund amental limitations; rather, they represent a lack

of current carrier requirements for speed. We are now work-

ing with equipment suppliers to further understand the setup

times and ways to reduce them. The 60-70 seconds for wave-

length connection establishment is orders of magnitude bet-

ter than today’s provisioning time in the DWDM la yer. This

was primarily a chievable due to the automate d reconﬁgura-

tion of ﬁber cross-connects and ROADMs using the GRIPho N

controller. Tearing down a wavelength connection takes around

10 seconds. We also performed preliminary analysis on the

dependence of the connection provisioning times on th e path

lengths (number of hops) in the ROADM (or, DWDM) layer.

Table 2 summarizes the results over ten iterations. As the

path length increases, the connection provisioning increases.

4. RESEARCH CHALLENGES

The BoD services offered by GRIPhoN introduce an en-

tirely new set o f research and operational challenges. An

effective, integrated network design or restoration process

across IP, OTN and DWDM layers necessitates cross-layer

management. The dynamic services, the intelligent and au-

tonomous network, and the integration of multiple network

layers together present several challenges:

Network resource planning. Ensuring adequate network

resources to support anticipated demand from the CSPs is

made more difﬁcult by the existence of dynamic services.

In order to support rapid connection provisioning and faster

restorations, the ca rrier must plan ahead, where and when

to deploy the spare resources (especially OTs). Obviously, it

would be very expensive for the carrier to provision in lieu of

all possible usage scenarios. Thus, they need to forecast de-

mand and carefully manage the pool of GRIPhoN resources.

The carrier should also ensure isolation of services across

different CSPs. This resource planning at ﬁrst glance may

seem similar to the planning that was performed in providing

plain old telephony services (POTS) with resources (phone

circuits) statistically shared by multiple users. However, in

this network the number of users is smaller and the cost of

a line is far greater, makin g accurate planning far more crit-

ical.

Network re-grooming. One attractive applicatio n of GRIPhoN

that is tolerant of the connection times demonstrated in this

work is network grooming. As the GRIPhoN network grows,

additional routes betwee n nodes will be ad ded. This will

make paths that were previously unavailable more appro-

priate for so me connections than the originally established

paths. The carrier may then want to re-provision the inter-

data center communication network with better paths (reduc-

ing latency and/or off-loading the original paths). The p ro-

cess of re-provisioning connections to achieve an improved

network conﬁguration is called re-grooming. In order to

perform re-grooming with minimal imp act to the CSP, the

GRIPhoN bridge-and-roll can be used to migrate the wave-

length connections [ 34].

DWDM layer management. The connection establishment

times we have demonstrated are far slower than any fun-

damental limitations on th e DWDM layer. To reduce the

connection establishment time will place ad ditional require-

ments on bo th the physical hardware and software control

used in the DWDM layer. The optical transport system must

be able to turn wavelengths on/off and route them appro-

priately without affecting other con nections. This has im-

plications for the entire DWDM layer, from how quickly a

new wavelength is turned on, to the power transient tolerance

of the optical line (including bo th ampliﬁers and receivers).

The latter requirement is a lready b eing a ddressed by carriers

requiring that a cable cut in one pa rt of a mesh network will

not affect trafﬁc in another part of the network. Achieving

a DWDM layer with d ramatically faster end-to-end connec-

tion times in a cost-effective manner requires that the entire

system’s dynamics be considered.

5. SUMMARY

In this paper, we presented the design of Globally Recon-

ﬁgurable Intelligent Photonic Network (GRIPhoN) between

data centers that can provide BoD service rangin g from low

data rates (e.g., 1 Gbps) to wavelength rates (e.g., 40 Gbps).

GRIPhoN provides ﬂexibility to the cloud service providers

to dynamically set up and take down their wavelength con-

nections between their geographically distributed data cen-

ters w hen performing tasks like content replication or non-

interactive bulk data transfers.

Acknowledgement

We thank Adel Saleh, the DARPA Prog ram Manager of the

CORONET Program, for his inception of the program and

for his guida nce. We appreciate the support of the DARPA

CORONET Program, Contract N00173-08-C-2011 and the

U. S. Army RDE Contracting Center, Adelphi Contracting

Division, 2800 Powder Mill Rd., Adelphi, MD under con-

tract W911QX- 10-C00094. We thank Amin Vahdat (our

shepherd), Ra kesh Sinha and the HotNets anonymous re-

viewers for their insigh tful feed back. We also thank Fujitsu

and Ciena for th eir equipment and technical suppo rt.

6. REFERENCES

[1] S. Agarwal, J. Dunagan, N. Jain, S. Saroiu, A. Wolman, and H. Bhogan. Volley:

automated data placement for geo-distributed cloud services. In NSDI, 2010.

[2] Apple icloud. http://www.apple.com/icloud/.

[3] B. S. Arnaud, J. Wu, and B. K alali. Customer-controlled and -managed optical

networks. In Journal of Lightwave Technology, 2003.

[4] Amazon Simple Storage Service. aws.amazon.com/s3/.

[5] Windows azure. http://www.microsoft.com/windowsazure/.

[6] Y. Chen, S. Jain, V. K. Adhikari, Z.-L. Zhang, and K. Xu. A ﬁrst look at

inter-data center trafﬁc c haracteristics via yahoo! datasets. In IEEE INFOCOM,

2011.

[7] A. L. Chiu, G. Choudhury, G. Clapp, R. Doverspike, J. W. Gannett, J. G.

Klincewicz, G. Li, R. A. Skoog, J. Strand, A. von Lehmen, and D. Xu. Network

design and architectures for highly dynamic next-generation ip-over-optical

long distance networks. In Journal of Lightwave Technology, 2009.

[8] M. Chowdhury, M. Zaharia, J. Ma, M. I. Jordan, and I. Stoica. Managing data

transfers in computer clusters with Orchestra. In ACM SIGCOMM, 2011.

[9] R. Doverspike. Practical aspects of bandwidth-on-demand in optical networks.

In Panel on Emerging Networks, Service Provider Summit, OFC, 2007.

[10] Facebook Statistics.

www.facebook.com/press/info.php?statistics.

[11] N. Farrington, G. Porter, S. Radhakrishnan, H. H. Bazzaz, V. Subramanya,

Y. Fainman, G. Papen, and A . Vahdat. Helios: a hybrid electrical/optical switch

architecture for modular data centers. In ACM SIGCOMM, 2010.

[12] M. D. Feuer, D. C. Kilper, and S. L. Woodward. ROADMS and their syste m

applications. In Optical Fiber Telecommunications VB. New York: Academic

Press, 2008.

[13] S. Figueira, S. Naiksata, H. Cohen, D. Cutrell, P. D aspit, D. Gutierrez, and D. B.

Hoang. DWDM-RAM: Enabling grid services with dynamic optical networks.

In IEEE International Symposium on Cluster Computing and the Grid, 2004.

[14] Forrester research.

http://info.infineta.com/l/5622/2011-01-27/Y26.

[15] Google. http:

//www.google.com/corporate/datacenter/index.html.

[16] A. Greenberg, J. Hamilton, D. A. Maltz, and P. Patel. The cost of a cloud:

research problems in data center networks. In ACM SIGCOMM CCR, 2009.

[17] A. Greenberg, J. R. Hamilton, N. Jain, S. Kandula, C. Kim, P. Lahiri, D. A.

Maltz, P. Patel, and S. Sengupta. VL2: a scalable and ﬂexible data center

network. In ACM SIGCOMM, 2009.

[18] C. Guo, G. Lu, D. Li, H. Wu, X. Zhang, Y. Shi, C. Tian, Y. Zhang, and S. Lu.

BCube: a high performance, server-centric network architecture for modular

data centers. In ACM SIGCOMM, 2009.

[19] C. Guo, H. Wu, K. Tan, L. Shi, Y. Zhang, and S. Lu. DCell: a scalable and

fault-tolerant network s tructure for data centers. In ACM SIGCOMM, 2008.

[20] Perspectives - James Hamilton’s Blog, Inter-Datacenter replication &

geo-redundancy. http://perspectives.mvdirona.com/2010/05/

10/InterDatacenterReplicationGeoRedundancy.aspx.

[21] Ibm smart cloud. http://www.ibm.com/cloud-computing/us/en/.

[22] N. Laoutaris, M. Sirivianos, X. Yang, and P. Rodriguez. Inter-datacenter bulk

transfers with NetStitcher. In ACM SIGCOMM, 2011.

[23] N. Laoutaris, G. Smaragdakis, P. Rodriguez, and R. Sundaram. Delay tolerant

bulk data transfers on the internet. In ACM SIGMETRICS, 2009.

[24] T. Lehman, J. Sobieski, and B. Jabbari. DRAGON: a framework for service

provisioning in heterogeneous grid networks. In IEEE Communications

Magazine, 2006.

[25] R. Niranjan Mysore, A. Pamboris, N. Farrington, N. Huang, P. Miri,

S. Radhakrishnan, V. Subramanya, and A. Vahdat. Portland: a scalable

fault-tolerant layer 2 data center network fabric. In ACM SIGCOMM, 2009.

[26] K. Oikonomou and R. Sinha. Network design and cost analysis of optical vpns.

In OFC, 2006.

[27] Optical mesh service (OMS). http://http://www.business.att.

com/wholesale/Service/data-networking-wholesale/

long-haul-access-wholesale/

optical-mesh-service-wholesale/.

[28] A. Singla, A. Singh, K. Ramachandran, L. Xu, and Y. Zhang. Proteus: A

topology malleable data center network. In ACM HotNets, 2010.

[29] G. Wang, D. G. A nde rsen, M. Kaminsky, M. Kozuch, T. S. E. Ng,

K. Papagiannaki, and M. Ryan. c-Through: Part-time optics in data centers. In

ACM SIGCOMM, 2010.

[30] S. L. Woodward, M. D. Feuer, J. L. Jackel, and A. A garwal.

Massively-scaleable highly-dynamic optical node design. In OFC/NFOEC,

2010.

[31] H. Wu, G. Lu, D. Li, C. Guo, and Y. Zhang. MDCube: a high performance

network structure for modular data center interconnection. In ACM CoNEXT,

2009.

[32] Yahoo! http://www.yahoo.com/.

[33] O. Yu, A. Li, Y. Cao, L. Yin, M. Liao, and H. Xu. Multi-domain lambda grid

data portal for collaborative grid applications. Future Gener. Comput. Syst.,

2006.

[34] X. J. Zhang, M. Birk, A. Chiu, R. Doverspike, M. D. Feuer, P. Magill,

E. Mavrogiorgis, J. Pastor, S. L. Woodward, and J. Yates. Bridge-and-roll

demonstration in griphon (globally reconﬁgurable intelligent photonic

network). In OFC, 2010.

[35] X. Zheng, M. Veeraraghavan, N. S. V. Rao, Q. Wu, and M. Zhu. CHEETAH:

Circuit-switched high-speed end-to-end transport architecture tes tbed. In IEEE

Communications Magazine, 2005.

[36] D. Ziegler. Distributed peta-scale data transfer.

http://www.cs.huji.ac.il/˜dhay/IND2011.html.

Using SDN Technology to Enable Cost-Effective Bandwidth-on-Demand for Cloud Services [Invited]

Article

Full-text available

Dec 2014
J OPT COMMUN NETW

We describe bandwidth-on-demand in an evolved multilayer, software-defined networking (SDN) based cloud services model. We present motivation for using a multilayer architecture for the wide-area network (WAN). Our laboratory testbed is described, and both the hardware and management architecture are presented. We also show an initial proof-of-concept demonstration of this capability.

Reconfiguring multicast sessions in elastic optical networks adaptively with graph-aware deep reinforcement learning

Article

Full-text available

Jul 2021
J OPT COMMUN NETW

With the fast deployment of datacenters (DCs), bandwidth-intensive multicast services are becoming more and more popular in metro and wide-area networks, to support dynamic applications such as DC synchronization and backup. Hence, this work studies the problem of how to formulate and reconfigure multicast sessions in an elastic optical network (EON) dynamically. We propose a deep reinforcement learning (DRL) model based on graph neural networks to solve the sub-problem of multicast session selection in a more universal and adaptive manner. The DRL model abstracts topology information of the EON and the current provisioning scheme of a multicast session as graph-structured data, and analyzes the data to intelligently determine whether the session should be selected for reconfiguration. We evaluate our proposal with extensive simulations that consider different EON topologies, and the results confirm its effectiveness and universality. Specifically, the results show that it can balance the trade-off between the number of reconfiguration operations and blocking performance much better than existing algorithms, and the DRL model trained in one EON topology can easily adapt to solve the problem of dynamic multicast session reconfiguration in other topologies, without being redesigned or retrained.

On HPC and Cloud Environments Integration

Chapter

Full-text available

Mar 2021

Recently in many scientific disciplines, e.g. physics, chemistry, biology and multidisciplinary research have shifted to computational modelling. The main instrument for such numerical experiments has been supercomputing. However, the number of supercomputers and their performance grows significantly slower than the growth of user’s demands. As a result, users of supercomputers may wait for weeks until their job will be done. At the same time the computational power of cloud computing recently grows up considerably represented by heterogeneous DC network with plenty of available resources for numerical experiments. In these circumstances, it may turn out that the time spent by the task in the system, i.e. the time spent in the queue \(+\) computing time, in the cloud environment may be shorter than in HPC installation. There are several problems related to cloud and supercomputer environments integration. First, is how to make a decision where to send a computational task: to a supercomputer or to cloud. Secondly, these environments may have significantly different APIs, so moving a computational task from one environment to another may require a lot of code modification. Another significant problem is an automatic provisioning of virtual environment to execute the task properly. The third one is how to organize effectively migration data, computational tasks, applications and services in DC network, between DC and HPC installation? Saying effectively, we mean that network can allocate shortly, on demand, the necessary capacity in order to transfer the necessary amount of data for the right time. It is called ‘Capacity on Demand’ service. In this chapter an environment for academic multidisciplinary research – Meta Cloud Computing Environment (MC2E) is presented. This environment demonstrates the possible solutions and approaches to the problems listed above.

Improving Scalability in Traffic Engineering via Optical Topology Programming

Article

Jan 2023

We present a novel framework, GreyLambda, to improve the scalability of traffic engineering (TE) systems. TE systems continuously monitor traffic and allocate network resources based on observed demands. The temporal requirement for TE is to have a time-to-solution in five minutes or less. Additionally, traffic allocations have a spatial requirement, which is to enable all traffic to traverse the network without encountering an over-subscribed link. However, the multi-commodity flowbased TE formulation cannot scale with increasing network sizes. Recent approaches have relaxed multi-commodity flow constraints to meet the temporal requirement but fail to satisfy the spatial requirement due to changing traffic demands, resulting in oversubscribed links or infeasible solutions. To satisfy both these requirements, we utilize optical topology programming (OTP) to rapidly reconfigure optical wavelengths in critical network paths and provide localized bandwidth scaling and new paths for traffic forwarding. GreyLambda integrates OTP into TE systems by introducing a heuristic algorithm that capitalizes on latent hardware resources at high-degree nodes to offer bandwidth scaling, and a method to reduce optical path reconfiguration latencies. Our experiments show that GreyLambda enhances the performance of two state-of-the-art TE systems, SMORE and NCFlow in real-world topologies with challenging traffic and link failure scenarios.

ONE PASS PACKET STEERING IN SOFTWARE DEFINED DATA CENTERS

Article

Jan 2023

Issue of service function chaining in a network is the focus of this paper. Currently, middle box placement in a network and packet steering through middle boxes are the two main problems associated with chaining services in a network—also known as service function chaining. We propose a One Pass Packet Steering (OPPS) method for use in multi-subscriber environments with the goal of reducing the total amount of time it takes for Users and Services to connect. We show a proof of idea execution utilizing imitations performed with Mininet. According to our findings, the end-to-end delay of subscribers utilizing different sets of policy chains with the same middle boxes and a fixed topology remains roughly the same. Software-Defined Networking, or SDN for short, is a new way of networking that gives a controller and its applications the all-powerful ability to see the whole network and program it in any way they want. This makes it possible for new innovations in network protocols and applications. SDN's logically centralized control plane, which gives visibility to the entire network and is used by many SDN applications, is one of its main benefits. We propose new SDN-specific attack vectors that seriously challenge this foundation, a first in the literature. While the spirit of our new attacks is somewhat similar to that of spoofing attacks in legacy networks, such as the ARP poisoning attack, there are significant differences in how unique vulnerabilities are exploited and how current SDN differs from legacy networks.

Reducing Video Transmission Cost of the Cloud Service Provider with QoS-Guaranteed

Chapter

Aug 2022

With the advancement of cloud computing technology, many service providers are combining with cloud service providers to build a highly available streaming video-on-demand cloud platform and provide video services to end users. Generally, cloud service providers deploy many edge cloud CDN nodes in different geographic areas and provide video services to end users. However, when an end-user wants to watch certain videos and request video resources from surrounding edge cloud CDN nodes, the edge cloud CDN node will request missing video clips from other cloud nodes. Therefore, this will generate a large amount of additional video transmission costs and reduce the quality of service of the cloud service provider. To reduce or even minimize the video transmission cost of edge cloud CDN nodes while ensuring the quality of service (QoS). We designed a video transmission algorithm called Netdmc to ensure transmission quality. The algorithm can be divided into two parts. The first part is a low-latency video request algorithm based on ensuring service quality, and the second part is a video request algorithm based on minimizing video transmission costs. The simulation results demonstrate that the Netdmc algorithm can effectively reduce the cost of cloud service providers and ensure the quality of video services. KeywordsCloud service providerVideo transmission costQoS

Achieving high reliability and throughput in software defined networks

Article

Jul 2021
COMPUT NETW

Flow routing is one of the most important issues in a software defined network (SDN), and faces various challenges. For example, each link may not be reliable all the time (or with a failure probability), and the flow-table size on each switch is limited. Existing solutions about reliable flow routing may result in a longer failure recovery delay, a large number of flow entries or massive control overhead. To this end, we propose to achieve throughput optimization with the constraint that the forwarding reliability probability of each switch pair should exceed a threshold (e.g., 99.9%). We formally define the reliable flow routing (RLFR) problem with flow-table size constraint. We present a rounding-based algorithm and analyze its approximation performance. We further extend our algorithm to preserve the throughput of each switch pair even with link failures. We implement our proposed algorithms on the SDN platform. The experimental results and large-scale network simulation results show that our algorithms can improve the network throughput by about 48.0% and reduce the maximum number of required flow entries by about 53.1%, compared with the existing solutions under the reliability requirement.

Minimize the Cost of Video Transmission Among Cloud Data Center and Edge Cloud CDN Nodes

Chapter

Feb 2021

With the development of cloud computing, more and more video service providers use services from cloud providers. A video service provider can construct a scalable video streaming platform with high availability by the cloud services. Typically, a video service provider uploads its video data to a cloud data center. Then, the cloud data center distributes the video data to its edge cloud CDN nodes. Usually, the cloud data center links with its edge cloud CDN nodes by high-capacity links, spanning different geographical regions. Video traffic across the cloud data center and the edge cloud CDN nodes of a cloud provider, brings on large operational cost to the cloud provider. How to reduce the video traffic cost is important for a cloud provider. Therefore, to reduce the video traffic cost, we propose a set of algorithms based on network maximum flow and minimum cut, called Netcut-way. The proposed Netcut-way, charged by the peak-bandwidth billing model, consists of three parts. The first is peak bandwidth calculation. The second is video segment segmentation. The third is video distribution route. Through extensive simulations, we demonstrate that Netcut-way can effectively reduce the operational cost of cloud providers in video traffic across data centers.

Guarantees for Mix-flows in Inter-Datacenter WANs in Single and Federated Clouds

Conference Paper

Nov 2020

Achieving high utilization with software-driven WAN

Article

Sep 2013
COMPUT COMMUN REV

We present SWAN, a system that boosts the utilization of inter-datacenter networks by centrally controlling when and how much traffic each service sends and frequently re-configuring the network's data plane to match current traffic demand. But done simplistically, these re-configurations can also cause severe, transient congestion because different switches may apply updates at different times. We develop a novel technique that leverages a small amount of scratch capacity on links to apply updates in a provably congestion-free manner, without making any assumptions about the order and timing of updates at individual switches. Further, to scale to large networks in the face of limited forwarding table capacity, SWAN greedily selects a small set of entries that can best satisfy current demand. It updates this set without disrupting traffic by leveraging a small amount of scratch capacity in forwarding tables. Experiments using a testbed prototype and data-driven simulations of two production networks show that SWAN carries 60% more traffic than the current practice.

DWDM-RAM: Enabling Grid services with dynamic optical networks

Conference Paper

Full-text available

Apr 2004

Advances in Grid technology enable the deployment of data-intensive distributed applications, which require moving terabytes or even petabytes of data between data banks. The current underlying networks cannot provide dedicated links with adequate end-to-end sustained bandwidth to support the requirements of these Grid applications. DWDM-RAM is a novel service-oriented architecture, which harnesses the enormous bandwidth potential of optical networks and demonstrates their on-demand usage on the OMNInet. Preliminary experiments suggest that dynamic optical networks, such as the OMNInet, are the ideal option for transferring such massive amounts of data. DWDM-RAM incorporates an OGSI/OGSA compliant service interface and promotes greater convergence between dynamic optical networks and data intensive Grid computing.

Volley: Automated data placement for geo-distributed cloud services

Article

Jan 2010

c-Through

Conference Paper

Aug 2010
COMPUT COMMUN REV

Data-intensive applications that operate on large volumes of data have motivated a fresh look at the design of data center networks. The first wave of proposals focused on designing pure packet-switched networks that provide full bisection bandwidth. However, these proposals significantly increase network complexity in terms of the number of links and switches required and the restricted rules to wire them up. On the other hand, optical circuit switching technology holds a very large bandwidth advantage over packet switching technology. This fact motivates us to explore how optical circuit switching technology could benefit a data center network. In particular, we propose a hybrid packet and circuit switched data center network architecture (or HyPaC for short) which augments the traditional hierarchy of packet switches with a high speed, low complexity, rack-to-rack optical circuit-switched network to supply high bandwidth to applications. We discuss the fundamental requirements of this hybrid architecture and their design options. To demonstrate the potential benefits of the hybrid architecture, we have built a prototype system called c-Through. c-Through represents a design point where the responsibility for traffic demand estimation and traffic demultiplexing resides in end hosts, making it compatible with existing packet switches. Our emulation experiments show that the hybrid architecture can provide large benefits to unmodified popular data center applications at a modest scale. Furthermore, our experimental experience provides useful insights on the applicability of the hybrid architecture across a range of deployment scenarios.

ROADMS and their system applications

Chapter

Dec 2008

Subsystem and system vendors are rapidly developing and producing reconfigurable optical add/drop multiplexers (ROADMs), and carriers are installing and deploying them in their networks. This chapter is a comprehensive treatment of ROADMs and their application in WDM transmission systems and networks, comprising a review of various ROADM technologies and architectures, analyses of their routing functionalities and economic advantages, and considerations of design features and other requirements. The complex interplay between ROADM properties and optical transmission has also been explored, including a detailed discussion of static and dynamic channel power control. ROADMs enable an automated and transparent network capable of rapid reconfiguration. To fully realize this vision within the growing global communication fabric, transmission systems must be capable of dealing with continual changes, including power transients and varying transmission conditions. Network management systems must solve complex problems in routing and wavelength blocking, path verification, and more as the photonic layer assumes some of the tasks previously handled by higher layers. Advanced ROADM functionality, such as colorless add/drop ports, steerable transponders, and adaptive passbands, will be increasingly sought after, as will new and better solutions for signaling, network management, and mesh transmission. By meeting these challenges, the optical R&D community will help address the world's need for flexible, economical, and scalable networks.

Helios

Article

Aug 2010
COMPUT COMMUN REV

The basic building block of ever larger data centers has shifted from a rack to a modular container with hundreds or even thousands of servers. Delivering scalable bandwidth among such containers is a challenge. A number of recent efforts promise full bisection bandwidth between all servers, though with significant cost, complexity, and power consumption. We present Helios, a hybrid electrical/optical switch architecture that can deliver significant reductions in the number of switching elements, cabling, cost, and power consumption relative to recently proposed data center network architectures. We explore architectural trade offs and challenges associated with realizing these benefits through the evaluation of a fully functional Helios prototype.

VL2

Conference Paper

Aug 2009

Dcell

Article

Oct 2008
COMPUT COMMUN REV

A fundamental challenge in data center networking is how to efficiently interconnect an exponentially increasing number of servers. This paper presents DCell, a novel network structure that has many desirable features for data center networking. DCell is a recursively defined structure, in which a high-level DCell is constructed from many low-level DCells and DCells at the same level are fully connected with one another. DCell scales doubly exponentially as the node degree increases. DCell is fault tolerant since it does not have single point of failure and its distributed fault-tolerant routing protocol performs near shortest-path routing even in the presence of severe link or node failures. DCell also provides higher network capacity than the traditional tree-based structure for various types of services. Furthermore, DCell can be incrementally expanded and a partial DCell provides the same appealing features. Results from theoretical analysis, simulations, and experiments show that DCell is a viable interconnection structure for data centers.

The cost of a cloud

Article

Dec 2008
COMPUT COMMUN REV

The data centers used to create cloud services represent a significant investment in capital outlay and ongoing costs. Accordingly, we first examine the costs of cloud service data centers today. The cost breakdown reveals the importance of optimizing work completed per dollar invested. Unfortunately, the resources inside the data centers often operate at low utilization due to resource stranding and fragmentation. To attack this first problem, we propose (1) increasing network agility, and (2) providing appropriate incentives to shape resource consumption. Second, we note that cloud service providers are building out geo-distributed networks of data centers. Geo-diversity lowers latency to users and increases reliability in the presence of an outage taking out an entire site. However, without appropriate design and management, these geo-diverse data center networks can raise the cost of providing service. Moreover, leveraging geo-diversity requires services be designed to benefit from it. To attack this problem, we propose (1) joint optimization of network and data center resources, and (2) new systems and mechanisms for geo-distributing state.

Massively-scaleable highly-dynamic optical node design

Conference Paper

Jan 2010

We describe a node architecture capable of supporting a highly-dynamic, multi-terabit core network. Design trade-offs, performance and hardware requirements are all discussed.

Massively-Scaleable Highly-Dynamic Optical Node Design

Article

Mar 2010

We describe a node architecture capable of supporting a highly-dynamic, multi-terabit core network. Design trade-offs, performance and hardware requirements are all discussed.

Bandwidth on demand for inter-data center communication

Abstract and Figures

Recommended publications

Simple Cloud Federation

Migration Management in Cloud Computing

Resource allocation in the cloud for video-on-demand applications using multiple cloud service provi...

Temporal Task Scheduling With Constrained Service Delay for Profit Maximization in Hybrid Clouds