Conference PaperPDF Available

Dynamic Path Bandwidth Allocation for 100010-Scale Optical Layer-2 Switch Network based on Hierarchical Timeslot Allocation Algorithm and Timeslot Converter

January 2013

January 2013

DOI:10.1049/cp.2013.1547

Conference: Optical Communication (ECOC 2013), 39th European Conference and Exhibition on

Authors:

Masaru Katayama

NTT Advanced Technology

Show all 5 authorsHide

We are developing optical layer-2 switch network that achieves dynamic path bandwidth allocation (DPBA) for efficient aggregation in metro NWs. We show the experimental results of DPBA cycle according to variations in traffic on NW scale of 1000×10.

Concept of optical Layer-2 SW network

…

Logical network of optical Layer-2 SW network

…

Figures - uploaded by Masaru Katayama

Content may be subject to copyright.

Content uploaded by Masaru Katayama

Content may be subject to copyright.

Dynamic Path Bandwidth Allocation for 1000x10-Scale

Optical Layer-2 Switch Network based on Hierarchical Timeslot

Allocation Algorithm and Timeslot Converter

Kyota HATTORI, Masahiro NAKAGAWA, Naoki KIMISHIMA, Masaru KATAYAMA, and Akira MISAWA

NTT Network Service Systems Laboratories, NTT Corporation, hattori.kyota@lab.ntt.co.jp

Abstract We are developing optical layer-2 switch network that achieves dynamic path bandwidth

allocation (DPBA) for efficient aggregation in metro NWs. We show the experimental results of DPBA

cycle according to variations in traffic on NW scale of 1000x10.

Introduction

Network (NW) traffic is increasing exponentially,

and it is therefore becoming increasingly

necessary to develop cost-effective NWs in

terms of equipment cost and power

consumption. For this trend, we are developing

an optical layer-2 switch network1 (OL2SW-NW)

that can efficiently aggregate traffic in a large-

scale metro NW. The OL2SW-NW is based on a

WDM/TDM ring NW of L WDM channels. This

allows the bandwidth of wavelengths in the NW

to be shared with the ground paths between

1000 aggregation switches (SWs) on the access

NW and 10 IP routers (RRs) on the core NW.

The OL2SW-NW enables NW bandwidth to be

shared effectively as a TDM-PON2does by

allocating timeslots (TSs) to each ground path

and performing ADD/DROP of data according to

allocated TSs (Fig. 1). The OL2SW-NW can

dynamically change each bandwidth of ground

path by changing the number of allocated TSs

according to the amount of traffic on each

ground path at every fixed time. To achieve the

bandwidth allocation, traffic information on all

ground paths is collected at a central TS

scheduler (SCH) for TS allocation (TSA). Hence,

collecting traffic information from all ground

paths and TSA processing at the SCH will

produce bottlenecks in large-scale NWs.

In this paper, we show the experimental results

of dynamic path bandwidth allocation (DPBA) in

1000x10-scale NW by using a novel TSA

algorism and TS converters (TSCs).

Architecture of OL2SW-NW and NW model

The OL2SW-NW achieves DPBA for ground

paths between 1000 SWs and 10 RRs by

deploying a SCH (Fig. 2). We label the node

which constitutes OL2SW-NW as OL2SW; that

is, the OL2SW connected to SWs is called A-

OL2SW, and the one connected to RRs is called

C-OL2SW. We assumed that RRs achieve load

balance between the 10 nodes by applying

virtualization technology3 to the RRs. Therefore,

a SW communicates with a maximum of 10 RRs.

Each OL2SW has the TSC converts variable

Fig. 1: Concept of optical Layer-2 SW network

Fig. 2: Logical network of optical Layer-2 SW network

Ethernet frames (Ethers) from SWs and RRs

into burst signals (Bursts) according to the

allocated TSs. The SCH executes the TSA for

all paths among TSCs. The following shows the

NW model used to estimate the number of paths

and the number of wavelengths required for

TSA computation. First, we assumed that a SW

has one 10G Ethernet Interface (10GE) that

accommodates 1000 users, and that the

average traffic from a user is 1 Mbps. To

accommodate 1000 SWs, the required NW

capacity becomes 1 Tbps. Therefore, the

OL2SW-NW needs 100 (: L) wavelengths if the

capacity of a wavelength is 10 Gbps. Here, we

assumed that each RR aggregates the traffic

from 1000 SWs to 100 Gbps at maximum. This

means that the RR requires ten 10GEs. A TSC

is connected to each 10GE of a SW and RR

one-to-one and transmits Bursts from a 10G

burst interface (10GBT) according to the

allocated TS. We assume that the TSC in the A-

OL2SW does not communicate simultaneously

with more than one TSC at the same C-OL2SW,

and a TSC in the C-OL2SW can accommodate

100 paths at maximum to reduce the

computational time of TSA at the SCH. In

addition, there is no loopback communication

from one SW to another or from one RR to

another. In this case, the SCH has to compute

TSA for 20-k paths for both up and down stream.

Core NW

Access

OL2SW-NW

Fiber

Timeslot

OL2SW

#1000

IP router

#10

IP router

WDM/TDM

(Lwavelengths)

λ1

λN

ὉὉὉ

#1#4

Time

#2 #3

Wavelength

#3 #99

#1 #59#1

TS1

TS2TS3

100

TS1TS2

:Allocated

timeslot for

path number#

OL2SW

#1000

OL2SW

#1010

OL2SW

#1001

Mapping

ὉὉὉ

router

Path

number

Upstream

Downstream

#1000

#10

#19999

#20000

ὉὉὉ

䊶䊶䊶

Scheduler

C-plane

D-plane

10GE x 10 IFs

10GE

100 paths x 100 TSCs

=> 10k paths

䊶䊶䊶

1Gbps@1000 users

䊶䊶䊶䊶䊶䊶

SW #1 SW #100 SW #901 SW #1000

IP Router#1

䊶䊶䊶

C-OL2SW

IP Router#10

䊶䊶䊶

TSC

#100

TSC

#901

TSC

#1000

#10

A-OL2SW

10G BT

TSC

#1100

TSC

#1091

䊶䊶䊶

TSC

#1010

TSC

#1001

䊶䊶䊶

Up &

Down

stream

TSA for 20k

paths

Th.2.E.2.pd

Fig. 3: Proposed DPBA based on TSC and SCH

Fig. 4: Schematic of Hierarchical TSA algorithm

Tab. 1: Formats of QL and TS-table

Operation of DPBA for OL2SW-NW

At every fixed time T, all TSCs are allocated TSs

for all destinations according to the amount of

traffic to follow the traffic variations. A TSC

prepares virtual queues (VQs) for each

destination, the number of which is M(A-

OL2SW’s TSC: 10, C-OL2SW’s TSC: 100) to

achieve burst transmission for each destination.

When a TSC receives Ethers from a SW and

RR through 10GE, the TSC identifies the

destination according to the MAC address of the

Ether and inserts them into the VQ according to

the destination. Then the TSC measures the

queue lengths (QL) in the VQs and notifies the

SCH of the QLs. The SCH calculates the

number of TSs assigned to each TSC based on

the QLs and sends the TS information (TS-

table) to the TSCs. Then, All TSCs can change

the allocated TS at the same time by updating

the TS-table at every T cycle.

Requirements of DPBA cycle

Traffic variation in the metro NW defined in this

paper is assumed to be a traffic rate of 100

Mbits per 1 sec to 1 min4. It is necessary to set

the T in the same order as traffic variation to

achieve DPBA. Therefore, we assume that a

suitable value of T is 1 sec. This means that the

OL2SW-NW must achieve DPBA cycle for 20k

paths at 1 sec. Here, we define the time from

sending the QL to setting the TS-table at the

TSC as Ta (Fig. 3). The Ta for all TSCs must be

within 1 sec in order to set T as 1 sec. Therefore,

we evaluate whether Ta is within 1 sec to

achieve DPBA at every 1 sec for 20k paths.

Hierarchical TSA algorithm

We introduce hierarchical TSA which divides

large-scale NW-wide TSA into small-scale local

TSA (Fig. 4). Then we calculate several small-

scale TSAs independently, which reduces the

calculation time. First, we define a set of some

adjacent OL2SWs as a “node group (N-G)” to

achieve a TSA hierarchy. Thus, each OL2SW

will belong to only one N-G. We call a pair of N-

Gs a N-G pair (GP). Then we compute a TSA

for paths in each GP (in-TSA) and a TSA for

GPs (gp-TSA). Finally, we obtain a TS schedule

by matching the results of the in-TSA to that of

the gp-TSA. In this process, the number of

intended paths and links in each TSA is less

than that of the original TSA. As a result, we

save computation time. Let the number of N-Gs

set up in NOL2SWs be G. The product of the

number of intended paths and that of the

intended links in the in-TSA and gp-TSA are

represented as O{(N/G)3} and O(G3),

respectively. The number of GPs is O(G2), so

this algorism has a calculation time that

depends on O{G2(N/G)3+G3}. Thus, we should

set G as an integer around N3/4 to yield a

minimal calculation time.

Control TS for QL and TS-table

It is necessary to collect the QL from 1010

OL2SWs and set the TS-table to 1010 OL2SWs.

However, if all OL2SWs send QL at any time,

collisions between QLs from different OL2SWs

C-Plane

D-Plane

Time

䊶䊶䊶

1010

QLTS table

TS Scheduler

TSC

䊶

10GE

Ether

frame

䊶䊶䊶

TS1 TS2 TS

100 TS1

䊶䊶䊶

Processing TSA for 20k paths

䊶䊶䊶

TS1 TS2

OL2SW

100

10GBT

TS2

1010

TSA for node-

group pairs

(gp-TSA)

Path

Link

Time

Link

1111213

TS1

GP#1

GP#11

GP#1 -> GP#13

Time

Link

TS3

C#1 -> A#3

1 2 3

C#1

A#2

TSA for paths

in each node-

group pair

(in-TSA)

A-OL2S W

C-OL2SW

Node

layer

110

998

Node-group

layer

999

1000

C#1 -> A#1

䊶䊶

Node-group

Field

size

(bit)

Required number

-OL2SW

TSC

-ID

1 (

TSC)

10 (

TSCs)

-ID

10 (

VQs) x 1 (TSC)

100 (

VQs) x 10 (TSCs)

-size

10 (

VQs) x 1 (TSC)

100 (

VQs) x 10 (TSCs)

Sum

Less than 0.1 KB

3.9

table

Input port

101

䋨WDM: 100 + 10GBT:1䋩

110

䋨WDM: 100 + 10GBT :10䋩

Output port

101 (Input port) x 100 (number

of TSs

)

110 (Input port) x 100 (number

of TSs

)

Sum

8.9 KB

9.9 KB

Fig. 5: (A) Experimental setup, (B) Prototype of TSC, (C) QL for 10 destinations according to allocated TS,

and

(D) D-plane’s signals from a TSC-D according to traffic variations at every T(:500 ms)

MUX

[Traffic conditions]

䊶Traffic variation:100Mbps

@maximum (Poisson traffic)

䊶Range of rate: 100M-1Gbps

TSC1

SCH

䋨Rx䋩

䋨Tx䋩

MUX

Emulator of 1007

A- /C-OL2SWs

Wavelength

filter

Coupler

C-OL2SW #1

TSC-D#2

A-OL2SW #2

TSC-C#5

TSC-D#1

A-OL2SW #1

TSC-D#3

Emulation

of QLs from

1007 A- /C-

OL2SWs

Queue

length

QTimeslot

table

(a)

(A)

1GE

10GBT

10GE

1GE 1GE

λ1

λ2

λ1

λ2

λ3

λ1

λ2

λ3

λ1

λ2

TSTS

(a) (a)

(b)

Traff ic

10G

(a)

Dest.1

Dest.2

Dest.3Dest.4

1TS (:10μs)

Time

(D)

Time

(ms)

10.50

QL (MB)

5Traffi c

increase

Reallocation of TS

TSC-D

TSC-C

(B)

TSC-C#1

TSTS

10GBT

1GE

TSC-C#2

TSTS

10GBT

1GE

Traff ic

10G

BT 10

GBT

Traff ic

Dest.

2TS (:20μs)

TSC-C#3

TSTS

10GBT

1GE

TSC-C#4

TSTS

10GBT

10GE

[Specifications of

TSC-C and TSC-D]

䊶Kind of interfaces

䊶1 10GE

䊶1 10GBT

䊶4 1GEs

Receiving and sending

1010 C-TSs

Th.2.E.2.pd

Fig. 6: (A) Delay among TSC-Cs for number of A-

OL2SWs and (B) TSA time for number of paths

will occur. Therefore, all OL2SWs are allocated

control TS (C-TS) other than the data plane’s TS.

The OL2SW sends and receives QL and TS-

table by using the allocated C-TS. To achieve

this, the TSC consists of TSC-D (TSC for data),

which converts Ethers of the data into Bursts,

and TSC-C (TSC for control), which converts

Ethers of the QL and TS-table into Bursts.

Furthermore, to reduce the bandwidth for the C-

TSs, A TSC-C is deployed for an OL2SW. Thus,

1010 OL2SWs can send and receive QL and

TS-tables by allocating C-TSs to all OL2SWs.

Size of QL and TS-table

We define the QL and TS-table necessary for

the DPBA to show the required time to collect

QL and set TS-table for 1010 OL2SWs. The QL

and TS-table formats are given in Tab. 1. QL is

required, including TSC-ID, VQ-ID for identifying

the TSC-D and VQ, and Q-size for representing

the QL. A 4-bit /7 bit data field for the TSC-ID

and VQ-ID are necessary to identify 10 TSC-Ds

for an OL2SW and 100 VQs for a TSC-D,

respectively. Assuming the QL is sent every 10

ms, the maximum QL in 10 ms for 10-Gbps

traffic is 12.5 MB. Therefore, a 24-bit data field

for the QL is necessary to express in Byte unit.

From the above, the size of QL from an A-

OL2SW and a C-OL2SW is less than 0.1 KB

and 3.9 KB, respectively. If we set the size of

the C-TS to 10 μs (:12.5 KB), it will be capable

of sending QL of 1010 OL2SWs in 10.1 ms. In

contrast, the TS-table must be set to every

OL2SW input port (Inport), which is equal to the

sum of 10GBTs and the number of wavelengths1.

There are 101 A-OL2SW Inports and 110 C-

OL2SW Inports if the OL2SW accommodates

100 (: L) wavelengths. Therefore, a 7 bit data

field for the Inport is necessary. If we assume

there are 100 TSs, which have to be set for

every Inport and be specified for output port, the

sizes of TS-table for an A-OL2SW and a C-

OL2SW are 8.9 KB and 9.9 KB, respectively.

Therefore, if we allocate 1 C-TS for an OL2SW,

each C-TS will be capable of sending TS-table

of 1010 OL2SWs in 10.1 ms.

DPBA experiment for 1000 x10-scale NW

We explain the experimental results of DPBA for

1000x10-scale NW. The experimental setup is

shown in Fig. 5(A). We set three TSC-Ds for A-

OL2SW #1, #2 and C-OL2SW #1, which were

connected to TSC-C #1, #2, and #3,

respectively. The prototypes of a TSC-C and a

TSC-D are shown in Fig. 5(B). We allocated one

C-TS for each TSC-C. To emulate QLs from 998

A-OL2SWs and 9 C-OL2SWs, we connected a

traffic generator (TG) to TSC-C #4 and allocated

1007 C-TSs. The SCH was implemented in a

personal computer with a single core 2.3 GHz

CPU. In the proposed algorithm, Gis set to 180

based on the size of N(:1010) to obtain the

minimum computation time. In the algorithm of

in-TSA and gp-TSA, we use the first-fit TS

assignment algorithm.

Fig. 6(A) plots the maximum delay time (D)

between sending QL from Fig. 5 (A)’s (a) and

receiving QL at Fig. 5(A)’s (b) when changing

the number of A-OL2SWs under the condition of

fixed 10 C-OL2SWs at the emulator. Fig. 6(A)

indicates that D is increasing in proportion to the

number of A-OL2SWs because each A-OL2SW

requires a C-TS of 10 μs. The D was 10.2 ms at

maximum even when controlling 1000 A-

OL2SWs. Also, the time to send the TS-table

was 10.2 ms at maximum for 1000 A-OL2SWs

(not appeared in the graph). Therefore, we

found it was possible to achieve a round-trip

time between all TSC-Cs within 20.4 ms.

To evaluate the TSA time in the SCH, we

measured the time from the arrivals of all QLs to

sending the TS-table for all OL2SWs at SCH.

Fig. 6(B) plots the TSA time when changing the

number of paths, where each case was

executed 1000 times. The TSA time increased

in proportion to the number of paths5. The

maximum TSA time was within 450 ms even

when computed for 20-k paths.

The above results indicate that the maximum Ta

was 470 ms. Therefore, we found that it was

possible to achieve DPBA cycle at every 1 sec

on a metro NW scale of 1000x10 against the

defined traffic variations. At this time, when we

set T to 500 ms, we make sure of each function

which sends QL for 10 destinations according to

the allocated TS (Fig. 5(C)) and changes the

bandwidth at a TSC-D according to traffic

variations at every T (Fig. 5(D)).

Conclusions

We evaluated DPBA in OL2SW-NW. We verified

experimentally that DPBA cycle at 500 ms on

NW scale of 1000x10 was achieved.

References

[1] K. Hattori, et al., ECOC2012, We.3.D.5

[2] ITU-T, G.984.2 (2006)

[3] Y. Wang, et al., SIGCOMM’08, pp. 231–242.

[4] G. Xie et al., ITC2007, pp. 666-677

[5] S. Subramaniam et.al. , SPIE 3843, 2 (1999)

D(ms)

01000800600400200

Number of A-OL2SW

10.2 ms

100

200

300

400

500

TSA time (ms)

Number of paths 2016124

(k)

(A) (B)

450 ms

Th.2.E.2.pd

Bufferless Bidirectional Multi-Ring Networks with Sharing an Optical Burst Mode Transceiver for Any Route

Article

May 2017

Improvement of conventional networks with an incremental approach is an important design method for the development of the future internet. For this approach, we are developing a future aggregation network based on passive optical network (PON) technology to achieve both cost-effectiveness and high reliability. In this paper, we propose a timeslot (TS) synchronization method for sharing a TS from an optical burst mode transceiver between any route of arbitrary fiber length by changing both the route of the TS transmission and the TS control timing on the optical burst mode transceiver. We show the effectiveness of the proposed method for exchanging TSs in bidirectional bufferless wavelength division multiplexing (WDM) and time division multiplexing (TDM) multi-ring networks under the condition of the occurrence of a link failure through prototype systems. Also, we evaluate the reduction of the required number of optical interfaces in a multi-ring network by applying the proposed method.

Timeslot synchronization to share bandwidth for any route in bufferless bidirectional ring networks

Conference Paper

Sep 2015

We are developing a future metro network based on a 10 Gigabit Ethernet Passive Optical Network system to achieve both cost-effectiveness and high reliability. We propose a timeslot (TS) synchronization method to achieve a TS from a burst mode transceiver shared between any route of arbitrary fiber length by changing both the route of TS and the TS timing on each transceiver. We show the effectiveness of proposed method in exchange TSs in a bidirectional bufferless WDM/TDM ring networks during link failure through prototype systems.

Flow burst conversion for large-scale optical layer-2 switch network based on scale-out flow control method

Article

Nov 2014

We report a flow burst conversion system for efficient flow aggregation corresponding to virtual machine migration in future metro networks. Experimental results show the routes of flows were changed at high speed via optical TDM paths while accommodating 100k flows.

Evaluation of parallel processing control of virtual switch architecture on large-scale network

Article

Feb 2015

Power consumption of network (NW) equipment has been rapidly increasing; therefore it is necessary to build a resource-efficient NW. Router virtualization, which involves dynamically re-allocating virtual routers to physical resources as server virtualization, is becoming more common as a method of using NW equipment effectively and robustly. Edge routers which are gateways of core NWs should be virtualized because they have many functions and resources just as servers do. A metro NW is a wide area layer-2 aggregation NW that connects each user's residential gateway to edge routers. To achieve edge router virtualization, the metro NW must trace dynamic edge router re-allocation by changing the route of each Ethernet flow. Therefore, we previously proposed a virtual layer-2 switch architecture with scale-out control that can improve route control performance to trace dynamic virtual router re-allocation to use metro NW equipment effectively and robustly. The routes are controlled in parallel flow-by-flow on this architecture. When edge router failure occurs, the controller must change the routes of all flows passing through the failed edge router. On the other hand, load imbalance of this route change occurs among parallel processes. If we can distribute this load evenly, we can decrease resources for the controller deployed in advance. Therefore, in this paper we propose two re-allocation methods for allocating flows to parallel processes according to the virtual router re-allocation. We evaluated the methods through simulation and showed that they can evenly distribute load among parallel processes not only in large-scale metro NWs, but also in data center NWs, which have recently become an important type of large-scale layer-2 NW.

Scalable Centralized Control Architecture of Virtual Switch on Large-Scale Network

Article

Nov 2015

Router virtualization is becoming more common as a method that uses network (NW) equipment effectively and robustly similar to server virtualization. Edge routers, which are gateways of core NWs, should be virtualized because they have many functions and resources just as servers do. To virtualize edge routers, a metro NW, which is a wide area layer-2 NWconnecting each user's residential gateway to edge routers, must trace dynamic edge router re-Allocation by changing the route of each Ethernet flow. Therefore, we propose a scalable centralized control architecture of a virtual layer-2 switch on a metro NW to trace virtual router reallocation and use metro NW equipment effectively. The proposed scalable control architecture improves the centralized route control performance by processing in parallel on a flow-by-flow basis taking into account route information even in the worst case where edge routers fail. In addition, the architecture can equalize the load among parallel processes dynamically by using two proposed load re-Allocation methods to increase the route control performance stably while minimizing the amount of resources for the control. We evaluate the scalability of the proposed architecture through theoretical analysis and experiments on a prototype and show that the proposed architecture increases the number of flows accommodated in a metro NW. Moreover, we evaluate the load re-Allocation methods through simulation and show that they can evenly distribute the load among parallel processes. Finally, we show that the proposed architecture can be applied to not only large-scale metro NWs but also to data center NWs, which have recently become an important type of large-scale layer-2 NW. © 2015 The Institute of Electronics, Information and Communication Engineers.

Jan 1999
12

S Subramaniam

S. Subramaniam et.al., SPIE 3843, 2 (1999) 12

Dynamic Path Bandwidth Allocation for 100010-Scale Optical Layer-2 Switch Network based on Hierarchical Timeslot Allocation Algorithm and Timeslot Converter

Abstract and Figures

Recommended publications

Packet contention resolution in slotted optical packet switch using ant-colony based algorithm

Optical L2 switch network for achieving dynamic bandwidth allocation based on 10G-EPON.

Proposal of optical L2 switch network to achieve dynamic bandwidth allocation based on 10G-EPON

Advertised Window Control Based on ACK Throughput Measurement for Improving TCP Performance

Hierarchical Timeslot Allocation for Optical Layer-2 Switch Network