Conference PaperPDF Available

Dynamic Path Bandwidth Allocation for 100010-Scale Optical Layer-2 Switch Network based on Hierarchical Timeslot Allocation Algorithm and Timeslot Converter

Authors:
  • NTT Advanced Technology

Abstract and Figures

We are developing optical layer-2 switch network that achieves dynamic path bandwidth allocation (DPBA) for efficient aggregation in metro NWs. We show the experimental results of DPBA cycle according to variations in traffic on NW scale of 1000×10.
Content may be subject to copyright.
Dynamic Path Bandwidth Allocation for 1000x10-Scale
Optical Layer-2 Switch Network based on Hierarchical Timeslot
Allocation Algorithm and Timeslot Converter
Kyota HATTORI, Masahiro NAKAGAWA, Naoki KIMISHIMA, Masaru KATAYAMA, and Akira MISAWA
NTT Network Service Systems Laboratories, NTT Corporation, hattori.kyota@lab.ntt.co.jp
Abstract We are developing optical layer-2 switch network that achieves dynamic path bandwidth
allocation (DPBA) for efficient aggregation in metro NWs. We show the experimental results of DPBA
cycle according to variations in traffic on NW scale of 1000x10.
Introduction
Network (NW) traffic is increasing exponentially,
and it is therefore becoming increasingly
necessary to develop cost-effective NWs in
terms of equipment cost and power
consumption. For this trend, we are developing
an optical layer-2 switch network1 (OL2SW-NW)
that can efficiently aggregate traffic in a large-
scale metro NW. The OL2SW-NW is based on a
WDM/TDM ring NW of L WDM channels. This
allows the bandwidth of wavelengths in the NW
to be shared with the ground paths between
1000 aggregation switches (SWs) on the access
NW and 10 IP routers (RRs) on the core NW.
The OL2SW-NW enables NW bandwidth to be
shared effectively as a TDM-PON2does by
allocating timeslots (TSs) to each ground path
and performing ADD/DROP of data according to
allocated TSs (Fig. 1). The OL2SW-NW can
dynamically change each bandwidth of ground
path by changing the number of allocated TSs
according to the amount of traffic on each
ground path at every fixed time. To achieve the
bandwidth allocation, traffic information on all
ground paths is collected at a central TS
scheduler (SCH) for TS allocation (TSA). Hence,
collecting traffic information from all ground
paths and TSA processing at the SCH will
produce bottlenecks in large-scale NWs.
In this paper, we show the experimental results
of dynamic path bandwidth allocation (DPBA) in
1000x10-scale NW by using a novel TSA
algorism and TS converters (TSCs).
Architecture of OL2SW-NW and NW model
The OL2SW-NW achieves DPBA for ground
paths between 1000 SWs and 10 RRs by
deploying a SCH (Fig. 2). We label the node
which constitutes OL2SW-NW as OL2SW; that
is, the OL2SW connected to SWs is called A-
OL2SW, and the one connected to RRs is called
C-OL2SW. We assumed that RRs achieve load
balance between the 10 nodes by applying
virtualization technology3 to the RRs. Therefore,
a SW communicates with a maximum of 10 RRs.
Each OL2SW has the TSC converts variable
Fig. 1: Concept of optical Layer-2 SW network
Fig. 2: Logical network of optical Layer-2 SW network
Ethernet frames (Ethers) from SWs and RRs
into burst signals (Bursts) according to the
allocated TSs. The SCH executes the TSA for
all paths among TSCs. The following shows the
NW model used to estimate the number of paths
and the number of wavelengths required for
TSA computation. First, we assumed that a SW
has one 10G Ethernet Interface (10GE) that
accommodates 1000 users, and that the
average traffic from a user is 1 Mbps. To
accommodate 1000 SWs, the required NW
capacity becomes 1 Tbps. Therefore, the
OL2SW-NW needs 100 (: L) wavelengths if the
capacity of a wavelength is 10 Gbps. Here, we
assumed that each RR aggregates the traffic
from 1000 SWs to 100 Gbps at maximum. This
means that the RR requires ten 10GEs. A TSC
is connected to each 10GE of a SW and RR
one-to-one and transmits Bursts from a 10G
burst interface (10GBT) according to the
allocated TS. We assume that the TSC in the A-
OL2SW does not communicate simultaneously
with more than one TSC at the same C-OL2SW,
and a TSC in the C-OL2SW can accommodate
100 paths at maximum to reduce the
computational time of TSA at the SCH. In
addition, there is no loopback communication
from one SW to another or from one RR to
another. In this case, the SCH has to compute
TSA for 20-k paths for both up and down stream.
Core NW
Access
NW
OL2SW-NW
Fiber
Timeslot
OL2SW
#1
SW
#1
SW
#1000
IP router
#10
IP router
#1
WDM/TDM
(Lwavelengths)
λ1
λN
ὉὉὉ
#1#4
#2
Time
#1
#2 #3
Wavelength
#3 #99
#1 #59#1
#2
TS1
TS2TS3
TS
100
TS1TS2
:Allocated
timeslot for
path number#
#
OL2SW
#1000
OL2SW
#1010
OL2SW
#1001
Mapping
ὉὉὉ
ὉὉὉ
SW
IP
router
Path
number
Upstream
Downstream
#1
#1
#1
#2
#1
#2
#3
#4
#1000
#10
#19999
#20000
ὉὉὉ
ὉὉὉ
ὉὉὉ
ὉὉὉ
ὉὉὉ
䊶䊶䊶
TS
Scheduler
C-plane
D-plane
10GE x 10 IFs
10GE
100 paths x 100 TSCs
=> 10k paths
䊶䊶䊶
1Gbps@1000 users
䊶䊶䊶 䊶䊶䊶
SW #1 SW #100 SW #901 SW #1000
IP Router#1
䊶䊶䊶
C-OL2SW
IP Router#10
䊶䊶䊶
TSC
#1
#1
TSC
#100
#100
TSC
#901
#901
TSC
#1000
#1000
#1
#10
A-OL2SW
10G BT
10G BT
TSC
#1100
TSC
#1091
䊶䊶䊶
TSC
#1010
TSC
#1001
䊶䊶䊶
Up &
Down
stream
TSA for 20k
paths
Th.2.E.2.pd
f
Fig. 3: Proposed DPBA based on TSC and SCH
Fig. 4: Schematic of Hierarchical TSA algorithm
Tab. 1: Formats of QL and TS-table
Operation of DPBA for OL2SW-NW
At every fixed time T, all TSCs are allocated TSs
for all destinations according to the amount of
traffic to follow the traffic variations. A TSC
prepares virtual queues (VQs) for each
destination, the number of which is M(A-
OL2SW’s TSC: 10, C-OL2SW’s TSC: 100) to
achieve burst transmission for each destination.
When a TSC receives Ethers from a SW and
RR through 10GE, the TSC identifies the
destination according to the MAC address of the
Ether and inserts them into the VQ according to
the destination. Then the TSC measures the
queue lengths (QL) in the VQs and notifies the
SCH of the QLs. The SCH calculates the
number of TSs assigned to each TSC based on
the QLs and sends the TS information (TS-
table) to the TSCs. Then, All TSCs can change
the allocated TS at the same time by updating
the TS-table at every T cycle.
Requirements of DPBA cycle
Traffic variation in the metro NW defined in this
paper is assumed to be a traffic rate of 100
Mbits per 1 sec to 1 min4. It is necessary to set
the T in the same order as traffic variation to
achieve DPBA. Therefore, we assume that a
suitable value of T is 1 sec. This means that the
OL2SW-NW must achieve DPBA cycle for 20k
paths at 1 sec. Here, we define the time from
sending the QL to setting the TS-table at the
TSC as Ta (Fig. 3). The Ta for all TSCs must be
within 1 sec in order to set T as 1 sec. Therefore,
we evaluate whether Ta is within 1 sec to
achieve DPBA at every 1 sec for 20k paths.
Hierarchical TSA algorithm
We introduce hierarchical TSA which divides
large-scale NW-wide TSA into small-scale local
TSA (Fig. 4). Then we calculate several small-
scale TSAs independently, which reduces the
calculation time. First, we define a set of some
adjacent OL2SWs as a “node group (N-G)” to
achieve a TSA hierarchy. Thus, each OL2SW
will belong to only one N-G. We call a pair of N-
Gs a N-G pair (GP). Then we compute a TSA
for paths in each GP (in-TSA) and a TSA for
GPs (gp-TSA). Finally, we obtain a TS schedule
by matching the results of the in-TSA to that of
the gp-TSA. In this process, the number of
intended paths and links in each TSA is less
than that of the original TSA. As a result, we
save computation time. Let the number of N-Gs
set up in NOL2SWs be G. The product of the
number of intended paths and that of the
intended links in the in-TSA and gp-TSA are
represented as O{(N/G)3} and O(G3),
respectively. The number of GPs is O(G2), so
this algorism has a calculation time that
depends on O{G2(N/G)3+G3}. Thus, we should
set G as an integer around N3/4 to yield a
minimal calculation time.
Control TS for QL and TS-table
It is necessary to collect the QL from 1010
OL2SWs and set the TS-table to 1010 OL2SWs.
However, if all OL2SWs send QL at any time,
collisions between QLs from different OL2SWs
C-Plane
D-Plane
Time
䊶䊶䊶
1010
QLTS table
TS Scheduler
T
a
TSC
#1
#2
#M
10GE
Ether
frame
䊶䊶䊶
TS1 TS2 TS
100 TS1
䊶䊶䊶
䊶䊶䊶
Processing TSA for 20k paths
T
䊶䊶䊶
TS1 TS2
VQ
OL2SW
TS
100
10GBT
TS2
1010
TSA for node-
group pairs
(gp-TSA)
Path
Link
Time
Link
1111213
TS1
2
3
4
5
6
7
GP#1
->
GP#11
GP#1 -> GP#13
Time
Link
1
TS3
4
5
6
7
C#1 -> A#3
1 2 3
C#1
->
A#2
TSA for paths
in each node-
group pair
(in-TSA)
A-OL2S W
C-OL2SW
Node
layer
1
110
11
998
G
Node-group
layer
999
1000
1
2
3
10
C#1 -> A#1
䊶䊶
䊶䊶
Node-group
size
(bit)
Required number
A
-OL2SW
C
-OL2SW
QL
-ID
4
1 (
TSC)
10 (
TSCs)
-ID
7
10 (
VQs) x 1 (TSC)
100 (
VQs) x 10 (TSCs)
-size
24
10 (
VQs) x 1 (TSC)
100 (
VQs) x 10 (TSCs)
Sum
Less than 0.1 KB
3.9
KB
TS
table
7
101
WDM: 100 + 10GBT:1
110
WDM: 100 + 10GBT :10
7
101 (Input port) x 100 (number
of TSs
)
110 (Input port) x 100 (number
of TSs
)
Sum
8.9 KB
9.9 KB
Fig. 5: (A) Experimental setup, (B) Prototype of TSC, (C) QL for 10 destinations according to allocated TS,
and
(D) D-plane’s signals from a TSC-D according to traffic variations at every T(:500 ms)
MUX
[Traffic conditions]
Traffic variation:100Mbps
@maximum (Poisson traffic)
Range of rate: 100M-1Gbps
1
2
3
4
5
6
7
8
9
10
TSC1
SCH
TG
Rx
TG
Tx
MUX
TG
Emulator of 1007
A- /C-OL2SWs
Wavelength
filter
Coupler
C-OL2SW #1
TSC-D#2
A-OL2SW #2
TSC-C#5
TSC-D#1
A-OL2SW #1
TSC-D#3
Emulation
of QLs from
1007 A- /C-
OL2SWs
Queue
length
QTimeslot
table
TS
(a)
(A)
1GE
10GBT
10GE
1GE 1GE
λ1
λ2
λ1
λ2
λ3
λ3
λ1
λ2
λ3
λ1
λ2
QQ
TSTS
(a) (a)
(b)
10
GE
Traff ic
10G
BT
(a)
Dest.1
Dest.1
Dest.2
Dest.2
Dest.3Dest.4
T
1TS (:10μs)
Time
(D)
Time
(ms)
10.50
QL (MB)
TT
(C) 6
1
0
T
2
3
4
5Traffi c
increase
Reallocation of TS
TSC-D
TSC-C
(B)
TSC-C#1
Q
TSTS
Q
10GBT
1GE
TSC-C#2
Q
TSTS
Q
10GBT
1GE
10
GE
Traff ic
10G
BT 10
GBT
Traff ic
10
GE
Dest.
2TS (:20μs)
TSC-C#3
Q
TSTS
Q
10GBT
1GE
TSC-C#4
Q
TSTS
Q
10GBT
10GE
[Specifications of
TSC-C and TSC-D]
Kind of interfaces
1 10GE
1 10GBT
4 1GEs
Receiving and sending
1010 C-TSs
Th.2.E.2.pd
f
Fig. 6: (A) Delay among TSC-Cs for number of A-
OL2SWs and (B) TSA time for number of paths
will occur. Therefore, all OL2SWs are allocated
control TS (C-TS) other than the data plane’s TS.
The OL2SW sends and receives QL and TS-
table by using the allocated C-TS. To achieve
this, the TSC consists of TSC-D (TSC for data),
which converts Ethers of the data into Bursts,
and TSC-C (TSC for control), which converts
Ethers of the QL and TS-table into Bursts.
Furthermore, to reduce the bandwidth for the C-
TSs, A TSC-C is deployed for an OL2SW. Thus,
1010 OL2SWs can send and receive QL and
TS-tables by allocating C-TSs to all OL2SWs.
Size of QL and TS-table
We define the QL and TS-table necessary for
the DPBA to show the required time to collect
QL and set TS-table for 1010 OL2SWs. The QL
and TS-table formats are given in Tab. 1. QL is
required, including TSC-ID, VQ-ID for identifying
the TSC-D and VQ, and Q-size for representing
the QL. A 4-bit /7 bit data field for the TSC-ID
and VQ-ID are necessary to identify 10 TSC-Ds
for an OL2SW and 100 VQs for a TSC-D,
respectively. Assuming the QL is sent every 10
ms, the maximum QL in 10 ms for 10-Gbps
traffic is 12.5 MB. Therefore, a 24-bit data field
for the QL is necessary to express in Byte unit.
From the above, the size of QL from an A-
OL2SW and a C-OL2SW is less than 0.1 KB
and 3.9 KB, respectively. If we set the size of
the C-TS to 10 μs (:12.5 KB), it will be capable
of sending QL of 1010 OL2SWs in 10.1 ms. In
contrast, the TS-table must be set to every
OL2SW input port (Inport), which is equal to the
sum of 10GBTs and the number of wavelengths1.
There are 101 A-OL2SW Inports and 110 C-
OL2SW Inports if the OL2SW accommodates
100 (: L) wavelengths. Therefore, a 7 bit data
field for the Inport is necessary. If we assume
there are 100 TSs, which have to be set for
every Inport and be specified for output port, the
sizes of TS-table for an A-OL2SW and a C-
OL2SW are 8.9 KB and 9.9 KB, respectively.
Therefore, if we allocate 1 C-TS for an OL2SW,
each C-TS will be capable of sending TS-table
of 1010 OL2SWs in 10.1 ms.
DPBA experiment for 1000 x10-scale NW
We explain the experimental results of DPBA for
1000x10-scale NW. The experimental setup is
shown in Fig. 5(A). We set three TSC-Ds for A-
OL2SW #1, #2 and C-OL2SW #1, which were
connected to TSC-C #1, #2, and #3,
respectively. The prototypes of a TSC-C and a
TSC-D are shown in Fig. 5(B). We allocated one
C-TS for each TSC-C. To emulate QLs from 998
A-OL2SWs and 9 C-OL2SWs, we connected a
traffic generator (TG) to TSC-C #4 and allocated
1007 C-TSs. The SCH was implemented in a
personal computer with a single core 2.3 GHz
CPU. In the proposed algorithm, Gis set to 180
based on the size of N(:1010) to obtain the
minimum computation time. In the algorithm of
in-TSA and gp-TSA, we use the first-fit TS
assignment algorithm.
Fig. 6(A) plots the maximum delay time (D)
between sending QL from Fig. 5 (A)’s (a) and
receiving QL at Fig. 5(A)’s (b) when changing
the number of A-OL2SWs under the condition of
fixed 10 C-OL2SWs at the emulator. Fig. 6(A)
indicates that D is increasing in proportion to the
number of A-OL2SWs because each A-OL2SW
requires a C-TS of 10 μs. The D was 10.2 ms at
maximum even when controlling 1000 A-
OL2SWs. Also, the time to send the TS-table
was 10.2 ms at maximum for 1000 A-OL2SWs
(not appeared in the graph). Therefore, we
found it was possible to achieve a round-trip
time between all TSC-Cs within 20.4 ms.
To evaluate the TSA time in the SCH, we
measured the time from the arrivals of all QLs to
sending the TS-table for all OL2SWs at SCH.
Fig. 6(B) plots the TSA time when changing the
number of paths, where each case was
executed 1000 times. The TSA time increased
in proportion to the number of paths5. The
maximum TSA time was within 450 ms even
when computed for 20-k paths.
The above results indicate that the maximum Ta
was 470 ms. Therefore, we found that it was
possible to achieve DPBA cycle at every 1 sec
on a metro NW scale of 1000x10 against the
defined traffic variations. At this time, when we
set T to 500 ms, we make sure of each function
which sends QL for 10 destinations according to
the allocated TS (Fig. 5(C)) and changes the
bandwidth at a TSC-D according to traffic
variations at every T (Fig. 5(D)).
Conclusions
We evaluated DPBA in OL2SW-NW. We verified
experimentally that DPBA cycle at 500 ms on
NW scale of 1000x10 was achieved.
References
[1] K. Hattori, et al., ECOC2012, We.3.D.5
[2] ITU-T, G.984.2 (2006)
[3] Y. Wang, et al., SIGCOMM’08, pp. 231–242.
[4] G. Xie et al., ITC2007, pp. 666-677
[5] S. Subramaniam et.al. , SPIE 3843, 2 (1999)
12
D(ms)
10
8
6
4
2
01000800600400200
Number of A-OL2SW
10.2 ms
100
200
300
400
500
0
TSA time (ms)
Number of paths 2016124
(k)
(A) (B)
8
450 ms
Th.2.E.2.pd
f
... It can characteristically eliminate the L2SW located in the internal network, which is required for conventional aggregation networks, by aggregating the packet buffer to the entity of the edge node in the OL2SW-NW. This entity is called an "Ether-burst converter (EBC) [13]," and it converts variable-length 10G Ethernet (10GE) frames to TSs with a fixed length and transmits them according to the allocated TSs at a constant TDM frame (the length of TDM frame: L FR ). Each node deploys a WDM/TDM SW [12], which enables TS add/drop for any channel and any TS. ...
... Each node deploys a WDM/TDM SW [12], which enables TS add/drop for any channel and any TS. To allocate the bandwidth for logical paths between EBCs, the OL2SW-NW has a TS scheduler that collects the traffic information on all logical paths for TSA calculated by a high speed TSA algorithm [13]. Then, the TS scheduler periodically allocates bandwidth at a constant cycle, T , which is an integral multiple of L FR for achieving DBA for large-scale aggregation networks [18]. ...
... The OL2SW-NW has to both change the route of TS transmission when the link failure occurs and control the DBA. Therefore, it is desirable for the achievement of both the fast restore from the occurrence of the failure and shortened DBA cycle to increase the efficiency of the accommodated traffic [13]. The control information related to DBA and the detection of a link failure is transmitted using control TSs between each node. ...
Article
Improvement of conventional networks with an incremental approach is an important design method for the development of the future internet. For this approach, we are developing a future aggregation network based on passive optical network (PON) technology to achieve both cost-effectiveness and high reliability. In this paper, we propose a timeslot (TS) synchronization method for sharing a TS from an optical burst mode transceiver between any route of arbitrary fiber length by changing both the route of the TS transmission and the TS control timing on the optical burst mode transceiver. We show the effectiveness of the proposed method for exchanging TSs in bidirectional bufferless wavelength division multiplexing (WDM) and time division multiplexing (TDM) multi-ring networks under the condition of the occurrence of a link failure through prototype systems. Also, we evaluate the reduction of the required number of optical interfaces in a multi-ring network by applying the proposed method.
Conference Paper
We are developing a future metro network based on a 10 Gigabit Ethernet Passive Optical Network system to achieve both cost-effectiveness and high reliability. We propose a timeslot (TS) synchronization method to achieve a TS from a burst mode transceiver shared between any route of arbitrary fiber length by changing both the route of TS and the TS timing on each transceiver. We show the effectiveness of proposed method in exchange TSs in a bidirectional bufferless WDM/TDM ring networks during link failure through prototype systems.
Article
We report a flow burst conversion system for efficient flow aggregation corresponding to virtual machine migration in future metro networks. Experimental results show the routes of flows were changed at high speed via optical TDM paths while accommodating 100k flows.
Article
Power consumption of network (NW) equipment has been rapidly increasing; therefore it is necessary to build a resource-efficient NW. Router virtualization, which involves dynamically re-allocating virtual routers to physical resources as server virtualization, is becoming more common as a method of using NW equipment effectively and robustly. Edge routers which are gateways of core NWs should be virtualized because they have many functions and resources just as servers do. A metro NW is a wide area layer-2 aggregation NW that connects each user's residential gateway to edge routers. To achieve edge router virtualization, the metro NW must trace dynamic edge router re-allocation by changing the route of each Ethernet flow. Therefore, we previously proposed a virtual layer-2 switch architecture with scale-out control that can improve route control performance to trace dynamic virtual router re-allocation to use metro NW equipment effectively and robustly. The routes are controlled in parallel flow-by-flow on this architecture. When edge router failure occurs, the controller must change the routes of all flows passing through the failed edge router. On the other hand, load imbalance of this route change occurs among parallel processes. If we can distribute this load evenly, we can decrease resources for the controller deployed in advance. Therefore, in this paper we propose two re-allocation methods for allocating flows to parallel processes according to the virtual router re-allocation. We evaluated the methods through simulation and showed that they can evenly distribute load among parallel processes not only in large-scale metro NWs, but also in data center NWs, which have recently become an important type of large-scale layer-2 NW.
Article
Router virtualization is becoming more common as a method that uses network (NW) equipment effectively and robustly similar to server virtualization. Edge routers, which are gateways of core NWs, should be virtualized because they have many functions and resources just as servers do. To virtualize edge routers, a metro NW, which is a wide area layer-2 NWconnecting each user's residential gateway to edge routers, must trace dynamic edge router re-Allocation by changing the route of each Ethernet flow. Therefore, we propose a scalable centralized control architecture of a virtual layer-2 switch on a metro NW to trace virtual router reallocation and use metro NW equipment effectively. The proposed scalable control architecture improves the centralized route control performance by processing in parallel on a flow-by-flow basis taking into account route information even in the worst case where edge routers fail. In addition, the architecture can equalize the load among parallel processes dynamically by using two proposed load re-Allocation methods to increase the route control performance stably while minimizing the amount of resources for the control. We evaluate the scalability of the proposed architecture through theoretical analysis and experiments on a prototype and show that the proposed architecture increases the number of flows accommodated in a metro NW. Moreover, we evaluate the load re-Allocation methods through simulation and show that they can evenly distribute the load among parallel processes. Finally, we show that the proposed architecture can be applied to not only large-scale metro NWs but also to data center NWs, which have recently become an important type of large-scale layer-2 NW. © 2015 The Institute of Electronics, Information and Communication Engineers.
  • S Subramaniam
S. Subramaniam et.al., SPIE 3843, 2 (1999) 12