ArticlePDF Available

Reconfiguring multicast sessions in elastic optical networks adaptively with graph-aware deep reinforcement learning

Journal of Optical Communications and Networking

July 2021
13(11)

DOI:10.1364/JOCN.431225

Authors:

Baojia Li

Tencent

Rentao Gu

Beijing University of Posts and Telecommunications

Zuqing Zhu

University of Science and Technology of China

With the fast deployment of datacenters (DCs), bandwidth-intensive multicast services are becoming more and more popular in metro and wide-area networks, to support dynamic applications such as DC synchronization and backup. Hence, this work studies the problem of how to formulate and reconfigure multicast sessions in an elastic optical network (EON) dynamically. We propose a deep reinforcement learning (DRL) model based on graph neural networks to solve the sub-problem of multicast session selection in a more universal and adaptive manner. The DRL model abstracts topology information of the EON and the current provisioning scheme of a multicast session as graph-structured data, and analyzes the data to intelligently determine whether the session should be selected for reconfiguration. We evaluate our proposal with extensive simulations that consider different EON topologies, and the results confirm its effectiveness and universality. Specifically, the results show that it can balance the trade-off between the number of reconfiguration operations and blocking performance much better than existing algorithms, and the DRL model trained in one EON topology can easily adapt to solve the problem of dynamic multicast session reconfiguration in other topologies, without being redesigned or retrained.

Architecture and operation principle of our graph-aware DRL model.

…

Training of the DRL model in the A3C framework.

…

EON topologies used in simulations. (a) 14-node NSFNET topology. (b) 28-node US Backbone (USB) topology. (c) 32-node European Backbone (EUB) topology. (d) 20-node random topology (RT).

…

Training performance (NSFNET, 25 Erlangs). (a) Overall blocking probability. (b) Average number of lightpath reroutings per session. (c) Average reward.

…

Results of dynamic operations (NSFNET, partial rearrangement). (a) Overall blocking probability. (b) Average number of lightpath reroutings per session.

…

Figures - available from: Journal of Optical Communications and Networking

This content is subject to copyright. Terms and conditions apply.

Content uploaded by Zuqing Zhu

Content may be subject to copyright.

Reconﬁguring Multicast Sessions in Elastic Optical

Networks Adaptively with Graph-Aware Deep

Reinforcement Learning

Xiaojian Tian, Baojia Li, Rentao Gu, and Zuqing Zhu, Senior Member, IEEE

Abstract—With the fast deployment of datacenters (DCs),

bandwidth-intensive multicast services are becoming more and

more popular in metro and wide-area networks, to support

dynamic applications such as DC synchronization and backup.

Hence, this work studies the problem of how to formulate and

reconﬁgure multicast sessions in an elastic optical network (EON)

dynamically. We proposed a deep reinforcement learning (DRL)

model based on graph neural networks (GNNs) to solve the

sub-problem of multicast session selection in a more universal

and adaptive manner. The DRL model abstracts the topology

information of the EON and the current provisioning scheme

of a multicast session as graph-structured data, and analyzes

the data to intelligently determine whether the session should

be selected for reconﬁguration. We evaluate our proposal with

extensive simulations that consider different EON topologies, and

the results conﬁrm its effectiveness and universality. Speciﬁcally,

the results show that it can balance the tradeoff between the

number of reconﬁguration operations and blocking performance

much better than the existing algorithms, and the DRL model

trained in one EON topology can easily adapt to solve the

problem of dynamic multicast session reconﬁguration in other

topologies, without being redesigned or retrained.

Index Terms—Optical multicast, Elastic optical networks

(EONs), Network reconﬁguration, Deep reinforcement learning

(DRL), Graph neural network (GNN).

I. INTRODUCTION

In recent years, the rising of cloud services and live video

streaming has made multicast services more and more popular

in the Internet [1]. This trend becomes even more remarkable

since 2020, because of the surge in demands for video con-

ferencing and online classroom services during the epidemic.

Meanwhile, due to the fast deployment of datacenters (DCs)

all over the world, the popularity of multicast services can also

be seen in metro and wide-area networks [2], especially for

bandwidth-intensive applications such as DC synchronization

and backup, distributed scientiﬁc computing, etc [3]. This

has put great pressure on DC interconnects (DCIs) and made

multicast provisioning in DCIs an attractive research topic.

With the tremendous bandwidth in each optical ﬁber, optical

networking plays an important role in DCIs, and a latest study

[4] even suggested that an optical-circuit-switched architecture

could be more scalable and cost-effective for regional DCIs

X. Tian, B. Li and Z. Zhu are with the School of Information Science and

Technology, University of Science and Technology of China, Hefei, Anhui

230027, P. R. China (email: zqzhu@ieee.org).

R. Gu is with the School of Information and Communication Engineering,

Beijing University of Posts and Telecommunications, Beijing 100876, China.

Manuscript received on May 9, 2021.

than a natural packet-switched network. More promisingly, the

advances on the ﬂexible-grid elastic optical networks (EONs)

can further improve the performance of optical switching on

spectrum-efﬁciency, adaptivity and application-awareness [5–

7]. Note that, for bandwidth-intensive and long-lasting applica-

tions (e.g., DC backup), realizing multicast directly in the op-

tical domain has the beneﬁts such as less bandwidth/protocol

overheads and easier to obtain large throughputs [8]. The

agility of EONs would further promote these beneﬁts, which

motivated people to study how to provision multicast services

in EONs and proposed various algorithms [9–14].

Meanwhile, the semi-permanent optical layer in telecommu-

nication networks might not adapt to the dynamic applications

and trafﬁc in DCIs [15]. Therefore, a dynamic optical layer

with fast reconﬁguration speed is desired. For instance, the

standardization effort in [16] suggested that to properly support

inter-DC communications, a dynamic optical network should

be reconﬁgurable within a few milliseconds. Following this

trend, researchers have considered different dynamic operation

scenarios for EONs, e.g., the reconﬁguration to accommodate

time-varying unicast trafﬁc [17, 18], spectrum defragmentation

[19], lightpath restoration [20], and spectrum retuning for

bulk data transfers [21]. The dynamic nature of the multicast

services in DCIs determines that each multicast session might

also need to be updated consistently to maintain the optimality

of its service provisioning scheme (i.e., the one that consumes

the least spectrum resources) [22]. For example, during a one-

to-many DC backup, each destination DC joins the multicast

session when the data of its interest starts to be transferred,

and it will leave the session when its data transfer is done.

The problem of how to formulate and reconﬁgure multicast

sessions in EONs dynamically was previously studied in [22].

Speciﬁcally, the authors divided the problem into two sub-

problems, i.e., session selection and session reconﬁguration,

and designed algorithms to solve them. The session selection

algorithm ﬁnds the most “critical” multicast sessions whose

provisioning schemes waste the most spectrum resources

when being compared with the optimal ones (i.e., off their

optima the most), to reconﬁgure. After the sessions have

been selected, they can be reconﬁgured with either full or

partial rearrangements in the session reconﬁguration, to free up

unnecessary spectrum usages. Note that, the reconﬁguration of

multicast sessions should be evaluated from two perspectives,

i.e., the number of reconﬁguration operations and overall

blocking probability of multicast sessions. Speciﬁcally, by

invoking more reconﬁguration operations, we generally can

readjust the provisioning schemes of multicast sessions better

to save more spectrum resources, and thus a lower blocking

probability will be get in the future. Hence, to maximize

the efﬁciency of the reconﬁguration, we should use the least

reconﬁguration operations to achieve the largest reduction on

blocking probability. However, to the best of our knowledge,

how to optimize this tradeoff has not been fully explored yet.

We can see that in the reconﬁguration of multicast ses-

sions, the sub-problem of session selection is more relevant

to the aforementioned tradeoff. Nevertheless, the heuristic

approaches developed in [22] (i.e., the D-/Q-value based

selection strategies) cannot universally adapt to dynamic EON

environments, and the problem of how to select between them

and determine their key parameters can only be tackled in an

empirical manner. This motivates us to revisit the sub-problem

in this work. Note that, deep reinforcement learning (DRL) can

obtain statistically optimal solutions for complex and time-

varying problems without explicit programming [23]. Hence,

we try to replace the heuristic approaches for session selection

with a DRL-based algorithm, and expect that it can balance

the tradeoff between the number of reconﬁguration operations

and blocking probability better.

Note that, in order to select multicast sessions in an EON

to reconﬁgure, we need to process data in graph structure,

which can hardly be handled well by the neural networks

(NNs) in linear structures. This is because certain important

information buried in the graph-structured data can be lost,

and the DRL models with NNs in linear structures need to be

redesigned and retrained when the EON’s topology changes.

Fortunately, graph neural networks (GNNs) [24] can fulﬁll

the requirements much better, as they can operate directly on

graph-structured data to understand the complex relations in

it for the applications related to networks [25].

In this work, we propose a DRL model based on GNNs

to solve the sub-problem of multicast session selection in a

more universal and adaptive way. The DRL model takes the

topology information of the EON and the current provisioning

scheme of a multicast session as the input, abstracts them as

graph-structured data, and analyzes the data to intelligently

determine whether the multicast session should be selected for

reconﬁguration. We evaluate the proposed graph-aware DRL

model with extensive simulations that consider different EON

topologies. The simulation results conﬁrm the effectiveness

and universality of our proposal, and show that it can balance

the tradeoff between the number of reconﬁguration operations

and blocking probability much better than the existing heuris-

tic approaches, without empirical parameter adjustments.

The rest of the paper is organized as follows. Section II

brieﬂy surveys the related work. We describe the network

model and operation principle of the dynamic reconﬁguration

of multicast sessions in EONs in Section III. The graph-aware

DRL model for session selection is designed in Section IV, and

we discuss its performance evaluations in Section V. Finally,

Section VI summarizes the paper.

II. RELATED WORK

Multicast in the optical domain has been studied since the

inception of wavelength-division-multiplexing (WDM) net-

works, and Sahasrabuddhe et al. [8] ﬁrst came up with the

concept of light-tree for it. One can refer to the survey in

[26] for a complete review of optical multicast in ﬁxed-grid

WDM networks. The proposals of ﬂexible-grid EON [5–7]

considered to leverage bandwidth-variable transponders (BV-

Ts) and bandwidth-variable switches (BV-WSS’) to manage

the spectrum allocation in the optical layer with a ﬁne granu-

larity of 12.5GHz or even less, and thus can make the optical

layer more spectrum-efﬁcient and adaptive. Meanwhile, the

ﬂexible spectrum management in EONs transforms the well-

known routing and wavelength assignment (RWA) problem in

WDM networks into a more complex one, i.e., the routing and

spectrum assignment (RSA) [27]. Hence, the provisioning of

optical multicast should be revisited for EONs.

In [9], the authors proposed two multicast-capable RSA

(MC-RSA) algorithms for EONs and analyzed their perfor-

mance. Liu et al. [12] improved the performance of MC-RSA

by leveraging layered auxiliary graphs. Nevertheless, these two

studies did not consider the adaptive modulation-level selec-

tion in EONs. The multicast provisioning with impairment-

aware routing, modulation and spectrum assignment (RMSA)

was addressed in [11], where the authors designed two integer

linear programming (ILP) models and a few heuristics. Then,

the multicast-capable RMSA (MC-RMSA) algorithms to sup-

port distance-adaptive transmissions were developed in [13].

The authors of [14] introduced light-forest to further improve

the performance of MC-RMSA and proposed a polynomial-

time approximation algorithm. In addition to algorithmic con-

tributions, people have also leveraged the idea of software-

deﬁned EON (SD-EON) to experimentally demonstrate the

control plane operations for optical multicast in [28].

However, the aforementioned studies all assumed that the

optical switches are multicast-capable (MC) (i.e., supporting

light-splitting). Note that, MC optical switches usually have

complicated architectures and thus can be relatively expensive

[26]. Therefore, it might not be cost-effective to build an

EON with them, since the majority of the communications

in the EON will still be for unicast services. This issue can

be addressed by realizing multicast with multicast-incapable

(MI) optical switches, i.e., establishing a logic light-tree for

each multicast session with multiple unicast lightpaths [10].

Speciﬁcally, the study in [10] proposed a spectrum-ﬂexible

member-only relay (OL-M-SFMOR) scheme for this purpose.

Another beneﬁt of realizing multicast with MI optical

switches is that the multicast sessions can be reconﬁgured in

a local and easier manner. This is because the multicast with

MC optical switches has the restriction that all the branches of

a light-tree should have the same spectrum assignment, while

this is not required by the OL-M-SFMOR scheme [10]. In [22],

the authors studied how to formulate and reconﬁgure multicast

sessions dynamically, assuming that OL-M-SFMOR is used in

an EON built with MI optical switches. Nevertheless, as we

have already explained, the algorithms proposed in [22] for

multicast session selection still have a few drawbacks, which

motivate us to revisit the sub-problem in this work and try to

solve it better with a novel graph-aware DRL model.

Previously, Li et al. [29] designed a deep neural network

(DNN) to predict the performance of multicast light-trees.

However, the DNN still uses a linear architecture, which is

not good at processing graph-structured data, and the topic

was not on multicast reconﬁguration. Due to its promising

performance on processing graph-structured data, GNN has

attracted great attention nowadays [25], especially for the

complex optimizations in networks [30, 31].

III. PROBLEM DESCRIPTION

In this section, we explain the network model and operation

principle of the dynamic multicast reconﬁguration in EONs.

A. Network Model

The topology of an EON for DCI is modeled as a directed

graph G(V, E ), where Vand Eare the sets of DCs and ﬁber

links, respectively. Here, similar to the case in [22], we assume

that the EON is built with MI optical switches. On each link

e∈E, there are Ffrequency slots (FS’), each of which has a

bandwidth of 12.5GHz. The BV-Ts that terminate each ﬁber

link are assumed to be the sliceable ones [32], which means

that as long as there are sufﬁcient spectrum resources on a

link, its BV-Ts can always be sliced to facilitate the requested

lightpath transmissions.

We model each multicast session as M R(s, D, b, t), where

s∈Vdenotes the source, Drepresents the set of destinations,

bis the bandwidth demand in Gbps, and tstands for its life-

time. In this work, we consider a dynamic EON environment

that each multicast session M R(s, D, b, t)can come and leave

on-the-ﬂy, and during its life-time t, the DCs in Dcan change

over time too. Hence, when a new multicast session ﬁrst

comes in, we leverage the OL-M-SFMOR scheme in [10]

to set up several lightpaths for establishing a logic light-

tree, such that each destination in Dcan receive bGbps

from the source sthrough one or more lightpaths. Here, each

lightpath for serving the multicast session can only start and

end at its member nodes (i.e., those in s∪D) for saving

BV-Ts, according to the principle of OL-M-SFMOR [10].

As the optical signal is only transmitted all-optically on each

lightpath, the RSA schemes of different lightpaths in the logic

light-tree are independent, i.e., the spectrum assignments on

different branches of the light-tree can be different.

After the initial provisioning of the multicast session, the

DCs in Dcan change over time. Then, when a DC leaves

the session or a new DC joins in, the lightpaths in the logic

light-tree are updated, still with OL-M-SFMOR. Nevertheless,

this might gradually degrade the optimality of the logic light-

tree, the RSA schemes of certain lightpaths in it can be sub-

optimal and waste spectrum resources. Therefore, we need to

reconﬁgure the multicast session adaptively from time to time.

B. Dynamic Reconﬁguration of Multicast Sessions

We use Algorithm 1 to explain the operation principle of

dynamic formulation and reconﬁguration of multicast sessions

[22]. Lines 2-10 explains how to formulate multicast sessions

dynamically. Then, the reconﬁguration of multicast sessions is

triggered periodically to maintain the optimality of the logic

light-trees of in-service multicast sessions. Here, we need to

solve two sub-problems for the reconﬁguration, i.e., session

selection (Line 12) and session reconﬁguration (Line 13). The

session selection needs to ﬁnd the most “critical” multicast

sessions whose logic light-trees are off their optima the most,

to reconﬁgure. The session reconﬁguration rearranges the logic

light-trees of the selected sessions to save spectrum resources,

which can be done with either full or partial rearrangements

[22]. Speciﬁcally, the full rearrangement recalculates the logic

light-tree of each selected session with OL-M-SFMOR, while

the partial rearrangement only chooses certain lightpaths in

the logic light-tree of each selected session to reconﬁgure,

according to the average cost of the lightpaths in it1.

Algorithm 1: Dynamic provisioning of multicast sessions

1while the EON is operational do

2for each newly-arrived session M Ri(s, D, b, t)do

3try to set up a logic light-tree for it with

OL-M-SFMOR;

4if the light-tree cannot be established then

5mark M Rias blocked;

6end

7end

8for each existing session M Rj(s, D, b, t)do

9if thas been expired then

10 remove M Rjand free its resources;

11 end

12 if Dhas changed then

13 update its logic light-tree with

OL-M-SFMOR;

14 end

15 end

16 if it is the time to reconﬁgure multicast sessions then

17 select existing multicast sessions to reconﬁgure;

18 reconﬁgure the selected multicast sessions;

19 end

20 end

Throughout the aforementioned process, we need to balance

the tradeoff between the number of lightpath reroutings and

overall blocking probability of multicast sessions. It can be

seen that the sub-problem of session selection is more relevant

to this tradeoff. Hence, in the following, we ﬁrst review the

D-/Q-value based selection strategies designed in [22], analyze

their drawbacks, and then explain the principle of our graph-

aware DRL based selection algorithm.

The D-value of a logic light-tree is actually the hop-count

of its longest-destination branch [22]

D(T) = max [hops(s→d),∀d∈D],(1)

where Tis the light-tree for multicast session M R(s, D, b, t)

and hops(·)returns the hop-count of a routing path. With the

deﬁnition in (1), the D-value based selection (DTS) strategy

ﬁrst calculates the average D-value of all the in-service multi-

cast sessions, and then selects those whose D-values are larger

1Here, the cost of a lightpath was deﬁned in [22], which depends on the

lightpath’s spectrum usage and the number of hops of its routing path.

than the average value to reconﬁgure. As DTS only considers

the branch lengths of each logic light-tree but does not address

the overall tree structure or the spectrum assignment on the

links, it might not always select the most critical sessions to

reconﬁgure.

The Q-value of a logic light-tree considers the overall tree

structure and the spectrum assignment on its links [22]

Q(T) = hops(T∗)·hidx(T∗)

hops(T)·hidx(T),(2)

where T∗is the logic light-tree that is calculated with OL-

M-SFMOR based on the current network status, and hidx(·)

returns the highest index of the used FS’ on a light-tree. With

(2), the Q-value based selection (QTS) strategy ﬁrst chooses

a threshold Qlb , and then selects those whose Q-values are

smaller than Qlb to reconﬁgure. Although QTS considers more

information of a logic light-tree than DTS, the information

is still somehow limited, and the value of Qlb can only be

determined empirically, which is rather difﬁcult in a dynamic

EON or for EONs with various topologies.

To address the issues of DTS and QTS, this work proposes

to select multicast sessions in a self-adaptive manner with

graph-aware DRL. More speciﬁcally, the DRL model takes the

topology information of the EON and the current provisioning

scheme of a multicast session as the input, abstracts them

as graph-structured data, and analyze the data with GNNs to

intelligently determine whether the multicast session should be

selected for reconﬁguration. Meanwhile, after ofﬂine training,

the DRL model is also trained in the online manner to make

sure that it can optimize its decision-making automatically and

adaptively according to the reward feedbacks from a dynamic

EON environment, i.e., its effectiveness and universality can

be guaranteed without empirical parameter adjustments.

IV. GRAPH-AWARE DRL BASED APPROACH

This section elaborates on our graph-aware DRL model for

multicast session selection. Note that, to determine whether

a multicast session should be selected for reconﬁguration or

not, we need to process graph-structured data (i.e., the tree

topology and the spectrum usages on its links). This task is

suitable for GNNs, because NNs in linear structures normally

only deal with the data in Euclidean domains well [25].

A. System Architecture

We still assume that the EON for DCI is operated by lever-

aging software-deﬁned networking (SDN), which means that

the control plane consists of a centralized controller to handle

the tasks for network control and management (NC&M). Our

graph-aware DRL model obtains the information about the

EON and the multicast sessions in it from the controller, and

selects the most critical sessions to reconﬁgure.

Fig. 1 explains the operation principle of the graph-aware

DRL model, and its work-ﬂow is illustrated with step num-

bers. Dynamic requests regarding multicast sessions (i.e., new

multicast sessions and changes on in-service sessions) are ﬁrst

processed by the request handler, which dispatches them to

both the trafﬁc engineering database (TED) and the service

Network Status

Feature

Engineering

Traffic Engineering Database

StateState

State

Action

Service

Provisioning

Session

Reconfiguration

Reward

Calculation

Network Status

A-GNN C-GNN

Action

Evaluation

DRL Agent

Periodical

Trigger

Experience

Buffer

Reward

Online

Training

Update

Global GNN

Gradient

Parameters

Dynamic

Requests

New Multicast

Sessions

Destinations

Join/Leave

Sessions

Action

Request Handler

Topology

In-Service

Multicast

Sessions

In-Service

Lightpaths

Historical

Multicast

Sessions

Fig. 1. Architecture and operation principle of our graph-aware DRL model.

provisioning module. As explained in Algorithm 1, the service

provisioning module serves the dynamic requests and updates

their provisioning results in the TED. Then, the reconﬁguration

of multicast sessions is triggered periodically, and it starts

from TED sending the current network status to the feature

engineering module, which abstracts the network status to a

state that consists of graph-structured data. The DRL agent

uses two GNNs, i.e., the actor GNN (A-GNN) and critic GNN

(C-GNN), to analyze the state of each in-service multicast

sessions and select certain sessions to reconﬁgure.

Next, the action from the DRL agent (i.e., the selected

multicast sessions) is forwarded to the session reconﬁguration

module, which works with the service provisioning module to

reconﬁgure the selected sessions. After this, the TED sends

the new network status to the reward calculation module to

obtain the reward of the last action conducted by the DRL

agent. Then, we organize the state, action and reward as a

training sample, and store it in the experience buffer. When

enough entries of experience have been accumulated, the

online training module invokes a training process to update

the global GNN, which in turn updates the parameters of the

A-GNN and C-GNN in the DRL agent accordingly.

B. Preprocessing of Data

To prepare the input to the graph-aware DRL model, we

abstract the topology information of the EON and the current

provisioning scheme of a multicast session as graph-structured

data G(V, ˜

V , E , ˜

E), where Vand Estill represent the sets

of DC nodes and ﬁber links in the EON, respectively, while

Vand ˜

Edenote the features of the nodes and links in V

and E, respectively, regarding the current provisioning scheme

a multicast session M R(s, D, b, t). Speciﬁcally, according to

the current logic light-tree Tof M R, we classify the nodes

in Vinto 5categories, i.e., the source s, destinations in D,

intermediate node on Tthat used to be a destination, normal

intermediate node, and nodes that are not on T. Then, the

feature of a node v∈Vcan be described with a corresponding

vector ˜v∈˜

V, with one-hot coding. Here, we use 5bits to

represent the aforementioned node categories, respectively. For

instance, if we have a node v=s, its feature vector ˜vshould

be [1,0,0,0,0], or if the node vis a destination, its feature

vector ˜vshould be [0,1,0,0,0]. On the other hand, the feature

of a link e∈Eis deﬁned as ˜e=f

F, where fis the number of

unused FS’ on link eand Fis the total number of FS’ there.

Here, for simplicity, we do not consider distance-adaptive

modulation selection, and assume that all the lightpaths in each

logic light-tree use the lowest modulation level (i.e., BPSK).

Note that, if we need to consider distance-adaptive modulation

selection, the only difference is that we should let our DRL

model learn the relation between the transmission distance of

a lightpath and the number of FS’ that it uses. Hence, when

preprocessing the graph-structured data of G(V, ˜

V , E , ˜

E), we

need to include the length of each ﬁber link as an attribute,

modify the feature of each link in Eand ˜

Eaccordingly, and

redesign the GNNs in the DRL model to accommodate the

changes. This will be considered in our future work.

C. Structure of GNN

We design the GNNs used in our DRL model based on

graph convolutional network (GCN) [33]. The GCN takes the

graph-structured data G(V, ˜

V , E , ˜

E)as the input, and performs

two types of operations on the data, i.e., the message transfer

and information reduction. For the two types of operations,

we deﬁne two functions as follows. The message function

calculates the message to be sent from node vto node u

msg(v , u) = ˜v·˜e, v, u ∈V, e = (v, u)∈E, (3)

where nodes vand uare connected with a link e= (v, u)in

G(V, ˜

V , E , ˜

E). The reduction function reduces the messages

that each node in Vreceives from its neighbors.

rdu(v) = X

{u:(u,v)∈E}

msg(u, v ), v ∈V. (4)

Then, we send {rdu(v),∀v∈V}through a linear network

in the GCN to obtain the new feature vector of v, and the

transfer function from layer-lto layer-(l+ 1) is deﬁned as

˜v(l+1) =σ(W·rdu(l)(v) + b),(5)

where Wand bdenote the weight matrix and bias of the linear

network, respectively, σ(·)is the nonlinear transfer function,

and rdu(l)(v)represents the reduced information for node v

obtained in layer-lof the linear network.

After several layers of GCNs, we introduce a pooling layer

to aggregate the processed graph-structured data and get a

vector for representing it. Speciﬁcally, we select the pooling

layer that averages the feature vectors of each node as

G=1

|V|X

v∈V

˜v(k),(6)

where ˜

Gis the obtained vector, |V|is the number of nodes in

V, and kis the number of GCN layers. Finally, we send ˜

Gto

go through several linear layers, for getting the ﬁnal output.

*OREDO1HWZRUN$*11&*117KUHDG$*11&*11([SHULHQFH%XIIHU(QYLURQPHQW,QWHUDFW

*UDGLHQW

3DUDPHWHUV

3DUDPHWHU8SGDWHV7KUHDG$*11&*11([SHULHQFH%XIIHU(QYLURQPHQW,QWHUDFW7KUHDGQ$*11&*11([SHULHQFH%XIIHU(QYLURQPHQWQ,QWHUDFW

Fig. 2. Training of DRL model in A3C framework.

D. Design of Graph-aware DRL Model

We design the four basic elements of the DRL model as

•Agent: The DRL agent is based on the asynchronous

advantage actor-critic (A3C) framework [34], which uses

multiple pairs of A-GNN and C-GNN for parallel online

training in several threads. For each pair of A-GNN and

C-GNN, the A-GNN provides an action policy π(S)

based on the state Sin graph structure, and chooses the

appropriate action according to the policy π(S). The C-

GNN is responsible for learning the value of state Sand

evaluating the action from A-GNN based on it.

•State: The state Scontains the topology information

of the EON and the current provisioning scheme of a

multicast session, and it is just the graph-structured data

G(V, ˜

V , E , ˜

E)obtained by the data preprocessing.

•Action: The action is modeled with a binary variable

a,i.e., if the multicast session should be selected for

reconﬁguration, we have a= 1, and a= 0, otherwise.

•Reward: We deﬁne the reward as follows

r=−k1·Nre +k2·[slots(T)−slots(T∗)]

+k3·[cuts(T)−cuts(T∗)] ,(7)

where k1,k2and k3are the positive coefﬁcients for

normalization, Nre represents the number of lightpath

reroutings to reconﬁgure the multicast session, Tand

T∗denote the logic light-trees for the multicast session

before and after the reconﬁguration, respectively, slots(·)

returns the number of FS’ used by a logic light-tree, and

cuts(·)returns the number of spectrum cuts [19] caused

by a logic light-tree. Hence, the reward in (7) decreases

with the number of lightpath reroutings, and increases

with the spectrum usage and spectrum cuts saved by

the reconﬁguration. In other words, by maximizing the

reward, our graph-aware DRL model tries to invoke the

smallest number of lightpath reroutings on a multicast

session to achieve the largest savings on spectrum usage

and spectrum cuts.

As shown in Fig. 2, we duplicate the A-GNN and C-GNN

into several copies, use one copy as the global GNN, and put

each of the others in a training thread to expedite the training

process. Speciﬁcally, each training thread uses its A-GNN and

C-GNN to interact with an EON environment independently

to obtain training samples. In the iterative manner, the global

GNN collects the gradients generated by the training threads,

leverages them to update the parameters of its A-GNN and

C-GNN, and synchronizes the updated parameters to the A-

GNNs and C-GNNs in the training threads. As each thread

is trained independently to obtain the gradients, the major

beneﬁt of this approach is that it effectively reduces the

correlations among training samples. Meanwhile, the multi-

thread training can make full use of available computing

resources to accelerate the online training.

Algorithm 2 explains the training process in a thread in

detail, where we use Tto record the number of training

iterations, and Tmax is the upper-limit on training iterations.

Lines 3-9 use the local A-GNN and C-GNN of the thread

to interact with its own EON environment, for collecting

training samples. Then, when enough training samples have

been collected, Lines 11-17 perform one iteration of the

training. Speciﬁcally, the gradients are ﬁrst calculated locally

with the obtained training sample (Lines 11-13), then they are

forwarded to the global GNN (Line 14), and ﬁnally the thread

updates the parameters of its A-GNN and C-GNN according

to the feedback from the global GNN and prepares itself for

the next iteration of training (Lines 15-17).

V. PERFORMANCE EVALUATION

In this section, we conduct extensive numerical simulations

to evaluate our proposed approach based on graph-aware DRL.

A. Simulation Setup

The simulations use the four topologies in Fig. 3 for the

EONs for DCIs, to conﬁrm the universality of our proposal in

terms of topologies. The capacity of each ﬁber link is assumed

to be F= 100 FS’, where each FS has a bandwidth of

12.5GHz to deliver 12.5Gbps throughput. For each multicast

session M R(s, D, b, t),sand Dare randomly selected from

the nodes in the EON, Dcontains [2,5] destinations initial-

ly, the bandwidth demand bis uniformly distributed within

[50,200] Gbps, and the life-time tfollows the exponential

distribution with an average of 500 time-units. As the multicast

sessions are dynamic, we generate new multicast sessions

according to the Poisson trafﬁc model, and for each in-service

multicast, destinations can join or leave dynamically during

its life-time. Speciﬁcally, the service time of each destination

follows the exponential distribution, and it leaves its multicast

session when the service time expires, while new destinations

are generated with the Poisson distribution. In Section V-E,

we will change the settings mentioned above and run more

simulations to further verify the universality of our proposal.

The reconﬁguration of multicast sessions is invoked every

100 time-units, and this interval is empirically set. The sim-

ulations compare our proposal based on graph-aware DRL

with the heuristics for session selection in [22] (i.e., DTS and

Algorithm 2: Training process of a thread

1T= 0;

2while T < Tmax do

3if it is the time to reconﬁgure multicast sessions then

4for each in-service multicast session M Rido

5get state Siof M Ri;

6put Siinto the A-GNN to get an action ai;

7apply aito the EON environment;

8calculate reward ri;

9push {Si, ai, ri}to experience buffer;

10 end

11 end

12 if experience buffer is full then

13 reset the gradients as 0;

14 calculate the loss with A-GNN and C-GNN using

the training samples in experience buffer;

15 get the gradients with the loss;

16 send the gradients to the global GNN;

17 update the parameters of A-GNN and C-GNN

according to the feedback from the global GNN;

18 empty the experience buffer;

19 T=T+ 1;

20 end

21 end

QTS), and consider both the partial and full rearrangements

for session reconﬁguration. To ensure sufﬁcient statistical

accuracy, we average the results from 5independent runs to

obtain each data point.

B. Training Performance

We ﬁrst evaluate the training performance of our DRL

model. Note that, the DRL model needs to ﬁrst go through

the ofﬂine training that optimizes its parameters initially, to

make it suitable for being put into online operation/training.

Hence, we study the performance of the ofﬂine training in

this subsection, and will consider that of the online opera-

tion/training in subsequent ones. Figs. 4(a) and 4(b) show how

the average number of lightpath reroutings per session and

blocking probability change in the training process, respec-

tively, for the case in the NSFNET topology with the trafﬁc

load at 25 Erlangs. For comparisons, we also plot the results

from DTS-based and QTS-based algorithms in Fig. 4. Here,

all the algorithms assume that full rearrangement is used to

reconﬁgure the selected multicast sessions, and thus they are

labeled with “-F”. In the following, the algorithms labeled with

“-F” and “-P” mean that they accomplish multicast session

reconﬁguration with the full and partial rearrangements in [22],

respectively. Meanwhile, for QTS-based algorithms, we can

choose their thresholds on Q-value for session selection (i.e.,

Qlb), and thus they are also labeled with their Qlb values. For

instance, the QTS-P-0.8in Fig. 4 means that the multicast

reconﬁguration uses QTS to select multicast sessions with

Qlb = 0.8, and reconﬁgures them with partial rearrangement.

The results in Fig. 4(a) show that compared with QTS-P-

0.8, DTS-P achieves a lower blocking probability by invoking

111

(a) 14-node NSFNET topology

910

23 24

26 27

(b) 28-node US Backbone (USB) topology

810

21 28 23

(d) 20-node random topology (RT)

Fig. 3. EON topologies used in simulations.

much more lightpath reroutings per session. Meanwhile, after

being trained with more than 5,000 episodes, our DRL-P can

obtain a blocking probability that is as low as that of QTS-

P-0.8, while its average lightpath reroutings per session is

fewer than that of QTS-P-0.8in Fig. 4(b). In other words, by

utilizing its graph-aware intelligence, our DRL-P can balance

the tradeoff between overall blocking probability and average

lightpath reroutings per session much better than the two

benchmarks that use deterministic strategies.

Moreover, to clearly see how the average value of DRL-

P’s reward correlates with the metrics in Figs. 4(a) and 4(b)

in the training, we plot it in Fig. 4(c). Here, we empirically

set the positive coefﬁcients in (7) as k1= 6.0,k2= 1.0and

k3= 2.0. It can be seen that the average reward generally

increases with the decreases of overall blocking probability

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 4. Training performance (NSFNET, 25 Erlangs).

TABLE I

AVERAGE RUNNING T IME OF OFFLI NE TRAI NIN G (SECO NDS)

Topology NSFNET USB EUB RT

DRL-P 25167.35 34698.01 34944.14 32342.45

DRL-F 31541.22 40423.58 45508.86 36843.80

and average number of lightpath reroutings per session in Figs.

4(a) and 4(b), respectively. Note that, we also check other

trafﬁc loads in NSFNET and the cases with full rearrangement,

and conﬁrm that our DRL-based approach can always achieve

similar training performance as that in Fig. 4. Hence, the

results are omitted due to the page limit.

Table I lists the running time of the ofﬂine training that

makes our DRL model suitable for online operation/training.

We observe that for the EONs with the NSFNET, US Back-

bone (USB), European Backbone (EUB), and random (RT)

topologies, the running time actually increases with the size

of the topology. Meanwhile, the DRL model that is for the

full rearrangement scheme usually takes longer ofﬂine training

time than that for the partial rearrangement scheme, regardless

of the topology. These trends are expected, because when

the topology of the EON becomes larger or the multicast

reconﬁguration changes from partial rearrangement to full re-

arrangement, the problem of multicast reconﬁguration actually

become more complex.

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 5. Results of dynamic operations (NSFNET, partial rearrangement).

C. Performance in Dynamic Network Environments

Next, we evaluate the performance of our DRL-based ap-

proach by putting the graph-aware DRL model, which has

passed the ofﬂine training in a dynamic network environment

with the NSFNET topology, and compare its performance

with DTS-based and QTS-based algorithms. Figs. 5 and 6

show the simulation results for the cases using partial and

full rearrangements, respectively. Here,“NR” denotes the case

without multicast session reconﬁguration. Note that, in Figs.

5(a) and 6(a), when the trafﬁc load is above 35 Erlangs,

the blocking probabilities from the algorithms with multicast

session reconﬁguration can actually exceed the practical range

of the blocking probability in a real-world EON. Although the

trafﬁc loads exceed what should be considered in a real-world

EON, we still simulate them to get a complete picture about

how the algorithms will perform at various trafﬁc loads. The

DTS-based algorithm still provides the lowest overall blocking

probability with the largest number of lightpath reroutings

per session. By combining the results in the ﬁgures, we

can conclude that to keep the overall blocking probabilities

comparable to those of DTS-based and QTS-based algorithms,

our DRL model always requires the smallest number of light-

path reroutings per session effectively, for all the simulation

scenarios considered in Figs. 5 and 6. Hence, our graph-aware

DRL-based approach can effectively reduce the operational

complexity of dynamic multicast session reconﬁguration, with-

out sacriﬁcing much performance on request blocking.

Moreover, we notice that QTS-based algorithm can change

the value of Qlb to balance the tradeoff between blocking prob-

ability and average lightpath reroutings per session. Hence, we

change Qlb to obtain different sets of blocking probability and

average lightpath reroutings per session, and plot the results

in Fig. 7, when the trafﬁc load is set as 40 Erlangs. Here,

we take average lightpath reroutings per session and blocking

probability as the X-axis and Y-axis, respectively, to illustrate

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 6. Results of dynamic operations (NSFNET, full rearrangement).

(a) Partial rearrangement

(b) Full rearrangement

Fig. 7. Tradeoff between blocking probability and average lightpath rerout-

ings per session (NSFNET, 40 Erlangs).

the tradeoff more clearly. It can be seen that no matter partial

or full rearrangement is used, the data point for the results

from the DRL model is always below the curve for the results

from QTS-based algorithm. This veriﬁes that the DRL model

balances the tradeoff better than QTS, regardless of the choice

of Qlb. In addition to 40 Erlangs, the simulations also check

other trafﬁc loads, and similar trends can be obtained.

D. Universality across Different Topologies

We then evaluate the universality of our graph-aware DRL-

based approach across different topologies. The operation

principle of our graph-aware DRL model ensures that the

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 8. Results of dynamic operations (USB, partial rearrangement).

DRL model trained in one EON topology can be directly

applied to solve the problem of dynamic multicast session

reconﬁguration in others. Speciﬁcally, we only need to abstract

the new topology information of the EON and the provisioning

scheme of each multicast session as graph-structured data

G(V, ˜

V , E , ˜

E)and input the data to the trained DRL model,

while the DRL model does not need to be redesigned or

retrained. To verify this, the simulations apply the DRL model

trained in NSFNET to solve the problem of dynamic multicast

session reconﬁguration in the other topologies in Fig. 3.

Fig. 8 shows the results for the dynamic operations in USB,

when partial rearrangement is considered. We can see that

the results follow the similar trends as those in Fig. 5. To

further clarify the adaptability of our DRL model, we take the

case of trafﬁc load at 25 Erlangs in USB as an example, and

plot how the performance metrics change over the simulation

time in Fig. 9. As we directly apply the DRL model trained

in NSFNET to the EON with the USB topology, a zero-

shot transfer learning (i.e., applying a trained DRL model

to an unseen environment for the same task [35]) is actually

considered. It can be seen that due to the superior adaptability

of our DRL model, it achieves relatively good performance

on the performance metrics at the beginning of the online

operation/training, and both the overall blocking probability

and average number of lightpath reroutings per session only

changes slightly afterwards.

Figs. 10 and 11 illustrate the results obtained by directly

applying the DRL model trained in NSFNET to the EONs

with EUB and RT topologies, respectively. The results still

follow the similar trends as those in Fig. 5. Note that, when

the EON topology changes, we might need to change the value

of Qlb (i.e., the threshold on Q-value for session selection)

for QTS-based algorithms empirically. This is the reason why

we simulate QTS-P-0.9 in EUB (as shown in Fig. 10). On

the other hand, with its graph-aware intelligence, our DRL

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 9. Performance on zero-shot transfer learning (USB, 25 Erlangs).

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 10. Results of dynamic operations (EUB, partial rearrangement).

model can adapt to different topologies without such manual

adjustments. Although the results in Figs. 8-11 are all about the

cases that use partial rearrangement, we also check those with

full rearrangement and conﬁrm that our DRL-based approach

achieves similar performance in them too. Therefore, we prove

the universality of our DRL model across different topologies.

Table II lists the average running time per multicast session

reconﬁguration of the algorithms. Here, for our DRL model,

the running time is only for its online operation/training,

because the ofﬂine training should be ﬁnished before the DRL

model can be put into operation and its running time has

already been summarized in Table I. The results in Table

II suggest that the running time of all the algorithms is

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 11. Results of dynamic operations (RT, partial rearrangement).

TABLE II

AVERAGE RUNNING TIM E PER M ULTICAS T RECONFIGU RATIO N (SEC ONDS)

Topology NSFNET USB EUB RT

DRL-P 0.1943 0.2494 0.2383 0.2908

QTS-P 0.2964 0.3941 0.4040 0.3986

DTS-P 0.0693 0.0900 0.1016 0.0867

DRL-F 0.2104 0.2823 0.3022 0.3110

QTS-F 0.3620 0.4648 0.4927 0.4448

DTS-F 0.1995 0.2723 0.2882 0.2687

comparable and short enough to adapt to dynamic operations.

The running time of our DRL model is less than that of the

QTS-based algorithm in all the simulation scenarios, while

as the DTS-based algorithm only makes decisions according

to the depth of each logical light-tree, it runs the fastest.

Meanwhile, the running time of each algorithm generally

increases with the size of the topology, or from using partial

rearrangement to using full rearrangement.

E. Generalization to Various EON Settings

Finally, we consider more simulation settings to verify that

our proposed graph-aware DRL model can adapt to various

EON settings. First of all, we notice that the assumption

of Poisson trafﬁc model might not hold in today’s Internet.

Hence, we design a new simulation scenario, in which the mul-

ticast sessions are generated dynamically in a bursty manner,

i.e., they come in according to the realistic ON/OFF pattern

for bursty Internet trafﬁc [36]. Note that, we still quantify the

trafﬁc load of the multicast sessions with Erlangs, i.e., the

production of the average number of new sessions per unit-

time and the average lifetime of each session in time-units. The

results of the simulations with NSFNET are shown in Fig. 12,

and by comparing them with those in Fig. 5, we can see the

similar trends. Meanwhile, as the bursty trafﬁc model is more

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 12. Results of dynamic operations with bursty multicast sessions

(NSFNET, partial rearrangement).

likely to cause session blockings, the blocking probability of

each algorithm in Fig. 12 is higher. Nevertheless, our DRL

model still retains its advantage of signiﬁcantly reducing the

number of reconﬁguration operations without sacriﬁcing the

performance on blocking probability. With the bursty trafﬁc

model, we also simulate other EON topologies and test the

algorithms with full rearrangement, while the results always

follow the similar trends as those in Fig. 12.

Secondly, we increase the number of FS’ on each ﬁber

link to 200, for simulating the EONs with more spectrum

resources. The results of the simulations with NSFNET are

shown in Fig. 13, and by comparing them with those in Fig.

5, we still see the similar trends. Meanwhile, since there are

more spectrum resources in the EON, we need to increase the

trafﬁc load to see the same blocking probability. Our DRL

model still exhibits the advantages over the heuristics, which

suggests that its performance is not affected by the change

of spectrum resources in the EON. With the new setting of

spectrum resources, we also simulate other EON topologies

and test the algorithms with full rearrangement, and the results

always follow the similar trends as those in Fig. 13.

Finally, considering the fact that in a real-world EON, there

are unicast and anycast lightpaths coexisting with multicast

sessions, we design a realistic simulation scenario that unicast

and anycast lightpaths are used as the background trafﬁc of

multicast sessions. Speciﬁcally, to create a stressful scenario

for our DRL model, we make the total bandwidth demands

of unicast, anycast, and multicast account for 25%, 25% and

50% of the overall bandwidth usage in the EON, respectively.

The results of the simulations with NSFNET are shown in Fig.

14, and by comparing them with those in Fig. 5, we can see

that the blocking probability of multicast sessions becomes

lower. This is because for the same trafﬁc load, unicast and

anycast lightpaths generally require less spectrum resources

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 13. Results of dynamic operations with 200 FS’ per ﬁber link (NSFNET,

partial rearrangement).

(a) Overall blocking probability

(b) Average number of lightpath reroutings per session

Fig. 14. Results of dynamic operations with unicast/anycast background

trafﬁc (NSFNET, partial rearrangement).

than multicast sessions, and thus the total spectrum usage

is actually smaller. Meanwhile, for the same reason, the gap

on blocking probability between the multicast reconﬁguration

algorithms and the case without multicast reconﬁguration (i.e.,

NR) becomes smaller too. Note that, compared with the

QTS-based and DTS-based algorithms, our DRL model still

invokes a smaller number of lightpath reroutings per session

to maintain almost the same blocking probability. This veriﬁes

the effectiveness of our DRL model in the EON environment

that contains mixed types of trafﬁc demands.

VI. CONCLUSION

In this work, we revisited the problem of how to formulate

and reconﬁgure multicast sessions in an EON, and proposed

a DRL model based on GNNs that can solve the sub-problem

of multicast session selection in a more universal and adap-

tive way. Speciﬁcally, we abstracted the state information

of each multicast session as graph-structured data, which

can be directly analyzed by our graph-aware DRL model.

Then, the graph-based reasoning capability of our proposal

made sure that the state information of each multicast session

can be analyzed in depth for dynamic reconﬁguration, and

facilitated the universality across different topologies. Hence,

an important takeaway is that our graph-aware design of the

DRL model made its architect and operation independent of

the EON’s topology, and thus avoided the hassle of redesigning

its architecture to adapt to different EON topologies.

Simulation results veriﬁed that compared with the exist-

ing deterministic algorithms based on DTS and QTS, our

graph-aware DRL based approach can signiﬁcantly reduce

the average lightpath reroutings per multicast session while

maintaining the overall blocking probability approximately

at the same level. This suggested that our proposal can

balance the tradeoff between the number of reconﬁguration

operations and blocking performance much better than the

existing algorithms. Moreover, our simulations also conﬁrmed

that the DRL model trained in one EON environment can

easily adapt to solve the problem of dynamic multicast session

reconﬁguration in EONs with various settings (e.g., different

topologies, spectrum resources, trafﬁc models and request

types). Therefore, the universality of our proposal helped to

effectively save the time and efforts that are needed to adjust

the DRL model according to an EON’s setting, and provided

a more realistic solution for network automation.

ACKNOWLEDGMENTS

This work was supported in part by the National Key

R&D Program of China (2020YFB1806400), NSFC project

61871357, SPR Program of CAS (XDC02070300), and Fun-

damental Funds for Central Universities (WK3500000006).

REFERENCES

[1] Cisco Visual Networking Index, 2017-2022. [Online]. Available:

https://www.cisco.com/c/en/us/solutions/collateral/service-provider/

visual-networking-index-vni/white-paper-c11-741490.html.

[2] P. Lu, L. Zhang, X. Liu, J. Yao, and Z. Zhu, “Highly efﬁcient data

migration and backup for Big Data applications in elastic optical inter-

data-center networks,” IEEE Netw., vol. 29, pp. 36–42, Sept./Oct. 2015.

[3] N. Laoutaris, M. Sirivianos, X. Yang, and P. Rodriguez, “Inter-datacenter

bulk transfers with NetStitcher,” in Proc. of ACM SIGCOMM 2011, pp.

74–85, Aug. 2012.

[4] V. Dukic, C. Gkantsidis, T. Karagiannis, F. Parmigiani, A. Singla,

M. Filer, J. Cox, A. Ptasznik, N. Harland, W. Saunders, and C. Belady,

“Beyond the mega-data center: networking multi-data center regions,”

in Proc. of ACM SIGCOMM 2020, pp. 765–781, Aug. 2020.

[5] O. Gerstel, M. Jinno, A. Lord, and B. Yoo, “Elastic optical networking:

a new dawn for the optical layer?” IEEE Commun. Mag., vol. 50, pp.

s12–s20, Feb. 2012.

[6] Z. Zhu, W. Lu, L. Zhang, and N. Ansari, “Dynamic service provisioning

in elastic optical networks with hybrid single-/multi-path routing,” J.

Lightw. Technol., vol. 31, pp. 15–22, Jan. 2013.

[7] L. Gong and Z. Zhu, “Virtual optical network embedding (VONE) over

elastic optical networks,” J. Lightw. Technol., vol. 32, pp. 450–460, Feb.

2014.

[8] L. Sahasrabuddhe and B. Mukherjee, “Light trees: optical multicasting

for improved performance in wavelength routed networks,” IEEE Com-

mun. Mag., vol. 37, pp. 67–73, Feb. 1999.

[9] Q. Wang and L. Chen, “Performance analysis of multicast trafﬁc over

spectrum elastic optical networks,” in Proc. of OFC 2012, pp. 1–3, Mar.

2012.

[10] X. Liu, L. Gong, and Z. Zhu, “On the spectrum-efﬁcient overlay multi-

cast in elastic optical networks built with multicast-incapable switches,”

IEEE Commun. Lett., vol. 17, pp. 1860–1863, Sept. 2013.

[11] L. Gong, X. Zhou, X. Liu, W. Zhao, W. Lu, and Z. Zhu, “Efﬁcient

resource allocation for all-optical multicasting over spectrum-sliced

elastic optical networks,” J. Opt. Commun. Netw., vol. 5, pp. 836–847,

Aug. 2013.

[12] X. Liu, L. Gong, and Z. Zhu, “Design integrated RSA for multicast

in elastic optical networks with a layered approach,” in Proc. of

GLOBECOM 2013, pp. 2346–2351, Dec. 2013.

[13] K. Walkowiak, R. Goscien, M. Klinkowski, and M. Wozniak, “Optimiza-

tion of multicast trafﬁc in elastic optical networks with distance-adaptive

transmission,” IEEE Commun. Lett., vol. 18, pp. 2117–2120, Dec. 2014.

[14] Z. Zhu, X. Liu, Y. Wang, W. Lu, L. Gong, S. Yu, and N. Ansari,

“Impairment- and splitting-aware cloud-ready multicast provisioning in

elastic optical networks,” IEEE/ACM Trans. Netw., vol. 25, pp. 1220–

1234, Apr. 2017.

[15] A. Mahimkar, A. Chiu, R. Doverspike, M. Feuer, P. Magill, E. Mavro-

giorgis, J. Pastor, S. Woodward, and J. Yates, “Bandwidth on demand

for inter-data center communication,” in Proc. of HOTNETS 2011, pp.

1–6, Nov. 2011.

[16] A. Malis, B. Wilson, G. Clapp, and V. Shukla, “Requirements for very

fast setup of GMPLS label switched paths (LSPs),” RFC 7709, Nov.

2015. [Online]. Available: https://tools.ietf.org/html/rfc7709.

[17] A. Castro, L. Velasco, M. Ruiz, M. Klinkowski, J. Fernandez-Palacios,

and D. Careglio, “Dynamic routing and spectrum (re) allocation in future

ﬂexgrid optical networks,” Comput. Netw., vol. 56, pp. 2869–2883, Aug.

2012.

[18] M. Klinkowski, M. Ruiz, L. Velasco, D. Careglio, V. Lopez, and

J. Comellas, “Elastic spectrum allocation for time-varying trafﬁc in

ﬂexgrid optical networks,” IEEE J. Sel. Areas Commun., vol. 31, pp.

26–38, Dec. 2012.

[19] Y. Yin, H. Zhang, M. Zhang, M. Xia, Z. Zhu, S. Dahlfort, and S. Yoo,

“Spectral and spatial 2D fragmentation-aware routing and spectrum

assignment algorithms in elastic optical networks,” J. Opt. Commun.

Netw., vol. 5, pp. A100–A106, Oct. 2013.

[20] Y. Sone, A. Watanabe, W. Imajuku, Y. Tsukishima, B. Kozicki,

H. Takara, and M. Jinno, “Bandwidth squeezed restoration in spectrum-

sliced elastic optical path networks (SLICE),” J. Opt. Commun. Netw.,

vol. 3, pp. 223–233, Mar. 2012.

[21] W. Lu and Z. Zhu, “Malleable reservation based bulk-data transfer

to recycle spectrum fragments in elastic optical networks,” J. Lightw.

Technol., vol. 33, pp. 2078–2086, May 2015.

[22] M. Zeng, Y. Li, W. Fang, W. Lu, X. Liu, H. Yu, and Z. Zhu, “Control

plane innovations to realize dynamic formulation of multicast sessions in

inter-DC software-deﬁned elastic optical networks,” Opt. Switch. Netw.,

vol. 23, pp. 259–269, Jan. 2017.

[23] R. Gu, Z. Yang, and Y. Ji, “Machine learning for intelligent optical

networks: A comprehensive survey,” J. Netw. Comput. Appl., vol. 157,

pp. 1–22, May 2020.

[24] F. Scarselli, M. Gori, A. Tsoi, M. Hagenbuchner, and G. Monfardini,

“The graph neural network model,” IEEE Trans. Neur. Net., vol. 20, pp.

61–80, Jan. 2009.

[25] J. Zhou, G. Cui, S. Hu, Z. Zhang, C. Yang, Z. Liu, L. Wang, C. Li, and

M. Sun, “Graph neural networks: A review of methods and applications,”

AI Open, vol. 1, pp. 57–81, Jan. 2021.

[26] A. Ding and G. Poo, “A survey of optical multicast over WDM

networks,” Comput. Commun., vol. 26, pp. 193–200, Feb. 2003.

[27] K. Christodoulopoulos, I. Tomkos, and E. Varvarigos, “Elastic band-

width allocation in ﬂexible OFDM-based optical networks,” J. Lightw.

Technol., vol. 29, pp. 1354–1366, May. 2011.

[28] P. Zhu, J. Li, Y. Chen, X. Chen, Z. Wu, D. Ge, Z. Chen, and Y. He,

“Experimental demonstration of EON node supporting reconﬁgurable

optical superchannel multicasting,” Opt. Express, vol. 23, pp. 20495–

20 504, Aug. 2015.

[29] X. Li, L. Zhang, J. Wei, and S. Huang, “Deep neural network based

OSNR and availability predictions for multicast light-trees in optical

WDM networks,” Opt. Express, vol. 27, pp. 10648–10 669, 2020.

[30] K. Rusek, J. Suarez-Varela, A. Mestres, P. Barlet-Ros, and A. Cabellos-

Aparicio, “Unveiling the potential of graph neural networks for network

modeling and optimization in SDN,” in Proc. of SOSR 2019, pp. 140–

151, Oct. 2019.

[31] P. Sun, J. Lan, J. Li, Z. Guo, Y. Hu, and T. Hu, “Efﬁcient ﬂow migration

for NFV with graph-aware deep reinforcement learning,” Comput. Netw.,

vol. 183, p. 107575, Sept. 2020.

[32] N. Sambo, P. Castoldi, A. D’Errico, E. Riccardi, A. Pagano, M. Moreolo,

J. Fabrega, D. Raﬁque, A. Napoli, S. Frigerio, E. Salas, G. Zervas,

M. Nolle, J. Fischer, A. Lord, and J. Gimenez, “Next generation sliceable

bandwidth variable transponders,” IEEE Commun. Mag., vol. 53, no. 2,

pp. 163–171, 2015.

[33] M. Defferrard, X. Bresson, and P. Vandergheynst, “Convolutional neural

networks on graphs with fast localized spectral ﬁltering,” arXiv preprint

arXiv:1606.09375, pp. 1–9, Feb. 2017.

[34] V. Mnih, A. Badia, M. Mirza, A. Graves, T. Lillicrap, T. Harley,

D. Silver, and K. Kavukcuoglu, “Asynchronous methods for deep

reinforcement learning,” in Proc. of ICML 2016, pp. 1928–1937, Jun.

2016.

[35] Z. Zhang and V. Saligrama, “Zero-shot learning via semantic similarity

embedding,” in Prof. of ICCV 2015, pp. 4166–4174, Dec. 2015.

[36] X. Yang, “Designing trafﬁc proﬁles for bursty Internet trafﬁc,” in Prof.

of GLOBECOM 2002, pp. 1–6, Dec. 2002.

A preview of this full-text is provided by Optica Publishing Group.

Content available from Journal of Optical Communications and Networking

This content is subject to copyright. Terms and conditions apply.

Coordinating Multiple Light-Trails in Multicast Elastic Optical Networks With Adaptive Modulation

Article

Full-text available

Feb 2023

Optical multicasting has been considered resource efficient for multicast services. Light-tree and light-trail are two technologies that support optical multicasting while the former requires many splitters and thus experiences significant power loss. In this paper, we consider using the light-trail technology for the accommodation of multicast requests in elastic optical networks with adaptive modulation. For better spectrum efficiency, we consider accommodating each multicast by multiple light-trails. We formulate the problem by Mixed Integer Linear Programming (MILP) and propose efficient heuristic algorithms. For the impact of accommodation sequence on the algorithm performance, apart from the traditional sequence among different requests, we consider an additional sequence among the destinations of a multicast. For efficient multicast accommodation, we propose several strategies and compare their performances through a range of cases. To avoid a destination occupying excessive resources in certain cases of joining multiple light-trails, we propose an efficient algorithm to delete some duplicated destinations. Numerical results show that the proposed heuristic algorithms significantly outperform a benchmark algorithm and one performs close to the optimal MILP. Also, the algorithm for deleting certain destination replicas largely reduces the spectrum and transmitter usages, up to 41% and 20% for the cases considered, respectively.

Content Connectivity Based Polyhedron Protection Against Multiple Link Failures in Optical Data Center Networks

Chapter

Full-text available

Dec 2022

To further improve the resource efficiency of the p-polyhedron protection scheme against multi-link failures in optical data center networks (ODCNs), the content connectivity is considered when constructing the p-polyhedron structure. In this paper, the content connectivity-based polyhedron protection (CCPP) scheme is proposed. An ILP model and a heuristic algorithm are developed to realize the CCPP scheme. Numerical results show that the proposed CCPP scheme has a lower network redundancy. Moreover, the network redundancy of the CCPP scheme is positively correlated with the degree of content connectivity.

Dynamic slicing of multidimensional resources in DCI-EON with penalty-aware deep reinforcement learning

Article

Full-text available

Jan 2024
J OPT COMMUN NETW

With the increasing demand for dynamic cloud computing services, data center interconnections based on elastic optical networks (DCI-EON) require efficient allocation methods for spectrum, access IP bandwidth, and compute resources. Dynamic slicing of multidimensional resources in DCI-EON has emerged as a promising solution. However, improper reallocation of resources can diminish the benefits of slice reconfiguration, and different resource reconfiguration techniques can lead to varying degrees of service degradation for existing services. In this paper, we propose a prediction-based dynamic slicing approach (DS-DRL-RW) that leverages penalty-aware deep reinforcement learning (DRL) to optimize resource allocation while considering the trade-off between the benefits and penalties of slice reconfiguration. DS-DRL-RW employs statistical prediction to obtain a coarse-grained solution for dynamic slicing that does not differentiate among multidimensional resources. Subsequently, through focused DRL training based on the coarse-grained solution, the accurate result for multidimensional resource slicing is achieved. Moreover, DS-DRL-RW comprehensively considers the benefits and penalties associated with different reconfiguration techniques after slice reconfiguration, enabling the determination of a suitable reconfiguration strategy. Simulation results demonstrate that DS-DRL-RW improves training efficiency and reduces the blocking rate of dynamic services by integrating slice traffic prediction and DRL. It effectively addresses both direct penalties from reconfiguration and indirect penalties from resource waste, thereby enhancing multidimensional resource utilization. DS-DRL-RW effectively handles the diverse penalties associated with various reconfiguration techniques and selects the appropriate reconfiguration strategy. Furthermore, DS-DRL-RW prioritizes the different quality requirements of services in slices, such as completion time, to avoid service degradation.

Multi-Agent DRL for Distributed Routing and Data Scheduling in Interplanetary Networks

Conference Paper

Full-text available

Aug 2023

With the fast development of deep space exploration missions, the data transfer in interplanetary networks (IPNs) is gaining increasing attention. In this work, we propose a deep reinforcement learning (DRL) based routing and data scheduling approach, which leverages a multi-agent setup for distributed operations and aims to balance the trade-off between average end-to-end (E2E) latency and delivery ratio of interplanetary data transfers (IP-DTs) well. Specifically, DRL agents based on asynchronous advantage actor-critic (A3C) are deployed on each IPN node to handle the routing and data scheduling of IP-DTs there separately. Simulation results confirm that our proposal can handle the routing and data scheduling of IP-DTs more adaptively and balance the tradeoff between the delivery ratio and average E2E latency better than the benchmarks.

Resource Allocation in Multicore Elastic Optical Networks: A Deep Reinforcement Learning Approach

Article

Full-text available

Mar 2023
COMPLEXITY

A deep reinforcement learning (DRL) approach is applied, for the frst time, to solve the routing, modulation, spectrum, and core allocation (RMSCA) problem in dynamic multicore fber elastic optical networks (MCF-EONs). To do so, a new environment was designed and implemented to emulate the operation of MCF-EONs-taking into account the modulation format-dependent reach and intercore crosstalk (XT)-and four DRL agents were trained to solve the RMSCA problem. Te blocking performance of the trained agents was compared through simulation to 3 baselines RMSCA heuristics. Results obtained for the NSFNet and COST239 network topologies under diferent trafc loads show that the best-performing agent achieves, on average, up to a four-times decrease in blocking probability with respect to the best-performing baseline heuristic method.

Study of Optical Networks, 5G, Artificial Intelligence and Their Applications

Preprint

Full-text available

Jan 2023

This paper discusses the application of artificial intelligence (AI) technology in optical communication networks and 5G. It primarily introduces representative applications of AI technology and potential risks of AI technology failure caused by the openness of optical communication networks, and proposes some coping strategies, mainly including modeling AI systems through modularization and miniaturization, combining with traditional classical network modeling and planning methods, and improving the effectiveness and interpretability of AI technology. At the same time, it proposes response strategies based on network protection for the possible failure and attack of AI technology.

TABDeep: A two-level action branch architecture-based deep reinforcement learning for distributed sub-tree scheduling of online multicast sessions in EON

Article

Feb 2024
COMPUT NETW

DRL assisted multi-objective algorithm for multicast scheduling in elastic optical network

Article

Nov 2023
COMPUT NETW

Bulk Transfers With GCN Scheduling In Digital Twin Networks

Conference Paper

May 2023

Frequency Dispersion Index based Spectrum Defragmentation for Multicast Services in Fixed/Flex-Grid Optical Networks

Conference Paper

Jul 2023

Deep neural network based OSNR and availability predictions for multicast light-trees in optical WDM networks

Article

Full-text available

Mar 2020
OPT EXPRESS

The quality of transmission (QoT) of a light-tree is influenced by a variety of physical impairments including attenuation, dispersion, amplified spontaneous emission (ASE), nonlinear effect, light-splitting, etc. Moreover, a light-tree has multiple destinations that have different distances away from the source node so that the QoT of the received optical signal at each destination is different from each other. Since the optical network is a living network, the real-time network state is difficult to obtain. Therefore, it is difficult to accurately and rapidly determine the QoT or availability of a light-tree. However, the QoT or availability of a light-tree obtained in advance not only guarantees the quality of service (QoS) but also helps to network optimization design. This paper studies the problems of the optical signal-to-noise ratio (OSNR) and availability predictions for multicast light-trees while leveraging deep neural network (DNN) in optical WDM networks. The DNN based OSNR and availability prediction methods are developed and implemented. Numerical results show that the DNN based OSNR prediction method reaches an accuracy of about 95%. And the DNN based availability prediction method reaches a high accuracy greater than 98%. These two methods provide a fast decision approach for light-tree construction.

Machine Learning for Intelligent Optical Networks: A Comprehensive Survey

Article

Full-text available

Feb 2020

With the rapid development of Internet and communication systems, both in the aspect of services and technologies, communication networks have been suffering increasing complexity. It is imperative to improve intelligence in communication networks, and several aspects have been incorporating with Artificial Intelligence (AI) and Machine Learning (ML). The optical network, which plays an important role both in core and access network in communication networks, also faces great challenges of system complexity and the requirement of manual operations. To overcome the current limitations and address the issues of future optical networks, it is essential to deploy more intelligence capability to enable autonomous and flexible network operations. ML techniques are proved to have superiority on solving complex problems, and thus recently, ML techniques have been used for many optical network applications. In this paper, a detailed survey of existing applications of ML for intelligent optical networks is presented. The applications of ML are classified in terms of their use cases, which are categorized into optical network control and resource management, and optical network monitoring and survivability. These applications are analyzed and compared according to the used ML techniques. Besides, a tutorial for ML applications is provided from the aspects of the introduction of common ML algorithms, paradigms of ML, and motivations of applying ML. Lastly, challenges and possible solutions of ML application in intelligent optical networks are also discussed, which intends to inspire future innovations in leveraging ML to build intelligent optical networks.

Unveiling the potential of Graph Neural Networks for network modeling and optimization in SDN

Conference Paper

Full-text available

Apr 2019

Network modeling is a critical component for building self-driving Software-Defined Networks, particularly to find optimal routing schemes that meet the goals set by administrators. However, existing modeling techniques do not meet the requirements to provide accurate estimations of relevant performance metrics such as delay and jitter. In this paper we propose a novel Graph Neural Network (GNN) model able to understand the complex relationship between topology, routing and input traffic to produce accurate estimates of the per-source/destination pair mean delay and jitter. GNN are tailored to learn and model information structured as graphs and as a result, our model is able to generalize over arbitrary topologies, routing schemes and variable traffic intensity. In the paper we show that our model provides accurate estimates of delay and jitter (worst case R2 = 0.86) when testing against topologies, routing and traffic not seen during training. In addition, we present the potential of the model for network operation by presenting several use-cases that show its effective use in per-source/destination pair delay/jitter routing optimization and its generalization capabilities by reasoning in topologies and routing schemes not seen during training.

Asynchronous Methods for Deep Reinforcement Learning

Article

Full-text available

Feb 2016

We propose a conceptually simple and lightweight framework for deep reinforcement learning that uses asynchronous gradient descent for optimization of deep neural network controllers. We present asynchronous variants of four standard reinforcement learning algorithms and show that parallel actor-learners have a stabilizing effect on training allowing all four methods to successfully train neural network controllers. The best performing method, an asynchronous variant of actor-critic, surpasses the current state-of-the-art on the Atari domain while training for half the time on a single multi-core CPU instead of a GPU. Furthermore, we show that asynchronous actor-critic succeeds on a wide variety of continuous motor control problems as well as on a new task involving finding rewards in random 3D mazes using a visual input.

Graph neural networks: A review of methods and applications

Article

Jan 2020

Lots of learning tasks require dealing with graph data which contains rich relation information among elements. Modeling physics systems, learning molecular fingerprints, predicting protein interface, and classifying diseases demand a model to learn from graph inputs. In other domains such as learning from non-structural data like texts and images, reasoning on extracted structures (like the dependency trees of sentences and the scene graphs of images) is an important research topic which also needs graph reasoning models. Graph neural networks (GNNs) are neural models that capture the dependence of graphs via message passing between the nodes of graphs. In recent years, variants of GNNs such as graph convolutional network (GCN), graph attention network (GAT), graph recurrent network (GRN) have demonstrated ground-breaking performances on many deep learning tasks. In this survey, we propose a general design pipeline for GNN models and discuss the variants of each component, systematically categorize the applications, and propose four open problems for future research.

Efficient flow migration for NFV with Graph-aware deep reinforcement learning

Article

Dec 2020
COMPUT NETW

Network Function Virtualization (NFV) enables flexible deployment of network services as applications. However, it is a big challenge to guarantee the Quality of Service (QoS) under unpredictable network traffic while minimizing the processing resources. One typical solution is to realize NF scale-out, scale-in and load balancing by elastically migrating the related traffic flows. However, it is difficult to optimally migrate flows considering the resources and QoS constraints. In this paper, we propose DeepMigration to efficiently and dynamically migrate traffic flows among different NF instances. DeepMigration is a Deep Reinforcement Learning (DRL)-based solution coupled with Graph Neural Network (GNN). By taking advantages of the graph-based relationship deduction ability from our customized GNN and the self-evolution ability from the experience training of DRL, DeepMigration can accurately model the cost (e.g., migration latency) and the benefit (e.g., reducing the number of NF instances) of flow migration among different NF instances and employ dynamic and effective flow migration policies generated by the neural networks to improve the QoS. Experiment results show that DeepMigration reduces the migration latency and saves up to 71.6% of the computation time than the state-of-the-art.

Beyond the mega-data center: networking multi-data center regions

Conference Paper

Jul 2020

Impairment-and Splitting-Aware Cloud-Ready Multicast Provisioning in Elastic Optical Networks

Article

Oct 2016

It is known that multicast provisioning is important for supporting cloud-based applications, and as the traffics from these applications are increasing quickly, we may rely on optical networks to realize high-throughput multicast. Meanwhile, the flexible-grid elastic optical networks (EONs) achieve agile access to the massive bandwidth in optical fibers, and hence can provision variable bandwidths to adapt to the dynamic demands from cloud-based applications. In this paper, we consider all-optical multicast in EONs in a practical manner and focus on designing impairment-and splitting-aware multicast provisioning schemes. We first study the procedure of adaptive modulation selection for a light-tree, and point out that the multicast scheme in EONs is fundamentally different from that in the fixed-grid wavelength-division multiplexing (WDM) networks. Then, we formulate the problem of impairment-and splitting-aware routing, modulation and spectrum assignment (ISa-RMSA) for all-optical multicast in EONs and analyze its hardness. Next, we analyze the advantages brought by the flexibility of routing structures and discuss the ISa-RMSA schemes based on light-trees and light-forests. Our study suggests that for ISa-RMSA, the light-forest based approach can use less bandwidth than the light-tree based one, while still satisfying the quality of transmission (QoT) requirement. Therefore, we establish the minimum light-forest problem for optimizing a light-forest in ISa-RMSA. Finally, we design several time-efficient ISa-RMSA algorithms, and prove that one of them can solve the minimum light-forest problem with a fixed approximation ratio. Index Terms—Elastic optical networks (EONs), All-optical multicast, Routing, modulation and spectrum assignments (RM-SA), Impairment, Approximation algorithm.

Control Plane Innovations to Realize Dynamic Formulation of Multicast Sessions in Inter-DC Software-Defined Elastic Optical Networks

Article

Apr 2016

It is known that to support the applications such as datacenter backup and migration, multicast should be supported efficiently in inter-datacenter (inter-DC) networks to carry the corresponding point-to-multiple-point communications. Moreover, due to the traffic dynamics in inter-DC networks, we might have to consider the case that the multicast members can join or leave a multicast session dynamically. Therefore, in this work, we try to leverage control plane innovations to realize dynamic formulation of multicast sessions in inter-DC software-defined elastic optical networks (SD-EONs), which are equipped with multicast-incapable bandwidth-variable wavelength selective switches (MI-BV-WSS'). Here, one key issue to address is that the continuous changing of multicast group members can degrade the optimality of a multicast-tree. Hence, we propose to rearrange the multicast-trees adaptively to reduce their spectrum usage. Meanwhile, we try to minimize the frequency of rearrangements to avoid unnecessary operation complexity. Based on these considerations, we propose several multicast-tree rearrangement algorithms for updating multicast sessions dynamically with lightpath reroutings in inter-DC SD-EONs. Both partial and full multicast-tree rearrangements are studied. Simulation results indicate that the proposed algorithms can rearrange the multicast-trees intelligently such that the blocking probability can be reduced effectively with the least lightpath reroutings. Next, based on these theoretical investigations, we consider how to implement the proposed algorithms in the control plane of an inter-DC SD-EON. We extend the OpenFlow (OF) protocol to support the dynamic formulation of multicast sessions and also design the functional models in the control plane elements to realize multicast-tree rearrangements. Experiment results verify the effectiveness of our proposed algorithms and system design.

Highly Efficient Data Migration and Backup for Big Data Applications in Elastic Optical Inter-Data-Center Networks

Article

Sep 2015

This article discusses the technologies for realizing highly efficient data migration and backup for big data applications in elastic optical inter-data-center (inter-DC) networks. We first describe the impacts of big data applications on underlying network infrastructure and introduce the concept of flexible-grid elastic optical inter-DC networks. Then we model the data migration in such networks as dynamic anycast and propose several efficient algorithms. Joint resource defragmentation is also discussed to further improve network performance. For efficient data backup, we leverage a mutual backup model and investigate how to avoid the prolonged negative impacts on DCs' normal operation by minimizing the DC backup window.

Reconfiguring multicast sessions in elastic optical networks adaptively with graph-aware deep reinforcement learning

Abstract and Figures

Recommended publications

DRL-assisted Light-tree Reconfiguration for Dynamic Multicast in Elastic Optical Networks

DRL-assisted Dynamic Reconfiguration of Multicast Sessions in Elastic Optical Networks

Reconfiguring Multicast Sessions in EONs Adaptively with Deep Reinforcement Learning

GNN-based Hierarchical Deep Reinforcement Learning for NFV-Oriented Online Resource Orchestration in...