Content uploaded by Kim-Khoa Nguyen
Author content
All content in this area was uploaded by Kim-Khoa Nguyen on Jan 30, 2022
Content may be subject to copyright.
Joint Selection of Local Trainers and Resource
Allocation for Federated Learning in Open RAN
Intelligent Controllers
Amardip Kumar Singh, Kim Khoa Nguyen
Synchromedia Lab , ´
Ecole de Technologie Sup´
erieure, Montreal, Canada
amardip-kumar.singh.1@ens.etsmtl.ca, kim-khoa.nguyen@etsmtl.ca
Abstract—Recently, Federated Learning (FL) has been applied
in various research domains specially because of its privacy pre-
serving and decentralized approach of model training. However,
very few FL applications have been developed for the Radio
Access Network (RAN) due to the lack of efficient deployment
models. Open RAN (O-RAN) promises a high standard of
meeting 5G services through its disaggregated, hierarchical, and
distributed network function processing framework. Moreover, it
comes with built-in intelligent controllers to instill smart decision
making ability into RAN. In this paper, we propose a framework
named O-RANFed to deploy and optimize FL tasks in O-RAN
to provide 5G slicing services. To improve the performance of
FL we formulate a joint mathematical optimization model of
local learners selection and resource allocation to perform model
training in every iteration. We solve this non-convex problem
using the decomposition method. First, we propose a slicing based
and deadline aware client selection algorithm. Then, we solve the
reduced resource allocation problem by using successive convex
approximation (SCA) method. Our simulation results show the
proposed model outperforms the state-of-the-art FL methods
such as FedAvg and FedProx in terms of convergence, learning
time, and resource costs.
Index Terms—Federated Learning, O-RAN, 5G, Resource
Allocation, RAN Intelligent Controller, Network Slicing, RIC
I. INTRODUCTION
Federated Learning (FL) is a new approach fusing the
concepts of distributed computing and Artificial Intelligence
[1]. Unlike the traditional machine learning in which models
are trained on a centralized data set, FL does not require the
data sets to be located at one central point. It collaborates with
the local computing nodes through a global aggregation point
by transferring only the model update vectors in a periodic
communication mode. Therefore, it respects the privacy of
the local data set, distributes the burden of computing, and
still train a centralized model [1]. These features enable
many applications of FL in edge computing where users’
devices serve as local trainers. However, it is yet to be well
investigated in carriers’ networks due to the lack of a standard
implementation model. The characteristics of FL pave way for
its uses in the domain of 5G RAN, particularly to enhance the
radio resource management policies [2].
5G network services promise to deliver ever fast and reliable
user experience that too on a massive scale. This comes with
a burden on improvements in how the access network is
managed. Using the operational and maintenance data that is
collected periodically, RAN performance can be improved by
incorporating AI capabilities. O-RAN is a newly introduced
radio access network architectural framework that supports
different use cases of 5G services through network slicing and
disaggregation. The key element of O-RAN is its intelligent
controllers (RICs) that monitor and support the guaranteed per-
formance for different slice user groups [3]. On the other hand,
O-RAN operates on multi-vendor and shared resource system.
Therefore, it has to function under minimal resource cost and
yet with guaranteed service delivery. These constraints impose
heavy challenges posed for the O-RAN RICs.
In this paper, we investigate the possibility of FL model
training in O-RAN to provide 5G slicing services. 5G services
are governed by slicing of the physical network into several
logical isolated self-adaptable networks hosted by general
processors located at cloud-based data centres [4]. 5G defines
three classes of services based on their quality of services
(QoS) metrics: ultra-Reliable Low Latency Communication
(uRLLC), extreme Mobile Broad Band (eMBB), and massive
Machine Type Communications (mMTC). Accordingly, the
physical RAN infrastructure is also divided into three logi-
cal slices of network elements that are assigned and main-
tained dynamically [1]. The two kinds of O-RAN Intelligent
Controllers namely, near-realtime RAN Intelligent Controller
(near-RT-RIC) and Non-RT-RIC coordinate with each other to
select the slice specific local training points and then train the
FL models.
In Fig. 1, we map the FL framework via RICs’ specifications
of O-RAN [5]. The slice operational data is collected and
saved into distributed databases through E2 interface. O1 inter-
face transfers data to near RT-RICs for local processing. The
interaction of FL parameters is enabled by the A1 interface
between Non-RT-RIC and near-RT-RICs. Since the FL tasks
are distributed over edge cloud sites in O-RAN RICs, and
shared by various operators, resources required to facilitate this
learning process need to be optimized. In addition, FL model is
trained iteratively, therefore learning time is also an issue. Due
to the stochastic nature of this distributed learning process,
minimizing the learning time while guaranteeing the accuracy
of the global model is challenging. Hence, the resource usage
cost reduction and FL time minimization should be dealt
together for the O-RAN intelligent controllers.
A plethora of recent works have focused on the adaptive
ML Model Application
Data Base
RAN Data Analytics & AI Platform
O-CU-UP
O-CU-CP
O-DU
O-RU
Near-
RT-RIC
ML Model
Host
Non-RT-RIC
Service & Management Orchestration Functions
A1
O1
O2
E2
O1
E1
F1
E2
Near-
RT-RIC
Near-
RT-RIC
near-
RT-RIC
Near-
RT-RIC
Near-
RT-RIC
near-
RT-RIC
Near-
RT-RIC
Near-
RT-RIC
near-
RT-RIC
O-CU-UP
O-CU-CP
O-DU
O-RU
O-CU-UP
O-CU-CP
O-DU
O-RU
Data Base
Data
Base
Data
Base Data Base
Data Base
Data
Base Data Base
Data Base
Data
Base
F1
F1
E1 E1
E2 E2
uRLLC
slice
eMBB
slice
mMTC
slice
Local model
update vector
uploading
Global model
aggregated vector
broadcast
Fig. 1. Federated Learning set-up for O-RAN Intelligent Controllers
optimization of FL under communication constrained edge
computing. Prior work in [6], [7] and [8] have investigated
different resource constrained federated learning approaches.
Their objective is to minimize the energy usage of edge
devices involved in FL training by allocating optimal trans-
mission power. The work in [8] is too general and can only
be implemented in a single edge (with a single base station)
and cannot be used for a carrier network of multiple edges.
Moreover, the problem of resource allocation from the per-
spective of the System Management and Orchestration (SMO),
which is important for O-RAN, has not been considered so
far. In this paper, we have taken into account new parameters
of O-RAN architecture to design an algorithm to select local
trainers, then allocate resources for the selected trainers, and
propose an aggregation method. Our contributions in this paper
can be summarized as below:
•A mathematical formulation of the joint optimal resource
allocation and local trainers’ selection problem for the O-
RANFed learning tasks. Then, we propose a solution for
this non-convex optimization problem using the decom-
position method.
•An O-RAN slicing based and deadline aware algorithm
to select representative instances of near-RT-RIC as local
model participants in each global iteration of FL.
•A FL algorithm, so called O-RANFed, for O-RAN slicing
services where near RT-RIC hosts the local training
instances and Non-RT-RIC hosts the global aggregation
point of the ML model.
To the best of our knowledge, this is the first work proposed
to optimize ML training through federated settings in O-
RAN. The remainder of this paper is organized as follows. In
Section II, the system model and the problem formulation are
presented. In Section III, we describe our proposed solution
approach. In Section IV, we present the numerical results to
evaluate the performance of our proposed solution. Finally, we
conclude the paper and discuss our future work.
II. SY ST EM MO DE L AN D PROB LE M FOR MU LATI ON
Consider an O-RAN system with a single regional cloud and
a set Mof Mdistributed edges cooperatively performing an
FL algorithm. In this FL setup, each edge cloud uses its locally
collected training data to train a local FL model. The Non-
RT-RIC at the regional cloud integrates the local FL models
from participating edge clouds and generates an aggregated FL
model. This aggregated FL model is further used to improve
local FL models of each near-RT-RIC enabling the local
models to collaboratively perform a learning algorithm without
transferring its training data. We call this aggregated FL model
generated by using the local FL models as the global FL
model. As illustrated in Fig. 1, the uplink from near-RT-RICs
to the Non-RT-RIC is used to send the local FL model update
parameters and the downlink is used to broadcast the global
FL model in global rounds of the training.
A. The Learning Model
In this model, each near-RT-RIC collects a dataset Di=
[xi,1, ....., xi,Si]of input data where Siis the number of the
input samples collected by near-RT-RIC iand each element xis
is the FL model’s input vector. Let yis be the output of xis . For
simplicity, we consider an FL model with single output, which
can be readily generalized to a case with multiple outputs. The
output data vector for training the FL model of near-RT-RIC i
is yi= [yi,1, ....., yi,Si]. We assume that the data collected by
each near-RT-RIC is different from the other near-RT-RICs
i.e. (xi6=xj;i6=j, i, j ∈ M). So, each local trainer
will train the model using a different dataset. This is in line
with the real scenario as each local near-RT-RIC collects the
operational data from the corresponding slice specific users.
We define a vector gito capture the parameters related to the
local FL model that is trained by Siand yi.gidetermines
the local FL model of each near-RT-RIC i. For example, in a
linear regression prediction algorithm, ximT.yirepresents the
output, and gidetermines the prediction accuracy. The training
process of an FL model is done in a way to solve:
min
g1,......,gM
1
S
M
X
i=1
Si
X
s=1
f(gi, xis, yis )(1)
s.t. g1=g2=..... =gM=g∀i∈ M (1a)
where S=PM
i=1 Siis the total size of training data of
all near-RT-RICs. gis the global FL model generated by
the Non-RT-RIC and f(gi, xis, yis )is a loss function that
captures the FL prediction accuracy. Different FL algorithms
use different loss functions. Constraint (1a) ensures that, once
the FL model converges, all of the near-RT-RICs and the Non-
RT-RIC will transmit the parameters gof the global FL model
to its connected near-RT-RICs so that they train their local FL
models. Then the near-RT-RICs will transmit their local FL
models to the Non-RT-RIC to update the global FL model. The
update of each near-RT-RIC i’s local FL model gidepends on
all near-RT-RICs’ local FL models. The update of the local FL
model gidepends on the learning algorithm. For example, one
can use gradient descent or randomized coordinate descent to
update the local FL model. The update of the global model g
is given by:
g=
M
X
i=1
Si.gi
S.(2)
Since, we are considering wireless transmissions through A1
interface between near-RT-RICs and the Non-RT-RIC [5],
there is a resource constraint on the communication model
which in turn affects the performance of FL learning algo-
rithm. Therefore, we need to jointly consider these two aspects.
B. FL Resource Model
In each global interaction, the O-RAN system has to decide
which local training points i.e. which near-RT-RIC to partic-
ipate. This is because at each time interval only a limited
number of clients can participate due to delay constraints
originating from the control loops of O-RAN. Therefore,
the selected clients upload their local FL models updates
depending on the wireless media. We define a binary variable
at
m∈ {1,0}to decide whether or not the trainer mis
selected in round t, and at= (at
1, ......, at
M)collects the
overall trainers’ selection decisions. A selected near-RT-RIC
in round ti.e. (at
m= 1), consumes compute resources to
train locally with the collected data. At the same time, these
selected trainers also consume bandwidth resources to transmit
update vectors. We consider the orthogonal frequency division
multi access (OFDMA) for local model uploading with a total
bandwidth B. Let bt
m∈[0,1] be the bandwidth allocation
ratio for trainer min round t, hence its allocated bandwidth
is bt
mB. Let bt= (bt
1, ......., bt
M). Bandwidth allocation must
satisfy Pm∈M bt
m= 1 ∀t. Clearly if at
m= 0, namely trainer
mis not selected in round t, then no bandwidth is allocated to
it i.e. bt
m= 0. On the other hand, if at
m= 1, then we require
at least a minimum bandwidth bmin is to be allocated to the
trainer mi.e. bt
m≥bmin. To make the problem feasible, we
assume bmin ≤1
M. Therefore, total resource cost for using
communication bandwidth is:
Rco =
M
X
m=1
Rco
m=
T
X
t=1
at
mbt
mBptr (3)
for Tglobal rounds where ptr is the unit cost of bandwidth
usage. For each near-RT-RIC m, let Rcp
mdenote its local
training compute resource cost in every round which depends
on its computing host and dataset. To process the local dataset
each near-RT-RIC uses the CPU cycle frequency of the host.
Let the CPU power of mth host be fmcycles/s and the per
unit time usage cost be pc. Then the total compute resource
cost is:
Rcp =
M
X
m=1
Rcp
m=
T
X
t=1
at
m
Dmcm
fm
pc(4)
where cmis the CPU cycles required for processing a bit of
data.
C. FL Accuracy Model
The target for each of these local models is to attain a
θ (0,1) level of accuracy, defined as below:
||∇ft
m(gt
m)|| ≤ θ||∇ft
m(gt−1
m)||,∀m {1,2,3, .., M }(5)
A near-RT-RIC takes several iterations, called local iterations,
to attain this accuracy. In the global model placed at the non-
RT-RIC, the target is to attain the optimal model weights to
reach level of global accuracy, defined as below:
|F(gt)−F(g)| ≤ ∀t≥T(6)
Constraint (6) states that the gis the optimal model parameter
i.e. for every global round beyond T, the difference between
the loss function values falls within the defined accuracy level.
Here, F(.)denotes the global loss function, defined over all
the local loss functions as:
∇F(g) :=
M
X
i=1
(|Si|/S)fi(gi)(7)
In [9], it is proven that the number of global iterations required
to attain a level of global accuracy and local accuracy θcan
be upper bounded by:
K(, θ) = O(log(1/))
(1 −θ)(8)
We use this relationship among the local accuracy level, global
model accuracy, and the upper limit on the number of required
global rounds to model the FL time. In order to ensure
the convergence of the gradient descent approximation, the
following assumptions are considered on the loss functions at
each near-RT-RIC training point:
(i) Fi(g)is convex.
(ii) Fi(g)is ρ-Lipschitz, i.e. ||Fi(g)−Fi(g0)|| ≤ ρ||g−g0||,
for any g, g0.
(iii) Fi(g)is β-smooth, i.e, ||∆Fi(g)−∆Fi(g0)|| ≤ β||g−
g0||, for any g, g0.
(iv) For any gand i, the difference between the global gradient
and local gradient can be bounded by
||∆Fi(g)−∆F(g)|| ≤ δi, and δ:= PiSi.δi
S.
These assumptions are in line with the recent works [10], [11],
[6], [12] on convergence analysis of FL.
D. Latency Model
We consider synchronous communication, in other words
all the near-RT-RICs send their local update vectors to the
Non-RT-RIC before the tth round of global aggregation starts.
Therefore, before entering this communication round, all
the near-RT-RICs must finish its local ML processing. In
each of the global round, the FL tasks are spanned over
three operations: (i) computation, (ii) communication of local
updates to the Non-RT-RIC using uplink, and (iii) broad-
cast communication to all the involved near-RT-RICs using
downlink. Let the computation time required for one local
round for mth near-RT-RIC be Tcp
m, and there be Kllocal
iterations in each interval of the global communication. Then,
the computation time in one global iteration round is KlTcp
m.
Let the communication time required in transferring the local
update vectors from mth near-RT-RIC to the Non-RT-RIC be
Tco
min the uplink phase. Let dmbe the datasize of the update
vector of mth trainer. Therefore, the learning time in one
global round of FEDL for the mth local FL model trainer
is:
Tm=Kl.T cp
m+Tco
m;m∈ M (9)
Where Tco
mis calculated as:
Tco
m=dm
bt
m.B ;m∈ M (10)
In the downlink phase, we do not consider the delay because
it is negligible as compared to the uplink delay as a result
of high speed downlink communication. Let Kbe the total
number of global rounds to attain the global accuracy as
established in (8). Therefore, the total learning time can be
modeled as:
Ttotal =K.Tmax =K.max{Tm;∀m∈ M} (11)
E. Problem Formulation
Our goal is to jointly minimize the resource cost and the
learning time under the constraints of model accuracy, and
available compute and bandwidth resources. This can be done
by optimizing the selection of local trainers i.e. near-RT-RICs,
bandwidth allocation, and number of local training rounds, as
formulated in the optimization model (12).
P: min
at,bt,θ,Kl
{(1 −ρ)Rtotal +ρ.T total}(12)
subject to:
0< θ < 1,(12a)
M
X
m=1
at
m.bt
m.B ≤B, (12b)
M
X
m=1
bt
m= 1,(12c)
bmin ≤bt
m≤1 ; ∀mM,(12d)
max
m{cm.Dm
fm
+Tco
m}=Tmax,(12e)
K=µ.log(1/)
(1 −θ),(12f)
at
m∈ {1,0}.(12g)
The objective function (12) has two components balanced by
a trade-off parameter ρbecause the two goals are conflicting.
The total resource cost, Rtotal =Rcp +Rco and the FL
training time, Ttotal as given by (11). Minimizing the resource
cost naturally leads to higher learning time and vice-versa.
Constraint (12a) limits the local accuracy line. Constraint
(12b) bounds the total bandwidth allocated for the FL tasks.
Constraint (12c) presents the definition of bt
mi.e, the sum of
bandwidth fractions must be 1. (12d) denotes the boundary of
the bandwidth fractional allocation. Since we have assumed
the synchronous communication mode of update vectors in
each global round, (12e) imposes this criteria. The relationship
between local accuracy and the number of global rounds
is stated in (12f) where µis a multiplication factor. (12h)
represent the defining domain of the decision variable.
III. PROP OS ED SOLUTION
(12) is a non-convex optimization problem because of the
non-convex objective function and constraints (12e)-(12g). So,
we decompose the problem into two sub-problems and then
use iterative solution to reach the optimal solution. We first
solve the problem of trainers’ selection and then use this
solution to allocate resources optimally to these selected local
trainers. Fig. 2 shows the scheme of the proposed solution.
Due to the variation in traffic patterns for different kinds
Original problem (12):
Resource cost and FL
time minimization,
sub. to optimal client
selection and
bandwidth allocation
Slicing based
and deadline
aware client
selection (13)
Resource
Allocation for
selected near-
RT-RICs (14)
Solution Approach
Solving (13) using
Algorithm 1
Solving (14) using
SCA method
Using the
solutions to
implement
Algorithm 2
Fig. 2. Schematic Diagram of the Proposed Solution
of slicing services of O-RAN, the local FL model might en-
counter inconsistency problem. This may lead to a degradation
in accurate prediction. We take into account this differentiation
and propose a trainers’ selection algorithm that respects the
formation of slices in O-RAN while maintaining a deadline
awareness.
A. Local Trainers’ Selection
According to the specifications defined by O-RAN Alliance,
the collected RAN operational data can be separated based on
their slice-user groups. Each near-RT-RIC is then fed with
slice specific network data. The selection of a near-RT-RIC
corresponding to a slice must be incorporated in each iteration
of gradient descent training of the model. However, not all the
local models can be accommodated in each iteration because
of the deadline constraint (13a) and limited computational and
bandwidth resources to be assigned for this learning task. So,
we propose Algorithm 1 for this selection in alignment with
the O-RAN slice definition. In this algorithm, we categorize
the set of near-RT-RICs into three classes corresponding to
eMBB, uRLLC, and mMTC slicing services.
Our objective in this trainers’ selection algorithm is to
maximize the number of near-RT-RICs to participate in each
global round and allow the non-RT-RIC to aggregate all
received data. This is based on the idea that a larger fraction
of trainers in each round saves the total time required for a
global FL model to attain the desired accuracy performance
[13]. Let N(⊆ M)be the set of selected near-RT-RICs, tr ound
be the deadline for each global round, t1be the elapsed time to
perform Algorithm 1, and tagg be the time taken in aggregating
the update parameters at the Non-RT-RIC. Therefore, the
Algorithm 1 : Deadline aware and Slicing based Local
Trainers’ Selection
1: Input: M: Set of all near-RT-RICs
2: Initialize Nu,Ne,Nm= Φ
3: for ti
round defined for i∈ {N u,Ne,Nm}do
4: while |N | >0do
5: x←arg minn∈N 1
2.(tk−1
n+α.tk
n(estimated))
6: t←t1+tagg. +tk
n
7: N \ {x}
8: if t<ti
round then
9: t←t+tk
n
10: end if
11: end while
12: end for
13: Output: N=Nu∪ N e∪ N m
mathematical optimization problem for the trainer selection
becomes:
max
N{|N |} (13)
s.t. t1+tagg. +1
2(tk−1
n+α.tk
n)≤tround.(13a)
(13) is a combinatorial optimization problem which makes it
non-trivial. So, we employ a greedy heuristic to solve this
problem as shown in Algorithm 1. We repeat the steps in
each global round until we get the desired accuracy. Here, the
constraint (13a) restricts the violation of the deadline for every
near-RT-RIC in each global round. The deadline is assigned
separately for each slice-user groups while the total deadline
in each round is varied experimentally to observe its impact
on overall learning time of FL model.
B. Resource Allocation
From the trainer selection phase, we obtain ati.e. a binary
valued vector of selected trainers in kth global round. The
next phase is to allocate the compute and bandwidth resources
to support the local training, parameters uploading, model
aggregation, and broadcast of updated model weights. For
this we solve the optimization problem (12) with known
variable at. Still, (12) is a non-convex optimization prob-
lem, exact solution of which is infeasible using traditional
methods. Therefore, we employ an approximation approach
with equivalent surrogate functions. The multiplication factor
µin (12f) is chosen such that the whole numerator part is 1.
(12d) is replaced by an inequality preserving the same lower
bound on Tmax value. With these changes and substituting the
defining expressions, the optimization problem (12) reduces
the following mathematical form:
P1: min
bt,θ,Kln(1 −ρ)T
X
t=1
at
m.bt
m.B.ptr
+Kl.
T
X
t=1
at
m.Dm.cm
fm
.pc+ρ. 1
1−θ.Kl.Tmaxo(14)
subject to: (12a), (12b), (12c), (12d), and (12e)
The number of local iterations (Kl) in each global round is
determined experimentally as required in attaining the local
accuracy value θ. We solve this problem using Successive
Convex Approximation (SCA) method.
C. Federated Training in O-RAN RICs(O-RANFed)
Using the solutions of trainer selection and resource al-
location in (14), we train the FL model as described in
Algorithm 2. In each global round, a subset of participating
local trainers are selected first followed by resource allocation,
and then interaction of local Fl models with the global Fl
model. This loop continues for Kiterations, which is the
maximum number of global rounds required to attain the
prefixed accuracy of the model.
Algorithm 2 :ORANFed
1: Initialize: Untrained local model at each near-RT-RIC i∈
M;
2: for k≤K(accuracy) do
3: Non-RT-RIC uses Alg. 1 for client selection;
4: compute and bandwidth resources are assigned to se-
lected near-RT-RICs (N);
5: Each near-RT-RIC trains using local data till it achieves
an accuracy θand obtains gi,k;
6: Model update parameters edge clouds sent to the Non-
RT-RIC;
7: Non-RT-RIC aggregates the local weights through (2);
8: Non-RT-RIC broadcasts the aggregated parameters;
9: Non-RT-RIC calculates the global accuracy attained (6);
10: end for
11: Finally trained model is sent to SMO for deployment
D. Complexity Analysis
O-RANFed consists of trainers’ selection in step 3, and
assignment of resources in step 4. Steps 5 to 9 trains the FL
model iteratively. So, its complexity can be analysed in two-
part. In the first part, (13) is solved using Algorithm 1 having
time complexity of O(L)where Lis the cardinality of the set
M. In the second part, (14) is solved using SCA approach
having complexity O(JSCA )[14], where JSC A is the total
number of iterations within the SCA algorithm.
IV. NUMERICAL RES ULT S
TABLE I
SIMULATION SETTINGS
Parameter Value Parameter Value
N50 B1MHz
cm15cycles/bit fm∼U(1,1.6)GHz
ptr 1pc1
Dm∼U(5,10)MB d 1
bmin 0.1MHz ρ (0,1)
Federated Learning Task: We trained a prediction model
where each near-RT-RIC processes a time series data con-
taining the volume of traffic requested by its corresponding
slice in a period of one month. The Dataset represents hourly
operational data of lower level network traffic. Using this
dataset, the trained model predicts the requirement of amount
of traffic in the next hour. We used Long Short Term Memory
(LSTM) based neural network with 4 layers to train this
regression model. We run this training on Intel(R) Core(TM)
i5-8265U CPU. The model attains 96.3% (approx.) accuracy
in centralized ML model. Therefore, the global accuracy of
the FL model is taken as 0.96.
Wireless Network: In order to compare our proposed model
with state-of-the-art FL methods, we considered a compatible
wireless network setting as described in Table 1. For simplic-
ity, all near-RT-RICs have the same data processing rate cm.
We used a uniform distribution random generator for assigning
CPU frequency (fm) of the host. The maximum bandwidth
capacity (B) is 1 MHz whereas the minimum (bmin) is kept
at 0.1. To better present result and without loss of generality,
the communication and compute cost (ptr, pc) are set as unit
value. The local dataset size is distributed uniformly in the
range of (5,10) MB. For benchmark, we consider FedAvg
[13] algorithm with fixed number of clients (N= 50), which
is the maximum number of near-RT-RICs in each global round.
Another prominent FL method is FedProx [15], which takes a
probability distribution to select the number of clients in each
global round. These two methods are suitable for comparative
analysis as one sets the upper limit on the client selection,
the later follows variable selection policy that differs from
our proposed ORANFed. Fig. 3 presents a comparison of the
Fig. 3. Trainer Selection pattern Fig. 4. Accuracy convergence
Fig. 5. Resource Cost comparison Fig. 6. Learning Time Cost
three FL approaches in terms of the number of clients selected
in each global rounds with respect to the total learning time
elapsed in the training process. FedAvg serves as the baseline
keeping a constant value. O-RANFed gradually attains the
maximum number of clients as the time progresses which
shows its efficiency over FedProx. Then, we compare the
accuracy achieved after each global round by each of the FL
methods. In Fig. 4, O-RANFed takes significantly less number
of global rounds to achieve the same accuracy compared to
the two other methods. This helps O-RANFed in saving FL
time as well as resources.
In terms of the objectives costs (resource and time), O-
RANFed performs better than FedAvg and FedProx. Fig. 5 and
6 show learning time and resources consumed by each method.
In these figures the behaviour of FL methods is plotted against
the pareto co-efficient (ρ). We can see that the learning time
of O-RANFed is the lowest, and it is much lower for higher
value of ρ. Moreover, resource cost required by O-RANFed is
the lowest, and it is lower for smaller value of ρ.
V. CONCLUSION
In this paper, we proposed a federated learning method
designed for O-RAN slicing environment. Our model takes
into account the importance of slice specific local trainers as
well as the resource allocation for performing FL tasks. The
simulation results show FL can be implemented to predict
data traffic of different slices in O-RAN. Our proposed model
outperforms state-of-the-art FL methods in terms of learning
time and resource cost in the simulations. Therefore, it can
be deployed in the control loops of O-RAN to guarantee the
slice QoS. In future, we will investigate the location of the
distributed data collection points to improve O-RANFed in a
highly distributed environment.
ACKNOWLEDGMENT
The authors thank Mitacs, Ciena, and ENCQOR for funding
this research under the IT13947 grant.
REFERENCES
[1] S. Abdulrahman and et al., “A survey on federated learning: The journey
from centralized to distributed on-site learning and beyond,” IEEE IoT
Journal, vol. 8, no. 7, pp. 5476–5497, 2021.
[2] Z. Zhao and et al., “Federated-learning-enabled intelligent fog radio
access networks: Fundamental theory, key techniques, and future trends,”
IEEE MCW, vol. 27, no. 2, pp. 22–28, 2020.
[3] S. K. Singh and et al., “The evolution of radio access network towards
open-ran: Challenges and opportunities,” in IEEE WCNCW, 2020, pp.
1–6.
[4] O-RAN Alliance, “O-RAN-WG1.OAM-Architecture-v02.00,” 2019.
[5] H. Lee and et al., “Hosting ai/ml workflows on o-ran ric platform,” in
2020 IEEE GCWkshps, 2020, pp. 1–6.
[6] H. H. Yang and et al., “Scheduling policies for federated learning in
wireless networks,” IEEE TCOMM, vol. 68, no. 1, pp. 317–333, 2020.
[7] W. Shi and et al., “Joint device scheduling and resource allocation for
latency constrained wireless federated learning,” IEEE TWC, vol. 20,
no. 1, pp. 453–467, 2021.
[8] C. T. Dinh and et al., “Federated learning over wireless networks: Con-
vergence analysis and resource allocation,” IEEE/ACM TNET, vol. 29,
no. 1, pp. 398–409, 2021.
[9] J. Koneˇ
cn`
y and et al., “Federated optimization: Distributed machine
learning for on-device intelligence,” preprint arXiv:1610.02527, 2016.
[10] Z. Yang and et al., “Energy efficient federated learning over wireless
communication networks,” ITWC, vol. 20, no. 3, pp. 1935–1949, 2021.
[11] S. Wang and et al., “Adaptive federated learning in resource constrained
edge computing systems,” IEEE JSAC, vol. 37(6), pp. 1205–1221, 2019.
[12] M. Chen and et al., “A joint learning and communications framework for
federated learning over wireless networks,” IEEE TWC, vol. 20, no. 1,
pp. 269–283, 2021.
[13] B. McMahan and et al., “Communication-efficient learning of deep net-
works from decentralized data,” in Artificial intelligence and statistics.
PMLR, 2017, pp. 1273–1282.
[14] S. Boyd and L. Vandenberghe, Convex optimization. Cambridge
university press, 2004.
[15] T. Li and et al., “Federated optimization in heterogeneous networks,”
arXiv preprint arXiv:1812.06127, 2018.