ArticlePDF Available

Quality- and Availability-Based Device Scheduling and Resource Allocation for Federated Edge Learning

November 2022
IEEE Communications Letters PP(99):1-1

November 2022
PP(99):1-1

DOI:10.1109/LCOMM.2022.3194558

Authors:

Wanli Wen

Chongqing University

Chen Chen

Chongqing University

Yunjian Jia

Chongqing University

Show all 6 authorsHide

To achieve an efficient federated edge learning (FEEL) system, the scheme of device scheduling and resource allocation should jointly perceive the device availability, wireless channel quality, and local gradient quality. The existing literature on FEEL rarely considers these three aspects simultaneously, so the schemes they proposed still have room for improving the efficiency of FEEL, which motivates our work. In this paper, by mathematically modeling the device availability, wireless channel quality, and gradient quality, and deriving the convergence bound for model training in the FEEL system, we formulate a joint device scheduling and resource allocation problem, aiming to improve the FEEL efficiency. The formulated problem is a challenging non-convex problem. By exploring its structural properties and utilizing the KKT conditions, we obtain an optimal solution in closed-form. The analytical results enable us to gain some important insights into how the device availability, wireless channel quality, and gradient quality affect device scheduling and resource allocation in the FEEL system.

Content uploaded by Chen Chen

Content may be subject to copyright.

Quality- and Availability-Based Device

Scheduling and Resource Allocation

for Federated Edge Learning

Wanli Wen, Yi Zhang, Chen Chen, Yunjian Jia, Lu Luo, and Lei Tang

Abstract—To achieve an efﬁcient federated edge learning

(FEEL) system, the scheme of device scheduling and resource

allocation should jointly perceive the device availability, wireless

channel quality, and local gradient quality. The existing literature

on FEEL rarely considers these three aspects simultaneously, so

the schemes they proposed still have room for improving the

efﬁciency of FEEL, which motivates our work. In this paper, by

mathematically modeling the device availability, wireless channel

quality, and gradient quality, and deriving the convergence

bound for model training in the FEEL system, we formulate a

joint device scheduling and resource allocation problem, aiming

to improve the FEEL efﬁciency. The formulated problem is

a challenging non-convex problem. By exploring its structural

properties and utilizing the KKT conditions, we obtain an optimal

solution in closed-form. The analytical results enable us to gain

some important insights into how the device availability, wireless

channel quality, and gradient quality affect device scheduling and

resource allocation in the FEEL system.

Index Terms—Federated edge learning, device availability,

channel and gradient quality, scheduling, resource allocation.

I. INTRODUCTION

With the development of machine learning (ML), there is a

growing trend to deploy ML algorithms at the wireless edge

to extract useful knowledge from massive data generated on

end-user devices such as smartphones and cars. For traditional

ML algorithms, the training data needs to be gathered at the

edge server, such as the base station or access point, for

model training. However, the devices may be reluctant to share

sensitive data with the server due to concerns about privacy

disclosure. To address this issue, federated edge learning

(FEEL) has been proposed in recent years [1]–[3]. The model

training process of FEEL is an iterative process, where an

iteration is also called a communication round. In an arbitrary

round, each device downloads a global ML model from the

server, computes an updated model/gradient based on the local

dataset, and then submits the resultant model/gradient to the

server for parameter aggregation. An improved global model is

This work is sponsored by the National Natural Science Foundation of

China under Grant 61971077, the Natural Science Foundation of Chongqing,

China under Grant cstc2021jcyj-msxmX0458 and cstc2021jcyj-msxmX0480,

and the open research fund of National Mobile Communications Research

Laboratory, Southeast University under Grant 2022D06. (Corresponding au-

thor: Yunjian Jia)

Wanli Wen is with the School of Microelectronics and Communication

Engineering, Chongqing University, Chongqing, China, and also with the

National Mobile Communications Research Laboratory, Southeast University,

Nanjing, China (wanli_wen@cqu.edu.cn).

Yi Zhang, Chen Chen, and Yunjian Jia are with the School of Microelec-

tronics and Communication Engineering, Chongqing University, Chongqing,

China (never_zy@cqu.edu.cn, c.chen@cqu.edu.cn, yunjian@cqu.edu.cn).

Lu Luo and Lei Tang are with the CSSC Haizhuang Windpower Co., Ltd.,

Chongqing, China (lu.luo@hzwindpower.com, tanglei@hzwindpower.com).

then sent back from the server to the devices for another round

of model training. Since FEEL does not expose the devices’

data during model training, it can well protect data privacy,

which has attracted widespread attention from the industry

and academia [4].

To achieve an efﬁcient FEEL system with high training

performance and low training energy consumption, device

scheduling and wireless resource allocation should be care-

fully designed. Consequently, the existing researches in this

direction can be roughly divided into three categories: device

scheduling [5]–[7], resource allocation [8], as well as joint

device scheduling and resource allocation [9]–[15]. Speciﬁ-

cally, the authors in [5] analyzed the convergence of FEEL

under some conventional scheduling schemes. In [6], an online

device scheduling method based on the theory of multi-armed

bandit was proposed to minimize the training latency of FEEL.

The authors in [7] proposed a device scheduling scheme based

on the quality of wireless channel and gradient. The authors

in [8] investigated the trade-off between training latency and

energy consumption of FEEL. In [9]–[15], several different

optimization problems of joint device scheduling and resource

allocation are established to improve the training performance

[9]–[12], [15] or save the energy [12]–[14].

However, in practical wireless networks, the design of

device scheduling and resource allocation in the FEEL systems

faces some major challenges. For instance, some devices

may temporarily leave the training process due to reasons

such as losing connection, making phone calls, and suffering

low battery, so the devices may not be always available

to participate in the training process. Furthermore, some

devices may suffer from poor quality of wireless channels

and/or models/gradients, so scheduling these devices will not

only consume more energy for training, but also prolong

the training time. Therefore, to achieve an efﬁcient FEEL

system in practical wireless networks, the scheme of device

scheduling and resource allocation should jointly perceive the

device availability, wireless channel quality, and local gradient

quality. Nonetheless, the existing literature on FEEL rarely

considers these three aspects simultaneously, so the schemes

they proposed still have room for improving the efﬁciency of

FEEL, which motivates this letter.

Our main contributions are twofold: 1) A novel scheme of

device scheduling and resource allocation is devised for FEEL,

which can simultaneously perceive the device availability,

wireless channel quality, and gradient quality. 2) We obtain

some important theoretical insights into how the device avail-

ability, wireless channel quality, and gradient quality affect the

device scheduling and resource allocation in the FEEL system.

This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and

content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558

Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.

ĞǀŝĐĞϭĞǀŝĐĞϱĞǀŝĐĞϲĞǀŝĐĞϮĞǀŝĐĞϰĞǀŝĐĞϯ





hŶĂǀĂŝůĂďůĞĞǀŝĐĞĞǀŝĐĞůƵƐƚĞƌ'ƌĂĚŝĞŶƚŽŵƉƵƚĂƚŝŽŶ'ƌĂĚŝĞŶƚŐŐƌĞŐĂƚŝŽŶ









ÖÖ

§·¨¸©¹

n



ĚŐĞƐĞƌǀĞƌ

Ön



Fig. 1. An illustration of the FEEL system with one edge server

and K= 6 edge devices. Devices {1,3,5}and {2,4,6}belong to

two different clusters, respectively, where devices 3, 4, and 6 are

temporarily unavailable to participate in model training in round n.

II. SY ST EM MO DE L

We consider an FEEL system which is composed of one

edge server and Kdevices, denoted by K≜{1,2,· · · , K}.

The server and devices are equipped with one antenna. Let

Dk≜{ξd}|Dk|

d=1 be the dataset of device k, where |Dk|denotes

the cardinality of Dkand ξdis the d-th data point in Dk.

Denote D≜Sk∈K Dkas the whole data set. An example

of this system is depicted in Fig. 1. In the FEEL system, we

aim to learn a supervised machine learning (ML) model over

D, which is mathematically possible by solving the following

problem:

w∗≜arg min

wL(w),(1)

where the vector wis a parameter relating to the speciﬁc ML

model and L(w)≜1

|D| Pk∈K|Dk|Lk(w)denotes the global

loss function over D. Here, Lk(w)≜1

|Dk|P|Dk|

d=1 lk(w, ξd)

is the local loss function over Dkwhere lk(w, ξd)is the

loss function for the data point ξd∈ Dk. In the context of

FEEL, solving the problem in (1) is composed of a series of

iterations, also known as communication rounds. Denote wn

as the model vector after the n-th round with n= 1,2,· · ·,

and w0the initial model vector. Then, in each round n, the

solving process contains three stages: i) Gradient Calculation,

ii) Gradient Submission, and iii) Gradient Aggregation. In the

following, we elaborate on each of these stages.

A. Gradient Calculation

In this stage, device k∈ K computes the gradient of Lk(w)

at w=wn, where wnis the global model vector broadcasted

from the server. Note that in practice, device kmay not always

be available to perform model training due to various reasons,

e.g., losing connection to the server, making phone calls, or

suffering low battery. To reﬂect this behavior, we introduce

a binary random variable Xk∈ {0,1}to model the state

of the availability of device k.1Speciﬁcally, Xk= 1 means

that device kis available to compute gradient and Xk= 0

otherwise. Let ρk≜Pr(Xk= 1) represent the probability of

availability of device kand ρ≜(ρk)k∈K the vector of the joint

distribution. Then, based on ρk, device kgenerates a speciﬁc

availability state, denoted by xn

k∈ {0,1}, in the n-th commu-

nication round. As a result, the computed gradient at device k

1Through modeling the device availability, the impact of the transmission

of the global model vector from the edge server to the devices can be captured

in our system model.

is given by ˆgn

k=xn

kgn

k, where gn

k≜ ▽wLk(w)|w=wnwith

▽denoting the gradient operator.

B. Gradient Submission

In this stage, device kexpends a certain amount of energy

to upload ˆgn

kto the server via the wireless channel. Due to the

constraints of device availability, channel quality, and gradient

quality, it is important to select the appropriate devices to

perform gradient submission in each communication round.

1) Device Scheduling: Let Cbe the number of devices that

are scheduled for gradient submission. To select Cdifferent

devices, we consider that every Cdevices in Kform a cluster,

which will exactly generate M≜K

Cclusters in total, denoted

by M≜{1,2,· · · , M }. Let Kmdenote the set of Cdevices

included in cluster m∈ M. Apparently, scheduling cluster

mis equivalent to schedule Cdevices in Km. Let pn

mbe the

scheduling probability of cluster min round n, where

0≤pn

m≤1, m ∈ M,(2)

m∈M

m= 1.(3)

Deﬁne pn≜(pn

m)m∈M to be the device scheduling design.

Note that, the scheduling design illustrated in our work is

different with those proposed in the existing literature on

FEEL. Here, we focus on scheduling a cluster of users instead

of a single user, and thus scheduling one cluster is equivalent

to scheduling multiple users. Later, we shall see that this

scheduling design will greatly facilitate us to construct an

unbiased global gradient in the phase of Gradient Aggregation.

2) Energy Consumption: We consider time division multi-

ple access in this work.2In the n-th communication round, let

kand Pn

krepresent the channel power gain and transmission

power of device k, respectively.3Then, the transmission rate

of device kcan be calculated as Rn

k≜Blog2(1 + Pn

kHn

k/σ2)

(in bps). Here, Bis the available bandwidth and σ2denotes

the noise power. Denote by ℓthe number of bits required

to encode the gradient. To ensure device kin cluster mcan

successfully submit its local gradient to the server, Rn

kshould

satisfy Rn

k=ik(m)xn

kℓ/tn

k(m), where ik(m)∈ {0,1}is used

to check whether device kis in cluster m, i.e., ik(m) = 1

if k∈ Km, and ik(m) = 0 otherwise. Additionally, tn

k(m)is

the allocated time for device kto perform gradient submission

and satisﬁes

0≤tn

k(m)≤ik(m)xn

kT, k ∈ K, m ∈ M,(4)

k∈K

k(m) = Tmax

k∈Km

{xn

k}, m ∈ M.(5)

Here, Tdenotes the time duration of the gradient submission

stage. As a result, within the time duration tn

k(m), the trans-

mission energy consumed by device kcan be calculated as

2The analysis and optimization framework proposed in this paper can

easily be extended to other advanced access technologies such as orthogonal

frequency division multiple access and non-orthogonal multiple access.

3The server needs to estimate the channel information by using various

methods. The estimation error may cause system performance loss. However,

based on Berge’s Maximum Theorem, it is easy to prove that the performance

loss can be arbitrarily small as long as the estimation error is small.

This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and

content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558

Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.

Ek(tn

k(m)) ≜Pn

ktn

k(m) = tn

k(m)

Hkfik(m)xn

kℓ

k(m), where f(x)≜

σ22x/B −1. The total transmission energy consumption

of all devices is given by E(tn(m)) = Pk∈K Ek(tn

k(m))

with tn(m)≜(tn

k(m))k∈K. Since cluster mis selected

in accordance with the probability pm, by using the total

probability theorem, the average total transmission energy

consumption of all devices is given by

E(pn,tn)≜X

m∈M

mE(tn(m)),(6)

where tn≜(tn(m))m∈M denotes the resource allocation

design. Let ˆ

Edenote the computing energy consumed by all

devices in the phase of Gradient Calculation. Note that ˆ

does not depend on (pn,tn). Then, with (6), the total energy

consumption of all devices can be expressed as

E(pn,tn)≜¯

E(pn,tn) + ˆ

E. (7)

C. Gradient Aggregation

In this stage, the server aggregates the gradients from the

scheduled devices and then generates a new global model for

the next round of local training. Moreover, the aggregation

of gradients shall be unbiased so as to achieve convergence

of FEEL. To this end, we devise a novel gradient aggrega-

tion scheme as follows.

Gradient Aggregation Scheme: In case of scheduling

cluster m, the server calculates the global gradient, denoted

by ˆgm, based on

ˆgn

m=1

DΠpn

k∈K

ik(m)Dk

ρk

ˆgn

k.(8)

The following result concludes the unbiasedness of ˆg(m),

which will greatly help to prove the convergence of FEEL.

Lemma 1 (Unbiasedness of ˆgm): ˆgn

mis an unbiased estimate

of the ground-truth global gradient gn≜ ▽wL(wn).

Proof: See Appendix A. ■

Then, based on the global gradient ˆg(m), the ML model in

round n+ 1 can be calculated as

wn+1 =wn−ηˆgn

m.(9)

Here, η > 0denotes the learning rate.

D. One-Round Convergence Bound

The phases of Gradient Calculation,Gradient Submission,

and Gradient Aggregation will be repeated until the conver-

gence of FEEL. Using Lemma 1, we have the following result.

Lemma 2 (One-Round Convergence Bound of FEEL): If

▽wL(w)is Lipschitz continuous with a positive modulus µ,

then we have

EL(wn+1)−L(w∗)≤E[L(wn)−L(w∗)]

−η∥gn∥2+1

2µη2E[g(pn)] ,(10)

where w∗denotes an optimal solution of the problem in (1)

and g(pn)≜1

(|D|Π)2Pm∈M C

mPk∈KmDk

ρk2∥ˆgn

k∥2.

Proof: See Appendix B. ■

In Lemma 2, g(pn)is directly related to the device schedul-

ing pn. In particular, a smaller g(pn)(e.g., due to higher

device availability ρk) will lead to faster convergence of FEEL.

III. PROB LE M ESTABLISHMENT AND SOLUTION

A. Problem Establishment

Based on (7) and (10), we observe that the scheduling

and resource allocation design (pn,tn)has an impact on

both the energy consumption and the convergence of FEEL.

This observation leads to a natural question: how to design

an appropriate scheduling and resource allocation scheme

that can minimize the energy consumption of FEEL while

simultaneously accelerating its convergence? To answer this

question, we establish an optimization problem as follows.

Problem 1 (Joint Device Scheduling and Resource Alloca-

tion):

min

pn,tn(1 −λ)˜

E(pn,tn) + λg(pn)

s.t.(2),(3),(4),(5)

where λ∈[0,1] is a weight coefﬁcient. Let (p∗n,t∗n)be an

optimal solution of Problem 1. Note that in Problem 1, since

Eis independent of (pn,tn), it is a constant with respect to

(pn,tn). So, from now on, we exclude the term ˆ

Efrom the

objective function for simplicity.

Problem 1 is a challenging non-convex problem. To solve

Problem 1 optimally, we propose to decompose it into two

subproblems, namely, Resource Allocation subproblem and

Device Scheduling subproblem, using the structural properties

of Problem 1. The two subproblems are speciﬁed below.

Problem 2 (Resource Allocation for Each m∈ M):

t∗n(m)≜arg min

tnE(tn(m))

s.t.0≤tn

k(m)≤ik(m)xn

kT, k ∈ K,(11)

k∈K

k(m) = Tmax

k∈Km

{xn

k},(12)

where t∗n(m)denotes an optimal solution. Note that we have

t∗n= (t∗n(m))m∈M.

Problem 3 (Device Scheduling for Given t∗n):

p∗n= arg min

pn(1 −λ)¯

E(pn,t∗n) + λg(pn)

s.t.(2),(3).

The relationship between Problem 1 and Problems 2 and 3

is addressed as follows. Speciﬁcally, it can be easy to verify

that if tn(m)and pnare in the feasible sets of Problems 2

and 3, respectively, the point (pn,tn)is a feasible point of

Problem 1, and vice versa. So, Problem 1 and Problems 2

and 3 have identical feasible sets. The point (pn,tn)is optimal

for Problem 1 if and only if it is optimal for Problems 2 and 3.

Thus, we conclude that Problem 1 and Problems 2 and 3 are

equivalent. On this basis, to solve Problem 1, we are only

required to solve Problems 2 and 3 separately without losing

any optimality, which are elaborated in the sequel.

B. Problem Solution

First, we solve Problem 2. Since Problem 2 is convex, we

can get an optimal solution of it by using the KKT conditions,

as summarized below. Note that we have omitted the details

of the proof due to page limitation.

This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and

content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558

Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.

Algorithm 1 The Algorithm to Solve Problem 1

1: Require: K,B,λ,C,T,σ2,Dk,ρk,Hn

k,∥ˆ

k∥, and xn

2: Obtain t∗nby solving Problem 2 via Lemma 3.

3: Obtain p∗nby solving Problem 3 via Lemma 4.

4: Return: (p∗n,t∗n)

Lemma 3 (Optimal Solution of Problem 2): An optimal

solution of Problem 2 is given by

t∗n

k(m) = min 





max 





ℓln 2/B

W0Hn

k¯ν−σ2

σ2e+ 1

,0





, ik(m)xn

kT





where W0(·)denotes the Lambert function and ¯νsatisﬁes

Pk∈K ¯

k(m) = Tmaxk∈Km{xn

k}.

Remark 1 (Quality-Aware Resource Allocation): We can

observe from Lemma 3 that the time allocation t∗n

k(m)is

only aware of the channel quality Hn

kand is independent of

the gradient quality ∥ˆgn

k∥and the device availability ρk. In

particular, t∗n

k(m)decreases with a growing channel quality

of the available device kin cluster msince a higher quality

channel can support a larger transmission rate.

Next, we solve Problem 3. Similar to Problem 2, Problem 3

is also convex, and thus, we have the following result.

Lemma 4 (Optimal Solution of Problem 3): An optimal

solution of Problem 3 is given by

p∗n

m= min (max (sλvm

(1 −λ)E(t∗n(m)) + ¯φ,0),1),

where vm≜C

(|D|Π)2Pk∈KmDk

ρk2∥ˆgn

k∥2and ¯φsatisﬁes

Pm∈M p∗n

m= 1.

Remark 2 (Quality and Availability-Aware Device Schedul-

ing): From Lemma 4, we can observe that the device schedul-

ing probability p∗n

mis determined by the channel quality

(captured in E(t∗n(m))) and the gradient quality (captured

in vm) as well as the device availability (captured in vm). In

particular, p∗n

mincreases with the improvement of the channel

quality and the gradient quality of all available devices in clus-

ter m. This can be explained as follows: high-quality gradients

contain richer local data information that can contribute to the

convergence of FEEL, and high-quality channels can support

the submission of more high-quality gradients. The probability

p∗n

mincreases with a reducing availability probability of all

active devices in cluster m. This is because lazy devices with

low availability have more fresh data, and scheduling theses

devices will accelerate the convergence of FEEL.

Finally, by combining Lemmas 3 and 4, we can obtain

an optimal joint device scheduling and resource allocation

scheme (p∗n,t∗n), as summarized in Algorithm 1. Note that

Algorithm 1 is executed by the edge server in Stage 2 of FEEL.

Since (p∗n,t∗n)is obtained in a closed form, Algorithm 1

can be very efﬁcient and scalable, and has the potential to

solve large-scale problems. We believe that the edge server

has sufﬁcient processing power to execute Algorithm 1.

IV. NUMERICAL RES ULT S

The simulation settings are given below: K= 10,B=

10 MHz, σ2= 10−9W, T= 60 ms, λ= 1 ×10−6, and Hk

0 100 200 300 400 500 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

C=2

C=4

C=6

C=10

(a) Test accuracy.

0 100 200 300 400 500 600

C=2

C=4

C=6

C=10

0 600

0.2

(b) Energy consumption.

Fig. 2. Impact of cluster size.

0 200 400 600

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18 Proposed scheme

Baseline 1

Baseline 2

Baseline 3

0 200 400 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 Proposed scheme

Baseline 1

Baseline 2

Baseline 3

(a) MNIST dataset.

0 200 400 600

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9 Proposed scheme

Baseline 1

Baseline 2

Baseline 3

0 200 400 600

0.2

0.4

0.6

0.8

1Proposed scheme

Baseline 1

Baseline 2

Baseline 3

(b) Fashion-MNIST dataset.

Fig. 3. Performance comparison with baselines.

is modeled as an independent exponential distribution with

mean ˆ

H= 10−5, where ˆ

Hreﬂects the mean channel quality.

We use FEEL to train a classiﬁcation model on MNIST (built

into Matlab 2021a) and Fashion-MNIST datasets, respectively.

The MNIST contains 10000 handwritten images of the digits 0

to 9, where each digit has 1000 images. The Fashion-MNIST

consists of 70000 images of fashion items from 10 categories,

including “t-shirt”, “bag”, and so on, with 7000 images for

each category. The learning rate ηand the momentum weight

are set to be 0.001 and 0, respectively. Then using Matlab, we

obtain the size of the gradient ˆgn

kas ℓ= 9 ×105bits (on the

MNIST dataset) or ℓ= 1 ×106bits (on the Fashion-MNIST

dataset). To make ensure that the training data distribution of

each device is non-IID, we ﬁrst randomly assign a label to each

device, and then randomly select |Dk|images from all images

under this label as the training set, and the rest as the test set.

Here, we set |Dk|= 800 if kis odd and |Dk|= 200, otherwise.

We use the test set to evaluate the classiﬁcation performance

of FEEL during model training. We employ the convolutional

neural network (CNN) to perform image classiﬁcation task.

The concrete architecture of CNN is the same as that in [1].

All numerical results are averaged over 100 trails.

Fig. 2 shows the impact of the cluster size Con the accuracy

and energy consumption of the FEEL system. From Fig. 2, we

can see that the larger Cis, the higher the test accuracy is, but

at the cost of consuming more energy. This indicates that in

practical applications, the number of scheduled devices should

be adjusted appropriately to balance the accuracy and energy

consumption.

Fig. 3 compares the performance of FEEL under the

proposed Algorithm 1 with those under three representative

baselines, where C= 2 for the MNIST dataset and C= 6

for the Fashion-MNIST dataset, ρk= 0.8if kis odd and

ρk= 0.2, otherwise. Here, Baseline 1 refers to a uniform

scheduling design, i.e., pn

m= 1/M; Baseline 2 is a schedul-

ing design with availability and gradient quality awareness,

i.e., pn

m=Gn

m/Pm∈M Gn

m, where Gn

m≜1

CPk∈Km∥ˆgn

k∥

represents the mean gradient norm of cluster m; Baseline 3

This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and

content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558

Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.

adopts a scheduling design with availability and channel

quality awareness, i.e., pn

m=Jn

m/Pm∈M Jn

m, where Jn

m≜

CPk∈Kmxn

kHn

kdenotes the mean channel power gain of

cluster m. Note that the baselines also consider the optimiza-

tion of resource allocation via Lemma 3. From Fig. 3, it is

clear that the proposed scheme is signiﬁcantly better than the

baselines in terms of test accuracy and energy consumption on

both the MNIST and fashion-MNIST datasets. The underlying

reasons are given as follows. Baseline 1 fails to perceive

the device availability and completely overlooks the wireless

channel quality as well as the updated gradient quality during

model training. Although Baseline 1 (Baseline 2) perceives

the device availability and can be conscious of the gradient

quality (channel quality), the channel quality (gradient quality)

is completely ignored. In contrast, as pointed out in Remark 1

and Remark 2, our proposed scheme can well adapt to the

changes of device availability, wireless channel quality, and

local gradient quality, so it achieves higher accuracy and lower

energy consumption.

V. CONCLUSIONS

In this paper, we ﬁrst mathematically model the device

availability, wireless channel quality, and gradient quality, and

derive the convergence bound of FEEL. Then, we formulate a

joint device scheduling and resource allocation problem, aim-

ing to improve the FEEL efﬁciency. The formulated problem is

a challenging non-convex problem. By exploring its structural

properties and utilizing the KKT conditions, we obtain an

optimal solution in closed-form. Finally, the analytical results

enable us to gain some important insights into how the device

availability, wireless channel quality, and gradient quality

affect the device scheduling and resource allocation in the

FEEL system.

APPENDIX A

PROO F OF LEMMA 1

By taking the derivative of L(w)at w=wn, we have

gn≜1

|D| Pk∈K|Dk|▽wLk(w)|w=wn=1

|D| Pk∈K|Dk|gn

Then, by taking expectation of ˆgn

m, we have

E[ˆgn

m] = X

u∈M

|D|Πpn

k∈K

ik(u)|Dk|ρ−1

kE[xn

k]gn

(a)

|D|ΠX

k∈K

|Dk|gn

u∈M

ik(u)

(b)

|D| X

k∈K

|Dk|gn

where (a) follows from E[xn

k] = ρkand (b) is due to

Pu∈M ik(u) = Π for each k∈ K. Therefore, we have

E[ˆgn

m] = gn, namely, ˆgn

mis an unbiased estimation of gn,

which completes the proof. ■

APPENDIX B

PROO F OF LEMMA 2

First, since ▽wL(w)is Lipschitz continuous with a positive

modulus µ, we have

L(wn+1)≤L(wn) + wn+1 −wnTgn+µ

2

wn+1 −wn



(a)

=L(wn)+(−ηˆgn(m))Tgn+µ

2∥−ηˆgn(m)∥2.

Here, (·)Tis the transposition operator and (a) follows from

(9). Then, taking expectation in both sides of the above

inequality, we have

EL(wn+1)−L(w∗)

(b)

≤E[L(wn)−L(w∗)] −η∥gn∥2+µη2

2Eh∥ˆgn(m)∥2i

=E[L(wn)−L(w∗)] −η∥gn∥2+µη2

2(|D|Π)2X

m∈M

∆,

where ∆≜E



Pk∈Km

ρkˆgn

k



2and (b) follows from

the unbiasedness of ˆgn

m. Next, using the generalized triangle

inequality of the second kind ∥Pn

j=1 xj∥2≤nPn

j=1∥xj∥2,

we have ∆≤CPk∈KmDk

ρk2

Eh∥ˆgn

k∥2i. Finally, using the

deﬁnition of g(pn), we complete the proof. ■

REFERENCES

[1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,

“Communication-Efﬁcient Learning of Deep Networks from Decentral-

ized Data,” in Proc. Int. Conf. Artiﬁcial Intell. Stat. (AISTATS), vol. 54,

2017, pp. 1273–1282.

[2] G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward

an intelligent edge: Wireless communication meets machine learning,”

IEEE Commun. Mag., vol. 58, no. 1, pp. 19–25, Jan. 2020.

[3] Y. Liu, Y. Zhu, and J. J. Yu, “Resource-constrained federated learning

with heterogeneous data: Formulation and analysis,” IEEE Trans. Netw.

Sci. Eng., pp. 1–1, 2021.

[4] J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang, “Incentive mech-

anism for reliable federated learning: A joint optimization approach

to combining reputation and contract theory,” IEEE Internet Things J.,

vol. 6, no. 6, pp. 10 700–10 714, 2019.

[5] H. H. Yang, Z. Liu, T. Q. S. Quek, and H. V. Poor, “Scheduling policies

for federated learning in wireless networks,” IEEE Trans. Commun.,

vol. 68, no. 1, pp. 317–333, Jan. 2020.

[6] W. Xia, T. Q. S. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu,

“Multi-armed bandit-based client scheduling for federated learning,”

IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7108–7123, Nov.

2020.

[7] J. Leng, Z. Lin, M. Ding, P. Wang, D. Smith, and B. Vucetic, “Client

scheduling in wireless federated learning based on channel and learning

qualities,” IEEE Wirel. Commun., pp. 1–1, 2022.

[8] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy

efﬁcient federated learning over wireless communication networks,”

IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1935–1949, Mar.

2021.

[9] W. Shi, S. Zhou, and Z. Niu, “Device scheduling with fast convergence

for wireless federated learning,” in Proc. IEEE ICC, Jun. 2020, pp. 1–6.

[10] J. Xu and H. Wang, “Client selection and bandwidth allocation in

wireless federated learning networks: A long-term perspective,” IEEE

Trans. Wireless Commun., vol. 20, no. 2, pp. 1188–1200, Feb. 2021.

[11] M. M. Wadu, S. Samarakoon, and M. Bennis, “Joint client scheduling

and resource allocation under channel uncertainty in federated learning,”

IEEE Trans. Commun., pp. 1–1, 2021.

[12] J. Ren, Y. He, D. Wen, G. Yu, K. Huang, and D. Guo, “Scheduling

for cellular federated edge learning with importance and channel aware-

ness,” IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7690–7703,

Nov. 2020.

[13] Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efﬁcient radio

resource allocation for federated edge learning,” in Proc. IEEE ICC

Workshops, Jun. 2020, pp. 1–6.

[14] W. Wen, Z. Chen, H. H. Yang, W. Xia, and T. Q. S. Quek, “Joint schedul-

ing and resource allocation for hierarchical federated edge learning,”

IEEE Trans. Wireless Commun., pp. 1–1, 2022.

[15] H.-S. Lee, “Device selection and resource allocation for layerwise

federated learning in wireless networks,” IEEE Systems Journal, pp.

1–4, 2022.

This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and

content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558

Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.

Communication-Efficient Resource Allocation for Wireless Federated Learning Systems

Book

Jun 2023

Chung-Hsuan Hu

Availability-Aware Group-Personalized Federated Learning in Wireless Edge Networks

Conference Paper

Nov 2023

Energy Efficient Federated Learning Over Wireless Communication Networks

Article

Full-text available

Nov 2020

In this paper, the problem of energy efficient transmission and computation resource allocation for federated learning (FL) over wireless communication networks is investigated. In the considered model, each user exploits limited local computational resources to train a local FL model with its collected data and, then, sends the trained FL model to a base station (BS) which aggregates the local FL model and broadcasts it back to all of the users. Since FL involves an exchange of a learning model between users and the BS, both computation and communication latencies are determined by the learning accuracy level. Meanwhile, due to the limited energy budget of the wireless users, both local computation energy and transmission energy must be considered during the FL process. This joint learning and communication problem is formulated as an optimization problem whose goal is to minimize the total energy consumption of the system under a latency constraint. To solve this problem, an iterative algorithm is proposed where, at every step, closed-form solutions for time allocation, bandwidth allocation, power control, computation frequency, and learning accuracy are derived. Since the iterative algorithm requires an initial feasible solution, we construct the completion time minimization problem and a bisection-based algorithm is proposed to obtain the optimal solution, which is a feasible solution to the original energy minimization problem. Numerical results show that the proposed algorithms can reduce up to 59.5% energy consumption compared to the conventional FL method.

Device Selection and Resource Allocation for Layerwise Federated Learning in Wireless Networks

Article

Dec 2022

Hyun-Suk Lee

In this article, we study device selection and resource allocation (DSRA) for layerwise federated learning (FL) in wireless networks. For effective learning, DSRA should be carefully determined considering the characteristics of both layerwise FL and wireless networks. To address this, we propose a DSRA algorithm for layerwise FL, called LAFLAS, that maximizes the total average number of the shallow and entire parameter transmissions over time in FL while guaranteeing the ratio between their numbers. Through experiments, we show that the model trained by our LAFLAS outperforms those by state-of-the-art algorithms, which demonstrates the effectiveness of DSRA by LAFLAS.

Joint Scheduling and Resource Allocation for Hierarchical Federated Edge Learning

Article

Aug 2022

The concept of hierarchical federated edge learning (H-FEEL) has been recently proposed as an enhancement of federated learning model. Such a system generally consists of three entities, i.e., the server, helpers, and clients, in which each helper collects the trained gradients from clients nearby, aggregates them, and sends the result to the server for global model update. Due to limited communication resources, only a portion of helpers can be scheduled to upload their aggregated gradients in each round of the model training. And that necessitates a well-designed scheme for the joint helper scheduling and communication resources allocation. In this paper, we develop a training algorithm for the H-FEEL system which involves local gradient computing, weighted gradient uploading, and machine learning model updating phases. By characterizing these phases mathematically and analyzing one-round convergence bound of the training algorithm, we formulate an optimization problem to achieve the scheduling and resource allocation scheme. The problem simultaneously captures the uncertainty of the wireless channel and the importance of the weighted gradient. To solve the problem, we first transform it into an equivalent problem and then decompose the transformed problem into two subproblems: bit and sub-channel allocation and helper scheduling , which are mixed integer nonlinear programming and continuous nonlinear problems, respectively. For the first subproblem, we obtain an optimal solution of exponential complexity and a suboptimal solution that has polynomial complexity. For the second subproblem, we obtain a closed-form optimal solution in a special case and a suboptimal solution in the general case. The efficacy of our scheme is amply demonstrated via simulations and the analytical framework is shown to provide valuable design insights for the practical implementation of the H-FEEL system.

Client Scheduling in Wireless Federated Learning Based on Channel and Learning Qualities

Article

Jan 2022

Federated learning (FL) emerges as a distributed training method in the Internet of Things (IoT), allowing participating clients to use their local data to train local models and upload parameters for global model aggregation after every few local iterations, protecting data privacy and reducing communication overhead. Given the scarcity of wireless communication resources, in this letter, we propose a client scheduling strategy for a wireless FL network based on a joint quality of channel and learning. Finally, we compare the proposed scheduling method’s performance with that of traditional methods considering the channel quality only. Experimental results show that our method can significantly improve training performance in terms of model accuracy and speed of convergence.

Resource-constrained Federated Learning with Heterogeneous Data: Formulation and Analysis

Article

Nov 2021

Efficient collaboration between collaborative machine learning and wireless communication technology, forming a Federated Edge Learning (FEEL), has spawned a series of next-generation intelligent applications. However, due to the openness of network connections, the FEEL framework generally involves hundreds of remote devices (or clients), resulting in expensive communication costs, which is not friendly to resource-constrained FEEL. To address this issue, we propose a distributed approximate Newton-type algorithm with fast convergence speed to alleviate the problem of FEEL resource (in terms of communication resources) constraints. Specifically, the proposed algorithm is improved based on distributed L-BFGS algorithm and allows each client to approximate the high-cost Hessian matrix by computing the low-cost Fisher matrix in a distributed manner to find a ‘'better’' descent direction, thereby speeding up convergence. Second, we prove that the proposed algorithm has linear convergence in strongly convex and non-convex cases and analyze its computational and communication complexity. Similarly, due to the heterogeneity of the connected remote devices, FEEL faces the challenge of heterogeneous data and non-IID (Independent and Identically Distributed) data. To this end, we design a simple but elegant training scheme, namely FedOVA, to solve the heterogeneous statistical challenge brought by heterogeneous data.

Joint Client Scheduling and Resource Allocation Under Channel Uncertainty in Federated Learning

Article

Jun 2021

The performance of federated learning (FL) over wireless networks depend on the reliability of the client-server connectivity and clients’ local computation capabilities. In this article we investigate the problem of client scheduling and resource block (RB) allocation to enhance the performance of model training using FL, over a pre-defined training duration under imperfect channel state information (CSI) and limited local computing resources. First, we analytically derive the gap between the training losses of FL with clients scheduling and a centralized training method for a given training duration. Then, we formulate the gap of the training loss minimization over client scheduling and RB allocation as a stochastic optimization problem and solve it using Lyapunov optimization. A Gaussian process regression-based channel prediction method is leveraged to learn and track the wireless channel, in which, the clients’ CSI predictions and computing power are incorporated into the scheduling decision. Using an extensive set of simulations, we validate the robustness of the proposed method under both perfect and imperfect CSI over an array of diverse data distributions. Results show that the proposed method reduces the gap of the training accuracy loss by up to 40.7% compared to state-of-the-art client scheduling and RB allocation methods.

Client Selection and Bandwidth Allocation in Wireless Federated Learning Networks: A Long-Term Perspective

Article

Oct 2020

This paper studies federated learning (FL) in a classic wireless network, where learning clients share a common wireless link to a coordinating server to perform federated model training using their local data. In such wireless federated learning networks (WFLNs), optimizing the learning performance depends crucially on how clients are selected and how bandwidth is allocated among the selected clients in every learning round, as both radio and client energy resources are limited. While existing works have made some attempts to allocate the limited wireless resources to optimize FL, they focus on the problem in individual learning rounds, overlooking an inherent yet critical feature of federated learning. This paper brings a new long-term perspective to resource allocation in WFLNs, realizing that learning rounds are not only temporally interdependent but also have varying significance towards the final learning outcome. To this end, we first design data-driven experiments to show that different temporal client selection patterns lead to considerably different learning performance. With the obtained insights, we formulate a stochastic optimization problem for joint client selection and bandwidth allocation under long-term client energy constraints, and develop a new algorithm that utilizes only currently available wireless channel information but can achieve long-term performance guarantee. Experiments show that our algorithm results in the desired temporal client selection pattern, is adaptive to changing network environments and far outperforms benchmarks that ignore the long-term effect of FL.

Scheduling for Cellular Federated Edge Learning With Importance and Channel Awareness

Article

Aug 2020

In cellular federated edge learning (FEEL), multiple edge devices holding local data jointly train a neural network by communicating learning updates with an access point without exchanging their data samples. With very limited communication resources, it is beneficial to schedule the most informative local learning updates. This paper focuses on FEEL with gradient averaging over participating devices in each round of communication. A novel scheduling policy is proposed to exploit both diversity in multiuser channels and diversity in the “importance” of the edge devices’ learning updates. First, a new probabilistic scheduling framework is developed to yield unbiased update aggregation in FEEL. The importance of a local learning update is measured by its gradient divergence. If one edge device is scheduled in each communication round, the scheduling policy is derived in closed form to achieve the optimal trade-off between channel quality and update importance. The probabilistic scheduling framework is then extended to allow scheduling multiple edge devices in each communication round. Numerical results obtained using popular models and learning datasets demonstrate that the proposed scheduling policy can achieve faster model convergence and higher learning accuracy than conventional scheduling policies that only exploit a single type of diversity.

Device Scheduling with Fast Convergence for Wireless Federated Learning

Conference Paper

Jun 2020

Energy-Efficient Radio Resource Allocation for Federated Edge Learning

Conference Paper

Jun 2020

Quality- and Availability-Based Device Scheduling and Resource Allocation for Federated Edge Learning

Abstract

Recommended publications

Offloading and Resource Allocation With General Task Graph in Mobile Edge Computing: A Deep Reinforc...

Joint Scheduling and Resource Allocation for Hierarchical Federated Edge Learning

Federated Learning in SWIPT-Enabled Micro-UAV Swarm Networks: A Joint Design of Scheduling and Resou...

Towards Fast and Energy-Efficient Hierarchical Federated Edge Learning: A Joint Design for Helper Sc...

Clustered Scheduling and Communication Pipelining for Efficient Resource Management of Wireless Fede...