Content uploaded by Chen Chen
Author content
All content in this area was uploaded by Chen Chen on Aug 16, 2022
Content may be subject to copyright.
1
Quality- and Availability-Based Device
Scheduling and Resource Allocation
for Federated Edge Learning
Wanli Wen, Yi Zhang, Chen Chen, Yunjian Jia, Lu Luo, and Lei Tang
Abstract—To achieve an efficient federated edge learning
(FEEL) system, the scheme of device scheduling and resource
allocation should jointly perceive the device availability, wireless
channel quality, and local gradient quality. The existing literature
on FEEL rarely considers these three aspects simultaneously, so
the schemes they proposed still have room for improving the
efficiency of FEEL, which motivates our work. In this paper, by
mathematically modeling the device availability, wireless channel
quality, and gradient quality, and deriving the convergence
bound for model training in the FEEL system, we formulate a
joint device scheduling and resource allocation problem, aiming
to improve the FEEL efficiency. The formulated problem is
a challenging non-convex problem. By exploring its structural
properties and utilizing the KKT conditions, we obtain an optimal
solution in closed-form. The analytical results enable us to gain
some important insights into how the device availability, wireless
channel quality, and gradient quality affect device scheduling and
resource allocation in the FEEL system.
Index Terms—Federated edge learning, device availability,
channel and gradient quality, scheduling, resource allocation.
I. INTRODUCTION
With the development of machine learning (ML), there is a
growing trend to deploy ML algorithms at the wireless edge
to extract useful knowledge from massive data generated on
end-user devices such as smartphones and cars. For traditional
ML algorithms, the training data needs to be gathered at the
edge server, such as the base station or access point, for
model training. However, the devices may be reluctant to share
sensitive data with the server due to concerns about privacy
disclosure. To address this issue, federated edge learning
(FEEL) has been proposed in recent years [1]–[3]. The model
training process of FEEL is an iterative process, where an
iteration is also called a communication round. In an arbitrary
round, each device downloads a global ML model from the
server, computes an updated model/gradient based on the local
dataset, and then submits the resultant model/gradient to the
server for parameter aggregation. An improved global model is
This work is sponsored by the National Natural Science Foundation of
China under Grant 61971077, the Natural Science Foundation of Chongqing,
China under Grant cstc2021jcyj-msxmX0458 and cstc2021jcyj-msxmX0480,
and the open research fund of National Mobile Communications Research
Laboratory, Southeast University under Grant 2022D06. (Corresponding au-
thor: Yunjian Jia)
Wanli Wen is with the School of Microelectronics and Communication
Engineering, Chongqing University, Chongqing, China, and also with the
National Mobile Communications Research Laboratory, Southeast University,
Nanjing, China (wanli_wen@cqu.edu.cn).
Yi Zhang, Chen Chen, and Yunjian Jia are with the School of Microelec-
tronics and Communication Engineering, Chongqing University, Chongqing,
China (never_zy@cqu.edu.cn, c.chen@cqu.edu.cn, yunjian@cqu.edu.cn).
Lu Luo and Lei Tang are with the CSSC Haizhuang Windpower Co., Ltd.,
Chongqing, China (lu.luo@hzwindpower.com, tanglei@hzwindpower.com).
then sent back from the server to the devices for another round
of model training. Since FEEL does not expose the devices’
data during model training, it can well protect data privacy,
which has attracted widespread attention from the industry
and academia [4].
To achieve an efficient FEEL system with high training
performance and low training energy consumption, device
scheduling and wireless resource allocation should be care-
fully designed. Consequently, the existing researches in this
direction can be roughly divided into three categories: device
scheduling [5]–[7], resource allocation [8], as well as joint
device scheduling and resource allocation [9]–[15]. Specifi-
cally, the authors in [5] analyzed the convergence of FEEL
under some conventional scheduling schemes. In [6], an online
device scheduling method based on the theory of multi-armed
bandit was proposed to minimize the training latency of FEEL.
The authors in [7] proposed a device scheduling scheme based
on the quality of wireless channel and gradient. The authors
in [8] investigated the trade-off between training latency and
energy consumption of FEEL. In [9]–[15], several different
optimization problems of joint device scheduling and resource
allocation are established to improve the training performance
[9]–[12], [15] or save the energy [12]–[14].
However, in practical wireless networks, the design of
device scheduling and resource allocation in the FEEL systems
faces some major challenges. For instance, some devices
may temporarily leave the training process due to reasons
such as losing connection, making phone calls, and suffering
low battery, so the devices may not be always available
to participate in the training process. Furthermore, some
devices may suffer from poor quality of wireless channels
and/or models/gradients, so scheduling these devices will not
only consume more energy for training, but also prolong
the training time. Therefore, to achieve an efficient FEEL
system in practical wireless networks, the scheme of device
scheduling and resource allocation should jointly perceive the
device availability, wireless channel quality, and local gradient
quality. Nonetheless, the existing literature on FEEL rarely
considers these three aspects simultaneously, so the schemes
they proposed still have room for improving the efficiency of
FEEL, which motivates this letter.
Our main contributions are twofold: 1) A novel scheme of
device scheduling and resource allocation is devised for FEEL,
which can simultaneously perceive the device availability,
wireless channel quality, and gradient quality. 2) We obtain
some important theoretical insights into how the device avail-
ability, wireless channel quality, and gradient quality affect the
device scheduling and resource allocation in the FEEL system.
This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.
2
ĞǀŝĐĞϭĞǀŝĐĞϱĞǀŝĐĞϲĞǀŝĐĞϮĞǀŝĐĞϰĞǀŝĐĞϯ
n
Z
Ö
n
J
hŶĂǀĂŝůĂďůĞĞǀŝĐĞĞǀŝĐĞůƵƐƚĞƌ'ƌĂĚŝĞŶƚŽŵƉƵƚĂƚŝŽŶ'ƌĂĚŝĞŶƚŐŐƌĞŐĂƚŝŽŶ
n
Z
Ö
n
J
ÖÖ
n
n
§·¨¸©¹
J
J
n
Z
6
ĚŐĞƐĞƌǀĞƌ
Ön
J
n
Z
Fig. 1. An illustration of the FEEL system with one edge server
and K= 6 edge devices. Devices {1,3,5}and {2,4,6}belong to
two different clusters, respectively, where devices 3, 4, and 6 are
temporarily unavailable to participate in model training in round n.
II. SY ST EM MO DE L
We consider an FEEL system which is composed of one
edge server and Kdevices, denoted by K≜{1,2,· · · , K}.
The server and devices are equipped with one antenna. Let
Dk≜{ξd}|Dk|
d=1 be the dataset of device k, where |Dk|denotes
the cardinality of Dkand ξdis the d-th data point in Dk.
Denote D≜Sk∈K Dkas the whole data set. An example
of this system is depicted in Fig. 1. In the FEEL system, we
aim to learn a supervised machine learning (ML) model over
D, which is mathematically possible by solving the following
problem:
w∗≜arg min
wL(w),(1)
where the vector wis a parameter relating to the specific ML
model and L(w)≜1
|D| Pk∈K|Dk|Lk(w)denotes the global
loss function over D. Here, Lk(w)≜1
|Dk|P|Dk|
d=1 lk(w, ξd)
is the local loss function over Dkwhere lk(w, ξd)is the
loss function for the data point ξd∈ Dk. In the context of
FEEL, solving the problem in (1) is composed of a series of
iterations, also known as communication rounds. Denote wn
as the model vector after the n-th round with n= 1,2,· · ·,
and w0the initial model vector. Then, in each round n, the
solving process contains three stages: i) Gradient Calculation,
ii) Gradient Submission, and iii) Gradient Aggregation. In the
following, we elaborate on each of these stages.
A. Gradient Calculation
In this stage, device k∈ K computes the gradient of Lk(w)
at w=wn, where wnis the global model vector broadcasted
from the server. Note that in practice, device kmay not always
be available to perform model training due to various reasons,
e.g., losing connection to the server, making phone calls, or
suffering low battery. To reflect this behavior, we introduce
a binary random variable Xk∈ {0,1}to model the state
of the availability of device k.1Specifically, Xk= 1 means
that device kis available to compute gradient and Xk= 0
otherwise. Let ρk≜Pr(Xk= 1) represent the probability of
availability of device kand ρ≜(ρk)k∈K the vector of the joint
distribution. Then, based on ρk, device kgenerates a specific
availability state, denoted by xn
k∈ {0,1}, in the n-th commu-
nication round. As a result, the computed gradient at device k
1Through modeling the device availability, the impact of the transmission
of the global model vector from the edge server to the devices can be captured
in our system model.
is given by ˆgn
k=xn
kgn
k, where gn
k≜ ▽wLk(w)|w=wnwith
▽denoting the gradient operator.
B. Gradient Submission
In this stage, device kexpends a certain amount of energy
to upload ˆgn
kto the server via the wireless channel. Due to the
constraints of device availability, channel quality, and gradient
quality, it is important to select the appropriate devices to
perform gradient submission in each communication round.
1) Device Scheduling: Let Cbe the number of devices that
are scheduled for gradient submission. To select Cdifferent
devices, we consider that every Cdevices in Kform a cluster,
which will exactly generate M≜K
Cclusters in total, denoted
by M≜{1,2,· · · , M }. Let Kmdenote the set of Cdevices
included in cluster m∈ M. Apparently, scheduling cluster
mis equivalent to schedule Cdevices in Km. Let pn
mbe the
scheduling probability of cluster min round n, where
0≤pn
m≤1, m ∈ M,(2)
X
m∈M
pn
m= 1.(3)
Define pn≜(pn
m)m∈M to be the device scheduling design.
Note that, the scheduling design illustrated in our work is
different with those proposed in the existing literature on
FEEL. Here, we focus on scheduling a cluster of users instead
of a single user, and thus scheduling one cluster is equivalent
to scheduling multiple users. Later, we shall see that this
scheduling design will greatly facilitate us to construct an
unbiased global gradient in the phase of Gradient Aggregation.
2) Energy Consumption: We consider time division multi-
ple access in this work.2In the n-th communication round, let
Hn
kand Pn
krepresent the channel power gain and transmission
power of device k, respectively.3Then, the transmission rate
of device kcan be calculated as Rn
k≜Blog2(1 + Pn
kHn
k/σ2)
(in bps). Here, Bis the available bandwidth and σ2denotes
the noise power. Denote by ℓthe number of bits required
to encode the gradient. To ensure device kin cluster mcan
successfully submit its local gradient to the server, Rn
kshould
satisfy Rn
k=ik(m)xn
kℓ/tn
k(m), where ik(m)∈ {0,1}is used
to check whether device kis in cluster m, i.e., ik(m) = 1
if k∈ Km, and ik(m) = 0 otherwise. Additionally, tn
k(m)is
the allocated time for device kto perform gradient submission
and satisfies
0≤tn
k(m)≤ik(m)xn
kT, k ∈ K, m ∈ M,(4)
X
k∈K
tn
k(m) = Tmax
k∈Km
{xn
k}, m ∈ M.(5)
Here, Tdenotes the time duration of the gradient submission
stage. As a result, within the time duration tn
k(m), the trans-
mission energy consumed by device kcan be calculated as
2The analysis and optimization framework proposed in this paper can
easily be extended to other advanced access technologies such as orthogonal
frequency division multiple access and non-orthogonal multiple access.
3The server needs to estimate the channel information by using various
methods. The estimation error may cause system performance loss. However,
based on Berge’s Maximum Theorem, it is easy to prove that the performance
loss can be arbitrarily small as long as the estimation error is small.
This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.
3
Ek(tn
k(m)) ≜Pn
ktn
k(m) = tn
k(m)
Hkfik(m)xn
kℓ
tn
k(m), where f(x)≜
σ22x/B −1. The total transmission energy consumption
of all devices is given by E(tn(m)) = Pk∈K Ek(tn
k(m))
with tn(m)≜(tn
k(m))k∈K. Since cluster mis selected
in accordance with the probability pm, by using the total
probability theorem, the average total transmission energy
consumption of all devices is given by
¯
E(pn,tn)≜X
m∈M
pn
mE(tn(m)),(6)
where tn≜(tn(m))m∈M denotes the resource allocation
design. Let ˆ
Edenote the computing energy consumed by all
devices in the phase of Gradient Calculation. Note that ˆ
E
does not depend on (pn,tn). Then, with (6), the total energy
consumption of all devices can be expressed as
˜
E(pn,tn)≜¯
E(pn,tn) + ˆ
E. (7)
C. Gradient Aggregation
In this stage, the server aggregates the gradients from the
scheduled devices and then generates a new global model for
the next round of local training. Moreover, the aggregation
of gradients shall be unbiased so as to achieve convergence
of FEEL. To this end, we devise a novel gradient aggrega-
tion scheme as follows.
Gradient Aggregation Scheme: In case of scheduling
cluster m, the server calculates the global gradient, denoted
by ˆgm, based on
ˆgn
m=1
DΠpn
mX
k∈K
ik(m)Dk
ρk
ˆgn
k.(8)
The following result concludes the unbiasedness of ˆg(m),
which will greatly help to prove the convergence of FEEL.
Lemma 1 (Unbiasedness of ˆgm): ˆgn
mis an unbiased estimate
of the ground-truth global gradient gn≜ ▽wL(wn).
Proof: See Appendix A. ■
Then, based on the global gradient ˆg(m), the ML model in
round n+ 1 can be calculated as
wn+1 =wn−ηˆgn
m.(9)
Here, η > 0denotes the learning rate.
D. One-Round Convergence Bound
The phases of Gradient Calculation,Gradient Submission,
and Gradient Aggregation will be repeated until the conver-
gence of FEEL. Using Lemma 1, we have the following result.
Lemma 2 (One-Round Convergence Bound of FEEL): If
▽wL(w)is Lipschitz continuous with a positive modulus µ,
then we have
EL(wn+1)−L(w∗)≤E[L(wn)−L(w∗)]
−η∥gn∥2+1
2µη2E[g(pn)] ,(10)
where w∗denotes an optimal solution of the problem in (1)
and g(pn)≜1
(|D|Π)2Pm∈M C
pn
mPk∈KmDk
ρk2∥ˆgn
k∥2.
Proof: See Appendix B. ■
In Lemma 2, g(pn)is directly related to the device schedul-
ing pn. In particular, a smaller g(pn)(e.g., due to higher
device availability ρk) will lead to faster convergence of FEEL.
III. PROB LE M ESTABLISHMENT AND SOLUTION
A. Problem Establishment
Based on (7) and (10), we observe that the scheduling
and resource allocation design (pn,tn)has an impact on
both the energy consumption and the convergence of FEEL.
This observation leads to a natural question: how to design
an appropriate scheduling and resource allocation scheme
that can minimize the energy consumption of FEEL while
simultaneously accelerating its convergence? To answer this
question, we establish an optimization problem as follows.
Problem 1 (Joint Device Scheduling and Resource Alloca-
tion):
min
pn,tn(1 −λ)˜
E(pn,tn) + λg(pn)
s.t.(2),(3),(4),(5)
where λ∈[0,1] is a weight coefficient. Let (p∗n,t∗n)be an
optimal solution of Problem 1. Note that in Problem 1, since
ˆ
Eis independent of (pn,tn), it is a constant with respect to
(pn,tn). So, from now on, we exclude the term ˆ
Efrom the
objective function for simplicity.
Problem 1 is a challenging non-convex problem. To solve
Problem 1 optimally, we propose to decompose it into two
subproblems, namely, Resource Allocation subproblem and
Device Scheduling subproblem, using the structural properties
of Problem 1. The two subproblems are specified below.
Problem 2 (Resource Allocation for Each m∈ M):
t∗n(m)≜arg min
tnE(tn(m))
s.t.0≤tn
k(m)≤ik(m)xn
kT, k ∈ K,(11)
X
k∈K
tn
k(m) = Tmax
k∈Km
{xn
k},(12)
where t∗n(m)denotes an optimal solution. Note that we have
t∗n= (t∗n(m))m∈M.
Problem 3 (Device Scheduling for Given t∗n):
p∗n= arg min
pn(1 −λ)¯
E(pn,t∗n) + λg(pn)
s.t.(2),(3).
The relationship between Problem 1 and Problems 2 and 3
is addressed as follows. Specifically, it can be easy to verify
that if tn(m)and pnare in the feasible sets of Problems 2
and 3, respectively, the point (pn,tn)is a feasible point of
Problem 1, and vice versa. So, Problem 1 and Problems 2
and 3 have identical feasible sets. The point (pn,tn)is optimal
for Problem 1 if and only if it is optimal for Problems 2 and 3.
Thus, we conclude that Problem 1 and Problems 2 and 3 are
equivalent. On this basis, to solve Problem 1, we are only
required to solve Problems 2 and 3 separately without losing
any optimality, which are elaborated in the sequel.
B. Problem Solution
First, we solve Problem 2. Since Problem 2 is convex, we
can get an optimal solution of it by using the KKT conditions,
as summarized below. Note that we have omitted the details
of the proof due to page limitation.
This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.
4
Algorithm 1 The Algorithm to Solve Problem 1
1: Require: K,B,λ,C,T,σ2,Dk,ρk,Hn
k,∥ˆ
gn
k∥, and xn
k.
2: Obtain t∗nby solving Problem 2 via Lemma 3.
3: Obtain p∗nby solving Problem 3 via Lemma 4.
4: Return: (p∗n,t∗n)
Lemma 3 (Optimal Solution of Problem 2): An optimal
solution of Problem 2 is given by
t∗n
k(m) = min
max
ℓln 2/B
W0Hn
k¯ν−σ2
σ2e+ 1
,0
, ik(m)xn
kT
,
where W0(·)denotes the Lambert function and ¯νsatisfies
Pk∈K ¯
tn
k(m) = Tmaxk∈Km{xn
k}.
Remark 1 (Quality-Aware Resource Allocation): We can
observe from Lemma 3 that the time allocation t∗n
k(m)is
only aware of the channel quality Hn
kand is independent of
the gradient quality ∥ˆgn
k∥and the device availability ρk. In
particular, t∗n
k(m)decreases with a growing channel quality
of the available device kin cluster msince a higher quality
channel can support a larger transmission rate.
Next, we solve Problem 3. Similar to Problem 2, Problem 3
is also convex, and thus, we have the following result.
Lemma 4 (Optimal Solution of Problem 3): An optimal
solution of Problem 3 is given by
p∗n
m= min (max (sλvm
(1 −λ)E(t∗n(m)) + ¯φ,0),1),
where vm≜C
(|D|Π)2Pk∈KmDk
ρk2∥ˆgn
k∥2and ¯φsatisfies
Pm∈M p∗n
m= 1.
Remark 2 (Quality and Availability-Aware Device Schedul-
ing): From Lemma 4, we can observe that the device schedul-
ing probability p∗n
mis determined by the channel quality
(captured in E(t∗n(m))) and the gradient quality (captured
in vm) as well as the device availability (captured in vm). In
particular, p∗n
mincreases with the improvement of the channel
quality and the gradient quality of all available devices in clus-
ter m. This can be explained as follows: high-quality gradients
contain richer local data information that can contribute to the
convergence of FEEL, and high-quality channels can support
the submission of more high-quality gradients. The probability
p∗n
mincreases with a reducing availability probability of all
active devices in cluster m. This is because lazy devices with
low availability have more fresh data, and scheduling theses
devices will accelerate the convergence of FEEL.
Finally, by combining Lemmas 3 and 4, we can obtain
an optimal joint device scheduling and resource allocation
scheme (p∗n,t∗n), as summarized in Algorithm 1. Note that
Algorithm 1 is executed by the edge server in Stage 2 of FEEL.
Since (p∗n,t∗n)is obtained in a closed form, Algorithm 1
can be very efficient and scalable, and has the potential to
solve large-scale problems. We believe that the edge server
has sufficient processing power to execute Algorithm 1.
IV. NUMERICAL RES ULT S
The simulation settings are given below: K= 10,B=
10 MHz, σ2= 10−9W, T= 60 ms, λ= 1 ×10−6, and Hk
0 100 200 300 400 500 600
0.1
0.2
0.3
0.4
0.5
0.6
0.7
C=2
C=4
C=6
C=10
(a) Test accuracy.
0 100 200 300 400 500 600
0
1
2
3
4
5
C=2
C=4
C=6
C=10
0 600
0
0.2
(b) Energy consumption.
Fig. 2. Impact of cluster size.
0 200 400 600
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18 Proposed scheme
Baseline 1
Baseline 2
Baseline 3
0 200 400 600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 Proposed scheme
Baseline 1
Baseline 2
Baseline 3
(a) MNIST dataset.
0 200 400 600
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9 Proposed scheme
Baseline 1
Baseline 2
Baseline 3
0 200 400 600
0
0.2
0.4
0.6
0.8
1Proposed scheme
Baseline 1
Baseline 2
Baseline 3
(b) Fashion-MNIST dataset.
Fig. 3. Performance comparison with baselines.
is modeled as an independent exponential distribution with
mean ˆ
H= 10−5, where ˆ
Hreflects the mean channel quality.
We use FEEL to train a classification model on MNIST (built
into Matlab 2021a) and Fashion-MNIST datasets, respectively.
The MNIST contains 10000 handwritten images of the digits 0
to 9, where each digit has 1000 images. The Fashion-MNIST
consists of 70000 images of fashion items from 10 categories,
including “t-shirt”, “bag”, and so on, with 7000 images for
each category. The learning rate ηand the momentum weight
are set to be 0.001 and 0, respectively. Then using Matlab, we
obtain the size of the gradient ˆgn
kas ℓ= 9 ×105bits (on the
MNIST dataset) or ℓ= 1 ×106bits (on the Fashion-MNIST
dataset). To make ensure that the training data distribution of
each device is non-IID, we first randomly assign a label to each
device, and then randomly select |Dk|images from all images
under this label as the training set, and the rest as the test set.
Here, we set |Dk|= 800 if kis odd and |Dk|= 200, otherwise.
We use the test set to evaluate the classification performance
of FEEL during model training. We employ the convolutional
neural network (CNN) to perform image classification task.
The concrete architecture of CNN is the same as that in [1].
All numerical results are averaged over 100 trails.
Fig. 2 shows the impact of the cluster size Con the accuracy
and energy consumption of the FEEL system. From Fig. 2, we
can see that the larger Cis, the higher the test accuracy is, but
at the cost of consuming more energy. This indicates that in
practical applications, the number of scheduled devices should
be adjusted appropriately to balance the accuracy and energy
consumption.
Fig. 3 compares the performance of FEEL under the
proposed Algorithm 1 with those under three representative
baselines, where C= 2 for the MNIST dataset and C= 6
for the Fashion-MNIST dataset, ρk= 0.8if kis odd and
ρk= 0.2, otherwise. Here, Baseline 1 refers to a uniform
scheduling design, i.e., pn
m= 1/M; Baseline 2 is a schedul-
ing design with availability and gradient quality awareness,
i.e., pn
m=Gn
m/Pm∈M Gn
m, where Gn
m≜1
CPk∈Km∥ˆgn
k∥
represents the mean gradient norm of cluster m; Baseline 3
This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.
5
adopts a scheduling design with availability and channel
quality awareness, i.e., pn
m=Jn
m/Pm∈M Jn
m, where Jn
m≜
1
CPk∈Kmxn
kHn
kdenotes the mean channel power gain of
cluster m. Note that the baselines also consider the optimiza-
tion of resource allocation via Lemma 3. From Fig. 3, it is
clear that the proposed scheme is significantly better than the
baselines in terms of test accuracy and energy consumption on
both the MNIST and fashion-MNIST datasets. The underlying
reasons are given as follows. Baseline 1 fails to perceive
the device availability and completely overlooks the wireless
channel quality as well as the updated gradient quality during
model training. Although Baseline 1 (Baseline 2) perceives
the device availability and can be conscious of the gradient
quality (channel quality), the channel quality (gradient quality)
is completely ignored. In contrast, as pointed out in Remark 1
and Remark 2, our proposed scheme can well adapt to the
changes of device availability, wireless channel quality, and
local gradient quality, so it achieves higher accuracy and lower
energy consumption.
V. CONCLUSIONS
In this paper, we first mathematically model the device
availability, wireless channel quality, and gradient quality, and
derive the convergence bound of FEEL. Then, we formulate a
joint device scheduling and resource allocation problem, aim-
ing to improve the FEEL efficiency. The formulated problem is
a challenging non-convex problem. By exploring its structural
properties and utilizing the KKT conditions, we obtain an
optimal solution in closed-form. Finally, the analytical results
enable us to gain some important insights into how the device
availability, wireless channel quality, and gradient quality
affect the device scheduling and resource allocation in the
FEEL system.
APPENDIX A
PROO F OF LEMMA 1
By taking the derivative of L(w)at w=wn, we have
gn≜1
|D| Pk∈K|Dk|▽wLk(w)|w=wn=1
|D| Pk∈K|Dk|gn
k.
Then, by taking expectation of ˆgn
m, we have
E[ˆgn
m] = X
u∈M
pn
u
|D|Πpn
uX
k∈K
ik(u)|Dk|ρ−1
kE[xn
k]gn
k
(a)
=1
|D|ΠX
k∈K
|Dk|gn
kX
u∈M
ik(u)
(b)
=1
|D| X
k∈K
|Dk|gn
k,
where (a) follows from E[xn
k] = ρkand (b) is due to
Pu∈M ik(u) = Π for each k∈ K. Therefore, we have
E[ˆgn
m] = gn, namely, ˆgn
mis an unbiased estimation of gn,
which completes the proof. ■
APPENDIX B
PROO F OF LEMMA 2
First, since ▽wL(w)is Lipschitz continuous with a positive
modulus µ, we have
L(wn+1)≤L(wn) + wn+1 −wnTgn+µ
2
wn+1 −wn
2
(a)
=L(wn)+(−ηˆgn(m))Tgn+µ
2∥−ηˆgn(m)∥2.
Here, (·)Tis the transposition operator and (a) follows from
(9). Then, taking expectation in both sides of the above
inequality, we have
EL(wn+1)−L(w∗)
(b)
≤E[L(wn)−L(w∗)] −η∥gn∥2+µη2
2Eh∥ˆgn(m)∥2i
=E[L(wn)−L(w∗)] −η∥gn∥2+µη2
2(|D|Π)2X
m∈M
1
pn
m
∆,
where ∆≜E
Pk∈Km
Dk
ρkˆgn
k
2and (b) follows from
the unbiasedness of ˆgn
m. Next, using the generalized triangle
inequality of the second kind ∥Pn
j=1 xj∥2≤nPn
j=1∥xj∥2,
we have ∆≤CPk∈KmDk
ρk2
Eh∥ˆgn
k∥2i. Finally, using the
definition of g(pn), we complete the proof. ■
REFERENCES
[1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
“Communication-Efficient Learning of Deep Networks from Decentral-
ized Data,” in Proc. Int. Conf. Artificial Intell. Stat. (AISTATS), vol. 54,
2017, pp. 1273–1282.
[2] G. Zhu, D. Liu, Y. Du, C. You, J. Zhang, and K. Huang, “Toward
an intelligent edge: Wireless communication meets machine learning,”
IEEE Commun. Mag., vol. 58, no. 1, pp. 19–25, Jan. 2020.
[3] Y. Liu, Y. Zhu, and J. J. Yu, “Resource-constrained federated learning
with heterogeneous data: Formulation and analysis,” IEEE Trans. Netw.
Sci. Eng., pp. 1–1, 2021.
[4] J. Kang, Z. Xiong, D. Niyato, S. Xie, and J. Zhang, “Incentive mech-
anism for reliable federated learning: A joint optimization approach
to combining reputation and contract theory,” IEEE Internet Things J.,
vol. 6, no. 6, pp. 10 700–10 714, 2019.
[5] H. H. Yang, Z. Liu, T. Q. S. Quek, and H. V. Poor, “Scheduling policies
for federated learning in wireless networks,” IEEE Trans. Commun.,
vol. 68, no. 1, pp. 317–333, Jan. 2020.
[6] W. Xia, T. Q. S. Quek, K. Guo, W. Wen, H. H. Yang, and H. Zhu,
“Multi-armed bandit-based client scheduling for federated learning,”
IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7108–7123, Nov.
2020.
[7] J. Leng, Z. Lin, M. Ding, P. Wang, D. Smith, and B. Vucetic, “Client
scheduling in wireless federated learning based on channel and learning
qualities,” IEEE Wirel. Commun., pp. 1–1, 2022.
[8] Z. Yang, M. Chen, W. Saad, C. S. Hong, and M. Shikh-Bahaei, “Energy
efficient federated learning over wireless communication networks,”
IEEE Trans. Wireless Commun., vol. 20, no. 3, pp. 1935–1949, Mar.
2021.
[9] W. Shi, S. Zhou, and Z. Niu, “Device scheduling with fast convergence
for wireless federated learning,” in Proc. IEEE ICC, Jun. 2020, pp. 1–6.
[10] J. Xu and H. Wang, “Client selection and bandwidth allocation in
wireless federated learning networks: A long-term perspective,” IEEE
Trans. Wireless Commun., vol. 20, no. 2, pp. 1188–1200, Feb. 2021.
[11] M. M. Wadu, S. Samarakoon, and M. Bennis, “Joint client scheduling
and resource allocation under channel uncertainty in federated learning,”
IEEE Trans. Commun., pp. 1–1, 2021.
[12] J. Ren, Y. He, D. Wen, G. Yu, K. Huang, and D. Guo, “Scheduling
for cellular federated edge learning with importance and channel aware-
ness,” IEEE Trans. Wireless Commun., vol. 19, no. 11, pp. 7690–7703,
Nov. 2020.
[13] Q. Zeng, Y. Du, K. Huang, and K. K. Leung, “Energy-efficient radio
resource allocation for federated edge learning,” in Proc. IEEE ICC
Workshops, Jun. 2020, pp. 1–6.
[14] W. Wen, Z. Chen, H. H. Yang, W. Xia, and T. Q. S. Quek, “Joint schedul-
ing and resource allocation for hierarchical federated edge learning,”
IEEE Trans. Wireless Commun., pp. 1–1, 2022.
[15] H.-S. Lee, “Device selection and resource allocation for layerwise
federated learning in wireless networks,” IEEE Systems Journal, pp.
1–4, 2022.
This article has been accepted for publication in IEEE Communications Letters. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/LCOMM.2022.3194558
© 2022 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: CHONGQING UNIVERSITY. Downloaded on August 16,2022 at 03:39:16 UTC from IEEE Xplore. Restrictions apply.