Conference PaperPDF Available

Resource Allocation in URLLC with Online Learning for Mobile Users

Authors:
Resource Allocation in URLLC with Online
Learning for Mobile Users
Jie Zhang, Chengjian Sun and Chenyang Yang
School of Electronics and Information Engineering,
Beihang University, Beijing, China
Email: {zhang15021245,sunchengjian,cyyang}@buaa.edu.cn
Abstract—Neural networks (NNs) have been applied to solve
various problems in ultra-reliable and low-latency communica-
tions (URLLC). Facing the stringent quality of service require-
ment, the time for training and running NNs is not negligible,
and how to ensure the reliability with learning-based solutions
is challenging, especially in a dynamic environment. In this
paper, we propose an online learning method, which fine-tunes
the NNs trained without supervision to ensure the reliability
of URLLC for mobile users. A joint power and bandwidth
allocation problem, aiming to minimize the bandwidth required
for satisfying the quality of service of each user, is considered
as an example. A “learning-to-optimize” method with offline
training is provided for comparison. Simulation results show that
the proposed online learning method can achieve comparable
system performance as the offline training method, where the
time consumed for online training and inference is about 25%
of the 1ms latency bound for the considered setup. Besides,
the online learning method adapts to the abrupt change of
average packet arrival rate quickly and can ensure reliability by
setting the required overall packet loss probability conservative
slightly. By contrast, the offline training method yields much
worse reliability when the arrival rate varies.
Index Terms—Online training, real-time inference, URLLC
I. INTRODUCTION
Ultra-reliable and low-latency communication (URLLC) is
continuously to be investigated in the sixth generation mobile
systems due to various technical challenges [1].
Resource allocation plays a key role in satisfying the
quality of service (QoS) requirement. To use spectrum ef-
ficiently in wireless channels, the resources should be allo-
cated frequently to adapt to the time-varying communication
environments. To facilitate the real-time decision, the new
framework of “learning-to-optimize” proposed in [2] can be
used for URLLC, where a neural network (NN) well-trained
in an offline manner is used to learn the mapping from the
environmental parameters to the optimal resource allocation.
In general, NNs are unable to be generalized well in
dynamic environments since the training samples cannot be
gathered from all scenarios, and hence need to be re-trained.
For example, the NNs in [3] and [4] respectively need to
be re-trained periodically in the timescales of minutes and
seconds. To avoid the time used for generating labels in the
offline-training phase required by the framework in [2], which
is prohibitive for functional optimization problems [5] often
This work was supported by the key project of National Natural Science
Foundation of China (NSFC) under Grant 61731002.
appearing in URLLC, unsupervised learning can be applied
[4, 6, 7]. However, training the NNs still consumes time for
computing, which has never been taken into account and
evaluated for URLLC.
URLLC has stringent end-to-end (E2E) time budget, say 1
ms. Hence, the re-training time for NN-based solutions can
not be ignored, and low complexity online training methods
is urgently needed for URLLC to adapt to environmental
variations. Reinforcement learning (RL) is a powerful tool
for online learning, and has been used to support URLLC in
dynamic environment [8]. However, RL is designed to solve
problems formulated as Markov decision process (MDP),
which consumes unnecessary computational resource to solve
the non-MDP problems widely existed in URLLC.
In this paper, we propose an online learning method under
the framework of unsupervised deep learning for non-MPD
problems. We take a downlink resource allocation problem
in URLLC as an example, where the transmit power and
bandwidth are allocated among users according to their small-
scale and large-scale channel gains, respectively. The power
allocation policy is learned by a NN, which is trained on-
line together with the optimization of bandwidth allocation.
For adapting to the time-varying large-scale channel gains
of mobile users, an offline training method is provided for
comparison. Simulation results show that, with only a few
iterations for each observation of the large-scale channel
gains, the online learning method can achieve comparable
performance as the offline method. The total time used by the
online training and inference is even shorter than the inference
time of the offline method, both of which are much shorter
than the latency requirement.
II. PRO BL EM FO RM UL ATIO N AN D EXISTING SOLUTION
Consider a downlink (DL) orthogonal frequency division
multiple access system supporting URLLC, where a base
station (BS) equipped with Ntantennas serves Ksingle-
antenna mobile users with maximal power of Pmax.
The small-scale channels are time-varying with coherence
time Ts. Within the duration Ts, multiple frames, each with du-
ration Tf, are used for DL and uplink (UL) transmission. The
duration for DL transmission is τ. The large-scale channels are
also time-varying, which can be regarded as unchanged within
duration TLbut vary among the durations. The relation of the
timescales is shown in Fig. 1.
Time
UL DL
Fig. 1. Timescales of channel variations and frame duration.
Since the packet size uin URLLC is usually small, the
bandwidth required for transmitting each packet is less than the
channel coherence bandwidth, i.e., the channel is flat fading.
Since the E2E delay requirement in URLLC is typically
shorter than Ts, the channel is quasi-static and time diversity
cannot be exploited. To guarantee transmission reliability, we
consider frequency hopping, where each user is assigned with
different subchannels in adjacent frames. When the frequency
interval between adjacent subchannels exceeds the coherence
bandwidth, the small scale channels of a user among frames
are independent.
In URLLC, the blocklength of channel coding is finite
due to the short transmission duration. To characterize the
impact of decoding errors on reliability, the achievable rate
in finite blocklength regime is required. In quasi-static flat
fading channels, when channel state information is available
at the BS and a user (say the kth user), the achievable rate (in
packets/frame) can be accurately approximated by [9],
skτWk
uln 2 "ln 1 + αkgkPk
N0WkrVk
τWk
Q1
G(εc
k)#(1)
where Wkand Pkare the bandwidth and transmit power
allocated to the kth user, εc
kis the decoding error probability,
αkand gkare the large-scale and small-scale channel gains of
the kth user, respectively, N0is the single-side noise spectral
density, Q1
G(x)is the inverse of the Gaussian Q-function, and
Vk= 1 1
h1+ αkgkPk
N0Wki2[9]. If the signal-to-noise ratio (SNR)
αkgkPk
N0Wk5dB, Vk1is accurate [10]. Since high SNR is
required for URLLC, such approximation is accurate.
Packets for each user arrive at the buffer of the BS randomly
and may accumulate into a queue. We consider that the packets
for different users wait in different queues.
The QoS requirements of URLLC can be characterized by
a delay bound Dmax and an overall packet loss probability
εmax. The delay of UL transmission, backhaul and processing
have been studied in literature [11, 12]. By further removing
the time used for online learning and inference, herein Dmax
is the DL delay, which consists of the queueing delay (denoted
as Dq
kfor the kth user), transmission delay Dtand decoding
delay Dc.Dt=Tfand Dcis a constant value [13]. All these
delay components are measured in frames. Due to the random
packet arrival, Dq
kis random. To ensure the delay requirement,
Dq
kshould be bounded by Dq
max ,Dmax DtDc.
If the queueing delay of a packet exceeds Dq
max, the packet
will be useless. The queueing delay violation probability for
the kth user, denoted as εq
k,Pr{Dq
k>Dq
max}, can be bounded
by εq
k< eθkBE
kDq
max ,εq,U B
k[14], where θkis the QoS
exponent that satisfies CE
kBE
k,CE
kis effective capacity
depending on the service process and can be expressed as
CE
k=1
θkln Eg{eθksk}(packet/frame) [14] and BE
kis
effective bandwidth depending on the packet arrival process.
Here, Eg{·} denotes the expectation over the small-scale
channel gains.
The overall reliability requirement can be characterized by
1(1 εc
k)(1 εq
k)εc
k+εq
kεmax. This approximation is
very accurate, because the values of εcand εqare very small
in URLLC. The queueing delay requirement (Dq
max,εq
k) can
be satisfied if εq,U B
kis satisfied. Then, the overall reliability
requirement can be ensured if εmax =εc
k+eθkBE
kDq
max . For
simplicity, we assume εc
k=εmax/2as in [11]. Then, the
QoS exponent θksatisfying (Dq
max,εq
k) can be obtained from
eθkBE
kDq
max =εmax/2, with which the QoS of URLLC can
be ensured if 1
θkln Eg{eθksk} ≥ BE
k. For example, when
the packets of the kth user arrive according to Poisson process
with average arrival rate ak,BE
k=ak
θk(eθk1) (packet/frame)
and θk= ln h1ln(εmax /2)
akDq
max i[11].
To exploit multi-user diversity, we allocate transmit power
to each user according to the small-scale channel gains of
all users g,{g1, ..., gK}. To reduce the complexity, the
bandwidth is allocated among users according to their large-
scale channel gains [4]. To improve the resource efficiency,
we minimize the total bandwidth required to ensure the QoS
by optimizing the power and the bandwidth allocation,
min
Pk(g),Wk
K
X
k=1
Wk(2)
s.t. 1
θk
lnEg{eθksk} ≥ BE
k(2a)
K
X
k=1
PkPmax, Pk0, Wk0(2b)
Problem (2) involves two timescales, which is a functional
optimization problem. Besides, the constraint in (2a) is not
with closed-form expression. To solve such a challenging
problem, an unsupervised deep learning method is proposed
in [4] to find Pk(g)and Wk,k= 1,· · · , K for every given
value of α,{α1, ..., αK}. In particular, problem (2) is first
transformed into its primal-dual problem as follows,
max
h(g)k
min
Pk(g),Wk
K
X
k=1 hWk+λkEg{eθksk} − eθkBE
ki
+ZRK
+
h(g) K
X
k=1
Pk(g)Pmax!dg(3)
s.t.
K
X
k=1
Pk(g)Pmax (3a)
Pk(g)0, Wk0, h(g), λk0(3b)
where the objective is the Lagrangian function of problem
(2), and h(g)and λkare the Lagrangian multipliers. Then,
Pk(g)/Pmax is parameterized as a NN with model param-
eters ω, denoted as N(g;ω). Since the required bandwidth
decreases when more power are allocated, the equality in
(3a) holds. Thereby, by applying Softmax function in the
output layer, N(g;ω)can satisfy the maximal transmit power
constraint. Wkand the parameterized form of Pk(g)are
optimized from,
max
λk
min
ω,Wk
L,
K
X
k=1 hWk+λkEg{eθksk}−eθkBE
ki (4)
s.t. Wk0, λk0(4a)
By taking the objective function in (4) as the loss function,
the optimal power and bandwidth allocation can be found from
stochastic gradient methods as shown in [4].
III. LEARNING RESOURCE ALL OC ATIO N IN OFFL IN E AN D
ONLINE MANNER
When large-scale channel gains change, N(g;ω)needs
to be re-trained and bandwidth allocation needs to be re-
optimized. To avoid the re-training, one can extend the idea
in [2] to learn P(g,α) = [P1(g,α),· · · , PK(g,α)] and
W(α) = [W1(α),· · · , WK(α)] with offline-trained NNs.
Alternatively, we can learn P(g)=[P1(g),· · · , PK(g)] with
online-trained NN and optimize W= [W1,· · · , WK]by
tracking αsampled with period of TLin an online manner.
A. Learning to Optimize Problem (2) with Offline Training
According to the proof in [7], problem (2) can be equiva-
lently transformed into the following form,
max
h(g,α)λk(α)min
Pk(g,α),Wk(α)
¯
L,
K
X
k=1
Eα(Wk(α)+λk(α)EgeθkˆskeθkBE
k
+ZRK
+
h(g,α) K
X
k=1
Pk(g,α)Pmax!dg
)
s.t. (3a),Pk(g,α)0, Wk(α)0, h(g,α), λk(α)0
where Eα{·} denotes the expectation over the large-scale
channel gains.
We approximate Pk(g,α)/Pmax ,Wk(α)and λk(α),k=
1,· · · , K, with NNs denoted as NP(g,α;ωP),NW(α;ωW)
and Nλ(α;ωλ), respectively. To ensure positive bandwidth
and Lagrangian multipliers, we choose SoftPlus as the
output layer of NW(α;ωω)and Nλ(α;ωλ). To satisfy the
maximal transmit power constraint, we choose SoftMax
as the output layer of NP(g,α;ωP). Then, by taking the
Lagrange function as the loss function, we can train ωPand
ωWby stochastic gradient descent (SGD) and train ωλby
stochastic gradient ascent (SGA) as follows
ωP(t+1) = ωP(t)φP(t)ωPˆ
L(t)
=ωP(t)φP(t)PmaxωPNP(α,g;ωP)Pˆ
L(t)
ωW(t+1) = ωW(t)φW(t)ωWˆ
L(t)
=ωW(t)φW(t)ωWNW(α;ωW)Wˆ
L(t)
ωλ(t+1) = ωλ(t)+φλ(t)ωλˆ
L(t)
=ωλ(t)+φλ(t)ωλNλ(α;ωλ)λˆ
L(t)
where ˆ
L(t),1
NbPNb
n=1PK
k=1hWk+λkeθkˆs(t)
k,n eθkBE
ki,
ˆs(t)
k,n is computed by (1) with a realization of large-scale
channel gain and a realization of small-scale channel gain,
and Nbis the batch size in each iteration. The gradient matrix
ωWNW(α;ωW),ωPNP(α,g;ωP)and ωλNλ(α;ωλ)
can be computed through backward propagation, and Wˆ
L(t),
Pˆ
L(t)and λˆ
L(t)can be computed as
Wˆ
L(t)=(11
Nb
Nb
X
n=1
λt
k,n
ˆs(t)
k,n
∂Wk
θkeθkˆs(t)
k,n , k = 1,· · ·, K)
Pˆ
L(t)=(1
Nb
Nb
X
n=1
λt
k,n
ˆs(t)
k,n
∂Pk
eθkˆs(t)
k,n , k = 1,· · ·, K)
λˆ
L(t)=(1
Nb
Nb
X
n=1
eθkˆs(t)
k,n eθkBE
k, k = 1,· · ·, K)
B. Learning to Optimize (2) by Online Training and Tracking
When the large-scale channel gains change slightly, it is
unnecessary to retrain the NNs from scratch, where the model
parameters only need fine tuning.
To learn Pk(g)and Wkin an online manner, we can use
stochastic gradient method to solve problem (4). In particular,
for the lth sampled value of α, we fine-tune the model
parameter ω(i.e., train N(g;ω)) and update Wkand λkfrom
Niterations in the lth round as follows,
ωl(t+ 1) = ωl(t)φ(t)ωˆ
L(t)
=ωl(t)φ(t)PmaxωN(g;ω)Pˆ
L(t)
Wl
k(t+ 1) = "Wl
k(t)φ(t)ˆ
L(t)
∂Wk#+
λl
k(t+ 1) = "λl
k(t) + φ(t)ˆ
L(t)
∂λk#+
where [x]+,max{x, 0}yields positive values, ˆ
L(t)has
the same form as defined in Section III-A but herein ˆs(t)
k,n
is achievable rate computed by the nth realization of gkgiven
the lth sampled value of α, and Nbis the batch size in each
iteration. When the (l+ 1)th sampled value of αis observed,
the (l+ 1)th round of online training are initialized as
ωl+1(1) = ωl(N), λkl+1 (1) = λkl(N), W l+1
k(1)=Wl
k(N)
For the first round, the values of ω,λkand Wkcan be set via
pre-training.
C. Procedure and Online Computational Complexity
In the two-timescale policy, bandwidth allocation is updated
with period TLwhen αchanges, and power allocation is
updated in each frame Tfwhen gchanges.
With the offline training method, the functions Pk(g,α)and
Wk(α)can be obtained once the NNs have been trained with
random realizations of gand α. We only consider the time
used to obtain one value of P(g,α)and W(α)via forward
propagation of NP(g,α;ωP)and NW(α;ωω), respectively
denoted as toff
Pand toff
W. The fraction of the time consumed
for inference per unit time can be obtained as
ηoff =TL
Tf
toff
P+toff
W.TL×100% (5)
With online learning, Pk(g)and Wkare obtained for each
value of αkwith Ntimes of iterations. Denote the time used
by each round of iterations as ton, and the time used for
forward propagation of N(g;ω)to obtain one value of P(g)
as ton
P. Then, the fraction of the time consumed for online
training and resource allocation per unit time is,
ηon =TL
Tf
ton
P+Nton.TL×100% (6)
IV. SIMULATION RESULTS
In this section, we compare the performance of the offline
training method in Section III-A (called Method-B), and the
online learning method in Section III-B (called Method-C).
Since there is no optimal solution for problem (2), we use the
method in [4] as a baseline (called Method-A).
𝒗𝟏
O
B
𝒗𝟐
CA
𝒗𝒌
𝒗𝑲
𝑫𝟏𝒎
𝑫𝟐𝒎
𝒙𝒌
𝑫𝟏𝒎
Fig. 2. Simulation setup: cell radius D1= 250 m, and D2= 50 m.
We consider K= 10 users, which are equally located in the
OA segment of a road when they start to request the URLLC
service, as shown in Fig. 2. Pmax = 36 dBm, Nt= 32, and
N0=173 dBm/Hz. The path loss model is 35.3+37.6 lg(dk)
where dkis the distance between the kth user and the BS, and
the shadowing is zero mean log-normal with 8dB standard
deviation and 50 m correlation distance. Small-scale channels
is subject to Rayleigh distribution. To show the impact of
the speed of environmental change on the online learning, we
consider four groups of users with different velocities, where
the users in each group have the same velocity. M= 100
test locations (and hence the large-scale channel gains) are
uniformly selected on the road for evaluation. The remaining
simulation parameters are shown in Table I. This simulation
setup is used unless otherwise specified.
The fine-tuned hyper-parameters of the NNs are as follows.
The numbers of hidden layers of all the NNs are 4. The nodes
of hidden layers of Method-B and Method-C are 40 and 20,
respectively. For offline training, each sample includes a large-
scale channel gain (generated by a randomly location in the
cell with the path loss model and shadowing) and a small-scale
channel gain (generated according to Rayleigh distribution).
For online learning and testing the offline training method,
the large-scale and small-scale channel gains are generated
according to the simulation setup.
TABLE I
SIMULATION PARAMETERS
Overall packet loss probability εmax 105
Frame duration Tfand DL transmission time τ0.1ms and 0.05 ms
DL delay bound Dmax 10 frames/1ms
Transmission delay Dtand decoding delay Dc1frame (i.e., 0.1ms)
Packet size u20 bytes (160 bits)
Average packet arrival rate a0.2packets/frame
Sampling period of large-scale channel gain TL0.1s
To evaluate the performance of online learning and of-
fline training methods in terms of minimizing the band-
width and satisfying the constraints, we use the average
relative error of the Lagrangian function as a metric, εAX
L=
PM
m=1
|ˆ
LX(m)ˆ
LA(m)|
Mˆ
LA(m), X =Bor C, where ˆ
LA(m)and
ˆ
LX(m)is the Lagrangian function of Method-A and Method-
X obtained at the mth test location, respectively. For a fair
comparison, we use the Lagrangian multiplier of Method-A
to compute the Lagrangian function of the other two methods
such that all methods achieve the same performance in terms
of satisfying the constraints.
In Fig. 3, we provide the performance of the online learning
method versus the number of iterations N. The results are
averaged over 50 rounds of independent training. It is shown
that only a few iterations are required for online learning to
achieve the same performance as offline training, even for high
speed scenarios (e.g. N= 5 for 40 km/h, and N= 11 for
80 km/h).
12345678910111213
10-3
10-2
Fig. 3. Performance of online learning versus N.
To evaluate the accuracy of the learning methods, we define
the average relative error of the learned bandwidth as, εAX
W=
PM
m=1 PK
k=1
|WX
k(m)WA
k(m)|
KM W A
k(m), X =Bor C, where WA
k(m)
and WX
k(m)are the bandwidth allocated to the kth user by
Method-A and Method-X at the mth test location, respectively.
The average relative error of the learned power allocation is
defined in the same way. The results are provided in Table II.
We can find that the relative errors are small, where the error
of learning power allocation is larger. When the number of
iterations is given, the accuracy of online training decreases
with the increase of velocity.
TABLE II
REL ATIVE ER ROR S OF T HE LEA RNE D BANDWIDTH AND POWE R
Method Method-B Method-C (N= 11)
Resources Bandwidth Power Bandwidth Power
v= 10km/h0.91% 3.66% 0.27% 1.45%
v= 20km/h0.91% 3.66% 0.38% 2.02%
v= 40km/h0.91% 3.66% 0.57% 2.74%
v= 80km/h0.91% 3.66% 0.81% 3.86%
It is note worthy that, to improve generalization ability, the
offline training method can also be designed to further learn
the mapping from some other environmental parameters to
the optimal solution. Nonetheless, in practice there are always
parameters that are unable to learn in this way. To demonstrate
what happens in dynamic environment, we simulate a simple
scenario, where the average packet arrival rate ais 0.2pack-
ets/frame from 0s to 5s, and changes to 0.3packets/frame
from 5s to 10 s, and the velocities of all users is set as
40 km/h. The overall packet loss probabilities of the worst
user achieved by different learning methods are provided
in Fig. 4, where the large-scale channel gain only includes
path loss in this figure to emphasize the impact of the time-
varying arrival rate. It is shown that the offline training method
yields much worse reliability after achanges, indicating poor
generalization ability to the packet arrival rate. Although the
online learning method also cannot enure εmax = 105due
to the errors of estimating ˆ
L(t)and the stochastic gradient
method, its overall packet loss probability is always less than
2εmax and is robust to the abrupt change of a. This suggests
that the reliability of the online learning method can be ensured
by setting εmax a little conservatively during the training, say
as 0.5·105.
0246810
10-5
10-4
Fig. 4. Achieved overall reliability with time-varying αand a,N= 5.
In Table III, we compare the computational complexity
between offline and online training methods, where toff
P,toff
W,
ton
Pand ton are tested on an Intel® Core™ i7-8700K CPU
@3.70 GHz. Recall from Table I that the sampling period of
large-scale channel gain is TL= 100 ms and TL/Tf= 1000,
ηon = 25.95% means that about 26 ms is used for online
training N(g;ω), obtain one optimal value of Wand 1000
optimal values of P. Further recalling the definition of ηon,
the result means that in average 0.26 ms will be used in every
1ms (i.e., the time budget for learning and inference is 1/4
of the E2E delay bound). For Method-B, in average 0.28 ms
is used for inference in every 1ms without considering the
time for offline training.
TABLE III
COM PUTATI ONA L COMPLEXITY,AL L USE RS M OVE W ITH 80 K M/H.
Offline Training Method toff
Ptoff
Wηoff
Computational Time 0.028 ms 0.026 ms 28.03%
Online Training Method (N= 11)ton
Pton ηon
Computational Time 0.025 ms 0.086 ms 25.95%
V. CONCLUSION
In this paper, we jointly optimized bandwidth and power
allocation in URLLC with deep learning, where the NNs are
trained either online or offline. Simulation results showed that
the proposed online learning method can achieve the system
performance of offline training via a few steps of online update
and training, and is robust to the change of environmental
parameters. By evaluating with a regular computer, the time
consumed by online training a NN and optimizing the resource
allocation is in average one fourth of the E2E delay bound. By
setting the overall packet loss probability a little conservative,
the reliability can be guaranteed by the online learning method.
In conclusion, our results suggest that online deep learning is
possible to be applied for real-time resource allocation in dy-
namic environment with low online computational complexity.
REFERENCES
[1] C. She, R. Dong, Z. Gu, Z. Hou, Y. Li, W. Hardjawana, C. Yang,
L. Song, and B. Vucetic, “Deep learning for ultra-reliable and low-
latency communications in 6G networks,IEEE Netw., vol. 34, no. 5,
pp. 219–225, 2020.
[2] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos,
“Learning to optimize: Training deep neural networks for interference
management,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5348–
5453, Oct. 2018.
[3] A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep
learning coordinated beamforming for highly-mobile millimeter wave
systems,” IEEE Access, vol. 6, pp. 37328–37 348, May 2018.
[4] C. Sun and C. Yang, “Unsupervised deep learning for ultra-reliable and
low-latency communications,IEEE GLOBECOM, 2019.
[5] E. Zeidler, Nonlinear functional analysis and its applications: III:
variational methods and optimization. Springer Science & Business
Media, 2013.
[6] M. Eisen, C. Zhang, L. F. Chamon, D. D. Lee, and A. Ribeiro, “Learning
optimal resource allocations in wireless systems,” IEEE Trans. Signal
Process., vol. 67, no. 10, pp. 2775–2790, May 2019.
[7] C. Sun and C. Yang, “Learning to optimize with unsupervised learning:
Training deep neural networks for URLLC,IEEE PIMRC, 2019.
[8] A. T. Z. Kasgari, W. Saad, M. Mozaffari, and H. V. Poor, “Experi-
enced deep reinforcement learning with generative adversarial networks
(GANs) for model-free ultra reliable low latency communication,IEEE
Trans. Commun., pp. 1–1, 2020.
[9] W. Yang, G. Durisi, T. Koch, et al., “Quasi-static multiple-antenna fading
channels at finite blocklength,” IEEE Trans. Inf. Theory, vol. 60, no. 7,
pp. 4232–4264, Jul. 2014.
[10] J. G. S. Schiessl and H. Al-Zubaidy, “Delay analysis for wireless fading
channels with finite blocklength channel coding,” ACM MSWiM, 2015.
[11] C. She, C. Yang, and T. Q. S. Quek, “Joint uplink and downlink resource
configuration for ultra-reliable and low-latency communications,IEEE
Trans. Commun., vol. 66, no. 5, pp. 2266–2280, May 2018.
[12] B. Makki, T. Svensson, G. Caire, and M. Zorzi, “Fast HARQ over finite
blocklength codes: A technique for low-latency reliable communication,
IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 194–209, Jan 2019.
[13] M. Condoluci, T. Mahmoodi, E. Steinbach, and M. Dohler, “Soft re-
source reservation for low-delayed teleoperation over mobile networks,
IEEE Access, vol. 5, pp. 10 445–10 455, May 2017.
[14] J. Tang and X. Zhang, “Quality-of-service driven power and rate
adaptation over wireless links,IEEE Trans. Wireless Commun., vol. 6,
no. 8, pp. 3058–3068, 2007.
... A problem usually involves not only variable optimization but also function optimization, and usually exist both transient constraints and statistical constraints. In [28], the authors established a framework to use unsupervised deep learning to deal with problems with transient and statistical constraints, and achieved good results. Authors in [29] designed an Adaptive Neuro Fuzzy Inference System model to calculate the QoS for LTE HetNet applications. ...
Article
Full-text available
Energy efficiency (EE) is an important indicator in ultra‐reliable and low‐latency communication (URLLC). Power allocation is considered as an effective method to achieve high EE in URLLC. However, since the EE optimization problem is non‐convex, it is difficult to obtain the analytical solution efficiently. Moreover, to ensure reliable and low‐latency communication within a finite blocklength, the Shannon formula becomes impractical for URLLC. Therefore, finite blocklength coding theory is used to meet the requirements of URLLC. In this paper, the EE problem of URLLC is formulated and the power allocation function is parameterized to be optimized through a deep neural network (DNN). The DNN is trained through the primal‐dual iterative algorithm offline in the unsupervised manner, and can be deployed online to achieve real time power allocation results. The numerical results show the effectiveness of the proposed method.
... [eess.SP] 15 Feb 2023 allocation strategies, are in [4], [7], [11]- [15]. Data-driven approaches [8], [16]- [20], which observe data for model training for resource allocation, use tools including unsupervised learning [18], online learning [19], and reinforcement learning [20]. ...
Preprint
The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services, such as enhanced mobile broadband (eMBB) devices, by only allocating resources when necessary. The main challenge is posed by the uncertainty in the process of URLLC packet generation, which mandates the use of predictors for URLLC traffic in the coming frames. In practice, such prediction may overestimate or underestimate the amount of URLLC data to be generated, yielding either an excessive or an insufficient amount of resources to be pre-emptively allocated for URLLC packets. In this paper, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor. The proposed method leverages recent advances in online conformal prediction (CP), and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer.
... 17, No. 06, 2022 211 and teaching resource databases, designed and implemented an Android-based mobile educational resource management platform, and introduced the overall design of the entire database and the realization method of the front-end display platform, from the perspective of system construction. Zhang et al. [20] proposed an online learning method to meet the service quality requirements of each user, and offered a learning optimization method combined with offline training. Their method can quickly adapt to sudden changes in the average packet arrival rate. ...
Article
Full-text available
Currently, centralized online learning can no longer meet the fragmented learning needs of learners. It is a hot topic in mobile learning to allocate reasonable mobile learning resources (MLRs) for user terminals and servers. However, the existing studies have rarely discussed the matching relationship between the MLR features of user terminals and servers. To fill up the gap, this paper tries to optimize the allocation of MLRs based on the theory of mobile knowledge complex network. Firstly, a local bidirectional fitness model was established to optimize MLR allocation, and the core nodes were mined from the complex network of MLRs. Next, the authors clarified the causality between the density of MLR complex network and resource integration, constructed an evaluation index system (EIS) for MLR integration ability, and evaluated the overall resource integration ability of MLR network resources. The proposed network was proved effective in optimizing the resource allocation of mobile learning networks through experiments.
Article
The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services, such as enhanced mobile broadband (eMBB) devices, by only allocating resources when necessary. The main challenge is posed by the uncertainty in the process of URLLC packet generation, which mandates the use of predictors for URLLC traffic in the coming frames. In practice, such prediction may overestimate or underestimate the amount of URLLC data to be generated, yielding either an excessive or an insufficient amount of resources to be pre-emptively allocated for URLLC packets. In this paper, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor . The proposed method leverages recent advances in online conformal prediction (CP) , and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer.
Article
Improving learning efficiency is paramount for learning resource allocation with deep neural networks (DNNs) in wireless communications over highly dynamic environments. Incorporating domain knowledge into learning is a promising approach to dealing with this issue. It is also an emerging topic in the wireless community. In this article, we briefly summarize two approaches for using domain knowledge: introducing a mathematical model and prior knowledge to deep learning. Then, we consider a type of symmetric prior permutation equivariance, which widely exists in wireless tasks. To explain how such a generic prior is harnessed to improve learning efficiency, we resort to ranking, which jointly sorts the input and output of a DNN. We use power allocation among subcarriers, probabilistic content caching, and interference coordination to illustrate the improvement of learning efficiency by exploiting the property. From the case study, we find that the required training samples to achieve given system performance decreases with the number of subcarriers or contents, owing to an interesting phenomenon called “sample hardening.” Simulation results show that the training samples, the free parameters in DNNs, and the training time can be reduced dramatically by harnessing the prior knowledge. The samples required to train a DNN after ranking can be reduced by 15 ∼ 2,400 folds to achieve the same system performance as the counterpart without using prior.
Article
A well-trained deep neural network (DNN) enables real-time resource allocation by learning the relationship between a policy and its impacting parameters. When wireless systems operate in dynamic environments, the DNN has to be re-trained frequently and hence training complexity should be low. A promising approach to deal with this issue is to construct DNNs with prior knowledge. In this paper, we show that the power allocation policy in multi-cell-multi-user systems exhibits a combination of permutation equivariance properties, which can be harnessed by graph neural networks (GNNs). In particular, we construct a heterogeneous graph and resort to heterogeneous GNN for learning the policy, whose outputs are only equivariant to some permutations of vertexes rather than arbitrary permutations as homogeneous GNNs. We prove that the properties of the functions learned by existing heterogeneous GNN for the formulated graph are inconsistent with the properties of the policy. To avoid the performance degradation by embedding wrong priors, we design a parameter sharing scheme for heterogeneous GNN such that the learned relationship satisfies the desired properties. Simulation results show that the sample and computational complexities for training the constructed GNN are much lower than existing DNNs to achieve the same sum rate.
Article
Full-text available
This paper considers the design of optimal resource allocation policies in wireless communication systems which are generically modeled as a functional optimization problem with stochastic constraints. These optimization problems have the structure of a learning problem in which the statistical loss appears as a constraint, motivating the development of learning methodologies to attempt their solution. To handle stochastic constraints, training is undertaken in the dual domain. It is shown that this can be done with small loss of optimality when using near-universal learning parameterizations. In particular, since deep neural networks (DNN) are near-universal their use is advocated and explored. DNNs are trained here with a model-free primal-dual method that simultaneously learns a DNN parametrization of the resource allocation policy and optimizes the primal and dual variables. Numerical simulations demonstrate the strong performance of the proposed approach on a number of common wireless resource allocation problems.
Article
Full-text available
Supporting high mobility in millimeter wave (mmWave) systems enables a wide range of important applications such as vehicular communications and wireless virtual/augmented reality. Realizing this in practice, though, requires overcoming several challenges. First, the use of narrow beams and the sensitivity of mmWave signals to blockage greatly impact the coverage and reliability of highly-mobile links. Second, highly-mobile users in dense mmWave deployments need to frequently hand-off between base stations (BSs), which is associated with critical control and latency overhead. Further, identifying the optimal beamforming vectors in large antenna array mmWave systems requires considerable training overhead, which significantly affects the efficiency of these mobile systems. In this paper, a novel integrated machine learning and coordinated beamforming solution is developed to overcome these challenges and enable highly-mobile mmWave applications. In the proposed solution, a number of distributed yet coordinating BSs simultaneously serve a mobile user. This user ideally needs to transmit only one uplink training pilot sequence that will be jointly received at the coordinating BSs using omni or quasi-omni beam patterns. These received signals draw a defining signature not only for the user location, but also for its interaction with the surrounding environment. The developed solution then leverages a deep learning model that learns how to use these signatures to predict the beamforming vectors at the BSs. This renders a comprehensive solution that supports highly-mobile mmWave applications with reliable coverage, low latency, and negligible training overhead. Extensive simulation results, based on accurate ray-tracing, show that the proposed deep-learning coordinated beamforming strategy approaches the achievable rate of the genie-aided solution that knows the optimal beamforming vectors with no training overhead. Compared to traditional beamforming solutions, the results show that the proposed deep learning based strategy attains higher rates, especially in high-mobility large-array regimes.
Article
Full-text available
Supporting ultra-reliable and low-latency communications (URLLC) is one of the major goals for the fifth-generation cellular networks. Since spectrum usage efficiency is always a concern, and large bandwidth is required for ensuring stringent quality-of-service (QoS), we minimize the total bandwidth under the QoS constraints of URLLC. We first propose a packet delivery mechanism for URLLC. To reduce the required bandwidth for ensuring queueing delay, we consider a statistical multiplexing queueing mode, where the packets to be sent to different devices are waiting in one queue at the base station, and broadcast mode is adopted in downlink transmission. In this way, downlink bandwidth is shared among packets of multiple devices. In uplink transmission, different subchannels are allocated to different devices to avoid strong interference. Then, we jointly optimize uplink and downlink bandwidth configuration and delay components to minimize the total bandwidth required to guarantee the overall packet loss and end-to-end delay, which includes uplink and downlink transmission delays, queueing delay and backhaul delay. We propose a two-step method to find the optimal solution. Simulation and numerical results validate our analysis and show remarkable performance gain by jointly optimizing uplink and downlink configuration.
Article
Full-text available
The emerging Tactile Internet (TI) will enable control-oriented networks for remotely accessing or manipulating objects or devices. One major challenge in this context is how to achieve ultra-low-delay communication between the local operator and the remote object/device to guarantee the stability of the global control loop and to maximize the user’s qualityof- experience (QoE). Being one of the major human-in-the-loop applications of the TI, haptic teleoperation inherits its delaysensitive nature and requires the orchestration of communication and control approaches. In this paper, we focus on the radio access protocol, and its impact on the latency of wireless communication. We propose a novel soft resource reservation mechanism for the uplink scheduling of mobile networks that can significantly reduce the latency compared with the current legacy scheme. By leveraging the characteristics of teleoperation data traffic, and reserving resources accordingly, the proposed soft reservation scheme maintains the spectral efficiency while the human operator’s QoE is improved. The simulation results confirm the efficiency of the proposed scheme.
Article
In this paper, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC-6G) in the downlink of a wireless network. The goal is to guarantee high end-to-end reliability and low end-to-end latency, under explicit data rate constraints, for each wireless user without any models of or assumptions on the users’ traffic. In particular, in order to enable the deep-RL framework to account for extreme network conditions and operate in highly reliable systems, a new approach based on generative adversarial networks (GANs) is proposed. This GAN approach is used to pretrain the deep-RL framework using a mix of real and synthetic data, thus creating an experienced deep-RL framework that has been exposed to a broad range of network conditions. The proposed deep-RL framework is particularly applied to a multi-user orthogonal frequency division multiple access (OFDMA) resource allocation system. Formally, this URLLC-6G resource allocation problem in OFDMA systems is posed as a power minimization problem under reliability, latency, and rate constraints. To solve this problem using experienced deep-RL, first, the rate of each user is determined. Then, these rates are mapped to the resource block and power allocation vectors of the studied wireless system. Finally, the end-to-end reliability and latency of each user are used as feedback to the deep-RL framework. It is then shown that at the fixed-point of the deep-RL algorithm, the reliability and latency of the users are near-optimal. Moreover, for the proposed GAN approach, a theoretical limit for the generator output is analytically derived. Simulation results show how the proposed approach can achieve near-optimal performance within the rate-reliability-latency region, depending on the network and service requirements. The results also show that the proposed experienced deep-RL framework is able to remove the transient training time that makes conventional deep-RL methods unsuitable for URLLC-6G. Moreover, during extreme conditions, it is shown that the proposed, experienced deep-RL agent can recover instantly while a conventional deep-RL agent takes several epochs to adapt to new extreme conditions.
Article
In future 6th generation networks, URLLC will lay the foundation for emerging mission-critical applications that have stringent requirements on end-to-end delay and reliability. Existing works on URLLC are mainly based on theoretical models and assumptions. The model-based solutions provide useful insights, but cannot be directly implemented in practice. In this article, we first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC, and discuss some open problems of these methods. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC. The basic idea is to merge theoretical models and real-world data in analyzing the latency and reliability and training deep neural networks (DNNs). Deep transfer learning is adopted in the architecture to fine-tune the pre-trained DNNs in non-stationary networks. Further considering that the computing capacity at each user and each mobile edge computing server is limited, federated learning is applied to improve the learning efficiency. Finally, we provide some experimental and simulation results and discuss some future directions.
Article
This paper studies the performance of delayconstrained hybrid automatic repeat request (HARQ) protocols. Particularly, we propose a fast HARQ protocol where, to increase the end-to-end throughput, some HARQ feedback signals and successive message decodings are omitted. Considering quasistatic channels and a bursty communication model, we derive closed-form expressions for the message decoding probabilities as well as the throughput, the expected delay and the error probability of the HARQ setups. The analysis is based on recent results on the achievable rates of finite-length codes and shows the effect of the codeword length on the system performance. Moreover, we evaluate the effect of various parameters such as imperfect channel estimation and hardware on the system performance. As demonstrated, the proposed fast HARQ protocol reduces the packet transmission delay considerably, compared to state-of-the-art HARQ schemes. For example, with typical message decoding delay profiles and a maximum of 2, … , 5 transmission rounds, the proposed fast HARQ protocol can improve the expected delay, compared to standard HARQ, by 27, 42, 52 and 60%, respectively, independently of the code rate/fading model.