Conference PaperPDF Available

Resource Allocation in URLLC with Online Learning for Mobile Users

April 2021

April 2021

DOI:10.1109/VTC2021-Spring51267.2021.9449050

Conference: 2021 IEEE 93rd Vehicular Technology Conference (VTC2021-Spring)

Authors:

Timescales of channel variations and frame duration.

…

Simulation setup: cell radius D 1 = 250 m, and D 2 = 50 m.

…

Performance of online learning versus N .

…

Achieved overall reliability with time-varying α and a, N = 5.

…

Content may be subject to copyright.

Content uploaded by Chengjian Sun

Content may be subject to copyright.

Resource Allocation in URLLC with Online

Learning for Mobile Users

Jie Zhang, Chengjian Sun and Chenyang Yang

School of Electronics and Information Engineering,

Beihang University, Beijing, China

Email: {zhang15021245,sunchengjian,cyyang}@buaa.edu.cn

Abstract—Neural networks (NNs) have been applied to solve

various problems in ultra-reliable and low-latency communica-

tions (URLLC). Facing the stringent quality of service require-

ment, the time for training and running NNs is not negligible,

and how to ensure the reliability with learning-based solutions

is challenging, especially in a dynamic environment. In this

paper, we propose an online learning method, which ﬁne-tunes

the NNs trained without supervision to ensure the reliability

of URLLC for mobile users. A joint power and bandwidth

allocation problem, aiming to minimize the bandwidth required

for satisfying the quality of service of each user, is considered

as an example. A “learning-to-optimize” method with ofﬂine

training is provided for comparison. Simulation results show that

the proposed online learning method can achieve comparable

system performance as the ofﬂine training method, where the

time consumed for online training and inference is about 25%

of the 1ms latency bound for the considered setup. Besides,

the online learning method adapts to the abrupt change of

average packet arrival rate quickly and can ensure reliability by

setting the required overall packet loss probability conservative

slightly. By contrast, the ofﬂine training method yields much

worse reliability when the arrival rate varies.

Index Terms—Online training, real-time inference, URLLC

I. INTRODUCTION

Ultra-reliable and low-latency communication (URLLC) is

continuously to be investigated in the sixth generation mobile

systems due to various technical challenges [1].

Resource allocation plays a key role in satisfying the

quality of service (QoS) requirement. To use spectrum ef-

ﬁciently in wireless channels, the resources should be allo-

cated frequently to adapt to the time-varying communication

environments. To facilitate the real-time decision, the new

framework of “learning-to-optimize” proposed in [2] can be

used for URLLC, where a neural network (NN) well-trained

in an ofﬂine manner is used to learn the mapping from the

environmental parameters to the optimal resource allocation.

In general, NNs are unable to be generalized well in

dynamic environments since the training samples cannot be

gathered from all scenarios, and hence need to be re-trained.

For example, the NNs in [3] and [4] respectively need to

be re-trained periodically in the timescales of minutes and

seconds. To avoid the time used for generating labels in the

ofﬂine-training phase required by the framework in [2], which

is prohibitive for functional optimization problems [5] often

This work was supported by the key project of National Natural Science

Foundation of China (NSFC) under Grant 61731002.

appearing in URLLC, unsupervised learning can be applied

[4, 6, 7]. However, training the NNs still consumes time for

computing, which has never been taken into account and

evaluated for URLLC.

URLLC has stringent end-to-end (E2E) time budget, say 1

ms. Hence, the re-training time for NN-based solutions can

not be ignored, and low complexity online training methods

is urgently needed for URLLC to adapt to environmental

variations. Reinforcement learning (RL) is a powerful tool

for online learning, and has been used to support URLLC in

dynamic environment [8]. However, RL is designed to solve

problems formulated as Markov decision process (MDP),

which consumes unnecessary computational resource to solve

the non-MDP problems widely existed in URLLC.

In this paper, we propose an online learning method under

the framework of unsupervised deep learning for non-MPD

problems. We take a downlink resource allocation problem

in URLLC as an example, where the transmit power and

bandwidth are allocated among users according to their small-

scale and large-scale channel gains, respectively. The power

allocation policy is learned by a NN, which is trained on-

line together with the optimization of bandwidth allocation.

For adapting to the time-varying large-scale channel gains

of mobile users, an ofﬂine training method is provided for

comparison. Simulation results show that, with only a few

iterations for each observation of the large-scale channel

gains, the online learning method can achieve comparable

performance as the ofﬂine method. The total time used by the

online training and inference is even shorter than the inference

time of the ofﬂine method, both of which are much shorter

than the latency requirement.

II. PRO BL EM FO RM UL ATIO N AN D EXISTING SOLUTION

Consider a downlink (DL) orthogonal frequency division

multiple access system supporting URLLC, where a base

station (BS) equipped with Ntantennas serves Ksingle-

antenna mobile users with maximal power of Pmax.

The small-scale channels are time-varying with coherence

time Ts. Within the duration Ts, multiple frames, each with du-

ration Tf, are used for DL and uplink (UL) transmission. The

duration for DL transmission is τ. The large-scale channels are

also time-varying, which can be regarded as unchanged within

duration TLbut vary among the durations. The relation of the

timescales is shown in Fig. 1.

Time

濷

ࢀࡸࢀࡸ

ࢀ࢙

ࢀࢌ

濷

UL DL

Fig. 1. Timescales of channel variations and frame duration.

Since the packet size uin URLLC is usually small, the

bandwidth required for transmitting each packet is less than the

channel coherence bandwidth, i.e., the channel is ﬂat fading.

Since the E2E delay requirement in URLLC is typically

shorter than Ts, the channel is quasi-static and time diversity

cannot be exploited. To guarantee transmission reliability, we

consider frequency hopping, where each user is assigned with

different subchannels in adjacent frames. When the frequency

interval between adjacent subchannels exceeds the coherence

bandwidth, the small scale channels of a user among frames

are independent.

In URLLC, the blocklength of channel coding is ﬁnite

due to the short transmission duration. To characterize the

impact of decoding errors on reliability, the achievable rate

in ﬁnite blocklength regime is required. In quasi-static ﬂat

fading channels, when channel state information is available

at the BS and a user (say the kth user), the achievable rate (in

packets/frame) can be accurately approximated by [9],

sk≈τWk

uln 2 "ln 1 + αkgkPk

N0Wk−rVk

τWk

Q−1

G(εc

k)#(1)

where Wkand Pkare the bandwidth and transmit power

allocated to the kth user, εc

kis the decoding error probability,

αkand gkare the large-scale and small-scale channel gains of

the kth user, respectively, N0is the single-side noise spectral

density, Q−1

G(x)is the inverse of the Gaussian Q-function, and

Vk= 1 −1

h1+ αkgkPk

N0Wki2[9]. If the signal-to-noise ratio (SNR)

αkgkPk

N0Wk≥5dB, Vk≈1is accurate [10]. Since high SNR is

required for URLLC, such approximation is accurate.

Packets for each user arrive at the buffer of the BS randomly

and may accumulate into a queue. We consider that the packets

for different users wait in different queues.

The QoS requirements of URLLC can be characterized by

a delay bound Dmax and an overall packet loss probability

εmax. The delay of UL transmission, backhaul and processing

have been studied in literature [11, 12]. By further removing

the time used for online learning and inference, herein Dmax

is the DL delay, which consists of the queueing delay (denoted

as Dq

kfor the kth user), transmission delay Dtand decoding

delay Dc.Dt=Tfand Dcis a constant value [13]. All these

delay components are measured in frames. Due to the random

packet arrival, Dq

kis random. To ensure the delay requirement,

kshould be bounded by Dq

max ,Dmax −Dt−Dc.

If the queueing delay of a packet exceeds Dq

max, the packet

will be useless. The queueing delay violation probability for

the kth user, denoted as εq

k,Pr{Dq

k>Dq

max}, can be bounded

by εq

k< e−θkBE

kDq

max ,εq,U B

k[14], where θkis the QoS

exponent that satisﬁes CE

k≥BE

k,CE

kis effective capacity

depending on the service process and can be expressed as

k=−1

θkln Eg{e−θksk}(packet/frame) [14] and BE

kis

effective bandwidth depending on the packet arrival process.

Here, Eg{·} denotes the expectation over the small-scale

channel gains.

The overall reliability requirement can be characterized by

1−(1 −εc

k)(1 −εq

k)≈εc

k+εq

k≤εmax. This approximation is

very accurate, because the values of εcand εqare very small

in URLLC. The queueing delay requirement (Dq

max,εq

k) can

be satisﬁed if εq,U B

kis satisﬁed. Then, the overall reliability

requirement can be ensured if εmax =εc

k+e−θkBE

kDq

max . For

simplicity, we assume εc

k=εmax/2as in [11]. Then, the

QoS exponent θksatisfying (Dq

max,εq

k) can be obtained from

e−θkBE

kDq

max =εmax/2, with which the QoS of URLLC can

be ensured if −1

θkln Eg{e−θksk} ≥ BE

k. For example, when

the packets of the kth user arrive according to Poisson process

with average arrival rate ak,BE

k=ak

θk(eθk−1) (packet/frame)

and θk= ln h1−ln(εmax /2)

akDq

max i[11].

To exploit multi-user diversity, we allocate transmit power

to each user according to the small-scale channel gains of

all users g,{g1, ..., gK}. To reduce the complexity, the

bandwidth is allocated among users according to their large-

scale channel gains [4]. To improve the resource efﬁciency,

we minimize the total bandwidth required to ensure the QoS

by optimizing the power and the bandwidth allocation,

min

Pk(g),Wk

k=1

Wk(2)

s.t. −1

θk

lnEg{e−θksk} ≥ BE

k(2a)

k=1

Pk≤Pmax, Pk≥0, Wk≥0(2b)

Problem (2) involves two timescales, which is a functional

optimization problem. Besides, the constraint in (2a) is not

with closed-form expression. To solve such a challenging

problem, an unsupervised deep learning method is proposed

in [4] to ﬁnd Pk(g)and Wk,k= 1,· · · , K for every given

value of α,{α1, ..., αK}. In particular, problem (2) is ﬁrst

transformed into its primal-dual problem as follows,

max

h(g),λk

min

Pk(g),Wk

k=1 hWk+λkEg{e−θksk} − e−θkBE

ki

+ZRK

h(g) K

k=1

Pk(g)−Pmax!dg(3)

s.t.

k=1

Pk(g)≤Pmax (3a)

Pk(g)≥0, Wk≥0, h(g), λk≥0(3b)

where the objective is the Lagrangian function of problem

(2), and h(g)and λkare the Lagrangian multipliers. Then,

Pk(g)/Pmax is parameterized as a NN with model param-

eters ω, denoted as N(g;ω). Since the required bandwidth

decreases when more power are allocated, the equality in

(3a) holds. Thereby, by applying Softmax function in the

output layer, N(g;ω)can satisfy the maximal transmit power

constraint. Wkand the parameterized form of Pk(g)are

optimized from,

max

λk

min

ω,Wk

k=1 hWk+λkEg{e−θksk}−e−θkBE

ki (4)

s.t. Wk≥0, λk≥0(4a)

By taking the objective function in (4) as the loss function,

the optimal power and bandwidth allocation can be found from

stochastic gradient methods as shown in [4].

III. LEARNING RESOURCE ALL OC ATIO N IN OFFL IN E AN D

ONLINE MANNER

When large-scale channel gains change, N(g;ω)needs

to be re-trained and bandwidth allocation needs to be re-

optimized. To avoid the re-training, one can extend the idea

in [2] to learn P(g,α) = [P1(g,α),· · · , PK(g,α)] and

W(α) = [W1(α),· · · , WK(α)] with ofﬂine-trained NNs.

Alternatively, we can learn P(g)=[P1(g),· · · , PK(g)] with

online-trained NN and optimize W= [W1,· · · , WK]by

tracking αsampled with period of TLin an online manner.

A. Learning to Optimize Problem (2) with Ofﬂine Training

According to the proof in [7], problem (2) can be equiva-

lently transformed into the following form,

max

h(g,α)λk(α)min

Pk(g,α),Wk(α)

k=1

Eα(Wk(α)+λk(α)Ege−θkˆsk−e−θkBE

k

+ZRK

h(g,α) K

k=1

Pk(g,α)−Pmax!dg

)

s.t. (3a),Pk(g,α)≥0, Wk(α)≥0, h(g,α), λk(α)≥0

where Eα{·} denotes the expectation over the large-scale

channel gains.

We approximate Pk(g,α)/Pmax ,Wk(α)and λk(α),k=

1,· · · , K, with NNs denoted as NP(g,α;ωP),NW(α;ωW)

and Nλ(α;ωλ), respectively. To ensure positive bandwidth

and Lagrangian multipliers, we choose SoftPlus as the

output layer of NW(α;ωω)and Nλ(α;ωλ). To satisfy the

maximal transmit power constraint, we choose SoftMax

as the output layer of NP(g,α;ωP). Then, by taking the

Lagrange function as the loss function, we can train ωPand

ωWby stochastic gradient descent (SGD) and train ωλby

stochastic gradient ascent (SGA) as follows

ωP(t+1) = ωP(t)−φP(t)∇ωPˆ

L(t)

=ωP(t)−φP(t)Pmax∇ωPNP(α,g;ωP)∇Pˆ

L(t)

ωW(t+1) = ωW(t)−φW(t)∇ωWˆ

L(t)

=ωW(t)−φW(t)∇ωWNW(α;ωW)∇Wˆ

L(t)

ωλ(t+1) = ωλ(t)+φλ(t)∇ωλˆ

L(t)

=ωλ(t)+φλ(t)∇ωλNλ(α;ωλ)∇λˆ

L(t)

where ˆ

L(t),1

NbPNb

n=1PK

k=1hWk+λke−θkˆs(t)

k,n −e−θkBE

ki,

ˆs(t)

k,n is computed by (1) with a realization of large-scale

channel gain and a realization of small-scale channel gain,

and Nbis the batch size in each iteration. The gradient matrix

∇ωWNW(α;ωW),∇ωPNP(α,g;ωP)and ∇ωλNλ(α;ωλ)

can be computed through backward propagation, and ∇Wˆ

L(t),

∇Pˆ

L(t)and ∇λˆ

L(t)can be computed as

∇Wˆ

L(t)=(1−1

n=1

λt

k,n

∂ˆs(t)

k,n

∂Wk

θke−θkˆs(t)

k,n , k = 1,· · ·, K)

∇Pˆ

L(t)=(−1

n=1

λt

k,n

∂ˆs(t)

k,n

∂Pk

e−θkˆs(t)

k,n , k = 1,· · ·, K)

∇λˆ

L(t)=(1

n=1

e−θkˆs(t)

k,n −e−θkBE

k, k = 1,· · ·, K)

B. Learning to Optimize (2) by Online Training and Tracking

When the large-scale channel gains change slightly, it is

unnecessary to retrain the NNs from scratch, where the model

parameters only need ﬁne tuning.

To learn Pk(g)and Wkin an online manner, we can use

stochastic gradient method to solve problem (4). In particular,

for the lth sampled value of α, we ﬁne-tune the model

parameter ω(i.e., train N(g;ω)) and update Wkand λkfrom

Niterations in the lth round as follows,

ωl(t+ 1) = ωl(t)−φ(t)∇ωˆ

L(t)

=ωl(t)−φ(t)Pmax∇ωN(g;ω)∇Pˆ

L(t)

k(t+ 1) = "Wl

k(t)−φ(t)∂ˆ

L(t)

∂Wk#+

λl

k(t+ 1) = "λl

k(t) + φ(t)∂ˆ

L(t)

∂λk#+

where [x]+,max{x, 0}yields positive values, ˆ

L(t)has

the same form as deﬁned in Section III-A but herein ˆs(t)

k,n

is achievable rate computed by the nth realization of gkgiven

the lth sampled value of α, and Nbis the batch size in each

iteration. When the (l+ 1)th sampled value of αis observed,

the (l+ 1)th round of online training are initialized as

ωl+1(1) = ωl(N), λkl+1 (1) = λkl(N), W l+1

k(1)=Wl

k(N)

For the ﬁrst round, the values of ω,λkand Wkcan be set via

pre-training.

C. Procedure and Online Computational Complexity

In the two-timescale policy, bandwidth allocation is updated

with period TLwhen αchanges, and power allocation is

updated in each frame Tfwhen gchanges.

With the ofﬂine training method, the functions Pk(g,α)and

Wk(α)can be obtained once the NNs have been trained with

random realizations of gand α. We only consider the time

used to obtain one value of P(g,α)and W(α)via forward

propagation of NP(g,α;ωP)and NW(α;ωω), respectively

denoted as toff

Pand toff

W. The fraction of the time consumed

for inference per unit time can be obtained as

ηoff =TL

toff

P+toff

W.TL×100% (5)

With online learning, Pk(g)and Wkare obtained for each

value of αkwith Ntimes of iterations. Denote the time used

by each round of iterations as ton, and the time used for

forward propagation of N(g;ω)to obtain one value of P(g)

as ton

P. Then, the fraction of the time consumed for online

training and resource allocation per unit time is,

ηon =TL

ton

P+Nton.TL×100% (6)

IV. SIMULATION RESULTS

In this section, we compare the performance of the ofﬂine

training method in Section III-A (called Method-B), and the

online learning method in Section III-B (called Method-C).

Since there is no optimal solution for problem (2), we use the

method in [4] as a baseline (called Method-A).

𝒗𝟏

𝒗𝟐

𝒗𝒌

…𝒗𝑲

𝑫𝟏𝒎

𝑫𝟐𝒎

𝒙𝒌

𝑫𝟏𝒎

…

Fig. 2. Simulation setup: cell radius D1= 250 m, and D2= 50 m.

We consider K= 10 users, which are equally located in the

OA segment of a road when they start to request the URLLC

service, as shown in Fig. 2. Pmax = 36 dBm, Nt= 32, and

N0=−173 dBm/Hz. The path loss model is 35.3+37.6 lg(dk)

where dkis the distance between the kth user and the BS, and

the shadowing is zero mean log-normal with 8dB standard

deviation and 50 m correlation distance. Small-scale channels

is subject to Rayleigh distribution. To show the impact of

the speed of environmental change on the online learning, we

consider four groups of users with different velocities, where

the users in each group have the same velocity. M= 100

test locations (and hence the large-scale channel gains) are

uniformly selected on the road for evaluation. The remaining

simulation parameters are shown in Table I. This simulation

setup is used unless otherwise speciﬁed.

The ﬁne-tuned hyper-parameters of the NNs are as follows.

The numbers of hidden layers of all the NNs are 4. The nodes

of hidden layers of Method-B and Method-C are 40 and 20,

respectively. For ofﬂine training, each sample includes a large-

scale channel gain (generated by a randomly location in the

cell with the path loss model and shadowing) and a small-scale

channel gain (generated according to Rayleigh distribution).

For online learning and testing the ofﬂine training method,

the large-scale and small-scale channel gains are generated

according to the simulation setup.

TABLE I

SIMULATION PARAMETERS

Overall packet loss probability εmax 10−5

Frame duration Tfand DL transmission time τ0.1ms and 0.05 ms

DL delay bound Dmax 10 frames/1ms

Transmission delay Dtand decoding delay Dc1frame (i.e., 0.1ms)

Packet size u20 bytes (160 bits)

Average packet arrival rate a0.2packets/frame

Sampling period of large-scale channel gain TL0.1s

To evaluate the performance of online learning and of-

ﬂine training methods in terms of minimizing the band-

width and satisfying the constraints, we use the average

relative error of the Lagrangian function as a metric, εAX

m=1

|ˆ

LX(m)−ˆ

LA(m)|

Mˆ

LA(m), X =Bor C, where ˆ

LA(m)and

LX(m)is the Lagrangian function of Method-A and Method-

X obtained at the mth test location, respectively. For a fair

comparison, we use the Lagrangian multiplier of Method-A

to compute the Lagrangian function of the other two methods

such that all methods achieve the same performance in terms

of satisfying the constraints.

In Fig. 3, we provide the performance of the online learning

method versus the number of iterations N. The results are

averaged over 50 rounds of independent training. It is shown

that only a few iterations are required for online learning to

achieve the same performance as ofﬂine training, even for high

speed scenarios (e.g. N= 5 for 40 km/h, and N= 11 for

80 km/h).

12345678910111213

10-3

10-2

Fig. 3. Performance of online learning versus N.

To evaluate the accuracy of the learning methods, we deﬁne

the average relative error of the learned bandwidth as, εAX

m=1 PK

k=1

|WX

k(m)−WA

k(m)|

KM W A

k(m), X =Bor C, where WA

k(m)

and WX

k(m)are the bandwidth allocated to the kth user by

Method-A and Method-X at the mth test location, respectively.

The average relative error of the learned power allocation is

deﬁned in the same way. The results are provided in Table II.

We can ﬁnd that the relative errors are small, where the error

of learning power allocation is larger. When the number of

iterations is given, the accuracy of online training decreases

with the increase of velocity.

TABLE II

REL ATIVE ER ROR S OF T HE LEA RNE D BANDWIDTH AND POWE R

Method Method-B Method-C (N= 11)

Resources Bandwidth Power Bandwidth Power

v= 10km/h0.91% 3.66% 0.27% 1.45%

v= 20km/h0.91% 3.66% 0.38% 2.02%

v= 40km/h0.91% 3.66% 0.57% 2.74%

v= 80km/h0.91% 3.66% 0.81% 3.86%

It is note worthy that, to improve generalization ability, the

ofﬂine training method can also be designed to further learn

the mapping from some other environmental parameters to

the optimal solution. Nonetheless, in practice there are always

parameters that are unable to learn in this way. To demonstrate

what happens in dynamic environment, we simulate a simple

scenario, where the average packet arrival rate ais 0.2pack-

ets/frame from 0s to 5s, and changes to 0.3packets/frame

from 5s to 10 s, and the velocities of all users is set as

40 km/h. The overall packet loss probabilities of the worst

user achieved by different learning methods are provided

in Fig. 4, where the large-scale channel gain only includes

path loss in this ﬁgure to emphasize the impact of the time-

varying arrival rate. It is shown that the ofﬂine training method

yields much worse reliability after achanges, indicating poor

generalization ability to the packet arrival rate. Although the

online learning method also cannot enure εmax = 10−5due

to the errors of estimating ˆ

L(t)and the stochastic gradient

method, its overall packet loss probability is always less than

2εmax and is robust to the abrupt change of a. This suggests

that the reliability of the online learning method can be ensured

by setting εmax a little conservatively during the training, say

as 0.5·10−5.

0246810

10-5

10-4

Fig. 4. Achieved overall reliability with time-varying αand a,N= 5.

In Table III, we compare the computational complexity

between ofﬂine and online training methods, where toff

P,toff

ton

Pand ton are tested on an Intel® Core™ i7-8700K CPU

@3.70 GHz. Recall from Table I that the sampling period of

large-scale channel gain is TL= 100 ms and TL/Tf= 1000,

ηon = 25.95% means that about 26 ms is used for online

training N(g;ω), obtain one optimal value of Wand 1000

optimal values of P. Further recalling the deﬁnition of ηon,

the result means that in average 0.26 ms will be used in every

1ms (i.e., the time budget for learning and inference is 1/4

of the E2E delay bound). For Method-B, in average 0.28 ms

is used for inference in every 1ms without considering the

time for ofﬂine training.

TABLE III

COM PUTATI ONA L COMPLEXITY,AL L USE RS M OVE W ITH 80 K M/H.

Ofﬂine Training Method toff

Ptoff

Wηoff

Computational Time 0.028 ms 0.026 ms 28.03%

Online Training Method (N= 11)ton

Pton ηon

Computational Time 0.025 ms 0.086 ms 25.95%

V. CONCLUSION

In this paper, we jointly optimized bandwidth and power

allocation in URLLC with deep learning, where the NNs are

trained either online or ofﬂine. Simulation results showed that

the proposed online learning method can achieve the system

performance of ofﬂine training via a few steps of online update

and training, and is robust to the change of environmental

parameters. By evaluating with a regular computer, the time

consumed by online training a NN and optimizing the resource

allocation is in average one fourth of the E2E delay bound. By

setting the overall packet loss probability a little conservative,

the reliability can be guaranteed by the online learning method.

In conclusion, our results suggest that online deep learning is

possible to be applied for real-time resource allocation in dy-

namic environment with low online computational complexity.

REFERENCES

[1] C. She, R. Dong, Z. Gu, Z. Hou, Y. Li, W. Hardjawana, C. Yang,

L. Song, and B. Vucetic, “Deep learning for ultra-reliable and low-

latency communications in 6G networks,” IEEE Netw., vol. 34, no. 5,

pp. 219–225, 2020.

[2] H. Sun, X. Chen, Q. Shi, M. Hong, X. Fu, and N. D. Sidiropoulos,

“Learning to optimize: Training deep neural networks for interference

management,” IEEE Trans. Signal Process., vol. 66, no. 20, pp. 5348–

5453, Oct. 2018.

[3] A. Alkhateeb, S. Alex, P. Varkey, Y. Li, Q. Qu, and D. Tujkovic, “Deep

learning coordinated beamforming for highly-mobile millimeter wave

systems,” IEEE Access, vol. 6, pp. 37328–37 348, May 2018.

[4] C. Sun and C. Yang, “Unsupervised deep learning for ultra-reliable and

low-latency communications,” IEEE GLOBECOM, 2019.

[5] E. Zeidler, Nonlinear functional analysis and its applications: III:

variational methods and optimization. Springer Science & Business

Media, 2013.

[6] M. Eisen, C. Zhang, L. F. Chamon, D. D. Lee, and A. Ribeiro, “Learning

optimal resource allocations in wireless systems,” IEEE Trans. Signal

Process., vol. 67, no. 10, pp. 2775–2790, May 2019.

[7] C. Sun and C. Yang, “Learning to optimize with unsupervised learning:

Training deep neural networks for URLLC,” IEEE PIMRC, 2019.

[8] A. T. Z. Kasgari, W. Saad, M. Mozaffari, and H. V. Poor, “Experi-

enced deep reinforcement learning with generative adversarial networks

(GANs) for model-free ultra reliable low latency communication,” IEEE

Trans. Commun., pp. 1–1, 2020.

[9] W. Yang, G. Durisi, T. Koch, et al., “Quasi-static multiple-antenna fading

channels at ﬁnite blocklength,” IEEE Trans. Inf. Theory, vol. 60, no. 7,

pp. 4232–4264, Jul. 2014.

[10] J. G. S. Schiessl and H. Al-Zubaidy, “Delay analysis for wireless fading

channels with ﬁnite blocklength channel coding,” ACM MSWiM, 2015.

[11] C. She, C. Yang, and T. Q. S. Quek, “Joint uplink and downlink resource

conﬁguration for ultra-reliable and low-latency communications,” IEEE

Trans. Commun., vol. 66, no. 5, pp. 2266–2280, May 2018.

[12] B. Makki, T. Svensson, G. Caire, and M. Zorzi, “Fast HARQ over ﬁnite

blocklength codes: A technique for low-latency reliable communication,”

IEEE Trans. Wireless Commun., vol. 18, no. 1, pp. 194–209, Jan 2019.

[13] M. Condoluci, T. Mahmoodi, E. Steinbach, and M. Dohler, “Soft re-

source reservation for low-delayed teleoperation over mobile networks,”

IEEE Access, vol. 5, pp. 10 445–10 455, May 2017.

[14] J. Tang and X. Zhang, “Quality-of-service driven power and rate

adaptation over wireless links,” IEEE Trans. Wireless Commun., vol. 6,

no. 8, pp. 3058–3068, 2007.

Energy efficient power allocation for ultra‐reliable and low‐latency communications via unsupervised learning

Article

Full-text available

Mar 2023

Energy efficiency (EE) is an important indicator in ultra‐reliable and low‐latency communication (URLLC). Power allocation is considered as an effective method to achieve high EE in URLLC. However, since the EE optimization problem is non‐convex, it is difficult to obtain the analytical solution efficiently. Moreover, to ensure reliable and low‐latency communication within a finite blocklength, the Shannon formula becomes impractical for URLLC. Therefore, finite blocklength coding theory is used to meet the requirements of URLLC. In this paper, the EE problem of URLLC is formulated and the power allocation function is parameterized to be optimized through a deep neural network (DNN). The DNN is trained through the primal‐dual iterative algorithm offline in the unsupervised manner, and can be deployed online to achieve real time power allocation results. The numerical results show the effectiveness of the proposed method.

Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction

Preprint

Feb 2023

The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services, such as enhanced mobile broadband (eMBB) devices, by only allocating resources when necessary. The main challenge is posed by the uncertainty in the process of URLLC packet generation, which mandates the use of predictors for URLLC traffic in the coming frames. In practice, such prediction may overestimate or underestimate the amount of URLLC data to be generated, yielding either an excessive or an insufficient amount of resources to be pre-emptively allocated for URLLC packets. In this paper, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor. The proposed method leverages recent advances in online conformal prediction (CP), and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer.

Optimal Allocation of Mobile Learning Resources Based on a Complex Network

Article

Full-text available

Mar 2022

Currently, centralized online learning can no longer meet the fragmented learning needs of learners. It is a hot topic in mobile learning to allocate reasonable mobile learning resources (MLRs) for user terminals and servers. However, the existing studies have rarely discussed the matching relationship between the MLR features of user terminals and servers. To fill up the gap, this paper tries to optimize the allocation of MLRs based on the theory of mobile knowledge complex network. Firstly, a local bidirectional fitness model was established to optimize MLR allocation, and the core nodes were mined from the complex network of MLRs. Next, the authors clarified the causality between the density of MLR complex network and resource integration, constructed an evaluation index system (EIS) for MLR integration ability, and evaluated the overall resource integration ability of MLR network resources. The proposed network was proved effective in optimizing the resource allocation of mobile learning networks through experiments.

Guaranteed Dynamic Scheduling of Ultra-Reliable Low-Latency Traffic via Conformal Prediction

Article

Jan 2023

The dynamic scheduling of ultra-reliable and low-latency traffic (URLLC) in the uplink can significantly enhance the efficiency of coexisting services, such as enhanced mobile broadband (eMBB) devices, by only allocating resources when necessary. The main challenge is posed by the uncertainty in the process of URLLC packet generation, which mandates the use of predictors for URLLC traffic in the coming frames. In practice, such prediction may overestimate or underestimate the amount of URLLC data to be generated, yielding either an excessive or an insufficient amount of resources to be pre-emptively allocated for URLLC packets. In this paper, we introduce a novel scheduler for URLLC packets that provides formal guarantees on reliability and latency irrespective of the quality of the URLLC traffic predictor . The proposed method leverages recent advances in online conformal prediction (CP) , and follows the principle of dynamically adjusting the amount of allocated resources so as to meet reliability and latency requirements set by the designer.

Improving Learning Efficiency for Wireless Resource Allocation with Symmetric Prior

Article

Apr 2022

Improving learning efficiency is paramount for learning resource allocation with deep neural networks (DNNs) in wireless communications over highly dynamic environments. Incorporating domain knowledge into learning is a promising approach to dealing with this issue. It is also an emerging topic in the wireless community. In this article, we briefly summarize two approaches for using domain knowledge: introducing a mathematical model and prior knowledge to deep learning. Then, we consider a type of symmetric prior permutation equivariance, which widely exists in wireless tasks. To explain how such a generic prior is harnessed to improve learning efficiency, we resort to ranking, which jointly sorts the input and output of a DNN. We use power allocation among subcarriers, probabilistic content caching, and interference coordination to illustrate the improvement of learning efficiency by exploiting the property. From the case study, we find that the required training samples to achieve given system performance decreases with the number of subcarriers or contents, owing to an interesting phenomenon called “sample hardening.” Simulation results show that the training samples, the free parameters in DNNs, and the training time can be reduced dramatically by harnessing the prior knowledge. The samples required to train a DNN after ranking can be reduced by 15 ∼ 2,400 folds to achieve the same system performance as the counterpart without using prior.

Learning Power Allocation for Multi-Cell-Multi-User Systems With Heterogeneous Graph Neural Networks

Article

Aug 2021

A well-trained deep neural network (DNN) enables real-time resource allocation by learning the relationship between a policy and its impacting parameters. When wireless systems operate in dynamic environments, the DNN has to be re-trained frequently and hence training complexity should be low. A promising approach to deal with this issue is to construct DNNs with prior knowledge. In this paper, we show that the power allocation policy in multi-cell-multi-user systems exhibits a combination of permutation equivariance properties, which can be harnessed by graph neural networks (GNNs). In particular, we construct a heterogeneous graph and resort to heterogeneous GNN for learning the policy, whose outputs are only equivariant to some permutations of vertexes rather than arbitrary permutations as homogeneous GNNs. We prove that the properties of the functions learned by existing heterogeneous GNN for the formulated graph are inconsistent with the properties of the policy. To avoid the performance degradation by embedding wrong priors, we design a parameter sharing scheme for heterogeneous GNN such that the learned relationship satisfies the desired properties. Simulation results show that the sample and computational complexities for training the constructed GNN are much lower than existing DNNs to achieve the same sum rate.

Unsupervised Deep Learning for Ultra-Reliable and Low-Latency Communications

Conference Paper

Full-text available

Dec 2019

Learning to Optimize with Unsupervised Learning: Training Deep Neural Networks for URLLC

Conference Paper

Full-text available

Sep 2019

Learning Optimal Resource Allocations in Wireless Systems

Article

Full-text available

Apr 2019

This paper considers the design of optimal resource allocation policies in wireless communication systems which are generically modeled as a functional optimization problem with stochastic constraints. These optimization problems have the structure of a learning problem in which the statistical loss appears as a constraint, motivating the development of learning methodologies to attempt their solution. To handle stochastic constraints, training is undertaken in the dual domain. It is shown that this can be done with small loss of optimality when using near-universal learning parameterizations. In particular, since deep neural networks (DNN) are near-universal their use is advocated and explored. DNNs are trained here with a model-free primal-dual method that simultaneously learns a DNN parametrization of the resource allocation policy and optimizes the primal and dual variables. Numerical simulations demonstrate the strong performance of the proposed approach on a number of common wireless resource allocation problems.

Deep Learning Coordinated Beamforming for Highly-Mobile Millimeter Wave Systems

Article

Full-text available

Jun 2018

Supporting high mobility in millimeter wave (mmWave) systems enables a wide range of important applications such as vehicular communications and wireless virtual/augmented reality. Realizing this in practice, though, requires overcoming several challenges. First, the use of narrow beams and the sensitivity of mmWave signals to blockage greatly impact the coverage and reliability of highly-mobile links. Second, highly-mobile users in dense mmWave deployments need to frequently hand-off between base stations (BSs), which is associated with critical control and latency overhead. Further, identifying the optimal beamforming vectors in large antenna array mmWave systems requires considerable training overhead, which significantly affects the efficiency of these mobile systems. In this paper, a novel integrated machine learning and coordinated beamforming solution is developed to overcome these challenges and enable highly-mobile mmWave applications. In the proposed solution, a number of distributed yet coordinating BSs simultaneously serve a mobile user. This user ideally needs to transmit only one uplink training pilot sequence that will be jointly received at the coordinating BSs using omni or quasi-omni beam patterns. These received signals draw a defining signature not only for the user location, but also for its interaction with the surrounding environment. The developed solution then leverages a deep learning model that learns how to use these signatures to predict the beamforming vectors at the BSs. This renders a comprehensive solution that supports highly-mobile mmWave applications with reliable coverage, low latency, and negligible training overhead. Extensive simulation results, based on accurate ray-tracing, show that the proposed deep-learning coordinated beamforming strategy approaches the achievable rate of the genie-aided solution that knows the optimal beamforming vectors with no training overhead. Compared to traditional beamforming solutions, the results show that the proposed deep learning based strategy attains higher rates, especially in high-mobility large-array regimes.

Joint Uplink and Downlink Resource Configuration for Ultra-Reliable and Low-Latency Communications

Article

Full-text available

Jun 2018

Supporting ultra-reliable and low-latency communications (URLLC) is one of the major goals for the fifth-generation cellular networks. Since spectrum usage efficiency is always a concern, and large bandwidth is required for ensuring stringent quality-of-service (QoS), we minimize the total bandwidth under the QoS constraints of URLLC. We first propose a packet delivery mechanism for URLLC. To reduce the required bandwidth for ensuring queueing delay, we consider a statistical multiplexing queueing mode, where the packets to be sent to different devices are waiting in one queue at the base station, and broadcast mode is adopted in downlink transmission. In this way, downlink bandwidth is shared among packets of multiple devices. In uplink transmission, different subchannels are allocated to different devices to avoid strong interference. Then, we jointly optimize uplink and downlink bandwidth configuration and delay components to minimize the total bandwidth required to guarantee the overall packet loss and end-to-end delay, which includes uplink and downlink transmission delays, queueing delay and backhaul delay. We propose a two-step method to find the optimal solution. Simulation and numerical results validate our analysis and show remarkable performance gain by jointly optimizing uplink and downlink configuration.

Soft Resource Reservation for Low-Delayed Teleoperation Over Mobile Networks

Article

Full-text available

May 2017

The emerging Tactile Internet (TI) will enable control-oriented networks for remotely accessing or manipulating objects or devices. One major challenge in this context is how to achieve ultra-low-delay communication between the local operator and the remote object/device to guarantee the stability of the global control loop and to maximize the user’s qualityof- experience (QoE). Being one of the major human-in-the-loop applications of the TI, haptic teleoperation inherits its delaysensitive nature and requires the orchestration of communication and control approaches. In this paper, we focus on the radio access protocol, and its impact on the latency of wireless communication. We propose a novel soft resource reservation mechanism for the uplink scheduling of mobile networks that can significantly reduce the latency compared with the current legacy scheme. By leveraging the characteristics of teleoperation data traffic, and reserving resources accordingly, the proposed soft reservation scheme maintains the spectral efficiency while the human operator’s QoE is improved. The simulation results confirm the efficiency of the proposed scheme.

Experienced Deep Reinforcement Learning With Generative Adversarial Networks (GANs) for Model-Free Ultra Reliable Low Latency Communication

Article

Oct 2020

In this paper, a novel experienced deep reinforcement learning (deep-RL) framework is proposed to provide model-free resource allocation for ultra reliable low latency communication (URLLC-6G) in the downlink of a wireless network. The goal is to guarantee high end-to-end reliability and low end-to-end latency, under explicit data rate constraints, for each wireless user without any models of or assumptions on the users’ traffic. In particular, in order to enable the deep-RL framework to account for extreme network conditions and operate in highly reliable systems, a new approach based on generative adversarial networks (GANs) is proposed. This GAN approach is used to pretrain the deep-RL framework using a mix of real and synthetic data, thus creating an experienced deep-RL framework that has been exposed to a broad range of network conditions. The proposed deep-RL framework is particularly applied to a multi-user orthogonal frequency division multiple access (OFDMA) resource allocation system. Formally, this URLLC-6G resource allocation problem in OFDMA systems is posed as a power minimization problem under reliability, latency, and rate constraints. To solve this problem using experienced deep-RL, first, the rate of each user is determined. Then, these rates are mapped to the resource block and power allocation vectors of the studied wireless system. Finally, the end-to-end reliability and latency of each user are used as feedback to the deep-RL framework. It is then shown that at the fixed-point of the deep-RL algorithm, the reliability and latency of the users are near-optimal. Moreover, for the proposed GAN approach, a theoretical limit for the generator output is analytically derived. Simulation results show how the proposed approach can achieve near-optimal performance within the rate-reliability-latency region, depending on the network and service requirements. The results also show that the proposed experienced deep-RL framework is able to remove the transient training time that makes conventional deep-RL methods unsuitable for URLLC-6G. Moreover, during extreme conditions, it is shown that the proposed, experienced deep-RL agent can recover instantly while a conventional deep-RL agent takes several epochs to adapt to new extreme conditions.

Deep Learning for Ultra-Reliable and Low-Latency Communications in 6G Networks

Article

Jul 2020

In future 6th generation networks, URLLC will lay the foundation for emerging mission-critical applications that have stringent requirements on end-to-end delay and reliability. Existing works on URLLC are mainly based on theoretical models and assumptions. The model-based solutions provide useful insights, but cannot be directly implemented in practice. In this article, we first summarize how to apply data-driven supervised deep learning and deep reinforcement learning in URLLC, and discuss some open problems of these methods. To address these open problems, we develop a multi-level architecture that enables device intelligence, edge intelligence, and cloud intelligence for URLLC. The basic idea is to merge theoretical models and real-world data in analyzing the latency and reliability and training deep neural networks (DNNs). Deep transfer learning is adopted in the architecture to fine-tune the pre-trained DNNs in non-stationary networks. Further considering that the computing capacity at each user and each mobile edge computing server is limited, federated learning is applied to improve the learning efficiency. Finally, we provide some experimental and simulation results and discuss some future directions.

Fast HARQ over Finite Blocklength Codes: A Technique for Low-Latency Reliable Communication

Article

Nov 2018

This paper studies the performance of delayconstrained hybrid automatic repeat request (HARQ) protocols. Particularly, we propose a fast HARQ protocol where, to increase the end-to-end throughput, some HARQ feedback signals and successive message decodings are omitted. Considering quasistatic channels and a bursty communication model, we derive closed-form expressions for the message decoding probabilities as well as the throughput, the expected delay and the error probability of the HARQ setups. The analysis is based on recent results on the achievable rates of finite-length codes and shows the effect of the codeword length on the system performance. Moreover, we evaluate the effect of various parameters such as imperfect channel estimation and hardware on the system performance. As demonstrated, the proposed fast HARQ protocol reduces the packet transmission delay considerably, compared to state-of-the-art HARQ schemes. For example, with typical message decoding delay profiles and a maximum of 2, … , 5 transmission rounds, the proposed fast HARQ protocol can improve the expected delay, compared to standard HARQ, by 27, 42, 52 and 60%, respectively, independently of the code rate/fading model.

Learning to Optimize: Training Deep Neural Networks for Interference Management

Article

Aug 2018

Resource Allocation in URLLC with Online Learning for Mobile Users

Figures

Recommended publications

A Study on the Impact of Mobility on the Performance of Routing Protocols in MANET-Based Online Lear...

Online Learning for Offloading and Autoscaling in Renewable-Powered Mobile Edge Computing

Research Summary of Aging-Adaptive Design for Mobile Online Learning

Online Learning and Matching for Resource Allocation Problems