PreprintPDF Available

Optimizing Information Freshness via Multiuser Scheduling with Adaptive NOMA/OMA

Authors:
Preprints and early-stage research may not have been peer reviewed yet.

Abstract and Figures

This paper considers a wireless network with a base station (BS) conducting timely status updates to multiple clients via adaptive non-orthogonal multiple access (NOMA)/orthogonal multiple access (OMA). Specifically, the BS is able to adaptively switch between NOMA and OMA for the downlink transmission to optimize the information freshness of the network, characterized by the Age of Information (AoI) metric. If the BS chooses OMA, it can only serve one client within each time slot and should decide which client to serve; if the BS chooses NOMA, it can serve more than one client at the same time and needs to decide the power allocated to the served clients. For the simple two-client case, we formulate a Markov Decision Process (MDP) problem and develop the optimal policy for the BS to decide whether to use NOMA or OMA for each downlink transmission based on the instantaneous AoI of both clients. The optimal policy is shown to have a switching-type property with obvious decision switching boundaries. A near-optimal policy with lower computation complexity is also devised. For the more general multi-client scenario, inspired by the proposed near-optimal policy, we formulate a nonlinear optimization problem to determine the optimal power allocated to each client by maximizing the expected AoI drop of the network in each time slot. We resolve the formulated problem by approximating it as a convex optimization problem. We also derive the upper bound of the gap between the approximate convex problem and the original nonlinear, nonconvex problem. Simulation results validate the effectiveness of the adopted approximation. The performance of the adaptive NOMA/OMA scheme by solving the convex optimization is shown to be close to that of max-weight policy solved by exhaustive search...
Content may be subject to copyright.
Optimizing Information Freshness via
Multiuser Scheduling with Adaptive
NOMA/OMA
Qian Wang, He Chen, Changhong Zhao, Yonghui Li, Petar Popovski and Branka
Vucetic
Abstract
This paper considers a wireless network with a base station (BS) conducting timely status updates
to multiple clients via adaptive non-orthogonal multiple access (NOMA)/orthogonal multiple access
(OMA). Specifically, the BS is able to adaptively switch between NOMA and OMA for the down-
link transmission to optimize the information freshness of the network, characterized by the Age of
Information (AoI) metric. If the BS chooses OMA, it can only serve one client within each time slot
and should decide which client to serve; if the BS chooses NOMA, it can serve more than one client
at the same time and needs to decide the power allocated to the served clients. For the simple two-
client case, we formulate a Markov Decision Process (MDP) problem and develop the optimal policy
for the BS to decide whether to use NOMA or OMA for each downlink transmission based on the
instantaneous AoI of both clients. The optimal policy is shown to have a switching-type property with
obvious decision switching boundaries. A near-optimal policy with lower computation complexity is
also devised. For the more general multi-client scenario, inspired by the proposed near-optimal policy,
The work of H. Chen is supported by the CUHK direct grant under the project code 4055126. Part of the paper was presented
on IEEE ISIT 2020 [1].
Q.Wang is with School of Electrical and Information Engineering, The University of Sydney, Sydney, NSW 2006, Australia
and Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong SAR, China. The work is done
when she is a visiting student at CUHK (email:qian.wang2@sydney.edu.au).
H. Chen and C. Zhao are with Department of Information Engineering, The Chinese University of Hong Kong, Hong Kong
SAR, China (email: {he.chen, chzhao}@ie.cuhk.edu.hk).
Y. Li and B. Vucetic are with School of Electrical and Information Engineering, The University of Sydney, Sydney, NSW
2006, Australia (email: {yonghui.li, branka.vucetic}@sydney.edu.au).
P. Popovski is with the Department of Electronic Systems, Faculty of Engineering and Science, APNet Section, Aalborg
University, 9220 Aalborg, Denmark (email: petarp@es.aau.dk)
arXiv:2007.04072v1 [cs.IT] 7 Jul 2020
we formulate a nonlinear optimization problem to determine the optimal power allocated to each client
by maximizing the expected AoI drop of the network in each time slot. We resolve the formulated
problem by approximating it as a convex optimization problem. We also derive the upper bound of
the gap between the approximate convex problem and the original nonlinear, nonconvex problem.
Simulation results validate the effectiveness of the adopted approximation. The performance of the
adaptive NOMA/OMA scheme by solving the convex optimization is shown to be close to that of max-
weight policy solved by exhaustive search. Besides, the adaptive NOMA/OMA scheme has achieved
significant performance improvement comparing to the OMA scheme, especially when the number of
clients in the network is large and the transmission SNR is high.
Index Terms
Information freshness, Age of Information, multiuser scheduling, non-orthogonal multiple access,
Markov decision process and power allocation.
I. INTRODUCTION
Recently, researchers have shown enormous interest (see, e.g, [2]–[19]) in a new performance
metric, termed Age of Information (AoI), thanks to its capability in characterizing the timeliness
of data transmission in status update systems. The timeliness of status update is of great impor-
tance, especially in real-time monitoring applications, in which the dynamics of the monitored
processes need to be well grasped at the monitor side for further actions. The AoI is defined as
the time elapsed since the generation time of the latest received status update at the destination
[2]. According to this definition, the AoI is jointly determined by the transmission interval and
the transmission delay.
Early work on the analysis and optimization of AoI in various networks has mainly focused
on the simple single-source system model [2]–[11]. Recent efforts on AoI optimization pay
more attention to the more general multi-source systems [12]–[19]. For systems with multiple
sources, the AoI of each user depends on the transmission scheduling of all devices. In this
line of research, the authors in [12] considered a base station (BS) receiving status updates
from multiple nodes with a generate-at-will status arrival model in the uplink. A BS serving
status updates to multiple nodes in the downlink with the randomly generated status update was
investigated in [14]. Both of them derived the lower bound of the weighted sum of the expected
AoI of the considered network and compared the lower bound with that of various suboptimal
scheduling policies, including Whittle index policy and max-weight policy, etc. The authors in
[16] also considered systems with stochastic status update arrivals and derived the Whittle index
policy in closed form. A decentralized policy was proposed in [16], which was shown to achieve
near-optimal performance. Another branch of this research line is to analyze and optimize the
AoI of the networks with random access protocols. Particularly, the AoI performance of slotted
ALOHA was investigated in [15], [18] and that of Carrier Sense Multiple Access (CSMA) was
investigated in [17].
All aforementioned studies on AoI have concentrated on the orthogonal multiple access (OMA)
scheme. That is, only one status update packet can be delivered and received in each time slot.
Very recently, the authors in [20] have for the first time investigated the potential of applying
non-orthogonal multiple access (NOMA) in reducing the average AoI of a two-node network.
The results in [20] showed that OMA and NOMA can outperform each other in different setups.
In fact, NOMA has been regarded as a promising technique to deal with large-scale Internet of
Thing (IoT) deployment [21]–[24]. The basic idea of NOMA is to leverage the power domain
to enable multiple clients to be served at the same time or frequency band. Compared to OMA,
NOMA has the potential to reduce AoI by improving spectrum utilization efficiency. Specifically,
more than one client can be served by the BS using NOMA, resulting in a possible AoI drop
of more than one client. However, in OMA, only the served client may have AoI drop and the
AoI of all other clients will increase. In this context, a natural question arises: how should a
multiuser system adaptively switch between OMA and NOMA modes to minimize the long-term
average weighted sum of AoI of the network? To the best of authors’ knowledge, the answer
to this question remains unknown in the literature. The NOMA scheme allows the BS to serve
more clients in each time slot at the cost of a high transmission error probability, while the OMA
scheme serves at most one client in each time slot with a smaller transmission error probability.
This makes the optimal multiuser scheduling problem with adaptive NOMA/OMA non-trivial.
In Fig.1 we depict an example of the AoI evolution under the adopted adaptive NOMA/OMA
scheduling for a two-client network. We can observe from Fig. 1 that the BS may take a risk to
serve both clients in order to achieve small AoI for both clients at next time slot when the age
difference between clients is relatively small. When the age difference between clients is large
with one age being small, the BS tends to use OMA to serve the client with larger AoI.
Motivated by the gap above, in this paper we consider a wireless network with a BS that
conducts timely status updates to multiple clients in a time-slotted manner. The BS is able to
adaptively switch between NOMA and OMA for the downlink transmission. To achieve reduced
Timeslot
AoI
0
2
4
6
8
Schedule
client2
client1
NNN
NN N
NOMAschemeservesbothclients
N
OMAschemeservesclient1 OMAschemeservesclient2
Figure 1: An illustration of AoI evolution for a two-client network under the adopted adaptive
NOMA/OMA scheduling.
AoI performance, the BS needs to decide which scheme (i.e., NOMA or OMA) to use at the
beginning of each time slot. For the OMA scheme, the BS should further decide which client
to serve. For the NOMA scheme, the BS needs to further decide the power allocated to each
scheduled client. That is, when using NOMA, the BS should decide which clients to serve by
allocating non-zero power for status update transmission to these clients; the rest unselected
clients will be allocated with zero power.
A. Contributions
The main contributions of this paper lie in the following two aspects:
For the two-client scenario, we develop the optimal policy for the BS to decide whether
to use NOMA or OMA for each downlink transmission based on the instantaneous AoI
of both clients by formulating a Markov Decision Process (MDP) problem. We prove the
existence of the optimal stationary and deterministic policy, and perform action elimination
to reduce the action space for lower computation complexity. The optimal policy is shown to
have a switching-type property with obvious decision switching boundaries. A suboptimal
policy with lower computation complexity is also proposed, which can achieve near-optimal
performance, as shown by the simulation results.
For the multi-client scenario, the optimal policy is not computationally tractable due to the
exponentially increasing state space for linearly increasing number of clients, the coupled
AoI evolution across clients and large action space considering different combinations
of power allocated to each client. To adaptively switch between NOMA and OMA, we
formulate a nonlinear optimization problem to determine the optimal power allocated to
each client by maximizing the weighted sum of expected AoI drop of the network within
each time slot, inspired by the near-optimal policy and the max-weight policy in [12]–
[14]. We manage to resolve the formulated problem by approximating it as a convex
optimization problem. We also derive the upper bound of the gap between the approximate
convex problem and the original nonlinear, nonconvex problem. Simulation results show the
effectiveness of the adopted approximation. The performance of the adaptive NOMA/OMA
scheme by solving the convex optimization problem is shown to be close to that of max-
weight policy solved by exhaustive search. Besides, the adaptive NOMA/OMA scheme can
achieve significantly lower average AoI, comparing to OMA scheme, especially when the
number of clients in the network is large and the transmission SNR is high.
B. Related Work
We note that MDP method has been widely used in designing optimal scheduling policies
for average AoI minimization [3]–[6], [19]. In multiuser systems, the states of the system are
jointly determined by the AoI values of all users, where the MDP method becomes intractable
as the number of users increases. This is because the increasing number of users will lead to
exponentially exploding state space and enormous computation complexity, known as the curse
of dimensionality [25]. Thus, several attempts [12]–[14], [16], [17], [19] have been made to seek
for low-complexity scheduling algorithms. Whittle index policy has been investigated in [12],
[13], [16], [19], where the indexability of their considered problem was proved. This policy
demonstrated near-optimal performance in numerical simulations. To implement the Whittle
index policy, the Whittle index function needs to be derived beforehand and the user with the
largest Whittle index will be scheduled to update its status. However, it can be challenging to
prove indexability and derive closed-form Whittle index function for many problems [26]. To
address these issues, the authors in [27] proposed an Approximate Index Policy. On the other
hand, the max-weight policy has been studied in [12]–[14] and the upper bound of its average
age performance was analyzed. Simulation results in [12], [13] showed negligible performance
gap between the max-weight policy and the optimal policy, and similar performance between
Whittle index policy and max-weight policy.
All the aforementioned work focused on OMA scheme, i.e., different users cannot update their
status simultaneously. The potentials of NOMA scheme on reducing AoI were first investigated
in [20] considering a simple two-user network. The analytical expression of the total average
AoI of the network using NOMA scheme and that of conventional OMA environments were
derived via Stochastic Hybrid Systems (SHS) and compared in different setups. The simulation
results have illustrated the advantage of NOMA for the case of relatively high spectral efficiency
in comparison with OMA. The authors in [20] focused on analyzing the AoI of two-user
network that always uses NOMA to investigate the potential of NOMA scheme by comparing
it with the AoI of same network adopting OMA scheme. In contrast, our work considers how
to dynamically schedule the communications in a more general multiuser system by adaptively
switching between OMA and NOMA modes to minimize the AoI of the network. The considered
system is more practical due to the increased number of users and the scheduling scheme is more
comprehensive including which user(s) to schedule and the corresponding power allocation.
C. Organization
The rest of the paper is organized as follows. Section II introduces the system model. We
study optimal policy for the two-user scenario and propose a near-optimal policy in Section
III. Section IV studies the multi-client scenario. Numerical results are presented in Section V
to validate the theoretical analysis and the effectiveness of the proposed adaptive NOMA/OMA
scheme. Finally, conclusions are drawn in Section VI.
II. SY ST EM MO DE L
We consider a multiuser wireless network, in which a BS conducts timely status updates to
Nclients in a slotted manner. At the beginning of each time slot, the BS can generate a status
update packet for each client, which is known as generate-at-will in the literature [3], [4], [12].
Adaptive NOMA/OMA transmission scheme is adopted by the BS. Specifically, the BS can
adaptively switch between NOMA and OMA for the downlink transmission. With NOMA, it is
possible for more than one client to receive their packets simultaneously within one time slot. At
the end of each time slot, if client ihas received its packet successfully from the BS, it will send
an acknowledgment (ACK) to the BS. The ACK link from all clients to the BS is considered to
be error-free and delay-free.
We use Age of Information (AoI) [2] to characterize the timeliness of the information received
at each client. AoI is defined as the time elapsed since the generation time of the latest received
information at the destination side. Mathematically, the AoI of client iin time t, denoted by
i(t), is tui(t), where ui(t)denotes the generation time of latest received status update at
time t. According to the considered generate-at-will model, if client ihas successfully received
its status update from the BS, its AoI will decrease to 1, otherwise its AoI increases by 1.
Mathematically, we have
i(t+ 1) =
i(t)+1, vi(t) = 0,
1, vi(t) = 1,
(1)
where vi(t)is the indicator that is equal to 1when the client ireceives its status update correctly
from the BS in time slot t, and vi(t) = 0 otherwise. The weighted sum of the expected AoI of
all clients is adopted to measure the network-wide information timeliness, which is given by
¯
∆ = lim
T→∞ sup 1
TE"N
X
i=1
T
X
t=1
wii(t)#,(2)
where wiis the weight coefficient of client iwith PN
i=1 wi= 1, and the expectation is taken
over all possible system dynamics.
For ease of understanding, we first consider the two-client scenario, i.e., N= 2. We later will
extend our design to the general case with more clients. In the OMA mode, the BS only conducts
transmission to a single client. In this context, if time slot tis assigned for the transmission to
client i,i∈ {1,2}, the signal received at the client ican be written as
yO
i(t) = hi(t)P si(t) + ni(t),(3)
where Pis the constant transmission power of the BS; siis the status update message from the
BS to client i;hiis the channel coefficient between the BS and client i. Specifically,
hi=qdτ
igi,(4)
where the normalized distance di=ci/c0, with ciand c0denoting the distance between client i
to the BS and the baseline distance, respectively. Parameter τdenotes the path loss exponent and
gi∼ CN(0,1) with CN denoting complex normal distribution. Without loss of generality, we
consider c1< c2, i.e., E[|h1|2]>E[|h2|2]. Random variable niis the complex additive Gaussian
noise with variance σ2
i. For simplicity, we assume the variance of niis identical for both clients,
i.e., σ2
i=σ2,i. After receiving the signal, the information can be decoded in an interference-
free manner with a SNR γi=|hi|2ρ, where ρ=P2is the transmission SNR. Then, the rate
for client ican be expressed as ROMA
i= log(1 + γi). The outage probability at client iusing
OMA is given by
PO
i= 1 PROMA
iRi= 1 exp (2Ri1)dτ
i
ρ,(5)
where Riis the target rate of client i. For simplicity, we assume that the target rates of both
clients are the same, i.e., R1=R2=R. Note that the framework developed for the two-user
scenario, can be readily extended to the case with distinct target rates.
On the other hand, when NOMA is conducted in time slot t, the signals to different clients are
combined in the power domain at the BS by allocating different power levels to them. Through
successive interference cancellation (SIC), it is possible for two clients to successfully recover
their corresponding information in the same time slot. We consider fixed power transmission,
and the observation at client ican be expressed as
yN
i(t) = hi(t)(pα1P s1(t) + pα2P s2(t)) + ni(t),(6)
where αiis the power allocation coefficient, and we readily have α1+α2= 1 to achieve the best
possible performance. It is assumed that the BS only has the knowledge of statistical channel
state information (CSI) of its channels to both clients, while the clients as receivers have perfect
knowledge of CSI, as in [24], [28]. Thus, we have α1< α2according to the NOMA principle.
Then, for client 2(i.e., the far user), it decodes its message from the BS directly by treating s1
as interference. The received SINR can be written as γ22 =α2|h2|2/(α1|h2|2+ 1). Therefore,
the outage probability of client 2using NOMA is given by
PN
2= 1 P(log(1 + γ22)R)=1exp (2R1)dτ
2
ρ(α2α1(2R1)),(7)
where we enforce α2α1(2R1) >0, i.e., α2>2R1
2R.
For client 1(i.e., the near user), it will conduct SIC. Specifically, client 1will first decode s2
as what client 2has done by treating s1as interference. The received SINR of client 1when
decoding s2, denoted by γ12, can thus be similarly expressed as γ12 =α2|h1|2/(α1|h1|2+ 1).
Once s2is successfully decoded, client 1will then decode s1without interference, and the
resultant SNR is γ11 =α1|h1|2ρ. The outage probability of client 1using NOMA can thus be
calculated as
PN
1= 1 P(log(1 + γ12)R& log(1 + γ11 )R)
= 1 exp max (2R1)dτ
1
ρ(α2α1(2R1)),(2R1)dτ
1
ρα1.
(8)
Comparing the above outage probability expressions between NOMA and OMA schemes,
we can find that NOMA offers more chance for the BS to transmit fresh status updates to
both clients at the cost of a higher outage probability. Thus, to maintain the freshness of the
information received at each client, at the beginning of each time slot, the BS needs to carefully
decide whether to use NOMA or OMA scheme. In addition, the outage probability of NOMA is
determined by the power allocation among the two clients. As such, when using NOMA, the BS
should appropriately allocate power for the transmission to each client. The power allocated to
each client is considered to be discrete in the two-client system. Specifically, the power allocated
to client i, denoted by pi, can only take the value from the discrete set {0, p, 2p, 3p, ..Lp}with
p=P/L and p1+p2=P, as α1= 1 α2. That is, αican take the value from {0,1
L,2
L,3
L, .., 1}.
As client 2is far from the BS (i.e., c1< c2), to effectively use NOMA, α2should be larger
than α1when applying NOMA, i.e., α2>0.5. Combining it with the previous condition α2>
2R1
2R, one can deduce that α2can only take value from {0,max{1
2+1
L,d(2R1)L
2Re1
L},max{1
2+
1
L,d(2R1)L
2Re1
L}+1
L, ..., 1}.
Let α2(t)denote the power allocation coefficient for client 2in time slot t. Specifically,
α2(t)=0or α2(t) = 1 indicates the BS uses OMA scheme, conducting orthogonal transmission
to client 1and client 2, respectively; otherwise, the BS uses NOMA scheme, serving both clients
with the amount of power α2(t)Pallocated to client 2and (1 α2(t))Pto client 1.
Let πdenote the stationary transmission policy at the BS, which maps system states to action
space. Denoting atas the action at time slot t,at∈ {0,max{dL
2e+1,d(2R1)L
2Re}, ..., L}indicates
that the BS allocates atpamount of power to client 2. If at= 0, the BS chooses OMA scheme
and only transmits information to client 1; if at=L, the BS chooses OMA scheme and transmits
information to client 2; otherwise, the BS chooses NOMA scheme, with atpamount of power
allocated to client 2and Patpallocated to client 1. Our design objective is to find the optimal
policy to be adopted by the BS that can adaptively switch between NOMA and OMA schemes
to minimize the weighted sum of the expected AoI for both clients. The problem can be formally
formulated as follows
Problem 1.
min
π
¯
∆(π).(9)
III. OPTIMAL AND NEA R-OPTIMAL POLICIES FOR TWO-CLIENT SYSTEM
In this section, we resolve Problem 1 by formulating it as an MDP problem and investigate
the age-optimal policy that minimizes the weighted sum of the expected AoI of both clients. By
analyzing the structural results of the optimal policy, we then devise a near-optimal policy with
lower computation complexity.
A. MDP Formulation
We first recast Problem 1 into an MDP problem, described by a tuple {S,A,P, r}, where
State space S=Z+×Z+: The state in time slot tis composed by the instantaneous AoI
of both clients, st,(∆1,t,2,t ).
Action space A={0,max{dL
2e+ 1,d(2R1)L
2Re}, ..., L}: the detailed description of action
at∈ A has been provided at the end of the previous section.
Transition probabilities P:P(st+1|st, at)is the probability of the transition from state stto
st+1 when taking action at. According to the outage probability of both clients using either
NOMA or OMA given in Section II, we have the following transition probabilities,
P((1,2+ 1)|(∆1,2), a = 0) = 1 PO
1,
P((∆1+ 1,2+ 1)|(∆1,2), a = 0) = PO
1,
P((∆1+ 1,1)|(∆1,2), a =L) = 1 PO
2,
P((∆1+ 1,2+ 1)|(∆1,2), a =L) = PO
2,
(10)
and for i6= 0, N
P((1,2+ 1)|(∆1,2), a =i) = (1 PN
1(a))PN
2(a),
P((∆1+ 1,1)|(∆1,2), a =i) = (1 PN
2(a))PN
1(a),
P((1,1)|(∆1,2), a =i) = (1 PN
1(a))(1 PN
2(a)),
P((∆1+ 1,2+ 1)|(∆1,2), a =i) = PN
1(a)PN
2(a),
(11)
where PN
1(a)and PN
2(a)are the outage probability of client 1and client 2, respectively,
using NOMA with α1= 1 a
Land α2=a
L. Note that in (10) and (11), the time superscript
for the state (∆1,t,2,t )and action atis omitted for brevity.
r:S ×A → Ris the one-stage reward function of state-action pairs, defined as r(st, at) =
w11,t +w22,t.
Given any initial state s0, the infinite-horizon average reward of any feasible policy π, can be
expressed as
C(π, s0) = lim
T→∞ sup 1
T
T
X
k=0
Eπ
s0[r(sk, ak)|s0].(12)
We are now ready to transform Problem 1 to the following MDP problem
Problem 2.
min
πC(π, s0).(13)
To proceed, we first investigate the existence of an optimal stationary and deterministic policy
of Problem 2 and arrive at the following theorem.
Theorem 1. There exists a constant J, a bounded function h(∆1,2) : S → Rand a stationary
and deterministic policy π, satisfying the average reward optimality equation,
J+h(∆1,2) = min
a∈A (w11+w22+E[h(ˆ
1,ˆ
2)]),(14)
(∆1,2)∈ S, where πis the optimal policy, Jis the optimal average reward, and (ˆ
1,ˆ
2)
is the next state after (∆1,2)taking action a.
Proof. See Appendix A.
According to Theorem 1, the optimal policy is stationary and deterministic, i.e., it is time-
invariant and deterministically selects an action in each time slot with no randomization.
B. Action Elimination
In this subsection, we establish action elimination by analyzing the property of the formulated
MDP problem, which can reduce action space of each state for lower computation complexity.
According to (7) and (8), and the fact α1+α2= 1, the outage probability of client 2using
NOMA (i.e., PN
2) is decreasing in α2, i.e., PN
2(a)is decreasing in action awhen max{dL
2e+
1,d(2R1)L
2Re} < a < L. However, the outage probability of client 1using NOMA (i.e., PN
1) is
decreasing in α2when 2R1
2R< α2<2R
2R+1 and is increasing in α2when 2R
2R+1 < α2<1. That
is, PN
1(a)is decreasing in awhen a∈ {max{dL
2e+ 1,d(2R1)L
2Re}, ..., b2RL
2R+1 c} and increasing
in awhen a {d 2RL
2R+1 e,d2RL
2R+1 e+ 1, ..., L 1}. As such, the action a=b2RL
2R+1 chas a better
performance in reducing AoI of both clients, with lower outage probability comparing to a
{max{dL
2e+1,d(2R1)L
2Re},max{dL
2e+1,d(2R1)L
2Re}+ 1, ..., b2RL
2R+1 c}. Thus, the action space can
be reduced to a∈ {0,b2RL
2R+1 c,b2RL
2R+1 c+ 1, ..., L}.
C. Structural Results on Optimal Policy
In this subsection, we derive two structural results of the optimal policy that offer an ef-
fective way to reduce the offline computation complexity and online implementation hardware
requirement.
Theorem 2. The optimal policy πhas a switching-type policy. That is, denoting cand das
any action from action space {0,b2RL
2R+1 c,b2RL
2R+1 c+ 1, ..., L},
If π((∆1,2)) = c, then π((∆1,2+z)) = d, where zis any positive integer and dc,
If π((∆1,2)) = c, then π((∆1+z, 2)) = d, where zis any positive integer and dc.
Proof. See Appendix B.
Given the structure of the optimal policy, only the decision switching boundary is needed
for implementation, rather than storing each state-action pair in the optimal policy, which
significantly reduces the memory for the hardware. In addition, based on the structure, a special
algorithm can be developed accordingly as in [5, Althorithm 1] to reduce the complexity in
calculating the optimal policy.
D. Near-optimal Policy
In this subsection, we propose a near-optimal policy with lower computation complexity
comparing with that of the optimal MDP policy. Inspired by the max-weight policy in [13],
the proposed suboptimal policy makes use of the transition probability of the underlying MDP
and only maximizes the weighted sum of the expected AoI drop within each time slot, i.e., the
weighted sum of the expected difference between the age of current state and the possible age
of next state. According to (10), given the current state s= (∆1,2), the expected AoI drop,
denoted by E[η(s, a)], can be expressed as
E[η(s, a)] =
w1(1 PO
1)∆11,if a= 1;
w2(1 PO
2)∆21,if a=L
w1(1 PN
1(a))∆1+w2(1 PN
2(a))∆21,otherwise.
(15)
Then, the action of state sin the proposed suboptimal policy ¯πcan be given by
¯π(s) = arg max
a
E[η(s, a)].(16)
The suboptimal policy is simple and easy to implement. Moreover, as we show via the numerical
results in Section IV, the suboptimal policy can achieve near-optimal performance. In addition,
the suboptimal policy can be readily extended to continuous power scenario, i.e., in each time
slot, finding the optimal power allocated to each client to maximize the weighted sum of the
expected AoI drop where PN
1(a)and PN
2(a)in (15) are replaced by the outage probability of
each client using NOMA with continuous power allocated to client 2.
IV. EXTENSION TO MULTIPLE CLIENTS N > 2
Recall that the BS aims to deliver status updates to all clients in a timely manner. To that end,
the BS needs to carefully decide the transmission power allocated to each client as explained
in Section III. However, since state-space explodes exponentially as the number of clients and
the power discretization levels increase, the MDP method elaborated in Section III is no longer
computationally tractable for the multi-client scenarios.
In this section, we extend our near-optimal policy proposed in Section III.D to the general
case with a BS delivering timely status updates to Nclients (N > 2) in a slotted manner using
adaptive NOMA/OMA principle. At the beginning of each time slot, the BS needs to schedule
transmission to clients. That is, the BS decides to transmit to which client(s) and allocates the
transmission power to them. At the end of each time slot, if client ihas received its packet
successfully from the BS, it will send an ACK to the BS. The observation at the ith client in
time slot tis given by
yi(t) = hi(t)
N
X
j=1 qpj(t)sj(t) + ni(t),(17)
where sjdenotes the message from BS to client jand hidenotes the channel coefficient between
the BS and client ias in (4). Without loss of generality, we consider the sorted distance c1>
c2> ... > cN, i.e., E[|h1|2]<E[|h2|2]< ... < E[|hN|2]. Variable pjis the transmission power
allocated to the message intended to client jwhich satisfies the power limit ¯p, i.e., PN
i=1 pi¯p,
and ni∼ CN(0, σ2
i)is the complex additive Gaussian noise at client i. For simplicity, we assume
the variance of niis identical for all clients, i.e., σ2
i=σ2,i.
Denoted by Nthe set of all clients in the system, i.e., N={1,2, ..., N }. Any subset
K ⊆ N denotes the possible set of clients to be served in each time slot. According to the
NOMA principle, in the subset of clients selected to be served, a client with a smaller distance
is assigned with a larger decoding order index [29], [30]. Each selected client employs the
successive interference cancellation (SIC) technique to decode the messages for clients with
a smaller decoding order index in the selected client set first, and to remove the inter-user
interference if the decoding is correct. Denote λias the indicator that equals 1when client iis
selected to transmit, and equals 0otherwise. Thus, if Kclients are selected to be served, then
PN
i=1 λi=K. Let m(k)denote the original client index among the Kselected clients whose
decoding order is k, i.e., λm(k)= 1,k∈ {1,2, ..., K},k.m(.)is a single mapping that maps the
set {1,2, ..., K}to the set {1,2, ..., N}where KN. The sequence {m(k)}k=1,2,...,K consists
of the set of clients selected for receiving status updates. Besides, according to the decoding
order of NOMA, we have m(1) < m(2) < ... < m(K).
Given the set of clients {m(k)}k=1,2,...,K to be served, denote by Rm(j)
m(i)the rate for client m(j)
to detect client m(i)’s message. We consider ji, indicating m(j)m(i). To correctly detect
client m(i)’s message, client m(j)should first successfully remove the interference from clients
in {m(k)}k=1,2,...,K whose decoding order index is smaller than m(i). Thus, the expression of
Rm(j)
m(i)is given by [28]–[30]
Rm(j)
m(i)= log 1 + |hm(j)|2pm(j)
PK
k=i+1 |hm(k)|2pm(k)+σ2!.(18)
As the BS does not have perfect knowledge of CSI, outage may occur in the considered system.
We define that if client m(j)cannot detect its own message or the message of client m(i)with
smaller decoding index m(j)m(i)in the selected client set, then outage occurs at client m(j)
[28], [31]. Assume that the BS transmits one message to each client with the same fixed target
rate R, the outage probability of client m(j)can be expressed as [29], [30]
Po
m(j)= 1 PRm(j)
m(1) R, ..., Rm(j)
m(j)R
= 1 exp dτ
m(j)max
k=1,2,...,j ((2R1)σ2
pm(k)(2R1) PK
i=k+1 pm(i))!.
(19)
We can see from (19) that if pm(k)(2R1) PN
i=k+1 pm(i)0, the outage probability of client
m(j)will always be 1. Thus, for any client m(k)selected to be served, i.e., pm(k)6= 0, the
following condition needs to be satisfies
pm(k)>(2R1)
K
X
i=k+1
pm(i).(20)
Otherwise, an outage always occurs and the allocated power will be wasted. Moreover, if client
iis not served, i.e., i /∈ {m(k)}k=1,2,...,K and pi= 0, its outage probability is 1, otherwise, its
outage probability will be smaller than 1. Mathematically, we have
E[vi(t)=1]=
0, i /∈ {m(k)}k=1,2,...,K,
1Po
i, i ∈ {m(k)}k=1,2,...,K .
(21)
Recall the vi(t)is the indicator that equals 1when client isuccessfully receives its status
update from the BS in time slot t. Let p(t) = {p1(t), p2(t), ..., pN(t)}denote the amount of
transmission power allocated to each client satisfying PN
i=1 pi(t)¯p. Give {m(k)}k=1,2,...,K, we
have PK
i=1 pm(i)(t)¯pand pi(t)=0,i /∈ {m(k)}k=1,2,...,K .
Note that the special case K= 1 indicates only one client will be served, i.e., client
m(1) will be served using OMA scheme. The corresponding outage probability becomes 1
exp dτ
m(1)
(2R1)σ2
pm(1) as in (5).
We now extend our near-optimal policy (i.e., problem in (16)) to the multiple-client scenario
by formulating the following power allocation problem.
Problem 3.
max
p(t)
N
X
i=1
(1 Po
i(p(t))) wii(t)
s.t.,(20),
N
X
i=1
pi(t)¯p, pi(t)0.
(22)
We note that in the above optimization problem, the instantaneous AoI of all clients in time
slot twill affect the power allocated to each client. Clients with smaller AoI are less likely to
be served as the resultant AoI drop is insignificant.
A. Effective power allocation
In this subsection, we solve Problem 3 to obtain the effective power allocation to minimize
the weighted sum of expected AoI in two steps: 1) Step 1: design an optimal power allocation
scheme to serve a fixed number of clients. That is, given K, find optimal {m(k)}k=1,2,...,K and
pm(1), pm(2), ..., pm(K); 2) Step 2: choose optimal K∈ {1,2, ..., N}that achieves the maximum
objective value. The detailed description of these two steps is given in the following.
1) Step 1: Optimal power allocation to conduct transmission to fixed Knumber of clients:
Given K, i.e., the number of clients to serve, the BS should decide which group of clients
to serve, i.e., {m(k)}k=1,2,...,K , and the power allocated to them, i.e., pm(1), pm(2), ..., pm(K).
Recall that the power allocated to the unselected clients is 0.
As in [29, Eq.(15)], we convert the power constraint described in (20) to the following format
to facilitate the use of power constraint,
K
X
k=1
ˆpm(k)(r+ 1)(k1) ¯p, (23)
where r= 2R1and
ˆpm(k)=pm(k)r
K
X
i=k+1
pm(i).(24)
The outage probability of the selected client m(k)can be expressed as
Po
m(k)= 1 PRm(k)
m(1) R, ..., Rm(k)
m(k)R
= 1 exp dτ
m(k)2max
t=1,2,...,k 1
ˆpm(t), k ∈ {1,2, ..., K}.
(25)
For other unselected nodes, their outage probability is always equal to 1. Recall that c1> c2>
... > cN, indicating dτ
1> dτ
2> ... > dτ
N. We thus have dτ
m(1) > dτ
m(2) > ... > dτ
m(K). Note that
only the selected clients may have AoI drop and the AoI of unselected clients will increase by
one, and therefore the one-step weighted sum of expected AoI drop of the network is actually
that of those selected clients. Hence, for a given K, Problem 3 can be re-written as
Problem 4.
max
p(t)
K
X
k=1 1Po
m(k)(p(t))wm(k)m(k)(t)
s.t., (20),
K
X
k=1
pm(k)(t)¯p, pi(t)=0,i /∈ {m(k)}k={1,2,...,K}.
(26)
To further simplify the above problem, the variable transformation according to (23) is applied,
and Problem in 4 can be transformed into the following equivalent form:
Problem 5.
max
ˆp(t),{m(k)}
K
X
k=1 1Po
m(k)(ˆp(t))wm(k)m(k)(t)
s.t., (23),ˆp(t) = (ˆpm(1) ,ˆpm(2), ..., ˆpm(K)),ˆpm(k)>0,k∈ {1,2, ..., K}.
(27)
This problem consists of two parts: 1) select which Kclients to serve, i.e., {m(k)}; 2)
transferred power variable of these Kclients, i.e., ˆp(t), given (t) = {1(t),2(t), ..., N(t)},
dτ
1> dτ
2> ... > dτ
N,rand σ2.
Suppose {m(k)}k=1,2,...K is known, we then solve Problem 5 as following (note that the time
index tis dropped hereafter for notation simplicity):
Problem 6.
max
ˆp
K
X
k=1
exp dτ
m(k)2max 1
ˆpm(1)
,..., 1
ˆpm(k)wm(k)m(k)(28a)
s.t.,
K
X
k=1
(r+ 1)k1ˆpm(k)p, (28b)
ˆpm(k)>0, k = 1, . . . , K. (28c)
In solving Problem 6, we first have the following lemma.
Lemma 1. Adding the following constraint:
ˆpm(1) ˆpm(2) ≥ ··· ≥ ˆpm(K)
to Problem 6 will not change its optimal objective value of (28a)
Proof. See the Appendix C.
By Lemma 1, we focus on solving the following problem to the same objective value as
Problem 6, which can be solved in a simple and tractable way.
Problem 7.
max
ˆp
K
X
k=1
wm(k)m(k)exp dτ
m(k)2
ˆpm(k)!(29a)
s.t.
K
X
k=1
(r+ 1)k1ˆpm(k)p(29b)
ˆpm(1) ˆpm(2) ≥ ··· ≥ ˆpm(K)(29c)
ˆpm(k)>0, k = 1, . . . , K (29d)
To proceed, we first investigate the properties of the objective function (29a) in Problem 7.
We define
Gk(ˆpm(k)) := wm(k)m(k)exp dτ
m(k)2
ˆpm(k)!.
The following properties hold for functions Gk(·),k= 1, . . . , K:
limˆpm(k)0+Gk(ˆpm(k)) = 0. For convenience, we define Gk(0) = 0;
limˆpm(k)+Gk(ˆpm(k)) = wm(k)m(k);
0 0.5 1 1.5 2 2.5 3 3.5 4
0
0.5
1
1.5
Figure 2: Understanding of the convex approximation.
Gk(·)is strictly monotonically increasing on (0,+), which can be verified by checking
G0
k(·);
Gk(·)is strictly convex on [0,dτ
m(k)2
2), and strictly concave on [dτ
m(k)2
2,+), which can
be verified by checking G00
k(·).
Inspired by the properties above, we propose a convex upper approximation of Gk(·)as
follows. We find a constant ˜pm(k)>0for each k= 1, . . . , K , and replace the segment of
Gk(·)on [0,˜pm(k)]by the straight line segment connecting two points (0, Gk(0) = 0) and
(˜pm(k), Gk(˜pm(k))). At the same time, the straight line segment is a tangent line to Gk(·)at
the point (˜pm(k), Gk(˜pm(k))). Therefore ˜pm(k)can be calculated as follows:
Gk(˜pm(k))Gk(0)
˜pm(k)0=G0
k(˜pm(k))
which leads to the result ˜pm(k)=dτ
m(k)2. Hence a convex upper approximate of Gk(·)is:
˜
Gk(ˆpm(k)) :=
wm(k)m(k)e1
dτ
m(k)2ˆpm(k),0ˆpm(k)< dτ
m(k)2
wm(k)m(k)exp dτ
m(k)2
ˆpm(k),ˆpm(k)dτ
m(k)2
For the sake of understanding, we illustrates an example of the adopted convex approximation
in Fig. 2. Then we can solve the following convex problem as an approximate of Problem 7:
Problem 8.
max
ˆp
K
X
k=1
˜
Gk(ˆpm(k))(30a)
s.t.
K
X
k=1
(r+ 1)k1ˆpm(k)p, (30b)
ˆpm(1) ˆpm(2) ≥ ··· ≥ ˆpm(K),(30c)
ˆpm(k)0, k = 1, . . . , K. (30d)
Let ˆpo= (ˆpo
m(1),...,ˆpo
m(K))be an optimal solution that we obtain by solving Problem 8, and
denote the optimal objective value of Problem 8 as ˜
Uo=˜
U(ˆpo) := PK
k=1 ˜
Gk(ˆpo
m(k)). Note that
ˆpois also a feasible solution to Problem 7. Moreover, denote the objective value of Problem 7 at
ˆpoas Uo=U(ˆpo) = PK
k=1 Gk(ˆpo
m(k)). Then the optimal objective value of Problem 7, denoted
by U, is bounded as UoU˜
Uo. The following Corollary provides an upper bound of the
suboptimality gap UUofor Problem 7.
Corollary 1. The gap between the optimal objective value of Problem 7 and that of Problem 8
is bounded by e2PK
k=1 wm(k)m(k). Mathematically, UUoe2PK
k=1 wm(k)m(k).
Proof. See the Appendix D.
We realize that it could be difficult to derive the closed-form solution to both Problem 7
and Problem 8. However, compared to Problem 7, Problem 8 can be solved efficiently via any
convex optimization solver. Besides, Corollary 1 offers the upper bound of the suboptimality
gap between Problem 8 and Problem 7.
Moreover, for a fixed total number Nof clients and a fixed number Kof clients to be served,
there are in total CK
Npossible sequences {m(k)}k=1,2,...K . By traversing all these combinations,
we can find the optimal solution to Problem 8 with the optimal set of Kclients to be served K=
{m(k)}k=1,2,...K . It is worth emphasizing that we traverse all these combinations by substituting
them to (29a) rather than (30a), and then select the one with the maximum objective value.
2) Step 2: Optimal number of clients to be served:By comparing the optimal performance
for every K∈ {1,2, ..., N}, we can find the optimal value K, and its corresponding clients to be
served K={m(k)}k=1,2,...Kand ˆpm(1),ˆpm(1),ˆpm(3),..., ˆpm(K). It is worth emphasizing that
we traverse all K∈ {1,2, ..., N}by substituting them to the object in Problem 3 to find K=
{m(k)}k=1,2,...Kand the corresponding value ˆpm(1),ˆpm(1) ,ˆpm(3),..., ˆpm(K). Then, according
to the relationship between {pm(k)}and {ˆpm(k)}, we can transfer {ˆpm(k)}to the power allocated
to each client, and obtain {pm(k)}and pi= 0, if i /∈ K.
To summarize our method, the pseudocode of the overall algorithm for resolving Problem 3
is described in Algorithm 1.
Algorithm 1 Calculate power allocated to each client
Require:
1: Input:(t) = {1(t),2(t), ..., N(t)},(d1, d2, ..., dN),r,τand σ2.
2: for K= 1 to Ndo
3: ηK= 0;
4: for j= 1 to CK
Ndo
5: {m(k)}k=1,2,··· ,K ={mj(k)}k=1,2,··· ,K ;The subset of Nwith Kclients
6: {ˆpm(k)}:= solution to Problem 8; Solve Problem 8 by convex optimization tool.
7: if ηK<PK
k=1 Gk(ˆpm(k))then
8: ηK=PK
k=1 Gk(ˆpm(k));
9: {m
K(k)}k=1,2,··· ,K ={m(k)}k=1,2,··· ,K ;
10: ˆpK={ˆpm(k)}k=1,2,··· ,K ;
11: end if
12: end for
13: end for
14: K= arg max
K=1,2,...,NηK;
15: K={m
K(k)}k=1,2,··· ,K;The set of served clients
16: convert ˆpKto pKusing (24); Power allocated to clients in K.
V. NUMERICAL RE SU LTS AND DISCUSSIONS
In this section, simulation results are provided to evaluate the effectiveness of the proposed
adaptive NOMA/OMA scheme for both two-client and multi-client scenarios.
A. Two-client scenario
This subsection provides numerical results to verify the analytical results for the two-client
scenario presented in Section III. We set path loss exponent τ= 2 and the target data rate R= 1
in all simulations. The SNR in this subsection refers to the transmission SNR ρ.
0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
(a) MDP optimal policy
0 10 20 30 40 50 60 70 80 90 100
0
10
20
30
40
50
60
70
80
90
100
(b) Suboptimal policy
Figure 3: Age-optimal policy and suboptimal policy. Each point represents a state s= (∆1,2).
The colored area indicates action for each state, i.e., a= 0 for states in the blue area; a= 7 for
states in the orange area; a= 8 for states in the purple area; a= 9 for states in the green area
and a= 10 for states in the red area, where L= 10 and A={0,6,7,8,9,10}.
10 15 20 25
0
2
4
6
8
10
12
14
16 Optimal adaptive NOMA/OMA
Optimal NOMA
Suboptimal adaptive NOMA/OMA
Optimal OMA
17 18 19 20
1
1.5
2
2.5
3
d1=2, d2=4
d1=3, d2=6
Figure 4: The performance comparison of different policies versus SNR for the two-client
scenario with w1=w2= 0.5.
We follow [32] and apply Relative Value Iteration (RVI) method on truncated finite states
(i100,i) to approximate the countable infinite state space. The optimal policy and
suboptimal policy is illustrated in Fig.3, where SNR= 18dB, the normalized distances for two
clients are d1= 2 and d2= 4, and the weighted parameters for two clients w1=w2= 0.5. We
can observe the switching structure of the optimal policy which verifies Theorem 2. Besides, we
can find that the proposed suboptimal policy is similar to the optimal policy.
Fig. 4 compares the weighted sum of the expected AoI of the two clients under optimal
policy using adaptive NOMA/OMA scheme (optimal adaptive NOMA/OMA scheme), the policy
that always using NOMA for transmission (optimal NOMA policy with a∈ {max{dL
2e+
1,d(2R1)L
2Re}, ..., L 1}), the proposed suboptimal policy and the optimal OMA policy that the
BS adaptively selects one client to conduct transmission (optimal OMA scheme with a∈ {0, L})
in two cases: 1) d1= 2 and d2= 4; 2) d1= 3 and d2= 6. The setting of the rest system
parameters is the same as that in Fig 3. We conduct the simulations by generating 106time slots
for different transmission SNRs. We can see from Fig.4 that the proposed suboptimal policy
achieves near-optimal performance: its weighted sum of the expected AoI almost coincides with
that of the optimal adaptive NOMA/OMA policy especially when the outage probability of two
clients are small as shown in Fig. 4. Specifically, the performance of suboptimal policy is closer
to that of the optimal adaptive NOMA/OMA policy when d1= 2 and d2= 4, comparing to the
case when d1= 3 and d2= 6; the gap between the AoI performance of the suboptimal policy
and that of the optimal adaptive NOMA/OMA policy narrows as the SNR increases.
Moreover, we can see that when SNR is small, e.g., SNR<15dB, the performance of optimal
adaptive NOMA/OMA scheme and that of the optimal OMA scheme are almost the same in Fig.
4. This is due to the low SNR, which leads to a higher outage probability for both OMA and
NOMA. The situation for NOMA is even worse. As such, both optimal adaptive NOMA/OMA
policy and the suboptimal policy will prefer not to choose NOMA scheme but use OMA scheme.
Thus, these two policies have similar performance. As SNR increases, the weighted sum of the
expected AoI of optimal OMA policy will approach 1.5, when w1=w2= 0.5. This is the optimal
performance under the OMA scheme. As the outage probability of each client is approaching 0,
the instantaneous age of each client will equal to 1 and 2 iteratively.
Furthermore, we can see from Fig. 4 that the performance of optimal adaptive NOMA/OMA
policy and that of suboptimal policy and NOMA policy are relatively close when SNR is large,
e.g., SNR20dB. This is because both optimal adaptive NOMA/OMA policy and suboptimal
policy are more likely to choose NOMA for transmission to both clients at the same time. When
SNR is large enough, the optimal performance of both the optimal adaptive NOMA/OMA policy
and the suboptimal policy approaches 1as the instantaneous AoI of each client will be always
1, thanks to almost no outage for both clients in NOMA at high SNR. The BS thus always
chooses NOMA scheme to conduct transmissions to both clients. In addition, NOMA is better
than optimal OMA when SNR>16dB for d1= 2 and d2= 4 and SNR>19dB for d1= 3 and
d2= 6. This shows the benefits of NOMA in timely status update when SNR is large.
B. Multi-client scenario
In this subsection, we evaluate the effectiveness of approximation of the max-weight policy
in multi-client scenario. We conduct all simulations by generating 105time slots for different
transmission SNR ρ= ¯p/σ2. We consider the scenario with a BS conducting timely status
update to 5clients with normalized distance di= 6i,i∈ {1,2, .., 5}. We set path loss exponent
τ= 2 and the target data rate R= 1. Fig. 5 illustrates the performance of different policies under
different transmission SNR, including: 1) max-weight policy under adaptive NOMA/OMA solved
by exhaustive search in each time slot (MW-N/OMA), 2) approximated convex optimization
policy (termed AP-N/OMA), 3) approximated convex optimization policy under NOMA with
fixed client number K(termed AP-NOMA-F-K) and 4) OMA scheme that selects the client
corresponding to achieve maximum expected age drop to serve as in [14] (termed MW-OMA).
We can see that similar to the results of the two-client scenario, when the SNR is low,
the AoI performance under different NOMA schemes (i.e., AP-NOMA-F-K) is poor, due the
relatively large outage probability of NOMA scheme in low SNR scenario, comparing with
MW-OMA scheme. Specifically, when SNR ρ13dB, the performance of AP-NOMA-F-K1
is worse than that of AP-NOMA-F-K2, if K1> K2. As the transmission SNR increases, the
performance of AP-NOMA-F-Kbecomes better. The rationale is that when the transmission
SNR is sufficiently large, the NOMA scheme that allows to serve more clients achieves reduced
age performance. When SNR ρ29dB, the performance of AP-NOMA-F-K1is better than
that of AP-NOMA-F-K2, if K1> K2. Comparing to the AP-NOMA-F-Kand MW-OMA,
the proposed AP-N/OMA scheme that adaptively switches between NOMA and OMA achieves
overall better AoI performance as it allocates power to each client in a more flexible way. In
addition, the small gap between MW-N/OMA policy and AP-N/OMA shows the effectiveness
of our proposed approximation method which reduces the computation complexity but achieves
near-optimal performance.
Fig. 6 plots the curves of the weighted sum of expected AoI performance for MW-OMA policy,
AP-N/OMA policy and MW-N/OMA policy versus the number of clients in the network. The
network with increasing number of clients is considered with N∈ {2,3,4,5,6}, the normalized
distance of ith client in the system with Nclients is di=N+1iand weighted parameter wi=
1/N. As shown in Fig. 6, the performance of AP-N/OMA scheme is close to that of MW-N/OMA.
10 12 14 16 18 20 22 24 26 28 30
0
2
4
6
8
10
12
14
16 AP-NOMA-F-5
AP-NOMA-F-4
AP-NOMA-F-3
AP-NOMA-F-2
MW-OMA
AP-N/OMA
MW-N/OMA
27 28 29 30
1
1.5
2
2.5
3
Figure 5: The performance comparison of different policies versus SNR for multi-client scenario,
N= 5 with wi= 1/N,i∈ {1,2, .., 5}.
Moreover, comparing to the MW-OMA scheme, it achieves significant performance improvement.
Besides, the AP-N/OMA scheme has a slow speed of AoI increase due to the increasing number
of clients in the network, comparing with MW-OMA scheme. The performance gap between
MW-OMA and AP-N/OMA and that between MW-OMA and MW-N/OMA, both increase as the
number of clients in the network increases. This shows the potential of adaptive NOMA/OMA
scheme in achieving reduced AoI performance for multi-client network. The rationale behind is
that in MW-OMA scheme, only one client can be served to have potential AoI drop while other
clients’ AoI will certainly increase. The increasing number of clients in the network makes
more clients have AoI increase. Thus, the age of network will increase. While for adaptive
NOMA/OMA, as more than one client can be served at each time slot, the speed of AoI increase
due to the increasing number of clients in the network will slow down.
VI. CONCLUSIONS
In this paper, we considered a wireless network with a base station (BS) conducting timely
transmission to multiple clients in a time-slotted manner. The BS can adaptively switch be-
tween NOMA and OMA for the downlink transmission to minimize the AoI of the network.
We studied both two-client scenario and multi-client scenario. For the two-client scenario, we
developed an optimal policy for the BS to decide whether to use NOMA or OMA for downlink
transmission based on the instantaneous AoI of both clients in order to minimize the weighted
sum of the expected AoI of the network. This was achieved by formulating and resolving a
23456
1
1.5
2
2.5
3
3.5
4
4.5 MW-OMA
AP-N/OMA
MW-N/OMA
Figure 6: Simulation of network with different number of clients Nwith wi= 1/N,iand
transmission SNR ρ= 20 dB.
Markov Decision Process (MDP) problem. We proved the existence of an optimal stationary and
deterministic policy. Action elimination was conducted to reduce the computation complexity.
The optimal policy is shown to have a switching-type property with obvious decision boundaries.
A suboptimal policy with lower computation complexity was also proposed, which is shown to
achieve near-optimal performance according to simulation results.
For the multi-client scenario, inspired by the proposed near-optimal policy, we formulated
a nonlinear optimization problem to determine the optimal power allocated to each client by
maximizing the expected AoI drop of the network in each time slot. We managed to resolve the
formulated problem by approximating it as a convex optimization problem. The upper bound of
the gap between the approximate convex problem and the original nonlinear, nonconvex problem
was derived. Simulation results validated the effectiveness of the approximation. The performance
adaptive NOMA/OMA scheme by solving the convex optimization was shown to be close to that
of max-weight policy solved by exhaustive search. Besides, the adaptive NOMA/OMA scheme
has achieved significantly reduced AoI comparing to OMA scheme, especially when the number
of clients in the network is large and the transmission SNR is high.
APPENDIX A
PROOF OF THEOREM 1
We prove this theorem by verifying Assumptions 3.1, 3.2 and 3.3 in [33] hold. As the action
space for each state is finite, Assumption 3.2 holds, and we only need to verify the following
two conditions.
1) There exist positive constants β < 1,Mand m, and a measurable function ω(s)1,
s= (∆1,2)∈ S such that the reward function of MDP problem r(s, a) = w11+w22,
|r(s, a)| ≤ Mω(s)for all state-action pairs (s, a)and
X
ˆsS
ωs)Ps|s, a)βω(s) + m, for all (s, a).(31)
2) There exist two value functions v1, v2Bω(S), and some state s0∈ S, such that
v1(s)hα(s)v2(s),for all s∈ S,and α(0,1),(32)
where hα(s) = Vα(s)Vα(s0)and Bω(S) := {u:kukω<∞} denotes Banach space,
kukω:= supsSω(s)1|u(s)|denotes the weighted supremum norm.
For condition 1, we show that when ω(s) = w11+w22and m > 1, there exists a β
that max
a{w11PO
1+w22+1m
w11+w22,w11+w2PO
22+1m
w11+w22,w1PN
1(a)∆1+w2PN
2(a)∆2+1m
w11+w22} ≤ β < 1to meet
condition 1. To prove condition 2 in our problem, we show that when ω(s) = w11+w22, there
exists w11+w22+1
w11+w22κ < that PˆsSω(ˆs)P(ˆs|s, a)κω(s)for all (s, a), and for dDM D ,
where DMD denotes the set of Markovian and deterministic decision rule, PˆsSωs)Pds|s, a)
ω(s)+1(1 + 1)ω(s), so that αTPˆsSω(ˆs)PT
ds|s, a)αT(ω(s) + T)< αT(1 + T)ω(s).
Hence, for each α,0α < 1, there exists a η,0η < 1and an integer Tsuch that
αTX
ˆsS
ωs)PT
πs|s, a)ηω(s)(33)
for π= (d1, ..., dT), where dkDMD,1kT. Then, according to Proposition 6.10.1 [34],
for each πΠMD, where ΠM D denotes the set of Markovian deterministic policies, and sS
|Vα(s)| ≤ 1
1η[1 + ακ +... + (ακ)(T1)]w(s).(34)
We thus can further prove condition 2. This completes the proof.
APPENDIX B
PROOF OF THEOREM 2
The switching-type policy is actually the same as the monotonically nondecreasing policy in
2when 1is fixed, and the monotonically nonincreasing policy in 1when 2is fixed. To
prove the monotonicity of the optimal policy of the MDP problem in 2, we verify that the
following four conditions given in [34, Theorem 8.11.3] hold.
a) The reward function r(s, a)is nondecreasing in sfor all a∈ A;
b) q(k|s, a) = P
j=kp(j|s, a)is nondecreasing in sfor all k∈ S and a∈ A, where p(j|s, a)
is the state transition probability P(st+1 =j|st=s, at=a), given in (10) and (11);
c) r(s, a)is a subadditive function on S × A and
d) q(k|s, a)is a subadditive function on S × A for all k∈ S.
To verify these conditions, we first order the state by 2, i.e., s+sif +
2
2where
s+= (·,+
2)and s= (·,
2). The one-step reward function of the MDP is
r(s, a) = w11+w22.(35)
It is obvious that the condition a) is satisfied. According to the transition probabilities in (10)
and (11), if the current state s= (∆1,2), the next possible states are s1= (·,2+1) (including
(1,2+ 1) and (∆1+ 1,2+ 1)) and s2= (·,1) (including (1,1) and (∆1+ 1,1)). Based on
(10) and (11), we have
q(k|s, a = 0) =
0,if k > s1
1,otherwise.
(36)
q(k|s, a =i, 0< i < L) =
0,if k > s1
PN
2(i),if s1k > s2
1,if ks2
(37)
q(k|s, a =L) =
0,if k > s1
PO
2,if s1k > s2
1,if ks2
(38)
Thus, condition b) is immediate.
To verify the remaining two conditions, we give the definition of subadditivity in the following
Definition 1. (Subadditivity [34]) A multivariable function Q(δ, a) : S × A → Ris subadditive
in (δ, a), if for all δ+δand a+a,
Q(δ+, a+) + Q(δ, a)Q(δ+, a) + Q(δ, a+)(39)
holds.
According to (35), condition c) follows. For the last condition, we verify whether
q(k|s+, a+) + q(k|s, a)q(k|s+, a) + q(k|s, a+),(40)
with s+= (∆1,+
2)and s= (∆1,
2)where +
2
2and a+a. As there are three
actions, we consider three cases: (1) a+=i,a= 0, (2) a+=N,a= 0 and (3) a+=N,
a=iand (4) a+=i,a=jfor 0jiL,i, j. According to (36)-(38), we can verify
that condition d) holds. As all these four conditions hold, the optimal policy is monotonically
nondecreasing in 2, when 1is fixed. The proof of monotonicity of the optimal policy of the
MDP problem in 1is similar, thus omitted for brevity. This completes the proof.
APPENDIX C
PROOF OF LEMMA 1
Consider any feasible point (ˆpm(1) ,...,ˆpm(K))of Problem 6. Suppose ˆpm(1) <ˆpm(2) . By
decreasing ˆpm(2) to the same value as ˆpm(1) , we construct another point:
ˆp0= (ˆp0
m(1),ˆp0
m(2),ˆp0
m(3) . . . , ˆp0
m(K)) = (ˆpm(1),ˆpm(1),ˆpm(3) ,...,ˆpm(K))(41)
which is still feasible in terms of (28b)–(28c). Moreover, it can be verified that
max (1
ˆp0
m(1)
,..., 1
ˆp0
m(k))= max 1
ˆpm(1)
,..., 1
ˆpm(k)(42)
for all k= 1, ..., K. Hence the optimal objective value (28a) will not change if we add constraint
ˆpm(2) ˆpm(1) to Problem 6. The same argument implies that adding ˆpm(k)ˆpm(k1) for all
k= 2, . . . , K to Problem 6 will not change its optimal objective value. This completes the proof.
APPENDIX D
PROOF OF COROLLARY 1
For each k= 1, . . . , K, there is a unique point ˆp0
m(k)(0,dτ
m(k)2
2], such that
G0
k(ˆp0
m(k)) = wm(k)m(k)exp dτ
m(k)2
ˆp0
m(k)!×dτ
m(k)2
(ˆp0
m(k))2=wm(k)m(k)e1
dτ
m(k)2.(43)
The difference ˜
Gk(ˆpm(k))Gk(ˆpm(k))is maximized at ˆpm(k)= ˆp0
m(k), which is:
˜
Gk(ˆp0
m(k))Gk(ˆp0
m(k)) = wm(k)m(k)e1
dτ
m(k)2ˆp0
m(k)wm(k)m(k)exp dτ
m(k)2
ˆp0
m(k)!
=wm(k)m(k)exp dτ
m(k)2
ˆp0
m(k)! dτ
m(k)2
ˆp0
m(k)1!=wm(k)m(k)eα(α1)
where α:= dτ
m(k)2
ˆp0
m(k)[2,+), and the second equality utilized (43). It is easy to verify that
eα(α1) is monotonically decreasing on α[2,+), and therefore its upper bound is
attained at α= 2, i.e.,
˜
Gk(ˆp0
m(k))Gk(ˆp0
m(k))wm(k)m(k)e2.
Therefore, we have
˜
UoUo
K
X
k=1 h˜
Gk(ˆp0
m(k))Gk(ˆp0
m(k))ie2
K
X
k=1
wm(k)m(k).(44)
The upper bound of ˜
UoUo, which is also an upper bound of the gap UUofor Problem 7
as UoU˜
Uo. This completes the proof.
REFERENCES
[1] Q. Wang, H. Chen, Y. Li, and B. Vucetic, “Minimizing age of information via hybrid noma/oma,arXiv preprint
arXiv:2001.04042, 2020.
[2] S. Kaul, R. Yates, and M. Gruteser, “Real-time status: How often should one update?” in 2012 Proceedings IEEE
INFOCOM. IEEE, 2012, pp. 2731–2735.
[3] Q. Wang, H. Chen, Y. Gu, Y. Li, and B. Vucetic, “Minimizing the age of information of cognitive radio-based iot systems
under a collision constraint,” arXiv preprint arXiv:2001.02482, 2020.
[4] E. T. Ceran, D. Gündüz, and A. György, “Average age of information with hybrid arq under a resource constraint,” IEEE
Transactions on Wireless Communications, vol. 18, no. 3, pp. 1900–1913, 2019.
[5] B. Wang, S. Feng, and J. Yang, “To skip or to switch? minimizing age of information under link capacity constraint,” in
2018 IEEE 19th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC). IEEE,
2018, pp. 1–5.
[6] Q. Wang, H. Chen, Y. Li, Z. Pang, and B. Vucetic, “Minimizing age of information for real-time monitoring in resource-
constrained industrial iot networks,” arXiv preprint arXiv:1912.07186, 2019.
[7] S. K. Kaul, R. D. Yates, and M. Gruteser, “Status updates through queues,” in 2012 46th Annual Conference on Information
Sciences and Systems (CISS). IEEE, 2012, pp. 1–6.
[8] Y. Gu, H. Chen, Y. Zhou, Y. Li, and B. Vucetic, “Timely status update in internet of things monitoring systems: An
age-energy tradeoff,IEEE Internet of Things Journal, 2019.
[9] Y. Sun, E. Uysal-Biyikoglu, R. D. Yates, C. E. Koksal, and N. B. Shroff, “Update or wait: How to keep your data fresh,
IEEE Transactions on Information Theory, vol. 63, no. 11, pp. 7492–7508, 2017.
[10] M. Costa, M. Codreanu, and A. Ephremides, “On the age of information in status update systems with packet management,”
IEEE Transactions on Information Theory, vol. 62, no. 4, pp. 1897–1910, 2016.
[11] Y. Gu, H. Chen, C. Zhai, Y. Li, and B. Vucetic, “Minimizing age of information in cognitive radio-based iot systems:
Underlay or overlay?” IEEE Internet of Things Journal, 2019.
[12] I. Kadota, A. Sinha, and E. Modiano, “Optimizing age of information in wireless networks with throughput constraints,”
in IEEE INFOCOM 2018-IEEE Conference on Computer Communications. IEEE, 2018, pp. 1844–1852.
[13] I. Kadota, A. Sinha, E. Uysal-Biyikoglu, R. Singh, and E. Modiano, “Scheduling policies for minimizing age of information
in broadcast wireless networks,” IEEE/ACM Transactions on Networking (TON), vol. 26, no. 6, pp. 2637–2650, 2018.
[14] I. Kadota and E. Modiano, “Minimizing the age of information in wireless networks with stochastic arrivals,arXiv preprint
arXiv:1905.07020, 2019.
[15] R. D. Yates and S. K. Kaul, “Status updates over unreliable multiaccess channels,” in 2017 IEEE International Symposium
on Information Theory (ISIT). IEEE, 2017, pp. 331–335.
[16] Z. Jiang, B. Krishnamachari, S. Zhou, and Z. Niu, “Can decentralized status update achieve universally near-optimal age-
of-information in wireless multiaccess channels?” in 2018 30th International Teletraffic Congress (ITC 30), vol. 1. IEEE,
2018, pp. 144–152.
[17] A. Maatouk, M. Assaad, and A. Ephremides, “Minimizing the age of information in a csma environment,arXiv preprint
arXiv:1901.00481, 2019.
[18] H. Chen, Y. Gu, and S.-C. Liew, “Age-of-information dependent random access for massive iot networks,arXiv preprint
arXiv:2001.04780, 2020.
[19] Y.-P. Hsu, E. Modiano, and L. Duan, “Scheduling algorithms for minimizing age of information in wireless broadcast
networks with random arrivals,IEEE Transactions on Mobile Computing, 2019.
[20] A. Maatouk, M. Assaad, and A. Ephremides, “Minimizing the age of information: Noma or oma?” arXiv preprint
arXiv:1901.03020, 2019.
[21] Z. Ding, Y. Liu, J. Choi, Q. Sun, M. Elkashlan, I. Chih-Lin, and H. V. Poor, “Application of non-orthogonal multiple
access in lte and 5g networks,” IEEE Communications Magazine, vol. 55, no. 2, pp. 185–191, 2017.
[22] Y. Saito, Y. Kishiyama, A. Benjebbour, T. Nakamura, A. Li, and K. Higuchi, “Non-orthogonal multiple access (noma) for
cellular future radio access,” in 2013 IEEE 77th vehicular technology conference (VTC Spring). IEEE, 2013, pp. 1–5.
[23] Z. Dong, H. Chen, J.-K. Zhang, and L. Huang, “On non-orthogonal multiple access with finite-alphabet inputs in z-
channels,” IEEE Journal on Selected Areas in Communications, vol. 35, no. 12, pp. 2829–2845, 2017.
[24] Y. Yu, H. Chen, Y. Li, Z. Ding, and B. Vucetic, “On the performance of non-orthogonal multiple access in short-packet
communications,” IEEE Communications Letters, vol. 22, no. 3, pp. 590–593, 2017.
[25] W. B. Powell, Approximate Dynamic Programming: Solving the curses of dimensionality. John Wiley & Sons, 2007, vol.
703.
[26] J. Gittins, K. Glazebrook, and R. Weber, Multi-armed bandit allocation indices. John Wiley & Sons, 2011.
[27] J. Sun, Z. Jiang, S. Zhou, and Z. Niu, “Optimizing information freshness in broadcast network with unreliable links and
random arrivals: An approximate index policy,” in IEEE INFOCOM 2019-IEEE Conference on Computer Communications
Workshops (INFOCOM WKSHPS). IEEE, 2019, pp. 115–120.
[28] J. Cui, Z. Ding, and P. Fan, “A novel power allocation scheme under outage constraints in noma systems,IEEE Signal
Processing Letters, vol. 23, no. 9, pp. 1226–1230, 2016.
[29] P. Xu, Y. Yuan, Z. Ding, X. Dai, and R. Schober, “On the outage performance of non-orthogonal multiple access with
1-bit feedback,” IEEE Transactions on Wireless Communications, vol. 15, no. 10, pp. 6716–6730, 2016.
[30] P. Xu and K. Cumanan, “Optimal power allocation scheme for non-orthogonal multiple access with α-fairness,IEEE
Journal on Selected Areas in Communications, vol. 35, no. 10, pp. 2357–2369, 2017.
[31] Z. Ding, Z. Yang, P. Fan, and H. V. Poor, “On the performance of non-orthogonal multiple access in 5g systems with
randomly deployed users,” IEEE signal processing letters, vol. 21, no. 12, pp. 1501–1505, 2014.
[32] L. I. Sennott, Stochastic dynamic programming and the control of queueing systems. John Wiley & Sons, 2009, vol. 504.
[33] X. Guo and Q. Zhu, “Average optimality for markov decision processes in borel spaces: a new condition and approach,
Journal of Applied Probability, vol. 43, no. 2, pp. 318–334, 2006.
[34] M. L. Puterman, Markov Decision Processes.: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We consider an Internet-of-Things (IoT) monitoring system, in which an IoT device monitors a physical process and transmits randomly generated status updates to its associated Access Point (AP) as timely as possible. The timeliness of the status updates is characterized by a recently introduced metric, termed the Age of Information (AoI), which is defined as the time elapsed since the generation of the last successfully received status update. The channel between the IoT device and the AP is considered to be error-prone and thus the status updates suffer from packet loss. Assuming that the AP provides no feedback to the IoT device, we adopt a practical Truncated Automatic Repeat reQuest (TARQ) scheme: the IoT device keeps transmitting the current status update repeatedly until the maximum allowable transmission times is reached or a new status update is generated. We characterize the inherent age-energy tradeoff for the considered IoT monitoring system. Specifically, a larger value of the maximum allowable transmission times reduces the average AoI, at the cost of incurring higher average energy consumption at the IoT device. Based on the evolution of AoI, we derive the closed-form expressions of the average AoI, the average peak AoI, and the average energy consumption. We then minimize the average AoI by optimizing the transmit power of the IoT device and the maximum allowable transmission times under an average transmit power constraint. Simulations validate the theoretical analysis and reveal that under the same average transmit power constraint, the adopted TARQ scheme achieves a lower average AoI than the classical ARQ scheme that allows an infinite number of retransmission times.
Article
This article considers a cognitive radio-based IoT monitoring system, consisting of an IoT device that aims to update its measurement to a destination using cognitive radio technique. Specifically, the IoT device as a secondary user (SIoT), seeks and exploits the spectrum opportunities of the licensed band vacated by its primary user (PU) to deliver status updates without causing visible effects to the licensed operation. In this context, the SIoT should carefully make use of the licensed band and schedule when to transmit to maintain the timeliness of the status update. The timeliness of the status update characterizes how the destination knows the latest information of the SIoT. We adopt a recent metric, Age of Information (AoI), to characterize the timeliness of the status update of the SIoT. We aim to minimize the long-term average AoI of the SIoT while satisfying the collision constraint imposed by the PU by formulating a constrained Markov decision process (CMDP) problem. We first prove the existence of optimal stationary policy of the CMDP problem. The optimal stationary policy (termed age-optimal policy) is shown to be a randomized simple policy that randomizes between two deterministic policies with a fixed probability. We prove that the two deterministic policies have a threshold structure and further derive the closed-form expression of average AoI and collision probability for the deterministic threshold-structured policy by conducting Markov Chain analysis. The analytical expression offers an efficient way to calculate the threshold and randomization probability to form the age-optimal policy. For comparison, we also consider the throughput maximization policy (termed throughput-optimal policy) and analyze the average AoI performance under the throughput-optimal policy in the considered system. Numerical simulations show the superiority of the derived age-optimal policy over the throughput-optimal policy. We also unveil the impacts of various system parameters on the corresponding optimal policy and the resultant average AoI.
Article
We consider a wireless network with a base station serving multiple traffic streams to different destinations. Packets from each stream arrive to the base station according to a stochastic process and are enqueued in a separate (per stream) queue. The queueing discipline controls which packet within each queue is available for transmission. The base station decides, at every time t, which stream to serve to the corresponding destination. The goal of scheduling decisions is to keep the information at the destinations fresh. Information freshness is captured by the Age of Information (AoI) metric. In this paper, we derive a lower bound on the AoI performance achievable by any given network operating under any queueing discipline. Then, we consider three common queueing disciplines and develop both an Optimal Stationary Randomized policy and a Max-Weight policy under each discipline. Our approach allows us to evaluate the combined impact of the stochastic arrivals, queueing discipline and scheduling policy on AoI. We evaluate the AoI performance both analytically and using simulations. Numerical results show that the performance of the Max-Weight policy is close to the analytical lower bound.
Article
We consider a cognitive radio-based Internet-of-Things (CR-IoT) network consisting of one primary IoT (PIoT) system and one secondary IoT (SIoT) system. The IoT devices of both the PIoT and the SIoT respectively monitor one physical process and send randomly generated status updates to their associated access points (APs). The timeliness of the status updates is important as the systems are interested in the latest condition (e.g., temperature, speed and position) of the IoT device. In this context, two natural questions arise: (1) How to characterize the timeliness of the status updates in CR-IoT systems? (2) Which scheme, overlay or underlay, is better in terms of the timeliness of the status updates. To answer these two questions, we adopt a new performance metric, named the age of information (AoI). We analyze the average peak AoI of the PIoT and the SIoT for overlay and underlay schemes, respectively. Simple asymptotic expressions of the average peak AoI are also derived when the PIoT operates at high signal-to-noise ratio (SNR). Based on the asymptotic expressions, we characterize a critical generation rate of the PIoT system, which can determine the superiority of overlay and underlay schemes in terms of the average peak AoI of the SIoT. Numerical results validate the theoretical analysis and uncover that the overlay and underlay schemes can outperform each other in terms of the average peak AoI of the SIoT for different system setups.