Conference PaperPDF Available

A Reinforcement Learning Approach to Dynamic Spectrum Access in Internet-of-Things Networks

Authors:
A Reinforcement Learning Approach to Dynamic Spectrum Access
in Internet-of-Things Networks
Han Cha and Seong-Lyun Kim
School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea
Email: {chan, slkim}@ramo.yonsei.ac.kr
Abstract—To support wireless communication traffic of
Internet-of-Things (IoT) systems in terms of massive connec-
tivity, dynamic spectrum access (DSA) is important issue. This
paper proposes spectrum sensor-aided DSA system based on
a reinforcement learning (RL) algorithm that aims at efficient
spectrum usage for IoT network over the incumbent network.
Due to small-form-factor of IoT devices, they do not have
spectrum sensing capability. To support DSA of IoT devices,
we introduce sensor-aided DSA system that enhances spatial
spectrum reusability by means of RL algorithm. With the RL
algorithm, proposed DSA system provides self-organizing feature
for massive number of IoT devices. We show that the performance
of proposed RL based DSA system in various densities of IoT
devices utilizing slotted ALOHA protocol that has spectrum
access probability learned by proposed DSA system. We also
present the performance of proposed RL based DSA system
surpass that of distributed Carrier Sensing Multiple Access with
Collision Avoidance (CSMA/CA) protocol for channel access
coordination. We also present the consistent performance of
incumbent user when the IoT devices access to the spectrum
band with learned spectrum access probability.
Index Terms—Reinforcement learning, Dynamic spectrum ac-
cess, Random MAC, Internet-of-Things, Area spectral efficiency.
I. INTRODUCTION
In recent years, there are enormous applications that re-
quire tremendous wireless communications traffic. Machine-
to-machine (M2M) communication is one of the traffic-
demanding application in terms of huge connectivity require-
ment. Internet-of-Things (IoT) technology is a key enabler for
realizing M2M communication. With the IoT technology, tons
of devices are connected through Internet that enables devices
to exchange the information in real-time. To support those
of connections in wireless communications manner, explosive
demand of spectrum resources is obvious.
Dynamic spectrum access (DSA) scheme coupled with
random access is one of the promising technology to satisfy
such demand of IoT networks. The DSA-aided IoT network
utilizes the spectrum bandwidth more efficiently by exploit-
ing underutilized spectrum that is not used by incumbent
users. This opportunity-aware spectrum access is supported by
spectrum sensing functionality that requires high-performance
features: digital-to-analog converter, signal processor, RF unit,
single/double link structure, and so on. It is difficult to equip
those of complex architecture to IoT device because of low
device cost requirement. To overcome such spectrum sensing
capability constraint, dedicated spectrum sensors support DSA
of IoT devices [1], [2].
1
Incumbent user
IoT device
Spectrum
sensor
Fig. 1: A system model of the Internet-of-Things (IoT)
network utilizing spectrum sensor-aided dynamic spectrum
access framework. Spectrum sensors provide spectrum usage
of incumbent users to IoT devices at each location of sensors.
IoT devices utilize this information for spectrum access.
Spectrum sensors collect the value of aggregate interference
at their positions and inform the values to a central unit that
controls the spectrum access of IoT devices. The central unit
determines the spectrum access probability of IoT devices in
certain moment based on collected information of aggregate
interference. IoT devices utilize DSA scheme, when the central
unit must consider harmful interference to another IoT devices
as well as incumbent user from IoT devices. Unfortunately,
the central unit may have no information about location of
IoT devices as well as incumbent user. To determine the
spectrum access probability of IoT devices, the central unit
needs to investigate the impact of each IoT devices to the
entire network.
In this context, reinforcement learning (RL) technique is
suitable for handling unknown information. The RL system
tries to learn the action with respect to maximizing numerical
reward by interacting with random environment iteratively [3].
RL has been applied various field of mobile communications:
Resource allocation for mobile cellular networks [4], and [5],
cell outage management for dense heterogeneous networks [6]
and channel selection for D2D communications [7]. In the
cognitive spectrum access area, the papers [8]- [9] consider
the dynamic multichannel access scenario for users with
performing RL process. In [10] and [9], the cognitive radio
system is designed for a single user. On the other hand, this
paper considers the dynamic spectrum access scenario for a
large number of low-capability users as well as enhancing
spatial spectrum reusability.
Since the numerical reward has stochastic characteristics,
the central unit has to repeat trials based on an RL framework
for finding proper access probability of IoT devices. The rest
of the paper organized as follows. First, we present our system
model and optimization problem that maximizes area spectral
efficiency of DSA IoT network in sections II and III. Second,
we introduce the reinforcement learning DSA procedure that
is conducted by a central unit in section IV. Finally, we
investigate the performance of the RL based DSA system with
various densities of STX along the finalized learning step in
section V. We also provide a performance comparison between
proposed RL based DSA system and Carrier Sense Multiple
Access with Collision Avoidance (CSMA/CA). In this section,
we show an impact to incumbent network from IoT network
with respect to average spectral efficiency.
II. SY ST EM MO DE L
A. Network Model
Consider a DSA IoT network where transmitters communi-
cate with receivers over an incumbent user network. We define
the incumbent user network as the primary user network and
IoT network as the secondary user network. The Nnumber
of secondary transmitters (STXs) are governed by a central
unit that determines the spectrum access probabilities with
aggregate interference obtained from spectrum sensors. The
STXs transmit the packet to its paired secondary receiver
(SRX) through a wireless channel, which have infinite back-
logged data to transmit. We denote the activation vector as
a= [a1, a2,· · · , aN]where the component akis one if the
kth STX is active, otherwise zero.
The STXs operate according to the value of activation vector
generated by the central unit or STX itself. Except for learning
process, the STX generates the activation vector, i.e. determine
the packet transmission itself. We assume that the primary
transmitters (PTXs) are always transmit its packet to primary
receivers (PRXs).
B. Channel Model
The transmitted signal experiences path-loss attenuation
with the exponent αas well as Rayleigh fading with unity
mean, i.e. hexp(1). The fading coefficient of kth secondary
pair is hk. We assume that the identical distance between
paired STX and SRX denoted by d. Primary and secondary
networks share common wireless channel, interfering with
each other. When the kth STX transmits the packet, signal-to-
interference-plus-noise-ratio (SINR) γkat the paired kth SRX
is given by:
γk(a) = P2hkdα
Pi6=kaiP2hikdα
ik +Ik+σ2,(1)
where Ikis aggregate interference from the primary network
at kth SRX, and hik is the fading coefficients from ith STX to
kth SRX. The value dik is distance between ith STX and kth
SRX. The value P2denotes the transmit power of STX. The
value σ2represents the thermal noise power.
III. PROB LE M FOR MU LATI ON
In the DSA IoT network, it is important to increase the
number of concurrent transmissions while guaranteeing the
individual quality of transmission. To this end, our prime
concern is to find the optimal access probability pthat
maximizes area spectral efficiency (ASE) defined as the sum
of data rates per unit bandwidth in the unit area [11]. To
find p, we formulate the following optimization problem:
p= arg max
p
log2(1 + β)·E"X
k
1γkβ#,(P1)
s.t. 0pk1,k= 1,· · · , N,
where Ndenotes the number of STXs and β > 0is a target
SINR threshold of SRX. Note that 1γkβis an indicator
function yielding one if γkβ, otherwise zero. The term
E[Pk1γkβ]represents the average number of successful
transmissions. The objective function represents the average
ASE in terms of optimal access probability of STXs.
The challenge for finding the optimal access probability
of STXs arises from the stochastic nature of communication
channels. The central unit does not know the location of
the STXs as well as the location of PTXs. According to
the lack of global information including channel coefficients,
the problems become intractable. The stochastic nature of
communication channels is effectively handled by evaluating
objective function repeatedly, which is a key property of RL
framework [12]. Therefore, we propose an RL based DSA
system suitable for stochastic environment.
IV. REI NF OR CE ME NT LE AR NI NG BASED DYNAM IC
SPE CT RUM AC CE SS SY ST EM
In this section we introduce the learning operation of sensor-
aided DSA IoT network which is performed by central unit.
We first introduce the operation of the central unit and describe
the learning procedure of proposed RL based DSA system.
A. The Central Unit
Let us consider a central unit interacts with the environment
along time steps denoted by tas presented in Fig. 2. The
central unit poses an action by controlling the spectrum access
of STXs, and gather the transmission results of secondary
pairs. With these results, the central unit calculates the benefit
of transmission of each STX. After that, the central unit adjusts
the access probabilities of STXs that produce higher ASE
value. These procedure is based on REINFORCE learning
2
Environment
The central unit
Transmission !
results
Action
Fig. 2: The reinforcement learning system. The central unit
interacts with wireless network by controlling the spectrum
access of secondary transmitters.
algorithm [12], which has an advantage for handling stochastic
environment. Now we describe the learning model.
In every time step, the central unit produces the activa-
tion vector a(t)=[a1(t), a2(t),· · · , aN(t)] that contains
the Bernoulli random variables. The probabilities of those
random variables are defined with the access probability vector
p(t)=[p1(t), p2(t),· · · , pN(t)], i.e. a(t) = Bernoulli(p(t)).
The access probability vector p(t)is related with internal
status vector w(t)and a(t)by means of the modified sigmoid
function as:
pk(t) = 1
1 + ewk(t)·,where t= 1,2, .... (2)
The central unit finds the optimal access probability vector
pby learning internal state vector w(t)according to the
following updating rules:
wk(0) = ln1pk(0)
pk(0) ,(3)
wk(t+ 1) = wk(t) + α(t){u(t)u(t)}Gk(t),(4)
Gk(t) = (1)ak(t)+1
1 + ewk(t)(1)ak(t)+1 ,(5)
α(t+ 1) = α·t, (6)
where the α(t)is a learning rate which monotonically de-
creases with and the function Gk(t)is a gradient of pk(t)
with respect to wk(t). Note that an utility function u(t)and
the average utility function u(t)is defined as follows:
u(t) = log2(1 + β)·X
k
1γkβ(a(t)),(7)
u(t+ 1) = (1 λ)u(t) + λu(t),where 0< λ 1,(8)
where λis proportion of applying the value of utility function
at ttimeslot in the average utility function. We define the
baseline as the average utility function, which enables stable
performance enhancement of secondary network in reinforce-
ment learning procedure.
B. Proposed Reinforcement Learning Procedure
The proposed reinforcement learning procedure is com-
prised as follows:
1
2
3
5
4
6
7
Fig. 3: Network topologies where the network size is 50 m ×
50 m with the number of secondary pairs is 20. We extensively
simulate about the number of secondary pairs from 20 to
70. We note that the pentagrams represent spectrum sensors.
We give a number to PTXs so as to distinguish each other.
Otherwise we mention, the location of all PTXs, STXs, SRXs,
sensors are fixed.
Algorithm 1 Learning procedure of the central unit
1: Initializes p(0),w(0),a(0) with Sk,k= 1,2, ..., N .
2: Informs a(0) to STXs.
3: The STXs transmit according to a(0).
4: for t= 0 to Tl
5: Collects 1γk(a(t)), k = 1,2, ..., N.
6: Updates u(t)according to Eq. (7).
7: Updates w(t+ 1) according to Eq. (4).
8: Updates u(t+ 1) according to Eq. (8).
9: Calculates p(t+ 1) according to Eq. (2).
10: Produces a(t+ 1) according to .
11: Informs a(t+ 1) to STXs.
12: The STXs transmit according to a(t+ 1).
13: end for
a) Spectrum sensing: In this period, the central unit collects
the interference value for calculating initial access probability
of STXs. Each sensor nearest to each STX informs the
measured aggregate interference value to the central unit. We
denote the measured aggregate interference by sensor as Sk.
b) Initializing: The central unit initializes the access proba-
bility and the internal state vector. The central unit determines
initial access probability p(0) of STXs with the interference
level Skmeasured by kth sensors, where k= 1,2, ..., N. Then
the central unit calculates the initial internal state values w(0)
with p(0) according to inverse function of Eq. (2).
c) Access probability learning: In this period, the central unit
interacts with the environment, i.e., wireless network utilizing
REINFORCE learning algorithm. The central unit controls the
transmission of STXs along the time step t= 1,2, ..., Tl.
After the transmission of STXs, the central unit collects
the transmission results of STXs to assess the benefit of
transmission of each STX. The access probability learning
procedure of the central unit is described in Algorithm 1. Note
that Bernoulli(p) is one with probability p, otherwise zero.
d) Determining the final access probability: After REIN-
FORCE learning algorithm is done, the central unit determines
the final access probability of STXs plas follows:
pl=p(Tl).(9)
Note that the access probabilities of STXs converge to one
or zero for large enough Tl. The proof about convergence
behavior of access probabilities is represented in [12]. We
utilize this behavior for determining access probability of
STXs in learning procedure. With the final access probability,
kth STXs transmit its packet according to slotted ALOHA
protocol with access probability plk.
In the next section, we evaluate the performance of the
proposed spectrum access procedure.
V. SIMULATION RES ULTS
A. General Settings
Consider a network within a square of length Lnet that is
set as 50 m (Fig. 3). The STXs and sensors are distributed
randomly in the network with density λs, and λssr respec-
tively. Otherwise we mention, the density of sensors λssr is
identical in all simulation cases as 0.24. The pair distance of
secondary users dis 3 m. We assume that the secondary users
use QAM constellation, i.e. the target SINR threshold of SRX
is 3 dB. TABLE I summarizes the parameters of simulations.
In Fig. 3, the location and the number of PTXs is same in
all simulations. We obtain average ASE of secondary users
by executing slotted ALOHA transmission of STX-SRX pairs
with the final access probability repeating 100,000 times.
We compare the performance of proposed RL based DSA
system and a reference system with respect to average ASE.
We implement the reference system as Carrier Sense Multiple
Access with Collision Avoidance (CSMA/CA) protocol, which
is based on a Clear Channel Access (CCA) and random back-
off mechanism. We assume that CSMA/CA protocol has a
fixed carrier sensing range that is double of pair distance das
a conventional setting [13], i.e., 6 m.
The initial access probability of STXs is determined by
means of sense-and-predict (SaP) method presented in [14],
[15]. The interference level at STXs is different from that of
sensors due to location difference. When it comes to using
sensor value to obtain spectrum access probability of STXs,
difference of interference level hinders accurate decision of
this probability. The SaP method aims at overcome this differ-
ence by predicting the interference level at STXs using spatial
correlation of interference. With the predicted interference
0 2500 5000 7500 10000 12500 15000
Tl
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Final access proability
Fig. 4: Final access probabilities: the density of STXs is
0.004. Each line denotes the transition of the final access
probability of STXs along Tl. The STXs locating at the sparse
area that free from impact of PTXs have near-to-one final
access probability. These STXs capture spatial transmission
opportunities.
level, the SaP method provides successful transmission proba-
bility of STXs by means of stochastic geometry approach. We
refer successful transmission as OP.
TABLE I: Simulation parameters
Parameters Values
Network size (Lnet ×Lnet) 50 m ×50 m
STX node density (λs) 0.008, 0.012, ... , 0.028
Spectrum sensor node density (λssr) 0.24
Distance between STX-SRX pair (d) 3 m
Target SINR threshold (β)3dB
Transmit power of STX (P2)23 dBm
Transmit power of PTX (P1)43 dBm
Tl500,1000, ..., 15000
Initial learning rate (α)0.025
Initial access probability (p(0))OP [14]
α/(Tl+ 100)
Carrier sensing range for CSMA/CA 6 m
Distance between PTX and PRX 10 m
B. Average Area Spectral Efficiency Validation
In Fig. 5, we evaluate the performance of RL based DSA
system in various STX densities. As the density of STXs is
increasing, the average ASE is improved when the learning
procedure is done. This because more the STXs locate at the
sparse area, where far away from PTXs, as we can see in Fig.
3. These STXs just consider the other STXs around itself, so
more STXs freely communicate with its paired SRXs. After Tl
has a value of around 5,000, the performance enhancement is
slowdown in every simulation case. So we need to choose
0 2500 5000 7500 10000 12500 15000
Tl
1.5
2
2.5
3
3.5
4
4.5
Average area spectral efficiency (bps/Hz/m2)
10-3
s=0.028
s=0.024
s=0.020
s=0.016
s=0.012
s=0.008
Fig. 5: The average area spectral efficiencies of various STX
densities. As the density of STXs is higher, the performance
gain due to learning procedure is larger.
proper Tltrade-off between learning cost and performance
gain.
In Fig. 6, the performance of RL based DSA system
surpasses those of CSMA/CA as we mention before. If STXs
utilize CSMA/CA protocol, STXs located sparse area lose
transmission opportunity waiting for CCA procedure. In the
RL based DSA system, STXs located sparse area have final
access probability close to 1 (see Fig. 4), which captures spa-
tial transmission opportunity [16]. The STXs that have near-
to-one final access probability transmit its packet whenever
transmission is needed. Therefore, the performance of RL
based DSA system surpasses those of CSMA/CA protocol.
C. Investigating Impact to Primary Network
With slotted ALOHA protocol utilizing the access probabil-
ity with final access probability obtained by RL based DSA
system, secondary network has marginal impact to primary
network as we can see in Fig. 7. Nevertheless, as Tlgetting
larger, the average spectral efficiency of primary user slightly
deteriorates. As we discussed in previous section, modest value
of Tlhas sufficient performance enhancement of secondary
network. Therefore, selecting proper value of Tlmay help
enhancing aggregate performance through primary and sec-
ondary network, which can be optimized in future work.
VI. CONCLUSION
In this paper, we propose the reinforcement learning (RL)
based dynamic spectrum access (DSA) system which aims at
efficient spectrum usage for Internet-of-Things (IoT) networks.
The limitation of IoT device is presented and the architecture
of sensor-aided DSA IoT network is described. The main
objective of RL based DSA system is enhancing the spatial
spectrum reusability of IoT network, which is evaluated by
0.008 0.012 0.016 0.02 0.024 0.028
Density of STXs
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Average area spectral efficiency (bps/Hz/m2)
10-3
RL based DSA
CSMA/CA
Fig. 6: Performance comparison between proposed RL based
DSA system and CSMA/CA. The final access probabilities
when Tl= 10,000 is used for obtaining performance of
proposed RL based DSA system.
0 2500 5000 7500 10000 12500 15000
Tl
23
24
25
26
27
28
29
30
Average spectral efficiency (bps/Hz)
PTX=2
PTX=6
PTX=5
PTX=4
PTX=3
PTX=7
PTX=1
Fig. 7: Average spectral efficiencies of PTXs when the density
of STX is 0.004, i.e., the number of STXs is 20. The distance
between PTXs and PRXs is 10m.
average area spectral efficiency (ASE). We investigate the
performance of proposed RL based DSA system in various
densities of IoT devices. As the learning period is longer,
the performance of IoT network has more performance en-
hancement but gain is marginal after certain value of learning
period. We show the impact to incumbent network from IoT
network is marginal with respect to average spectral efficiency
of incumbent network. But as the learning period is longer,
there is slight performance degradation of incumbent network.
So we have to select proper learning period so as to reducing
impact on incumbent network as well as learning cost for
marginal performance enhancement of IoT network, as future
research topic. Also, the system architecture when there is no
dedicated control channel has to be considered to implement
the RL based DSA system in real wireless communication
networks.
ACK NOW LE DG EM EN T
This work was partly supported by Institute for Information
& communications Technology Planning & Evaluation (IITP)
grant funded by the Korea government (MSIT) (No. 2018-0-
00923, Scalable Spectrum Sensing for Beyond 5G Communi-
cation) and IITP grant funded by the MSIT (No.2018-0-00170,
Virtual Presence in Moving Objects through 5G).
REFERENCES
[1] G. A. Akpakwu, B. J. Silva, G. P. Hancke, and A. M. Abu-Mahfouz,
“A survey on 5g networks for the internet of things: communication
technologies and challenges,” IEEE Access, vol. 6, pp. 3619–3647, Dec.
2017.
[2] Z. Zhang, W. Zhang, S. Zeadally, Y. Wang, and Y. Liu, “Cognitive
radio spectrum sensing framework based on multi-agent architecture for
5g networks,” Wireless Communications, vol. 22, no. 6, pp. 34–39, Dec.
2015.
[3] R. S. Sutton and A. G. Barto, “Reinforcement learning: an introduction,”
Cambridge, MA: MIT Press, Mar. 1998.
[4] G. Alnwaimi, S. Vahid, and K. Moessner, “Dynamic heterogeneous
learning games for opportunistic access in lte-based macro/femtocell
deployments,” IEEE Transactions on Wireless Communications, vol. 14,
no. 4, pp. 2294–2308, Apr. 2015.
[5] F. Bernardo, R. Agust, J. Perez-Romero, and O. Sallent, “An application
of reinforcement learning for efficient spectrum usage in next-generation
mobile cellular networks,” IEEE Transactions on Systems, Man, and
Cybernetics-PART C: Applications and Reviews, vol. 40, no. 4, pp. 477–
484, Jul. 2010.
[6] O. Onireti, A. Zoha, J. Moysen, A. Imran, L. Giupponi, M. A. Imran,
and A. Abu-Dayya, “A cell outage management framework for dense
heterogeneous networks,” IEEE Transactions on Vehicular Technology,
vol. 65, no. 4, pp. 2097–2113, Apr. 2016.
[7] S. Maghsudi and S. Stanczak, “Channel selection for network-assisted
d2d communication via no-regret bandit learning with calibrated fore-
casting,” IEEE Transactions on Wireless Communications, vol. 14, no. 3,
pp. 1309–1322, Mar. 2015.
[8] R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, and J. Palicot, “Multi-
armed bandit learning in iot networks: learning helps even in non-
stationary settings,” in Proc. 12th EAI International Conference on
Cognitive Radio Oriented Wireless Networks (CROWNCOM) 2017,
Lisbon, Portugal, pp. 173–185, Feb. 2018.
[9] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep reinforce-
ment learning for dynamic multichannel access in wireless networks,”
IEEE Transactions on Cognitive Communications and Networking,
vol. 4, no. 2, pp. 257–264, Jun. 2018.
[10] V. Raj, I. Dias, T. Tholeti, and S. Kalyani, “Spectrum access in cognitive
radio using a two stage reinforcement learning approach,” IEEE Journal
of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 20–34, Jan.
2018.
[11] D. M. Kim and S. L. Kim, “Exploiting regional differences: A spatially
adaptive random access,IEEE Transaction on Wireless Communica-
tions, vol. 14, no. 8, pp. 4342–4352, Aug. 2015.
[12] V. V. Phansalkar and M. A. L. Thathachar, “Local and global optimiza-
tion algorithms for generalized learning automata,” Neural Computer,
vol. 7, no. 5, pp. 950–973, Sep. 1995.
[13] K. Xu, M. Gerla, and S. Bae, “How effective is the ieee 802.11
rts/cts handshake in ad hoc networks,” in Proc. ’02. IEEE Global
Telecommunications Conference, 2002, Taipei, Taiwan, Nov. 2002.
[14] J. Kim, S. W. Ko, H. Cha, and S. L. Kim, “Sense-and-predict: oppor-
tunistic mac based on spatial interference correlation for cognitive radio
networks,” in Proc. 2017 IEEE International Symposium on Dynamic
Spectrum Access Networks (DySPAN), Baltimore, MD, USA, Mar. 2017.
[15] ——, “Testbed verification of spectrum access opportunity detection in
cognitive radio networks,in Proc. IEEE Asia-Pacific Conference on
Communications (APCC), Perth, Australia, Dec. 2017.
[16] T. Novlan, J. D. Matyjas, B. L. Ng, and J. Zhang, “Spatial spectrum
sensing-based device-to-device cellular networks,IEEE Transactions on
Wireless Communications, vol. 15, no. 11, pp. 7299–7313, Nov. 2016.
... Unsupervised learning: K-means [192]; DBSCAN [180]; Gaussian mixture model [177]; Affinity Propagation [178] Reinforcement Learning: Q-learning [186][187][188]; SARSA [189]; DQN [189][190]; DDQN [191]; ...
... It is shown that the proposed distributed slot allocation scheme outperforms the centralized approach in terms of average SIR and convergence capabilities. Then, a spectrum sensors-aided IoT network is considered In [187], where to enhance the spatial spectrum re-usability, an RL-based dynamic spectrum access (DSA) algorithm is proposed which learns the spectrum access probability for the slotted ALOHA protocol utilized by IoT devices. The proposed RL-based DSA approach is claimed to outperform the distributed Carrier Sensing Multiple Access with Collision Avoidance (CSMA/CA) protocol regarding area spectral efficiency. ...
... Specifically, for LoRa-enabled IoT communication networks, a DNN-based algorithm can be used to assign suitable spreading factors to IoT devices for uplink communication to mitigate the effect of interference and enhance the packet success ratio [182]. Random channel access probability for Slotted-ALOHA protocol can be controlled with RL [187] or DRL [190] techniques for the IoT devices to access the shared channel for uplink communication. ...
Article
Full-text available
Fifth-generation and Beyond (5GB) networks are transformational technologies to revolutionize future wireless communications in terms of massive connectivity, higher capacity, lower latency, and ultra-high reliability. To this end, 5GB networks are designed as a coalescence of various schemes and enabling technologies such as unmanned aerial vehicles (UAV)-assisted networks, vehicular networks, heterogeneous cellular networks (HCNs), Internet of things (IoT), device-to-device (D2D) communication, millimeter-wave (mm-wave), massive multiple-input multiple-output (mMIMO), non-orthogonal multiple access (NOMA), re-configurable intelligent surface (RIS) and Terahertz (THz) communications. Due to the scarcity of licensed bands and the co-existence of multiple technologies in unlicensed bands, interference management is a pivotal factor in enhancing the user experience and quality of service (QoS) in future-generation networks. However, due to the highly complex scenarios, conventional interference mitigation techniques may not be suitable in 5GB networks. To cope with this, researchers have investigated artificial intelligence (AI)-based interference management techniques to tackle complex environments. Existing surveys either focus on conventional interference management methods or AI-based interference management only for a specific scheme or technology. This survey article complements the existing survey literature by providing a detailed review of AI-based intentional-interference management such as jamming detection and mitigation, and AI-enabled unintentional-interference mitigation techniques from the standpoints of UAV-assisted networks, vehicular networks, HCNs, D2D, IoT, mmWave-MIMO, NOMA, and THz communications. While identifying and presenting the AI-based techniques for interference management in 5G and beyond networks, this article also points out the challenges, open issues, and future research directions to adopt AI-enabled techniques to curtail the effects of interference in 5GB and towards 6G networks.
... The prediction is quantified in terms of a spatial interference correlation between the two locations. In [9], the authors proposed a dynamic spectrum access system based on a reinforcement learning algorithm. The algorithm collects the value of sum interference at the positions of spectrum sensors and informs the values to a central unit aimed at efficient spectrum usage for an Internet-of-things network over the incumbent network. ...
... 8 Estimate SINR based on the known SNR aŝ γ = σ t+1 /(1 +Î t+1 ). 9 Allocate blocklength based on the estimated SINR via (8). 10 After observation of I t+1 , update the transition matrix P via (6). ...
Conference Paper
Full-text available
In designing ultra-reliable low-latency communication (URLLC) services in 5G-and-beyond systems, link adaptation (LA) plays a vital role in adjusting transmission parameters under channel and interference dynamics. Without capturing such dynamics (e.g., relying on average estimates), the LA algorithms fail to simultaneously meet the strict reliability and latency bounds of mission-critical applications. To this end, this paper focuses on interference prediction-based adaptive resource allocation of one-shot URLLC transmission, wherein our solution deviates from the conventional average-based interference estimation schemes. We predict the next interference value based on the interference distribution estimation using a discrete-time Markov chain (DTMC). Further, to exploit the time correlation of each interference source, we model the correlated interference variations as a second-order DTMC to achieve higher prediction accuracy. While accounting for the risk sensitivity of interference estimates, the prediction outcome is then used for appropriate resource allocation of a URLLC transmission under link outage constraints. We evaluate the complete solution, given in the form of an algorithm, using Monte-Carlo simulations, and compare it with the first-order baseline counterpart. The analysis shows that the second-order interference estimate can fulfill the target outage as low as 10 −7 and improve the outage probability more than ten times in some scenarios compared to the baseline scheme while keeping the same amount of resource usage.
... In [122], the authors propose a sensor-aided dynamic spectrum access (DSA) system to enhance spectrum reusability by applying a reinforcement learning algorithm. DSA integration with random access is a promising approach to satisfy the needs of massive IoT networks. ...
... DSA schemes achieve higher spectrum efficiency by exploiting the sparse spectrum used by incumbent users. The main idea in [122] is optimization of the access probability in a centralized manner. The central unit learns the spectrum access probability of IoT devices in order to regulate IoT devices' activity. ...
Article
Driven by the need to ensure the connectivity of an unprecedentedly huge number of IoT devices with no human intervention the issues of massive connectivity have recently become one of the main research areas in IoT studies. Conventional wireless communication technologies are designed for Human-to-Human (H2H) communication which leads to major problems in primary access, channel utilization and spectrum efficiency when massive numbers of devices require connectivity. Current random access procedures are based on a four-step handshaking with control messages which contradicts the requirements of IoT applications in terms of small data payloads and low complexity. Targeted channel utilization and spectrum efficiency cannot be achieved using traditional orthogonal approaches. Thus the goal of our work is to review the most recent developments and critically evaluate the existing work related to the evolution of network access methods in the new communication era. The paper covers three major aspects: first the primary random access procedures, proposed for IoT communications are discussed. The second aspect focuses on the approaches for integration of existing random multiple access schemes with non-orthogonal multiple access methods (NOMA). This integration of random access procedures with NOMA opens a new research trend in the field of massive connectivity. Operating on space domains additional to the physical domain such as code and power domains, NOMA integration targets increased channel utilization and spectrum efficiency to complement the flexibility of random access. On the other hand, the design of efficient algorithms for massive connectivity in IoT is also challenged by the highly application and environmentally dependent traffic model. A new angle of tackling this problem has emerged thanks to the extensive developments in machine learning and the possibilities of their incorporation in communication networks. Thus, the final aspect this review paper addresses are the newly emerging research directions of incorporating machine learning (ML) methods for providing efficient IoT connectivity. Breakthrough ML techniques allow wireless networking devices to perform transmissions by learning and building knowledge about the communication and networking environment. A critical evaluation of the large body of work accumulated in this area in the most recent years and outlining of some major open research issues concludes the paper.
... In [14], authors proposed a Reinforcement Learning (RL)based DSA technique for spectrum allocation to the IoT users in a cellular network. In this proposal authors successfully showed that DSA technique can be used to identify the underutilized spectrum in the network to be reused for a sensoraided IoT network, subsequently, enhancing the spectrum reusability. ...
Article
Full-text available
The popularity of mobile broadband connectivity continues to grow and thus, the future wireless networks are expected to serve a very large number of users demanding a huge capacity. Employing larger spectral bandwidth and installing more access points to enhance the capacity is not enough to tackle the stated challenge due to related costs and the interference issues involved. In this way, frequency resources are becoming one of the most valuable assets, which require proper utilization and fair distribution. Traditional frequency resource management strategies are often based on static approaches, and are agnostic to the instantaneous demand of the network. These static approaches tend to cause congestion in a few cells, whereas at the same time, might waste those precious resources on others. Therefore, such static approaches are not efficient enough to deal with the capacity challenge of the future network. Thus, in this paper we present a dynamic access-aware bandwidth allocation approach, which follows the dynamic traffic requirements of each cell and allocates the required bandwidth accordingly from a common spectrum pool, which gathers the entire system bandwidth. We perform the evaluation of our proposal by means of real network traffic traces. Evaluation results presented in this paper depict the performance gain of the proposed dynamic access-aware approach compared to two different traditional approaches in terms of utilization and served traffic. Moreover, to acquire knowledge about access network requirement, we present a machine learning-based approach, which predicts the state of the network, and is utilized to manage the available spectrum accordingly. Our comparative results show that, in terms of spectrum allocation accuracy and utilization efficiency, a well designed machine learning-based bandwidth allocation mechanism not only outperforms common static approaches, but even achieves the performance (with a relative error close to 0.04) of an ideal dynamic system with perfect knowledge of future traffic requirements.
Article
This paper investigates a new medium access control (MAC) protocol for multi-channel heterogeneous networks (HetNets) based on deep reinforcement learning (DRL), referred to as multi-channel deep-reinforcement learning multiple access (MC-DLMA). Specifically, we consider a HetNet where different radio networks adopt different MAC protocols to transmit data packets to a common access point on different wireless channels. Three key challenges for the MC-DLMA node are (i) no environmental knowledge is known in advance; (ii) the channels in HetNets are allocated to nodes using different MAC protocols; (iii) the capacities of different channels may be different. The main goal of MC-DLMA is to find an optimal access policy to transmit on those pre-allocated channels and expedite more efficient spectrum utilization. Due to the complex temporal correlation of spectrum states in HetNets, the traditional DRL technique, e.g., original deep Q-network (DQN) algorithm, is no longer applicable to our problem. In our MC-DLMA design, an advanced class of recurrent neural network, termed as Gated Recurrent Unit (GRU), is embedded into the original DQN technique to aggregate observations over time and reason the underlying temporal feature in multi-channel HetNets. Furthermore, we analytically give the optimal spectrum access patterns and derive the optimal throughputs in various HetNet scenarios. With judicious definitions of the state, action, and reward function in the parlance of the DRL framework, simulation results show that MC-DLMA can (i) find the optimal spectrum access strategies in various HetNets, (ii) outperform the random access policy, the whittle index policy, and the original DQN, (iii) perform cooperative transmission in a fully distributed manner in the presence of multiple agents, and (iv) adapt well to the environmental changes.
Conference Paper
Full-text available
Detecting the spectrum access opportunity (OP) of a secondary user is a key technique in cognitive radio (CR) networks. Especially in a CR scenario where dedicated spectrum sensors are installed to check the spectrum utilization, the OP at locations where the sensor is not installed cannot be estimated. To cope with the issue, this paper proposes an OP map, in which a centralized server estimates the OP of secondary users based on the interference measurements of sensors. The OP is estimated by analyzing the spatial correlation of interference. The accuracy of OP detection is validated through the MATLAB simulations in conjunction with testbed experiments using universal software radio peripherals (USRPs) in Yonsei university, Seoul, South Korea.
Article
Full-text available
We consider a dynamic multichannel access problem, where multiple correlated channels follow an unknown joint Markov model. A user at each time slot selects a channel to transmit data and receives a reward based on the success or failure of the transmission. The objective is to find a policy that maximizes the expected long-term reward. The problem is formulated as a partially observable Markov decision process (POMDP) with unknown system dynamics. To overcome the challenges of unknown system dynamics as well as prohibitive computation, we apply the concept of reinforcement learning and implement a Deep Q-Network (DQN) that can deal with large state space without any prior knowledge of the system dynamics. We provide an analytical study on the optimal policy for fixed-pattern channel switching with known system dynamics and show through simulations that DQN can achieve the same optimal performance without knowing the system statistics. We compare the performance of DQN with a Myopic policy and a Whittle Index-based heuristic through both simulations as well as real-data trace and show that DQN achieves near-optimal performance in more complex situations. Finally, we propose an adaptive DQN approach with the capability to adapt its learning in time-varying, dynamic scenarios.
Article
Full-text available
The Internet of Things (IoT) is a promising technology which tends to revolutionize and connect the global world via heterogeneous smart devices through seamless connectivity. The current demand for Machine-Type Communications (MTC) has resulted in a variety of communication technologies with diverse service requirements to achieve the modern IoT vision. More recent cellular standards like Long-Term Evolution (LTE) have been introduced for mobile devices but are not well suited for low-power and low data rate devices such as the IoT devices. To address this, there is a number of emerging IoT standards. Fifth Generation (5G) mobile network, in particular, aims to address the limitations of previous cellular standards and be a potential key enabler for future IoT. In this paper, the state-of-the-art of the IoT application requirements along with their associated communication technologies are surveyed. Additionally, the 3rd Generation Partnership Project (3GPP) cellular-based Low-Power Wide Area (LPWA) solutions to support and enable the new service requirements for Massive to Critical IoT use cases are discussed in detail, including Extended Coverage Global System for Mobile Communications for the Internet of Things (EC-GSM-IoT), enhanced Machine-Type Communications (eMTC), and Narrowband-Internet of Things (NB-IoT). Furthermore, 5G New Radio (NR) enhancements for new service requirements and enabling technologies for the IoT are introduced. This paper presents a comprehensive review related to emerging and enabling technologies with main focus on 5G mobile networks that is envisaged to support the exponential traffic growth for enabling the IoT. The challenges and open research directions pertinent to the deployment of Massive to Critical IoT applications are also presented in coming up with an efficient context-aware congestion control (CACC) mechanism.
Article
Full-text available
With the advent of the 5th generation of wireless standards and an increasing demand for higher throughput, methods to improve the spectral efficiency of wireless systems have become very important. In the context of cognitive radio, a substantial increase in throughput is possible if the secondary user can make smart decisions regarding which channel to sense and when or how often to sense. Here, we propose an algorithm to not only select a channel for data transmission but also to predict how long the channel will remain unoccupied so that the time spent on channel sensing can be minimized. Our algorithm learns in two stages - a reinforcement learning approach for channel selection and a Bayesian approach to determine the optimal duration for which sensing can be skipped. Comparisons with other learning methods are provided through extensive simulations. We show that the number of sensing is minimized with negligible increase in primary interference; this implies that lesser energy is spent by the secondary user in sensing and also higher throughput is achieved by saving on sensing.
Conference Paper
Full-text available
Opportunity detection at secondary transmitters (TXs) is a key technique enabling cognitive radio (CR) networks. Such detection however cannot guarantee reliable communication at secondary receivers (RXs), especially when their association distance is long. To cope with the issue, this paper proposes a novel MAC called sense-and-predict (SaP), where each secondary TX decides whether to access or not based on the prediction of the interference level at RX. Firstly, we provide the spatial interference correlation in a probabilistic form using stochastic geometry, and utilize it to maximize the area spectral efficiency (ASE) for secondary networks while guaranteeing the service quality of primary networks. Through simulations and testbed experiments using USRP, SaP is shown to always achieve ASE improvement compared with the conventional TX based sensing.
Article
Full-text available
Due to the fixed frequency spectrum division policy, the radio spectrum resource is becoming increasingly scarce because many licensed frequency bands are not always fully utilized, and unlicensed users have no permission to use them. Cognitive radio has emerged as a promising solution for efficient radio spectrum utilization. One of the most important techniques of CR is spectrum sensing, which provides the real-time occupancy of available frequency bands for secondary users. However, current CR frameworks require SU terminals to conduct spectrum sensing and upload their sensing results to a fusion center (FC). This approach leads to many architectural challenges such as high design complexity, increase in hardware costs, inefficient resource usage, and high energy consumption. To address these challenges, we present a novel CR spectrum sensing framework by introducing a new functional entity called the spectrum agent (SA) to perform spectrum sensing tasks for SUs. We describe in detail the architecture and spectrum sensing mechanism of this new framework which provides a seamless integration of CR with next-generation (5G) cellular networks.
Article
Full-text available
In this paper, we present a novel cell outage management (COM) framework for heterogeneous networks (HetNets) with split control and data planes -a candidate architecture for meeting future capacity, quality of service and energy efficiency demands. In such architecture, the control and data functionalities are not necessarily handled by the same node. The control base stations (BSs) manage the transmission of control information and user equipment (UE) mobility, while the data BSs handle UE data. An implication of this split architecture is that, an outage to a BS in one plane has to be compensated by other BSs in the same plane. Our COM framework addresses this challenge by incorporating two distinct cell outage detection (COD) algorithms to cope with the idiosyncrasies of both the data and control planes. The COD algorithm for control cells leverages the relatively larger number of UEs in the control cell to gather large scale minimize drive testing (MDT) reports data, and detects outage by applying machine learning and anomaly detection techniques. To improve outage detection accuracy, we also investigate and compare the performance of two anomaly detecting algorithms, i.e. k􀀀 nearest neighbor and local outlier factor based anomaly detector, within the control COD. On the other hand, for data cells COD, we propose a heuristic grey-prediction based approach, which can work with the small number of UEs in the data cell, by exploiting the fact that the control BS manages UE-data BS connectivity, by receiving a periodic update of the received signal reference power (RSRP) statistic between the UEs and data BSs in its coverage. The detection accuracy of the heuristic data COD algorithm is further improved by exploiting the Fourier series of residual error that is inherent to grey prediction model. Our COM framework integrates these two COD algorithms with a cell outage compensation (COC) algorithm which can be applied to both planes. Our COC solution utili- es an actor critic (AC) based reinforcement learning (RL) algorithm, which optimizes the capacity and coverage of the identified outage zone in a plane, by adjusting the antenna gain and transmission power of the surrounding BSs in that plane. The simulation results show that the proposed framework can detect both data and control cell outage, and also compensate for the detected outage in a reliable manner.
Article
Ultra-densification is one of the main features of 5G networks. In an ultra-dense network, how to conduct interference management and spectrum allocation is a challenging issue. Spectrum sensing in cognitive radio networks is a distributed and efficient way to resolve this issue in ultra-dense networks. However, most of the studies on spectrum sensing only focus on sensing temporal spectrum opportunities where one or multiple primary users are active, which does not make full use of spectrum opportunities in the spatial location domain. To overcome the shortcomings of conventional temporal spectrum sensing, we study the problem of spatial spectrum sensing, which senses spatial spectrum opportunities in wireless networks. In this paper, the performance of spatial spectrum sensing and its application in sensing-based device-to-device (D2D) cellular networks are analyzed using stochastic geometry. Specifically, by modeling the locations of active transmitters as a Poisson point process, the spatial spectrum sensing problem is formulated using the framework of a detection theory. Closed-form expressions are obtained for the sensing threshold, probabilities of spatial detection, and false alarm. Furthermore, analytical throughput for D2D users and cellular users under both channel inversion and constant power allocation cases are derived. The optimal sensing radius that maximizes the defined network metric is obtained numerically. Finally, the simulation and numerical results are presented to verify our theoretical analysis.
Article
Interference is one of the most limiting factors when trying to achieve high spectral efficiency in the deployment of heterogeneous networks (HNs). In this paper, the HN is modeled as a layer of closed-access LTE femtocells (FCs) overlaid upon an LTE radio access network. Within the context of dynamic learning games, this work proposes a novel heterogeneous multiobjective fully distributed strategy based on a reinforcement learning (RL) model (CODIPAS-HRL) for FC self-configuration/optimization. The self-organization capability enables the FCs to autonomously and opportunistically sense the radio environment using different learning strategies and tune their parameters accordingly, in order to operate under restrictions of avoiding interference to both network tiers and satisfy certain quality-of-service requirements. The proposed model reduces the learning cost associated with each learning strategy. We also study the convergence behavior under different learning rates and derive a new accuracy metric in order to provide comparisons between the different learning strategies. The simulation results show the convergence of the learning model to a solution concept based on satisfaction equilibrium, under the uncertainty of the HN environment. We show that intra/inter-tier interference can be significantly reduced, thus resulting in higher cell throughputs.