Conference PaperPDF Available

A Reinforcement Learning Approach to Dynamic Spectrum Access in Internet-of-Things Networks

May 2019

May 2019

DOI:10.1109/ICC.2019.8762091

Conference: ICC 2019 - 2019 IEEE International Conference on Communications (ICC)

Authors:

Han Cha

Yonsei University

Seong-Lyun Kim

Yonsei University

The reinforcement learning system. The central unit interacts with wireless network by controlling the spectrum access of secondary transmitters.

…

Network topologies where the network size is 50 m × 50 m with the number of secondary pairs is 20. We extensively simulate about the number of secondary pairs from 20 to 70. We note that the pentagrams represent spectrum sensors. We give a number to PTXs so as to distinguish each other. Otherwise we mention, the location of all PTXs, STXs, SRXs, sensors are fixed.

…

Final access probabilities: the density of STXs is 0.004. Each line denotes the transition of the final access probability of STXs along T l . The STXs locating at the sparse area that free from impact of PTXs have near-to-one final access probability. These STXs capture spatial transmission opportunities.

…

Figures - uploaded by Han Cha

Content may be subject to copyright.

Content uploaded by Han Cha

Content may be subject to copyright.

A Reinforcement Learning Approach to Dynamic Spectrum Access

in Internet-of-Things Networks

Han Cha and Seong-Lyun Kim

School of Electrical and Electronic Engineering, Yonsei University, Seoul, Korea

Email: {chan, slkim}@ramo.yonsei.ac.kr

Abstract—To support wireless communication trafﬁc of

Internet-of-Things (IoT) systems in terms of massive connec-

tivity, dynamic spectrum access (DSA) is important issue. This

paper proposes spectrum sensor-aided DSA system based on

a reinforcement learning (RL) algorithm that aims at efﬁcient

spectrum usage for IoT network over the incumbent network.

Due to small-form-factor of IoT devices, they do not have

spectrum sensing capability. To support DSA of IoT devices,

we introduce sensor-aided DSA system that enhances spatial

spectrum reusability by means of RL algorithm. With the RL

algorithm, proposed DSA system provides self-organizing feature

for massive number of IoT devices. We show that the performance

of proposed RL based DSA system in various densities of IoT

devices utilizing slotted ALOHA protocol that has spectrum

access probability learned by proposed DSA system. We also

present the performance of proposed RL based DSA system

surpass that of distributed Carrier Sensing Multiple Access with

Collision Avoidance (CSMA/CA) protocol for channel access

coordination. We also present the consistent performance of

incumbent user when the IoT devices access to the spectrum

band with learned spectrum access probability.

Index Terms—Reinforcement learning, Dynamic spectrum ac-

cess, Random MAC, Internet-of-Things, Area spectral efﬁciency.

I. INTRODUCTION

In recent years, there are enormous applications that re-

quire tremendous wireless communications trafﬁc. Machine-

to-machine (M2M) communication is one of the trafﬁc-

demanding application in terms of huge connectivity require-

ment. Internet-of-Things (IoT) technology is a key enabler for

realizing M2M communication. With the IoT technology, tons

of devices are connected through Internet that enables devices

to exchange the information in real-time. To support those

of connections in wireless communications manner, explosive

demand of spectrum resources is obvious.

Dynamic spectrum access (DSA) scheme coupled with

random access is one of the promising technology to satisfy

such demand of IoT networks. The DSA-aided IoT network

utilizes the spectrum bandwidth more efﬁciently by exploit-

ing underutilized spectrum that is not used by incumbent

users. This opportunity-aware spectrum access is supported by

spectrum sensing functionality that requires high-performance

features: digital-to-analog converter, signal processor, RF unit,

single/double link structure, and so on. It is difﬁcult to equip

those of complex architecture to IoT device because of low

device cost requirement. To overcome such spectrum sensing

capability constraint, dedicated spectrum sensors support DSA

of IoT devices [1], [2].

Incumbent user

IoT device

Spectrum

sensor

Fig. 1: A system model of the Internet-of-Things (IoT)

network utilizing spectrum sensor-aided dynamic spectrum

access framework. Spectrum sensors provide spectrum usage

of incumbent users to IoT devices at each location of sensors.

IoT devices utilize this information for spectrum access.

Spectrum sensors collect the value of aggregate interference

at their positions and inform the values to a central unit that

controls the spectrum access of IoT devices. The central unit

determines the spectrum access probability of IoT devices in

certain moment based on collected information of aggregate

interference. IoT devices utilize DSA scheme, when the central

unit must consider harmful interference to another IoT devices

as well as incumbent user from IoT devices. Unfortunately,

the central unit may have no information about location of

IoT devices as well as incumbent user. To determine the

spectrum access probability of IoT devices, the central unit

needs to investigate the impact of each IoT devices to the

entire network.

In this context, reinforcement learning (RL) technique is

suitable for handling unknown information. The RL system

tries to learn the action with respect to maximizing numerical

reward by interacting with random environment iteratively [3].

RL has been applied various ﬁeld of mobile communications:

Resource allocation for mobile cellular networks [4], and [5],

cell outage management for dense heterogeneous networks [6]

and channel selection for D2D communications [7]. In the

cognitive spectrum access area, the papers [8]- [9] consider

the dynamic multichannel access scenario for users with

performing RL process. In [10] and [9], the cognitive radio

system is designed for a single user. On the other hand, this

paper considers the dynamic spectrum access scenario for a

large number of low-capability users as well as enhancing

spatial spectrum reusability.

Since the numerical reward has stochastic characteristics,

the central unit has to repeat trials based on an RL framework

for ﬁnding proper access probability of IoT devices. The rest

of the paper organized as follows. First, we present our system

model and optimization problem that maximizes area spectral

efﬁciency of DSA IoT network in sections II and III. Second,

we introduce the reinforcement learning DSA procedure that

is conducted by a central unit in section IV. Finally, we

investigate the performance of the RL based DSA system with

various densities of STX along the ﬁnalized learning step in

section V. We also provide a performance comparison between

proposed RL based DSA system and Carrier Sense Multiple

Access with Collision Avoidance (CSMA/CA). In this section,

we show an impact to incumbent network from IoT network

with respect to average spectral efﬁciency.

II. SY ST EM MO DE L

A. Network Model

Consider a DSA IoT network where transmitters communi-

cate with receivers over an incumbent user network. We deﬁne

the incumbent user network as the primary user network and

IoT network as the secondary user network. The Nnumber

of secondary transmitters (STXs) are governed by a central

unit that determines the spectrum access probabilities with

aggregate interference obtained from spectrum sensors. The

STXs transmit the packet to its paired secondary receiver

(SRX) through a wireless channel, which have inﬁnite back-

logged data to transmit. We denote the activation vector as

a= [a1, a2,· · · , aN]where the component akis one if the

kth STX is active, otherwise zero.

The STXs operate according to the value of activation vector

generated by the central unit or STX itself. Except for learning

process, the STX generates the activation vector, i.e. determine

the packet transmission itself. We assume that the primary

transmitters (PTXs) are always transmit its packet to primary

receivers (PRXs).

B. Channel Model

The transmitted signal experiences path-loss attenuation

with the exponent αas well as Rayleigh fading with unity

mean, i.e. h∼exp(1). The fading coefﬁcient of kth secondary

pair is hk. We assume that the identical distance between

paired STX and SRX denoted by d. Primary and secondary

networks share common wireless channel, interfering with

each other. When the kth STX transmits the packet, signal-to-

interference-plus-noise-ratio (SINR) γkat the paired kth SRX

is given by:

γk(a) = P2hkd−α

Pi6=kaiP2hikd−α

ik +Ik+σ2,(1)

where Ikis aggregate interference from the primary network

at kth SRX, and hik is the fading coefﬁcients from ith STX to

kth SRX. The value dik is distance between ith STX and kth

SRX. The value P2denotes the transmit power of STX. The

value σ2represents the thermal noise power.

III. PROB LE M FOR MU LATI ON

In the DSA IoT network, it is important to increase the

number of concurrent transmissions while guaranteeing the

individual quality of transmission. To this end, our prime

concern is to ﬁnd the optimal access probability p∗that

maximizes area spectral efﬁciency (ASE) deﬁned as the sum

of data rates per unit bandwidth in the unit area [11]. To

ﬁnd p∗, we formulate the following optimization problem:

p∗= arg max

log2(1 + β)·E"X

1γk≥β#,(P1)

s.t. 0≤pk≤1,∀k= 1,· · · , N,

where Ndenotes the number of STXs and β > 0is a target

SINR threshold of SRX. Note that 1γk≥βis an indicator

function yielding one if γk≥β, otherwise zero. The term

E[Pk1γk≥β]represents the average number of successful

transmissions. The objective function represents the average

ASE in terms of optimal access probability of STXs.

The challenge for ﬁnding the optimal access probability

of STXs arises from the stochastic nature of communication

channels. The central unit does not know the location of

the STXs as well as the location of PTXs. According to

the lack of global information including channel coefﬁcients,

the problems become intractable. The stochastic nature of

communication channels is effectively handled by evaluating

objective function repeatedly, which is a key property of RL

framework [12]. Therefore, we propose an RL based DSA

system suitable for stochastic environment.

IV. REI NF OR CE ME NT LE AR NI NG BASED DYNAM IC

SPE CT RUM AC CE SS SY ST EM

In this section we introduce the learning operation of sensor-

aided DSA IoT network which is performed by central unit.

We ﬁrst introduce the operation of the central unit and describe

the learning procedure of proposed RL based DSA system.

A. The Central Unit

Let us consider a central unit interacts with the environment

along time steps denoted by tas presented in Fig. 2. The

central unit poses an action by controlling the spectrum access

of STXs, and gather the transmission results of secondary

pairs. With these results, the central unit calculates the beneﬁt

of transmission of each STX. After that, the central unit adjusts

the access probabilities of STXs that produce higher ASE

value. These procedure is based on REINFORCE learning

2

Environment

The central unit

Transmission !

results

Action

Fig. 2: The reinforcement learning system. The central unit

interacts with wireless network by controlling the spectrum

access of secondary transmitters.

algorithm [12], which has an advantage for handling stochastic

environment. Now we describe the learning model.

In every time step, the central unit produces the activa-

tion vector a(t)=[a1(t), a2(t),· · · , aN(t)] that contains

the Bernoulli random variables. The probabilities of those

random variables are deﬁned with the access probability vector

p(t)=[p1(t), p2(t),· · · , pN(t)], i.e. a(t) = Bernoulli(p(t)).

The access probability vector p(t)is related with internal

status vector w(t)and a(t)by means of the modiﬁed sigmoid

function as:

pk(t) = 1

1 + e−wk(t)·,where t= 1,2, .... (2)

The central unit ﬁnds the optimal access probability vector

p∗by learning internal state vector w(t)according to the

following updating rules:

wk(0) = ln1−pk(0)

pk(0) ,(3)

wk(t+ 1) = wk(t) + α(t){u(t)−u(t)}Gk(t),(4)

Gk(t) = (−1)ak(t)+1

1 + ewk(t)(−1)ak(t)+1 ,(5)

α(t+ 1) = α−∆·t, (6)

where the α(t)is a learning rate which monotonically de-

creases with ∆and the function Gk(t)is a gradient of pk(t)

with respect to wk(t). Note that an utility function u(t)and

the average utility function u(t)is deﬁned as follows:

u(t) = log2(1 + β)·X

1γk≥β(a(t)),(7)

u(t+ 1) = (1 −λ)u(t) + λu(t),where 0< λ ≤1,(8)

where λis proportion of applying the value of utility function

at ttimeslot in the average utility function. We deﬁne the

baseline as the average utility function, which enables stable

performance enhancement of secondary network in reinforce-

ment learning procedure.

B. Proposed Reinforcement Learning Procedure

The proposed reinforcement learning procedure is com-

prised as follows:

-25 -20 -15 -10 -5 0 5 10 15 20 25

-25

-20

-15

-10

-5

s=0.008, p=0.0028 ssr=0.24

PTX

STX

SRX

Fig. 3: Network topologies where the network size is 50 m ×

50 m with the number of secondary pairs is 20. We extensively

simulate about the number of secondary pairs from 20 to

70. We note that the pentagrams represent spectrum sensors.

We give a number to PTXs so as to distinguish each other.

Otherwise we mention, the location of all PTXs, STXs, SRXs,

sensors are ﬁxed.

Algorithm 1 Learning procedure of the central unit

1: Initializes p(0),w(0),a(0) with Sk,k= 1,2, ..., N .

2: Informs a(0) to STXs.

3: The STXs transmit according to a(0).

4: for t= 0 to Tl

5: Collects 1γk>β (a(t)), k = 1,2, ..., N.

6: Updates u(t)according to Eq. (7).

7: Updates w(t+ 1) according to Eq. (4).

8: Updates u(t+ 1) according to Eq. (8).

9: Calculates p(t+ 1) ←according to Eq. (2).

10: Produces a(t+ 1) according to .

11: Informs a(t+ 1) to STXs.

12: The STXs transmit according to a(t+ 1).

13: end for

a) Spectrum sensing: In this period, the central unit collects

the interference value for calculating initial access probability

of STXs. Each sensor nearest to each STX informs the

measured aggregate interference value to the central unit. We

denote the measured aggregate interference by sensor as Sk.

b) Initializing: The central unit initializes the access proba-

bility and the internal state vector. The central unit determines

initial access probability p(0) of STXs with the interference

level Skmeasured by kth sensors, where k= 1,2, ..., N. Then

the central unit calculates the initial internal state values w(0)

with p(0) according to inverse function of Eq. (2).

c) Access probability learning: In this period, the central unit

interacts with the environment, i.e., wireless network utilizing

REINFORCE learning algorithm. The central unit controls the

transmission of STXs along the time step t= 1,2, ..., Tl.

After the transmission of STXs, the central unit collects

the transmission results of STXs to assess the beneﬁt of

transmission of each STX. The access probability learning

procedure of the central unit is described in Algorithm 1. Note

that Bernoulli(p) is one with probability p, otherwise zero.

d) Determining the ﬁnal access probability: After REIN-

FORCE learning algorithm is done, the central unit determines

the ﬁnal access probability of STXs plas follows:

pl=p(Tl).(9)

Note that the access probabilities of STXs converge to one

or zero for large enough Tl. The proof about convergence

behavior of access probabilities is represented in [12]. We

utilize this behavior for determining access probability of

STXs in learning procedure. With the ﬁnal access probability,

kth STXs transmit its packet according to slotted ALOHA

protocol with access probability plk.

In the next section, we evaluate the performance of the

proposed spectrum access procedure.

V. SIMULATION RES ULTS

A. General Settings

Consider a network within a square of length Lnet that is

set as 50 m (Fig. 3). The STXs and sensors are distributed

randomly in the network with density λs, and λssr respec-

tively. Otherwise we mention, the density of sensors λssr is

identical in all simulation cases as 0.24. The pair distance of

secondary users dis 3 m. We assume that the secondary users

use QAM constellation, i.e. the target SINR threshold of SRX

is 3 dB. TABLE I summarizes the parameters of simulations.

In Fig. 3, the location and the number of PTXs is same in

all simulations. We obtain average ASE of secondary users

by executing slotted ALOHA transmission of STX-SRX pairs

with the ﬁnal access probability repeating 100,000 times.

We compare the performance of proposed RL based DSA

system and a reference system with respect to average ASE.

We implement the reference system as Carrier Sense Multiple

Access with Collision Avoidance (CSMA/CA) protocol, which

is based on a Clear Channel Access (CCA) and random back-

off mechanism. We assume that CSMA/CA protocol has a

ﬁxed carrier sensing range that is double of pair distance das

a conventional setting [13], i.e., 6 m.

The initial access probability of STXs is determined by

means of sense-and-predict (SaP) method presented in [14],

[15]. The interference level at STXs is different from that of

sensors due to location difference. When it comes to using

sensor value to obtain spectrum access probability of STXs,

difference of interference level hinders accurate decision of

this probability. The SaP method aims at overcome this differ-

ence by predicting the interference level at STXs using spatial

correlation of interference. With the predicted interference

0 2500 5000 7500 10000 12500 15000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

Final access proability

Fig. 4: Final access probabilities: the density of STXs is

0.004. Each line denotes the transition of the ﬁnal access

probability of STXs along Tl. The STXs locating at the sparse

area that free from impact of PTXs have near-to-one ﬁnal

access probability. These STXs capture spatial transmission

opportunities.

level, the SaP method provides successful transmission proba-

bility of STXs by means of stochastic geometry approach. We

refer successful transmission as OP.

TABLE I: Simulation parameters

Parameters Values

Network size (Lnet ×Lnet) 50 m ×50 m

STX node density (λs) 0.008, 0.012, ... , 0.028

Spectrum sensor node density (λssr) 0.24

Distance between STX-SRX pair (d) 3 m

Target SINR threshold (β)3dB

Transmit power of STX (P2)23 dBm

Transmit power of PTX (P1)43 dBm

Tl500,1000, ..., 15000

Initial learning rate (α)0.025

Initial access probability (p(0))OP [14]

∆α/(Tl+ 100)

Carrier sensing range for CSMA/CA 6 m

Distance between PTX and PRX 10 m

B. Average Area Spectral Efﬁciency Validation

In Fig. 5, we evaluate the performance of RL based DSA

system in various STX densities. As the density of STXs is

increasing, the average ASE is improved when the learning

procedure is done. This because more the STXs locate at the

sparse area, where far away from PTXs, as we can see in Fig.

3. These STXs just consider the other STXs around itself, so

more STXs freely communicate with its paired SRXs. After Tl

has a value of around 5,000, the performance enhancement is

slowdown in every simulation case. So we need to choose

0 2500 5000 7500 10000 12500 15000

1.5

2.5

3.5

4.5

Average area spectral efficiency (bps/Hz/m2)

10-3

s=0.028

s=0.024

s=0.020

s=0.016

s=0.012

s=0.008

Fig. 5: The average area spectral efﬁciencies of various STX

densities. As the density of STXs is higher, the performance

gain due to learning procedure is larger.

proper Tltrade-off between learning cost and performance

gain.

In Fig. 6, the performance of RL based DSA system

surpasses those of CSMA/CA as we mention before. If STXs

utilize CSMA/CA protocol, STXs located sparse area lose

transmission opportunity waiting for CCA procedure. In the

RL based DSA system, STXs located sparse area have ﬁnal

access probability close to 1 (see Fig. 4), which captures spa-

tial transmission opportunity [16]. The STXs that have near-

to-one ﬁnal access probability transmit its packet whenever

transmission is needed. Therefore, the performance of RL

based DSA system surpasses those of CSMA/CA protocol.

C. Investigating Impact to Primary Network

With slotted ALOHA protocol utilizing the access probabil-

ity with ﬁnal access probability obtained by RL based DSA

system, secondary network has marginal impact to primary

network as we can see in Fig. 7. Nevertheless, as Tlgetting

larger, the average spectral efﬁciency of primary user slightly

deteriorates. As we discussed in previous section, modest value

of Tlhas sufﬁcient performance enhancement of secondary

network. Therefore, selecting proper value of Tlmay help

enhancing aggregate performance through primary and sec-

ondary network, which can be optimized in future work.

VI. CONCLUSION

In this paper, we propose the reinforcement learning (RL)

based dynamic spectrum access (DSA) system which aims at

efﬁcient spectrum usage for Internet-of-Things (IoT) networks.

The limitation of IoT device is presented and the architecture

of sensor-aided DSA IoT network is described. The main

objective of RL based DSA system is enhancing the spatial

spectrum reusability of IoT network, which is evaluated by

0.008 0.012 0.016 0.02 0.024 0.028

Density of STXs

0.5

1.5

2.5

3.5

4.5

Average area spectral efficiency (bps/Hz/m2)

10-3

RL based DSA

CSMA/CA

Fig. 6: Performance comparison between proposed RL based

DSA system and CSMA/CA. The ﬁnal access probabilities

when Tl= 10,000 is used for obtaining performance of

proposed RL based DSA system.

0 2500 5000 7500 10000 12500 15000

Average spectral efficiency (bps/Hz)

PTX=2

PTX=6

PTX=5

PTX=4

PTX=3

PTX=7

PTX=1

Fig. 7: Average spectral efﬁciencies of PTXs when the density

of STX is 0.004, i.e., the number of STXs is 20. The distance

between PTXs and PRXs is 10m.

average area spectral efﬁciency (ASE). We investigate the

performance of proposed RL based DSA system in various

densities of IoT devices. As the learning period is longer,

the performance of IoT network has more performance en-

hancement but gain is marginal after certain value of learning

period. We show the impact to incumbent network from IoT

network is marginal with respect to average spectral efﬁciency

of incumbent network. But as the learning period is longer,

there is slight performance degradation of incumbent network.

So we have to select proper learning period so as to reducing

impact on incumbent network as well as learning cost for

marginal performance enhancement of IoT network, as future

research topic. Also, the system architecture when there is no

dedicated control channel has to be considered to implement

the RL based DSA system in real wireless communication

networks.

ACK NOW LE DG EM EN T

This work was partly supported by Institute for Information

& communications Technology Planning & Evaluation (IITP)

grant funded by the Korea government (MSIT) (No. 2018-0-

00923, Scalable Spectrum Sensing for Beyond 5G Communi-

cation) and IITP grant funded by the MSIT (No.2018-0-00170,

Virtual Presence in Moving Objects through 5G).

REFERENCES

[1] G. A. Akpakwu, B. J. Silva, G. P. Hancke, and A. M. Abu-Mahfouz,

“A survey on 5g networks for the internet of things: communication

technologies and challenges,” IEEE Access, vol. 6, pp. 3619–3647, Dec.

2017.

[2] Z. Zhang, W. Zhang, S. Zeadally, Y. Wang, and Y. Liu, “Cognitive

radio spectrum sensing framework based on multi-agent architecture for

5g networks,” Wireless Communications, vol. 22, no. 6, pp. 34–39, Dec.

2015.

[3] R. S. Sutton and A. G. Barto, “Reinforcement learning: an introduction,”

Cambridge, MA: MIT Press, Mar. 1998.

[4] G. Alnwaimi, S. Vahid, and K. Moessner, “Dynamic heterogeneous

learning games for opportunistic access in lte-based macro/femtocell

deployments,” IEEE Transactions on Wireless Communications, vol. 14,

no. 4, pp. 2294–2308, Apr. 2015.

[5] F. Bernardo, R. Agust, J. Perez-Romero, and O. Sallent, “An application

of reinforcement learning for efﬁcient spectrum usage in next-generation

mobile cellular networks,” IEEE Transactions on Systems, Man, and

Cybernetics-PART C: Applications and Reviews, vol. 40, no. 4, pp. 477–

484, Jul. 2010.

[6] O. Onireti, A. Zoha, J. Moysen, A. Imran, L. Giupponi, M. A. Imran,

and A. Abu-Dayya, “A cell outage management framework for dense

heterogeneous networks,” IEEE Transactions on Vehicular Technology,

vol. 65, no. 4, pp. 2097–2113, Apr. 2016.

[7] S. Maghsudi and S. Stanczak, “Channel selection for network-assisted

d2d communication via no-regret bandit learning with calibrated fore-

casting,” IEEE Transactions on Wireless Communications, vol. 14, no. 3,

pp. 1309–1322, Mar. 2015.

[8] R. Bonnefoi, L. Besson, C. Moy, E. Kaufmann, and J. Palicot, “Multi-

armed bandit learning in iot networks: learning helps even in non-

stationary settings,” in Proc. 12th EAI International Conference on

Cognitive Radio Oriented Wireless Networks (CROWNCOM) 2017,

Lisbon, Portugal, pp. 173–185, Feb. 2018.

[9] S. Wang, H. Liu, P. H. Gomes, and B. Krishnamachari, “Deep reinforce-

ment learning for dynamic multichannel access in wireless networks,”

IEEE Transactions on Cognitive Communications and Networking,

vol. 4, no. 2, pp. 257–264, Jun. 2018.

[10] V. Raj, I. Dias, T. Tholeti, and S. Kalyani, “Spectrum access in cognitive

radio using a two stage reinforcement learning approach,” IEEE Journal

of Selected Topics in Signal Processing, vol. 12, no. 1, pp. 20–34, Jan.

2018.

[11] D. M. Kim and S. L. Kim, “Exploiting regional differences: A spatially

adaptive random access,” IEEE Transaction on Wireless Communica-

tions, vol. 14, no. 8, pp. 4342–4352, Aug. 2015.

[12] V. V. Phansalkar and M. A. L. Thathachar, “Local and global optimiza-

tion algorithms for generalized learning automata,” Neural Computer,

vol. 7, no. 5, pp. 950–973, Sep. 1995.

[13] K. Xu, M. Gerla, and S. Bae, “How effective is the ieee 802.11

rts/cts handshake in ad hoc networks,” in Proc. ’02. IEEE Global

Telecommunications Conference, 2002, Taipei, Taiwan, Nov. 2002.

[14] J. Kim, S. W. Ko, H. Cha, and S. L. Kim, “Sense-and-predict: oppor-

tunistic mac based on spatial interference correlation for cognitive radio

networks,” in Proc. 2017 IEEE International Symposium on Dynamic

Spectrum Access Networks (DySPAN), Baltimore, MD, USA, Mar. 2017.

[15] ——, “Testbed veriﬁcation of spectrum access opportunity detection in

cognitive radio networks,” in Proc. IEEE Asia-Paciﬁc Conference on

Communications (APCC), Perth, Australia, Dec. 2017.

[16] T. Novlan, J. D. Matyjas, B. L. Ng, and J. Zhang, “Spatial spectrum

sensing-based device-to-device cellular networks,” IEEE Transactions on

Wireless Communications, vol. 15, no. 11, pp. 7299–7313, Nov. 2016.

From 5G to 6G Networks, a Survey on AI-Based Jamming and Interference Detection and Mitigation

Article

Full-text available

Jan 2024

Fifth-generation and Beyond (5GB) networks are transformational technologies to revolutionize future wireless communications in terms of massive connectivity, higher capacity, lower latency, and ultra-high reliability. To this end, 5GB networks are designed as a coalescence of various schemes and enabling technologies such as unmanned aerial vehicles (UAV)-assisted networks, vehicular networks, heterogeneous cellular networks (HCNs), Internet of things (IoT), device-to-device (D2D) communication, millimeter-wave (mm-wave), massive multiple-input multiple-output (mMIMO), non-orthogonal multiple access (NOMA), re-configurable intelligent surface (RIS) and Terahertz (THz) communications. Due to the scarcity of licensed bands and the co-existence of multiple technologies in unlicensed bands, interference management is a pivotal factor in enhancing the user experience and quality of service (QoS) in future-generation networks. However, due to the highly complex scenarios, conventional interference mitigation techniques may not be suitable in 5GB networks. To cope with this, researchers have investigated artificial intelligence (AI)-based interference management techniques to tackle complex environments. Existing surveys either focus on conventional interference management methods or AI-based interference management only for a specific scheme or technology. This survey article complements the existing survey literature by providing a detailed review of AI-based intentional-interference management such as jamming detection and mitigation, and AI-enabled unintentional-interference mitigation techniques from the standpoints of UAV-assisted networks, vehicular networks, HCNs, D2D, IoT, mmWave-MIMO, NOMA, and THz communications. While identifying and presenting the AI-based techniques for interference management in 5G and beyond networks, this article also points out the challenges, open issues, and future research directions to adopt AI-enabled techniques to curtail the effects of interference in 5GB and towards 6G networks.

Reliable Interference Prediction and Management with Time-Correlated Traffic for URLLC

Conference Paper

Full-text available

Dec 2023

In designing ultra-reliable low-latency communication (URLLC) services in 5G-and-beyond systems, link adaptation (LA) plays a vital role in adjusting transmission parameters under channel and interference dynamics. Without capturing such dynamics (e.g., relying on average estimates), the LA algorithms fail to simultaneously meet the strict reliability and latency bounds of mission-critical applications. To this end, this paper focuses on interference prediction-based adaptive resource allocation of one-shot URLLC transmission, wherein our solution deviates from the conventional average-based interference estimation schemes. We predict the next interference value based on the interference distribution estimation using a discrete-time Markov chain (DTMC). Further, to exploit the time correlation of each interference source, we model the correlated interference variations as a second-order DTMC to achieve higher prediction accuracy. While accounting for the risk sensitivity of interference estimates, the prediction outcome is then used for appropriate resource allocation of a URLLC transmission under link outage constraints. We evaluate the complete solution, given in the form of an algorithm, using Monte-Carlo simulations, and compare it with the first-order baseline counterpart. The analysis shows that the second-order interference estimate can fulfill the target outage as low as 10 −7 and improve the outage probability more than ten times in some scenarios compared to the baseline scheme while keeping the same amount of resource usage.

Massive connectivity with machine learning for the Internet of Things

Article

Jan 2021
COMPUT NETW

Driven by the need to ensure the connectivity of an unprecedentedly huge number of IoT devices with no human intervention the issues of massive connectivity have recently become one of the main research areas in IoT studies. Conventional wireless communication technologies are designed for Human-to-Human (H2H) communication which leads to major problems in primary access, channel utilization and spectrum efficiency when massive numbers of devices require connectivity. Current random access procedures are based on a four-step handshaking with control messages which contradicts the requirements of IoT applications in terms of small data payloads and low complexity. Targeted channel utilization and spectrum efficiency cannot be achieved using traditional orthogonal approaches. Thus the goal of our work is to review the most recent developments and critically evaluate the existing work related to the evolution of network access methods in the new communication era. The paper covers three major aspects: first the primary random access procedures, proposed for IoT communications are discussed. The second aspect focuses on the approaches for integration of existing random multiple access schemes with non-orthogonal multiple access methods (NOMA). This integration of random access procedures with NOMA opens a new research trend in the field of massive connectivity. Operating on space domains additional to the physical domain such as code and power domains, NOMA integration targets increased channel utilization and spectrum efficiency to complement the flexibility of random access. On the other hand, the design of efficient algorithms for massive connectivity in IoT is also challenged by the highly application and environmentally dependent traffic model. A new angle of tackling this problem has emerged thanks to the extensive developments in machine learning and the possibilities of their incorporation in communication networks. Thus, the final aspect this review paper addresses are the newly emerging research directions of incorporating machine learning (ML) methods for providing efficient IoT connectivity. Breakthrough ML techniques allow wireless networking devices to perform transmissions by learning and building knowledge about the communication and networking environment. A critical evaluation of the large body of work accumulated in this area in the most recent years and outlining of some major open research issues concludes the paper.

Dynamic Spectrum Allocation Following Machine Learning-Based Traffic Predictions in 5G

Article

Full-text available

Oct 2021

The popularity of mobile broadband connectivity continues to grow and thus, the future wireless networks are expected to serve a very large number of users demanding a huge capacity. Employing larger spectral bandwidth and installing more access points to enhance the capacity is not enough to tackle the stated challenge due to related costs and the interference issues involved. In this way, frequency resources are becoming one of the most valuable assets, which require proper utilization and fair distribution. Traditional frequency resource management strategies are often based on static approaches, and are agnostic to the instantaneous demand of the network. These static approaches tend to cause congestion in a few cells, whereas at the same time, might waste those precious resources on others. Therefore, such static approaches are not efficient enough to deal with the capacity challenge of the future network. Thus, in this paper we present a dynamic access-aware bandwidth allocation approach, which follows the dynamic traffic requirements of each cell and allocates the required bandwidth accordingly from a common spectrum pool, which gathers the entire system bandwidth. We perform the evaluation of our proposal by means of real network traffic traces. Evaluation results presented in this paper depict the performance gain of the proposed dynamic access-aware approach compared to two different traditional approaches in terms of utilization and served traffic. Moreover, to acquire knowledge about access network requirement, we present a machine learning-based approach, which predicts the state of the network, and is utilized to manage the available spectrum accordingly. Our comparative results show that, in terms of spectrum allocation accuracy and utilization efficiency, a well designed machine learning-based bandwidth allocation mechanism not only outperforms common static approaches, but even achieves the performance (with a relative error close to 0.04) of an ideal dynamic system with perfect knowledge of future traffic requirements.

Joint decision-making of communication waveform and power based on Q-learning

Conference Paper

Apr 2024

A Cooperation-Free Resource Allocation Algorithm Enhanced by Reinforcement Learning for Coexisting IIoTs

Conference Paper

Apr 2023

Deep Q-network Based Reinforcement Learning for Distributed Dynamic Spectrum Access

Conference Paper

May 2022

Multi-Channel Opportunistic Access for Heterogeneous Networks Based on Deep Reinforcement Learning

Article

Jul 2021

This paper investigates a new medium access control (MAC) protocol for multi-channel heterogeneous networks (HetNets) based on deep reinforcement learning (DRL), referred to as multi-channel deep-reinforcement learning multiple access (MC-DLMA). Specifically, we consider a HetNet where different radio networks adopt different MAC protocols to transmit data packets to a common access point on different wireless channels. Three key challenges for the MC-DLMA node are (i) no environmental knowledge is known in advance; (ii) the channels in HetNets are allocated to nodes using different MAC protocols; (iii) the capacities of different channels may be different. The main goal of MC-DLMA is to find an optimal access policy to transmit on those pre-allocated channels and expedite more efficient spectrum utilization. Due to the complex temporal correlation of spectrum states in HetNets, the traditional DRL technique, e.g., original deep Q-network (DQN) algorithm, is no longer applicable to our problem. In our MC-DLMA design, an advanced class of recurrent neural network, termed as Gated Recurrent Unit (GRU), is embedded into the original DQN technique to aggregate observations over time and reason the underlying temporal feature in multi-channel HetNets. Furthermore, we analytically give the optimal spectrum access patterns and derive the optimal throughputs in various HetNet scenarios. With judicious definitions of the state, action, and reward function in the parlance of the DRL framework, simulation results show that MC-DLMA can (i) find the optimal spectrum access strategies in various HetNets, (ii) outperform the random access policy, the whittle index policy, and the original DQN, (iii) perform cooperative transmission in a fully distributed manner in the presence of multiple agents, and (iv) adapt well to the environmental changes.

A Dynamic Spectrum Access Method Based on Q-Learning

Conference Paper

Jun 2020

Testbed verification of spectrum access opportunity detection in cognitive radio networks

Conference Paper

Full-text available

Dec 2017

Detecting the spectrum access opportunity (OP) of a secondary user is a key technique in cognitive radio (CR) networks. Especially in a CR scenario where dedicated spectrum sensors are installed to check the spectrum utilization, the OP at locations where the sensor is not installed cannot be estimated. To cope with the issue, this paper proposes an OP map, in which a centralized server estimates the OP of secondary users based on the interference measurements of sensors. The OP is estimated by analyzing the spatial correlation of interference. The accuracy of OP detection is validated through the MATLAB simulations in conjunction with testbed experiments using universal software radio peripherals (USRPs) in Yonsei university, Seoul, South Korea.

Deep Reinforcement Learning for Dynamic Multichannel Access in Wireless Networks

Article

Full-text available

Feb 2018

We consider a dynamic multichannel access problem, where multiple correlated channels follow an unknown joint Markov model. A user at each time slot selects a channel to transmit data and receives a reward based on the success or failure of the transmission. The objective is to find a policy that maximizes the expected long-term reward. The problem is formulated as a partially observable Markov decision process (POMDP) with unknown system dynamics. To overcome the challenges of unknown system dynamics as well as prohibitive computation, we apply the concept of reinforcement learning and implement a Deep Q-Network (DQN) that can deal with large state space without any prior knowledge of the system dynamics. We provide an analytical study on the optimal policy for fixed-pattern channel switching with known system dynamics and show through simulations that DQN can achieve the same optimal performance without knowing the system statistics. We compare the performance of DQN with a Myopic policy and a Whittle Index-based heuristic through both simulations as well as real-data trace and show that DQN achieves near-optimal performance in more complex situations. Finally, we propose an adaptive DQN approach with the capability to adapt its learning in time-varying, dynamic scenarios.

Multi-Armed Bandit Learning in IoT Networks: Learning Helps Even in Non-stationary Settings

Chapter

Full-text available

Feb 2018

A Survey on 5G Networks for the Internet of Things: Communication Technologies and Challenges

Article

Full-text available

Dec 2017

The Internet of Things (IoT) is a promising technology which tends to revolutionize and connect the global world via heterogeneous smart devices through seamless connectivity. The current demand for Machine-Type Communications (MTC) has resulted in a variety of communication technologies with diverse service requirements to achieve the modern IoT vision. More recent cellular standards like Long-Term Evolution (LTE) have been introduced for mobile devices but are not well suited for low-power and low data rate devices such as the IoT devices. To address this, there is a number of emerging IoT standards. Fifth Generation (5G) mobile network, in particular, aims to address the limitations of previous cellular standards and be a potential key enabler for future IoT. In this paper, the state-of-the-art of the IoT application requirements along with their associated communication technologies are surveyed. Additionally, the 3rd Generation Partnership Project (3GPP) cellular-based Low-Power Wide Area (LPWA) solutions to support and enable the new service requirements for Massive to Critical IoT use cases are discussed in detail, including Extended Coverage Global System for Mobile Communications for the Internet of Things (EC-GSM-IoT), enhanced Machine-Type Communications (eMTC), and Narrowband-Internet of Things (NB-IoT). Furthermore, 5G New Radio (NR) enhancements for new service requirements and enabling technologies for the IoT are introduced. This paper presents a comprehensive review related to emerging and enabling technologies with main focus on 5G mobile networks that is envisaged to support the exponential traffic growth for enabling the IoT. The challenges and open research directions pertinent to the deployment of Massive to Critical IoT applications are also presented in coming up with an efficient context-aware congestion control (CACC) mechanism.

Spectrum Access In Cognitive Radio Using A Two Stage Reinforcement Learning Approach

Article

Full-text available

Jul 2017

With the advent of the 5th generation of wireless standards and an increasing demand for higher throughput, methods to improve the spectral efficiency of wireless systems have become very important. In the context of cognitive radio, a substantial increase in throughput is possible if the secondary user can make smart decisions regarding which channel to sense and when or how often to sense. Here, we propose an algorithm to not only select a channel for data transmission but also to predict how long the channel will remain unoccupied so that the time spent on channel sensing can be minimized. Our algorithm learns in two stages - a reinforcement learning approach for channel selection and a Bayesian approach to determine the optimal duration for which sensing can be skipped. Comparisons with other learning methods are provided through extensive simulations. We show that the number of sensing is minimized with negligible increase in primary interference; this implies that lesser energy is spent by the secondary user in sensing and also higher throughput is achieved by saving on sensing.

Sense-and-Predict: Opportunistic MAC Based on Spatial Interference Correlation for Cognitive Radio Networks

Conference Paper

Full-text available

Mar 2017

Opportunity detection at secondary transmitters (TXs) is a key technique enabling cognitive radio (CR) networks. Such detection however cannot guarantee reliable communication at secondary receivers (RXs), especially when their association distance is long. To cope with the issue, this paper proposes a novel MAC called sense-and-predict (SaP), where each secondary TX decides whether to access or not based on the prediction of the interference level at RX. Firstly, we provide the spatial interference correlation in a probabilistic form using stochastic geometry, and utilize it to maximize the area spectral efficiency (ASE) for secondary networks while guaranteeing the service quality of primary networks. Through simulations and testbed experiments using USRP, SaP is shown to always achieve ASE improvement compared with the conventional TX based sensing.

Cognitive radio spectrum sensing framework based on multi-agent arc hitecture for 5G networks

Article

Full-text available

Dec 2015

Due to the fixed frequency spectrum division policy, the radio spectrum resource is becoming increasingly scarce because many licensed frequency bands are not always fully utilized, and unlicensed users have no permission to use them. Cognitive radio has emerged as a promising solution for efficient radio spectrum utilization. One of the most important techniques of CR is spectrum sensing, which provides the real-time occupancy of available frequency bands for secondary users. However, current CR frameworks require SU terminals to conduct spectrum sensing and upload their sensing results to a fusion center (FC). This approach leads to many architectural challenges such as high design complexity, increase in hardware costs, inefficient resource usage, and high energy consumption. To address these challenges, we present a novel CR spectrum sensing framework by introducing a new functional entity called the spectrum agent (SA) to perform spectrum sensing tasks for SUs. We describe in detail the architecture and spectrum sensing mechanism of this new framework which provides a seamless integration of CR with next-generation (5G) cellular networks.

A Cell Outage Management Framework for Dense Heterogeneous Networks

Article

Full-text available

Jan 2015

In this paper, we present a novel cell outage management (COM) framework for heterogeneous networks (HetNets) with split control and data planes -a candidate architecture for meeting future capacity, quality of service and energy efficiency demands. In such architecture, the control and data functionalities are not necessarily handled by the same node. The control base stations (BSs) manage the transmission of control information and user equipment (UE) mobility, while the data BSs handle UE data. An implication of this split architecture is that, an outage to a BS in one plane has to be compensated by other BSs in the same plane. Our COM framework addresses this challenge by incorporating two distinct cell outage detection (COD) algorithms to cope with the idiosyncrasies of both the data and control planes. The COD algorithm for control cells leverages the relatively larger number of UEs in the control cell to gather large scale minimize drive testing (MDT) reports data, and detects outage by applying machine learning and anomaly detection techniques. To improve outage detection accuracy, we also investigate and compare the performance of two anomaly detecting algorithms, i.e. k􀀀 nearest neighbor and local outlier factor based anomaly detector, within the control COD. On the other hand, for data cells COD, we propose a heuristic grey-prediction based approach, which can work with the small number of UEs in the data cell, by exploiting the fact that the control BS manages UE-data BS connectivity, by receiving a periodic update of the received signal reference power (RSRP) statistic between the UEs and data BSs in its coverage. The detection accuracy of the heuristic data COD algorithm is further improved by exploiting the Fourier series of residual error that is inherent to grey prediction model. Our COM framework integrates these two COD algorithms with a cell outage compensation (COC) algorithm which can be applied to both planes. Our COC solution utili- es an actor critic (AC) based reinforcement learning (RL) algorithm, which optimizes the capacity and coverage of the identified outage zone in a plane, by adjusting the antenna gain and transmission power of the surrounding BSs in that plane. The simulation results show that the proposed framework can detect both data and control cell outage, and also compensate for the detected outage in a reliable manner.

Spatial Spectrum Sensing based Device-to-Device (D2D) Cellular Networks

Article

Nov 2016

Ultra-densification is one of the main features of 5G networks. In an ultra-dense network, how to conduct interference management and spectrum allocation is a challenging issue. Spectrum sensing in cognitive radio networks is a distributed and efficient way to resolve this issue in ultra-dense networks. However, most of the studies on spectrum sensing only focus on sensing temporal spectrum opportunities where one or multiple primary users are active, which does not make full use of spectrum opportunities in the spatial location domain. To overcome the shortcomings of conventional temporal spectrum sensing, we study the problem of spatial spectrum sensing, which senses spatial spectrum opportunities in wireless networks. In this paper, the performance of spatial spectrum sensing and its application in sensing-based device-to-device (D2D) cellular networks are analyzed using stochastic geometry. Specifically, by modeling the locations of active transmitters as a Poisson point process, the spatial spectrum sensing problem is formulated using the framework of a detection theory. Closed-form expressions are obtained for the sensing threshold, probabilities of spatial detection, and false alarm. Furthermore, analytical throughput for D2D users and cellular users under both channel inversion and constant power allocation cases are derived. The optimal sensing radius that maximizes the defined network metric is obtained numerically. Finally, the simulation and numerical results are presented to verify our theoretical analysis.

Dynamic Heterogeneous Learning Games for Opportunistic Access in LTE-Based Macro/Femtocell Deployments

Article

Apr 2015

Interference is one of the most limiting factors when trying to achieve high spectral efficiency in the deployment of heterogeneous networks (HNs). In this paper, the HN is modeled as a layer of closed-access LTE femtocells (FCs) overlaid upon an LTE radio access network. Within the context of dynamic learning games, this work proposes a novel heterogeneous multiobjective fully distributed strategy based on a reinforcement learning (RL) model (CODIPAS-HRL) for FC self-configuration/optimization. The self-organization capability enables the FCs to autonomously and opportunistically sense the radio environment using different learning strategies and tune their parameters accordingly, in order to operate under restrictions of avoiding interference to both network tiers and satisfy certain quality-of-service requirements. The proposed model reduces the learning cost associated with each learning strategy. We also study the convergence behavior under different learning rates and derive a new accuracy metric in order to provide comparisons between the different learning strategies. The simulation results show the convergence of the learning model to a solution concept based on satisfaction equilibrium, under the uncertainty of the HN environment. We show that intra/inter-tier interference can be significantly reduced, thus resulting in higher cell throughputs.

A Reinforcement Learning Approach to Dynamic Spectrum Access in Internet-of-Things Networks

Figures

Recommended publications

Deep-RMSA: A Deep-Reinforcement-Learning Routing, Modulation and Spectrum Assignment Agent for Elast...

A Reinforcement Learning Based Joint Spectrum Allocation and Power Control Algorithm for D2D Communi...

Testbed verification of spectrum access opportunity detection in cognitive radio networks

Opportunistic map based Flexible Hybrid Duplex Systems in Dynamic Spectrum Access

Opportunism in Dynamic Spectrum Access for 5G: A Concept and Its Application to Duplexing

Distributed Reinforcement Learning for Quality-of-Service Routing in Wireless Device-to-device Netwo...