Conference PaperPDF Available

Improved Decentralized Q-learning Algorithm for Interference Reduction in LTE-Femtocells

June 2011

June 2011

DOI:10.1109/WiAd.2011.5983301

Conference: Wireless Advanced WiAd 2011

Authors:

Meryem Simsek

ICSI Berkeley

Ana Galindo-Serrano

Orange Labs, Paris

L. Giupponi

CTTC Catalan Telecommunications Technology Centre

Femtocells are receiving considerable interest in mobile communications as a strategy to overcome the indoor coverage problems as well as to improve the efficiency of current macrocell systems. Nevertheless, the detrimental factor in such networks is co-channel interference between macrocells and femtocells, as well as among neighboring femtocells which can dramatically decrease the overall capacity of the network. In this paper we propose a Reinforcement Learning (RL) framework, based on an improved decentralized Q-learning algorithm for femtocells sharing the macrocell spectrum. Since the major drawback of Q-learning is its slow convergence, we propose a smart initialization procedure. The proposed algorithm will be compared with a basic Q-learning algorithm and some power control (PC) algorithms from literature, e.g., fixed power allocation, received power based PC. The goal is to show the performance improvement and enhanced convergence.

Description of Q-table initialization.

…

Average throughput of macro UE for 20 times 50 ms.

…

Figures - uploaded by Meryem Simsek

Content may be subject to copyright.

Content uploaded by Meryem Simsek

Content may be subject to copyright.

Improved Decentralized Q-learning Algorithm for

Interference Reduction in LTE-Femtocells

Meryem Simsek, Andreas Czylwik

Department of Communication Systems

University of Duisburg-Essen

Bismarckstrasse 81, 47057 Duisburg, Germany

Email: {simsek,czylwik}@nts.uni-due.de

Ana Galindo-Serrano, Lorenza Giupponi

Centre Tecnol

ogic de Telecommunicacions

de Catalunya (CTTC)

Barcelona, Spain 08860

Email: {ana.maria.galindo, lorenza.giupponi}@cttc.es

Abstract—Femtocells are receiving considerable interest in

mobile communications as a strategy to overcome the indoor

coverage problems as well as to improve the efﬁciency of current

macrocell systems. Nevertheless, the detrimental factor in such

networks is co-channel interference between macrocells and

femtocells, as well as among neighboring femtocells which can

dramatically decrease the overall capacity of the network. In this

paper we propose a Reinforcement Learning (RL) framework,

based on an improved decentralized Q-learning algorithm for

femtocells sharing the macrocell spectrum. Since the major

drawback of Q-learning is its slow convergence, we propose

a smart initialization procedure. The proposed algorithm will

be compared with a basic Q-learning algorithm and some

power control (PC) algorithms from literature, e.g., ﬁxed power

allocation, received power based PC. The goal is to show the

performance improvement and enhanced convergence.

Index Terms—Femtocell system, interference management,

multi-agent system, decentralized Q-learning.

I. INTRODUCTION

The next generation mobile network (NGMN) aims to

efﬁciently deploy low cost and low power cellular BSs in the

subscribers home environment, known as femtocells. NGMN

aims to eliminate dead spots like home or ofﬁce and let

multiple users efﬁciently use limited frequency resources by

providing a better wireless environment that enables high

capacity data transmission service. However, for co-channel

and closed access femtocells deployment, it is a big concern

to mitigate interference, that is caused by femtocells to the

existing macrocellular network. The interference of femtocell

networks cannot be fully avoided. However, it should be

reduced as much as possible.

A number of different deployment conﬁgurations have been

considered for femtocells [2]. Corresponding scenarios are

for example open or close access, dedicated or co-channel

deployment and ﬁxed or adaptive downlink transmit power.

Especially the close access femtocells which are deployed

on the same channel as the macro network are considered

as the worst case interference s cenario. A key requirement

for co-channel femtocell deployment is to keep the increase

in interference caused by femtocells low enough to ensure a

low impact on the performance of the existing macrocellular

network, while still ensure enough transmit power for femto

BSs to achieve target coverage and services. Femtocells that

use co-channel allocation with macrocells can considerably

increase wireless coverage and system capacity, especially for

indoor and cell-edge users. However, the beneﬁt is realized

only when interference between femtocells and macrocells is

well managed. Since the algorithm which is used to control the

femtocell transmit power is left as an implementation detail, a

variety of models have been analyzed. As an initial analysis in

[3] all femto BSs tr ansmit with equal maximum power. This

will lead to improved indoor coverage while decreasing the

macrocell performance rapidly. In [4], for example, the femto

BS adjusts its maximum downlink transmit power as a function

of air interface measurements to avoid interfering with macro

cell user equipments (UE). Examples of such measurements

are the total received interference, the reference signal received

power (RSRP) for the most dominant macro BS, etc. This

scheme is open loop and will be named as received power

based power control (PC) algorithm in this paper. Further PC

schemes can be found in [5–7].

Due to the selﬁsh nature of femtocells and uncertainty on

their number and locations, self-organization techniques are

needed. Self-organization will allow femtocells to integrate

themselves into the network of the operator, learn about

their environment (neighbouring cells, interference) and tune

their parameters (power, frequency) accordingly. As a result,

distributed interference management was considered in [8] by

using a powerful learning technique known as Reinforcement

Learning (RL). Here, Q-learning is applied to the distributed

femtocell setting in the form of decentralized Q-learning.

RL [9; 10] describes a learning scenario, where an agent

tries to improve its behavior by taking actions in its environ-

ment and receiving reward for performing well or r eceiving

punishment for failure. As a new method, multi-agent RL has

been applied in many ﬁelds, such as artiﬁcial intelligence, to

solve multi-agent coordination and collaboration since it is a

promising approach for establishing autonomous agents that

improve their performance with experience. A fundamental

problem of its standard algorithm is that, although many

tasks can asymptotically be learned by adopting the Markov

Decision Process (MDP) framework, in practice they are not

solvable in a reasonable amount of time.

Therefore, in this paper we will show an improvement of the

Q-learning algorithm presented in [8] by introduction a new

2011 Wireless Advanced

initialization method, which shows an enhanced convergence.

Based on an Long Term Evolution (LTE)-femtocell system

level simulation environment, we have presented in [11], we

show the performance of our proposed algorithm. We compare

our results with the performance of algorithms from literature,

such as in [3], [4] and [8].

The paper is organized as follows: In Section II, we

summarize well known PC algorithms in order to introduce

our proposed improved decentralized Q-learning algorithm in

Section III. In the next section (Section IV) we describe our

simulation environment and discuss simulation results. Finally,

we conclude the paper in Section V.

II. POWER CONTROL ALGORITHMS FOR FEMTOCELLS

In this section we describe some PC algorithms that have

been proposed for femtocells and that we will use for perfor-

mance comparison with our proposed algorithm.

We consider K macrocells, where m

macro UEs are ran-

domly located inside the macro coverage area. The macrocells

are deployed in an urban area and coexist with L femtocells.

Each femtocell provides service to its m

associated femto

UEs. We consider that the total bandwidth BW is divided

into subchannels with bandwidth ∆f = 15 kHz. Orthogonal

frequency division multiplexing (OFDM) symbols are grouped

into resource blocks (RBs). Both macrocells and femtocells

operate in the same frequency band and have the same amount

R of available RBs. We consider proportional fair scheduling,

in which all RBs are allocated to UEs. In this paper we focus

on the downlink operation. For simplicity we will neglect the

time index in the following algorithms.

We denote with p

k,M

and p

l,F

the downlink transmit power

of macro BS k and femto BS l in RB r, respectively. The

maximum transmit powers for macro and femto BSs are p

max

and p

max

, respectively. In the following, transmit powers are

denoted with p and corresponding power levels in dBm with

P . Signal-to-noise-plus-interference power ratios (SINR) are

denoted with γ and the corresponding values in dB with Γ.

A. Fixed Power Allocation

The ﬁxed power allocation method is the most basic and

common power allocation scheme, in which the total transmit

power of each BS is equally divided among the number of

subcarriers of the system. Assuming, there are 12 subcarriers

per RB the transmit power per RB is:

k,M

= 12 ·

max

12 · R

and p

l,F

= 12 ·

max

12 · R

. (1)

Due to the usage of maximum transmit power, this scheme

will improve the femto UE throughput with the drawback of

interfering the macro UEs. The macrocellular performance is

expected to be reduced.

B. Received Power Based Power Control Algorithm

To address the interference management in heterogeneous

networks with co-channel deployment of macro and femto

cells a received power based PC algorithm was proposed in

[4]. In this algorithm the femto BS l adjusts its maximum

Algorithm 1 Decentralized Q-learning.

Initialize:

for each s ∈ S , a ∈ A do

initialize the Q-value representation mechanism Q

(s, a)

end for

evaluate the starting state s ∈ S

Learning:

loop

generate a random number r between 0 and 1

if (r < ǫ) then

select action randomly

else

select the action a ∈ A characterized by the min(Q-value)

end if

execute a

receive an immediate cost c

observe the next state s

′

update the table entry as f ollows:

(s, a) ← (1 − α)Q

(s, a) + α[c + λ min

′

, a)]

s = s

′

end loop

downlink transmit power as a function of air interface mea-

surements and sets its maximum transmit power according to:

′l,F

max

= max

min



α · P

+ β, P

l,F

max



, P

l,F

min

, (2)

where P

l,F

min

is the minimum transmit power level of femto

BS l and P

is the received power level from the strongest

co-channel macro BS and α = 0.8 and β = 40 dB. The

parameter α is a linear scalar that allows altering the slope

of power control mapping curve and adjustment to different

sizes of macro cells, β is a parameter expressed in dB that

can be used for altering the exact range of P

covered by

dynamic range of power control. This PC algorithm is open-

loop and is promising in allowing adequate femtocell coverage

area without causing signiﬁcant performance degradation for

the macrocells. However, only the maximum transmit power is

adapted. There is no RB-based power adaptation considered.

C. Basic Decentralized Q-learning Algorithm

In the Q-learning algorithm [14], agents learn based on the

state of the environment and a cost value. The agents learn by

taking actions and using feedback from the environment. The

Q-value Q(s, a) in Q-learning is an estimation of the value of

future costs if the agent takes a particular action a when it is in

a particular state s. By exploring the environment, the agents

create a table of Q-values for each state and each possible

action. Except, when making an exploratory move in case of

ǫ-greedy policy, the agents select the action with the minimum

Q-value.

The Q-learning algorithm with an ǫ-greedy policy has three

parameters: the learning rate α (0 ≤ α ≤ 1), the discount

factor λ (0 ≤ λ ≤ 1) and the ǫ-greedy parameter, which

is usually very small (0.01 ≤ ǫ ≤ 0.05). The learning rate

parameter limits how quickly learning can occur. The Q-

learning algorithm controls how quickly the Q-values can

change with each state/action change. If the learning rate is too

139

small, learning will occur very slowly. If the rate is too high,

then the algorithm might not converge. The discount factor

controls the value placed on future costs [9]. If the value is

low, immediate costs are optimized, while values closer to 1

cause the learning algorithm to more strongly count future

costs. The value of ǫ is a probability of taking a non-greedy

(exploratory) action in ǫ-greedy action selection method. A

non-zero value of ǫ insures that all state/action pairs will be

explored as the number of trials goes to inﬁnity. If ǫ = 0 the

algorithm might miss optimal solutions.

The distributed femtocell scenario can be mathematically

formulated by means of a stochastic game. We design our

basic decentralized Q-learning based on [8]. Let S be the set

of possible states S = {s

r,1

, s

r,2

, . . . , s

r,n

}, and A be the

set of possible actions A = {a

r,1

, a

r,2

, . . . , a

r,m

} that each

femto BS l may choose with respect to RB r. The interactions

between the multi-agent system and the environment at each

time instant t corresponding to RB r consist of the following

sequence.

• The agent l senses the state s

= s ∈ S.

• Based on s, agent l selects an action a

= a ∈ A.

• As a result, the environment makes a transition to the

new state s

′

∈ S.

• The transition to the state s

′

generates a cost c

= c ∈ R,

for agent l.

• The cost c is fed back to the agent and the process is

repeated.

A summary of the Q-learning procedure is given in Algo-

rithm 1.

Within our system model the SINR at macro UE m allocated

in RB r of macrocell k at time t is:

k(m),M

k,m,MM

j=1,j6=k

j(m),M

j,m,MM

{z }

l=1

k(m),F

l,m,FM

{z }

+σ

(3)

Here h

k,m,MM

indicates the link gain between the trans-

mitting macro BS k and its macro UE m; h

j,m,MM

indicates

the link gain between the transmitting macro BS j and macro

UE m in the macrocell at BS k; h

l,m,FM

indicates the link

gain between the transmitting femto BS l and macro UE m

of macrocell k; σ

is the noise power. I

and I

are the

interferences caused by the macro BSs and the femto BSs,

respectively.

In the following, we sum up the Q-learning algorithm as it

was proposed in [8].

• State: At time t f or femtocell l and RB r the state is

deﬁned as:

= {I

, P

tot

}

where I

speciﬁes the level of aggregated interference

generated by the femtocell system in RB r. The set of

possible values is based on:











0, if Γ

< Γ

target

− 2 dB

1, if Γ

target

− 2 dB ≤ Γ

≤ Γ

target

+ 2 dB

2, otherwise

where Γ

is the instantaneous SINR measured at

macrouser m for RB r and Γ

target

= 20 dB represents

the minimum value of SINR that can be perceived by the

macrousers. P

tot

r=R

r=1

denotes the total transmit

power by the femtocell l in all RBs at time t. The set of

possible values is based on:

tot











0, if P

tot

< P

max

− 6 dBm

1, if P

max

− 6 dBm ≤ P

tot

≤ P

max

+ 6 dBm

2, otherwise

where P

max

= 20 dBm is the maximum transmit power

that a femto BS can transmit.

• Action: The set of possible actions are the 60 power

levels that a femto BS can assign to RB r. Those power

levels range from -80 to 20 dBm effective radiated power

(ERP), with 1 dBm granularity between 20 dBm and 0

dBm, 2 dBm granularity between -40 dBm and 0 dBm

and 4 dBm granularity between -80 dBm and -40 dBm.

• Cost: The cost c

incurred due to the assignment of action

a in state s for femtocell l is:

(

500, if P

tot

> P

max

(Γ

− Γ

target

)

, if otherwise

where Γ

is the instanteneous SINR value measured at

macro UE m allocated at RB r. The rationale behind

this cost function is that the total transmit power of each

femtocell must not exceed the allowed value P

max

, and

the SINR at the macro UE m is close to the selected

target Γ

target

With respect to the Q-learning algorithm, the learning rate

is α = 0.5 and the discount factor is λ = 0.9.

III. IMPROVED DECENTRALIZED Q-LEARNING

ALGORITHM

The interactions between the multi-agent system and the

environment at each time instant t, corresponding to RB r for

our improved decentralized Q-learning algorithm, consist of

the following elements:

• State: In our improved decentralized Q-learning algo-

rithm we deﬁne our state with a ﬁner granularity as in the

basic Q-learning algorithm. At ti me t for femtocell l and

RB r the state is deﬁned as quantized values of SINRs:











−10, if Γ

≤ −8 dB

−6, if − 8 dB < Γ

≤ −4 dB

−2, if − 4 dB < Γ

≤ 0 dB

38, if Γ

> 40 dB.

• Actions: The set of possible actions are the 25 power

levels that a femto BS can assign to RB r. Those power

140

levels range from -80 dBm to 20 dBm ERP with 4 dBm

granularity, where 20 dBm is the maximum power that a

femto BS can transmit. The quantized power levels, that

are available for femtocell l in RB r at each time instant

t, are accordingly deﬁned as:

l,F

r,Q

∈ [a

r,1

, . . . , a

r,m

] = [−80, −76, −72, ..., 20] [dBm].

Notice that the set of actions has the same granularity as

the state.

• Cost: For the Q-value update after our proposed initial-

ization algorithm, we use the same cost-function as in

the basic Q-learning algorithm case in order to show how

the performance of Q-learning can be enhanced through

the proposed initialization procedure. The cost-function

that we deﬁne for the initialization of the Q-table will be

introduced in the next subsection.

The major drawback of the Q-learning algorithm is its slow

convergence, because the learning approach requires all state-

action pairs to be visited at least once (and preferably a consid-

erable number of times) during the learning process in order

to determine an optimal policy. Furthermore, the traditional Q-

learning approach requires all state-action pairs to be visited

at least once (and preferably a considerable number of times)

during the learning process in order to determine an optimal

policy. In order to overcome this drawback, a new initialization

procedure for the Q-learning algorithm is proposed, which

shows a convergence/performance enhancement. Visiting one

state for the ﬁrst time, we do not only update the Q-value of a

single state-action pair, but add estimates for the cost function

for all other possible actions of the current state, so that the

Q-table is faster initialized.

The major changes in our algorithm is not the deﬁnition

of different states and actions, but the way of initializing

the Q-table, as said before. Accordingly, the learning loop in

Algorithm 1 will be the same as in the basic decentralized

Q-learning algorithm. In the following we will only focus on

the initialization part.

A. Initialization of the Q-table

Fig. 1 demonstrates the proposed initialization procedure of

the Q-table. The x-y-plane in Fig. 1 represents the actions and

the states, where each (x,y)-point is an state-action pair. The

target SINR, Γ

target

, that can be perceived by a macro UE is

assumed to be 20 dB. This corresponds to state 18 as shown

in Fig. 1.

Assuming that the instantenous SINR Γ

of macro UE m

in RB r corresponds to state s

in Fig. 1 and according to the

Q-learning algorithm (see Algorithm 1) action a

is selected.

The difference ∆s = s

− 18 from state s

to our target state

18 can be expressed as Γ

− Γ

target

The interference at macro UE m in RB r is caused by both

and I

. Since we assume a ﬁxed power distribution at

macro BSs, this interference can be reduced by controlling the

transmit power at femto BSs. Increasing the transmit power

of femto BS at RB r will reduce the SINR and lead to a state

closer to state 18, while at the same time better coverage and

−80

−76

...

−10

−6

s_i

...

→

action [dBm]

←

state [dB]

→ cost: (Γ−Γ

target

)

∆ s

∆ a

target

new

Fig. 1. Description of Q-table initialization.

services at femtocells are obtained. This power enhancement

is i llustr ated in Fig. 1 as ∆a. Neglecting intercell-unterference

and additive noise, the optimum action is to increase the

femtocell transmit power level by ∆s. Therefore, the cost

function, which has to be assigned to any state-action pair,

has to be minimum for the action which already includes the

transmit power level increase by ∆s.

The obtained cost-value for state s

and selected action a

is given by

l,F

r,Q

= a

) = (Γ

− Γ

target

)

. (4)

Because of the quadratic cost function in (4) also a quadratic

increase of the initial estimated cost function with respect to

the actions is assumed. This initial estimated cost function is

depicted in Fig. 1.

The objective to ﬁnd an action a

new

in order to obtain zero

cost, can be expressed as c

l,F

r,Q

= a

new

) = 0, where a

new

= a

+ ∆a.

Accordingly, the new cost function can be expressed as

r,new

l,F

r,Q

) = (P

l,F

r,Q

− a

new

)

(5)

= (P

l,F

r,Q

− a

− (Γ

− Γ

target

))

Using this new cost function the Q-table is ﬁlled in state

for each action a ∈ A, which corresponds to the quantized

power level P

l,F

r,Q

in (5).

B. Q-value Update after Initialization

In the following iterations of the Q-learning algorithm the

Q-values of unvisited states s ∈ S, s 6= s

are initialized in

the same way. For states that have been visited before, the

basic Q-learning algorithm is used, i.e., the corresponding Q-

value Q

(s, a) is updated as shown in the learning loop of

Algorithm 1.

141

IV. SIMULATION RESULTS AND DISCUSSION

An urban scenario is considered to validate the proposed

Q-learning algorithm and compare its performance with al-

gorithms from literature. The simulation scenario is depicted

in Fig. 2 and is based on the 3GPP TSG ( Technical Speciﬁ-

cation Group) RAN (Radio Access Network) WG4 (Working

Group 4) simulation assumption and parameters, that has been

proposed in [2] and [13]. The simulation parameters can be

summarized as in TABLE I.

−300 −200 −100 0 100 200

−300

−200

−100

100

200

distance [m]

Macro UE

Femtocell with

1 Femto UE

Fig. 2. Simulation scenario.

TABLE I

SIMULATION PARAMETERS.

Parameter Value

Cellular layou t Hexagonal grid,

3 sectors per site, reuse 1

Intersite-distance 500 m

Femtocell d eployment scenario Dualstripe model [12]

Number of sites 1

Number of macro UEs per sector 3

Number of femto blocks (FB) per sector 1

Number of femto BS per femto block 4

Number of femto UEs per femto BS 1

Carrier frequency 2 GHz

System bandwidth 1.4 MHz

Distance dependent path loss see [12]

Shadowing standard deviation 8 dB

Shadowing correlation

Between cells 0.5

Between sectors 1

Max. macro (femto) BS tranmit power P

max

= 46 dBm

max

= 20 dBm)

Trafﬁc model Fullbuffer

Scheduling algorithm Proportional fair

Macro UE speed 3

Max. distance of macro UE to FB center 50 m

0 0.5 1 1.5 2 2.5 3

0.2

0.4

0.6

0.8

Avg. throughput per femto UE [bps/Hz]

cumulative distribution

Fixed PC

Received Power

Based PC

Q−learning Basic

Q−learning New

Fig. 3. Average throughput of femto UE for 20 times 50 ms.

System level simulations were performed with the LTE-

femtocell system level simulator presented in [11]. For the

dualstripe deployment scenario twenty random drops were

simulated, each with a simulation time of 50 ms and 500 ms,

which corresponds to 50 and 500 transmission time intervals

(TTI), respectively. In each of the drops we force the macro

UEs to be randomly distributed within a radius of 50 m from

the femtocell block center in order t o be placed within the

coverage area of the femtocells.

For each PC algorithm the same random scenarios and chan-

nels were analyzed. Fig. 3 and Fig. 4 depict the cumulative

distribution function of average throughput after 50 TTIs of

each random scenario for femto and macro UEs, respectively,

for each of the presented PC algorithms. Since the curves show

a similar behaviour, we present results for 50 TTI simulations,

only. For macro UEs additional simulations were performed

in the same deployment scenario, but without femtocells.

This curve will be used as a reference in order to show the

performance degradation in macrocells when femtocells are

activated as interferers.

As expected the ﬁxed PC algorithm shows the highest data

rate in femtocells (Fig. 3). The drawback of this algorithm

can be seen in Fig. 4. The key requirement of co-channel

femtocells, not to impact the existing macrocell network,

cannot be fulﬁlled in this algorithm. The average macro

UE throughput is very close to the baseline curve for the

received power based PC algorithm. Thus signiﬁcantly less

performance can be obtained for femtocells as in the case of

ﬁxed PC. The worst results were obtained for the basic Q-

learning algorithm since it requires longer periods of learning.

Neither macro UEs nor femto UEs can get high data rates.

This is due to the bad convergence of this algorithm. Within

50 TTIs (and also 500 TTIs) the basic Q-learning algorithm

cannot show its effectivity.

Using the proposed initialization procedure yields to per-

formance increase for femto and macro UEs compared to the

142

0 0.5 1 1.5 2 2.5 3

0.2

0.4

0.6

0.8

Avg. throughput per macro UE [bps/Hz]

cumulative distribution

0.4 0.5 0.6 0.7 0.8 0.9

0.4

0.45

0.5

0.55

0.6

No Femtocell

Fixed PC

Received Power Based PC

Q−learning Basic

Q−learning New

Fig. 4. Average throughput of macro UE for 20 times 50 ms.

basic Q-learning algorithm. Our algorithm shows signiﬁcantly

better performance for femto UEs while at the same time

increasing the macro UE performance. Compared to the other

algorithms we can point out that our proposed Q-learning

algorithm shows better performance for femto UEs than the

received power based PC and less performance than the ﬁxed

PC algorithm. In case of macro UE performance our algorithm

shows similar behaviour as the received power based PC, i.e.,

it is close to the performance of the reference case, while it

is better than the ﬁxed PC algorithm. Thus our proposed Q-

learning algorithm shows the best tradeoff between the femto

and macro UE average throughput and ﬁts best with the key

requirement of co-channel femtocells.

V. CONCLUSION

In this paper we have presented a new decentralized Q-

learning approach for interference management in a macrocel-

lular network overlaid by femtocells to improve the systems’

performance. The main drawback of the Q-learning approach

is known as its slow convergence. A mitigation approach

has been shown for this drawback. As a results, we have

introduced a new approach for the initialization procedure

of Q-learning. Hence, we have shown in a 3GPP compliant

system level simulation environment that, with respect to the

common Q-learning algorithm, our proposal yields signiﬁcant

gains in terms of average UE throughput for both, macro and

femto UEs. In addition we compare our results with a basic

ﬁxed PC algorithm, transmitting with maximum power, and

an open-loop received power based PC algorithm. We have

shown that our proposal ﬁts best with the key requirement for

co-channel femtocells, which is to keep the increase in in-

terference caused by femtocells low enough to ensure a low

impact on the performance of existing macrocellular network,

while achieving the target coverage at femtocells.

REFERENCES

[1] S. R. Saunders, S. Carlaw, A. Giustina, R. R. Bhat,

V. S. Rao and R. Siegberg, Femtocells: Oppotunities and

Challenges for Business and Technology. Great Britain:

John Wiley & Sons Ltd., 2009.

[2] 3GPP TS 25.820, ”3rd Generation Partnership Project;

Technical Speciﬁcation Group Radio Access Network; 3G

Home NodeB Study Item Technical Report (Release 8)”,

V8.2.0 (2008-09).

[3] 3GPP TSG-RAN WG4 R4-070902, ”Initial home NodeB

coexistence simulation results”, Nokia Siemens Networks,

(2007-06)

[4] 3GPP TSG-RAN WG4 R4-094245, ”Interference control

for LTE Rel-9 HeNB cells”, Nokia Siemens Networks,

(2009-11)

[5] 3GPP TSG-RAN WG4 R4-071540, ”LTE Home Node B

downlink simulation results with ﬂexible Home Node B

power”, Nokia Siemens Networks, (2007-10)

[6] 3GPP TSG-RAN WG4 R4-071578, ”Simulation results

of macro-cell and co-channel Home NodeB with power

conﬁguration and open access”, Alcatel-Lucent, (2007-10)

[7] 3GPP TSG-RAN WG4 R4-071621, ”HNB Coexistence

Scenario Evaluation”, Qualcomm Europe, (2007-10)

[8] A. Galindo-Serrano and L. Guipponi, ”Distributed Q-

learning for Interference Control in OFDMA-based Fem-

tocell Networks”, IEEE 71st Vehicular Technology Con-

ference, 2010, pp. 1-5.

[9] R. S. Sutton and A. G. Barto, Reinforcement Learning:

An Introduction, Cambridge, 1998

[10] L. P. Kaelbling, M. L. Littman and A. W. Moore,

”Reinforcement learning: A survey”, Journal of Artiﬁcial

Intelligence Research, vol.4, pp. 237-285,1996.

[11] M. Simsek et al, ”An LTE-femtocell dynamic system

level simulator,” in Proc. IEEE Smart Antennas (WSA),

2010 International ITG Workshop, Feb. 2010, pp. 66-71.

[12] 3GPP TR 36.814, ”3rd Generation Partnership Project;

Technical Speciﬁcation Group Radio Access Network;

Further advancements for E-UTRA physical layer aspects

(Release 9)”, V9.0.0 (2010-03).

[13] Femto Forum, (2008, Dec.) Interference

Management in UMTS Femtocells [Online]. Available:

http://www.femtoforum.org/femto/Files/File/FF

UMTS-Interference Management.pdf.

[14] C. J. C. H. Watkins, Learning from Delayed Rewards,

PhD thesis, Cambridge University, England, 1989.

143

Toward Safe and Accelerated Deep Reinforcement Learning for Next-Generation Wireless Networks

Preprint

Sep 2022

Deep reinforcement learning (DRL) algorithms have recently gained wide attention in the wireless networks domain. They are considered promising approaches for solving dynamic radio resource management (RRM) problems in next-generation networks. Given their capabilities to build an approximate and continuously updated model of the wireless network environments, DRL algorithms can deal with the multifaceted complexity of such environments. Nevertheless, several challenges hinder the practical adoption of DRL in commercial networks. In this article, we first discuss two key practical challenges that are faced but rarely tackled when developing DRL-based RRM solutions. We argue that it is inevitable to address these DRL-related challenges for DRL to find its way to RRM commercial solutions. In particular, we discuss the need to have safe and accelerated DRL-based RRM solutions that mitigate the slow convergence and performance instability exhibited by DRL algorithms. We then review and categorize the main approaches used in the RRM domain to develop safe and accelerated DRL-based solutions. Finally, a case study is conducted to demonstrate the importance of having safe and accelerated DRL-based RRM solutions. We employ multiple variants of transfer learning (TL) techniques to accelerate the convergence of intelligent radio access network (RAN) slicing DRL-based controllers. We also propose a hybrid TL-based approach and sigmoid function-based rewards as examples of safe exploration in DRL-based RAN slicing.

Factorization Q -Learning Initialization for Parameter Optimization in Cellular Networks

Article

Full-text available

Aug 2022
WIREL COMMUN MOB COM

Q -value initialization significantly influences the efficiency of Q -learning. However, there have been no precise rules to choose the initial Q -values as yet correctly, which are usually initialized to a default value. This paper proposes a novel Q -value initialization framework for cellular network applications and factorization Q -learning Initialization (FQI). The proposed method works as an add-on of Q -learning that automatically and efficiently initializes the nonupdated Q -values by utilizing the correlation model of the visited experiences built on factorization machines. In an open-source VoLTE network, FQI was introduced into Q -learning and four improved variants (Dyna Q -learning, Q λ -learning, double Q -learning, and speedy Q -learning) for performance comparison. The experiment results demonstrate that the factorized algorithms based on FQI substantially outperform the original algorithms, often learning policies that attain 1.5-8 times higher final performance measured by the episode reward and the convergence episodes.

A novel Q-learning algorithm based on improved whale optimization algorithm for path planning

Article

Full-text available

Dec 2022
PLOS ONE

Q-learning is a classical reinforcement learning algorithm and one of the most important methods of mobile robot path planning without a prior environmental model. Nevertheless, Q-learning is too simple when initializing Q-table and wastes too much time in the exploration process, causing a slow convergence speed. This paper proposes a new Q-learning algorithm called the Paired Whale Optimization Q-learning Algorithm (PWOQLA) which includes four improvements. Firstly, to accelerate the convergence speed of Q-learning, a whale optimization algorithm is used to initialize the values of a Q-table. Before the exploration process, a Q-table which contains previous experience is learned to improve algorithm efficiency. Secondly, to improve the local exploitation capability of the whale optimization algorithm, a paired whale optimization algorithm is proposed in combination with a pairing strategy to speed up the search for prey. Thirdly, to improve the exploration efficiency of Q-learning and reduce the number of useless explorations, a new selective exploration strategy is introduced which considers the relationship between current position and target position. Fourthly, in order to balance the exploration and exploitation capabilities of Q-learning so that it focuses on exploration in the early stage and on exploitation in the later stage, a nonlinear function is designed which changes the value of ε in ε-greedy Q-learning dynamically based on the number of iterations. Comparing the performance of PWOQLA with other path planning algorithms, experimental results demonstrate that PWOQLA achieves a higher level of accuracy and a faster convergence speed than existing counterparts in mobile robot path planning. The code will be released at https://github.com/wanghanyu0526/improveQL.git.

Striatal dopamine explains novelty-induced behavioral dynamics and individual variability in threat prediction

Article

Full-text available

Sep 2022
NEURON

Animals both explore and avoid novel objects in the environment, but the neural mechanisms that underlie these behaviors and their dynamics remain uncharacterized. Here, we used multi-point tracking (DeepLabCut) and behavioral segmentation (MoSeq) to characterize the behavior of mice freely interacting with a novel object. Novelty elicits a characteristic sequence of behavior, starting with investigatory approach and culminating in object engagement or avoidance. Dopamine in the tail of the striatum (TS) suppresses engagement, and dopamine responses were predictive of individual variability in behavior. Behavioral dynamics and individual variability are explained by a reinforcement-learning (RL) model of threat prediction in which behavior arises from a novelty-induced initial threat prediction (akin to “shaping bonus”) and a threat prediction that is learned through dopamine-mediated threat prediction errors. These results uncover an algorithmic similarity between reward- and threat-related dopamine sub-systems.

CLSQL: Improved Q-Learning Algorithm Based on Continuous Local Search Policy for Mobile Robot Path Planning

Article

Full-text available

Aug 2022
SENSORS-BASEL

How to generate the path planning of mobile robots quickly is a problem in the field of robotics. The Q-learning(QL) algorithm has recently become increasingly used in the field of mobile robot path planning. However, its selection policy is blind in most cases in the early search process, which slows down the convergence of optimal solutions, especially in a complex environment. Therefore, in this paper, we propose a continuous local search Q-Learning (CLSQL) algorithm to solve these problems and ensure the quality of the planned path. First, the global environment is gradually divided into independent local environments. Then, the intermediate points are searched in each local environment with prior knowledge. After that, the search between each intermediate point is realized to reach the destination point. At last, by comparing other RL-based algorithms, the proposed method improves the convergence speed and computation time while ensuring the optimal path.

A fast constructive path planning algorithm for mobile robot navigation

Conference Paper

Sep 2021

Path planning is a vitally important ability for autonomous mobile robots. Because of the high computational complexity, the optimal solution is generally infeasible since the required computation time increases exponentially with the increase in the problem size. Instead, it is common to rely on heuristic and meta-heuristic algorithms to find near-optimal solutions. Although heuristic algorithms have proven effective and computationally inexpensive in small workspaces, they do not always scale to large environments and tend to get trapped in local minima. Also, while meta-heuristic algorithms are attracting considerable attention because of their effectiveness in optimization, they still require significant computational resources and are non-deterministic. In this paper, we introduce a novel Fast Constructive Algorithm (FCA) for deterministic path optimization that requires comparatively few computational resources to generate an optimized path. Our proposed FCA efficiently generates the waypoints for a path based only on the obstacles that intersect with the straight-line segment linking the robot’s current position to the target location. The key idea is to construct the path by iteratively calculating the best waypoint to avoid the next obstacle in the robot’s path. The effectiveness of the FCA is assessed on several maps with distinct complexities and its performance is compared with different state-of-the-art path planning algorithms. Our results show that the proposed FCA is competitive and can outperform existing algorithms in terms of path length and computation time.

Distributed Multi-Cell Power Control with NAF Reinforcement Learning

Conference Paper

Jun 2023

Distributional Reinforcement Learning for VoLTE Closed Loop Power Control in Indoor Small Cells

Article

Apr 2023
INT J AD HOC UBIQ CO

We present a Distributional Reinforcement Learning (DRL) empowered downlink power control algorithm for voice over LTE (VoLTE). We mainly focus on closed-loop power control with small cells serving an indoor environment. We model the power control problem using DRL to efficiently manage the uncertainty in the function approximation process used to evaluate the power control decisions. The proposed DRL-based power control algorithm greatly improves the performance w.r.t. Fixed Power Allocation and Deep Q-Networks-based approaches in terms of voice calls retainability.

Toward Safe and Accelerated Deep Reinforcement Learning for Next-Generation Wireless Networks

Article

Jan 2022

Distributed Task Offloading based on Multi-Agent Deep Reinforcement Learning

Conference Paper

Dec 2021

Distributed Q-Learning for Interference Control in OFDMA-Based Femtocell Networks

Conference Paper

Full-text available

Jun 2010

This paper proposes a self-organized power allocation technique to solve the interference problem caused by a femtocell network operating in the same channel as an orthogonal frequency division multiple access cellular network. We model the femto network as a multi-agent system where the different femto base stations are the agents in charge of managing the radio resources to be allocated to their femtousers. We propose a form of real-time multi-agent reinforcement learning, known as decentralized Q-learning, to manage the interference generated to macro-users. By directly interacting with the surrounding environment in a distributed fashion, the multi-agent system is able to learn an optimal policy to solve the interference problem. Simulation results show that the introduction of the femto network increases the system capacity without decreasing the capacity of the macro network.

An LTE-femtocell dynamic system level simulator

Conference Paper

Full-text available

Mar 2010

A dynamic system level simulator for LTE networks was developed for investigating the interference behaviour of femtocells placed within macrocells. We simulate a multi-cell, multi-user and multi-carrier system in the downlink for Single-Input, Single-Output (SISO) and Multiple-Input, Multiple-Output (MIMO) antenna configurations.

Learning From Delayed Rewards

Article

Full-text available

Jan 1989

Christopher Watkins

Photocopy. Supplied by British Library. Thesis (Ph. D.)--King's College, Cambridge, 1989.

Macro- and Femtocell Dynamic LTE System Level Simulations

Conference Paper

Jun 2010

Reinforcement Learning: A Survey

Article

Apr 1996

This paper surveys the field of reinforcement learning from a computer-science perspective. It is written to be accessible to researchers familiar with machine learning. Both the historical basis of the field and a broad selection of current work are summarized. Reinforcement learning is the problem faced by an agent that learns behavior through trial-and-error interactions with a dynamic environment. The work described here has a resemblance to work in psychology, but differs considerably in the details and in the use of the word ``reinforcement.'' The paper discusses central issues of reinforcement learning, including trading off exploration and exploitation, establishing the foundations of the field via Markov decision theory, learning from delayed reinforcement, constructing empirical models to accelerate learning, making use of generalization and hierarchy, and coping with hidden state. It concludes with a survey of some implemented systems and an assessment of the practical utility of current methods for reinforcement learning. Comment: See http://www.jair.org/ for any accompanying files

LTE Home Node B downlink simulation results with flexible Home Node B power

2007-10

gpp Tsg-Ran
Wg4 R

3GPP TSG-RAN WG4 R4-071540, " LTE Home Node B downlink simulation results with flexible Home Node B power ", Nokia Siemens Networks, (2007-10)

Femtocells: Oppotunities and Challenges for Business and Technology

Jan 2009

S R Saunders
S Carlaw
A Giustina
R R Bhat
V S Rao
R Siegberg

S. R. Saunders, S. Carlaw, A. Giustina, R. R. Bhat, V. S. Rao and R. Siegberg, Femtocells: Oppotunities and Challenges for Business and Technology. Great Britain: John Wiley & Sons Ltd., 2009.

Distributed Qlearning for Interference Control in OFDMA-based Femtocell Networks

Jan 2010
1-5

A Galindo-Serrano
L Guipponi

A. Galindo-Serrano and L. Guipponi, "Distributed Qlearning for Interference Control in OFDMA-based Femtocell Networks", IEEE 71st Vehicular Technology Conference, 2010, pp. 1-5.

Dec.) Interference Management in UMTS Femtocells

Jan 2008

Femto Forum, (2008, Dec.) Interference Management in UMTS Femtocells [Online]. Available: http://www.femtoforum.org/femto/Files/File/FF UMTS-Interference Management.pdf.

Improved Decentralized Q-learning Algorithm for Interference Reduction in LTE-Femtocells

Abstract and Figures

Recommended publications

Node dynamic selection in camera networks based on reinforcement learning

Constrained minimum crest factor multisine signals for "Plant-Friendly" identification of highly int...

LSP suitability maps

ANALYSIS OF DIFFERENT LENGTH OF SUB CHANNELS AND DIFFERENT ORDER OF MODULATION ON THE BASIS OF BIT E...