ArticlePDF Available

On decision making for dynamic configuration adaptation problem in cognitive radio equipments: a multi-armed bandit based approach

March 2010

March 2010

Authors:

Wassim Jouini

École Supérieure d'Electricité

Christophe Moy

Université de Rennes 1

Jacques Palicot

École Supérieure d'Electricité

…

Figures - uploaded by Christophe Moy

Content may be subject to copyright.

Content uploaded by Christophe Moy

Content may be subject to copyright.

On decision making for dynamic conﬁguration

adaptation problem in cognitive radio

equipments: a multi-armed bandit based

approach.

Wassim Jouini, Christophe Moy, Jacques Palicot,

SUPELEC/IETR,

France

{wassim.jouini, christophe.moy, jacques.palicot}@supelec.fr

Abstract— We introduce in this paper the notion of

“design space” as a conceptual object that deﬁnes a set

of cognitive radio decision making problems by their

constraints rather than by their degrees of freedom.

We identiﬁed, in our analysis work, three dimensions of

constrains: the environment’s, the equipment’s and the

user’s related constrains. Moreover , we deﬁne and use

the notion of a priori knowledge, to show that the tack-

led challenges by the radio community to solve conﬁg-

uration adaptation decision making problems have of-

ten the same design space, however they differ by the a

priori knowledge they assume available. Consequently,

we suggest in this paper, the “a priori knowledge” as

a classiﬁcation criteria to discriminate the main pro-

posed techniques in the literature to solve conﬁguration

adaptation decision making problems. In the rest of the

paper we propose to further study a particular deci-

sion making framework where no a priori (or limited)

information is provided to the cognitive radio equip-

ment. An approach based on tools borrowed from the

multi-armed bandit community is discussed. Finally,

our simulation results highlight that by customizing al-

gorithms developed for solving the multi-armed bandit,

efﬁcient engineering solutions to some problems met in

cognitive radio can indeed be built.

Index Terms— Cognitive radio, decision making

problems, dynamic conﬁguration adaptation, multi-

armed bandit, Upper Conﬁdence Bound, design space,

a priori knowledge, survey.

I. INTRODUCTION

Recent hardware advances have offered the possi-

bility to design software solutions to problems which

were requiring in the past hardwired signal process-

ing devices. With this added software layer, equip-

ments based on this technology, referred to as soft-

ware deﬁned radios (SDR), are able to control a large

set of parameters to operate with great ﬂexibility and

efﬁciency (e.g., change the bandwidth of the devices,

switch from one communication protocol to another,

minimize the energy consumption of a device, etc.).

Soon after the emergence of the SDR ﬁeld, several

scientists have studied ways to control at best these

parameters leading to the emergence of a new re-

search ﬁeld, named Cognitive Radio [1].

The concept of Cognitive Radio (CR) presents it-

self as the technology that will have the autonomy and

the cognitive abilities to become aware of its environ-

ment as well as of its own operational abilities. The

purpose of this new concept is to meet the user’s ex-

pectations, i.e., maximizing his proﬁt without com-

promising the efﬁciency of the network. Thus, it

presupposes the capacity to collect information from

its surrounding environment (perception), to digest it

(learning, decision making and predicting problems)

and to act in the best possible way by considering sev-

eral constraints and the available information. There-

fore, it is a new paradigm of wireless communication

whose purpose is to combine Software Deﬁned Ra-

dio technologies and Cognitive Abilities in order to

achieve Cognitive Radio equipments.

Sensing [2][3] and reconﬁguration [4][5] have been

quite intensively investigated in the community and

are out of the scope of this paper. However, on the

decision making side, only a few methods were sug-

gested by the community and most of them are still

in their infancy. Eventually, the promises of this new

technology are as high as the challenges it sets.

The purpose of this paper is twofold: On the one

hand, we aim at presenting a quick survey on the sev-

eral decision making challenges the CR community

has been dealing with during this last 10 years, as

well as the main solutions and tools suggested by the

CR literature to deal with these challenges. This sur-

vey focuses on CR equipments’ based decision mak-

ing and learning challenges. On the other hand, we

complete this survey by tackling a particular online

decision making issue where the CR equipment oper-

ates in an unknown environment [6].

The outline of the rest of this paper is the follow-

ing: we start by introducing and deﬁning a concep-

tual object referred to as design space in Section II.

The main purpose of this object is to suggest that the

cognitive radio design problem is deﬁned by a set of

constrains rather than by its degrees of freedom. We

identiﬁed, in our analysis work, three dimensions of

constrains: the environment’s, the equipment’s and

the user’s related constrains. Moreover, in Section

III, we deﬁne and use the notion of a priori knowl-

edge, to show that the tackled challenges by the radio

community to solve conﬁguration adaptation decision

making problems have often the same design space,

however they differ by the a priori knowledge they

assume available. Consequently, in section III, we

suggest the “a priori knowledge” as a classiﬁcation

criteria to discriminate the main proposed techniques

in the literature to solve conﬁguration adaptation de-

cision making problems. In Section IV, we further

detail one particular decision making tool borrowed

from the machine learning community. We suggest to

use it in a cognitive radio context when dealing with

environments where almost no a priori knowledge is

available and where the performance evaluation is un-

certain. Section V presents several simulations to val-

idate our approach on an academic dynamic conﬁg-

uration adaptation problem. Finally, Section VI con-

cludes.

II. COGNITIVE RADIO DESIGN SPACE

A. Cognitive radio design related constraints

A Cognitive Radio (CR) equipment can be deﬁned

as a communication system aware of its environment

as well as of its operational abilities and capable of

using them intelligently. Consequently it is assumed

that the device has the ability to collect information

through its sensors and that it can use that information

to adapt itself to its surrounding environment as de-

scribed in Figure 1. That presupposes cognitive abili-

ties enabling CR equipments to deal with all the col-

lected information in order to make appropriate deci-

sions [1][7].

Fig. 1. Cognitive radio decision making context.

When designing such CR equipments the main

challenge is to ﬁnd an appropriate way to correctly

dimension its cognitive abilities according to its en-

vironment as well as to its purpose (i.e., providing a

certain service to the user). Several papers in the lit-

erature have already been concerned by this matter

however their description of the problem usually re-

mained fuzzy (e.g., [1][8][9]). We summerize their

analysis by deﬁning three “constraints” on which the

design of a CR equipment will depend: First, the

constraints imposed by the surrounding environment,

then the constraints related to the user’s expectations

and ﬁnally, the constraints inherent to the equipment.

These constraints help dimensioning the CR decision

making engine. Consequently, an a priori formula-

tion of these elements helps the designer to imple-

ment the right tools in order to obtain a ﬂexible and

adequate cognitive radio.

1) The environment constraints: since a cognitive

radio is a wireless device that operates in a surround-

ing communicating environment, it shall respect its

rules (e.g., allocated frequency bands, tolerated inter-

ference,etc.). Thus the behavior of cognitive radio

equipments is highly coordinated by the constraints

imposed by the environment. As a matter of fact, if

the environment allows no degrees of freedom to the

equipments, this latter has no choice but to obey and

thus looses all cognitive behavior. On the other side,

if no constraints are imposed by the environment, the

cognitive radio will still be constrained by its own op-

erational abilities and the expectations of the user.

2) User’s expectations: when using his wireless

device for a particular application (voice communi-

cation, data, streaming and so on), the user is ex-

pecting a certain quality of service. Depending on

the awaited quality of service, the cognitive radio can

identify several criteria to optimize, such as, minimiz-

ing the bit error rate, minimizing energy consump-

Fig. 2. Cognitive radio decision making design space.

tion, maximizing spectral efﬁciency, etc. If the user

is too greedy and imposes too many objectives, the

designing problem to solve might become intractable

because of the constraints imposed by the surround-

ing environment and the platform of the cognitive ra-

dio. However if the user is expecting nothing, then

again there is no need for a ﬂexible cognitive radio.

Usually it is assumed that the user is reasonable in a

sense that he will accept the best he could get with a

minimum cost as long as the quality of service pro-

vided is above a certain level.

3) Equipment’s operational abilities: These lim-

itations are perhaps the most obvious since one can-

not ask the cognitive radio equipment to adapt itself

more than what it can perform (sense and/or act). It

is usually assumed in the cognitive radio literature

that the equipment is an ideal software deﬁned ra-

dio, and thus, that it has all the needed ﬂexibility for

the designed framework. On a real application the

efﬁciency of cognitive radio equipments depends of

course on the degrees of freedom (or equivalently the

constraints) inherent to the wireless platform used to

communicate. As examples of commonly analyzed

degrees of freedom one can ﬁnd: modulation, pulse

shape, symbol rate, transmit power, etc.

B. Design space

We denote by cognitive radio design space an ab-

stract three dimensional space that characterizes the

CR decision making engine as shown in Figure 2. It

is indeed abstract since it does not have any rigor-

ous mathematical meaning but it is only used to vi-

sually and conceptually illustrate the dependencies of

the CR decision making engine to the ”design dimen-

sions”: environment, parameters (usually referred to

as knobs) and objectives (or criteria deﬁned from the

user’s expectations).

In Figure 2, we represent two sub-spaces referred

to as actual design space and virtual design space.

On the one hand, the virtual design space refers to the

upper bound support of the design space where ev-

ery dimension is considered independently from the

others. Its volume can be interpreted as the largest

space of decision problems one could deﬁne from the

three dimensions. On the other hand, the actual de-

sign space is included in the virtual design space. It

results from the reduction of the design space when

taking into account the correlation between the differ-

ent constraints imposed by every dimension of the de-

sign space. For instance, some constraints on the en-

vironment such as, “imposed ﬁxed waveform” might

disable some objectives such as “ﬁnd a waveform that

maximizes the spectral efﬁciency”.

C. Dynamic Conﬁguration Adaptation-DCA

As an illustrative exemple that we will use for the

rest of the paper, we deﬁne the design space of the so

called dynamic conﬁguration adaptation (DCA) prob-

lem. Within this framework, we assume that the en-

vironment constrains the cognitive radio by allowing

only K possible conﬁgurations to use. This condi-

tion characterizes the environment and the equipment.

Moreover we assume that there exist M≥1objec-

tives that evaluates how well the equipment performs

to meet the users expectations.

To conclude, we usually observe in the literature

that these characterizations are implicitly made, then

ﬁnal assumptions are done to deﬁne the decision mak-

ing framework. These assumptions concern what we

refer to as the “a priori model knowledge”. In the

next section, we introduce and explain the notion of

a priori knowledge and we present a brief state of

the art on decision making for cognitive radio con-

ﬁguration adaptation using the particular DCA design

space. We show that although the design space is the

same, depending on the a priori model knowledge,

different approaches are suggested by the community

to tackle the deﬁned decision making problems.

III. DYNAMI C CON FIGU RATION A DA PTATION

PRO B LEM :CHALLENGES AND SUGGESTED

AP PROAC HES

The a priori knowledge is a set of assumptions

made by the designer on the amount and representa-

tion of the available information to the decision mak-

ing engine when it ﬁrst deals with the environment.

As a matter of fact, “knowledge” is deﬁned by the

Oxford English Dictionary as: (i) expertise, and skills

acquired by a person through experience or educa-

tion; the theoretical or practical understanding of a

subject, (ii) what is known in a particular ﬁeld or

in total; facts and information or (iii) awareness or

familiarity gained by experience of a fact or situa-

tion. Consequently, within the cognitive radio frame-

work, we can deﬁne the a priori knowledge as the

set of theoretical or practical assumptions provided

by the designer to the CR decision making engine.

These assumptions, if they are accurate, provide the

CR with valuable information on the problem to deal

with. These remarks lead us to suggest that the deci-

sion making problems the cognitive radio will have to

deal with are deﬁned by the set {design space, a priori

knowledge}. The more accurate the a priori knowl-

edge is the more efﬁcient the cognitive radio can be.

In the next subsections we brieﬂy describe the dif-

ferent approaches provided by the community de-

pending on the a priori knowledge assumed relevant

to tackle the environment the CR might face during its

life time. In Figure 3 we see a suggestion to classify

these techniques depending on the a prioi knowledge

provided to the cognitive decision making engine.

A. Expert approach

The expert approach relies on the important

amount of knowledge collected by telecommunica-

tion engineers and researchers. This knowledge is

based on theoretical consideration and practical mea-

sures on the environment and radio communication

parameters. It was ﬁrst suggested by Mitola in his

Ph.D. dissertation on cognitive radio [1]. Through in-

tensive off-line simulations, expert systems are pro-

vided with a set of inference rules. These rules are

then used on-line to adapt the equipment depending

on the context faced by cognitive radio equipments.

Thus the more available knowledge the better the

equipment can adapt itself to its surrounding dynamic

environment. However, this knowledge is usefully as

long as if the cognitive radio can represent its knowl-

edge in a way that enables to exploit it and to react to

the environment by adequate adaptations of its oper-

ating conﬁguration.

For that purpose, Mitola suggested representing the

knowledge of cognitive radio equipments using a new

dedicated language radio communication: “Radio

Knowledge Representation Language” (RKRL)[1].

This representation of knowledge uses web semantic

such as XML (eXtensible Markup Language), EDF

(Resource Description Framework) and OWL (Web

Ontology Language). The expert knowledge based

approach had a large success especially due to the XG

project (neXt Generation) supported by the DARPA

(e.g. [10] and for spectrum sharing: [11]). As a mat-

ter of fact, if the knowledge is well represented and

provided to the equipment as a set of rules, the deci-

sion making process becomes very simple. However

this approach has a few drawbacks:

•The behavior of the designed system is not

adapted to a particular user but to all users and

to a set of probable environments. Moreover in

order to acquaint the CR decision making engine

with valuable and large knowledge, an important

amount of effort is needed from the designer.

•Expert knowledge is mainly based on models.

Thus the system might behave in a poor way

when it is facing unexpected dynamic in the en-

vironment.

The techniques based on expert systems can, how-

ever be supported by several other tools to help them

acquire new knowledge on the environment or help

them avoid conﬂicts between different conﬁguration

adaptation rules.

B. Exploration based decision making: Genetic Al-

gorithms

In some contexts, one can consider that there is a

priori knowledge available on the complex relation-

ships existing between, the metrics observed, the pa-

rameters to adapt and the criteria to satisfy as de-

scribed in Figure 4. In this case the problem appears

to be a multi-criteria optimization problem. Within

this framework, the CR decision making engine aims

at ﬁnding the best parameters to meet the users expec-

tations by solving a set of equations as shown in Table

II, Figure 4). This problem is known to be complex

for several reasons:

•there exist no universal deﬁnition of optimality

in this case. Thus the solution of this problem

are satisfactory (or not) with respect to a cer-

tain function, usually named ﬁtness that evalu-

ates how well the criteria were satisﬁed.

•Thus usually a large space of possible “good”

conﬁgurations can be available.

•The criteria are correlated and can be in conﬂict

(e.g., Figure 4).

If we assume that the previously mentioned off-line

expert rule extraction phase has not been (or partially)

accomplished an exploration of the space of possible

conﬁgurations is needed.

This deﬁned cognitive radio decision making

framework was ﬁrst analyzed by Christian James

Fig. 3. Suggested decision making techniques depending on the assumed a priori knowledge.

Fig. 4. Multi-criteria optimization problem [12].

Rieser and Thomas W. Rondeau. They suggested the

use of Genetic Algorithms (GA) to tackle this frame-

work [8][12]. Genetic algorithms were ﬁrst designed

to mimic Darwin’s evolutionary theory and are well

known for their capacity to adapt themselves to a

changing environment. Their work showed that un-

der this design space and with the described a priori

knowledge, the genetic algorithms provide cognitive

radios with an efﬁcient and ﬂexible decision making

engine.

C. Learning approaches: exploration and exploita-

tion

As we argued in the previous subsections and as

several other authors [13][9] notices, “Many CR pro-

posals, such as [12][13][14], rely on a priori char-

acterization of these performance metrics which are

often derived from analytical models. Unfortunately,

[. . . ], this approach is not always practical due to

e.g., limiting modeling assumption, non-ideal behav-

iors in real-life scenarios, and poor scalability” [13].

To avoid these limitations and in order to tackle more

realistic scenarios, many methods based on learn-

ing techniques were suggested: Artiﬁcial Neuronal

Networks (ANN), Evolving connectionist systems

(ECS), statistical learning, regression models and so

on. All of these approaches have their cons and pros,

however they all have in common that they mainly

rely on the real environment to try and infer from it

decision making rules for CR equipments. Since this

learning tools aim at representing the functional rela-

tionship between the environment (through the sensed

metrics), the systems parameters and the criteria to

satisfy, they need a direct interaction with the envi-

ronment in order to build a posteriori knowledge on

their environment. In this paper we sub classify these

methods depending on the way they learn and ex-

ploit their rules. On the on hand (i), we ﬁnd a set

of techniques that separates exploration and exploita-

tion phases. On the other hand (ii), we ﬁnd other tech-

niques more ﬂexible that combine both processes.

In the ﬁrst mentioned case (i) we ﬁnd several tools

such as Artiﬁcial Neural Networks or statistical learn-

ing already used and exploited in other domain requir-

ing some cognitive abilities (robotics, video games,

etc.). These methods have two phases: a phase of pure

“exploration” where the CR decision making engine

learns and infers to ﬁnd (explicitly or implicitly) deci-

sion making rules, then uses in a second phase this a

posteriori knowledge to make decision. Since these

learning techniques rely on a ﬁrst learning phase,

a large amount of data and computational power is

needed in order to extract reliable knowledge. This

difﬁculty is already known concerning ANN for in-

stance. It is still true for statistical learning. As no-

ticed by Weingart in his paper [15], the provided tech-

niques are still computationally prohibitive, and not

ready yet to be used in a real equipment. However

if the ﬁrst phase is well achieved the second phase is

usually very simple and doesn’t require much time or

energy [13].

in the second case (ii), we ﬁnd promising tech-

niques recently introduced to the community and

still need to be further investigated [9][6]. These

techniques try to provide the CR with a ﬂexible and

incremental learning decision making engine. In the

case of ECS based decision making engine, Colson

suggested the use of an evolving neural network

[16][17]. Unlike the usual ANN, the ECS-NN can

change its structure without “forgetting” already

learned knowledge. Thus new rules can be learned

by adding new neurons to the neural structure. In

order to be efﬁcient the architecture proposed in [9]

needs some expert advice (a priori knowledge) on

the several available conﬁgurations. These added

information ranks the different conﬁgurations based

on some criteria (robustness, spectral efﬁciency,

etc.) but without knowing a priori which one is

more adequate when facing a certain environment.

The suggested tools in [6] however assumes that

no a priori knowledge is provided and that the

performance of the equipment can only be estimated

when trying a speciﬁc conﬁguration. These tools are

based on the so-called Multi-Armed Bandit (MAB)

framework and will be further detailed in Section IV.

To conclude on this ﬁrst part of the paper, we would

like to enhance the fact that the proposed classiﬁca-

tion in this paper shows that a CR equipment cannot

depend on only one core decision making tool but on

a pool of techniques. Everytime it faces an environ-

ment, the equipment needs to have an estimation of

its a priori knowledge and on its reliability. To tackle

a particular context, the general process can be sum-

marized through three questions: What can’t I do (de-

sign space)? What do I already know (a priori knowl-

edge)? And what technique should I select to solve

the decision making problem?

In the next section we further detail a particular

case of partial monitoring under uncertainty known

as multi-armed bandit framework. Within this frame-

work we assume that we only have very limited a

priori knowledge on the environment and on the CR

itself, which makes senses within a CR framework.

The purpose of the method suggested in this section in

to offer a balance between exploration and exploita-

tion phase without interrupting the communication

process, i.e., while providing a certain service to the

user.

Fig. 5. Slot representation for a radio equipment controlled by

a cognitive decision making engine. A slot is divided into 4 peri-

ods. During the ﬁrst period, the cognitive decision making engine

senses the environment and chooses the next conﬁguration. If the

new conﬁguration is different from the current one, a reconﬁgu-

ration is carried out during the second period before communicat-

ing. If a reconﬁguration is not needed, the CR equipment keeps

the current conﬁguration to communicate. At the end of every

slot, the cognitive decision making engine computes a reward that

evaluates its performance during the communication process. It

is assumed here that τ1+τ2+τ4are small with respect to τ3.

IV. DYNA MIC C ONFI GUR ATI ON ADA P TATIO N

PRO B LEM

A. General Framework

The general framework tackled in this section is de-

scribed in Figure 6. A particular case of this problem

has been introduced to the CR community in a pre-

vious paper [6]. In this section we extend the frame-

work to a more realistic scenario. Within this frame-

work (as for the previously analyzed one in [6]) the

problem appears as a particular instance of the well

know multi-armed bandit problem.

A multi-armed bandit is a simple machine learning

problem based on an analogy with the traditional slot

machine (one armed bandit) but with more than one

lever. When pulled at a time t= 0,1,2, ..., each lever

(or machine) k∈ {k= 1, ..., K}provides a reward

rtdrawn from a distribution θkassociated to that spe-

ciﬁc lever. The objective of the gambler is to maxi-

mize the collected reward sum through iterative pulls.

It is classically assumed that the gambler has no initial

knowledge about the levers. However it is important

to understand that many CR applications may provide

some information that shall be used to design better

policies. For the sake of generality, and in order to

cope with the worst situations, we ignore on purpose

some of that information. The crucial tradeoff the

gambler faces at each trial is between “exploitation”

of the lever that has the highest expected payoff and

“exploration” to get more information about the ex-

pected payoffs of the other levers. In this paper we as-

sume that the different payoffs drawn from a machine

are independent and identically distributed (i.i.d.) and

that the independence of the rewards holds between

the machines. However the different machines reward

distributions {θ1, θ2, ..., θK}are not supposed to be

the same. We invite the reader to refer to the previ-

1-CR equipment:

•Kpossible conﬁgurations Ck,k∈ {k=

1, ..., K}, verifying the operational constraints

but with unknown performances.

•A cognitive decision making engine: can learn

and make decisions to help the CR equipment to

improve its behavior.

2-Time representation:

•Time divided into slots t= 0,1,2, ... (Figure 5)

•At the beginning of every slot t, the cognitive de-

cision making engine decides to reconﬁgure or

not the CR equipment.

3-Environment and performance evaluation:

•Typical observations: SNR, BER, network load,

throughput, spectrum bands, etc.

•A numerical signal is computed at the end of

every slot tand informs the cognitive decision

making engine of the performance of the CR

equipment. The numerical signal obtained when

using conﬁguration Ckis a function of the ob-

servations and the conﬁgurations.

•The numerical results computed with a conﬁg-

uration Ckare assumed to be i.i.d. and drawn

from an unknown stochastic distribution θk.

Fig. 6. Description of the Dynamic Conﬁguration Adaptation

problem.

ously mentioned paper [6] for more details about the

equivalence between this CR decision making prob-

lem and the MAB framework.

In the case of CR problems, these distributions θk

depend on external parameters that the environment

reveals at the beginning of every slot (for instance the

SNR in Section V). Thus the dynamic of the problem

is the following: ﬁrst, the equipment senses the con-

text of the environment (e.g., the current SNR), then

depending on the outcome of this sensing, chooses a

conﬁguration to try. At the end of the transmitted slot,

the CR can compute a signal that evaluates its per-

formances during that speciﬁc slot. Finally, the CR

decision making engine takes into account the new

collected information to update its conﬁguration se-

lection policy.

B. Suggested approach

The tackled framework in [6] corresponds to the

herein described problem with a ﬁxed context (e.g.,

ﬁxed SNR value for all slots). However when the

context changes we cannot assume a priori that the

acquired knowledge is still valid in the new context.

Thus there are two possible solutions: on one the

hand, we could use statistical learning to try and in-

fer a relationship between the performance of one

conﬁguration in one context and the performance of

the same conﬁguration in a different context. These

methods can be efﬁcient; however it is often at the

cost of a large overhead in terms of computation

time and memory. On the other hand we could as-

sume, if possible, that for two “close” contexts (e.g.,

SN R1= 9 dB and SN R2= 9.05 dB ) the perfor-

mance of a conﬁguration doesn’t change much. Then

we can group several contexts and form a cluster.

That would enable us to divide the context into sev-

eral clusters (e.g., We can represent a large interval of

SNRs by several clusters : [0 20]=[0 1]∪[1 2]. . . [19

20]). And ﬁnally address locally the learning prob-

lem in every cluster as one MAB problem on a ﬁxed

context. Consequently, we can duplicate the tools al-

ready used in the case of one MAB problem to deal

with the case where we have several MAB problems,

one in every cluster. Within this framework, every

cluster shall have its own learning algorithm to esti-

mate, on average depending on the cluster size, the

best conﬁguration.

In this paper we prefer the latter approach that is

very intuitive and doesn’t cost a lot of the already lim-

ited computational resources in a CR equipment. The

learning tools used are the same already presented in

[6]. In order to make this paper as self-sufﬁcient as

possible, we present the main ideas of the so-called

Upper Conﬁdence Bound (UCB) indexes in the next

paragraph.

At every instant t, an upper conﬁdence bound index

is computed for every machine k. This upper con-

ﬁdence bound index, denoted by Bk,t,Tk(t), is com-

puted from the gathered information ituntil the slot

number tand gives an optimistic estimation of the ex-

pected reward of machine k.

Let Bk,t,Tk(t)denote the index of the policies we

are dealing with:

Bk,t,Tk(t)=Xk,Tk(t)+Ak,t,Tk(t)(1)

where Xk,Tk(t)is the sample mean of the machine

kafter been played Tk(t)times at the step t, and

Ak,t,Tk(t)is an upper conﬁdence bias added to the

sample mean.

A policy πcomputes from itthese indexes from

which it deduces an action atas follows:

at=π(it) = arg max

(Bk,t,Tk(t))(2)

Parameters: K, exploration coefﬁcient α

Input: it

Output: at

Algorithm:

If: t≤Kreturn at=t+ 1

Else:

•Tk(t)←Pt−1

m=0 1{Im=k},1∀k

•Ak,t,Tk(t)←qα. ln(t)

Tk(t),∀k

•Bk,t,Tk(t)←Pt−1

m=0 rm.1{Im=k}

Tk(t)+Ak,t,Tk(t),∀k

•return at= arg max

(Bk,t,Tk(t))

Fig. 7. A tabular version of the U C B1algorithm for selecting

the next conﬁguration at.

We describe hereafter two speciﬁc upper conﬁdence

biases Ak,t,Tk(t)that will be used in our simulations.

Assuming that the rewards are upper bounded by a

positive real b > 0we ﬁnd:

1) UCB1:is deﬁned by Ak,t,Tk(t)such that [18]:

Ak,t,Tk(t)=sb2.α. ln(t)

Tk(t)(3)

2) UCBV:is deﬁned by Ak,t,Tk(t)such that [19]:

Ak,t,Tk(t)=s2ξ.Vk(t).ln(t)

Tk(t)+3.c.b.ξ. ln(t)

Tk(t)(4)

where Vkrefers to the empirical variance of the con-

ﬁguration kin the particular cluster considered.

Finally both of these indexes are very simple to

computes as we can see it in Figure 7 where a tabular

version of the UCB algorithm is proposes in the case

of the UCB1index. The case of the UCBVindex is

strictly equivalent.

C. Performance Evaluation.

To evaluate the performance of these policies,

it is convenient to use the notion of “regret”. The

general idea behind the “regret” can be summarized

as follows: if the gambler knew a priori which one

was the best arm, he would only pull that one, and

hence, maximize the expectancy of the collected

rewards. However, since he lacks that essential

information he will suffer unavoidable loss due to

suboptimal pulls. In a similar way, if it is possible

to ﬁnd an adequate clustering of the environment’s

context such that for every cluster there is one and

only one “best candidate”, then the gambler could

Fig. 8. Performance of the different conﬁgurations depending

on the SNR.

play always this best candidate. However since

we usually do not have this optimal division of the

context space, we suffer a second loss due to the gap

existing between the optimal division of the context

space and the actual division of the context space.

For our application in a cognitive radio context, we

adapt the expression of the regret and suggest the

form in Equation (5). Let Imdenote the selected

conﬁguration at the slot number mthen:

The regret of a policy π∈Πat time t(after tdeci-

sions) is deﬁned as follows:

E[Rπ

t] =

t−1

m=0

(µ∗(SN Rm)−µIm(SN Rm)) (5)

where µk(SN Rm)is the expected performance of

the conﬁguration k, at the slot number munder a con-

text SN Rm. And µ∗(SN Rm) = max

k{µk(SN Rm)}

In the next section, we exploit the herein described

algorithms within the DCA problem, implement the

proposed approach and discuss the parameters. Then

through several simulations, we show that the general

implemented has empirically a logarithmic regret.

V. SIMULATIONS

A. Experimental protocol

For the simulations, we used 5 different conﬁgura-

tions denoted by {C1, C2, . . . , C5}. The curves that

appear in Figure 8 represent the throughputs TCkof

the different conﬁgurations Ck,k∈ {1,2,...,5},

as a function of the SNR. Their expressions are in-

spired from real radio communication problems how-

ever, for the sake of generality; we only use them as

tools for the simulations.

Usually, the radio equipments are dimensioned to

provide a service within a certain interval of SNRs.

This leads to a worst case analysis. Thus if we ex-

pect the designed system to provide the user with the

highest throughput for a low SNR (around 6 dB in

this case), then C1would be the chosen conﬁgura-

tion. However, in a Cognitive Radio context, we aim

at ﬁnding a way to “jump” from a curve to another

depending on the SNR, in order to stay on the curve

that maximizes the performance of the equipment for

all SN Rs.

lets deﬁne j= [454 940 454

454

2940] where j(1)

is a parameter associated to C1,j(2) to the conﬁgu-

ration C2and so on. As for j, let M= [4 8 8 16 16].

And Let np= 20 then the performance criterion used

(i.e., throughput in our case) has the following form:

TCk(SN R) = j(k).log2(M)

j(k) + np

[1−

(1 −1

p(M(k))).erf c(s3.SNR.log2(M(k))

2.(M(k)−1) )]

During the rest of this paper, we consider that the es-

timations of the throughput received by the CR deci-

sion making engine are drawn from Bernoulli distri-

butions θksuch that for all SNR we verify

TCk(SN R) = E[θk(SNR)] (6)

Moreover we consider that the DCA problem exist

only for a bounded SN R ∈[SN Rmin SNRmax ]and

that the SNR follows is a random variable. In this case

the variable SN RdB = 10.log10(SN R)is assumed

to follow a Gaussian distribution with mean 10 dB

and standard deviation of 4 dB, SN RdB min = 6 dB

and SN RdBmax = 14 dB. The SN R interval was

divided into 24 equal clusters in order to have a good

learning resolution.

The parameters used for the UCB algorithms were

chosen to make sure that these algorithm have loga-

rithmic regret. As a matter of fact, theoretical anal-

ysis are provided in [18][19] where it is shown that

parameters α≤1(case of UCB1) and {ξ≤1and/or

c≤1/3}(case of UCBV) are risky and could lead

to a bad learning behavior of the algorithm. Thus we

implemented our work using the critical values α= 1

and {ξ=1 and/or c= 1/3}. Moreover we chose

b= 4 as an upper bound of the possible rewards (cf.

Figure 8). As a matter of fact in this case bis larger

than any possible outcome of the transmission pro-

cess.

Fig. 9. Average cumulated regret when using UCB algorithms

to tackle DCA problems..

B. Results

Figure 9 shows the evolution of the average cu-

mulated regret for the different UCB policies. For

the two policies, the cumulated regret ﬁrst increases

rather rapidly with the slot number and then more and

more slowly. This shows that the CR decision making

engines based on UCB policies are able to process the

past information in an appropriate way even though

such that conﬁgurations leading to high rewards are

favored with time. Moreover by choosing a cluster

size small enough to have a good local approximation

on the conﬁguration performances, yet not too small

(otherwise the algorithm would spend most of its time

exploring), we see that the proposed algorithms have

logarithmic regrets which is known to be order opti-

mal for the classic MAB problem. Figure 10 show

that although the designed system is facing a ran-

domly changing environment with a high variance, it

manages to learn the different optimal conﬁgurations

depending on the context. In both ﬁgures, U CBV

has a more satisfactory behavior than UCB1. It is

probably due to the fact that UCBVtakes advantage

of the variance to orient its learning behavior. How-

ever, several questions still remain regarding the de-

pendency of the designed algorithm to its parameters

such as the clusters size. As a matter of fact, the sys-

tem needs to adapt its clusters to optimize its behav-

ior. Moreover, it doesn’t exploit the sparse informa-

tion on the environment by communicating with the

other clusters. All these questions and several others

are currently under investigation. The results of these

investigations will be suggested to the community in

a future work.

Fig. 10. Percentage of times a UCB-based policy selects the

optimal conﬁguration.

VI. CONCLUSIONS

In this paper, we presented a quick yet original

state of the art on the different conﬁguration adap-

tation challenges faced by the cognitive radio deci-

sion making community. We suggested that most of

these challenges have the same constraints however

they differ by the a priori knowledge they assume

available. Consequently, we suggested the “a priori

knowledge” as a classiﬁcation criteria to discriminate

the main proposed techniques in the literature to solve

conﬁguration adaptation decision making problems.

Moreover we tackled the conﬁguration adaptation de-

cision making problem when no a priori (or very lim-

ited) information is provided to the CR equipment.

We argued that this problem is a particular instance

of the well known multi-armed bandit paradigm and

can be efﬁciently addressed through UCB algorithms.

Our simulation results have highlighted that by cus-

tomizing algorithms developed for solving the multi-

armed bandit, efﬁcient engineering solutions to some

problems met in cognitive radio can indeed be built.

ACK NOWLE DGM ENT S

This work was supported by the European Com-

mission in the framework of the FP7 Network of Ex-

cellence in Wireless COMmunications NEWCOM++

(contract n. 216715).

REFERENCES

[1] J. Mitola. Cognitive radio: An integrated agent architec-

ture for software deﬁned radio. PhD Thesis, Royal Inst. of

Technology (KTH), 2000.

[2] R. Hachemani, J. Palicot, and C. Moy. A new stan-

dard recognition sensor for cognitive radio terminal. EU-

SIPCO’07, Poznan, Pologne, 3-7 septembre 2007.

[3] C. Moy, A. Bisiaux, and S. Paquelet. An ultra-wide band

umbilical cord for cognitive radio systems. PIMRC’05,

Berlin, Septembre 2005.

[4] A. Kountouris and C. Moy. Reconﬁguration in software

radio systems. Second Karlsruhe Workshop on Software

Radios, Karlshruhe, Germany, 20-21, March 2002.

[5] J.P. Delahaye, P. Leray, C. Moy, and J. Palicot. Anaging

Dynamic Partial Reconﬁguration on Heterogeneous SDR

Platforms. SDR Forum Technical Conference05, Anaheim

(USA), November 2005.

[6] W. Jouini, D. Ernst, C. Moy, and J. Palicot. Multi-armed

bandit based policies for cognitive radio’s decision making

issues. In Proceedings of the 3rd international conference

on Signals, Circuits and Systems (SCS), November 2009.

[7] S. Haykin. Cognitive radio: brain-empowered wireless

communications. IEEE Journal on Selected Areas in Com-

munications, 23, no. 2:201–220, Feb 2005.

[8] C.J. Rieser. Biologically Inspired Cognitive Radio En-

gine Model Utilizing Distributed Genetic Algorithms for Se-

cure and Robust Wireless Communications and Networking.

PhD thesis, Virginia Tech, 2004.

[9] N. Colson, A. Kountouris, A. Wautier, and L. Husson. Cog-

nitive decision making process supervising the radio dy-

namic reconﬁguration. In Proceedings of Cognitive Radio

Oriented Wireless Networks and Communications, page 7,

2008.

[10] DARPA XG Working Group. The XG vision. request for

comments. BBN Technologies, Cambridge MA, USA, Tech.

Rep. Version 2.0, January 2004.

[11] L. Berlemann, S. Mangold, and B. H. Walke. Policy-based

reasoning for spectrum sharing in radio networks. In Pro-

ceedings of IEEE International Symposium on New Fron-

tiers in Dynamic Spectrum Access Networks (DySPAN),

Baltimore, MD, USA, November 2005.

[12] T. W. Rondeau, D. Maldonado, D. Scaperoth, and C.W.

Bostian. Cognitive radio formulation and implementation.

IEEE Proceedings CROWNCOM, Mykonos, Greece, 2006.

[13] N. Baldo and M. Zorzi. Fuzzy logic for cross-layer

optimization in cognitive radio networks. IEEE Con-

sumer Communications and Networking Conference, Jan-

uary 2007.

[14] Charles Clancy, Joe Hecker, and Erich Stuntebeck. Applica-

tions of machine learning to cognitive radio networks. IEEE

Wireless Communications Magazine, 14, 2007.

[15] T. Weingart, D. Sicker, and D. Grunwald. A statistical

method for reconﬁguration of cognitive radios. IEEE Wire-

less Commun. Mag.,vol. 14, no. 4, pp. 3440, August 2007.

[16] N. Kasabov. ECOS : Evolving connectionist systems and

the eco learning paradigm. International Conference on

Neural Information Processing, Kitakyushu, Japan, Oct.

1998.

[17] N. Kasabov. Evolving connectionist systems. the knowl-

edge engineering approach. 2nd ed. New York : Springer,

2007.

[18] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time anal-

ysis of multi-armed bandit problems. Machine learning,

47(2/3):235–256, 2002.

[19] J.-Y. Audibert, R. Munos, and C. Szepesvri. Tuning bandit

algorithms in stochastic environments. In Proceedings of

the 18th international conference on Algorithmic Learning

Theory, 2007.

Decision making for cognitive radio equipment: Analysis of the first 10 years of exploration

Article

Full-text available

Jan 2012
EURASIP J WIREL COMM

This article draws a general retrospective view on the ﬁrst 10 years of cognitive radio (CR). More speciﬁcally, we explore in this article decision making and learning for CR from an equipment perspective. Thus, this article depicts the main decision making problems addressed by the community as general dynamic conﬁguration adaptation (DCA) problems and discuss the suggested solution proposed in the literature to tackle them. Within this framework dynamic spectrum management is brieﬂy introduced as a speciﬁc instantiation of DCA problems. We identiﬁed, in our analysis study, three dimensions of constrains: the environment’s, the equipment’s and the user’s related constrains. Moreover, we deﬁne and use the notion of a priori knowledge, to show that the tackled challenges by the radio community during ﬁrst 10 years of CR to solve decision making problems have often the same design space, however they differ by the a priori knowledge they assume available. Consequently, we suggest in this article, the “a priori knowledge” as a classiﬁcation criteria to discriminate the main proposed techniques in the literature to solve conﬁguration adaptation decision making problems. We ﬁnally discuss the impact of sensing errors on the decision making process as a prospective analysis.

Multi-armed bandit based policies for cognitive radio's decision making issues

Conference Paper

Full-text available

Dec 2009

We suggest in this paper that many problems related to Cognitive Radio's (CR) decision making inside CR equipments can be formalized as Multi-Armed Bandit problems and that solving such problems by using Upper Confidence Bound (UCB) algorithms can lead to high-performance CR devices. An application of these algorithms to an academic Cognitive Radio problem is reported.

Evolving Connectionist Systems: The Knowledge Engineering Approach

Book

Full-text available

Jan 2007

Nikola Kirilov Kasabov

Evolving Connectionist Systems is aimed at all those interested in developing and using intelligent computational models and systems to solve challenging real world problems in computer science, engineering, bioinformatics and neuroinformatics. The book challenges scientists and practitioners with open questions about future creation of new information models inspired by Nature. This second edition includes new methods for adaptive, knowledge-based learning, such as online incremental feature selection, spiking neural networks, transductive neuro-fuzzy inference, adaptive data and model integration, cellular automata and artificial life systems, particle swarm optimisation, ensembles of evolving systems, and quantum inspired neural networks. New applications to gene and protein interaction modelling, brain data analysis and brain model creation, computational neuro-genetic modelling, adaptive speech, image and multimodal recognition, language modelling, adaptive robotics, modelling dynamic financial and socioeconomic systems, and ecological modelling, are covered. An important new feature of the book is the attempt to connect different structural and functional levels of a complex, intelligent system, looking for inspiration from functional relationships in natural systems, such as the genetic and the brain activity. Overall, the book is more about problem solving and intelligent systems, than about mathematical proofs of theoretical models. Additional resources for practical model validation and system creation are attached as programs in the Appendix. Data, programs, colour figures and .ppt slides are available from: http://www.kedri.info/ and http://www.theneucom. com. "This book is an important update on the first edition, taking account of exciting new developments in adaptive evolving systems. It is a very important book, and Nik should be congratulated on letting his enthusiasm shine through, but at the same time keeping his expertise as the ultimate guide. A must for all in the field!" Professor John G Taylor, King's College London "This second edition provides fully integrated, up-to-date support for knowledge-based computing in a broad range of applications by students and professionals". Professor Walter J Freeman, University of California at Berkeley.

Finite-time Analysis of the Multiarmed Bandit Problem

Article

Full-text available

May 2002

Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.

Reconfiguration in software radio systems

Article

Full-text available

An Ultra-Wide Band Umbilical Cord for Cognitive Radio Systems

Conference Paper

Full-text available

Jan 2005

This paper describes how cognitive radio (CR) could benefit from software-defined radio (SDR) compatible ultra-wide band (UWB) systems. It introduces the notion of umbilical cord that can keep a CR device connected to its surrounding world, thanks to sensing and low speed over-the-air reconfiguration (OTAR) means provided by low-data rate (LDR) systems, and to fast OTAR downloading facilities based on high-data rate (HDR) hot spots. A particular UWB architecture supporting SDR-compatible technological constraints is proposed as candidate to realize this promising combination of capabilities

Cognitive radio: An integrated agent architecture for software defined radio, Doctor of Technology

Article

Jan 2000

J. Mitola

Cognitive radio: Brain-empowered wireless communications selected areas in communications

Article

Jan 2005

S. Haykin

Biologically Inspired Cognitive Radio Engine Model Utilizing Distributed Genetic Algorithms for Secure and Robust Wireless Communications and Networking

Article

C. J. Rieser

Cognitive Radio Formulation and Implementation

Conference Paper

Jul 2006

This paper approaches cognition on the physical and MAC layers by defining a common language of "knobs" and "meters" to discuss adaptation and learning. Cognitive radio merges artificial intelligence and software defined radios (SDR). It requires a simple language for communicating between these two levels. We define a method for doing this. We also discuss a genetic algorithm approach to perform intelligent radio adaptation, using the GNU radio platform as an example. We provide both conceptual and practical implementation details of a cognitive radio acting at the physical and MAC layers. Results presented show the promise for the genetic algorithm adaptation within the multi-objective optimization environment of the cognitive radio

A new standard recognition sensor for cognitive terminal

Article

On decision making for dynamic configuration adaptation problem in cognitive radio equipments: a multi-armed bandit based approach

Figures

Recommended publications

Modeling and Designing Computational Organizations

Cómo el diseño puede utilizar las neurociencias

Exploring compatible and incompatible transactions in teams

Typological thinking of the contemporary creative knowledge space: A teaching experiment of the Chin...