ArticlePDF Available

On decision making for dynamic configuration adaptation problem in cognitive radio equipments: a multi-armed bandit based approach

Authors:

Figures

Content may be subject to copyright.
On decision making for dynamic configuration
adaptation problem in cognitive radio
equipments: a multi-armed bandit based
approach.
Wassim Jouini, Christophe Moy, Jacques Palicot,
SUPELEC/IETR,
France
{wassim.jouini, christophe.moy, jacques.palicot}@supelec.fr
Abstract We introduce in this paper the notion of
“design space” as a conceptual object that defines a set
of cognitive radio decision making problems by their
constraints rather than by their degrees of freedom.
We identified, in our analysis work, three dimensions of
constrains: the environment’s, the equipment’s and the
user’s related constrains. Moreover , we define and use
the notion of a priori knowledge, to show that the tack-
led challenges by the radio community to solve config-
uration adaptation decision making problems have of-
ten the same design space, however they differ by the a
priori knowledge they assume available. Consequently,
we suggest in this paper, the a priori knowledge” as
a classification criteria to discriminate the main pro-
posed techniques in the literature to solve configuration
adaptation decision making problems. In the rest of the
paper we propose to further study a particular deci-
sion making framework where no a priori (or limited)
information is provided to the cognitive radio equip-
ment. An approach based on tools borrowed from the
multi-armed bandit community is discussed. Finally,
our simulation results highlight that by customizing al-
gorithms developed for solving the multi-armed bandit,
efficient engineering solutions to some problems met in
cognitive radio can indeed be built.
Index Terms Cognitive radio, decision making
problems, dynamic configuration adaptation, multi-
armed bandit, Upper Confidence Bound, design space,
a priori knowledge, survey.
I. INTRODUCTION
Recent hardware advances have offered the possi-
bility to design software solutions to problems which
were requiring in the past hardwired signal process-
ing devices. With this added software layer, equip-
ments based on this technology, referred to as soft-
ware defined radios (SDR), are able to control a large
set of parameters to operate with great flexibility and
efficiency (e.g., change the bandwidth of the devices,
switch from one communication protocol to another,
minimize the energy consumption of a device, etc.).
Soon after the emergence of the SDR field, several
scientists have studied ways to control at best these
parameters leading to the emergence of a new re-
search field, named Cognitive Radio [1].
The concept of Cognitive Radio (CR) presents it-
self as the technology that will have the autonomy and
the cognitive abilities to become aware of its environ-
ment as well as of its own operational abilities. The
purpose of this new concept is to meet the user’s ex-
pectations, i.e., maximizing his profit without com-
promising the efficiency of the network. Thus, it
presupposes the capacity to collect information from
its surrounding environment (perception), to digest it
(learning, decision making and predicting problems)
and to act in the best possible way by considering sev-
eral constraints and the available information. There-
fore, it is a new paradigm of wireless communication
whose purpose is to combine Software Defined Ra-
dio technologies and Cognitive Abilities in order to
achieve Cognitive Radio equipments.
Sensing [2][3] and reconfiguration [4][5] have been
quite intensively investigated in the community and
are out of the scope of this paper. However, on the
decision making side, only a few methods were sug-
gested by the community and most of them are still
in their infancy. Eventually, the promises of this new
technology are as high as the challenges it sets.
The purpose of this paper is twofold: On the one
hand, we aim at presenting a quick survey on the sev-
eral decision making challenges the CR community
1
has been dealing with during this last 10 years, as
well as the main solutions and tools suggested by the
CR literature to deal with these challenges. This sur-
vey focuses on CR equipments’ based decision mak-
ing and learning challenges. On the other hand, we
complete this survey by tackling a particular online
decision making issue where the CR equipment oper-
ates in an unknown environment [6].
The outline of the rest of this paper is the follow-
ing: we start by introducing and defining a concep-
tual object referred to as design space in Section II.
The main purpose of this object is to suggest that the
cognitive radio design problem is defined by a set of
constrains rather than by its degrees of freedom. We
identified, in our analysis work, three dimensions of
constrains: the environment’s, the equipment’s and
the user’s related constrains. Moreover, in Section
III, we define and use the notion of a priori knowl-
edge, to show that the tackled challenges by the radio
community to solve configuration adaptation decision
making problems have often the same design space,
however they differ by the a priori knowledge they
assume available. Consequently, in section III, we
suggest the a priori knowledge” as a classification
criteria to discriminate the main proposed techniques
in the literature to solve configuration adaptation de-
cision making problems. In Section IV, we further
detail one particular decision making tool borrowed
from the machine learning community. We suggest to
use it in a cognitive radio context when dealing with
environments where almost no a priori knowledge is
available and where the performance evaluation is un-
certain. Section V presents several simulations to val-
idate our approach on an academic dynamic config-
uration adaptation problem. Finally, Section VI con-
cludes.
II. COGNITIVE RADIO DESIGN SPACE
A. Cognitive radio design related constraints
A Cognitive Radio (CR) equipment can be defined
as a communication system aware of its environment
as well as of its operational abilities and capable of
using them intelligently. Consequently it is assumed
that the device has the ability to collect information
through its sensors and that it can use that information
to adapt itself to its surrounding environment as de-
scribed in Figure 1. That presupposes cognitive abili-
ties enabling CR equipments to deal with all the col-
lected information in order to make appropriate deci-
sions [1][7].
Fig. 1. Cognitive radio decision making context.
When designing such CR equipments the main
challenge is to find an appropriate way to correctly
dimension its cognitive abilities according to its en-
vironment as well as to its purpose (i.e., providing a
certain service to the user). Several papers in the lit-
erature have already been concerned by this matter
however their description of the problem usually re-
mained fuzzy (e.g., [1][8][9]). We summerize their
analysis by defining three “constraints” on which the
design of a CR equipment will depend: First, the
constraints imposed by the surrounding environment,
then the constraints related to the user’s expectations
and finally, the constraints inherent to the equipment.
These constraints help dimensioning the CR decision
making engine. Consequently, an a priori formula-
tion of these elements helps the designer to imple-
ment the right tools in order to obtain a flexible and
adequate cognitive radio.
1) The environment constraints: since a cognitive
radio is a wireless device that operates in a surround-
ing communicating environment, it shall respect its
rules (e.g., allocated frequency bands, tolerated inter-
ference,etc.). Thus the behavior of cognitive radio
equipments is highly coordinated by the constraints
imposed by the environment. As a matter of fact, if
the environment allows no degrees of freedom to the
equipments, this latter has no choice but to obey and
thus looses all cognitive behavior. On the other side,
if no constraints are imposed by the environment, the
cognitive radio will still be constrained by its own op-
erational abilities and the expectations of the user.
2) User’s expectations: when using his wireless
device for a particular application (voice communi-
cation, data, streaming and so on), the user is ex-
pecting a certain quality of service. Depending on
the awaited quality of service, the cognitive radio can
identify several criteria to optimize, such as, minimiz-
ing the bit error rate, minimizing energy consump-
2
Fig. 2. Cognitive radio decision making design space.
tion, maximizing spectral efficiency, etc. If the user
is too greedy and imposes too many objectives, the
designing problem to solve might become intractable
because of the constraints imposed by the surround-
ing environment and the platform of the cognitive ra-
dio. However if the user is expecting nothing, then
again there is no need for a flexible cognitive radio.
Usually it is assumed that the user is reasonable in a
sense that he will accept the best he could get with a
minimum cost as long as the quality of service pro-
vided is above a certain level.
3) Equipment’s operational abilities: These lim-
itations are perhaps the most obvious since one can-
not ask the cognitive radio equipment to adapt itself
more than what it can perform (sense and/or act). It
is usually assumed in the cognitive radio literature
that the equipment is an ideal software defined ra-
dio, and thus, that it has all the needed flexibility for
the designed framework. On a real application the
efficiency of cognitive radio equipments depends of
course on the degrees of freedom (or equivalently the
constraints) inherent to the wireless platform used to
communicate. As examples of commonly analyzed
degrees of freedom one can find: modulation, pulse
shape, symbol rate, transmit power, etc.
B. Design space
We denote by cognitive radio design space an ab-
stract three dimensional space that characterizes the
CR decision making engine as shown in Figure 2. It
is indeed abstract since it does not have any rigor-
ous mathematical meaning but it is only used to vi-
sually and conceptually illustrate the dependencies of
the CR decision making engine to the ”design dimen-
sions”: environment, parameters (usually referred to
as knobs) and objectives (or criteria defined from the
user’s expectations).
In Figure 2, we represent two sub-spaces referred
to as actual design space and virtual design space.
On the one hand, the virtual design space refers to the
upper bound support of the design space where ev-
ery dimension is considered independently from the
others. Its volume can be interpreted as the largest
space of decision problems one could define from the
three dimensions. On the other hand, the actual de-
sign space is included in the virtual design space. It
results from the reduction of the design space when
taking into account the correlation between the differ-
ent constraints imposed by every dimension of the de-
sign space. For instance, some constraints on the en-
vironment such as, “imposed fixed waveform” might
disable some objectives such as “find a waveform that
maximizes the spectral efficiency”.
C. Dynamic Configuration Adaptation-DCA
As an illustrative exemple that we will use for the
rest of the paper, we define the design space of the so
called dynamic configuration adaptation (DCA) prob-
lem. Within this framework, we assume that the en-
vironment constrains the cognitive radio by allowing
only K possible configurations to use. This condi-
tion characterizes the environment and the equipment.
Moreover we assume that there exist M1objec-
tives that evaluates how well the equipment performs
to meet the users expectations.
To conclude, we usually observe in the literature
that these characterizations are implicitly made, then
final assumptions are done to define the decision mak-
ing framework. These assumptions concern what we
refer to as the a priori model knowledge”. In the
next section, we introduce and explain the notion of
a priori knowledge and we present a brief state of
the art on decision making for cognitive radio con-
figuration adaptation using the particular DCA design
space. We show that although the design space is the
same, depending on the a priori model knowledge,
different approaches are suggested by the community
to tackle the defined decision making problems.
III. DYNAMI C CON FIGU RATION A DA PTATION
PRO B LEM :CHALLENGES AND SUGGESTED
AP PROAC HES
The a priori knowledge is a set of assumptions
made by the designer on the amount and representa-
tion of the available information to the decision mak-
ing engine when it first deals with the environment.
As a matter of fact, “knowledge” is defined by the
Oxford English Dictionary as: (i) expertise, and skills
3
acquired by a person through experience or educa-
tion; the theoretical or practical understanding of a
subject, (ii) what is known in a particular field or
in total; facts and information or (iii) awareness or
familiarity gained by experience of a fact or situa-
tion. Consequently, within the cognitive radio frame-
work, we can define the a priori knowledge as the
set of theoretical or practical assumptions provided
by the designer to the CR decision making engine.
These assumptions, if they are accurate, provide the
CR with valuable information on the problem to deal
with. These remarks lead us to suggest that the deci-
sion making problems the cognitive radio will have to
deal with are defined by the set {design space, a priori
knowledge}. The more accurate the a priori knowl-
edge is the more efficient the cognitive radio can be.
In the next subsections we briefly describe the dif-
ferent approaches provided by the community de-
pending on the a priori knowledge assumed relevant
to tackle the environment the CR might face during its
life time. In Figure 3 we see a suggestion to classify
these techniques depending on the a prioi knowledge
provided to the cognitive decision making engine.
A. Expert approach
The expert approach relies on the important
amount of knowledge collected by telecommunica-
tion engineers and researchers. This knowledge is
based on theoretical consideration and practical mea-
sures on the environment and radio communication
parameters. It was first suggested by Mitola in his
Ph.D. dissertation on cognitive radio [1]. Through in-
tensive off-line simulations, expert systems are pro-
vided with a set of inference rules. These rules are
then used on-line to adapt the equipment depending
on the context faced by cognitive radio equipments.
Thus the more available knowledge the better the
equipment can adapt itself to its surrounding dynamic
environment. However, this knowledge is usefully as
long as if the cognitive radio can represent its knowl-
edge in a way that enables to exploit it and to react to
the environment by adequate adaptations of its oper-
ating configuration.
For that purpose, Mitola suggested representing the
knowledge of cognitive radio equipments using a new
dedicated language radio communication: “Radio
Knowledge Representation Language” (RKRL)[1].
This representation of knowledge uses web semantic
such as XML (eXtensible Markup Language), EDF
(Resource Description Framework) and OWL (Web
Ontology Language). The expert knowledge based
approach had a large success especially due to the XG
project (neXt Generation) supported by the DARPA
(e.g. [10] and for spectrum sharing: [11]). As a mat-
ter of fact, if the knowledge is well represented and
provided to the equipment as a set of rules, the deci-
sion making process becomes very simple. However
this approach has a few drawbacks:
The behavior of the designed system is not
adapted to a particular user but to all users and
to a set of probable environments. Moreover in
order to acquaint the CR decision making engine
with valuable and large knowledge, an important
amount of effort is needed from the designer.
Expert knowledge is mainly based on models.
Thus the system might behave in a poor way
when it is facing unexpected dynamic in the en-
vironment.
The techniques based on expert systems can, how-
ever be supported by several other tools to help them
acquire new knowledge on the environment or help
them avoid conflicts between different configuration
adaptation rules.
B. Exploration based decision making: Genetic Al-
gorithms
In some contexts, one can consider that there is a
priori knowledge available on the complex relation-
ships existing between, the metrics observed, the pa-
rameters to adapt and the criteria to satisfy as de-
scribed in Figure 4. In this case the problem appears
to be a multi-criteria optimization problem. Within
this framework, the CR decision making engine aims
at finding the best parameters to meet the users expec-
tations by solving a set of equations as shown in Table
II, Figure 4). This problem is known to be complex
for several reasons:
there exist no universal definition of optimality
in this case. Thus the solution of this problem
are satisfactory (or not) with respect to a cer-
tain function, usually named fitness that evalu-
ates how well the criteria were satisfied.
Thus usually a large space of possible “good”
configurations can be available.
The criteria are correlated and can be in conflict
(e.g., Figure 4).
If we assume that the previously mentioned off-line
expert rule extraction phase has not been (or partially)
accomplished an exploration of the space of possible
configurations is needed.
This defined cognitive radio decision making
framework was first analyzed by Christian James
4
Fig. 3. Suggested decision making techniques depending on the assumed a priori knowledge.
Fig. 4. Multi-criteria optimization problem [12].
Rieser and Thomas W. Rondeau. They suggested the
use of Genetic Algorithms (GA) to tackle this frame-
work [8][12]. Genetic algorithms were first designed
to mimic Darwin’s evolutionary theory and are well
known for their capacity to adapt themselves to a
changing environment. Their work showed that un-
der this design space and with the described a priori
knowledge, the genetic algorithms provide cognitive
radios with an efficient and flexible decision making
engine.
C. Learning approaches: exploration and exploita-
tion
As we argued in the previous subsections and as
several other authors [13][9] notices, Many CR pro-
posals, such as [12][13][14], rely on a priori char-
acterization of these performance metrics which are
often derived from analytical models. Unfortunately,
[. . . ], this approach is not always practical due to
e.g., limiting modeling assumption, non-ideal behav-
iors in real-life scenarios, and poor scalability [13].
To avoid these limitations and in order to tackle more
realistic scenarios, many methods based on learn-
ing techniques were suggested: Artificial Neuronal
Networks (ANN), Evolving connectionist systems
(ECS), statistical learning, regression models and so
on. All of these approaches have their cons and pros,
however they all have in common that they mainly
rely on the real environment to try and infer from it
decision making rules for CR equipments. Since this
learning tools aim at representing the functional rela-
tionship between the environment (through the sensed
metrics), the systems parameters and the criteria to
satisfy, they need a direct interaction with the envi-
ronment in order to build a posteriori knowledge on
their environment. In this paper we sub classify these
methods depending on the way they learn and ex-
ploit their rules. On the on hand (i), we find a set
of techniques that separates exploration and exploita-
tion phases. On the other hand (ii), we find other tech-
niques more flexible that combine both processes.
In the first mentioned case (i) we find several tools
such as Artificial Neural Networks or statistical learn-
ing already used and exploited in other domain requir-
ing some cognitive abilities (robotics, video games,
etc.). These methods have two phases: a phase of pure
“exploration” where the CR decision making engine
learns and infers to find (explicitly or implicitly) deci-
sion making rules, then uses in a second phase this a
posteriori knowledge to make decision. Since these
learning techniques rely on a first learning phase,
a large amount of data and computational power is
needed in order to extract reliable knowledge. This
difficulty is already known concerning ANN for in-
stance. It is still true for statistical learning. As no-
ticed by Weingart in his paper [15], the provided tech-
niques are still computationally prohibitive, and not
ready yet to be used in a real equipment. However
if the first phase is well achieved the second phase is
usually very simple and doesn’t require much time or
5
energy [13].
in the second case (ii), we find promising tech-
niques recently introduced to the community and
still need to be further investigated [9][6]. These
techniques try to provide the CR with a flexible and
incremental learning decision making engine. In the
case of ECS based decision making engine, Colson
suggested the use of an evolving neural network
[16][17]. Unlike the usual ANN, the ECS-NN can
change its structure without “forgetting” already
learned knowledge. Thus new rules can be learned
by adding new neurons to the neural structure. In
order to be efficient the architecture proposed in [9]
needs some expert advice (a priori knowledge) on
the several available configurations. These added
information ranks the different configurations based
on some criteria (robustness, spectral efficiency,
etc.) but without knowing a priori which one is
more adequate when facing a certain environment.
The suggested tools in [6] however assumes that
no a priori knowledge is provided and that the
performance of the equipment can only be estimated
when trying a specific configuration. These tools are
based on the so-called Multi-Armed Bandit (MAB)
framework and will be further detailed in Section IV.
To conclude on this first part of the paper, we would
like to enhance the fact that the proposed classifica-
tion in this paper shows that a CR equipment cannot
depend on only one core decision making tool but on
a pool of techniques. Everytime it faces an environ-
ment, the equipment needs to have an estimation of
its a priori knowledge and on its reliability. To tackle
a particular context, the general process can be sum-
marized through three questions: What can’t I do (de-
sign space)? What do I already know (a priori knowl-
edge)? And what technique should I select to solve
the decision making problem?
In the next section we further detail a particular
case of partial monitoring under uncertainty known
as multi-armed bandit framework. Within this frame-
work we assume that we only have very limited a
priori knowledge on the environment and on the CR
itself, which makes senses within a CR framework.
The purpose of the method suggested in this section in
to offer a balance between exploration and exploita-
tion phase without interrupting the communication
process, i.e., while providing a certain service to the
user.
Fig. 5. Slot representation for a radio equipment controlled by
a cognitive decision making engine. A slot is divided into 4 peri-
ods. During the first period, the cognitive decision making engine
senses the environment and chooses the next configuration. If the
new configuration is different from the current one, a reconfigu-
ration is carried out during the second period before communicat-
ing. If a reconfiguration is not needed, the CR equipment keeps
the current configuration to communicate. At the end of every
slot, the cognitive decision making engine computes a reward that
evaluates its performance during the communication process. It
is assumed here that τ1+τ2+τ4are small with respect to τ3.
IV. DYNA MIC C ONFI GUR ATI ON ADA P TATIO N
PRO B LEM
A. General Framework
The general framework tackled in this section is de-
scribed in Figure 6. A particular case of this problem
has been introduced to the CR community in a pre-
vious paper [6]. In this section we extend the frame-
work to a more realistic scenario. Within this frame-
work (as for the previously analyzed one in [6]) the
problem appears as a particular instance of the well
know multi-armed bandit problem.
A multi-armed bandit is a simple machine learning
problem based on an analogy with the traditional slot
machine (one armed bandit) but with more than one
lever. When pulled at a time t= 0,1,2, ..., each lever
(or machine) k {k= 1, ..., K}provides a reward
rtdrawn from a distribution θkassociated to that spe-
cific lever. The objective of the gambler is to maxi-
mize the collected reward sum through iterative pulls.
It is classically assumed that the gambler has no initial
knowledge about the levers. However it is important
to understand that many CR applications may provide
some information that shall be used to design better
policies. For the sake of generality, and in order to
cope with the worst situations, we ignore on purpose
some of that information. The crucial tradeoff the
gambler faces at each trial is between “exploitation”
of the lever that has the highest expected payoff and
“exploration” to get more information about the ex-
pected payoffs of the other levers. In this paper we as-
sume that the different payoffs drawn from a machine
are independent and identically distributed (i.i.d.) and
that the independence of the rewards holds between
the machines. However the different machines reward
distributions {θ1, θ2, ..., θK}are not supposed to be
the same. We invite the reader to refer to the previ-
6
1-CR equipment:
Kpossible configurations Ck,k {k=
1, ..., K}, verifying the operational constraints
but with unknown performances.
A cognitive decision making engine: can learn
and make decisions to help the CR equipment to
improve its behavior.
2-Time representation:
Time divided into slots t= 0,1,2, ... (Figure 5)
At the beginning of every slot t, the cognitive de-
cision making engine decides to reconfigure or
not the CR equipment.
3-Environment and performance evaluation:
Typical observations: SNR, BER, network load,
throughput, spectrum bands, etc.
A numerical signal is computed at the end of
every slot tand informs the cognitive decision
making engine of the performance of the CR
equipment. The numerical signal obtained when
using configuration Ckis a function of the ob-
servations and the configurations.
The numerical results computed with a config-
uration Ckare assumed to be i.i.d. and drawn
from an unknown stochastic distribution θk.
Fig. 6. Description of the Dynamic Configuration Adaptation
problem.
ously mentioned paper [6] for more details about the
equivalence between this CR decision making prob-
lem and the MAB framework.
In the case of CR problems, these distributions θk
depend on external parameters that the environment
reveals at the beginning of every slot (for instance the
SNR in Section V). Thus the dynamic of the problem
is the following: first, the equipment senses the con-
text of the environment (e.g., the current SNR), then
depending on the outcome of this sensing, chooses a
configuration to try. At the end of the transmitted slot,
the CR can compute a signal that evaluates its per-
formances during that specific slot. Finally, the CR
decision making engine takes into account the new
collected information to update its configuration se-
lection policy.
B. Suggested approach
The tackled framework in [6] corresponds to the
herein described problem with a fixed context (e.g.,
fixed SNR value for all slots). However when the
context changes we cannot assume a priori that the
acquired knowledge is still valid in the new context.
Thus there are two possible solutions: on one the
hand, we could use statistical learning to try and in-
fer a relationship between the performance of one
configuration in one context and the performance of
the same configuration in a different context. These
methods can be efficient; however it is often at the
cost of a large overhead in terms of computation
time and memory. On the other hand we could as-
sume, if possible, that for two “close” contexts (e.g.,
SN R1= 9 dB and SN R2= 9.05 dB ) the perfor-
mance of a configuration doesn’t change much. Then
we can group several contexts and form a cluster.
That would enable us to divide the context into sev-
eral clusters (e.g., We can represent a large interval of
SNRs by several clusters : [0 20]=[0 1][1 2]. . . [19
20]). And finally address locally the learning prob-
lem in every cluster as one MAB problem on a fixed
context. Consequently, we can duplicate the tools al-
ready used in the case of one MAB problem to deal
with the case where we have several MAB problems,
one in every cluster. Within this framework, every
cluster shall have its own learning algorithm to esti-
mate, on average depending on the cluster size, the
best configuration.
In this paper we prefer the latter approach that is
very intuitive and doesn’t cost a lot of the already lim-
ited computational resources in a CR equipment. The
learning tools used are the same already presented in
[6]. In order to make this paper as self-sufficient as
possible, we present the main ideas of the so-called
Upper Confidence Bound (UCB) indexes in the next
paragraph.
At every instant t, an upper confidence bound index
is computed for every machine k. This upper con-
fidence bound index, denoted by Bk,t,Tk(t), is com-
puted from the gathered information ituntil the slot
number tand gives an optimistic estimation of the ex-
pected reward of machine k.
Let Bk,t,Tk(t)denote the index of the policies we
are dealing with:
Bk,t,Tk(t)=Xk,Tk(t)+Ak,t,Tk(t)(1)
where Xk,Tk(t)is the sample mean of the machine
kafter been played Tk(t)times at the step t, and
Ak,t,Tk(t)is an upper confidence bias added to the
sample mean.
A policy πcomputes from itthese indexes from
which it deduces an action atas follows:
at=π(it) = arg max
k
(Bk,t,Tk(t))(2)
7
Parameters: K, exploration coefficient α
Input: it
Output: at
Algorithm:
If: tKreturn at=t+ 1
Else:
Tk(t)Pt1
m=0 1{Im=k},1k
Ak,t,Tk(t)qα. ln(t)
Tk(t),k
Bk,t,Tk(t)Pt1
m=0 rm.1{Im=k}
Tk(t)+Ak,t,Tk(t),k
return at= arg max
k
(Bk,t,Tk(t))
Fig. 7. A tabular version of the U C B1algorithm for selecting
the next configuration at.
We describe hereafter two specific upper confidence
biases Ak,t,Tk(t)that will be used in our simulations.
Assuming that the rewards are upper bounded by a
positive real b > 0we find:
1) UCB1:is defined by Ak,t,Tk(t)such that [18]:
Ak,t,Tk(t)=sb2.α. ln(t)
Tk(t)(3)
2) UCBV:is defined by Ak,t,Tk(t)such that [19]:
Ak,t,Tk(t)=s2ξ.Vk(t).ln(t)
Tk(t)+3.c.b.ξ. ln(t)
Tk(t)(4)
where Vkrefers to the empirical variance of the con-
figuration kin the particular cluster considered.
Finally both of these indexes are very simple to
computes as we can see it in Figure 7 where a tabular
version of the UCB algorithm is proposes in the case
of the UCB1index. The case of the UCBVindex is
strictly equivalent.
C. Performance Evaluation.
To evaluate the performance of these policies,
it is convenient to use the notion of “regret”. The
general idea behind the “regret” can be summarized
as follows: if the gambler knew a priori which one
was the best arm, he would only pull that one, and
hence, maximize the expectancy of the collected
rewards. However, since he lacks that essential
information he will suffer unavoidable loss due to
suboptimal pulls. In a similar way, if it is possible
to find an adequate clustering of the environment’s
context such that for every cluster there is one and
only one “best candidate”, then the gambler could
Fig. 8. Performance of the different configurations depending
on the SNR.
play always this best candidate. However since
we usually do not have this optimal division of the
context space, we suffer a second loss due to the gap
existing between the optimal division of the context
space and the actual division of the context space.
For our application in a cognitive radio context, we
adapt the expression of the regret and suggest the
form in Equation (5). Let Imdenote the selected
configuration at the slot number mthen:
The regret of a policy πΠat time t(after tdeci-
sions) is defined as follows:
E[Rπ
t] =
t1
X
m=0
(µ(SN Rm)µIm(SN Rm)) (5)
where µk(SN Rm)is the expected performance of
the configuration k, at the slot number munder a con-
text SN Rm. And µ(SN Rm) = max
k{µk(SN Rm)}
In the next section, we exploit the herein described
algorithms within the DCA problem, implement the
proposed approach and discuss the parameters. Then
through several simulations, we show that the general
implemented has empirically a logarithmic regret.
V. SIMULATIONS
A. Experimental protocol
For the simulations, we used 5 different configura-
tions denoted by {C1, C2, . . . , C5}. The curves that
appear in Figure 8 represent the throughputs TCkof
the different configurations Ck,k {1,2,...,5},
as a function of the SNR. Their expressions are in-
spired from real radio communication problems how-
ever, for the sake of generality; we only use them as
tools for the simulations.
8
Usually, the radio equipments are dimensioned to
provide a service within a certain interval of SNRs.
This leads to a worst case analysis. Thus if we ex-
pect the designed system to provide the user with the
highest throughput for a low SNR (around 6 dB in
this case), then C1would be the chosen configura-
tion. However, in a Cognitive Radio context, we aim
at finding a way to “jump” from a curve to another
depending on the SNR, in order to stay on the curve
that maximizes the performance of the equipment for
all SN Rs.
lets define j= [454 940 454
2
454
2940] where j(1)
is a parameter associated to C1,j(2) to the configu-
ration C2and so on. As for j, let M= [4 8 8 16 16].
And Let np= 20 then the performance criterion used
(i.e., throughput in our case) has the following form:
TCk(SN R) = j(k).log2(M)
j(k) + np
[1
(1 1
p(M(k))).erf c(s3.SNR.log2(M(k))
2.(M(k)1) )]
During the rest of this paper, we consider that the es-
timations of the throughput received by the CR deci-
sion making engine are drawn from Bernoulli distri-
butions θksuch that for all SNR we verify
TCk(SN R) = E[θk(SNR)] (6)
Moreover we consider that the DCA problem exist
only for a bounded SN R [SN Rmin SNRmax ]and
that the SNR follows is a random variable. In this case
the variable SN RdB = 10.log10(SN R)is assumed
to follow a Gaussian distribution with mean 10 dB
and standard deviation of 4 dB, SN RdB min = 6 dB
and SN RdBmax = 14 dB. The SN R interval was
divided into 24 equal clusters in order to have a good
learning resolution.
The parameters used for the UCB algorithms were
chosen to make sure that these algorithm have loga-
rithmic regret. As a matter of fact, theoretical anal-
ysis are provided in [18][19] where it is shown that
parameters α1(case of UCB1) and {ξ1and/or
c1/3}(case of UCBV) are risky and could lead
to a bad learning behavior of the algorithm. Thus we
implemented our work using the critical values α= 1
and {ξ=1 and/or c= 1/3}. Moreover we chose
b= 4 as an upper bound of the possible rewards (cf.
Figure 8). As a matter of fact in this case bis larger
than any possible outcome of the transmission pro-
cess.
Fig. 9. Average cumulated regret when using UCB algorithms
to tackle DCA problems..
B. Results
Figure 9 shows the evolution of the average cu-
mulated regret for the different UCB policies. For
the two policies, the cumulated regret first increases
rather rapidly with the slot number and then more and
more slowly. This shows that the CR decision making
engines based on UCB policies are able to process the
past information in an appropriate way even though
such that configurations leading to high rewards are
favored with time. Moreover by choosing a cluster
size small enough to have a good local approximation
on the configuration performances, yet not too small
(otherwise the algorithm would spend most of its time
exploring), we see that the proposed algorithms have
logarithmic regrets which is known to be order opti-
mal for the classic MAB problem. Figure 10 show
that although the designed system is facing a ran-
domly changing environment with a high variance, it
manages to learn the different optimal configurations
depending on the context. In both figures, U CBV
has a more satisfactory behavior than UCB1. It is
probably due to the fact that UCBVtakes advantage
of the variance to orient its learning behavior. How-
ever, several questions still remain regarding the de-
pendency of the designed algorithm to its parameters
such as the clusters size. As a matter of fact, the sys-
tem needs to adapt its clusters to optimize its behav-
ior. Moreover, it doesn’t exploit the sparse informa-
tion on the environment by communicating with the
other clusters. All these questions and several others
are currently under investigation. The results of these
investigations will be suggested to the community in
a future work.
9
Fig. 10. Percentage of times a UCB-based policy selects the
optimal configuration.
VI. CONCLUSIONS
In this paper, we presented a quick yet original
state of the art on the different configuration adap-
tation challenges faced by the cognitive radio deci-
sion making community. We suggested that most of
these challenges have the same constraints however
they differ by the a priori knowledge they assume
available. Consequently, we suggested the a priori
knowledge” as a classification criteria to discriminate
the main proposed techniques in the literature to solve
configuration adaptation decision making problems.
Moreover we tackled the configuration adaptation de-
cision making problem when no a priori (or very lim-
ited) information is provided to the CR equipment.
We argued that this problem is a particular instance
of the well known multi-armed bandit paradigm and
can be efficiently addressed through UCB algorithms.
Our simulation results have highlighted that by cus-
tomizing algorithms developed for solving the multi-
armed bandit, efficient engineering solutions to some
problems met in cognitive radio can indeed be built.
ACK NOWLE DGM ENT S
This work was supported by the European Com-
mission in the framework of the FP7 Network of Ex-
cellence in Wireless COMmunications NEWCOM++
(contract n. 216715).
REFERENCES
[1] J. Mitola. Cognitive radio: An integrated agent architec-
ture for software defined radio. PhD Thesis, Royal Inst. of
Technology (KTH), 2000.
[2] R. Hachemani, J. Palicot, and C. Moy. A new stan-
dard recognition sensor for cognitive radio terminal. EU-
SIPCO’07, Poznan, Pologne, 3-7 septembre 2007.
[3] C. Moy, A. Bisiaux, and S. Paquelet. An ultra-wide band
umbilical cord for cognitive radio systems. PIMRC’05,
Berlin, Septembre 2005.
[4] A. Kountouris and C. Moy. Reconfiguration in software
radio systems. Second Karlsruhe Workshop on Software
Radios, Karlshruhe, Germany, 20-21, March 2002.
[5] J.P. Delahaye, P. Leray, C. Moy, and J. Palicot. Anaging
Dynamic Partial Reconfiguration on Heterogeneous SDR
Platforms. SDR Forum Technical Conference05, Anaheim
(USA), November 2005.
[6] W. Jouini, D. Ernst, C. Moy, and J. Palicot. Multi-armed
bandit based policies for cognitive radio’s decision making
issues. In Proceedings of the 3rd international conference
on Signals, Circuits and Systems (SCS), November 2009.
[7] S. Haykin. Cognitive radio: brain-empowered wireless
communications. IEEE Journal on Selected Areas in Com-
munications, 23, no. 2:201–220, Feb 2005.
[8] C.J. Rieser. Biologically Inspired Cognitive Radio En-
gine Model Utilizing Distributed Genetic Algorithms for Se-
cure and Robust Wireless Communications and Networking.
PhD thesis, Virginia Tech, 2004.
[9] N. Colson, A. Kountouris, A. Wautier, and L. Husson. Cog-
nitive decision making process supervising the radio dy-
namic reconfiguration. In Proceedings of Cognitive Radio
Oriented Wireless Networks and Communications, page 7,
2008.
[10] DARPA XG Working Group. The XG vision. request for
comments. BBN Technologies, Cambridge MA, USA, Tech.
Rep. Version 2.0, January 2004.
[11] L. Berlemann, S. Mangold, and B. H. Walke. Policy-based
reasoning for spectrum sharing in radio networks. In Pro-
ceedings of IEEE International Symposium on New Fron-
tiers in Dynamic Spectrum Access Networks (DySPAN),
Baltimore, MD, USA, November 2005.
[12] T. W. Rondeau, D. Maldonado, D. Scaperoth, and C.W.
Bostian. Cognitive radio formulation and implementation.
IEEE Proceedings CROWNCOM, Mykonos, Greece, 2006.
[13] N. Baldo and M. Zorzi. Fuzzy logic for cross-layer
optimization in cognitive radio networks. IEEE Con-
sumer Communications and Networking Conference, Jan-
uary 2007.
[14] Charles Clancy, Joe Hecker, and Erich Stuntebeck. Applica-
tions of machine learning to cognitive radio networks. IEEE
Wireless Communications Magazine, 14, 2007.
[15] T. Weingart, D. Sicker, and D. Grunwald. A statistical
method for reconfiguration of cognitive radios. IEEE Wire-
less Commun. Mag.,vol. 14, no. 4, pp. 3440, August 2007.
[16] N. Kasabov. ECOS : Evolving connectionist systems and
the eco learning paradigm. International Conference on
Neural Information Processing, Kitakyushu, Japan, Oct.
1998.
[17] N. Kasabov. Evolving connectionist systems. the knowl-
edge engineering approach. 2nd ed. New York : Springer,
2007.
[18] P. Auer, N. Cesa-Bianchi, and P. Fischer. Finite time anal-
ysis of multi-armed bandit problems. Machine learning,
47(2/3):235–256, 2002.
[19] J.-Y. Audibert, R. Munos, and C. Szepesvri. Tuning bandit
algorithms in stochastic environments. In Proceedings of
the 18th international conference on Algorithmic Learning
Theory, 2007.
10
... Then, in order to objectively compare the techniques introduces to address CR related decision making problem, we describe a conceptual object referred to as design space in Section 3. This conceptual object was introduced in the literature [7] to suggest that the CR design problem , from the decision making perspective, is better defined by a set of constrains rather than by a set of degrees of freedom. Thus, this section reminds us of the three considered dimensions of constrains viz., the environment's constraint, the equipment's limits and the user's needs. ...
... Several questions arise when designing a CR engine. We summarize our conceptual approach, presented in article [7], to dimension the decision making and learning abilities of a cognitive engine. Thus, we introduce the notion of design space as a conceptual object that defines a set of CR decision making problems by their constraints rather than by their degrees of freedom . ...
... This latter notion is further detailed in Section 4, where a classification of decision making tools as a function of prior knowledge is suggested. Nevertheless , the general DCA problem can be described as the most general decision making design space that we can state as follows [7]: Within this framework, we assume that the environment constrains the CR by allowing only K possible configurations to use. This condition characterizes the environment and the equipment. ...
Article
Full-text available
This article draws a general retrospective view on the first 10 years of cognitive radio (CR). More specifically, we explore in this article decision making and learning for CR from an equipment perspective. Thus, this article depicts the main decision making problems addressed by the community as general dynamic configuration adaptation (DCA) problems and discuss the suggested solution proposed in the literature to tackle them. Within this framework dynamic spectrum management is briefly introduced as a specific instantiation of DCA problems. We identified, in our analysis study, three dimensions of constrains: the environment’s, the equipment’s and the user’s related constrains. Moreover, we define and use the notion of a priori knowledge, to show that the tackled challenges by the radio community during first 10 years of CR to solve decision making problems have often the same design space, however they differ by the a priori knowledge they assume available. Consequently, we suggest in this article, the “a priori knowledge” as a classification criteria to discriminate the main proposed techniques in the literature to solve configuration adaptation decision making problems. We finally discuss the impact of sensing errors on the decision making process as a prospective analysis.
Conference Paper
Full-text available
We suggest in this paper that many problems related to Cognitive Radio's (CR) decision making inside CR equipments can be formalized as Multi-Armed Bandit problems and that solving such problems by using Upper Confidence Bound (UCB) algorithms can lead to high-performance CR devices. An application of these algorithms to an academic Cognitive Radio problem is reported.
Book
Full-text available
Evolving Connectionist Systems is aimed at all those interested in developing and using intelligent computational models and systems to solve challenging real world problems in computer science, engineering, bioinformatics and neuroinformatics. The book challenges scientists and practitioners with open questions about future creation of new information models inspired by Nature. This second edition includes new methods for adaptive, knowledge-based learning, such as online incremental feature selection, spiking neural networks, transductive neuro-fuzzy inference, adaptive data and model integration, cellular automata and artificial life systems, particle swarm optimisation, ensembles of evolving systems, and quantum inspired neural networks. New applications to gene and protein interaction modelling, brain data analysis and brain model creation, computational neuro-genetic modelling, adaptive speech, image and multimodal recognition, language modelling, adaptive robotics, modelling dynamic financial and socioeconomic systems, and ecological modelling, are covered. An important new feature of the book is the attempt to connect different structural and functional levels of a complex, intelligent system, looking for inspiration from functional relationships in natural systems, such as the genetic and the brain activity. Overall, the book is more about problem solving and intelligent systems, than about mathematical proofs of theoretical models. Additional resources for practical model validation and system creation are attached as programs in the Appendix. Data, programs, colour figures and .ppt slides are available from: http://www.kedri.info/ and http://www.theneucom. com. "This book is an important update on the first edition, taking account of exciting new developments in adaptive evolving systems. It is a very important book, and Nik should be congratulated on letting his enthusiasm shine through, but at the same time keeping his expertise as the ultimate guide. A must for all in the field!" Professor John G Taylor, King's College London "This second edition provides fully integrated, up-to-date support for knowledge-based computing in a broad range of applications by students and professionals". Professor Walter J Freeman, University of California at Berkeley.
Article
Full-text available
Reinforcement learning policies face the exploration versus exploitation dilemma, i.e. the search for a balance between exploring the environment to find profitable actions while taking the empirically best action as often as possible. A popular measure of a policy's success in addressing this dilemma is the regret, that is the loss due to the fact that the globally optimal policy is not followed all the times. One of the simplest examples of the exploration/exploitation dilemma is the multi-armed bandit problem. Lai and Robbins were the first ones to show that the regret for this problem has to grow at least logarithmically in the number of plays. Since then, policies which asymptotically achieve this regret have been devised by Lai and Robbins and many others. In this work we show that the optimal logarithmic regret is also achievable uniformly over time, with simple and efficient policies, and for all reward distributions with bounded support.
Conference Paper
Full-text available
This paper describes how cognitive radio (CR) could benefit from software-defined radio (SDR) compatible ultra-wide band (UWB) systems. It introduces the notion of umbilical cord that can keep a CR device connected to its surrounding world, thanks to sensing and low speed over-the-air reconfiguration (OTAR) means provided by low-data rate (LDR) systems, and to fast OTAR downloading facilities based on high-data rate (HDR) hot spots. A particular UWB architecture supporting SDR-compatible technological constraints is proposed as candidate to realize this promising combination of capabilities
Conference Paper
This paper approaches cognition on the physical and MAC layers by defining a common language of "knobs" and "meters" to discuss adaptation and learning. Cognitive radio merges artificial intelligence and software defined radios (SDR). It requires a simple language for communicating between these two levels. We define a method for doing this. We also discuss a genetic algorithm approach to perform intelligent radio adaptation, using the GNU radio platform as an example. We provide both conceptual and practical implementation details of a cognitive radio acting at the physical and MAC layers. Results presented show the promise for the genetic algorithm adaptation within the multi-objective optimization environment of the cognitive radio