ArticlePDF Available

Utility-Based Resource Allocation for Multi-Channel Decentralized Networks

October 2014
IEEE Transactions on Communications 62(10):3610-3620

October 2014
62(10):3610-3620

DOI:10.1109/TCOMM.2014.2357028

Authors:

Min Sheng

Dow Chemical Company

Chao Xu

Xidian University

Xijun Wang

Sun Yat-Sen University

Yan Zhang

Chinese Academy of Sciences

Show all 6 authorsHide

The architecture of decentralization makes future wireless networks more flexible and scalable. However, due to the lack of the central authority (e.g., BS or AP), the limitation of spectrum resource, and the coupling among different users, designing efficient resource allocation strategies for decentralized networks faces a great challenge. In this paper, we address the distributed channel selection and power control problem for a decentralized network consisting of multiple users, i.e., transmit-receiver pairs. Particularly, we first take the users' interactions into account and formulate the distributed resource allocation problem as a non-cooperative transmission control game (NTCG). Then, a utility-based transmission control algorithm (UTC) is developed based on the formulated game. Our proposed algorithm is completely distributed as there is no information exchange among different users and hence, is especially appropriate for this decentralized network. Furthermore, we prove that the global optimal solution can be asymptotically obtained with the devised algorithm, and more importantly, in contrast to existing utility-based algorithms, our method does not require that the converging point is one Nash equilibrium (NE) of the formulated game. In this light, our algorithm can be adopted to achieve efficient resource allocation in more general use cases.

Illustration of a decentralized network, where each user consists of one transmitting node and one receiving node.

…

Convergence comparison of TEL developed in [4] and our algorithm UTC, where there are N = 25 users sharing K = 5 channels. (a) Overall utility U vs the number of iterations T . (b) Average rate ¯ R vs the number of iterations T . global optimal solution of P rather than reach a NE of the formulated game.

…

Overall utility U vs. the number of users N .

…

Performance comparison in terms of satisfaction.

…

Figures - uploaded by Xijun Wang

Content may be subject to copyright.

Content uploaded by Xijun Wang

Content may be subject to copyright.

3610 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014

Utility-Based Resource Allocation for Multi-Channel

Decentralized Networks

Min Sheng, Member, IEEE, Chao Xu, Xijun Wang, Member, IEEE, Yan Zhang, Member, IEEE,

Weijia Han, Member, IEEE, and Jiandong Li, Senior Member, IEEE

Abstract—The architecture of decentralization makes future

wireless networks more ﬂexible and scalable. However, due to the

lack of the central authority (e.g., BS or AP), the limitation of spec-

trum resource, and the coupling among different users, designing

efﬁcient resource allocation strategies for decentralized networks

faces a great challenge. In this paper, we address the distributed

channel selection and power control problem for a decentralized

network consisting of multiple users, i.e., transmit-receiver pairs.

Particularly, we ﬁrst take the users’ interactions into account and

formulate the distributed resource allocation problem as a non–

cooperative transmission control game (NTCG). Then, a utility-

based transmission control algorithm (UTC) is developed based

on the formulated game. Our proposed algorithm is completely

distributed as there is no information exchange among different

users and hence, is especially appropriate for this decentralized

network. Furthermore, we prove that the global optimal solution

can be asymptotically obtained with the devised algorithm, and

more importantly, in contrast to existing utility-based algorithms,

our method does not require that the converging point is one

Nash equilibrium (NE) of the formulated game. In this light, our

algorithm can be adopted to achieve efﬁcient resource allocation

in more general use cases.

Index Terms—Decentralized networks, distributed resource

allocation, learning, game theory.

I. INTRODUCTION

DECENTRALIZED networks are the infrastructure-less

wireless networks consisting of multiple transmit-receive

pairs, where each transmitter could dynamically adjust its trans-

mission parameters and transmit data to its receiver [1]–[4].

Compared to the conventional networks with the control of cen-

tral authorities, e.g., BSs or APs, decentralized networks have

more ﬂexibility and scalability, and hence, span a large number

of real-world implementations, e.g., military communications,

disaster relief or sensor networking [2], [4], [5].

Manuscript received January 27, 2014; revised June 18, 2014; accepted

August 24, 2014. Date of publication September 11, 2014; date of current ver-

sion October 17, 2014. This work was supported in part by the National Natural

Science Foundation of China under Grants 61231008, 61172079, 61201141,

61301176, and 91338114, by the 863 Project under Grant 2014AA01A701,

and by the 111 Project under Grant B08038. The associate editor coordinating

the review of this paper and approving it for publication was Y. J. Zhang.

The authors are with the State Key Laboratory of ISN, Xidian University,

Xi’an 710071, China (e-mail: msheng@mail.xidian.edu.cn; cxu@mail.xidian.

edu.cn; xijunwang@xidian.edu.cn; yanzhang@xidian.edu.cn; alfret@gmail.

com; jdli@mail.xidian.edu.cn).

Color versions of one or more of the ﬁgures in this paper are available online

at http://ieeexplore.ieee.org.

Digital Object Identiﬁer 10.1109/TCOMM.2014.2357028

The main characteristics of a decentralized network can be

summarized as follows.

1) The lack of central controller. In such an infrastructure-

less network, each transmitter is responsible for tuning

its transmission strategy, e.g., transmission frequency,

bandwidth, power, modulation, etc., based on its local

observation. Therefore, self-organization is one funda-

mental capability for a decentralized network [6], [7].

2) The limitation of spectrum resource. The available chan-

nels are limited in a decentralized network, and hence,

users should compete for this precious resource to im-

prove their individual performance, e.g., transmission rate

or energy efﬁciency, thereby satisfying their individual

QoS requirement.

3) The coupling among different users. Interference occurs

when different users transmit on the same channel simul-

taneously. Therefore, each user’s performance could be

tuned by properly adjusting the operational parameters of

other users. In other words, the users are coupled.

According to the above three characteristics, there exist

two kinds of conﬂicts in a decentralized network. One is the

conﬂict between different users which is caused by the last

two characteristics, i.e., the limitation of spectrum resource and

coupling among different users. The other one is the conﬂict

between system performance and individual requirement which

is mainly introduced by the lack of a central controller. In fact,

these two conﬂicts always make a decentralized network oper-

ate at an inefﬁcient point, which is termed as price of anarchy

(PoA). For instance, considering some users who operate on

the same channel, if all of them want to maximize their own

transmission rate through power control, then the maximum

transmit power will be adopted by everyone. Obviously, this

is not an efﬁcient power control scheme for this system [8], [9].

In this light, to exploit the beneﬁts promised by the decen-

tralized networks, it is essential to design distributed resource

allocation strategies which should fully consider these two

conﬂicts. Fortunately, game theory which provides a suitable

paradigm to analyze the interrelationship between decision

makers, can be naturally adopted to deal with the ﬁrst conﬂict

[2], [6]–[8], [10], [11]. However, designing globally optimal

or even Pareto-efﬁcient (Pareto-optimal)1distributed resource

allocation algorithms for a decentralized network is still an open

problem [2], [6], [7].

1Generally speaking, it is easy to prove that the global optimal solution is

also Pareto-optimal, but not vice versa.

See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3611

In this paper, we consider a multi-user multi-channel decen-

tralized network, where each user (consisting of a transmitter

and receiver pair) is capable of performing channel selection

and power allocation to satisfy its transmission rate require-

ment. In addition, to avoid the high communication overhead,

we focus on the network where there is no information ex-

change among different users, i.e., no common control channel

(CCC) is introduced. We note that this consideration makes the

scenario more practical but on the other hand, brings us more

difﬁculties in designing efﬁcient resource allocation strategies

[2]–[4], [12]–[14].

Because of the limitation of spectrum resource and coupling

among different users, not all the rate requirements of users

(i.e., transmit-receive pairs) can be simultaneously satisﬁed

[4]. Furthermore, recalling that there is no central controller

being responsible for scheduling users’ transmission, it is a

great challenge to provide hard rate guarantee to every user

in this decentralized network. For this reason, as studied in

previous work [15]–[18], we consider softening users’ require-

ments and use a sigmoid function to measure their satisfaction.

Speciﬁcally, one user has very limited satisfaction when its

transmission rate is below the requirement, but the satisfaction

rapidly reaches an asymptotic value when its transmission

rate is above the requirement. Based on this, we formulate

the distributed channel selection and power control problem

as a non-cooperative transmission control game (NTCG). To

overcome the lack of communication between different users,

a utility-based learning approach is adopted2and a Utility-

based Transmission Control algorithm (UTC) is developed,

with which each user can conﬁgure its operational parameters

just by measuring local interference. More importantly, al-

though there is no guarantee that the Nash equilibrium (NE) for

NTCG always exists, it is proved that the decentralized network

could operate at a global optimal point by implementing UTC.

Finally, simulation results verify the validity of our analysis

and demonstrate that the performance of our algorithm (e.g.,

convergence speed, achieved overall utility, etc.) are better than

that of the existing distributed algorithms.

The remainder of this paper is organized as follows. In

Section II, the related work is presented. Section III describes

the system model and formulates the distributed channel se-

lection and power control problem. In Section IV, we develop

a utility-based transmission control algorithm and analyze its

complexity as well as efﬁciency. Finally, numerical and sim-

ulation results are presented and analyzed in Section V, and

conclusions are drawn in Section VI.

II. RELATED WORK

The game theoretic approach has been applied extensively to

design distributed resource allocation schemes in wireless com-

munication systems from both the perspective of transmission

rate as well as energy efﬁciency [9], [19]–[22]. In [19]–[21], the

concerning problem has been formulated as a potential game

[23], and then a best response dynamic (BRD) was adopted to

2The deﬁnition of utility-based learning approaches will be formally given

in next section.

achieve a pure-strategy NE. However, as discussed in the sem-

inal work [23], a potential game always admits multiple pure-

strategy NE solutions. Hence, for such a game the operating

point achieved by BRD totally depends on the starting point

and may be inefﬁcient. To improve the efﬁciency of the devised

strategy, pricing technique was introduced in [9], [22] and the

Pareto efﬁciency of the achieved NE is proved.

We note that all of the above schemes require CCC for infor-

mation exchange among different agents. Hence, they are not

suitable for the decentralized network, and developing the so-

called utility-based or payoff-based learning algorithms is nec-

essary. Speciﬁcally, when implementing this type of algorithms,

each user only needs to access the history of its own actions and

utilities, and would make its decision with the local information

[24]. To this end, some distributed schemes based on stochastic

learning, no-regret learning and reinforcement learning have

been proposed in [12]–[14], respectively. It should be noted

that all the algorithms devised in [12]–[14] are utility-based,

but the converging solution is a probability distribution over the

set of available strategies. Therefore, the performance can only

be evaluated from a statistical perspective in [12]–[14], i.e., the

performance of each implementation is unpredictable [4].

Recently, some studies begin to focus on developing the

utility-based resource allocation strategy which can asymp-

totically converge to a ﬁxed conﬁguration (e.g., pure-strategy

NE) instead of a probability distribution [3], [4]. In [3], the

distributed channel selection problem was formulated as a

potential game, and then a utility-based learning algorithm was

proposed, which could converge to a pure NE. Furthermore, not

only channel selection but also power control was considered

in [4], and another utility-based strategy was designed for one

class of non-cooperative games. To be more speciﬁc, under

the assumption that the set of NE for the proposed game is

not empty and there is at least one NE maximizing the social

welfare (i.e., sum of the utilities of all users), the proposed

distributed channel selection and power control scheme can

asymptotically converge to the global optimal solution [4].

Actually, the above assumption is less plausible in many

general cases. The reason lies in two folds: 1. For a non-

cooperative game there is no guarantee that the pure-strategy

NE always exists [25].3Particularly, one example can be found

in [21], which is termed as the signal-to-interference-plus-noise

ratio (SINR) maximization game. 2. Even if the formulated

non-cooperative game admits a NE, the Pareto-efﬁciency of its

NE is hard to guarantee [19]–[21], [25]. Obviously, it is more

difﬁcult to satisfy the more severe requirement that there exists

some NE which can maximize the social welfare. In this work, a

novel utility-based resource allocation algorithm is developed.

More importantly, it has been proved that, even if there is no

NE for the formulated game, we can also asymptotically obtain

the globally optimal solution with our proposed algorithm.

III. SYSTEM MODEL AND PROBLEM FORMULATION

As depicted in Fig. 1, we consider a decentralized net-

work featuring Ncommunicating users, each consisting of a

3Since the mixed-strategy NE will not be considered in this work, we use NE

to denote the pure-strategy NE hereafter for brevity.

3612 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014

Fig. 1. Illustration of a decentralized network, where each user consists of one

transmitting node and one receiving node.

transmit-receive pair. Particularly, to transmit data, every user

will choose one channel from the Korthogonal channels, each

of which has bandwidth B0. We consider that each channel can

be assigned to multiple users and meanwhile, the interference

occurs when each channel is simultaneously utilized by more

than one user. Without loss of generality, we suppose N≥K.

For notational simplicity, let vectors Nand Kdenote the set

of users and channels, respectively, i.e., N={1,2,···,N}

and K={1,2,···,K}. Additionally, we denote the channel

selected by user nby cn∈K. In this paper, we consider that

there is no CCC or central authority for coordination among

users. That is, all users are autonomous.

Let G∈RN×N×Kbe the channel power gain matrix, where

n,m represents the channel gain between transmitter nand

receiver mon channel k. We assume the channel condition is

static during the underlying operational period, e.g., the quasi-

static scenario. The additive noise is modelled as a zero-mean

Gaussian random variable, and then, for user n, its signal-to-

interference-plus-noise ratio (SINR) can be expressed as

γn=pngcn

n,n

Icn

n+B0N0

=pngcn

n,n



m∈N ,m=n

δ(cm,c

n)pmgcn

m,n +B0N0

,(1)

where Inrepresents the interference caused to user n,pnis the

transmit power of user n, and N0is the noise power density.

Besides that, the indicative function δ(cm,c

n)is adopted to

indicate whether the same channel is used by user mand n

simultaneously or not: if cm=cn,δ(cm,c

n)=1; otherwise

δ(cm,c

n)=0. In this paper, we consider that each user n

can choose the transmit power pnfrom a ﬁnite set Pn=

{p1

n,p

n,···,p

max

n}[4], [12].

Based on the above, the achievable transmission rate of user

ncan be expressed as

Rn=B0log2(1 + γn).(2)

Adopting different channels and power levels, one user will

obtain different achievable rates. According to (1) and (2), if

user ntransmits on channel cn,Rncan be maximized with

power pmax

nwhen there is no interference. Therefore, the upper

bound of the rate Rnfor user ncan be deﬁned as

Rmax

n=maxB0log21+pmax

ngcn

n,n

B0N0|cn∈K

.(3)

Moreover, we consider that each user nhas rate requirement

Rmin

nto satisfy its QoS requirement and assume that 0≤

Rmin

n≤Rmax

Intuitively, in this network not all the users’ rate requirements

can be guaranteed when they transmit simultaneously, espe-

cially for the case where all the users’ rate requirements are high

[4]. For instance, if Rmin

n=Rmax

n,∀n∈N, then there are at

most Ktransmissions being permitted. Here, to get around this

problem, we consider softening the user’s rate requirement and

measure its degree of satisfaction with a sigmoid function. In

fact, this approach has been widely adopted in radio resource

management [15]–[18]. To this end, the utility of each individ-

ual user can be expressed as

Un(Rn)= 1

1+e−βn(Rn−Rmin

n),∀n∈N,(4)

where βnis a constant deciding the steepness of the satisfactory

curve, and moreover, both Rnand Rmin

nare considered to have

units Mbps. It is clear from the above equation that Un(Rn)

is a monotonic increasing function with respect to Rn, i.e.,

individual users will feel more satisﬁed when they have higher

rate. Furthermore, since lim

Rn→0Un(Rn)= 1

1+eβnRmin

n>0and

lim

Rn→∞ Un(Rn)=1, the utility of each user nis scaled between

0 and 1, i.e., Un(Rn)∈(0,1). We note that although the higher

utility means the higher spectral efﬁciency for given bandwidth,

the value of the former can not directly reﬂect the value of the

latter. Therefore, in the simulation results, not only the overall

utility Ubut also the average rate ¯

Rare recorded and shown to

evaluate the efﬁciency of different algorithms.

Before starting a transmission, each individual user should

decide to adopt which power level and transmit on which

channel. For notational simplicity, we refer to a pair of channel

index and power level as a strategy sn, i.e.,

sn=(cn,p

n)∈S

n,Sn=K×P

n,∀n∈N.(5)

From (1), (2), and (4), we note that each user’s rate is affected

by the transmissions of other users and meanwhile, higher rate

brings higher satisfaction to a user. Therefore, to improve the

degree of satisfaction or utility, each user should choose its own

strategy by considering the actions of other users. That is, there

is a coupling among the strategies employed by different users.

To well study the conﬂict among different users, NTCG has

been formulated, and hereafter, the terms user and player will

be used interchangeably.

Deﬁnition: NTCG: NTCG can be represented by the tuple

G=ΓN,(Sn)n∈N ,(Un)n∈N .(6)

Particularly, Ndenotes the set of players which is identical to

the user set. For each player n, its strategy space Snis deﬁned

as shown in (5). Given a strategy proﬁle

(sn)n∈N =(s1,s2,···,sN)∈(Sn)n∈N (7)

the utility function of each player nis

Un(sn)n∈N =UnRn(sn)n∈N ,∀n∈N,(8)

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3613

where Rn((sn)n∈N )represents the achievable rate when player

nadopts the strategy sn=(cn,p

n), i.e.,

Rn(sn)n∈N =log

21+ pngcn

n,n

Icn

n(s−n)+B0N0).(9)

In (9), s−n=(s1,···,sn−1,sn+1,···,sN)is the strategy pro-

ﬁle of all players other than player n, and Icn

n(s−n)represents

the interference caused to player non channel cn.

Obtaining the optimal channel selection and power control

strategy for this decentralized network is equivalent to solving

the following combinatorial problem Pwhich is NP-hard.

P:max

c,p

n∈N

Un(cn,p

n)n∈N(10)

s.t. c∈{(c1,c

2,···,c

N)|∀cn∈K,∀n∈N},(11)

p∈{(p1,p

2,···,p

N)|∀pn∈P

n,∀n∈N}.(12)

The objective function (10) means that our objective is to

maximize the social welfare or overall utility, which is deter-

mined by both the achievable and required rate of users, i.e.,

(R1,R

2,···,R

N)and (Rmin

1,R

min

2,···,R

min

N). In addition,

constraints (11) and (12) specify each individual user’s avail-

able channel and power level sets, respectively.

Unfortunately, the above problem is an integer programming

which is extremely difﬁcult to solve. Moreover, due to the

fact that there is no central authority controlling the users in

this decentralized network, developing a completely distributed

algorithm to obtain the optimal solution of Pis important and

non-trivial.

IV. DISTRIBUTED ALGORITHM DESIGN

In this section, we would develop a utility-based algorithm

for NTCG to achieve the solution of Pshown in Section III.

We ﬁrst prove the uncertainty of the existence of NE for

NTCG, and then develop a utility-based algorithm. At the

end of this section, we will investigate the complexity of the

proposed algorithm and ﬁnally, prove that our algorithm can

asymptotically converges to the global optimal solution under

the given condition, no matter whether this solution is a NE of

the formulated game or not.

A. NE for NTCG

Recalling that there is no CCC for exchanging information

among different players, the utility-based learning algorithm is

therefore considered to be more appropriate for this distributed

environment. Actually, in the recent work [4], a similar problem

has also been studied and moreover, an efﬁcient utility-based

learning algorithm has been proposed. Particularly, for the

formulated game, authors in [4] has proved that if there exists

a NE which can maximize the social welfare, this NE can be

achieved with their proposed distributed algorithm. To check

whether the algorithm devised in [4] also can be adopted to

solve our problem Por not, we would ﬁrst discuss the existence

of NE for NTCG.

For a non-cooperative game, (pure-strategy) NE is a standard

solution standing for the equilibrium state, under which no

player can unilaterally improve its own utility by choosing

a different strategy [25]. Mathematical speaking, if a proﬁle

s∗=(s∗

1,s∗

2,···,s∗

N)in the strategy space (Sn)n∈N is a NE,

then we have

Uns∗

n,s∗

−n≥Unsn,s∗

−n,∀sn∈S

n,∀n∈N,(13)

where s∗

−n=(s∗

1,···,s∗

n−1,s∗

n+1,···,s∗

N).

Theorem 1: There is no guarantee that the NE for NTCG

always exists.

Proof: For each player n,wehave

arg

max Un(sn,s−n)

=arg

max 1

1+e−ωn(Rn(sn,s−n)−Rmin

=arg

(pmax

n,cn)

max Rn((pmax

n,c

n),s−n)

=arg

(pmax

n,cn)

max γn((pmax

n,c

n),s−n).(14)

It is noted from the above equation that NTCG is identical

to the SINR-maximization game which is introduced in [21].

According to the conclusion drawn from a “toy” two-user case

in [21], the existence of the NE for the SINR-maximization

game can not be guaranteed, which further indicates that the

NE for NTCG may not exist too. Note that a counter example

can be easily derived with the parameters given in Table I in

[21], and hence it is omitted here.

Now, the proof is complete. 

To this end, we can see that the utility-based algorithm

developed in [4] can not be directly applied to solve the problem

addressed in this work. Hence, a novel utility-based algorithm

will be devised in the following subsection.

B. Utility-Based Distributed Transmission Control Algorithm

When devising a utility-based learning algorithm, there are

two components should be elaborated for each player: the

state proﬁle and learning model (dynamics) [2], [24]. More

speciﬁcally, the former depicts each player’s available local

information, and the latter tells the users how to make their

decisions based on this information. In this subsection, we ﬁrst

deﬁne the proper state proﬁle and learning model for each

player in NTCG in detail. Then, a utility-based distributed

transmission control algorithm is proposed.4

1) State Proﬁle: At each decision moment t∈{1,2,···},

we consider describing the state proﬁle of player nwith a

triplet Ln(t)=(sn(t),U

n(t),α

n(t)), where Sn(t),Un(t), and

αn(t)∈{0,1}represent its strategy, utility, and mood, re-

spectively. We note that the binary variable αn(t)is used to

elaborate players’ desire for changing the currently adopted

4The utility-based learning approach implemented in this paper can also be

viewed as a state-based learning approach. The term “utility-based” is adopted

by [2], [24] and the references therein.

3614 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014

strategy, which will be speciﬁed in detail when introducing the

learning model.

2) Learning Model: Motivated by Marden’s work [26], a

utility-based learning model is adopted in this paper, with which

each individual player ncan update its sn(t),Un(t)and αn(t)

in sequence at each decision moment t. To be speciﬁc, at the

beginning of time t, individual player nﬁrst needs to determine

the probability distribution over the set of its available strategies

(i.e., mixed-strategy)

Qn(t)=q1

n(t),q

n(t),···,q

|Sn|

n(t),(15)

where |·|represents the cardinality of a set, and qj

n(t)is the

probability of choosing the jth strategy at time t, i.e.,

n(t)≥0,∀j∈{1,2,···,|Sn|} ,

|Sn|



j=1

n(t)=1.(16)

In other words, the probability distribution Qn(t)is adopted to

describe the players’ dynamics. Here, player nwould update

Qn(t)based on its previous mood αn(t−1) and action sn(t−

1). Particularly, if αn(t−1) = 0,

qi(fn)

n(t)= 1

|Sn|,∀fn∈S

n,(17)

where i(fn)denotes the index of strategy fnin Sn.Therule

shown in (17) means that if the previous mood is 0 the player

will choose each strategy with equal probability. On the other

hand, if αn(t−1) = 1

qi(fn)

n(t)=εw

|Sn|−1,∀fn∈S

n,fn=sn(t−1)

1−εw,otherwise, (18)

where εis a constant belonging to (0, 1) and wis a constant

greater than N. The above equation means that if the previous

mood is 1 then the player will change its strategy to a different

one (i.e., fn=sn(t−1)) with probability εw

|Sn|−1. Meanwhile,

the same strategy (i.e., fn=sn(t−1)) will be adopted with

probability 1−εw. Since εwis generally much less than 1,

equation (18) represents that the player will the a different

strategy with a relatively smaller probability if its mood is 1,

i.e., 1−εwεw

|Sn|−1. The main motivation behind utilizing

(17) and (18) to update Qn(t)is that this rule guarantees that

each individual player would more like to choose the strategy

making its mood be 1.

After that, player nwill choose an action sn(t)based on

the probability distribution Qn(t), calculate its utility Un(t)by

measuring the interference, and ﬁnally update mood αn(t)with

Algorithm 1.

Algorithm 1 Mood updating algorithm

1: if αn(t−1) = 1 then

2: if (sn(t)=sn(t−1)) and (Un(t)=Un(t−1)) then

3: Set αn(t)to 1

4: else

5: Go to 10.

6: end if

7: else

8: Go to 10.

9: end if

10: Set αn(t)to 1 and 0 with the probability ρ1=ε1−Un(t)

and ρ0=1−ρ1, respectively.

3) UTC: Now, based on the above described state pro-

ﬁle and learning model, UTC is developed and shown in

Algorithm 2, where players can update their strategies in

parallel. Similar to [4], the stop criterion of this algorithm

can be one of the following: 1) the preset maximum iteration

number Tis reached or 2) for each player n, the variation of its

utility during a period is trivial.

Algorithm 2 UTC

1: Initialize iteration count t=0, personality αn(t)=0

and strategy counter Vn=(v1

n,v

n,···,v

|Sn|

n)=

(0)1×(|Sn|),∀n∈N. Each player nrandomly chooses its

initial strategy sn(t)and then, measures its utility Un(t).

2: repeat

3: Set t=t+1

4: for n=1to Nusers do

5: Update state proﬁle Ln(t):

6: if αn(t−1) = 0 then

7: Calculate Qn(t)with (17).

8: else

9: Calculate Qn(t)with (18).

10: end if

11: Choose a strategy sn(t), measure the utility Un(t),

and update its mood αn(t).

12: Update strategies count Vn:

13: if αn(t)=1then

14: Update Vnwith (19).

15: end if

16: end for

17: until the stop criterion is satisﬁed.

18: Each player ndecides its strategy sD

naccording to (20).

During the initialization of Algorithm 2, each player nwill

randomly choose its own strategy, set its moods to 0, and

initialize the strategy counter Vn, where (0)1×(|Sn|)represents

the |Sn|-dimension null vector. We note that elements in vector

Vnis used to count the times of αn=1 when different

strategies are adopted. For instance, vi

nrepresents the times

that the ith strategy makes the mood of player nbe 1. When

the initialization is completed, the algorithm goes into a loop,

in which each individual player nwill ﬁrst update its state

proﬁle Ln(t)=(sn(t),U

n(t),α

n(t)) with the devised utility-

based learning model at each iteration. We note that the SINR

estimation can be done by sending a pilot or training sequence

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3615

from the transmitter to receiver in practice [27]. Therefore, the

utility can be measured by each autonomous user. Then, the

strategy counter Vn=(v1

n,v

n,···,v

|Sn|

n)is updated based on

the current mood αn(t).Ifαn(t)=1,

vi(sn(t))

n=vi(sn(t))

n+1,∀sn(t)∈Sn,(19)

where vi(sn(t))

nis the i(sn(t))th entry in vector Vn. Intuitively,

this updating rule implies that each player would like to record

the strategy which makes its mood be 1. When the loop is

exited, individual players will make their ﬁnal decisions:

n=arg

snvi(sn)

n=maxv1

n,v

n,···,v

|Sn|

n,∀n∈N.(20)

From (20), we note that the strategy recorded most frequently

will be eventually adopted by users.

The reasons why we choose the above decision rule are two

folds. Firstly, it only requires simple comparison operations

when making ﬁnal decision as shown in (20). Secondly, it can

make the solution of problem Pbe asymptotically achieved

under the given condition, which will be proved in the following

subsection. We note that with the adopted learning model, sys-

tem dynamics can be depicted as a perturbed Markov process

and the parameter ε>0is the perturbation factor. Therefore,

to show that the optimal strategy proﬁle can be converged, it

is essential to prove that the learning process of our algorithm

will lead to a stochastically stable strategy proﬁle which can

maximize the overall utility. The similar idea has also been

adopted by work [4] when designing the utility-based learning

algorithm. However, authors in [4] adopted a quaternary vari-

able instead of a binary variable to depict each user’s mood,

which introduces a much larger state space to capture system

dynamics and hence, makes the convergence speed of their

algorithm slower than that of ours. This will be illustrated

through simulation results as shown in the following section.

Moreover, it is worth noting that Algorithm 2 is simple and

completely distributed. In particular, when each player updates

its own state proﬁle, it does not require any prior informa-

tion of other players, thereby avoiding a large communication

overhead.

C. Complexity and Efﬁciency Analysis of UTC

In this subsection, we ﬁrst present the complexity analysis

for the proposed algorithm UTC. Then, we will analyze its

efﬁciency and give the main result in Theorem 2.

The main blocks of UTC are two parts. The ﬁrst one is the

loop from line 2 to line 17, which is independently executed

by each player. The second one is the step in line 18, in which

each player nneeds to make its own ﬁnal decision with (20).

Note that the ﬁrst main part (i.e., from line 2 to line 17)

only involves basic arithmetic operations and random number

generation, and hence has a computational complexity of O(1)

for each iteration. In addition, (20) requires the player nto

compare the all Snelements in the vector Vn. Therefore, the

complexity of this algorithm explicitly depends on both the

stop criterion of the loop and the size of the player’s strategy

space. Particularly, for the two different stop criterions ear-

lier described, the complexities are O(T+L)and O(E+L),

respectively, where Tis the preset maximum iteration num-

ber, L=max{|S1|,|S2|,···,|SN|}, and Eis the convergence

speed of the algorithm. Moreover, it should be noted that the

convergence speed Eis related to the value of parameter ε,

which will be further discussed at the end of this subsection.

Theorem 2: Let (sO

n)n∈N ∈(Sn)n∈N denote the solution of

problem P, i.e.,

sO

nn∈N =arg

(sn)n∈N

max 

n∈N

Un(sn)n∈N .(21)

When (sO

n)n∈N is unique and εis sufﬁciently small, i.e., ε→0,

the solution of UTC asymptotically converges to (sO

n)n∈N , i.e.,

Pr lim

T→∞,ε→0sD

nn∈N =sO

nn∈N =1,(22)

where Tis the number of iterations.

Proof: The proof is given in Appendix A. 

We note that there is no requirement that the optimal solution

(sO

n)n∈N is a NE for the formulated game, and hence, this

efﬁcient point may be ignored by existing utility-based resource

allocation algorithms which are designed to reach a NE [2]–

[4]. In addition, it is worth noting that when the parameter ε

is given, a much larger state space will make the convergence

speed of the proposed algorithm much slower, i.e., there is a

curse of dimensionality. This is mainly due to the fact that the

considered resource allocation is essentially a combinatorial

problem which is generally NP hard. On the other hand, there is

a tradeoff between the efﬁciency and the convergence speed of

our algorithm, which can be made by adjusting ε. Speciﬁcally,

a smaller εwill lead to a slower convergence speed, but the

algorithm is more likely to converge to the global optimal

solution (sO

n)n∈N . For this reason, if εis properly set then

our algorithm still works in the scenario where the size of

state space becomes large. In other words, a tradeoff between

the convergence speed and accuracy can be properly made to

implement this algorithm in practice. This conclusion will be

conﬁrmed with simulation results in the following section.

V. R ESULTS AND ANALYSIS

A. Simulation Scenario

To evaluate the performance of our proposed algorithm, we

conduct simulations of a decentralized network consisting of

Ntransmit-receive pairs, which are randomly deployed in a

circular region of radius rm. Meanwhile, the distance between

each transmit-receive pair is a uniform random variable be-

tween 0 and Dm. We assume that all the channels undergo

identically and independently log-normal shadow fading as

well as path loss and moreover, the path loss exponent αand

the shadow fading standard deviation σψare set to 3 and

4 dB, respectively. We note that this channel model has been

conﬁrmed empirically to accurately model the variation in

received power in some outdoor and indoor radio propagation

environments, see e.g., [28] and references therein. In addition,

the duration of a shadow fade lasts for multiple seconds or

minutes, and hence changes at a much slower time-scale [27].

3616 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014

TAB L E I

SIMULATION PARAMETERS

We consider a three-level power set for each user, i.e., low,

medium, and high power levels, which are set to −20 dBW,

−10 dBW, and 0 dBW, respectively. Besides that, for each user

n, the minimal rate requirement Rmin

nand steepness of the

sigmoid function ωnare set to 1

10 Rmax

nand 10, respectively.

In addition, each individual simulation result is obtained by

averaging over 1000 independent realizations of the users’

locations and channel conditions. Unless speciﬁed otherwise,

the simulation parameters are adopted as listed in Table I.

B. Convergence of UTC

Before delving into the performance of the proposed dis-

tributed resource allocation algorithm UTC, we ﬁrst investigate

its convergence behavior and examine the impact of the algo-

rithm parameter ε. According to Theorem 2, we provide the

maximum overall utility as shown in (10) as a benchmark result.

To solve the problem Pwithin an acceptable period of time, a

simpliﬁed scenario is considered in this simulation. Particularly,

we focus on the case that there are K=5 channels, and

meanwhile, all the users transmit with the high power level,

i.e., Pn={0dBW},∀n∈N. When there are N=Kand

N=2Kusers, the simulation results are illustrated in Fig. 2(a)

and (b), respectively.

It can be seen from Fig. 2 that when εbecomes smaller, the

convergence speed of UTC is slower but the achieved overall

utility is higher. Besides that, although there is a small gap

between the performance of UTC and that of enumeration, our

algorithm converges with much fewer iterations than that re-

quired by the latter (i.e., 

n∈N

|Sn|=

n∈N

K|Pn|). Considering

the scenario consisting of 10 users as an example, enumeration

needs 510 = 9765625 iterations but our algorithm converges in

about 40 and 100 iterations when εis set to 10−3and 10−5,

respectively. Moreover, if εis set to 10−5, the relative difference

between the overall utility achieved by enumeration and that

achieved with our algorithm is only around 0.4%. Recalling

Theorem 2, we note that this gap may stem from the fact that

εis not small enough and meanwhile there is no guarantee that

the optimal solution of Pis always unique in each round of

simulation.

Next, we compare the convergence behavior of UTC and that

of the Trial and Error Learning algorithm (TEL) proposed in

[4]. For fair comparison, both the perturbation factor εused

in UTC and that in TEL are set to 0.01. Furthermore, the

Fig. 2. Convergence of UTC with respect to ε, where the number of channels

is K=5 and the overall utility is U=

n∈N

Un.(a)N=K=5 users.

(b) N=2K=10users.

necessary mapping functions suggested in [4] are also adopted

in our simulation,5i.e., G(x)=−0.2x+0.2and F(x)=−0.2/

N+0.2/N . During this simulation, there are N=25 users and

K=5 channels, and additionally, for each user nthe available

transmit power set Pnis set as shown in Table I. The conver-

gence in terms of overall utility Uand of average transmission

rate ¯

R=

n∈N

Nare illustrated in Fig. 3(a) and (b), respectively.

From the simulation results, we note that our algorithm UTC

converges much faster than TEL. The main reasons are two

folds. On one hand, TEL introduces four states to depict each

user’s mood, but just two states are adopted in our algorithm.

This difference indicates that TEL has a much larger state

space to capture system dynamics, which further means that

each player has to search more states before making the ﬁnal

decision. On the other hand, as discussed previously, the main

result in [4] is that when ε→0and there is at least one NE

which can maximize the overall utility, TEL will asymptotically

converge to a NE with which the maximum overall utility can

be achieved. However, there is no guarantee that the NE for

NTCG always exists, i.e., the main result given in Theorem 1

in Section IV. Besides that, we can see that both higher overall

utility and average rate can be achieved by UTC. This is mainly

due to the fact that the goal of our algorithm is to ﬁnd the

5The designing requirements of these two mapping functions G(x)and

F(x)are given in (6) and (7) in [4], and meanwhile, two instances are suggested

by simulations below the two equations, respectively.

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3617

Fig. 3. Convergence comparison of TEL developed in [4] and our algorithm

UTC, where there are N=25 users sharing K=5 channels. (a) Overall

utility Uvs the number of iterations T. (b) Average rate ¯

Rvs the number of

iterations T.

global optimal solution of Prather than reach a NE of the

formulated game.

C. Performance Comparison

In this section we will evaluate the performance of our

algorithm UTC with the following metrics:

• Overall utility U: The sum utility of the all players, i.e.,

U=

n∈N

Un.

• Average transmission rate ¯

R: The average transmission

rate achieved by the users, i.e., ¯

R=

n∈N

• User satisfaction ratio ηs: The ratio of users whose rate

requirements are met over the total number of users N, i.e.,

ηs=|N0|

N,∀n∈N

0,R

n≥Rmin

n. Note that |·| denotes

the cardinality of a set.

We compare our algorithm UTC with three distributed

schemes which are presented as follows.

• Random: With this algorithm, each user nwill randomly

choose a strategy snfrom its strategy space Sn=K×P

Therefore, the performance of this method can be regard

as the baseline.

• Greedy transmission control (GTC): This greedy based al-

gorithm is proposed in [29], with which each user needs to

Fig. 4. Overall utility Uvs. the number of users N.

Fig. 5. Average rate ¯

Rvs. the number of users N.

measure the interference on all channels and then transmits

on the channel having the minimum interference with the

maximum transmit power. This process is repeated until

the stop criterion is satisﬁed.

• TEL: This utility-based distributed learning algorithm is

developed in [4]. Note that when implementing this al-

gorithm, the corresponding mapping functions G(x)and

F(x)are set the same as those adopted in previous

subsection B.

When running UTC and TEL, we use the parameter setting sug-

gested in [4] and hence, set the perturbation factor εto 10−2.In

addition, for fair comparison, all the algorithms are executed in

parallel and the maximum iteration number Tis set to 104[4].

Figs. 4 and 5 illustrate the overall utility Uand the corre-

sponding average rate ¯

Rversus the number of users, respec-

tively. As it can be observed from the simulation results, when

there are more transmitting users, the improvement of the over-

all utility is gradually slow down and meanwhile, the average

rate becomes lower. This is because that when the density of

user increases there is more interference in this network, and in

return, both the achieved utility and transmission rate of each

user would decrease. In addition, we can see that when the

number of communicating users in this network is small (for

example N≤15), the performance of GTC is good. However,

when there are more users its performance becomes worse.

This is mainly caused by the greedy behavior of users when

implementing this algorithm, i.e., they always transmit with the

3618 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014

Fig. 6. Performance comparison in terms of satisfaction.

maximum power to improve their own utilities. As discussed

in previous studies [8], [9], such a greedy based method may

cause severe interference in the system and ﬁnally may become

an inefﬁcient resource allocation strategy.

Additionally, we note that both our algorithm and TEL

perform much better than the baseline algorithm (i.e., Random).

Meanwhile, compared with TEL, there is also an improvement

in performance by implementing our algorithm. For instance,

when there are N=50users, UTC has around 9.7% higher

overall utility (i.e., from 40.08 to 44.77) and 12.4% higher

average rate (i.e., from 0.884 Mbps to 0.994 Mbps) than TEL,

respectively. Therefore, we can conclude that the interference

mitigation capability of UTC is the best among these four

distributed algorithms. It should be noted that the reason for this

improvement is similar to that stated in the previous subsection.

Next, we compare the performance of these four algorithms

from the perspective of user satisfaction, which is demonstrated

in Fig. 6. Particularly, Fig. 6(a) illustrates the user satisfaction

rate ηsversus the number of users N, and Fig. 6(b) compares

the cumulative distribution function (CDF) of the number of

satisﬁed users (i.e., |N0|=ηs·N) for the four algorithms when

there are N=50 users. Three observations can be made from

Fig. 6. Firstly, not all the rate requirements of users can be

met, especially for the case where the number of users is

large. We note that this result is consistent with the statement

given in Section III. Secondly, ηsdecreases with respect to N,

which is due to the fact that more users will result in higher

interference in this network. Thirdly, compared with the other

resource allocation schemes, more users can satisfy their rate

requirements with our algorithm. Furthermore, combining the

results shown in Figs. 4–6, we can see that our algorithm is able

to achieve better performance on both the system and individual

level. This is mainly for the reason that, both the selﬁshness

of users and welfare of the whole network are well considered

when developing our algorithm.

VI. CONCLUSION

In this paper, we have addressed the issue of distributed

channel selection and power control in decentralized networks

and meanwhile, proposed a distributed resource allocation al-

gorithm where no information exchange is introduced. More

importantly, we have theoretically proved that the networks

can asymptotically operate at the global optimal point with

our proposed algorithm under the given condition. Simulation

results veriﬁed the validity of our analysis and demonstrated

that our algorithm always performed better than the existing

ones in terms of different metrics. One possible extension in

the future work is to consider the time varying characteristic of

the network topology and speed up the convergence time.

APPENDIX A

PROOF OF THEOREM 2

Proof: The learning model in Algorithm 2 introduces a

Markov process over the ﬁnite state space Z=n∈N (Sn×

Un×A

n), where Unis the ﬁnite range of Unover all strategy

proﬁles (sn)n∈N ∈(Sn)n∈N , and An={0,1}is the set of

mood. Accordingly, for every scalar ε>0such a Markov pro-

cess is perturbed and meanwhile, we denote such a “perturbed”

Markov process by MPε. Before giving the proof in detail, we

start by introducing some necessary deﬁnitions.

Deﬁnition 2: Interdependence: An N-person game

Γ(N,(Sn)n∈N ,(Un)n∈N )is interdependent if, for every

strategy proﬁle (sn)n∈N ∈(Sn)n∈N and every subset of

players H⊆N, there exists a player g/∈H and a choice of

strategies (s

h)h∈H ∈(Sh)h∈H such that

Ug(s

h)h∈H ,(s)n∈{N /H}=Ug(sh)h∈H ,(s)n∈{N /H}.(23)

In other words, given a strategy proﬁle (sn)n∈N ∈(Sn)n∈N ,

every subset of players Hcan cause a utility (welfare) change

for some player in {N /H} by performing a proper change in

their strategies.

Deﬁnition 3: Stochastically stable states: For a perturbed

Markov process MPε, the elements of the support of the lim-

iting stationary distribution are referred to as the stochastically

stable states. Speciﬁcally, a state T∈Zis stochastically stable

if and only if lim

ε→0π(T,ε)>0, where π(T,ε)is stationary

distribution of the perturbed process.

We divide the proof of Theorem 2 into 3 steps, S1–S3, which

we now elaborate formally.

Step S1:

Proposition 1: UTCG Gis an interdependent game.

Proof: According to (4) and (8), the utility of each player

Unis an increasing function with respect to its achievable

rate Rn. We consider a situation that a single player hcan

SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3619

change its strategy and other players stay the same. Let

we suppose that the current strategy proﬁle is (sn)n∈N =

((cn,p

n))n∈N , and then, the according situation can be di-

vided into three disjoint cases: 1. ∀n∈{N/{h}},c

h=cn;

2. ∀n∈{N/{h}},c

h=cn;3.∃n∈{N/{h}},c

h=cn, and

meanwhile ∃m∈{N/{h}},c

h=cm.

In the ﬁrst case, for any player g∈{N/{h}}, if player h

change its strategy to s

h=(c

h,p

h)where c

h=cg, player g

will suffer higher interference and achieve lower rate Rg, which

will ﬁnally make its utility Ugreduced. In the second case, if

there exists a player g∈{N/{h}} whose channel index cg=

ch,hcan change its strategy to realize c

h=cg, which implies

that player gwill suffer lower interference and obtain higher

utility. Additionally, similar to the above discussions, it is easy

to prove that there is a player g∈{N/{h}} whose welfare can

be changed when player hproperly changes its strategy in the

third case. 

Therefore, the game UTCG Gis interdependent.

Step S2:

Proposition 2: For the introduced perturbed Markov process

MPε=Z=

n∈N

(Sn×U

n×A

n), if and only if a state T=

((sn)n∈N ,(un)n∈N ,(αn)n∈N )∈Z is stochastically stable,

then the strategy proﬁle can maximize the social welfare, i.e.,

(sn)n∈N =arg

(fn)n∈N

max 

n∈N

Un(fn)n∈N .(24)

Moreover, in such a state, we have un=Un((sn)n∈N )and

αn=1,∀n∈N. Therefore, such a state can be equally rep-

resented as T=((sn)n∈N ,(αn=1)

n∈N).

Proof: According to the conclusion of Proposition 1 and

the proof of Theorem 1 in [26], Proposition 2 can be proved

with the theory of resistance trees for regular perturbed Markov

decision processes, which can be found in [30]. The detailed

proof and related deﬁnitions can be founded in [26]. Here, we

only provide an outline of the proof which consists of four

steps.

First, for the unperturbed process MP0, we need to prove

that the recurrence classes are all singletons T∈C

0and D0,

where C0denotes the subset of states in which each agent’s

mood is 1 and the benchmark action and utility are aligned.

In other words, if ((sn)n∈N ,(un)n∈N ,(αn)n∈N )∈C

0then

un=Un((sn)n∈N )and αn=1. Additionally, D0represents

the set of states in which the mood of everyone is 0, i.e., if

((sn)n∈N ,(un)n∈N ,(αn)n∈N )∈D

0then un=Un((sn)n∈N )

and αn=0. This proof can be completed based on

Proposition 1, i.e., the interdependence of UTCG.

Second, it is needed to prove that the stochastic potential

of each state T∈C

0is γ(T)=w(|C0|−1) + 

n∈N

(1 −un),

where wis a constant larger than N. Here, the stochastic

potential γ(T)is the minimum resistance over all trees rooted at

the state T[30]. This conclusion can be drawn by showing that

the upper bound and lower bound for the stochastic potential

γ(T)are the same.

Then, we need to apply the criterion for determining the

stochastically stable states introduced in [30]. To be speciﬁc,

the criterion shows that the stochastically stable states are

precisely those states contained in the recurrence classes with

the minimum stochastic potential.

Finally, by way of contradiction, it can be shown that only

the recurrence classes (i.e., all singletons) in C0can be the

candidates for stochastically stable states. Hence, the main

conclusion in this proposition can be directly drawn, since

arg

T∈C

min γ(T)= arg

T∈C

min w(|C0|−1) + 

n∈N

(1 −un)

=arg

T∈C

max 

n∈N

Un(sn)n∈N .(25)

Step S3:

If the social optimal solution is unique, then there is only

one stochastically stable state for the perturbed Markov process

MPε. According to Proposition 2, if the stochastically stable

state T∈Zis unique, then we have

lim

ε→0π(T,ε) = lim

ε→0Pr (sn)n∈N ,(αn=1)

n∈N 

= lim

ε→0

n∈N

Pr(sn,α

n=1)

= lim

T→∞,ε→0

n∈N

t(sn,α

n=1)

T=1,(26)

where t(sn,α

n=1) is the number of occurrences of the

corresponding state during period T. In Algorithm 2, each

player will choose the most frequently recorded strategy which

makes its mood equal to 1 (as shown in (20)). Therefore, the

unique efﬁcient strategy proﬁle can be achieved by applying

the proposed distributed approach.

Now, the proof is completed. 

REFERENCES

[1] J. Huang, R. Berry, and M. Honig, “Distributed interference compensation

for wireless networks,” IEEE J. Sel. Areas Commun., vol. 24, no. 5,

pp. 1074–1084, May 2006.

[2] L. Rose, S. Lasaulce, S. Perlaza, and M. Debbah, “Learning equilib-

ria with partial information in decentralized wireless networks,” IEEE

Commun. Mag., vol. 49, no. 8, pp. 136–142, Aug. 2011.

[3] Q. Wu et al., “Distributed channel selection in time-varying radio environ-

ment: Interference mitigation game with uncoupled stochastic learning,”

IEEE Trans. Veh. Technol., vol. 62, no. 9, pp. 4524–4538, Nov. 2013.

[4] L. Rose, S. Perlaza, C. Le Martret, and M. Debbah, “Self-organization in

decentralized networks: A trial and error learning approach,” IEEE Trans.

Wireless Commun., vol. 13, no. 1, pp. 268–279, Jan. 2014.

[5] W. Kiess and M. Mauve, “A survey on real-world implementations of

mobile ad-hoc networks,” Ad Hoc Netw., vol. 5, no. 3, pp. 324–339,

Apr. 2007.

[6] O. Aliu, A. Imran, M. Imran, and B. Evans, “A survey of self organisation

in future cellular networks,” IEEE Commun. Surveys Tuts., vol. 15, no. 1,

pp. 336–361, 2013.

[7] M. Peng, D. Liang, Y. Wei, J. Li, and H.-H. Chen, “Self-conﬁguration

and self-optimization in LTE-advanced heterogeneous networks,” IEEE

Commun. Mag., vol. 51, no. 5, pp. 36–45, May 2013.

[8] A. MacKenzie and S. Wicker, “Game theory and the design of self-

conﬁguring, adaptive wireless networks,” IEEE Commun. Mag., vol. 39,

no. 11, pp. 126–131, Nov. 2001.

[9] F. Wang, M. Krunz, and S. Cui, “Price-based spectrum management in

cognitive radio networks,” IEEE J. Sel. Topics Signal Process.,vol.2,

no. 1, pp. 74–87, Feb. 2008.

[10] A. Zappone, Z. Chong, E. Jorswieck, and S. Buzzi, “Energy-aware com-

petitive power control in relay-assisted interference wireless networks,”

IEEE Trans. Wireless Commun., vol. 12, no. 4, pp. 1860–1871, Apr. 2013.

[11] G. Bacci, E. V. Belmega, P. Mertikopoulos, and L. Sanguinetti, “Energy-

aware competitive link adaptation in small-cell networks,” in Proc.

Int. Workshop Resource Allocation Wireless Netw., Hammamet, Tunisia,

May 2014, pp. 1–8.

3620 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014

[12] M. Bennis, S. Perlaza, P. Blasco, Z. Han, and H. Poor, “Self-organization

in small cell networks: A reinforcement learning approach,” IEEE Trans.

Wireless Commun., vol. 12, no. 7, pp. 3202–3212, Jul. 2013.

[13] P. S. Sastry, V. V. Phansalkar, and M. Thathachar, “Decentralized learning

of Nash equilibria in multi-person stochastic games with incomplete in-

formation,” IEEE Trans. Syst., Man, Cybern., vol. 24, no. 5, pp. 769–777,

May 1994.

[14] Z. Han, C. Pandana, and K. Liu, “Distributive opportunistic spectrum ac-

cess for cognitive radio using correlated equilibrium and no-regret learn-

ing,” in Proc. IEEE WCNC, Kowloon, Hong Kong, 2007, pp. 11–15.

[15] M. Xiao, N. Shroff, and E. K. P. Chong, “A utility-based power-control

scheme in wireless cellular systems,” IEEE/ACM Trans. Netw., vol. 11,

no. 2, pp. 210–221, Apr. 2003.

[16] J. Zhang and Q. Zhang, “Stackelberg game for utility-based coopera-

tive cognitiveradio networks,” Proc. Proc. 10th ACM Int. Symp. Mobile

Ad Hoc Netw. Comput., pp. 23–32, 2009.

[17] H. Lin, M. Chatterjee, S. Das, and K. Basu, “ARC: An integrated ad-

mission and rate control framework for competitive wireless CDMA data

networks using noncooperative games,” IEEE Trans. Mobile Comput.,

vol. 4, no. 3, pp. 243–258, May/Jun. 2005.

[18] D. T. Ngo, L. B. Le, T. Le-Ngoc, E. Hossain, and D. I. Kim, “Distributed

interference management in two-tier CDMA femtocell networks,” IEEE

Trans. Wireless Commun., vol. 11, no. 3, pp. 979–989, Mar. 2012.

[19] Q. D. La, Y. Chew, and B.-H. Soong, “An interference-minimization

potential game for OFDMA-based distributed spectrum sharing systems,”

IEEE Trans. Veh. Technol., vol. 60, no. 7, pp. 3374–3385, Sep. 2011.

[20] Q. D. La, Y. Chew, and B.-H. Soong, “Performance analysis of down-

link multi-cell OFDMA systems based on potential game,” IEEE Trans.

Wireless Commun., vol. 11, no. 9, pp. 3358–3367, Sep. 2012.

[21] S. Buzzi, G. Colavolpe, D. Saturnino, and A. Zappone, “Potential games

for energy-efﬁcient power control and subcarrier allocation in uplink

multicell OFDMA systems,” IEEE J. Sel. Topics Signal Process.,vol.6,

no. 2, pp. 89–103, Apr. 2012.

[22] C. Xu, M. Sheng, C. Yang, X. Wang, and L. Wang, “Pricing-based mul-

tiresource allocation in OFDMA cognitive radio networks: An energy

efﬁciency perspective,” IEEE Trans. Veh. Technol., vol. 63, no. 5,

pp. 2336–2348, Jun. 2014.

[23] D. Monderer and L. Shapley, “Potential games,” Games Econ. Behavior,

vol. 14, no. 1, pp. 124–143, May 1996.

[24] R. Cominetti, E. Melo, and S. Sorin, “A payoff-based learning procedure

and its application to trafﬁc games,” Games Econ. Behavior, vol. 70, no. 1,

pp. 71–83, Sep. 2010.

[25] R. B. Myerson, Game Theory: Analysis of Conﬂict. Cambridge, MA,

USA: Harvard Univ. Press, 2013.

[26] J. R. Marden, L. Y. Pao, and H. P. Young, “Achieving Pareto optimality

through distributed learning,” Dept. Econ., Univ. Oxford, Oxford, U.K.,

Jul. 2011, Tech. Rep.

[27] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.

Cambridge, U.K.: Cambridge Univ. Press, 2005.

[28] A. Goldsmith, Wireless Communications. Cambridge, U.K.: Cambridge

Univ. Press, 2004.

[29] B. Babadi and V. Tarokh, “Gadia: A greedy asynchronous distributed in-

terference avoidance algorithm,” IEEE Trans. Inf. Theory, vol. 56, no. 12,

pp. 6228–6252, Dec. 2010.

[30] H. P. Young, “The evolution of conventions,” Econometrica, vol. 61, no. 1,

pp. 57–84, Jan. 1993.

Min Sheng (M’03) received the M.S. and Ph.D.

degrees in communication and information systems

from Xidian University, Shaanxi, China, in 1997 and

2000, respectively.

She is currently a Full Professor at the Broadband

Wireless Communications Laboratory, the School

of Telecommunications Engineering, Xidian Univer-

sity. Her general research interests include mobile

ad hoc networks, wireless sensor networks, wireless

mesh networks, third generation (3G)/fourth genera-

tion (4G) mobile communication systems, dynamic

radio resource management (RRM) for integrated services, cross-layer al-

gorithm design and performance evaluation, cognitive radio and networks,

cooperative communications, and medium access control (MAC) protocols. She

has published two books and over 50 papers in refereed journals and conference

proceedings.

Dr. Sheng was the New Century Excellent Talents in University by the

Ministry of Education of China, and obtained the Young Teachers Award by

the Fok Ying-Tong Education Foundation, China, in 2008.

Chao Xu received the B.S. degree in electronic infor-

mation engineering from Xidian University, Xi’an,

China, in 2009, where he is currently working toward

the Ph.D. degree in communication and information

systems with the Institute of Information and Sci-

ence, Broadband Wireless Communications Labora-

tory, School of Telecommunications Engineering.

From June to September 2014, he was a visiting

student with the Singapore University of Technology

and Design, Singapore, under the supervision of

Prof. Tony Q. S. Quek. His research interests focus

on dynamic radio resource management, cognitive radio and networks, energy

efﬁcient transmission, distributed algorithm design, and the applications of

game theory and learning theory in wireless communications.

Xijun Wang (M’12) received the B.S. degree with

distinction in telecommunications engineering from

Xidian University, Xi’an, Shaanxi, China, in 2005.

He received the Ph.D. degree in electronic engi-

neering from Tsinghua University, in January 2012,

Beijing, China.

Since 2012, he has been with the School of Tele-

communications Engineering, Xidian University,

where he is currently an Assistant Professor. His

research interests include wireless communications

and cognitive radios and interference management.

Dr. Wang served as a Publicity Chair of IEEE/CIC ICCC 2013. He was a reci-

pient of the 2005 “Outstanding Graduate of Shaanxi Province” Award, the Excel-

lent Paper Award at 6th International Student Conference on Advanced Science

and Technology in 2011, the Best Paper Award at IEEE/CIC ICCC 2013.

Yan Zhang (M’12) received B.S. and Ph.D. degrees

from Xidian University, Xi’an, China, in 2005 and

2010, respectively. He is currently an Associate Pro-

fessor in Xidian University.

His research interests include cooperative cogni-

tive networks, self-organizing networks, media ac-

cess protocol design, energy-efﬁcient transmission

and dynamic radio resource management (RRM) in

heterogeneous networks.

Weijia Han (S’07–M’11) received the B.S. degree

from Northwest University, China, the M.S. degree

from Queen’s University Belfast, UK, and the Ph.D.

degree from Xidian University, Xi’an, China. He

is currently a Lecturer in Xidian University, Xi’an,

China.

His research interests include sensing in cognitive

radio networks, resource management and network

optimization, cognitive media access protocol and

algorithm design.

Jiandong Li (SM’05) received the B.S., M.S., and

Ph.D. degrees in communications and electronic sys-

tems from Xidian University, Xi’an, China, in 1982,

1985, and 1991, respectively.

In 1985, he joined Xidian University, where he

has been a Professor since 1994 and the Vice-

President since 2012. His current research inter-

ests and projects consist of mobile communications,

broadband wireless systems, ad hoc networks, cog-

nitive and software radio, self-organizing networks,

and game theory for wireless networks.

Dr. Li is a Senior Member of the China Institute of Electronics and a Fellow

of the China Institute of Communication. He was a member of the PCN Special-

ist Group for the China 863 Communication High Technology Program between

January 1993 and October 1994 and from 1999 to 2000.He is also a member of the

Communication Specialist Group for The Ministry of Industry and Information.

Ambient Backscatter Communication System Empowered by Matching Game and Machine Learning for Enabling Massive IoT Over 6G HetNets

Article

Full-text available

Jan 2024

Ambient backscatter communication (ABC) is considered as a promising paradigm for meeting the 6G massive Internet of Things (IoT) requirements which is expected to revolutionize our world. In this paper, a new multimode matching game and machine learning-based IoT ambient backscatter communication scheme is proposed to maximize the ABC system rate and capacity over the LTE and Wi-Fi multi-RAT heterogeneous network, thereby supporting the 6G green massive IoT communication. The proposed algorithm is designed to support different rate and capacity requirements for different massive Machine Type Communication (mMTC) use cases such as sensor networks, smart grid, agriculture and low data rate Ultra Reliable Low Latency Communication (URLLC) use cases such as Tactile Interaction. The proposed optimization algorithm runs into two phases, the first one is a matching game-based algorithm that selects the optimum association between the IoT tags and the primary users (PU) downlink signals from a specific base station which maximizes the IoT tags rate while minimizing the resulting interference to the PU. Each IoT tag can ride the PU downlink signal using one of three different riding modes according to the required IoT ABC system rate and capacity, whereas mode 1 allows multiple IoT tags to ride the whole PU downlink signal resource blocks, in mode 2 each IoT tag can ride only one subcarrier from the PU downlink signal resource blocks, while in mode 3 multiple IoT tags can ride the same subcarrier from the PU downlink signal resource blocks. In addition, unmanned aerial vehicles (UAVs) flying HetNodes equipped with LTE and Wi-Fi receivers are used as backscatter receivers to receive the IoT tags uplink backscattered signals, so the second optimization phase is formulated to maximize the total sum rate of the ABC system by dividing its service area into clusters using the enhanced unsupervised k-means algorithm, also the enhanced k-means algorithm finds the optimum location of each cluster’s serving UAV flying HetNode that maximizes the channels gain between the IoT tags and the serving UAV flying HetNode in order to maximize the total system rate. The system model was implemented within the MATLAB environment where simulations across the various scenarios are conducted to assess the effectiveness of the proposed algorithm. Simulation results and the performance analysis demonstrated that the proposed algorithm can support the required rate for the most mMTC and low data rate URLLC IoT applications with average IoT tag rates in the range of 15 Kbps to 115 Kbps, and outperforms the algorithm-free riding technique in the case of massive IoT applications. The proposed mode 2 (first enhanced mode) achieves the best performance in terms of the average IoT tags rate and the total system rate with the lowest interference to the primary system users, on the other hand, mode 3 (second enhanced mode) improves the system capacity with maximum IoT tags satisfaction ratio. The capacity and satisfaction ratio of the proposed mode 3 outperforms mode 1 by 300% and 138% respectively, and outperforms mode 2 by 2,000% and 420% respectively. The proposed algorithm reduces the interference power to the PUs on the average by 1:(15.69×10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-12</sup> ) relative to the algorithm-free riding technique. From the result, we can conclude that the proposed algorithm supports different IoT applications and achieves the required data rates with minimal effect on the primary system keeping the PU’s data rate within the required range compared to the algorithm-free riding technique with the cost of higher time complexity.

Deep learning‐assisted reconfigurable intelligent surface for enhancing 6G mobile networks

Article

Full-text available

Nov 2023

This article presents the role of Reconfigurable Intelligent Surface (RIS) using the Multi‐Input Multi‐Output (MIMO) technique as the enabling technology to boost the achievable data rate for mobile networks of the sixth generation (6G). The RIS has been adopted to mitigate the interference at the Cell Edge User (CEU) placed where two adjacent cells are separated. That is by reflecting the incident interference signal in the CEU direction out of phase with the basic interference signals coming from the interfering BS towards the CEU. This article adopts an efficient solution for designing the RIS redirecting (reflection) matrix with trivial training overhead using Deep Learning (DL) technology. Whereas a few of the reflecting elements in RIS are chosen to be active (attached to the baseband), whilst the majority are chosen to be passive, in which the active element's channels are known and used as medium indicators and indicate further the positions of the transmitter and receiver. This article illustrated the efficacy of the adopted framework with DL, by changing the training parameters regarding the data rates that can be achieved, the Spectral Energy Efficiency (SEE), and the Satisfaction Rate (SR). As a result, the proposed model improves the achievable data rate by a near average of 97% above the reference model. Furthermore, it enhances the achievable data rate above the baseline model that assumes no RIS used by an average of 115%. Therefore, The DL method demonstrated that the proposed model is promising for enhancing 6G Mobile Networks.

Transfer Learning for Autonomous Cell Activation Based on Relational Reinforcement Learning With Adaptive Reward

Article

Mar 2021

With the increasing threat of global warming due to high energy consumption of wireless network infrastructure, cell activation complements the capabilities of next-generation wireless technology. In this article, we propose an energy consumption optimization strategy based on deep reinforcement learning (DRL) and transfer learning (TL) techniques. We implement an adaptive reward to autonomously adjust parameters in a reward function to balance energy consumption and quality of service (QoS) requirement of users during the learning process. We further formulate a cell activation/deactivation problem as a Markov decision process and set up our proposed relational DRL model to meet the QoS requirements of users with a minimum number of active remote radio heads under a traffic model defined to simulate a real-world scenario. A weighted TL algorithm has been developed in DRL to validate sample data from a source task. Extensive simulations reveal that the proposed scheme based on the adaptive reward has better performance in balancing the QoS requirement of users and system energy consumption. Finally, based on our simulation results, we conclude that combining DRL with TL speeds up the learning process.

Hierarchical DRL-empowered Network Slicing in Space-Air-Ground Networks

Conference Paper

Dec 2023

SNAF: DRL-based Interdependent E2E Resource Slicing Scheme for a Virtualized Network

Article

Jul 2023

The fifth generation (5G) mobile cellular network relies on network slicing (NS) to satisfy the diverse quality of service (QoS) requirements of various service providers operating on a standard shared infrastructure. However, the synchronization of radio access network (RAN) and core network (CN) slicing has not been well-studied as an interdependent resource allocation problem. This work proposes a novel slice-to-node access factor (SNAF)-based end-to-end (E2E) slice resource provisioning scheme and deep reinforcement learning (DRL)-based real-time resource allocation algorithm for E2E interdependent resource slicing and allocation, respectively, specifically for RAN and CN. To ensure effective resource slicing and allocation, we consider the versatile user equipment (UEs) QoS requirements on transmission delay and data rate. Notably, the SNAF-based scheme provides proper resource provisioning and traffic synchronization, while the DRL-based algorithm allocates radio resources based on affordable traffic and backhaul resources. Based on the 5G air interface, we conduct system-level simulations to evaluate the performance of our proposed methods from various perspectives. Simulation results confirm that our proposed SNAF and DRL-based interdependent E2E resource slicing and allocation techniques achieve better E2E traffic-resource synchronization, and improve the QoS satisfaction with minimal resource utilization compared to other existing benchmark schemes.

Partition Placement and Resource Allocation for Multiple DNN-Based Applications in Heterogeneous IoT Environments

Article

Jun 2023

The evolution of the Internet of Things (IoT) has been driving the explosive growth of deep neural network (DNN)-based applications and processing demands. Hence, edge computing has emerged as a potential solution to meet these processing requirements. However, emerging IoT applications have increasingly demanded to run multiple DNNs to extract multifaceted knowledge, requiring more computational resources and increasing response time. Consequently, edge nodes cannot act as a complete substitute for the previous cloud paradigm, owing to their relatively limited resources. To address this problem, we propose to incorporate nearby IoT devices when allocating resources to multiple DNN models. Furthermore, the optimization of resource allocation can be hindered by the heterogeneity of IoT devices, which affects the delay performance of DNN-based computing. In this context, we propose a DNN partition placement and resource allocation strategy that considers different processing powers, memory, and battery levels for heterogeneous IoT devices. We evaluate the performance of the proposed strategy through extensive simulations. Simulation results reveal that the proposed strategy outperforms other existing solutions in terms of end-to-end delay, service probability, and energy consumption. The proposed solution was further simulated in a Kubernetes testbed consisting of actual devices to assess its feasibility.

SNAF-based Interdependent E2E Network Resource Slicing Scheme for a Virtualized Network

Conference Paper

Aug 2022

QoE-Driven Distributed Resource Optimization for Mixed Reality in Dynamic TDD Systems

Article

Nov 2022

With the full development of intelligent mobile communications, wireless mixed reality (MR) provides a more visually immersive experience and stronger interaction with environments than virtual reality (VR) and augmented reality (AR). However, the asymmetric characteristic of wireless MR traffic creates a huge challenge to current mobile networks. Dynamic time division duplex (D-TDD) is considered as a promising technology to improve wireless MR users’ quality of experience (QoE) due to its potentials and advantages in delivering asymmetric traffic. Therefore, in this paper, we propose a QoE-driven distributed multidimensional resource allocation (MRA) supplemented by inter-cell interference (ICI) mitigation scheme for wireless MR in multi-cell D-TDD systems. First, to improve QoE of MR users, we formulate the joint optimization of subframe configuration, channel assignment and computation offloading as a mixed-integer nonlinear programming problem. A novel fully-decentralized multi-agent deep Q-network (DQN) algorithm is developed to solve the problem. Then, to mitigate ICI, a water filling based power control algorithm is investigated to minimize the total power of each small base station and its associated MR users. Simulation results demonstrate that our proposed scheme improves QoE of MR users in a realizable way as compared to existing schemes.

AoI Optimal Dynamic Power Control for IoT networks: A DRDPG Approach

Conference Paper

Aug 2021

Trial and Error Learning for Dynamic Distributed Channel Allocation in Random Medium

Article

Jun 2021

This paper considers the problem of fully distributed channel allocation in clustered wireless networks when the propagation medium is random. We extend here the existing Trial and Error (TE) framework developed in the deterministic case and for which strong convergence properties hold. We prove that using directly this solution in the random context leads to unsatisfactory solutions. Then we propose an adaptation of the original Trial and Error Learning (TEL) algorithm, called Robust TEL (RTEL), assuming that the random channel effects translate into a bounded stochastic disturbance of the utility function. The solution consists in introducing thresholds in the transitions of the TEL’s Finite State Controller (FSC). We prove that this new solution restores the good convergence property inherited from the TEL. Furthermore, we provide analysis of the stochastic utilities in the Rayleigh fading case in order to check the bounded assumption. Finally, we develop an online algorithm that dynamically estimates the optimal threshold values to adapt to the instantaneous disturbance. Numerical results corroborate our theoretical claims.

Energy-Aware Competitive Link Adaptation in Small-Cell Networks (Invited Paper)

Conference Paper

Full-text available

May 2014

This work proposes a distributed power allocation scheme for maximizing the energy efficiency in the uplink of non-cooperative small-cell networks based on orthogonal frequency-division multiple-access technology. This is achieved by modeling user terminals as rational agents that engage in a non-cooperative game in which every terminal selects the power loading so as to maximize its own utility (the user's throughput per Watt of transmit power) while satisfying minimum rate constraints. In this framework, we prove the existence of a Debreu equilibrium (also known as generalized Nash equilibrium) and we characterize the structure of the corresponding power allocation profile using techniques drawn from fractional programming. To attain the equilibrium in a distributed fashion, we also propose a method based on an iterative water-filling best response process. Numerical simulations are then used to assess the convergence of the proposed algorithm and the performance of its end-state as a function of the system parameters.

Self-Organization in Decentralized Networks: A Trial and Error Learning Approach

Article

Full-text available

Jan 2014

In this paper, the problem of channel selection and power control is jointly analyzed in the context of multiple-channel clustered ad-hoc networks, i.e., decentralized networks in which radio devices are arranged into groups (clusters) and each cluster is managed by a central controller (CC). This problem is modeled by game in normal form in which the corresponding utility functions are designed for making some of the Nash equilibria (NE) to coincide with the solutions to a global network optimization problem. In order to ensure that the network operates in the equilibria that are globally optimal, a learning algorithm based on the paradigm of trial and error learning is proposed. These results are presented in the most general form and therefore, they can also be seen as a framework for designing both games and learning algorithms with which decentralized networks can operate at global optimal points using only their available local knowledge. The pertinence of the game design and the learning algorithm are highlighted using specific scenarios in decentralized clustered ad hoc networks. Numerical results confirm the relevance of using appropriate utility functions and trial and error learning for enhancing the performance of decentralized networks.

A Survey of Self Organisation in Future Cellular Networks

Article

Full-text available

Jan 2013

This article surveys the literature over the period of the last decade on the emerging field of self organisation as applied to wireless cellular communication networks. Self organisation has been extensively studied and applied in adhoc networks, wireless sensor networks and autonomic computer networks; however in the context of wireless cellular networks, this is the first attempt to put in perspective the various efforts in form of a tutorial/survey. We provide a comprehensive survey of the existing literature, projects and standards in self organising cellular networks. Additionally, we also aim to present a clear understanding of this active research area, identifying a clear taxonomy and guidelines for design of self organising mechanisms. We compare strength and weakness of existing solutions and highlight the key research areas for further development. This paper serves as a guide and a starting point for anyone willing to delve into research on self organisation in wireless cellular communication networks.

The evolution of conventions

Article

Jan 1993
ECONOMETRICA

H.P. Young

Fundamentals of Wireless Communication (Hardcover)

Article

May 2005

The past decade has seen many advances in physical layer wireless communication theory and their implementation in wireless systems. This textbook takes a unified view of the fundamentals of wireless communication and explains the web of concepts underpinning these advances at a level accessible to an audience with a basic background in probability and digital communication. Topics covered include MIMO (multi-input, multi-output) communication, space-time coding, opportunistic communication, OFDM and CDMA. The concepts are illustrated using many examples from real wireless systems such as GSM, IS-95 (CDMA), IS-856 (1 x EV-DO), Flash OFDM and UWB (ultra-wideband). Particular emphasis is placed on the interplay between concepts and their implementation in real systems. An abundant supply of exercises and figures reinforce the material in the text. This book is intended for use on graduate courses in electrical and computer engineering and will also be of great interest to practising engineers.

Distributed Channel Selection in Time-Varying Radio Environment: Interference Mitigation Game With Uncoupled Stochastic Learning

Article

Nov 2013

Pricing-Based Multiresource Allocation in OFDMA Cognitive Radio Networks: An Energy Efficiency Perspective

Article

Jun 2014

Both the orthogonal frequency-division multiple access (OFDMA) and cognitive radio (CR) technologies offer great flexibility and feasibility for future green wireless communications. In this paper, an energy-efficient multiresource-allocation scheme is proposed for OFDMA CR networks (CRNs) with multiple secondary transmitters (STs). To maximize the energy efficiency (EE) and guarantee the primary transmitter's (PT's) quality-of-service (QoS) requirement, a linear pricing technique is employed to handle both the inter-ST coupling (the spectrum competition among STs) and intra-ST coupling (the correlation between the available transmit power and assigned subchannels for each ST). Furthermore, a distributed algorithm is devised to harvest the multiresource and multiuser gains, and the multiresource allocation is transformed to 1-D pricing-factor profile searching. Simulation results demonstrate that the proposed strategy brings higher EE. Additionally, by adjusting the pricing factors, a different performance can be achieved by the STs with different priorities.

Achieving pareto optimality through distributed learning

Conference Paper

Dec 2012

We propose a simple payoff-based learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any n-person game with generic payoffs. The algorithm requires no communication. Agents respond solely to changes in their own realized payoffs, which are affected by the actions of other agents in the system in ways that they do not necessarily understand. The method can be applied to the optimization of complex systems with many distributed components, such as the routing of information in networks and the design and control of wind farms.

Self-Configuration and Self-Optimization in LTE-Advanced Heterogeneous Networks

Article

May 2013

Self-organizing network, or SON, technology, which is able to minimize human intervention in networking processes, was proposed to reduce the operational costs for service providers in future wireless systems. As a cost-effective means to significantly enhance capacity, heterogeneous deployment has been defined in the 3GPP LTEAdvanced standard, where performance gains can be achieved through increasing node density with low-power nodes, such as pico, femto, and relay nodes. The SON has great potential for application in future LTE-Advanced heterogeneous networks, also called HetNets. In this article, state-of-the-art research on self-configuring and self-optimizing HetNets are surveyed, and their corresponding SON architectures are introduced. In particular, we discuss the issues of automatic physical cell identifier assignment and radio resource configuration in HetNets based on selfconfiguring SONs. As for self-optimizing SONs, we address the issues of optimization strategies and algorithms for mobility management and energy saving in HetNets. At the end of the article, we show a testbed designed for evaluating SON technology, with which the performance gain of SON algorithms is demonstrated.

Self-Organization in Small Cell Networks: A Reinforcement Learning Approach

Article

Jul 2013

In this paper, a decentralized and self-organizing mechanism for small cell networks (such as micro-, femto- and picocells) is proposed. In particular, an application to the case in which small cell networks aim to mitigate the interference caused to the macrocell network, while maximizing their own spectral efficiencies, is presented. The proposed mechanism is based on new notions of reinforcement learning (RL) through which small cells jointly estimate their time-average performance and optimize their probability distributions with which they judiciously choose their transmit configurations. Here, a minimum signal to interference plus noise ratio (SINR) is guaranteed at the macrocell user equipment (UE), while the small cells maximize their individual performances. The proposed RL procedure is fully distributed as every small cell base station requires only an observation of its instantaneous performance which can be obtained from its UE. Furthermore, it is shown that the proposed mechanism always converges to an epsilon Nash equilibrium when all small cells share the same interest. In addition, this mechanism is shown to possess better convergence properties and incur less overhead than existing techniques such as best response dynamics, fictitious play or classical RL. Finally, numerical results are given to validate the theoretical findings, highlighting the inherent tradeoffs facing small cells, namely exploration/exploitation, myopic/foresighted behavior and complete/incomplete information.

Utility-Based Resource Allocation for Multi-Channel Decentralized Networks

Abstract and Figures

Recommended publications

Evaluation of AFFINISOL HPMCAS HP Polymers in Hot Melt Extrusion Applications

Cooperative Decentralized Resource Allocation in Heterogeneous Wireless Access Medium

Distributed Subchannel Allocation for Interference Mitigation in OFDMA Femtocells: A Utility-Based L...

CORASMA Program on Cognitive Radio for Tactical Networks: High Fidelity Simulator and First Results...

Trial and Error Learning for Dynamic Distributed Channel Allocation in Random Medium

On the decentralized management of scrambling codes in small cell networks

Robust Energy Efficiency Maximization in Cognitive Radio Networks: The Worst-Case Optimization Appro...