ArticlePDF Available

Utility-Based Resource Allocation for Multi-Channel Decentralized Networks

Authors:

Abstract and Figures

The architecture of decentralization makes future wireless networks more flexible and scalable. However, due to the lack of the central authority (e.g., BS or AP), the limitation of spectrum resource, and the coupling among different users, designing efficient resource allocation strategies for decentralized networks faces a great challenge. In this paper, we address the distributed channel selection and power control problem for a decentralized network consisting of multiple users, i.e., transmit-receiver pairs. Particularly, we first take the users' interactions into account and formulate the distributed resource allocation problem as a non-cooperative transmission control game (NTCG). Then, a utility-based transmission control algorithm (UTC) is developed based on the formulated game. Our proposed algorithm is completely distributed as there is no information exchange among different users and hence, is especially appropriate for this decentralized network. Furthermore, we prove that the global optimal solution can be asymptotically obtained with the devised algorithm, and more importantly, in contrast to existing utility-based algorithms, our method does not require that the converging point is one Nash equilibrium (NE) of the formulated game. In this light, our algorithm can be adopted to achieve efficient resource allocation in more general use cases.
Content may be subject to copyright.
3610 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014
Utility-Based Resource Allocation for Multi-Channel
Decentralized Networks
Min Sheng, Member, IEEE, Chao Xu, Xijun Wang, Member, IEEE, Yan Zhang, Member, IEEE,
Weijia Han, Member, IEEE, and Jiandong Li, Senior Member, IEEE
Abstract—The architecture of decentralization makes future
wireless networks more flexible and scalable. However, due to the
lack of the central authority (e.g., BS or AP), the limitation of spec-
trum resource, and the coupling among different users, designing
efficient resource allocation strategies for decentralized networks
faces a great challenge. In this paper, we address the distributed
channel selection and power control problem for a decentralized
network consisting of multiple users, i.e., transmit-receiver pairs.
Particularly, we first take the users’ interactions into account and
formulate the distributed resource allocation problem as a non–
cooperative transmission control game (NTCG). Then, a utility-
based transmission control algorithm (UTC) is developed based
on the formulated game. Our proposed algorithm is completely
distributed as there is no information exchange among different
users and hence, is especially appropriate for this decentralized
network. Furthermore, we prove that the global optimal solution
can be asymptotically obtained with the devised algorithm, and
more importantly, in contrast to existing utility-based algorithms,
our method does not require that the converging point is one
Nash equilibrium (NE) of the formulated game. In this light, our
algorithm can be adopted to achieve efficient resource allocation
in more general use cases.
Index Terms—Decentralized networks, distributed resource
allocation, learning, game theory.
I. INTRODUCTION
DECENTRALIZED networks are the infrastructure-less
wireless networks consisting of multiple transmit-receive
pairs, where each transmitter could dynamically adjust its trans-
mission parameters and transmit data to its receiver [1]–[4].
Compared to the conventional networks with the control of cen-
tral authorities, e.g., BSs or APs, decentralized networks have
more flexibility and scalability, and hence, span a large number
of real-world implementations, e.g., military communications,
disaster relief or sensor networking [2], [4], [5].
Manuscript received January 27, 2014; revised June 18, 2014; accepted
August 24, 2014. Date of publication September 11, 2014; date of current ver-
sion October 17, 2014. This work was supported in part by the National Natural
Science Foundation of China under Grants 61231008, 61172079, 61201141,
61301176, and 91338114, by the 863 Project under Grant 2014AA01A701,
and by the 111 Project under Grant B08038. The associate editor coordinating
the review of this paper and approving it for publication was Y. J. Zhang.
The authors are with the State Key Laboratory of ISN, Xidian University,
Xi’an 710071, China (e-mail: msheng@mail.xidian.edu.cn; cxu@mail.xidian.
edu.cn; xijunwang@xidian.edu.cn; yanzhang@xidian.edu.cn; alfret@gmail.
com; jdli@mail.xidian.edu.cn).
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TCOMM.2014.2357028
The main characteristics of a decentralized network can be
summarized as follows.
1) The lack of central controller. In such an infrastructure-
less network, each transmitter is responsible for tuning
its transmission strategy, e.g., transmission frequency,
bandwidth, power, modulation, etc., based on its local
observation. Therefore, self-organization is one funda-
mental capability for a decentralized network [6], [7].
2) The limitation of spectrum resource. The available chan-
nels are limited in a decentralized network, and hence,
users should compete for this precious resource to im-
prove their individual performance, e.g., transmission rate
or energy efficiency, thereby satisfying their individual
QoS requirement.
3) The coupling among different users. Interference occurs
when different users transmit on the same channel simul-
taneously. Therefore, each user’s performance could be
tuned by properly adjusting the operational parameters of
other users. In other words, the users are coupled.
According to the above three characteristics, there exist
two kinds of conflicts in a decentralized network. One is the
conflict between different users which is caused by the last
two characteristics, i.e., the limitation of spectrum resource and
coupling among different users. The other one is the conflict
between system performance and individual requirement which
is mainly introduced by the lack of a central controller. In fact,
these two conflicts always make a decentralized network oper-
ate at an inefficient point, which is termed as price of anarchy
(PoA). For instance, considering some users who operate on
the same channel, if all of them want to maximize their own
transmission rate through power control, then the maximum
transmit power will be adopted by everyone. Obviously, this
is not an efficient power control scheme for this system [8], [9].
In this light, to exploit the benefits promised by the decen-
tralized networks, it is essential to design distributed resource
allocation strategies which should fully consider these two
conflicts. Fortunately, game theory which provides a suitable
paradigm to analyze the interrelationship between decision
makers, can be naturally adopted to deal with the first conflict
[2], [6]–[8], [10], [11]. However, designing globally optimal
or even Pareto-efficient (Pareto-optimal)1distributed resource
allocation algorithms for a decentralized network is still an open
problem [2], [6], [7].
1Generally speaking, it is easy to prove that the global optimal solution is
also Pareto-optimal, but not vice versa.
0090-6778 © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3611
In this paper, we consider a multi-user multi-channel decen-
tralized network, where each user (consisting of a transmitter
and receiver pair) is capable of performing channel selection
and power allocation to satisfy its transmission rate require-
ment. In addition, to avoid the high communication overhead,
we focus on the network where there is no information ex-
change among different users, i.e., no common control channel
(CCC) is introduced. We note that this consideration makes the
scenario more practical but on the other hand, brings us more
difficulties in designing efficient resource allocation strategies
[2]–[4], [12]–[14].
Because of the limitation of spectrum resource and coupling
among different users, not all the rate requirements of users
(i.e., transmit-receive pairs) can be simultaneously satisfied
[4]. Furthermore, recalling that there is no central controller
being responsible for scheduling users’ transmission, it is a
great challenge to provide hard rate guarantee to every user
in this decentralized network. For this reason, as studied in
previous work [15]–[18], we consider softening users’ require-
ments and use a sigmoid function to measure their satisfaction.
Specifically, one user has very limited satisfaction when its
transmission rate is below the requirement, but the satisfaction
rapidly reaches an asymptotic value when its transmission
rate is above the requirement. Based on this, we formulate
the distributed channel selection and power control problem
as a non-cooperative transmission control game (NTCG). To
overcome the lack of communication between different users,
a utility-based learning approach is adopted2and a Utility-
based Transmission Control algorithm (UTC) is developed,
with which each user can configure its operational parameters
just by measuring local interference. More importantly, al-
though there is no guarantee that the Nash equilibrium (NE) for
NTCG always exists, it is proved that the decentralized network
could operate at a global optimal point by implementing UTC.
Finally, simulation results verify the validity of our analysis
and demonstrate that the performance of our algorithm (e.g.,
convergence speed, achieved overall utility, etc.) are better than
that of the existing distributed algorithms.
The remainder of this paper is organized as follows. In
Section II, the related work is presented. Section III describes
the system model and formulates the distributed channel se-
lection and power control problem. In Section IV, we develop
a utility-based transmission control algorithm and analyze its
complexity as well as efficiency. Finally, numerical and sim-
ulation results are presented and analyzed in Section V, and
conclusions are drawn in Section VI.
II. RELATED WORK
The game theoretic approach has been applied extensively to
design distributed resource allocation schemes in wireless com-
munication systems from both the perspective of transmission
rate as well as energy efficiency [9], [19]–[22]. In [19]–[21], the
concerning problem has been formulated as a potential game
[23], and then a best response dynamic (BRD) was adopted to
2The definition of utility-based learning approaches will be formally given
in next section.
achieve a pure-strategy NE. However, as discussed in the sem-
inal work [23], a potential game always admits multiple pure-
strategy NE solutions. Hence, for such a game the operating
point achieved by BRD totally depends on the starting point
and may be inefficient. To improve the efficiency of the devised
strategy, pricing technique was introduced in [9], [22] and the
Pareto efficiency of the achieved NE is proved.
We note that all of the above schemes require CCC for infor-
mation exchange among different agents. Hence, they are not
suitable for the decentralized network, and developing the so-
called utility-based or payoff-based learning algorithms is nec-
essary. Specifically, when implementing this type of algorithms,
each user only needs to access the history of its own actions and
utilities, and would make its decision with the local information
[24]. To this end, some distributed schemes based on stochastic
learning, no-regret learning and reinforcement learning have
been proposed in [12]–[14], respectively. It should be noted
that all the algorithms devised in [12]–[14] are utility-based,
but the converging solution is a probability distribution over the
set of available strategies. Therefore, the performance can only
be evaluated from a statistical perspective in [12]–[14], i.e., the
performance of each implementation is unpredictable [4].
Recently, some studies begin to focus on developing the
utility-based resource allocation strategy which can asymp-
totically converge to a fixed configuration (e.g., pure-strategy
NE) instead of a probability distribution [3], [4]. In [3], the
distributed channel selection problem was formulated as a
potential game, and then a utility-based learning algorithm was
proposed, which could converge to a pure NE. Furthermore, not
only channel selection but also power control was considered
in [4], and another utility-based strategy was designed for one
class of non-cooperative games. To be more specific, under
the assumption that the set of NE for the proposed game is
not empty and there is at least one NE maximizing the social
welfare (i.e., sum of the utilities of all users), the proposed
distributed channel selection and power control scheme can
asymptotically converge to the global optimal solution [4].
Actually, the above assumption is less plausible in many
general cases. The reason lies in two folds: 1. For a non-
cooperative game there is no guarantee that the pure-strategy
NE always exists [25].3Particularly, one example can be found
in [21], which is termed as the signal-to-interference-plus-noise
ratio (SINR) maximization game. 2. Even if the formulated
non-cooperative game admits a NE, the Pareto-efficiency of its
NE is hard to guarantee [19]–[21], [25]. Obviously, it is more
difficult to satisfy the more severe requirement that there exists
some NE which can maximize the social welfare. In this work, a
novel utility-based resource allocation algorithm is developed.
More importantly, it has been proved that, even if there is no
NE for the formulated game, we can also asymptotically obtain
the globally optimal solution with our proposed algorithm.
III. SYSTEM MODEL AND PROBLEM FORMULATION
As depicted in Fig. 1, we consider a decentralized net-
work featuring Ncommunicating users, each consisting of a
3Since the mixed-strategy NE will not be considered in this work, we use NE
to denote the pure-strategy NE hereafter for brevity.
3612 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014
Fig. 1. Illustration of a decentralized network, where each user consists of one
transmitting node and one receiving node.
transmit-receive pair. Particularly, to transmit data, every user
will choose one channel from the Korthogonal channels, each
of which has bandwidth B0. We consider that each channel can
be assigned to multiple users and meanwhile, the interference
occurs when each channel is simultaneously utilized by more
than one user. Without loss of generality, we suppose NK.
For notational simplicity, let vectors Nand Kdenote the set
of users and channels, respectively, i.e., N={1,2,···,N}
and K={1,2,···,K}. Additionally, we denote the channel
selected by user nby cn∈K. In this paper, we consider that
there is no CCC or central authority for coordination among
users. That is, all users are autonomous.
Let GRN×N×Kbe the channel power gain matrix, where
gk
n,m represents the channel gain between transmitter nand
receiver mon channel k. We assume the channel condition is
static during the underlying operational period, e.g., the quasi-
static scenario. The additive noise is modelled as a zero-mean
Gaussian random variable, and then, for user n, its signal-to-
interference-plus-noise ratio (SINR) can be expressed as
γn=pngcn
n,n
Icn
n+B0N0
=pngcn
n,n
m∈N ,m=n
δ(cm,c
n)pmgcn
m,n +B0N0
,(1)
where Inrepresents the interference caused to user n,pnis the
transmit power of user n, and N0is the noise power density.
Besides that, the indicative function δ(cm,c
n)is adopted to
indicate whether the same channel is used by user mand n
simultaneously or not: if cm=cn,δ(cm,c
n)=1; otherwise
δ(cm,c
n)=0. In this paper, we consider that each user n
can choose the transmit power pnfrom a finite set Pn=
{p1
n,p
2
n,···,p
max
n}[4], [12].
Based on the above, the achievable transmission rate of user
ncan be expressed as
Rn=B0log2(1 + γn).(2)
Adopting different channels and power levels, one user will
obtain different achievable rates. According to (1) and (2), if
user ntransmits on channel cn,Rncan be maximized with
power pmax
nwhen there is no interference. Therefore, the upper
bound of the rate Rnfor user ncan be defined as
Rmax
n=maxB0log21+pmax
ngcn
n,n
B0N0|cn∈K
.(3)
Moreover, we consider that each user nhas rate requirement
Rmin
nto satisfy its QoS requirement and assume that 0
Rmin
nRmax
n.
Intuitively, in this network not all the users’ rate requirements
can be guaranteed when they transmit simultaneously, espe-
cially for the case where all the users’ rate requirements are high
[4]. For instance, if Rmin
n=Rmax
n,n∈N, then there are at
most Ktransmissions being permitted. Here, to get around this
problem, we consider softening the user’s rate requirement and
measure its degree of satisfaction with a sigmoid function. In
fact, this approach has been widely adopted in radio resource
management [15]–[18]. To this end, the utility of each individ-
ual user can be expressed as
Un(Rn)= 1
1+eβn(RnRmin
n),n∈N,(4)
where βnis a constant deciding the steepness of the satisfactory
curve, and moreover, both Rnand Rmin
nare considered to have
units Mbps. It is clear from the above equation that Un(Rn)
is a monotonic increasing function with respect to Rn, i.e.,
individual users will feel more satisfied when they have higher
rate. Furthermore, since lim
Rn0Un(Rn)= 1
1+eβnRmin
n>0and
lim
Rn→∞ Un(Rn)=1, the utility of each user nis scaled between
0 and 1, i.e., Un(Rn)(0,1). We note that although the higher
utility means the higher spectral efficiency for given bandwidth,
the value of the former can not directly reflect the value of the
latter. Therefore, in the simulation results, not only the overall
utility Ubut also the average rate ¯
Rare recorded and shown to
evaluate the efficiency of different algorithms.
Before starting a transmission, each individual user should
decide to adopt which power level and transmit on which
channel. For notational simplicity, we refer to a pair of channel
index and power level as a strategy sn, i.e.,
sn=(cn,p
n)∈S
n,Sn=K×P
n,n∈N.(5)
From (1), (2), and (4), we note that each user’s rate is affected
by the transmissions of other users and meanwhile, higher rate
brings higher satisfaction to a user. Therefore, to improve the
degree of satisfaction or utility, each user should choose its own
strategy by considering the actions of other users. That is, there
is a coupling among the strategies employed by different users.
To well study the conflict among different users, NTCG has
been formulated, and hereafter, the terms user and player will
be used interchangeably.
Definition: NTCG: NTCG can be represented by the tuple
GN,(Sn)n∈N ,(Un)n∈N .(6)
Particularly, Ndenotes the set of players which is identical to
the user set. For each player n, its strategy space Snis defined
as shown in (5). Given a strategy profile
(sn)n∈N =(s1,s2,···,sN)(Sn)n∈N (7)
the utility function of each player nis
Un(sn)n∈N =UnRn(sn)n∈N ,n∈N,(8)
SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3613
where Rn((sn)n∈N )represents the achievable rate when player
nadopts the strategy sn=(cn,p
n), i.e.,
Rn(sn)n∈N =log
21+ pngcn
n,n
Icn
n(sn)+B0N0).(9)
In (9), sn=(s1,···,sn1,sn+1,···,sN)is the strategy pro-
file of all players other than player n, and Icn
n(sn)represents
the interference caused to player non channel cn.
Obtaining the optimal channel selection and power control
strategy for this decentralized network is equivalent to solving
the following combinatorial problem Pwhich is NP-hard.
P:max
c,p
n∈N
Un(cn,p
n)nN(10)
s.t. c∈{(c1,c
2,···,c
N)|∀cn∈K,n∈N},(11)
p∈{(p1,p
2,···,p
N)|∀pn∈P
n,n∈N}.(12)
The objective function (10) means that our objective is to
maximize the social welfare or overall utility, which is deter-
mined by both the achievable and required rate of users, i.e.,
(R1,R
2,···,R
N)and (Rmin
1,R
min
2,···,R
min
N). In addition,
constraints (11) and (12) specify each individual user’s avail-
able channel and power level sets, respectively.
Unfortunately, the above problem is an integer programming
which is extremely difficult to solve. Moreover, due to the
fact that there is no central authority controlling the users in
this decentralized network, developing a completely distributed
algorithm to obtain the optimal solution of Pis important and
non-trivial.
IV. DISTRIBUTED ALGORITHM DESIGN
In this section, we would develop a utility-based algorithm
for NTCG to achieve the solution of Pshown in Section III.
We first prove the uncertainty of the existence of NE for
NTCG, and then develop a utility-based algorithm. At the
end of this section, we will investigate the complexity of the
proposed algorithm and finally, prove that our algorithm can
asymptotically converges to the global optimal solution under
the given condition, no matter whether this solution is a NE of
the formulated game or not.
A. NE for NTCG
Recalling that there is no CCC for exchanging information
among different players, the utility-based learning algorithm is
therefore considered to be more appropriate for this distributed
environment. Actually, in the recent work [4], a similar problem
has also been studied and moreover, an efficient utility-based
learning algorithm has been proposed. Particularly, for the
formulated game, authors in [4] has proved that if there exists
a NE which can maximize the social welfare, this NE can be
achieved with their proposed distributed algorithm. To check
whether the algorithm devised in [4] also can be adopted to
solve our problem Por not, we would first discuss the existence
of NE for NTCG.
For a non-cooperative game, (pure-strategy) NE is a standard
solution standing for the equilibrium state, under which no
player can unilaterally improve its own utility by choosing
a different strategy [25]. Mathematical speaking, if a profile
s=(s
1,s
2,···,s
N)in the strategy space (Sn)n∈N is a NE,
then we have
Uns
n,s
nUnsn,s
n,sn∈S
n,n∈N,(13)
where s
n=(s
1,···,s
n1,s
n+1,···,s
N).
Theorem 1: There is no guarantee that the NE for NTCG
always exists.
Proof: For each player n,wehave
arg
sn
max Un(sn,sn)
=arg
sn
max 1
1+eωn(Rn(sn,sn)Rmin
n)
=arg
(pmax
n,cn)
max Rn((pmax
n,c
n),sn)
=arg
(pmax
n,cn)
max γn((pmax
n,c
n),sn).(14)
It is noted from the above equation that NTCG is identical
to the SINR-maximization game which is introduced in [21].
According to the conclusion drawn from a “toy” two-user case
in [21], the existence of the NE for the SINR-maximization
game can not be guaranteed, which further indicates that the
NE for NTCG may not exist too. Note that a counter example
can be easily derived with the parameters given in Table I in
[21], and hence it is omitted here.
Now, the proof is complete.
To this end, we can see that the utility-based algorithm
developed in [4] can not be directly applied to solve the problem
addressed in this work. Hence, a novel utility-based algorithm
will be devised in the following subsection.
B. Utility-Based Distributed Transmission Control Algorithm
When devising a utility-based learning algorithm, there are
two components should be elaborated for each player: the
state profile and learning model (dynamics) [2], [24]. More
specifically, the former depicts each player’s available local
information, and the latter tells the users how to make their
decisions based on this information. In this subsection, we first
define the proper state profile and learning model for each
player in NTCG in detail. Then, a utility-based distributed
transmission control algorithm is proposed.4
1) State Profile: At each decision moment t∈{1,2,···},
we consider describing the state profile of player nwith a
triplet Ln(t)=(sn(t),U
n(t)
n(t)), where Sn(t),Un(t), and
αn(t)∈{0,1}represent its strategy, utility, and mood, re-
spectively. We note that the binary variable αn(t)is used to
elaborate players’ desire for changing the currently adopted
4The utility-based learning approach implemented in this paper can also be
viewed as a state-based learning approach. The term “utility-based” is adopted
by [2], [24] and the references therein.
3614 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014
strategy, which will be specified in detail when introducing the
learning model.
2) Learning Model: Motivated by Marden’s work [26], a
utility-based learning model is adopted in this paper, with which
each individual player ncan update its sn(t),Un(t)and αn(t)
in sequence at each decision moment t. To be specific, at the
beginning of time t, individual player nfirst needs to determine
the probability distribution over the set of its available strategies
(i.e., mixed-strategy)
Qn(t)=q1
n(t),q
2
n(t),···,q
|Sn|
n(t),(15)
where |·|represents the cardinality of a set, and qj
n(t)is the
probability of choosing the jth strategy at time t, i.e.,
qj
n(t)0,j∈{1,2,···,|Sn|} ,
|Sn|
j=1
qj
n(t)=1.(16)
In other words, the probability distribution Qn(t)is adopted to
describe the players’ dynamics. Here, player nwould update
Qn(t)based on its previous mood αn(t1) and action sn(t
1). Particularly, if αn(t1) = 0,
qi(fn)
n(t)= 1
|Sn|,fn∈S
n,(17)
where i(fn)denotes the index of strategy fnin Sn.Therule
shown in (17) means that if the previous mood is 0 the player
will choose each strategy with equal probability. On the other
hand, if αn(t1) = 1
qi(fn)
n(t)=εw
|Sn|−1,fn∈S
n,fn=sn(t1)
1εw,otherwise, (18)
where εis a constant belonging to (0, 1) and wis a constant
greater than N. The above equation means that if the previous
mood is 1 then the player will change its strategy to a different
one (i.e., fn=sn(t1)) with probability εw
|Sn|−1. Meanwhile,
the same strategy (i.e., fn=sn(t1)) will be adopted with
probability 1εw. Since εwis generally much less than 1,
equation (18) represents that the player will the a different
strategy with a relatively smaller probability if its mood is 1,
i.e., 1εwεw
|Sn|−1. The main motivation behind utilizing
(17) and (18) to update Qn(t)is that this rule guarantees that
each individual player would more like to choose the strategy
making its mood be 1.
After that, player nwill choose an action sn(t)based on
the probability distribution Qn(t), calculate its utility Un(t)by
measuring the interference, and finally update mood αn(t)with
Algorithm 1.
Algorithm 1 Mood updating algorithm
1: if αn(t1) = 1 then
2: if (sn(t)=sn(t1)) and (Un(t)=Un(t1)) then
3: Set αn(t)to 1
4: else
5: Go to 10.
6: end if
7: else
8: Go to 10.
9: end if
10: Set αn(t)to 1 and 0 with the probability ρ1=ε1Un(t)
and ρ0=1ρ1, respectively.
3) UTC: Now, based on the above described state pro-
file and learning model, UTC is developed and shown in
Algorithm 2, where players can update their strategies in
parallel. Similar to [4], the stop criterion of this algorithm
can be one of the following: 1) the preset maximum iteration
number Tis reached or 2) for each player n, the variation of its
utility during a period is trivial.
Algorithm 2 UTC
1: Initialize iteration count t=0, personality αn(t)=0
and strategy counter Vn=(v1
n,v
2
n,···,v
|Sn|
n)=
(0)1×(|Sn|),n∈N. Each player nrandomly chooses its
initial strategy sn(t)and then, measures its utility Un(t).
2: repeat
3: Set t=t+1
4: for n=1to Nusers do
5: Update state profile Ln(t):
6: if αn(t1) = 0 then
7: Calculate Qn(t)with (17).
8: else
9: Calculate Qn(t)with (18).
10: end if
11: Choose a strategy sn(t), measure the utility Un(t),
and update its mood αn(t).
12: Update strategies count Vn:
13: if αn(t)=1then
14: Update Vnwith (19).
15: end if
16: end for
17: until the stop criterion is satisfied.
18: Each player ndecides its strategy sD
naccording to (20).
During the initialization of Algorithm 2, each player nwill
randomly choose its own strategy, set its moods to 0, and
initialize the strategy counter Vn, where (0)1×(|Sn|)represents
the |Sn|-dimension null vector. We note that elements in vector
Vnis used to count the times of αn=1 when different
strategies are adopted. For instance, vi
nrepresents the times
that the ith strategy makes the mood of player nbe 1. When
the initialization is completed, the algorithm goes into a loop,
in which each individual player nwill first update its state
profile Ln(t)=(sn(t),U
n(t)
n(t)) with the devised utility-
based learning model at each iteration. We note that the SINR
estimation can be done by sending a pilot or training sequence
SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3615
from the transmitter to receiver in practice [27]. Therefore, the
utility can be measured by each autonomous user. Then, the
strategy counter Vn=(v1
n,v
2
n,···,v
|Sn|
n)is updated based on
the current mood αn(t).Ifαn(t)=1,
vi(sn(t))
n=vi(sn(t))
n+1,sn(t)Sn,(19)
where vi(sn(t))
nis the i(sn(t))th entry in vector Vn. Intuitively,
this updating rule implies that each player would like to record
the strategy which makes its mood be 1. When the loop is
exited, individual players will make their final decisions:
sD
n=arg
snvi(sn)
n=maxv1
n,v
2
n,···,v
|Sn|
n,n∈N.(20)
From (20), we note that the strategy recorded most frequently
will be eventually adopted by users.
The reasons why we choose the above decision rule are two
folds. Firstly, it only requires simple comparison operations
when making final decision as shown in (20). Secondly, it can
make the solution of problem Pbe asymptotically achieved
under the given condition, which will be proved in the following
subsection. We note that with the adopted learning model, sys-
tem dynamics can be depicted as a perturbed Markov process
and the parameter ε>0is the perturbation factor. Therefore,
to show that the optimal strategy profile can be converged, it
is essential to prove that the learning process of our algorithm
will lead to a stochastically stable strategy profile which can
maximize the overall utility. The similar idea has also been
adopted by work [4] when designing the utility-based learning
algorithm. However, authors in [4] adopted a quaternary vari-
able instead of a binary variable to depict each user’s mood,
which introduces a much larger state space to capture system
dynamics and hence, makes the convergence speed of their
algorithm slower than that of ours. This will be illustrated
through simulation results as shown in the following section.
Moreover, it is worth noting that Algorithm 2 is simple and
completely distributed. In particular, when each player updates
its own state profile, it does not require any prior informa-
tion of other players, thereby avoiding a large communication
overhead.
C. Complexity and Efficiency Analysis of UTC
In this subsection, we first present the complexity analysis
for the proposed algorithm UTC. Then, we will analyze its
efficiency and give the main result in Theorem 2.
The main blocks of UTC are two parts. The first one is the
loop from line 2 to line 17, which is independently executed
by each player. The second one is the step in line 18, in which
each player nneeds to make its own final decision with (20).
Note that the first main part (i.e., from line 2 to line 17)
only involves basic arithmetic operations and random number
generation, and hence has a computational complexity of O(1)
for each iteration. In addition, (20) requires the player nto
compare the all Snelements in the vector Vn. Therefore, the
complexity of this algorithm explicitly depends on both the
stop criterion of the loop and the size of the player’s strategy
space. Particularly, for the two different stop criterions ear-
lier described, the complexities are O(T+L)and O(E+L),
respectively, where Tis the preset maximum iteration num-
ber, L=max{|S1|,|S2|,···,|SN|}, and Eis the convergence
speed of the algorithm. Moreover, it should be noted that the
convergence speed Eis related to the value of parameter ε,
which will be further discussed at the end of this subsection.
Theorem 2: Let (sO
n)n∈N (Sn)n∈N denote the solution of
problem P, i.e.,
sO
nn∈N =arg
(sn)n∈N
max
n∈N
Un(sn)n∈N .(21)
When (sO
n)n∈N is unique and εis sufficiently small, i.e., ε0,
the solution of UTC asymptotically converges to (sO
n)n∈N , i.e.,
Pr lim
T→∞0sD
nn∈N =sO
nn∈N =1,(22)
where Tis the number of iterations.
Proof: The proof is given in Appendix A.
We note that there is no requirement that the optimal solution
(sO
n)n∈N is a NE for the formulated game, and hence, this
efficient point may be ignored by existing utility-based resource
allocation algorithms which are designed to reach a NE [2]–
[4]. In addition, it is worth noting that when the parameter ε
is given, a much larger state space will make the convergence
speed of the proposed algorithm much slower, i.e., there is a
curse of dimensionality. This is mainly due to the fact that the
considered resource allocation is essentially a combinatorial
problem which is generally NP hard. On the other hand, there is
a tradeoff between the efficiency and the convergence speed of
our algorithm, which can be made by adjusting ε. Specifically,
a smaller εwill lead to a slower convergence speed, but the
algorithm is more likely to converge to the global optimal
solution (sO
n)n∈N . For this reason, if εis properly set then
our algorithm still works in the scenario where the size of
state space becomes large. In other words, a tradeoff between
the convergence speed and accuracy can be properly made to
implement this algorithm in practice. This conclusion will be
confirmed with simulation results in the following section.
V. R ESULTS AND ANALYSIS
A. Simulation Scenario
To evaluate the performance of our proposed algorithm, we
conduct simulations of a decentralized network consisting of
Ntransmit-receive pairs, which are randomly deployed in a
circular region of radius rm. Meanwhile, the distance between
each transmit-receive pair is a uniform random variable be-
tween 0 and Dm. We assume that all the channels undergo
identically and independently log-normal shadow fading as
well as path loss and moreover, the path loss exponent αand
the shadow fading standard deviation σψare set to 3 and
4 dB, respectively. We note that this channel model has been
confirmed empirically to accurately model the variation in
received power in some outdoor and indoor radio propagation
environments, see e.g., [28] and references therein. In addition,
the duration of a shadow fade lasts for multiple seconds or
minutes, and hence changes at a much slower time-scale [27].
3616 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014
TAB L E I
SIMULATION PARAMETERS
We consider a three-level power set for each user, i.e., low,
medium, and high power levels, which are set to 20 dBW,
10 dBW, and 0 dBW, respectively. Besides that, for each user
n, the minimal rate requirement Rmin
nand steepness of the
sigmoid function ωnare set to 1
10 Rmax
nand 10, respectively.
In addition, each individual simulation result is obtained by
averaging over 1000 independent realizations of the users’
locations and channel conditions. Unless specified otherwise,
the simulation parameters are adopted as listed in Table I.
B. Convergence of UTC
Before delving into the performance of the proposed dis-
tributed resource allocation algorithm UTC, we first investigate
its convergence behavior and examine the impact of the algo-
rithm parameter ε. According to Theorem 2, we provide the
maximum overall utility as shown in (10) as a benchmark result.
To solve the problem Pwithin an acceptable period of time, a
simplified scenario is considered in this simulation. Particularly,
we focus on the case that there are K=5 channels, and
meanwhile, all the users transmit with the high power level,
i.e., Pn={0dBW},n∈N. When there are N=Kand
N=2Kusers, the simulation results are illustrated in Fig. 2(a)
and (b), respectively.
It can be seen from Fig. 2 that when εbecomes smaller, the
convergence speed of UTC is slower but the achieved overall
utility is higher. Besides that, although there is a small gap
between the performance of UTC and that of enumeration, our
algorithm converges with much fewer iterations than that re-
quired by the latter (i.e.,
n∈N
|Sn|=
n∈N
K|Pn|). Considering
the scenario consisting of 10 users as an example, enumeration
needs 510 = 9765625 iterations but our algorithm converges in
about 40 and 100 iterations when εis set to 103and 105,
respectively. Moreover, if εis set to 105, the relative difference
between the overall utility achieved by enumeration and that
achieved with our algorithm is only around 0.4%. Recalling
Theorem 2, we note that this gap may stem from the fact that
εis not small enough and meanwhile there is no guarantee that
the optimal solution of Pis always unique in each round of
simulation.
Next, we compare the convergence behavior of UTC and that
of the Trial and Error Learning algorithm (TEL) proposed in
[4]. For fair comparison, both the perturbation factor εused
in UTC and that in TEL are set to 0.01. Furthermore, the
Fig. 2. Convergence of UTC with respect to ε, where the number of channels
is K=5 and the overall utility is U=
nN
Un.(a)N=K=5 users.
(b) N=2K=10users.
necessary mapping functions suggested in [4] are also adopted
in our simulation,5i.e., G(x)=0.2x+0.2and F(x)=0.2/
N+0.2/N . During this simulation, there are N=25 users and
K=5 channels, and additionally, for each user nthe available
transmit power set Pnis set as shown in Table I. The conver-
gence in terms of overall utility Uand of average transmission
rate ¯
R=
n∈N
Rn
Nare illustrated in Fig. 3(a) and (b), respectively.
From the simulation results, we note that our algorithm UTC
converges much faster than TEL. The main reasons are two
folds. On one hand, TEL introduces four states to depict each
user’s mood, but just two states are adopted in our algorithm.
This difference indicates that TEL has a much larger state
space to capture system dynamics, which further means that
each player has to search more states before making the final
decision. On the other hand, as discussed previously, the main
result in [4] is that when ε0and there is at least one NE
which can maximize the overall utility, TEL will asymptotically
converge to a NE with which the maximum overall utility can
be achieved. However, there is no guarantee that the NE for
NTCG always exists, i.e., the main result given in Theorem 1
in Section IV. Besides that, we can see that both higher overall
utility and average rate can be achieved by UTC. This is mainly
due to the fact that the goal of our algorithm is to find the
5The designing requirements of these two mapping functions G(x)and
F(x)are given in (6) and (7) in [4], and meanwhile, two instances are suggested
by simulations below the two equations, respectively.
SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3617
Fig. 3. Convergence comparison of TEL developed in [4] and our algorithm
UTC, where there are N=25 users sharing K=5 channels. (a) Overall
utility Uvs the number of iterations T. (b) Average rate ¯
Rvs the number of
iterations T.
global optimal solution of Prather than reach a NE of the
formulated game.
C. Performance Comparison
In this section we will evaluate the performance of our
algorithm UTC with the following metrics:
Overall utility U: The sum utility of the all players, i.e.,
U=
nN
Un.
Average transmission rate ¯
R: The average transmission
rate achieved by the users, i.e., ¯
R=
n∈N
Rn
N.
User satisfaction ratio ηs: The ratio of users whose rate
requirements are met over the total number of users N, i.e.,
ηs=|N0|
N,n∈N
0,R
nRmin
n. Note that |·| denotes
the cardinality of a set.
We compare our algorithm UTC with three distributed
schemes which are presented as follows.
Random: With this algorithm, each user nwill randomly
choose a strategy snfrom its strategy space Sn=K×P
n.
Therefore, the performance of this method can be regard
as the baseline.
Greedy transmission control (GTC): This greedy based al-
gorithm is proposed in [29], with which each user needs to
Fig. 4. Overall utility Uvs. the number of users N.
Fig. 5. Average rate ¯
Rvs. the number of users N.
measure the interference on all channels and then transmits
on the channel having the minimum interference with the
maximum transmit power. This process is repeated until
the stop criterion is satisfied.
TEL: This utility-based distributed learning algorithm is
developed in [4]. Note that when implementing this al-
gorithm, the corresponding mapping functions G(x)and
F(x)are set the same as those adopted in previous
subsection B.
When running UTC and TEL, we use the parameter setting sug-
gested in [4] and hence, set the perturbation factor εto 102.In
addition, for fair comparison, all the algorithms are executed in
parallel and the maximum iteration number Tis set to 104[4].
Figs. 4 and 5 illustrate the overall utility Uand the corre-
sponding average rate ¯
Rversus the number of users, respec-
tively. As it can be observed from the simulation results, when
there are more transmitting users, the improvement of the over-
all utility is gradually slow down and meanwhile, the average
rate becomes lower. This is because that when the density of
user increases there is more interference in this network, and in
return, both the achieved utility and transmission rate of each
user would decrease. In addition, we can see that when the
number of communicating users in this network is small (for
example N15), the performance of GTC is good. However,
when there are more users its performance becomes worse.
This is mainly caused by the greedy behavior of users when
implementing this algorithm, i.e., they always transmit with the
3618 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014
Fig. 6. Performance comparison in terms of satisfaction.
maximum power to improve their own utilities. As discussed
in previous studies [8], [9], such a greedy based method may
cause severe interference in the system and finally may become
an inefficient resource allocation strategy.
Additionally, we note that both our algorithm and TEL
perform much better than the baseline algorithm (i.e., Random).
Meanwhile, compared with TEL, there is also an improvement
in performance by implementing our algorithm. For instance,
when there are N=50users, UTC has around 9.7% higher
overall utility (i.e., from 40.08 to 44.77) and 12.4% higher
average rate (i.e., from 0.884 Mbps to 0.994 Mbps) than TEL,
respectively. Therefore, we can conclude that the interference
mitigation capability of UTC is the best among these four
distributed algorithms. It should be noted that the reason for this
improvement is similar to that stated in the previous subsection.
Next, we compare the performance of these four algorithms
from the perspective of user satisfaction, which is demonstrated
in Fig. 6. Particularly, Fig. 6(a) illustrates the user satisfaction
rate ηsversus the number of users N, and Fig. 6(b) compares
the cumulative distribution function (CDF) of the number of
satisfied users (i.e., |N0|=ηs·N) for the four algorithms when
there are N=50 users. Three observations can be made from
Fig. 6. Firstly, not all the rate requirements of users can be
met, especially for the case where the number of users is
large. We note that this result is consistent with the statement
given in Section III. Secondly, ηsdecreases with respect to N,
which is due to the fact that more users will result in higher
interference in this network. Thirdly, compared with the other
resource allocation schemes, more users can satisfy their rate
requirements with our algorithm. Furthermore, combining the
results shown in Figs. 4–6, we can see that our algorithm is able
to achieve better performance on both the system and individual
level. This is mainly for the reason that, both the selfishness
of users and welfare of the whole network are well considered
when developing our algorithm.
VI. CONCLUSION
In this paper, we have addressed the issue of distributed
channel selection and power control in decentralized networks
and meanwhile, proposed a distributed resource allocation al-
gorithm where no information exchange is introduced. More
importantly, we have theoretically proved that the networks
can asymptotically operate at the global optimal point with
our proposed algorithm under the given condition. Simulation
results verified the validity of our analysis and demonstrated
that our algorithm always performed better than the existing
ones in terms of different metrics. One possible extension in
the future work is to consider the time varying characteristic of
the network topology and speed up the convergence time.
APPENDIX A
PROOF OF THEOREM 2
Proof: The learning model in Algorithm 2 introduces a
Markov process over the finite state space Z=n∈N (Sn×
Un×A
n), where Unis the finite range of Unover all strategy
profiles (sn)n∈N (Sn)n∈N , and An={0,1}is the set of
mood. Accordingly, for every scalar ε>0such a Markov pro-
cess is perturbed and meanwhile, we denote such a “perturbed”
Markov process by MPε. Before giving the proof in detail, we
start by introducing some necessary definitions.
Definition 2: Interdependence: An N-person game
Γ(N,(Sn)n∈N ,(Un)n∈N )is interdependent if, for every
strategy profile (sn)n∈N (Sn)n∈N and every subset of
players H⊆N, there exists a player g/∈H and a choice of
strategies (s
h)h∈H (Sh)h∈H such that
Ug(s
h)h∈H ,(s)n∈{N /H}=Ug(sh)h∈H ,(s)n∈{N /H}.(23)
In other words, given a strategy profile (sn)n∈N (Sn)n∈N ,
every subset of players Hcan cause a utility (welfare) change
for some player in {N /H} by performing a proper change in
their strategies.
Definition 3: Stochastically stable states: For a perturbed
Markov process MPε, the elements of the support of the lim-
iting stationary distribution are referred to as the stochastically
stable states. Specifically, a state T∈Zis stochastically stable
if and only if lim
ε0π(T)>0, where π(T)is stationary
distribution of the perturbed process.
We divide the proof of Theorem 2 into 3 steps, S1S3, which
we now elaborate formally.
Step S1:
Proposition 1: UTCG Gis an interdependent game.
Proof: According to (4) and (8), the utility of each player
Unis an increasing function with respect to its achievable
rate Rn. We consider a situation that a single player hcan
SHENG et al.: UTILITY-BASED RESOURCE ALLOCATION FOR MULTI-CHANNEL DECENTRALIZED NETWORKS 3619
change its strategy and other players stay the same. Let
we suppose that the current strategy profile is (sn)n∈N =
((cn,p
n))n∈N , and then, the according situation can be di-
vided into three disjoint cases: 1. n∈{N/{h}},c
h=cn;
2. n∈{N/{h}},c
h=cn;3.n∈{N/{h}},c
h=cn, and
meanwhile m∈{N/{h}},c
h=cm.
In the first case, for any player g∈{N/{h}}, if player h
change its strategy to s
h=(c
h,p
h)where c
h=cg, player g
will suffer higher interference and achieve lower rate Rg, which
will finally make its utility Ugreduced. In the second case, if
there exists a player g∈{N/{h}} whose channel index cg=
ch,hcan change its strategy to realize c
h=cg, which implies
that player gwill suffer lower interference and obtain higher
utility. Additionally, similar to the above discussions, it is easy
to prove that there is a player g∈{N/{h}} whose welfare can
be changed when player hproperly changes its strategy in the
third case.
Therefore, the game UTCG Gis interdependent.
Step S2:
Proposition 2: For the introduced perturbed Markov process
MPε=Z=
n∈N
(Sn×U
n×A
n), if and only if a state T=
((sn)n∈N ,(un)n∈N ,(αn)n∈N )∈Z is stochastically stable,
then the strategy profile can maximize the social welfare, i.e.,
(sn)n∈N =arg
(fn)n∈N
max
n∈N
Un(fn)n∈N .(24)
Moreover, in such a state, we have un=Un((sn)n∈N )and
αn=1,n∈N. Therefore, such a state can be equally rep-
resented as T=((sn)n∈N ,(αn=1)
nN).
Proof: According to the conclusion of Proposition 1 and
the proof of Theorem 1 in [26], Proposition 2 can be proved
with the theory of resistance trees for regular perturbed Markov
decision processes, which can be found in [30]. The detailed
proof and related definitions can be founded in [26]. Here, we
only provide an outline of the proof which consists of four
steps.
First, for the unperturbed process MP0, we need to prove
that the recurrence classes are all singletons T∈C
0and D0,
where C0denotes the subset of states in which each agent’s
mood is 1 and the benchmark action and utility are aligned.
In other words, if ((sn)n∈N ,(un)n∈N ,(αn)n∈N )∈C
0then
un=Un((sn)n∈N )and αn=1. Additionally, D0represents
the set of states in which the mood of everyone is 0, i.e., if
((sn)n∈N ,(un)n∈N ,(αn)n∈N )∈D
0then un=Un((sn)n∈N )
and αn=0. This proof can be completed based on
Proposition 1, i.e., the interdependence of UTCG.
Second, it is needed to prove that the stochastic potential
of each state T∈C
0is γ(T)=w(|C0|−1) +
n∈N
(1 un),
where wis a constant larger than N. Here, the stochastic
potential γ(T)is the minimum resistance over all trees rooted at
the state T[30]. This conclusion can be drawn by showing that
the upper bound and lower bound for the stochastic potential
γ(T)are the same.
Then, we need to apply the criterion for determining the
stochastically stable states introduced in [30]. To be specific,
the criterion shows that the stochastically stable states are
precisely those states contained in the recurrence classes with
the minimum stochastic potential.
Finally, by way of contradiction, it can be shown that only
the recurrence classes (i.e., all singletons) in C0can be the
candidates for stochastically stable states. Hence, the main
conclusion in this proposition can be directly drawn, since
arg
T∈C
0
min γ(T)= arg
T∈C
0
min w(|C0|−1) +
n∈N
(1 un)
=arg
T∈C
0
max
n∈N
Un(sn)n∈N .(25)
Step S3:
If the social optimal solution is unique, then there is only
one stochastically stable state for the perturbed Markov process
MPε. According to Proposition 2, if the stochastically stable
state T∈Zis unique, then we have
lim
ε0π(T) = lim
ε0Pr (sn)n∈N ,(αn=1)
n∈N
= lim
ε0
n∈N
Pr(sn
n=1)
= lim
T→∞0
n∈N
t(sn
n=1)
T=1,(26)
where t(sn
n=1) is the number of occurrences of the
corresponding state during period T. In Algorithm 2, each
player will choose the most frequently recorded strategy which
makes its mood equal to 1 (as shown in (20)). Therefore, the
unique efficient strategy profile can be achieved by applying
the proposed distributed approach.
Now, the proof is completed.
REFERENCES
[1] J. Huang, R. Berry, and M. Honig, “Distributed interference compensation
for wireless networks,” IEEE J. Sel. Areas Commun., vol. 24, no. 5,
pp. 1074–1084, May 2006.
[2] L. Rose, S. Lasaulce, S. Perlaza, and M. Debbah, “Learning equilib-
ria with partial information in decentralized wireless networks,” IEEE
Commun. Mag., vol. 49, no. 8, pp. 136–142, Aug. 2011.
[3] Q. Wu et al., “Distributed channel selection in time-varying radio environ-
ment: Interference mitigation game with uncoupled stochastic learning,”
IEEE Trans. Veh. Technol., vol. 62, no. 9, pp. 4524–4538, Nov. 2013.
[4] L. Rose, S. Perlaza, C. Le Martret, and M. Debbah, “Self-organization in
decentralized networks: A trial and error learning approach,” IEEE Trans.
Wireless Commun., vol. 13, no. 1, pp. 268–279, Jan. 2014.
[5] W. Kiess and M. Mauve, “A survey on real-world implementations of
mobile ad-hoc networks,” Ad Hoc Netw., vol. 5, no. 3, pp. 324–339,
Apr. 2007.
[6] O. Aliu, A. Imran, M. Imran, and B. Evans, “A survey of self organisation
in future cellular networks,” IEEE Commun. Surveys Tuts., vol. 15, no. 1,
pp. 336–361, 2013.
[7] M. Peng, D. Liang, Y. Wei, J. Li, and H.-H. Chen, “Self-configuration
and self-optimization in LTE-advanced heterogeneous networks,” IEEE
Commun. Mag., vol. 51, no. 5, pp. 36–45, May 2013.
[8] A. MacKenzie and S. Wicker, “Game theory and the design of self-
configuring, adaptive wireless networks,” IEEE Commun. Mag., vol. 39,
no. 11, pp. 126–131, Nov. 2001.
[9] F. Wang, M. Krunz, and S. Cui, “Price-based spectrum management in
cognitive radio networks,” IEEE J. Sel. Topics Signal Process.,vol.2,
no. 1, pp. 74–87, Feb. 2008.
[10] A. Zappone, Z. Chong, E. Jorswieck, and S. Buzzi, “Energy-aware com-
petitive power control in relay-assisted interference wireless networks,”
IEEE Trans. Wireless Commun., vol. 12, no. 4, pp. 1860–1871, Apr. 2013.
[11] G. Bacci, E. V. Belmega, P. Mertikopoulos, and L. Sanguinetti, “Energy-
aware competitive link adaptation in small-cell networks,” in Proc.
Int. Workshop Resource Allocation Wireless Netw., Hammamet, Tunisia,
May 2014, pp. 1–8.
3620 IEEE TRA NSA CTI ON S ON COM MUN ICATION S, VOL. 62, NO. 10, OCTOBER 2014
[12] M. Bennis, S. Perlaza, P. Blasco, Z. Han, and H. Poor, “Self-organization
in small cell networks: A reinforcement learning approach,” IEEE Trans.
Wireless Commun., vol. 12, no. 7, pp. 3202–3212, Jul. 2013.
[13] P. S. Sastry, V. V. Phansalkar, and M. Thathachar, “Decentralized learning
of Nash equilibria in multi-person stochastic games with incomplete in-
formation,” IEEE Trans. Syst., Man, Cybern., vol. 24, no. 5, pp. 769–777,
May 1994.
[14] Z. Han, C. Pandana, and K. Liu, “Distributive opportunistic spectrum ac-
cess for cognitive radio using correlated equilibrium and no-regret learn-
ing,” in Proc. IEEE WCNC, Kowloon, Hong Kong, 2007, pp. 11–15.
[15] M. Xiao, N. Shroff, and E. K. P. Chong, “A utility-based power-control
scheme in wireless cellular systems,” IEEE/ACM Trans. Netw., vol. 11,
no. 2, pp. 210–221, Apr. 2003.
[16] J. Zhang and Q. Zhang, “Stackelberg game for utility-based coopera-
tive cognitiveradio networks,” Proc. Proc. 10th ACM Int. Symp. Mobile
Ad Hoc Netw. Comput., pp. 23–32, 2009.
[17] H. Lin, M. Chatterjee, S. Das, and K. Basu, “ARC: An integrated ad-
mission and rate control framework for competitive wireless CDMA data
networks using noncooperative games,” IEEE Trans. Mobile Comput.,
vol. 4, no. 3, pp. 243–258, May/Jun. 2005.
[18] D. T. Ngo, L. B. Le, T. Le-Ngoc, E. Hossain, and D. I. Kim, “Distributed
interference management in two-tier CDMA femtocell networks,” IEEE
Trans. Wireless Commun., vol. 11, no. 3, pp. 979–989, Mar. 2012.
[19] Q. D. La, Y. Chew, and B.-H. Soong, “An interference-minimization
potential game for OFDMA-based distributed spectrum sharing systems,”
IEEE Trans. Veh. Technol., vol. 60, no. 7, pp. 3374–3385, Sep. 2011.
[20] Q. D. La, Y. Chew, and B.-H. Soong, “Performance analysis of down-
link multi-cell OFDMA systems based on potential game,” IEEE Trans.
Wireless Commun., vol. 11, no. 9, pp. 3358–3367, Sep. 2012.
[21] S. Buzzi, G. Colavolpe, D. Saturnino, and A. Zappone, “Potential games
for energy-efficient power control and subcarrier allocation in uplink
multicell OFDMA systems,” IEEE J. Sel. Topics Signal Process.,vol.6,
no. 2, pp. 89–103, Apr. 2012.
[22] C. Xu, M. Sheng, C. Yang, X. Wang, and L. Wang, “Pricing-based mul-
tiresource allocation in OFDMA cognitive radio networks: An energy
efficiency perspective,” IEEE Trans. Veh. Technol., vol. 63, no. 5,
pp. 2336–2348, Jun. 2014.
[23] D. Monderer and L. Shapley, “Potential games,” Games Econ. Behavior,
vol. 14, no. 1, pp. 124–143, May 1996.
[24] R. Cominetti, E. Melo, and S. Sorin, “A payoff-based learning procedure
and its application to traffic games,” Games Econ. Behavior, vol. 70, no. 1,
pp. 71–83, Sep. 2010.
[25] R. B. Myerson, Game Theory: Analysis of Conflict. Cambridge, MA,
USA: Harvard Univ. Press, 2013.
[26] J. R. Marden, L. Y. Pao, and H. P. Young, “Achieving Pareto optimality
through distributed learning,” Dept. Econ., Univ. Oxford, Oxford, U.K.,
Jul. 2011, Tech. Rep.
[27] D. Tse and P. Viswanath, Fundamentals of Wireless Communication.
Cambridge, U.K.: Cambridge Univ. Press, 2005.
[28] A. Goldsmith, Wireless Communications. Cambridge, U.K.: Cambridge
Univ. Press, 2004.
[29] B. Babadi and V. Tarokh, “Gadia: A greedy asynchronous distributed in-
terference avoidance algorithm,” IEEE Trans. Inf. Theory, vol. 56, no. 12,
pp. 6228–6252, Dec. 2010.
[30] H. P. Young, “The evolution of conventions,” Econometrica, vol. 61, no. 1,
pp. 57–84, Jan. 1993.
Min Sheng (M’03) received the M.S. and Ph.D.
degrees in communication and information systems
from Xidian University, Shaanxi, China, in 1997 and
2000, respectively.
She is currently a Full Professor at the Broadband
Wireless Communications Laboratory, the School
of Telecommunications Engineering, Xidian Univer-
sity. Her general research interests include mobile
ad hoc networks, wireless sensor networks, wireless
mesh networks, third generation (3G)/fourth genera-
tion (4G) mobile communication systems, dynamic
radio resource management (RRM) for integrated services, cross-layer al-
gorithm design and performance evaluation, cognitive radio and networks,
cooperative communications, and medium access control (MAC) protocols. She
has published two books and over 50 papers in refereed journals and conference
proceedings.
Dr. Sheng was the New Century Excellent Talents in University by the
Ministry of Education of China, and obtained the Young Teachers Award by
the Fok Ying-Tong Education Foundation, China, in 2008.
Chao Xu received the B.S. degree in electronic infor-
mation engineering from Xidian University, Xi’an,
China, in 2009, where he is currently working toward
the Ph.D. degree in communication and information
systems with the Institute of Information and Sci-
ence, Broadband Wireless Communications Labora-
tory, School of Telecommunications Engineering.
From June to September 2014, he was a visiting
student with the Singapore University of Technology
and Design, Singapore, under the supervision of
Prof. Tony Q. S. Quek. His research interests focus
on dynamic radio resource management, cognitive radio and networks, energy
efficient transmission, distributed algorithm design, and the applications of
game theory and learning theory in wireless communications.
Xijun Wang (M’12) received the B.S. degree with
distinction in telecommunications engineering from
Xidian University, Xi’an, Shaanxi, China, in 2005.
He received the Ph.D. degree in electronic engi-
neering from Tsinghua University, in January 2012,
Beijing, China.
Since 2012, he has been with the School of Tele-
communications Engineering, Xidian University,
where he is currently an Assistant Professor. His
research interests include wireless communications
and cognitive radios and interference management.
Dr. Wang served as a Publicity Chair of IEEE/CIC ICCC 2013. He was a reci-
pient of the 2005 “Outstanding Graduate of Shaanxi Province” Award, the Excel-
lent Paper Award at 6th International Student Conference on Advanced Science
and Technology in 2011, the Best Paper Award at IEEE/CIC ICCC 2013.
Yan Zhang (M’12) received B.S. and Ph.D. degrees
from Xidian University, Xi’an, China, in 2005 and
2010, respectively. He is currently an Associate Pro-
fessor in Xidian University.
His research interests include cooperative cogni-
tive networks, self-organizing networks, media ac-
cess protocol design, energy-efficient transmission
and dynamic radio resource management (RRM) in
heterogeneous networks.
Weijia Han (S’07–M’11) received the B.S. degree
from Northwest University, China, the M.S. degree
from Queen’s University Belfast, UK, and the Ph.D.
degree from Xidian University, Xi’an, China. He
is currently a Lecturer in Xidian University, Xi’an,
China.
His research interests include sensing in cognitive
radio networks, resource management and network
optimization, cognitive media access protocol and
algorithm design.
Jiandong Li (SM’05) received the B.S., M.S., and
Ph.D. degrees in communications and electronic sys-
tems from Xidian University, Xi’an, China, in 1982,
1985, and 1991, respectively.
In 1985, he joined Xidian University, where he
has been a Professor since 1994 and the Vice-
President since 2012. His current research inter-
ests and projects consist of mobile communications,
broadband wireless systems, ad hoc networks, cog-
nitive and software radio, self-organizing networks,
and game theory for wireless networks.
Dr. Li is a Senior Member of the China Institute of Electronics and a Fellow
of the China Institute of Communication. He was a member of the PCN Special-
ist Group for the China 863 Communication High Technology Program between
January 1993 and October 1994 and from 1999 to 2000.He is also a member of the
Communication Specialist Group for The Ministry of Industry and Information.
... It is clear from the above equation that λ k R b k,m is a monotonic increasing function with respect to R b k,m , i.e., individual IoT tags will feel more satisfied when they have a higher rate .λ k of each IoT tag i is scaled between 0 and 1, i.e., λ k R b k,m ∈ (0, 1) and β k = 10 [46]. Fig. 4 demonstrates the IoT tag data satisfaction rate versus the different number of tags for 20 PUs, and the rate threshold R th IoT = 10 Kbps. ...
Article
Full-text available
Ambient backscatter communication (ABC) is considered as a promising paradigm for meeting the 6G massive Internet of Things (IoT) requirements which is expected to revolutionize our world. In this paper, a new multimode matching game and machine learning-based IoT ambient backscatter communication scheme is proposed to maximize the ABC system rate and capacity over the LTE and Wi-Fi multi-RAT heterogeneous network, thereby supporting the 6G green massive IoT communication. The proposed algorithm is designed to support different rate and capacity requirements for different massive Machine Type Communication (mMTC) use cases such as sensor networks, smart grid, agriculture and low data rate Ultra Reliable Low Latency Communication (URLLC) use cases such as Tactile Interaction. The proposed optimization algorithm runs into two phases, the first one is a matching game-based algorithm that selects the optimum association between the IoT tags and the primary users (PU) downlink signals from a specific base station which maximizes the IoT tags rate while minimizing the resulting interference to the PU. Each IoT tag can ride the PU downlink signal using one of three different riding modes according to the required IoT ABC system rate and capacity, whereas mode 1 allows multiple IoT tags to ride the whole PU downlink signal resource blocks, in mode 2 each IoT tag can ride only one subcarrier from the PU downlink signal resource blocks, while in mode 3 multiple IoT tags can ride the same subcarrier from the PU downlink signal resource blocks. In addition, unmanned aerial vehicles (UAVs) flying HetNodes equipped with LTE and Wi-Fi receivers are used as backscatter receivers to receive the IoT tags uplink backscattered signals, so the second optimization phase is formulated to maximize the total sum rate of the ABC system by dividing its service area into clusters using the enhanced unsupervised k-means algorithm, also the enhanced k-means algorithm finds the optimum location of each cluster’s serving UAV flying HetNode that maximizes the channels gain between the IoT tags and the serving UAV flying HetNode in order to maximize the total system rate. The system model was implemented within the MATLAB environment where simulations across the various scenarios are conducted to assess the effectiveness of the proposed algorithm. Simulation results and the performance analysis demonstrated that the proposed algorithm can support the required rate for the most mMTC and low data rate URLLC IoT applications with average IoT tag rates in the range of 15 Kbps to 115 Kbps, and outperforms the algorithm-free riding technique in the case of massive IoT applications. The proposed mode 2 (first enhanced mode) achieves the best performance in terms of the average IoT tags rate and the total system rate with the lowest interference to the primary system users, on the other hand, mode 3 (second enhanced mode) improves the system capacity with maximum IoT tags satisfaction ratio. The capacity and satisfaction ratio of the proposed mode 3 outperforms mode 1 by 300% and 138% respectively, and outperforms mode 2 by 2,000% and 420% respectively. The proposed algorithm reduces the interference power to the PUs on the average by 1:(15.69×10 <sup xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">-12</sup> ) relative to the algorithm-free riding technique. From the result, we can conclude that the proposed algorithm supports different IoT applications and achieves the required data rates with minimal effect on the primary system keeping the PU’s data rate within the required range compared to the algorithm-free riding technique with the cost of higher time complexity.
... In addition to the above, the SR is formulated with a sigmoid function to estimate the user's satisfaction. 53 The rate satisfaction metric is presented in the following equation: ...
Article
Full-text available
This article presents the role of Reconfigurable Intelligent Surface (RIS) using the Multi‐Input Multi‐Output (MIMO) technique as the enabling technology to boost the achievable data rate for mobile networks of the sixth generation (6G). The RIS has been adopted to mitigate the interference at the Cell Edge User (CEU) placed where two adjacent cells are separated. That is by reflecting the incident interference signal in the CEU direction out of phase with the basic interference signals coming from the interfering BS towards the CEU. This article adopts an efficient solution for designing the RIS redirecting (reflection) matrix with trivial training overhead using Deep Learning (DL) technology. Whereas a few of the reflecting elements in RIS are chosen to be active (attached to the baseband), whilst the majority are chosen to be passive, in which the active element's channels are known and used as medium indicators and indicate further the positions of the transmitter and receiver. This article illustrated the efficacy of the adopted framework with DL, by changing the training parameters regarding the data rates that can be achieved, the Spectral Energy Efficiency (SEE), and the Satisfaction Rate (SR). As a result, the proposed model improves the achievable data rate by a near average of 97% above the reference model. Furthermore, it enhances the achievable data rate above the baseline model that assumes no RIS used by an average of 115%. Therefore, The DL method demonstrated that the proposed model is promising for enhancing 6G Mobile Networks.
... The value of satisfaction is in a range of [0, 1]. The utility function represents the user preference in the network by mapping the achievable rate with the level of UE satisfaction [28]. Considering that dynamics in UE behavior, traffic demand per connection, and service elasticity will result in a probabilistic output, we use a sigmoid function to express the satisfaction of users in terms of rate and delay. ...
Article
With the increasing threat of global warming due to high energy consumption of wireless network infrastructure, cell activation complements the capabilities of next-generation wireless technology. In this article, we propose an energy consumption optimization strategy based on deep reinforcement learning (DRL) and transfer learning (TL) techniques. We implement an adaptive reward to autonomously adjust parameters in a reward function to balance energy consumption and quality of service (QoS) requirement of users during the learning process. We further formulate a cell activation/deactivation problem as a Markov decision process and set up our proposed relational DRL model to meet the QoS requirements of users with a minimum number of active remote radio heads under a traffic model defined to simulate a real-world scenario. A weighted TL algorithm has been developed in DRL to validate sample data from a source task. Extensive simulations reveal that the proposed scheme based on the adaptive reward has better performance in balancing the QoS requirement of users and system energy consumption. Finally, based on our simulation results, we conclude that combining DRL with TL speeds up the learning process.
Article
The fifth generation (5G) mobile cellular network relies on network slicing (NS) to satisfy the diverse quality of service (QoS) requirements of various service providers operating on a standard shared infrastructure. However, the synchronization of radio access network (RAN) and core network (CN) slicing has not been well-studied as an interdependent resource allocation problem. This work proposes a novel slice-to-node access factor (SNAF)-based end-to-end (E2E) slice resource provisioning scheme and deep reinforcement learning (DRL)-based real-time resource allocation algorithm for E2E interdependent resource slicing and allocation, respectively, specifically for RAN and CN. To ensure effective resource slicing and allocation, we consider the versatile user equipment (UEs) QoS requirements on transmission delay and data rate. Notably, the SNAF-based scheme provides proper resource provisioning and traffic synchronization, while the DRL-based algorithm allocates radio resources based on affordable traffic and backhaul resources. Based on the 5G air interface, we conduct system-level simulations to evaluate the performance of our proposed methods from various perspectives. Simulation results confirm that our proposed SNAF and DRL-based interdependent E2E resource slicing and allocation techniques achieve better E2E traffic-resource synchronization, and improve the QoS satisfaction with minimal resource utilization compared to other existing benchmark schemes.
Article
The evolution of the Internet of Things (IoT) has been driving the explosive growth of deep neural network (DNN)-based applications and processing demands. Hence, edge computing has emerged as a potential solution to meet these processing requirements. However, emerging IoT applications have increasingly demanded to run multiple DNNs to extract multifaceted knowledge, requiring more computational resources and increasing response time. Consequently, edge nodes cannot act as a complete substitute for the previous cloud paradigm, owing to their relatively limited resources. To address this problem, we propose to incorporate nearby IoT devices when allocating resources to multiple DNN models. Furthermore, the optimization of resource allocation can be hindered by the heterogeneity of IoT devices, which affects the delay performance of DNN-based computing. In this context, we propose a DNN partition placement and resource allocation strategy that considers different processing powers, memory, and battery levels for heterogeneous IoT devices. We evaluate the performance of the proposed strategy through extensive simulations. Simulation results reveal that the proposed strategy outperforms other existing solutions in terms of end-to-end delay, service probability, and energy consumption. The proposed solution was further simulated in a Kubernetes testbed consisting of actual devices to assess its feasibility.
Article
With the full development of intelligent mobile communications, wireless mixed reality (MR) provides a more visually immersive experience and stronger interaction with environments than virtual reality (VR) and augmented reality (AR). However, the asymmetric characteristic of wireless MR traffic creates a huge challenge to current mobile networks. Dynamic time division duplex (D-TDD) is considered as a promising technology to improve wireless MR users’ quality of experience (QoE) due to its potentials and advantages in delivering asymmetric traffic. Therefore, in this paper, we propose a QoE-driven distributed multidimensional resource allocation (MRA) supplemented by inter-cell interference (ICI) mitigation scheme for wireless MR in multi-cell D-TDD systems. First, to improve QoE of MR users, we formulate the joint optimization of subframe configuration, channel assignment and computation offloading as a mixed-integer nonlinear programming problem. A novel fully-decentralized multi-agent deep Q-network (DQN) algorithm is developed to solve the problem. Then, to mitigate ICI, a water filling based power control algorithm is investigated to minimize the total power of each small base station and its associated MR users. Simulation results demonstrate that our proposed scheme improves QoE of MR users in a realizable way as compared to existing schemes.
Article
This paper considers the problem of fully distributed channel allocation in clustered wireless networks when the propagation medium is random. We extend here the existing Trial and Error (TE) framework developed in the deterministic case and for which strong convergence properties hold. We prove that using directly this solution in the random context leads to unsatisfactory solutions. Then we propose an adaptation of the original Trial and Error Learning (TEL) algorithm, called Robust TEL (RTEL), assuming that the random channel effects translate into a bounded stochastic disturbance of the utility function. The solution consists in introducing thresholds in the transitions of the TEL’s Finite State Controller (FSC). We prove that this new solution restores the good convergence property inherited from the TEL. Furthermore, we provide analysis of the stochastic utilities in the Rayleigh fading case in order to check the bounded assumption. Finally, we develop an online algorithm that dynamically estimates the optimal threshold values to adapt to the instantaneous disturbance. Numerical results corroborate our theoretical claims.
Conference Paper
Full-text available
This work proposes a distributed power allocation scheme for maximizing the energy efficiency in the uplink of non-cooperative small-cell networks based on orthogonal frequency-division multiple-access technology. This is achieved by modeling user terminals as rational agents that engage in a non-cooperative game in which every terminal selects the power loading so as to maximize its own utility (the user's throughput per Watt of transmit power) while satisfying minimum rate constraints. In this framework, we prove the existence of a Debreu equilibrium (also known as generalized Nash equilibrium) and we characterize the structure of the corresponding power allocation profile using techniques drawn from fractional programming. To attain the equilibrium in a distributed fashion, we also propose a method based on an iterative water-filling best response process. Numerical simulations are then used to assess the convergence of the proposed algorithm and the performance of its end-state as a function of the system parameters.
Article
Full-text available
In this paper, the problem of channel selection and power control is jointly analyzed in the context of multiple-channel clustered ad-hoc networks, i.e., decentralized networks in which radio devices are arranged into groups (clusters) and each cluster is managed by a central controller (CC). This problem is modeled by game in normal form in which the corresponding utility functions are designed for making some of the Nash equilibria (NE) to coincide with the solutions to a global network optimization problem. In order to ensure that the network operates in the equilibria that are globally optimal, a learning algorithm based on the paradigm of trial and error learning is proposed. These results are presented in the most general form and therefore, they can also be seen as a framework for designing both games and learning algorithms with which decentralized networks can operate at global optimal points using only their available local knowledge. The pertinence of the game design and the learning algorithm are highlighted using specific scenarios in decentralized clustered ad hoc networks. Numerical results confirm the relevance of using appropriate utility functions and trial and error learning for enhancing the performance of decentralized networks.
Article
Full-text available
This article surveys the literature over the period of the last decade on the emerging field of self organisation as applied to wireless cellular communication networks. Self organisation has been extensively studied and applied in adhoc networks, wireless sensor networks and autonomic computer networks; however in the context of wireless cellular networks, this is the first attempt to put in perspective the various efforts in form of a tutorial/survey. We provide a comprehensive survey of the existing literature, projects and standards in self organising cellular networks. Additionally, we also aim to present a clear understanding of this active research area, identifying a clear taxonomy and guidelines for design of self organising mechanisms. We compare strength and weakness of existing solutions and highlight the key research areas for further development. This paper serves as a guide and a starting point for anyone willing to delve into research on self organisation in wireless cellular communication networks.
Article
The past decade has seen many advances in physical layer wireless communication theory and their implementation in wireless systems. This textbook takes a unified view of the fundamentals of wireless communication and explains the web of concepts underpinning these advances at a level accessible to an audience with a basic background in probability and digital communication. Topics covered include MIMO (multi-input, multi-output) communication, space-time coding, opportunistic communication, OFDM and CDMA. The concepts are illustrated using many examples from real wireless systems such as GSM, IS-95 (CDMA), IS-856 (1 x EV-DO), Flash OFDM and UWB (ultra-wideband). Particular emphasis is placed on the interplay between concepts and their implementation in real systems. An abundant supply of exercises and figures reinforce the material in the text. This book is intended for use on graduate courses in electrical and computer engineering and will also be of great interest to practising engineers.
Article
Both the orthogonal frequency-division multiple access (OFDMA) and cognitive radio (CR) technologies offer great flexibility and feasibility for future green wireless communications. In this paper, an energy-efficient multiresource-allocation scheme is proposed for OFDMA CR networks (CRNs) with multiple secondary transmitters (STs). To maximize the energy efficiency (EE) and guarantee the primary transmitter's (PT's) quality-of-service (QoS) requirement, a linear pricing technique is employed to handle both the inter-ST coupling (the spectrum competition among STs) and intra-ST coupling (the correlation between the available transmit power and assigned subchannels for each ST). Furthermore, a distributed algorithm is devised to harvest the multiresource and multiuser gains, and the multiresource allocation is transformed to 1-D pricing-factor profile searching. Simulation results demonstrate that the proposed strategy brings higher EE. Additionally, by adjusting the pricing factors, a different performance can be achieved by the STs with different priorities.
Conference Paper
We propose a simple payoff-based learning rule that is completely decentralized, and that leads to an efficient configuration of actions in any n-person game with generic payoffs. The algorithm requires no communication. Agents respond solely to changes in their own realized payoffs, which are affected by the actions of other agents in the system in ways that they do not necessarily understand. The method can be applied to the optimization of complex systems with many distributed components, such as the routing of information in networks and the design and control of wind farms.
Article
Self-organizing network, or SON, technology, which is able to minimize human intervention in networking processes, was proposed to reduce the operational costs for service providers in future wireless systems. As a cost-effective means to significantly enhance capacity, heterogeneous deployment has been defined in the 3GPP LTEAdvanced standard, where performance gains can be achieved through increasing node density with low-power nodes, such as pico, femto, and relay nodes. The SON has great potential for application in future LTE-Advanced heterogeneous networks, also called HetNets. In this article, state-of-the-art research on self-configuring and self-optimizing HetNets are surveyed, and their corresponding SON architectures are introduced. In particular, we discuss the issues of automatic physical cell identifier assignment and radio resource configuration in HetNets based on selfconfiguring SONs. As for self-optimizing SONs, we address the issues of optimization strategies and algorithms for mobility management and energy saving in HetNets. At the end of the article, we show a testbed designed for evaluating SON technology, with which the performance gain of SON algorithms is demonstrated.
Article
In this paper, a decentralized and self-organizing mechanism for small cell networks (such as micro-, femto- and picocells) is proposed. In particular, an application to the case in which small cell networks aim to mitigate the interference caused to the macrocell network, while maximizing their own spectral efficiencies, is presented. The proposed mechanism is based on new notions of reinforcement learning (RL) through which small cells jointly estimate their time-average performance and optimize their probability distributions with which they judiciously choose their transmit configurations. Here, a minimum signal to interference plus noise ratio (SINR) is guaranteed at the macrocell user equipment (UE), while the small cells maximize their individual performances. The proposed RL procedure is fully distributed as every small cell base station requires only an observation of its instantaneous performance which can be obtained from its UE. Furthermore, it is shown that the proposed mechanism always converges to an epsilon Nash equilibrium when all small cells share the same interest. In addition, this mechanism is shown to possess better convergence properties and incur less overhead than existing techniques such as best response dynamics, fictitious play or classical RL. Finally, numerical results are given to validate the theoretical findings, highlighting the inherent tradeoffs facing small cells, namely exploration/exploitation, myopic/foresighted behavior and complete/incomplete information.