ArticlePDF Available

A personalized QoE-aware handover decision based on distributed reinforcement learning

Authors:

Abstract and Figures

Recent developments in heterogeneous mobile networks and growing demands for variety of real-time and multimedia applications have emphasized the necessity of more intelligent handover decisions. Addressing the context knowledge of mobile devices, users, applications, and networks is the subject of context-aware handoff decision as a recent effort to this aim. However, user perception has not been attended adequately in the area of context-aware handover decision making. Mobile users may have different judgments about the Quality of Service (QoS) depending on their environmental conditions, and personal and psychological characteristics. This reality has been exploited in this paper to introduce a personalized user-centric handoff decision method to decide about the time and target of handover based on User Perceived Quality (UPQ) feedbacks. The UPQ degradations are mainly for the sake of (1) exiting the coverage of the serving Point of Attachment (PoA) or (2) QoS degradation of serving access network. Using UPQ metric, the proposed method obviates the necessity of being aware about rapidly varying network QoS parameters and overcomes the complexity and overhead of gathering and managing some other context information. Moreover, considering the underlying network and geographical map, the proposed method is able to inherently exploit the trajectory information of mobile users for handover decision. UPQ degradation is not only due to the user behaviour, but also due to the behaviours of others users. As such, multi-agent reinforcement learning paradigm has been considered for target PoA selection. The employed decision algorithm is based on WoLF-PHC learning method where UPQ is used as a delayed reward for training. The proposed handoff decision has been implemented under IEEE 802.21 framework using NS2 network simulator. The results have shown better performance of the proposed method comparing to conventional methods assuming regular movement of mobile users.
A taxonomy of MARL Algorithms based on task type [37] Recent handoff management methods are contextaware, that is looking into a wider knowledge of underlying context and their changes. The context information is usually classified into network-side information and mobile-side information. Depending on location of handover decision (MN or access network), part of this context must be transferred there. Gathering context information is a difficulty for context-aware handovers and adds complexity and signalling overhead to MNs and access networks. Authors of [50] have proposed a context-based application handover that exploits user's presence, location, available network interfaces, network availability, network priority, communication status, terminal features, and installed applications. Tramcar [12] is a cross-layer context-aware architecture that utilizes price, power consumption, network conditions, user preference, and network performance in an analytic decision function. Decision function employs dynamic context information from accessible networks; however the method of gathering this part of context is missing. Authors of [57] have presented a context-aware handoff mechanism for ubiquitous computing environments that includes an MADM-based and a Genetic Algorithm (GA) based target PoA selection service. The MADM-based method is aiming at providing higher QoS performance while the goal of GAbased method is reducing the number of handovers in presence of satisfying requirements. The GA-based method takes the mobility of user into account. The handover procedure in [51] includes a handover trigger method which is based on context changes (MN
… 
Content may be subject to copyright.
1
A Personalized QoE-Aware Handover Decision based
on Distributed Reinforcement Learning
Behrouz Shahgholi Ghahfarokhi1,*, Naser Movahhedinia2
{shahgholi, naserm}@eng.ui.ac.ir
1Department of Information Technology Engineering, University of Isfahan, Isfahan, Iran
2Department of Computer Engineering, University of Isfahan, Isfahan, Iran
*Corresponding Author (Email: shahgholi@eng.ui.ac.ir, Tel: +98-311-7934094)
Abstract
Recent developments in heterogeneous mobile networks and growing demands for variety of real-time and
multimedia applications have emphasized the necessity of more intelligent handover decisions. Addressing the context
knowledge of mobile devices, users, applications, and networks is the subject of context-aware handoff decision as a
recent effort to this aim. However, user perception has not been attended adequately in the area of context-aware
handover decision making. Mobile users may have different judgments about the Quality of Service (QoS) depending
on their environmental conditions, and personal and psychological characteristics. This reality has been exploited in this
paper to introduce a personalized user-centric handoff decision method to decide about the time and target of handover
based on User Perceived Quality (UPQ) feedbacks. The UPQ degradations are mainly for the sake of 1) exiting the
coverage of the serving Point of Attachment (PoA) or 2) QoS degradation of serving access network. Using UPQ
metric, the proposed method obviates the necessity of being aware about rapidly varying network QoS parameters and
overcomes the complexity and overhead of gathering and managing some other context information. Moreover,
considering the underlying network and geographical map, the proposed method is able to inherently exploit the
trajectory information of mobile users for handover decision. UPQ degradation is not only due to the user behaviour,
but also due to the behaviours of others users. As such , Multi-Agent Reinforcement Learning (MARL) paradigm has
been considered for target PoA selection. The employed decision algorithm is based on WoLF-PHC learning method
where UPQ is used as a delayed reward for training. The proposed handoff decision has been implemented under IEEE
802.21 framework using NS2 network simulator. The results have shown better performance of the proposed method
comparing to conventional methods assuming regular movement of mobile users.
Keywords
User Perceived Quality, Context-Aware Handover, QoE-Aware Handover, Distributed
Reinforcement Learning.
*Manuscript
Click here to download Manuscript: WN-revised-3.docx
Click here to view linked References
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
2
A Personalized QoE-Aware Handover Decision based
on Distributed Reinforcement Learning
Abstract
Recent developments in heterogeneous mobile
networks and growing demands for variety of real-time
and multimedia applications have emphasized the
necessity of more intelligent handover decisions.
Addressing the context knowledge of mobile devices,
users, applications, and networks is the subject of
context-aware handoff decision as a recent effort to this
aim. However, user perception has not been attended
adequately in the area of context-aware handover
decision making. Mobile users may have different
judgments about the Quality of Service (QoS) depending
on their environmental conditions, and personal and
psychological characteristics. This reality has been
exploited in this paper to introduce a personalized user-
centric handoff decision method to decide about th e time
and target of handover based on User Perceived Quality
(UPQ) feedbacks. The UPQ degradations are mainly for
the sake of 1) exiting the coverage of the serving Point of
Attachment (PoA) or 2) QoS degradation of serving
access network. Using UPQ metric, the proposed method
obviates the necessity of being aware about rapidly
varying network QoS parameters and overcomes the
complexity and overhead of gathering and managing
some other context information. Moreover, considering
the underlying network and geographical map, the
proposed method is able to inherently exploit the
trajectory information of mobile users for handover
decision. UPQ degradation is not only due to the user
behaviour, but also due to the behaviours of others users.
As such, Multi-Agent Reinforcement Learning (MARL)
paradigm has been considered for target PoA selection.
The employed decision algorithm is based on WoLF-
PHC learning method where UPQ is used as a delayed
reward for training. The proposed handoff decision has
been implemented under IEEE 802.21 framework using
NS2 network simulator. The results have shown better
performance of the proposed method comparing to
conventional methods assuming regular movement of
mobile users.
Keywords
User Perceived Quality, Context-Aware
Handover, QoE-Aware Handover, Distributed
Reinforcement Learning.
Abbreviations
AHD Adaptive Handover Decision
AHP Analytic Hierarchy Process
IS Information Server
MADM Multi Attribute Decision Making
MARL Multi Agent Reinforcement Learning
MCHO Mobile Controlled Han dover
MICS MIH Independent Command Service
MIES Media Independent Event Service
MIH Media Independent Handover
MIIS MIH Independent Information Service
MIP Mobile IP
MN Mobile Node
MOS Mean Opinion Score
PHC Policy Hill Climbing
PoA Point of Attachment
PQE Perceived Quality Evaluator
PSNR Peak Signal to Noise Ratio
QoCE Quality of Customer Experience
QoE Quality of Experience
QoS Quality of Service
QoUE Quality of User Experience
RSS Received Signal Strength
SAW Simple Additive Weighting
SCM Spatial Conceptual Map
SG Stochastic Game
SSNR Segmental Signal to Noise Ratio
TLV Type-Length-Value
UPQ User Perceived Quality
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
3
WEA Way Elementary Areas
WoLF Win or Learn Fast
1 Introduction
The evolution of wireless mobile networks has
necessitated incorporating more intelligence in multi-
technology vertical handoff management. Entering the
ubiquitous computing era, mobile users need to be
Always Best Connected (ABC) anywhere and at anytime
to diverse access technologies. As such, handoff decision
is an essential point of attention in next generation
wireless mobile n etworks. Recently, context-awareness
has been employed in handover decision making. The
context-aware handoff can be defined as a handover
procedure which selects a target access node based not
only on the signal quality (as is done traditionally), but
also on a wide knowledge of the mobile side and the
network side information, in order to take an intelligent
and optimized decision [1]. However, the r ole of user s in
context-aware handovers has been limited to influencing
their preferences and requirements in terms of decision
parameters. The telecom market is facing a migration
from network centricity towards user-centricity [2] and
this emphasizes that users should have greater control
over automatic handover decision to select the access
network with which they are most satisfied.
In recent communication networks, the Quality of
Service (QoS) approach is more and more substituted by
the Quality of Experience (QoE) approach which is
defined as the overall acceptability of the service as
perceived by the user [3]. QoE covers two main aspects,
namely Quality of User Experience (QoUE) and Quality
of Customer Experience (QoCE) [4]. In this study, we
focus on QoUE or User Perceived Quality (UPQ). UPQ is
not only related to network QoS factors, but also to the
user preferences and application requirements [5] [3], the
capabilities of mobile device [6] [3] [7], environmental
conditions (e.g. surrounding audio interference in
conversations or light conditions for videos) [8] [7] and
subjective factors such as the emotional and
psychological state of user [6] [3]. In this paper, UPQ has
been considered as a novel and remarkable metric in
handover decision which embodies a wide r ange of
context knowledge.
Authors of [3] have emphasized on the impor tance of
QoE in next generation networks and have proposed a
general framework for end to end QoE assurance where
the QoE has been considered as a metric for network and
application management and adjustment. In [9], authors
have demonstrated the importance of defining concept of
UPQ and linking it to specific wireless data network
parameters. In [10], UPQ has been considered beside the
QoS parameters for adaptive configuration of protocols.
As our best of knowledge, a few works have employed
UPQ metric in handover procedure. Reference [11] just
exploits UPQ degradations as a trigger for handover
initiation and [63] employs minimum QoE of ongoing
users in candidate networks as an indicator for target PoA
selection accepting signalling overhead of broadcasting
minimum estimated QoE to Mobile Nodes (MNs).
The handover decision may be performed in a
centralized fashion or being abandoned to MNs. The
former needs the mobile side context information to be
transferred to the network while the later requires the
network side context to be obtained by MNs. Herein,
gathering information of different access networks,
especially the QoS parameters which are rapidly
changing, is a complication for context-aware decision
methods. While some of the related works (e.g. [12],
[13], [11], [14]) have not addressed context gathering,
some others (e.g. [15], [16]) have presented complicated
and inefficient mechanism for it. The context transfer
overhead is the drawback of context-awareness with
respect to the concept of green wireless technology and
MNs' power restriction. Therefore, some of the recent
researches (e.g. [43], [55]) have tried to reduce this
overhead although some signalling overhead is
unavoidable.
To indirectly exploit the context information related
to user satisfaction level and to avoid the overhead of
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
4
gathering the dynamic network context in context-aware
Mobile Controlled HandOvers (MCHO), a personalized
QoE-aware handover decision method has been
introduced in this paper. In the proposed method, MN is
responsible for handoff initiation and target Point of
Attachment (PoA) selection. The main contribution of
this paper is an adaptive target PoA selection that learns
its strategy from UPQ feedbacks without remarkable
signalling. The UPQ degradations may be due to the
behaviour of MN (e.g. moving out of the coverage of the
PoA) or due to the behaviours of other MNs in its society
(where their behaviours affect the QoS of the serving
PoA). In other words, a handoff decision performed by an
MN may affect the quality of others' perception. Hence,
the target selection mechanism should learn its skill like
an agent in a multi-agent environment regarding both the
behaviour of mobile user (mobility) and the behaviour of
other MNs (mobility and entrance/exit). Being so, a tight
combination of mobility prediction and context-
awareness will be formed in target PoA selection.
The proposed target selection mechanism is based on
Multi-Agent Reinforcement Learning (MARL) concept
which considers the UPQ parameter for its training stage.
The intelligent handover decision should be such that the
shared resources of access networks are exploited
efficiently and without additional communications
between MNs. This problem needs a combination of
cooperation and competition between isolated agents
(decision makers). Per se, WoLF-PHC [17] algorithm has
been employed as an adaptive and convergent algorithm
for learning the handover decision skill.
This paper exploits such an intelligent handover
decision mechanism in a handover management model
proposed under Media Independent Handover (MIH)
framework. The paper presents details of algorithms
performed upon reception of different MIH originated
handover triggers and also the UPQ degradation trigger.
The remainder of this paper has been organized as
follows; in the next section, the research background is
presented. Section 3 introduces some of the previous
works in handover decision. Section 4 presents our
proposed method and section 5 demonstrates simulation
results. Section 6 discusses about advantages and
shortcomings of the proposed method and finally the
paper is concluded in section 7.
2 Research Background
In this section, after introducing the MIH framework,
the UPQ evaluation methods are described and in
subsection 2.3, a survey of MARL methods is presented.
2.1 Media Independent Handover
The IEEE 802.21 [18] is a recent effort of IEEE
aiming at providing a general interface for the handover
and interoperability between heterogeneous networks
which is called MIH. One of the main ideas behind IEEE
802.21 is to provide a common interface for managing
events emerged from different network devices and
dispatching control messages to them [19]. The standard
specifies the MIH Function that is responsible for this
generalization and provides three primary services as
below [19]:
x The MIES (Media Independent Event Service) that
provides support for both local and remote link
layer event notifications to the upper layers of an
MN.
x The MICS (Media Independent Command Service)
that is used to claim for gathering information about
the status of connected links and also to execute
mobility and connectivity decisions in layer 2.
x The MIIS (Media Independent Information Service)
which provides discovery and distribution of
network information within a geographic area. The
role of MIIS is to provide information about the
available networks and PoAs through Information
Elements (IE) accessed from an Information Server
(IS).
These services are provided to any mobility and
handoff management method in upper layers (namely
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
5
MIH Users). Note that the 802.21 standard does neither
specify rules or policies for handover decision nor
determines whether the handover has to be terminal or
network initiated [20].
2.2 User Perceived Quality Evaluation
As justified in section 1, attending to the UPQ is
valuable to improve the performance of handover in next
generation wireless networks. One of the significant
trends in the QoE estimation over the last few years is for
multimedia traffic. Referring to the level of quality
experienced by the user enjoying multimedia services,
QoE has been recognized as a mean in multimedia
transmissions in traditional and recent telecommunication
standards such as ITU-T P.800 [21], ITU-T Rec. J.246
[22], G.1070 [23], and J.247 [24]. Moreover, many
efforts addressed in the literature have focused on the
estimation of QoE for audiovisual traffic. Two main
directions are being explored: objective and subjective
testing of perceived QoS. The methodology to capture the
subjective quality of perception for speech is based on
MOS (Mean Opinion Score) proposed by ITU-T in P.800
[21]. In P.910 [25], the main recommendations regarding
MOS methods have been described for video quality. To
determine MOS, a number of listeners/viewers rate the
quality of audio/video transmitted through a
communication system. Ranged from 1 (worst) to 5
(best), MOS is considered as the arithmetic mean for all
the individual scores [26]. Although the subjective
method is more realistic, the complexity and cost of the
required tests usually make it laborious [26].
In a sporadic way of subjective assessment, the
network users may individually report their UPQ for
personalized usages. However, direct UPQ feedback of
users adds interaction complexity to users and system.
Hence, estimating UPQ from the content and also the
behaviour of the user is preferred in some applications.
A number of standards and models have been
specified in the ITU-T for objective evaluation of video
and audio quality. Example of traditional standards are
P.861 [27] and P.862 [28] in case of evaluating the
quality of voice and J.144 [29] in case of video quality
assessment. Recently, several new standards have also
been defined within ITU-T for objective UPQ evaluation
including G.1070 [23], J.246 [22], and J.247 [24]. In [30]
a decision-tree bas ed method has been proposed to model
the dependency of network and application QoS to QoE
using subjective quality feedbacks. In [7], Bayesian
networks are employed for QoE modeling based on a
variety of context parameters. Authors of [31] have
proposed an exponential relation that estimates QoE from
QoS parameters. Reference [32] offers a combination of
objective and subjective measures to better manage the
complexity of QoE metric.
The objective UPQ assessment methods can be
categorized to full-reference, no-reference, and reduced-
reference evaluation. Full-reference methods (e.g. J.247
for video and P.862 for audio) compare the received
audiovisual stream against original stream and check for
differences. In contrast, no-reference or reference-free
methods (e.g. [33]) analyze the received stream without
any comparison. Between these two extremes are
reduced-reference methods (e.g. J.246 for video) that do
not consider the original stream, but require some
characteristics describing it. Although full-reference
methods have greater precision, their high processing
requirements and necessity for original multimedia
prevent them from being used in real-time (in-service)
applications such as handover decision. Hence, full-
reference methods are only useful for pre-service quality
assessments and simulations. Instead, high performance
no-reference methods can be used for in-service
applications such as real deployment of the method
proposed in this paper.
As illustrated in [34], there are three classes of
objective voice quality evaluation metrics: the network
parameter based metrics, the psycho-acoustic metrics and
the elementary metrics. The network parameter based
metrics do not consider the voice signal, while the
psycho-acoustic metrics transform the voice signals to a
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
6
reduced representation to retain only the perceptually
significant aspects. On the other side, elementar y
objective metrics rely on low-complexity signal
processing techniques to predict the subjective voice
quality, regarding original signal. The Segmental Signal
to Noise Ratio (SSNR) has been chosen as a simple and
mostly used el ementary metric for objective evaluation of
voice quality [34].
Similar classification can be imagined for objective
video quality assessment in which psycho-vision methods
try to recognize or predict impairments by analyzing the
inherent characteristics of video such as blockiness or
jerkiness. Some of these methods evaluate the quality in
the spatial domain after decompressing the video stream
while some others perform this evaluation faster in
compressed domain (e.g. [35]). In contrast, elementary
metrics require the original video to estimate the UPQ
using simple image processing techniques. The most
widespread elementary method for video quality
assessment is the calculation of Peak Signal to Noise
Ratio (PSNR) [36] that performs better than most of the
traditional objective methods [64]. In this paper, video
traffic and PSNR metric have been exploited for the
evaluation of our proposed handover decision method as
in section 5.
2.3 Multi-Agent Reinforcement Learning
Reinforcement learning is a well known method for
solving problems whereby a single agent can learn to
choose optimal actions to achieve its goal. Here, the task
of system is to learn a target function π that plans from
the current state s to the optimal action a=π(s). The
trainer only is provided by a reward value after executing
an action, and the goal of the system is to attain the
optimal target function, π* to maximize the overall
reward.
Q-Learning is one of the algorithms for learning π*
that is based on an evaluation function Q(s, a). The value
of Q(s, a) should be the maximum cumulative reward that
can be achieved starting from state s and performing
action a as the first action. In other words, the amount of
estimated reward obtained upon executing action a in
state s (namely Q(s,a)) depends not only on the
immediate reward from executing a, but also on the
rewards gained from later actions that will be possible
after executing a. Q is assumed to be an n by m matrix
wher e n is the number of states and m is th e number of
possible actions and its elements must be maximized
during the learning phase. There are some algorithms for
updating Q matrix; a rule of update for non-deterministic
environments is as below [66]:
))) a ,s( (Q γmax+β(r+a)(s,Q β)-(1=a)(s,Q
ta t1+t
cc
c
(1)
wher e β is a small constant called learning rate, r stands
for achieved reward after executing action a, s' is the new
state after executing this action, and
10 d
J
is discount
factor .
In Q-Learning, the strategy of the agent is greedily
selecting the action a=
)) a(Q(s, argmax
a
c
c
regarding
current state, s. Therefore
)) a,s((Q max
ta
cc
c
in relation
(1) considers the reward that may be obtained assuming
the greedy strategy. As this future reward is an
expectation, it is added to the current reward, r by a
discount factor. As the environment is non-deterministic,
also a learning rate has been considered to update the
value of Q(s,a) to the new estimated value smoothly
rather than direct replacement.
The Stochastic Game (SG) framework can be
introduced as a multi-agent and multi-state representation
of the environment. A Stochastic Game is a multi-tuple
(n, S, A1...n, T, R1...n), where n is the number of players
(agents), S is the set of states, Ai is the set of actions
available to player i (and A=A1
х
...
х
An is the set of joint
action space), T is the transition function S
х
A
х
S [0,
1], and Ri is the reward function for the ith agent; S
х
A
[17]. The player's strategy may be a pure strategy that
selects the actions deterministically or mixed strategy that
select the action according to a probability distribution
over all possible actions. In mixed strategy game which
has been considered in this paper, a stationary policy is to
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
7
be learnt as π: S×A [0, 1] to map states to a
probability distribution over actions such that the players’
discounted future rewards are maximized.
MARL algorithms are a type of learning algorithms
used by agents to find an optimal strategy for SGs. Two
properties are desirable for an MARL algorithm. At first,
it must be rational, i.e. if the other players’ policies
converge to the stationary policies, the learner converges
to a policy which is best-response to others policies.
Secondly the learner must be convergent, i.e. it certainly
converges to a stationary policy. If all players use a
rational and convergent learning algorithm, then players
are guaranteed to converge to a Nash Equilibrium [17].
MARL algorithms can be classified along several
dimensions such as task type. The type of task targeted by
the learning algorithm leads to the classification of
MARL techniques into those addressing fully
cooperative, fully competitive, or mixed SGs [37]. In a
fully cooperative SG, the agents have the same reward
and the learning goal is to maximize the common reward.
In fully competitive or zero-sum SGs (2-palyer), one's
reward is always the negative of the other's. However, in
mixed games, there is no constraint on the reward of the
agents. The mixed SG model is appropriate for self-
interested agents [37]. Figure 1 shows this taxonomy and
sample MARL algorithms in each category for both static
and dynamic games. Static (or stateless) games are those
SGs with =∅.
In another point of view, there are two forms of
MARL that are isolated learning and interactive learning.
In the isolated form of learning, each agent learns to
optimize its reinforcement from the environment without
any communication. In contrast, interactive agents
explicitly communicate to decide on individual and group
actions [38]. Examples of isolated learning algorithms are
WoLF-PHC while Team-Q, AWESOME and Nash-Q are
not communication free.
Also, the MARL algorithms are classified to
homogeneous and heterogeneous algorithms. In
homogeneous algorithms, all the agents should play the
game with the same learning method (self-play) while in
heterogeneous algorithms, some agents may have
different rational methods for action selection. Examples
of self-play algorithms are Team-Q and Nash-Q while
AWESOME and WoLF-PHC are not self-play [37]. In
handover decision problem, using learning algorithms
that are suitable for heterogeneous environments allows
some of the MNs to have traditional decision methods.
Assuming that those traditional methods do rational
decisions, the distributed learning algorithm converges to
the optimal solution in such environments too. Based on
these characteristics, WoLF-PHC algorithm has been
chosen for our proposed method which will be described
more in section 4.
3 Related Works
Traditional handoff decision methods such as [39, 40, 41,
42, 44] were mostly using link and trajectory information.
However the next generation of handover decision
methods addressed more metrics in decision making. For
example, a multi-service vertical handoff decision
algorithm has been introduced in [45] to judge target
networks based on a wider variety of user and network
metrics including QoS parameters. MADM (Multi
Attribute Decision Making) methods are well-known
decision techniques that have been employed in most of
the handover decision makings. A comparison of some
MADM methods has been reported in [46] considering
bandwidth, delay, jitter, and Bit Error Rate (BER) as
decision parameters. Zhang has proposed a fuzzy MADM
based vertical handoff in [47]. On the same ground,
Hongyan et al. [48] and Chan et al. [49] have proposed
fuzzy based MADM processes to perform PoA and
interface selection based on the cost constraints and
application priorities specified by users.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
8
Figure 1. A taxonomy of MARL Algorithms based on task type [37]
Recent handoff management methods are context-
aware, that is looking into a wider knowledge of
underlying context and their changes. The context
information is usually classified into network-side
information and mobile-side information. Depending on
location of handover decision (MN or access network),
part of this context must be transferred there. Gathering
context information is a difficulty for context-aware
handovers and adds complexity and signalling overhead
to MNs and access networks.
Authors of [50] have proposed a context-based
application handover that exploits user’s presence,
location, available network interfaces, network
availability, network priority, communication status,
terminal features, and installed applications. Tramcar [12]
is a cross-layer context-aware architecture that utilizes
price, power consumption, network conditions, user
preference, and network performance in an analytic
decision function. Decision function employs dynamic
context information from accessible networks; however
the method of gathering this part of context is missing .
Authors of [57] have presented a context-aware
handoff mechanism for ubiquitous computing
environments that includes an MADM-based and a
Genetic Algorithm (GA) based target PoA selection
service. The MADM-based method is aiming at
providing higher QoS performance while the goal of GA-
based method is reducing the number of handovers in
presence of satisfying requirements. The GA-based
method takes the mobility of user into account.
The handover procedure in [51] includes a handover
trigger method which is based on context changes (MN
exits or enters a cell; QoS degrades below acceptable
threshold or user requests for handover). It also proposes
a QoS based target selection and also the adaptation of
communication streams. A Context Manager in that
method gathers, manages, and evaluates context
information. However, being aware about network
context modifications (i.e. QoS parameters) imposes
heavy signalling overhead due to rapidly varying nature
of network resources.
Ahmed et al. have proposed the architecture of a
context-aware mobile-initiated and controlled vertical
handover decision model [13]. The proposed access
networks evaluation and ranking is a five stage Analytic
Hierarchy Process (AHP) based process. Similarly, the
authors of [52] have proposed a combined decision
method which uses Fuzzy Logic to decide about the
handover initiation and AHP to decide about the target
access network. Those papers have not clearly described
the context collection mechanism.
In [11] a policy-based handover management method
has been presented. The handover decision is performed
in backbone of wireless network and authors have
assumed a network context monitor to obtain access
networks information. However, details of gathering
network side context are missing. The proposed scheme
in [53] considers context of services and user intention, in
addition to network information. This paper has discussed
the method of gathering application requirements and
user intention; however the network context gathering has
not been considered.
In [14], an autonomic handover manager has been
proposed which is based on the autonomic computing.
MARL Algorithms
Fully Cooperative
JAL
FMQ Team-Q
Distributed-Q
OAL
Fully Competitive
Minimax-Q
Mixed
Fictitious Play
MetaStrategy
IGA
WoLF-IGA
GIGA
GIGA-WoLF
AWESOME
Hyper-Q
Single-agent RL
Nash-Q
CE-Q
Asymmetric-Q
NSCP
WoLF-PHC
PD-WoLF
EXORL
Static
Dynamic
Static
Static
Dynamic
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
9
They have considered a context server in backbone that
collects the network information from context
repositories distributed in different access networks and
provides them to MNs. However, they have stated that:
"this is not currently realistic because it is difficult and
rather impossible to share the information among the
service providers [14]." Nonetheless, they have assumed
that such a context gathering mechanism will be possible
in 4G networks.
In [15], a general framework has been proffered for
context-aware handover decision. In their architecture,
handover decision points are responsible for deciding
about handover destination while context collection
points collect, compile, and deliver the relevant context
information to the handover decision points. Reference
[16] presents an integrated approach for context
management based on active networking technology.
However, this method increases the complexity of MNs
and adds signalling overhead.
MIH framework provides static context of access
networks through its MIIS service. However, some of the
works in the literature have extended the MIH standard to
provide other context parameters such as dynamic
network context. An Enhanced MIH framework has been
introduced in [54] to gather more context information
while this method is more complex comparing to
standard MIH framework and imposes heavy signalling
overhead to network. Authors of [56] have exploited
extended MIH_MN_HO_Candidate_Query and
MIH_N2N_HO_Query_Resources MICS primitives
defined by IEEE 802.21 to ask the neighbouring PoAs
about their dynamic information.
Authors of [55] have proposed an extended context-
aware information server for MIH-based handover. That
paper has specifically focused on dynamic network
context gathering and an information update algorithm
has been presented for PoAs to send their context to IS.
The drawback of this method is that small variations of
the resources (e.g. due to unsteady nature of variable bit
rate traffic) results in huge number of update packets
when the access networks’ resources are near the
saturation. Authors have proposed a handoff-aware
network context gathering based on MIH framework in
[43] to reduce the signalling overhead of dynamic
network context gathering. In that method, PoAs update
their resources whenever a new MN attaches or detaches
from them, so the signalling overhead is yet remarkable.
That paper also proposes the pre-fetching of dynamic
network context before handover decision starts.
Considering the user in previous context-aware
methods is mostly limited to applying its requirements
and preferences as some thresholds and weights. Authors
of [2] have emphasized on user-centric approach for
handover decision. However, the user preferences vector
is the only user relevant context in their method which is
modified according to the situation of user and the class
of application. Reference [11] has considered UPQ as a
novel user relevant parameter just for handover detection.
In [63], estimated QoE of ongoing users in candidate
networks has been considered as an indicator to select the
best access network. In that method, PoAs estimate the
minimum QoE of users using pseudo-Subjective Quality
Assessment (PSQA) method and broadcast it to all users
within their range. This estimate is used beside price and
mobility metric for target PoA selection. However, it
implies heavy signalling overhead and estimated QoE is
not user and application dependent.
From above review, the followings can be concluded:
1) Although the contribution of users in recent handover
decision methods has been increased, however this
contribution is mostly limited to influencing the user
preferences and requirements. Contributing the UPQ has
not earned deserving attention in handover decision since
UPQ metric has not been utilized for target PoA selection
ever.
2) The above review implies that context gathering and
management is a major difficulty of context-aware
handovers and this difficulty is more challenging for
dynamic network context. We will show that addressing
UPQ metric exempts the handover decision from being
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
10
aware about that part of context and its gathering
complexity.
3) The movement information of users has not usually
been considered in recent context-aware methods.
Although mobility prediction has been considered in
previous context-aware works (e.g. [57, 58]), however
using the movement information beside other context
parameters in handoff decision is a shortcoming of
context-aware handovers. We will show that our
proposed method inherently learns the trajectory of MNs
beside other parts of context for target PoA selection.
4 Proposed Handoff Decision
In this section, we introduce the proposed mobile
controlled personalized handover management model that
employs the UPQ metric for handover decision.
Since UPQ reflects the quality of attaching to a PoA
after handing over to it, it cannot be used for target PoA
selection directly. One may imagine that the UPQ level of
other user s in neighbouring PoAs is a good metric for
target PoA selection; however, accessing the UPQ level
of other users is not suitable due to its complexity and
heavy signalling overhead. Therefore, reinforcement
learning methods are the best solution to employ such a
parameter for handover decision, indirectly. The
underlying network is a shared resource exploited by
MNs (like a multi-agent system), so the quality of
perception is not only related to the decisions of the MN,
but also to the handover s of other MNs. The decision
mechanism should learn its skills being indirectly aware
about the behaviour of other MNs. Hence, a suitable
MARL algorithm has been utilized for target PoA
selection to take this awareness into account inherently.
The proposed model includes an Adaptive Han doff
Decision (AHD) module and a Perceived Quality
Evaluation (PQE) module as shown in Figure 2. The
AHD module determines the appropriate time of
handover (from events) and also the target PoA (using
Learner). Handoff triggers are constructed from two
sources: the link layer events; and UPQ degradation
event. MIH framework provides link events for handoff
trigger. Likewise, the PQE module is responsible for
providing UPQ level to AHD and also initiating handoff
triggers whenever UPQ degrades below satisfying level.
Users can subjectively report the UPQ level and UPQ
degradations to PQE. Otherwise PQE evaluates the UPQ
level objectively and compares it to defined threshold for
handoff initiation triggers. AHD employs an MARL
algorithm (Learner component) to decide about the target
PoA where UPQ metric is used as a reward/penalty to
train it. AHD applies the result of handover decision
through MIH and also reports it to higher layer mobility
management protocols (such as MIP).
In the remainder of this section, we introduce the
employed learning algorithm and the state space
representation in sections 4.1 and 4.2, respectively.
Subsection 4.3 is about extensions proposed on MIH and
finally, details of event handlers are discussed in
subsection 4.4.
Figure 2. The proposed model for personalized handover
decision in mobile nodes
4.1 Employed MARL Algorithm
Target PoA selection problem needs a combination of
cooperation and competition between isolated MNs that
are changing their location during time. Therefore, this
Link laye r
MIH Function
MIES
MIIS
MICS
Mobility management protocols (MIP, SIP ...)
Perceived
Quality
Evaluation
Multimedia
Application
UPQ Le vel/ UPQ Trigge r
Decision Result
(UMTS/802.11/802.16/...)
Event Handlers
Learner
MIH
AHD
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
11
problem is mostly matched to definition of dynamic
mixed SGs. Also, regarding the communication overhead
required to share the actions, rewards and other structures
in interactive learning algorithms, the isolated learning
methods are acceptable ones for handover decision.
As stated in [59], single-agent RLs such as Q-
Learning (Algorithm 1) can be directly applied to mixed
games. However, "non-stationarity of the problem
invalidates most of the single-agent RL theoretical
guarantees [37]". Considering the overview presented in
section 2.3, WoLF-PHC [17] algorithm has been
addressed as an isolated heterogeneous learning
algorithm for dynamic mixed SGs and is chosen to be
used in our proposed handover decision model.
WoLF-PHC is an improvement on PHC (Policy Hill
Climbing) algorithm which adds the convergence
property to PHC in addition to its innate rationality. PHC
is an extension of Q-Learning with the same Q matrix
that its values are maintained as in single-agent Q-
Learning. In addition to basic Q-Learning, PHC
maintains a policy matrix, π involving the probability of
selecting each action according to the current state. It
performs hill-climbing search in the space of mixed
policies [17] with respect to Q values. Therefore the
policy matrix is updated according to relation (2) where δ
є (0,1] is a learning rate and |Ai| is the number of possible
actions for ith agent.
°
°
¯
°
°
®
c
c
z
¦
z
c
c
otherwise
A
as
asQaif
A
as
asas
aa i
a
i
)
1
),,(mi n (
)),((ma xarg)
1
),,(mi n (
),(),(
G
S
G
S
SS
(2)
In relation (2), the probability of selecting action a in
state s is updated with respect to the learning rate. If after
execution of action a, the value of Q(s,a) is the maximum
value between th e elements of sth row of Q, this
probability is increased to improve the chance of future
selection of this action. This probability is diminished if
the action has not caused Q(s,a) to be the maximum
between the elements of sth row. This is the idea of hill
climbing in policy space where the agents greedily
increase the chance of those actions showing better
achievement to cumulative reward.
PHC is rational and its proofs follow from the proof
of Q-learning [17]. However, as PHC is not always
convergent [17], it does not show convergence to the
Nash Equilibrium in some cases.
The WoLF (Win or Learn Fast) principle aids PHC in
convergence by providing more time for other players to
adapt to the changes in our player's strategy that at first
appear beneficial while allowing the player to match
more quickly to other players' strategy changes when they
are harmful. Algorithm 1 shows WoLF-PHC in details. In
fact, the essence of WoLF on PHC is that agents learn
quickly when they are losing and cautiously when they
are winning. Determination of winning or losing is
according to the average of the policy matrix (
S
) that is
an approximation of equilibrium policy [17]. So, WoLF-
PHC algorithm defines two learning parameters, δl for
loss and δw for win, where δl > δw. This technique causes
the agent to adapt more quickly when it is working
weakly. However, the agent will be careful when it is
working better than expected since other agents may
change their policy suddenly [17]. WoLF-PHC is still
rational since only the speed of learning is changed
comparing to PHC. This rationality along the
convergence property guarantees attaining the Nash
Equilibrium. Details of the proofs on how WoLF idea
guarantees the convergence to equilibrium is out of the
scope of this paper and reader may refer to [17] which
also shows the feasibility of WoLF-PHC by some
practical examples.
The consequences of WoLF principle are expedient
for handover problem where MNs are eager to learn the
best strategy more quickly. This paper employs the
WoLF-PHC algorithm for adaptive learning of target PoA
selection in proposed handover decision model. It must
be noted that Algorithm 1 is not so time consuming since
it is in the class of linear time complexity algorithms.
Therefore, using it in the proposed model does n ot burden
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
12
heavy computation load to MNs. Using WoLF-PHC,
MNs do not require to exchange any information since
each MN learns others’ policies from its directl y
observed reward and its average policy matrix. It should
be noticed that converging to the equilibrium is time
consuming in stochastic games with mixed strategy.
Although WoLF enhances the convergence speed using
variable learning rates, however convergence speed is
still a restriction that needs more research.
Algorithm 1: WoLF-PHC Algorithm for agent i
0. Let β є (0, 1] and δ
l
> δ
w
є (0, 1] be learning rates
1. Per each s є S and a є Ai :
Q(s, a) = 0; π(s, a) =
i
A
1
; C(s) = 0
2. Do forever:
a. Select action a accordin g to mixed strat egy π(s)
and
execut e it.
b. Obtain relevant reward r.
c. Observe the new state s'.
d. Updat e Q a ccording to relati on (1 ).
e. Update estimate of average policy,
S
using:
)),(),((
)(
1
),(),(,
1)()(
asas
sC
asasAa
sCsC
i
c
c
c
c
c
SSSS
f. Updat e π(s, a) as in relation (2) where δ is:
g. s = s '
4.2 State Space Representation
Before using the MARL method, the state space
where MN lies in should be modeled. In the simplest
way, the state of the MN can be determined based on the
value of the following spatial and temporal context
parameters:
- Currently serving PoA address
- RSS level of currently serving PoA
- Time (Days of a week and hours of a day)
Serving PoA and its RSS level enable the learner to
approximately distinguish between different locations in
the environment. As RSS is a continuous parameter,
some thresholds have been employed for it (with respect
to PoA configuration) to segregate the state space.
Temporal information also is used for discriminating
between different states that may be created due to the
time-varying context and time-varying behaviour of
mobile users.
When any positioning service is not available for
MNs, the serving PoA and RSS level are the best choices
to determine the state of the MN. Assumin g those
parameters, each MN needs to know whole of the PoAs
and their RSS thresholds which are provided through
MIH framework as discussed in the next subsection.
However, global position of the MN can be utilized to
discriminate different locations more accurately. This
requires the mobile devices to be GPS-enabled (for
outdoor applications) or sensors are necessary to detect
the location of mobile devices (e.g. in ubiquitous network
models). To discriminate the state space based on
location, the map of the environment is required. This
study adopts Spatial Conceptual Map (SCM) [60] model
which transforms the real map into an abstract view. The
SCM has already been used in some previous works for
mobility prediction [58] or path planning [57]. An SCM
contains representation of landmark objects, Oi (e.g.
buildings or rooms) and way areas, Wi. A way area is
partitioned into a set of Way Elementary Areas (WEA),
αi according to landmark objects and way crossings [58].
In this paper, we also partition the way areas according to
the coverage of PoAs such that the list of available PoAs
does not change during movement of MN in a WEA. A
characterization function (Co) is associated with each
landmark object and WEA which shows the accessible
PoAs as shown below:
Co(Oi) = { list of PoAs accessible in Oi } (3)
Co(αi) = { list of PoAs accessible in αi } (4)
List of accessible PoAs is obtained from the position and
coverage of PoAs accessed through MIIS. This
characteristic function is used to make the learner to
select the target PoA from just accessible PoAs according
to current location of MN. Therefore, the following
°
¯
°
®
¦cc
¦!
cc
cc
otherwise
asQasasQasif
l
aa
w
G
SSG
G
),(),(),(),(
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
13
parameters are proposed for state space representation if a
positioning mechanism is available:
- Currently serving PoA address
- Current location of MN (Oi or αi)
- Movement direction of MN (if is available)
- Time
With the aid of SCM model, the movement direction of
MNs can also be utilized for state space representation. It
is noteworthy that the movement direction of MNs is
limited with respect to the orientation of WEAs in SCM.
For example in an east-west WEA; the movement
direction of each MN may be east or west and two states
are separ able from movemen t direction .
One set of matrices ( ) is assumed per each user
and per each service. The first dimension of th ese
matrices is the number of possible states (N) and the
second dimension is the number of all PoAs in the
wireless domain (M). So, they are N by M matrices where
rows are representing states and columns are representing
PoAs as possible actions.
4.3 Extensions to MIH
Knowing all possible states of the environment
requires the complete knowledge about all PoAs in the
wireless domain. To resolve the problem, a simple way is
providing all accessible PoAs of the geographical domain
using a newly defined IE container (ALL_PoAs_Report).
Hence, a new request message with a unique Type
identifier (Type_ALL_PoAs_Report) has been defined as
shown in Table 1. In response to that request, a reply
message is returned that includes all PoAs and some
information about them. The format of the response TLV
has been shown in Table 2. All of the IEs that have been
used in the proposed response message are the ones
defined in IEEE 802.21 except two ones which are
proposed here. One of the defined IEs is RSS Thresholds
List which includes the thresholds on RSS that are used
for separating different states as described in 4.2. The
other one includes the neighbouring PoAs of each PoA.
This container will be requested from IS server once at
the beginning to identify the wireless domain and will be
used for creating learner matrices.
Table 1. All PoAs Request TLV
Type
Length
Value
Type_ALL_PoAs_Report
Variable
MN MAC Address
4.4 Handover Decision Algorithm
This section describes the handover decision algorithm
which is based on WoLF-PHC learning method. In the
proposed handover decision, target PoA selection
(Algorithm 1; step 2.a) is simultaneously performed
along with the training process (Algorithm 1; steps 2.b to
2.f). The training/selection process contains the following
steps:
- When a handoff initiation trigger reaches the AHD
(UPQ degradation or Link Going Down), it selects the
best neighbouring PoA regarding the current state of
MN. Neighbouring PoAs ar e obtained from MIIS or
SCM representation.
- AHD stores the current state, the current value of
mean UPQ level, and the selected PoA.
- The handover to the selected PoA will be performed if
the relevant link is available (or is detected in MIH
Link Detected event or scan confirm).
Table 2. All PoAs Response TLV
Type
Length
Value
Type_ALL_PoAs_Report
Variable
Number of PoAs
Include entry per each PoA
PoA
1
address
PoA
1
Coverage
PoA
1
Subnet
Information
PoA
1
RSS
thresholds list
PoA
1
neighbours
list
PoA1
Position
PoA
2
address
PoA
2
Coverage
PoA
2
Subnet
Information
PoA
2
RSS
thresholds li st
PoA
2
neighbours
list
PoA2
Position
...
...
...
...
...
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
14
- If the han dover to selected PoA is performed, AHD
has to update its matrices with respect to the earlier
state and the recently selected PoA. The difference
between current and previous (stored) mean UPQ
level is used as training reward. However, this update
could not be performed exactly after handover
execution and is postponed to future to investigate the
effect of handover for the time between current and
next handover decision.
- If the handover to the selected PoA is not performed
and UPQ level is degrading, yet the AHD must punish
the learner for this selection that probably is not in
movement path of MN and is not detected.
It must be noticed that quality of each action (target
PoA selection) not only depends on variations of mean
UPQ, but also depends on the length of the time that user
has experienced degradations or improvements due to th e
recent action. To consider this case in AHD, the variation
of mean UPQ is multiplied by the time between current
and next handover decision to obtain the training reward
as below:
r = (NOW - Timed) × (UPQNOW UPQold) (5)
where Timed is the time that current PoA selected, UPQold
is the mean UPQ level at decision time, UPQNOW is
current level of mean UPQ, and NOW is the current time
that a n ew decision is underway.
The training and target selection processes are divided
between some event handlers. For simplicity of
presenting these event handlers, we assume that there is
only one flow in each MN at a time. One structure has
been defined in these event handlers per each flow, which
is called selection structure. Th e selection structure stores
the last selected PoA, the state of MN, the mean UPQ
level when this selection was performed, and the time of
selection. It also contains a member that indicates
whether the handover to the selected PoA is executed or
not yet.
Algorithm 2 shows the detailed procedure proposed
for UPQ degradation event which is called UPQ_Trigger.
When a UPQ_trigger event arrives while the previous
selection has not yet been evaluated (selection age is less
than Sel_Threshold), no operation is performed. In fact,
UPQ_Trigger's event handler avoids consecutive target
selections and handovers based on Sel_Threshold
parameter. Otherwise, the algorithm updates the WoLF-
PHC matrices with respect to the previous decision
(stored in selection structure) and a new PoA is selected
according to WoLF-PHC policy. The algorithm stores the
time, the selected PoA, the current level of mean UPQ,
the current state, and the flow in selection structure, and
then tries to perform the handover if the selected PoA is
now accessible through an interface. Unless, the
algorithm requests from MIH to perform a link scan on
relevant interface. In fact, the selected PoA remains in
selection structure hoping to be detected later.
MIH Link_Going_Down event handler is similar to
the UPQ_Trigger. Although receiving UPQ degradation
trigger is usual for the flows which are using a going
down link, however, sometimes no UPQ degradation is
reported depending on the transferred content (e.g. for a
series of MPEG video frames which are not so different
from each other). Therefore, MIH Link_Going_Down
event n eeds to be handled.
The situation of Link_Down event is rather confusing.
This event occurs when a link is not available anymore.
Since the handover decision is based on a learning
algorithm, the handler of this event should not perform
any handover (during the training stage). In fact, this
event is produced after UPQ_Trigger or
Link_Going_Down events where any decision about
handover has already been taken. If that decision is not
effective, the learner should be punished to improve its
future selection in this state. So, the only task in this
event may be requesting link scan on the disconnected
interface. Of course, Link_Down event may redirect the
flows with down interface to another interface, if the
training process is n ot in progress anymore or has r eached
steady state.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
15
Algorithm 2: UPQ_Trigger (flow, UPQLevel, currentState)
if (NOW-selection.time > Sel_threshold){
if (selection.performed == True or UPQLevel <
selection.UPQ)
Learner.update (NOW, UPQLevel, currentState,
selection)
newPoA = Learner.select(currentState, {neighbors})
selection.PoA = newPoA
selection.UPQ = UPQLevel
selection.time = NOW
selection.flow = flow
selection.state = currentState
selection.performed = False
selection.addressPrefix = AddPrefix(newPoA)
if (typeOf (newPoA) != typeOf(flow.currentPoA))
for each interface in IFManager.interfaces
if (interface.PoA == newPoA) {
switch flow to interface
complete the handover in upper layers
selection.performed = True
go to end
}
MIH.scanReq(relevantInterface(newPoA))
else
if( newPoA != flow.currentPoA)
MIH.scanReq(relevantInterface(newPoA))
else
selection.performed = True
}
end
Algorithm 3 shows the event handler for
Link_Detected event. This event is emitted when an
interface detects a new PoA. The algorithm configures
the newly detected PoA if it has already been selected
according to the learner advice (stored in selection
structure), otherwise the relevant interface is configured
for probable future use (without considering the learn er
advice) if the type of interface is not the same as selection
and the interface is not currently being used by any data
flow. It is also obvious from the algorithm that we do not
perform target selection in this event. This is due to the
fact that the goal of the proposed method is only
improving quality of perception, so the MN only thinks
of a new PoA when its flows are endangered (in
Link_Goin g_Down or UPQ_Trigger ) to avoid
unnecessary handovers. Handing over when a new PoA
with better characteristics is detected, may sometimes
lead to quality degradations. Similar algorithm is
performed for the detected PoAs in scan confirm event.
Algorithm 3: Link_Detected (detectedPoA, interface)
if (selection.performed == False and selection.PoA ==
detectedPoA)
configure interface to connect thr ough detetctedPoA
else {
if ((typeOf(detectedPoA) == typeOf(selection.PoA)) or
flow.interface == interface)
go to end
configure interface to connect thr ough detetctedPoA
}
end
Link_Up event is another event that occurs when a
new link is configured with a PoA at Link_Detected or
Scan_Confirm event handlers. Regarding the Link_Up
event handler (Algorithm 4), if the new PoA is the
anticipated one stored in selection, the MN's interface
must be configured with a new IP address. As discussed
in section 4.3, MNs attain the address prefixes of PoAs
from IS server (Table 2) and so the interface can be
configured without waiting for MIP access router
discovery process. Moreover, the binding update must be
accomplished for the data flow. Also, the performed field
of selection structure should be set if handover is
performed.
In addition to these events, Link_Parameter_Report event
is used to obtain RSS level which is used for state
determination if the first method of state space
representation h as been chosen.
Algorithm 4: Link_Up (linkPoA, interface)
if (selection.performed == False and selection.PoA ==
linkPoA)
- configure the interface IP address using
selection.addressPrefix
- handover selection.flow to interface (if is not using)
- complete the MIP handover procedure
- selection.performed = True
else if (typeOf(selection.PoA) != typeOf(linkPoA))
- configure the IP address of interface using MIP
procedure
end
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
16
5 Simulations
To evaluate the feasibility of the proposed handoff
decision method, in this section we present the results of
simulations performed in NS 2.28 network simulator
using NIST mobility package [61]. NIST mobility
package has provided the layer 2 handover for traditional
NS2 implementation of WiFi and WiMAX in addition to
the basic implementation of MIH framework (event and
command services). We have extended the NIST MIH to
support MIIS. In implemented IS server, we have
provided a table that stores a container for all PoAs in the
simulation environment and their relevant context
information.
The video traffic has been chosen for our evaluation.
The transferred video flows are real video streams
encoded in MPEG format. To evaluate UPQ in
simulation environments, the possible method is
objective evaluation. The PSNR method is chosen to
estimate the UPQ level of received video stream.
Calculating PSNR requires original video which is
conveniently accessible in simulations. However in real
implementations, reference free methods (or subjective
evaluation) should be used.
The mechanism chosen for evaluation of real video
quality over the simulated network is similar to the
method proposed in [36]. Here, a log file is constructed
from real transmission of an MPEG video file. This log
file is used as a sender trace file (in NS-2) to generate the
corresponding simulated traffic in the transmitting node.
Once the receiver gets the video packets from the
simulated network, a receiver trace file will be generated
that describes the time and status of the received packets.
The receiver trace file is employed along with the sender
trace file and the original video stream to recognize the
erroneous received video stream continuously. So, the
original and the received video streams are available for
PSNR calculations, frame by frame.
Evaluation of PSNR should be performed
continuously during the simulation. In our case, the
evaluation is performed every N received frames of video
stream (its commensurate time step) to rapidly report any
UPQ degradations. We do not calculate the PSNR for the
entire received video stream in each time step. In stead, a
moving average window with window length of W
frames has been used. Briefly, the PSNR of the last W
received video frames are averaged and reported to AHD
in each evaluation step.
In the remainder of this section, the results of two
different simulation scenarios are shown. The next
subsection presents the results of a typical simulation
scenario which compares the proposed method to
traditional and context-aware ones. Subsection 5.2
demonstrates the results of simulating a more complex
scenario using SCM model to evaluate the feasibility of
the proposed method for being adapted in real
environments.
5.1 Simulation of a Typical Scenario
The first simulation environment is assumed to be
covered by two WiMAX base stations and four WiFi
access points as shown in Figure 3. These PoAs are
connected to th e router R1 via 100Mbps trunks. The
coverage radius of WiFi access points is assumed to be
about 50 meters while the coverage radius of WiMAX
stations is about 500 meters. The employed propagation
model of the physical layer is Two-Ray Ground model.
WiFi access points are working at data rate of 11Mbps,
and WiMAX nodes are based on IEEE 802.16e. Three
mobile nodes have been assumed in our simulations and
MIPv6 has been chosen as their layer 3 mobility
management protocol. These MNs are multi-interface
nodes supporting both WiFi and WiMax technologies. In
addition, three fixed Crowding Nodes (CWNs) have been
assumed in the coverage area of AP1, AP2 and AP3.
MNs have different movement patterns with a fixed
speed of 20m/s. MN2 starts from position (470, 980) and
moves to (600, 1050), then it moves to position
(1400,1000). MN1 starts from (480, 1000) and moves to
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
17
(650, 950), and then to (1200, 1000). MN3 starts from
position (1100, 1000) and moves to position (450, 1000).
In this simulation, the first proposed method for state
space modeling (section 4.2) is chosen and one RSS
threshold has been considered to separate different states
around PoAs. The RSS threshold is -61 dB for WiFi APs
and -80 dB for WiMAX BSs regarding adjusted
transmitting power.
Three video flows have been considered in this
simulation where MN1, MN2, and MN3 are the
destination of those three MPEG video streams (in QCIF
frame size) with 1000 bytes maximum packet size and
frame rate of 30 fps, over UDP connections. These video
flows are originated from Corresponding Node (CN) and
the maximum required bandwidth of each one is about
0.3Mbps. In addition, a CBR flow has been transferred
from each Crowding Node (CWNi) to CN with 6Mbps
rate to diminish the capacity of WiFi access points in
specific times. The aim of these CBR flows is to diminish
the quality of video streams when transferred through
these APs.
For instantaneous UPQ evaluation, N and W are
adjusted to be 3 and 6 respectively. Considering the
reference PSNR of original video, the amount of PSNR
threshold for handover initiation is assumed to be 25dB.
This threshold has been attained experimentally to reflect
the acceptable user satisfaction level. Also, the
Sel_Threshold in Algorithm 2 is selected to be 0.5
second.
To evaluate our proposed model, we firstly compare
its performance with a traditional handover decision
method implemented under MIH framework (called
simple handover). This simple method employs Link Up,
Link Down, Link Detected, and Link Going Down events
and initiates the handover only based on these layer2
triggers. In this method, WiFi PoAs are preferred to
WiMAX ones and the target PoA is selected based on
RSS quality. The signalling overhead of the proposed
method is not considerable comparing to this basic
method. Therefore, the two methods have been compared
only in terms of service quality measures. The aim of this
comparison is evaluating the proposed method against
another handover mechanism with similar signalling
overhead to show the susceptibility of proposed method
after convergence. Secondly, the proposed method is
compared to a usual context-aware handover (called
conventional handover) in terms of both the provided
quality and the signalling overhead to show the expense
of context gathering in context-aware methods.
Figure 3. First Simulation Environment
BS
1
(WiM AX)
MAC=10
Position=(1300,1000)
AP
2
(WiFi- 11Mbps)
MAC
=3
Position=(650,1020)
AP
3
(WiFi- 11Mbps)
MAC
=4
Position=(750,1000)
CN
Flow1: Video (to MN2)
Flow 2: Video (to MN1)
Flow 3: Video (to MN3)
R1
AP
0
(WiFi- 11Mbps)
MAC=1
Position=(500,1000)
MN
2
(Multi-IF)
Position=(470,980)
Speed = 20 m/s
AP
1
(WiFi- 11Mbps)
MAC
=2
Position=(640,980)
BS
0
(WiM AX)
MAC
=8
Position=(1000,1000)
IS
MN
1
(Multi-IF)
Position
=(480,1000)
Speed = 20 m/s
MN
3
(Multi-IF)
Position=(1100,1000)
CWN3
CWN2
CWN1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
18
5.1.1 Comparing to Traditional Handoff
At first, the simulation has been carried out using
simple handover mechanism in MNs and the results have
been gathered for about 30 seconds (about 900 frames of
the video). Then, the proposed handover method has been
employed and after 6 training stages; the simulation
results have been compared to those obtained from the
simple method. Figure 4 shows that the overall PSNR
levels of video flows h ave been improved using our
proposed method. For the proposed method, this
calculated metric is from 6th training stage.
In simple handover, flow 1 and flow 2 experience
major degradations when MN1 and MN2 select AP1 or
AP2, leaving AP0 coverage (around frame 200 and
subsequent ones). Ch oosing AP1 to transmit flow 2, MN1
faces with the poor quality of service provided by this
PoA, although its signal strength is fine. On the other
hand, as MN2 selects AP
1 and then AP2, subsequent
degradations occur for flow 1. Similar degradation also
takes place around AP3 because of the same r easons.
MN1 and MN2 also involve in a ping pong effect between
BS0 and BS1 during the rest of simulation.
In contrast, the PSNR of video flows has been
improved using our pr oposed method. Both MN1 and
MN2 have learnt to select BS0 when their quality of
perception degrades during their connection to AP0 and
also to stay connected to BS0 hereafter, resulting in low
PSNR degradation. Similarly, for flow 3, simple
handover causes service degradations when MN3 ent ers
the coverage ar ea of AP3 (for frames 600 to 800).
However, this degradation does not happen for the
proposed method since MN3 avoids handing over to this
PoA.
20
25
30
35
40
0
100
200
300
400
500
600
700
800
900
PSNR (dB)
Fram Number
Flow 1
20
25
30
35
40
0
100
200
300
400
500
600
700
800
900
PSNR (dB)
Frame Number
Flow 2
20
25
30
35
40
0
100
200
300
400
500
600
700
800
900
PSNR (dB)
Frame Number
Flow 3
Simple Handover
Proposed Handover
Figure 4. Final PSNR level of video flows for the
proposed and simple handover decision methods
We also have compared the quality of video flows in
terms of frame jitter and frame loss as the QoS metrics.
Figures 5 to 7 compare frame jitter of video flows for the
simple handover decision and the proposed method.
These figures show that video frames experience more
delay variations under simple handover decision due to
inefficient target selection and unnecessary handovers.
Table 3 presents frame loss comparison under these two
methods. As shown in this table, the n umber of lost video
frames has reduced remarkably when the proposed
method has been trained for sufficient iterations.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
19
Figure 5. Jitter comparison of Flow 1 under simple and
proposed decision methods
Figure 6. Jitter comparison of Flow 2 under simple and
proposed decision methods
Figure 7. Jitter comparison of Flow 3 under simple and
proposed decision meth ods
Table 3. Frame loss comparison of video flows under
simple and proposed handover decisions (after 6 trials)
Simple Handover
Proposed Handover
Flow 1
271
1
Flow 2
368
1
Flow 3
211
2
Anoth er important metric for comparison of handover
methods which has direct relation to quality of received
video, is the number of handovers. Table 4 compares the
number of handovers arisen for both methods. A large
number of unnecessary handovers degrades service
quality in addition to diminishing the utilization.
As a conclusion employing the three techniques of
UPQ for handover initiation, the proposed algorithms for
management of triggers, and the learning-based target
selection, bring about significant improvement in UPQ
compared to simple handover method.
Table 4. Comparing the number of completed handovers
Simple Handover
Proposed Handover
MN1
10
1
MN2
7
1
MN3
4
2
The behaviour of the proposed method implies that
its decision skill improves as the mobile node continues
to learn the environment. Figure 8 shows the overall
PSNR level for one of the video flows (flow 1) which
improves as the learning procedure is reiterated showing
that the MARL mechanism gradually learns to select the
best PoA.
20
25
30
35
40
0
100
200
300
400
500
600
700
800
900
PSNR (dB)
Frame Number
Flow 1
Run#1
Run#3
Run#6
Figure 8. PSNR level of Flow 1 video frames after each
of the three training runs
The average PSNR levels of all the video flows have
been examined over six training runs in Figure 9. This
-0.1
-0.05
0
0.05
0.1
0.15
0 200 400 600 800
1000
Jitter (Sec.)
Frame Number
Simple Handover Proposed Handover
-0.1
-0.05
0
0.05
0.1
0.15
0 200 400 600 800 1000
Jitter (Sec.)
Frame Number
Simple Handover Proposed Handover
-0.1
-0.05
0
0.05
0.1
0.15
0 200 400 600 800
Jitter (Sec.)
Frame Number
Simple Handover Proposed Handover
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
20
figure shows that the videos’ mean PSNR level amplifies
as the number of training stages increases. After six trials,
the learning system converges to a steady state value and
the UPQ calculations are needed less frequently meaning
that the computation load of MNs could be reduced.
Figure 9. Mean PSNR level of video flows in
consecutive learning trials
5.1.2 Comparing to Context-Aware Handoff
In this subsection, the proposed method is compared
to a conventional context-aware method in which the
network QoS parameters and user preferences are utilized
as context parameters in an MADM based decision
maker. Only the QoS parameters are chosen as context
parameters to have a fair comparison to the proposed
method which is focusing on quality (due to exploiting
PSNR metric). The chosen metrics are available
bandwidth (abw), number of users (load), packet delay
and packet loss. As mentioned in section 2, MADM
methods are a popular technique in context-aware and
vertical handover decisions (e.g. [13,47,62]). Here, the
Simple Additive Weighting (SAW) algorithm has been
selected for MADM. SAW requires a preference vector
that indicates the percentage of preferrin g each metric
according to user intentions. A decision matrix will be
formed to show the capabilities of the candidate PoAs in
terms of decision parameters. SAW algorithm employs
preference vector and decision matrix to indicate the
importance of choosing each PoA between the
candidates. The preference vector is as below and has
been adjusted such that MNs select the best target PoAs
in this scenario:
>@
14.028.036.022.0 P
loaddelayabwloss
(6)
The aim of choosing this method for comparison is to
show the effect of context access latency on user
perception quality. It must be pointed out that all of the
context aware handover methods proposed in literature
suffer from context access latency. Therefore, the
MADM based method has been adjusted to select the best
selection between candidates to only show the effect of
context access latency. One may think that pre-fetching
dynamic context (in recent works such as [43]) eliminates
context access latency, but it must be noticed that the
freshness of context data is not guaranteed at decision
time specially in crowded networks where network
resources are changing so rapidly.
Decision metrics are dynamic and need to be attained
repeatedly from access networks. Gathering those
network parameters has been motivated from the method
proposed in [56] which is based on MIH framework.
Herein, MNs should ask from IS for neighbouring PoAs
and their static context (after a successful handover) and
then ask through currently serving PoA for dynamic
context of each neighbouring PoA (whenever a handover
decision is underway). The mentioned method performs
target selection when Link Going Down, Link Down, or
Link Detected events arrive. To have a fair comparison to
proposed learning-based target selection, the
UPQ_Trigger event has also been considered as a handoff
initiation source in conventional method.
The simulation has been repeated for the above
conventional method and the results have been compared
to the ones obtained for the proposed method. Figure 10
shows the PSNR level of received video flows using
explained conventional method and compares it to
proposed method. Although the preference vector has
been adjusted such that MN1 and MN2 select BS0 exiting
from coverage of AP0, yet the PSNR degradation is more
considerable using conventional method. This is due to
the handover latency in conventional method which waits
for collection of dynamic information about neighbouring
29
31
33
35
37
123456
Mean PSNR (dB)
Trial Number
Flow 1 Flo w 2 Flow 3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
21
PoAs to do target selection. Similar degradation happens
for flow 3 during handover from BS1 to BS0 and finally to
AP0 as shown in Figure 10. Using conventional method,
flow 1 and flow 2 also experience quality degradations
during handover from BS0 to BS1 since this method
selects BS1 as better PoA whenever it is detected.
However, the proposed method does not perform such an
unnecessary handover as shown in Figure 10 since it only
performs handover decisions whenever UPQ degrades.
20
25
30
35
40
0
100
200
300
400
500
600
700
800
900
PSNR (dB)
Fram Number
Flow 1
20
25
30
35
40
0
100
200
300
400
500
600
700
800
900
PSNR (dB)
Frame Number
Flow 2
20
25
30
35
40
0
100
200
300
400
500
600
700
800
900
PSNR (dB)
Frame Number
Flow 3
Conventional Handover
Proposed Handover
Figure 10. PSNR level of received video frames under
conventional method and proposed method
In the remainder, we are going to show that the
proposed method incorporates QoE awareness to target
selection without considerable signalling overhead. Using
the proposed handover model, each MN only perfor ms
one transaction (one request/response for
ALL_PoAs_Report IE container as stated in section 4.3)
at the beginning of its activity to obtain the list of PoAs
and the static context about them. Then each MN tries to
learn its handover decision strategy which is stored
locally in learning matrices constructed from the above
static context without additional signalling (as explained
in section 4.2). In contrast, in conventional method, the
MN should look for neighbouring PoAs and their static
context (one transaction with IS), and then ask through
the serving PoA for dynamic context of the neighbouring
PoAs (namely one transaction between MN and the
serving PoA and one transaction between the serving
PoA and the neighbouring PoA per each neighbour). The
first transaction is performed whenever the MN’s
handover to a new PoA is completed while other
transactions should be performed when a handoff
decision is necessary due to handoff initiation triggers
such as UPQ degradation or Link Going Down.
Therefore, total number of transaction s depends on the
rate of completed handoffs (Completed_HO) and the rate
of handoff triggers (HO_Triggers).
Table 5 presents a general signalling comparison of
both methods assuming n mobile nodes in environment
and m neighbouring PoAs per each PoA in average. In
those relations, the rate of handoff initiation triggers of ith
mobile node is shown by HO_Triggersi and the rate of
completed handoffs executed by ith mobile node is given
by Completed_HOi where the initial connection to the
wireless network has also been considered as a completed
handoff.
Assuming identical handoff initiation and execution
rates for all the MNs, figure 11 draws the signalling
overhead versus the number of MNs for the proposed
method, the conventional handover, and the method of
[43] under different ratio of HO_Triggers to
Completed_HO. As figure 11 shows, signalling
transactions of conventional method and its enhanced
version in [43] grow more severely comparing to the
proposed method as the number of MNs increases.
Moreover, as the number of unsuccessful handovers (rate
of handoff initiation triggers to completed handovers)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
22
raises, this signalling overhead is more considerable for
conventional method and even for the method of [43].
However the overhead of the proposed method does not
depend on this ratio as shown in Table 5.
0
50
100
150
200
250
0 5 10 15 20 25
# of Signalling Transactions
Number of MNs
Conventional me thod (HO_Triggers/Completed_HO=1)
Conventional me thod (HO_Triggers/Completed_HO=2)
Method of [43] (HO_Triggers/Completed_HO=1)
Method of [43] (HO_Triggers/Completed_HO=2)
Proposed method (all ratios of HO_Triggers/Com pl eted_HO)
Figure 11. Signalling transactions comparison
In the above comparisons, we have not considered th e
size of signalling messages. Although the proposed
ALL_PoAs_Report IE container needs larger signalling
messages than the standard IE container which is used by
the conventional method, however it should be noticed
that these messages are transferred to MNs only at the
beginning to identify the wireless access domain to MNs.
Therefore the size of the signalling messages is not a
considerable drawback for th e proposed method.
However for the signalling between MNs and IS in
conventional method, the IE containers that involve
neighboring information may be attained by each MN
many times during its movement (depending on the rate
of handoff executions) which leads to increasing
cumulative overhead as the time goes on.
The signalling overhead of the proposed method has
also been compared to the conventional method in the
simulated scenario. The number of MIH messages related
to network context retrievals is 42 messages in the
conventional method while this is 6 messages for our
proposed method.
5.2 Simulation of an Environment
Represented by SCM
The second simulated environment is a floor of a
building covered by 6 WiFi access points. Figure 12
shows the SCM representation of the simulated
environment. Access points are connected to a
correspondent node through an access router and their
physical data rate is limited to 2Mbps. The coverage of
these access points is adjusted to about 50 meters. 10
mobile nodes have been considered in this environment
as shown in Figure 12. MN0 and MN1 move to room O6
while MN6, MN7, and MN8 move to room O1. MN4
moves to room O2 and MN
3 moves to room O1.
Table 5. Signaling transaction comparison between proposed method and the conventional context -gathering method
Number of
transactions
Proposed method Conventional context-aware method
Between MNs and IS
nх1
¦
n
ii
HOCompleted
1
_
Between MNs and
serving PoA
0
¦
n
ii
TriggersHO
1
_
Between serving PoA
and
neighboring
PoAs
0
¦
u
n
ii
TriggersHOm
1
_
Imposed to MNs
nх1
)__(
1i
n
ii
TriggersHOHOCompleted
¦
Total
nх1
)_)1(_(
1i
n
iiTriggersHOmHOCompleted u
¦
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
23
Destination of MN5, MN2, and MN9 is O3, O1, and O3
respectively. All MNs move at fixed speed of 15m/s. The
characteristics of rooms and WEAs have been determined
with respect to position of access points. For example for
α1, Co(α1) is {AP0, AP1, AP2} while for α2, Co(α2) is {AP1,
AP2}.
Each MN is receiving a video flow (in QCIF frame
format with 128 bytes maximum packet size and frame
rate of 30 fps) over a UDP connection. These video flows
are originated from the CN and the maximum required
bandwidth of each one is about 0. 3Mbps. For
instantaneous UPQ evaluation, N and W are chosen to be
6. Considering the reference PSNR of the original video,
the amount of PSNR threshold for handover initiation is
assumed to be 20dB. The Sel_threshold parameter is
adjusted to 0.8 second in this simulation.
The simulation of proposed method has been repeated for
multiple consecutive runs where Figure 13 shows the
mean PSNR level of each video flow after each run. This
figure indicates that the UPQ levels of th e received flows
improve in overall. Although the mean UPQ level of
some video flows oscillates during repetitions, however
such variations are natural during the training of the
learner in this scenario that the resources are limited and
a handover may affect the quality of other users.
Figure 14 shows the PSNR of MN7’s received video
in 2nd, 5th and 9th iteration of the simulation. To better
show that the overall performance of the handover
decision improves as the simulation is repeated, we also
have shown the average of mean PSNR of all video flows
in Figure 15 per each run of the simulation. This figure
also emphasizes on ability of the proposed method to
learn th e handover decision skill based on UPQ metric.
O
1
O
2
O
6
O
5
O
3
O
4
α
1
α
2
α
3
α
4
α
5
MN
1
MN
0
MN
9
MN
2
MN
4
MN
5
MN
6
MN
7
MN
8
AP
0
AP
1
AP
3
AP
2
AP
4
AP
5
MN
3
Figure 12. Second Simulation Environment
19
21
23
25
27
29
31
33
012345678910
Mean PSNR level (dB)
Number of Tra ining Runs
Video0 (received by MN0)
Video1 (received by MN1)
Video2 (received by MN2)
Video3 (received by MN3)
Video4 (received by MN4)
Video5 (received by MN5)
Video6 (received by MN6)
Video7 (received by MN7)
Video8 (received by MN8)
Video9 (received by MN9)
Figure 13. Mean PSNR level of received video flows during different consecutive runs of the simulation
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
24
Figure 14. PSNR level of video frames received by MN7
in different consecutive runs of simulation.
21
22
23
24
25
26
27
123456789
10
Averagre of Mean PSNRs (dB)
Simulation Run
Figure 15. Average of mean PSNR levels of all video
flows versus the number of simulation run s
We also have obtained results from multiple tests with
different random generator seeds. Figure 16 shows the
mean PSNR level of video flows of MN4, MN7, and MN8
during 15 training runs. Each sample is obtained by
taking the average over 15 tests with different random
seeds. Figure 16 includes 95% confidence intervals for
the means. As the figure shows, in overall, the UPQ level
of the received video flows incr eases while the
confidence interval diminishes during training runs.
Figure 17 shows the average of mean PSNR of all
videos during training runs with different seeds. The
figure also shows the variance between the mean PSNR
levels of video flows received by the MNs. One may
conclude from this figure that at the beginning of
training, the average and the variance of the videos’
PSNR are low. However, as MNs are competing for
resources during the training, the variance increases for
some runs but commences to decrease as the learner tends
toward steady state. Moreover, the growing average
demonstrates the ability of the proposed method to
improve its strategy during the experiments in more
realistic conditions.
20
22
24
26
28
30
32
34
123456789101112131415
Mean PS NR (dB)
Number of Training Runs
video4
video7
video8
Figure 16. Mean PSNR levels of MN4, MN7, and MN8
video flows versus th e number of training runs.
21
22
23
24
25
26
27
13579111315
Average of M ean PSNRs (dB)
Numbe r of T raini ng Runs
0
2
4
6
8
10
12
14
13579111315
Variance of Mean PSNRs (dB)
Number of Training Runs
Figure 17. Average and variance of mean PSNR levels of
all video flows versus the number of training runs
After 15 training runs, we have exploited the learner
for a different MPEG video stream that has the same
characteristics but less picture motion. The UPQ
threshold for this video stream is adjusted to be 25 dB.
Figure 18 shows the mean PSNR level of the video
received in each MN and the average of all PSNRS under
the proposed method and compares it to the mean PSNR
level obtained using the basic handover decision
5
10
15
20
25
30
35
40
0 100 200 300 400 500
PSNR (dB)
Video Frames
Run #2
Run #5
Run #9
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
25
(introduced in previous subsection). Although the UPQ is
content dependent, however this figure shows that the
trained learner is yet capable of satisfying the user much
better than traditional method with similar required
signalling. In the next section, we will discuss about
features of the proposed method further.
27
28
29
30
31
32
33
34
35
Mean PSN R (dB)
video flows
Proposed Handover
Basic Handover
Figure 18. Mean PSNR level of received video under
proposed handover decision and basic handover decision
Finally, in this section we investigate the convergence
time of the proposed method for varying number of MNs
in terms of the number of training runs needed by MNs to
learn their suitable strategy. Therefore, varying number of
MNs (3, 5, 7, 10, 13, and 15) has been assumed in a
random arrangement and the simulation runs have been
repeated for each case until the average PSNR reaches to
the defined threshold (26 dB) for all the contributing
MNs. Then the number of runs has been reported for each
case as shown in Figure 19. It can be concluded from this
figure that as the number of MNs increases, the
convergence time of the algorithm rises accordingly. The
convergence time is a limitation of our proposed method
that we will discuss more about it in the next section.
0
2
4
6
8
10
12
14
16
18
20
2 3 4 5 6 7 8 9 1011121314
15
Number of Training Runs
Number of MNs
Figure 19. Convergence time of the proposed method
versus th e number of MNs
6 Discussions
In general, monitoring quality metrics in our
simulations shows that th e proposed handover method
improves the quality of received video traffic especially
comparing to decision method with similar signalling
overhead. In this section, the advantages and limitations
of the proposed handover decision are discussed in more
details.
The main advantage of the proposed method is
importing new context knowledge to the handover
decision. As discussed earlier, UPQ is also related to
some non-measurable and subjective parameters such as
characteristics of the physical environment, the
expectation and emotion of the user, the content, etc.
Although current objective UPQ evaluation methods are
not capable of considering those parameters completely,
however direct feedback of the mobile users could be
utilized in real deployments to adapt the handover
decision to the mostly satisfying PoA. Evaluating the
proposed method with real user satisfaction feedbacks is
worthy to be considered in future works.
The oth er advantage of the proposed method is
removing the side effects of context access latency and its
signalling overhead. Moreover, th e proposed method
removes the necessity of knowing QoS related context
knowledge from application, user, and device since the
quality of user experience implicitly reflects them.
Therefore, the proposed method has incorporated the
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
26
QoE awareness to the target PoA selection without
remarkable signalling.
Another advantage of the proposed method is
considering the movement of mobile users in handover
decision. This means that the proposed method inherently
combines the mobility prediction with QoE awareness.
This is due to the fact that the probability of selecting the
PoAs that are not in the regular movement path of MN
will be decreased during learning process as selecting
those PoAs degrades the attained reward (UPQ) for the
sake of bad link quality or disconnections. However, this
probability is increased for PoAs that are in regular
movement path of MN over the time.
The major limitation of the proposed method is the
time required for the algorithm to converge to the steady
state which is yet extraordinary for large environments
with many mobile users. So improving the proposed
method to reduce its convergence time is an open
challenge. Combining the proposed method with
conventional context aware methods through advice
taking idea [65] is a proposal that will be considered in
our future work.
One may imagine that the computation load imposed
to MN (if objective UPQ evaluation method has been
chosen) or the user-device interaction complexity (i f
direct feedback of user satisfaction level has been
exploited) is of the main drawbacks of the proposed
method. However, such overheads are not so significant
due to the following r easons:
1) Users are more familiar with qualitative
representation of their needs (their UPQ) than
quantitative one (r epresenting preferences and
requirements in terms of QoS parameters as in context-
aware methods) and this persuades users to accept
interaction complexity of direct UPQ feedback.
2) Decision strategy improves as MN lives through
the environment, so the UPQ evaluation periods may be
made longer as the mean UPQ increases and its deviation
decreases. Therefore, interaction complexity and
computation overhead decrease gradually.
Finally it is remarkable that although the behaviour of
the mobile users is mostly regular in practice and our
simulations have also been performed under the
assumption of regular movement pattern of mobile users;
however the proposed method may also be improved for
environments involving mobile users with irregular
behaviours.
7 Conclusions
This paper presents a personalized QoE-aware
handover management model that employs UPQ to
manage the user satisfaction more elegantly. The method
utilizes UPQ instead of network quality parameters to
eliminate the complexity and overhead of the network
context gathering and management procedures. In our
proposed method which benefits from UPQ as a delayed
reward, an MARL scheme is employed for target PoA
selection under MIH framework. The simulation of a
typical scenario with video streams as the traffic sources
shows that the proposed approach delivers higher
performance comparing to the traditional link based
handover decision making scheme which does not require
context transfer signalling similar to the proposed
method. The proposed meth od has also shown better
performance comparing to a conventional context-aware
method in terms of service quality and signalling
overhead. We also have shown the ability of learning
trajectory of MNs and feasibility of the proposed method
for conventional environments by simulating a more
complex scenario with more mobile nodes.
Although the proposed method is rather slow in
convergence and would not work optimally during its
learning period, it can be applied to infuse more
intelligence to traditional context-aware methods.
Employing the proposed method in a more intelligent
handover management model for more complex scenarios
with irregular behaviour of some mobile users is an issue
to be considered in future works. Furthermore, using the
proposed method in more advanced handover mechanism
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
27
with multi-homing capability and incorporating other
parameters such as power consumption and price into
decision parameters are of our future researches.
Moreover, real implementation of the proposed method
can be addressed as another future work to evaluate its
performance considering no-reference objective
assessment methods, pseudo subjective assessment
methods, and also subjective measures.
References
1. Prehofer, C., Nafisi, N., & Wei, Q. A framework for context-
aware handover decisi ons. In 14th IEEE Proceedings on
Personal, Indoor and Mobile Radio Communications,
PIMRC, 2003 (Vol. 3, pp. 2794-2798)
2. Nguyen-Vuong, Q.-T., Agoulmine, N., & Ghamri-Doudane,
Y. (2008). A user-centric and context-aware solution to
interface management and access network selection in
heterogeneous wireless environments. Computer Networks,
52(18), 3358-3372.
3. Jingjing, Z., & Ansari, N. (2011). On assuring end-to-end
QoE in next generation networks: challenges and a possible
solution. Communications Magazine, 49(7), 185-191.
4. Kilkki, K. (2008). Quality of experience in communications
ecosystem. Journal of Universal Computer Science, 14(5),
615-624.
5. Stankiewicz, R., & Jajszczyk, A. (2011). A survey of QoE
assurance in converged networks. Computer Networks,
55(7), 1459-1473.
6. Herman, H., Rahman, A. A., Syahbana, Y. A., & Bakar, K.
A. Nonlinearity Modelling of QoE for Video Streaming
over Wireless and Mobile Network. In Second
International Conference on Intelligent Systems, Modelling
and Simulation (ISMS), 2011 (pp. 313-317)
7. Mitra, K., Zaslavsky, A., & Aahlund, C. A probabilistic
context-aware approach for quality of experience
measurement in pervasive systems. In Proceedings of the
ACM Symposium on Applied Computing, 2011 (pp. 419-
424)
8. Winkler, S., & Mohandas, P. (2008). The evolution of video
quality measurement: from psnr to hybrid metrics. IEEE
Transactions on Broadcasting, 54, 660-668.
9. Saliba, J., Beresford, A., Ivanovich, M., & Fitzpatrick, P.
(2005). User-perceived quality of service in wireless data
networks. Personal Ubiquitous Comput., 9(6), 413-422.
10. Magoulas, G. D., & Ghinea, G. Neural network-based
interactive multicriteria decision making in a quality of
perception-oriented management scheme. In International
Joint Conference on Neural Networks, 2001 (Vol. 4, pp.
2536-2541 vol.2534)
11. Ghahfarokhi, B. S., & Movahhedinia, N. (2011). A
contextaware handover decision based on user perceived
quality of service trigger. Wireless Communication and
Mobile Computing, 11, 723741.
12. Hasswa, A., Nasser, N., & Hassanein, H. (2007). A
seamless context-aware architecture for fourth generation
wireless net works. Wirel. Pers. Commun., 43(3), 1035-
1049.
13. Ahmed, T., Kyamakya, K., & Ludwig, M. Architecture of a
Context-Aware Vertical Handover Decision Model and Its
Performance Analysis for GPRS - WiFi Handover. 11th
IEEE Symposium on Computers and Communications, 2007
(pp. 795-801).
14. Kang, J.-M., Ju, H.-T., & Hong, J. (2006). Towards
Autonomic Handover Decision Management in 4G
Networks. In Autonomic Management of Mobile
Multimedia Services (pp. 145-157).
15. Prehofer, C., Nafisi, N., & Wei, Q. A framework for
context-aware handover decisions. In IEEE International
Symposium on Personal, Indoor and Mobile Radio
Communications, 2003 (pp. 2794-2798)
16. Wei, Q., Farkas, K., Prehofer, C., Mendes, P., & Plattner, B.
(2006). Context-aware handover using active network
technology. Comput. Netw., 50(15), 2855-2872.
17. Bowling, M., & Vel oso, M. (2002). Multiagent learning
using a variable learning rate. Artif. Intell., 136(2), 215-250.
18. IEEE 802.21, IEEE Standard for Local and Metropolitan
Area Networks: Media Independent Handover Services,
2008
19. Cacace, F., & Vollero, L. Managing mobility and adaptation
in upcoming 802.21 enabled devices. In 4th international
workshop on Wireless mobile applications and services on
WLAN hotspots, 2006 (pp. 1-10)
20. Oliva, A. d. l., Melia, T., Vidal, A., Bernardos, C. J., Soto,
I., & Banchs, A. (2007). IEEE 802.21 enabled mobile
terminals for optimized WLAN/3G handovers: a case study.
SIGMOBILE Mob. Comput. Commun. Rev., 11(2), 29-40.
21. ITU-T Rec. P.800; Methods for subjective determination of
transmission quality, 1996
22. ITU-T Rec. J.246; Perceptual audiovisual quality
measurement techniques for multimedia services over
digital cable television networks in presence of reduced
bandwidth reference, 2008
23. ITU-T Rec. G.1070; Opinion model for videophone
applications, 2007
24. ITU-T Rec. J.247; Objective perceptual multimedia video
quality measurement in the presence of a full reference,
2008
25. ITU-T Rec. p.910; Subjective video quality assessment
method for multimedia applications, 1999
26. Reves, X. User perceived Quality Evaluation in a B3G
Network Testbed. 15th IST Mobile and Wireless Summit,
2006
27. ITU-T Rec. P.861; Objective quality measurement of
telephone-band speech codecs, 1998
28. ITU-T Rec. p.862, Perceptual Evaluation of Speech Quality
(PESQ): An objective method for end to end speech quality
assessment of narrowband telephone networks and speech
codecs, 2001
29. ITU-T Rec. J.144, Objective perceptual video quality
measurement techniques for digital cable television in
presence of full reference, 2004
30. Menkovski, V., Exarchakos, G., & Liotta, A. Machine
Learning Approach for Quality of Experience Aware
Networks. In 2nd International Conference on Intelligent
Networking and Collaborative Systems, 2010 (pp. 461-466)
31. Fiedler, M., Hossfeld, T., & Tran-Gia, P. (2010). A Generic
Quantitative Relationship Between Quality of Experience
and Quality of Service. IEEE Network, 24(2), 36-41.
32. Brooks, P., & Hestnes, B. r. (2010). User measures of
quality of experience: why being objective and quantitative
is important. IEEE Network, 24(2), 8-13.
33. Mahdi, A., & Picovici, D. (2010). New single-ended
objective measure for non-intrusive speech quality
evaluation. Signal, Image and Video Processing, 4(1), 23-
38.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
28
34. Rein, S., Fitzek, F. H. P., & Reisslein, M. (2005). Voice
quality evaluation in wireless packet communication
systems: a tutorial and performance results for ROHC.
Wireless Communications, 12(1), 60-67.
35. Jiang, X., Wang, Y., & Wang, C. No-reference video quality
assessment for MPEG-2 video streams using BP neural
networks. In 2nd International Conference on Interaction
Sciences: Information Technology, Culture and Human,
2009 (pp. 307-311)
36. Klaue, J., Rathke, B., & Wolisz, A. (2003). EvalVid A
Framework for Video Transmission and Quality Evaluation.
In Computer Performance (pp. 255-272).
37. Busoniu, L., Babuska, R., & De Schutter, B. (2008). A
Comprehensive Survey of Multiagent Reinforcement
Learning. IEEE Transactions on Systems, Man, and
Cybernetics, Part C: Applications and Reviews, 38(2), 156-
172.
38. Weiss, G. (1999). Multiagent Systems: A Modern Approach
to Distributed Artificial Intelligence: MIT Press.
39. Sleem, A., & Kumar, A. (2005). Handoff management in
wireless data networks using topography-aware mobility
prediction. Journal of Parallel and Distributed Computing,
65(8), 963-982.
40. Ghahfarokhi, B. S., & Movahhedinia, N. (2007). QoS
provisioning by EFuNNs-based handoff planning in cellular
MPLS networks. Comput. Commun., 30(13), 2676-2685.
41. Onel, T., Ersoy, C., Cayirci, E., & Parr, G. (2004). A
multicriteria handoff decision scheme for the next
generation tactical communications systems. Computer
Networks, 46(5), 695-708.
42. Edwards, G., & Sankar, R. (1998). Microcellular handoff
using fuzzy techniques. Wirel. Netw., 4(5), 401-409.
43. Ghahfarokhi, B. S., & Movahhedinia, N. (2012). Context-
Aware Handover Decision in an Enhanced Media
Independent Handover Framework. Wireless Personal
Communications., published online, DOI 10.1007/s11277-
012-0543-4.
44. Sharma, S., Baek, I., & Chiueh, T.-C. (2007). OmniCon: a
Mobile IP-based vertical handoff system for wireless LAN
and GPRS links. Softw. Pract. Exper., 37(7), 779-798.
45. Zhu, F., & McNair, J. (2006). Multiservice vertical handoff
decisi on algorithms. EURASIP J. Wirel. Commun. Netw.,
2006(2), 52-52.
46. Stevens-Navarro, E., & Wong, V. W. S. Comparison
between Vertical Handoff Decision Algorithms for
Heterogeneous Wireless Networks. In IEEE 63rd Vehicular
Technology Conference, VTC, 2006 (Vol. 2, pp. 947-951)
47. Wenhui, Z. Handover decision using fuzzy MADM in
heterogeneous networks. In IEEE Wireless
Communications and Networking Conference, 2004 (Vol. 2,
pp. 653-658)
48. Bing, H., He, C., & Jiang, L. Intelligent signal processing of
mobility management for heterogeneous networks. In
Proceedings of the International Conference on Neural
Networks and Signal Processing, 2003 (Vol. 2, pp. 1578-
1581)
49. Chan, P. M. L., Sheriff, R. E., Hu, Y. F., Conforto, P., &
Tocci, C. (2002). Mobility management incorporating fuzzy
logic for het erogeneous IP envir onment. Communications
Magazine, 39(12), 42-51.
50. Inoue, M., Mahmud, K., Murakami, H., Hasegawa, M., &
Morikawa, H. (2005). Context-Based Network and
Application Management on Seamless Networking
Platform. Wirel. Pers. Commun., 35(1-2), 53-70.
51. Indulska, J., & Balasubramaniam, S. Context-aware vertical
handovers between WLAN and 3G networks. In IEEE 59th
Vehicular Technology Conference, 2004 (Vol. 5, pp. 3019-
3023 )
52. Kassar, M., Kervella, B., & Pujolle, G. (2008). An overview
of verti cal handover decision strategies in heterogeneous
wireless networks. Computer Communications, 31(10),
2607-2620.
53. Hong, C.-P., Weems, C. C., & Kim, S.-D. (2008). An
effective vertical handoff scheme based on service
management for ubiquitous computing. Comput. Commun.,
31(9), 1739-1750.
54. Wang, Y., Zhang, P., Zhou, Y., Yuan, J., Liu, F., & Li, G.
Handover Management in Enhanced MIH Framework for
Heterogeneous Wi reless Networks Environment. Wireless
Personal Communications, 52(3), 615-636.
55. Neves, P., Soares, J., Sargento, S., Pires, H., & Fontes, F.
(2011). Context-aware media independent information
server for optimized seamless handover procedures.
Computer Networks, 55(7), 1498-1519.
56. Mussabbir, Q. B., Wenbing, Y., Zeyun, N., & Xiaoming, F.
(2007). Optimized FMIPv6 Using IEEE 802.21 MIH
Services in Vehicular Networks. IEEE Transactions on
Vehicular Technology, 56(6), 3397-3407.
57. Wang, C.-Y., Huang, H.-Y., & Hwang, R.-H. Mobility
management in ubiquitous environments. Personal
Ubiquitous Comput., 15(3), 235-251.
58. Samaan, N., & Karmouch, A. (2005). A mobility prediction
architecture based on contextual knowledge and spatial
conceptual maps. IEEE Transactions on Mobile Computing,
4(6), 537-551.
59. Sen, S., Sekaran, M., & Hale, J. Learning to coordinate
without sharing information. In twelfth national conference
on artificial intelligence, 1994, (vol. 1, pp. 426-431)
60. Kettani, D., & Moulin, B. (1999). A spatial model based on
the notions of spatial conceptual map and of object’s
influence areas. Paper presented at the Spat Inf Theory Cogn
Comput Found Geogr Inf Sci, 1999 (pp. 401-416)
61. National Institute of Standards and Technology.
http://w3.antd.nist.gov/s eamlessandsecure/pubtool .shtml,
visited on July 2011.
62. Balasubramaniam, S., & Indulska, J. (2004). Vertical
handover supporting pervasive computing in future wireless
networks. Computer Communications, 27(8), 708-719.
63. Piamrat, K., et al. QoE-aware vertical handover in wireless
heterogeneous networks, In Wireless Communications and
Mobile Computing Conference (IWCMC), 2011 (pp.95-
100).
64. A. Takahashi, D. Hands, and V. Barriac (2008).
Standardization activities in the ITU for a QoE assessment
of IPTV. Communications Magazine, vol. 46, 78-84.
65. M. Rovatsos & A. Belesiot is. Advice taki ng in multiagent
reinforcement learning, In Proceedings of the 6th
international joint conference on Autonomous agents and
multiagent systems (AAMAS), 2007 (pp. 1342-1344).
66. T. M. Mitchell, Machine Learning. McGraw-Hill, 1997.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
... Nowadays, the MARL is widely applied in AlphaGo, intelligent multi-robot systems, road traffic signal control, and distributed system control [33][34][35][36][37]. Here it is introduced as an enabling solution for the intelligent cognitive processing, for its superior performances of parallel optimization and dynamic strategy-selecting. ...
Article
Full-text available
The publisher's note contains a correction to [Opt. Express 29 32333 (2021)10.1364/OE.438439]. The article was corrected on 17 June 2022.
... Nowadays, the MARL is widely applied in AlphaGo, intelligent multi-robot systems, road traffic signal control, and distributed system control [33][34][35][36][37]. Here it is introduced as an enabling solution for the intelligent cognitive processing, for its superior performances of parallel optimization and dynamic strategy-selecting. ...
Article
Full-text available
Highly reliable wireless train-ground communication immune to the electromagnetic interferences (EMIs) is of critical importance for the security and efficiency of high-speed railways (HSRs). However, the rapid development of HSRs (>52,000 km all over the world) brings great challenges on the conventional EMIs mitigation strategies featuring non-real-time and passive. In this paper, the convergence of radio-over-fiber distributed antenna architecture (RoF-DAA) and reinforcement learning technologies is explored to empower a real-time, cognitive and efficient wireless communication solution for HSRs, with strong immunity to EMIs. A centralized communication system utilizes the RoF-DAA to connect the center station (CS) and distributed remote radio units (RRUs) along with the railway track-sides to collect electromagnetic signals from environments. Real-time recognition of EMIs and interactions between the CS and RRUs are enabled by the RoF link featuring broad bandwidth and low transmission loss. An intelligent proactive interference avoidance scheme is proposed to perform EMI-immunity wireless communication. Then an improved Win or learn Fast-Policy Hill Climbing (WoLF-PHC) multi-agent reinforcement learning algorithm is adopted to dynamically select and switch the operation frequency bands of RRUs in a self-adaptive mode, avoiding the frequency channel contaminated by the EMIs. In proof-of-concept experiments and simulations, EMIs towards a single RRU and multiple RRUs in the same cluster and towards two adjacent RRUs in distinct clusters are effectively avoided for the Global System for Mobile communications–Railway (GSM-R) system in HSRs. The proposed system has a superior performance in terms of circumventing either static or dynamic EMIs, serving as an improved cognitive radio scheme to ensuring high security and high efficiency railway communication.
... Relates to the current and the previous (stored) mean user-perceived quality (QoE) level, and length of the time the user has experienced degradation or improvements due to the recent action ns-2 4 Mobile nodes [35] Medium access control [50]. It creates a normalized QoE value using bandwidth, burst level, delay and jitter. ...
Article
Wireless networks show several challenges not found in wired networks, due to the dynamics of data transmission. Besides, home wireless networks are managed by non-technical people, and providers do not implement full management services because of the difficulties of manually managing thousands of devices. Thus, automatic management mechanisms are desirable. However, such control mechanisms are hard to achieve in practice because we do not always have a model of the process to be controlled, or the behavior of the environment is dynamic. Thus, the control must adapt to changing conditions, and it is necessary to identify the quality of the control executed from the perspective of the user of the network service. This article proposes a control loop for transmission power and channel selection, based on Software Defined Networking and Reinforcement Learning (RL), and capable of improving Web Quality of Experience metrics, thus benefiting the user. We evaluate a prototype in which some Access Points are controlled by a single controller or by independent controllers. The control loop uses the predicted Mean Opinion Score (MOS) as a reward, thus the system needs to classify the web traffic. We proposed a semi-supervised learning method to classify the web sites into three classes (light, average and heavy) that groups pages by their complexity, i.e. number and size of page elements. These classes define the MOS predictor used by the control loop. The proposed web site classifier achieves an average score of 87%±1%, classifying 500 unlabeled examples with only fifteen known examples, with a sub-second runtime. Further, the RL control loop achieves higher Mean Opinion Score (up to 167% in our best result) than the baselines. The page load time of clients browsing heavy web sites is improved by up to 6.6x.
... It assumes stochastic environment in which MU arrive simultaneously in groups and they do not need to know any information about other MUs in their group or any previous decision taken by other MUs in their group. Two reinforcement learning methods for stochastic approximation using Sastry algorithm [10] and Q-Learning algorithm [11] are used. It applies only bandwidth as the handoff decision parameter to calculate the utility of a WN. ...
Article
Full-text available
When a group of Mobile Users (MUs) equipped with multi-mode or multi-home terminals, like passengers on board a bus or a train or a car, moves from one wireless network (WN) to another WN within a heterogeneous wireless network (HWN) environment, request vertical handoffs simultaneously, a group vertical handoff (GVHO) occurs. In literature, the prevailing research work is mainly concerned for forced GVHO with network aspects like signal strength and bandwidth etc. while in reality the user initiated GVHO with the user aspects like price, power consumption and velocity etc. along with their respective user preferences is more important for performing vertical handoffs in HWNs. In user initiated GVHO, selection of the mutually best WN-MU pair which can maximise network revenue of constituent WN as well as user satisfaction of MU in a group while minimising the simultaneous selection of a WN by multiple MU of the group is a challenging problem. This paper proposes a GVHO decision model based on non-cooperative game which utilizes multiple handoff decision attributes and their respective user preferences calculated dynamically on real-time basis as the game strategies to select the best available WNs by group MUs at NASH equilibrium for vertical handoffs. The performance of the proposed model is evaluated in terms of number of GVHOs, price of anarchy and price of stability for both group of MUs and WNs. The simulation results show that the proposed model results in minimum number of GVHOs as compared to existing GVHO models and maximisation of user satisfaction and network revenue.
... The user preference for an attribute is driven by user compulsions in HWNs For example, a performance savvy user gives higher preference to network bandwidth while price savvy user gives higher preference to network usage cost in comparison to other decision attributes. These user preferences are pre-estimated according to mobile user's expected Quality of Experience (QoE) [15] and remains fixed i.e. static throughout an application execution time. However, in our view, in order to complete a task/application by a mobile user, the user preferences should also be driven by real time system compulsions. ...
Article
Full-text available
In Heterogeneous Wireless Networks (HWN), seamless Vertical Handoff (VHO) to the best available network is significant in providing Quality of Experience to the Mobile Users. The selection of best available network is based on multiple contrasting handoff decision attributes along with their respective User Preferences. In literature, the user preferences used in various network selection techniques are pre-fixed i.e. static and arbitrary without any standard theoretical basis. This paper proposes a method to moderate these static user preferences on real time basis according to the current value of respective handoff decision attributes, to make them dynamic and realistic. The effect of Dynamic User Preferences on network selection for vertical handoff, is evaluated with the prominent Multi Attribute Decision Making (MADM) methods like Simple Additive Weighting, Multiplicative Exponential Weighting, Technique of Order Preference Similarity to Ideal Solution, and Grey Relational Analysis. Simulations are performed using both static user preference weights from the user and proposed dynamic user preference weights. The result of simulations shows that the number of vertical handoffs, needs to complete an application by a mobile user, using dynamic user preference weights is less in comparison to using static user preference weights for all considered MADM methods. This proves the effectiveness of the proposed dynamic user preferences in network selections to perform VHOs in HWNs.
Chapter
This chapter describes the current trends of mobile devices in education, the applications of mobile technologies in learning, the overview of Mobile Learning (m-learning), and the importance of m-learning in global education. M-learning encourages both blended learning and collaborative learning, thus allowing the learners at different locations to get in touch with their peers or others teams to discuss and learn. The m-learning environment is about access to content, peers, experts, portfolio artifacts, credible sources, and previous thinking on relevant topics. Given the convenience of m-learning, there is less time spent getting trained, and the overall costs are lowered as a results. With m-learning, learners are able to learn in their own style at their own pace. M-learning provides easy access to the learning at any place and any time, which is more convenient to the learners.
Article
Full-text available
The major application areas of reinforcement learning (RL) have traditionally been game playing and continuous control. In recent years, however, RL has been increasingly applied in systems that interact with humans. RL can personalize digital systems to make them more relevant to individual users. Challenges in personalization settings may be different from challenges found in traditional application areas of RL. An overview of work that uses RL for personalization, however, is lacking. In this work, we introduce a framework of personalization settings and use it in a systematic literature review. Besides setting, we review solutions and evaluation strategies. Results show that RL has been increasingly applied to personalization problems and realistic evaluations have become more prevalent. RL has become sufficiently robust to apply in contexts that involve humans and the field as a whole is growing. However, it seems not to be maturing: the ratios of studies that include a comparison or a realistic evaluation are not showing upward trends and the vast majority of algorithms are used only once. This review can be used to find related work across domains, provides insights into the state of the field and identifies opportunities for future work.
Chapter
The Internet of Things (IoT) is transforming the agriculture industry and enables farmers to deal with the vast challenges in the industry. Internet of Farming (IoF) applications increases the quantity, quality, sustainability as well as cost effectiveness of agricultural production. Farmers leverage IoF to monitor remotely, sensors that can detect soil moisture, crop growth and livestock feed levels, manage and control remotely the smart connected harvesters and irrigation equipment, and utilize artificial intelligence based tools to analyze operational data combined with 3rd party information, such as weather services, to provide new insights and improve decision making. The Internet of Farming relies on data gathered from sensor of Wireless Sensor Network (WSN). The WSN requires a reliable connectivity to provide accurate prediction of the farming system. This chapter proposes a strategy that provides always best connectivity (ABC). The strategy considers a routing protocol to support Low-power and lossy networks (LLN), with a minimum energy usage. Two scenarios are presented.
Article
Full-text available
One emerging characteristic of electronic devices is the increasing number of connectivity interfaces (aka NICs1) towards the outside world. That obviously translates in a set of technical issues related to their management in order to provide seamless connectivity when the connections move from one interface to another. The IEEE 802.21 is a recent effort of IEEE that aims at providing a general interface for the management of NICs. In this paper we discuss how the upcoming standard may be effectively exploited in a mobile context in order to hide network heterogeneity to end users. To accomplish this task, we propose a centralized element called Mobility Manager interfacing with the 802.21 sublayer and responsible for the application of connectivity policies. Based on a real testbed, we showed that the new standard and the MM can be used to improve network performance experienced by the end user. Moreover we showed how the MM can interact with adaptive applications in order to improve further the range of usability of real-time applications.
Article
Full-text available
In the GRAAD project we aim at developping a knowledge-based system which manipulates spatial and temporal knowledge while simulating the kind of behaviour that people adopt when describing a route. A route description is essentially a narrative text in which sentences are prescriptions given by the speaker to an addressee: they describe a succession of actions that the addressee will have to carry out when s/he follows the route in the described environment. Hence, temporal and spatial knowledge are «interleaved» in a route description. In this paper we present an approach for generating route descriptions using spatial conceptual maps and a simulation of the virtual pedestrian's movements in these maps. We show how the notion of influence area enables us to transform spatial relations of neighborhood and orientation into topological relations. A way can be partitioned into a succession of certain typical segments (intersection with other ways, intersection with crossable objects, intersections with landmark objects' influence areas) which are well suited for natural language descriptions. A route is specified as a succession of way segments pertaining to one or several ways, some of them being used to generate the natural language description. We show how the equations of the virtual pedestrian's trajectory can be used to select the proper movement verbs used in the route description.
Article
Introduction QoS concepts and standards IETF multimedia protocols Semantic approach for QoS management in home networks Conclusion Bibliography
Article
The next generation in mobility management will enable different mobile networks to interoperate with each other to ensure terminal and personal mobility and global portability of network services. However, in order to ensure global mobility, the deployment and integration of both satellite and terrestrial components are necessary. This article is focused on issues related to mobility management in a future mobile communications system, in a scenario where a multisegment access network is integrated into an IP core network by exploiting the principles of Mobile IP. In particular, attention is given to the requirements for location, address, and handover management. In a heterogeneous environment, the need to perform handover between access networks imposes particular constraints on the type of information available to the terminal and network. In this case, consideration will need to be given to parameters other than radio characteristics, such as achievable quality of service and user preference. This article proposes a new approach to handover management by applying the fuzzy logic concept to a heterogeneous environment. The article concludes with a presentation of mobility management signaling protocols
Article
This Recommendation describes methods and procedures for conducting subjective evaluations oftransmission quality. The main revision encompassed by this version of this Recommendation is theaddition of an annex describing the Comparison Category Rating (CCR) procedure. Othermodifications have been made to align this Recommendation with recent revision ofRecommendation P.830.
Article
Recent developments in heterogeneous mobile networks emphasize the necessity of more intelligent and context-aware handover decisions. However, the complexity and overhead of collecting and managing context information are the main difficulties in context-aware handovers. Media independent handover (MIH) framework which has been proposed by IEEE 802.21 only provides static context of access networks through its information service. This paper elaborates the idea of handoff-aware network context gathering for renewal of dynamic context in MIH information server. An extension is proposed on MIH framework to efficiently accommodate the dynamic context of access networks along with the ordinary static context in IS. The paper presents analytical evaluation of the proposed context gathering method in terms of context access latency and signalling overhead. Also, the paper presents a policy-based context-aware handover model based on the proposed extension. A well defined policy format is proposed for straight description of users’, devices’, and applications’ preferences and requirements. In contrast to traditional policy-based methods, a multi-policy scheme is proposed that exploits rank aggregation methods to employ a set of matching policies in target point of attachment selection. Simulations have been carried out in NS2 to verify the performance of the proposed context gathering method and the proposed handover decision model. Simulation results show better performance in terms of evaluation metrics.
Article
Wi-Fi based hotspots offer mobile users broadband wireless Internet connectivity in public work spaces and corporate/university campuses. Despite the aggressive deployment of these hotspots in recent years, high-speed wireless Internet access remains restricted to small geographical areas due to the limited physical coverage of wireless LANs. On the other hand, despite their lower throughput, cellular networks have a significantly wider coverage and are thus much more available. Recognizing that 2.5G or 3G cellular networks can effectively complement wireless LANs, we set out to develop a vertical handoff system that allows mobile users to seamlessly fall back to such cellular networks as the general packet radio service (GPRS) or 3G whenever wireless LAN connectivity is not available. The resulting handoff mechanism allows a network connection of a mobile node to operate over multiple wireless access networks in a way that is transparent to end user applications. In this paper, we present the design, implementation, and evaluation of a fully operational vertical handoff system, called OmniCon, which enables mobile nodes to automatically switch between wireless LAN and GPRS, based on wireless LAN availability, by introducing a simple extension to the existing Mobile IP implementation. We discuss the design issues in the proposed vertical handoff system for heterogeneous networks, including connection setup problems due to network address translation, and the disparity in link characteristics between wireless LANs and GPRS. A detailed performance evaluation study of the OmniCon prototype demonstrates its ability to migrate active network connections between these two wireless technologies with low handoff latency and close to zero packet loss. Copyright © 2006 John Wiley & Sons, Ltd.
Article
A context-based adaptive communication system is introduced for use in heterogeneous networks. Context includes the user's presence, location, available network interfaces, network availability, network priority, communication status, terminal features, and installed applications. An experimental system was developed to clarify the feasibility of using context information to flexibly control networks and applications. The system operates on a seamless networking platform we developed for heterogeneous networks. By using contexts, the system can inform the caller and callee of applications they can access, which are available through the network before communication occurs. Changes in contexts can switch an on-going application to another during actual communication. These functions provide unprecedented styles of communication. A business scenario for a seamless networking provider is also presented.