ArticlePDF Available

A personalized QoE-aware handover decision based on distributed reinforcement learning

March 2013
Wireless Networks 19(8)

March 2013
19(8)

DOI:10.1007/s11276-013-0572-2

Authors:

Behrouz Shahgholi Ghahfarokhi

University of Isfahan

Naser Movahhedinia

Recent developments in heterogeneous mobile networks and growing demands for variety of real-time and multimedia applications have emphasized the necessity of more intelligent handover decisions. Addressing the context knowledge of mobile devices, users, applications, and networks is the subject of context-aware handoff decision as a recent effort to this aim. However, user perception has not been attended adequately in the area of context-aware handover decision making. Mobile users may have different judgments about the Quality of Service (QoS) depending on their environmental conditions, and personal and psychological characteristics. This reality has been exploited in this paper to introduce a personalized user-centric handoff decision method to decide about the time and target of handover based on User Perceived Quality (UPQ) feedbacks. The UPQ degradations are mainly for the sake of (1) exiting the coverage of the serving Point of Attachment (PoA) or (2) QoS degradation of serving access network. Using UPQ metric, the proposed method obviates the necessity of being aware about rapidly varying network QoS parameters and overcomes the complexity and overhead of gathering and managing some other context information. Moreover, considering the underlying network and geographical map, the proposed method is able to inherently exploit the trajectory information of mobile users for handover decision. UPQ degradation is not only due to the user behaviour, but also due to the behaviours of others users. As such, multi-agent reinforcement learning paradigm has been considered for target PoA selection. The employed decision algorithm is based on WoLF-PHC learning method where UPQ is used as a delayed reward for training. The proposed handoff decision has been implemented under IEEE 802.21 framework using NS2 network simulator. The results have shown better performance of the proposed method comparing to conventional methods assuming regular movement of mobile users.

A taxonomy of MARL Algorithms based on task type [37] Recent handoff management methods are contextaware, that is looking into a wider knowledge of underlying context and their changes. The context information is usually classified into network-side information and mobile-side information. Depending on location of handover decision (MN or access network), part of this context must be transferred there. Gathering context information is a difficulty for context-aware handovers and adds complexity and signalling overhead to MNs and access networks. Authors of [50] have proposed a context-based application handover that exploits user's presence, location, available network interfaces, network availability, network priority, communication status, terminal features, and installed applications. Tramcar [12] is a cross-layer context-aware architecture that utilizes price, power consumption, network conditions, user preference, and network performance in an analytic decision function. Decision function employs dynamic context information from accessible networks; however the method of gathering this part of context is missing. Authors of [57] have presented a context-aware handoff mechanism for ubiquitous computing environments that includes an MADM-based and a Genetic Algorithm (GA) based target PoA selection service. The MADM-based method is aiming at providing higher QoS performance while the goal of GAbased method is reducing the number of handovers in presence of satisfying requirements. The GA-based method takes the mobility of user into account. The handover procedure in [51] includes a handover trigger method which is based on context changes (MN

…

The proposed model for personalized handover decision in mobile nodes

…

. All PoAs Response TLV

…

First Simulation Environment

…

+14

presents frame loss comparison under these two

…

Figures - uploaded by Naser Movahhedinia

Content may be subject to copyright.

Content uploaded by Naser Movahhedinia

Content may be subject to copyright.

A Personalized QoE-Aware Handover Decision based

on Distributed Reinforcement Learning

Behrouz Shahgholi Ghahfarokhi1,*, Naser Movahhedinia2

{shahgholi, naserm}@eng.ui.ac.ir

1Department of Information Technology Engineering, University of Isfahan, Isfahan, Iran

2Department of Computer Engineering, University of Isfahan, Isfahan, Iran

*Corresponding Author (Email: shahgholi@eng.ui.ac.ir, Tel: +98-311-7934094)

Abstract

Recent developments in heterogeneous mobile networks and growing demands for variety of real-time and

multimedia applications have emphasized the necessity of more intelligent handover decisions. Addressing the context

knowledge of mobile devices, users, applications, and networks is the subject of context-aware handoff decision as a

recent effort to this aim. However, user perception has not been attended adequately in the area of context-aware

handover decision making. Mobile users may have different judgments about the Quality of Service (QoS) depending

on their environmental conditions, and personal and psychological characteristics. This reality has been exploited in this

paper to introduce a personalized user-centric handoff decision method to decide about the time and target of handover

based on User Perceived Quality (UPQ) feedbacks. The UPQ degradations are mainly for the sake of 1) exiting the

coverage of the serving Point of Attachment (PoA) or 2) QoS degradation of serving access network. Using UPQ

metric, the proposed method obviates the necessity of being aware about rapidly varying network QoS parameters and

overcomes the complexity and overhead of gathering and managing some other context information. Moreover,

considering the underlying network and geographical map, the proposed method is able to inherently exploit the

trajectory information of mobile users for handover decision. UPQ degradation is not only due to the user behaviour,

but also due to the behaviours of others users. As such , Multi-Agent Reinforcement Learning (MARL) paradigm has

been considered for target PoA selection. The employed decision algorithm is based on WoLF-PHC learning method

where UPQ is used as a delayed reward for training. The proposed handoff decision has been implemented under IEEE

802.21 framework using NS2 network simulator. The results have shown better performance of the proposed method

comparing to conventional methods assuming regular movement of mobile users.

Keywords

User Perceived Quality, Context-Aware Handover, QoE-Aware Handover, Distributed

Reinforcement Learning.

*Manuscript

Click here to download Manuscript: WN-revised-3.docx

Click here to view linked References

A Personalized QoE-Aware Handover Decision based

on Distributed Reinforcement Learning

Abstract

Recent developments in heterogeneous mobile

networks and growing demands for variety of real-time

and multimedia applications have emphasized the

necessity of more intelligent handover decisions.

Addressing the context knowledge of mobile devices,

users, applications, and networks is the subject of

context-aware handoff decision as a recent effort to this

aim. However, user perception has not been attended

adequately in the area of context-aware handover

decision making. Mobile users may have different

judgments about the Quality of Service (QoS) depending

on their environmental conditions, and personal and

psychological characteristics. This reality has been

exploited in this paper to introduce a personalized user-

centric handoff decision method to decide about th e time

and target of handover based on User Perceived Quality

(UPQ) feedbacks. The UPQ degradations are mainly for

the sake of 1) exiting the coverage of the serving Point of

Attachment (PoA) or 2) QoS degradation of serving

access network. Using UPQ metric, the proposed method

obviates the necessity of being aware about rapidly

varying network QoS parameters and overcomes the

complexity and overhead of gathering and managing

some other context information. Moreover, considering

the underlying network and geographical map, the

proposed method is able to inherently exploit the

trajectory information of mobile users for handover

decision. UPQ degradation is not only due to the user

behaviour, but also due to the behaviours of others users.

As such, Multi-Agent Reinforcement Learning (MARL)

paradigm has been considered for target PoA selection.

The employed decision algorithm is based on WoLF-

PHC learning method where UPQ is used as a delayed

reward for training. The proposed handoff decision has

been implemented under IEEE 802.21 framework using

NS2 network simulator. The results have shown better

performance of the proposed method comparing to

conventional methods assuming regular movement of

mobile users.

Keywords

User Perceived Quality, Context-Aware

Handover, QoE-Aware Handover, Distributed

Reinforcement Learning.

Abbreviations

AHD Adaptive Handover Decision

AHP Analytic Hierarchy Process

IS Information Server

MADM Multi Attribute Decision Making

MARL Multi Agent Reinforcement Learning

MCHO Mobile Controlled Han dover

MICS MIH Independent Command Service

MIES Media Independent Event Service

MIH Media Independent Handover

MIIS MIH Independent Information Service

MIP Mobile IP

MN Mobile Node

MOS Mean Opinion Score

PHC Policy Hill Climbing

PoA Point of Attachment

PQE Perceived Quality Evaluator

PSNR Peak Signal to Noise Ratio

QoCE Quality of Customer Experience

QoE Quality of Experience

QoS Quality of Service

QoUE Quality of User Experience

RSS Received Signal Strength

SAW Simple Additive Weighting

SCM Spatial Conceptual Map

SG Stochastic Game

SSNR Segmental Signal to Noise Ratio

TLV Type-Length-Value

UPQ User Perceived Quality

WEA Way Elementary Areas

WoLF Win or Learn Fast

1 Introduction

The evolution of wireless mobile networks has

necessitated incorporating more intelligence in multi-

technology vertical handoff management. Entering the

ubiquitous computing era, mobile users need to be

Always Best Connected (ABC) anywhere and at anytime

to diverse access technologies. As such, handoff decision

is an essential point of attention in next generation

wireless mobile n etworks. Recently, context-awareness

has been employed in handover decision making. The

context-aware handoff can be defined as a handover

procedure which selects a target access node based not

only on the signal quality (as is done traditionally), but

also on a wide knowledge of the mobile side and the

network side information, in order to take an intelligent

and optimized decision [1]. However, the r ole of user s in

context-aware handovers has been limited to influencing

their preferences and requirements in terms of decision

parameters. The telecom market is facing a migration

from network centricity towards user-centricity [2] and

this emphasizes that users should have greater control

over automatic handover decision to select the access

network with which they are most satisfied.

In recent communication networks, the Quality of

Service (QoS) approach is more and more substituted by

the Quality of Experience (QoE) approach which is

defined as the overall acceptability of the service as

perceived by the user [3]. QoE covers two main aspects,

namely Quality of User Experience (QoUE) and Quality

of Customer Experience (QoCE) [4]. In this study, we

focus on QoUE or User Perceived Quality (UPQ). UPQ is

not only related to network QoS factors, but also to the

user preferences and application requirements [5] [3], the

capabilities of mobile device [6] [3] [7], environmental

conditions (e.g. surrounding audio interference in

conversations or light conditions for videos) [8] [7] and

subjective factors such as the emotional and

psychological state of user [6] [3]. In this paper, UPQ has

been considered as a novel and remarkable metric in

handover decision which embodies a wide r ange of

context knowledge.

Authors of [3] have emphasized on the impor tance of

QoE in next generation networks and have proposed a

general framework for end to end QoE assurance where

the QoE has been considered as a metric for network and

application management and adjustment. In [9], authors

have demonstrated the importance of defining concept of

UPQ and linking it to specific wireless data network

parameters. In [10], UPQ has been considered beside the

QoS parameters for adaptive configuration of protocols.

As our best of knowledge, a few works have employed

UPQ metric in handover procedure. Reference [11] just

exploits UPQ degradations as a trigger for handover

initiation and [63] employs minimum QoE of ongoing

users in candidate networks as an indicator for target PoA

selection accepting signalling overhead of broadcasting

minimum estimated QoE to Mobile Nodes (MNs).

The handover decision may be performed in a

centralized fashion or being abandoned to MNs. The

former needs the mobile side context information to be

transferred to the network while the later requires the

network side context to be obtained by MNs. Herein,

gathering information of different access networks,

especially the QoS parameters which are rapidly

changing, is a complication for context-aware decision

methods. While some of the related works (e.g. [12],

[13], [11], [14]) have not addressed context gathering,

some others (e.g. [15], [16]) have presented complicated

and inefficient mechanism for it. The context transfer

overhead is the drawback of context-awareness with

respect to the concept of green wireless technology and

MNs' power restriction. Therefore, some of the recent

researches (e.g. [43], [55]) have tried to reduce this

overhead although some signalling overhead is

unavoidable.

To indirectly exploit the context information related

to user satisfaction level and to avoid the overhead of

gathering the dynamic network context in context-aware

Mobile Controlled HandOvers (MCHO), a personalized

QoE-aware handover decision method has been

introduced in this paper. In the proposed method, MN is

responsible for handoff initiation and target Point of

Attachment (PoA) selection. The main contribution of

this paper is an adaptive target PoA selection that learns

its strategy from UPQ feedbacks without remarkable

signalling. The UPQ degradations may be due to the

behaviour of MN (e.g. moving out of the coverage of the

PoA) or due to the behaviours of other MNs in its society

(where their behaviours affect the QoS of the serving

PoA). In other words, a handoff decision performed by an

MN may affect the quality of others' perception. Hence,

the target selection mechanism should learn its skill like

an agent in a multi-agent environment regarding both the

behaviour of mobile user (mobility) and the behaviour of

other MNs (mobility and entrance/exit). Being so, a tight

combination of mobility prediction and context-

awareness will be formed in target PoA selection.

The proposed target selection mechanism is based on

Multi-Agent Reinforcement Learning (MARL) concept

which considers the UPQ parameter for its training stage.

The intelligent handover decision should be such that the

shared resources of access networks are exploited

efficiently and without additional communications

between MNs. This problem needs a combination of

cooperation and competition between isolated agents

(decision makers). Per se, WoLF-PHC [17] algorithm has

been employed as an adaptive and convergent algorithm

for learning the handover decision skill.

This paper exploits such an intelligent handover

decision mechanism in a handover management model

proposed under Media Independent Handover (MIH)

framework. The paper presents details of algorithms

performed upon reception of different MIH originated

handover triggers and also the UPQ degradation trigger.

The remainder of this paper has been organized as

follows; in the next section, the research background is

presented. Section 3 introduces some of the previous

works in handover decision. Section 4 presents our

proposed method and section 5 demonstrates simulation

results. Section 6 discusses about advantages and

shortcomings of the proposed method and finally the

paper is concluded in section 7.

2 Research Background

In this section, after introducing the MIH framework,

the UPQ evaluation methods are described and in

subsection 2.3, a survey of MARL methods is presented.

2.1 Media Independent Handover

The IEEE 802.21 [18] is a recent effort of IEEE

aiming at providing a general interface for the handover

and interoperability between heterogeneous networks

which is called MIH. One of the main ideas behind IEEE

802.21 is to provide a common interface for managing

events emerged from different network devices and

dispatching control messages to them [19]. The standard

specifies the MIH Function that is responsible for this

generalization and provides three primary services as

below [19]:

x The MIES (Media Independent Event Service) that

provides support for both local and remote link

layer event notifications to the upper layers of an

MN.

x The MICS (Media Independent Command Service)

that is used to claim for gathering information about

the status of connected links and also to execute

mobility and connectivity decisions in layer 2.

x The MIIS (Media Independent Information Service)

which provides discovery and distribution of

network information within a geographic area. The

role of MIIS is to provide information about the

available networks and PoAs through Information

Elements (IE) accessed from an Information Server

(IS).

These services are provided to any mobility and

handoff management method in upper layers (namely

MIH Users). Note that the 802.21 standard does neither

specify rules or policies for handover decision nor

determines whether the handover has to be terminal or

network initiated [20].

2.2 User Perceived Quality Evaluation

As justified in section 1, attending to the UPQ is

valuable to improve the performance of handover in next

generation wireless networks. One of the significant

trends in the QoE estimation over the last few years is for

multimedia traffic. Referring to the level of quality

experienced by the user enjoying multimedia services,

QoE has been recognized as a mean in multimedia

transmissions in traditional and recent telecommunication

standards such as ITU-T P.800 [21], ITU-T Rec. J.246

[22], G.1070 [23], and J.247 [24]. Moreover, many

efforts addressed in the literature have focused on the

estimation of QoE for audiovisual traffic. Two main

directions are being explored: objective and subjective

testing of perceived QoS. The methodology to capture the

subjective quality of perception for speech is based on

MOS (Mean Opinion Score) proposed by ITU-T in P.800

[21]. In P.910 [25], the main recommendations regarding

MOS methods have been described for video quality. To

determine MOS, a number of listeners/viewers rate the

quality of audio/video transmitted through a

communication system. Ranged from 1 (worst) to 5

(best), MOS is considered as the arithmetic mean for all

the individual scores [26]. Although the subjective

method is more realistic, the complexity and cost of the

required tests usually make it laborious [26].

In a sporadic way of subjective assessment, the

network users may individually report their UPQ for

personalized usages. However, direct UPQ feedback of

users adds interaction complexity to users and system.

Hence, estimating UPQ from the content and also the

behaviour of the user is preferred in some applications.

A number of standards and models have been

specified in the ITU-T for objective evaluation of video

and audio quality. Example of traditional standards are

P.861 [27] and P.862 [28] in case of evaluating the

quality of voice and J.144 [29] in case of video quality

assessment. Recently, several new standards have also

been defined within ITU-T for objective UPQ evaluation

including G.1070 [23], J.246 [22], and J.247 [24]. In [30]

a decision-tree bas ed method has been proposed to model

the dependency of network and application QoS to QoE

using subjective quality feedbacks. In [7], Bayesian

networks are employed for QoE modeling based on a

variety of context parameters. Authors of [31] have

proposed an exponential relation that estimates QoE from

QoS parameters. Reference [32] offers a combination of

objective and subjective measures to better manage the

complexity of QoE metric.

The objective UPQ assessment methods can be

categorized to full-reference, no-reference, and reduced-

reference evaluation. Full-reference methods (e.g. J.247

for video and P.862 for audio) compare the received

audiovisual stream against original stream and check for

differences. In contrast, no-reference or reference-free

methods (e.g. [33]) analyze the received stream without

any comparison. Between these two extremes are

reduced-reference methods (e.g. J.246 for video) that do

not consider the original stream, but require some

characteristics describing it. Although full-reference

methods have greater precision, their high processing

requirements and necessity for original multimedia

prevent them from being used in real-time (in-service)

applications such as handover decision. Hence, full-

reference methods are only useful for pre-service quality

assessments and simulations. Instead, high performance

no-reference methods can be used for in-service

applications such as real deployment of the method

proposed in this paper.

As illustrated in [34], there are three classes of

objective voice quality evaluation metrics: the network

parameter based metrics, the psycho-acoustic metrics and

the elementary metrics. The network parameter based

metrics do not consider the voice signal, while the

psycho-acoustic metrics transform the voice signals to a

reduced representation to retain only the perceptually

significant aspects. On the other side, elementar y

objective metrics rely on low-complexity signal

processing techniques to predict the subjective voice

quality, regarding original signal. The Segmental Signal

to Noise Ratio (SSNR) has been chosen as a simple and

mostly used el ementary metric for objective evaluation of

voice quality [34].

Similar classification can be imagined for objective

video quality assessment in which psycho-vision methods

try to recognize or predict impairments by analyzing the

inherent characteristics of video such as blockiness or

jerkiness. Some of these methods evaluate the quality in

the spatial domain after decompressing the video stream

while some others perform this evaluation faster in

compressed domain (e.g. [35]). In contrast, elementary

metrics require the original video to estimate the UPQ

using simple image processing techniques. The most

widespread elementary method for video quality

assessment is the calculation of Peak Signal to Noise

Ratio (PSNR) [36] that performs better than most of the

traditional objective methods [64]. In this paper, video

traffic and PSNR metric have been exploited for the

evaluation of our proposed handover decision method as

in section 5.

2.3 Multi-Agent Reinforcement Learning

Reinforcement learning is a well known method for

solving problems whereby a single agent can learn to

choose optimal actions to achieve its goal. Here, the task

of system is to learn a target function π that plans from

the current state s to the optimal action a=π(s). The

trainer only is provided by a reward value after executing

an action, and the goal of the system is to attain the

optimal target function, π* to maximize the overall

reward.

Q-Learning is one of the algorithms for learning π*

that is based on an evaluation function Q(s, a). The value

of Q(s, a) should be the maximum cumulative reward that

can be achieved starting from state s and performing

action a as the first action. In other words, the amount of

estimated reward obtained upon executing action a in

state s (namely Q(s,a)) depends not only on the

immediate reward from executing a, but also on the

rewards gained from later actions that will be possible

after executing a. Q is assumed to be an n by m matrix

wher e n is the number of states and m is th e number of

possible actions and its elements must be maximized

during the learning phase. There are some algorithms for

updating Q matrix; a rule of update for non-deterministic

environments is as below [66]:

))) a ,s( (Q γmax+β(r+a)(s,Q β)-(1=a)(s,Q

ta t1+t

(1)

wher e β is a small constant called learning rate, r stands

for achieved reward after executing action a, s' is the new

state after executing this action, and

10 d

is discount

factor .

In Q-Learning, the strategy of the agent is greedily

selecting the action a=

)) a(Q(s, argmax

regarding

current state, s. Therefore

)) a,s((Q max

in relation

(1) considers the reward that may be obtained assuming

the greedy strategy. As this future reward is an

expectation, it is added to the current reward, r by a

discount factor. As the environment is non-deterministic,

also a learning rate has been considered to update the

value of Q(s,a) to the new estimated value smoothly

rather than direct replacement.

The Stochastic Game (SG) framework can be

introduced as a multi-agent and multi-state representation

of the environment. A Stochastic Game is a multi-tuple

(n, S, A1...n, T, R1...n), where n is the number of players

(agents), S is the set of states, Ai is the set of actions

available to player i (and A=A1

...

An is the set of joint

action space), T is the transition function S

S → [0,

1], and Ri is the reward function for the ith agent; S

A →



[17]. The player's strategy may be a pure strategy that

selects the actions deterministically or mixed strategy that

select the action according to a probability distribution

over all possible actions. In mixed strategy game which

has been considered in this paper, a stationary policy is to

be learnt as π: S×A → [0, 1] to map states to a

probability distribution over actions such that the players’

discounted future rewards are maximized.

MARL algorithms are a type of learning algorithms

used by agents to find an optimal strategy for SGs. Two

properties are desirable for an MARL algorithm. At first,

it must be rational, i.e. if the other players’ policies

converge to the stationary policies, the learner converges

to a policy which is best-response to others policies.

Secondly the learner must be convergent, i.e. it certainly

converges to a stationary policy. If all players use a

rational and convergent learning algorithm, then players

are guaranteed to converge to a Nash Equilibrium [17].

MARL algorithms can be classified along several

dimensions such as task type. The type of task targeted by

the learning algorithm leads to the classification of

MARL techniques into those addressing fully

cooperative, fully competitive, or mixed SGs [37]. In a

fully cooperative SG, the agents have the same reward

and the learning goal is to maximize the common reward.

In fully competitive or zero-sum SGs (2-palyer), one's

reward is always the negative of the other's. However, in

mixed games, there is no constraint on the reward of the

agents. The mixed SG model is appropriate for self-

interested agents [37]. Figure 1 shows this taxonomy and

sample MARL algorithms in each category for both static

and dynamic games. Static (or stateless) games are those

SGs with =∅.

In another point of view, there are two forms of

MARL that are isolated learning and interactive learning.

In the isolated form of learning, each agent learns to

optimize its reinforcement from the environment without

any communication. In contrast, interactive agents

explicitly communicate to decide on individual and group

actions [38]. Examples of isolated learning algorithms are

WoLF-PHC while Team-Q, AWESOME and Nash-Q are

not communication free.

Also, the MARL algorithms are classified to

homogeneous and heterogeneous algorithms. In

homogeneous algorithms, all the agents should play the

game with the same learning method (self-play) while in

heterogeneous algorithms, some agents may have

different rational methods for action selection. Examples

of self-play algorithms are Team-Q and Nash-Q while

AWESOME and WoLF-PHC are not self-play [37]. In

handover decision problem, using learning algorithms

that are suitable for heterogeneous environments allows

some of the MNs to have traditional decision methods.

Assuming that those traditional methods do rational

decisions, the distributed learning algorithm converges to

the optimal solution in such environments too. Based on

these characteristics, WoLF-PHC algorithm has been

chosen for our proposed method which will be described

more in section 4.

3 Related Works

Traditional handoff decision methods such as [39, 40, 41,

42, 44] were mostly using link and trajectory information.

However the next generation of handover decision

methods addressed more metrics in decision making. For

example, a multi-service vertical handoff decision

algorithm has been introduced in [45] to judge target

networks based on a wider variety of user and network

metrics including QoS parameters. MADM (Multi

Attribute Decision Making) methods are well-known

decision techniques that have been employed in most of

the handover decision makings. A comparison of some

MADM methods has been reported in [46] considering

bandwidth, delay, jitter, and Bit Error Rate (BER) as

decision parameters. Zhang has proposed a fuzzy MADM

based vertical handoff in [47]. On the same ground,

Hongyan et al. [48] and Chan et al. [49] have proposed

fuzzy based MADM processes to perform PoA and

interface selection based on the cost constraints and

application priorities specified by users.

Figure 1. A taxonomy of MARL Algorithms based on task type [37]

Recent handoff management methods are context-

aware, that is looking into a wider knowledge of

underlying context and their changes. The context

information is usually classified into network-side

information and mobile-side information. Depending on

location of handover decision (MN or access network),

part of this context must be transferred there. Gathering

context information is a difficulty for context-aware

handovers and adds complexity and signalling overhead

to MNs and access networks.

Authors of [50] have proposed a context-based

application handover that exploits user’s presence,

location, available network interfaces, network

availability, network priority, communication status,

terminal features, and installed applications. Tramcar [12]

is a cross-layer context-aware architecture that utilizes

price, power consumption, network conditions, user

preference, and network performance in an analytic

decision function. Decision function employs dynamic

context information from accessible networks; however

the method of gathering this part of context is missing .

Authors of [57] have presented a context-aware

handoff mechanism for ubiquitous computing

environments that includes an MADM-based and a

Genetic Algorithm (GA) based target PoA selection

service. The MADM-based method is aiming at

providing higher QoS performance while the goal of GA-

based method is reducing the number of handovers in

presence of satisfying requirements. The GA-based

method takes the mobility of user into account.

The handover procedure in [51] includes a handover

trigger method which is based on context changes (MN

exits or enters a cell; QoS degrades below acceptable

threshold or user requests for handover). It also proposes

a QoS based target selection and also the adaptation of

communication streams. A Context Manager in that

method gathers, manages, and evaluates context

information. However, being aware about network

context modifications (i.e. QoS parameters) imposes

heavy signalling overhead due to rapidly varying nature

of network resources.

Ahmed et al. have proposed the architecture of a

context-aware mobile-initiated and controlled vertical

handover decision model [13]. The proposed access

networks evaluation and ranking is a five stage Analytic

Hierarchy Process (AHP) based process. Similarly, the

authors of [52] have proposed a combined decision

method which uses Fuzzy Logic to decide about the

handover initiation and AHP to decide about the target

access network. Those papers have not clearly described

the context collection mechanism.

In [11] a policy-based handover management method

has been presented. The handover decision is performed

in backbone of wireless network and authors have

assumed a network context monitor to obtain access

networks’ information. However, details of gathering

network side context are missing. The proposed scheme

in [53] considers context of services and user intention, in

addition to network information. This paper has discussed

the method of gathering application requirements and

user intention; however the network context gathering has

not been considered.

In [14], an autonomic handover manager has been

proposed which is based on the autonomic computing.

MARL Algorithms

Fully Cooperative

JAL

FMQ Team-Q

Distributed-Q

OAL

Fully Competitive

Minimax-Q

Mixed

Fictitious Play

MetaStrategy

IGA

WoLF-IGA

GIGA

GIGA-WoLF

AWESOME

Hyper-Q

Single-agent RL

Nash-Q

CE-Q

Asymmetric-Q

NSCP

WoLF-PHC

PD-WoLF

EXORL

Static

Dynamic

Static

Dynamic

They have considered a context server in backbone that

collects the network information from context

repositories distributed in different access networks and

provides them to MNs. However, they have stated that:

"this is not currently realistic because it is difficult and

rather impossible to share the information among the

service providers [14]." Nonetheless, they have assumed

that such a context gathering mechanism will be possible

in 4G networks.

In [15], a general framework has been proffered for

context-aware handover decision. In their architecture,

handover decision points are responsible for deciding

about handover destination while context collection

points collect, compile, and deliver the relevant context

information to the handover decision points. Reference

[16] presents an integrated approach for context

management based on active networking technology.

However, this method increases the complexity of MNs

and adds signalling overhead.

MIH framework provides static context of access

networks through its MIIS service. However, some of the

works in the literature have extended the MIH standard to

provide other context parameters such as dynamic

network context. An Enhanced MIH framework has been

introduced in [54] to gather more context information

while this method is more complex comparing to

standard MIH framework and imposes heavy signalling

overhead to network. Authors of [56] have exploited

extended MIH_MN_HO_Candidate_Query and

MIH_N2N_HO_Query_Resources MICS primitives

defined by IEEE 802.21 to ask the neighbouring PoAs

about their dynamic information.

Authors of [55] have proposed an extended context-

aware information server for MIH-based handover. That

paper has specifically focused on dynamic network

context gathering and an information update algorithm

has been presented for PoAs to send their context to IS.

The drawback of this method is that small variations of

the resources (e.g. due to unsteady nature of variable bit

rate traffic) results in huge number of update packets

when the access networks’ resources are near the

saturation. Authors have proposed a handoff-aware

network context gathering based on MIH framework in

[43] to reduce the signalling overhead of dynamic

network context gathering. In that method, PoAs update

their resources whenever a new MN attaches or detaches

from them, so the signalling overhead is yet remarkable.

That paper also proposes the pre-fetching of dynamic

network context before handover decision starts.

Considering the user in previous context-aware

methods is mostly limited to applying its requirements

and preferences as some thresholds and weights. Authors

of [2] have emphasized on user-centric approach for

handover decision. However, the user preferences vector

is the only user relevant context in their method which is

modified according to the situation of user and the class

of application. Reference [11] has considered UPQ as a

novel user relevant parameter just for handover detection.

In [63], estimated QoE of ongoing users in candidate

networks has been considered as an indicator to select the

best access network. In that method, PoAs estimate the

minimum QoE of users using pseudo-Subjective Quality

Assessment (PSQA) method and broadcast it to all users

within their range. This estimate is used beside price and

mobility metric for target PoA selection. However, it

implies heavy signalling overhead and estimated QoE is

not user and application dependent.

From above review, the followings can be concluded:

1) Although the contribution of users in recent handover

decision methods has been increased, however this

contribution is mostly limited to influencing the user

preferences and requirements. Contributing the UPQ has

not earned deserving attention in handover decision since

UPQ metric has not been utilized for target PoA selection

ever.

2) The above review implies that context gathering and

management is a major difficulty of context-aware

handovers and this difficulty is more challenging for

dynamic network context. We will show that addressing

UPQ metric exempts the handover decision from being

aware about that part of context and its gathering

complexity.

3) The movement information of users has not usually

been considered in recent context-aware methods.

Although mobility prediction has been considered in

previous context-aware works (e.g. [57, 58]), however

using the movement information beside other context

parameters in handoff decision is a shortcoming of

context-aware handovers. We will show that our

proposed method inherently learns the trajectory of MNs

beside other parts of context for target PoA selection.

4 Proposed Handoff Decision

In this section, we introduce the proposed mobile

controlled personalized handover management model that

employs the UPQ metric for handover decision.

Since UPQ reflects the quality of attaching to a PoA

after handing over to it, it cannot be used for target PoA

selection directly. One may imagine that the UPQ level of

other user s in neighbouring PoAs is a good metric for

target PoA selection; however, accessing the UPQ level

of other users is not suitable due to its complexity and

heavy signalling overhead. Therefore, reinforcement

learning methods are the best solution to employ such a

parameter for handover decision, indirectly. The

underlying network is a shared resource exploited by

MNs (like a multi-agent system), so the quality of

perception is not only related to the decisions of the MN,

but also to the handover s of other MNs. The decision

mechanism should learn its skills being indirectly aware

about the behaviour of other MNs. Hence, a suitable

MARL algorithm has been utilized for target PoA

selection to take this awareness into account inherently.

The proposed model includes an Adaptive Han doff

Decision (AHD) module and a Perceived Quality

Evaluation (PQE) module as shown in Figure 2. The

AHD module determines the appropriate time of

handover (from events) and also the target PoA (using

Learner). Handoff triggers are constructed from two

sources: the link layer events; and UPQ degradation

event. MIH framework provides link events for handoff

trigger. Likewise, the PQE module is responsible for

providing UPQ level to AHD and also initiating handoff

triggers whenever UPQ degrades below satisfying level.

Users can subjectively report the UPQ level and UPQ

degradations to PQE. Otherwise PQE evaluates the UPQ

level objectively and compares it to defined threshold for

handoff initiation triggers. AHD employs an MARL

algorithm (Learner component) to decide about the target

PoA where UPQ metric is used as a reward/penalty to

train it. AHD applies the result of handover decision

through MIH and also reports it to higher layer mobility

management protocols (such as MIP).

In the remainder of this section, we introduce the

employed learning algorithm and the state space

representation in sections 4.1 and 4.2, respectively.

Subsection 4.3 is about extensions proposed on MIH and

finally, details of event handlers are discussed in

subsection 4.4.

Figure 2. The proposed model for personalized handover

decision in mobile nodes

4.1 Employed MARL Algorithm

Target PoA selection problem needs a combination of

cooperation and competition between isolated MNs that

are changing their location during time. Therefore, this

Link laye r

MIH Function

MIES

MIIS

MICS

Mobility management protocols (MIP, SIP ...)

Perceived

Quality

Evaluation

Multimedia

Application

UPQ Le vel/ UPQ Trigge r

Decision Result

(UMTS/802.11/802.16/...)

Event Handlers

Learner

MIH

AHD

problem is mostly matched to definition of dynamic

mixed SGs. Also, regarding the communication overhead

required to share the actions, rewards and other structures

in interactive learning algorithms, the isolated learning

methods are acceptable ones for handover decision.

As stated in [59], single-agent RLs such as Q-

Learning (Algorithm 1) can be directly applied to mixed

games. However, "non-stationarity of the problem

invalidates most of the single-agent RL theoretical

guarantees [37]". Considering the overview presented in

section 2.3, WoLF-PHC [17] algorithm has been

addressed as an isolated heterogeneous learning

algorithm for dynamic mixed SGs and is chosen to be

used in our proposed handover decision model.

WoLF-PHC is an improvement on PHC (Policy Hill

Climbing) algorithm which adds the convergence

property to PHC in addition to its innate rationality. PHC

is an extension of Q-Learning with the same Q matrix

that its values are maintained as in single-agent Q-

Learning. In addition to basic Q-Learning, PHC

maintains a policy matrix, π involving the probability of

selecting each action according to the current state. It

performs hill-climbing search in the space of mixed

policies [17] with respect to Q values. Therefore the

policy matrix is updated according to relation (2) where δ

є (0,1] is a learning rate and |Ai| is the number of possible

actions for ith agent.







otherwise

asQaif

asas

aa i

)

),,(mi n (

)),((ma xarg)

),,(mi n (

),(),(

(2)

In relation (2), the probability of selecting action a in

state s is updated with respect to the learning rate. If after

execution of action a, the value of Q(s,a) is the maximum

value between th e elements of sth row of Q, this

probability is increased to improve the chance of future

selection of this action. This probability is diminished if

the action has not caused Q(s,a) to be the maximum

between the elements of sth row. This is the idea of hill

climbing in policy space where the agents greedily

increase the chance of those actions showing better

achievement to cumulative reward.

PHC is rational and its proofs follow from the proof

of Q-learning [17]. However, as PHC is not always

convergent [17], it does not show convergence to the

Nash Equilibrium in some cases.

The WoLF (Win or Learn Fast) principle aids PHC in

convergence by providing more time for other players to

adapt to the changes in our player's strategy that at first

appear beneficial while allowing the player to match

more quickly to other players' strategy changes when they

are harmful. Algorithm 1 shows WoLF-PHC in details. In

fact, the essence of WoLF on PHC is that agents learn

quickly when they are losing and cautiously when they

are winning. Determination of winning or losing is

according to the average of the policy matrix (

) that is

an approximation of equilibrium policy [17]. So, WoLF-

PHC algorithm defines two learning parameters, δl for

loss and δw for win, where δl > δw. This technique causes

the agent to adapt more quickly when it is working

weakly. However, the agent will be careful when it is

working better than expected since other agents may

change their policy suddenly [17]. WoLF-PHC is still

rational since only the speed of learning is changed

comparing to PHC. This rationality along the

convergence property guarantees attaining the Nash

Equilibrium. Details of the proofs on how WoLF idea

guarantees the convergence to equilibrium is out of the

scope of this paper and reader may refer to [17] which

also shows the feasibility of WoLF-PHC by some

practical examples.

The consequences of WoLF principle are expedient

for handover problem where MNs are eager to learn the

best strategy more quickly. This paper employs the

WoLF-PHC algorithm for adaptive learning of target PoA

selection in proposed handover decision model. It must

be noted that Algorithm 1 is not so time consuming since

it is in the class of linear time complexity algorithms.

Therefore, using it in the proposed model does n ot burden

heavy computation load to MNs. Using WoLF-PHC,

MNs do not require to exchange any information since

each MN learns others’ policies from its directl y

observed reward and its average policy matrix. It should

be noticed that converging to the equilibrium is time

consuming in stochastic games with mixed strategy.

Although WoLF enhances the convergence speed using

variable learning rates, however convergence speed is

still a restriction that needs more research.

Algorithm 1: WoLF-PHC Algorithm for agent i

0. Let β є (0, 1] and δ

> δ

є (0, 1] be learning rates

1. Per each s є S and a є Ai :

Q(s, a) = 0; π(s, a) =

; C(s) = 0

2. Do forever:

a. Select action a accordin g to mixed strat egy π(s)

and

execut e it.

b. Obtain relevant reward r.

c. Observe the new state s'.

d. Updat e Q a ccording to relati on (1 ).

e. Update estimate of average policy,

using:

)),(),((

)(

),(),(,

1)()(

asas

asasAa

sCsC











SSSS

f. Updat e π(s, a) as in relation (2) where δ is:

g. s = s '

4.2 State Space Representation

Before using the MARL method, the state space

where MN lies in should be modeled. In the simplest

way, the state of the MN can be determined based on the

value of the following spatial and temporal context

parameters:

- Currently serving PoA address

- RSS level of currently serving PoA

- Time (Days of a week and hours of a day)

Serving PoA and its RSS level enable the learner to

approximately distinguish between different locations in

the environment. As RSS is a continuous parameter,

some thresholds have been employed for it (with respect

to PoA configuration) to segregate the state space.

Temporal information also is used for discriminating

between different states that may be created due to the

time-varying context and time-varying behaviour of

mobile users.

When any positioning service is not available for

MNs, the serving PoA and RSS level are the best choices

to determine the state of the MN. Assumin g those

parameters, each MN needs to know whole of the PoAs

and their RSS thresholds which are provided through

MIH framework as discussed in the next subsection.

However, global position of the MN can be utilized to

discriminate different locations more accurately. This

requires the mobile devices to be GPS-enabled (for

outdoor applications) or sensors are necessary to detect

the location of mobile devices (e.g. in ubiquitous network

models). To discriminate the state space based on

location, the map of the environment is required. This

study adopts Spatial Conceptual Map (SCM) [60] model

which transforms the real map into an abstract view. The

SCM has already been used in some previous works for

mobility prediction [58] or path planning [57]. An SCM

contains representation of landmark objects, Oi (e.g.

buildings or rooms) and way areas, Wi. A way area is

partitioned into a set of Way Elementary Areas (WEA),

αi according to landmark objects and way crossings [58].

In this paper, we also partition the way areas according to

the coverage of PoAs such that the list of available PoAs

does not change during movement of MN in a WEA. A

characterization function (Co) is associated with each

landmark object and WEA which shows the accessible

PoAs as shown below:

Co(Oi) = { list of PoAs accessible in Oi } (3)

Co(αi) = { list of PoAs accessible in αi } (4)

List of accessible PoAs is obtained from the position and

coverage of PoAs accessed through MIIS. This

characteristic function is used to make the learner to

select the target PoA from just accessible PoAs according

to current location of MN. Therefore, the following

¦cc

¦!

otherwise

asQasasQasif

SSG

),(),(),(),(

parameters are proposed for state space representation if a

positioning mechanism is available:

- Currently serving PoA address

- Current location of MN (Oi or αi)

- Movement direction of MN (if is available)

- Time

With the aid of SCM model, the movement direction of

MNs can also be utilized for state space representation. It

is noteworthy that the movement direction of MNs is

limited with respect to the orientation of WEAs in SCM.

For example in an east-west WEA; the movement

direction of each MN may be east or west and two states

are separ able from movemen t direction .

One set of matrices ( ) is assumed per each user

and per each service. The first dimension of th ese

matrices is the number of possible states (N) and the

second dimension is the number of all PoAs in the

wireless domain (M). So, they are N by M matrices where

rows are representing states and columns are representing

PoAs as possible actions.

4.3 Extensions to MIH

Knowing all possible states of the environment

requires the complete knowledge about all PoAs in the

wireless domain. To resolve the problem, a simple way is

providing all accessible PoAs of the geographical domain

using a newly defined IE container (ALL_PoAs_Report).

Hence, a new request message with a unique Type

identifier (Type_ALL_PoAs_Report) has been defined as

shown in Table 1. In response to that request, a reply

message is returned that includes all PoAs and some

information about them. The format of the response TLV

has been shown in Table 2. All of the IEs that have been

used in the proposed response message are the ones

defined in IEEE 802.21 except two ones which are

proposed here. One of the defined IEs is RSS Thresholds

List which includes the thresholds on RSS that are used

for separating different states as described in 4.2. The

other one includes the neighbouring PoAs of each PoA.

This container will be requested from IS server once at

the beginning to identify the wireless domain and will be

used for creating learner matrices.

Table 1. All PoAs Request TLV

Type

Length

Value

Type_ALL_PoAs_Report

Variable

MN MAC Address

4.4 Handover Decision Algorithm

This section describes the handover decision algorithm

which is based on WoLF-PHC learning method. In the

proposed handover decision, target PoA selection

(Algorithm 1; step 2.a) is simultaneously performed

along with the training process (Algorithm 1; steps 2.b to

2.f). The training/selection process contains the following

steps:

- When a handoff initiation trigger reaches the AHD

(UPQ degradation or Link Going Down), it selects the

best neighbouring PoA regarding the current state of

MN. Neighbouring PoAs ar e obtained from MIIS or

SCM representation.

- AHD stores the current state, the current value of

mean UPQ level, and the selected PoA.

- The handover to the selected PoA will be performed if

the relevant link is available (or is detected in MIH

Link Detected event or scan confirm).

Table 2. All PoAs Response TLV

Type

Length

Value

Type_ALL_PoAs_Report

Variable

Number of PoAs

Include entry per each PoA

PoA

address

PoA

Coverage

PoA

Subnet

Information

PoA

RSS

thresholds list

PoA

neighbours

list

PoA1

Position

PoA

address

PoA

Coverage

PoA

Subnet

Information

PoA

RSS

thresholds li st

PoA

neighbours

list

PoA2

Position

...

…

- If the han dover to selected PoA is performed, AHD

has to update its matrices with respect to the earlier

state and the recently selected PoA. The difference

between current and previous (stored) mean UPQ

level is used as training reward. However, this update

could not be performed exactly after handover

execution and is postponed to future to investigate the

effect of handover for the time between current and

next handover decision.

- If the handover to the selected PoA is not performed

and UPQ level is degrading, yet the AHD must punish

the learner for this selection that probably is not in

movement path of MN and is not detected.

It must be noticed that quality of each action (target

PoA selection) not only depends on variations of mean

UPQ, but also depends on the length of the time that user

has experienced degradations or improvements due to th e

recent action. To consider this case in AHD, the variation

of mean UPQ is multiplied by the time between current

and next handover decision to obtain the training reward

as below:

r = (NOW - Timed) × (UPQNOW – UPQold) (5)

where Timed is the time that current PoA selected, UPQold

is the mean UPQ level at decision time, UPQNOW is

current level of mean UPQ, and NOW is the current time

that a n ew decision is underway.

The training and target selection processes are divided

between some event handlers. For simplicity of

presenting these event handlers, we assume that there is

only one flow in each MN at a time. One structure has

been defined in these event handlers per each flow, which

is called selection structure. Th e selection structure stores

the last selected PoA, the state of MN, the mean UPQ

level when this selection was performed, and the time of

selection. It also contains a member that indicates

whether the handover to the selected PoA is executed or

not yet.

Algorithm 2 shows the detailed procedure proposed

for UPQ degradation event which is called UPQ_Trigger.

When a UPQ_trigger event arrives while the previous

selection has not yet been evaluated (selection age is less

than Sel_Threshold), no operation is performed. In fact,

UPQ_Trigger's event handler avoids consecutive target

selections and handovers based on Sel_Threshold

parameter. Otherwise, the algorithm updates the WoLF-

PHC matrices with respect to the previous decision

(stored in selection structure) and a new PoA is selected

according to WoLF-PHC policy. The algorithm stores the

time, the selected PoA, the current level of mean UPQ,

the current state, and the flow in selection structure, and

then tries to perform the handover if the selected PoA is

now accessible through an interface. Unless, the

algorithm requests from MIH to perform a link scan on

relevant interface. In fact, the selected PoA remains in

selection structure hoping to be detected later.

MIH Link_Going_Down event handler is similar to

the UPQ_Trigger. Although receiving UPQ degradation

trigger is usual for the flows which are using a going

down link, however, sometimes no UPQ degradation is

reported depending on the transferred content (e.g. for a

series of MPEG video frames which are not so different

from each other). Therefore, MIH Link_Going_Down

event n eeds to be handled.

The situation of Link_Down event is rather confusing.

This event occurs when a link is not available anymore.

Since the handover decision is based on a learning

algorithm, the handler of this event should not perform

any handover (during the training stage). In fact, this

event is produced after UPQ_Trigger or

Link_Going_Down events where any decision about

handover has already been taken. If that decision is not

effective, the learner should be punished to improve its

future selection in this state. So, the only task in this

event may be requesting link scan on the disconnected

interface. Of course, Link_Down event may redirect the

flows with down interface to another interface, if the

training process is n ot in progress anymore or has r eached

steady state.

Algorithm 2: UPQ_Trigger (flow, UPQLevel, currentState)

if (NOW-selection.time > Sel_threshold){

if (selection.performed == True or UPQLevel <

selection.UPQ)

Learner.update (NOW, UPQLevel, currentState,

selection)

newPoA = Learner.select(currentState, {neighbors})

selection.PoA = newPoA

selection.UPQ = UPQLevel

selection.time = NOW

selection.flow = flow

selection.state = currentState

selection.performed = False

selection.addressPrefix = AddPrefix(newPoA)

if (typeOf (newPoA) != typeOf(flow.currentPoA))

for each interface in IFManager.interfaces

if (interface.PoA == newPoA) {

switch flow to interface

complete the handover in upper layers

selection.performed = True

go to end

}

MIH.scanReq(relevantInterface(newPoA))

else

if( newPoA != flow.currentPoA)

MIH.scanReq(relevantInterface(newPoA))

else

selection.performed = True

}

end

Algorithm 3 shows the event handler for

Link_Detected event. This event is emitted when an

interface detects a new PoA. The algorithm configures

the newly detected PoA if it has already been selected

according to the learner advice (stored in selection

structure), otherwise the relevant interface is configured

for probable future use (without considering the learn er

advice) if the type of interface is not the same as selection

and the interface is not currently being used by any data

flow. It is also obvious from the algorithm that we do not

perform target selection in this event. This is due to the

fact that the goal of the proposed method is only

improving quality of perception, so the MN only thinks

of a new PoA when its flows are endangered (in

Link_Goin g_Down or UPQ_Trigger ) to avoid

unnecessary handovers. Handing over when a new PoA

with better characteristics is detected, may sometimes

lead to quality degradations. Similar algorithm is

performed for the detected PoAs in scan confirm event.

Algorithm 3: Link_Detected (detectedPoA, interface)

if (selection.performed == False and selection.PoA ==

detectedPoA)

configure interface to connect thr ough detetctedPoA

else {

if ((typeOf(detectedPoA) == typeOf(selection.PoA)) or

flow.interface == interface)

go to end

configure interface to connect thr ough detetctedPoA

}

end

Link_Up event is another event that occurs when a

new link is configured with a PoA at Link_Detected or

Scan_Confirm event handlers. Regarding the Link_Up

event handler (Algorithm 4), if the new PoA is the

anticipated one stored in selection, the MN's interface

must be configured with a new IP address. As discussed

in section 4.3, MNs attain the address prefixes of PoAs

from IS server (Table 2) and so the interface can be

configured without waiting for MIP access router

discovery process. Moreover, the binding update must be

accomplished for the data flow. Also, the performed field

of selection structure should be set if handover is

performed.

In addition to these events, Link_Parameter_Report event

is used to obtain RSS level which is used for state

determination if the first method of state space

representation h as been chosen.

Algorithm 4: Link_Up (linkPoA, interface)

if (selection.performed == False and selection.PoA ==

linkPoA)

- configure the interface IP address using

selection.addressPrefix

- handover selection.flow to interface (if is not using)

- complete the MIP handover procedure

- selection.performed = True

else if (typeOf(selection.PoA) != typeOf(linkPoA))

- configure the IP address of interface using MIP

procedure

end

5 Simulations

To evaluate the feasibility of the proposed handoff

decision method, in this section we present the results of

simulations performed in NS 2.28 network simulator

using NIST mobility package [61]. NIST mobility

package has provided the layer 2 handover for traditional

NS2 implementation of WiFi and WiMAX in addition to

the basic implementation of MIH framework (event and

command services). We have extended the NIST MIH to

support MIIS. In implemented IS server, we have

provided a table that stores a container for all PoAs in the

simulation environment and their relevant context

information.

The video traffic has been chosen for our evaluation.

The transferred video flows are real video streams

encoded in MPEG format. To evaluate UPQ in

simulation environments, the possible method is

objective evaluation. The PSNR method is chosen to

estimate the UPQ level of received video stream.

Calculating PSNR requires original video which is

conveniently accessible in simulations. However in real

implementations, reference free methods (or subjective

evaluation) should be used.

The mechanism chosen for evaluation of real video

quality over the simulated network is similar to the

method proposed in [36]. Here, a log file is constructed

from real transmission of an MPEG video file. This log

file is used as a sender trace file (in NS-2) to generate the

corresponding simulated traffic in the transmitting node.

Once the receiver gets the video packets from the

simulated network, a receiver trace file will be generated

that describes the time and status of the received packets.

The receiver trace file is employed along with the sender

trace file and the original video stream to recognize the

erroneous received video stream continuously. So, the

original and the received video streams are available for

PSNR calculations, frame by frame.

Evaluation of PSNR should be performed

continuously during the simulation. In our case, the

evaluation is performed every N received frames of video

stream (its commensurate time step) to rapidly report any

UPQ degradations. We do not calculate the PSNR for the

entire received video stream in each time step. In stead, a

moving average window with window length of W

frames has been used. Briefly, the PSNR of the last W

received video frames are averaged and reported to AHD

in each evaluation step.

In the remainder of this section, the results of two

different simulation scenarios are shown. The next

subsection presents the results of a typical simulation

scenario which compares the proposed method to

traditional and context-aware ones. Subsection 5.2

demonstrates the results of simulating a more complex

scenario using SCM model to evaluate the feasibility of

the proposed method for being adapted in real

environments.

5.1 Simulation of a Typical Scenario

The first simulation environment is assumed to be

covered by two WiMAX base stations and four WiFi

access points as shown in Figure 3. These PoAs are

connected to th e router R1 via 100Mbps trunks. The

coverage radius of WiFi access points is assumed to be

about 50 meters while the coverage radius of WiMAX

stations is about 500 meters. The employed propagation

model of the physical layer is Two-Ray Ground model.

WiFi access points are working at data rate of 11Mbps,

and WiMAX nodes are based on IEEE 802.16e. Three

mobile nodes have been assumed in our simulations and

MIPv6 has been chosen as their layer 3 mobility

management protocol. These MNs are multi-interface

nodes supporting both WiFi and WiMax technologies. In

addition, three fixed Crowding Nodes (CWNs) have been

assumed in the coverage area of AP1, AP2 and AP3.

MNs have different movement patterns with a fixed

speed of 20m/s. MN2 starts from position (470, 980) and

moves to (600, 1050), then it moves to position

(1400,1000). MN1 starts from (480, 1000) and moves to

(650, 950), and then to (1200, 1000). MN3 starts from

position (1100, 1000) and moves to position (450, 1000).

In this simulation, the first proposed method for state

space modeling (section 4.2) is chosen and one RSS

threshold has been considered to separate different states

around PoAs. The RSS threshold is -61 dB for WiFi APs

and -80 dB for WiMAX BSs regarding adjusted

transmitting power.

Three video flows have been considered in this

simulation where MN1, MN2, and MN3 are the

destination of those three MPEG video streams (in QCIF

frame size) with 1000 bytes maximum packet size and

frame rate of 30 fps, over UDP connections. These video

flows are originated from Corresponding Node (CN) and

the maximum required bandwidth of each one is about

0.3Mbps. In addition, a CBR flow has been transferred

from each Crowding Node (CWNi) to CN with 6Mbps

rate to diminish the capacity of WiFi access points in

specific times. The aim of these CBR flows is to diminish

the quality of video streams when transferred through

these APs.

For instantaneous UPQ evaluation, N and W are

adjusted to be 3 and 6 respectively. Considering the

reference PSNR of original video, the amount of PSNR

threshold for handover initiation is assumed to be 25dB.

This threshold has been attained experimentally to reflect

the acceptable user satisfaction level. Also, the

Sel_Threshold in Algorithm 2 is selected to be 0.5

second.

To evaluate our proposed model, we firstly compare

its performance with a traditional handover decision

method implemented under MIH framework (called

simple handover). This simple method employs Link Up,

Link Down, Link Detected, and Link Going Down events

and initiates the handover only based on these layer2

triggers. In this method, WiFi PoAs are preferred to

WiMAX ones and the target PoA is selected based on

RSS quality. The signalling overhead of the proposed

method is not considerable comparing to this basic

method. Therefore, the two methods have been compared

only in terms of service quality measures. The aim of this

comparison is evaluating the proposed method against

another handover mechanism with similar signalling

overhead to show the susceptibility of proposed method

after convergence. Secondly, the proposed method is

compared to a usual context-aware handover (called

conventional handover) in terms of both the provided

quality and the signalling overhead to show the expense

of context gathering in context-aware methods.

Figure 3. First Simulation Environment

(WiM AX)

MAC=10

Position=(1300,1000)

(WiFi- 11Mbps)

MAC

Position=(650,1020)

(WiFi- 11Mbps)

MAC

Position=(750,1000)

Flow1: Video (to MN2)

Flow 2: Video (to MN1)

Flow 3: Video (to MN3)

(WiFi- 11Mbps)

MAC=1

Position=(500,1000)

(Multi-IF)

Position=(470,980)

Speed = 20 m/s

(WiFi- 11Mbps)

MAC

Position=(640,980)

(WiM AX)

MAC

Position=(1000,1000)

(Multi-IF)

Position

=(480,1000)

Speed = 20 m/s

(Multi-IF)

Position=(1100,1000)

CWN3

CWN2

CWN1

5.1.1 Comparing to Traditional Handoff

At first, the simulation has been carried out using

simple handover mechanism in MNs and the results have

been gathered for about 30 seconds (about 900 frames of

the video). Then, the proposed handover method has been

employed and after 6 training stages; the simulation

results have been compared to those obtained from the

simple method. Figure 4 shows that the overall PSNR

levels of video flows h ave been improved using our

proposed method. For the proposed method, this

calculated metric is from 6th training stage.

In simple handover, flow 1 and flow 2 experience

major degradations when MN1 and MN2 select AP1 or

AP2, leaving AP0 coverage (around frame 200 and

subsequent ones). Ch oosing AP1 to transmit flow 2, MN1

faces with the poor quality of service provided by this

PoA, although its signal strength is fine. On the other

hand, as MN2 selects AP

1 and then AP2, subsequent

degradations occur for flow 1. Similar degradation also

takes place around AP3 because of the same r easons.

MN1 and MN2 also involve in a ping pong effect between

BS0 and BS1 during the rest of simulation.

In contrast, the PSNR of video flows has been

improved using our pr oposed method. Both MN1 and

MN2 have learnt to select BS0 when their quality of

perception degrades during their connection to AP0 and

also to stay connected to BS0 hereafter, resulting in low

PSNR degradation. Similarly, for flow 3, simple

handover causes service degradations when MN3 ent ers

the coverage ar ea of AP3 (for frames 600 to 800).

However, this degradation does not happen for the

proposed method since MN3 avoids handing over to this

PoA.

100

200

300

400

500

600

700

800

900

PSNR (dB)

Fram Number

Flow 1

100

200

300

400

500

600

700

800

900

PSNR (dB)

Frame Number

Flow 2

100

200

300

400

500

600

700

800

900

PSNR (dB)

Frame Number

Flow 3

Simple Handover

Proposed Handover

Figure 4. Final PSNR level of video flows for the

proposed and simple handover decision methods

We also have compared the quality of video flows in

terms of frame jitter and frame loss as the QoS metrics.

Figures 5 to 7 compare frame jitter of video flows for the

simple handover decision and the proposed method.

These figures show that video frames experience more

delay variations under simple handover decision due to

inefficient target selection and unnecessary handovers.

Table 3 presents frame loss comparison under these two

methods. As shown in this table, the n umber of lost video

frames has reduced remarkably when the proposed

method has been trained for sufficient iterations.

Figure 5. Jitter comparison of Flow 1 under simple and

proposed decision methods

Figure 6. Jitter comparison of Flow 2 under simple and

proposed decision methods

Figure 7. Jitter comparison of Flow 3 under simple and

proposed decision meth ods

Table 3. Frame loss comparison of video flows under

simple and proposed handover decisions (after 6 trials)

Simple Handover

Proposed Handover

Flow 1

271

Flow 2

368

Flow 3

211

Anoth er important metric for comparison of handover

methods which has direct relation to quality of received

video, is the number of handovers. Table 4 compares the

number of handovers arisen for both methods. A large

number of unnecessary handovers degrades service

quality in addition to diminishing the utilization.

As a conclusion employing the three techniques of

UPQ for handover initiation, the proposed algorithms for

management of triggers, and the learning-based target

selection, bring about significant improvement in UPQ

compared to simple handover method.

Table 4. Comparing the number of completed handovers

Simple Handover

Proposed Handover

MN1

MN2

MN3

The behaviour of the proposed method implies that

its decision skill improves as the mobile node continues

to learn the environment. Figure 8 shows the overall

PSNR level for one of the video flows (flow 1) which

improves as the learning procedure is reiterated showing

that the MARL mechanism gradually learns to select the

best PoA.

100

200

300

400

500

600

700

800

900

PSNR (dB)

Frame Number

Flow 1

Run#1

Run#3

Run#6

Figure 8. PSNR level of Flow 1 video frames after each

of the three training runs

The average PSNR levels of all the video flows have

been examined over six training runs in Figure 9. This

-0.1

-0.05

0.05

0.1

0.15

0 200 400 600 800

1000

Jitter (Sec.)

Frame Number

Simple Handover Proposed Handover

-0.1

-0.05

0.05

0.1

0.15

0 200 400 600 800 1000

Jitter (Sec.)

Frame Number

Simple Handover Proposed Handover

-0.1

-0.05

0.05

0.1

0.15

0 200 400 600 800

Jitter (Sec.)

Frame Number

Simple Handover Proposed Handover

figure shows that the videos’ mean PSNR level amplifies

as the number of training stages increases. After six trials,

the learning system converges to a steady state value and

the UPQ calculations are needed less frequently meaning

that the computation load of MNs could be reduced.

Figure 9. Mean PSNR level of video flows in

consecutive learning trials

5.1.2 Comparing to Context-Aware Handoff

In this subsection, the proposed method is compared

to a conventional context-aware method in which the

network QoS parameters and user preferences are utilized

as context parameters in an MADM based decision

maker. Only the QoS parameters are chosen as context

parameters to have a fair comparison to the proposed

method which is focusing on quality (due to exploiting

PSNR metric). The chosen metrics are available

bandwidth (abw), number of users (load), packet delay

and packet loss. As mentioned in section 2, MADM

methods are a popular technique in context-aware and

vertical handover decisions (e.g. [13,47,62]). Here, the

Simple Additive Weighting (SAW) algorithm has been

selected for MADM. SAW requires a preference vector

that indicates the percentage of preferrin g each metric

according to user intentions. A decision matrix will be

formed to show the capabilities of the candidate PoAs in

terms of decision parameters. SAW algorithm employs

preference vector and decision matrix to indicate the

importance of choosing each PoA between the

candidates. The preference vector is as below and has

been adjusted such that MNs select the best target PoAs

in this scenario:

14.028.036.022.0 P

loaddelayabwloss

(6)

The aim of choosing this method for comparison is to

show the effect of context access latency on user

perception quality. It must be pointed out that all of the

context aware handover methods proposed in literature

suffer from context access latency. Therefore, the

MADM based method has been adjusted to select the best

selection between candidates to only show the effect of

context access latency. One may think that pre-fetching

dynamic context (in recent works such as [43]) eliminates

context access latency, but it must be noticed that the

freshness of context data is not guaranteed at decision

time specially in crowded networks where network

resources are changing so rapidly.

Decision metrics are dynamic and need to be attained

repeatedly from access networks. Gathering those

network parameters has been motivated from the method

proposed in [56] which is based on MIH framework.

Herein, MNs should ask from IS for neighbouring PoAs

and their static context (after a successful handover) and

then ask through currently serving PoA for dynamic

context of each neighbouring PoA (whenever a handover

decision is underway). The mentioned method performs

target selection when Link Going Down, Link Down, or

Link Detected events arrive. To have a fair comparison to

proposed learning-based target selection, the

UPQ_Trigger event has also been considered as a handoff

initiation source in conventional method.

The simulation has been repeated for the above

conventional method and the results have been compared

to the ones obtained for the proposed method. Figure 10

shows the PSNR level of received video flows using

explained conventional method and compares it to

proposed method. Although the preference vector has

been adjusted such that MN1 and MN2 select BS0 exiting

from coverage of AP0, yet the PSNR degradation is more

considerable using conventional method. This is due to

the handover latency in conventional method which waits

for collection of dynamic information about neighbouring

123456

Mean PSNR (dB)

Trial Number

Flow 1 Flo w 2 Flow 3

PoAs to do target selection. Similar degradation happens

for flow 3 during handover from BS1 to BS0 and finally to

AP0 as shown in Figure 10. Using conventional method,

flow 1 and flow 2 also experience quality degradations

during handover from BS0 to BS1 since this method

selects BS1 as better PoA whenever it is detected.

However, the proposed method does not perform such an

unnecessary handover as shown in Figure 10 since it only

performs handover decisions whenever UPQ degrades.

100

200

300

400

500

600

700

800

900

PSNR (dB)

Fram Number

Flow 1

100

200

300

400

500

600

700

800

900

PSNR (dB)

Frame Number

Flow 2

100

200

300

400

500

600

700

800

900

PSNR (dB)

Frame Number

Flow 3

Conventional Handover

Proposed Handover

Figure 10. PSNR level of received video frames under

conventional method and proposed method

In the remainder, we are going to show that the

proposed method incorporates QoE awareness to target

selection without considerable signalling overhead. Using

the proposed handover model, each MN only perfor ms

one transaction (one request/response for

ALL_PoAs_Report IE container as stated in section 4.3)

at the beginning of its activity to obtain the list of PoAs

and the static context about them. Then each MN tries to

learn its handover decision strategy which is stored

locally in learning matrices constructed from the above

static context without additional signalling (as explained

in section 4.2). In contrast, in conventional method, the

MN should look for neighbouring PoAs and their static

context (one transaction with IS), and then ask through

the serving PoA for dynamic context of the neighbouring

PoAs (namely one transaction between MN and the

serving PoA and one transaction between the serving

PoA and the neighbouring PoA per each neighbour). The

first transaction is performed whenever the MN’s

handover to a new PoA is completed while other

transactions should be performed when a handoff

decision is necessary due to handoff initiation triggers

such as UPQ degradation or Link Going Down.

Therefore, total number of transaction s depends on the

rate of completed handoffs (Completed_HO) and the rate

of handoff triggers (HO_Triggers).

Table 5 presents a general signalling comparison of

both methods assuming n mobile nodes in environment

and m neighbouring PoAs per each PoA in average. In

those relations, the rate of handoff initiation triggers of ith

mobile node is shown by HO_Triggersi and the rate of

completed handoffs executed by ith mobile node is given

by Completed_HOi where the initial connection to the

wireless network has also been considered as a completed

handoff.

Assuming identical handoff initiation and execution

rates for all the MNs, figure 11 draws the signalling

overhead versus the number of MNs for the proposed

method, the conventional handover, and the method of

[43] under different ratio of HO_Triggers to

Completed_HO. As figure 11 shows, signalling

transactions of conventional method and its enhanced

version in [43] grow more severely comparing to the

proposed method as the number of MNs increases.

Moreover, as the number of unsuccessful handovers (rate

of handoff initiation triggers to completed handovers)

raises, this signalling overhead is more considerable for

conventional method and even for the method of [43].

However the overhead of the proposed method does not

depend on this ratio as shown in Table 5.

100

150

200

250

0 5 10 15 20 25

# of Signalling Transactions

Number of MNs

Conventional me thod (HO_Triggers/Completed_HO=1)

Conventional me thod (HO_Triggers/Completed_HO=2)

Method of [43] (HO_Triggers/Completed_HO=1)

Method of [43] (HO_Triggers/Completed_HO=2)

Proposed method (all ratios of HO_Triggers/Com pl eted_HO)

Figure 11. Signalling transactions comparison

In the above comparisons, we have not considered th e

size of signalling messages. Although the proposed

ALL_PoAs_Report IE container needs larger signalling

messages than the standard IE container which is used by

the conventional method, however it should be noticed

that these messages are transferred to MNs only at the

beginning to identify the wireless access domain to MNs.

Therefore the size of the signalling messages is not a

considerable drawback for th e proposed method.

However for the signalling between MNs and IS in

conventional method, the IE containers that involve

neighboring information may be attained by each MN

many times during its movement (depending on the rate

of handoff executions) which leads to increasing

cumulative overhead as the time goes on.

The signalling overhead of the proposed method has

also been compared to the conventional method in the

simulated scenario. The number of MIH messages related

to network context retrievals is 42 messages in the

conventional method while this is 6 messages for our

proposed method.

5.2 Simulation of an Environment

Represented by SCM

The second simulated environment is a floor of a

building covered by 6 WiFi access points. Figure 12

shows the SCM representation of the simulated

environment. Access points are connected to a

correspondent node through an access router and their

physical data rate is limited to 2Mbps. The coverage of

these access points is adjusted to about 50 meters. 10

mobile nodes have been considered in this environment

as shown in Figure 12. MN0 and MN1 move to room O6

while MN6, MN7, and MN8 move to room O1. MN4

moves to room O2 and MN

3 moves to room O1.

Table 5. Signaling transaction comparison between proposed method and the conventional context -gathering method

Number of

transactions

Proposed method Conventional context-aware method

Between MNs and IS

nх1

HOCompleted

Between MNs and

serving PoA

TriggersHO

Between serving PoA

and

neighboring

PoAs

TriggersHOm

Imposed to MNs

nх1

)__(

TriggersHOHOCompleted 

Total

nх1

)_)1(_(

iiTriggersHOmHOCompleted u

Destination of MN5, MN2, and MN9 is O3, O1, and O3

respectively. All MNs move at fixed speed of 15m/s. The

characteristics of rooms and WEAs have been determined

with respect to position of access points. For example for

α1, Co(α1) is {AP0, AP1, AP2} while for α2, Co(α2) is {AP1,

AP2}.

Each MN is receiving a video flow (in QCIF frame

format with 128 bytes maximum packet size and frame

rate of 30 fps) over a UDP connection. These video flows

are originated from the CN and the maximum required

bandwidth of each one is about 0. 3Mbps. For

instantaneous UPQ evaluation, N and W are chosen to be

6. Considering the reference PSNR of the original video,

the amount of PSNR threshold for handover initiation is

assumed to be 20dB. The Sel_threshold parameter is

adjusted to 0.8 second in this simulation.

The simulation of proposed method has been repeated for

multiple consecutive runs where Figure 13 shows the

mean PSNR level of each video flow after each run. This

figure indicates that the UPQ levels of th e received flows

improve in overall. Although the mean UPQ level of

some video flows oscillates during repetitions, however

such variations are natural during the training of the

learner in this scenario that the resources are limited and

a handover may affect the quality of other users.

Figure 14 shows the PSNR of MN7’s received video

in 2nd, 5th and 9th iteration of the simulation. To better

show that the overall performance of the handover

decision improves as the simulation is repeated, we also

have shown the average of mean PSNR of all video flows

in Figure 15 per each run of the simulation. This figure

also emphasizes on ability of the proposed method to

learn th e handover decision skill based on UPQ metric.

Figure 12. Second Simulation Environment

012345678910

Mean PSNR level (dB)

Number of Tra ining Runs

Video0 (received by MN0)

Video1 (received by MN1)

Video2 (received by MN2)

Video3 (received by MN3)

Video4 (received by MN4)

Video5 (received by MN5)

Video6 (received by MN6)

Video7 (received by MN7)

Video8 (received by MN8)

Video9 (received by MN9)

Figure 13. Mean PSNR level of received video flows during different consecutive runs of the simulation

Figure 14. PSNR level of video frames received by MN7

in different consecutive runs of simulation.

123456789

Averagre of Mean PSNRs (dB)

Simulation Run

Figure 15. Average of mean PSNR levels of all video

flows versus the number of simulation run s

We also have obtained results from multiple tests with

different random generator seeds. Figure 16 shows the

mean PSNR level of video flows of MN4, MN7, and MN8

during 15 training runs. Each sample is obtained by

taking the average over 15 tests with different random

seeds. Figure 16 includes 95% confidence intervals for

the means. As the figure shows, in overall, the UPQ level

of the received video flows incr eases while the

confidence interval diminishes during training runs.

Figure 17 shows the average of mean PSNR of all

videos during training runs with different seeds. The

figure also shows the variance between the mean PSNR

levels of video flows received by the MNs. One may

conclude from this figure that at the beginning of

training, the average and the variance of the videos’

PSNR are low. However, as MNs are competing for

resources during the training, the variance increases for

some runs but commences to decrease as the learner tends

toward steady state. Moreover, the growing average

demonstrates the ability of the proposed method to

improve its strategy during the experiments in more

realistic conditions.

123456789101112131415

Mean PS NR (dB)

Number of Training Runs

video4

video7

video8

Figure 16. Mean PSNR levels of MN4, MN7, and MN8

video flows versus th e number of training runs.

13579111315

Average of M ean PSNRs (dB)

Numbe r of T raini ng Runs

13579111315

Variance of Mean PSNRs (dB)

Number of Training Runs

Figure 17. Average and variance of mean PSNR levels of

all video flows versus the number of training runs

After 15 training runs, we have exploited the learner

for a different MPEG video stream that has the same

characteristics but less picture motion. The UPQ

threshold for this video stream is adjusted to be 25 dB.

Figure 18 shows the mean PSNR level of the video

received in each MN and the average of all PSNRS under

the proposed method and compares it to the mean PSNR

level obtained using the basic handover decision

0 100 200 300 400 500

PSNR (dB)

Video Frames

Run #2

Run #5

Run #9

(introduced in previous subsection). Although the UPQ is

content dependent, however this figure shows that the

trained learner is yet capable of satisfying the user much

better than traditional method with similar required

signalling. In the next section, we will discuss about

features of the proposed method further.

Mean PSN R (dB)

video flows

Proposed Handover

Basic Handover

Figure 18. Mean PSNR level of received video under

proposed handover decision and basic handover decision

Finally, in this section we investigate the convergence

time of the proposed method for varying number of MNs

in terms of the number of training runs needed by MNs to

learn their suitable strategy. Therefore, varying number of

MNs (3, 5, 7, 10, 13, and 15) has been assumed in a

random arrangement and the simulation runs have been

repeated for each case until the average PSNR reaches to

the defined threshold (26 dB) for all the contributing

MNs. Then the number of runs has been reported for each

case as shown in Figure 19. It can be concluded from this

figure that as the number of MNs increases, the

convergence time of the algorithm rises accordingly. The

convergence time is a limitation of our proposed method

that we will discuss more about it in the next section.

2 3 4 5 6 7 8 9 1011121314

Number of Training Runs

Number of MNs

Figure 19. Convergence time of the proposed method

versus th e number of MNs

6 Discussions

In general, monitoring quality metrics in our

simulations shows that th e proposed handover method

improves the quality of received video traffic especially

comparing to decision method with similar signalling

overhead. In this section, the advantages and limitations

of the proposed handover decision are discussed in more

details.

The main advantage of the proposed method is

importing new context knowledge to the handover

decision. As discussed earlier, UPQ is also related to

some non-measurable and subjective parameters such as

characteristics of the physical environment, the

expectation and emotion of the user, the content, etc.

Although current objective UPQ evaluation methods are

not capable of considering those parameters completely,

however direct feedback of the mobile users could be

utilized in real deployments to adapt the handover

decision to the mostly satisfying PoA. Evaluating the

proposed method with real user satisfaction feedbacks is

worthy to be considered in future works.

The oth er advantage of the proposed method is

removing the side effects of context access latency and its

signalling overhead. Moreover, th e proposed method

removes the necessity of knowing QoS related context

knowledge from application, user, and device since the

quality of user experience implicitly reflects them.

Therefore, the proposed method has incorporated the

QoE awareness to the target PoA selection without

remarkable signalling.

Another advantage of the proposed method is

considering the movement of mobile users in handover

decision. This means that the proposed method inherently

combines the mobility prediction with QoE awareness.

This is due to the fact that the probability of selecting the

PoAs that are not in the regular movement path of MN

will be decreased during learning process as selecting

those PoAs degrades the attained reward (UPQ) for the

sake of bad link quality or disconnections. However, this

probability is increased for PoAs that are in regular

movement path of MN over the time.

The major limitation of the proposed method is the

time required for the algorithm to converge to the steady

state which is yet extraordinary for large environments

with many mobile users. So improving the proposed

method to reduce its convergence time is an open

challenge. Combining the proposed method with

conventional context aware methods through advice

taking idea [65] is a proposal that will be considered in

our future work.

One may imagine that the computation load imposed

to MN (if objective UPQ evaluation method has been

chosen) or the user-device interaction complexity (i f

direct feedback of user satisfaction level has been

exploited) is of the main drawbacks of the proposed

method. However, such overheads are not so significant

due to the following r easons:

1) Users are more familiar with qualitative

representation of their needs (their UPQ) than

quantitative one (r epresenting preferences and

requirements in terms of QoS parameters as in context-

aware methods) and this persuades users to accept

interaction complexity of direct UPQ feedback.

2) Decision strategy improves as MN lives through

the environment, so the UPQ evaluation periods may be

made longer as the mean UPQ increases and its deviation

decreases. Therefore, interaction complexity and

computation overhead decrease gradually.

Finally it is remarkable that although the behaviour of

the mobile users is mostly regular in practice and our

simulations have also been performed under the

assumption of regular movement pattern of mobile users;

however the proposed method may also be improved for

environments involving mobile users with irregular

behaviours.

7 Conclusions

This paper presents a personalized QoE-aware

handover management model that employs UPQ to

manage the user satisfaction more elegantly. The method

utilizes UPQ instead of network quality parameters to

eliminate the complexity and overhead of the network

context gathering and management procedures. In our

proposed method which benefits from UPQ as a delayed

reward, an MARL scheme is employed for target PoA

selection under MIH framework. The simulation of a

typical scenario with video streams as the traffic sources

shows that the proposed approach delivers higher

performance comparing to the traditional link based

handover decision making scheme which does not require

context transfer signalling similar to the proposed

method. The proposed meth od has also shown better

performance comparing to a conventional context-aware

method in terms of service quality and signalling

overhead. We also have shown the ability of learning

trajectory of MNs and feasibility of the proposed method

for conventional environments by simulating a more

complex scenario with more mobile nodes.

Although the proposed method is rather slow in

convergence and would not work optimally during its

learning period, it can be applied to infuse more

intelligence to traditional context-aware methods.

Employing the proposed method in a more intelligent

handover management model for more complex scenarios

with irregular behaviour of some mobile users is an issue

to be considered in future works. Furthermore, using the

proposed method in more advanced handover mechanism

with multi-homing capability and incorporating other

parameters such as power consumption and price into

decision parameters are of our future researches.

Moreover, real implementation of the proposed method

can be addressed as another future work to evaluate its

performance considering no-reference objective

assessment methods, pseudo subjective assessment

methods, and also subjective measures.

References

1. Prehofer, C., Nafisi, N., & Wei, Q. A framework for context-

aware handover decisi ons. In 14th IEEE Proceedings on

Personal, Indoor and Mobile Radio Communications,

PIMRC, 2003 (Vol. 3, pp. 2794-2798)

2. Nguyen-Vuong, Q.-T., Agoulmine, N., & Ghamri-Doudane,

Y. (2008). A user-centric and context-aware solution to

interface management and access network selection in

heterogeneous wireless environments. Computer Networks,

52(18), 3358-3372.

3. Jingjing, Z., & Ansari, N. (2011). On assuring end-to-end

QoE in next generation networks: challenges and a possible

solution. Communications Magazine, 49(7), 185-191.

4. Kilkki, K. (2008). Quality of experience in communications

ecosystem. Journal of Universal Computer Science, 14(5),

615-624.

5. Stankiewicz, R., & Jajszczyk, A. (2011). A survey of QoE

assurance in converged networks. Computer Networks,

55(7), 1459-1473.

6. Herman, H., Rahman, A. A., Syahbana, Y. A., & Bakar, K.

A. Nonlinearity Modelling of QoE for Video Streaming

over Wireless and Mobile Network. In Second

International Conference on Intelligent Systems, Modelling

and Simulation (ISMS), 2011 (pp. 313-317)

7. Mitra, K., Zaslavsky, A., & Aahlund, C. A probabilistic

context-aware approach for quality of experience

measurement in pervasive systems. In Proceedings of the

ACM Symposium on Applied Computing, 2011 (pp. 419-

424)

8. Winkler, S., & Mohandas, P. (2008). The evolution of video

quality measurement: from psnr to hybrid metrics. IEEE

Transactions on Broadcasting, 54, 660-668.

9. Saliba, J., Beresford, A., Ivanovich, M., & Fitzpatrick, P.

(2005). User-perceived quality of service in wireless data

networks. Personal Ubiquitous Comput., 9(6), 413-422.

10. Magoulas, G. D., & Ghinea, G. Neural network-based

interactive multicriteria decision making in a quality of

perception-oriented management scheme. In International

Joint Conference on Neural Networks, 2001 (Vol. 4, pp.

2536-2541 vol.2534)

11. Ghahfarokhi, B. S., & Movahhedinia, N. (2011). A

contextaware handover decision based on user perceived

quality of service trigger. Wireless Communication and

Mobile Computing, 11, 723–741.

12. Hasswa, A., Nasser, N., & Hassanein, H. (2007). A

seamless context-aware architecture for fourth generation

wireless net works. Wirel. Pers. Commun., 43(3), 1035-

1049.

13. Ahmed, T., Kyamakya, K., & Ludwig, M. Architecture of a

Context-Aware Vertical Handover Decision Model and Its

Performance Analysis for GPRS - WiFi Handover. 11th

IEEE Symposium on Computers and Communications, 2007

(pp. 795-801).

14. Kang, J.-M., Ju, H.-T., & Hong, J. (2006). Towards

Autonomic Handover Decision Management in 4G

Networks. In Autonomic Management of Mobile

Multimedia Services (pp. 145-157).

15. Prehofer, C., Nafisi, N., & Wei, Q. A framework for

context-aware handover decisions. In IEEE International

Symposium on Personal, Indoor and Mobile Radio

Communications, 2003 (pp. 2794-2798)

16. Wei, Q., Farkas, K., Prehofer, C., Mendes, P., & Plattner, B.

(2006). Context-aware handover using active network

technology. Comput. Netw., 50(15), 2855-2872.

17. Bowling, M., & Vel oso, M. (2002). Multiagent learning

using a variable learning rate. Artif. Intell., 136(2), 215-250.

18. IEEE 802.21, IEEE Standard for Local and Metropolitan

Area Networks: Media Independent Handover Services,

2008

19. Cacace, F., & Vollero, L. Managing mobility and adaptation

in upcoming 802.21 enabled devices. In 4th international

workshop on Wireless mobile applications and services on

WLAN hotspots, 2006 (pp. 1-10)

20. Oliva, A. d. l., Melia, T., Vidal, A., Bernardos, C. J., Soto,

I., & Banchs, A. (2007). IEEE 802.21 enabled mobile

terminals for optimized WLAN/3G handovers: a case study.

SIGMOBILE Mob. Comput. Commun. Rev., 11(2), 29-40.

21. ITU-T Rec. P.800; Methods for subjective determination of

transmission quality, 1996

22. ITU-T Rec. J.246; Perceptual audiovisual quality

measurement techniques for multimedia services over

digital cable television networks in presence of reduced

bandwidth reference, 2008

23. ITU-T Rec. G.1070; Opinion model for videophone

applications, 2007

24. ITU-T Rec. J.247; Objective perceptual multimedia video

quality measurement in the presence of a full reference,

2008

25. ITU-T Rec. p.910; Subjective video quality assessment

method for multimedia applications, 1999

26. Reves, X. User perceived Quality Evaluation in a B3G

Network Testbed. 15th IST Mobile and Wireless Summit,

2006

27. ITU-T Rec. P.861; Objective quality measurement of

telephone-band speech codecs, 1998

28. ITU-T Rec. p.862, Perceptual Evaluation of Speech Quality

(PESQ): An objective method for end to end speech quality

assessment of narrowband telephone networks and speech

codecs, 2001

29. ITU-T Rec. J.144, Objective perceptual video quality

measurement techniques for digital cable television in

presence of full reference, 2004

30. Menkovski, V., Exarchakos, G., & Liotta, A. Machine

Learning Approach for Quality of Experience Aware

Networks. In 2nd International Conference on Intelligent

Networking and Collaborative Systems, 2010 (pp. 461-466)

31. Fiedler, M., Hossfeld, T., & Tran-Gia, P. (2010). A Generic

Quantitative Relationship Between Quality of Experience

and Quality of Service. IEEE Network, 24(2), 36-41.

32. Brooks, P., & Hestnes, B. r. (2010). User measures of

quality of experience: why being objective and quantitative

is important. IEEE Network, 24(2), 8-13.

33. Mahdi, A., & Picovici, D. (2010). New single-ended

objective measure for non-intrusive speech quality

evaluation. Signal, Image and Video Processing, 4(1), 23-

38.

34. Rein, S., Fitzek, F. H. P., & Reisslein, M. (2005). Voice

quality evaluation in wireless packet communication

systems: a tutorial and performance results for ROHC.

Wireless Communications, 12(1), 60-67.

35. Jiang, X., Wang, Y., & Wang, C. No-reference video quality

assessment for MPEG-2 video streams using BP neural

networks. In 2nd International Conference on Interaction

Sciences: Information Technology, Culture and Human,

2009 (pp. 307-311)

36. Klaue, J., Rathke, B., & Wolisz, A. (2003). EvalVid – A

Framework for Video Transmission and Quality Evaluation.

In Computer Performance (pp. 255-272).

37. Busoniu, L., Babuska, R., & De Schutter, B. (2008). A

Comprehensive Survey of Multiagent Reinforcement

Learning. IEEE Transactions on Systems, Man, and

Cybernetics, Part C: Applications and Reviews, 38(2), 156-

172.

38. Weiss, G. (1999). Multiagent Systems: A Modern Approach

to Distributed Artificial Intelligence: MIT Press.

39. Sleem, A., & Kumar, A. (2005). Handoff management in

wireless data networks using topography-aware mobility

prediction. Journal of Parallel and Distributed Computing,

65(8), 963-982.

40. Ghahfarokhi, B. S., & Movahhedinia, N. (2007). QoS

provisioning by EFuNNs-based handoff planning in cellular

MPLS networks. Comput. Commun., 30(13), 2676-2685.

41. Onel, T., Ersoy, C., Cayirci, E., & Parr, G. (2004). A

multicriteria handoff decision scheme for the next

generation tactical communications systems. Computer

Networks, 46(5), 695-708.

42. Edwards, G., & Sankar, R. (1998). Microcellular handoff

using fuzzy techniques. Wirel. Netw., 4(5), 401-409.

43. Ghahfarokhi, B. S., & Movahhedinia, N. (2012). Context-

Aware Handover Decision in an Enhanced Media

Independent Handover Framework. Wireless Personal

Communications., published online, DOI 10.1007/s11277-

012-0543-4.

44. Sharma, S., Baek, I., & Chiueh, T.-C. (2007). OmniCon: a

Mobile IP-based vertical handoff system for wireless LAN

and GPRS links. Softw. Pract. Exper., 37(7), 779-798.

45. Zhu, F., & McNair, J. (2006). Multiservice vertical handoff

decisi on algorithms. EURASIP J. Wirel. Commun. Netw.,

2006(2), 52-52.

46. Stevens-Navarro, E., & Wong, V. W. S. Comparison

between Vertical Handoff Decision Algorithms for

Heterogeneous Wireless Networks. In IEEE 63rd Vehicular

Technology Conference, VTC, 2006 (Vol. 2, pp. 947-951)

47. Wenhui, Z. Handover decision using fuzzy MADM in

heterogeneous networks. In IEEE Wireless

Communications and Networking Conference, 2004 (Vol. 2,

pp. 653-658)

48. Bing, H., He, C., & Jiang, L. Intelligent signal processing of

mobility management for heterogeneous networks. In

Proceedings of the International Conference on Neural

Networks and Signal Processing, 2003 (Vol. 2, pp. 1578-

1581)

49. Chan, P. M. L., Sheriff, R. E., Hu, Y. F., Conforto, P., &

Tocci, C. (2002). Mobility management incorporating fuzzy

logic for het erogeneous IP envir onment. Communications

Magazine, 39(12), 42-51.

50. Inoue, M., Mahmud, K., Murakami, H., Hasegawa, M., &

Morikawa, H. (2005). Context-Based Network and

Application Management on Seamless Networking

Platform. Wirel. Pers. Commun., 35(1-2), 53-70.

51. Indulska, J., & Balasubramaniam, S. Context-aware vertical

handovers between WLAN and 3G networks. In IEEE 59th

Vehicular Technology Conference, 2004 (Vol. 5, pp. 3019-

3023 )

52. Kassar, M., Kervella, B., & Pujolle, G. (2008). An overview

of verti cal handover decision strategies in heterogeneous

wireless networks. Computer Communications, 31(10),

2607-2620.

53. Hong, C.-P., Weems, C. C., & Kim, S.-D. (2008). An

effective vertical handoff scheme based on service

management for ubiquitous computing. Comput. Commun.,

31(9), 1739-1750.

54. Wang, Y., Zhang, P., Zhou, Y., Yuan, J., Liu, F., & Li, G.

Handover Management in Enhanced MIH Framework for

Heterogeneous Wi reless Networks Environment. Wireless

Personal Communications, 52(3), 615-636.

55. Neves, P., Soares, J., Sargento, S., Pires, H., & Fontes, F.

(2011). Context-aware media independent information

server for optimized seamless handover procedures.

Computer Networks, 55(7), 1498-1519.

56. Mussabbir, Q. B., Wenbing, Y., Zeyun, N., & Xiaoming, F.

(2007). Optimized FMIPv6 Using IEEE 802.21 MIH

Services in Vehicular Networks. IEEE Transactions on

Vehicular Technology, 56(6), 3397-3407.

57. Wang, C.-Y., Huang, H.-Y., & Hwang, R.-H. Mobility

management in ubiquitous environments. Personal

Ubiquitous Comput., 15(3), 235-251.

58. Samaan, N., & Karmouch, A. (2005). A mobility prediction

architecture based on contextual knowledge and spatial

conceptual maps. IEEE Transactions on Mobile Computing,

4(6), 537-551.

59. Sen, S., Sekaran, M., & Hale, J. Learning to coordinate

without sharing information. In twelfth national conference

on artificial intelligence, 1994, (vol. 1, pp. 426-431)

60. Kettani, D., & Moulin, B. (1999). A spatial model based on

the notions of spatial conceptual map and of object’s

influence areas. Paper presented at the Spat Inf Theory Cogn

Comput Found Geogr Inf Sci, 1999 (pp. 401-416)

61. National Institute of Standards and Technology.

http://w3.antd.nist.gov/s eamlessandsecure/pubtool .shtml,

visited on July 2011.

62. Balasubramaniam, S., & Indulska, J. (2004). Vertical

handover supporting pervasive computing in future wireless

networks. Computer Communications, 27(8), 708-719.

63. Piamrat, K., et al. QoE-aware vertical handover in wireless

heterogeneous networks, In Wireless Communications and

Mobile Computing Conference (IWCMC), 2011 (pp.95-

100).

64. A. Takahashi, D. Hands, and V. Barriac (2008).

Standardization activities in the ITU for a QoE assessment

of IPTV. Communications Magazine, vol. 46, 78-84.

65. M. Rovatsos & A. Belesiot is. Advice taki ng in multiagent

reinforcement learning, In Proceedings of the 6th

international joint conference on Autonomous agents and

multiagent systems (AAMAS), 2007 (pp. 1342-1344).

66. T. M. Mitchell, Machine Learning. McGraw-Hill, 1997.

RoF distributed antenna architecture and reinforcement learning empowered real-time EMI immunity for highly reliable railway communication: publisher’s note

Article

Full-text available

Jun 2022
OPT EXPRESS

The publisher's note contains a correction to [Opt. Express 29 32333 (2021)10.1364/OE.438439]. The article was corrected on 17 June 2022.

RoF distributed antenna architecture– and reinforcement learning–empowered real-time EMI immunity for highly reliable railway communication

Article

Full-text available

Sep 2021
OPT EXPRESS

Highly reliable wireless train-ground communication immune to the electromagnetic interferences (EMIs) is of critical importance for the security and efficiency of high-speed railways (HSRs). However, the rapid development of HSRs (>52,000 km all over the world) brings great challenges on the conventional EMIs mitigation strategies featuring non-real-time and passive. In this paper, the convergence of radio-over-fiber distributed antenna architecture (RoF-DAA) and reinforcement learning technologies is explored to empower a real-time, cognitive and efficient wireless communication solution for HSRs, with strong immunity to EMIs. A centralized communication system utilizes the RoF-DAA to connect the center station (CS) and distributed remote radio units (RRUs) along with the railway track-sides to collect electromagnetic signals from environments. Real-time recognition of EMIs and interactions between the CS and RRUs are enabled by the RoF link featuring broad bandwidth and low transmission loss. An intelligent proactive interference avoidance scheme is proposed to perform EMI-immunity wireless communication. Then an improved Win or learn Fast-Policy Hill Climbing (WoLF-PHC) multi-agent reinforcement learning algorithm is adopted to dynamically select and switch the operation frequency bands of RRUs in a self-adaptive mode, avoiding the frequency channel contaminated by the EMIs. In proof-of-concept experiments and simulations, EMIs towards a single RRU and multiple RRUs in the same cluster and towards two adjacent RRUs in distinct clusters are effectively avoided for the Global System for Mobile communications–Railway (GSM-R) system in HSRs. The proposed system has a superior performance in terms of circumventing either static or dynamic EMIs, serving as an improved cognitive radio scheme to ensuring high security and high efficiency railway communication.

Wireless control using reinforcement learning for practical web QoE

Article

Feb 2020
COMPUT COMMUN

Wireless networks show several challenges not found in wired networks, due to the dynamics of data transmission. Besides, home wireless networks are managed by non-technical people, and providers do not implement full management services because of the difficulties of manually managing thousands of devices. Thus, automatic management mechanisms are desirable. However, such control mechanisms are hard to achieve in practice because we do not always have a model of the process to be controlled, or the behavior of the environment is dynamic. Thus, the control must adapt to changing conditions, and it is necessary to identify the quality of the control executed from the perspective of the user of the network service. This article proposes a control loop for transmission power and channel selection, based on Software Defined Networking and Reinforcement Learning (RL), and capable of improving Web Quality of Experience metrics, thus benefiting the user. We evaluate a prototype in which some Access Points are controlled by a single controller or by independent controllers. The control loop uses the predicted Mean Opinion Score (MOS) as a reward, thus the system needs to classify the web traffic. We proposed a semi-supervised learning method to classify the web sites into three classes (light, average and heavy) that groups pages by their complexity, i.e. number and size of page elements. These classes define the MOS predictor used by the control loop. The proposed web site classifier achieves an average score of 87%±1%, classifying 500 unlabeled examples with only fifteen known examples, with a sub-second runtime. Further, the RL control loop achieves higher Mean Opinion Score (up to 167% in our best result) than the baselines. The page load time of clients browsing heavy web sites is improved by up to 6.6x.

Dynamic user preference based group vertical handoffs in heterogeneous wireless networks: a non-cooperative game approach

Article

Full-text available

Feb 2020
WIREL NETW

When a group of Mobile Users (MUs) equipped with multi-mode or multi-home terminals, like passengers on board a bus or a train or a car, moves from one wireless network (WN) to another WN within a heterogeneous wireless network (HWN) environment, request vertical handoffs simultaneously, a group vertical handoff (GVHO) occurs. In literature, the prevailing research work is mainly concerned for forced GVHO with network aspects like signal strength and bandwidth etc. while in reality the user initiated GVHO with the user aspects like price, power consumption and velocity etc. along with their respective user preferences is more important for performing vertical handoffs in HWNs. In user initiated GVHO, selection of the mutually best WN-MU pair which can maximise network revenue of constituent WN as well as user satisfaction of MU in a group while minimising the simultaneous selection of a WN by multiple MU of the group is a challenging problem. This paper proposes a GVHO decision model based on non-cooperative game which utilizes multiple handoff decision attributes and their respective user preferences calculated dynamically on real-time basis as the game strategies to select the best available WNs by group MUs at NASH equilibrium for vertical handoffs. The performance of the proposed model is evaluated in terms of number of GVHOs, price of anarchy and price of stability for both group of MUs and WNs. The simulation results show that the proposed model results in minimum number of GVHOs as compared to existing GVHO models and maximisation of user satisfaction and network revenue.

Dynamic User Preference Based Network Selection for Vertical Handoff in Heterogeneous Wireless Networks

Article

Full-text available

Jan 2018
WIRELESS PERS COMMUN

In Heterogeneous Wireless Networks (HWN), seamless Vertical Handoff (VHO) to the best available network is significant in providing Quality of Experience to the Mobile Users. The selection of best available network is based on multiple contrasting handoff decision attributes along with their respective User Preferences. In literature, the user preferences used in various network selection techniques are pre-fixed i.e. static and arbitrary without any standard theoretical basis. This paper proposes a method to moderate these static user preferences on real time basis according to the current value of respective handoff decision attributes, to make them dynamic and realistic. The effect of Dynamic User Preferences on network selection for vertical handoff, is evaluated with the prominent Multi Attribute Decision Making (MADM) methods like Simple Additive Weighting, Multiplicative Exponential Weighting, Technique of Order Preference Similarity to Ideal Solution, and Grey Relational Analysis. Simulations are performed using both static user preference weights from the user and proposed dynamic user preference weights. The result of simulations shows that the number of vertical handoffs, needs to complete an application by a mobile user, using dynamic user preference weights is less in comparison to using static user preference weights for all considered MADM methods. This proves the effectiveness of the proposed dynamic user preferences in network selections to perform VHOs in HWNs.

Exploring the Role of Mobile Learning in Global Education

Chapter

Jan 2021

Kijpokin Kasemsap

This chapter describes the current trends of mobile devices in education, the applications of mobile technologies in learning, the overview of Mobile Learning (m-learning), and the importance of m-learning in global education. M-learning encourages both blended learning and collaborative learning, thus allowing the learners at different locations to get in touch with their peers or others teams to discuss and learn. The m-learning environment is about access to content, peers, experts, portfolio artifacts, credible sources, and previous thinking on relevant topics. Given the convenience of m-learning, there is less time spent getting trained, and the overall costs are lowered as a results. With m-learning, learners are able to learn in their own style at their own pace. M-learning provides easy access to the learning at any place and any time, which is more convenient to the learners.

Reinforcement learning for personalization: A systematic literature review

Article

Full-text available

Apr 2020

The major application areas of reinforcement learning (RL) have traditionally been game playing and continuous control. In recent years, however, RL has been increasingly applied in systems that interact with humans. RL can personalize digital systems to make them more relevant to individual users. Challenges in personalization settings may be different from challenges found in traditional application areas of RL. An overview of work that uses RL for personalization, however, is lacking. In this work, we introduce a framework of personalization settings and use it in a systematic literature review. Besides setting, we review solutions and evaluation strategies. Results show that RL has been increasingly applied to personalization problems and realistic evaluations have become more prevalent. RL has become sufficiently robust to apply in contexts that involve humans and the field as a whole is growing. However, it seems not to be maturing: the ratios of studies that include a comparison or a realistic evaluation are not showing upward trends and the vast majority of algorithms are used only once. This review can be used to find related work across domains, provides insights into the state of the field and identifies opportunities for future work.

A Seamless Vertical Handover based on IEEE802.21 & IPv6 for Heterogeneous Wireless Network

Conference Paper

Nov 2018

Sabeen Tahir

Vertical handoff in heterogeneous wireless networks: A tutorial

Conference Paper

May 2017

Wireless Sensor Network With Always Best Connection for Internet of Farming

Chapter

Jul 2017

The Internet of Things (IoT) is transforming the agriculture industry and enables farmers to deal with the vast challenges in the industry. Internet of Farming (IoF) applications increases the quantity, quality, sustainability as well as cost effectiveness of agricultural production. Farmers leverage IoF to monitor remotely, sensors that can detect soil moisture, crop growth and livestock feed levels, manage and control remotely the smart connected harvesters and irrigation equipment, and utilize artificial intelligence based tools to analyze operational data combined with 3rd party information, such as weather services, to provide new insights and improve decision making. The Internet of Farming relies on data gathered from sensor of Wireless Sensor Network (WSN). The WSN requires a reliable connectivity to provide accurate prediction of the farming system. This chapter proposes a strategy that provides always best connectivity (ABC). The strategy considers a routing protocol to support Low-power and lossy networks (LLN), with a minimum energy usage. Two scenarios are presented.

Managing mobility and adaptation in upcoming 802.21 enabled devices

Article

Full-text available

Sep 2006

One emerging characteristic of electronic devices is the increasing number of connectivity interfaces (aka NICs1) towards the outside world. That obviously translates in a set of technical issues related to their management in order to provide seamless connectivity when the connections move from one interface to another. The IEEE 802.21 is a recent effort of IEEE that aims at providing a general interface for the management of NICs. In this paper we discuss how the upcoming standard may be effectively exploited in a mobile context in order to hide network heterogeneity to end users. To accomplish this task, we propose a centralized element called Mobility Manager interfacing with the 802.21 sublayer and responsible for the application of connectivity policies. Based on a real testbed, we showed that the new standard and the MM can be used to improve network performance experienced by the end user. Moreover we showed how the MM can interact with adaptive applications in order to improve further the range of usability of real-time applications.

Route descriptions based on the notions of spatial conceptual map and of object's influence areas

Article

Full-text available

Jan 1999
Lect Notes Comput Sci

In the GRAAD project we aim at developping a knowledge-based system which manipulates spatial and temporal knowledge while simulating the kind of behaviour that people adopt when describing a route. A route description is essentially a narrative text in which sentences are prescriptions given by the speaker to an addressee: they describe a succession of actions that the addressee will have to carry out when s/he follows the route in the described environment. Hence, temporal and spatial knowledge are «interleaved» in a route description. In this paper we present an approach for generating route descriptions using spatial conceptual maps and a simulation of the virtual pedestrian's movements in these maps. We show how the notion of influence area enables us to transform spatial relations of neighborhood and orientation into topological relations. A way can be partitioned into a succession of certain typical segments (intersection with other ways, intersection with crossable objects, intersections with landmark objects' influence areas) which are well suited for natural language descriptions. A route is specified as a succession of way segments pertaining to one or several ways, some of them being used to generate the natural language description. We show how the equations of the virtual pedestrian's trajectory can be used to select the proper movement verbs used in the route description.

Multiagent Systems -- A Modern Approach to Distributed Artificial Intelligence

Book

Jan 1999

Gerhard Weiss

Quality of Experience and Quality Of Service

Article

Feb 2013

Introduction QoS concepts and standards IETF multimedia protocols Semantic approach for QoS management in home networks Conclusion Bibliography

Mobility management incorporating fuzzy logic for a heterogeneous IP environment

Article

Dec 2001

The next generation in mobility management will enable different mobile networks to interoperate with each other to ensure terminal and personal mobility and global portability of network services. However, in order to ensure global mobility, the deployment and integration of both satellite and terrestrial components are necessary. This article is focused on issues related to mobility management in a future mobile communications system, in a scenario where a multisegment access network is integrated into an IP core network by exploiting the principles of Mobile IP. In particular, attention is given to the requirements for location, address, and handover management. In a heterogeneous environment, the need to perform handover between access networks imposes particular constraints on the type of information available to the terminal and network. In this case, consideration will need to be given to parameters other than radio characteristics, such as achievable quality of service and user preference. This article proposes a new approach to handover management by applying the fuzzy logic concept to a heterogeneous environment. The article concludes with a presentation of mobility management signaling protocols

Perceptual Evaluation of Speech Quality (PESQ): An Objective Method for End-to-End Speech Quality Assessment of Narrow-Band Telephone Networks and Speech Codecs

Article

Jan 2001

ITU-T: Methods for Subjective Determination of Transmission Quality. Rec. P.800

Article

Jan 1996

This Recommendation describes methods and procedures for conducting subjective evaluations oftransmission quality. The main revision encompassed by this version of this Recommendation is theaddition of an annex describing the Comparison Category Rating (CCR) procedure. Othermodifications have been made to align this Recommendation with recent revision ofRecommendation P.830.

Context-Aware Handover Decision in an Enhanced Media Independent Handover Framework

Article

Feb 2013

Recent developments in heterogeneous mobile networks emphasize the necessity of more intelligent and context-aware handover decisions. However, the complexity and overhead of collecting and managing context information are the main difficulties in context-aware handovers. Media independent handover (MIH) framework which has been proposed by IEEE 802.21 only provides static context of access networks through its information service. This paper elaborates the idea of handoff-aware network context gathering for renewal of dynamic context in MIH information server. An extension is proposed on MIH framework to efficiently accommodate the dynamic context of access networks along with the ordinary static context in IS. The paper presents analytical evaluation of the proposed context gathering method in terms of context access latency and signalling overhead. Also, the paper presents a policy-based context-aware handover model based on the proposed extension. A well defined policy format is proposed for straight description of users’, devices’, and applications’ preferences and requirements. In contrast to traditional policy-based methods, a multi-policy scheme is proposed that exploits rank aggregation methods to employ a set of matching policies in target point of attachment selection. Simulations have been carried out in NS2 to verify the performance of the proposed context gathering method and the proposed handover decision model. Simulation results show better performance in terms of evaluation metrics.

OmniCon: a Mobile IP‐based vertical handoff system for wireless LAN and GPRS links

Article

Jun 2007
SOFTWARE PRACT EXPER

Wi-Fi based hotspots offer mobile users broadband wireless Internet connectivity in public work spaces and corporate/university campuses. Despite the aggressive deployment of these hotspots in recent years, high-speed wireless Internet access remains restricted to small geographical areas due to the limited physical coverage of wireless LANs. On the other hand, despite their lower throughput, cellular networks have a significantly wider coverage and are thus much more available. Recognizing that 2.5G or 3G cellular networks can effectively complement wireless LANs, we set out to develop a vertical handoff system that allows mobile users to seamlessly fall back to such cellular networks as the general packet radio service (GPRS) or 3G whenever wireless LAN connectivity is not available. The resulting handoff mechanism allows a network connection of a mobile node to operate over multiple wireless access networks in a way that is transparent to end user applications. In this paper, we present the design, implementation, and evaluation of a fully operational vertical handoff system, called OmniCon, which enables mobile nodes to automatically switch between wireless LAN and GPRS, based on wireless LAN availability, by introducing a simple extension to the existing Mobile IP implementation. We discuss the design issues in the proposed vertical handoff system for heterogeneous networks, including connection setup problems due to network address translation, and the disparity in link characteristics between wireless LANs and GPRS. A detailed performance evaluation study of the OmniCon prototype demonstrates its ability to migrate active network connections between these two wireless technologies with low handoff latency and close to zero packet loss. Copyright © 2006 John Wiley & Sons, Ltd.

Context-Based Network and Application Management on Seamless Networking Platform

Article

Jan 2005

A context-based adaptive communication system is introduced for use in heterogeneous networks. Context includes the user's presence, location, available network interfaces, network availability, network priority, communication status, terminal features, and installed applications. An experimental system was developed to clarify the feasibility of using context information to flexibly control networks and applications. The system operates on a seamless networking platform we developed for heterogeneous networks. By using contexts, the system can inform the caller and callee of applications they can access, which are available through the network before communication occurs. Changes in contexts can switch an on-going application to another during actual communication. These functions provide unprecedented styles of communication. A business scenario for a seamless networking provider is also presented.

A personalized QoE-aware handover decision based on distributed reinforcement learning

Abstract and Figures

Recommended publications

Netradar - Measuring the wireless world

Control de Admisión Óptimo en Redes Móviles Celulares con Predicción de Movimiento

A context-aware handover decision based on user perceived quality of service trigger

HMNToolSuite: tool support for mobility management of mobile devices in heterogeneous mobile network...