ThesisPDF Available

Efficient Support of Video Streaming to Mobile Devices with Utilization of Multiple Radio Interfaces and Scalable Video Coding

July 2011

July 2011

Authors:

Chayan Sarkar

TCS Research

…

Scalability directions of scalable video coding

…

Message exchange between a server and a client during a RTSP video-on- demand streaming

…

Extended NALU header for SVC - 3 extra bytes are used along with usual 1-byte header

…

+17

Overview of the software modules introduced by the system architecture

…

Figures - uploaded by Chayan Sarkar

Content may be subject to copyright.

Content uploaded by Chayan Sarkar

Content may be subject to copyright.

Eﬃcient Support of Video Streaming to Mobile

Devices with Utilization of Multiple Radio Interfaces

and Scalable Video Coding

Dissertation

Submitted in partial fulﬁllment of the requirements for the degree of

Master of Technology

Chayan Sarkar

Roll No : 09305069

under the guidance of

Dr. Stephan Rein

Prof. Adam Wolisz

Prof. Kameswari Chebrolu

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

Telecommunication Networks Group

Technische Universit¨at Berlin

June 2011

Abstract

An annoying experience in streaming multimedia to the mobile user is a ﬂuctuating me-

dia quality due to the varying characteristics of the wireless link. This thesis aims to

improve the mobile user’s media experience by utilization of multiple wireless interfaces

of the user’s terminal and proper handling of scalable video stream. Scalable video coding

creates several codec layers in a multimedia content and allows to stream the content in

multiple ﬂows from which the stream receivers can select a subset of ﬂows according to

their quality needs.

This work provides a framework in order to provide a stable video quality for mo-

bile users. Appropriate software extensions are introduced to the client terminal and

the streaming server to meet the goal without changing the existing streaming server or

scalable video player software. A tool is designed for dynamic bandwidth estimation in

WLAN that mixes probe packets in between the video stream to induce less traﬃc. De-

pending on the bandwidth availability, a subset of codec layers of a scalable video stream

are received via WLAN and the remaining layers are either switched to another interface

of the receiver (if available) or discarded. Using a set of customized UDP control mes-

sages, a new signaling method is established to support the framework.

We verify that the new bandwidth estimator gives accurate results with maximum

7.5% underestimation/overestimation, while the switch of codec layers is triggered rea-

sonably and in time due to the varying available bandwidth. The layer switching stabilizes

within few hundreds of milliseconds up to 2 seconds time period. The PSNR measure-

ments indicate that the switch of codec layers and the probing packets do not aﬀect the

objective quality, while the new utilization of multiple interfaces improves the general

user experience.

Acknowledgement

I would like to express my sincere gratitude to my advisors Dr. Stephan Rein, Prof.

Adam Wolisz of Technical University Berlin and Prof. Kameswari Chebrolu of Indian

Institute of Technology, Bombay. I am deeply indebted to them for the guidance and

encouragement that they provided throughout the duration of the project. They have

constantly motivated me to come up with my own ideas. I would also like to thank

Karsten Gr¨uneberg of Fraunhofer Heinrich Hertz Institute, Berlin and Sven Wieth¨olter

of Technical University Berlin for their help at diﬀerent point of times.

Chayan Sarkar

IIT Bombay

Monday, June 27, 2011

iii

Contents

1 Introduction 1

1.1 Motivation.................................... 1

1.2 ProblemStatement............................... 2

1.3 Challenges.................................... 2

1.4 Contributions .................................. 3

1.5 Outline...................................... 4

2 Related Work 7

3 Principles and Features of Scalable Video Streaming 9

3.1 Scalability.................................... 9

3.2 Streaming a scalable video . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.2.1 Codec layers and streamed ﬂows . . . . . . . . . . . . . . . . . . . . 11

3.2.2 Streaming session protocols . . . . . . . . . . . . . . . . . . . . . . 12

3.2.3 Traﬃc during a streaming session . . . . . . . . . . . . . . . . . . . 12

3.3 How scalable video encoding helps . . . . . . . . . . . . . . . . . . . . . . . 14

4 Proposed System Architecture 15

4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 Dynamic switching of scalable video content . . . . . . . . . . . . . . . . . 17

4.3 Synchronization of multiple video ﬂows of a streaming session . . . . . . . 19

5 Available Bandwidth Estimation 21

5.1 General bandwidth estimation technique . . . . . . . . . . . . . . . . . . . 22

5.1.1 Limitations of packet dispersion technique in wireless networks . . . 23

5.2 WBest : A bandwidth estimation tool for IEEE 802.11 based wireless net-

works....................................... 23

5.2.1 Limitations of WBest . . . . . . . . . . . . . . . . . . . . . . . . . . 25

5.3 EStream : A new bandwidth estimation tool for WLAN . . . . . . . . . . 26

5.3.1 Content of probe packets . . . . . . . . . . . . . . . . . . . . . . . . 28

5.3.2 Continuous bandwidth estimation . . . . . . . . . . . . . . . . . . . 28

5.4 Cost comparison between EStream and WBest due to intruding traﬃc . . . 29

5.5 Bandwidth estimation for UMTS network . . . . . . . . . . . . . . . . . . 30

6 Signaling 31

6.1 SolvingNATissue ............................... 31

6.2 Controlmessages ................................ 35

6.3 Packet handling by software modules . . . . . . . . . . . . . . . . . . . . . 40

6.3.1 Inside server-module . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.3.2 Inside client-module . . . . . . . . . . . . . . . . . . . . . . . . . . 42

6.4 Collaboration between the server-module and the client-module . . . . . . 43

7 Experiments, Results and Evaluation 47

7.1 Measurementsetup............................... 47

7.2 Measurements and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

7.2.1 Evaluation of EStream . . . . . . . . . . . . . . . . . . . . . . . . . 49

7.2.2 Evaluation of scalable video adaptation via network aware utiliza-

tion of multiple interfaces . . . . . . . . . . . . . . . . . . . . . . . 51

8 Future Work 61

9 Conclusion 63

Appendices 67

A PSNR comparison 67

List of Figures

3.1 Scalability directions of scalable video coding . . . . . . . . . . . . . . . . . 10

3.2 Hierarchical organization of frames in SVC . . . . . . . . . . . . . . . . . . 10

3.3 RTSP video-on-demand streaming . . . . . . . . . . . . . . . . . . . . . . . 13

4.1 System arcitecture - overview . . . . . . . . . . . . . . . . . . . . . . . . . 15

4.2 System architecture - inside the software extensions . . . . . . . . . . . . . 16

4.3 Extended NALU header for SVC . . . . . . . . . . . . . . . . . . . . . . . 18

4.4 RTP payload format - STAP-A (type 24) NAL unit . . . . . . . . . . . . . 18

4.5 RTP payload format - FU-A (type 28) NAL unit . . . . . . . . . . . . . . 18

5.1 Packetdispersion ................................ 22

5.2 Packet forwarding at the last hop wireless link . . . . . . . . . . . . . . . . 24

5.3 Mixing of probe and data packets . . . . . . . . . . . . . . . . . . . . . . . 27

6.1 Network address translation . . . . . . . . . . . . . . . . . . . . . . . . . . 32

6.2 Structure used to store registered interface information by the server-module 35

6.3 Structure used by the client-module to store information about each stream-

ingﬂows..................................... 35

6.4 Control message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

6.5 Mini structure used to inform and store ﬂow distribution policy . . . . . . 37

6.6 Combined operating point value using three scalable identiﬁers . . . . . . . 37

6.7 Pseudo code to decide a packet’s fate at the server-module . . . . . . . . . 38

6.8 Message body format of probe request message . . . . . . . . . . . . . . . . 39

6.9 Packet traversal inside server-module . . . . . . . . . . . . . . . . . . . . . 41

6.10 Packet traversal inside client-module . . . . . . . . . . . . . . . . . . . . . 42

6.11 Message exchange during a streaming session . . . . . . . . . . . . . . . . . 44

7.1 General experimental testbed . . . . . . . . . . . . . . . . . . . . . . . . . 47

7.2 Packet dispersion rate with varying packet size . . . . . . . . . . . . . . . . 49

7.3 WBest v/s EStream - III . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.4 Sample video frame - I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7.5 Sample video frame - II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

vii

7.6 Sample video frame - III . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7.7 PSNR comparison between received video and original video - I . . . . . . 54

7.8 Inter-packet delay - I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.9 Inter-packet delay - II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

7.10 PSNR comparison between received video and original video - II . . . . . . 56

7.11 PSNR comparison between received video and original video - III . . . . . 57

7.12 PSNR comparison between received video and original video - IV . . . . . 58

7.13 PSNR comparison between received video and original video - V . . . . . . 59

7.14 PSNR comparison between received video and original video - VI . . . . . 60

Chapter 1

Introduction

1.1 Motivation

Fact - I : Multimedia content distribution takes signiﬁcant part of the IP traﬃc. As

predicted by the well known Cisco study [2], Internet video will account for 62 percent of

the consumer Internet traﬃc by the end of 2015. By the same time, Wi-Fi and mobile

devices will account for 54 percent of IP traﬃc. Media content access from mobile devices

is gaining popularity day by day and a large amount of multimedia traﬃc is accounted

by video streaming. The amount of video traﬃc in the wireless networks does impose the

challenge of suﬃcient bandwidth provision to the mobile users. Due to varying character-

istics of wireless link, the available bandwidth for a mobile user changes very frequently.

An annoying experience in mobile multimedia streaming is ﬂuctuating media quality due

to the varying bandwidth of the wireless network. If the available bandwidth is not suﬃ-

cient, packets are delayed or even lost, resulting in a drop of perceived quality of the video.

Fact - II : Modern mobile devices are equipped with multiple radio interfaces to

provide a wide range of connectivity options to the users. Wi-Fi and UMTS are two most

common interfaces available with almost every mobile devices. UMTS network is being

deployed widely and it will be available almost everywhere within a short period of time.

Though it has a wide connectivity, it can provide only a low bandwidth to the users. On

the other hand, the IEEE 802.11 based WLAN is available only in certain hot-spots like

airport, shops, oﬃces, educational institutes etc. WLAN can provide a high available

bandwidth within a short connectivity range. But the bandwidth availability can change

quickly with environmental inﬂuences like physical obstacle, interference by other users

etc. In a word, a high bandwidth can not be ensured for mobile users through a single

access network.

1.2 Problem Statement

If multiple access networks are available to an user, bandwidth aggregation among these

networks can eﬀectively provide an higher bandwidth. But sometimes, each access net-

work can provide a very low bandwidth so that altogether the available bandwidth can’t

avoid packet loss. Also multiple access networks are not always available in real life sce-

narios. Under these circumstances, it may be necessary to selectively discard frames to

minimize the eﬀect of their loss on the overall video quality. Please note that in this work,

we have assumed that the client device is equipped with two network interfaces - WiFi

and UMTS.

In this thesis we aim to provide a stable user experience for streamed media by a

combined approach of (i) adapting the content quality to the currently available network

resources and (ii) by a smart utilization of multiple wireless interfaces. For the content

adaptation we use the extensions to the H.264 AVC standard for Scalable Video Coding

(SVC) [18], which allows for encoding the video into multiple hierarchical layers, each of

them providing additional quality. In case of streaming the SVC layers in multiple ﬂows,

the client can select a subset of the ﬂows with respect to the available network resources

and his needs. As the modern mobile devices are equipped with multiple radio interfaces,

we utilize the current resources of the individual interfaces to receive a suitable codec ﬂow.

In case the respective access networks are available, multiple wireless interfaces (WLAN,

UMTS) are utilized simultaneously if thereby the user experience can be improved.

The requirements for the design of such an adaptive SVC multimedia delivery accord-

ing to the client’s available access networks and their current resources are given as follows.

(i) The available network resources have to be estimated, (ii) it has to be enabled that

a codec layer ﬂow can be switched to the appropriate interface (with optional switching

on/oﬀ of some layers) and (iii) the switch of a codec layer has to be triggered reasonably

in time with network resource change in order to achieve a stable user experience.

1.3 Challenges

To reach the desired goal there are many challenges that need to overcome.

Like any other server-client terminology, in a video streaming an user receives video

data using only one access network. To gain a higher overall bandwidth, band-

width aggregation need to be done among multiple access networks (if available).

Bandwidth aggregation can be achieved by splitting the video data and re-route

them over diﬀerent paths so that the receiver receives the data via multiple access

networks. But several question comes on this regard - (i) who will split the data?

(ii) where the data will be re-routed, i.e. how the splitter will know about the ad-

dresses of the interfaces of the receiver? (iii) how the data will be distributed among

multiple paths?

Due to varying characteristic of wireless link, available bandwidth changes very

frequently. If the sender sends data to an user at a higher rate than the available

bandwidth, the packets are delayed or even lost. As a result, the overall video quality

is degraded. On the other hand, a video with a low data rate can not provide a high

perceived quality. So the data need to be sent at the maximum possible data rate

available to the user to perceive the highest video quality. Accurate estimation of

available bandwidth can only ensure right amount traﬃc that can be transmitted

correctly on time. As bandwidth ﬂuctuates within a short period of time in a

wireless network, the bandwidth availability need to be monitored continuously

during a video streaming session. Lack of suitable bandwidth estimation tool for

wireless networks during a video streaming creates another challenge.

To maintain a stable video quality in times of low available bandwidth, video content

need to be adapted. Scalable video coding creates several codec layers in a video.

Codec layers can be transmitted in multiple ﬂows. By receiving a subset of the

ﬂows, content adaptation can be performed. In general, streaming servers are not

aware of scalability of a video content. Somewhere in the ﬂow path (from the

streaming server to the client), the cognizance about the scalable content need to

be installed so that it can help to adapt scalable content according to the user’s

need. The challenge is where to put the cognizance about the scalable content and

it can adapt the stream according to an user’s requirement.

This work resolves these challenges to reach the goal. A detailed solution for each of the

obstacles are described in the subsequent chapters. Please note that the WLAN access

network is prioritized in this work due to its better energy eﬃciency and lesser monetary

expanses.

1.4 Contributions

The main contributions of this work can be summarized in line with the requirements

deﬁned in section 1.2.

We achieve (i), the wireless bandwidth estimation in that we use the packet pair

technique which sends probe packets from the server to the client upon request.

Speciﬁcally, we design a new WLAN bandwidth estimation tool that mixes probe

packets in between the video data stream. Thereby less probing data is induced,

resulting into more accurate results without disturbing the perceived video quality.

The probe packets even make the video transmission more robust against packet loss,

as relevant video data is copied to the body of the probe packets. The bandwidth

estimation is triggered periodically to monitor the available bandwidth, while packet

loss is taken as an additional trigger to detect ﬂuctuating bandwidth in time.

We enable (ii), the codec layer switch by a new signaling method with speciﬁed

UDP control messages, which are exchanged between client and streaming server.

Speciﬁcally, a switch can mean to route a ﬂow to the required interface during the

streaming session setup, it can refer to a re-routing of a ﬂow to a new wireless

interface, or it can switch oﬀ or on a ﬂow thereby either stopping or resuming the

transmission of it. The signaling method is applied by introduced software extension

modules on client and server side, where the switch of ﬂows is triggered on the client

side and executed on the server side. Importantly, with the introduced software

extensions change in the RTP streaming server software or the SVC media player

on the client is not necessary. The new signaling method furthermore solves the

Network Address Translation (NAT) problem due to the generally behind a NAT

hidden wireless network interfaces and supports the bandwidth estimation in (i).

For reasonably triggering a switch as required in (iii), we periodically monitor the

available WLAN bandwidth and react to the measured bandwidth variations in

time.

If we summarize the work, we have developed two software extensions for the server

and the client device. They collaboratively measures available bandwidth at the users

terminal. The new estimation tool measures the available bandwidth during a streaming

session with better accuracy. The decision of distributing video layers among multiple

interfaces is taken at the client device and the server re-routes the packets to multiple

interfaces of the client terminal. The collaboration, i.e. information exchange and decision

making are accomplished by using a new signaling method developed in this work. To

evaluate the system, experiments are done in controlled environment to avoid interference

from other users. To measure the streaming video quality, the video player dumps the

video data into a ﬁle while playing the stream. PSNR comparison is done oﬀ line between

the dumped video and the original video to estimate the perceived video quality.

1.5 Outline

The rest of the thesis is organized as follows. In the next chapter we will review related

work. Chapter 3 describes the principle of scalable video streaming. The terminology

of scalable video encoding as well as streaming the video content is described in this

chapter. The proposed system architecture to support stable quality of the streaming

video by simultaneously using multiple radio interfaces is discussed in chapter 4. The

underlying principles of bandwidth estimation and a new customized tool for the WLAN

bandwidth estimation is described in chapter 5. A new signaling mechanism is developed

to provide dynamic stream adaptation and to support video stream reception via multiple

interface. Our signaling mechanism is described in chapter 6. In chapter 7, we evaluate

our system. The experimental testbed and experimental methodologies are described in

this chapter. The results are analyzed to evaluate the system. There are many areas that

can be improved to provide stable quality of a streaming video in real life scenarios. Our

work provides a base for such further developments. Chapter 8 describes future working

areas. Finally, the thesis is concluded in Chapter 9.

Chapter 2

Related Work

Bandwidth aggregation for a mobile device with multiple radio interfaces and the pro-

vision of a suitable architecture to use the interfaces simultaneously is discussed in [12,

13, 17, 22, 26]. In [17], authors proposed dynamic bandwidth aggregation (DBA) proxy

(situated in the Internet) which handles the packet aggregation and scheduling. The

DBA proxy breaks the end-to-end argument of communication by creating two separate

connection between the sender and the receiver. The DBA proxy monitors the channel

condition over the wireless link, but wireless channel condition can be realized properly

at the receiver only. This work does not consider real-time or multimedia traﬃc and their

system is tested using simulations only. In [12], authors considered real-time traﬃc, but

they have not monitored the wireless link to make decision. They assumed the channel

condition parameters are available. In [13], authors provides a proxy-based bandwidth

aggregation technique with reduced IP Packet Reordering. This work mainly emphasize

packet scheduling for bandwidth aggregation. Use of scalable video coding in our work

makes the packet scheduling easier. Also, they have not provided any bandwidth estima-

tion mechanism. In [7] a H.264 video stream is split into segments to receive them using

WLAN and UMTS. The segments are distributed based on throughput and RTT of the

networks. Results are only veriﬁed via simulations only. None of these works considers

scalable video streaming. As we will explain in Chapter 3, scalable video provides fea-

tures to send separate ﬂows of diﬀerent codec layers that make it preferrable for multiple

interface utilization.

Related literature on scalable video streaming is reviewed as follows. In [18] a method

is described for scalable video adaptation due to changing available bandwidth in hetero-

geneous networks. Streaming over two networks simultaneously is not considered, and

the measured available bandwidth is not veriﬁed with the actually available one. In [9]

deterministic packet scheduling algorithms are derived. TCP is selected as transport pro-

tocol and each packet is scheduled depending on the timing information, which makes the

packet scheduling computationally expensive. The algorithms are evaluated via simula-

tion only against the rate control algorithms deﬁned in the Datagram Congestion Control

Protocol (DCCP) standard. In [20] a scheme for scalable video transmission over multiple

wireless networks is detailed. The work does neither provide any signaling method for

multi-path streaming nor considers the available bandwidth to schedule the packets. The

focus there is on service provision to a group of users who are connected via a multi-homed

access point. Streaming in individual multi-homed device is not considered.

Chapter 3

Principles and Features of Scalable

Video Streaming

Scalable Video Coding (SVC) is the name for the Annex G extension of the H.264/MPEG-

4 Advance Video Coding (AVC) video compression standard. In this section we shortly

review the principles and features of scalable video streaming in the context of this thesis.

3.1 Scalability

Streaming servers normally have to serve a large number of users with diﬀerent screen

resolutions and network bandwidth. As a result, the objective of video coding for the In-

ternet streaming has changed to optimizing the video quality over a given bit rate range

instead of a single bit rate. SVC address this problem by encoding the video in several

layers, where the ﬁrst layer (base layer) contains the minimum data and the remaining

layers (enhancement layers) include reﬁnements to the base layer. This makes scalability

possible as a receiver can receive a subset of the layers, ignoring the rest, depending on

its current bit rate availability.

Scalable video coding provides scalability in three directions with diﬀerent (i) bit rate

or signal-to-noise ratio (SNR) scalability, (ii) frame rate or temporal scalability and (iii)

spatial scalability (ﬁgure 3.1 [21]). SNR scalability is a technique to encode a video into

two layers at the same frame rate and the same spatial resolution, but diﬀerent quantiza-

tion accuracy. On the other hand, temporal scalability is a technique to encode a video

sequence into two layers at the same spatial resolution, but diﬀerent frame rates. Finally,

Spatial scalability is a technique to encode a video into two layers at the same frame rate,

but diﬀerent spatial resolutions [5].

The encoder organizes frames of a video in a hierarchical order in order to provide

Figure 3.1: Scalability directions of scalable video coding

diﬀerent levels of scalability. The hierarchical organizations of frames in a scalable video

is shown in ﬁgure 3.2. The small (blue) rectangles are the frames and the arrows signify

dependency among the frames. Within the larger rectangles the two smaller rectangles

Figure 3.2: Hierarchical organization of frames among SVC layers: Arrows represent

dependency, each small rectangle represents a frame, and frames within a large rounded

rectangles represent two SNR layers.

represents two quality layers or SNR scalability. There can be more than two quality

layers. The lower frame contains a small number of bits to present per-pixel in the

frame. The upper quality layers add few extra bits of information to enhance the picture

quality. Please note that there can be more than two spatial layers. Now, the two larger

rectangles with same frame number, separated by the spatial layer line, represents two

spatial layers in the video. That means, the frame situates below the line, represents one

(smaller) dimension of the picture and the frame above the line, represents another (larger)

dimension of the same picture. The higher frame contains only the extra information to

draw the higher dimension picture. The (near) horizontal arrows represent dependency

among frames with diﬀerent frame numbers. In this ﬁgure, we have shown a hierarchy with

Group-of-Picture (GOP) size four. Each fourth frames (0, 4, 8, 12 etc.) are marked as key

frames by the encoder (red rectangles). As the key frames can be decoded independently,

only they can reproduce the video with lowest frame rate. Along with the key frames,

if other even numbered frames (2, 6, 10 etc.) are decoded, then the frame rate as well

as the video quality can be improved. Now, if all the frames of a particular spatial layer

are received, the maximum video quality can be achieved. In this way, with GOP size

4, three temporal layers can be created. The general relationship between number of

temporal layers and GOP size is -

no −of −temporal −layers =log(GOP )

log 2 + 1 (3.1)

The hierarchy of frames creates several codec layers within a scalable video. A codec

layer is identiﬁed by the three scalability parameters, the dependency identiﬁer (DID) for

spatial scalability, the temporal identiﬁer (TID) for temporal scalability, and the quality

identiﬁer (QID) for SNR scalability. Frames of the same level of hierarchy belong to the

same codec layer. A layer, situate above in the hierarchy, depends on the lower layer(s)

for decoding.

3.2 Streaming a scalable video

After encoding a video, it is hinted and stored in the repository of a streaming server.

Hinting of a video adds few extra information with the encoded data ﬁle. Hinting creates

tracks within a video and helps the server to stream the content (using the additional

data). During transmission, each tracks are transmitted as a separate data ﬂow between

the server and the client [21].

3.2.1 Codec layers and streamed ﬂows

Hinting of a video usually creates separate track for each type of media content (e.g. au-

dio, video, subtitle etc.). In case of scalable video content, multiple tracks are created for

a single video. Multiple track creation helps to adapt the video content, as adaptation can

be easily accomplished by receiving a subset of the video ﬂows. For each codec layer, a

separate ﬂow (track) can be created between the streaming server and a client. However,

more number of ﬂows do increase complexity at the receiving side in terms of synchro-

nization, monitoring etc. So multiple codec layers are sent together in a single ﬂow (as

they are included in a single track during hinting). In [3], the authors investigated eﬀects

of multi-dimensional scalability on human perception in order to provide an automated

scalable video adaptation procedure. Their ﬁnding indicates that switching on or oﬀ a

temporal layer ﬂow is perceived more clearly than switching on or oﬀ an SNR layer ﬂow.

Therefore, generally all the temporal layers for a particular frame size are transmitted

in a single ﬂow, whereas the diﬀerent quality layers (SNR layers) for a particular frame

size are transmitted in diﬀerent ﬂows. Diﬀerent spatial layers are targeted for users with

diﬀerent display screen resolution and so they are transmitted as separate ﬂows. An user

with the smallest display screen resolution (supported by the video), receives only the

lowest spatial layer, whereas an user with higher screen resolution has to receive higher

spatial layer to achieve a better video quality.

3.2.2 Streaming session protocols

Scalable video transmission follows the Network Abstract Layering (NAL) concept [25].

A video streaming session has two parts - session setup and video data transmission.

For session setup we have used the RTSP protocol [24]. The session setup is done by a

mutual agreement between the streaming server and the client program (video player).

After receiving the request for a video, the server describes the video using the Session

Description Protocol (SDP) as a plain text in an RTSP message [8]. The video data

is sent as RTP packets. For each data ﬂow (track), a separate RTP session is created.

The RTP protocol is coupled with the RTCP protocol which monitors the RTP session

[23]. For each RTP session, two consecutive ports are used for data transfer - the even

port is used for the RTP packets and the higher odd port is used for RTCP packets. In

this work, the RTSP protocol uses TCP, and RTP uses UDP as transport layer protocol.

Other options are possible as well.

3.2.3 Traﬃc during a streaming session

The setup message and video data transfer during a video streaming session is shown

in ﬁgure 3.3. During the session setup phase, a set of RTSP messages are exchanged

between the server and the client. Diﬀerent types of RTSP messages are used for diﬀerent

purposes. First, the client asks for a video using an OPTION message. After getting the

reply about the availability of the video, the client requests to DESCRIBE the video. As

mentioned earlier, the SDP protocol is used to describe a video. The description contains

information about the number of video tracks available in the video, the number of codec

Figure 3.3: Message exchange between a server and a client during a RTSP video-on-

demand streaming

layers along with the codec information available in the video, required bandwidth for

each tracks in the stream etc. For each track available in the video, a separate video ﬂow

or RTP session is established between the streaming server and the client. As mentioned

earlier, a track contains one or multiple codec layers. Two parties agree upon the RTP

session port numbers using SETUP message. When all RTP session setup is done, the client

requests to send the video data using PLAY message. During the video data transfer,

each video ﬂows are sent to its respective port numbers. After completing the data

transmission, the session is closed using TEARDOWN RTSP message.

3.3 How scalable video encoding helps

A mobile device with multiple network interfaces can achieve a higher bandwidth by aggre-

gating bandwidth provided by all individual access networks. However, scheduling packets

of a video stream for multiple interfaces needs to be done eﬃciently to really gain from the

bandwidth aggregation [10]. Sometimes, bandwidth aggregation can’t completely avoid

packet loss and selectively discarding packets (content adaptation) may possibly maintain

a better overall video quality [11]. Scalable video coding makes packet distribution for

multiple interfaces (and if required frame discarding) easier to implement. As the scalable

video is streamed in multiple ﬂows, ﬂows can be distributed among multiple routes so that

the receiver receives them via multiple access networks. In case of content adaptation,

discarding a video ﬂow, content adaptation can easily be accomplished. One can argue

that without using scalable video coding, the video can be encoded and hinted to multiple

tracks by using other encoding methods. But other encoding method does not use the

layering structure for the encoded video frames to store them according to their impor-

tance and priority in the video. As a result, hinting the video in multiple tracks becomes

a complex job. Also the adaptation becomes very diﬃcult as one has to decide priority of

the data at the packet level (for each individual packet, depending on it’s content) before

discarding it.

We have seen the main principles and features of video coding and how a streaming

of a scalable video works. In the next chapter we will propose a system architecture

which comprised of two software extensions for the streaming server and the client device

respectively. This software extensions are aware of scalable video streams and manipulates

the scalable stream properly to reach the goals.

Chapter 4

Proposed System Architecture

The proposed system architecture provides an adaptive SVC multimedia delivery by uti-

lizing the multiple access networks according to their current resources. If we reiterate

the requirements, then the proposed system should support the following functionalities

- (i) continuous estimation of available bandwidth without aﬀecting the video traﬃc, (ii)

dynamically switching codec layers to a suitable interface, i.e. receiving a scalable video

using multiple access networks and (iii) switching on/oﬀ of a codec layer, i.e. dynamic

scalable video content adaptation to achieve a stable user experience.

4.1 System Architecture

One of the major constraint of the system design is to meet the requirements without

changing the existing streaming server and SVC player software. The system architecture

provides two software extensions - the server-module and the client-module at the server

Figure 4.1: Overview of the system architecture : software extensions are added in the

server and the client

and the client respectively. The overview of the system architecture is shown in the ﬁgure

4.1. These modules work at the lower layer of the protocol stack of the respective devices

and maintain transparency between the streaming server and the video player so that

they are not aware of the existence of these modules.

The client sends request for a video using only one access network. The streaming

server also sends video data to only one interface of the client. Also the streaming server

does not have any knowledge about scalability of the video content. On the other hand,

the server-module is aware of scalable video coding and codec layers available in a video.

It also knows the address of multiple network interfaces of the client. So it distribute

codec layers of a video among multiple interfaces of the client. At the client, the client-

module merge the video data received via multiple access networks and forwards them to

the client program, thus maintaining transparency between the streaming server

and the client program.

Figure 4.2 gives a detailed view of the two software modules. They work collabora-

tively to fulﬁll the requirements. The control-center of the two module exchange control

message at diﬀerent phase of a streaming session and establish mutual agreement on -

address of the multiple receiving interfaces of the client, distribution of codec layers of

Figure 4.2: Overview of the software modules introduced by the system architecture

a stream among multiple interfaces of the client, switching of codec layers between two

interfaces (or switching on/oﬀ a layer) etc. Details of the signaling technique is described

in chapter 6. The WLAN b/w monitor section of the client-module monitors available

bandwidth at the client interfaces. The probe-sender of the server-module sends probe

packets on request to help estimating available bandwidth. More details about available

bandwidth estimation is described in chapter 5.

The client-module monitors available bandwidth continuously and changes codec layer

distribution policy accordingly. The server-module gets update about layer distribution

policy from the client. According to this policy, the packet-interceptor of the server-module

distributes ﬂows of a streaming session among multiple interfaces of the client. Please

note that the server-module can’t make decision on it’s own, rather it acts according to

the policy deﬁned by the client-module. The only job of the server-module is to decide

codec layer of a video packet and treat the packet according to the policy (described in

section 4.2). On the other hand, the client-module not only makes codec layer distribution

policy, but also merges the codec layers received via separate ﬂow path (access network)

(described in section 4.3).

4.2 Dynamic switching of scalable video content

Switching of scalable video content refers to - switching of one or multiple codec layers

from one access network to another access network or switching on/oﬀ of a codec layer.

In [14], authors provided a static scalable video content adaptation technique (based on

[28]). In their work, the adaptation is done on a WiFi router. The use of TCP as transport

protocol makes the adaptation process a complex task. They have created two separate

connection - one between the client and the WiFi router and another between the WiFi

router and the streaming server. The adaptation technique is based on switching on/oﬀ

of one/multiple codec layers. In case of switching oﬀ a layer, the WiFi router creates ac-

knowledgement packets and sends them to the streaming server. In our work, we provide

a simple but dynamic scalable video adaptation technique implemented at the server itself

(server-module). So any network elements need not be aware of scalable content. In this

work, the video layers are not only being switched on or oﬀ, they can also be switched to

another access network. In this work, we have presented a dynamic switching of scalable

video content based on bandwidth availability.

The connection setup phase of a streaming session provides information about the

video which includes number of video ﬂows available in a stream, codec information of

a ﬂow, bandwidth requirement for each ﬂow etc. The client-module stores these in-

formation. This work provides a bandwidth estimation technique to measure available

bandwidth for an user. After measuring the available bandwidth and depending on the

bandwidth requirements for each ﬂow, the layer reception policy (content adaptation as

well as switching to another interface) is decided. As the continuous bandwidth estimation

followed by the ﬂow reception policy is done during a video streaming, the dynamism of

the system is maintained. The layer adaptation policy is informed to the server-module.

Now it the task of the server-module to decide the codec information of a packet and act

according to the policy informed by the client-module.

RTP payload format for SVC is based on NALU concept (described in [28]). Each

NAL unit contains a one byte header (ﬁgure 4.3). The 5-bit type ﬁeld of a NALU header

decides a NALU type, therefore the payload format. Besides one-byte usual header of a

NALU, 3-bytes extended header provides scalability information (ﬁgure 4.3). From these

additional information, the receiver can extract {DTQ}or {DID, TID, QID}values and

decides the codec layer of the packet.

Figure 4.3: Extended NALU header for SVC - 3 extra bytes are used along with usual

1-byte header

There are total 32 diﬀerent types of NALU (type 0-31) can exist, where as type 0,

30 and 31 is undeﬁned (till now). The NALU type, i.e the payload format deﬁnes three

diﬀerent basic payload structure [27]. NALU type 1-23 are reserved for single NAL unit

packet, i.e. the RTP packet contains a single NAL unit in the payload. If multiple NALU

Figure 4.4: RTP payload format -

STAP-A (type 24) NAL unit

Figure 4.5: RTP payload format -

FU-A (type 28) NAL unit

can ﬁt in a single RTP packet, they are aggregated into a single RTP packet payload.

NALU type 24-27 are reserved for aggregation packet. On the other hand, if size of a

NALU exceeds the RTP payload size, it has to be fragmented into multiple fragmentation

unit (FU). NALU type 28-29 are used for FU. NALU type 14, 15 and 20 are used as SVC

NAL unit. For these type of NALUs, the 3-byte extra header informations are attached

to provide scalable information. In this work, only NALU type 24 and 28 are used beyond

type 23. Single time aggregation packet (STAP) put NALUs with the same timestamp in

the same RTP packet. The payload format of STAP-A (NALU type 24) contains 1-byte

NALU header followed by the size of the ﬁrst NALU (2 bytes) and the actual NALU.

Subsequently, the size of the next NALU and the NALU itself is attached to the RTP

payload, and so on (ﬁgure 4.4). The each aggregated NALUs (in a STAP-A NALU) has

type 1-23. The payload format of FU-A fragmentation mode (NALU type 28) consists

of a 1-byte NALU header (FU indicator) followed by the 1-byte FU header. The FU

indicator is used to identify the RTP payload as a fragmentation unit and the FU header

signals weather the fragment is the ﬁrst or last or intermediate one. The upper 5-bits

of FU header also signiﬁes NALU type of 1-23. In case of SVC NALU type, only the

ﬁrst fragment contains scalable information, 3 additional bytes to provide the scalable

information and this is applied to the subsequent fragments. So the scalable information

of the ﬁrst fragment is stored to identify the codec layer of the subsequent fragments.

4.3 Synchronization of multiple video ﬂows of a stream-

ing session

The dynamic resource allocative nature of UMTS network impose a problem of unsyn-

chronized video layers while diﬀerent layers are received via diﬀerent access network. An

UMTS user is assigned more bandwidth based on the traﬃc pattern (obviously the high-

est achievable bandwidth is limited by the contract between the service provider and the

user). When the user starts using UMTS network, initially a low bandwidth is assigned

and the allotment increased if the traﬃc is increased. In case of a video streaming, when

some codec layers are received via UMTS, initially one (or some) packets get delayed.

This delay can vary from hundreds of microseconds (say, 200 s) to couple of seconds

(say 1.5 seconds). As the other layers are being received via another interface, they reach

the user on time and the decoder starts decoding these packets. Each video data packet

contains a timestamp to be synchronized by the decoder. As there is an initial delay in

UMTS networks, the packets received via UMTS network may miss their decoding time-

line. So the codec layers become unsynchronized and the decoder assumes the packets are

lost. As a result, the video quality get degraded. Any packet, received after it’s decoding

deadline, is simply discarded by the decoder.

To solve this problem, a synchronization buﬀer is created at the client. It is integrated

with the client-module. The video player usually maintains a video buﬀer to order the

data packets. But, it is observed that the video playout buﬀer is not capable of syn-

chronizing the video data received via multiple interfaces (when higher rate video faces

long initial delay). A simple buﬀer management technique helps to synchronize the data

packets. The synchronizing-buﬀer is manipulated by using the following two steps - (i)

Packets are stored in the synchronizing buﬀer for Xunit of time before being forwarded

to the video player. (ii) From the synchronizing buﬀer, packets are forwarded to the video

player - (a) if the current packet’s timestamp is within the buﬀer-window, the packet is

sent at Yrate, (b) else if the packet’s timestamp is smaller than the lower bound of the

buﬀer-window, then it is sent immediately, (c) else the packet waits for a small amount

of time before being considered again.

The value of Xis little less than the video player timeout interval. The buﬀer-window

is adjusted using the timestamp of the RTP packet. During encoding, the last frame of

the group-of-picture (GOP) is marked as key frame. If the GOP size is 8, then the key

frames in the video are 0, 8, 16, 24 etc. The streaming server ﬁrst sends the key frame and

the remaining frames are forwarded according to their hierarchical order. The data pack-

ets do not contain the frame number and they are arranged according to the timestamp.

A key frame is identiﬁed using the timestamp itself. Timestamp from frame-to-frame

increases at a ﬁxed rate. A video with 30fps frame rate has 3000 unit time increase for

the next frame, where as a video with 25fps frame rate has 3600 unit time increase. So

the time gap between the two key frame is constant-time-gap*GOP. The upper bound

of the buﬀer-window is the timestamp of the latest key frame of the base layer. The lower

bound of the buﬀer-window is upper bound - 16*constant time gap. After receiving a

key frame of the base layer, the upper and lower limit of the buﬀer-window is recalculated.

As GOP can be maximum 16 for scalable video coding, this is taken to calculate the lower

bound. Experiments show that this value is suﬃcient to provide required synchronization.

The value of Yis calculated periodically during a streaming session. It is initially

set to the average video rate. Then actual packet reception rate is calculated by the

client-module in a periodic manner. The value of Yis updated by taking average of the

previous value of Yand currently measured actual packet reception rate. The initial

waiting time (X) takes care of initial packet delay in UMTS network. Some packets are

stored in the buﬀer during this period (X). So, before packet reception via UMTS get

started, the buﬀered packets are forwarded. As the video data packets are need to be

forwarded before the decoding deadline, the synchronizing-buﬀer can be emptied (due to

large initial delay in UMTS). As a result, the codec layers become unsynchronized. As the

timestamp of the packets, received via UMTS, are smaller than the lower bound of the

buﬀer-window (due to the initial delay), they are forwarded immediately. But, the packets

of other layers are forwarded at rate Y. After sometime, the timestamp of the unsynchro-

nized packets come within buﬀer-window and they become synchronized with other layers.

Support for dynamic switching of scalable video content by this proposed system archi-

tecture is dependent on bandwidth estimation. The next chapter discusses the bandwidth

estimation technique during a video streaming session.

Chapter 5

Available Bandwidth Estimation

A video streaming requires a certain bandwidth guarantee for stable video quality. If the

access network can’t ensure the required bandwidth for a streaming video, then the video

quality would be poor over this network. Low available bandwidth (than the required)

leads to delayed delivery of packets and often loss of packets. Quality of a video stream is

not only eﬀected by the packet loss, but also there is a greater impact of delayed delivery

of packets on a streaming session as compared to other data transmission.

Bandwidth availability for an user in a wireless network can be changed very frequently

due to channel fading and error from physical obstacles. Therefore the perceived quality

of the streaming video over a wireless network also changes frequently. Fluctuating video

quality can be annoying to an user. To avoid video quality ﬂuctuation, the bandwidth

availability need to be ensured.

One possible option can be reservation of required bandwidth for the streaming ses-

sion. But bandwidth reservation is not always possible, because bandwidth allocation and

deallocation requires complex algorithm to avoid wastage of resources. Also bandwidth

reservation will restrict number of simultaneous users in a network.

Another possible option can be bandwidth aggregation among multiple wireless net-

works to ensure the required bandwidth. Modern mobile devices are equipped with mul-

tiple wireless interfaces. If one access network can’t provide the required bandwidth for a

video streaming session, then the video data reception can be shared by multiple access

network, i.e. multiple interface can be used simultaneously to receive video data. Amount

of video data, received via one access network, is decided based upon the available band-

width for the user from that network. Rest of the video data can be received via other

access networks.

If each and every access networks (available to an user) can provide a very low band-

width, then the bandwidth aggregation between these networks can’t ensure the required

bandwidth for a streaming video. So packet loss and delay are inevitable in this situ-

ation. Also multiple access networks are not always available in real life. Under these

circumstances, video content need to be adapted, i.e. selectively video frames need to be

discarded so that the overall video quality remains less aﬀected. To decide the amount of

data that can be sent over a access network, the available bandwidth for the user need to

be measured accurately.

5.1 General bandwidth estimation technique

Packet dispersion technique have been commonly used to estimate available bandwidth.

Illustration of the technique is shown in ﬁgure 5.1 [15]. There are three links between a

sender and receiver having capacity C1, C2 and C3 respectively whereas, C1>C3>C2.

Figure 5.1: Packet dispersion in a multi link path

The eﬀective capacity of the ﬂow path is C2, as this link has the smallest capacity. To

estimate the capacity, two packets are sent back-to-back from the sender. The ﬁrst link

takes L/C1 unit times to forward a packet, where L is the packet size. Therefore the

time gap between the two packets becomes L/C1 after the ﬁrst link. Similarly, the second

link takes L/C2 unit times to forward a packet. As C1>C2, therefore L/C2>L/C1. As a

result, the time gap between the two back-to-back packets becomes L/C2 after the second

link. The third link forwards the ﬁrst packet in L/C3 unit time. As the gap between two

packets is L/C2 (<L/C3), the forwarder has to wait for the second packet after forwarding

the ﬁrst packet in the third link. It can forward the second packet only after L/C2 unit

time. So the time gap between two packets still remains L/C2. The receiver measures

the time gap between the two packets. Then the eﬀective capacity of the ﬂow path is

calculated by -

capacity =packet length

packet gap

C=L

=C2 (5.1)

5.1.1 Limitations of packet dispersion technique in wireless net-

works

However, packet dispersion techniques were developed for wired networks. These tech-

niques give inaccurate results in wireless environment because wireless capacity varies

over a short period due to environmental conditions. Poor channel conditions, include

low received signal or high bit error rate due to path loss, fading, interference, contention

etc., trigger dynamic rate adaptation and retransmission at the MAC layer with random

back oﬀ. Also in a saturated WLAN, a ﬂuid ﬂow model is not applicable because of the

probability-based fairness for channel access across WLAN nodes [15].

As the available bandwidth in a wireless network ﬂuctuates very frequently, we need to

estimate the available bandwidth properly for proper utilization of available resources. If

we use a small portion of the available bandwidth, then available resources remain under

utilized. Also, if we send data at a higher rate than the availability, then the network

becomes congested and packets get lost. In video streaming application packet loss can

have large impact on the overall video display depending on the priority of the lost packet.

5.2 WBest : A bandwidth estimation tool for IEEE

802.11 based wireless networks

WBest is a Wireless Bandwidth estimation tool, designed for fast, non-intrusive, accurate

estimation of available bandwidth in IEEE 802.11 networks [16]. There are two main parts

in the algorithm used in this tool. In the ﬁrst step, a packet-pair technique estimates the

maximum eﬀective capacity over a ﬂow path irrespective of the traﬃc available along the

path. At second step, a packet-train technique estimates achievable throughput to infer

the available bandwidth, which varies depending on the presence of other traﬃc. The tool

assumes the last hop is a wireless LAN (WLAN) and the last hop has minimum capacity

as well as minimum available bandwidth.

In this algorithm, a client (wbest-receiver) requests for probe packets and the probe

sender (wbest-sender) sends the probe packets to the receiver. During the measurement

process, ﬁrst, packet-pairs are requested to sent. Upon receiving the request, the sender

sends two packets (packet-pair) back-to-back. Dispersion of these two probe packets is

measured by the receiver. From the packet dispersion model discussed in section 5.1, it is

clear that the packet dispersion rate would reﬂect the eﬀective capacity of the ﬂow path

between the sender and the receiver. The packet dispersion rate is actually reﬂecting the

capacity of the link which has the lowest capacity along the ﬂow path. As it is assumed

that the last hop is wireless and it has the lowest capacity, so the dispersion rate of the

packet-pair would reﬂect the capacity of the wireless link. Considering the facts that there

could be any packet loss or delay in the wireless network, 30 such packet-pairs are sent

from the sender to calculate the capacity accurately. Dispersion rates for each correctly

received packet pairs are calculated. To remove any outliers, the median of these disper-

sion rates is taken as the capacity of the wireless link.

Figure 5.2: Packet forwarding at the last hop wireless link

After calculating the ﬂow path capacity, the receiver requests for packet trains at the

rate of the ﬂow path capacity (wireless link capacity). Packet train refers to the continu-

ous ﬂow of packets at a constant rate. The sender sends 30 packets (packet train of length

30) at the requested rate. From these 30 packet trains, 29 dispersion rate is calculated and

a mean of all these rates is taken as a available train rate. Probing traﬃc forwarding at

the last hop wireless link is shown in ﬁgure 5.2 [16]. The available bandwidth is calculated

as following.

Suppose, the capacity of the link is C and the load in the network is L (crossing traﬃc).

So the available bandwidth (A) is -

A=C−L

or, L =C−A(5.2)

When the packet-train is sent at the rate of C, the total incoming traﬃc rate for the

wireless access point (AP) is C+L. Now, the AP can only forward data with maximum

capacity (C). Some share of the AP capacity would be taken by the other traﬃc, say

this share is L´. Assuming downstream AP traﬃc is processed as a FIFO queue, the

downstream probing traﬃc would share the same ratio of the total amount of traﬃc

before and after the AP queue. So we can say -

C+L=C′

C′+L′(5.3)

where C´ is the amount of share the probe traﬃc gets. Now,

C′+L′=C

So the equation 5.3 can be rewritten as -

C+L=C′

C(5.4)

Now if we replace L with equation 5.2, we can rewrite the equation as -

C+ (C−A)=C′

or, C2

C′= 2C−A

or, A = 2C−C2

C′

or, A = 2(C−C

C′) (5.5)

The capacity (C) of the wireless link is calculated using packet-pair technique and the

probe share (average train rate), i.e. C´ is calculated using packet train. Then the

available bandwidth is calculated using the equation 5.5.

5.2.1 Limitations of WBest

There are few assumptions related to WBest measurement which does not hold during a

video streaming session - (i) WBest measures available bandwidth for a device (connected

with a WLAN) which is in a idle state, i.e. no packets are coming into or going out of

this device during the measurement process. (ii) All the probe packets used in WBest

measurement has a ﬁxed size (1460 bytes).

As the bandwidth estimation is done based on probe packet dispersion, it increases

traﬃc in the last hop (wireless link). If these probe packets create congestion, even for a

short period of time, the video traﬃc will be eﬀected for that period. The congestion will

also eﬀect the probing traﬃc, which will lead to inaccurate bandwidth estimation. So,

the requirement is measuring available bandwidth accurately by introducing minimum

amount of probe traﬃc and without eﬀecting the video traﬃc.

Equal sized packets will not estimate available bandwidth accurately. As packet disper-

sion depends on packet size, equal-sized probe packets would lead to improper bandwidth

estimation [6]. Small packets has low dispersion rate and large packets has higher disper-

sion rate (see ﬁgure 7.2). The problem is, if probe packets has larger size than the data

packets, then the estimated bandwidth would be higher than the achievable bandwidth

(with smaller data packet size). Similarly, if the probe packets has smaller size than the

data packets, then the estimated bandwidth is smaller than the achievable bandwidth.

5.3 EStream : A new bandwidth estimation tool for

WLAN

With the necessity of estimating available bandwidth in WLAN during a streaming ses-

sion, a bandwidth estimation tool for WLAN is created. The tool is called EStream,

as it continuously estimates available bandwidth in the WLAN through out a streaming

session (Estimation of available bandwidth in WLAN during Streming). EStream algo-

rithm is based on the technique used in WBest tool for measuring available bandwidth in

WLAN. Though WBest measures available bandwidth quite accurately than other band-

width estimation tools, it is not suitable for estimating bandwidth during a streaming

session. EStream implementation does the required modiﬁcation while keeping the basic

algorithm same as the WBest.

In a video streaming session (RTSP/RTP streaming), unequal sized packets are preva-

lent. During video encoding, few video frames are speciﬁed as key frames. After encoding,

key frames contain more data than non-key frames. Also scalable video coding uses hier-

archical prediction structure which imposes unequal size frame. So not all encoded frames

contain the same amount of data. For RTP packets, each frame has a separate time-stamp

which helps the decoder to arrange the frames and play them at the right time. IP packet

payload is limited by maximum segment size (MSS). If a frame (after encoding) is larger

than MSS, then it is sent using multiple packets. Suppose, MSS (excluding header) for

a LAN is 1460 bytes and two consecutive frames has size of 1600 bytes and 600 bytes.

Clearly the ﬁrst frame need to be split into two packets. First packet will contain 1460

bytes and remaining 140 bytes will be sent using a second packets. There is a possibility

of merging this 140 bytes and the second frame and send them together. As, they has

separate time-stamp they can not to merged and sent together. There is no ﬁxed size for

key and non-key frames after encoding, rather it completely depends on encoding algo-

rithm and content of the video. So the video data packet size is variable.

To reduce the amount of probe traﬃc while estimating available bandwidth accurately,

an customized estimation technique is used by EStream which smartly avoids congestion

eﬀect on the video stream. During the measurement period, EStream intrudes few probe

packets into the stream and treat all packets (probe and data) as probe packet which

decreases the amount of separate probe packets. Also it adaptively choose probe packet

size (according to data packet size) to infer proper available bandwidth.

Figure 5.3: How video and probe packets are mixed during packet-pair and packet-train

technique as compared to a normal video traﬃc pattern

The EStream implementation is integrated with the software modules. The probe-

sender of server-module is basically the sending side of EStream and the WLAN b/w

monitor of client-module is basically the receiving side of EStream. During the probing

period, the probe-sender mixes the probe packets with data packets. WBest inserts 60

packets (30 pairs) into the network during packet-pair techniques and each probe packets

has same size of 1460 bytes. Where as EStream, after receiving a request, inserts a probe

packet immediately after the actual data packet in the stream. 30 such probe packets are

inserted after 30 data packet. The probe packet has the same size of the data packet it is

following. The general video data pattern is shown in ﬁgure 5.3a. The modiﬁed stream

with packet-pair probes is shown in ﬁgure 5.3b. At the receiver, if both the packets (of

the pair) are received correctly, they are treated as a packet-pair and dispersion rate for

this pair is calculated. After that the data packet is forwarded to the client video player.

Two packets belonging to same pair is identiﬁed by the sequence number as the probe

packet uses the same sequence number as the data packet. This process certainly reduced

the amount of probe traﬃc as only 30 extra packets are sent and not all of them has

maximum size.

The packet train technique has large diﬀerences between WBest and EStream. As a

train, WBest sends 30 packets (at the train sending rate speciﬁed in the request message)

and all the probe packet has same size of 1460 bytes. EStream does not send consecutive

30 probe packets, as consecutive packets will create congestion and video data (as well as

probe traﬃc) will be aﬀected. As a result, it would neither estimate the accurate available

bandwidth nor deliver proper video data. So, EStream sends multiple small packet trains

to avoid the congestion (ﬁgure 5.3c). Actually, it sends two packets as a packet train (train

length 2). They are not sent back-to-back, rather they are separated by a gap depending

on the train sending rate. First packet of the train is the video packet and second packet

is the probe. After receiving the two packets the dispersion rate is calculated. To get 30

dispersion rate (like WBest), 30 such packet trains are created. The average of all these

rates is taken as available train rate. Then the available bandwidth is calculated using

the equation 5.5.

5.3.1 Content of probe packets

Content selection of the probe packets plays a crucial role in EStream technique. The

purpose of the probe packet is nothing but to measure dispersion rate of the packets which

does not depend on its content. As the probe packets are inserted into the stream, this

may create congestion in the network (for a small period of time). This can lead to packet

loss which would not aﬀect the bandwidth estimation, rather reﬂect accurate estimation.

But the video data loss would aﬀect the video quality. To retain the video quality, the

probability of delivering video data is doubled by duplicating the content of the data

packet in the next probe packet. If either of the packets are received correctly, the video

quality remains the same. If both of them are received correctly, only one of them is

forwarded to the video player and the second packet is discarded by the client-module

after calculating the dispersion rate.

5.3.2 Continuous bandwidth estimation

As the available bandwidth in wireless network can vary frequently, the available band-

width need to be estimated continuously during a streaming session. In general, EStream

measures available bandwidth by requesting packet-pair and packet-trains in every 10

seconds. The capacity estimation of the WLAN (using packet pair) can vary a bit in suc-

cessive measurements. So a moving average of the capacity estimated in all measurement

cases (until that point) is taken as the capacity of the link. If the available bandwidth

drops immediately after one measurement process, the low bandwidth availability will be

identiﬁed only after 10 sec. EStream uses a quick response scheme to tackle this. During

the 10 second period between two successive estimation, it monitors the stream (sequence

numbers of the packets) to identify packet loss. If packet loss is identiﬁed, immediately it

requests for packet-train to measure available bandwidth. The capacity measured by the

last measurement is taken as the capacity of the link. If the bandwidth is really dropped,

then it will be reﬂected immediately. If this is a false alarm (packet is lost due to some

other reason), the bandwidth estimation will also reﬂect that.

5.4 Cost comparison between EStream and WBest

due to intruding traﬃc

Suppose a video stream requires 1 Mbps bandwidth. Now the bandwidth is measured in

every 10 seconds. In 10 seconds, the total amount of video traﬃc (VT) is -

V T = 10 ∗1∗106bits

In case of WBest, each measurement cycle requires total 90 probe packets (30 packet-pairs

and a packet-train of length 30) to estimate the bandwidth. So, the amount probe traﬃc

(PT) would be -

P T = 1460 ∗8∗90 bits

as WBest uses probe packet of size 1460 bytes. Therefore, the traﬃc increase (TI) can be

calculated as -

T I =P T

=1460 ∗8∗90

10 ∗106∗100%

= 10.512% (5.6)

Now, in case of EStream, each measurement cycle requires total 60 probe packets (30

probe packets for 30 packet-pairs and another 30 probe packets for 30 sparse packet

trains of length 2) to estimate the bandwidth. Therefore, the amount probe traﬃc (PT)

would be -

P T = 1000 ∗8∗60 bits

where average packet size for the test case is 1000 bytes. So, the traﬃc increase (TI)

would be -

T I =P T

=1000 ∗8∗60

10 ∗106∗100%

= 4.8% (5.7)

In worst case, if all the data packet has the maximum size, then also the amount of traﬃc

increase (due to EStream measurement) would be -

T I =P T

=1460 ∗8∗60

10 ∗106∗100%

= 7.008% (5.8)

5.5 Bandwidth estimation for UMTS network

Most of the time, the maximum bandwidth availability for UMTS network is ﬁxed by

the contract between the operator and the user. The UMTS network provides dynamic

bandwidth to the users depending on the traﬃc load for the users. Also there is assurance

from the operator that the user can get up to a certain amount of bandwidth (depending

on the contract). In other words, the bandwidth up to the maximum limit is available

for the user. Considering this fact (and also the lack of bandwidth estimation tool for

UMTS), we have not measured available bandwidth for the UMTS interface. In couple

of experiments, we have received video streams entirely using the UMTS network. If the

required data rate for the stream is within the assured data rate by the UMTS network

(2Mbps), then we were able to receive the video.

As the device is now capable of monitoring bandwidth availability, the decision about

video ﬂow distribution is made by the client-module and then it informs the decision to

the server-module. In the next chapter, we will discuss the collaboration between the two

software modules.

Chapter 6

Signaling

Signaling is the most important aspect of the proposed system. The two software mod-

ules, server-module and client-module, are added as an software extension at the server

and the client respectively. These modules make decisions to fulﬁll the goals, i.e. when

to send probe packets to estimate bandwidth availability, which codec layers need to be

forwarded, how codec layers are distributed among multiple interfaces of the client etc.

The server-module can’t make any decision on it’s own, rather it acts on the command

of the client-module. As a server can serve a number of clients simultaneously, the deci-

sion making ability is added to the client-module to reduce complexity at the server. To

perform each act, these modules signal each other by exchanging a set of UDP messages.

This chapter deﬁnes the UDP messages that are used for signaling and describes when

and how these messages are exchanged to reach the desired goals.

Like any other multimedia streaming using UDP, network address translation (NAT)

impose a challenge to continue the communication. Multimedia service providers often

deploy a specialized stun server to solve this problem. In this work, our signaling technique

takes care of this issue without deploying a separate stun server. This chapter ﬁrst

describes how NAT creates problem in a RTSP video-on-demand system and how it can

be solved. Then format of the UDP control messages is deﬁned in section 6.2. Diﬀerent

control messages and their purpose is also described in this section. How packets are

intercepted and handled by the software modules is described in section 6.3. Finally, the

collaboration between the two modules to complete a streaming session, while fulﬁlling

the requirements, is described in section 6.4.

6.1 Solving NAT issue

A RTSP video streaming session should go on between the streaming server and a client

without any problem if the client has a good network connectivity. But often a client,

situates behind a NAT, is not able to receive a single video data of a streaming session,

though it is availing a good network connectivity. Before discussing why a client behind

NAT is not able to receive a RTSP streaming video data, let’s ﬁrst discuss what is NAT

and how most of the other type of packets a client can receive while situates behind a NAT.

Due to lack of IPV4 address space, all network elements can not have a globally

unique IP address. Network administrator often use Network Address Translation (NAT)

to provide Internet access to a large number of users with a single publicly addressable

IP address. There is a set of private IP address which can be reused in diﬀerent local

area networks. Behind a NAT, an user is assigned a private IP address (unique within

Figure 6.1: Network Address Translation in a gateway - how packet’s header is changed

according to the address mapping maintained at the gateway

a local network). When the user tries to access remote content (outside of the local

network), the source IP of the packet is changed at the gateway of the network. The

address translation is shown in ﬁgure 6.1. An outgoing packet with source IP and port

pair (X.X.X.X:A) is changed to (Y.Y.Y.Y:B) at the gateway. When the packet reaches

it’s destination, the reply is send to (Y.Y.Y.Y:B). At the gateway, the NAT maintains the

mapping (Y.Y.Y.Y:B ->X.X.X.X:A). So it changes the destination of the packet from

(Y.Y.Y.Y:B) to (X.X.X.X:A) and forwards it to the local network and the packet reaches

it’s intended destination. There are diﬀerent kind of NAT implementation which impose

diﬀerent restriction on the mapping of the incoming packet address. But in all kind of

NAT implementation, the sender would always be able to send a message to the recipient

if the recipient itself initiates the communication.

Like other type of communication, a client behind a NAT should receive data of a

streaming session. But in most of the cases, it is not able to receive any video data.

To ﬁnd out the reason and possible solution, lets ﬁrst describe how an RTSP streaming

session works.

(1) A Streaming Server listens on a particular port (RTSP on 554) for serving request

from clients.

(2) A client requests for a video from the streaming server on port 554 using RT-

SP/TCP. The client uses an arbitrary free port (say 11224) for this communication.

(3) The server sends back reply to port 11224. The reply contains number of tracks

(and also other information) available in the scalable video.

(4) For each track in the video, a separate ﬂow (RTP session) is created between the

server and the client. As mentioned earlier, multiple codec layers of a scalable video can

be streamed in multiple ﬂows. The client informs the receiving port number for a ﬂow to

the server. Suppose the client says, “Send me track-1 on port 32154-32155”. The even

port is used for RTP packets and odd port is used for RTCP packets. For each tracks,

the client informs a pair of such receiving ports.

(5) In reply, the server says, “Ok, I am sending track-1 from port 6970-6971 to port

32154-32155”. These messages are exchanged using RTSP/TCP in the TCP message

body as a plain text. So far no messages are exchanged between ports 6970-6791 and

32154-32155 in either direction.

(6) When the client says, “play”, the server starts sending data using RTP/UDP to

the agreed ports. So the data of track-1 are sent in a packet with source and destination

port 6970 and 32154 respectively.

(7) In the same way, all the data packets are sent to the agreed ports of the client. After

sending all the data packets, the session is terminated using a set of RTSP/TCP messages.

Now let’s examine why a device behind NAT can’t receive data of a streaming ses-

sion. In this section, we also provided a solution direction which is used in the signaling

mechanism to tackle with the issue.

(1) The communication approach in step (1-4) in the previous paragraph remains the

same even if the client is behind a NAT. In connection with the previous discussion, when

a client sends a request from the port 11224 to the port number 554 of the streaming

server, the source port (and source IP) of this request message is changed to some other

port number by the NAT. Suppose the source port number is changed from 11224 to

23234. Please note that the source IP is also changed from the private IP (X) to the

public IP (Y).

(2) The streaming server only sees that a request has come from the IP address Y

and port number 23234. So, it sends back reply to Y on port number 23234. When this

message comes to the NAT, it knows that destination port 23234 means, this message is

destined for port 11224 on client X. So, it changes the destination IP and port number

and forwards to the client.

(3) The client does not realize about the presence of NAT and continues to communi-

cate with the server. Considering the previous situation, the client agrees with the server

that track-1 will be sent from 6970-6971 to 32154-32155. As these agreement is done

using RTSP/TCP in the message body (as a plain text), the NAT is not aware of these

port agreement.

(4) When the client says, “play”, the server sends data of track-1 to the port 32154.

As no messages are sent from the port 32154, the NAT does not know the destination

for the these packets. So when a packet with destination port 32154 comes to the NAT,

it does not know where to forward the data and ultimately the data does not reach the

destination.

(5) To solve this problem, a dummy message (control message) can be sent from the

client to the server. When the server replies, Ok, I am sending track-1 from port 6970-

6971 to port 32154-32155, two dummy message can be sent from ports 32154 and 32155 of

the client to ports 6970 and 6971 of the server respectively. At the NAT, the source port

of these control messages are changed to some other port (and also the source IP from X

to Y), say 42426 for the RTP port. When the server receives these control messages, it

notes down the port numbers for track-1. Now the RTP data packets for track-1 are sent

to 42426 (instead of 32154) from the port 6970. At the NAT, it knows that destination

port 42426 means client X on port 32154 and change destination address. So, the data

ﬁnally reaches the destination.

(6) The control messages are not exchanged between the client and the server, rather

the client-module sends a speciﬁc type of control message to the server-module to tackle

the NAT issue. A scalable video can have multiple tracks (or ﬂows) and according to

step-5 two control messages need to be sent for each tracks. The server-module needs

to maintain the forwarding port for each tracks. Instead of sending multiple control

messages, the client-module sends two control messages - one for RTP packets and another

for RTCP packets. The server-module maintains the IP address, RTP and RTCP ports

in a structure shown in ﬁgure 6.2 (other ﬁelds of the structure are described later). It

forwards all RTP packets of a streaming session to the same RTP port. Each ﬂow of

a streaming session can be identiﬁed by the source synchronization identiﬁer (ssrc). At

the client-module, it maintains the ssrc of a ﬂow along with RTP and RTCP source and

destination port number (agreed during connection setup) in a structure shown in ﬁgure

6.3 (other ﬁelds of the structure are described later). Upon receiving packets, it decides

the track number of the packet and forward the packet to the video player after adjusting

the port numbers. As there is no way of distinguishing RTP and RTCP packets, they are

sent using separate ports.

Figure 6.2: Structure used by the server-

module to store information about an regis-

tered interface of a client - a separate struc-

ture is used for each interface

Figure 6.3: Structure used by the client-

module to store information about each

streaming ﬂows

6.2 Control messages

As mentioned earlier, the client-module and the server-module collaborate by exchanging

various control messages between them. The control messages are exchanged at diﬀerent

points of time, i.e. before starting a streaming session, at the initial phase of the streaming

session, during a streaming session, at the closing phase of the streaming session and after

completing the streaming session. A separate type of control message is used to accomplish

each speciﬁc task.

Figure 6.4: Control message format - general format of the UDP messages used by the

new signaling technique

General format of a control message is shown in ﬁgure 6.4 . There are two parts of a

control message - message header and message body. The message header has a ﬁxed size

of 8 bytes. The message body is optional and not all message contains a message body.

Length of the message body can vary from 0-1016 bytes. So a control message can have

a maximum size of 1024 bytes. The message header contains two ﬁelds - type and tab.

The type ﬁeld determines the purpose of the message, thereby, content of the message

Message type Purpose of the message

CTRL-RTP Register IP address of an inter-

face and RTP port number

CTRL-RTCP Register RTCP port number

CTRL-SELECT Distribute ﬂows among multiple

interfaces

CTRL-REMOVE Remove ﬂow distribution entry

CTRL-UNREGISTER Unregister an interface

PACKET-PAIR Request packet pair

PACKET-TRAIN Request single long packet train

PACKET-SPARSE Request multiple small packet

train

Table 6.1: List of control messages and their purpose

body and action of the recipient after receiving the message. The value of the tab ﬁeld

depends on the message type. Table 6.2 summarizes the list of control messages and their

key purpose. Following is a detailed description of the control messages.

CTRL-RTP : If a client is intending to receive video data of a streaming session using

two interfaces, the server-module need to be informed about the fact as well as the IP

address of all the interfaces. This type of message is used to register all available interfaces

( e.g. wiﬁ, umts etc.) of a devices with the server-module. As discussed earlier, to tackle

with the NAT issue, the communication need to be initiated from the client side. This

type of control message also solves this problem. Upon receiving a CTRL-RTP message,

the server-module makes a database entry for the the sending IP address and port num-

ber. After successful registration, the server-module replies with an unique registration-id.

Any future communication from the client-module related to that interface should specify

this registration-id. When the client sends this type of message, the tab ﬁeld does not

have any signiﬁcance and set to a negative value. The server uses the tab ﬁeld to inform

the registration-id. This type of message does not have any message body.

CTRL-RTCP : This type of message has the same purpose of the CTRL-RTP mes-

sage. RTP protocol is coupled with RTCP protocol to control and monitor a RTP session.

Each RTP session is communicated between a pair of ports from both the server and client

side. RTP packets are communicated between even ports and the immediate higher odd

ports are used for RTCP communication. Except the port number, there is no other way

to distinguish between a RTP and a RTCP packet. So RTP and RTCP messages should

be sent to separate ports. As a result, a separate CTRL-RTCP message is sent to the

server-module to inform the RTCP port. The RTCP packets are sent to this port by the

server-module. The tab ﬁeld speciﬁes the registration-id of the interface known from the

CTRL-RTP reply message. This type of message also does not contain any message body.

CTRL-SELECT : This type of message is used to inform the codec layer distribution

policy (among multiple interface of the client) of a streaming session. Multiple mini struc-

tures (shown in ﬁgure 6.5) forms the message body. The tab ﬁeld of the control message

header signiﬁes the number of mini structures (number of tracks available in the video)

contained in the messages body. Each mini structure contains (ﬂow distribution) policy

Figure 6.5: Mini structure used to inform and store ﬂow distribution policy

information about each tracks of the video. The ssrc (source synchronization identiﬁer)

value identiﬁes the track to which the information is related to. The registration-id of

the two interfaces (assuming only two interfaces are available to an user) of the client are

conveyed by the id1 and id2 values. As mentioned earlier, a track can contain multiple

codec layers. If codec layers, belonging to a particular track, are distributed among the

two interfaces, the distribution is done according to the op values. As the codec layers

can be identiﬁed using the operating-point value, the op1 and op2 values contain the

maximum permissible operating-point value to be received by the client interfaces, de-

noted by id1 and id2 respectively. The three scalable identiﬁers (DID, QID and TID)

are combined together to form a single operating point value (ﬁgure 6.6). If one interface

of the client can receive all the operating points of the track, then the corresponding op

ﬁeld is speciﬁed as ALL-OP and the other one is speciﬁed as NO-OP. These values can

also be used to switch on/oﬀ one or multiple ﬂows. If both the op ﬁelds contains the

Figure 6.6: Combined operating point value using three scalable identiﬁers

value NO-OP, all the packets belonging to the ﬂow are discarded. To switch a ﬂow from

one interface to another interface, the op values of the respective interfaces are switched

between ALL-OP and NO-OP.

Figure 6.7: Pseudo code to decide a packet’s fate at the server-module

After receiving a CTRL-SELECT message (for the ﬁrst time), the server-module makes

a database entry for each mini structure in the message body. Subsequent CTRL-SELECT

messages (if any changes in the distribution policy is required) updates the database en-

try. The client-module also stores the ﬂow distribution policy in a structure shown in

ﬁgure 6.3. The op1 and op2 values are the same as used in the mini structure of CTRL-

SELECT message body.

The algorithm used by the server-module to decide a packet’s fate is described in

ﬁgure 6.7 as a pseudo code. Let’s take an example to discuss the role of the op ﬁelds

in distributing codec layers among multiple interfaces. Suppose, a video track contains

four codec layers with operating point values {1,0,0},{1,0,1},{1,0,2}and {1,0,3}and all

the layers can be received by interface-1 of the client. So the op1 and op2 ﬁelds contain

ALL-OP and NO-OP values respectively. After sometime the track need to be switched

from interface-1 to interface-2. Now the op1 and op2 ﬁelds contain NO-OP and ALL-

OP values respectively. Again after sometime, it is decided that the ﬁrst layer ({1,0,0})

will be received via ﬁrst interface, the second and third layer ({1,0,1},{1,0,2}) will be

received via second interface and the fourth layer will be dropped. In this case, the op1

and op2 ﬁelds contain 256 and 16630 respectively. These values are calculated by putting

the scalable identiﬁers in binary form in the structure 6.6. Please note that ALL-OP is

equal to 32767 and NO-OP is equal to -32768.

CTRL-REMOVE : When a streaming session is completed, the CTRL-REMOVE

message is sent from the client-module to remove the database entries marked by the

CTRL-SELECT message. In the message body, the ssrc for each video ﬂows of the

streaming session are mentioned. The tab ﬁeld contains the number of ﬂows associated

with the streaming session.

CTRL-DEREGISTER : To unregister an interface from the server-module, this

type of control message is sent. On receiving this type of control message, the server-

module removes the entry from the database. The tab value contains the registration-id

of the interface to be unregistered. This type of control message does not have a message

body.

Apart from all these control messages, which tackle address registration, NAT issue,

layer adaptation etc. there are few other control messages related to available bandwidth

estimation. To estimate available bandwidth for an interface, the client-module requests

for probe packet from the server-module. The tab ﬁeld of the message header speciﬁes the

registration-id of the interface for which the available bandwidth will be estimated. Upon

receiving probe packets, the bandwidth-estimator of the client-module calculates available

bandwidth for that interface. There are diﬀerent kinds of probe messages requested at

Figure 6.8: Format of the control message body to be used by the probe request messages

diﬀerent point of time of a streaming session. For each type of probe request, a diﬀerent

type of control message is sent. All these probe request messages use a common message

body (ﬁgure 6.8). As mentioned earlier, the server-module stores registered interface in-

formation in a structure (shown in ﬁgure 6.2). The probe-count and the probe-value ﬁelds

of the structure are ﬁlled up when a probe request message is received for that interface.

Following is the list of diﬀerent probe request messages, their purposes and content of the

message body.

PACKET-PAIR : As speciﬁed in the chapter 5, a packet-pair technique is used to

estimate the capacity of a ﬂow path. To request packet-pair, this type of control message

is used. In the message body, the probe-count speciﬁes the number of probe-pairs to be

sent. The probe-value does not have any signiﬁcance in the probe-pair request message

and set to zero.

PACKET-TRAIN : After estimating the capacity using the packet-pair technique, a

packet-train technique is used to estimate available bandwidth. The probe-count contains

the length of the packet train, i.e. number of packets to be sent as packet train. The

probe-value signiﬁes the sending rate of the packet train, which is actually the capacity

of the ﬂow path calculated using the packet-pair technique.

PACKET-SPARSE : A customized bandwidth estimation technique, sparse packet-

train is used in this work to estimate available bandwidth during a streaming session. A

customized probing technique is used for this purpose. Actually, a long packet-train is

divided into multiple small packet train of length 2. More about this technique is described

in chapter 5. To request sparse-packet-train, this type of control message is used. The

probe-count contains the number of trains to be sent and the probe-value contains the

sending rate (capacity of the ﬂow path) of the packet trains.

6.3 Packet handling by software modules

The software extensions (server-module and client-module) process all packets related to

a streaming session before passing them to the next protocol layer. To be more precise,

all incoming packets are processed after local routing is done and all outgoing packets are

processed before routing is applied to the packets. iptables is a user space application

program that allows a system administrator to conﬁgure the tables provided by the Linux

kernel ﬁrewall (implemented as diﬀerent Netﬁlter modules) [1]. By setting proper rules

in these table, the packets of a streaming session are queued in the Netﬁlter queue.

libnetﬁlterqueue is a library, provided by the Linux kernel, which provides functions to

process enqueued packets at the user space [19]. The subsequent sections describe how

diﬀerent packets are processed by each software modules after copying them to the user

space using libnetﬁlterqueue.

6.3.1 Inside server-module

Figure 6.9 provides the insight about the server-module. All outgoing video data packets

from the streaming server, related to any streaming session, are queued in the netﬁlter

Figure 6.9: Packet traversal inside server-module - how diﬀerent packets are handled by

diﬀerent section of the server-module

queue number-1. They are intercepted (processed) by the packet-interceptor of the server-

module. By looking into the packet’s source synchronization identiﬁer(ssrc) and operating-

point and consulting with the control-center, the destination address of the receiving

interface (for that packet) is decided. The ssrc is used to identify a ﬂow. The control-center

maintains a database which provides destination address of the packets belonging to a

particular video ﬂow. The operating-point identiﬁes the codec layer of the packet’s content,

which is essentially used to adapt scalable video content. If the client has requested for

probe packets earlier to measure available bandwidth for the receiving interface, i.e. probe-

count for that interface contains a positive value, the packet is queued in probe-mix buﬀer

(netﬁlter queue-2). Then a probe packet of the same size (and same content) is sent

along with the original data packet. After sending each probe packet, the probe-count is

reduced by one. If no probe packet is requested, the data packet is forwarded directly to

the client. UDP header of all the packets (probe and data) are changed to the registered

address of the receiving interface.

6.3.2 Inside client-module

Figure 6.10 provides the insight about the client-module. All packets related to a stream-

ing session are queued and intercepted by the packet-interceptor of the client-module.

Figure 6.10: Packet traversal inside client-module - how diﬀerent packets are handled by

diﬀerent section of the client-module

Incoming packets from the steaming server are queued in netﬁleter queue number-1 and

outgoing packets are queued in queue number-2. The control-center of the client-module

collects the useful information (ssrc, source and destination port, required bandwidth etc.

about each video ﬂow) from the SDP/RTSP packets. The RTSP packets are forwarded

unaltered by the packet-interceptor. In case of video data packets, packets are received via

diﬀerent interfaces with changed packet header and the source and the destination port

numbers are adjusted to the expected port numbers. As the listening port of the client

application is diﬀerent from the received packet’s destination port (to tackle the NAT is-

sue), the port numbers are adjusted to the listening port by the client-module to maintain

transparency between the client application and the server. If the receiving interface of

the packet is expecting probe packet to estimate available bandwidth for the interface,

the packet is forwarded to the bandwidth-estimator before the port numbers are adjusted.

As the data packet and probe packets are mixed together, the bandwidth-estimator treats

all the packets as probe packet. It is the job of the bandwidth-estimator to forward only

the data packets to the video player. Actually, the bandwidth-estimator only calculates

the packet dispersion rate to estimate available bandwidth. Then it forwards the data

packet to the upper layer and discards the probe packet. After changing the data packet’s

port numbers they are again queued to synchronizing- buﬀer before being forwarded to

the video player. Dynamic bandwidth allocation mechanism of the UMTS network can

cause an initial delay of the packet reception. As a result, the codec layers can become

unsynchronized. Packets received via WiFi and UMTS interface are queued to the queue

number 3 and 4 respectively. These queues (buﬀer) release packets in a synchronized

order. Packet synchronization is done by using the RTP timestamp of the data packets

(see section 4.3).

6.4 Collaboration between the server-module and the

client-module

The following section describes how the software modules work together to fulﬁll the

requirements with timely exchange and manipulation of the control messages. The server-

module runs all the time (as long as the streaming server is running). The client-module

(at a particular client device) can run all the time or can be started before starting a

streaming session. The data transmission (video data and control messages) during a

streaming session between a client and streaming server is shown in ﬁgure 6.11.

Registering multiple interfaces of the client - If a client has multiple in-

terfaces to receive a video stream, the server-module need to be informed about

the IP addresses of all the interfaces. The client-module registers IP address and

RTP port for all of it’s interfaces using the CTRL-RTP message. The server-module

maintains a database about the registered interfaces of diﬀerent clients. After receiv-

ing a CTRL-RTP message, the server-module makes a new entry in the database

with the source IP address and source port number of the message. It also as-

signs a registration-id to the entry and replies back the registration-id to the client.

Any future communication related to a registered interface is referred using the

registration-id. To register the RTCP port, the CTRL-RTCP message is used. The

tab ﬁeld of the CTRL-RTCP message contains the registration-id of the interface,

which is known from the reply of the CTRL-RTP message.

Initial available bandwidth estimation - Right before starting a streaming

session, the available bandwidth for the WiFi interface is measured using the similar

Figure 6.11: Message exchange between a server and a client during a streaming session

at the presence of the software modules

method used by the WBest tool. Packet-pair technique is initiated using PACKET-

PAIR control message. The tab ﬁeld speciﬁes the registration-id of the interface

for which the packet-pairs is requested. In the message body, number of packet-

pairs, i.e. probe-count is speciﬁed. After estimating the capacity using the packet-

pair technique, a PACKET-TRAIN message is used to request packet-train. In the

message body, length of the packet-train (probe-count) and capacity of the ﬂow path,

i.e. probe-value is speciﬁed. Manipulating the packet-train and using equation 5.5,

available bandwidth for the interface is calculated.

Session setup and parameter gathering for a streaming session - When a

streaming session starts, initially a set of RTSP request/reply messages are ex-

changed between the client and the server. The server provides all required infor-

mation about the video stream at this stage. Text-based SDP protocol is used to

describe a stream. The packet-interceptor of the client-module collects all the in-

formation about the codec layers and ﬂows of the scalable video from this message.

After receiving the description, the client and the streaming server setup the ﬂows

by mutual agreement (agree upon the source and destination port for each ﬂow).

Initial selection of path for each flows - When all ﬂows are setup, the

client-module makes the decision about distributing video ﬂows for it’s interfaces

depending on the initial bandwidth estimation. The ﬂow distribution policy is in-

formed to the server-module using the CTRL-SELECT message. Multiple mini

structures (ﬁgure 6.5), containing policy information for each ﬂow of the stream,

form the CTRL-SELECT message body. The server-module stores the ﬂow distri-

bution policy for each ﬂow in a table, called ﬂow-table.

Video data transfer - When the client sends play request, the streaming server

starts sending the video data. All the data packets pass through the server-module

and it decides the packet’s destination (IP address and port number) by looking into

the ﬂow-table and registered interface database. Actually, the ﬂow-table maintains a

mapping of registration-id against each video ﬂow. Using the reference (registration-

id), the server-module ﬁnds the IP address and port number for the packets of a

video ﬂow. Then it changes the packet’s header according to the registered address.

At the client, the client-module receives packets via multiple interfaces and adjusts

the packet’s header accordingly. In this way, the software modules maintains

transparency between the client and the streaming server.

Continuous bandwidth estimation and flow redistribution - During the video

data reception, the bandwidth-estimator measures available bandwidth of the WiFi

interface in a periodic manner. A customized bandwidth estimation technique

is used in this work to estimate available bandwidth which uses multiple small

packet-trains (see chapter 5). To request multiple, small packet trains, the PBORE-

SPARSE control message is used. If there is a signiﬁcant change in the available

bandwidth (measured using sparse-packet-train technique), the ﬂow distribution

policy is modiﬁed. The new ﬂow distribution is informed to the server-module us-

ing CTRL-SELECT message.

Session closing - When a streaming session is over, the server-module do not need

to maintain any information about the ﬂows of that session. So the client-module

sends the CTRL-REMOVE message to clear the entries from the ﬂow-table.

Unregistering interfaces - If the client-module does not request any more stream-

ing video from the server, the interfaces can be unregistered by sending CTRL-

UNREGISTER message. After receiving the message, the server-module removes

the entry from the database.

In the next chapter, we will discus the experimentation methodology and evaluate the

results.

Chapter 7

Experiments, Results and Evaluation

All system components are implemented for the Linux operating system. The server-

module and the client-module are implemented and tested in Ubuntu-10.04 with Linux

kernel version 2.6.32. For user space packet handling, the libnetlter-queue library of the

Linux kernel is used. It requires a kernel that includes the nfnetlink-queue subsystem, i.e.

it requires Linux kernel version 2.6.14 or later. The software modules are implemented in

C. The signaling between the two modules is realized by socket programming. The Darwin

Streaming Server-5.5.5 (DSS) software is used on the streaming server. The videos are

encoded using the JSVM (version 9.19.7) software. A customized VLC player extended

by a H.264 scalable video decoder plugin is used as video player.

7.1 Measurement setup

The general setup of the experimental testbed is shown in gure 7.1. The Darwin streaming

server - 5.5.5 (DSS) is installed on a Dell desktop computer with Intel Pentium - III

Figure 7.1: General experimental testbed used for all experiments - some components are

not used in some experiments

(Coppermine) processor, 996.634 MHz clock speed, 500 MB RAM, and the Ubuntu 10.04

(Kernel 2.6.32-21-generic) operating system. It is connected to the TU Berlin network via

an 100 Mbps Ethernet LAN. A Dell laptop (Latitude-D600) is used as FTP server, which

listens on a port to send UDP packets at a constant rate. The UDP packet size can be

varied, i.e. a (dummy) client can request ﬁxed sized (1460 bytes data) or variable sized

packets. The FTP server is similarly connected via Ethernet LAN. An Apple airport base

station is used as an wireless access point (WiFi AP shown in gure 7.1). It is connected

via a 10 Mbps Ethernet LAN to the TU Berlin network and provides a wireless link

capacity (IEEE 802.11b) of 5.5 Mbps. A dell laptop (Latitude-D600) is used as a dummy

client. It downloads data from the FTP server at a speciﬁed rate to create load in the

WLAN. A dell laptop (Studio-1435) with Intel Core 2 duo (T6400) processor, 2 GHz

clock speed, 3 GB RAM, and Ubuntu 10.04 (Kernel 2.6.32-24-generic) operating system

is used as a streaming video receiver (shown as VLC client ﬁgure 7.1). The VLC player is

used to request and receive RTSP video streams from the Darwin streaming server. This

client has an built-in Broadcom WiFi card to connect with the WLAN (Apple access

point). An external USB UMTS stick is used to access the UMTS network. The service

is borrowed from the German service provider O , which provides an assured bit rate of

2 Mbps (according to the contract).

7.2 Measurements and results

The proposed system can provide best result if the proposed tool (EStream) estimates

bandwidth accurately. To verify the accuracy of EStream, it is used to estimate the avail-

able bandwidth in a controlled environment, where the actually available bandwidth can

be concluded from the induced total load in the WLAN. The VLC client and the dummy

client are the only computers connected with the access point (used in the testbed). To

avoid interference from other wireless users, all experiments are conducted inside lab ei-

ther after 10pm or before 6am. The Maximum Segment Size (MSS) for IEEE 802.11

wireless network as well as for UMTS network is 1500 bytes (including packet header).

So an UDP packet can contain maximum 1482 bytes of data (minimum 20 bytes of IP

header + ﬁxed 8 bytes of UDP header), where as a TCP packet can contain maximum

1460 bytes of data (minimum 20 bytes IP header + minimum 20 bytes TCP header). As

the WBest tool uses 1460 bytes as maximum data size, we have also considered 1460 bytes

as maximum data size to make a comparison (though all of our probe packets and video

data are transmitted using UDP).

7.2.1 Evaluation of EStream

EStream estimates bandwidth, based on packet dispersion technique (chapter 5). Packet

dispersion rate depends on packet size. In ﬁgure 7.2, dispersion rate for 20 diﬀerent packet

0 200 400 600 800 1000 1200 1400

Dispersion rate (Mbps)

Packet Size (bytes)

Figure 7.2: Packet dispersion rate with varying packet size

size is shown. Packet sizes are chosen randomly. For each packet size, 100 packet pairs

are sent with a gap of 20ms between each pairs. 100 dispersion rate is calculated from

100 packet pairs for each packet size. Median of these 100 rates is taken as dispersion rate

for a particular packet size. Please note that the packet pairs are sent in an idle LAN,

i.e. probe packets are the only traﬃc available in the WLAN. From ﬁgure 7.2, it is clear

that if we use small size probe packets then the estimation gives lower bandwidth. On the

other hand, if we use larger probe packets, then the estimation indicates higher bandwidth.

Capacity of WLAN is also measured using diﬀerent size packets. To measure capacity

of the WLAN, continuous ﬂow of packets are sent from the FTP server to the dummy

client. Initially, the packets are sent at a very high rate (say 55 Mbps). The receiving

rate is measured at the receiver (dummy client). Then the sending rates are adjusted

(reduced/increased step by step) until the sending and receiving rate are the same (al-

most). In our testbed, the measured WLAN capacity is 4.02 Mbps when a variable size

(each packet size is chosen randomly) packets are used. On the other hand, the WLAN

capacity is measured 5.2 Mbps when only maximum size packets are used. In experiments

we have seen that if our data packet has smaller size and they are sent at the rate of 5.2

Mbps, they are not received at that rate by the client. So there is delayed delivery and

for longer transmission some of the data packets are lost (due to buﬀer overﬂow at the

access point). As a result, EStream chooses adaptive probe packet size to refer bandwidth

proper bandwidth according to the stream.

The limitations of WBest and required modiﬁcation for EStream is discussed in chapter

5. To evaluate the accuracy of EStream, diﬀerent loads are imposed in the WLAN (by the

dummy client) to produce diﬀerent bandwidth availability to the client. Here we provide

four load scenarios. The induced loads by the dummy client (by requesting traﬃc from

the FTP server at a constant rate) are 0.5 Mbps, 1.25 Mbps, 2.4 Mbps, and 3 Mbps.

The WLAN capacity is 4.02 Mbps. The measured dummy load values (at the dummy

client) are 0.5 Mbps, 1.24 Mbps, 2.38 Mbps, and 2.99 Mbps respectively for the four load

scenarios. So the available bandwidth for the VLC client should be 3.52 Mbps, 2.78 Mbps,

1.64 Mbps, and 1.03 Mbps respectively. These values are shown as ground truth in ﬁgure

7.3. The diﬀerence in requested load rate and measured load rate at the dummy client is

due the diﬀerent clock resolutions and computations at server and client side.

0.5

1.5

2.5

3.5

0 0.5 1 1.5 2 2.5 3 3.5

Available Bandwidth (Mbps)

Load in the WLAN (Mbps)

Ground truth

"EStream"

"WBest"

Figure 7.3: WBest v/s EStream - comparison of estimated bandwidth measured by these

two tools with diﬀerent load in the network; the required bandwidth for the videos in

each cases are less than, but close to the available bandwidth

During a video streaming, the available bandwidth is measured periodically after every

10 seconds with each of these four load scenarios. The 95% conﬁdence intervals with 50

measurement samples (for each load scenario) are plotted in ﬁgure 7.3. For comparison

the WBest tool is used for all the given cases via separate measurement series. In these

measurements, the purpose is to estimate available bandwidth. So we have used video

streams which requires less amount of bandwidth than the available bandwidth in each

scenarios. The streaming video rates used in all four cases are 3.1 Mbps, 2.4 Mbps, 1.45

Mbps and 0.5 Mbps respectively.

The measured bandwidth is shown in ﬁgure 7.3. The measurements indicate that the

EStream tool estimates the available bandwidth with a maximum underestimation error

of 0.3 Mbps, while the WBest tool gives a maximum error of 2.7 Mbps. From ﬁgure 7.3,

it is clear that the EStream estimates bandwidth more precisely. The reason behind this

is the fact that the EStream uses adaptive probe packet size (same probe packet size as

the data packet) and it mixes probe packet with data packet. This reduces the amount of

additional traﬃc in the network. Also, the probe packets contains signiﬁcant video data

which provides more reliability in video data transmission.

7.2.2 Evaluation of scalable video adaptation via network aware

utilization of multiple interfaces

As a test video, we have used the Paris sequence (available at http://media.xiph.org/

video/derf). It has a total of 1065 frames at a rate of 30 frames per second (fps). We

encode the video in two spatial layers, the base layer in the Quarter Common Intermediate

Format (QCIF) with a frame size of 176x144 pixels and one enhancement layer in the CIF

format with 352x288 pixels. Each of these layers is streamed in a separate ﬂow, where the

base and enhancement layer require at most 0.2 and 0.7 Mbps respectively. The described

testbed with a WLAN capacity of 4 Mbps is used. The video streaming measurements

are conducted for three diﬀerent setups, (I) receiving video with the WLAN interface

only without any layer adaptation, (II) receiving video with the WLAN interface only

but with layer adaptation and (III) receiving video with two interfaces simultaneously

with both layer adaptation and switch of streams. During the streaming session, the

videos are played by the VLC player. The VLC player also dumps the raw video frames

while decoding them to play, thus allowing for computation of the objective video quality

(PSNR comparison). The PSNR values are calculated for the received video in all three

cases, where the original video is taken as a reference.

Static content adaptation and distribution for multiple interface

To test video quality gain when content adaptation is used, streaming video is received

without any content adaptation and using content adaptation in two separate cases (case I

Figure 7.4: A sample frame of the paris sequence - the video quality is very low as all

the codec layers of the stream are received using wiﬁ interface instead of a low available

bandwidth

Figure 7.5: A sample frame of the paris sequence when the video received using wiﬁ

interface only; due to low available bandwidth, a subset of the codec layers are received

and II). A constant load of 3.5 Mbps is imposed in the WLAN via the dummy client. The

remaining available bandwidth is calculated as (4 - 3.5) Mbps=0.5 Mbps, which should

be suﬃcient to stream the base layer (0.2 Mbps) but not for the enhancement layer (0.7

Mbps). In case (I), both the video layers are received using WiFi, though the network can

Figure 7.6: A sample frame of the paris sequence when the video is received using wiﬁ

and UMTS simultaneously; due to low available bandwidth in the WLAN, some of the

codec layers are being received via UMTS interface

transmit the base layer (required bandwidth is 0.2 Mbps) correctly. In case (II), content

adaptation is applied, therefore the enhancement layer (required bandwidth is 0.7 Mbps)

is discarded. A sample frame of the received video of case (I) and (II) is shown in ﬁgure

7.4 and 7.5 respectively. The video quality in case (II) is much better than the case (I).

The video quality can be further improved if the enhancement layer is received via UMTS

interface instead of discarding it (case (III)). In ﬁgure 7.6, a sample video frame is shown

for case (III).

To further illustrate the overall video quality in all three cases, the PSNR comparison

of the received videos is shown in ﬁgure 7.7. The blue line shows the PSNR comparison

of the video received in case I and the original video, where as the green line shows the

comparison for case II. Though both the layers are received in case I, the video quality in

this case is worse than the case II (only QCIF layer is received). The reason behind this

is the low bandwidth availability. In case I, video data is sent at the higher rate than the

available bandwidth. So the WLAN becomes congested and some packets get lost. This

loss applies to both the video layers (QCIF and CIF). But the decoder can’t aﬀord to loss

packets from the base layer (QCIF). So video quality becomes worse in this case. There

is certain ﬂuctuation at the beginning of the blue line. This is because few packets of the

0 200 400 600 800 1000

PSNR value (dB)

Frame number

using both wifi and umts

using wifi only and layer adaptation

using wifi only and no layer adaptation

Figure 7.7: PSNR comparison between received video and original video in three cases

with high load in the WLAN

base layer is received properly. But then the congestion eﬀect stabilizes. As a result the

poor video quality also get stabilized. Now if we receive the entire CIF layer using UMTS

interface, we have the best and stable video quality (red line) (case III). In this case, the

video quality is the same as the oﬄine decoded video. Certain drift in the red line around

frame number 70, due to some packet loss or delay, is totally a random event and does

not have any visible aﬀect on the perceived video quality.

Synchronization of codec layers

In the previous case (III) experiment, two video layers are received via two interfaces and

the layers are distributed statically (predeﬁned). Switching of a layer from one interface

to another is not done. But when layer switching is done in case of dynamic distribution,

the layers become unsynchronized. As a result, the video quality get degraded. If both

the layers are received using WiFi, the UMTS network remains idle. Due to the dynamic

adaptiveness of the UMTS network, when one layer is switched to UMTS network, there

is a large gap between the last packet received via the WiFi and a ﬁrst packet received via

the UMTS network. But there is no such large inter-packet gap for rest of the layers. So

the switched layer becomes unsynchronized with rest of the layers. The inter-packet gap

for the switched layer is shown in ﬁgure 7.8. To solve the issue an additional synchronizing

200000

400000

600000

800000

1e+06

1.2e+06

1.4e+06

0 500 1000 1500 2000

Inter packet delay (us)

Packet number

packets received via wifi interface

packets received via umts interface

Figure 7.8: Inter-packet delay while of a video ﬂow while it is switched between wiﬁ and

UMTS

200000

400000

600000

800000

1e+06

1.2e+06

1.4e+06

0 500 1000 1500 2000

Inter packet delay (us)

Packet number

packets received via wifi interface

packets received via umts interface

Figure 7.9: Inter-packet delay is normalized by using a low level buﬀer

buﬀer is maintained at the client (described in chapter 4). The inter-packet delay for the

switching layer, with the presence of synchronizing buﬀer is shown in ﬁgure 7.9. Please

note that the inter-packet delay for the enhancement layer (it is being switched) is shown

only. The switching is done at predeﬁned times and it is done by the server-module itself

without any signaling from the client-module at packet number 500, 720, 810, 890 and

2000.

To check how the video quality is degraded due to unsynchronized layers, PSNR

comparison for the received videos (with and without the synchronization buﬀer) is shown

in ﬁgure 7.10. The enhancement layer is switched ﬁrst time from WiFi to UMTS after

0 200 400 600 800 1000

PSNR value (dB)

Frame number

using synchronization buffer

without using synchronization buffer

Figure 7.10: PSNR comparison of original video and received video with and without

synchronization buﬀer; enhancement layer is switched between WiFi an UMTS multiple

times

sending the 500th packet of the this layer. Please note that the switching is done, based

on packet number, but not on frame number because the frame number of a packet’s

content is not known to the server-module. From the dumped content of the streaming

video, it is realized that packet number 500 contains data of frame number 229. The video

quality is degraded after frame 230. Though the enhancement layer frames are received,

they are received after the decoding deadline (i.e. out of synchronization with the base

layer) and they are discarded by the decoder. As the base layer is received on time, the

video is being played with degraded quality. When the enhancement layer is switched

back to WiFi after the packet number 720, the packets (so are the frames) are received

in time again and the video quality is restored around frame number 340. Second time

when the enhancement layer is switched from WiFi to UMTS after packet number 810,

there is not much delay. As a result, the video quality is not degraded. Third time when

the layer is again switched from WiFi to UMTS after packet number, there is a large

inter-packet delay. As a result, the layers become unsynchronized again and the video

quality get degraded.

Dynamic content adaptation and distribution for multiple interface

To check dynamism of the proposed system design, the load on the WLAN is imposed

after starting the streaming session. Content adaptation for case (II) and content distri-

bution for case (III) is done dynamically with varying available bandwidth of the WLAN.

Results for the three cases are given in ﬁgure 7.11, which shows the PSNR comparison

0 200 400 600 800 1000

PSNR value (dB)

Frame number

using both wifi and umts

using wifi only and layer adaptation

using wifi only and no layer adaptation

Figure 7.11: PSNR comparison between received video and original video in three cases

with constant high load in the WLAN - (i) dynamic distribution of video ﬂows between

wiﬁ and UMTS, (ii) dynamic content adaptation while receiving the video ﬂows using

only one interface (wiﬁ), (iii) receiving all video ﬂows using the wiﬁ interface

for the received videos. From frame number 50 (for case (I)) and frame number 125 on

(case (II)+(III)) a constant WLAN load of 3.5 Mbps is induced by the dummy client. As

in the case of static content adaptation, the available bandwidth is 0.5 Mbps and only

base layer (required bandwidth 0.2 Mbps) can be received properly. Please note that

the measurements for each setup are conducted separately and that the WLAN loads

are triggered manually and (so) not at the same instances of time. Without adaptation

(setup (I)) the quality drops right after the induced load (frame number 50) to 17 dB and

does stay below 20 dB for the rest of the measurement (with the exception of some occa-

sional peaks). For setup (II) the quality drops from frame 125 on to 20 dB at most and

slightly improves from frame number 170 on (by roughly 1.25 dB) while staying stable.

When using multiple interfaces (setup (III)) the quality similarly drops to 21.25 dB but

is completely restored within 50 frames (1.7 seconds) at frame number 170, achieved by

the switch of the enhancement layer from WLAN to UMTS.

To verify that our system also reacts to released available bandwidth we conduct a

second measurement series for the setups (I), (II) and (III), where we induce a load of 3.5

Mbps (as in the previous measurements) shortly after starting the experiment but take it

oﬀ again in the last third of the experiment time. Figure 7.12, 7.13 and 7.14 illustrates

0 200 400 600 800 1000

PSNR value (dB)

Frame number

using wifi only and no layer adaptation

load duration

Figure 7.12: PSNR comparison between received video and original video when all video

ﬂows are received using the WiFi interface without any adaptation

the PSNR comparisons of case I, II and III respectively. For usage of one interface only

the WLAN load is induced at frame number 135 and removed at frame 720. The video

quality drops from frame number 130 onwards and the quality is accordingly restored af-

0 200 400 600 800 1000

PSNR value (dB)

Frame number

using wifi only and layer adaptation

load duration

Figure 7.13: PSNR comparison between received video and original video when video

ﬂows are received using the WiFi interface only with content adaptation

ter frame number 720. For usage of the WLAN interface with content adaptation (setup

(II)) the WLAN load is active between frame numbers 300 and 800. During that time,

the quality drops to 22 dB (and remains stable), while it is restored to 35 dB when the

load is taken oﬀ. The drift between frame number 300 to 420 is because some frames

of the CIF layer are received during this period (and some of them are not). But after

some time, the system detects a lower available bandwidth and decides not to receive the

CIF layer. So the video quality stabilizes. When utilizing two interfaces (setup (III)) the

WLAN load is induced between frames 50 and 640 (see ﬁrure 7.14). The video quality

ﬂuctuates between frame number 50 to 90 due to lower available bandwidth in WLAN.

Then the system detects the lower bandwidth and switch the CIF layer to UMTS network.

As a result, the video quality is restored. The switch back of the enhancement layer from

WLAN to UMTS does not result in any change of video quality.

Our measurements indicate that our system detects the loss of available bandwidth

and triggers the suitable adaptation within 1.5 seconds (on an average and maximum 2

seconds). Switching oﬀ a layer or re-routing it to a diﬀerent interface does not result in

a quality degradation, and the involved delay is mainly due to the bandwidth estimation

process. It always takes some time to estimate the available bandwidth reliably without

0 200 400 600 800 1000

PSNR value (dB)

Frame number

using both wifi and umts

load duration

Figure 7.14: PSNR comparison between received video and original video when all video

ﬂows are received using the WiFi and UMTS interfaces

aﬀecting the streaming video. From the plots in ﬁgures 7.11 and 7.14, it is clear that if we

use two interfaces simultaneously when the available bandwidth for one interface is not

suﬃcient, a better video quality will be received. Also when interfaces can not provide the

required bandwidth, we can switch oﬀ ﬂows to receive a better video quality. Similarly

the system detects when suﬃcient bandwidth becomes available again, and accordingly

switches on enhancement layers or re-routes them back to the WLAN interface while the

quality provided by the coding is received.

Chapter 8

Future Work

There are many areas which can be improved to provide a better result towards the goal

of this thesis in all real life scenarios.

Improvement of system response time - The response time of the system de-

pends mainly on the bandwidth estimation process and signaling procedure. The

time taken by the signaling procedure is more or less ﬁxed. As the system takes 1.2

seconds (on an average) to detect available bandwidth drop, this can cause ﬂuctua-

tion in video quality in highly ﬂuctuating WLAN. So the time required to estimate

available bandwidth for WLAN need to be minimized.

Available Bandwidth measurement for UMTS network - In this work, we have

a presented bandwidth estimation technique only for WiFi. The nature of UMTS

network is diﬀerent from WiFi network. So the bandwidth estimation technique,

used for WiFi network, can’t be used for UMTS as it is. Similar technique might

be used with required modiﬁcation. The similarity or dissimilarity between WiFi

and UMTS network need to be investigated thoroughly to make required changes

(if any) in the bandwidth estimation technique for UMTS.

Provide seamless video streaming irrespective of user’s mobility - Us-

ing a mobile device, an user is expected to roam around while watching the streaming

video. As WiFi network can provide a short coverage area (depends on environment,

such as indoor or outdoor, obstacle etc.), the user may loose connectivity while mov-

ing. On the other hand, the UMTS network is expected to provide connectivity at

all places. If the client device detects a connection black-out in advance and switch

the codec layers to the UMTS network, the streaming session can be continued. We

have tried an approach based on received signal strength. Some threshold points

for connectivity black-out based on received signal strength was identiﬁed. The ap-

proach was able to switch the codec layers in time, before loosing the connectivity,

to provide a continuous streaming session. These experiments are done only at the

outdoor environment. As only received signal strength can’t be an appropriate indi-

cator to trigger vertical handoﬀ, some other parameters need to be considered while

making the handoﬀ decision. Also this approach have not considered the horizontal

handoﬀ between WiFi access points. Extensive experiments need to be done to

provide uninterrupted video streaming for moving users.

Smart prediction of wifi network availability for moving users - Precog-

nition about user mobility can be useful to provide uninterrupted video streaming.

If the device has speciﬁc knowledge about the user’s mobility and WiFi connectiv-

ity along the user’s mobility path, available WiFi connectivity can be utilized in a

better way. Depending on the user’s position (using an integrated GPS device) and

movement (using an accelerometer) WiFi connectivity can be estimated.

Exploring user’s mobility history - An user’s mobility history can be utilized

to build a WiFi connectivity map along a mobility path. Suppose an user is moving

along a path while watching a streaming video without any previous knowledge

about the WiFi connectivity. During this movement a possible connectivity map

can be formed, so that the next time the user roams around this area, this map can

be used.

Optimization of power usage - Power consumption is a major concern for mobile

devices. Simultaneous usage of multiple interfaces increases the amount of consumed

power. Scanning for possible connectivity options (access point) also consumes a lot

of power. Usage of multiple interfaces need to be done smartly so that the power

consumption can be optimally minimized.

Minimize jitter due to handoff by smart data reception - When one or mul-

tiple codec layers (not all) are switched from one access network to another, layers

received via diﬀerent access networks need to be synchronized. In chapter 4, the

synchronization of multiple layers are discussed. In this work, using an additional

buﬀer at the client, the synchronization is done. The synchronization process is not

veriﬁed extensively and need to modify accordingly.

Chapter 9

Conclusion

While scalable video coding (SVC) and an RTP payload format to stream a selected

content quality to the client have recently been standardized, the methods for a network-

aware and dynamic content adaptation have not been available. In this work, we have

introduced a customized WLAN bandwidth estimation and a signaling method to enable

adaptive scalable video streaming over multiple wireless interfaces (WLAN, UMTS). SVC

features distribution of codec layers in separate ﬂows and thereby in principle supports the

seamless switch to a diﬀerent quality when multiple interfaces are used simultaneously.

We have introduced an available bandwidth estimation method that induces less probing

traﬃc by mixing probe packets in between the video stream to utilize video data packets

for the measurement. The introduced probing packets do contain relevant video data

and make the video transmission more robust against packet loss. We specify a signaling

method to enable the bandwidth estimation and the switch of codec layers to the required

interface, both to be triggered by the client. The signaling method solves the NAT prob-

lem - that is, it makes the generally hidden client interfaces addressable to receive the

selected ﬂow on each of them. On the client side the bandwidth is estimated periodically

and a suitable decision is taken accordingly. The introduced software extensions for the

streaming server and the client device do not require any change in the streaming server

or the clients media player software.

Our evaluation measurements indicate that our bandwidth estimation is more precise

than with the use of the general packet dispersion techniques. This is due to the controlled

insertion of probing packets into the stream, whereby sources of measurement errors are

minimized. Furthermore the PSNR measurements show that the video quality is not

aﬀected due to the less amount of probing packets. Instead it is veriﬁed that a switch of

an enhancement codec layer from WLAN to UMTS and vice versa is triggered in time

after a change in the available WLAN bandwidth, and that the switch does not result into

a quality drop but into the expected quality provided by the coding. It is furthermore

demonstrated that the ﬂows arriving at the two interfaces simultaneously are suﬃciently

synchronized by the introduced queuing systems on the server and the client side.

Appendices

Appendix A

PSNR comparison

To evaluate the system, the streaming video quality need to be measured in several sce-

narios. Though only human perception can assess a video quality, this practice is not

possible all the time. As human perception varies person to person and it takes a lot

of time, a objective video quality measurement technique is usually used for quick as-

sessment. Peak Signal to Noise Ratio (PSNR) is most simplest and widely used video

quality comparison methodology. PSNR value is calculated between two similar quantity

and expressed in decibel (dB). PSNR comparison between the original video (reference

video) and the transmitted video (received video in a streaming) is calculated using the

following equation -

P SN R(dB) = 10 ∗log( 2552

MSE ) (A.1)

where MSE refers to Mean-Square Error and it is calculated using,

MSE =

i=1

j=1

(Aij −Bij )2

w∗h(A.2)

Here, w is the width of the frame, h is the height of the frame, A and B refers to

pixel value (YUV or RGB) of the reference and transmitted videos. In equation A.1 it is

assumed that a pixel value is represented using 8 bits only.

However, general PSNR calculation does not take packet loss in consideration. If

there is a frame loss, the transmitted video and original video become unsynchronized.

As a result, a frame is not compared with the corresponding reference frame and produce

a wrong conclusion about the video quality. In [4], authors provide an objective video

quality measurement tool based on PSNR which considers packet loss. In this work, we

have used a synchronization tool which synchronizes the received video before calculating

the PSNR values. A freeze frame is inserted if the frame is lost.

Bibliography

[1] http://en.wikipedia.org/wiki/Iptables.

[2] Cisco visual networking index: Forecast and methodology 2009-2014. Technical re-

port, 2010.

[3] A.Eichhorn and P.Ni. Pick your Layers wisely - A Quality Assessment of H.264

Scalable Video Coding for Mobile Devices. In IEEE International Conference on

Communications, pages 1019–1025, 2009.

[4] An (Jack) Chan, Kai Zeng, Prasant Mohapatra, Sung-Ju Lee, and Sujata Banerjee.

Metrics for evaluating video streaming quality in lossy IEEE 802.11 wireless networks.

March 2010.

[5] Roya Choupani, Stephan Wong, and Mehmet Tolun. Scalable Video Coding: A

Technical Report. Technical report, September 2007.

[6] Constantinos Dovrolis, Parameswaran Ramanathan, and David Moore. Packet Dis-

persion techniques and capacity estimation. In IEEE Conference on Local Computer

Networks, October 2008.

[7] Kristian Evensen, Dominik Kaspar, Carsten Griwodz, Pl Halvorsen, Audun F.

Hansen1, and Paal Engelstad. Improving the Performance of Quality-Adaptive Video

Streaming over Multiple Heterogeneous Access Networks. In Second annual ACM

conference on Multimedia systems, pages 57–68, February 2011.

[8] M. Handley and V. Jacobson. SDP: Session Description Protocol. Internet draft,

April 1998.

[9] Cheng-Hsin Hsu, Nikolaos M. Freris, Jatinder Pal, and Singh Xiaoqing Zhu. Rate

Control and Stream Adaptation for Scalable Video Streaming over Multiple Access

Networks. In 18th International Packet Video Workshop, pages 33–40, December

2010.

[10] Dan Jurca and Pascal Frossard. Video Packet Selection and Scheduling for Multipath

Streaming. In IEEE Transactions on Multimedia, volume 9, April 2007.

[11] K.Chebrolu and R.R.Rao. Selective Frame Discard for Interactive Video. In IEEE In-

ternational Conference on Communications, volume 7, pages 4097–4102, June 2004.

[12] K.Chebrolu and R.R.Rao. Bandwidth Aggregation for Real-Time Applications in

Heterogeneous Wireless Networks. In Mobile Computing, IEEE Transactions, April

2006.

[13] K.Evensen, D.Kaspar, P.Engelstad, A.F.Hansen, C.Griwodz, and P.Halvorsen. A

Network-Layer Proxy for Bandwidth Aggregation and Reduction of IP Packet Re-

ordering. In IEEE Conference on Local Computer Networks, October 2009.

[14] Ingo Koﬂer, Martin Prangl, Robert Kuschnig, and Hermann Hellwagner. An

H.264/SVC-based adaptation proxy on a WiFi router. In 18th International Work-

shop on Network and Operating Systems Support for Digital Audio and Video, May

2008.

[15] M. Li, M.Claypool, and R.Kinicki. Packet Dispersion in IEEE 802.11 Wireless Net-

works. In IEEE Conference on Local Computer Networks, October 2008.

[16] M. Li, M.Claypool, and R.Kinicki. WBest: a Bandwidth Estimation Tool for IEEE

802.11 Wireless Networks. In IEEE Conference on Local Computer Networks, Octo-

ber 2008.

[17] Yu Hsiang Lin, Shiao-Li Tsao, Ya-Lian Cheng, and Chih-Min Yu. Dynamic Band-

width Aggregation for a Mobile Device with Multiple Interfaces. In The 8th Inter-

national Symposium on Communications, 2005.

[18] Pik Jian Low, M.S.S.M. Shahrom, M.F.A. Fauzi, M.Y. Alias, M.H.L. Abdullah,

K. Anuar, A.T. Samsudin, M. Amil, and S.N Yahya. Design and Implementation of

Adaptive Scalable Streaming System over Heterogeneous Network. In IEEE Interna-

tional Conference on Signal and Image Processing Applications, page 84, November

2009.

[19] Pablo Neira and Harald Welte. http://www.netﬁlter.org/projects/libnetﬁlter-

conntrack/index.html.

[20] James Nightingale, Qi Wang, and Christos Grecos. Optimised Transmission of H.264

Scalable Video Streams over Multiple Paths in Mobile Networks. In IEEE Transac-

tions on Consumer Electronics, volume 56, pages 2161–2169, November 2010.

[21] P.Amon, T.Rathgen, and D.Singer. File Format for Scalable Video Format. In IEEE

Transactions on Circuits and Systems for Video Technology, volume 17, page 1174,

September 2007.

[22] J. Puttonen, G. Fekete, T. Vaaramaki, and T. Hamalainen. Multiple Interface Man-

agement of Multihomed Mobile Hosts in Heterogeneous Wireless Environments. In

Eighth International Conference on Networks, pages 324–331, 2009.

[23] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol

for Real-Time Applications. Internet draft, January 1996.

[24] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming Protocol (RTSP).

Internet draft, April 1998.

[25] Thomas Stockhammer, Miska M. Hannuksela, and Stephan Wenger. H.26L/JVT

coding Network Abstraction Layer and IP-based Transport. In International Con-

ference on Image Processing, volume 2, pages II–485, December 2002.

[26] Cheng-Lin Tsao and Raghupathy Sivakumar. On Eﬀectively Exploiting Multiple

Wireless Interfaces in Mobile Hosts. In The 5th international conference on Emerging

networking experiments and technologies, 2009.

[27] S. Wenger, M.M. Hannuksela, T. Stockhammer, M. Westerlund, and D. Singer. RTP

Payload Format for H.264 Video. Internet draft, February 2005.

[28] S. Wenger, Y.K. Wang, T. Schierl, and A. Eleftheriadis. RTP Payload Format for

Scalable Video Coding. Internet draft, February 2011.

ResearchGate has not been able to resolve any citations for this publication.

RTP: A Transport Protocol for Real-Time Applications

Article

Full-text available

Jul 2003

RTP payload format for scalable video coding

Article

Full-text available

Jan 2011

https://tools.ietf.org/html/rfc6190

RTP: A Transport Protocol for Real-time Applications

Technical Report

Full-text available

Jan 1996

This memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers.

Packet Dispersion Techniques and Capacity Estimation

Article

Full-text available

Jan 2001

The packet pair technique estimates the capacity of a path (bottleneck bandwidth) from the dispersion (spacing) experienced by two back-to-back packets [5][19][13]. It has also been claimed that the dispersion of longer packet bursts (`packet trains') can measure the available bandwidth of a path [5][9][3]. This paper examines such packet pair and packet train dispersion techniques. We first demonstrate that the dispersion of packet pairs in loaded paths follows a multimodal distribution, and discuss the queueing effects that cause multiple local modes. The path capacity is not necessarily the global mode, and so it cannot be estimated using statistical procedures for the most common bandwidth range. The effect of the probing packet size is also investigated, showing that the conventional wisdom of using maximum sized packet pairs is not optimal. On the contrary, if the probing packets of different packet pairs are of variable size, the sub-capacity local modes become wider and weaker.

RTP payload Format for H.264 Video

Article

Full-text available

Mar 2005

Packet Dispersion in IEEE 802.11 Wireless Networks

Article

Full-text available

Nov 2006

Packet dispersion techniques have been commonly used to estimate bandwidth in wired networks. However, current packet dispersion techniques were developed for wired network environments and can provide inaccurate results in wireless networks due to wireless capacity variability over short time scales. This paper develops an analytical model to investigate packet dispersion behavior in wireless networks. The packet dispersion model is validated using both an extended ns-2 simulator that includes 802.11 MAC layer rate adaptation and wireless 802.11b testbed measurements. Utilizing the model, this study shows that packet dispersion measures effective capacity and achievable throughput of wireless networks instead of the maximum capacity as in wired networks. Additionally, mean and variance of packet dispersion in IEEE 802.11 wireless networks is analyzed while considering the impact of channel conditions such as packet size, link rate, bit error rate and RTS/CTS

WBest: A bandwidth estimation tool for IEEE 802.11 wireless networks

Conference Paper

Full-text available

Nov 2008

Bandwidth estimation techniques seek to provide an accurate estimation of available bandwidth such that network applications can adjust their behavior accordingly. However, most current techniques were designed for wired networks and produce relatively inaccurate results and long convergence times on wireless networks where capacity can vary dramatically. This paper presents a new wireless bandwidth estimation tool, WBest, designed for fast, non-intrusive, accurate estimation of available bandwidth in IEEE 802.11 networks. WBest is a two-stage algorithm: 1) a packet pair technique estimates the effective capacity over a flow path where the last hop is a wireless LAN (WLAN); and 2) a packet train technique estimates achievable throughput to infer the available bandwidth. WBest parameters are optimized given the tradeoffs of accuracy, intrusiveness and convergence time. The advantage of WBest stems from avoiding a search algorithm to detect the available bandwidth by statistically detecting the available fraction of the effective capacity to mitigate estimation delay and the impact of random wireless channel errors. WBest is implemented and evaluated on an 802.11 wireless testbed. Comparisons with other available bandwidth estimation tools shows WBest to have higher accuracy, lower intrusiveness and faster convergence times. Thus, WBest demonstrates the potential for improving the performance of applications that need bandwidth estimation, such as multimedia streaming, on wireless networks.

Scalable Video Coding: A Technical Report

Article

Video streaming over the Internet has gained popularity during the recent years which is mainly the result of the introduction of videoconferencing and videotelephony. These in turn have made it possible to bring to life many applications such as transmitting video over the Internet and over telephone lines, surveillance and monitoring, telemedicine (medical consultation and diagnosis at a distance), and computer based training and education. The heterogeneous, dynamic and best-efiort structure of the Internet however, can not guarantee any speciflc bandwidth for a connection. Many video coding standards have tried to deal with this problem by introducing the scalability feature as adapting video streams to the ∞uctuations in the available bandwidths. In this study, we have discussed the main technical features of more common scalable video coding techniques. The main problems of these methods and their applicability together with the available motion compensated video coding methods are discussed as well.

SDP: session description protocol, RFC4566

Article

Jan 1998

Dynamic Bandwidth Aggregation for a Mobile Device with Multiple Interfaces

Article

Jan 2005

It is expected that future mobile devices might equip multiple wireless interfaces, and can access different wireless systems simultaneously. For some mobile Internet applications that require a high bandwidth, it is important to aggregate the bandwidths of multiple interfaces of a mobile device together to provide a virtual broadband wireless pipe. In this paper, we propose a dynamic bandwidth aggregation scheme for a mobile device with more than one wireless interface. The proposed system suggests an architecture and procedures to offer wireless bandwidth aggregation service. Also, a bandwidth aggregation scheduler that can adjust the number of packets transferred on each wireless link dynamically is designed and presented. The simulation results demonstrate that the proposed method can aggregate wireless bandwidths efficiently and can accommodate the dynamics of the wireless link throughputs.

Efficient Support of Video Streaming to Mobile Devices with Utilization of Multiple Radio Interfaces and Scalable Video Coding

Figures

Recommended publications

A network coding scheduling for Multiple Description video streaming over wireless networks

Fast video coding at low bit-rates for mobile devices

Adaptive broadband video streaming using H.264 scalability across a tandem network

A Device and Network-Aware Scaling Framework for Efficient Delivery of Scalable Video over Wireless...