ThesisPDF Available

Efficient Support of Video Streaming to Mobile Devices with Utilization of Multiple Radio Interfaces and Scalable Video Coding

Authors:
  • TCS Research
Efficient Support of Video Streaming to Mobile
Devices with Utilization of Multiple Radio Interfaces
and Scalable Video Coding
Dissertation
Submitted in partial fulfillment of the requirements for the degree of
Master of Technology
by
Chayan Sarkar
Roll No : 09305069
under the guidance of
Dr. Stephan Rein
Prof. Adam Wolisz
Prof. Kameswari Chebrolu
Department of Computer Science and Engineering
Indian Institute of Technology Bombay
Telecommunication Networks Group
Technische Universit¨at Berlin
June 2011
Abstract
An annoying experience in streaming multimedia to the mobile user is a fluctuating me-
dia quality due to the varying characteristics of the wireless link. This thesis aims to
improve the mobile user’s media experience by utilization of multiple wireless interfaces
of the user’s terminal and proper handling of scalable video stream. Scalable video coding
creates several codec layers in a multimedia content and allows to stream the content in
multiple flows from which the stream receivers can select a subset of flows according to
their quality needs.
This work provides a framework in order to provide a stable video quality for mo-
bile users. Appropriate software extensions are introduced to the client terminal and
the streaming server to meet the goal without changing the existing streaming server or
scalable video player software. A tool is designed for dynamic bandwidth estimation in
WLAN that mixes probe packets in between the video stream to induce less traffic. De-
pending on the bandwidth availability, a subset of codec layers of a scalable video stream
are received via WLAN and the remaining layers are either switched to another interface
of the receiver (if available) or discarded. Using a set of customized UDP control mes-
sages, a new signaling method is established to support the framework.
We verify that the new bandwidth estimator gives accurate results with maximum
7.5% underestimation/overestimation, while the switch of codec layers is triggered rea-
sonably and in time due to the varying available bandwidth. The layer switching stabilizes
within few hundreds of milliseconds up to 2 seconds time period. The PSNR measure-
ments indicate that the switch of codec layers and the probing packets do not affect the
objective quality, while the new utilization of multiple interfaces improves the general
user experience.
i
Acknowledgement
I would like to express my sincere gratitude to my advisors Dr. Stephan Rein, Prof.
Adam Wolisz of Technical University Berlin and Prof. Kameswari Chebrolu of Indian
Institute of Technology, Bombay. I am deeply indebted to them for the guidance and
encouragement that they provided throughout the duration of the project. They have
constantly motivated me to come up with my own ideas. I would also like to thank
Karsten Gr¨uneberg of Fraunhofer Heinrich Hertz Institute, Berlin and Sven Wieth¨olter
of Technical University Berlin for their help at different point of times.
Chayan Sarkar
IIT Bombay
Monday, June 27, 2011
iii
Contents
1 Introduction 1
1.1 Motivation.................................... 1
1.2 ProblemStatement............................... 2
1.3 Challenges.................................... 2
1.4 Contributions .................................. 3
1.5 Outline...................................... 4
2 Related Work 7
3 Principles and Features of Scalable Video Streaming 9
3.1 Scalability.................................... 9
3.2 Streaming a scalable video . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.2.1 Codec layers and streamed flows . . . . . . . . . . . . . . . . . . . . 11
3.2.2 Streaming session protocols . . . . . . . . . . . . . . . . . . . . . . 12
3.2.3 Traffic during a streaming session . . . . . . . . . . . . . . . . . . . 12
3.3 How scalable video encoding helps . . . . . . . . . . . . . . . . . . . . . . . 14
4 Proposed System Architecture 15
4.1 System Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 Dynamic switching of scalable video content . . . . . . . . . . . . . . . . . 17
4.3 Synchronization of multiple video flows of a streaming session . . . . . . . 19
5 Available Bandwidth Estimation 21
5.1 General bandwidth estimation technique . . . . . . . . . . . . . . . . . . . 22
5.1.1 Limitations of packet dispersion technique in wireless networks . . . 23
5.2 WBest : A bandwidth estimation tool for IEEE 802.11 based wireless net-
works....................................... 23
5.2.1 Limitations of WBest . . . . . . . . . . . . . . . . . . . . . . . . . . 25
5.3 EStream : A new bandwidth estimation tool for WLAN . . . . . . . . . . 26
5.3.1 Content of probe packets . . . . . . . . . . . . . . . . . . . . . . . . 28
5.3.2 Continuous bandwidth estimation . . . . . . . . . . . . . . . . . . . 28
v
5.4 Cost comparison between EStream and WBest due to intruding traffic . . . 29
5.5 Bandwidth estimation for UMTS network . . . . . . . . . . . . . . . . . . 30
6 Signaling 31
6.1 SolvingNATissue ............................... 31
6.2 Controlmessages ................................ 35
6.3 Packet handling by software modules . . . . . . . . . . . . . . . . . . . . . 40
6.3.1 Inside server-module . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.3.2 Inside client-module . . . . . . . . . . . . . . . . . . . . . . . . . . 42
6.4 Collaboration between the server-module and the client-module . . . . . . 43
7 Experiments, Results and Evaluation 47
7.1 Measurementsetup............................... 47
7.2 Measurements and results . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
7.2.1 Evaluation of EStream . . . . . . . . . . . . . . . . . . . . . . . . . 49
7.2.2 Evaluation of scalable video adaptation via network aware utiliza-
tion of multiple interfaces . . . . . . . . . . . . . . . . . . . . . . . 51
8 Future Work 61
9 Conclusion 63
Appendices 67
A PSNR comparison 67
List of Figures
3.1 Scalability directions of scalable video coding . . . . . . . . . . . . . . . . . 10
3.2 Hierarchical organization of frames in SVC . . . . . . . . . . . . . . . . . . 10
3.3 RTSP video-on-demand streaming . . . . . . . . . . . . . . . . . . . . . . . 13
4.1 System arcitecture - overview . . . . . . . . . . . . . . . . . . . . . . . . . 15
4.2 System architecture - inside the software extensions . . . . . . . . . . . . . 16
4.3 Extended NALU header for SVC . . . . . . . . . . . . . . . . . . . . . . . 18
4.4 RTP payload format - STAP-A (type 24) NAL unit . . . . . . . . . . . . . 18
4.5 RTP payload format - FU-A (type 28) NAL unit . . . . . . . . . . . . . . 18
5.1 Packetdispersion ................................ 22
5.2 Packet forwarding at the last hop wireless link . . . . . . . . . . . . . . . . 24
5.3 Mixing of probe and data packets . . . . . . . . . . . . . . . . . . . . . . . 27
6.1 Network address translation . . . . . . . . . . . . . . . . . . . . . . . . . . 32
6.2 Structure used to store registered interface information by the server-module 35
6.3 Structure used by the client-module to store information about each stream-
ingows..................................... 35
6.4 Control message format . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
6.5 Mini structure used to inform and store flow distribution policy . . . . . . 37
6.6 Combined operating point value using three scalable identifiers . . . . . . . 37
6.7 Pseudo code to decide a packet’s fate at the server-module . . . . . . . . . 38
6.8 Message body format of probe request message . . . . . . . . . . . . . . . . 39
6.9 Packet traversal inside server-module . . . . . . . . . . . . . . . . . . . . . 41
6.10 Packet traversal inside client-module . . . . . . . . . . . . . . . . . . . . . 42
6.11 Message exchange during a streaming session . . . . . . . . . . . . . . . . . 44
7.1 General experimental testbed . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 Packet dispersion rate with varying packet size . . . . . . . . . . . . . . . . 49
7.3 WBest v/s EStream - III . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
7.4 Sample video frame - I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
7.5 Sample video frame - II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
vii
7.6 Sample video frame - III . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7.7 PSNR comparison between received video and original video - I . . . . . . 54
7.8 Inter-packet delay - I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.9 Inter-packet delay - II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
7.10 PSNR comparison between received video and original video - II . . . . . . 56
7.11 PSNR comparison between received video and original video - III . . . . . 57
7.12 PSNR comparison between received video and original video - IV . . . . . 58
7.13 PSNR comparison between received video and original video - V . . . . . . 59
7.14 PSNR comparison between received video and original video - VI . . . . . 60
Chapter 1
Introduction
1.1 Motivation
Fact - I : Multimedia content distribution takes significant part of the IP traffic. As
predicted by the well known Cisco study [2], Internet video will account for 62 percent of
the consumer Internet traffic by the end of 2015. By the same time, Wi-Fi and mobile
devices will account for 54 percent of IP traffic. Media content access from mobile devices
is gaining popularity day by day and a large amount of multimedia traffic is accounted
by video streaming. The amount of video traffic in the wireless networks does impose the
challenge of sufficient bandwidth provision to the mobile users. Due to varying character-
istics of wireless link, the available bandwidth for a mobile user changes very frequently.
An annoying experience in mobile multimedia streaming is fluctuating media quality due
to the varying bandwidth of the wireless network. If the available bandwidth is not suffi-
cient, packets are delayed or even lost, resulting in a drop of perceived quality of the video.
Fact - II : Modern mobile devices are equipped with multiple radio interfaces to
provide a wide range of connectivity options to the users. Wi-Fi and UMTS are two most
common interfaces available with almost every mobile devices. UMTS network is being
deployed widely and it will be available almost everywhere within a short period of time.
Though it has a wide connectivity, it can provide only a low bandwidth to the users. On
the other hand, the IEEE 802.11 based WLAN is available only in certain hot-spots like
airport, shops, offices, educational institutes etc. WLAN can provide a high available
bandwidth within a short connectivity range. But the bandwidth availability can change
quickly with environmental influences like physical obstacle, interference by other users
etc. In a word, a high bandwidth can not be ensured for mobile users through a single
access network.
1
1.2 Problem Statement
If multiple access networks are available to an user, bandwidth aggregation among these
networks can effectively provide an higher bandwidth. But sometimes, each access net-
work can provide a very low bandwidth so that altogether the available bandwidth can’t
avoid packet loss. Also multiple access networks are not always available in real life sce-
narios. Under these circumstances, it may be necessary to selectively discard frames to
minimize the effect of their loss on the overall video quality. Please note that in this work,
we have assumed that the client device is equipped with two network interfaces - WiFi
and UMTS.
In this thesis we aim to provide a stable user experience for streamed media by a
combined approach of (i) adapting the content quality to the currently available network
resources and (ii) by a smart utilization of multiple wireless interfaces. For the content
adaptation we use the extensions to the H.264 AVC standard for Scalable Video Coding
(SVC) [18], which allows for encoding the video into multiple hierarchical layers, each of
them providing additional quality. In case of streaming the SVC layers in multiple flows,
the client can select a subset of the flows with respect to the available network resources
and his needs. As the modern mobile devices are equipped with multiple radio interfaces,
we utilize the current resources of the individual interfaces to receive a suitable codec flow.
In case the respective access networks are available, multiple wireless interfaces (WLAN,
UMTS) are utilized simultaneously if thereby the user experience can be improved.
The requirements for the design of such an adaptive SVC multimedia delivery accord-
ing to the client’s available access networks and their current resources are given as follows.
(i) The available network resources have to be estimated, (ii) it has to be enabled that
a codec layer flow can be switched to the appropriate interface (with optional switching
on/off of some layers) and (iii) the switch of a codec layer has to be triggered reasonably
in time with network resource change in order to achieve a stable user experience.
1.3 Challenges
To reach the desired goal there are many challenges that need to overcome.
Like any other server-client terminology, in a video streaming an user receives video
data using only one access network. To gain a higher overall bandwidth, band-
width aggregation need to be done among multiple access networks (if available).
Bandwidth aggregation can be achieved by splitting the video data and re-route
them over different paths so that the receiver receives the data via multiple access
networks. But several question comes on this regard - (i) who will split the data?
(ii) where the data will be re-routed, i.e. how the splitter will know about the ad-
dresses of the interfaces of the receiver? (iii) how the data will be distributed among
multiple paths?
Due to varying characteristic of wireless link, available bandwidth changes very
frequently. If the sender sends data to an user at a higher rate than the available
bandwidth, the packets are delayed or even lost. As a result, the overall video quality
is degraded. On the other hand, a video with a low data rate can not provide a high
perceived quality. So the data need to be sent at the maximum possible data rate
available to the user to perceive the highest video quality. Accurate estimation of
available bandwidth can only ensure right amount traffic that can be transmitted
correctly on time. As bandwidth fluctuates within a short period of time in a
wireless network, the bandwidth availability need to be monitored continuously
during a video streaming session. Lack of suitable bandwidth estimation tool for
wireless networks during a video streaming creates another challenge.
To maintain a stable video quality in times of low available bandwidth, video content
need to be adapted. Scalable video coding creates several codec layers in a video.
Codec layers can be transmitted in multiple flows. By receiving a subset of the
flows, content adaptation can be performed. In general, streaming servers are not
aware of scalability of a video content. Somewhere in the flow path (from the
streaming server to the client), the cognizance about the scalable content need to
be installed so that it can help to adapt scalable content according to the user’s
need. The challenge is where to put the cognizance about the scalable content and
it can adapt the stream according to an user’s requirement.
This work resolves these challenges to reach the goal. A detailed solution for each of the
obstacles are described in the subsequent chapters. Please note that the WLAN access
network is prioritized in this work due to its better energy efficiency and lesser monetary
expanses.
1.4 Contributions
The main contributions of this work can be summarized in line with the requirements
defined in section 1.2.
We achieve (i), the wireless bandwidth estimation in that we use the packet pair
technique which sends probe packets from the server to the client upon request.
Specifically, we design a new WLAN bandwidth estimation tool that mixes probe
packets in between the video data stream. Thereby less probing data is induced,
resulting into more accurate results without disturbing the perceived video quality.
The probe packets even make the video transmission more robust against packet loss,
as relevant video data is copied to the body of the probe packets. The bandwidth
estimation is triggered periodically to monitor the available bandwidth, while packet
loss is taken as an additional trigger to detect fluctuating bandwidth in time.
We enable (ii), the codec layer switch by a new signaling method with specified
UDP control messages, which are exchanged between client and streaming server.
Specifically, a switch can mean to route a flow to the required interface during the
streaming session setup, it can refer to a re-routing of a flow to a new wireless
interface, or it can switch off or on a flow thereby either stopping or resuming the
transmission of it. The signaling method is applied by introduced software extension
modules on client and server side, where the switch of flows is triggered on the client
side and executed on the server side. Importantly, with the introduced software
extensions change in the RTP streaming server software or the SVC media player
on the client is not necessary. The new signaling method furthermore solves the
Network Address Translation (NAT) problem due to the generally behind a NAT
hidden wireless network interfaces and supports the bandwidth estimation in (i).
For reasonably triggering a switch as required in (iii), we periodically monitor the
available WLAN bandwidth and react to the measured bandwidth variations in
time.
If we summarize the work, we have developed two software extensions for the server
and the client device. They collaboratively measures available bandwidth at the users
terminal. The new estimation tool measures the available bandwidth during a streaming
session with better accuracy. The decision of distributing video layers among multiple
interfaces is taken at the client device and the server re-routes the packets to multiple
interfaces of the client terminal. The collaboration, i.e. information exchange and decision
making are accomplished by using a new signaling method developed in this work. To
evaluate the system, experiments are done in controlled environment to avoid interference
from other users. To measure the streaming video quality, the video player dumps the
video data into a file while playing the stream. PSNR comparison is done off line between
the dumped video and the original video to estimate the perceived video quality.
1.5 Outline
The rest of the thesis is organized as follows. In the next chapter we will review related
work. Chapter 3 describes the principle of scalable video streaming. The terminology
of scalable video encoding as well as streaming the video content is described in this
chapter. The proposed system architecture to support stable quality of the streaming
video by simultaneously using multiple radio interfaces is discussed in chapter 4. The
underlying principles of bandwidth estimation and a new customized tool for the WLAN
bandwidth estimation is described in chapter 5. A new signaling mechanism is developed
to provide dynamic stream adaptation and to support video stream reception via multiple
interface. Our signaling mechanism is described in chapter 6. In chapter 7, we evaluate
our system. The experimental testbed and experimental methodologies are described in
this chapter. The results are analyzed to evaluate the system. There are many areas that
can be improved to provide stable quality of a streaming video in real life scenarios. Our
work provides a base for such further developments. Chapter 8 describes future working
areas. Finally, the thesis is concluded in Chapter 9.
Chapter 2
Related Work
Bandwidth aggregation for a mobile device with multiple radio interfaces and the pro-
vision of a suitable architecture to use the interfaces simultaneously is discussed in [12,
13, 17, 22, 26]. In [17], authors proposed dynamic bandwidth aggregation (DBA) proxy
(situated in the Internet) which handles the packet aggregation and scheduling. The
DBA proxy breaks the end-to-end argument of communication by creating two separate
connection between the sender and the receiver. The DBA proxy monitors the channel
condition over the wireless link, but wireless channel condition can be realized properly
at the receiver only. This work does not consider real-time or multimedia traffic and their
system is tested using simulations only. In [12], authors considered real-time traffic, but
they have not monitored the wireless link to make decision. They assumed the channel
condition parameters are available. In [13], authors provides a proxy-based bandwidth
aggregation technique with reduced IP Packet Reordering. This work mainly emphasize
packet scheduling for bandwidth aggregation. Use of scalable video coding in our work
makes the packet scheduling easier. Also, they have not provided any bandwidth estima-
tion mechanism. In [7] a H.264 video stream is split into segments to receive them using
WLAN and UMTS. The segments are distributed based on throughput and RTT of the
networks. Results are only verified via simulations only. None of these works considers
scalable video streaming. As we will explain in Chapter 3, scalable video provides fea-
tures to send separate flows of different codec layers that make it preferrable for multiple
interface utilization.
Related literature on scalable video streaming is reviewed as follows. In [18] a method
is described for scalable video adaptation due to changing available bandwidth in hetero-
geneous networks. Streaming over two networks simultaneously is not considered, and
the measured available bandwidth is not verified with the actually available one. In [9]
deterministic packet scheduling algorithms are derived. TCP is selected as transport pro-
tocol and each packet is scheduled depending on the timing information, which makes the
packet scheduling computationally expensive. The algorithms are evaluated via simula-
7
tion only against the rate control algorithms defined in the Datagram Congestion Control
Protocol (DCCP) standard. In [20] a scheme for scalable video transmission over multiple
wireless networks is detailed. The work does neither provide any signaling method for
multi-path streaming nor considers the available bandwidth to schedule the packets. The
focus there is on service provision to a group of users who are connected via a multi-homed
access point. Streaming in individual multi-homed device is not considered.
Chapter 3
Principles and Features of Scalable
Video Streaming
Scalable Video Coding (SVC) is the name for the Annex G extension of the H.264/MPEG-
4 Advance Video Coding (AVC) video compression standard. In this section we shortly
review the principles and features of scalable video streaming in the context of this thesis.
3.1 Scalability
Streaming servers normally have to serve a large number of users with different screen
resolutions and network bandwidth. As a result, the objective of video coding for the In-
ternet streaming has changed to optimizing the video quality over a given bit rate range
instead of a single bit rate. SVC address this problem by encoding the video in several
layers, where the first layer (base layer) contains the minimum data and the remaining
layers (enhancement layers) include refinements to the base layer. This makes scalability
possible as a receiver can receive a subset of the layers, ignoring the rest, depending on
its current bit rate availability.
Scalable video coding provides scalability in three directions with different (i) bit rate
or signal-to-noise ratio (SNR) scalability, (ii) frame rate or temporal scalability and (iii)
spatial scalability (figure 3.1 [21]). SNR scalability is a technique to encode a video into
two layers at the same frame rate and the same spatial resolution, but different quantiza-
tion accuracy. On the other hand, temporal scalability is a technique to encode a video
sequence into two layers at the same spatial resolution, but different frame rates. Finally,
Spatial scalability is a technique to encode a video into two layers at the same frame rate,
but different spatial resolutions [5].
The encoder organizes frames of a video in a hierarchical order in order to provide
9
Figure 3.1: Scalability directions of scalable video coding
different levels of scalability. The hierarchical organizations of frames in a scalable video
is shown in figure 3.2. The small (blue) rectangles are the frames and the arrows signify
dependency among the frames. Within the larger rectangles the two smaller rectangles
Figure 3.2: Hierarchical organization of frames among SVC layers: Arrows represent
dependency, each small rectangle represents a frame, and frames within a large rounded
rectangles represent two SNR layers.
represents two quality layers or SNR scalability. There can be more than two quality
layers. The lower frame contains a small number of bits to present per-pixel in the
frame. The upper quality layers add few extra bits of information to enhance the picture
quality. Please note that there can be more than two spatial layers. Now, the two larger
rectangles with same frame number, separated by the spatial layer line, represents two
spatial layers in the video. That means, the frame situates below the line, represents one
(smaller) dimension of the picture and the frame above the line, represents another (larger)
dimension of the same picture. The higher frame contains only the extra information to
draw the higher dimension picture. The (near) horizontal arrows represent dependency
among frames with different frame numbers. In this figure, we have shown a hierarchy with
Group-of-Picture (GOP) size four. Each fourth frames (0, 4, 8, 12 etc.) are marked as key
frames by the encoder (red rectangles). As the key frames can be decoded independently,
only they can reproduce the video with lowest frame rate. Along with the key frames,
if other even numbered frames (2, 6, 10 etc.) are decoded, then the frame rate as well
as the video quality can be improved. Now, if all the frames of a particular spatial layer
are received, the maximum video quality can be achieved. In this way, with GOP size
4, three temporal layers can be created. The general relationship between number of
temporal layers and GOP size is -
no of temporal layers =log(GOP )
log 2 + 1 (3.1)
The hierarchy of frames creates several codec layers within a scalable video. A codec
layer is identified by the three scalability parameters, the dependency identifier (DID) for
spatial scalability, the temporal identifier (TID) for temporal scalability, and the quality
identifier (QID) for SNR scalability. Frames of the same level of hierarchy belong to the
same codec layer. A layer, situate above in the hierarchy, depends on the lower layer(s)
for decoding.
3.2 Streaming a scalable video
After encoding a video, it is hinted and stored in the repository of a streaming server.
Hinting of a video adds few extra information with the encoded data file. Hinting creates
tracks within a video and helps the server to stream the content (using the additional
data). During transmission, each tracks are transmitted as a separate data flow between
the server and the client [21].
3.2.1 Codec layers and streamed flows
Hinting of a video usually creates separate track for each type of media content (e.g. au-
dio, video, subtitle etc.). In case of scalable video content, multiple tracks are created for
a single video. Multiple track creation helps to adapt the video content, as adaptation can
be easily accomplished by receiving a subset of the video flows. For each codec layer, a
separate flow (track) can be created between the streaming server and a client. However,
more number of flows do increase complexity at the receiving side in terms of synchro-
nization, monitoring etc. So multiple codec layers are sent together in a single flow (as
they are included in a single track during hinting). In [3], the authors investigated effects
of multi-dimensional scalability on human perception in order to provide an automated
scalable video adaptation procedure. Their finding indicates that switching on or off a
temporal layer flow is perceived more clearly than switching on or off an SNR layer flow.
Therefore, generally all the temporal layers for a particular frame size are transmitted
in a single flow, whereas the different quality layers (SNR layers) for a particular frame
size are transmitted in different flows. Different spatial layers are targeted for users with
different display screen resolution and so they are transmitted as separate flows. An user
with the smallest display screen resolution (supported by the video), receives only the
lowest spatial layer, whereas an user with higher screen resolution has to receive higher
spatial layer to achieve a better video quality.
3.2.2 Streaming session protocols
Scalable video transmission follows the Network Abstract Layering (NAL) concept [25].
A video streaming session has two parts - session setup and video data transmission.
For session setup we have used the RTSP protocol [24]. The session setup is done by a
mutual agreement between the streaming server and the client program (video player).
After receiving the request for a video, the server describes the video using the Session
Description Protocol (SDP) as a plain text in an RTSP message [8]. The video data
is sent as RTP packets. For each data flow (track), a separate RTP session is created.
The RTP protocol is coupled with the RTCP protocol which monitors the RTP session
[23]. For each RTP session, two consecutive ports are used for data transfer - the even
port is used for the RTP packets and the higher odd port is used for RTCP packets. In
this work, the RTSP protocol uses TCP, and RTP uses UDP as transport layer protocol.
Other options are possible as well.
3.2.3 Traffic during a streaming session
The setup message and video data transfer during a video streaming session is shown
in figure 3.3. During the session setup phase, a set of RTSP messages are exchanged
between the server and the client. Different types of RTSP messages are used for different
purposes. First, the client asks for a video using an OPTION message. After getting the
reply about the availability of the video, the client requests to DESCRIBE the video. As
mentioned earlier, the SDP protocol is used to describe a video. The description contains
information about the number of video tracks available in the video, the number of codec
Figure 3.3: Message exchange between a server and a client during a RTSP video-on-
demand streaming
layers along with the codec information available in the video, required bandwidth for
each tracks in the stream etc. For each track available in the video, a separate video flow
or RTP session is established between the streaming server and the client. As mentioned
earlier, a track contains one or multiple codec layers. Two parties agree upon the RTP
session port numbers using SETUP message. When all RTP session setup is done, the client
requests to send the video data using PLAY message. During the video data transfer,
each video flows are sent to its respective port numbers. After completing the data
transmission, the session is closed using TEARDOWN RTSP message.
3.3 How scalable video encoding helps
A mobile device with multiple network interfaces can achieve a higher bandwidth by aggre-
gating bandwidth provided by all individual access networks. However, scheduling packets
of a video stream for multiple interfaces needs to be done efficiently to really gain from the
bandwidth aggregation [10]. Sometimes, bandwidth aggregation can’t completely avoid
packet loss and selectively discarding packets (content adaptation) may possibly maintain
a better overall video quality [11]. Scalable video coding makes packet distribution for
multiple interfaces (and if required frame discarding) easier to implement. As the scalable
video is streamed in multiple flows, flows can be distributed among multiple routes so that
the receiver receives them via multiple access networks. In case of content adaptation,
discarding a video flow, content adaptation can easily be accomplished. One can argue
that without using scalable video coding, the video can be encoded and hinted to multiple
tracks by using other encoding methods. But other encoding method does not use the
layering structure for the encoded video frames to store them according to their impor-
tance and priority in the video. As a result, hinting the video in multiple tracks becomes
a complex job. Also the adaptation becomes very difficult as one has to decide priority of
the data at the packet level (for each individual packet, depending on it’s content) before
discarding it.
We have seen the main principles and features of video coding and how a streaming
of a scalable video works. In the next chapter we will propose a system architecture
which comprised of two software extensions for the streaming server and the client device
respectively. This software extensions are aware of scalable video streams and manipulates
the scalable stream properly to reach the goals.
Chapter 4
Proposed System Architecture
The proposed system architecture provides an adaptive SVC multimedia delivery by uti-
lizing the multiple access networks according to their current resources. If we reiterate
the requirements, then the proposed system should support the following functionalities
- (i) continuous estimation of available bandwidth without affecting the video traffic, (ii)
dynamically switching codec layers to a suitable interface, i.e. receiving a scalable video
using multiple access networks and (iii) switching on/off of a codec layer, i.e. dynamic
scalable video content adaptation to achieve a stable user experience.
4.1 System Architecture
One of the major constraint of the system design is to meet the requirements without
changing the existing streaming server and SVC player software. The system architecture
provides two software extensions - the server-module and the client-module at the server
Figure 4.1: Overview of the system architecture : software extensions are added in the
server and the client
and the client respectively. The overview of the system architecture is shown in the figure
15
4.1. These modules work at the lower layer of the protocol stack of the respective devices
and maintain transparency between the streaming server and the video player so that
they are not aware of the existence of these modules.
The client sends request for a video using only one access network. The streaming
server also sends video data to only one interface of the client. Also the streaming server
does not have any knowledge about scalability of the video content. On the other hand,
the server-module is aware of scalable video coding and codec layers available in a video.
It also knows the address of multiple network interfaces of the client. So it distribute
codec layers of a video among multiple interfaces of the client. At the client, the client-
module merge the video data received via multiple access networks and forwards them to
the client program, thus maintaining transparency between the streaming server
and the client program.
Figure 4.2 gives a detailed view of the two software modules. They work collabora-
tively to fulfill the requirements. The control-center of the two module exchange control
message at different phase of a streaming session and establish mutual agreement on -
address of the multiple receiving interfaces of the client, distribution of codec layers of
Figure 4.2: Overview of the software modules introduced by the system architecture
a stream among multiple interfaces of the client, switching of codec layers between two
interfaces (or switching on/off a layer) etc. Details of the signaling technique is described
in chapter 6. The WLAN b/w monitor section of the client-module monitors available
bandwidth at the client interfaces. The probe-sender of the server-module sends probe
packets on request to help estimating available bandwidth. More details about available
bandwidth estimation is described in chapter 5.
The client-module monitors available bandwidth continuously and changes codec layer
distribution policy accordingly. The server-module gets update about layer distribution
policy from the client. According to this policy, the packet-interceptor of the server-module
distributes flows of a streaming session among multiple interfaces of the client. Please
note that the server-module can’t make decision on it’s own, rather it acts according to
the policy defined by the client-module. The only job of the server-module is to decide
codec layer of a video packet and treat the packet according to the policy (described in
section 4.2). On the other hand, the client-module not only makes codec layer distribution
policy, but also merges the codec layers received via separate flow path (access network)
(described in section 4.3).
4.2 Dynamic switching of scalable video content
Switching of scalable video content refers to - switching of one or multiple codec layers
from one access network to another access network or switching on/off of a codec layer.
In [14], authors provided a static scalable video content adaptation technique (based on
[28]). In their work, the adaptation is done on a WiFi router. The use of TCP as transport
protocol makes the adaptation process a complex task. They have created two separate
connection - one between the client and the WiFi router and another between the WiFi
router and the streaming server. The adaptation technique is based on switching on/off
of one/multiple codec layers. In case of switching off a layer, the WiFi router creates ac-
knowledgement packets and sends them to the streaming server. In our work, we provide
a simple but dynamic scalable video adaptation technique implemented at the server itself
(server-module). So any network elements need not be aware of scalable content. In this
work, the video layers are not only being switched on or off, they can also be switched to
another access network. In this work, we have presented a dynamic switching of scalable
video content based on bandwidth availability.
The connection setup phase of a streaming session provides information about the
video which includes number of video flows available in a stream, codec information of
a flow, bandwidth requirement for each flow etc. The client-module stores these in-
formation. This work provides a bandwidth estimation technique to measure available
bandwidth for an user. After measuring the available bandwidth and depending on the
bandwidth requirements for each flow, the layer reception policy (content adaptation as
well as switching to another interface) is decided. As the continuous bandwidth estimation
followed by the flow reception policy is done during a video streaming, the dynamism of
the system is maintained. The layer adaptation policy is informed to the server-module.
Now it the task of the server-module to decide the codec information of a packet and act
according to the policy informed by the client-module.
RTP payload format for SVC is based on NALU concept (described in [28]). Each
NAL unit contains a one byte header (figure 4.3). The 5-bit type field of a NALU header
decides a NALU type, therefore the payload format. Besides one-byte usual header of a
NALU, 3-bytes extended header provides scalability information (figure 4.3). From these
additional information, the receiver can extract {DTQ}or {DID, TID, QID}values and
decides the codec layer of the packet.
Figure 4.3: Extended NALU header for SVC - 3 extra bytes are used along with usual
1-byte header
There are total 32 different types of NALU (type 0-31) can exist, where as type 0,
30 and 31 is undefined (till now). The NALU type, i.e the payload format defines three
different basic payload structure [27]. NALU type 1-23 are reserved for single NAL unit
packet, i.e. the RTP packet contains a single NAL unit in the payload. If multiple NALU
Figure 4.4: RTP payload format -
STAP-A (type 24) NAL unit
Figure 4.5: RTP payload format -
FU-A (type 28) NAL unit
can fit in a single RTP packet, they are aggregated into a single RTP packet payload.
NALU type 24-27 are reserved for aggregation packet. On the other hand, if size of a
NALU exceeds the RTP payload size, it has to be fragmented into multiple fragmentation
unit (FU). NALU type 28-29 are used for FU. NALU type 14, 15 and 20 are used as SVC
NAL unit. For these type of NALUs, the 3-byte extra header informations are attached
to provide scalable information. In this work, only NALU type 24 and 28 are used beyond
type 23. Single time aggregation packet (STAP) put NALUs with the same timestamp in
the same RTP packet. The payload format of STAP-A (NALU type 24) contains 1-byte
NALU header followed by the size of the first NALU (2 bytes) and the actual NALU.
Subsequently, the size of the next NALU and the NALU itself is attached to the RTP
payload, and so on (figure 4.4). The each aggregated NALUs (in a STAP-A NALU) has
type 1-23. The payload format of FU-A fragmentation mode (NALU type 28) consists
of a 1-byte NALU header (FU indicator) followed by the 1-byte FU header. The FU
indicator is used to identify the RTP payload as a fragmentation unit and the FU header
signals weather the fragment is the first or last or intermediate one. The upper 5-bits
of FU header also signifies NALU type of 1-23. In case of SVC NALU type, only the
first fragment contains scalable information, 3 additional bytes to provide the scalable
information and this is applied to the subsequent fragments. So the scalable information
of the first fragment is stored to identify the codec layer of the subsequent fragments.
4.3 Synchronization of multiple video flows of a stream-
ing session
The dynamic resource allocative nature of UMTS network impose a problem of unsyn-
chronized video layers while different layers are received via different access network. An
UMTS user is assigned more bandwidth based on the traffic pattern (obviously the high-
est achievable bandwidth is limited by the contract between the service provider and the
user). When the user starts using UMTS network, initially a low bandwidth is assigned
and the allotment increased if the traffic is increased. In case of a video streaming, when
some codec layers are received via UMTS, initially one (or some) packets get delayed.
This delay can vary from hundreds of microseconds (say, 200 s) to couple of seconds
(say 1.5 seconds). As the other layers are being received via another interface, they reach
the user on time and the decoder starts decoding these packets. Each video data packet
contains a timestamp to be synchronized by the decoder. As there is an initial delay in
UMTS networks, the packets received via UMTS network may miss their decoding time-
line. So the codec layers become unsynchronized and the decoder assumes the packets are
lost. As a result, the video quality get degraded. Any packet, received after it’s decoding
deadline, is simply discarded by the decoder.
To solve this problem, a synchronization buffer is created at the client. It is integrated
with the client-module. The video player usually maintains a video buffer to order the
data packets. But, it is observed that the video playout buffer is not capable of syn-
chronizing the video data received via multiple interfaces (when higher rate video faces
long initial delay). A simple buffer management technique helps to synchronize the data
packets. The synchronizing-buffer is manipulated by using the following two steps - (i)
Packets are stored in the synchronizing buffer for Xunit of time before being forwarded
to the video player. (ii) From the synchronizing buffer, packets are forwarded to the video
player - (a) if the current packet’s timestamp is within the buffer-window, the packet is
sent at Yrate, (b) else if the packet’s timestamp is smaller than the lower bound of the
buffer-window, then it is sent immediately, (c) else the packet waits for a small amount
of time before being considered again.
The value of Xis little less than the video player timeout interval. The buffer-window
is adjusted using the timestamp of the RTP packet. During encoding, the last frame of
the group-of-picture (GOP) is marked as key frame. If the GOP size is 8, then the key
frames in the video are 0, 8, 16, 24 etc. The streaming server first sends the key frame and
the remaining frames are forwarded according to their hierarchical order. The data pack-
ets do not contain the frame number and they are arranged according to the timestamp.
A key frame is identified using the timestamp itself. Timestamp from frame-to-frame
increases at a fixed rate. A video with 30fps frame rate has 3000 unit time increase for
the next frame, where as a video with 25fps frame rate has 3600 unit time increase. So
the time gap between the two key frame is constant-time-gap*GOP. The upper bound
of the buffer-window is the timestamp of the latest key frame of the base layer. The lower
bound of the buffer-window is upper bound - 16*constant time gap. After receiving a
key frame of the base layer, the upper and lower limit of the buffer-window is recalculated.
As GOP can be maximum 16 for scalable video coding, this is taken to calculate the lower
bound. Experiments show that this value is sufficient to provide required synchronization.
The value of Yis calculated periodically during a streaming session. It is initially
set to the average video rate. Then actual packet reception rate is calculated by the
client-module in a periodic manner. The value of Yis updated by taking average of the
previous value of Yand currently measured actual packet reception rate. The initial
waiting time (X) takes care of initial packet delay in UMTS network. Some packets are
stored in the buffer during this period (X). So, before packet reception via UMTS get
started, the buffered packets are forwarded. As the video data packets are need to be
forwarded before the decoding deadline, the synchronizing-buffer can be emptied (due to
large initial delay in UMTS). As a result, the codec layers become unsynchronized. As the
timestamp of the packets, received via UMTS, are smaller than the lower bound of the
buffer-window (due to the initial delay), they are forwarded immediately. But, the packets
of other layers are forwarded at rate Y. After sometime, the timestamp of the unsynchro-
nized packets come within buffer-window and they become synchronized with other layers.
Support for dynamic switching of scalable video content by this proposed system archi-
tecture is dependent on bandwidth estimation. The next chapter discusses the bandwidth
estimation technique during a video streaming session.
Chapter 5
Available Bandwidth Estimation
A video streaming requires a certain bandwidth guarantee for stable video quality. If the
access network can’t ensure the required bandwidth for a streaming video, then the video
quality would be poor over this network. Low available bandwidth (than the required)
leads to delayed delivery of packets and often loss of packets. Quality of a video stream is
not only effected by the packet loss, but also there is a greater impact of delayed delivery
of packets on a streaming session as compared to other data transmission.
Bandwidth availability for an user in a wireless network can be changed very frequently
due to channel fading and error from physical obstacles. Therefore the perceived quality
of the streaming video over a wireless network also changes frequently. Fluctuating video
quality can be annoying to an user. To avoid video quality fluctuation, the bandwidth
availability need to be ensured.
One possible option can be reservation of required bandwidth for the streaming ses-
sion. But bandwidth reservation is not always possible, because bandwidth allocation and
deallocation requires complex algorithm to avoid wastage of resources. Also bandwidth
reservation will restrict number of simultaneous users in a network.
Another possible option can be bandwidth aggregation among multiple wireless net-
works to ensure the required bandwidth. Modern mobile devices are equipped with mul-
tiple wireless interfaces. If one access network can’t provide the required bandwidth for a
video streaming session, then the video data reception can be shared by multiple access
network, i.e. multiple interface can be used simultaneously to receive video data. Amount
of video data, received via one access network, is decided based upon the available band-
width for the user from that network. Rest of the video data can be received via other
access networks.
21
If each and every access networks (available to an user) can provide a very low band-
width, then the bandwidth aggregation between these networks can’t ensure the required
bandwidth for a streaming video. So packet loss and delay are inevitable in this situ-
ation. Also multiple access networks are not always available in real life. Under these
circumstances, video content need to be adapted, i.e. selectively video frames need to be
discarded so that the overall video quality remains less affected. To decide the amount of
data that can be sent over a access network, the available bandwidth for the user need to
be measured accurately.
5.1 General bandwidth estimation technique
Packet dispersion technique have been commonly used to estimate available bandwidth.
Illustration of the technique is shown in figure 5.1 [15]. There are three links between a
sender and receiver having capacity C1, C2 and C3 respectively whereas, C1>C3>C2.
Figure 5.1: Packet dispersion in a multi link path
The effective capacity of the flow path is C2, as this link has the smallest capacity. To
estimate the capacity, two packets are sent back-to-back from the sender. The first link
takes L/C1 unit times to forward a packet, where L is the packet size. Therefore the
time gap between the two packets becomes L/C1 after the first link. Similarly, the second
link takes L/C2 unit times to forward a packet. As C1>C2, therefore L/C2>L/C1. As a
result, the time gap between the two back-to-back packets becomes L/C2 after the second
link. The third link forwards the first packet in L/C3 unit time. As the gap between two
packets is L/C2 (<L/C3), the forwarder has to wait for the second packet after forwarding
the first packet in the third link. It can forward the second packet only after L/C2 unit
time. So the time gap between two packets still remains L/C2. The receiver measures
the time gap between the two packets. Then the effective capacity of the flow path is
calculated by -
capacity =packet length
packet gap
C=L
L
C2
=C2 (5.1)
5.1.1 Limitations of packet dispersion technique in wireless net-
works
However, packet dispersion techniques were developed for wired networks. These tech-
niques give inaccurate results in wireless environment because wireless capacity varies
over a short period due to environmental conditions. Poor channel conditions, include
low received signal or high bit error rate due to path loss, fading, interference, contention
etc., trigger dynamic rate adaptation and retransmission at the MAC layer with random
back off. Also in a saturated WLAN, a fluid flow model is not applicable because of the
probability-based fairness for channel access across WLAN nodes [15].
As the available bandwidth in a wireless network fluctuates very frequently, we need to
estimate the available bandwidth properly for proper utilization of available resources. If
we use a small portion of the available bandwidth, then available resources remain under
utilized. Also, if we send data at a higher rate than the availability, then the network
becomes congested and packets get lost. In video streaming application packet loss can
have large impact on the overall video display depending on the priority of the lost packet.
5.2 WBest : A bandwidth estimation tool for IEEE
802.11 based wireless networks
WBest is a Wireless Bandwidth estimation tool, designed for fast, non-intrusive, accurate
estimation of available bandwidth in IEEE 802.11 networks [16]. There are two main parts
in the algorithm used in this tool. In the first step, a packet-pair technique estimates the
maximum effective capacity over a flow path irrespective of the traffic available along the
path. At second step, a packet-train technique estimates achievable throughput to infer
the available bandwidth, which varies depending on the presence of other traffic. The tool
assumes the last hop is a wireless LAN (WLAN) and the last hop has minimum capacity
as well as minimum available bandwidth.
In this algorithm, a client (wbest-receiver) requests for probe packets and the probe
sender (wbest-sender) sends the probe packets to the receiver. During the measurement
process, first, packet-pairs are requested to sent. Upon receiving the request, the sender
sends two packets (packet-pair) back-to-back. Dispersion of these two probe packets is
measured by the receiver. From the packet dispersion model discussed in section 5.1, it is
clear that the packet dispersion rate would reflect the effective capacity of the flow path
between the sender and the receiver. The packet dispersion rate is actually reflecting the
capacity of the link which has the lowest capacity along the flow path. As it is assumed
that the last hop is wireless and it has the lowest capacity, so the dispersion rate of the
packet-pair would reflect the capacity of the wireless link. Considering the facts that there
could be any packet loss or delay in the wireless network, 30 such packet-pairs are sent
from the sender to calculate the capacity accurately. Dispersion rates for each correctly
received packet pairs are calculated. To remove any outliers, the median of these disper-
sion rates is taken as the capacity of the wireless link.
Figure 5.2: Packet forwarding at the last hop wireless link
After calculating the flow path capacity, the receiver requests for packet trains at the
rate of the flow path capacity (wireless link capacity). Packet train refers to the continu-
ous flow of packets at a constant rate. The sender sends 30 packets (packet train of length
30) at the requested rate. From these 30 packet trains, 29 dispersion rate is calculated and
a mean of all these rates is taken as a available train rate. Probing traffic forwarding at
the last hop wireless link is shown in figure 5.2 [16]. The available bandwidth is calculated
as following.
Suppose, the capacity of the link is C and the load in the network is L (crossing traffic).
So the available bandwidth (A) is -
A=CL
or, L =CA(5.2)
When the packet-train is sent at the rate of C, the total incoming traffic rate for the
wireless access point (AP) is C+L. Now, the AP can only forward data with maximum
capacity (C). Some share of the AP capacity would be taken by the other traffic, say
this share is L´. Assuming downstream AP traffic is processed as a FIFO queue, the
downstream probing traffic would share the same ratio of the total amount of traffic
before and after the AP queue. So we can say -
C
C+L=C
C+L(5.3)
where C´ is the amount of share the probe traffic gets. Now,
C+L=C
So the equation 5.3 can be rewritten as -
C
C+L=C
C(5.4)
Now if we replace L with equation 5.2, we can rewrite the equation as -
C
C+ (CA)=C
C
or, C2
C= 2CA
or, A = 2CC2
C
or, A = 2(CC
C) (5.5)
The capacity (C) of the wireless link is calculated using packet-pair technique and the
probe share (average train rate), i.e. C´ is calculated using packet train. Then the
available bandwidth is calculated using the equation 5.5.
5.2.1 Limitations of WBest
There are few assumptions related to WBest measurement which does not hold during a
video streaming session - (i) WBest measures available bandwidth for a device (connected
with a WLAN) which is in a idle state, i.e. no packets are coming into or going out of
this device during the measurement process. (ii) All the probe packets used in WBest
measurement has a fixed size (1460 bytes).
As the bandwidth estimation is done based on probe packet dispersion, it increases
traffic in the last hop (wireless link). If these probe packets create congestion, even for a
short period of time, the video traffic will be effected for that period. The congestion will
also effect the probing traffic, which will lead to inaccurate bandwidth estimation. So,
the requirement is measuring available bandwidth accurately by introducing minimum
amount of probe traffic and without effecting the video traffic.
Equal sized packets will not estimate available bandwidth accurately. As packet disper-
sion depends on packet size, equal-sized probe packets would lead to improper bandwidth
estimation [6]. Small packets has low dispersion rate and large packets has higher disper-
sion rate (see figure 7.2). The problem is, if probe packets has larger size than the data
packets, then the estimated bandwidth would be higher than the achievable bandwidth
(with smaller data packet size). Similarly, if the probe packets has smaller size than the
data packets, then the estimated bandwidth is smaller than the achievable bandwidth.
5.3 EStream : A new bandwidth estimation tool for
WLAN
With the necessity of estimating available bandwidth in WLAN during a streaming ses-
sion, a bandwidth estimation tool for WLAN is created. The tool is called EStream,
as it continuously estimates available bandwidth in the WLAN through out a streaming
session (Estimation of available bandwidth in WLAN during Streming). EStream algo-
rithm is based on the technique used in WBest tool for measuring available bandwidth in
WLAN. Though WBest measures available bandwidth quite accurately than other band-
width estimation tools, it is not suitable for estimating bandwidth during a streaming
session. EStream implementation does the required modification while keeping the basic
algorithm same as the WBest.
In a video streaming session (RTSP/RTP streaming), unequal sized packets are preva-
lent. During video encoding, few video frames are specified as key frames. After encoding,
key frames contain more data than non-key frames. Also scalable video coding uses hier-
archical prediction structure which imposes unequal size frame. So not all encoded frames
contain the same amount of data. For RTP packets, each frame has a separate time-stamp
which helps the decoder to arrange the frames and play them at the right time. IP packet
payload is limited by maximum segment size (MSS). If a frame (after encoding) is larger
than MSS, then it is sent using multiple packets. Suppose, MSS (excluding header) for
a LAN is 1460 bytes and two consecutive frames has size of 1600 bytes and 600 bytes.
Clearly the first frame need to be split into two packets. First packet will contain 1460
bytes and remaining 140 bytes will be sent using a second packets. There is a possibility
of merging this 140 bytes and the second frame and send them together. As, they has
separate time-stamp they can not to merged and sent together. There is no fixed size for
key and non-key frames after encoding, rather it completely depends on encoding algo-
rithm and content of the video. So the video data packet size is variable.
To reduce the amount of probe traffic while estimating available bandwidth accurately,
an customized estimation technique is used by EStream which smartly avoids congestion
effect on the video stream. During the measurement period, EStream intrudes few probe
packets into the stream and treat all packets (probe and data) as probe packet which
decreases the amount of separate probe packets. Also it adaptively choose probe packet
size (according to data packet size) to infer proper available bandwidth.
Figure 5.3: How video and probe packets are mixed during packet-pair and packet-train
technique as compared to a normal video traffic pattern
The EStream implementation is integrated with the software modules. The probe-
sender of server-module is basically the sending side of EStream and the WLAN b/w
monitor of client-module is basically the receiving side of EStream. During the probing
period, the probe-sender mixes the probe packets with data packets. WBest inserts 60
packets (30 pairs) into the network during packet-pair techniques and each probe packets
has same size of 1460 bytes. Where as EStream, after receiving a request, inserts a probe
packet immediately after the actual data packet in the stream. 30 such probe packets are
inserted after 30 data packet. The probe packet has the same size of the data packet it is
following. The general video data pattern is shown in figure 5.3a. The modified stream
with packet-pair probes is shown in figure 5.3b. At the receiver, if both the packets (of
the pair) are received correctly, they are treated as a packet-pair and dispersion rate for
this pair is calculated. After that the data packet is forwarded to the client video player.
Two packets belonging to same pair is identified by the sequence number as the probe
packet uses the same sequence number as the data packet. This process certainly reduced
the amount of probe traffic as only 30 extra packets are sent and not all of them has
maximum size.
The packet train technique has large differences between WBest and EStream. As a
train, WBest sends 30 packets (at the train sending rate specified in the request message)
and all the probe packet has same size of 1460 bytes. EStream does not send consecutive
30 probe packets, as consecutive packets will create congestion and video data (as well as
probe traffic) will be affected. As a result, it would neither estimate the accurate available
bandwidth nor deliver proper video data. So, EStream sends multiple small packet trains
to avoid the congestion (figure 5.3c). Actually, it sends two packets as a packet train (train
length 2). They are not sent back-to-back, rather they are separated by a gap depending
on the train sending rate. First packet of the train is the video packet and second packet
is the probe. After receiving the two packets the dispersion rate is calculated. To get 30
dispersion rate (like WBest), 30 such packet trains are created. The average of all these
rates is taken as available train rate. Then the available bandwidth is calculated using
the equation 5.5.
5.3.1 Content of probe packets
Content selection of the probe packets plays a crucial role in EStream technique. The
purpose of the probe packet is nothing but to measure dispersion rate of the packets which
does not depend on its content. As the probe packets are inserted into the stream, this
may create congestion in the network (for a small period of time). This can lead to packet
loss which would not affect the bandwidth estimation, rather reflect accurate estimation.
But the video data loss would affect the video quality. To retain the video quality, the
probability of delivering video data is doubled by duplicating the content of the data
packet in the next probe packet. If either of the packets are received correctly, the video
quality remains the same. If both of them are received correctly, only one of them is
forwarded to the video player and the second packet is discarded by the client-module
after calculating the dispersion rate.
5.3.2 Continuous bandwidth estimation
As the available bandwidth in wireless network can vary frequently, the available band-
width need to be estimated continuously during a streaming session. In general, EStream
measures available bandwidth by requesting packet-pair and packet-trains in every 10
seconds. The capacity estimation of the WLAN (using packet pair) can vary a bit in suc-
cessive measurements. So a moving average of the capacity estimated in all measurement
cases (until that point) is taken as the capacity of the link. If the available bandwidth
drops immediately after one measurement process, the low bandwidth availability will be
identified only after 10 sec. EStream uses a quick response scheme to tackle this. During
the 10 second period between two successive estimation, it monitors the stream (sequence
numbers of the packets) to identify packet loss. If packet loss is identified, immediately it
requests for packet-train to measure available bandwidth. The capacity measured by the
last measurement is taken as the capacity of the link. If the bandwidth is really dropped,
then it will be reflected immediately. If this is a false alarm (packet is lost due to some
other reason), the bandwidth estimation will also reflect that.
5.4 Cost comparison between EStream and WBest
due to intruding traffic
Suppose a video stream requires 1 Mbps bandwidth. Now the bandwidth is measured in
every 10 seconds. In 10 seconds, the total amount of video traffic (VT) is -
V T = 10 1106bits
In case of WBest, each measurement cycle requires total 90 probe packets (30 packet-pairs
and a packet-train of length 30) to estimate the bandwidth. So, the amount probe traffic
(PT) would be -
P T = 1460 890 bits
as WBest uses probe packet of size 1460 bytes. Therefore, the traffic increase (TI) can be
calculated as -
T I =P T
DT
=1460 890
10 106100%
= 10.512% (5.6)
Now, in case of EStream, each measurement cycle requires total 60 probe packets (30
probe packets for 30 packet-pairs and another 30 probe packets for 30 sparse packet
trains of length 2) to estimate the bandwidth. Therefore, the amount probe traffic (PT)
would be -
P T = 1000 860 bits
where average packet size for the test case is 1000 bytes. So, the traffic increase (TI)
would be -
T I =P T
DT
=1000 860
10 106100%
= 4.8% (5.7)
In worst case, if all the data packet has the maximum size, then also the amount of traffic
increase (due to EStream measurement) would be -
T I =P T
DT
=1460 860
10 106100%
= 7.008% (5.8)
5.5 Bandwidth estimation for UMTS network
Most of the time, the maximum bandwidth availability for UMTS network is fixed by
the contract between the operator and the user. The UMTS network provides dynamic
bandwidth to the users depending on the traffic load for the users. Also there is assurance
from the operator that the user can get up to a certain amount of bandwidth (depending
on the contract). In other words, the bandwidth up to the maximum limit is available
for the user. Considering this fact (and also the lack of bandwidth estimation tool for
UMTS), we have not measured available bandwidth for the UMTS interface. In couple
of experiments, we have received video streams entirely using the UMTS network. If the
required data rate for the stream is within the assured data rate by the UMTS network
(2Mbps), then we were able to receive the video.
As the device is now capable of monitoring bandwidth availability, the decision about
video flow distribution is made by the client-module and then it informs the decision to
the server-module. In the next chapter, we will discuss the collaboration between the two
software modules.
Chapter 6
Signaling
Signaling is the most important aspect of the proposed system. The two software mod-
ules, server-module and client-module, are added as an software extension at the server
and the client respectively. These modules make decisions to fulfill the goals, i.e. when
to send probe packets to estimate bandwidth availability, which codec layers need to be
forwarded, how codec layers are distributed among multiple interfaces of the client etc.
The server-module can’t make any decision on it’s own, rather it acts on the command
of the client-module. As a server can serve a number of clients simultaneously, the deci-
sion making ability is added to the client-module to reduce complexity at the server. To
perform each act, these modules signal each other by exchanging a set of UDP messages.
This chapter defines the UDP messages that are used for signaling and describes when
and how these messages are exchanged to reach the desired goals.
Like any other multimedia streaming using UDP, network address translation (NAT)
impose a challenge to continue the communication. Multimedia service providers often
deploy a specialized stun server to solve this problem. In this work, our signaling technique
takes care of this issue without deploying a separate stun server. This chapter first
describes how NAT creates problem in a RTSP video-on-demand system and how it can
be solved. Then format of the UDP control messages is defined in section 6.2. Different
control messages and their purpose is also described in this section. How packets are
intercepted and handled by the software modules is described in section 6.3. Finally, the
collaboration between the two modules to complete a streaming session, while fulfilling
the requirements, is described in section 6.4.
6.1 Solving NAT issue
A RTSP video streaming session should go on between the streaming server and a client
without any problem if the client has a good network connectivity. But often a client,
31
situates behind a NAT, is not able to receive a single video data of a streaming session,
though it is availing a good network connectivity. Before discussing why a client behind
NAT is not able to receive a RTSP streaming video data, let’s first discuss what is NAT
and how most of the other type of packets a client can receive while situates behind a NAT.
Due to lack of IPV4 address space, all network elements can not have a globally
unique IP address. Network administrator often use Network Address Translation (NAT)
to provide Internet access to a large number of users with a single publicly addressable
IP address. There is a set of private IP address which can be reused in different local
area networks. Behind a NAT, an user is assigned a private IP address (unique within
Figure 6.1: Network Address Translation in a gateway - how packet’s header is changed
according to the address mapping maintained at the gateway
a local network). When the user tries to access remote content (outside of the local
network), the source IP of the packet is changed at the gateway of the network. The
address translation is shown in figure 6.1. An outgoing packet with source IP and port
pair (X.X.X.X:A) is changed to (Y.Y.Y.Y:B) at the gateway. When the packet reaches
it’s destination, the reply is send to (Y.Y.Y.Y:B). At the gateway, the NAT maintains the
mapping (Y.Y.Y.Y:B ->X.X.X.X:A). So it changes the destination of the packet from
(Y.Y.Y.Y:B) to (X.X.X.X:A) and forwards it to the local network and the packet reaches
it’s intended destination. There are different kind of NAT implementation which impose
different restriction on the mapping of the incoming packet address. But in all kind of
NAT implementation, the sender would always be able to send a message to the recipient
if the recipient itself initiates the communication.
Like other type of communication, a client behind a NAT should receive data of a
streaming session. But in most of the cases, it is not able to receive any video data.
To find out the reason and possible solution, lets first describe how an RTSP streaming
session works.
(1) A Streaming Server listens on a particular port (RTSP on 554) for serving request
from clients.
(2) A client requests for a video from the streaming server on port 554 using RT-
SP/TCP. The client uses an arbitrary free port (say 11224) for this communication.
(3) The server sends back reply to port 11224. The reply contains number of tracks
(and also other information) available in the scalable video.
(4) For each track in the video, a separate flow (RTP session) is created between the
server and the client. As mentioned earlier, multiple codec layers of a scalable video can
be streamed in multiple flows. The client informs the receiving port number for a flow to
the server. Suppose the client says, “Send me track-1 on port 32154-32155”. The even
port is used for RTP packets and odd port is used for RTCP packets. For each tracks,
the client informs a pair of such receiving ports.
(5) In reply, the server says, “Ok, I am sending track-1 from port 6970-6971 to port
32154-32155”. These messages are exchanged using RTSP/TCP in the TCP message
body as a plain text. So far no messages are exchanged between ports 6970-6791 and
32154-32155 in either direction.
(6) When the client says, “play”, the server starts sending data using RTP/UDP to
the agreed ports. So the data of track-1 are sent in a packet with source and destination
port 6970 and 32154 respectively.
(7) In the same way, all the data packets are sent to the agreed ports of the client. After
sending all the data packets, the session is terminated using a set of RTSP/TCP messages.
Now let’s examine why a device behind NAT can’t receive data of a streaming ses-
sion. In this section, we also provided a solution direction which is used in the signaling
mechanism to tackle with the issue.
(1) The communication approach in step (1-4) in the previous paragraph remains the
same even if the client is behind a NAT. In connection with the previous discussion, when
a client sends a request from the port 11224 to the port number 554 of the streaming
server, the source port (and source IP) of this request message is changed to some other
port number by the NAT. Suppose the source port number is changed from 11224 to
23234. Please note that the source IP is also changed from the private IP (X) to the
public IP (Y).
(2) The streaming server only sees that a request has come from the IP address Y
and port number 23234. So, it sends back reply to Y on port number 23234. When this
message comes to the NAT, it knows that destination port 23234 means, this message is
destined for port 11224 on client X. So, it changes the destination IP and port number
and forwards to the client.
(3) The client does not realize about the presence of NAT and continues to communi-
cate with the server. Considering the previous situation, the client agrees with the server
that track-1 will be sent from 6970-6971 to 32154-32155. As these agreement is done
using RTSP/TCP in the message body (as a plain text), the NAT is not aware of these
port agreement.
(4) When the client says, “play”, the server sends data of track-1 to the port 32154.
As no messages are sent from the port 32154, the NAT does not know the destination
for the these packets. So when a packet with destination port 32154 comes to the NAT,
it does not know where to forward the data and ultimately the data does not reach the
destination.
(5) To solve this problem, a dummy message (control message) can be sent from the
client to the server. When the server replies, Ok, I am sending track-1 from port 6970-
6971 to port 32154-32155, two dummy message can be sent from ports 32154 and 32155 of
the client to ports 6970 and 6971 of the server respectively. At the NAT, the source port
of these control messages are changed to some other port (and also the source IP from X
to Y), say 42426 for the RTP port. When the server receives these control messages, it
notes down the port numbers for track-1. Now the RTP data packets for track-1 are sent
to 42426 (instead of 32154) from the port 6970. At the NAT, it knows that destination
port 42426 means client X on port 32154 and change destination address. So, the data
finally reaches the destination.
(6) The control messages are not exchanged between the client and the server, rather
the client-module sends a specific type of control message to the server-module to tackle
the NAT issue. A scalable video can have multiple tracks (or flows) and according to
step-5 two control messages need to be sent for each tracks. The server-module needs
to maintain the forwarding port for each tracks. Instead of sending multiple control
messages, the client-module sends two control messages - one for RTP packets and another
for RTCP packets. The server-module maintains the IP address, RTP and RTCP ports
in a structure shown in figure 6.2 (other fields of the structure are described later). It
forwards all RTP packets of a streaming session to the same RTP port. Each flow of
a streaming session can be identified by the source synchronization identifier (ssrc). At
the client-module, it maintains the ssrc of a flow along with RTP and RTCP source and
destination port number (agreed during connection setup) in a structure shown in figure
6.3 (other fields of the structure are described later). Upon receiving packets, it decides
the track number of the packet and forward the packet to the video player after adjusting
the port numbers. As there is no way of distinguishing RTP and RTCP packets, they are
sent using separate ports.
Figure 6.2: Structure used by the server-
module to store information about an regis-
tered interface of a client - a separate struc-
ture is used for each interface
Figure 6.3: Structure used by the client-
module to store information about each
streaming flows
6.2 Control messages
As mentioned earlier, the client-module and the server-module collaborate by exchanging
various control messages between them. The control messages are exchanged at different
points of time, i.e. before starting a streaming session, at the initial phase of the streaming
session, during a streaming session, at the closing phase of the streaming session and after
completing the streaming session. A separate type of control message is used to accomplish
each specific task.
Figure 6.4: Control message format - general format of the UDP messages used by the
new signaling technique
General format of a control message is shown in figure 6.4 . There are two parts of a
control message - message header and message body. The message header has a fixed size
of 8 bytes. The message body is optional and not all message contains a message body.
Length of the message body can vary from 0-1016 bytes. So a control message can have
a maximum size of 1024 bytes. The message header contains two fields - type and tab.
The type field determines the purpose of the message, thereby, content of the message
Message type Purpose of the message
CTRL-RTP Register IP address of an inter-
face and RTP port number
CTRL-RTCP Register RTCP port number
CTRL-SELECT Distribute flows among multiple
interfaces
CTRL-REMOVE Remove flow distribution entry
CTRL-UNREGISTER Unregister an interface
PACKET-PAIR Request packet pair
PACKET-TRAIN Request single long packet train
PACKET-SPARSE Request multiple small packet
train
Table 6.1: List of control messages and their purpose
body and action of the recipient after receiving the message. The value of the tab field
depends on the message type. Table 6.2 summarizes the list of control messages and their
key purpose. Following is a detailed description of the control messages.
CTRL-RTP : If a client is intending to receive video data of a streaming session using
two interfaces, the server-module need to be informed about the fact as well as the IP
address of all the interfaces. This type of message is used to register all available interfaces
( e.g. wifi, umts etc.) of a devices with the server-module. As discussed earlier, to tackle
with the NAT issue, the communication need to be initiated from the client side. This
type of control message also solves this problem. Upon receiving a CTRL-RTP message,
the server-module makes a database entry for the the sending IP address and port num-
ber. After successful registration, the server-module replies with an unique registration-id.
Any future communication from the client-module related to that interface should specify
this registration-id. When the client sends this type of message, the tab field does not
have any significance and set to a negative value. The server uses the tab field to inform
the registration-id. This type of message does not have any message body.
CTRL-RTCP : This type of message has the same purpose of the CTRL-RTP mes-
sage. RTP protocol is coupled with RTCP protocol to control and monitor a RTP session.
Each RTP session is communicated between a pair of ports from both the server and client
side. RTP packets are communicated between even ports and the immediate higher odd
ports are used for RTCP communication. Except the port number, there is no other way
to distinguish between a RTP and a RTCP packet. So RTP and RTCP messages should
be sent to separate ports. As a result, a separate CTRL-RTCP message is sent to the
server-module to inform the RTCP port. The RTCP packets are sent to this port by the
server-module. The tab field specifies the registration-id of the interface known from the
CTRL-RTP reply message. This type of message also does not contain any message body.
CTRL-SELECT : This type of message is used to inform the codec layer distribution
policy (among multiple interface of the client) of a streaming session. Multiple mini struc-
tures (shown in figure 6.5) forms the message body. The tab field of the control message
header signifies the number of mini structures (number of tracks available in the video)
contained in the messages body. Each mini structure contains (flow distribution) policy
Figure 6.5: Mini structure used to inform and store flow distribution policy
information about each tracks of the video. The ssrc (source synchronization identifier)
value identifies the track to which the information is related to. The registration-id of
the two interfaces (assuming only two interfaces are available to an user) of the client are
conveyed by the id1 and id2 values. As mentioned earlier, a track can contain multiple
codec layers. If codec layers, belonging to a particular track, are distributed among the
two interfaces, the distribution is done according to the op values. As the codec layers
can be identified using the operating-point value, the op1 and op2 values contain the
maximum permissible operating-point value to be received by the client interfaces, de-
noted by id1 and id2 respectively. The three scalable identifiers (DID, QID and TID)
are combined together to form a single operating point value (figure 6.6). If one interface
of the client can receive all the operating points of the track, then the corresponding op
field is specified as ALL-OP and the other one is specified as NO-OP. These values can
also be used to switch on/off one or multiple flows. If both the op fields contains the
Figure 6.6: Combined operating point value using three scalable identifiers
value NO-OP, all the packets belonging to the flow are discarded. To switch a flow from
one interface to another interface, the op values of the respective interfaces are switched
between ALL-OP and NO-OP.
Figure 6.7: Pseudo code to decide a packet’s fate at the server-module
After receiving a CTRL-SELECT message (for the first time), the server-module makes
a database entry for each mini structure in the message body. Subsequent CTRL-SELECT
messages (if any changes in the distribution policy is required) updates the database en-
try. The client-module also stores the flow distribution policy in a structure shown in
figure 6.3. The op1 and op2 values are the same as used in the mini structure of CTRL-
SELECT message body.
The algorithm used by the server-module to decide a packet’s fate is described in
figure 6.7 as a pseudo code. Let’s take an example to discuss the role of the op fields
in distributing codec layers among multiple interfaces. Suppose, a video track contains
four codec layers with operating point values {1,0,0},{1,0,1},{1,0,2}and {1,0,3}and all
the layers can be received by interface-1 of the client. So the op1 and op2 fields contain
ALL-OP and NO-OP values respectively. After sometime the track need to be switched
from interface-1 to interface-2. Now the op1 and op2 fields contain NO-OP and ALL-
OP values respectively. Again after sometime, it is decided that the first layer ({1,0,0})
will be received via first interface, the second and third layer ({1,0,1},{1,0,2}) will be
received via second interface and the fourth layer will be dropped. In this case, the op1
and op2 fields contain 256 and 16630 respectively. These values are calculated by putting
the scalable identifiers in binary form in the structure 6.6. Please note that ALL-OP is
equal to 32767 and NO-OP is equal to -32768.
CTRL-REMOVE : When a streaming session is completed, the CTRL-REMOVE
message is sent from the client-module to remove the database entries marked by the
CTRL-SELECT message. In the message body, the ssrc for each video flows of the
streaming session are mentioned. The tab field contains the number of flows associated
with the streaming session.
CTRL-DEREGISTER : To unregister an interface from the server-module, this
type of control message is sent. On receiving this type of control message, the server-
module removes the entry from the database. The tab value contains the registration-id
of the interface to be unregistered. This type of control message does not have a message
body.
Apart from all these control messages, which tackle address registration, NAT issue,
layer adaptation etc. there are few other control messages related to available bandwidth
estimation. To estimate available bandwidth for an interface, the client-module requests
for probe packet from the server-module. The tab field of the message header specifies the
registration-id of the interface for which the available bandwidth will be estimated. Upon
receiving probe packets, the bandwidth-estimator of the client-module calculates available
bandwidth for that interface. There are different kinds of probe messages requested at
Figure 6.8: Format of the control message body to be used by the probe request messages
different point of time of a streaming session. For each type of probe request, a different
type of control message is sent. All these probe request messages use a common message
body (figure 6.8). As mentioned earlier, the server-module stores registered interface in-
formation in a structure (shown in figure 6.2). The probe-count and the probe-value fields
of the structure are filled up when a probe request message is received for that interface.
Following is the list of different probe request messages, their purposes and content of the
message body.
PACKET-PAIR : As specified in the chapter 5, a packet-pair technique is used to
estimate the capacity of a flow path. To request packet-pair, this type of control message
is used. In the message body, the probe-count specifies the number of probe-pairs to be
sent. The probe-value does not have any significance in the probe-pair request message
and set to zero.
PACKET-TRAIN : After estimating the capacity using the packet-pair technique, a
packet-train technique is used to estimate available bandwidth. The probe-count contains
the length of the packet train, i.e. number of packets to be sent as packet train. The
probe-value signifies the sending rate of the packet train, which is actually the capacity
of the flow path calculated using the packet-pair technique.
PACKET-SPARSE : A customized bandwidth estimation technique, sparse packet-
train is used in this work to estimate available bandwidth during a streaming session. A
customized probing technique is used for this purpose. Actually, a long packet-train is
divided into multiple small packet train of length 2. More about this technique is described
in chapter 5. To request sparse-packet-train, this type of control message is used. The
probe-count contains the number of trains to be sent and the probe-value contains the
sending rate (capacity of the flow path) of the packet trains.
6.3 Packet handling by software modules
The software extensions (server-module and client-module) process all packets related to
a streaming session before passing them to the next protocol layer. To be more precise,
all incoming packets are processed after local routing is done and all outgoing packets are
processed before routing is applied to the packets. iptables is a user space application
program that allows a system administrator to configure the tables provided by the Linux
kernel firewall (implemented as different Netfilter modules) [1]. By setting proper rules
in these table, the packets of a streaming session are queued in the Netfilter queue.
libnetfilterqueue is a library, provided by the Linux kernel, which provides functions to
process enqueued packets at the user space [19]. The subsequent sections describe how
different packets are processed by each software modules after copying them to the user
space using libnetfilterqueue.
6.3.1 Inside server-module
Figure 6.9 provides the insight about the server-module. All outgoing video data packets
from the streaming server, related to any streaming session, are queued in the netfilter
Figure 6.9: Packet traversal inside server-module - how different packets are handled by
different section of the server-module
queue number-1. They are intercepted (processed) by the packet-interceptor of the server-
module. By looking into the packet’s source synchronization identifier(ssrc) and operating-
point and consulting with the control-center, the destination address of the receiving
interface (for that packet) is decided. The ssrc is used to identify a flow. The control-center
maintains a database which provides destination address of the packets belonging to a
particular video flow. The operating-point identifies the codec layer of the packet’s content,
which is essentially used to adapt scalable video content. If the client has requested for
probe packets earlier to measure available bandwidth for the receiving interface, i.e. probe-
count for that interface contains a positive value, the packet is queued in probe-mix buffer
(netfilter queue-2). Then a probe packet of the same size (and same content) is sent
along with the original data packet. After sending each probe packet, the probe-count is
reduced by one. If no probe packet is requested, the data packet is forwarded directly to
the client. UDP header of all the packets (probe and data) are changed to the registered
address of the receiving interface.
6.3.2 Inside client-module
Figure 6.10 provides the insight about the client-module. All packets related to a stream-
ing session are queued and intercepted by the packet-interceptor of the client-module.
Figure 6.10: Packet traversal inside client-module - how different packets are handled by
different section of the client-module
Incoming packets from the steaming server are queued in netfileter queue number-1 and
outgoing packets are queued in queue number-2. The control-center of the client-module
collects the useful information (ssrc, source and destination port, required bandwidth etc.
about each video flow) from the SDP/RTSP packets. The RTSP packets are forwarded
unaltered by the packet-interceptor. In case of video data packets, packets are received via
different interfaces with changed packet header and the source and the destination port
numbers are adjusted to the expected port numbers. As the listening port of the client
application is different from the received packet’s destination port (to tackle the NAT is-
sue), the port numbers are adjusted to the listening port by the client-module to maintain
transparency between the client application and the server. If the receiving interface of
the packet is expecting probe packet to estimate available bandwidth for the interface,
the packet is forwarded to the bandwidth-estimator before the port numbers are adjusted.
As the data packet and probe packets are mixed together, the bandwidth-estimator treats
all the packets as probe packet. It is the job of the bandwidth-estimator to forward only
the data packets to the video player. Actually, the bandwidth-estimator only calculates
the packet dispersion rate to estimate available bandwidth. Then it forwards the data
packet to the upper layer and discards the probe packet. After changing the data packet’s
port numbers they are again queued to synchronizing- buffer before being forwarded to
the video player. Dynamic bandwidth allocation mechanism of the UMTS network can
cause an initial delay of the packet reception. As a result, the codec layers can become
unsynchronized. Packets received via WiFi and UMTS interface are queued to the queue
number 3 and 4 respectively. These queues (buffer) release packets in a synchronized
order. Packet synchronization is done by using the RTP timestamp of the data packets
(see section 4.3).
6.4 Collaboration between the server-module and the
client-module
The following section describes how the software modules work together to fulfill the
requirements with timely exchange and manipulation of the control messages. The server-
module runs all the time (as long as the streaming server is running). The client-module
(at a particular client device) can run all the time or can be started before starting a
streaming session. The data transmission (video data and control messages) during a
streaming session between a client and streaming server is shown in figure 6.11.
Registering multiple interfaces of the client - If a client has multiple in-
terfaces to receive a video stream, the server-module need to be informed about
the IP addresses of all the interfaces. The client-module registers IP address and
RTP port for all of it’s interfaces using the CTRL-RTP message. The server-module
maintains a database about the registered interfaces of different clients. After receiv-
ing a CTRL-RTP message, the server-module makes a new entry in the database
with the source IP address and source port number of the message. It also as-
signs a registration-id to the entry and replies back the registration-id to the client.
Any future communication related to a registered interface is referred using the
registration-id. To register the RTCP port, the CTRL-RTCP message is used. The
tab field of the CTRL-RTCP message contains the registration-id of the interface,
which is known from the reply of the CTRL-RTP message.
Initial available bandwidth estimation - Right before starting a streaming
session, the available bandwidth for the WiFi interface is measured using the similar
Figure 6.11: Message exchange between a server and a client during a streaming session
at the presence of the software modules
method used by the WBest tool. Packet-pair technique is initiated using PACKET-
PAIR control message. The tab field specifies the registration-id of the interface
for which the packet-pairs is requested. In the message body, number of packet-
pairs, i.e. probe-count is specified. After estimating the capacity using the packet-
pair technique, a PACKET-TRAIN message is used to request packet-train. In the
message body, length of the packet-train (probe-count) and capacity of the flow path,
i.e. probe-value is specified. Manipulating the packet-train and using equation 5.5,
available bandwidth for the interface is calculated.
Session setup and parameter gathering for a streaming session - When a
streaming session starts, initially a set of RTSP request/reply messages are ex-
changed between the client and the server. The server provides all required infor-
mation about the video stream at this stage. Text-based SDP protocol is used to
describe a stream. The packet-interceptor of the client-module collects all the in-
formation about the codec layers and flows of the scalable video from this message.
After receiving the description, the client and the streaming server setup the flows
by mutual agreement (agree upon the source and destination port for each flow).
Initial selection of path for each flows - When all flows are setup, the
client-module makes the decision about distributing video flows for it’s interfaces
depending on the initial bandwidth estimation. The flow distribution policy is in-
formed to the server-module using the CTRL-SELECT message. Multiple mini
structures (figure 6.5), containing policy information for each flow of the stream,
form the CTRL-SELECT message body. The server-module stores the flow distri-
bution policy for each flow in a table, called flow-table.
Video data transfer - When the client sends play request, the streaming server
starts sending the video data. All the data packets pass through the server-module
and it decides the packet’s destination (IP address and port number) by looking into
the flow-table and registered interface database. Actually, the flow-table maintains a
mapping of registration-id against each video flow. Using the reference (registration-
id), the server-module finds the IP address and port number for the packets of a
video flow. Then it changes the packet’s header according to the registered address.
At the client, the client-module receives packets via multiple interfaces and adjusts
the packet’s header accordingly. In this way, the software modules maintains
transparency between the client and the streaming server.
Continuous bandwidth estimation and flow redistribution - During the video
data reception, the bandwidth-estimator measures available bandwidth of the WiFi
interface in a periodic manner. A customized bandwidth estimation technique
is used in this work to estimate available bandwidth which uses multiple small
packet-trains (see chapter 5). To request multiple, small packet trains, the PBORE-
SPARSE control message is used. If there is a significant change in the available
bandwidth (measured using sparse-packet-train technique), the flow distribution
policy is modified. The new flow distribution is informed to the server-module us-
ing CTRL-SELECT message.
Session closing - When a streaming session is over, the server-module do not need
to maintain any information about the flows of that session. So the client-module
sends the CTRL-REMOVE message to clear the entries from the flow-table.
Unregistering interfaces - If the client-module does not request any more stream-
ing video from the server, the interfaces can be unregistered by sending CTRL-
UNREGISTER message. After receiving the message, the server-module removes
the entry from the database.
In the next chapter, we will discus the experimentation methodology and evaluate the
results.
Chapter 7
Experiments, Results and Evaluation
All system components are implemented for the Linux operating system. The server-
module and the client-module are implemented and tested in Ubuntu-10.04 with Linux
kernel version 2.6.32. For user space packet handling, the libnetlter-queue library of the
Linux kernel is used. It requires a kernel that includes the nfnetlink-queue subsystem, i.e.
it requires Linux kernel version 2.6.14 or later. The software modules are implemented in
C. The signaling between the two modules is realized by socket programming. The Darwin
Streaming Server-5.5.5 (DSS) software is used on the streaming server. The videos are
encoded using the JSVM (version 9.19.7) software. A customized VLC player extended
by a H.264 scalable video decoder plugin is used as video player.
7.1 Measurement setup
The general setup of the experimental testbed is shown in gure 7.1. The Darwin streaming
server - 5.5.5 (DSS) is installed on a Dell desktop computer with Intel Pentium - III
Figure 7.1: General experimental testbed used for all experiments - some components are
not used in some experiments
47
(Coppermine) processor, 996.634 MHz clock speed, 500 MB RAM, and the Ubuntu 10.04
(Kernel 2.6.32-21-generic) operating system. It is connected to the TU Berlin network via
an 100 Mbps Ethernet LAN. A Dell laptop (Latitude-D600) is used as FTP server, which
listens on a port to send UDP packets at a constant rate. The UDP packet size can be
varied, i.e. a (dummy) client can request fixed sized (1460 bytes data) or variable sized
packets. The FTP server is similarly connected via Ethernet LAN. An Apple airport base
station is used as an wireless access point (WiFi AP shown in gure 7.1). It is connected
via a 10 Mbps Ethernet LAN to the TU Berlin network and provides a wireless link
capacity (IEEE 802.11b) of 5.5 Mbps. A dell laptop (Latitude-D600) is used as a dummy
client. It downloads data from the FTP server at a specified rate to create load in the
WLAN. A dell laptop (Studio-1435) with Intel Core 2 duo (T6400) processor, 2 GHz
clock speed, 3 GB RAM, and Ubuntu 10.04 (Kernel 2.6.32-24-generic) operating system
is used as a streaming video receiver (shown as VLC client figure 7.1). The VLC player is
used to request and receive RTSP video streams from the Darwin streaming server. This
client has an built-in Broadcom WiFi card to connect with the WLAN (Apple access
point). An external USB UMTS stick is used to access the UMTS network. The service
is borrowed from the German service provider O , which provides an assured bit rate of
2 Mbps (according to the contract).
7.2 Measurements and results
The proposed system can provide best result if the proposed tool (EStream) estimates
bandwidth accurately. To verify the accuracy of EStream, it is used to estimate the avail-
able bandwidth in a controlled environment, where the actually available bandwidth can
be concluded from the induced total load in the WLAN. The VLC client and the dummy
client are the only computers connected with the access point (used in the testbed). To
avoid interference from other wireless users, all experiments are conducted inside lab ei-
ther after 10pm or before 6am. The Maximum Segment Size (MSS) for IEEE 802.11
wireless network as well as for UMTS network is 1500 bytes (including packet header).
So an UDP packet can contain maximum 1482 bytes of data (minimum 20 bytes of IP
header + fixed 8 bytes of UDP header), where as a TCP packet can contain maximum
1460 bytes of data (minimum 20 bytes IP header + minimum 20 bytes TCP header). As
the WBest tool uses 1460 bytes as maximum data size, we have also considered 1460 bytes
as maximum data size to make a comparison (though all of our probe packets and video
data are transmitted using UDP).
7.2.1 Evaluation of EStream
EStream estimates bandwidth, based on packet dispersion technique (chapter 5). Packet
dispersion rate depends on packet size. In figure 7.2, dispersion rate for 20 different packet
0
1
2
3
4
5
6
0 200 400 600 800 1000 1200 1400
Dispersion rate (Mbps)
Packet Size (bytes)
Figure 7.2: Packet dispersion rate with varying packet size
size is shown. Packet sizes are chosen randomly. For each packet size, 100 packet pairs
are sent with a gap of 20ms between each pairs. 100 dispersion rate is calculated from
100 packet pairs for each packet size. Median of these 100 rates is taken as dispersion rate
for a particular packet size. Please note that the packet pairs are sent in an idle LAN,
i.e. probe packets are the only traffic available in the WLAN. From figure 7.2, it is clear
that if we use small size probe packets then the estimation gives lower bandwidth. On the
other hand, if we use larger probe packets, then the estimation indicates higher bandwidth.
Capacity of WLAN is also measured using different size packets. To measure capacity
of the WLAN, continuous flow of packets are sent from the FTP server to the dummy
client. Initially, the packets are sent at a very high rate (say 55 Mbps). The receiving
rate is measured at the receiver (dummy client). Then the sending rates are adjusted
(reduced/increased step by step) until the sending and receiving rate are the same (al-
most). In our testbed, the measured WLAN capacity is 4.02 Mbps when a variable size
(each packet size is chosen randomly) packets are used. On the other hand, the WLAN
capacity is measured 5.2 Mbps when only maximum size packets are used. In experiments
we have seen that if our data packet has smaller size and they are sent at the rate of 5.2
Mbps, they are not received at that rate by the client. So there is delayed delivery and
for longer transmission some of the data packets are lost (due to buffer overflow at the
access point). As a result, EStream chooses adaptive probe packet size to refer bandwidth
proper bandwidth according to the stream.
The limitations of WBest and required modification for EStream is discussed in chapter
5. To evaluate the accuracy of EStream, different loads are imposed in the WLAN (by the
dummy client) to produce different bandwidth availability to the client. Here we provide
four load scenarios. The induced loads by the dummy client (by requesting traffic from
the FTP server at a constant rate) are 0.5 Mbps, 1.25 Mbps, 2.4 Mbps, and 3 Mbps.
The WLAN capacity is 4.02 Mbps. The measured dummy load values (at the dummy
client) are 0.5 Mbps, 1.24 Mbps, 2.38 Mbps, and 2.99 Mbps respectively for the four load
scenarios. So the available bandwidth for the VLC client should be 3.52 Mbps, 2.78 Mbps,
1.64 Mbps, and 1.03 Mbps respectively. These values are shown as ground truth in figure
7.3. The difference in requested load rate and measured load rate at the dummy client is
due the different clock resolutions and computations at server and client side.
0
0.5
1
1.5
2
2.5
3
3.5
0 0.5 1 1.5 2 2.5 3 3.5
Available Bandwidth (Mbps)
Load in the WLAN (Mbps)
Ground truth
"EStream"
"WBest"
Figure 7.3: WBest v/s EStream - comparison of estimated bandwidth measured by these
two tools with different load in the network; the required bandwidth for the videos in
each cases are less than, but close to the available bandwidth
During a video streaming, the available bandwidth is measured periodically after every
10 seconds with each of these four load scenarios. The 95% confidence intervals with 50
measurement samples (for each load scenario) are plotted in figure 7.3. For comparison
the WBest tool is used for all the given cases via separate measurement series. In these
measurements, the purpose is to estimate available bandwidth. So we have used video
streams which requires less amount of bandwidth than the available bandwidth in each
scenarios. The streaming video rates used in all four cases are 3.1 Mbps, 2.4 Mbps, 1.45
Mbps and 0.5 Mbps respectively.
The measured bandwidth is shown in figure 7.3. The measurements indicate that the
EStream tool estimates the available bandwidth with a maximum underestimation error
of 0.3 Mbps, while the WBest tool gives a maximum error of 2.7 Mbps. From figure 7.3,
it is clear that the EStream estimates bandwidth more precisely. The reason behind this
is the fact that the EStream uses adaptive probe packet size (same probe packet size as
the data packet) and it mixes probe packet with data packet. This reduces the amount of
additional traffic in the network. Also, the probe packets contains significant video data
which provides more reliability in video data transmission.
7.2.2 Evaluation of scalable video adaptation via network aware
utilization of multiple interfaces
As a test video, we have used the Paris sequence (available at http://media.xiph.org/
video/derf). It has a total of 1065 frames at a rate of 30 frames per second (fps). We
encode the video in two spatial layers, the base layer in the Quarter Common Intermediate
Format (QCIF) with a frame size of 176x144 pixels and one enhancement layer in the CIF
format with 352x288 pixels. Each of these layers is streamed in a separate flow, where the
base and enhancement layer require at most 0.2 and 0.7 Mbps respectively. The described
testbed with a WLAN capacity of 4 Mbps is used. The video streaming measurements
are conducted for three different setups, (I) receiving video with the WLAN interface
only without any layer adaptation, (II) receiving video with the WLAN interface only
but with layer adaptation and (III) receiving video with two interfaces simultaneously
with both layer adaptation and switch of streams. During the streaming session, the
videos are played by the VLC player. The VLC player also dumps the raw video frames
while decoding them to play, thus allowing for computation of the objective video quality
(PSNR comparison). The PSNR values are calculated for the received video in all three
cases, where the original video is taken as a reference.
Static content adaptation and distribution for multiple interface
To test video quality gain when content adaptation is used, streaming video is received
without any content adaptation and using content adaptation in two separate cases (case I
Figure 7.4: A sample frame of the paris sequence - the video quality is very low as all
the codec layers of the stream are received using wifi interface instead of a low available
bandwidth
Figure 7.5: A sample frame of the paris sequence when the video received using wifi
interface only; due to low available bandwidth, a subset of the codec layers are received
and II). A constant load of 3.5 Mbps is imposed in the WLAN via the dummy client. The
remaining available bandwidth is calculated as (4 - 3.5) Mbps=0.5 Mbps, which should
be sufficient to stream the base layer (0.2 Mbps) but not for the enhancement layer (0.7
Mbps). In case (I), both the video layers are received using WiFi, though the network can
Figure 7.6: A sample frame of the paris sequence when the video is received using wifi
and UMTS simultaneously; due to low available bandwidth in the WLAN, some of the
codec layers are being received via UMTS interface
transmit the base layer (required bandwidth is 0.2 Mbps) correctly. In case (II), content
adaptation is applied, therefore the enhancement layer (required bandwidth is 0.7 Mbps)
is discarded. A sample frame of the received video of case (I) and (II) is shown in figure
7.4 and 7.5 respectively. The video quality in case (II) is much better than the case (I).
The video quality can be further improved if the enhancement layer is received via UMTS
interface instead of discarding it (case (III)). In figure 7.6, a sample video frame is shown
for case (III).
To further illustrate the overall video quality in all three cases, the PSNR comparison
of the received videos is shown in figure 7.7. The blue line shows the PSNR comparison
of the video received in case I and the original video, where as the green line shows the
comparison for case II. Though both the layers are received in case I, the video quality in
this case is worse than the case II (only QCIF layer is received). The reason behind this
is the low bandwidth availability. In case I, video data is sent at the higher rate than the
available bandwidth. So the WLAN becomes congested and some packets get lost. This
loss applies to both the video layers (QCIF and CIF). But the decoder can’t afford to loss
packets from the base layer (QCIF). So video quality becomes worse in this case. There
is certain fluctuation at the beginning of the blue line. This is because few packets of the
15
20
25
30
35
40
0 200 400 600 800 1000
PSNR value (dB)
Frame number
using both wifi and umts
using wifi only and layer adaptation
using wifi only and no layer adaptation
Figure 7.7: PSNR comparison between received video and original video in three cases
with high load in the WLAN
base layer is received properly. But then the congestion effect stabilizes. As a result the
poor video quality also get stabilized. Now if we receive the entire CIF layer using UMTS
interface, we have the best and stable video quality (red line) (case III). In this case, the
video quality is the same as the offline decoded video. Certain drift in the red line around
frame number 70, due to some packet loss or delay, is totally a random event and does
not have any visible affect on the perceived video quality.
Synchronization of codec layers
In the previous case (III) experiment, two video layers are received via two interfaces and
the layers are distributed statically (predefined). Switching of a layer from one interface
to another is not done. But when layer switching is done in case of dynamic distribution,
the layers become unsynchronized. As a result, the video quality get degraded. If both
the layers are received using WiFi, the UMTS network remains idle. Due to the dynamic
adaptiveness of the UMTS network, when one layer is switched to UMTS network, there
is a large gap between the last packet received via the WiFi and a first packet received via
the UMTS network. But there is no such large inter-packet gap for rest of the layers. So
the switched layer becomes unsynchronized with rest of the layers. The inter-packet gap
for the switched layer is shown in figure 7.8. To solve the issue an additional synchronizing
0
200000
400000
600000
800000
1e+06
1.2e+06
1.4e+06
0 500 1000 1500 2000
Inter packet delay (us)
Packet number
packets received via wifi interface
packets received via umts interface
Figure 7.8: Inter-packet delay while of a video flow while it is switched between wifi and
UMTS
0
200000
400000
600000
800000
1e+06
1.2e+06
1.4e+06
0 500 1000 1500 2000
Inter packet delay (us)
Packet number
packets received via wifi interface
packets received via umts interface
Figure 7.9: Inter-packet delay is normalized by using a low level buffer
buffer is maintained at the client (described in chapter 4). The inter-packet delay for the
switching layer, with the presence of synchronizing buffer is shown in figure 7.9. Please
note that the inter-packet delay for the enhancement layer (it is being switched) is shown
only. The switching is done at predefined times and it is done by the server-module itself
without any signaling from the client-module at packet number 500, 720, 810, 890 and
2000.
To check how the video quality is degraded due to unsynchronized layers, PSNR
comparison for the received videos (with and without the synchronization buffer) is shown
in figure 7.10. The enhancement layer is switched first time from WiFi to UMTS after
15
20
25
30
35
40
0 200 400 600 800 1000
PSNR value (dB)
Frame number
using synchronization buffer
without using synchronization buffer
Figure 7.10: PSNR comparison of original video and received video with and without
synchronization buffer; enhancement layer is switched between WiFi an UMTS multiple
times
sending the 500th packet of the this layer. Please note that the switching is done, based
on packet number, but not on frame number because the frame number of a packet’s
content is not known to the server-module. From the dumped content of the streaming
video, it is realized that packet number 500 contains data of frame number 229. The video
quality is degraded after frame 230. Though the enhancement layer frames are received,
they are received after the decoding deadline (i.e. out of synchronization with the base
layer) and they are discarded by the decoder. As the base layer is received on time, the
video is being played with degraded quality. When the enhancement layer is switched
back to WiFi after the packet number 720, the packets (so are the frames) are received
in time again and the video quality is restored around frame number 340. Second time
when the enhancement layer is switched from WiFi to UMTS after packet number 810,
there is not much delay. As a result, the video quality is not degraded. Third time when
the layer is again switched from WiFi to UMTS after packet number, there is a large
inter-packet delay. As a result, the layers become unsynchronized again and the video
quality get degraded.
Dynamic content adaptation and distribution for multiple interface
To check dynamism of the proposed system design, the load on the WLAN is imposed
after starting the streaming session. Content adaptation for case (II) and content distri-
bution for case (III) is done dynamically with varying available bandwidth of the WLAN.
Results for the three cases are given in figure 7.11, which shows the PSNR comparison
15
20
25
30
35
40
0 200 400 600 800 1000
PSNR value (dB)
Frame number
using both wifi and umts
using wifi only and layer adaptation
using wifi only and no layer adaptation
Figure 7.11: PSNR comparison between received video and original video in three cases
with constant high load in the WLAN - (i) dynamic distribution of video flows between
wifi and UMTS, (ii) dynamic content adaptation while receiving the video flows using
only one interface (wifi), (iii) receiving all video flows using the wifi interface
for the received videos. From frame number 50 (for case (I)) and frame number 125 on
(case (II)+(III)) a constant WLAN load of 3.5 Mbps is induced by the dummy client. As
in the case of static content adaptation, the available bandwidth is 0.5 Mbps and only
base layer (required bandwidth 0.2 Mbps) can be received properly. Please note that
the measurements for each setup are conducted separately and that the WLAN loads
are triggered manually and (so) not at the same instances of time. Without adaptation
(setup (I)) the quality drops right after the induced load (frame number 50) to 17 dB and
does stay below 20 dB for the rest of the measurement (with the exception of some occa-
sional peaks). For setup (II) the quality drops from frame 125 on to 20 dB at most and
slightly improves from frame number 170 on (by roughly 1.25 dB) while staying stable.
When using multiple interfaces (setup (III)) the quality similarly drops to 21.25 dB but
is completely restored within 50 frames (1.7 seconds) at frame number 170, achieved by
the switch of the enhancement layer from WLAN to UMTS.
To verify that our system also reacts to released available bandwidth we conduct a
second measurement series for the setups (I), (II) and (III), where we induce a load of 3.5
Mbps (as in the previous measurements) shortly after starting the experiment but take it
off again in the last third of the experiment time. Figure 7.12, 7.13 and 7.14 illustrates
15
20
25
30
35
40
0 200 400 600 800 1000
PSNR value (dB)
Frame number
using wifi only and no layer adaptation
load duration
Figure 7.12: PSNR comparison between received video and original video when all video
flows are received using the WiFi interface without any adaptation
the PSNR comparisons of case I, II and III respectively. For usage of one interface only
the WLAN load is induced at frame number 135 and removed at frame 720. The video
quality drops from frame number 130 onwards and the quality is accordingly restored af-
15
20
25
30
35
40
0 200 400 600 800 1000
PSNR value (dB)
Frame number
using wifi only and layer adaptation
load duration
Figure 7.13: PSNR comparison between received video and original video when video
flows are received using the WiFi interface only with content adaptation
ter frame number 720. For usage of the WLAN interface with content adaptation (setup
(II)) the WLAN load is active between frame numbers 300 and 800. During that time,
the quality drops to 22 dB (and remains stable), while it is restored to 35 dB when the
load is taken off. The drift between frame number 300 to 420 is because some frames
of the CIF layer are received during this period (and some of them are not). But after
some time, the system detects a lower available bandwidth and decides not to receive the
CIF layer. So the video quality stabilizes. When utilizing two interfaces (setup (III)) the
WLAN load is induced between frames 50 and 640 (see firure 7.14). The video quality
fluctuates between frame number 50 to 90 due to lower available bandwidth in WLAN.
Then the system detects the lower bandwidth and switch the CIF layer to UMTS network.
As a result, the video quality is restored. The switch back of the enhancement layer from
WLAN to UMTS does not result in any change of video quality.
Our measurements indicate that our system detects the loss of available bandwidth
and triggers the suitable adaptation within 1.5 seconds (on an average and maximum 2
seconds). Switching off a layer or re-routing it to a different interface does not result in
a quality degradation, and the involved delay is mainly due to the bandwidth estimation
process. It always takes some time to estimate the available bandwidth reliably without
15
20
25
30
35
40
0 200 400 600 800 1000
PSNR value (dB)
Frame number
using both wifi and umts
load duration
Figure 7.14: PSNR comparison between received video and original video when all video
flows are received using the WiFi and UMTS interfaces
affecting the streaming video. From the plots in figures 7.11 and 7.14, it is clear that if we
use two interfaces simultaneously when the available bandwidth for one interface is not
sufficient, a better video quality will be received. Also when interfaces can not provide the
required bandwidth, we can switch off flows to receive a better video quality. Similarly
the system detects when sufficient bandwidth becomes available again, and accordingly
switches on enhancement layers or re-routes them back to the WLAN interface while the
quality provided by the coding is received.
Chapter 8
Future Work
There are many areas which can be improved to provide a better result towards the goal
of this thesis in all real life scenarios.
Improvement of system response time - The response time of the system de-
pends mainly on the bandwidth estimation process and signaling procedure. The
time taken by the signaling procedure is more or less fixed. As the system takes 1.2
seconds (on an average) to detect available bandwidth drop, this can cause fluctua-
tion in video quality in highly fluctuating WLAN. So the time required to estimate
available bandwidth for WLAN need to be minimized.
Available Bandwidth measurement for UMTS network - In this work, we have
a presented bandwidth estimation technique only for WiFi. The nature of UMTS
network is different from WiFi network. So the bandwidth estimation technique,
used for WiFi network, can’t be used for UMTS as it is. Similar technique might
be used with required modification. The similarity or dissimilarity between WiFi
and UMTS network need to be investigated thoroughly to make required changes
(if any) in the bandwidth estimation technique for UMTS.
Provide seamless video streaming irrespective of user’s mobility - Us-
ing a mobile device, an user is expected to roam around while watching the streaming
video. As WiFi network can provide a short coverage area (depends on environment,
such as indoor or outdoor, obstacle etc.), the user may loose connectivity while mov-
ing. On the other hand, the UMTS network is expected to provide connectivity at
all places. If the client device detects a connection black-out in advance and switch
the codec layers to the UMTS network, the streaming session can be continued. We
have tried an approach based on received signal strength. Some threshold points
for connectivity black-out based on received signal strength was identified. The ap-
proach was able to switch the codec layers in time, before loosing the connectivity,
to provide a continuous streaming session. These experiments are done only at the
61
outdoor environment. As only received signal strength can’t be an appropriate indi-
cator to trigger vertical handoff, some other parameters need to be considered while
making the handoff decision. Also this approach have not considered the horizontal
handoff between WiFi access points. Extensive experiments need to be done to
provide uninterrupted video streaming for moving users.
Smart prediction of wifi network availability for moving users - Precog-
nition about user mobility can be useful to provide uninterrupted video streaming.
If the device has specific knowledge about the user’s mobility and WiFi connectiv-
ity along the user’s mobility path, available WiFi connectivity can be utilized in a
better way. Depending on the user’s position (using an integrated GPS device) and
movement (using an accelerometer) WiFi connectivity can be estimated.
Exploring user’s mobility history - An user’s mobility history can be utilized
to build a WiFi connectivity map along a mobility path. Suppose an user is moving
along a path while watching a streaming video without any previous knowledge
about the WiFi connectivity. During this movement a possible connectivity map
can be formed, so that the next time the user roams around this area, this map can
be used.
Optimization of power usage - Power consumption is a major concern for mobile
devices. Simultaneous usage of multiple interfaces increases the amount of consumed
power. Scanning for possible connectivity options (access point) also consumes a lot
of power. Usage of multiple interfaces need to be done smartly so that the power
consumption can be optimally minimized.
Minimize jitter due to handoff by smart data reception - When one or mul-
tiple codec layers (not all) are switched from one access network to another, layers
received via different access networks need to be synchronized. In chapter 4, the
synchronization of multiple layers are discussed. In this work, using an additional
buffer at the client, the synchronization is done. The synchronization process is not
verified extensively and need to modify accordingly.
Chapter 9
Conclusion
While scalable video coding (SVC) and an RTP payload format to stream a selected
content quality to the client have recently been standardized, the methods for a network-
aware and dynamic content adaptation have not been available. In this work, we have
introduced a customized WLAN bandwidth estimation and a signaling method to enable
adaptive scalable video streaming over multiple wireless interfaces (WLAN, UMTS). SVC
features distribution of codec layers in separate flows and thereby in principle supports the
seamless switch to a different quality when multiple interfaces are used simultaneously.
We have introduced an available bandwidth estimation method that induces less probing
traffic by mixing probe packets in between the video stream to utilize video data packets
for the measurement. The introduced probing packets do contain relevant video data
and make the video transmission more robust against packet loss. We specify a signaling
method to enable the bandwidth estimation and the switch of codec layers to the required
interface, both to be triggered by the client. The signaling method solves the NAT prob-
lem - that is, it makes the generally hidden client interfaces addressable to receive the
selected flow on each of them. On the client side the bandwidth is estimated periodically
and a suitable decision is taken accordingly. The introduced software extensions for the
streaming server and the client device do not require any change in the streaming server
or the clients media player software.
Our evaluation measurements indicate that our bandwidth estimation is more precise
than with the use of the general packet dispersion techniques. This is due to the controlled
insertion of probing packets into the stream, whereby sources of measurement errors are
minimized. Furthermore the PSNR measurements show that the video quality is not
affected due to the less amount of probing packets. Instead it is verified that a switch of
an enhancement codec layer from WLAN to UMTS and vice versa is triggered in time
after a change in the available WLAN bandwidth, and that the switch does not result into
a quality drop but into the expected quality provided by the coding. It is furthermore
63
demonstrated that the flows arriving at the two interfaces simultaneously are sufficiently
synchronized by the introduced queuing systems on the server and the client side.
Appendices
65
Appendix A
PSNR comparison
To evaluate the system, the streaming video quality need to be measured in several sce-
narios. Though only human perception can assess a video quality, this practice is not
possible all the time. As human perception varies person to person and it takes a lot
of time, a objective video quality measurement technique is usually used for quick as-
sessment. Peak Signal to Noise Ratio (PSNR) is most simplest and widely used video
quality comparison methodology. PSNR value is calculated between two similar quantity
and expressed in decibel (dB). PSNR comparison between the original video (reference
video) and the transmitted video (received video in a streaming) is calculated using the
following equation -
P SN R(dB) = 10 log( 2552
MSE ) (A.1)
where MSE refers to Mean-Square Error and it is calculated using,
MSE =
w
X
i=1
h
X
j=1
(Aij Bij )2
wh(A.2)
Here, w is the width of the frame, h is the height of the frame, A and B refers to
pixel value (YUV or RGB) of the reference and transmitted videos. In equation A.1 it is
assumed that a pixel value is represented using 8 bits only.
However, general PSNR calculation does not take packet loss in consideration. If
there is a frame loss, the transmitted video and original video become unsynchronized.
As a result, a frame is not compared with the corresponding reference frame and produce
a wrong conclusion about the video quality. In [4], authors provide an objective video
quality measurement tool based on PSNR which considers packet loss. In this work, we
have used a synchronization tool which synchronizes the received video before calculating
the PSNR values. A freeze frame is inserted if the frame is lost.
67
Bibliography
[1] http://en.wikipedia.org/wiki/Iptables.
[2] Cisco visual networking index: Forecast and methodology 2009-2014. Technical re-
port, 2010.
[3] A.Eichhorn and P.Ni. Pick your Layers wisely - A Quality Assessment of H.264
Scalable Video Coding for Mobile Devices. In IEEE International Conference on
Communications, pages 1019–1025, 2009.
[4] An (Jack) Chan, Kai Zeng, Prasant Mohapatra, Sung-Ju Lee, and Sujata Banerjee.
Metrics for evaluating video streaming quality in lossy IEEE 802.11 wireless networks.
March 2010.
[5] Roya Choupani, Stephan Wong, and Mehmet Tolun. Scalable Video Coding: A
Technical Report. Technical report, September 2007.
[6] Constantinos Dovrolis, Parameswaran Ramanathan, and David Moore. Packet Dis-
persion techniques and capacity estimation. In IEEE Conference on Local Computer
Networks, October 2008.
[7] Kristian Evensen, Dominik Kaspar, Carsten Griwodz, Pl Halvorsen, Audun F.
Hansen1, and Paal Engelstad. Improving the Performance of Quality-Adaptive Video
Streaming over Multiple Heterogeneous Access Networks. In Second annual ACM
conference on Multimedia systems, pages 57–68, February 2011.
[8] M. Handley and V. Jacobson. SDP: Session Description Protocol. Internet draft,
April 1998.
[9] Cheng-Hsin Hsu, Nikolaos M. Freris, Jatinder Pal, and Singh Xiaoqing Zhu. Rate
Control and Stream Adaptation for Scalable Video Streaming over Multiple Access
Networks. In 18th International Packet Video Workshop, pages 33–40, December
2010.
[10] Dan Jurca and Pascal Frossard. Video Packet Selection and Scheduling for Multipath
Streaming. In IEEE Transactions on Multimedia, volume 9, April 2007.
69
[11] K.Chebrolu and R.R.Rao. Selective Frame Discard for Interactive Video. In IEEE In-
ternational Conference on Communications, volume 7, pages 4097–4102, June 2004.
[12] K.Chebrolu and R.R.Rao. Bandwidth Aggregation for Real-Time Applications in
Heterogeneous Wireless Networks. In Mobile Computing, IEEE Transactions, April
2006.
[13] K.Evensen, D.Kaspar, P.Engelstad, A.F.Hansen, C.Griwodz, and P.Halvorsen. A
Network-Layer Proxy for Bandwidth Aggregation and Reduction of IP Packet Re-
ordering. In IEEE Conference on Local Computer Networks, October 2009.
[14] Ingo Kofler, Martin Prangl, Robert Kuschnig, and Hermann Hellwagner. An
H.264/SVC-based adaptation proxy on a WiFi router. In 18th International Work-
shop on Network and Operating Systems Support for Digital Audio and Video, May
2008.
[15] M. Li, M.Claypool, and R.Kinicki. Packet Dispersion in IEEE 802.11 Wireless Net-
works. In IEEE Conference on Local Computer Networks, October 2008.
[16] M. Li, M.Claypool, and R.Kinicki. WBest: a Bandwidth Estimation Tool for IEEE
802.11 Wireless Networks. In IEEE Conference on Local Computer Networks, Octo-
ber 2008.
[17] Yu Hsiang Lin, Shiao-Li Tsao, Ya-Lian Cheng, and Chih-Min Yu. Dynamic Band-
width Aggregation for a Mobile Device with Multiple Interfaces. In The 8th Inter-
national Symposium on Communications, 2005.
[18] Pik Jian Low, M.S.S.M. Shahrom, M.F.A. Fauzi, M.Y. Alias, M.H.L. Abdullah,
K. Anuar, A.T. Samsudin, M. Amil, and S.N Yahya. Design and Implementation of
Adaptive Scalable Streaming System over Heterogeneous Network. In IEEE Interna-
tional Conference on Signal and Image Processing Applications, page 84, November
2009.
[19] Pablo Neira and Harald Welte. http://www.netfilter.org/projects/libnetfilter-
conntrack/index.html.
[20] James Nightingale, Qi Wang, and Christos Grecos. Optimised Transmission of H.264
Scalable Video Streams over Multiple Paths in Mobile Networks. In IEEE Transac-
tions on Consumer Electronics, volume 56, pages 2161–2169, November 2010.
[21] P.Amon, T.Rathgen, and D.Singer. File Format for Scalable Video Format. In IEEE
Transactions on Circuits and Systems for Video Technology, volume 17, page 1174,
September 2007.
[22] J. Puttonen, G. Fekete, T. Vaaramaki, and T. Hamalainen. Multiple Interface Man-
agement of Multihomed Mobile Hosts in Heterogeneous Wireless Environments. In
Eighth International Conference on Networks, pages 324–331, 2009.
[23] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson. RTP: A Transport Protocol
for Real-Time Applications. Internet draft, January 1996.
[24] H. Schulzrinne, A. Rao, and R. Lanphier. Real Time Streaming Protocol (RTSP).
Internet draft, April 1998.
[25] Thomas Stockhammer, Miska M. Hannuksela, and Stephan Wenger. H.26L/JVT
coding Network Abstraction Layer and IP-based Transport. In International Con-
ference on Image Processing, volume 2, pages II–485, December 2002.
[26] Cheng-Lin Tsao and Raghupathy Sivakumar. On Effectively Exploiting Multiple
Wireless Interfaces in Mobile Hosts. In The 5th international conference on Emerging
networking experiments and technologies, 2009.
[27] S. Wenger, M.M. Hannuksela, T. Stockhammer, M. Westerlund, and D. Singer. RTP
Payload Format for H.264 Video. Internet draft, February 2005.
[28] S. Wenger, Y.K. Wang, T. Schierl, and A. Eleftheriadis. RTP Payload Format for
Scalable Video Coding. Internet draft, February 2011.
ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
https://tools.ietf.org/html/rfc6190
Technical Report
Full-text available
This memorandum describes RTP, the real-time transport protocol. RTP provides end-to-end network transport functions suitable for applications transmitting real-time data, such as audio, video or simulation data, over multicast or unicast network services. RTP does not address resource reservation and does not guarantee quality-of-service for real-time services. The data transport is augmented by a control protocol (RTCP) to allow monitoring of the data delivery in a manner scalable to large multicast networks, and to provide minimal control and identification functionality. RTP and RTCP are designed to be independent of the underlying transport and network layers. The protocol supports the use of RTP-level translators and mixers.
Article
Full-text available
The packet pair technique estimates the capacity of a path (bottleneck bandwidth) from the dispersion (spacing) experienced by two back-to-back packets [5][19][13]. It has also been claimed that the dispersion of longer packet bursts (`packet trains&apos;) can measure the available bandwidth of a path [5][9][3]. This paper examines such packet pair and packet train dispersion techniques. We first demonstrate that the dispersion of packet pairs in loaded paths follows a multimodal distribution, and discuss the queueing effects that cause multiple local modes. The path capacity is not necessarily the global mode, and so it cannot be estimated using statistical procedures for the most common bandwidth range. The effect of the probing packet size is also investigated, showing that the conventional wisdom of using maximum sized packet pairs is not optimal. On the contrary, if the probing packets of different packet pairs are of variable size, the sub-capacity local modes become wider and weaker.
Article
Full-text available
Packet dispersion techniques have been commonly used to estimate bandwidth in wired networks. However, current packet dispersion techniques were developed for wired network environments and can provide inaccurate results in wireless networks due to wireless capacity variability over short time scales. This paper develops an analytical model to investigate packet dispersion behavior in wireless networks. The packet dispersion model is validated using both an extended ns-2 simulator that includes 802.11 MAC layer rate adaptation and wireless 802.11b testbed measurements. Utilizing the model, this study shows that packet dispersion measures effective capacity and achievable throughput of wireless networks instead of the maximum capacity as in wired networks. Additionally, mean and variance of packet dispersion in IEEE 802.11 wireless networks is analyzed while considering the impact of channel conditions such as packet size, link rate, bit error rate and RTS/CTS
Conference Paper
Full-text available
Bandwidth estimation techniques seek to provide an accurate estimation of available bandwidth such that network applications can adjust their behavior accordingly. However, most current techniques were designed for wired networks and produce relatively inaccurate results and long convergence times on wireless networks where capacity can vary dramatically. This paper presents a new wireless bandwidth estimation tool, WBest, designed for fast, non-intrusive, accurate estimation of available bandwidth in IEEE 802.11 networks. WBest is a two-stage algorithm: 1) a packet pair technique estimates the effective capacity over a flow path where the last hop is a wireless LAN (WLAN); and 2) a packet train technique estimates achievable throughput to infer the available bandwidth. WBest parameters are optimized given the tradeoffs of accuracy, intrusiveness and convergence time. The advantage of WBest stems from avoiding a search algorithm to detect the available bandwidth by statistically detecting the available fraction of the effective capacity to mitigate estimation delay and the impact of random wireless channel errors. WBest is implemented and evaluated on an 802.11 wireless testbed. Comparisons with other available bandwidth estimation tools shows WBest to have higher accuracy, lower intrusiveness and faster convergence times. Thus, WBest demonstrates the potential for improving the performance of applications that need bandwidth estimation, such as multimedia streaming, on wireless networks.
Article
Video streaming over the Internet has gained popularity during the recent years which is mainly the result of the introduction of videoconferencing and videotelephony. These in turn have made it possible to bring to life many applications such as transmitting video over the Internet and over telephone lines, surveillance and monitoring, telemedicine (medical consultation and diagnosis at a distance), and computer based training and education. The heterogeneous, dynamic and best-efiort structure of the Internet however, can not guarantee any speciflc bandwidth for a connection. Many video coding standards have tried to deal with this problem by introducing the scalability feature as adapting video streams to the ∞uctuations in the available bandwidths. In this study, we have discussed the main technical features of more common scalable video coding techniques. The main problems of these methods and their applicability together with the available motion compensated video coding methods are discussed as well.
Article
It is expected that future mobile devices might equip multiple wireless interfaces, and can access different wireless systems simultaneously. For some mobile Internet applications that require a high bandwidth, it is important to aggregate the bandwidths of multiple interfaces of a mobile device together to provide a virtual broadband wireless pipe. In this paper, we propose a dynamic bandwidth aggregation scheme for a mobile device with more than one wireless interface. The proposed system suggests an architecture and procedures to offer wireless bandwidth aggregation service. Also, a bandwidth aggregation scheduler that can adjust the number of packets transferred on each wireless link dynamically is designed and presented. The simulation results demonstrate that the proposed method can aggregate wireless bandwidths efficiently and can accommodate the dynamics of the wireless link throughputs.