Content uploaded by Reza Farahani
Author content
All content in this area was uploaded by Reza Farahani on Jan 16, 2023
Content may be subject to copyright.
Hybrid P2P-CDN Architecture for Live Video
Streaming: An Online Learning Approach
Reza Farahani*, Abdelhak Bentaleb§, Ekrem C¸ etinkaya*,
Christian Timmerer*, Roger Zimmermann§, and Hermann Hellwagner*
*Christian Doppler Laboratory ATHENA, Institute of Information Technology, Alpen-Adria-Universit¨
at Klagenfurt, Austria
§School of Computing, National University of Singapore, Singapore
Abstract—Designing a cost-effective, scalable, and flexible
architecture that supports low latency and high quality live
video streaming is still a challenge for Over-The-Top (OTT)
service providers. To cope with this issue, this paper leverages
Peer-to-Peer (P2P), Content Delivery Network (CDN), edge com-
puting,Network Function Virtualization (NFV), and distributed
video transcoding paradigms to introduce a hybRId P2P-CDN
arcHiTecture for livE video stReaming (RICHTER). We first
introduce RICHTER’s multi-layer architecture and design an
action tree that considers all feasible resources provided by peers,
edge, and CDN servers for serving peer requests with minimum
latency and maximum quality. We then formulate the problem
as an optimization model executed at the edge of the network.
We present an Online Learning (OL) approach that leverages an
unsupervised Self Organizing Map (SOM) to (i) alleviate the time
complexity issue of the optimization model and (ii) make it a
suitable solution for large-scale scenarios, by enabling decisions
for groups of requests instead of for single requests. Finally, we
implement the RICHTER framework, conduct our experiments on
a large-scale cloud-based testbed including 350 HAS players, and
compare its effectiveness with baseline systems. The experimental
results illustrate that RICHTER outperforms baseline schemes in
terms of users’ Quality of Experience (QoE), latency, and network
utilization, by at least 59%, 39%, and 70% respectively.
Index Terms—HAS; Edge Computing; NFV; CDN; P2P; Low
Latency; QoE; Video Transcoding; Online Learning.
I. INTRODUCTION
Motivation: The proliferation of novel video streaming
technologies, advancement of networking paradigms, and
steadily increasing numbers of users who prefer to watch video
content over the Internet rather than using classical TV have
made video the predominant traffic on the Internet. Among all
types of video traffic, live video streaming has become signif-
icantly popular, accounting for about 17% of the total video
traffic by 2022 [1]. HTTP Adaptive Streaming (HAS) delivery
systems, (e.g., based on the Dynamic Adaptive Streaming over
HTTP (DASH) standard or Apple’s HTTP Live Streaming
(HLS)) have become the prevalent technologies employed by
OTT service providers (e.g., Facebook, YouTube, Twitch) for
live video streaming delivery [2]. In HAS, videos are split
into short segments with fixed duration, and each segment
is encoded at various qualities/bitrates (i.e., representations);
then, HAS clients adapt to the available bandwidth and/or
playout buffer status to download appropriate segments from
CDN servers, using an adaptive bitrate algorithm [2]. Although
utilizing CDN services to scale HAS delivery systems has been
a step forward, tremendous growth in high-quality and low
latency live video demands create several challenges for OTT
services. For instance, CDN servers can be overloaded, then
OTT services fail to deliver a satisfactory quality and latency to
end-users [3]. Recent studies have revealed that using clients’
capabilities within a P2P network to form hybrid P2P-CDN
video delivery systems addresses the aforementioned issues
and brings many advantages, like alleviating network con-
gestion, increasing streaming stability, and reducing delivery
costs [4]–[6]. Considering these benefits, many companies,
e.g., Peer5 and Livepeer, have been utilizing peer-assisted
networks with some promising networking protocols (e.g., We-
bRTC) to offload CDNs and accomplish the aforementioned
goals. Some works [7] reveal that existing hybrid P2P-CDN
live streaming systems do not consider the full capability of
peers to provide high quality and low latency live streaming,
consequently suffering from inefficient resource utilization and
unpleasant users’ QoE. Therefore, the primary motivation for
our work is devising a hybrid P2P-CDN live streaming system
to (i) employ both computing and bandwidth capabilities
provided by the P2P network, (ii) leverage modern networking
paradigms (i.e., NFV and edge computing) and an OL-based
approach to utilize P2P and CDN resources efficiently, and (iii)
satisfy HAS client requests with high QoE and low latency.
Related Work: Hybrid P2P-CDN systems generally include
three main components: (i) the media servers (i.e., origin or
CDN servers) for distributing the video contents to the peers,
(ii) peers that stream the same video contents with the same
quality, and (iii) a tracker server including a matching table to
find the best peers with minimum latency who are watching
the same video content and quality level. Some works like [8]
customize HAS players and propose such a hybrid system
in order to reduce CDN bandwidth usage and transmission
costs. Muscat et al. [9] utilize the server push functionality
of HTTP/2 to propose a hybrid P2P-CDN low latency live
video streaming system. In our previous works [10]–[12],
we propose edge- and Software-Defined Networking (SDN-)
assisted video streaming frameworks that do not leverage P2P
capability and focus on Video on Demand (VoD) scenarios.
Nacakli et al. [13] leverage SDN and edge computing to
present a novel hybrid P2P-CDN service that is hosted at SDN-
enabled edge data centers. Our previous work [14] proposes
a hybrid P2P-CDN architecture for low latency live video
streaming without implementation, evaluation an utilizing an
online learning approach. Ma et al. [15] propose machine
learning-based approaches for hybrid P2P-CDN systems that
enable their trackers to perform peer selection. However, their
system does not employ edge- and OL-supported approaches978-1-6654-3540-6/22 © 2022 IEEE
Hybrid Event - 2022 IEEE Global Communications Conference: Communications Software and Multimedia
1911
GLOBECOM 2022 - 2022 IEEE Global Communications Conference | 978-1-6654-3540-6/22/$31.00 ©2022 IEEE | DOI: 10.1109/GLOBECOM48099.2022.10001091
Authorized licensed use limited to: Universitaet Klagenfurt. Downloaded on January 16,2023 at 09:35:01 UTC from IEEE Xplore. Restrictions apply.
CDN Network
Virtual Tracker
Server (VTS)
P2P Layer
Peer Transcoder
Edge Transcoder
gNodeB
Partial Cache (PC)
Media Organization Layer
Encoding and
Packaging
Seeders
CMCD
CMSD
Media
Leechers
Edge Layer
CDN Layer
Figure 1: RICHTER system architecture
and does not include transcoding-based actions. To the best of
our knowledge, none of the existing hybrid P2P-CDN video
streaming frameworks proposes a system to (i) use peers’
potential idle computational resources for serving HAS clients
through running video transcoding and (ii) make the virtual
edge trackers intelligent by employing an OL approach.
Contributions: To tackle these challenges, in this paper,
we leverage HAS, P2P, CDN, NFV, and edge computing
technologies and propose a hybRId P2P-CDN arcHiTecture
for livEvideo stReaming (RICHTER). Our solution aims to
minimize HAS clients’ latency and network costs. Besides
considering resource limitations, we design an Action Tree
including all possible actions for serving clients’ requests
employed by Virtual Tracker Servers (VTSs) at the edge of
a P2P-CDN network. We formulate the problem as a mixed-
integer linear programming (MILP) optimization model. Due
to the NP-completeness of the proposed MILP model, we
design an OL-based approach that uses an unsupervised SOM
technique [16] for action selection decisions. To test the
practical deployment of our solution, we implement RICHTER
and analyze its performance through experiments conducted
in a large-scale testbed including 350 clients and compare its
results with selected baseline approaches. The experimental
results demonstrate the effectiveness of RICHTER for achiev-
ing high users’ QoE, low latency, and optimized network
utilization.
Paper Outline: The remainder of this paper is structured
as follows. Section II-A explains the proposed architecture;
we formulate the problem as a MILP optimization model in
Section II-B, and explain our proposed OL-enabled method in
Section II-C. The evaluation setup, methods, metric and results
are described in Section III. Section IV concludes the paper
and gives an outlook on future work.
II. RICHTER DESIGN
A. System Model
The proposed architecture of RICHTER includes four core
layers and is shown in Fig. 1.
Media Organization Layer. In this layer, the raw live
videos are encoded and packaged into DASH format, then
stored on the origin server. Note that this layer is able to
package the encoded videos to other formats like HLS or
Common Media Application Format (CMAF).
CDN
Server
VTS
(PC.)
Origin
Server
Peer
(Tran.)
VTS
(Tran.)
VTS
(Tran.)
Action Tree
Clients
1
Peer
2345 6 7
Figure 2: RICHTER action tree
CDN Layer. This layer is constructed by a group of CDN
servers (either OTT servers or a purchased service from CDN
providers), each of which contains various parts of video
sequences. Inspired by the Consumer Technology Association
CTA-5004 standard [17], [18], CDN servers periodically in-
form the edge layer about their cache occupancy via Common
Media Server Data (CMSD) messages.
P2P Layer. Given the continuous increases in smartphone
capabilities, e.g., high-bandwidth access to the Internet, en-
ergy resources, and hardware-accelerated video transcoding,
RICHTER utilizes the peers’ idle resources to provide a
distributed video transcoding approach besides video trans-
mission. Like most hybrid P2P-CDN schemes, we construct
the P2P layer based on the tree-mesh structure, including
two types of peers: Seeders and Leechers. In this scheme,
seeders’ requests can be served by all nodes (i.e., CDNs,
origin, edge, or other seeders) except leechers, while leechers’
requests can be served by all nodes. Inspired by the CTA-5004
standard [17], peers periodically inform the edge layer about
their cache occupancies through Common Media Client Data
(CMCD) messages and receive updates from the edge layer
via CMSD messages.
Edge Layer. This layer leverages the capabilities of NFV
and edge computing and presents virtualized edge components
called Virtual Tracker Servers (VTSs) close to base stations
(e.g., gNodeB in 5G). Note that, in the proposed system,
during a live session, clients’ requests are directed to a VTS,
and then they get responses based on the VTS’s decisions.
As shown in Fig. 1, a VTS is equipped with transcoding
and partial cache functions to serve clients’ requests from
existing higher content qualities (by transcoding) or directly
from cached qualities, respectively. Note that because the VTS
has a broader view of both P2P and CDN layers (based on the
received CMCD/CMSD messages and monitored information),
it can track clients’ requests and store a mapping between
all transmitted content and all served clients in its peer-map
lists. Thus, it must respond to the following vital questions
whenever it needs to decide to serve received requests:
1) Where is the optimal place (i.e., adjacent peers, VTS, CDN
servers, or origin server) in terms of lowest latency for
fetching each client’s requested content quality level from,
while efficiently utilizing the available resources?
2) What is the optimal approach for responding to the re-
Hybrid Event - 2022 IEEE Global Communications Conference: Communications Software and Multimedia
1912
Authorized licensed use limited to: Universitaet Klagenfurt. Downloaded on January 16,2023 at 09:35:01 UTC from IEEE Xplore. Restrictions apply.
quested quality level (i.e., fetch or transcode)?
Among other tasks, a VTS monitors the system frequently to
obtain precise information about the available resources (e.g.,
bandwidth, peers’ computational and power resources), and
peers’ joining/leaving times. Therefore, when a VTS receives
a new request, it can find the optimal solution (i.e., in terms
of minimum latency) from the action tree (Fig. 2) (action
numbering as in the figure): (1) Use the P2P network and
transmit the requested quality directly from the best adjacent
peer with maximum stability (i.e., the least recent joining
time). (2) Transcode the requested quality from a higher
quality at the most stable adjacent peer and transmit it through
the P2P network. (3) Fetch the requested quality directly from
the edge, i.e., the VTS. (4) Transcode the requested quality
from a higher quality at the VTS. (5) Fetch the requested
quality from the origin server. (6) Fetch a higher quality from
the best CDN server and transcode it at the VTS. (7) Fetch
the requested quality from the best CDN server.
B. Problem Formulation
We introduce an MILP optimization model that includes
four groups of constraints: Action Selection (AS), Serving Time
(ST), Origin/Peer (CP), and Resource Usage (RU).
(i) AS constraint. Constraint (1) selects an appropriate
action from the proposed action tree (Fig. 2) for the request
issued by peer j. It chooses a suitable value of the binary
variable Xq,t
i,j (refer to Table I for the definition):
X
i∈{P∪V ∪C}\{j}X
t∈T X
q∈Qj
Xq,t
i,j ×αq
i,j = 1,∀j∈ P (1)
(ii) ST constraint. Constraint (2) determines transmitting
time Tq
i,j to transmit quality level q∈ Qjfrom source node i
to peer j:
Pt∈T Xq,t
i,j ×δq
j
ωi,j
≤Tq
i,j ,(2)
∀j∈ P, i ∈ {P ∪ V ∪ C} \ {j}, q ∈ Qj
Constraint (3) determines the required transcoding time τq
i,j
at node iin case of serving the quality requested by peer j
from a higher quality qby transcoding:
X
t∈T \{0}X
q∈Qj
Xq,t
i,j ×µq
i,j ≤τq
i,j ,∀j∈ P, i ∈ {P ∪ V} \ {j}
(3)
Therefore, the serving time, namely Ψ,i.e., fetching time
plus transcoding time, can be expressed as follows:
X
i∈{P∪V ∪C}\{j}X
j∈P X
q∈Qj
Tq
i,j +τq
i,j ≤Ψ(4)
(iii) CP constraint. Constraint (5) forces the model to fetch
the exact quality q∗from the origin or CDNs when one of them
is chosen to serve peer j∈ P.
Table I: Notation for RICHTER
Notation Description
Input Parameters
CSet of kCDN servers and an origin server (i.e.,c= 0)
PSet of npeers including sseeders and lleechers in subsets P1and P2,
respectively
VVirtual Tracer Servers (VTSs)
QjSet of possible quality levels for serving quality q∗requested by j∈ P,
where Qj={q∗, q∗+ 1, ..., q∗
max}and q∗
max is the maximum
quality level for the demanded segment
TSet of possible transcoding statuses, where T={0,1,2}and t= 1 or
t= 2 if the requested quality is transcoded from a higher quality q∈ Qj
at a VTS i∈ V or a peer i∈ {P } \ {j}, respectively; otherwise t=0
RSet of ρpeer regions
QSet of quality level queues in the VTS
αq
i,j Available quality levels in i∈ {P ∪ V ∪ C};αq
i,j = 1 means node i
hosts quality qrequested by peer j∈P; otherwise αq
i,j = 0
ωi,j Available bandwidth on path between i∈ {P ∪ V ∪ C} \ {j},j∈ P
θq
i,j Required resources (i.e., CPU usage in %) for transcoding quality q∈ Qj
into the quality requested by j∈ P in i∈ {P ∪ V}} \ {j}
ηq
i,j Required power (in milliampere-hour) for transcoding quality q∈ Qj
into the quality requested by j∈ P in i∈ {P } \ {j}
µq
i,j Required time (in seconds) for transcoding quality q∈ Qjinto the
quality requested by j∈ P in i∈ {P ∪ V}} \ {j}
δq
jSize of segment in quality q(in bytes) requested by j∈ P
λq
jBitrate for quality level q∈ Qjrequested by j∈ P
ΩiAvailable computation resources (available CPU) of i∈ {P ∪ V}
ϕiAvailable power resources of i∈ P
Variables
Xq,t
i,j Binary variable where Xq,t
i,j = 1 indicates source
i∈ {P ∪ V ∪ C} \ {j}transmits quality q∈ Qjrequested by peer
j∈ P with transcoding status t, otherwise Xq,r,t
i,j = 0
τq
i,j Required transcoding time at source i∈ {P ∪ V} \ {j}to serve quality
q∈ Qjrequested by j∈ P
Tq
i,j Required time of transmitting quality level q∈ Qjin response to peer
j∈ P from server i∈ {P ∪ V ∪ C} \ {j}
ΨServing time consisting of τq
i,j and Tq
i,j
X
i∈C X
q∈Qj
Xq,t=0
i,j ×q=q∗,∀j∈ P (5)
Note that i= 0 in Eq. (5) denotes the origin server.
Moreover, we should prevent seeders from fetching requested
qualities from leechers, expressed in Eq. (6):
X
i∈P2
X
t∈T \{1}X
q∈Qj
Xq,t
i,j ×q= 0,∀j∈ P1(6)
(iv) RU constraint. Constraint (7) guarantees that the
required bandwidth for transmitting segments on the link
between nodes iand jmust respect the available bandwidth:
X
t∈T X
q∈Qj
Xq,t
i,j ×λq
j≤ωi,j ,∀j∈ P, i ∈ {P ∪ V ∪ C} \ {j}(7)
Constraint (8) limits the maximum required processing
capacity for the transcoding operation to the available com-
putational resource.
X
j∈P X
q∈Qj
(Xq,t=1
i∈V,j +Xq,t=2
i∈P\{j},j )×θq
i,j ≤Ωi∀i∈ {P ∪ V}
(8)
Similarly, constraint (9) limits the maximum required peers’
power resources for running the transcoding function to the
available power resource.
Hybrid Event - 2022 IEEE Global Communications Conference: Communications Software and Multimedia
1913
Authorized licensed use limited to: Universitaet Klagenfurt. Downloaded on January 16,2023 at 09:35:01 UTC from IEEE Xplore. Restrictions apply.
X
j∈P X
q∈Qj
Xq,t=2
i,j ×ηq
i,j ≤ϕi∀i∈ P \ {j}(9)
MILP Optimization Model. The following model min-
imizes the requests’ serving times (i.e., fetching time plus
transcoding time), denoted by Ψ:
Minimize Ψ(10)
s.t. constraints Eq.(1) −Eq.(9)
vars. T q
i,j , τ q
i,j ,Ψ≥0, Xq,t
i,j ∈ {0,1}
By running the MILP model (10), an optimal action will be
selected for each request issued by peer j∈ P such that the
total serving time is minimized. However, the MILP model
(10) is NP-hard [19], and suffers from high time complexity.
The next section introduces an OL-based approach based on
SOM [16] to remedy this issue.
C. Proposed Online Learning Approach
We design an OL-based solution depicted in Fig. 3(a),
which works in a time-slotted fashion. As shown in Fig. 3(b),
the proposed time slot structure consists of two intervals:
(i) Collecting Data (CD) and (ii) Serving Requests (SR). In
addition to the transcoding and partial cache functions, a
VTS is equipped with the four following modules: Resource
monitoring Module (RM), Manager Module (MM), Queuing
Module (QM), and OL Agent. Moreover, the VTS hosts
multiple queues, one each per peer region, live video channel,
and bitrate in each channel. In the CD interval, the following
modules are called to prepare inputs for the OL agent:
RM. This module is responsible for collecting received
CMCD and CMSD messages, monitoring available resources
(i.e., bandwidth, power, computation, joining/leaving times),
queues, and notifying the MM module.
MM. This module is used to (i) receive HTTP requests from
players, (ii) extract regions based on IP addresses, requested
channels, and bitrates from the incoming HTTP requests, (iii)
aggregate and forward the incoming HTTP requests and the
extracted information (i.e., region/channel/bitrate) to the QM
module, (iv) update the OL agent based on the items received
by the RM, (v) control the correctness of decisions made by
the OL agent before fetching or transcoding qualities from
nodes, (vi) communicate with the peers, CDN and/or origin
server regarding the decisions made by the OL agent, and
(vii) store popular segments fetched from CDN/origin server
into the partial cache. Note that the MM module immediately
responds to a requested segment that exists in the partial cache.
Furthermore, it includes an on the fly list that is used to prevent
delivering a request to the QM module if a response to the
request is in flight from the CDN/origin server.
QM. This module receives extracted features of requests from
the MM module and places requests in separate queues based
on peer regions, requested channel IDs, and bitrates.
Considering the system’s current state, i.e., available in-
formation on resources and queues of requests provided by
the MM module, the OL agent in the SR interval must run
multiple threads of an OL algorithm (one thread per peer
region) to answer the questions mentioned in Section II-A.
Since SOM [16] (i) is one of the widely used techniques for
unsupervised classification problems, (ii) can be applied to
solve NP-hard problems [20], (iii) does not require a prepared
dataset for supervised model training, (iv) allows online real-
time decision making, and (v) evolves its model quickly over
time, it is adopted as the request management solution in the
OL agent. For each queue Qb∈ Q with requested bitrate level
b, a set of SOM neurons (black circles in Fig. 3(a)) is created,
each of which is a feasible node holding the requested quality
(i.e., same-region peer, VTS, CDN/origin server) or a higher
quality (i.e., same-region peer, VTS) for serving bthrough
fetching or running transcoding, respectively. Note that since
more than one queue can proceed and might violate all/several
resource constraints (e.g., the bandwidth, computation, or
power limitations), they are evaluated in a priority order where
the queue with a higher number of requests comes first.
Each SOM’s neuron has two features (i.e., feature map)
that are defined as a <latency, penalty>tuple. The latency
feature indicates fetching plus transcoding times, while the
penalty feature is used to penalize the neuron whenever the
agent makes an incorrect decision (due to violating one/several
constraints (1)-(9)). For the sake of simplicity, we assume that
each violating action increases the sum of penalties by one.
Moreover, in order to represent the SOM features in the same
space, we use normalized features in the range between 0
and 1. When the SOM thread is executed, it will consider
the neurons’ feature map and classify neurons to find the
best matching unit (BMU) with the maximum reward, i.e.,
minimum <latency, penalty>values. The Euclidean distance
function DQb(i, j) = qP2
n=1 wn
Qb(i[n]−j[n])2as a simple
discriminant function is used to calculate the best matching
of the features used in each neuron jcompared to BMU
i, where wn
Qbin weight matrix wQbis used for the nth
feature of each feature list. Usually, after selecting the BMU,
the corresponding neuron and its neighbors must be updated.
Note that the neighborhood function employed in the SOM
is the Gaussian distribution function HQb(i, j) = e
−DQb(i,j)2
2σ2,
where σis the learning rate. Finally, an output list of tuples
(N, A, R, V ) sorted in ascending order (in terms of latency)
is sent to the MM module, where each tuple indicates the
determined node N, action A, the maximum number of
requests Rthat can be served via that node/action, and a
violation signal V, respectively (Fig. 3(c)). For instance, tuple
(p1,1,2,0) of the output list shows that peer1using action1can
serve two requests without violating the defined constraints.
The MM module follows the SOM decisions for serving
requests with tuples with V= 0, while it ignores tuples with
V≥1. Note that the MM module updates the inputs of the OL
agent (i.e., available resources) regarding the accepted outputs
of the OL agent since the SOM threads might execute several
times during two consecutive CD intervals.
This process will be repeated in each SR interval until the
Hybrid Event - 2022 IEEE Global Communications Conference: Communications Software and Multimedia
1914
Authorized licensed use limited to: Universitaet Klagenfurt. Downloaded on January 16,2023 at 09:35:01 UTC from IEEE Xplore. Restrictions apply.
Region 1
Region 2
Region n
Ch I
Channel n
Q
I
Q
2
Q
3.
Ch
II
.
.
...
Channel I
Q I
Q 2
Q 3
Q n
.
Channel II
.
.
...
Ch I
Q
I
Q
2
Q
3.
Ch
II
.
.
...
Channel I
Q I
Q 2
Q 3
Q n
.
Channel II
.
.
...
Ch I
Q
I
Q
2
Q
3.
Ch
II
.
.
...
Q I
Q 2
Q 3
Q n
.
.
.
...
Queuing Module (QM)
.
.
.
Region 1 Region 2 Region n
Manager Module (MM)
. .
Online Learning Agent
Resource monitoring Module (RM)
1 2 3 456 7
Feature Map
Actions
Thread 1
Thread 2
Thread n
Partial
Cache
..
.
.
.
.
Channel n
Channel n
Channel II
Channel I
.
4
6
5
2
19
3 7
81
8
7
1
7
8
6
2
2
3
3
9
9
44
5
56
CDN/Origin
VTS
Peers
(a)
...
N:
A:
R:
V:
p1
1
2
0
c1
7
900
0
v
4
100
0
N:
A:
R:
V:
N:
A:
R:
V:
N:
A:
R:
V:
p2
2
2
1
N:
A:
R:
V:
N:
A:
R:
V:
v
3
500
0
o
5
300
0
123
Collecting Data (CD) Interval
Serving Requests (SR) Interval
... Time
(b) (c)
Figure 3: (a) Proposed online learning structure, (b) time slot structure, and (c) a sample of the OL agent output
live streaming session ends and all queues are served. Assume
ρ,β, and γindicate the number of peer regions, number of
live channels, and number of bitrates per channel. In the worst
case, the time complexity of the multi-thread SOM method
employed by the OL agent would be O(ρ×β×γ)in each
time slot.
III. PERFORMANCE EVALUATI ON
Evaluation Setup: To assess the effectiveness of RICHTER
in a realistic large-scale environment, InternetMCI1is con-
sidered as a real backbone network topology. We instanti-
ate our testbed including 375 elements, i.e., 350 AStream2
DASH players running the BOLA [21] adaptive bitrate (ABR)
algorithm (seven groups of 50 peers), five Apache HTTP
servers (i.e., four CDN servers with a total cache size of
40% of the video dataset and an origin server, containing all
video sequences), 19 OpenFlow (OF) backbone switches, 45
backbone layer-2 links, and a VTS server (with a partial cache
size of only 5% of the video sequences) on the CloudLab [22]
environment. Each element is run on Ubuntu 18.04 LTS inside
Xen virtual machines. RICHTER is independent of the caching
policy and is compatible with various caching strategies. For
simplicity, Least Recently Used (LRU) is considered in all
CDN and partial caches as the cache replacement policy. Note
that we assume each peer can cache five segments of the
videos at most. We implement all modules of VTS in Python
to serve clients’ requests for five live channels (i.e., CH I–
CH V). Each live channel plays a unique video [23] with 300
seconds duration, comprising two-second segments in bitrate
ladder {(0.089,320p), (0.262,480p), (0.791,720p), (2.4,1080p),
(4.2,1080p)}[Mbps, content resolution].
1http://www.topology-zoo.org/dataset.html; last access: 2022-05-16.
2https://github.com/pari685/AStream; last access: 2022-05-16.
The Docker image jrottenberg/ffmpeg3is utilized to measure
the segment transcoding time on the VTS. To measure the
transcoding time on the heterogeneous P2P network, we run
the transcoding function via FFmpegKit4on an iPhone 11
(Apple A13 Bionic, iOS 15.3), a Xiaomi Mi11 (Snapdragon
888, Android 11), and a PC (Apple M1, MacOS 12.0.1).
Moreover, power consumption is measured via device tools,
such as Android Energy Profiler and Android Battery Manager.
The bandwidth of all links in different paths from the CDN
and origin servers to the VTS are set to 50 and 100 Mbps,
respectively. To emulate the mobile network conditions, we
assume 250 peers initiate the experiments, and then, every
three seconds, a new peer joins the sessions. The VTS directs
the first peer to the best CDN server (in terms of lowest
latency), while other participating peers can be connected
on both CDN and P2P links. A real 4G network trace [24]
collected on bus rides is employed for links between peers to
edge servers in all experiments. The average bandwidth of this
trace is approximately 3780 kbps with a standard deviation
of 3190 kbps. The channel access probability is generated
following a Zipf distribution with the skew parameter α= 0.7,
i.e., the probability of an incoming request for the ith channel
in each peer group is given as prob(i) = 1/iα
PK
j=1 1/jα, where
K= 5. The learning rate and weighting parameters associ-
ated with latency and penalty are set to 0.01, 0.5, and 0.5,
respectively.
Evaluation Methods: The results achieved by the
RICHTER will be compared with the following baseline meth-
ods: (i) Non Hybrid (NOH): regular CDN-based streaming
with no P2P support. (ii) Non Transcoding-enabled Hybrid
(NTH): Like in most works, there is no transcoding capa-
3https://hub.docker.com/r/jrottenberg/ffmpeg; last access: 2022-05-16.
4https://tanersener.github.io/ffmpeg-kit/; last access: 2022-05-16.
Hybrid Event - 2022 IEEE Global Communications Conference: Communications Software and Multimedia
1915
Authorized licensed use limited to: Universitaet Klagenfurt. Downloaded on January 16,2023 at 09:35:01 UTC from IEEE Xplore. Restrictions apply.
ANS
AQS
(%)
(a)
(b)
(c)
(d)
ASD (sec.)
ASB (Mbps)
(e)
APQ
(%)
CHR
ETR
NOH NTH ECT RICHTER
0
20
40
60
80
100
0
20
40
60
80
100
NOH NTH ECT RICHTER
0
1
2
3
4
5
0
1
2
3
4
NOH NTH ECT RICHTER
AST (sec.)
0
10
20
30
0
10
20
30
NOH NTH ECT RICHTER
0
1
2
3
4
5
0
30
60
90
120
150
NOH NTH ECT RICHTER
ASB
AQS
ANS
ASD
APQ
AST
CHR
ETR
0
1
2
3
4
5
BTL (Gbps)
Figure 4: Evaluation results for the NOH, NTH, ECT, and RICHTER systems for 350 clients
bility in this approach. In an NTH-based system, peers only
can be served via one of the actions 1, 5, or 7 (Fig. 2).
(iii) Edge Caching/Transcoding Hybrid (ECT): In this ap-
proach, transcoding at the peer side is not considered, and
requests can be served via all actions except action 2. For fair
comparisons, our testbed with a similar setup is used in all
systems. Moreover, the NOH, NTH, and ECT systems em-
ploy lightweight heuristic approaches to answer the questions
mentioned in Section II-A by considering Eqs. (1)–(10).
Evaluation Metrics: The performance of these systems is
evaluated through the following metrics: (i) Average Segment
Bitrate (ASB) of all the downloaded segments; (ii) Average
Number of Quality Switches (AQS), the average number of
segments whose bitrate level changed compared to the pre-
vious one; (iii) Average Stall Duration (ASD), the average
of total video freeze time of all clients; (iv) Average Number
of Stalls (ANS), the average number of rebuffering events;
(v) Average Perceived Overall QoE (APQ) calculated by the
ITU-T Rec. P.1203 model in mode 05;(vi) Average Serving
Time (AST), defined as the overall time for serving all clients,
including fetching time plus transcoding time; (vii) Backhaul
Traffic Load (BTL), the volume of segments downloaded
from the origin server; (viii) Edge Transcoding Ratio (ETR),
the fraction of segments transcoded at the VTS or peers;
(ix) Cache Hit Ratio (CHR), defined as the fraction of seg-
ments fetched from the CDN or edge servers or peers. Each
experiment is executed 20 times, and the average and standard
deviation values are reported in the experimental results.
Evaluation Results: Running transcoding on peers must
be fast enough, not significantly impose a delay to the live
system, and not consume much battery; otherwise, the clients’
requests may use other actions that congest the network
and edge server. In the first scenario, we run experiments
to investigate the latency and energy overheads of running
transcoding tasks on peers. To evaluate the latency overheads,
we measure transcoding times for a five-minute video in
different resolutions/bitrates on the mobile devices. In fact,
transcoding demands decoding video into raw frames and then
re-encoding those frames into new frames. Thus, transcoding
time at the peer-side is equal to the encoding time due to
leveraging the video processing that is already being done
to capture or view video. As shown in Table II, running
transcoding for the whole video takes 8.5–254.2 seconds on
these devices (0.056–1.69 seconds per segment) and is fast
enough in action 2. In another experiment, we measure the
battery consumption of peers when they (i) play a video, (ii)
transcode a video from a higher quality, or (iii) play video
5https://github.com/itu-p1203/itu-p1203; last access: 2022-05-16.
Table II: Average transcoding times for a 5-min. video on peers
Resolution Bitrate iOS Android PC
1080p →240p 4219k→89k 34.2 53.25 18.85
1080p →360p 4219k →262k 42.7 61 24.9
1080p →720p 4219k →791k 166.5 130.9 53.1
1080p →1080p 4219k →2484k 254.2 249.6 87.2
1080p →240p 2484k →89k 35 55.1 18.9
1080p →360p 2484k →262k 45 62.7 21.8
1080p →720p 2484k →791k 172.2 132.5 52
720p →240p 791k →89k 16.25 34.75 10.8
720p →360p 791k →262k 25.1 49.3 15.4
360p →240p 262k →89k 8.5 19.5 8.8
I, transcode video II, and transmit video III, simultaneously.
The average values for five-minute videos are approximately
0.8%, 0.4%, and 1.3% of peers’ battery usage, respectively.
Thus, a combination of playing, transcoding, and transmitting
tasks does not put a significant burden on the peers’ batteries
compared to the energy used to play or transcode video.
In the second scenario, we evaluate RICHTER’s effective-
ness in terms of the aforementioned metrics and compare the
results with the baseline systems. As illustrated in Fig. 4(a–c),
RICHTER downloads segments with higher ASB, decreases
AQS and ANS, shortens ASD, and thus improves APQ and
AST by at least 59% and 39% compared to the baseline
approaches (Fig. 4(c)), respectively. Thus, the average latency
can be significantly reduced due to shortening the ASD and
AST values. This is because RICHTER utilizes all peers’
possible resources for serving clients. The performance of
RICHTER regarding the CHR, BTL, and ETR metrics is
shown in Fig. 4(d–e). Note that a cache miss event occurs
when (i) the requested or higher quality levels are not avail-
able in the partial caches or on CDN servers, (ii) avail-
able bandwidth values are insufficient to fetch the requested
or higher quality levels from CDN servers or peers, (iii)
the available edge or peers’ processing capabilities are not
sufficient to transcode the requested quality from a higher
quality. The CHR and BTL metrics indicate that RICHTER
outperforms other systems due to its ability to fetch requested
or higher quality levels in a hybrid system or using distributed
transcoded. Although RICHTER downloads fewer segments
from the origin server and improves backhaul bandwidth usage
(by about 70%) compared to ECT, it uses more computation
resources of the edge and P2P layer due to employing a
distributed transcoding approach.
IV. CONCLUSION AND FUT UR E WO RK
This paper presents RICHTER, a hybrid P2P-CDN archi-
tecture for HAS-based live video streaming services. We (i)
introduce an action tree that defines all the possible actions
to serve HAS clients (from peers, CDN, edge, or origin
servers) with maximum users’ QoE and minimum latency,
(ii) formulate the action decision problem as an MILP op-
timization, and (iii) solve the formulated problem using an
Hybrid Event - 2022 IEEE Global Communications Conference: Communications Software and Multimedia
1916
Authorized licensed use limited to: Universitaet Klagenfurt. Downloaded on January 16,2023 at 09:35:01 UTC from IEEE Xplore. Restrictions apply.
online learning, SOM-based approach. Experimental results
on a large-scale testbed indicate the superiority of RICHTER
compared to its competitors. Extending the proposed action
tree and employing a reinforcement learning approach are our
future research directions.
ACKNOWLEDGMENT
The financial support of the Austrian Federal Ministry for
Digital and Economic Affairs, the National Foundation for
Research, Technology and Development, and the Christian
Doppler Research Association is gratefully acknowledged.
Christian Doppler Laboratory ATHENA: https://athena.itec.
aau.at/. This research has been supported in part by the
Singapore Ministry of Education Academic Research Fund
Tier 2 under MOE’s official grant number MOE2018-T2-1-
103.
REFERENCES
[1] Sandvine, “The Global Internet Phenomena Report,” White Paper, Jan-
uary 2022. [Online]. Available: https://www.sandvine.com/phenomena
[2] A. Bentaleb, B. Taani, A. C. Begen, C. Timmerer, and R. Zimmermann,
“A survey on bitrate adaptation schemes for streaming media over
HTTP,” IEEE Communications Surveys & Tutorials, 2018.
[3] A. A. Barakabitze, N. Barman, A. Ahmad, S. Zadtootaghaj, L. Sun,
M. G. Martini, and L. Atzori, “QoE management of multimedia
streaming services in future networks: a tutorial and survey,” IEEE
Communications Surveys & Tutorials, 2019.
[4] N.-N. Dao, A.-T. Tran, N. H. Tu, T. T. Thanh, V. N. Q. Bao, and S. Cho,
“A Contemporary Survey on Live Video Streaming from a Computation-
Driven Perspective,” ACM Computing Surveys (CSUR), 2022.
[5] R. Farahani, “CDN and SDN Support and Player Interaction for HTTP
Adaptive Video Streaming,” in Proc. 12th ACM Multimedia Systems
Conf., 2021.
[6] S. Budhkar and V. Tamarapalli, “An overlay management strategy
to improve QoS in CDN-P2P live streaming systems,” Peer-to-Peer
Networking and Applications, 2020.
[7] N. Anjum, D. Karamshuk, M. Shikh-Bahaei, and N. Sastry, “Survey on
peer-assisted content delivery networks,” Computer Networks, 2017.
[8] H. Yousef, J. Le Feuvre, P.-L. Ageneau, and A. Storelli, “Enabling
adaptive bitrate algorithms in hybrid CDN/P2P networks,” in Proc. 11th
ACM Multimedia Systems Conf., 2020.
[9] N. Muscat and C. J. Debono, “A Hybrid CDN-P2P Architecture for
Live Video Streaming,” in IEEE EUROCON Int’l. Conf. on Smart
Technologies, 2021.
[10] R. Farahani, F. Tashtarian, A. Erfanian, C. Timmerer, M. Ghanbari,
and H. Hellwagner, “ES-HAS: An Edge- and SDN-Assisted Framework
for HTTP Adaptive Video Streaming,” in Proc. 31st ACM NOSSDAV
Workshop, 2021.
[11] R. Farahani, F. Tashtarian, H. Amirpour, C. Timmerer, M. Ghanbari,
and H. Hellwagner, “CSDN: CDN-Aware QoE Optimization in SDN-
Assisted HTTP Adaptive Video Streaming,” in Proc. 46th IEEE Conf. on
Local Computer Networks (LCN), 2021.
[12] R. Farahani, F. Tashtarian, C. Timmerer, M. Ghanbar, and H. Hellwag-
ner, “LEADER: A Collaborative Edge- and SDN-Assisted Framework
for HTTP Adaptive Video Streaming,” in Proc. IEEE Int’l. Conf. on
Communications (ICC), 2022.
[13] S. Nacakli and A. M. Tekalp, “Controlling P2P-CDN live streaming
services at SDN-enabled multi-access edge datacenters,” IEEE Trans.
on Multimedia, 2020.
[14] R. Farahani, H. Amirpour, F. Tashtarian, A. Bentaleb, C. Timmerer,
H. Hellwagner, and R. Zimmermann, “RICHTER: hybrid P2P-CDN
architecture for low latency live video streaming,” in Proc. of the 1st
Mile-High Video Conference, 2022, pp. 87–88.
[15] Z. Ma, S. Roubia, F. Giroire, and G. Urvoy-Keller, “When Locality is not
enough: Boosting Peer Selection of Hybrid CDN-P2P Live Streaming
Systems using Machine Learning,” in Network Traffic Measurement and
Analysis Conf (IFIP TMA), 2021.
[16] T. Kohonen, “Self-Organizing Maps,” Springer Science & Business
Media, 2012.
[17] CTA-5004, “Web application video ecosystem–common media client
data.” 2020. [Online]. Available: https://cdn.cta.tech/cta/media/media/
resources/standards/pdfs/cta-5004- final.pdf
[18] CTA-WAVE, “Common-media-server Data.” 2021. [Online]. Available:
https://github.com/cta-wave/common-media-client-data/issues/19
[19] M. R. Garey et al.,Computers and Intractability. A Guide to the Theory
of NP-Completeness. W.H. Freeman, 1979.
[20] A. Bentaleb, M. N. Akcay, M. Lim, A. C. Begen, and R. Zimmermann,
“Catching the moment with LoL+ in Twitch-like low-latency live
streaming platforms,” IEEE Trans. on Multimedia, 2021.
[21] K. Spiteri, R. Urgaonkar, and R. K. Sitaraman, “BOLA: Near-optimal
bitrate adaptation for online videos,” in 35th IEEE Int’l. Conf. on
Computer Communications, 2016.
[22] R. Ricci, E. Eide, and C. Team, “Introducing CloudLab: Scientific
infrastructure for advancing cloud architectures and applications,” login::
The Magazine of USENIX & SAGE, 2014.
[23] S. Lederer, C. M¨
uller, and C. Timmerer, “Dynamic adaptive streaming
over HTTP dataset,” in Proc. 3rd ACM Multimedia Systems Conf., 2012.
[24] D. Raca, J. J. Quinlan, A. H. Zahran, and C. J. Sreenan, “Beyond
throughput: a 4G LTE dataset with channel and context metrics,” in
Proc. 9th ACM Multimedia Systems Conf., 2018.
Hybrid Event - 2022 IEEE Global Communications Conference: Communications Software and Multimedia
1917
Authorized licensed use limited to: Universitaet Klagenfurt. Downloaded on January 16,2023 at 09:35:01 UTC from IEEE Xplore. Restrictions apply.