ArticlePDF Available

Adapting Caching to Audience Retention Rate: Which Video Chunk to Store?

December 2015

December 2015

Source
arXiv

Authors:

Lorenzo Maggi

Huawei Technologies

Lazaros Gkatzikis

University of Thessaly

Georgios S. Paschos

Huawei Technologies

Rarely do users watch online contents entirely. We study how to take this into account to improve the performance of cache systems for video-on-demand and video-sharing platforms in terms of traffic reduction on the core network. We exploit the notion of "Audience retention rate", introduced by mainstream online content platforms and measuring the popularity of different parts of the same video content. We first characterize the performance limits of a cache able to store parts of videos, when the popularity and the audience retention rate of each video are available to the cache manager. We then relax the assumption of known popularity and we propose a LRU (Least Recently Used) cache replacement policy that operates on the first chunks of each video. We characterize its performance by extending the well-known Che's approximation to this case. We prove that, by refining the chunk granularity, the chunk-LRU policy increases its performance. It is shown numerically that even for a small number of chunks (N=20), the gains of chunk-LRU are still significant in comparison to standard LRU policy that caches entire files, and they are almost optimal.

Histogram of watch-time in YouTube (based on a data sample of 7000 video files from [5]). On average 60% of a file is watched.

…

Watch-time distribution for different classes of video popularity. The average watch-time of a video increases with its popularity.

…

Average watch-time is increasing with the popularity of files, but steeply decreasing with its duration.

…

System model

…

Instance of audience retention rate from YouTube. IV. PERFORMANCE LIMITS OF PARTIAL CACHING This section analyzes the performance limits of partial caching in the context of audience retention rate. Our performance metric is core network traffic and we tackle the off-line problem of finding the optimal static (partial) file cache allocation 6 . In particular, we will compare the maximum network traffic saved by caching entire videos versus caching arbitrary portions of each of those. In both cases it is idealistically assumed that the video popularity distribution {p i } i∈M and the audience retention rate functions {R i } i∈M are perfectly known to the cache manager. This analysis serves as an upper bound for any cache management strategy with more limited information, as the one devised in Section V. Let us first formalize our problem. We define the partial allocation Y i ⊆ [0; 1] of video i to be the collection of (possibly ) non-adjacent bytes, that are selected to be permanently stored in the cache. Subject to a partial allocation Y i , any requests for the remaining portions [0; 1]\Y i need to be served by the origin video store. Due to the specific retention rate for this video, this happens with probability

…

Figures - uploaded by Lazaros Gkatzikis

Content may be subject to copyright.

Content uploaded by Lazaros Gkatzikis

Content may be subject to copyright.

Adapting Caching to Audience Retention Rate:

Which Video Chunk to Store?

Lorenzo Maggi?, Lazaros Gkatzikis?, Georgios Paschos?, and J´

er´

emie Leguay?

Abstract—Rarely do users watch online contents entirely. We

study how to take this into account to improve the performance of

cache systems for video-on-demand and video-sharing platforms

in terms of trafﬁc reduction on the core network. We exploit the

notion of “Audience retention rate”, introduced by mainstream

online content platforms and measuring the popularity of differ-

ent parts of the same video content. We ﬁrst characterize the

performance limits of a cache able to store parts of videos, when

the popularity and the audience retention rate of each video are

available to the cache manager. We then relax the assumption of

known popularity and we propose a LRU (Least Recently Used)

cache replacement policy that operates on the ﬁrst chunks of

each video. We characterize its performance by extending the

well-known Che’s approximation to this case. We prove that, by

reﬁning the chunk granularity, the chunk-LRU policy increases

its performance. It is shown numerically that even for a small

number of chunks (N= 20), the gains of chunk-LRU are still

signiﬁcant in comparison to standard LRU policy that caches

entire ﬁles, and they are almost optimal.

Index Terms—cache, audience retention rate, chunk, LRU

I. INTRODUCTION

Content Distribution Networks (CDN) and Video on De-

mand applications use network caches to store the most

popular contents near the user and reduce backhaul bandwidth

expenditure. The future projections for the cost of memory

and bandwidth promote the use of caching to satisfy the ever-

increasing network trafﬁc [1]. Since the bandwidth saving

potential of caching is restricted by the number of ﬁles that ﬁt

in the cache (the cache capacity), it is interesting to maximize

the caching effectiveness under such a constraint. Here we

consider the use of partial caching, a technique according to

which we may cache speciﬁc parts of ﬁles, instead of whole

ones.

We focus on video ﬁles which represent a signiﬁcant

fraction of the global Internet trafﬁc (64% according to [2]).

Videos are the most representative example of contents that

are only partially retrieved, since speciﬁc parts of a video

are viewed more than others. Typically, the average user

will “crawl” several video ﬁles before watching one in its

entirety. The above imply that most of the times it is not

needed to cache the entire video. Indeed Fig. 1 shows the

video watch-time from a trace of 7000 YouTube videos. The

histogram emphasizes the fact that the vast majority of ﬁles

is only partially watched, and motivates the design of caching

algorithms that avoid caching rarely accessed video parts, e.g.

the tail.

Optimization of caching is often based on ﬁle popularity.

Storing the most popular ﬁles results in more cache hits,

?Mathematical and Algorithmic Sciences Lab, France Research Center,

Huawei Technologies Co. Ltd.

Figure 1: Histogram of watch-time in YouTube (based on a data sample of 7000 video

ﬁles from [5]). On average 60% of a ﬁle is watched.

which decreases the impact on the trafﬁc on the core network.

Nevertheless, not all the parts of a ﬁle are equally popular

[3]. Hence, a natural generalization of “store the most popular

ﬁles” is to split the video ﬁles into chunks and “store the most

popular chunks” instead. To differentiate the popularity of each

video chunk we use the metric of the audience retention rate

[4], which measures the popularity of different parts of the

same ﬁle. It has many advantages: it is ﬁle speciﬁc, it is

available in most content distribution platforms, e.g., YouTube

[4], and it evolves very slowly over time, which facilitates its

easy estimation1. The latter is not generally true for chunk

popularity which are affected by the time-varying popularity

of the corresponding ﬁle.

In this paper we establish a link between the audience re-

tention rate and the efﬁciency of partial caching. Our approach

is based on decomposing popularity into video popularity

and video retention rate. More speciﬁcally, we address the

following questions: i)How much bandwidth could we save

via partial caching of video content and ii)Is this gain

achievable by practical caching algorithms?

A. Related Work

Partial caching techniques were ﬁrst reported in the context

of proxy caching, where it was proposed to store the ﬁle

headers to improve latency performance [6]. To capture both

latency and bandwidth improvements, [7] splits the ﬁles into

segments of exponentially increasing size. More generally, it

is possible to cache speciﬁc chunks in order to capture the

different popularity of sections within a ﬁle (a.k.a. internal

popularity) [3], [8].

1The quasi-static nature of audience retention rate relates to ﬁle particular-

ities, e.g. a movie may become uninteresting towards the end.

arXiv:1512.03274v1 [cs.NI] 10 Dec 2015

Intuitively, extreme chunking (e.g. at byte level) offers ﬁner

granularity and potentially leads to the optimal caching perfor-

mance. However, tracking popularity at such ﬁne granularity

is impractical and leads to algorithms of prohibitively high

complexity [9]. A series of works suggest to split each ﬁle

into a small number of chunks and treat each chunk indepen-

dently [7], [10]. Alternatively, it is proposed to model internal

popularity as a parametric k-transformed Zipf distribution [9],

[11]. Knowing the distribution type, simpliﬁes the estimation

task but still requires parameter estimations individually for

each ﬁle. Deducing the optimal size and number of chunks

is not straightforward. It was recently shown that restricting

to nhomogeneous chunks incurs a loss which is bounded

by O(n−2) [8]. Alternative heuristic approaches suggest that

only a speciﬁc segment of each ﬁle should be cached and

dynamically adjust its size. For instance, [12] proposes a seg-

mentation scheme where initially the whole object is cached

but the segment size is gradually set equal to its estimated

average watch-time. Similar adaptive strategies have been

also considered for peer-to-peer networks [13], where starting

from a small segment, the portion to be cached is increased

according to the number of requests and watch-time. The

caching of several segments of each ﬁle was proposed in [14],

since users may be interested only in speciﬁc, non-contiguous

parts of ﬁles. In this case the segment size has to be selected

accordingly.

In this paper we prove that the performance of partial

caching indeed improves when the ﬁle is split into chunks. We

develop an analytical framework for LRU performance under

partial caching and we use it to show that the performance

gains of partial caching remain signiﬁcant even for a small

number of chunks. Up to the authors’ knowledge, there are no

studies assessing analytically the actual performance of such

cache management strategies and their inherent performance

limits under the partial viewing assumption.

B. Main contributions

We ﬁrst investigate a trace of YouTube data [5] and conclude

that partial caching has a great potential to improve perfor-

mance, mainly because: (i) the average video watch-time is

no more than 70%, and (ii) the larger the video is the less

its average watch-time. Motivated by this, in Section IV we

present an analysis of trafﬁc bandwidth reduction which is

based on the audience retention rate. Combining the theoretical

analysis with the YouTube data, we show that in realistic

settings the trafﬁc reduction of partial caching over traditional

caching may reach up to 50%.

The above analysis compares the performance limits of

the two caching approaches assuming known popularity and

retention rates. Therefore, it is also interesting to investigate

the bandwidth beneﬁts from partial caching in a more realistic

setting. In Section V we design a class of practical chunk-LRU

(Least Recently Used) policies, which split ﬁles into different

chunks and always drop (i.e., never cache) the last chunk at

the tail of ﬁles. Chunk-LRU policies harness the realistic gain

of partial caching due to video watch-time. Moreover we gain

intuition into designing optimal chunking and we show that

the maximum performance can be approached with a small

number of chunks of equal size.

Our main technical contributions to the literature are:

•We formulate the trafﬁc reduction optimization problem

and provide a waterﬁlling algorithm to solve it efﬁciently.

For the special case where users watch each video con-

tinuously until they abandon it, we derive the optimal

waterﬁlling partial allocation in closed form. It consists

of caching a compact interval [0, ν]of the ﬁle where ν

is given in closed form.

•We propose a novel chunk-LRU algorithm that splits each

ﬁle in N+ 1 chunks where the last one is never cached.

•We build an analytical framework to analyze the chunk-

LRU performance under partial viewing, subject to Che’s

approximation for LRU performance, [15].

•We provide a sufﬁcient condition for retention rates such

that sub-splitting chunks is always beneﬁcial.

•We characterize the optimal performance of chunk-LRU

as a simple optimization problem over the tail drop factor

and with inﬁnitesimal chunking.

II. YOU TUB E VID EO WATCH -TI ME

In this section we examine YouTube access traces2[5] in

order to analyze the average video watch-time, which is the

portion (∈[0; 1]) of each ﬁle watched by the users. Watch-

times are crucial for caching: using partial caching we may

avoid to cache rarely watched parts of videos and use the freed

cache space to store more ﬁles.

Since most strategies try to cache the most popular ﬁles,

ﬁrst we investigate the relationship between average watch-

time and ﬁle popularity. We classify videos into 10 groups

according to their average daily views. Fig. 2 depicts the

estimated probability density function of watch-time for three

representative groups, the 10% most popular videos, the 10%

least popular, and the intermediate ones. Interestingly, we

observe that the more popular a video is, the higher the

average watch-time. However, even for the most popular ones,

on average only 72% of each video is watched, which leaves

room for caching optimization.

Figure 2: Watch-time distribution for different classes of video popularity. The average

watch-time of a video increases with its popularity.

2The dataset is publicly available and was crawled using the YouTube Data

API in 2013. It contains information about 7000 ﬁles, including daily views,

watch-time, duration, genre and title of each ﬁle.

Figure 3: Average watch-time is increasing with the popularity of ﬁles, but steeply

decreasing with its duration.

Next, we investigate the relationship between watch-time

and video duration. The latter is a critical parameter for

caching due to the cache capacity constraint which eventually

determines caching performance. If longer videos are only

partially watched, avoiding to cache their unwatched parts

will yield a greater beneﬁt. In Fig. 3 we depict with dots

the YouTube data for the 20% most popular ﬁles. In order to

identify how the watch-time is affected by the video duration

and its popularity, we use locally weighted polynomial regres-

sion [16] to ﬁt a smoothed surface to the corresponding data.

Notice that the most beneﬁcial regime for caching purposes

corresponds to the upper left corner of the plot, namely highly

popular videos of large size. We observe that in this region the

average watch-time is around 0.7. In addition, independently

of the video popularity, watch-time decreases rapidly with

video duration.

We then group the available data to 10 classes according to

their popularity and duration (≷200 sec). We depict the details

of the derived classes in Table I, namely for each class we

depict the average watch-time, the fraction of videos belonging

to this class and its average duration in seconds. We observe

that the large and popular videos amount to a non-negligible

percentage of 5%. In addition, the average watch-time of large

ﬁles is signiﬁcantly smaller than that of smaller ones. To

precisely evaluate the impact of watch-time to caching, we

use these data in the subsequent Sections IV,V to quantify

the theoretical maximum and the practically feasible caching

performance.

III. SYS TE M MOD EL

We consider a communication system where users download

video contents from the network. Let M={1, . . . , M }

be the video content (or simply, video) catalog. Each video

i∈ M is of size Sibytes. Content requests are generated

using the well-known Independent Reference Model (IRM)

[17] according to which the requests for the videos Mare

independent of each other. We call pithe probability that

video iis requested, given that a video request has arrived.

Equivalently, the sequence of video requests can be thought

of as Mindependent homogeneous Poisson processes with

intensity rate proportional to the probability vector {pi}i. For

M: video catalog of cardinality |M| =M

C: cache size

pi: popularity of video i

Ri(τ): audience retention rate of video i

πi(τ): viewing abandonment p.d.f. of video i

Si: size of video i

Bs(Y): trafﬁc bandwidth on the core network when the portion

Yiof video iis statically stored in the cache (see Eq. (2))

N+1 : number of chunks for chunk-LRU

B: minimum core network trafﬁc achieved by optimal partial

caching

[xk−1, xk]:k-th chunk of a video

x: collection of chunks

ν: tail drop factor for chunk-LRU; the last chunk [ν; 1] is

never stored in the cache

hk,i : hit rate of the k-th chunk of video i

tC: characteristic time for chunk-LRU

BcLRU(x, ν ): trafﬁc on core network with chunk-LRU (see Eq. (9))

subject to the chunking xand a tail drop factor ν

BcLRU : optimal trafﬁc performance for chunk-LRU (see Eq. (11))

Table II: Table of notation symbols

convenience of notation, we assume that the probabilities are

in decreasing order, i.e., p1≥p2≥ ·· · ≥ pM.

One cache of size Cbytes is deployed in the network.3

Whenever a requested video is found in the cache, the cache

itself can directly serve the user. Otherwise, the video needs to

be retrieved through the core network, which provides access

to a central video content store containing the entire video

catalog, see Fig. 4. Hence, good caching performance has a

profound impact on the trafﬁc reduction on the core network.

The goal of this paper is to determine the extra bandwidth

beneﬁts that may be gained by exploiting the fact that videos

are rarely watched entirely.

Figure 4: System model

A. Viewing Behavior Model: Audience Retention Rate

To mathematically analyze the impact of watch-time, we

introduce the central notion of audience retention rate Ri(τ).

According to Youtube’s deﬁnition, the audience retention rate

Ri(τ)measures the percentage of users that are still watching

video iat the corresponding (normalized) instant τ, out of the

overall number of views [4]. As we will see, in our analysis the

retention rate has a prominent role in determining the caching

performance.

Typically a user may watch video ifrom instant ai(1) up

to bi(1), then she possibly skips to ai(2) and watches until

bi(2), and so forth4. The watched part Wi, which equals the

3Our analysis can be extended to a cache hierarchy by letting piexpress

the probability that a request for video iis missed by the caches at all the

child nodes [1].

4We remark that such intervals may also overlap, i.e., a user may rewind the

video and watch a part of it multiple times. We assume that, if this occurs, then

the user can directly retrieve the video portion that she has already watched

from her terminal’s cache.

Popularity \Duration Small Large

Av. watch-time Fraction of population Av. Duration (sec) Av. watch-time Fraction of population Av. Duration (sec)

Lowest 0.52 0.179 81 0.37 0.020 220

Low 0.6 0.162 112 0.47 0.036 220

Medium 0.64 0.153 128 0.57 0.045 223

High 0.67 0.152 130 0.60 0.047 222

Highest 0.72 0.145 124 0.65 0.053 235

Table I: The characteristics of each class of videos. These data will be used to derive realistic and class-speciﬁc retention rates for our numerical evaluation.

minimum portion of video ithat the user needs to download,

is the union of all watch intervals j:

Wi=∪j[ai(j); bi(j)].

We call |Wi|the watch-time of user watching video i. For ease

of notation we consider ai, bi∈[0; 1] as portions of the whole

video duration. The “audience retention rate5” function Ri(τ)

can be then formally deﬁned as the probability that a user has

watched the (normalized) instant τof the video, i.e.,

Ri(τ) = Pr (τ∈Wi), τ ∈[0; 1].

Alternatively, we may think of Ri(τ)as the fraction of users

that watch the (normalized) instant τof the video i.

We remark that, thanks to the deﬁnition of Ri, we can easily

evaluate the average watch-time for video ias R1

0Ri(τ)dτ.

Next we devise a realistic and more speciﬁc viewing behav-

ior model and we derive its relationship to audience retention

rate.

1) Viewing Abandonment Model: This is a special instance

of the viewing model presented above. It assumes that users

always start watching each video ifrom its beginning, and

they abandon it after a random time portion bi∈[0; 1].

Hence, in this case the watched part Witakes on the simple

form Wi= [0; bi], thus biequals the watch-time. We call

πi(.)the probability density distribution of the abandonment

time variable bi. The relationship between the abandonment

distribution πiand the audience retention rate Riis described

by the expression:

Ri(τ) = 1 −Zτ

πi(t)dt. (1)

Hence, in this case the audience retention rate Ri(τ)measures

the fraction of users with watch-time higher than τfor the par-

ticular video i. We ﬁrst observe from (1) that Riis inherently

non-increasing, with Ri(0) = 1. We also remark that, under

the viewing abandonment assumption, the audience retention

rate Riuniquely describes the random watch behavior [0; bi]

of user via πi. This observation does not hold though for

the general case described in Section III-A, where the same

retention rate Rimay result from an arbitrary distribution of

watch behaviors.

In order to come up with a realistic audience retention

rate function from the estimated parameters in Tab. I for our

numerical investigations in Sections IV-C,V-D we assume that

the viewing abandonment model holds.

5Our deﬁnition is in accordance with the deﬁnition of audience retention (or

“engagement”) rate by Wistia.com [18]. Youtube’s audience retention rate [4]

actually counts the video rewinds as multiple views inside the same videos.

Figure 5: Instance of audience retention rate from YouTube.

IV. PERFORMANCE LIMITS OF PARTIAL CACHI NG

This section analyzes the performance limits of partial

caching in the context of audience retention rate. Our perfor-

mance metric is core network trafﬁc and we tackle the off-line

problem of ﬁnding the optimal static (partial) ﬁle cache allo-

cation6. In particular, we will compare the maximum network

trafﬁc saved by caching entire videos versus caching arbitrary

portions of each of those. In both cases it is idealistically

assumed that the video popularity distribution {pi}i∈M and

the audience retention rate functions {Ri}i∈M are perfectly

known to the cache manager. This analysis serves as an upper

bound for any cache management strategy with more limited

information, as the one devised in Section V.

Let us ﬁrst formalize our problem. We deﬁne the partial

allocation Yi⊆[0; 1] of video ito be the collection of (pos-

sibly) non-adjacent bytes, that are selected to be permanently

stored in the cache. Subject to a partial allocation Yi, any

requests for the remaining portions [0; 1]\Yineed to be served

by the origin video store. Due to the speciﬁc retention rate

for this video, this happens with probability R[0;1]\YiRi(τ)dτ.

Therefore, under a partial allocation vector Y, we may express

the expected trafﬁc on the core network per request B(Y)as

B(Y) = X

i∈M

SipiZ[0;1]\Yi

Ri(τ)dτ. (2)

Considering the video size Siand cache size C, a partial

allocation vector Yis feasible whenever Pi∈M SiRYi1dx =

C. Our goal is to select a feasible vector Ythat minimizes

the incurred trafﬁc Bs(Y), i.e.,

Y∗= argmin

B(Y)(3)

s.t.Pi∈M SiRYi1dx =C

Yi⊆[0; 1]

If users always watch the whole video, i.e., Ri(τ) = 1 for

all τ∈[0; 1] and i∈ M, then the optimization (3) takes a

6We remark that in our analysis of the optimal trafﬁc bandwidth B(Y∗)

we assumed that the videos Y∗are already present in the cache and we

did not take into account the trafﬁc needed to ﬁll the cache. If we wish

to incorporate this aspect, we could say that B(Y∗)is the expected trafﬁc

achieved asymptotically over a number of requests tending to inﬁnity.

simple form which is solved by the well-known store the most

popular videos policy. In this case, we would choose to fully

store, Yi= [0; 1], the videos of highest piup to the cache

capacity and no portion of the rest, i.e. Yi=∅otherwise. As

indicated by the previous section however, in reality this is not

the case, hence we expect Y∗to bring certain improvement,

that we evaluate in Section IV-C.

Technically speaking, if we lift any assumption on the

shape of the audience retention rate, the best cache allocation

should intuitively prescribe to partition all videos at the ﬁnest

granularity (at the byte level, say), order them according to

their popularity, and ﬁll the cache with the most popular bytes.

We now provide an equivalent waterﬁlling characterization of

the optimal partial video allocation Y∗to solve this problem.

The main advantage of this formulation lies in the fact that it

leads to an efﬁcient algorithm to compute Y∗, that we present

at the end of the section.

Theorem 1. The optimal partial video allocation Y∗can be

expressed as

Y∗

i(µ) = {τ:piRi(τ)≥µ} ∀ i∈ M,(4)

where µis such that Pi∈M Si|Y∗

i(µ)|=C, where |.|is the

size7of a subset of [0; 1].

Informally speaking, the water level µdetermines a popu-

larity threshold above which a byte of any video deserves to

be stored in the cache.

A. Viewing Abandonment Model

In the special case of viewing abandonment model, we

already observed that the audience retention rate Riis non-

increasing for all i∈ M. This allows us to specialize our

result in Theorem 1 as follows.

Corollary 1. Consider the viewing abandonment model with

strictly decreasing Ri, for all i∈ M. The optimal video

allocations writes Y∗= [0; η∗

i]for all i∈ M, where











η∗

i(µ) = 





1 if piRi(1) ≥µ(µ≥0)

0 if pi≤µ

R−1

i(µ/pi) otherwise

Pi∈M Siη∗

i(µ) = C.

(5)

A remarkable observation here is that optimum bandwidth

performance is achieved by splitting every video in only two

parts and caching the ﬁrst one. We may determine the exact

splits if the abandonment distribution is given. For instance,

if πiis truncated exponential one with parameter λi, i.e.,

πi(τ) = λi

1−e−λie−λiτ, τ ∈[0; 1],

then the following holds.

Corollary 2. Under the exponential viewing abandonment

model the optimal video allocations writes Y∗= [0; η∗

i]for

7formally deﬁned as the Lebesgue measure

all i∈ M, where

(η∗

i(µ) = h−1

λiln µ

pi(1 −e−λi) + e−λii+

,(µ≥0)

m=1 Siη∗

i(µ) = C.

(6)

B. Computation of Optimal Performance

To solve (3), we observe that it can be expressed as a

separable convex optimization problem with linear and box

constraints. If we further assume that the functions Rido

not have any plateau, then the objective function becomes

strictly convex, thus we can adapt the algorithm presented

in (Section 7.2, [19]) to our scope in order to efﬁciently

compute the optimal cache partial video allocation Y∗. We

present below the high-level description of the algorithm. An

interested reader may ﬁnd in the Appendix the implementation

details.

Waterﬁlling algorithm

Set k:= 0. Set M(0) := M.

while M(k)6=∅

•reﬁne the search of the set of indices M(k)in correspondence

to which the optimal solution is deemed to be in the interior of

the box constraint

•if the approximated solution for video i∈ M(k)falls beyond

the box [0; 1], it is rounded to the nearest boundary; it is now

optimal and discarded from M(k)

•set k:= k+ 1

end

C. Performance Evaluation with Real Data

In order to evaluate the performance of the optimal partial

allocation in a realistic scenario we utilize the average watch-

time parameters shown in Tab. I. In Fig. 6 we compare the

core network trafﬁc B=Bs(Y∗)generated by the optimal

partial caching strategy with the one produced by the most

natural strategy prescribing to store the most popular videos

in their entirety. We observe that remarkable gains from partial

caching are achieved for cache size ratios higher than 10−2of

the total catalog size, which we typically ﬁnd in current CDN

scenarios.

We then show in Fig. 7 the optimal portion of videos

that should be stored according to the same optimal caching

strategy, for different values of the cache size. Interestingly,

only very popular videos are stored in their entirety, even for

large cache sizes.

We ﬁnally remark that in this paper we normalize all the

core network trafﬁc ﬁgures with respect to the minimum

bandwidth per video request Bnc required to serve the users

when no cache is deployed in the system, which equals

Bnc =

i=1

SipiZ1

Ri(τ)dτ. (7)

Figure 6: Core trafﬁc generated by the optimal partial caching strategy in a realistic

scenario vs. the trafﬁc produced by storing the most popular videos in their entirety.

We show in red the resulting performance gain by using the ﬁrst strategy. We utilized

the parameters obtained via real data shown in Tab. I. The video popularity distribution

follows a Zipf law with parameter 0.8 [17]. Sis denoted as the average video size.

Figure 7: Optimal portion of videos that should be stored according to the same optimal

caching strategy in Fig. 6. Given a certain C/SM , the video with video popularity x

should be stored from its beginning up to portion y.

V. A PRAC TI CA L CHU NK -LRU SC HE ME F OR DECREASING

RET EN TI ON RATE S

After analyzing the best performance that can only be

achieved with full information on the system parameters, we

turn to the study of a practical cache update scheme that shows

good performance even when popularity piand audience

retention rate Riare unknown for each video i.

It is a widespread understanding that the Least Recently

Used (LRU) cache replacement policy represents a good

trade-off between hit-rate performance and implementation

complexity in a real scenario where no statistics on video

popularity are available to the cache manager. Moreover,

thanks to its short memory it reacts quickly to variations in

video popularity. In its simplest form though, each time a video

is requested even only partially by a user and is not found in

the cache, LRU would prescribe to cache it in its entirety

(and to update the LRU recency table accordingly). Since

users rarely watch videos entirely, as previously observed, the

standard LRU would generate extra-trafﬁc in the core network

and would waste precious cache space to store unpopular

portions of ﬁles.

In order to counter this, we propose a new cache manage-

ment policy that generalizes the classic LRU policy. We ﬁrst

Figure 8: Video split into N+1 chunks. Only the ﬁrst Nare considered for chunk-LRU;

the last one is never stored in the cache.

suggest to split each video into N+1 consecutive and non-

overlapping chunks. We denote by [xi−1;xi]the i-th chunk.

Moreover, we argue that the last (i.e., the (N+ 1)-th) chunk

of each video, which is the least popular part under the

assumption of decreasing audience retention rate, should never

be stored in the cache, even if requested by a user. Intuitively,

this frees up space for more popular chunks of less popular

videos to be stored in the cache. We call νthe tail drop factor

that pinpoints the position of the last chunk. Hence, the ﬁrst

Nchunks of each video are stored only if requested, and then

evicted from the cache in an LRU fashion.

Remark 1. For the sake of analysis simplicity we assume that

the chunk splitting x, ν does not depend on the identity of the

ﬁle. We leave this as a future extension.

Performing LRU on the ﬁrst Nchunks presents two main

beneﬁts. On the one hand, it reduces the extra-trafﬁc on the

core network caused for the retrieval of video portions that

are not requested. For instance, whenever a user watches

a video from its beginning up to portion b, only the ﬁrst

k= mink{xk≥b}chunks are downloaded. Hence, only the

portion xk−bis stored in the cache without being accessed.

On the other hand, we exploit the fact that the tail of a

video is generally less popular than the rest [9]. Hence, by

systematically discarding the tail of each video we avoid to

evict from the cache the ﬁrst chunks, which are likely to be

more popular.8

We now formally describe our algorithm which uses as input

the chunking of ﬁles and the tail drop factor. The impact of

those parameters on actual performance is analyzed in the

following subsections.

chunk-LRU Algorithm

Step 1 (Initialization):

1.1) Set the tail drop factor ν∈(0; 1]

1.2) Partition each video iinto N+ 1 chunks of the form [x0=

0; x1],[x1, x2],...,[xN−1;ν≡xN],[xN=ν;xN+1 = 1], where

xi∈[0; 1] (see Fig. 8)

1.3) An initial chunk request recency vector is available

Step 2: A request for a packet of video i∈ M belonging to its k-th chunk

[xk−1, xk]arrives

2.1) If k=N+1, then the request is handled by the core network and the

cache is not updated (i.e., the tail is never cached)

8Additionally, although this is not the focus of this paper, performing LRU

on chunks would allow to keep track of the evolution of the popularity of

each chunk. Nevertheless, the resulting beneﬁts would be minor, since the

retention rate varies on a time scale much slower than the video popularity

dynamics.

2.2) Else, if 1≤k≤N, then

2.2.1) If the requested chunk is stored in the cache, then the cache sends

the packet to the user

2.2.2) If the requested chunk is not stored in the cache, then it is retrieved

from the core network and then stored in the cache, after evicting

the minimum number of least recently used chunks. Finally, the

cache sends the packet to the user

2.3) The recency vector of the chunks stored in the cache is updated in an

LRU fashion

2.4) Return to step 2)

A. Chunk-LRU Performance under Viewing Abandonment

After having described our chunk-LRU algorithm, we now

turn to the analysis of its performance. To this purpose, in

this section we will assume that the viewing abandonment

model holds. Moreover, in order to come up with our analytical

results we make the common simplifying assumption that all

videos have the same size S=Si. This is well justiﬁed

by the fact that we can break large videos into equal size

fragments, and perform chunk-LRU over the chunks of the

video fragments.

We ﬁrst observe that, under the viewing abandonment model

(Section III-A1), the probability that the k-th chunk of video i

is requested by a user knowing that the user herself has already

started watching video iequals Ri(xk−1) = R1

xk−1πm(τ)dτ.

Since the requests for video ifollow by assumption a Poisson

process of intensity (proportional to) pi, then the request

process for the k-th chunk is also Poisson with reduced

intensity piRi(xk−1). Thus, thanks to an adaptation of the

popular Che’s approximation [15] we can already compute

the hit rate for a speciﬁc chunk, i.e., the probability that a

chunk is found in the cache when requested.

Let us elaborate on this. Che’s approximation was originally

proposed in [15] to compute the hit rate for ﬁles whose request

successions follow independent Poisson processes. It approxi-

mates the characteristic time tC, measuring the time that a ﬁle

spends in the cache, as a constant. When shifting the request

granularity from the video to the chunk level, the independence

property of request streams is unavoidably lost. Nevertheless

we can still rely on the intuition that when the cache size

is signiﬁcantly larger than the video size the characteristic

time of each chunk is approximately equal and constant, hence

Che’s approximation still holds, which has been shown valid

in [1]. Therefore, the hit rate hk,i for the k-th chunk of video

ican be approximated as hk,i = 1 −e−piRi(xk−1)tC, where

the characteristic time tCobeys the following relation [17]:

k=1

∆xk

i=1

hk,i,(8)

where ∆xk=xk−xk−1. Finally, the expected trafﬁc per video

request BcLRU forwarded to the core network when the chunk-

LRU cache management policy is employed writes

BcLRU(x, ν) = (9)

i=1

pi N

k=1

Ri(xk−1)(1−hk,i )∆xk+Z1

Ri(τ)dτ!

where x={x1, . . . , xN−1}.

B. Beneﬁts of Chunk Sub-Splitting

We now focus on the impact of the chunk size on chunk-

LRU performance, measured as the trafﬁc generated at the

core network BcLRU. Intuitively speaking, shrinking the chunk

size should translate into better trafﬁc performance, since this

reduces the trafﬁc surplus generated when users do not watch

a chunk in its entirety. Nevertheless this does not prove the

intuition, since modifying the chunk size also has an impact

on the characteristic time tCin a non-trivial way via Eq. (8).

Before stating the main result of this section, we ﬁrst

need to introduce some notation. Let tCand tCbe the

characteristic times when only one chunk (i.e., [0; ν]) and

chunks of inﬁnitesimal size dx (say, at the byte level) are

employed, respectively. More formally, tCand tCare the

unique roots of the two following equations:

S=ν

i=1 1−e−pitC

i=1 Zν

01−e−piRi(x)tCdx,

respectively. Moreover, we say that the chunk split x0is a

sub-split with respect to xwhenever ∪i{xi}⊂∪i{x0

i}. We

ﬁnally observe that if ν=C

MS then the cache can store all

the ﬁrst videos up to their portion ν; hence, it is reasonable

to constrain νwithin the interval [C

MS ; 1].

We are now ready to prove that any reﬁnement of the chunk

granularity produces a decrease in the expected trafﬁc load on

the core network.

Theorem 2. Let ν∈[C

MS ; 1] and let xbe a video chunk split.

Assume that

dτ

i=1

piRi(τ)e−piRi(τ)tC<0,∀tC∈[tC;tC], τ ∈[0; 1]

(10)

Then, any video chunk sub-split x0outperforms xin terms of

trafﬁc generated on the core network, i.e., the following holds:

BcLRU(x0, ν)< BcLRU(x, ν).

Numerical experiments suggest that our sufﬁcient condi-

tion (10) is very loose, and it generally holds for realistic

popularity distributions and retention rates. It is not satisﬁed

only in pathological cases where the distribution is extremely

concentrated around few popular ﬁles and the cache size very

small, near to the size of a single ﬁle.

C. Optimal Performance of Chunk-LRU

In this section we focus on the computation of the best

performance of chunk-LRU, optimized over the chunk size

and tail drop factor ν. We will utilize it as a benchmark for

the performance evaluation of practical chunk-LRU policies in

realistic scenarios in Section V-D.

In order to come up with the best performance achievable

by chunk-LRU we need to ﬁnd the solution of the following

optimization problem:

BcLRU = min

N,x,ν,tC

BcLRU(x, ν)(11)

s.t.





S=PN

k=1 ∆xkPM

i=1 1−e−piRi(xk−1)tC

MS ≤ν≤1

0 = x0≤x1≤ · · · ≤ xN−1≤xN=ν.

It follows from Theorem 2 that, if condition (10) holds,

then the bandwidth utilization of any video chunk split xand

ν∈[C

MS ; 1] is lower bounded by the performance BcLRU(ν)

of the inﬁnitesimal split (say, at the byte level). This greatly

simpliﬁes the formulation of (11) in a two-variable constrained

optimization problem (see Eq. 12). Below we formalize this

result.

Corollary 3. Assume that condition (10) holds. For any video

chunk split xand tail drop factor ν, the trafﬁc performance

BcLRU(x, ν)is lower bounded by the performance BcLRU of

the inﬁnitesimal chunking approach:

BcLRU ≤BcLRU(x, ν),

where BcLRU is computed as

BcLRU = min

ν,tC

i=1 Zν

piRi(x)e−piRi(x)tCdx +Z1

piRi(τ)dτ

s.t.C

S=PM

i=1 Rν

01−e−piRi(x)tCdx

MS ≤ν≤1.(12)

We stress the fact that BcLRU is the lowest core network

trafﬁc achievable by a chunk-LRU cache management policy.

Thanks to the formulation in (12) we can prove the

following two results via standard Lagrangian optimization

techniques.

Corollary 4. If Riis continuous and Ri(1) = 0 for all i∈ M

then the optimal ν∗<1.

Corollary 5. If Ri(τ) = 1 for all τ∈[0; 1],i∈ M then

standard LRU (only one chunk for each video and ν= 1)

achieves optimal performance.

The former result states that if users never watch videos in

their entirety, then it is always optimal to never cache a non-

negligible portion of ﬁle, i.e., ν∗<1. The latter claims that,

as intuition suggests, if all users watch the whole video then

the best chunk-LRU policy is actually the standard LRU.

D. Performance evaluations with real data

In this section we numerically evaluate the trafﬁc perfor-

mance on the core network of the proposed class of chunk-

LRU cache management policies. We compare them with the

optimal performance Bunder full information that we derived

in Section IV. We also take the performance of standard LRU

as a second term of comparison. As in Section IV, we consider

the audience retention rate scenario shown in Tab. I, estimated

from a real Youtube dataset, with the only difference that the

ﬁle size is supposed to be uniform. We show our results9

in Fig. 9. We ﬁrst notice that, as hinted by Theorem 2, the

trafﬁc generated by chunk-LRU decreases as the number N

of chunks increases (N= 4,20). The inﬁnitesimal chunk size

approach (N=∞) is shown to achieve optimal performance

BcLRU, as claimed in Corollary 3. Notably, the chunk-LRU

performs close to its optimal performance even with a limited

number of chunks (N= 20 or also N= 4). Moreover, a

suboptimal value of the tail drop factor ν= 1 still performs

close to optimal for Nsufﬁciently high (see Sect. V-E for

further details). On the other hand, as expected, standard LRU

performs poorly. In fact, the trafﬁc generated by retrieving

parts of ﬁle that are not requested by the users outweighs

the obtained beneﬁts through cache hits even for medium-size

caches. This explains why the trafﬁc generated by LRU can

be even higher than the one without any cache deployed.

The best tail drop factor ν∗=ν∗(N)used to produce Fig. 9

is optimized for each value of Nand cache size C, as shown

in Fig. 10. We notice that ν∗is closely related to average

watch-time, since it captures the portion of ﬁles with the lowest

popularity which need to be systematically discarded from the

cache. For small cache sizes, simulations show that νis lower

than the watch-time: in fact, to compensate for the reduced

cache size, low values of νallow to squeeze in the cache a

signiﬁcant amount of different - and popular - headers of ﬁles.

E. Tuning the chunk-LRU parameters

Although the optimization of chunk-LRU parameters is

beyond the scope of this paper, next we provide guidelines

on how reasonable values could be selected.

a) Choosing the number of chunks: Increasing the num-

ber of chunks translates into an increase of the frequency

at which the cache content and the associated recency list

is updated, as well as an increase of the recency table size.

Therefore, the design of the optimal number Nof chunks in

real systems should capture the trade-off between the actual

performance of the policy (for which high values of Nare

preferable, see Cor. 3) and the required processing/memory

resources, increasing with N. Our numerical results in Fig. 9

suggest that even a small number of chunks (around 4),

that would result to a low complexity policy, can achieve

reasonably good trafﬁc performance.

b) Choosing the tail drop factor ν:The exact optimal

value ν∗(N)can be computed by solving the problem in (12)

only if all the system parameters, i.e., the ﬁle popularity pi

and the retention rates Ri, are known to the cache controller.

For comparison purposes10, we then show in Fig. 9 the perfor-

mance achieved in the extreme case where the cache manager

is agnostic to piand Riand the tail drop parameter νis blindly

set to 1, i.e., no chunks are ever discarded. Remarkably, if the

number of chunks is sufﬁciently high (N= 20 in this case),

9The trafﬁc performance is normalized w.r.t. the trafﬁc Bnc generated when

no cache is present, as in Section IV.

The chunk-LRU policies have chunks with equal size.

10If the full information assumption holds then using chunk-LRU would

be highly suboptimal, since the theoretically optimal solution computed in

Section IV can be actually implemented.

Figure 9: Normalized core network trafﬁc generated by chunk-LRU vs. the theoretical optimum Band vs.

the standard LRU. The optimal ν∗=ν∗(N)is computed for each value of Nand cache size C, as

depicted in Fig. 10. We also evaluate the performance achieved when the sub-optimal value of ν= 1 is

utilized. The video popularity distribution follows a Zipf law with parameter 0.8 [17]

Figure 10: Optimal tail drop factor ν∗for different number of chunks

N= 4,20,∞. We notice that the optimal ν∗(N)is within a

neighborhood of the average watch-time of 0.61.

the loss in performance incurred by such sub-optimal choice

is limited: the ﬁne granularity of chunk splitting compensates

for the loss incurred by setting ν= 1.

Remark 2. We claim that a reasonable choice of ν(<1) can

be still made in realistic scenarios, based on an estimation of

the parameters pi, Ri. First of all, indeed, the optimal ν∗is

not strictly a function of the popularity of each video, but only

of the rank-dependent popularity piof the i-th most popular

video, for each i. It has been shown [17] that such rank-

dependent relation depends on the class of trafﬁc and is slowly

varying over time, hence it is easily predictable. Secondly, we

argue the video retention rate functions Rivary on a much

slower time scale than that of video popularity, which greatly

facilitates its estimation.

VI. CONCLUSIONS

In this paper we investigated the potential of partial caching

towards minimizing core network trafﬁc. Our numerical results

based on real YouTube access data reveal that big caches

beneﬁt the most from such strategies, namely up to 50%

over the classic approach of storing the most popular ﬁles.

Interestingly, partial caching is beneﬁcial even when the actual

popularity of videos is not known. In this case, practical

chunk-based LRU strategies which never cache the tail of

videos were shown to perform well as long as a sufﬁcient

number of chunks is used.

The introduction of audience retention rate in caching

decisions opens up interesting research directions. Retention

rate is generally available in online video distribution systems

and does not evolve over time. Thus, it can be used to

decompose the problems of ﬁle popularity estimation and

optimal chunking without loss of optimality. In this context,

the generalization of existing caching mechanisms so as to op-

timally exploit the beneﬁts of partial caching is an interesting

topic for future study.

VII. APPENDIX

A. Waterﬁlling Algorithm

Algorithm to compute η∗

Step 1 (Initialization): Let k= 0,C(0) := C,M(0) := M,Mµ

a:= ∅,

Mµ

b:= ∅. Deﬁne ˜

i:R→Ras a strictly decreasing extension of piR0

over the whole real axis, i.e., ˜

R00

i(τ) = piR0

i(τ)for all τ∈[0; 1] and ˜

R00

is strictly decreasing over R.

Step 2 Compute µ(k)via the equation Pi∈M(k)Si[˜

R00

i]−1(µ(k)) = C(k).

Compute the sets Mµ(k)

a={m: [ ˜

R00

i]−1(µ(k))<0},Mµ(k)

b={m:

[˜

R00

i]−1(µ(k))>1},Mµ(k)={m: 0 ≤[˜

R00

i]−1(µ(k))≤1}.

Compute δ(µ(k)) = Pi∈Mµ(k)

Si+Pi∈Mµ(k)Si[R0

i]−1(µ(k))−C(k).

Step 3 If δ(µ(k)) = 0 or Mµ(k)=∅then set µ=µ(k),Mµ

a=Mµ

a∪

Mµ(k)

a,Mµ

b=Mµ

b∪ Mµ(k)

b,Mµ=Mµ(k), and go to step 6.

Else, if δ(µ(k))>0then go to step 4.

Else, if δ(µ(k))<0then go to step 5.

Step 4 Set η∗

i= 0 for all i∈ Mµ(k)

a. Set C(k+1) := C(k). Compute

M(k+1) := M(k)\ Mµ(k)

a,Mµ

a:= Mµ

a∪ Mµ(k)

a,k:= k+ 1. Go to

step 2.

Step 5 Set η∗

i=Sifor all i∈ Mµ(k)

b. Compute C(k+1) =C(k)−

Pi∈Mµ(k)

Si,M(k+1) := M(k)\ Mµ(k)

b,Mµ

b:= Mµ

b∪ Mµ(k)

k:= k+ 1. Go to step 2.

Step 6 Set η∗

i= 0 for all i∈ Mµ

a;η∗

i= 1 for all i∈ Mµ

b;η∗

[˜

R00

i]−1(µ(k))for all i∈ Mµ. Stop.

B. Proof of Theorem 1

Proof. As a ﬁrst step, let us deﬁne fi(τ) : [0; 1] →[0; 1] as a

one-to-one function such that the permuted audience retention

rate function R0

i(τ) := Ri(f−1

i(τ)) is non decreasing. The

function fiis a permutation function that orders the video parts

in order of decreasing popularity, such that fi(τ)< fi(τ0)

if and only if Ri(τ)> Ri(τ0)11. Then, R0

iis the outcome

11We notice that such fialways exists, even though is not unique, since it

can arbitrarily break the ties among equally popular parts of a single video,

and it is in general discontinuous.

of such permutation. As a second step, we reformulate the

optimization problem in (3) as

Y∗= argmax

i∈M

SiZYi

piRi(τ)dτ (13)

s.t.Pi∈M SiRYi1dτ =C

Yi⊆[0; 1]

We can recast the bandwidth saving optimization problem in

(13) in terms of the permuted engagement rates R0

iand by

considering only right intervals of 0 of the kind Yi= [0; ηi],

as follows:

max

η∈RMX

i∈M

piSiZηi

i(τ)dτ (14)

s.t.Pi∈M ηiSi=C

ηi∈[0; 1].

In fact, it is not proﬁtable to consider a larger search domain,

e.g., more complicated subsets Yof [0; 1]M: for any collection

of subsets Yit is possible to replace Yiwith the interval

[0; RYidτ]with a strict increase of the objective function while

the feasibility is still preserved. We can further simplify (14)

by deﬁning the function R00

i(τ) = piR0

i(τ), as follows:

min

η∈RMX

i∈M Zηi

−R00

i(τ)dτ (15)

s.t.Pi∈M ηi=C

ηiSi∈[0; Si].

We notice that d

dηiRηi

0−R00

i(τ)dτ =−piR0

i(ηi), which is

non-decreasing in ηi. Thus we recognize in (15) a convex

optimization problem with linear and box constraints, where

the objective function is separable in the optimization variables

η. It is known that such kind of problems can be solved via

a classic water-ﬁlling technique (see [19], Chapter 6): more

speciﬁcally, there exists a positive “water level” µsuch that

the optimal portions η∗(µ)can be computed as











ηi∗(µ) = 





1 if minτ∈[0;1] R00

i(τ)≥µ

0 if maxτ∈[0;1] R00

i(τ)≤µ

R00−1

i(µ) else

Pi∈M Siη∗

i(µ) = C

(16)

By rewriting (16) in terms of R0

iwe obtain the expressions:











η∗

i=





1 if piminτ∈[0;1] R0

i(τ)≥µ

0 if pimaxτ∈[0;1] R0

i(τ)≤µ

R0−1

i(µ/pi) else

Pi∈M Si|Y∗

i|=C.

and we can ﬁnally claim that

Y∗

i=f−1

i([0; η∗

i]) = {τ:piRi(τ)≥µ} ∀ i∈ M.

The thesis follows.

C. Proof of Proposition 1

Proof. Since Riis already strictly decreasing, then we can

consider fi(τ) = τand R0

i=Ri. Moreover, in this case

minτRi(τ) = 0 and maxτRi(τ) = 1. The thesis easily

follows.

D. Proof of Corollary 2

Proof. Deﬁne

R−1

i(τ) = −1

λi

ln τ(1 −e−λi) + e−λi.

We notice that ˜

R−1

i(µ/pi) = R−1

i(µ/pi)when 0< µ ≤pi

and ˜

R−1

i(µ/pi)<0whenever pi> µ. Then, we can rewrite

(5) as

(η∗

i=h˜

R−1

i(µ/pi)i+

Pi∈M Siη∗

i=C.

The thesis easily follows.

E. Proof of Theorem 2

Proof. Let us ﬁrst introduce the function

ξ(tC)(τ) =

i=1

piRi(τ)e−piRi(τ)tC.

We then deﬁne I(f)|x, where fis a continuous function

deﬁned over R, the integral approximation of fvia Riemann

sums of the type:

I(f)|x=

k=1

f(xk−1)∆xk.

We notice that if fis increasing (decreasing) then I(f)|x<

(>)I(f)|x0for any sub-splitting x0. We can now rewrite

BcLRU(x, ν)as (compare with (9))

BcLRU(x, ν) = I(ξ(tC))|x

s.t. M ν −C

S=I(h(tC))|x

where h(tC)(τ) = PM

i=1 e−piRi(τ)tC. Since h(tC)(τ)is in-

creasing in τ, it easily follows from an induction argument

that the value of characteristic time for any chunk splitting is

found within [tC;tC].

Consider now a sub-splitting x0with associated characteristic

time t0

C. Since h(tC)(τ)is increasing, then I(h(tC))|x0>

I(h(tC))|x. Also, since I(h(t0

C))|x0=I(h(tC))|x, and h(t)(τ)

is decreasing in tthen t0

C> tC. We then have

BcLRU(x, ν) = I(ξ(tC))|x>I(ξ(t0

C))|x>I(ξ(t0

C))|x0

=BcLRU(x0, ν)

where the second inequality follows from the fact that ξ(t)(τ)

is decreasing in τfor any value tof the characteristic time.

The thesis is proven.

F. Proof of Corollary 4

Proof. The derivative with respect to νof the objective

function in (12) in the direction along which the constraint

is satisﬁed writes

q(ν) = −

i=1

(1 −e−piRi(ν)tC)piRi(ν)+ (17)

Zν

i=1

iR2

i(τ)e−piRi(τ)tCdτ PM

i=1 1−e−piRi(ν)tC

Rν

0PM

i=1 piRi(τ)e−piRi(τ)tCdτ

Let us calculate q(1 −dν), which equals

dν A+B dν

C+D dν

i=1

pi|R0

i(1)| − dν

i=1

i|R0

i(1)|2!.

Since A=Rν

0PM

i=1 p2

iR2

i(τ)e−piRi(τ)tCdτ > 0and B=

Rν

0PM

i=1 piRi(τ)e−piRi(τ)tCdτ > 0, then q(1 −dν)>0and

thesis is proven.

G. Proof of Corollary 5

Proof. We ﬁrst observe that, if Ri(τ)=1, then for all νwe

have BcLRU([0; ν], ν) = BcLRU(x, ν)for any chunk splitting

x. Then it sufﬁces to prove that q(ν)<0holds for all ν∈

(0; 1), i.e., that the following expression holds:

i=1

1−e−pitC!M

i=1

ie−pitC+

−

i=1

(1 −e−pitC)pi

i=1

pie−pitC<0

REFERENCES

[1] J. Roberts and N. Sbihi, “Exploring the memory-bandwidth tradeoff in

an information-centric network,” in Proc. of ITC, 2013, pp. 1–9.

[2] “Cisco visual networking index: Forecast and methodology, 20142019,”

http://www.cisco.com/c/en/us/solutions/collateral/service-provider/

ip-ngn- ip-next-generation-network/white paper c11-481360.html.

[3] K. W. Hwang, D. Applegate, A. Archer, V. Gopalakrishnan, S. Lee,

V. Misra, K. K. Ramakrishnan, and D. F. Swayne, “Leveraging video

viewing patterns for optimal content placement,” in Proceedings of IFIP

Conference on Networking, ser. IFIP’12, 2012, pp. 44–58.

[4] http://support.google.com/youtube/answer/1715160?hl=en-GB.

[5] M. Zeni, D. Miorandi, and F. De Pellegrini, “YOUStatAnalyzer: a tool

for analysing the dynamics of YouTube content popularity,” in Proc of

VALUETOOLS 13. ICST, 2013, pp. 286–289.

[6] S. Sen, J. Rexford, and D. Towsley, “Proxy preﬁx caching for multimedia

streams,” in Proc. of IEEE INFOCOM ’99, vol. 3, Mar 1999, pp. 1310–

1319 vol.3.

[7] K.-L. Wu, P. Yu, and J. Wolf, “Segmentation of multimedia streams for

proxy caching,” IEEE Transactions on Multimedia, vol. 6, no. 5, pp.

770–780, Oct 2004.

[8] L. Wang, S. Bayhan, and J. Kangasharju, “Optimal chunking and partial

caching in information-centric networks,” Computer Communications,

vol. 61, pp. 48–57, 2015.

[9] J. Yu, C. T. Chou, Z. Yang, X. Du, and T. Wang, “A dynamic caching

algorithm based on internal popularity distribution of streaming media,”

Multimedia Systems, vol. 12, no. 2, pp. 135–149, 2006.

[10] K. Agrawal, T. Venkatesh, and D. Medhi, “A dynamic popularity-based

partial caching scheme for video on demand service in IPTV networks,”

in Proc. of COMSNETS ’ 14, Jan 2014, pp. 1–8.

[11] S.-H. Lim, Y.-B. Ko, G.-H. Jung, J. Kim, and M.-W. Jang, “Inter-chunk

popularity-based edge-ﬁrst caching in content-centric networking,” IEEE

Communications Letters, vol. 18, no. 8, pp. 1331–1334, Aug 2014.

[12] S. Chen, H. Wang, X. Zhang, B. Shen, and S. Wee, “Segment-based

proxy caching for Internet streaming media delivery,” IEEE Multimedia,

vol. 12, no. 3, pp. 59–67, 2005.

[13] M. Hefeeda and O. Saleh, “Trafﬁc modeling and proportional partial

caching for peer-to-peer systems,” IEEE/ACM Transactions on Network-

ing,, vol. 16, no. 6, pp. 1447–1460, Dec 2008.

[14] U. Devi, R. Polavarapu, M. Chetlur, and S. Kalyanaraman, “On the

partial caching of streaming video,” in IEEE IWQoS, 2012, June 2012,

pp. 1–9.

[15] H. Che, Y. Tung, and Z. Wang, “Hierarchical web caching systems:

Modeling, design and experimental results,” IEEE Journal on Selected

Areas in Communications,, vol. 20, no. 7, pp. 1305–1314, 2002.

[16] W. S. Cleveland, “Robust locally weighted regression and smoothing

scatterplots,” Journal of the American statistical association, vol. 74,

no. 368, pp. 829–836, 1979.

[17] C. Fricker, P. Robert, and J. Roberts, “A versatile and accurate ap-

proximation for lru cache performance,” in 24th International Teletrafﬁc

Congress (ITC 24), Sept 2012, pp. 1–8.

[18] http://wistia.com/doc/audience-engagement- graph.

[19] S. M. Stefanov, Separable programming: theory and methods. Springer

Science & Business Media, 2013, vol. 53.

Caching Partial Files for Content Delivery

Conference Paper

Jan 2020

Numerous empirical studies have shown that users of video-on-demand platforms do not always watch videos in their entirety. A direct consequence of this is that not all parts of a video are equally popular. Motivated by this, we explore the benefits of dividing files into smaller segments for caching. We treat incoming requests as requests for segments of files and propose a Markovian request model which captures the time-correlation in requests. We characterize the fundamental limit on the performance of caching policies which only cache full files. Next, we propose and analyze the performance of policies which cache partial files. Using this, we characterize the potential for improvement in performance due to caching partial files and analyze its dependence on various system parameters like cache size and the popularity profile of the files being cached.

Audience-Retention-Rate-Aware Caching and Coded Video Delivery with Asynchronous Demands

Preprint

Full-text available

Aug 2018

Most results on coded caching focus on a static scenario, in which a fixed number of users synchronously place their requests from a content library, and the performance is measured in terms of the latency in satisfying all of these demands. In practice, however, users start watching an online video content asynchronously over time, and often abort watching a video before it is completed. The latter behaviour is captured by the notion of audience retention rate, which measures the portion of a video content watched on average. In order to bring coded caching one step closer to practice, asynchronous user demands are considered in this paper, by allowing user demands to arrive randomly over time, and both the popularity of video files, and the audience retention rates are taken into account. A decentralized partial coded caching (PCC) scheme is proposed, together with two cache allocation schemes; namely the optimal cache allocation (OCA) and the popularity-based cache allocation (PCA), which allocate users' caches among different chunks of the video files in the library. Numerical results validate that the proposed PCC scheme, either with OCA or PCA, outperforms conventional uncoded caching as well as the state-of-the-art decentralized caching schemes, which consider only the file popularities, and are designed for synchronous demand arrivals. An information-theoretical lower bound on the average delivery rate is also presented.

Learning to Cache With No Regrets

Preprint

Full-text available

Apr 2019

This paper introduces a novel caching analysis that, contrary to prior work, makes no modeling assumptions for the file request sequence. We cast the caching problem in the framework of Online Linear Optimization (OLO), and introduce a class of minimum regret caching policies, which minimize the losses with respect to the best static configuration in hindsight when the request model is unknown. These policies are very important since they are robust to popularity deviations in the sense that they learn to adjust their caching decisions when the popularity model changes. We first prove a novel lower bound for the regret of any caching policy, improving existing OLO bounds for our setting. Then we show that the Online Gradient Ascent (OGA) policy guarantees a regret that matches the lower bound, hence it is universally optimal. Finally, we shift our attention to a network of caches arranged to form a bipartite graph, and show that the Bipartite Subgradient Algorithm (BSA) has no regret

Learning to Cache With No Regrets

Conference Paper

Full-text available

Apr 2019

Interactive Web Documentaries: A Case Study of Video Viewing Behaviour on iOtok

Article

Full-text available

Sep 2021

This article explores video-viewing behavior when videos are wrapped in interactive content in the case of iOtok, a 13-episodes web documentary series. The interaction and viewing data were collected over a period of one year, providing a dataset of more than 12,200 total video views by 6000 users. Standard metrics (video views, percentage viewed, number of sessions) show higher active participation for registered users compared to unregistered users. Results also indicate that serialization over multiple weeks is an effective strategy for audience building over a long period of time without negatively affecting video views. In viewing behavior analysis, we focused on three perspectives: (i) regularity (watching on a weekly basis or not), (ii) intensity (number of videos per session), and (iii) order of watching. We performed a perspective based and combined perspectives analysis involving manual coding techniques, rule-based, and k-means clustering algorithms to reveal different user profiles (intermittent, exemplary, detached, enthusiastic users, and nibblers) and highlight further viewing behavior differences (e.g., post-series users binge-watched more than concurrent users during first 13 weeks while the series was weekly released). We discuss how these results can be used to inform the design and promotion of future web documentaries.

Cache-Aided Interactive Multiview Video Streaming in Small Cell Wireless Networks

Conference Paper

Sep 2018

The emergence of novel interactive multimedia applications with high data rate and low latency requirements has led to a drastic increase in the video data traffic over wireless cellular networks. Endowing the small base stations of a macro-cell with caches that can store some of the contents is a promising technology to cope with the increasing pressure on the backhaul connections, and to reduce the delay for demanding video applications. In this work, delivery of an interactive multiview video over an heterogeneous cellular network is studied. Differently from existing works that focus on the optimization of the delivery delay and ignore the video characteristics, the caching and scheduling policies are jointly optimized, taking into account the quality of the delivered video and the video delivery time constraints. We formulate our joint caching and scheduling problem via submodular set function maximization and propose efficient greedy approaches to find a well performing joint caching and scheduling policy. Numerical evaluations show that our solution significantly outperforms benchmark algorithms based on popularity caching and independent scheduling.

Cache-Aided Interactive Multiview Video Streaming in Small Cell Wireless Networks

Article

Apr 2018

The emergence of novel interactive multimedia applications with high rate and low latency requirements has led to a drastic increase in the video data traffic over wireless cellular networks. Endowing the small base stations of a macro-cell with caches that can store some of the content is a promising technology to cope with the increasing pressure on the backhaul connections, and to reduce the delay for demanding video applications. In this work, delivery of an interactive multiview video to a set of wireless users is studied in an heterogeneous cellular network. Differently from existing works that focus on the optimization of the delivery delay and ignore the video characteristics, the caching and scheduling policies are jointly optimized, taking into account the quality of the delivered video and the video delivery time constraints. We formulate our joint caching and scheduling problem as the average expected video distortion minimization, and show that this problem is NP-hard. We then provide an equivalent formulation based on submodular set function maximization and propose a greedy solution with $\frac{1}{2}(1-\mbox{e}^{-1})$ approximation guarantee. The evaluation of the proposed joint caching and scheduling policy shows that it significantly outperforms benchmark algorithms based on popularity caching and independent scheduling. Another important contribution of this paper is a new constant approximation ratio for the greedy submodular set function maximization subject to a $d$-dimensional knapsack constraint.

Towards Real-Time Video Caching at Edge Servers: A Cost-Aware Deep Q-Learning Solution

Article

Nov 2021

Given the rapid growth of user-generated videos, internet traffic has been heavily dominated by online video streaming. Caching videos on edge servers in close proximity to users has been an effective approach to reduce the backbone traffic and the request response time, as well as to improve the video quality on the user side. Video popularity, however, can be highly dynamic over time. The cost of cache replacement at edge servers, particularly that related to service interruption during replacement, is not yet well understood. This paper presents a novel lightweight video caching algorithm for edge servers, seeking to optimize the hit rate with real-time decisions and minimized cost. Inspired by recent advances in deep Q-learning, our DQN-based online video caching (DQN-OVC) makes effective use of the rich and readily available information from users and networks. We decompose the Q-value function as a product of the video value function and the action function, which significantly reduces the state space. We instantiate the action function for cost-aware caching decisions with low complexity so that the cached videos can be updated continuously and instantly with dynamic video popularity. We used video traces from Tencent, one of the largest online video providers in China, to evaluate the performance of our DQN-OVC and to compare it with state-of-the-art solutions. The results demonstrate that DQN-OVC significantly outperforms the baseline algorithms in the edge caching context.

Measuring Audience Retention in YouTube

Conference Paper

Mar 2019

There exist many aspects involved in a video turning viral on YouTube. These include properties of the video such as the attractiveness of its title and thumbnail, the recommendation policy of YouTube, marketing and advertising policies and the influence that the video's creator or owner has in social networks. In this work, we study audience retention measures provided by YouTube to video creators which may provide valuable information for improving the videos and for better understanding the viewers' potential interests in them. We then study the question of when is a video too long and can gain from being shortened. We examine consistency between several existing audience retention measures. We end in a proposal for a new audience retention measure and identify its advantages.

Quality-Aware DASH Video Caching Schemes at Mobile Edge

Conference Paper

Sep 2017

YOUStatAnalyzer: a Tool for Analysing the Dynamics of YouTube Content Popularity

Conference Paper

Full-text available

Jan 2014

Understanding the dynamics of on-line content popularity is an active research field with application in sectors as diverse as media advertising, content replication and caching and on-line marketing. In most cases, scientists have focused on user-generated contents, which are freely accessible through different on-line services. Among such services, the incumbent one is indeed YouTube. This online platform was launched in 2005 and it currently features more than 6 billions hours of video watched every month (almost one hour per person on Earth), with more than 100 hours of videos uploaded every minute and 1 billion unique users per month1. In order to analyze or predict content popularity, statistics about viewers, watch time and shares must be retrieved. The YouTube APIs, however, do not allow third parties to retrieve such an information in an open and accessible way. In order to overcome this problem, we have developed a framework, based on Web scraping techniques and big data tools, for the collection and analysis of YouTube video content popularity at scale. Our framework, called YOUStatAnalyzer, enables researchers to create their own dataset, according to a number of different search criteria and analyse them to extract relevant features and significant statistics.

Leveraging Video Viewing Patterns for Optimal Content Placement

Conference Paper

Full-text available

May 2012

As IP becomes the predominant choice for video delivery, storing the ever increasing number of videos for delivery will become a challenge. In this paper we focus on how to take advantage of user viewing patterns to place content in provider networks to reduce their storage and network utilization. We first characterize user viewing behavior using data collected from a nationally deployed Video-on-Demand service. We provide proof that users watch only a small portion of videos (not just for short clips, but even with full-length movies). We use this information and a highly flexible Mixed Integer Programming (MIP) formulation to solve the placement problem, in contrast to traditional popularity-based placement and caching strategy. We perform detailed simulations using real traces of user viewing sessions (including stream control operations such as Pause, Skip, etc.). Our results show that the use of a segmentbased placement yields substantial savings both in storage as well as network bandwidth. For example, compared to a simple caching scheme using full videos, our MIP-based placement using segments can achieve up to 71% reduction in peak link bandwidth usage.

A versatile and accurate approximation for LRU cache performance

Article

Full-text available

Feb 2012

In a 2002 paper, Che and co-authors proposed a simple approach for estimating the hit rates of a cache operating the least recently used (LRU) replacement policy. The approximation proves remarkably accurate and is applicable to quite general distributions of object popularity. This paper provides a mathematical explanation for the success of the approximation, notably in configurations where the intuitive arguments of Che, et al clearly do not apply. The approximation is particularly useful in evaluating the performance of current proposals for an information centric network where other approaches fail due to the very large populations of cacheable objects to be taken into account and to their complex popularity law, resulting from the mix of different content types and the filtering effect induced by the lower layers in a cache hierarchy.

Conference Paper

Jan 2014

Caching video objects closer to the users in delivery of on-demand video services in IPTV networks reduces the load on the network and improves the latency in video delivery. Partial caching of the video objects is attractive due to the space constraints of the cache and also due to the fact that some parts of the video might be more popular than the others. However, fixed segment-based caching of videos does not take into account the changing popularity of the segments and the changes in the viewing patterns of the users. In this work, we propose a partial caching strategy that considers the changes in the popularity of the segments over time and the access patterns of the users to compute the utility of the objects in the cache. We also propose to partition the cache to avoid the eviction of the popular objects (those not accessed frequently) by the unpopular ones which are accessed with higher frequency. We measured the popularity distribution and ageing of popularity from two online datasets and use the parameters in simulations. Our simulation results show that the proposed caching scheme improves the byte hit ratio when compared to the LRU caching scheme both for static and dynamic object pools and ageing of popularity.

Optimal Chunking and Partial Caching in Information-Centric Networks

Article

Dec 2014
COMPUT COMMUN

Caching is widely used to reduce network traffic and improve user experience. Traditionally caches store complete objects, but video files and the recent emergence of information-centric networking have highlighted a need for understanding how partial caching could be beneficial. In partial caching, objects are divided into chunks which are cached either independently or by exploiting common properties of chunks of the same file. In this paper, we identify why partial caching is beneficial, and propose a way to quantify the benefit. We develop an optimal n-Chunking algorithm with complexity for an s-byte file, and compare it with -optimal homogeneous chunking, where is bounded by . Our analytical results and comparison lead to the surprising conclusion that neither sophisticated partial caching algorithm nor high complexity optimal chunking are needed in information-centric networks. Instead, simple utility-based in-network caching algorithm and low complexity homogeneous chunking are sufficient to achieve the most benefits of partial caching.

Separable programming. Theory and methods

Article

Stefan M. Stefanov

Article

Aug 2014

Content-centric networking (CCN) is considered promising for the efficient support of ever-increasing streaming multimedia services. Inter-chunk popularity-based caching is one of the key requirements in CCN multimedia services because some chunks of a content file tend to be requested more frequently than others. For multimedia contents, forepart chunks often have higher popularity than others as users may interrupt and abort before finishing its service. This paper presents a novel cache replication scheme, which places more popular chunks ahead on the edge router, and establishes a cache pipelining on the relaying routers along the path to reduce user-perceived delay. Simulation results show that the proposed scheme incurs less delay and reduces the overall redundant network traffic while guaranteeing a higher cache hit ratio.

Exploring the Memory-Bandwidth Tradeoff in an Information-Centric Network

Article

Sep 2013

An information-centric network should realize significant economies by exploiting a favourable memory-bandwidth tradeoff: it is cheaper to store copies of popular content close to users than to fetch them repeatedly over the Internet. We evaluate this tradeoff for some simple cache network structures under realistic assumptions concerning the size of the content catalogue and its popularity distribution. Derived cost formulas reveal the relative impact of various cost, traffic and capacity parameters, allowing an appraisal of possible future network architectures. Our results suggest it probably makes more sense to envisage the future Internet as a loosely interconnected set of local data centers than a network like today's with routers augmented by limited capacity content stores.

On the partial caching of streaming video

Article

Jun 2012

Video objects are much larger in size than traditional web objects and tend not to be viewed in entirety. Hence, caching them partially is a promising approach. Also, the projected growth in video traffic over wireless cellular networks calls for resource-efficient caching mechanisms in the wireless edge to lower traffic over the cellular backhaul and peering links and their associated costs. An evaluation of traditional partial caching solutions proposed in the literature shows that known solutions are not robust to video viewing patterns, increasing object pool size, changing object popularity, or limitation in the resources available for caching at the wireless network elements. In this paper, to overcome the limitations, we propose a novel approach that adopts a flexible segmentation policy and generalizes both LRU and LFU when applied to segmented accesses, and in our simulations, is shown to significantly lower wireless backhaul traffic (by around 20--30% and in some cases even higher).

Segmentation of Multimedia Streams for Proxy Caching

Article

Oct 2004

Proxy caching of large multimedia objects on the edge of the Internet has become increasingly important for reducing network latency. For a large media object, such as a two-hour video, treating the whole media as a single object for caching is not appropriate. In this paper, we study three media segmentation approaches to proxy caching: fixed, pyramid, and skyscraper. Blocks of a media stream are grouped into various segments for cache management. The cache admission and replacement policies attach different caching priorities to individual segments, taking into account the access frequency of the media object and the segment distance from the start of the media. These caching policies give preferential treatment to the beginning segments. As such, most user requests can be quickly played back from the proxy servers without delay. Event-driven simulations are conducted to evaluate the segmentation approaches and compare them with whole media caching. The results show that: 1) compared with whole media caching, segmentation-based caching is more effective not only in increased byte-hit ratio but also in lowered fraction of requests that requires delayed start; 2) pyramid segmentation, where segment size increases exponentially, is the best segmentation approach; and 3) segmentation-based caching is especially advantageous when the cache size is limited, when the set of hot media objects changes over time, when the media file size is large, and when there are a large number of distinct media objects.

Adapting Caching to Audience Retention Rate: Which Video Chunk to Store?

Abstract and Figures

Recommended publications

Efficient Cache Availability Management in Information-Centric Networks

Adapting Caching to Audience Retention Rate

Caching Partial Files for Content Delivery

Audience-Retention-Rate-Aware Caching and Coded Video Delivery with Asynchronous Demands

Placing Dynamic Content in Caches with Small Population