Conference PaperPDF Available

AutoPlex: inter-session multiplexing congestion control for large-scale live video services

August 2022

August 2022

DOI:10.1145/3538401.3546596

Conference: SIGCOMM '22: ACM SIGCOMM 2022 Conference

Authors:

Tong Li

Renmin University of China

Show all 6 authorsHide

Content uploaded by Tong Li

Content may be subject to copyright.

AutoPlex: Inter-Session Multiplexing Congestion Control for

Large-Scale Live Video Services

Bo Wu

Tencent Technologies

Tong Li∗

Renmin University of China

Cheng Luo

Tencent Technologies

Changkui Ouyang

Tencent Technologies

Xinle Du

Tsinghua University

Fuyu Wang

Tencent Technologies

ABSTRACT

The increasingly obvious advances in live video services introduce

the urgent need for enhancing network transmission performance,

especially by designing an ecient congestion control (CC) scheme.

Unfortunately, the previous rule-based CC methods cannot adapt

well to various network conditions and statuses while machine-

learning-powered CC paradigms always suer from non-trivial

system overhead and unstable eects.

In this paper, we rst conduct a large-scale network measure-

ment for 800+ million live video streams, and nd that QoS metrics

of better-performed sessions show similarity in the same user group.

We then propose AutoPlex, an inter-session multiplexing CC frame-

work that makes full use of this similarity and automatically adjusts

CC parameters (i.e., pacing rate and congestion window size). Au-

toPlex supports user-dened policies that can act as standards to

learn QoS features of better-performed sessions. We implement the

proposed AutoPlex prototype based on QUIC protocol and BBR

algorithm, and conduct experiments in the real live CDN proxy. The

experimental results demonstrate the potentials of AutoPlex for the

transmission optimization of live video applications, in which the

average (or 90th-percentile) retransmission ratio can be reduced by

24%

⇠

27% (or 32%

⇠

40%) while the average value of goodput/rtt is

promoted by 14% ⇠32%.

CCS CONCEPTS

•Networks →Transport protocols;

KEYWORDS

Congestion Control; Network Measurement

ACM Reference Format:

Bo Wu, Tong Li, Cheng Luo, Changkui Ouyang, Xinle Du, and Fuyu Wang.

2022. AutoPlex: Inter-Session Multiplexing Congestion Control for Large-

Scale Live Video Services. In ACM SIGCOMM 2022 Workshop on Network-

Application Integration (NAI ’22), August 22, 2022, Amsterdam, Netherlands.

ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3538401.3546596

∗Tong Li is the corresponding author (tong.li@ruc.edu.cn).

Permission to make digital or hard copies of all or part of this work for personal or

classroom use is granted without fee provided that copies are not made or distributed

for prot or commercial advantage and that copies bear this notice and the full citation

on the rst page. Copyrights for components of this work owned by others than ACM

must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,

to post on servers or to redistribute to lists, requires prior specic permission and/or a

fee. Request permissions from permissions@acm.org.

NAI ’22, August 22, 2022, Amsterdam, Netherlands

ACM ISBN 978-1-4503-9395-9/22/08. . . $15.00

https://doi.org/10.1145/3538401.3546596

1 INTRODUCTION

In the past few years, live video services (LVS) such as TikTok Live,

YouTube Live, and Twitch have experienced rapid growth, where

the weekly coverage of live streaming worldwide has reached 30.4%

in Q3 of 2021 [

]. However, the real network conditions vary with

regions, resulting in vastly dierent network QoS among these

regions. In particular, poor network QoS has negative impacts on

both LVS consumers and CDN vendors. For example, users might

complain or stop using some LVS as their deteriorated QoE (e.g.,

video freezing) while CDN vendors are also unable to accept their

cost increases, e.g., caused by higher retransmission ratios.

The existing mechanisms mainly focus on designing, optimizing

congestion control (CC) algorithms for enhancing transmission per-

formance of LVS trac. On the one hand, rule-based CC schemes

(e.g., Reno [

], Cubic [

] and BBR [

]) can introduce better perfor-

mance in some specic network status or services. However, the

way they set CC parameters (e.g., maximum pacing rate) as xed val-

ues obviously cannot adapt to dierent network conditions, which

is the main reason for the huge QoS dierences between regions.

For example, 10Mbps maximum pacing rate might incur lower re-

transmission ratio in 10Mbps bottleneck bandwidth (BtlBW) than

in 1Mbps BtlBW. On the other hand, machine learning (ML) enables

more exible CC parameter conguration for dierent network

conditions [

–

]. However, the way they construct an ML model

for each user group (UG) cannot address the issue that a large num-

ber of UGs exist in a large-scale network. For example, over 18000

UGs (grouped based on "state-city-ASN" rule) can be obtained in

Brazil, in which 27.3% of national LVS sessions will occupy at least

100 ML models to achieve trac transmission optimization (see

§3.2). That is actually unable to be deployed due to the non-trival

memory or computation overhead caused by numerous ML models.

Therefore, a deployable CC scheme is highly required for adaptive

parameter congurations based on dierent network conditions.

In this paper, we make large-scale network measurements for

LVS trac transmissions, in which we can nd the same CC param-

eter conguration can bring dierent network QoS performance

among various UGs (see §3.3). Besides, we also learn the better-

performed sessions with lower retransmission ratios (retran_ratios)

or larger goodput/rtt value

have similar QoS features in a same

UG (see §3.4). Based on the above measurement and analysis, a

novel CC framework named AutoPlex is proposed, which can au-

tomatically multiplex QoS metrics of better-performed sessions in

the last period and makes adaptive CC parameter congurations

for next-period LVS trac. Take BBR as an example, maximun

The value of goodput/rtt is usually leveraged to evaluate the CC performance [

which is used as a target for LVS transmission optimization.

NAI ’22, August 22, 2022, Amsterdam, Netherlands Wu. et al.

pacing rate (max_pacing_rate) and maximum congestion window

(max_cwnd) can be congured based on little-changed QoS val-

ues such as goodput, maximum inight size (max_in�ight) and

max_cwnd that better-performed LVS sessions introduce in the

last period. Compared to rule-based CC methods, AutoPlex en-

ables adaptive parameter congurations for dierent UGs that

has various network conditions without incurring heavyweight

ML overhead. Meanwhile, AutoPlex supports user-dened policies

to achieve directional optimization for some specic QoS metric,

which act as selection criteria to extract better-performed LVS

sessions. For example, the QoS metrics of LVS trac with lower

retran_ratio can be leveraged to congure CC parameters if the pol-

icy is dened to reduce retran_ratio. We implement our proposed

AutoPlex and partially deploy it in live CDN proxy that is based

on QUIC protocol and BBR algorithm for LVS trac transmission.

The experimental results demonstrate the potential of AutoPlex

for LVS application. Concretely, the average (or 90th-percentile)

retran_ratio can be reduced by 24%

⇠

27% (or 32%

⇠

40%) while the

average value of goodput/rtt is promoted by by 14% ⇠32%.

The remainder of this paper is organized as follows. §2 intro-

duces the related work about congestion control. §3 introduces the

background and motivation. Then, the design details of AutoPlex

are depicted in §4, and experimental evaluation is described in §5.

Finally, we conclude this article in §6.

2 RELATED WORK

Reno [

] is the most well known scheme that proposes the key

concepts of parameters in CC such as congestion window (cwnd),

inight size, initial window, etc. Many TCP variants [

–

] are

aimed at specic networks and modify or improve the settings of

these parameters, based on their specic insights or assumptions.

For example, Vegas [

] sets cwnd based on RTT changes in networks

with stable delay. BIC [

] uses binary search to set cwnd, and

Cubic [

] uses a cubic function. Verus [

] and Sprout [

] are

designed specically for wireless networks to better t the frequent

changes in bandwidth. BBR [

] estimates the maximum bandwidth

to set pacing rate and computes the bandwidth and delay product

(bdp) to set cwnd. TACK [

] computes bdp and the minimum

RTT to set ACK frequency rather than cwnd in wireless network

cases. Although these schemes can improve TCP’s adaptability to

dierent network scenarios, for a specic scheme, its underlying

logic and way of setting CC parameters remain inexible.

Machine learning (ML) enables more exible CC parameter con-

guration for dierent network conditions [

–

]. We roughly di-

vide the ML-powered schemes into intra-session learning-based

and inter-session learning-based schemes.

Intra-session learning-based schemes learn the network con-

ditions (as well as status) in a same session and adjust the CC

parameters in real time. For example, Vivace [

] and PCC [

]

adjusts sending rates in real time, and determines the size of the

increment according to a performance utility function gradient.

RemyCC [

] iteratively searches for a state-action mapping table

to maximize an objective function. However, this intra-session esti-

mation still suer from inexibility since it relies on the stability

and predictability of the underlying network.

Table 1: User group (UG) amount and session ratio with 13.56

million live video trac in Brazil.

Grouping Rules UG Num. Session Ratio

Top 10 Top 50 Top 100

State-City-ASN 18620 11.8% 22.2% 27.3%

State-ASN 10326 16.7% 27.7% 32.9%

ASN 8329 30.9% 35.7% 38.3%

In the contrast, inter-session learning-based schemes learn the

network conditions across dierent sessions and adjust CC parame-

ters in the context of the same UG. For example, Indigo [

] uses an

LSTM model to train across a wide range of scenarios, guided by

reaching the optimal operating point of the network. Orca [

] com-

bines a learning model and rule-based scheme to ensure available

recoveries from wrong equilibrium. However, as discussed before,

the way they construct an ML model for each UG cannot address

the issue that a large number of UGs exist in a large-scale network

(see §3.2). In this paper, we propose AutoPlex to improve the CC

adaptability while reduces the computation and memory overhead

of inter-session learning-based schemes.

3 MOTIVATION

3.1 LVS Demands Flexible CC Schemes

It is well-studied that one-size-ts-all scheme does not exist in

CC [

]. Specically, for the worldwide LVS, covering a wide va-

riety of network conditions in various regions and countries, no

single scheme can adequately prevail. Take a famous live platform

as an example, the average delay in Malaysia is 86ms, while in

Hainan Province of China it is 36ms. Besides, the average available

bandwidth in Hainan is 7.3 Mbps for each LVS streams compared

to 3.2Mbps in Turkey. Moreover, LVS streams incur average 5.2%

packet loss rate in Turkey and 3.8% in Brazil, respectively. For a

same region such as Brazil, the average, 50th- and 90th-percentile

delays (available bandwidths) are 52ms (3.9Mbps), 27ms (5.4Mbps)

and 169ms (15.2Mbps), respectively2.

As discussed above, rule-based CC schemes (e.g., Reno, CUBIC,

Sprout, BBR, etc.) show inexibility in the underlying logic and way

of setting CC parameters. It is expected that dierent choices of

CC parameter settings to fare dierently in dierent contexts (e.g.,

UGs). Recently, it has been demonstrated that ML-powered schemes

show great potential to improve the exibility of CC [5–8, 17, 18].

3.2

UG-based and ML-powered Schemes Do Not

Scale Well

To pursue a more stable and precise control eect, many existing

researches focus on dividing user groups (UGs) based on client’s

region and ISP/AS number (ASN) information and provide some spe-

cic parameters for each UG [

][

]. However, non-trival memory

or computation overhead incurred by ML models can be the bottle-

neck factor of real deployment [

], especially when the amount

of user group becomes very large. Table 1 shows the amount of

The data mentioned above is all based on our large-scale network measurements for

LVS trac transmission.

AutoPlex: Inter-Session Multiplexing Congestion Control NAI ’22, August 22, 2022, Amsterdam, Netherlands

Avg. 50th 75th 90th

Metrics

10%

12%

Retran ratio

UG 1

UG 2

UG 3

(a) Retransmission ratio

Avg. 50th 75th 90th

Metrics

100

120

SRTT (ms)

UG 1

UG 2

UG 3

(b) Smooth RT T

Avg. 50th 75th 90th

Metrics

Goodput (Mbps)

UG 1

UG 2

UG 3

Figure 1: The QoS dierences of LVS streams that belong to three dierent UGs.

UG under dierent grouping rules in Brazil. For more ne-grained

grouping rules (e.g., state-city-ASN), more UGs can be obtained

while with the most coarse-grained rule (i.e., ASN)

, UG number

can also reach 8329. Meanwhile, we also count the top 10, 50, and

100 UGs based on their session amounts and nd that leverag-

ing 100 ML models for top 100 UGs can only cover 27.3% ~38.3%

steams of entire LVS sessions. Compared to their limited benets,

the overhead introduced by ML models is unacceptable in complex

networks, especially with a large number of UGs. This reveals that

UG-based and ML-powered schemes do not scale well.

3.3

LVS Session Performance Varies in Dierent

UGs

We make large-scale network measurements for 3 months and

record session data of 800+ million LVS streams, which contains

client IP address and some QoS metrics, e.g., minimum, maximum

and smooth RTT (denoted by min_rtt,max_rtt and srtt, respec-

tively), goodput, max_cwnd,max_in�ight,retran_ratio, etc. We use

"state-city-ASN" as a grouping rule and construct <IP, UG> mapping

table to extract QoS values for every UG.

Figure 1 shows the dierences of retran_ratio,srtt and goodput

between dierent UGs, which is depicted from the perspective of

average, 50th-, 75th and 90th-percentile values. We can learn LVS

sessions in dierent UGs perform dierently in terms of the above

three QoS metrics. For example, the LVS sessions of UG 2 result in

a smaller and more stable retran_ratio while a lower latency (evalu-

ated by srtt) is actually introduced in UG 1 (instead of in UG 2 and

3) from the perspective of average, 50th-, 75th- and 90th-percentile

values. Meanwhile, UG 3 always holds a smaller goodput than the

other two UGs, which might be caused by its higher retran_ratio

and smaller goodput. Therefore, adaptively conguring a series of

CC parameters for each UG will be highly required to meet dierent

network conditions and optimize LVS transmissions.

3.4 Better-Performed Sessions Perform

Similarly in the Same UG

To explore potential associations of session QoS within a same UG,

we randomly extract six-day LVS trac data of UG 1. The time

In this paper, we regard ASN (instead of ISP) as the basic for user grouping, as ASN

can better reect the session similarity in a same UG. With user grouping, each user

or client IP address can be exactly divided into a UG.

interval Tis set to 15 minutes, where some LVS trac with spe-

cic ranges of retran_ratio and goodput/srtt is extracted as better-

or worse-performed LVS sessions (dened as below). Figure 2

shows that better-performed sessions extremely outperform worse-

performed streams in terms of retran_ratios (

⇠

0.03% vs.

⇠

2.9%) and

goodput/srtt (⇠872 Mbps/s vs. ⇠53Mbps/s) on average.

•

Better-performed sessions are dened as the LVS streams

with 5th- to 20th-percentile retran_ratios or descending good-

put/srtt in all last-period sessions.

•

Worse-performed sessions are dened as the LVS streams

with 75th- to 90th-percentile retran_ratios or descending

goodput/srtt in all last-period sessions.

03/25 03/26 03/27 03/28 03/29 03/30 03/31

2.5%

5.0%

7.5%

10.0%

12.5%

15.0%

17.5%

Average retran_ratio

All LVS sessions

Better-performed sessions

Worse-performed sessions

(a) Retransmission ratio

03/25 03/26 03/27 03/28 03/29 03/30 03/31

Average goodput/srtt (×10 2Mbps/s)

All LVS traffic

Better-performed sessions

Worse-performed sessions

(b) Goodput/srtt

Figure 2: Better- and worse-performed LVS sessions.

To explore the similarity between better-performed LVS sessions

in a same UG, we analyze the changes of goodput, max_in�ight

and max_cwnd in two adjacent time periods. Figure 3(a) depicts the

cumulative goodput change rates of better- and worse-performed

sessions in a same UG. We nd that better-performed sessions has

little-changed goodput in two adjacent periods, whose average

change rate is 25%

⇠

30% of that of worse-performed sessions.

The cumulative changes of max_cwnd and max_in�ight for better-

and worse-performed session is shown in Figure 3(b) and 3(c), in

which the curve slope represents the change rate compared to last-

period values. We can learn the better-performed LVS sessions

can introduce smoother max_cwnd and max_in�ight changes than

those worse-performed trac. For example, 3.0

⇥⇠

5.1

⇥

changes

of max_cwnd and 1.3

⇥⇠

3.0

⇥

changes of max_in�ight are caused

by worse-performed sessions compared to better-performed LVS

streams. Therefore, better-performed LVS sessions in a same UG

perform similarly with little-changed QoS values, which can be

leveraged to automatically congure CC parameters for next period.

NAI ’22, August 22, 2022, Amsterdam, Netherlands Wu. et al.

03/25 03/26 03/27 03/28 03/29 03/30 03/31

Cumulative gooput change rate (×10 4)

Better-performed sessions (retran_ratio)

Worse-performed sessions (retran_ratio)

Better-performed sessions (goodput/srtt)

Worse-performed sessions (goodput/srtt)

(a) Cumulative goodput change rates

03/25 03/26 03/27 03/28 03/29 03/30 03/31

Cumulative max_cwnd changes (×10 4)

Better-performed sessions (retran_ratio)

Worse-performed sessions (retran_ratio)

Better-performed sessions (goodput/srtt)

Worse-performed sessions (goodput/srtt)

(b) Cumulative max_cwnd changes

03/25 03/26 03/27 03/28 03/29 03/30 03/31

100

150

200

250

300

Cumulative max_inflight changes (Mb)

Better-performed sessions (retran_ratio)

Worse-performed sessions (retran_ratio)

Better-performed sessions (goodput/srtt)

Worse-performed sessions (goodput/srtt)

Figure 3: The QoS similarities of better-performed LVS session in a same UG.

4 THE AUTOPLEX DESIGN

4.1 AutoPlex Framework

AutoPlex enables to enhance LVS transmission performance by

automatically multiplexing inter-session QoS similarities of better-

performed LVS sessions in the last period, which will be used to

congure next-period CC parameters. In particular, the maximum

value of cwnd and pacing_rate are all congured as little-changed

based on the QoS values of latest better-performed sessions. To

meet diverse requirements, AutoPlex framework supports user-

dened QoS policies (e.g., lowering retran_ratio or promoting the

value of goodput/srtt), which will be followed to learn similar QoS

metrics of better-performed LVS sessions. Figure 4 depicts AutoPlex

framework that contains three modules for measurement, decision

and execution. Note that these modules can be deployed on a same

CDN proxy server or on dierent servers.

Data

Collection

User

Grouping

QoS of

Each UG

Measurement Module Decision Module

User-defined

QoS targets

User-defined

QoS targets

data

collection

user

grouping

QoS of

each UG

Measurement Module Decision Module

parameters of 'better' sessions

user-defined

QoS policies

parameters of each UG

Execution Module

New LVS

request <IP, UG> <UG, Parameters>

UG_ID max_cwnd

max_pacing_rate

New LVS

response

Figure 4: The framework of AutoPlex.

Measurement module can perform data collection of all LVS

sessions and get last-period QoS values of each UG by looking up

the pre-established <IP, UG> mapping table. With user-dened QoS

policies, decision module obtains better-performed LVS sessions

and records their similar QoS values (e.g., goodput,max_cwnd and

max_in�ight) that will be utilized for next-period CC parameter

congurations. When receiving a new LVS request, execution mod-

ule extracts remote IP and gains UG identier UG_ID by inquiring

our pre-established <IP, UG> table. With <UG, parameters> map-

ping table, CC parameters (i.e., max_cwnd and max_pacing_rate) of

this UG will be obtained and then be congured for the response

of this LVS session. In AutoPlex, measurement module counts QoS

values of each UG for every time interval of T(e.g., T = 30 mins),

Algorithm 1 CC parameter conguration pseudo code.

1: function CC  ()

2: Require:

3: %DBA ,&D6

4: Compute:

5: &14CC4A ⇢G CA02C (&D6,%

DBA )

6: 6>>3? DC ,<0 G _8=5 ;86⌘C,<0G_2F=3 <40= (&14C C4A )

7: ?028=6_A0C 4_;0BC =6>>3 ?D C

8: <0G_2F=3_;0BC =<0G _8=5 ;86⌘C+<0G_2F=3

9: Operate:

10: ?028=6_A0C 4_2?C =?028=6_608= ⇤<0G _1F

11: ?028=6_A0C 4 =5(?028=6_A0C 4 _2?C,?028=6_A0C4 _;0BC)

12: 8=5 ;86⌘C_20? =5(8=5 ;86⌘C_20?,<0G_2F=3_;0BC)

13: end function

which is utilized by decision module for maintaining latest <UG,

parameters> mapping table in execution module.

4.2 Design Details

AutoPlex focuses on achieving automated congurations for CC pa-

rameters by multiplexing similar QoS values of those LVS sessions

that performed better in the last period. In this paper, more ne-

grained grouping rule (i.e., state-city-ASN) is leveraged to classify

LVS consumers. Algorithm 1 shows the pseudo code for CC pa-

rameter congurations, where user-dened policies

%DBA

and each

UG’s QoS values

&D6

(i.e., retran_ratio,goodput,srtt,max_in�ight

and max_cwnd) act as inputs for parameter computations.

Better-performed session extraction. AutoPlex enables deci-

sion module to extract similar QoS values (denoted by

&14CC4A

) of

better-performed LVS sessions in each UG. In Algorithm 1, function

⇢GCA 02C(·)

denotes sorting

&D6

according to

%DBA

-related metric

(e.g., retran_ratio) and then obtains

&14CC4A

from

8

9

percentile

values of the sorted

&D6

. For example,

5and

20 are set to

extract

&14CC4A

from

&D6

(ascended by retran_ratio) in §3.3. Then,

the average values of goodput,max_in�ight and max_cwnd can be

obtained while taking the mean operation for &14CC4A .

CC parameter conguration. With goodput,max_in�ight and

max_cwnd of better-performed sessions, AutoPlex rst computes

pacing_rate_last and max_cwnd_last, which reects average pac-

ing_rate and our desired max_cwnd, respectively, in the last period.

AutoPlex: Inter-Session Multiplexing Congestion Control NAI ’22, August 22, 2022, Amsterdam, Netherlands

f1 f2 f3 f4

path

Internet

云服务器

终端用户

音视频报文

消息确认报文消息确认报文

CDN servers

UG 1 UG 2 UG 3 UG 4

CDN proxy

Internet

Figure 5: The experimental evaluation.

Then pacing rate and inight cap can be congured for next-period

LVS trac transmissions, which will be elaborated in §4.3.

4.3 AutoPlex-based BBR Implementation

In this section, we give an example of how BBR makes use of the

two outputs of AutoPlex, i.e., pacing_rate_last and max_cwnd_last,

to adjust its pacing rate and inght cap.

Adjust pacing rate. A BBR sender controls its pacing_rate with

the help of pacing and an estimated maximum bandwidth max_bw,

where the estimate for max_bw is based on a windowed maximum

lter of the delivery rate that the receiver experiences. The pac-

ing rate is varied in an eight-phase cycle using a pacing_gain of

5/4,3/4,1,1,1,1,1,1, where each phase lasts for an RTT. The BBR

implementation based on AutoPlex computes the pacing rate (i.e.,

pacing_rate_cpt) as line 11 of Algorithm 1 shows. In this paper,

5(·)

is a customizable function according to the optimization goal. For

lower retran_ratio or higher goodput/srtt, we set

5(·)

as a function

that takes the minimum value when

01

and takes the average

value when 0>1as follows.

5(0,1)=(001

0+1

20>1(1)

Adjust inight cap. Although BBR does not use a congestion

window or ACK clocking to control the amount of inight data,

it uses an inight data limit (e.g., 2bdp), which we call it inght

cap (denoted by in�ight_cap) in this paper. The BBR implemen-

tation based on AutoPlex computes the inight cap as line 12 in

Algorithm 1 shows. This enables little-changed maximum values

of in�ight_cap, especially when the computed in�ight_cap exceeds

last-period max_cwnd_last.

Discussion. In real-world deployment, we can activate AutoPlex

just for some selected LVS trac, e.g., high-priority sessions from

VIP users. In this case, low-priority sessions are leveraged to explore

better QoS metrics, which will be multiplexed by high-priority ses-

sions. This partial deployment strategy enhances the convergence

of AutoPlex. Note that, even though the same CC parameters are

congured, better- and worse-performed LVS sessions in a same

UG can be still identied as changing network conditions and status.

UG 1 UG 2 UG 3 UG 4

Avgerage retran_ratio

Baseline

AutoPlex

(a) Average retran_ratio

UG 1 UG 2 UG 3 UG 4

0.2%

0.4%

0.6%

0.8%

50th-percentile retran_ratio

Baseline

AutoPlex

(b) 50th-percentile retran_ratio

UG 1 UG 2 UG 3 UG 4

75th-percentile retran_ratio

Baseline

AutoPlex

UG 1 UG 2 UG 3 UG 4

10%

12%

90th-percentile retran_ratio

Baseline

AutoPlex

(d) 90th-percentile retran_ratio

Figure 6: Retransmission ratios of AutoPlex.

Therefore, the idea of multiplexing inter-session QoS metrics for

CC parameter congurations can be always achieved over time.

5 EXPERIMENTAL EVALUATION

In this section, we implement AutoPlex prototype and partially

deploy it in the real live CDN proxy (as Figure 5 shows) that is de-

veloped based on Nginx [

], which enables measurement, decision

and execution modules of AutoPlex

. We randomly select four UGs

(i.e., UG 1

⇠

UG 4) to evaluate AutoPlex using the following metrics:

retran_ratio and gooodput/srtt, which can be set as user-dened

policies. The time period is set to 30 mins and LVS sessions in our

evaluation are all based on QUIC protocol and BBRv1 scheme (as

baseline) for transmission.

5.1 Retransmission Ratio

In AutoPlex framework, users can dene their own preference as an

optimization goal (e.g., lowering retran_ratio) for LVS trac trans-

mission. In this section, we select better-performed sessions with

5th- to 20th-percentile retran_ratio for CC parameter conguration

of next-period LVS trac.

Figure 6 depicts average, 50th-, 75th- and 90th-percentile re-

tran_ratio changes of dierent UGs, respectively, when deploy-

ing AutoPlex prototype. We can learn AutoPlex can introduce

lower retran_ratio, where the average, 75th- and 90th-percentile

retran_ratios are reduced by 24%

⇠

27%, 28%

⇠

41% and 32%

⇠

44%, respectively, compared to the existing BBR scheme. Note that

50th-percentile retran_ratio does not gain a obvious optimization,

which always keeps close to optimal values (i.e., 0.37%

⇠

0.70%)

that are dierent to be further optimized. Meanwhile, AutoPlex

brings dierentiated reduction ratios for dierent UGs to optimize

their retran_ratios. This is because traditional xed CC parameter

settings for entire networks can introduce various baseline values

for each UG while AutoPlex can adjust CC parameters based on

each UG’s network conditions.

In this paper, we meanly focus on the LVS transmission optimization from CDN proxy

to user clients. Then video streaming interaction is beyond the scope of our research.

NAI ’22, August 22, 2022, Amsterdam, Netherlands Wu. et al.

UG 1 UG 2 UG 3 UG 4

100

200

300

400

500

Avgerage goodput/srtt (Mbps/s)

Baseline

AutoPlex

(a) Average goodput/srtt

UG 1 UG 2 UG 3 UG 4

100

200

300

400

50th-percentile goodput/srtt (Mbps/s)

Baseline

AutoPlex

(b) 50th-percentile goodput/srtt

UG 1 UG 2 UG 3 UG 4

100

200

300

400

500

600

700

75th-percentile goodput/srtt (Mbps/s)

Baseline

AutoPlex

UG 1 UG 2 UG 3 UG 4

200

400

600

800

1000

1200

90th-percentile goodput/srtt (Mbps/s)

Baseline

AutoPlex

(d) 90th-percentile goodput/srtt

Figure 7: Goodput/srtt values of AutoPlex.

5.2 Ratio of Goodput to SRTT

AutoPlex enables to enhance goodput/srtt by multiplexing QoS val-

uess of better-performed LVS sessions to congure max_cwnd and

max_pacing_rate for the next LVS sessions. Note that the sessions

with 80th- to 95th-percentile values of goodput/srtt in the last period

are extracted as better-performed LVS sessions.

Figure 7 shows the values of goodput/srtt in dierent UGs. We

can know AutoPlex can achieve a better optimization for the value

of goodput/srtt, in which the average, 50th- and 75th-percentile

values are all signicantly enhanced (i.e., by 14%

⇠

32%, 44%

⇠

128%, and 19%

⇠

56%, respectively). By contrast, the 90th-percentile

values have not been improved obviously. This might be because

these values are already close to the optimal and have limited room

to be further optimized. AutoPlex can introduce dierentiated op-

timization results for various UGs. For example, 32% and 14% im-

provements of average goodput/srtt are incurred for UG 1 and UG

3, respectively. For the metric of goodput/srtt, AutoPlex can achieve

obvious optimization for those poor performances (e.g., 50th- and

75th-percentile values) while keeping plausible improvements for

already-better performances.

6 CONCLUSION AND FUTURE WORK

In this paper, we take a rst step to design, analyze, implement

and evaluate inter-session multiplexing congestion control frame-

work named AutoPlex, which is based on our performed large-scale

network measurements for live video streams. AutoPlex enables

adaptive CC parameter congurations by multiplexing QoS values

of better-performed LVS trac in the last period. Besides, user-

dened policies can also be supported in AutoPlex, which act as

standards to learn QoS features of better-performed LVS trac. We

complete experimental evaluation by implementing and deploying

AutoPlex in the real live CDN proxy, whose results demonstrate

the huge potentials that AutoPlex introduces for optimizing LVS

trac transmission.

In the future, machine learning such as DRL can be leveraged to

select better-performed LVS sessions based on the QoS performance

gains of prior session selection. Besides, exploring more parameter

settings [

] by multiplexing inter-session QoS values is also a

natural extension of AutoPlex.

ACKNOWLEDGMENTS

We would like to thank the NAI reviewers and Dr. Kai Gao for their

extensive and valuable feedback. We also thank Senzhen Liu for his

kind comments during our performing experimental evaluation.

REFERENCES

[1]

Max Wilbert. Top 22 live streaming platforms: Everything you need to know.

https://www.dacast.com/blog/, 2022.

[2]

Van Jacobson. Congestion avoidance and control. In ACM SIGCOMM computer

communication review, volume 18, pages 314–329. ACM, 1988.

[3]

Sangtae Ha, Injong Rhee, and Lisong Xu. Cubic: a new tcp-friendly high-speed

tcp variant. ACM SIGOPS operating systems review, 42(5):64–74, 2008.

[4]

Neal Cardwell, Yuchung Cheng, C Stephen Gunn, Soheil Hassas Yeganeh, and

Van Jacobson. BBR: Congestion-based congestion control: Measuring bottleneck

bandwidth and round-trip propagation time. ACM Queue, 14(5):20–53, 2016.

[5]

Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira.

PCC: Re-architecting congestion control for consistent high performance. In

USENIX NSDI, pages 395–408, 2015.

[6]

Xiaohui Nie, Youjian Zhao, Zhihan Li, Guo Chen, Kaixin Sui, Jiyang Zhang, Zijie

Ye, and Dan Pei. Dynamic tcp initial windows and congestion control schemes

through reinforcement learning. IEEE JSAC, 37(6):1231–1247, 2019.

[7]

Xu Li, Feilong Tang, Jiacheng Liu, Laurence T Yang, Luoyi Fu, and Long Chen.

AUTO: Adaptive congestion control based on multi-objective reinforcement

learning for the satellite-ground integrated network. In USENIX ATC, 2021.

[8]

Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip

Levis, and Keith Winstein. Pantheon: the training ground for internet congestion-

control research. In USENIX ATC, pages 731–743, 2018.

[9]

Lawrence S Brakmo, Sean W O’Malley, and Larry L Peterson. TCP Vegas: New

techniques for congestion detection and avoidance, volume 24. ACM, 1994.

[10]

Lisong Xu, Khaled Harfoush, and Injong Rhee. Binary increase congestion control

(bic) for fast long-distance networks. In IEEE Infocom, volume 4, pages 2514–2524,

2004.

[11]

Yasir Zaki, Thomas Pötsch, Jay Chen, Lakshminarayanan Subramanian, and

Carmelita Görg. Adaptive congestion control for unpredictable cellular networks.

In ACM SIGCOMM, pages 509–522, 2015.

[12]

Keith Winstein, Anirudh Sivaraman, and Hari Balakrishnan. Stochastic forecasts

achieve high throughput and low delay over cellular networks. In USENIX NSDI,

pages 459–472, 2013.

[13]

Venkat Arun and Hari Balakrishnan. Copa: Practical delay-based congestion

control for the internet. In USENIX NSDI, pages 329–342, 2018.

[14]

Tong Li, Kai Zheng, and Ke Xu. Acknowledgment on demand for transport

control. IEEE Internet Computing, 25(2):109–115, 2021.

[15]

Tong Li, Kai Zheng, Ke Xu, Rahul Arvind Jadhav, Tao Xiong, Keith Winstein, and

Kun Tan. Tack: Improving wireless transport performance by taming acknowl-

edgments. In ACM SIGCOMM, pages 15–30, 2020.

[16]

Mo Dong, Tong Meng, DoronZarchy, Engin Arslan, Yossi Gilad, Brighten Godfrey,

and Michael Schapira.

{

PCC

}

vivace: Online-learning congestion control. In

USENIX NSDI, pages 343–356, 2018.

[17]

Keith Winstein and Hari Balakrishnan. Tcp ex machina: Computer-generated

congestion control. ACM SIGCOMM Computer Communication Review, 43(4):123–

134, 2013.

[18]

Soheil Abbasloo, Chen-Yu Yen, and H Jonathan Chao. Classic meets modern: A

pragmatic learning-based congestion control for the internet. In ACM SIGCOMM,

2020.

[19]

Michael Schapira and Keith Winstein. Congestion-control throwdown. In ACM

Hotnets, pages 122–128, 2017.

[20]

Junchen Jiang, Shijie Sun, Vyas Sekar, and Hui Zhang. Pytheas: Enabling

data-driven quality of experience optimization using Group-BasedExploration-

Exploitation. In USENIX NSDI, 2017.

[21]

Minsik Cho and Daniel Brand. MEC: Memory-ecient convolution for deep

neural network. In International Conference on Machine Learning, pages 815–824.

PMLR, 2017.

[22]

F5 Networks Inc. NGNIX: Advanced load balancer, web server, & reverse proxy.

https://www.nginx.com, 2022.

[23]

Tong Li, Kai Zheng, Ke Xu, Rahul Arvind Jadhav, Tao Xiong, Keith Winstein,

and Kun Tan. Revisiting acknowledgment mechanism for transport control:

Modeling, analysis, and implementation. IEEE/ACM TON, 29(6):2678–2692, 2021.

[24]

Hui Xie and Li Tong. Revisiting loss recovery for high-speed transmission. In

IEEE WCNC, pages 1–6, 2022.

Poster: PolyCC: Poly-Algorithmic Congestion Control

Conference Paper

Full-text available

Sep 2023

ART: Adaptive Retransmission for Wide-Area Loss Recovery in the Wild

Conference Paper

Oct 2023

MetaABR: A Meta-Learning Approach on Adaptative Bitrate Selection for Video Streaming

Article

Jan 2023

Video streaming is one of the most popular Internet applications that makes up a large amount of Internet traffic. A fundamental mechanism in video streaming is adaptive bitrate (ABR) selection which decides the proper compression level for each chunk of a video to optimize the users' quality of experience (QoE). The existing ABR algorithms require significant tuning and do not generalize to diverse network conditions and personalized QoE objectives. In this paper, we propose a novel framework for meta-learning based ABR design and discuss challenges of deploying learning based ABR mechanism in real-world video streaming systems. We utilize the proposed framework to design MetaABR, a novel adaptive bitrate selection algorithm based on meta-reinforcement learning to maximize users' QoE. By jointly training multiple learning tasks with a shared meta-critic, it can provide transferrable meta-knowledge to supervise bitrate selection across tasks, and can be applied to efficiently learn a new task in unseen environment with only a few trials. We implement MetaABR on an emulation platform which connects to the Linux network protocol stack through virtual network interfaces. Extensive experiments based on real-world traces and wireless testbed show that MetaABR achieves the best comprehensive QoE compared with the state-of-the-art ABR algorithms in a variety of network environments.

Revisiting Loss Recovery for High-Speed Transmission

Conference Paper

Full-text available

Apr 2022

Revisiting Acknowledgment Mechanism for Transport Control: Modeling, Analysis, and Implementation

Article

Full-text available

Aug 2021

The shared nature of the wireless medium induces contention between data transport and backward signaling, such as acknowledgment. The current way of TCP acknowledgment induces control overhead which is counter-productive for TCP performance especially in wireless local area network (WLAN) scenarios. In this paper, we present a new acknowledgment called TACK ("Tame ACK"), as well as its TCP implementation TCP-TACK. TACK seeks to minimize ACK frequency, which is exactly what is required by transport. TCP-TACK works on top of commodity WLAN, delivering high wireless transport goodput with minimal control overhead in the form of ACKs, without any hardware modification. Evaluation results show that TCP-TACK achieves significant advantages over legacy TCP in WLAN scenarios due to less contention between data packets and ACKs. Specifically, TCP-TACK reduces over 90% of ACKs and also obtains an improvement of up to 28% on goodput. A TACK-based protocol is a good replacement of the legacy TCP to compensate for scenarios where the acknowledgment overhead is non-negligible.

Acknowledgment On Demand for Transport Control

Article

Full-text available

Apr 2021

The dependence on frequent acknowledgments (ACKs) is an artifact of current transport protocol designs rather than a fundamental requirement. Frequent ACKs waste resources when the overhead of ACKs is nonnegligible. However, reducing the number of ACKs may hurt transport performance. “Tame ACK” is an on-demand ACK mechanism that seeks to minimize ACK frequency, which is exactly what is required by transport.

TACK: Improving Wireless Transport Performance by Taming Acknowledgments

Conference Paper

Full-text available

Jul 2020

The shared nature of the wireless medium induces contention between data transport and backward signaling, such as acknowledgement. The current way of TCP acknowledgment induces control overhead which is counter-productive for TCP performance especially in wireless local area network (WLAN) scenarios. In this paper, we present a new acknowledgement called TACK ("Tame ACK"), as well as its TCP implementation TCP-TACK. TCP-TACK works on top of commodity WLAN, delivering high wireless transport goodput with minimal control overhead in the form of ACKs, without any hardware modification. To minimize ACK frequency, TACK abandons the legacy received-packet-driven ACK. Instead, it balances byte-counting ACK and periodic ACK so as to achieve a controlled ACK frequency. Evaluation results show that TCP-TACK achieves significant advantages over legacy TCP in WLAN scenarios due to less contention between data packets and ACKs. Specifically, TCP-TACK reduces over 90% of ACKs and also obtains an improvement of ~ 28% on good-put. We further find it performs equally well as high-speed TCP variants in wide area network (WAN) scenarios, this is attributed to the advancements of the TACK-based protocol design in loss recovery, round-trip timing, and send rate control. Full text can be downloaded at https://dl.acm.org/doi/10.1145/3387514.3405850

TCP ex Machina: Computer-Generated Congestion Control

Article

Aug 2013

This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint’s reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion control algorithms to run at the endpoints. In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm—the control rules for the independent endpoints—that tries to achieve this objective. In simulations with ns-2, Remy-generated algorithms outperformed human-designed end-to-end techniques, including TCP Cubic, Compound, and Vegas. In many cases, Remy’s algorithms also outperformed methods that require intrusive in-network changes, including XCP and Cubic-over-sfqCoDel (stochastic fair queueing with CoDel for active queue management). Remy can generate algorithms both for networks where some parameters are known tightly a priori, e.g. datacenters, and for networks where prior knowledge is less precise, such as cellular networks. We characterize the sensitivity of the resulting performance to the specificity of the prior knowledge, and the consequences when real-world conditions contradict the assumptions supplied at design-time.

Classic Meets Modern: a Pragmatic Learning-Based Congestion Control for the Internet

Conference Paper

Jul 2020

These days, taking the revolutionary approach of using clean-slate learning-based designs to completely replace the classic congestion control schemes for the Internet is gaining popularity. However, we argue that current clean-slate learning-based techniques bring practical issues and concerns such as overhead, convergence issues, and low performance over unseen network conditions to the table. To address these issues, we take a pragmatic and evolutionary approach combining classic congestion control strategies and advanced modern deep reinforcement learning (DRL) techniques and introduce a novel hybrid congestion control for the Internet named Orca. Through extensive experiments done over global testbeds on the Internet and various locally emulated network conditions, we demonstrate that Orca is adaptive and achieves consistent high performance in different network conditions, while it can significantly alleviate the issues and problems of its clean-slate learning-based counterparts.

Dynamic TCP Initial Windows and Congestion Control Schemes Through Reinforcement Learning

Article

Mar 2019

Despite many years of improvements to it, TCP still suffers from an unsatisfactory performance. For services dominated by short flows (e.g., web search, e-commerce), TCP suffers from the flow startup problem and cannot fully utilize the available bandwidth in the modern Internet: TCP starts from a conservative and static initial window (IW, 2-4 or 10), while most of the web flows are too short to converge to the best sending rate before the session ends. For services dominated by long flows (e.g., video streaming, file downloading), the congestion control (CC) scheme manually and statically configured might not offer the best performance for the latest network conditions. To address these two challenges, we propose TCP-RL, which uses reinforcement learning (RL) techniques to dynamically configure IW and CC in order to improve the performance of TCP flow transmission. Basing on the latest network conditions observed at the server side of a web service, TCP-RL dynamically configures a suitable IW for short flows through group-based RL, and dynamically configures a suitable CC scheme for long flows through deep RL. Our extensive experiments show that for short flows, TCP-RL can reduce the average transmission time by about 23%; and for long flows, compared with the performance of 14 CC schemes, TCP-RL’s performance ranks top 5 for about 85% of the 288 given static network conditions, whereas for about 90% of conditions, its performance drops by less than 12% compared with that of the best-performing CC schemes for the same network conditions.

Copa: Practical Delay-Based Congestion Control for the Internet

Conference Paper

Jul 2018

Congestion control schemes that are commonly deployed today are loss-based and were developed in the 2000s. The internet has changed dramatically since then, and these schemes are no longer suitable. This has prompted new research interest in this area, ranging from complex machine learning and optimization techniques [1, 2, 3, 4] to exploring usage of hitherto under-explored ack-arrival rate as signals [5, 6, 7]. It has long been recognized that delay-based congestion control overcomes many challenges that loss-based schemes face [8, 9]. However they have challenges of their own that have precluded their deployment. In this work, we identify and solve some of these challenges to create Copa, a practical delay-based congestion control algorithm for the Internet. Copa solves three challenges. (1) It periodically empties the queue to accurately measure the minimum (base) RTT, (2) it proposes a simple estimator for queuing delay that is robust to noise in RTT measurements, (3) it introduces "TCP-mode switching": normally Copa maintains low delays, but typically delay-sensitive schemes get low throughput when a buffer-filling flow shares the bottleneck. To solve this problem, when Copa detects a non-Copa flow, such as a buffer-filling TCP, it switches to a TCP-competitive mode to get its fair share of the bandwidth. Based on work published at Arun, Venkat, and Hari Balakrishnan. "Copa: Practical Delay-Based Congestion Control for the Internet." 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). USENIX} Association, 2018.

Congestion-Control Throwdown

Conference Paper

Nov 2017

Congestion control is a perennial topic of networking research. In making decisions about who sends data when, congestion-control schemes prevent collapses and ultimately determine the allocation of scarce communications resources among contending users and applications. The field has seen considerable recent activity. Even after three decades of research, basic principles and techniques remain up for debate. In this throwdown-as-paper, the authors find themselves at loggerheads over the fundamental tenets of congestion control.

MEC: Memory-efficient Convolution for Deep Neural Network

Article

Jun 2017

Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods have been proposed including im2col-based convolution, FFT-based convolution, or Winograd-based algorithm. However, all these indirect methods have high memory-overhead, which creates performance degradation and offers a poor trade-off between performance and memory consumption. In this work, we propose a memory-efficient convolution or MEC with compact lowering, which reduces memory-overhead substantially and accelerates convolution process. MEC lowers the input matrix in a simple yet efficient/compact way (i.e., much less memory-overhead), and then executes multiple small matrix multiplications in parallel to get convolution completed. Additionally, the reduced memory footprint improves memory sub-system efficiency, improving performance. Our experimental results show that MEC reduces memory consumption significantly with good speedup on both mobile and server platforms, compared with other indirect convolution algorithms.

AutoPlex: inter-session multiplexing congestion control for large-scale live video services

Recommended publications

On an Interdependent Communication Network

Poster: PolyCC: Poly-Algorithmic Congestion Control

Antelope: A Framework for Dynamic Selection of Congestion Control Algorithms

Poster: TOO: Accelerating Loss Recovery by Taming On-Off Traffic Patterns

Gemini: Divide-and-Conquer for Practical Learning-Based Internet Congestion Control