Conference PaperPDF Available

AutoPlex: inter-session multiplexing congestion control for large-scale live video services

Authors:
AutoPlex: Inter-Session Multiplexing Congestion Control for
Large-Scale Live Video Services
Bo Wu
Tencent Technologies
Tong Li
Renmin University of China
Cheng Luo
Tencent Technologies
Changkui Ouyang
Tencent Technologies
Xinle Du
Tsinghua University
Fuyu Wang
Tencent Technologies
ABSTRACT
The increasingly obvious advances in live video services introduce
the urgent need for enhancing network transmission performance,
especially by designing an ecient congestion control (CC) scheme.
Unfortunately, the previous rule-based CC methods cannot adapt
well to various network conditions and statuses while machine-
learning-powered CC paradigms always suer from non-trivial
system overhead and unstable eects.
In this paper, we rst conduct a large-scale network measure-
ment for 800+ million live video streams, and nd that QoS metrics
of better-performed sessions show similarity in the same user group.
We then propose AutoPlex, an inter-session multiplexing CC frame-
work that makes full use of this similarity and automatically adjusts
CC parameters (i.e., pacing rate and congestion window size). Au-
toPlex supports user-dened policies that can act as standards to
learn QoS features of better-performed sessions. We implement the
proposed AutoPlex prototype based on QUIC protocol and BBR
algorithm, and conduct experiments in the real live CDN proxy. The
experimental results demonstrate the potentials of AutoPlex for the
transmission optimization of live video applications, in which the
average (or 90th-percentile) retransmission ratio can be reduced by
24%
27% (or 32%
40%) while the average value of goodput/rtt is
promoted by 14% 32%.
CCS CONCEPTS
Networks Transport protocols;
KEYWORDS
Congestion Control; Network Measurement
ACM Reference Format:
Bo Wu, Tong Li, Cheng Luo, Changkui Ouyang, Xinle Du, and Fuyu Wang.
2022. AutoPlex: Inter-Session Multiplexing Congestion Control for Large-
Scale Live Video Services. In ACM SIGCOMM 2022 Workshop on Network-
Application Integration (NAI ’22), August 22, 2022, Amsterdam, Netherlands.
ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/3538401.3546596
Tong Li is the corresponding author (tong.li@ruc.edu.cn).
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for prot or commercial advantage and that copies bear this notice and the full citation
on the rst page. Copyrights for components of this work owned by others than ACM
must be honored. Abstracting with credit is permitted. To copy otherwise, or republish,
to post on servers or to redistribute to lists, requires prior specic permission and/or a
fee. Request permissions from permissions@acm.org.
NAI ’22, August 22, 2022, Amsterdam, Netherlands
©2022 Association for Computing Machinery.
ACM ISBN 978-1-4503-9395-9/22/08. . . $15.00
https://doi.org/10.1145/3538401.3546596
1 INTRODUCTION
In the past few years, live video services (LVS) such as TikTok Live,
YouTube Live, and Twitch have experienced rapid growth, where
the weekly coverage of live streaming worldwide has reached 30.4%
in Q3 of 2021 [
1
]. However, the real network conditions vary with
regions, resulting in vastly dierent network QoS among these
regions. In particular, poor network QoS has negative impacts on
both LVS consumers and CDN vendors. For example, users might
complain or stop using some LVS as their deteriorated QoE (e.g.,
video freezing) while CDN vendors are also unable to accept their
cost increases, e.g., caused by higher retransmission ratios.
The existing mechanisms mainly focus on designing, optimizing
congestion control (CC) algorithms for enhancing transmission per-
formance of LVS trac. On the one hand, rule-based CC schemes
(e.g., Reno [
2
], Cubic [
3
] and BBR [
4
]) can introduce better perfor-
mance in some specic network status or services. However, the
way they set CC parameters (e.g., maximum pacing rate) as xed val-
ues obviously cannot adapt to dierent network conditions, which
is the main reason for the huge QoS dierences between regions.
For example, 10Mbps maximum pacing rate might incur lower re-
transmission ratio in 10Mbps bottleneck bandwidth (BtlBW) than
in 1Mbps BtlBW. On the other hand, machine learning (ML) enables
more exible CC parameter conguration for dierent network
conditions [
5
7
]. However, the way they construct an ML model
for each user group (UG) cannot address the issue that a large num-
ber of UGs exist in a large-scale network. For example, over 18000
UGs (grouped based on "state-city-ASN" rule) can be obtained in
Brazil, in which 27.3% of national LVS sessions will occupy at least
100 ML models to achieve trac transmission optimization (see
§3.2). That is actually unable to be deployed due to the non-trival
memory or computation overhead caused by numerous ML models.
Therefore, a deployable CC scheme is highly required for adaptive
parameter congurations based on dierent network conditions.
In this paper, we make large-scale network measurements for
LVS trac transmissions, in which we can nd the same CC param-
eter conguration can bring dierent network QoS performance
among various UGs (see §3.3). Besides, we also learn the better-
performed sessions with lower retransmission ratios (retran_ratios)
or larger goodput/rtt value
1
have similar QoS features in a same
UG (see §3.4). Based on the above measurement and analysis, a
novel CC framework named AutoPlex is proposed, which can au-
tomatically multiplex QoS metrics of better-performed sessions in
the last period and makes adaptive CC parameter congurations
for next-period LVS trac. Take BBR as an example, maximun
1
The value of goodput/rtt is usually leveraged to evaluate the CC performance [
8
],
which is used as a target for LVS transmission optimization.
NAI ’22, August 22, 2022, Amsterdam, Netherlands Wu. et al.
pacing rate (max_pacing_rate) and maximum congestion window
(max_cwnd) can be congured based on little-changed QoS val-
ues such as goodput, maximum inight size (max_inight) and
max_cwnd that better-performed LVS sessions introduce in the
last period. Compared to rule-based CC methods, AutoPlex en-
ables adaptive parameter congurations for dierent UGs that
has various network conditions without incurring heavyweight
ML overhead. Meanwhile, AutoPlex supports user-dened policies
to achieve directional optimization for some specic QoS metric,
which act as selection criteria to extract better-performed LVS
sessions. For example, the QoS metrics of LVS trac with lower
retran_ratio can be leveraged to congure CC parameters if the pol-
icy is dened to reduce retran_ratio. We implement our proposed
AutoPlex and partially deploy it in live CDN proxy that is based
on QUIC protocol and BBR algorithm for LVS trac transmission.
The experimental results demonstrate the potential of AutoPlex
for LVS application. Concretely, the average (or 90th-percentile)
retran_ratio can be reduced by 24%
27% (or 32%
40%) while the
average value of goodput/rtt is promoted by by 14% 32%.
The remainder of this paper is organized as follows. §2 intro-
duces the related work about congestion control. §3 introduces the
background and motivation. Then, the design details of AutoPlex
are depicted in §4, and experimental evaluation is described in §5.
Finally, we conclude this article in §6.
2 RELATED WORK
Reno [
2
] is the most well known scheme that proposes the key
concepts of parameters in CC such as congestion window (cwnd),
inight size, initial window, etc. Many TCP variants [
3
,
4
,
9
14
] are
aimed at specic networks and modify or improve the settings of
these parameters, based on their specic insights or assumptions.
For example, Vegas [
9
] sets cwnd based on RTT changes in networks
with stable delay. BIC [
10
] uses binary search to set cwnd, and
Cubic [
3
] uses a cubic function. Verus [
11
] and Sprout [
12
] are
designed specically for wireless networks to better t the frequent
changes in bandwidth. BBR [
4
] estimates the maximum bandwidth
to set pacing rate and computes the bandwidth and delay product
(bdp) to set cwnd. TACK [
15
] computes bdp and the minimum
RTT to set ACK frequency rather than cwnd in wireless network
cases. Although these schemes can improve TCP’s adaptability to
dierent network scenarios, for a specic scheme, its underlying
logic and way of setting CC parameters remain inexible.
Machine learning (ML) enables more exible CC parameter con-
guration for dierent network conditions [
5
7
]. We roughly di-
vide the ML-powered schemes into intra-session learning-based
and inter-session learning-based schemes.
Intra-session learning-based schemes learn the network con-
ditions (as well as status) in a same session and adjust the CC
parameters in real time. For example, Vivace [
16
] and PCC [
5
]
adjusts sending rates in real time, and determines the size of the
increment according to a performance utility function gradient.
RemyCC [
17
] iteratively searches for a state-action mapping table
to maximize an objective function. However, this intra-session esti-
mation still suer from inexibility since it relies on the stability
and predictability of the underlying network.
Table 1: User group (UG) amount and session ratio with 13.56
million live video trac in Brazil.
Grouping Rules UG Num. Session Ratio
Top 10 Top 50 Top 100
State-City-ASN 18620 11.8% 22.2% 27.3%
State-ASN 10326 16.7% 27.7% 32.9%
ASN 8329 30.9% 35.7% 38.3%
In the contrast, inter-session learning-based schemes learn the
network conditions across dierent sessions and adjust CC parame-
ters in the context of the same UG. For example, Indigo [
8
] uses an
LSTM model to train across a wide range of scenarios, guided by
reaching the optimal operating point of the network. Orca [
18
] com-
bines a learning model and rule-based scheme to ensure available
recoveries from wrong equilibrium. However, as discussed before,
the way they construct an ML model for each UG cannot address
the issue that a large number of UGs exist in a large-scale network
(see §3.2). In this paper, we propose AutoPlex to improve the CC
adaptability while reduces the computation and memory overhead
of inter-session learning-based schemes.
3 MOTIVATION
3.1 LVS Demands Flexible CC Schemes
It is well-studied that one-size-ts-all scheme does not exist in
CC [
19
]. Specically, for the worldwide LVS, covering a wide va-
riety of network conditions in various regions and countries, no
single scheme can adequately prevail. Take a famous live platform
as an example, the average delay in Malaysia is 86ms, while in
Hainan Province of China it is 36ms. Besides, the average available
bandwidth in Hainan is 7.3 Mbps for each LVS streams compared
to 3.2Mbps in Turkey. Moreover, LVS streams incur average 5.2%
packet loss rate in Turkey and 3.8% in Brazil, respectively. For a
same region such as Brazil, the average, 50th- and 90th-percentile
delays (available bandwidths) are 52ms (3.9Mbps), 27ms (5.4Mbps)
and 169ms (15.2Mbps), respectively2.
As discussed above, rule-based CC schemes (e.g., Reno, CUBIC,
Sprout, BBR, etc.) show inexibility in the underlying logic and way
of setting CC parameters. It is expected that dierent choices of
CC parameter settings to fare dierently in dierent contexts (e.g.,
UGs). Recently, it has been demonstrated that ML-powered schemes
show great potential to improve the exibility of CC [5–8, 17, 18].
3.2
UG-based and ML-powered Schemes Do Not
Scale Well
To pursue a more stable and precise control eect, many existing
researches focus on dividing user groups (UGs) based on client’s
region and ISP/AS number (ASN) information and provide some spe-
cic parameters for each UG [
20
][
6
]. However, non-trival memory
or computation overhead incurred by ML models can be the bottle-
neck factor of real deployment [
21
], especially when the amount
of user group becomes very large. Table 1 shows the amount of
2
The data mentioned above is all based on our large-scale network measurements for
LVS trac transmission.
AutoPlex: Inter-Session Multiplexing Congestion Control NAI ’22, August 22, 2022, Amsterdam, Netherlands
Avg. 50th 75th 90th
Metrics
0
2%
4%
6%
8%
10%
12%
Retran ratio
UG 1
UG 2
UG 3
(a) Retransmission ratio
Avg. 50th 75th 90th
Metrics
0
20
40
60
80
100
120
SRTT (ms)
UG 1
UG 2
UG 3
(b) Smooth RT T
Avg. 50th 75th 90th
Metrics
0
2
4
6
8
10
12
14
Goodput (Mbps)
UG 1
UG 2
UG 3
(c) Goodput
Figure 1: The QoS dierences of LVS streams that belong to three dierent UGs.
UG under dierent grouping rules in Brazil. For more ne-grained
grouping rules (e.g., state-city-ASN), more UGs can be obtained
while with the most coarse-grained rule (i.e., ASN)
3
, UG number
can also reach 8329. Meanwhile, we also count the top 10, 50, and
100 UGs based on their session amounts and nd that leverag-
ing 100 ML models for top 100 UGs can only cover 27.3% ~38.3%
steams of entire LVS sessions. Compared to their limited benets,
the overhead introduced by ML models is unacceptable in complex
networks, especially with a large number of UGs. This reveals that
UG-based and ML-powered schemes do not scale well.
3.3
LVS Session Performance Varies in Dierent
UGs
We make large-scale network measurements for 3 months and
record session data of 800+ million LVS streams, which contains
client IP address and some QoS metrics, e.g., minimum, maximum
and smooth RTT (denoted by min_rtt,max_rtt and srtt, respec-
tively), goodput, max_cwnd,max_inight,retran_ratio, etc. We use
"state-city-ASN" as a grouping rule and construct <IP, UG> mapping
table to extract QoS values for every UG.
Figure 1 shows the dierences of retran_ratio,srtt and goodput
between dierent UGs, which is depicted from the perspective of
average, 50th-, 75th and 90th-percentile values. We can learn LVS
sessions in dierent UGs perform dierently in terms of the above
three QoS metrics. For example, the LVS sessions of UG 2 result in
a smaller and more stable retran_ratio while a lower latency (evalu-
ated by srtt) is actually introduced in UG 1 (instead of in UG 2 and
3) from the perspective of average, 50th-, 75th- and 90th-percentile
values. Meanwhile, UG 3 always holds a smaller goodput than the
other two UGs, which might be caused by its higher retran_ratio
and smaller goodput. Therefore, adaptively conguring a series of
CC parameters for each UG will be highly required to meet dierent
network conditions and optimize LVS transmissions.
3.4 Better-Performed Sessions Perform
Similarly in the Same UG
To explore potential associations of session QoS within a same UG,
we randomly extract six-day LVS trac data of UG 1. The time
3
In this paper, we regard ASN (instead of ISP) as the basic for user grouping, as ASN
can better reect the session similarity in a same UG. With user grouping, each user
or client IP address can be exactly divided into a UG.
interval Tis set to 15 minutes, where some LVS trac with spe-
cic ranges of retran_ratio and goodput/srtt is extracted as better-
or worse-performed LVS sessions (dened as below). Figure 2
shows that better-performed sessions extremely outperform worse-
performed streams in terms of retran_ratios (
0.03% vs.
2.9%) and
goodput/srtt (872 Mbps/s vs. 53Mbps/s) on average.
Better-performed sessions are dened as the LVS streams
with 5th- to 20th-percentile retran_ratios or descending good-
put/srtt in all last-period sessions.
Worse-performed sessions are dened as the LVS streams
with 75th- to 90th-percentile retran_ratios or descending
goodput/srtt in all last-period sessions.
03/25 03/26 03/27 03/28 03/29 03/30 03/31
0
2.5%
5.0%
7.5%
10.0%
12.5%
15.0%
17.5%
Average retran_ratio
All LVS sessions
Better-performed sessions
Worse-performed sessions
(a) Retransmission ratio
03/25 03/26 03/27 03/28 03/29 03/30 03/31
0
5
10
15
20
25
Average goodput/srtt (×10 2Mbps/s)
All LVS traffic
Better-performed sessions
Worse-performed sessions
(b) Goodput/srtt
Figure 2: Better- and worse-performed LVS sessions.
To explore the similarity between better-performed LVS sessions
in a same UG, we analyze the changes of goodput, max_inight
and max_cwnd in two adjacent time periods. Figure 3(a) depicts the
cumulative goodput change rates of better- and worse-performed
sessions in a same UG. We nd that better-performed sessions has
little-changed goodput in two adjacent periods, whose average
change rate is 25%
30% of that of worse-performed sessions.
The cumulative changes of max_cwnd and max_inight for better-
and worse-performed session is shown in Figure 3(b) and 3(c), in
which the curve slope represents the change rate compared to last-
period values. We can learn the better-performed LVS sessions
can introduce smoother max_cwnd and max_inight changes than
those worse-performed trac. For example, 3.0
⇥⇠
5.1
changes
of max_cwnd and 1.3
⇥⇠
3.0
changes of max_inight are caused
by worse-performed sessions compared to better-performed LVS
streams. Therefore, better-performed LVS sessions in a same UG
perform similarly with little-changed QoS values, which can be
leveraged to automatically congure CC parameters for next period.
NAI ’22, August 22, 2022, Amsterdam, Netherlands Wu. et al.
03/25 03/26 03/27 03/28 03/29 03/30 03/31
0
2
4
6
8
Cumulative gooput change rate (×10 4)
Better-performed sessions (retran_ratio)
Worse-performed sessions (retran_ratio)
Better-performed sessions (goodput/srtt)
Worse-performed sessions (goodput/srtt)
(a) Cumulative goodput change rates
03/25 03/26 03/27 03/28 03/29 03/30 03/31
0
1
2
3
4
5
Cumulative max_cwnd changes (×10 4)
Better-performed sessions (retran_ratio)
Worse-performed sessions (retran_ratio)
Better-performed sessions (goodput/srtt)
Worse-performed sessions (goodput/srtt)
(b) Cumulative max_cwnd changes
(c) Cumulative max_inight changes
Figure 3: The QoS similarities of better-performed LVS session in a same UG.
4 THE AUTOPLEX DESIGN
4.1 AutoPlex Framework
AutoPlex enables to enhance LVS transmission performance by
automatically multiplexing inter-session QoS similarities of better-
performed LVS sessions in the last period, which will be used to
congure next-period CC parameters. In particular, the maximum
value of cwnd and pacing_rate are all congured as little-changed
based on the QoS values of latest better-performed sessions. To
meet diverse requirements, AutoPlex framework supports user-
dened QoS policies (e.g., lowering retran_ratio or promoting the
value of goodput/srtt), which will be followed to learn similar QoS
metrics of better-performed LVS sessions. Figure 4 depicts AutoPlex
framework that contains three modules for measurement, decision
and execution. Note that these modules can be deployed on a same
CDN proxy server or on dierent servers.
Data
Collection
User
Grouping
QoS of
Each UG
Measurement Module Decision Module
User-defined
QoS targets
User-defined
QoS targets
data
collection
user
grouping
QoS of
each UG
Measurement Module Decision Module
parameters of 'better' sessions
user-defined
QoS policies
parameters of each UG
Execution Module
New LVS
request <IP, UG> <UG, Parameters>
UG_ID max_cwnd
max_pacing_rate
New LVS
response
Figure 4: The framework of AutoPlex.
Measurement module can perform data collection of all LVS
sessions and get last-period QoS values of each UG by looking up
the pre-established <IP, UG> mapping table. With user-dened QoS
policies, decision module obtains better-performed LVS sessions
and records their similar QoS values (e.g., goodput,max_cwnd and
max_inight) that will be utilized for next-period CC parameter
congurations. When receiving a new LVS request, execution mod-
ule extracts remote IP and gains UG identier UG_ID by inquiring
our pre-established <IP, UG> table. With <UG, parameters> map-
ping table, CC parameters (i.e., max_cwnd and max_pacing_rate) of
this UG will be obtained and then be congured for the response
of this LVS session. In AutoPlex, measurement module counts QoS
values of each UG for every time interval of T(e.g., T = 30 mins),
Algorithm 1 CC parameter conguration pseudo code.
1: function CC  ()
2: Require:
3: %DBA ,&D6
4: Compute:
5: &14CC4A ⇢G CA02C (&D6,%
DBA )
6: 6>>3? DC ,<0 G _8=5 ;86⌘C,<0G_2F=3 <40= (&14C C4A )
7: ?028=6_A0C 4_;0BC =6>>3 ?D C
8: <0G_2F=3_;0BC =<0G _8=5 ;86⌘C+<0G_2F=3
2
9: Operate:
10: ?028=6_A0C 4_2?C =?028=6_608= <0G _1F
11: ?028=6_A0C 4 =5(?028=6_A0C 4 _2?C,?028=6_A0C4 _;0BC)
12: 8=5 ;86⌘C_20? =5(8=5 ;86⌘C_20?,<0G_2F=3_;0BC)
13: end function
which is utilized by decision module for maintaining latest <UG,
parameters> mapping table in execution module.
4.2 Design Details
AutoPlex focuses on achieving automated congurations for CC pa-
rameters by multiplexing similar QoS values of those LVS sessions
that performed better in the last period. In this paper, more ne-
grained grouping rule (i.e., state-city-ASN) is leveraged to classify
LVS consumers. Algorithm 1 shows the pseudo code for CC pa-
rameter congurations, where user-dened policies
%DBA
and each
UG’s QoS values
&D6
(i.e., retran_ratio,goodput,srtt,max_inight
and max_cwnd) act as inputs for parameter computations.
Better-performed session extraction. AutoPlex enables deci-
sion module to extract similar QoS values (denoted by
&14CC4A
) of
better-performed LVS sessions in each UG. In Algorithm 1, function
⇢GCA 02C(·)
denotes sorting
&D6
according to
%DBA
-related metric
(e.g., retran_ratio) and then obtains
&14CC4A
from
8
to
9
percentile
values of the sorted
&D6
. For example,
8=
5and
9=
20 are set to
extract
&14CC4A
from
&D6
(ascended by retran_ratio) in §3.3. Then,
the average values of goodput,max_inight and max_cwnd can be
obtained while taking the mean operation for &14CC4A .
CC parameter conguration. With goodput,max_inight and
max_cwnd of better-performed sessions, AutoPlex rst computes
pacing_rate_last and max_cwnd_last, which reects average pac-
ing_rate and our desired max_cwnd, respectively, in the last period.
AutoPlex: Inter-Session Multiplexing Congestion Control NAI ’22, August 22, 2022, Amsterdam, Netherlands
f1 f2 f3 f4
f1 f2 f3 f4
path
Internet
服务器
终端用户
音视频报文
音视频报文
消息确认报文 消息确认报文
CDN servers
UG 1 UG 2 UG 3 UG 4
CDN proxy
Internet
Figure 5: The experimental evaluation.
Then pacing rate and inight cap can be congured for next-period
LVS trac transmissions, which will be elaborated in §4.3.
4.3 AutoPlex-based BBR Implementation
In this section, we give an example of how BBR makes use of the
two outputs of AutoPlex, i.e., pacing_rate_last and max_cwnd_last,
to adjust its pacing rate and inght cap.
Adjust pacing rate. A BBR sender controls its pacing_rate with
the help of pacing and an estimated maximum bandwidth max_bw,
where the estimate for max_bw is based on a windowed maximum
lter of the delivery rate that the receiver experiences. The pac-
ing rate is varied in an eight-phase cycle using a pacing_gain of
5/4,3/4,1,1,1,1,1,1, where each phase lasts for an RTT. The BBR
implementation based on AutoPlex computes the pacing rate (i.e.,
pacing_rate_cpt) as line 11 of Algorithm 1 shows. In this paper,
5(·)
is a customizable function according to the optimization goal. For
lower retran_ratio or higher goodput/srtt, we set
5(·)
as a function
that takes the minimum value when
01
and takes the average
value when 0>1as follows.
5(0,1)=(001
0+1
20>1(1)
Adjust inight cap. Although BBR does not use a congestion
window or ACK clocking to control the amount of inight data,
it uses an inight data limit (e.g., 2bdp), which we call it inght
cap (denoted by inight_cap) in this paper. The BBR implemen-
tation based on AutoPlex computes the inight cap as line 12 in
Algorithm 1 shows. This enables little-changed maximum values
of inight_cap, especially when the computed inight_cap exceeds
last-period max_cwnd_last.
Discussion. In real-world deployment, we can activate AutoPlex
just for some selected LVS trac, e.g., high-priority sessions from
VIP users. In this case, low-priority sessions are leveraged to explore
better QoS metrics, which will be multiplexed by high-priority ses-
sions. This partial deployment strategy enhances the convergence
of AutoPlex. Note that, even though the same CC parameters are
congured, better- and worse-performed LVS sessions in a same
UG can be still identied as changing network conditions and status.
UG 1 UG 2 UG 3 UG 4
0
1%
2%
3%
4%
Avgerage retran_ratio
Baseline
AutoPlex
(a) Average retran_ratio
UG 1 UG 2 UG 3 UG 4
0
0.2%
0.4%
0.6%
0.8%
50th-percentile retran_ratio
Baseline
AutoPlex
(b) 50th-percentile retran_ratio
UG 1 UG 2 UG 3 UG 4
0
1%
2%
3%
4%
75th-percentile retran_ratio
Baseline
AutoPlex
(c) 75th-percentile retran_ratio
UG 1 UG 2 UG 3 UG 4
0
2%
4%
6%
8%
10%
12%
90th-percentile retran_ratio
Baseline
AutoPlex
(d) 90th-percentile retran_ratio
Figure 6: Retransmission ratios of AutoPlex.
Therefore, the idea of multiplexing inter-session QoS metrics for
CC parameter congurations can be always achieved over time.
5 EXPERIMENTAL EVALUATION
In this section, we implement AutoPlex prototype and partially
deploy it in the real live CDN proxy (as Figure 5 shows) that is de-
veloped based on Nginx [
22
], which enables measurement, decision
and execution modules of AutoPlex
4
. We randomly select four UGs
(i.e., UG 1
UG 4) to evaluate AutoPlex using the following metrics:
retran_ratio and gooodput/srtt, which can be set as user-dened
policies. The time period is set to 30 mins and LVS sessions in our
evaluation are all based on QUIC protocol and BBRv1 scheme (as
baseline) for transmission.
5.1 Retransmission Ratio
In AutoPlex framework, users can dene their own preference as an
optimization goal (e.g., lowering retran_ratio) for LVS trac trans-
mission. In this section, we select better-performed sessions with
5th- to 20th-percentile retran_ratio for CC parameter conguration
of next-period LVS trac.
Figure 6 depicts average, 50th-, 75th- and 90th-percentile re-
tran_ratio changes of dierent UGs, respectively, when deploy-
ing AutoPlex prototype. We can learn AutoPlex can introduce
lower retran_ratio, where the average, 75th- and 90th-percentile
retran_ratios are reduced by 24%
27%, 28%
41% and 32%
44%, respectively, compared to the existing BBR scheme. Note that
50th-percentile retran_ratio does not gain a obvious optimization,
which always keeps close to optimal values (i.e., 0.37%
0.70%)
that are dierent to be further optimized. Meanwhile, AutoPlex
brings dierentiated reduction ratios for dierent UGs to optimize
their retran_ratios. This is because traditional xed CC parameter
settings for entire networks can introduce various baseline values
for each UG while AutoPlex can adjust CC parameters based on
each UG’s network conditions.
4
In this paper, we meanly focus on the LVS transmission optimization from CDN proxy
to user clients. Then video streaming interaction is beyond the scope of our research.
NAI ’22, August 22, 2022, Amsterdam, Netherlands Wu. et al.
UG 1 UG 2 UG 3 UG 4
0
100
200
300
400
500
Avgerage goodput/srtt (Mbps/s)
Baseline
AutoPlex
(a) Average goodput/srtt
UG 1 UG 2 UG 3 UG 4
0
100
200
300
400
50th-percentile goodput/srtt (Mbps/s)
Baseline
AutoPlex
(b) 50th-percentile goodput/srtt
UG 1 UG 2 UG 3 UG 4
0
100
200
300
400
500
600
700
75th-percentile goodput/srtt (Mbps/s)
Baseline
AutoPlex
(c) 75th-percentile goodput/srtt
UG 1 UG 2 UG 3 UG 4
0
200
400
600
800
1000
1200
90th-percentile goodput/srtt (Mbps/s)
Baseline
AutoPlex
(d) 90th-percentile goodput/srtt
Figure 7: Goodput/srtt values of AutoPlex.
5.2 Ratio of Goodput to SRTT
AutoPlex enables to enhance goodput/srtt by multiplexing QoS val-
uess of better-performed LVS sessions to congure max_cwnd and
max_pacing_rate for the next LVS sessions. Note that the sessions
with 80th- to 95th-percentile values of goodput/srtt in the last period
are extracted as better-performed LVS sessions.
Figure 7 shows the values of goodput/srtt in dierent UGs. We
can know AutoPlex can achieve a better optimization for the value
of goodput/srtt, in which the average, 50th- and 75th-percentile
values are all signicantly enhanced (i.e., by 14%
32%, 44%
128%, and 19%
56%, respectively). By contrast, the 90th-percentile
values have not been improved obviously. This might be because
these values are already close to the optimal and have limited room
to be further optimized. AutoPlex can introduce dierentiated op-
timization results for various UGs. For example, 32% and 14% im-
provements of average goodput/srtt are incurred for UG 1 and UG
3, respectively. For the metric of goodput/srtt, AutoPlex can achieve
obvious optimization for those poor performances (e.g., 50th- and
75th-percentile values) while keeping plausible improvements for
already-better performances.
6 CONCLUSION AND FUTURE WORK
In this paper, we take a rst step to design, analyze, implement
and evaluate inter-session multiplexing congestion control frame-
work named AutoPlex, which is based on our performed large-scale
network measurements for live video streams. AutoPlex enables
adaptive CC parameter congurations by multiplexing QoS values
of better-performed LVS trac in the last period. Besides, user-
dened policies can also be supported in AutoPlex, which act as
standards to learn QoS features of better-performed LVS trac. We
complete experimental evaluation by implementing and deploying
AutoPlex in the real live CDN proxy, whose results demonstrate
the huge potentials that AutoPlex introduces for optimizing LVS
trac transmission.
In the future, machine learning such as DRL can be leveraged to
select better-performed LVS sessions based on the QoS performance
gains of prior session selection. Besides, exploring more parameter
settings [
23
,
24
] by multiplexing inter-session QoS values is also a
natural extension of AutoPlex.
ACKNOWLEDGMENTS
We would like to thank the NAI reviewers and Dr. Kai Gao for their
extensive and valuable feedback. We also thank Senzhen Liu for his
kind comments during our performing experimental evaluation.
REFERENCES
[1]
Max Wilbert. Top 22 live streaming platforms: Everything you need to know.
https://www.dacast.com/blog/, 2022.
[2]
Van Jacobson. Congestion avoidance and control. In ACM SIGCOMM computer
communication review, volume 18, pages 314–329. ACM, 1988.
[3]
Sangtae Ha, Injong Rhee, and Lisong Xu. Cubic: a new tcp-friendly high-speed
tcp variant. ACM SIGOPS operating systems review, 42(5):64–74, 2008.
[4]
Neal Cardwell, Yuchung Cheng, C Stephen Gunn, Soheil Hassas Yeganeh, and
Van Jacobson. BBR: Congestion-based congestion control: Measuring bottleneck
bandwidth and round-trip propagation time. ACM Queue, 14(5):20–53, 2016.
[5]
Mo Dong, Qingxi Li, Doron Zarchy, P Brighten Godfrey, and Michael Schapira.
PCC: Re-architecting congestion control for consistent high performance. In
USENIX NSDI, pages 395–408, 2015.
[6]
Xiaohui Nie, Youjian Zhao, Zhihan Li, Guo Chen, Kaixin Sui, Jiyang Zhang, Zijie
Ye, and Dan Pei. Dynamic tcp initial windows and congestion control schemes
through reinforcement learning. IEEE JSAC, 37(6):1231–1247, 2019.
[7]
Xu Li, Feilong Tang, Jiacheng Liu, Laurence T Yang, Luoyi Fu, and Long Chen.
AUTO: Adaptive congestion control based on multi-objective reinforcement
learning for the satellite-ground integrated network. In USENIX ATC, 2021.
[8]
Francis Y Yan, Jestin Ma, Greg D Hill, Deepti Raghavan, Riad S Wahby, Philip
Levis, and Keith Winstein. Pantheon: the training ground for internet congestion-
control research. In USENIX ATC, pages 731–743, 2018.
[9]
Lawrence S Brakmo, Sean W O’Malley, and Larry L Peterson. TCP Vegas: New
techniques for congestion detection and avoidance, volume 24. ACM, 1994.
[10]
Lisong Xu, Khaled Harfoush, and Injong Rhee. Binary increase congestion control
(bic) for fast long-distance networks. In IEEE Infocom, volume 4, pages 2514–2524,
2004.
[11]
Yasir Zaki, Thomas Pötsch, Jay Chen, Lakshminarayanan Subramanian, and
Carmelita Görg. Adaptive congestion control for unpredictable cellular networks.
In ACM SIGCOMM, pages 509–522, 2015.
[12]
Keith Winstein, Anirudh Sivaraman, and Hari Balakrishnan. Stochastic forecasts
achieve high throughput and low delay over cellular networks. In USENIX NSDI,
pages 459–472, 2013.
[13]
Venkat Arun and Hari Balakrishnan. Copa: Practical delay-based congestion
control for the internet. In USENIX NSDI, pages 329–342, 2018.
[14]
Tong Li, Kai Zheng, and Ke Xu. Acknowledgment on demand for transport
control. IEEE Internet Computing, 25(2):109–115, 2021.
[15]
Tong Li, Kai Zheng, Ke Xu, Rahul Arvind Jadhav, Tao Xiong, Keith Winstein, and
Kun Tan. Tack: Improving wireless transport performance by taming acknowl-
edgments. In ACM SIGCOMM, pages 15–30, 2020.
[16]
Mo Dong, Tong Meng, DoronZarchy, Engin Arslan, Yossi Gilad, Brighten Godfrey,
and Michael Schapira.
{
PCC
}
vivace: Online-learning congestion control. In
USENIX NSDI, pages 343–356, 2018.
[17]
Keith Winstein and Hari Balakrishnan. Tcp ex machina: Computer-generated
congestion control. ACM SIGCOMM Computer Communication Review, 43(4):123–
134, 2013.
[18]
Soheil Abbasloo, Chen-Yu Yen, and H Jonathan Chao. Classic meets modern: A
pragmatic learning-based congestion control for the internet. In ACM SIGCOMM,
2020.
[19]
Michael Schapira and Keith Winstein. Congestion-control throwdown. In ACM
Hotnets, pages 122–128, 2017.
[20]
Junchen Jiang, Shijie Sun, Vyas Sekar, and Hui Zhang. Pytheas: Enabling
data-driven quality of experience optimization using Group-BasedExploration-
Exploitation. In USENIX NSDI, 2017.
[21]
Minsik Cho and Daniel Brand. MEC: Memory-ecient convolution for deep
neural network. In International Conference on Machine Learning, pages 815–824.
PMLR, 2017.
[22]
F5 Networks Inc. NGNIX: Advanced load balancer, web server, & reverse proxy.
https://www.nginx.com, 2022.
[23]
Tong Li, Kai Zheng, Ke Xu, Rahul Arvind Jadhav, Tao Xiong, Keith Winstein,
and Kun Tan. Revisiting acknowledgment mechanism for transport control:
Modeling, analysis, and implementation. IEEE/ACM TON, 29(6):2678–2692, 2021.
[24]
Hui Xie and Li Tong. Revisiting loss recovery for high-speed transmission. In
IEEE WCNC, pages 1–6, 2022.
... For over 40 years, many different congestion control algorithms (CCAs) have been developed for specific environments, including over 15 CCAs in the Linux kernel alone [1]. However, no single CCA can adequately prevail across all environments [2,3]. To satisfy the increasingly diverse application requirements over highly complex network conditions, learning-based CCAs have gained much attraction recently [4][5][6][7][8]. ...
Article
Video streaming is one of the most popular Internet applications that makes up a large amount of Internet traffic. A fundamental mechanism in video streaming is adaptive bitrate (ABR) selection which decides the proper compression level for each chunk of a video to optimize the users' quality of experience (QoE). The existing ABR algorithms require significant tuning and do not generalize to diverse network conditions and personalized QoE objectives. In this paper, we propose a novel framework for meta-learning based ABR design and discuss challenges of deploying learning based ABR mechanism in real-world video streaming systems. We utilize the proposed framework to design MetaABR, a novel adaptive bitrate selection algorithm based on meta-reinforcement learning to maximize users' QoE. By jointly training multiple learning tasks with a shared meta-critic, it can provide transferrable meta-knowledge to supervise bitrate selection across tasks, and can be applied to efficiently learn a new task in unseen environment with only a few trials. We implement MetaABR on an emulation platform which connects to the Linux network protocol stack through virtual network interfaces. Extensive experiments based on real-world traces and wireless testbed show that MetaABR achieves the best comprehensive QoE compared with the state-of-the-art ABR algorithms in a variety of network environments.
Article
Full-text available
The shared nature of the wireless medium induces contention between data transport and backward signaling, such as acknowledgment. The current way of TCP acknowledgment induces control overhead which is counter-productive for TCP performance especially in wireless local area network (WLAN) scenarios. In this paper, we present a new acknowledgment called TACK ("Tame ACK"), as well as its TCP implementation TCP-TACK. TACK seeks to minimize ACK frequency, which is exactly what is required by transport. TCP-TACK works on top of commodity WLAN, delivering high wireless transport goodput with minimal control overhead in the form of ACKs, without any hardware modification. Evaluation results show that TCP-TACK achieves significant advantages over legacy TCP in WLAN scenarios due to less contention between data packets and ACKs. Specifically, TCP-TACK reduces over 90% of ACKs and also obtains an improvement of up to 28% on goodput. A TACK-based protocol is a good replacement of the legacy TCP to compensate for scenarios where the acknowledgment overhead is non-negligible.
Article
Full-text available
The dependence on frequent acknowledgments (ACKs) is an artifact of current transport protocol designs rather than a fundamental requirement. Frequent ACKs waste resources when the overhead of ACKs is nonnegligible. However, reducing the number of ACKs may hurt transport performance. “Tame ACK” is an on-demand ACK mechanism that seeks to minimize ACK frequency, which is exactly what is required by transport.
Conference Paper
Full-text available
The shared nature of the wireless medium induces contention between data transport and backward signaling, such as acknowledgement. The current way of TCP acknowledgment induces control overhead which is counter-productive for TCP performance especially in wireless local area network (WLAN) scenarios. In this paper, we present a new acknowledgement called TACK ("Tame ACK"), as well as its TCP implementation TCP-TACK. TCP-TACK works on top of commodity WLAN, delivering high wireless transport goodput with minimal control overhead in the form of ACKs, without any hardware modification. To minimize ACK frequency, TACK abandons the legacy received-packet-driven ACK. Instead, it balances byte-counting ACK and periodic ACK so as to achieve a controlled ACK frequency. Evaluation results show that TCP-TACK achieves significant advantages over legacy TCP in WLAN scenarios due to less contention between data packets and ACKs. Specifically, TCP-TACK reduces over 90% of ACKs and also obtains an improvement of ~ 28% on good-put. We further find it performs equally well as high-speed TCP variants in wide area network (WAN) scenarios, this is attributed to the advancements of the TACK-based protocol design in loss recovery, round-trip timing, and send rate control. Full text can be downloaded at https://dl.acm.org/doi/10.1145/3387514.3405850
Article
This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint’s reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion control algorithms to run at the endpoints. In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm—the control rules for the independent endpoints—that tries to achieve this objective. In simulations with ns-2, Remy-generated algorithms outperformed human-designed end-to-end techniques, including TCP Cubic, Compound, and Vegas. In many cases, Remy’s algorithms also outperformed methods that require intrusive in-network changes, including XCP and Cubic-over-sfqCoDel (stochastic fair queueing with CoDel for active queue management). Remy can generate algorithms both for networks where some parameters are known tightly a priori, e.g. datacenters, and for networks where prior knowledge is less precise, such as cellular networks. We characterize the sensitivity of the resulting performance to the specificity of the prior knowledge, and the consequences when real-world conditions contradict the assumptions supplied at design-time.
Conference Paper
These days, taking the revolutionary approach of using clean-slate learning-based designs to completely replace the classic congestion control schemes for the Internet is gaining popularity. However, we argue that current clean-slate learning-based techniques bring practical issues and concerns such as overhead, convergence issues, and low performance over unseen network conditions to the table. To address these issues, we take a pragmatic and evolutionary approach combining classic congestion control strategies and advanced modern deep reinforcement learning (DRL) techniques and introduce a novel hybrid congestion control for the Internet named Orca. Through extensive experiments done over global testbeds on the Internet and various locally emulated network conditions, we demonstrate that Orca is adaptive and achieves consistent high performance in different network conditions, while it can significantly alleviate the issues and problems of its clean-slate learning-based counterparts.
Article
Despite many years of improvements to it, TCP still suffers from an unsatisfactory performance. For services dominated by short flows (e.g., web search, e-commerce), TCP suffers from the flow startup problem and cannot fully utilize the available bandwidth in the modern Internet: TCP starts from a conservative and static initial window (IW, 2-4 or 10), while most of the web flows are too short to converge to the best sending rate before the session ends. For services dominated by long flows (e.g., video streaming, file downloading), the congestion control (CC) scheme manually and statically configured might not offer the best performance for the latest network conditions. To address these two challenges, we propose TCP-RL, which uses reinforcement learning (RL) techniques to dynamically configure IW and CC in order to improve the performance of TCP flow transmission. Basing on the latest network conditions observed at the server side of a web service, TCP-RL dynamically configures a suitable IW for short flows through group-based RL, and dynamically configures a suitable CC scheme for long flows through deep RL. Our extensive experiments show that for short flows, TCP-RL can reduce the average transmission time by about 23%; and for long flows, compared with the performance of 14 CC schemes, TCP-RL’s performance ranks top 5 for about 85% of the 288 given static network conditions, whereas for about 90% of conditions, its performance drops by less than 12% compared with that of the best-performing CC schemes for the same network conditions.
Conference Paper
Congestion control schemes that are commonly deployed today are loss-based and were developed in the 2000s. The internet has changed dramatically since then, and these schemes are no longer suitable. This has prompted new research interest in this area, ranging from complex machine learning and optimization techniques [1, 2, 3, 4] to exploring usage of hitherto under-explored ack-arrival rate as signals [5, 6, 7]. It has long been recognized that delay-based congestion control overcomes many challenges that loss-based schemes face [8, 9]. However they have challenges of their own that have precluded their deployment. In this work, we identify and solve some of these challenges to create Copa, a practical delay-based congestion control algorithm for the Internet. Copa solves three challenges. (1) It periodically empties the queue to accurately measure the minimum (base) RTT, (2) it proposes a simple estimator for queuing delay that is robust to noise in RTT measurements, (3) it introduces "TCP-mode switching": normally Copa maintains low delays, but typically delay-sensitive schemes get low throughput when a buffer-filling flow shares the bottleneck. To solve this problem, when Copa detects a non-Copa flow, such as a buffer-filling TCP, it switches to a TCP-competitive mode to get its fair share of the bandwidth. Based on work published at Arun, Venkat, and Hari Balakrishnan. "Copa: Practical Delay-Based Congestion Control for the Internet." 15th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 18). USENIX} Association, 2018.
Conference Paper
Congestion control is a perennial topic of networking research. In making decisions about who sends data when, congestion-control schemes prevent collapses and ultimately determine the allocation of scarce communications resources among contending users and applications. The field has seen considerable recent activity. Even after three decades of research, basic principles and techniques remain up for debate. In this throwdown-as-paper, the authors find themselves at loggerheads over the fundamental tenets of congestion control.
Article
Convolution is a critical component in modern deep neural networks, thus several algorithms for convolution have been developed. Direct convolution is simple but suffers from poor performance. As an alternative, multiple indirect methods have been proposed including im2col-based convolution, FFT-based convolution, or Winograd-based algorithm. However, all these indirect methods have high memory-overhead, which creates performance degradation and offers a poor trade-off between performance and memory consumption. In this work, we propose a memory-efficient convolution or MEC with compact lowering, which reduces memory-overhead substantially and accelerates convolution process. MEC lowers the input matrix in a simple yet efficient/compact way (i.e., much less memory-overhead), and then executes multiple small matrix multiplications in parallel to get convolution completed. Additionally, the reduced memory footprint improves memory sub-system efficiency, improving performance. Our experimental results show that MEC reduces memory consumption significantly with good speedup on both mobile and server platforms, compared with other indirect convolution algorithms.