TCP ex Machina: Computer-Generated Congestion Control

Chapter

Full-text available

Oct 2022

Ersan Okatan

Tüm dünyada en ücra köşelere kadar ulaşmış olan ağ sistemlerinin hayatımıza olan etkileri bugün her zaman olduğundan daha önemli bir hale gelmiştir. Yaşamımızı, üretim, ulaşım vb tüm alanları kapsayan bu devasa sistemin eksiklerinin giderilmesi, kaynak kullanımı ve performans açısından daha verimli hale getirilmesi için yapılan çalışmalarda, tüm alanlarda etkisini hissettiren Yapay Zeka (YZ) tekniklerinin kullanılmaması mümkün değildir. Geçmişi uzun yıllar öncesine dayanan “Bilgisayar Ağlarında Yapay Zeka” konusunda yapılan çalışmalarda önemli aşamalar kat edilmiştir, ancak daha çok çalışma yapılması gereklidir. YZ ağ sistemlerine insan müdahalesini azaltarak kendi kendine bakım gerektirmeden çalışabilen sistemler geliştirilmesini sağlayacak ve insan kaynaklı riskleri azaltacaktır. Ayrıca ağ öğelerinin anlık ihtiyaçlara göre kendini şekillendirmesini ve böylece kaynakların daha verimli kullanımını sağlayacaktır. YZ fiziksel katmanda özellikle veri iletiminde kayıpları ve hataları azaltmaya yönelik çalışmalara katkı sağlamıştır. Diğer katmanlarda ise IP tabanlı paketlerin içeriklerine göre paketlerin hedeflerine daha çabuk ve daha az kaynak harcayarak ulaştırılabilmesi, trafik sıkışıklığını engelleyerek ağ kaynaklarının gerektiğinde otonom bir şekilde kendini değiştirebilmesi konularında çalışmalar yapılmıştır. YZ’nin ağ sistemlerinde bir diğer önemli çalışma alanı siber güvenliktir. YZ tüm sistemi izleyerek olası anormal durum değişikliklerini, saldırıları, güvenlik zafiyetlerini tespit edebilir. Bu çalışma YZ’nin bilgisayar ağlarında hangi alanlarda ve nasıl kullanıldığını incelemekte ayrıca YZ’nin bilgisayar ağlarının gelişimine katkıda bulunurken ne gibi zorluklarla karşılaştığından bahsetmektedir.

Symbolic Distillation for Learned TCP Congestion Control

Preprint

Oct 2022

Recent advances in TCP congestion control (CC) have achieved tremendous success with deep reinforcement learning (RL) approaches, which use feedforward neural networks (NN) to learn complex environment conditions and make better decisions. However, such "black-box" policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath due to the use of complex NNs. This paper proposes a novel two-stage solution to achieve the best of both worlds: first to train a deep RL agent, then distill its (over-)parameterized NN policy into white-box, light-weight rules in the form of symbolic expressions that are much easier to understand and to implement in constrained environments. At the core of our proposal is a novel symbolic branching algorithm that enables the rule to be aware of the context in terms of various network conditions, eventually converting the NN policy into a symbolic tree. The distilled symbolic rules preserve and often improve performance over state-of-the-art NN policies while being faster and simpler than a standard neural network. We validate the performance of our distilled symbolic rules on both simulation and emulation environments. Our code is available at https://github.com/VITA-Group/SymbolicPCC.

Pareto: Fair Congestion Control With Online Reinforcement Learning

Article

Sep 2022

Modern-day computer networks are highly diverse and dynamic, calling for fair and adaptive network congestion control algorithms with the objective of achieving the best possible throughput, latency, and inter-flow fairness. Yet, prevailing congestion control algorithms, such as hand-tuned heuristics or those fueled by deep reinforcement learning agents, may struggle to perform well on multiple diverse networks. Besides, many algorithms are unable to adapt to time-varying real-world networking environments; and some algorithms mistakenly overlooked the need of explicitly taking inter-flow fairness into account, and just measured it as an afterthought. In this paper, we propose a new staged training process to train Pareto , a new congestion control algorithm that generalizes well to a wide variety of environments. Different from existing congestion control algorithms running reinforcement learning agents, Pareto is trained for fairness using the first multi-agent reinforcement learning framework that is communication-free. Pareto continues training online adapting to newly observed environments in the real-world. Our extensive array of experiments shows that Pareto (i) performs well in a wide variety of environments, (ii) offers the best fairness when it comes to competing with other flows sharing the same network link, and (iii) improves its performance with online learning to surpass the state-of-the-art.

TCP-Drinc: Smart Congestion Control Based on Deep Reinforcement Learning

Article

Full-text available

Jan 2019

As wired/wireless networks become more and more complex, the fundamental assumptions made by many existing TCP variants may not hold true anymore. In this paper, we develop a model-free, smart congestion control algorithm based on deep reinforcement learning (DRL), which has a high potential in dealing with the complex and dynamic network environment. We present TCP-Drinc, acronym for Deep ReInforcement learNing based Congestion control, which learns from past experience in the form of a set of measured features to decide how to adjust the congestion window size.We present the TCP-Drinc design and validate its performance with extensive ns-3 simulations and comparison with five benchmark schemes.

Internet Congestion Control via Deep Reinforcement Learning

Preprint

Full-text available

Oct 2018

We present and investigate a novel and timely application domain for deep reinforcement learning (RL): Internet congestion control. Congestion control is the core networking task of modulating traffic sources' data-transmission rates so as to efficiently and fairly allocate network resources. Congestion control is fundamental to computer networking research and practice, and has recently been the subject of extensive attention in light of the advent of challenging Internet applications such as live video, augmented and virtual reality, Internet-of-Things, and more. We build on the recently introduced Performance-oriented Congestion Control (PCC) framework to formulate congestion control protocol design as an RL task. Our RL framework opens up opportunities for network practitioners, and even application developers, to train congestion control models that fit their local performance objectives based on small, bootstrapped models, or complex, custom models, as their resources and requirements merit. We present and discuss the challenges that must be overcome so as to realize our long-term vision for Internet congestion control.

FALCON: Fast and Accurate Multipath Scheduling using Offline and Online Learning

Preprint

Full-text available

Jan 2022

Multipath transport protocols enable the concurrent use of different network paths, benefiting a fast and reliable data transmission. The scheduler of a multipath transport protocol determines how to distribute data packets over different paths. Existing multipath schedulers either conform to predefined policies or to online trained policies. The adoption of millimeter wave (mmWave) paths in 5th Generation (5G) networks and Wireless Local Area Networks (WLANs) introduces time-varying network conditions, under which the existing schedulers struggle to achieve fast and accurate adaptation. In this paper, we propose FALCON, a learning-based multipath scheduler that can adapt fast and accurately to time-varying network conditions. FALCON builds on the idea of meta-learning where offline learning is used to create a set of meta-models that represent coarse-grained network conditions, and online learning is used to bootstrap a specific model for the current fine-grained network conditions towards deriving the scheduling policy to deal with such conditions. Using trace-driven emulation experiments, we demonstrate FALCON outperforms the best state-of-the-art scheduler by up to 19.3% and 23.6% in static and mobile networks, respectively. Furthermore, we show FALCON is quite flexible to work with different types of applications such as bulk transfer and web services. Moreover, we observe FALCON has a much faster adaptation time compared to all the other learning-based schedulers, reaching almost an 8-fold speedup compared to the best of them. Finally, we have validated the emulation results in real-world settings illustrating that FALCON adapts well to the dynamicity of real networks, consistently outperforming all other schedulers.

Policy Distillation for Real-Time Inference in Fronthaul Congestion Control

Article

Full-text available

Nov 2021

Centralized Radio Access Networks (C-RANs) are improving their cost-efficiency through packetized fronthaul networks. Such a vision requires network congestion control algorithms to deal with sub-millisecond delay budgets while optimizing link utilization and fairness. Classic congestion control algorithms have struggled to optimize these goals simultaneously in such scenarios. Therefore, many Reinforcement Learning (RL) approaches have recently been proposed to deal with such limitations. However, when considering RL policies’ deployment in the real world, many challenges exist. This paper deals with the real-time inference challenge, where a deployed policy has to output actions in microseconds. The experiments here evaluate the tradeoff of inference time and performance regarding a TD3 (Twin-delayed Deep Deterministic Policy Gradient) policy baseline and simpler Decision Tree (DT) policies extracted from TD3 via a process of policy distillation. The results indicate that DTs with a suitable depth can maintain performances similar to those of the TD3 baseline. Additionally, we show that by converting the distilled DTs to rules in C++, we can make inference-time nearly negligible, i.e., sub-microsecond time scale. The proposed method enables the use of state-of-the-art RL techniques to congestion control scenarios with tight inference-time and computational constraints.

Towards ML-based Management of Software-Defined Networks

Thesis

Jul 2021

Benoit Kokouvi Nougnanke

With the exponential growth in technology performance, the modern world has become highly connected, digitized, and diverse. Within this hyper-connected world, Communication networks or the Internet are part of our daily life and play many important roles. However, the ever-growing internet services, application, and massive traffic growth complexify networks that reach a point where traditional management functions mainly govern by human operations fail to keep the network operational. In this context, Software-Defined Networking (SDN) emerge as a new architecture for network management. It makes networks programmable by bringing flexibility in their control and management. Even if network management is eased, it is still tricky to handle due to the continuous growth of network complexity. Management tasks remain then complex. Faced with this, the concept of self-driving networking arose. It consists of leveraging recent technological advancements and scientific innovation in Artificial Intelligence (AI)/Machine Learning (ML) with SDN. Compared to traditional management approaches using only analytic mathematical models and optimization, this new paradigm is a data-driven approach. The management operations will leverage the ML ability to exploit hidden pattern in data to create knowledge. This association SDN-AI/ML, with the promise to simplify network management, needs many challenges to be addresses. Self-driving networking or full network automation is the "Holy Grail" of this association. In this thesis, two of the concerned challenges retain our attention. Firstly, efficient data collection with SDN, especially real-time telemetry. For this challenge, we propose COCO for COnfidence-based COllection, a low overhead near-real-time data collection in SDN. Data of interest is collected efficiently from the data plane to the control plane, where they are used whether by traditional management applications or machine-learning-based algorithms. Secondly, we tackle the effectiveness of the use of machine learning to handle complex management tasks. We consider application performance optimization in data centers. We propose a machine-learning-based incast performance inference, where analytical models struggle to provide general and expert-knowledge-free performance models. With this ML-performance model, smart buffering schemes or other QoS optimization algorithms could dynamically optimize traffic performance. These ML-based management schemes are built upon SDN, leveraging its centralized global view, telemetry capabilities, and management flexibility. The effectiveness of our efficient data collection framework and the machine-learning-based performance optimization show promising results. We expect that improved SDN monitoring with AI/ML analytics capabilities can considerably augment network management and make a big step in the self-driving network journey.

Leveraging Deep Reinforcement Learning for Traffic Engineering: A Survey

Article

Aug 2021

After decades of unprecedented development, modern networks have evolved far beyond expectations in terms of scale and complexity. In many cases, traditional traffic engineering (TE) approaches fail to address the quality of service (QoS) requirements of modern networks. In recent years, deep reinforcement learning (DRL) has proved to be a feasible and effective solution for autonomously controlling and managing complex systems. Massive growth in the use of DRL applications in various domains is beginning to benefit the communications industry. In this paper, we firstly provide a comprehensive overview of DRL-based TE. Then, we present a detailed literature review on applications of DRL for TE including three fundamental issues: routing optimization, congestion control, and resource management. Finally, we discuss our insights into the challenges and future research perspectives of DRL-based TE.

Vers un Management basé ML des Réseaux SDNs

Thesis

Jul 2021

Benoit Kokouvi Nougnanke

A Literature Survey and Bibliometric Analysis of Application of Artificial Intelligence Techniques on Wireless Mesh Networks

Article

Full-text available

Jan 2021

Recent years have seen a surge in the use of technology for executing transactions in both online and offline modes. Various industries like banking, e-commerce, and private organizations use networks for the exchange of confidential information and resources. Network security is thus of utmost importance, with the expectation of effective and efficient analysis of the network traffic. Wireless Mesh Networks are effective in communicating information over a vast span with minimal costs. A network is evaluated based on its security, accessibility, and extent of interoperability. Artificial Intelligence techniques like machine learning and deep learning have found widespread use to solve a range of challenging, real-world problems. These techniques are well known for their ability to detect issues or patterns in traffic along with advancements in computing capabilities. Extensive research is being carried out to improve the performance of Wireless Mesh Networks. This survey aims to provide a disinterested overview of the application of different artificial intelligence techniques to enhance network performance. We focus on approaches that address the three fundamental problems in networking: traffic prediction, traffic routing, and congestion control. Our paper also includes the bibliometric analysis of the literature, highlighting the ongoing efforts in terms of statistics across multiple metrics. This survey aims to provide researchers in this community with a reliable compendium to get a brief yet succinct understanding of the current progress in the domain.

Device-Based Cellular Throughput Prediction for Video Streaming: Lessons From a Real-World Evaluation

Article

Full-text available

Jan 2024

AI-driven data analysis methods have garnered attention in enhancing the performance of wireless networks. One such application is the prediction of downlink throughput in mobile cellular networks. Accurate throughput predictions have demonstrated significant application benefits, such as improving the quality of experience in adaptive video streaming. However, the high degree of variability in cellular link behaviour, coupled with device mobility and diverse traffic demands, presents a complex problem. Numerous published studies have explored the application of machine learning to address this problem, displaying potential when trained and evaluated with traffic traces collected from operational networks. The focus of this paper is an empirical investigation of machine learning-based throughput prediction that runs in real-time on a smartphone, and its evaluation with video streaming in a range of real-world cellular network settings. We report on a number of key challenges that arise when performing prediction “in the wild”, dealing with practical issues one encounters with online data (not traces) and the limitations of real smartphones. These include data sampling, distribution shift, and data labelling. We describe our current solutions to these issues and quantify their efficacy, drawing lessons that we believe will be valuable to network practitioners planning to use such methodologies in operational cellular networks.

Poster: PolyCC: Poly-Algorithmic Congestion Control

Conference Paper

Full-text available

Sep 2023

Two-tiered Online Optimization of Region-wide Datacenter Resource Allocation via Deep Reinforcement Learning

Preprint

Full-text available

Jun 2023

This paper addresses the important need for advanced techniques in continuously allocating workloads on shared infrastructures in data centers, a problem arising due to the growing popularity and scale of cloud computing. It particularly emphasizes the scarcity of research ensuring guaranteed capacity in capacity reservations during large-scale failures. To tackle these issues, the paper presents scalable solutions for resource management. It builds on the prior establishment of capacity reservation in cluster management systems and the two-level resource allocation problem addressed by the Resource Allowance System (RAS). Recognizing the limitations of Mixed Integer Linear Programming (MILP) for server assignment in a dynamic environment, this paper proposes the use of Deep Reinforcement Learning (DRL), which has been successful in achieving long-term optimal results for time-varying systems. A novel two-level design that utilizes a DRL-based algorithm is introduced to solve optimal server-to-reservation assignment, taking into account of fault tolerance, server movement minimization, and network affinity requirements due to the impracticality of directly applying DRL algorithms to large-scale instances with millions of decision variables. The paper explores the interconnection of these levels and the benefits of such an approach for achieving long-term optimal results in the context of large-scale cloud systems. We further show in the experiment section that our two-level DRL approach outperforms the MIP solver and heuristic approaches and exhibits significantly reduced computation time compared to the MIP solver. Specifically, our two-level DRL approach performs 15% better than the MIP solver on minimizing the overall cost. Also, it uses only 26 seconds to execute 30 rounds of decision making, while the MIP solver needs nearly an hour.

TCP-RLLD: TCP with reinforcement learning based loss differentiation for mobile adhoc networks

Article

Full-text available

Feb 2023
WIREL NETW

TCP-Transmission Control Protocol provides connection oriented and reliable communication. TCP’s default consideration of any loss as a cause of network congestion degrades the end-to-end performance in wireless networks specially in MANETs-Mobile Adhoc Networks. TCP should identify various non congestion losses such as channel loss and route failure loss to act accordingly. Over the years, researchers have proposed Machine Learning based network protocols for accurate and efficient decision making. Reinforcement learning suits better for the dynamic networks with unpredictable traffic and topology. TCP-RLLD, TCP with Reinforcement Learning based Loss Differentiation is an end-to-end transport layer solution to predict cause of a packet loss. TCP’s default consideration of any loss as a congestion loss is overruled by TCP-RLLD to avoid unnecessary reduction of the transmission rate. TCP-RLLD is evaluated with multiple TCP variants for the Mobile Adhoc Networks. The extensive evaluation is performed with NS-3 simulator. This paper discusses TCP-RLLD architecture along with the detail of performance improvement.

Scalable Deep Reinforcement Learning-Based Online Routing for Multi-type Service Requirements

Preprint

Full-text available

Nov 2022

p>This work is under review of IEEE Transactions on Parallel and Distributed Systems. </h3

Scalable Deep Reinforcement Learning-Based Online Routing for Multi-type Service Requirements

Preprint

Full-text available

Nov 2022

p>This work is under review of IEEE Transactions on Parallel and Distributed Systems. </h3

BoB: Bandwidth Prediction for Real-Time Communications Using Heuristic and Reinforcement Learning

Article

Full-text available

Jan 2022

Bandwidth prediction is critical in any Real-time Communication (RTC) service or application. This component decides how much media data can be sent in real time. Subsequently, the video and audio encoder dynamically adapts the bitrate to achieve the best quality without congesting the network and causing packets to be lost or delayed. To date, several RTC services have deployed the heuristic-based Google Congestion Control (GCC), which performs well under certain circumstances and falls short in some others. In this paper, we leverage the advancements in reinforcement learning and propose BoB (Bang-on-Bandwidth) — a hybrid bandwidth predictor for RTC. At the beginning of the RTC session, BoB uses a heuristic-based approach. It then switches to a learning-based approach. BoB predicts the available bandwidth accurately and improves bandwidth utilization under diverse network conditions compared to the two winning solutions of the ACM MMSys'21 grand challenge on bandwidth estimation in RTC. An open-source implementation of BoB is publicly available for further testing and research.

Forecasting TCP's Rate to Speed up Slow Start

Article

Full-text available

Jan 2022

Ralf LUbben

Selection of the optimal transmission rate in packet-switched best-effort networks is challenging. Typically, senders do not have any information about the end-to-end path and should not congest the connection but at once fully utilize it. The accomplishment of these goals lead to congestion control protocols such as TCP Reno, TCP Cubic, or TCP BBR that adapt the sending rate according to extensive measurements of the path characteristics by monitoring packets and related acknowledgments. To improve and speed up this adaptation, we propose and evaluate a machine learning approach for the prediction of sending rates from measurements of metrics provided by the TCP stack. For the prediction a neural network is trained and evaluated. The prediction is implemented in the TCP stack to speed up TCP slow start. For a customizable and performant implementation the extended Berkeley packet filter is used to extract relevant data from the kernel space TCP stack, to forward the monitoring data to a user space data rate prediction, and to feed the prediction result back to the stack. Results from a online experiment show improvement in flow completion time of up to 30%.

AutoPlex: inter-session multiplexing congestion control for large-scale live video services

Conference Paper

Full-text available

Aug 2022

Aquarius -Enable Fast, Scalable, Data-Driven Service Management in the Cloud

Article

Full-text available

Dec 2022

In order to dynamically manage and update networking policies in cloud data centers, Virtual Network Functions (VNFs) use, and therefore actively collect, networking state information -and in the process, incur additional control signaling and management overhead, especially in larger data centers. In the meantime, VNFs in production prefer distributed and straightforward heuristics over advanced learning algorithms to avoid intractable additional processing latency under high-performance and low-latency networking constraints. This paper identifies the challenges of deploying learning algorithms in the context of cloud data centers, and proposes Aquarius to bridge the application of machine learning (ML) techniques on distributed systems and service management. Aquarius passively yet efficiently gathers reliable observations, and enables the use of ML techniques to collect, infer, and supply accurate networking state information -without incurring additional signaling and management overhead. It offers fine-grained and programmable visibility to distributed VNFs, and enables both open-and close-loop control over networking systems. This paper illustrates the use of Aquarius with a traffic classifier, an auto-scaling system, and a load balancer -and demonstrates the use of three different ML paradigms -unsupervised, supervised, and reinforcement learning, within Aquarius, for network state inference and service management. Testbed evaluations show that Aquarius suitably improves network state visibility and brings notable performance gains for various scenarios with low overhead.

Automatic Curriculum Generation for Learning Adaptation in Networking

Preprint

Feb 2022

As deep reinforcement learning (RL) showcases its strengths in networking and systems, its pitfalls also come to the public's attention--when trained to handle a wide range of network workloads and previously unseen deployment environments, RL policies often manifest suboptimal performance and poor generalizability. To tackle these problems, we present Genet, a new training framework for learning better RL-based network adaptation algorithms. Genet is built on the concept of curriculum learning, which has proved effective against similar issues in other domains where RL is extensively employed. At a high level, curriculum learning gradually presents more difficult environments to the training, rather than choosing them randomly, so that the current RL model can make meaningful progress in training. However, applying curriculum learning in networking is challenging because it remains unknown how to measure the "difficulty" of a network environment. Instead of relying on handcrafted heuristics to determine the environment's difficulty level, our insight is to utilize traditional rule-based (non-RL) baselines: If the current RL model performs significantly worse in a network environment than the baselines, then the model's potential to improve when further trained in this environment is substantial. Therefore, Genet automatically searches for the environments where the current model falls significantly behind a traditional baseline scheme and iteratively promotes these environments as the training progresses. Through evaluating Genet on three use cases--adaptive video streaming, congestion control, and load balancing, we show that Genet produces RL policies which outperform both regularly trained RL policies and traditional baselines in each context, not only under synthetic workloads but also in real environments.

Modelling resource allocation in uncertain system environment through deep reinforcement learning

Preprint

Full-text available

Jun 2021

Reinforcement Learning has applications in field of mechatronics, robotics, and other resource-constrained control system. Problem of resource allocation is primarily solved using traditional predefined techniques and modern deep learning methods. The drawback of predefined and most deep learning methods for resource allocation is failing to meet the requirements in cases of uncertain system environment. We can approach problem of resource allocation in uncertain system environment alongside following certain criteria using deep reinforcement learning. Also, reinforcement learning has ability for adapting to new uncertain environment for prolonged period of time. The paper provides a detailed comparative analysis on various deep reinforcement learning methods by applying different components to modify architecture of reinforcement learning with use of noisy layers, prioritized replay, bagging, duelling networks, and other related combination to obtain improvement in terms of performance and reduction of computational cost. The paper identifies problem of resource allocation in uncertain environment could be effectively solved using Noisy Bagging duelling double deep Q network achieving efficiency of 97.7% by maximizing reward with significant exploration in given simulated environment for resource allocation.

BBR-S: A Low-Latency BBR Modification for Fast-Varying Connections

Article

Full-text available

May 2021

The new possibilities offered by 5G and beyond networks have led to a change in the focus of congestion control from capacity maximization for web browsing and file transfer to latency-sensitive interactive and real-time services, and consequently to a renaissance of research on the subject, whose most well-known result is Google’s Bottleneck Bandwidth and Round-trip propagation time (BBR) algorithm. BBR’s promise is to operate at the optimal working point of a connection, with the minimum Round Trip Time (RTT) and full capacity utilization, striking the balance between resource use efficiency and latency performance. However, while it provides significant performance improvements over legacy mechanisms such as Cubic, it can significantly overestimate the capacity of fast-varying mobile connections, leading to unreliable service and large potential swings in the RTT. Our BBR-S algorithm replaces the max filter that causes this overestimation issue with an Adaptive Tobit Kalman Filter (ATKF), an innovation on the Kalman filter that can deal with unknown noise statistics and saturated measurements, achieving a 40% reduction in the average RTT over BBR, which increases to 60% when considering worst-case latency, while maintaining over 95% of the throughput in 4G and 5G networks.

Data-driven Competitive Algorithms for Online Knapsack and Set Cover

Article

Full-text available

Dec 2020

The design of online algorithms has tended to focus on algorithms with worst-case guarantees, e.g., bounds on the competitive ratio. However, it is well-known that such algorithms are often overly pessimistic, performing sub-optimally on non-worst-case inputs. In this paper, we develop an approach for data-driven design of online algorithms that maintain near-optimal worst-case guarantees while also performing learning in order to perform well for typical inputs. Our approach is to identify policy classes that admit global worst-case guarantees, and then perform learning using historical data within the policy classes. We demonstrate the approach in the context of two classical problems, online knapsack and online set cover, proving competitive bounds for rich policy classes in each case. Additionally, we illustrate the practical implications via a case study on electric vehicle charging.

Tomography Based Learning for Load Distribution Through Opaque Networks

Article

Full-text available

Mar 2021

Applications such as virtual reality and online gaming require low delays for acceptable user experience. A key task for over-the-top (OTT) service providers who provide these applications is sending traffic through the networks to minimize delays. OTT traffic is typically generated from multiple data centers which are multi-homed to several network ingresses. However, information about the path characteristics of the underlying network from the ingresses to destinations is not explicitly available to OTT services. These can only be inferred from external probing. In this paper, we combine network tomography with machine learning to minimize delays. We consider this problem in a general setting where traffic sources can choose a set of ingresses through which their traffic enter a black box network. The problem in this setting can be viewed as a reinforcement learning problem with strict linear constraints on a continuous action space. Key technical challenges to solving this problem include the high dimensionality of the problem and handling constraints that are intrinsic to networks. Evaluation results show that our methods achieve up to 60% delay reductions in comparison to standard heuristics. Moreover, the methods we develop can be used in a centralized manner or in a distributed manner by multiple independent agents.

Data-driven Competitive Algorithms for Online Knapsack and Set Cover

Preprint

Full-text available

Dec 2020

The design of online algorithms has tended to focus on algorithms with worst-case guarantees, e.g., bounds on the competitive ratio. However, it is well-known that such algorithms are often overly pessimistic, performing sub-optimally on non-worst-case inputs. In this paper, we develop an approach for data-driven design of online algorithms that maintain near-optimal worst-case guarantees while also performing learning in order to perform well for typical inputs. Our approach is to identify policy classes that admit global worst-case guarantees, and then perform learning using historical data within the policy classes. We demonstrate the approach in the context of two classical problems, online knapsack and online set cover, proving competitive bounds for rich policy classes in each case. Additionally, we illustrate the practical implications via a case study on electric vehicle charging.

Towards A Learning-Based Framework for Self-Driving Design of Networking Protocols

Preprint

Full-text available

Sep 2020

Networking protocols are designed through long-time and hard-work human efforts. Machine Learning (ML)-based solutions have been developed for communication protocol design to avoid manual efforts to tune individual protocol parameters. While other proposed ML-based methods mainly focus on tuning individual protocol parameters (e.g., adjusting contention window), our main contribution is to propose a novel Deep Reinforcement Learning (DRL)-based framework to systematically design and evaluate networking protocols. We decouple a protocol into a set of parametric modules, each representing a main protocol functionality that is used as DRL input to better understand the generated protocols design optimization and analyze them in a systematic fashion. As a case study, we introduce and evaluate DeepMAC a framework in which a MAC protocol is decoupled into a set of blocks across popular flavors of 802.11 WLANs (e.g., 802.11 b/a/g/n/ac). We are interested to see what blocks are selected by DeepMAC across different networking scenarios and whether DeepMAC is able to adapt to network dynamics.

LearningCC: An online learning approach for congestion control

Preprint

Aug 2020

Zhang Songyang

Recently, much effort has been devoted by researchers from both academia and industry to develop novel congestion control methods. LearningCC is presented in this letter, in which the congestion control problem is solved by reinforce learning approach. Instead of adjusting the congestion window with fixed policy, there are serval options for an endpoint to choose. To predict the best option is a hard task. Each option is mapped as an arm of a bandit machine. The endpoint can learn to determine the optimal choice through trial and error method. Experiments are performed on ns3 platform to verify the effectiveness of LearningCC by comparing with other benchmark algorithms. Results indicate it can achieve lower transmission delay than loss based algorithms. Especially, we found LearningCC makes significant improvement in link suffering from random loss.

Tomography Based Learning for Load Distribution through Opaque Networks

Preprint

Jul 2020

Applications such as virtual reality and online gaming require low delays for acceptable user experience. A key task for over-the-top (OTT) service providers who provide these applications is sending traffic through the networks to minimize delays. OTT traffic is typically generated from multiple data centers which are multi-homed to several network ingresses. However, information about the path characteristics of the underlying network from the ingresses to destinations is not explicitly available to OTT services. These can only be inferred from external probing. In this paper, we combine network tomography with machine learning to minimize delays. We consider this problem in a general setting where traffic sources can choose a set of ingresses through which their traffic enter a black box network. The problem in this setting can be viewed as a reinforcement learning problem with constraints on a continuous action space, which to the best of our knowledge have not been investigated by the machine learning community. Key technical challenges to solving this problem include the high dimensionality of the problem and handling constraints that are intrinsic to networks. Evaluation results show that our methods achieve up to 60% delay reductions in comparison to standard heuristics. Moreover, the methods we develop can be used in a centralized manner or in a distributed manner by multiple independent agents.

ADYTIA: Adaptive and Dynamic TCP Interface Architecture for Heterogeneous Networks

Preprint

Full-text available

Oct 2018

Due to the widespread popularity and usage of Internet of things (IoT)-enabled devices, there is an exponential increase in the data traffic generated from these IoT devices. Most of these devices communicate with each other using heterogeneous links having constraints such as latency, throughput, and interference from concurrent transmissions. This results in an extra burden on the underlying communication infrastructure to manage the traffic within these constraints between source and destination. However, most of the existing applications use different Transmission Control Protocol (TCP) variants for traffic management between these devices, and are dependent on the stage of the sender, irrespective of the application types and link characteristics. Each operating system (OS) has different TCP variant for all applications, irrespective of path characteristics. Hence, a single TCP variant cannot select the best suitable link which results in degradation in throughput compared to the existing default. Moreover, it cannot use the full capacity of the available link for different applications and network links, especially in heterogeneous network such as IoT. To cope up with these challenges, in this paper, we propose an Adaptive and Dynamic TCP Interface Architecture (ADYTIA). ADYTIA allows the usage of different TCP variants based on application and link characteristics, irrespective of the physical links of the entire path. It allows the usage of different TCP variants based on their design principle across heterogeneous technologies, platforms, and applications. ADYTIA is implemented on NS-2 and Linux kernel for real testbed experiments. It's ability to select the best suitable TCP variant results in 20% to 80% improvement in throughput compared to the existing default and single TCP variant on Linux and Windows.

MPTCP meets FEC: Supporting Latency-Sensitive Applications over Heterogeneous Networks

Preprint

Full-text available

Jul 2018

Over the past years, TCP has gone through numerous updates to provide performance enhancement under diverse network conditions. However, with respect to losses, little can be achieved with legacy TCP detection and recovery mechanisms. Both fast retransmission and retransmission timeout take at least one extra round trip time to perform, and this might significantly impact performance of latency-sensitive applications, especially in lossy or high delay networks. While forward error correction (FEC) is not a new initiative in this direction, the majority of the approaches consider FEC inside the application. In this paper, we design and implement a framework, where FEC is integrated within TCP. Our main goal with this design choice is to enable latency sensitive applications over TCP in high delay and lossy networks, but remaining application agnostic. We further incorporate this design into multipath TCP (MPTCP), where we focus particularly on heterogeneous settings, considering the fact that TCP recovery mechanisms further escalate head-of-line blocking in multipath. We evaluate the performance of the proposed framework and show that such a framework can bring significant benefits compared to legacy TCP and MPTCP for latency-sensitive real application traffic, such as video streaming and web services.

QARC: Video Quality Aware Rate Control for Real-Time Video Streaming based on Deep Reinforcement Learning

Preprint

May 2018

Real-time video streaming is now one of the main applications in all network environments. Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively has become an upcoming and interestingly issue. To tackle this problem, most adaptive bitrate control methods have been proposed to provide high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to get a balance between them. In this paper, we propose QARC (video Quality Awareness Rate Control), a rate control algorithm that aims to have a higher perceptual video quality with possibly lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames. To overcome the "state explosion problem", we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC over a trace-driven emulation, outperforming existing approach with improvements in average video quality of 18\% - 25\% and decreases in average latency with 23% -45%. Meanwhile, Comparing QARC with offline optimal high bitrate method on various network conditions also yields a solid result.

MPTCP meets FEC: Supporting Latency-Sensitive Applications over Heterogeneous Networks

Article

Oct 2017

Over the past years, TCP has gone through numerous updates to provide performance enhancement under diverse network conditions. However, with respect to losses, little can be achieved with legacy TCP detection and recovery mechanisms. Both fast retransmission and retransmission timeout take at least one extra round trip time to perform, and this might significantly impact the performance of latency-sensitive applications, especially in lossy or high delay networks. While forward error correction (FEC) is not a new initiative in this direction, the majority of the approaches consider FEC inside the application. In this paper, we design and implement a framework, where FEC is integrated within TCP. Our main goal with this design choice is to enable latency sensitive applications over TCP in high delay and lossy networks, but remaining application agnostic. We further incorporate this design into multipath TCP (MPTCP), where we focus particularly on heterogeneous settings, considering the fact that TCP recovery mechanisms further escalate head-of-line blocking in multipath. We evaluate the performance of the proposed framework and show that such a framework can bring significant benefits compared with legacy TCP and MPTCP for latency-sensitive real application traffic, such as video streaming and web services.

TCP SIAD: Congestion Control supporting High Speed and Low Latency

Article

Dec 2016

Mirja Kühlewind

Congestion control has been an open research issue for more than two decades. More and more applications with narrow latency requirements are emerging which are not well addressed by existing proposals. In this paper we present TCP Scalable Increase Adaptive Decrease (SIAD), a new congestion control scheme supporting both high speed and low latency. More precisely, our algorithm aims to provide high utilization under various networking conditions, and therefore would allow operators to configure small buffers for low latency support. To provide full scalability with high speed networks, we designed TCP SIAD based on a new approach that aims for a fixed feedback rate independent of the available bandwidth. Further, our approach provides a configuration knob for the feedback rate. This can be used by a higher layer control loop to impact the capacity share, potentially at the cost of higher congestion, e.g. for applications that need a minimum rate. We evaluated TCP SIAD against well-known high-speed congestion control schemes, such as Scalable TCP and High Speed TCP, as well as H-TCP that among other goals targets small buffers. We show that only SIAD is able to utilize the bottleneck with arbitrary buffer sizes while avoiding a standing queue. Moreover, we demonstrate the capacity sharing of SIAD depending on the configured feedback rate and a high robustness of TCP SIAD to non-congestion related loss.

Analyzing the possibility of applying asymmetric transport protocols in terms of software defined networks

Article

Full-text available

Mar 2015

Network load balancing (NLB) is an important element of construction and management of fault tolerance in communication networks. At present, there are a lot of balancing algorithms both for standard approaches to networking and for software defined networks (SDN). An asymmetric transport protocol having the property to use the method anycast for parallel coupling with several servers was developed. The general description of SDN, TCP, and the first asymmetric transport protocol Trickles, as well as one conducted experiment in a network simulator in comparison with Trickles with some implementations of TCP, is presented in the article. A new algorithm for operation of the asymmetric transport protocol based on the experimental results is suggested. Several variants of using the asymmetric transport protocols in terms of SDN are discussed.

MACHINE LEARNING TECHNIQUES FOR INTRUSION DETECTION IN NETWORKS:A SYSTEMETIC STUDY

Article

Full-text available

Jan 2021

Intrusion detection system is one of the implemented solutions against harmful attacks. Moreover, attackers try to always change their tools and techniques. Although, implementing an accepted IDS system is additionally a difficult task. In this paper, several methods are identified and reviewed to assess various machine learning algorithms. Additionally, several conditions for wireless communication for deciding whether to apply ML as well suitable technique of ML. Also traditional summarized approaches along with their performance comparison with ML based techniques are surveyed.

Risk Prediction-Based Breast Cancer Diagnosis Using Personal Health Records and Machine Learning Models

Chapter

Jan 2021

Breast cancer is most common in middle-aged female population. It is the fourth most dangerous cancer compared to remaining cancers. In recent years, breast cancer patients are significantly increasing, so the early diagnosis of cancer has become a necessary task in the cancer research, to facilitate subsequent clinical management of patients. The prevention of the breast cancer tumor is early detection of the tumor. Early detection of cancer can stop increase in tumor and saves lives. In the field of machine learning classification, cancer patients are classified into two types as benign or malignant. Different preprocessing techniques like filling missing values, applying correlation coefficient, synthetic minority oversampling technique (SMOTE) and tenfold cross-validations are implemented and aptly used to obtain the accuracy. The main context of this study is to identify key features from the dataset and analyze the performance evaluation of different machine learning algorithms like random forest classifier, logistic regression, support vector machine, decision tree, Gaussian Naive Bayes and k-nearest neighbors. Based on the results, the classification model that gives highest accuracy will be used as the best model for cancer prediction.

WIP: Waveform Independent Congestion Control Protocol

Conference Paper

Mar 2017

Scott Pudlewski

TCP With Virtual Queue Management Policies: Stability and Bifurcation Analysis

Article

Nov 2016

In this paper, we analyze a model for transport control protocol (TCP) along with a non-adaptive virtual queue (VQ) and an adaptive virtual queue (AVQ) management policy. In the class of transport protocols, we focus on compound TCP as it is the default protocol in the Windows operating system. We start by conducting a local stability analysis for the underlying fluid models. For the VQ policy, we show that small virtual buffers play an important role in ensuring stability, whereas the AVQ policy could readily lose local stability as the link capacity, the feedback delay, or the link's damping factor gets large. With both the queue policies, the protocol parameters of compound TCP also influence stability. Furthermore, in both the models, we explicitly show that as parameters vary the loss of local stability would occur via a Hopf bifurcation. For the AVQ policy, we are also able to analytically verify if the Hopf bifurcation is super-critical, and determine the stability of the bifurcating limit cycles. Packet-level simulations, conducted over two topologies, using the network simulator (NS2) confirm the existence of stable limit cycles in the queue size.

A Knowledge-Based Systems Approach to Reason About Networking

Conference Paper

Nov 2016

Zied Ben Houidi

Most unwritten languages today have no known grammar, and are rather governed by ``unspoken rules''. Similarly, we think that the young discipline of networking is still a practice that lacks a deep understanding of the rules that govern it. This situation results in a loss of time and efforts. First, since the rules are unspoken, they are not systematically reused. Second, since there is no grammar, it is impossible to assert if a sentence is correct. Comparing two networking approaches or solutions is sometimes a synonym of endless religious debates. Drawing the proper conclusion from this claim, we advocate that networking research should spend more efforts on better understanding its rules as a first step to automatically reuse them. To illustrate our claim, we focus in this paper on one broad family of networking connectivity problems. We show how different instances of this problem, which were solved in parallel with no explicit knowledge reuse, can be derived from a small set of facts and rules implemented in a knowledge-based system.

Network resource allocation in distributed systems: A global optimization framework

Conference Paper

Jun 2015

Zbigniew Wesołowski

The paper proposes a global optimization approach to the network resource allocation problem, where the objective is to maximize the overall data flow through a shared network. In the proposed approach, the utility functions of agents may have different forms, which allows a more realistic modeling of phenomena occurring in computer networks. To solve the optimization problem, a modified gradient projection method has been applied.

Learning-Based and Data-Driven TCP Design for Memory-Constrained IoT

Conference Paper

May 2016

Kemy: An AQM generator based on machine learning

Conference Paper

Aug 2015

A Dynamic Offer/Answer Mechanism Encompassing TCP Variants in Heterogeneous Environments

Article

Jun 2014

The Transmission Control Protocol (TCP) has evolved from its initial form defined in RFC 793 in 1981 to cope with the evolution of IP networks in general and of the Internet in particular. Over the years, several factors have led to the design of successive TCP variants: the increasing disparity of end hosts, the variety of data links characteristics (Optical, wireless cellular or satellite), the increase of the delay-bandwidth product in data networks or the use of multiple paths at the same time. In this context, some TCP variants tune the congestion control and avoidance mechanisms to adapt them to specific situations while others make use of TCP options. In practice, such enhanced versions of TCP can be unusable because of the presence of intermediate elements such as firewalls along the path between the two end hosts. Such elements can indeed filter some TCP options or tamper with the way congestion is managed, introducing then unacceptable jitter in the IP flows. In such cases, most TCP variants are designed to fallback to a generic form of TCP. In many situations, this generic TCP version is not the best fit, while another TCP variant could be used to deal with this transient problem. To address this issue, we introduce an original offer/answer (O/A) mechanism allowing end hosts to identify dynamically a suitable TCP variant able to satisfy the specific constraints of each type of packet flows.

Experimental Evaluation of Congestion Avoidance

Conference Paper

Mar 2015

The Internet is currently undergoing a significant change. The majority of Internet transfers have historically occurred between wired network devices. The popularity of WiFi, combined with the uptake of smartphones and tablets, has changed this assumption and the majority of Internet connections will soon utilise a wireless link. Congestion avoidance controls the rate at which packets leave a TCP sender. This research re-evaluates different congestion avoidance mechanisms over real world wired and wireless links. This paper does not assert that present mechanisms are poor, nor that wireless has never been a consideration, but that the switch to a mobile and wireless majority necessitates a review, with wireless characteristics at the forefront of design considerations. The results of this study show that TCP Cubic and TCP Hybla perform similarly and generally outperform Veno and Westwood in both wired and wireless scenarios. All tested algorithms featured large delays in 3G links. The results suggest that queuing on the 3G links added between 370ms and 570ms of delay. It is suggested that additional research into congestion avoidance and buffering mechanisms over wireless links is needed.

TCP ex Machina: Computer-Generated Congestion Control

Abstract

No full-text available

Recommended publications

TCP ex machina

TCP ex Machina: Computer-Generated Congestion Control