Article

TCP ex Machina: Computer-Generated Congestion Control

Authors:
To read the full-text of this research, you can request a copy directly from the authors.

Abstract

This paper describes a new approach to end-to-end congestion control on a multi-user network. Rather than manually formulate each endpoint’s reaction to congestion signals, as in traditional protocols, we developed a program called Remy that generates congestion control algorithms to run at the endpoints. In this approach, the protocol designer specifies their prior knowledge or assumptions about the network and an objective that the algorithm will try to achieve, e.g., high throughput and low queueing delay. Remy then produces a distributed algorithm—the control rules for the independent endpoints—that tries to achieve this objective. In simulations with ns-2, Remy-generated algorithms outperformed human-designed end-to-end techniques, including TCP Cubic, Compound, and Vegas. In many cases, Remy’s algorithms also outperformed methods that require intrusive in-network changes, including XCP and Cubic-over-sfqCoDel (stochastic fair queueing with CoDel for active queue management). Remy can generate algorithms both for networks where some parameters are known tightly a priori, e.g. datacenters, and for networks where prior knowledge is less precise, such as cellular networks. We characterize the sensitivity of the resulting performance to the specificity of the prior knowledge, and the consequences when real-world conditions contradict the assumptions supplied at design-time.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the authors.

... Buna ek olarak, makine öğrenimi, ortamın mevcut durumlarına göre ağ zamanlamasını (B. Mao vd., 2017) ve parametre uyarlamasını kolaylaştıracak karar verme işlemine de yardımcı olabilir (Dong, Li, Zarchy, Godfrey, & Schapira, 2015;Winstein & Balakrishnan, 2013). İkincisi, birçok ağ sorunu karmaşık sistem ortamlarıyla etkileşim gerektirir. ...
... Son olarak, her ağ senaryosunun farklı özellikleri olabilir (örneğin, trafik kalıpları ve ağ durumları) ve araştırmacıların genellikle her senaryo için sorunu bağımsız olarak çözmeleri gerekir. Makine öğrenimi, genel bir modeli tek tip bir eğitim yöntemiyle oluşturmak için yeni olanaklar sağlayabilir (Dong vd., 2015;Winstein & Balakrishnan, 2013). ...
... Verimli kaynak yönetimi ve ağ adaptasyonu, ağ sistemi performansını iyileştirmenin anahtarlarıdır. Bunun için trafik tıkanıklığı (Dong vd., 2015;Winstein & Balakrishnan, 2013), ve yönlendirme kontrolü (B. Mao vd., 2017) ile birlikte trafik planlaması yapılması gerekir. ...
Chapter
Full-text available
Tüm dünyada en ücra köşelere kadar ulaşmış olan ağ sistemlerinin hayatımıza olan etkileri bugün her zaman olduğundan daha önemli bir hale gelmiştir. Yaşamımızı, üretim, ulaşım vb tüm alanları kapsayan bu devasa sistemin eksiklerinin giderilmesi, kaynak kullanımı ve performans açısından daha verimli hale getirilmesi için yapılan çalışmalarda, tüm alanlarda etkisini hissettiren Yapay Zeka (YZ) tekniklerinin kullanılmaması mümkün değildir. Geçmişi uzun yıllar öncesine dayanan “Bilgisayar Ağlarında Yapay Zeka” konusunda yapılan çalışmalarda önemli aşamalar kat edilmiştir, ancak daha çok çalışma yapılması gereklidir. YZ ağ sistemlerine insan müdahalesini azaltarak kendi kendine bakım gerektirmeden çalışabilen sistemler geliştirilmesini sağlayacak ve insan kaynaklı riskleri azaltacaktır. Ayrıca ağ öğelerinin anlık ihtiyaçlara göre kendini şekillendirmesini ve böylece kaynakların daha verimli kullanımını sağlayacaktır. YZ fiziksel katmanda özellikle veri iletiminde kayıpları ve hataları azaltmaya yönelik çalışmalara katkı sağlamıştır. Diğer katmanlarda ise IP tabanlı paketlerin içeriklerine göre paketlerin hedeflerine daha çabuk ve daha az kaynak harcayarak ulaştırılabilmesi, trafik sıkışıklığını engelleyerek ağ kaynaklarının gerektiğinde otonom bir şekilde kendini değiştirebilmesi konularında çalışmalar yapılmıştır. YZ’nin ağ sistemlerinde bir diğer önemli çalışma alanı siber güvenliktir. YZ tüm sistemi izleyerek olası anormal durum değişikliklerini, saldırıları, güvenlik zafiyetlerini tespit edebilir. Bu çalışma YZ’nin bilgisayar ağlarında hangi alanlarda ve nasıl kullanıldığını incelemekte ayrıca YZ’nin bilgisayar ağlarının gelişimine katkıda bulunurken ne gibi zorluklarla karşılaştığından bahsetmektedir.
... For example, packet loss-based CCs like Cubic [4] cannot distinguish packet drops caused by congestion or non-congestion-related events [7]. Researchers have tried to construct CC algorithms with machine learning approaches to address these limitations [7][8][9][10][11]. The insight is that the CC decisions are dependent on traffic patterns and network circumstances, which can be exploited by deep reinforcement learning (RL) to learn a policy for each scenario. ...
... Researchers have also investigated the use of machine learning to construct better heuristics. Indigo [10] and Remy [11] use offline learning to obtain high-performance CC algorithms. PCC [28] and PCC Vivace [9] opt for online learning to avoid any hardwired mappings between states and actions. ...
... internal state = 1), 7 and 8 realize two stages of recovery, where the latency inflation ratio starts plateauing and then starts reducing. 11 indicates that stable conditions have been achieved again and the agent is at an optimal sending rate. The internal state is flipped back again to 0 after this recovery. ...
Preprint
Recent advances in TCP congestion control (CC) have achieved tremendous success with deep reinforcement learning (RL) approaches, which use feedforward neural networks (NN) to learn complex environment conditions and make better decisions. However, such "black-box" policies lack interpretability and reliability, and often, they need to operate outside the traditional TCP datapath due to the use of complex NNs. This paper proposes a novel two-stage solution to achieve the best of both worlds: first to train a deep RL agent, then distill its (over-)parameterized NN policy into white-box, light-weight rules in the form of symbolic expressions that are much easier to understand and to implement in constrained environments. At the core of our proposal is a novel symbolic branching algorithm that enables the rule to be aware of the context in terms of various network conditions, eventually converting the NN policy into a symbolic tree. The distilled symbolic rules preserve and often improve performance over state-of-the-art NN policies while being faster and simpler than a standard neural network. We validate the performance of our distilled symbolic rules on both simulation and emulation environments. Our code is available at https://github.com/VITA-Group/SymbolicPCC.
... Third, current congestion control algorithms are either handcrafted or trained offline, and as a result, they use fixed mappings between network events and congestion control responses. The fixed set of rules is either manually designed ( e.g., BBR [1] and CUBIC [7]) or previously learned actions on observed states from simulated environments (e.g., Remy [8], Indigo [9], Aurora [5], Eagle [3] and Orca [4]). Since these fixed rules may not always apply or the network's dynamics may deviate from those simulated environments, these congestion control algorithms may not perform well on a wide variety of environments, and hence can lack generalization. ...
... Current learningbased congestion control algorithms do not fully solve issues in heuristics. They try to mitigate issues of heuristics by learning congestion control rules through interacting with simulated environments, such as Remy [8], Indigo [9], Eagle [3] and Orca [4]. Since learning occurs on a small subset of environments, it is a challenge to generalize well to a wider variety of environments. ...
... To evaluate the performance of Pareto, we used the Pantheon experimental testbed [9], which is designed to assess new congestion control algorithms by comparing with existing work. Pantheon has been widely used since its launch [2], [3], [8]. Pantheon uses an emulated network environment using Mahimahi shells, and results from Pantheon are reproducible and accurately reflect real-world results. ...
Article
Modern-day computer networks are highly diverse and dynamic, calling for fair and adaptive network congestion control algorithms with the objective of achieving the best possible throughput, latency, and inter-flow fairness. Yet, prevailing congestion control algorithms, such as hand-tuned heuristics or those fueled by deep reinforcement learning agents, may struggle to perform well on multiple diverse networks. Besides, many algorithms are unable to adapt to time-varying real-world networking environments; and some algorithms mistakenly overlooked the need of explicitly taking inter-flow fairness into account, and just measured it as an afterthought. In this paper, we propose a new staged training process to train Pareto , a new congestion control algorithm that generalizes well to a wide variety of environments. Different from existing congestion control algorithms running reinforcement learning agents, Pareto is trained for fairness using the first multi-agent reinforcement learning framework that is communication-free. Pareto continues training online adapting to newly observed environments in the real-world. Our extensive array of experiments shows that Pareto (i) performs well in a wide variety of environments, (ii) offers the best fairness when it comes to competing with other flows sharing the same network link, and (iii) improves its performance with online learning to surpass the state-of-the-art.
... With the recent success of machine learning/deep learning on image recognition, video analytics, and natural language processing, there is strong interest in applying machine learning to solving networking problems [5]. The results in [27], [28] provide interesting insights into machine generated congestion protocols. However, these learning algorithms require offline training based on prior knowledge of the network and can only be adopted for limited situations. ...
... Many TCP variants, such as [17], [33], make control decisions based only on the current state, since the basic assumption is the next state only depends on the current state, no matter what congestion signal they use (i.e., delay or loss). In a recent work [27], the exponential average over historical delay is utilized. This is an intuitive solution to the problem of congestion control because the current state is delayed and partially observable. ...
... We next define a utility function for training and evaluation of the proposed algorithm. For resource allocation/traffic engineering, the α-fairness function is widely adopted [27]. In this model, the utility function is defined as ...
Article
Full-text available
As wired/wireless networks become more and more complex, the fundamental assumptions made by many existing TCP variants may not hold true anymore. In this paper, we develop a model-free, smart congestion control algorithm based on deep reinforcement learning (DRL), which has a high potential in dealing with the complex and dynamic network environment. We present TCP-Drinc, acronym for Deep ReInforcement learNing based Congestion control, which learns from past experience in the form of a set of measured features to decide how to adjust the congestion window size.We present the TCP-Drinc design and validate its performance with extensive ns-3 simulations and comparison with five benchmark schemes.
... Congestion control is amongst the most extensively studied topics in this area and, as Internet services and applications become ever more demanding (live video, AR/VR, edge computing, IoT, etc.), and the number of network users steeply rises, is ever increasing in importance. Indeed, recent years have witnessed a surge of interest in the design and analysis of congestion control algorithms and protocols (see, e.g., [28,29,6,4,7]). ...
... Congestion control protocols typically fall into two main categories: (1) protocols designed (either handcrafted [29,2]) or automatically generated, e.g., Remy [28]) for a specific network environment, or a predetermined range of such environments (say, mobile networks, satellite networks, datacenter networks, etc.), and (2) "all purpose" protocols designed to perform well across a broad range of environments, e.g., PCC [6,7]. While protocols in the first category might achieve high performance when the network matches their design assumptions, they can suffer from poor performance when this is not so. ...
... Recent algorithms have proposed utility functions that their congestion control algorithm is intended to optimize for. We train our agent to optimize two such functions (the "power" function, a common objective in congestion control, and a version of which was used in Remy and Copa [28,3], R power , Figure 7: Fairness of various congestion control schemes on a 32Mbps, 32ms latency link with 0% random loss and a 500KB buffer. Each test was run for two minutes. ...
Preprint
Full-text available
We present and investigate a novel and timely application domain for deep reinforcement learning (RL): Internet congestion control. Congestion control is the core networking task of modulating traffic sources' data-transmission rates so as to efficiently and fairly allocate network resources. Congestion control is fundamental to computer networking research and practice, and has recently been the subject of extensive attention in light of the advent of challenging Internet applications such as live video, augmented and virtual reality, Internet-of-Things, and more. We build on the recently introduced Performance-oriented Congestion Control (PCC) framework to formulate congestion control protocol design as an RL task. Our RL framework opens up opportunities for network practitioners, and even application developers, to train congestion control models that fit their local performance objectives based on small, bootstrapped models, or complex, custom models, as their resources and requirements merit. We present and discuss the challenges that must be overcome so as to realize our long-term vision for Internet congestion control.
... Among several 5G multi-connectivity schemes [2], multipath transport protocols, such as multipath Transmission Control Protocol (MPTCP) [3] and multipath QUIC (MPQUIC) [4], have recently gained significant attention. In particular, this is due to the Technical Specification (TS) 23.501 (Release 16) by 3 rd Generation Partnership Project (3GPP) [5], where it is discussed how 5G systems can take advantage of multipath transport protocols to support the Access Traffic Steering, Switching and Splitting (ATSSS) architecture, ultimately enabling multi-connectivity between 3GPP access, such as Long Term Evolution (LTE) and 5G New Radio (NR), and non-3GPP Wireless Local Area Networks (WLAN), such as WiFi. ...
... Therefore, the assumption is that offline data includes a complete enough set of environment characteristics that could be experienced when the model/policy is actually used. To mention a few, offline learning is used to derive offline data-based policies for congestion control using optimization approach [23], Adaptive Bit Rate (ABR) streaming using DQN [24] or Asynchronous Advantage Actor Critic (A3C) [25], and device resource management using DQN [26] or Support Vector Machine (SVM) [27]. To the best of our knowledge, offline learning is not currently used for multipath scheduling. ...
Preprint
Full-text available
Multipath transport protocols enable the concurrent use of different network paths, benefiting a fast and reliable data transmission. The scheduler of a multipath transport protocol determines how to distribute data packets over different paths. Existing multipath schedulers either conform to predefined policies or to online trained policies. The adoption of millimeter wave (mmWave) paths in 5th Generation (5G) networks and Wireless Local Area Networks (WLANs) introduces time-varying network conditions, under which the existing schedulers struggle to achieve fast and accurate adaptation. In this paper, we propose FALCON, a learning-based multipath scheduler that can adapt fast and accurately to time-varying network conditions. FALCON builds on the idea of meta-learning where offline learning is used to create a set of meta-models that represent coarse-grained network conditions, and online learning is used to bootstrap a specific model for the current fine-grained network conditions towards deriving the scheduling policy to deal with such conditions. Using trace-driven emulation experiments, we demonstrate FALCON outperforms the best state-of-the-art scheduler by up to 19.3% and 23.6% in static and mobile networks, respectively. Furthermore, we show FALCON is quite flexible to work with different types of applications such as bulk transfer and web services. Moreover, we observe FALCON has a much faster adaptation time compared to all the other learning-based schedulers, reaching almost an 8-fold speedup compared to the best of them. Finally, we have validated the emulation results in real-world settings illustrating that FALCON adapts well to the dynamicity of real networks, consistently outperforming all other schedulers.
... Many authors have applied learning-based techniques to improve TCP congestion control, with promising results being reported [5]- [10]. The range of proposals goes from optimization heuristics, such as Remy [11] that searches for the best congestion window size and intersend time based on the current state of the network; to classical Reinforcement Learning (RL) based on tabular Q-learning using Sparse Distributed Memories to approximate state-action values [12]. More recent proposals mainly rely on deep RL algorithms, be it learning congestion control policies from the scratch [13]- [20], or hybrids that learn from or cooperate with classical algorithms [21], [22]. ...
... More recent proposals mainly rely on deep RL algorithms, be it learning congestion control policies from the scratch [13]- [20], or hybrids that learn from or cooperate with classical algorithms [21], [22]. Most of these works report improvements in comparison to classical TCP algorithms [11], [12], [18], mainly in metrics such as fairness among senders, latency and overall link utilization. ...
Article
Full-text available
Centralized Radio Access Networks (C-RANs) are improving their cost-efficiency through packetized fronthaul networks. Such a vision requires network congestion control algorithms to deal with sub-millisecond delay budgets while optimizing link utilization and fairness. Classic congestion control algorithms have struggled to optimize these goals simultaneously in such scenarios. Therefore, many Reinforcement Learning (RL) approaches have recently been proposed to deal with such limitations. However, when considering RL policies’ deployment in the real world, many challenges exist. This paper deals with the real-time inference challenge, where a deployed policy has to output actions in microseconds. The experiments here evaluate the tradeoff of inference time and performance regarding a TD3 (Twin-delayed Deep Deterministic Policy Gradient) policy baseline and simpler Decision Tree (DT) policies extracted from TD3 via a process of policy distillation. The results indicate that DTs with a suitable depth can maintain performances similar to those of the TD3 baseline. Additionally, we show that by converting the distilled DTs to rules in C++, we can make inference-time nearly negligible, i.e., sub-microsecond time scale. The proposed method enables the use of state-of-the-art RL techniques to congestion control scenarios with tight inference-time and computational constraints.
... ML application in performance optimization concern congestion control (e.g. [Geurts 2004, Winstein 2013 [Mijumbi 2014, Mao 2016, Tayyaba 2020). Finally when it comes to networking security one of the field when ML was leveraged a lot, we have intrusion detection system (e.g. ...
... It must satisfy the researched performance goal of providing high throughput (especially goodput in this study) γ for the elephant flow and low completion time τ for the incast traffic. Inspired from the network power metric [Floyd 2008, Winstein 2013, which follows a similar goal, we define U (τ, γ) = log(γ) − log(τ ). ...
Thesis
With the exponential growth in technology performance, the modern world has become highly connected, digitized, and diverse. Within this hyper-connected world, Communication networks or the Internet are part of our daily life and play many important roles. However, the ever-growing internet services, application, and massive traffic growth complexify networks that reach a point where traditional management functions mainly govern by human operations fail to keep the network operational. In this context, Software-Defined Networking (SDN) emerge as a new architecture for network management. It makes networks programmable by bringing flexibility in their control and management. Even if network management is eased, it is still tricky to handle due to the continuous growth of network complexity. Management tasks remain then complex. Faced with this, the concept of self-driving networking arose. It consists of leveraging recent technological advancements and scientific innovation in Artificial Intelligence (AI)/Machine Learning (ML) with SDN. Compared to traditional management approaches using only analytic mathematical models and optimization, this new paradigm is a data-driven approach. The management operations will leverage the ML ability to exploit hidden pattern in data to create knowledge. This association SDN-AI/ML, with the promise to simplify network management, needs many challenges to be addresses. Self-driving networking or full network automation is the "Holy Grail" of this association. In this thesis, two of the concerned challenges retain our attention. Firstly, efficient data collection with SDN, especially real-time telemetry. For this challenge, we propose COCO for COnfidence-based COllection, a low overhead near-real-time data collection in SDN. Data of interest is collected efficiently from the data plane to the control plane, where they are used whether by traditional management applications or machine-learning-based algorithms. Secondly, we tackle the effectiveness of the use of machine learning to handle complex management tasks. We consider application performance optimization in data centers. We propose a machine-learning-based incast performance inference, where analytical models struggle to provide general and expert-knowledge-free performance models. With this ML-performance model, smart buffering schemes or other QoS optimization algorithms could dynamically optimize traffic performance. These ML-based management schemes are built upon SDN, leveraging its centralized global view, telemetry capabilities, and management flexibility. The effectiveness of our efficient data collection framework and the machine-learning-based performance optimization show promising results. We expect that improved SDN monitoring with AI/ML analytics capabilities can considerably augment network management and make a big step in the self-driving network journey.
... Xu et al. [93] defined the state as a vector of throughput and delay of all sessions, and the action as the set of split ratios for different sessions. The reward is defined as the sum of utility functions of all sessions, which are related to end-to-end throughput and delay in the form of the αfairness model [108]. The α is a parameter controlling the tradeoff between fairness and efficiency. ...
... The action is the change of sending rate, and the reward is represented by a utility function that favors improved throughput but penalizes an increase of latency and packet loss. After having trained and tested in Gym using the PPO algorithm, the Aurora agent is shown to perform better or comparably with the benchmarks including the BBR [139], PCC-Vivace [140], RemyCC [108], and Copa [141]. The drawbacks of the Aurora is that each agent is treated as an independent learner [142], and coordinated multi-agent decision-making is also unavailable. ...
Article
After decades of unprecedented development, modern networks have evolved far beyond expectations in terms of scale and complexity. In many cases, traditional traffic engineering (TE) approaches fail to address the quality of service (QoS) requirements of modern networks. In recent years, deep reinforcement learning (DRL) has proved to be a feasible and effective solution for autonomously controlling and managing complex systems. Massive growth in the use of DRL applications in various domains is beginning to benefit the communications industry. In this paper, we firstly provide a comprehensive overview of DRL-based TE. Then, we present a detailed literature review on applications of DRL for TE including three fundamental issues: routing optimization, congestion control, and resource management. Finally, we discuss our insights into the challenges and future research perspectives of DRL-based TE.
... ML application in performance optimization concern congestion control (e.g. [Geurts 2004, Winstein 2013 [Mijumbi 2014, Mao 2016, Tayyaba 2020). Finally when it comes to networking security one of the field when ML was leveraged a lot, we have intrusion detection system (e.g. ...
... It must satisfy the researched performance goal of providing high throughput (especially goodput in this study) γ for the elephant flow and low completion time τ for the incast traffic. Inspired from the network power metric [Floyd 2008, Winstein 2013, which follows a similar goal, we define U (τ, γ) = log(γ) − log(τ ). ...
... Of late, deterministic efforts have been explored to minimize operational overhead and improve traffic prediction accuracy by using features of the data stream, instead of relying on the traffic volume [23]. As compared to the TSF methods, traffic forecasting as a non-time series forecasting (Non-TSF) problem can be modeled using other approaches and characteristics [22], [24]. Similarly, instead of depending only on the traffic volume, efforts have been made to create a frequency domain-based model for network traffic streams [25]. ...
... As a result, TCP decreases its transmission rate unnecessarily, at every single observed packet loss, reducing end-to-end control of the bandwidth in different networks. Therefore, TCP throughput for wireless networks can be increased by correctly determining the cause of packet loss and lowering the rate of transmission whenever congestion is observed [24], [30]. Currently, there is no method for TCP congestion control to identify the cause of the packet loss. ...
Article
Full-text available
Recent years have seen a surge in the use of technology for executing transactions in both online and offline modes. Various industries like banking, e-commerce, and private organizations use networks for the exchange of confidential information and resources. Network security is thus of utmost importance, with the expectation of effective and efficient analysis of the network traffic. Wireless Mesh Networks are effective in communicating information over a vast span with minimal costs. A network is evaluated based on its security, accessibility, and extent of interoperability. Artificial Intelligence techniques like machine learning and deep learning have found widespread use to solve a range of challenging, real-world problems. These techniques are well known for their ability to detect issues or patterns in traffic along with advancements in computing capabilities. Extensive research is being carried out to improve the performance of Wireless Mesh Networks. This survey aims to provide a disinterested overview of the application of different artificial intelligence techniques to enhance network performance. We focus on approaches that address the three fundamental problems in networking: traffic prediction, traffic routing, and congestion control. Our paper also includes the bibliometric analysis of the literature, highlighting the ongoing efforts in terms of statistics across multiple metrics. This survey aims to provide researchers in this community with a reliable compendium to get a brief yet succinct understanding of the current progress in the domain.
... Over the last few years, machine learning approaches have gained momentum for throughput prediction. The proposed solutions include the prediction of shorter horizons [5,6,7] and longer horizons [8]. The typical approach includes applying the Recurrent Neural Network (RNN) architecture for prediction tasks [9]. ...
Article
Full-text available
AI-driven data analysis methods have garnered attention in enhancing the performance of wireless networks. One such application is the prediction of downlink throughput in mobile cellular networks. Accurate throughput predictions have demonstrated significant application benefits, such as improving the quality of experience in adaptive video streaming. However, the high degree of variability in cellular link behaviour, coupled with device mobility and diverse traffic demands, presents a complex problem. Numerous published studies have explored the application of machine learning to address this problem, displaying potential when trained and evaluated with traffic traces collected from operational networks. The focus of this paper is an empirical investigation of machine learning-based throughput prediction that runs in real-time on a smartphone, and its evaluation with video streaming in a range of real-world cellular network settings. We report on a number of key challenges that arise when performing prediction “in the wild”, dealing with practical issues one encounters with online data (not traces) and the limitations of real smartphones. These include data sampling, distribution shift, and data labelling. We describe our current solutions to these issues and quantify their efficacy, drawing lessons that we believe will be valuable to network practitioners planning to use such methodologies in operational cellular networks.
... However, no single CCA can adequately prevail across all environments [2,3]. To satisfy the increasingly diverse application requirements over highly complex network conditions, learning-based CCAs have gained much attraction recently [4][5][6][7][8]. However, their exploration-based models may make mistakes or dangerous actions, resulting in poor performance. ...
... Consequently, DRL has also been adopted to solve resource allocation problems in cloud systems. In the context of cloud systems, RL has been applied to design congestion control protocols [38] and develop simple resource management systems by treating the problem as learning packing tasks with multiple resource demands [39]. Luan et al. [40] proposed a GPU cluster scheduler that leverages DRL for intelligent locality-aware scheduling of deep learning training jobs. ...
Preprint
Full-text available
This paper addresses the important need for advanced techniques in continuously allocating workloads on shared infrastructures in data centers, a problem arising due to the growing popularity and scale of cloud computing. It particularly emphasizes the scarcity of research ensuring guaranteed capacity in capacity reservations during large-scale failures. To tackle these issues, the paper presents scalable solutions for resource management. It builds on the prior establishment of capacity reservation in cluster management systems and the two-level resource allocation problem addressed by the Resource Allowance System (RAS). Recognizing the limitations of Mixed Integer Linear Programming (MILP) for server assignment in a dynamic environment, this paper proposes the use of Deep Reinforcement Learning (DRL), which has been successful in achieving long-term optimal results for time-varying systems. A novel two-level design that utilizes a DRL-based algorithm is introduced to solve optimal server-to-reservation assignment, taking into account of fault tolerance, server movement minimization, and network affinity requirements due to the impracticality of directly applying DRL algorithms to large-scale instances with millions of decision variables. The paper explores the interconnection of these levels and the benefits of such an approach for achieving long-term optimal results in the context of large-scale cloud systems. We further show in the experiment section that our two-level DRL approach outperforms the MIP solver and heuristic approaches and exhibits significantly reduced computation time compared to the MIP solver. Specifically, our two-level DRL approach performs 15% better than the MIP solver on minimizing the overall cost. Also, it uses only 26 seconds to execute 30 rounds of decision making, while the MIP solver needs nearly an hour.
... It is a decision tree based classifier to analyze the delay and inter arrival times of ACKs. TCP ex Machina's Remy program [16] produces computer generated congestion control algorithm. It analyzes the prior network assumptions and traffic for a specific objective such as to achieve higher throughput. ...
Article
Full-text available
TCP-Transmission Control Protocol provides connection oriented and reliable communication. TCP’s default consideration of any loss as a cause of network congestion degrades the end-to-end performance in wireless networks specially in MANETs-Mobile Adhoc Networks. TCP should identify various non congestion losses such as channel loss and route failure loss to act accordingly. Over the years, researchers have proposed Machine Learning based network protocols for accurate and efficient decision making. Reinforcement learning suits better for the dynamic networks with unpredictable traffic and topology. TCP-RLLD, TCP with Reinforcement Learning based Loss Differentiation is an end-to-end transport layer solution to predict cause of a packet loss. TCP’s default consideration of any loss as a congestion loss is overruled by TCP-RLLD to avoid unnecessary reduction of the transmission rate. TCP-RLLD is evaluated with multiple TCP variants for the Mobile Adhoc Networks. The extensive evaluation is performed with NS-3 simulator. This paper discusses TCP-RLLD architecture along with the detail of performance improvement.
... To enhance the fairness among flows, we apply function log(·) to throughput and function (·) α1 and (·) α2 (α 1 , α 2 ≥ 1) to latency and packet loss ratio, respectively. Such a fairnessenhanced utility function is in line with many previous works like [15] [22]. In this paper, we set α 1 = 1 and α 2 = 1. ...
Preprint
Full-text available
p>This work is under review of IEEE Transactions on Parallel and Distributed Systems. </h3
... To enhance the fairness among flows, we apply function log(·) to throughput and function (·) α1 and (·) α2 (α 1 , α 2 ≥ 1) to latency and packet loss ratio, respectively. Such a fairnessenhanced utility function is in line with many previous works like [15] [22]. In this paper, we set α 1 = 1 and α 2 = 1. ...
Preprint
Full-text available
p>This work is under review of IEEE Transactions on Parallel and Distributed Systems. </h3
... As learning techniques became popular, there were attempts to automatically perform the task of congestion control. Winstein et al. [58] designed Remy, a distributed congestion control solution for heterogeneous and dynamic network environments. Remy formulates congestion control as an optimization problem and implements an offline mapping from all possible events to good actions using a dynamic programming approach. ...
Article
Full-text available
Bandwidth prediction is critical in any Real-time Communication (RTC) service or application. This component decides how much media data can be sent in real time. Subsequently, the video and audio encoder dynamically adapts the bitrate to achieve the best quality without congesting the network and causing packets to be lost or delayed. To date, several RTC services have deployed the heuristic-based Google Congestion Control (GCC), which performs well under certain circumstances and falls short in some others. In this paper, we leverage the advancements in reinforcement learning and propose BoB (Bang-on-Bandwidth) — a hybrid bandwidth predictor for RTC. At the beginning of the RTC session, BoB uses a heuristic-based approach. It then switches to a learning-based approach. BoB predicts the available bandwidth accurately and improves bandwidth utilization under diverse network conditions compared to the two winning solutions of the ACM MMSys'21 grand challenge on bandwidth estimation in RTC. An open-source implementation of BoB is publicly available for further testing and research.
... This rate parameter is adapted to the path capabilities to determine a sending rate, so that the probability that packets get queued is small. A congestion control algorithm designed by an automatic learning process from timestamps of sent and received packets and RTT estimates is presented in [29]. The results show that algorithms that are trained automatically outperform designed algorithms such as TCP Reno or TCP Cubic. ...
Article
Full-text available
Selection of the optimal transmission rate in packet-switched best-effort networks is challenging. Typically, senders do not have any information about the end-to-end path and should not congest the connection but at once fully utilize it. The accomplishment of these goals lead to congestion control protocols such as TCP Reno, TCP Cubic, or TCP BBR that adapt the sending rate according to extensive measurements of the path characteristics by monitoring packets and related acknowledgments. To improve and speed up this adaptation, we propose and evaluate a machine learning approach for the prediction of sending rates from measurements of metrics provided by the TCP stack. For the prediction a neural network is trained and evaluated. The prediction is implemented in the TCP stack to speed up TCP slow start. For a customizable and performant implementation the extended Berkeley packet filter is used to extract relevant data from the kernel space TCP stack, to forward the monitoring data to a user space data rate prediction, and to feed the prediction result back to the stack. Results from a online experiment show improvement in flow completion time of up to 30%.
... For example, Vivace [16] and PCC [5] adjusts sending rates in real time, and determines the size of the increment according to a performance utility function gradient. RemyCC [17] iteratively searches for a state-action mapping table to maximize an objective function. However, this intra-session estimation still suer from inexibility since it relies on the stability and predictability of the underlying network. ...
... ML benefits various networking applications, e.g., congestion control [63], [64], intrusion detection systems [55], [65], traffic classification [66], [67], and task scheduling [7], [9]. It allows inferring system states from networking features. ...
Article
Full-text available
In order to dynamically manage and update networking policies in cloud data centers, Virtual Network Functions (VNFs) use, and therefore actively collect, networking state information -and in the process, incur additional control signaling and management overhead, especially in larger data centers. In the meantime, VNFs in production prefer distributed and straightforward heuristics over advanced learning algorithms to avoid intractable additional processing latency under high-performance and low-latency networking constraints. This paper identifies the challenges of deploying learning algorithms in the context of cloud data centers, and proposes Aquarius to bridge the application of machine learning (ML) techniques on distributed systems and service management. Aquarius passively yet efficiently gathers reliable observations, and enables the use of ML techniques to collect, infer, and supply accurate networking state information -without incurring additional signaling and management overhead. It offers fine-grained and programmable visibility to distributed VNFs, and enables both open-and close-loop control over networking systems. This paper illustrates the use of Aquarius with a traffic classifier, an auto-scaling system, and a load balancer -and demonstrates the use of three different ML paradigms -unsupervised, supervised, and reinforcement learning, within Aquarius, for network state inference and service management. Testbed evaluations show that Aquarius suitably improves network state visibility and brings notable performance gains for various scenarios with low overhead.
... Improving RL for networking: Some of our findings regarding the lack of generalization corroborate those in previous work [14,19,24,31,42,52]. To improve RL for networking use cases, prior work has attempted to apply and customize techniques from the ML literature. ...
Preprint
As deep reinforcement learning (RL) showcases its strengths in networking and systems, its pitfalls also come to the public's attention--when trained to handle a wide range of network workloads and previously unseen deployment environments, RL policies often manifest suboptimal performance and poor generalizability. To tackle these problems, we present Genet, a new training framework for learning better RL-based network adaptation algorithms. Genet is built on the concept of curriculum learning, which has proved effective against similar issues in other domains where RL is extensively employed. At a high level, curriculum learning gradually presents more difficult environments to the training, rather than choosing them randomly, so that the current RL model can make meaningful progress in training. However, applying curriculum learning in networking is challenging because it remains unknown how to measure the "difficulty" of a network environment. Instead of relying on handcrafted heuristics to determine the environment's difficulty level, our insight is to utilize traditional rule-based (non-RL) baselines: If the current RL model performs significantly worse in a network environment than the baselines, then the model's potential to improve when further trained in this environment is substantial. Therefore, Genet automatically searches for the environments where the current model falls significantly behind a traditional baseline scheme and iteratively promotes these environments as the training progresses. Through evaluating Genet on three use cases--adaptive video streaming, congestion control, and load balancing, we show that Genet produces RL policies which outperform both regularly trained RL policies and traditional baselines in each context, not only under synthetic workloads but also in real environments.
... Reinforcement learning maintains balance between exploration and exploitation to obtain optimal result [7].Also, new horizons of decision making have been created using the power of reinforcement learning that were earlier not possible using supervised machine learning methods. Reinforcement learning is used in resource Management that was found helpful in cases of job scheduling in computer clusters [8],relay selection in internet telephonic [9],traffic congestion control [10],adaptation of bit rate in video streaming [11] and other variants of reinforcement learning methods have been adopted for resource allocation in field of games [5] and business process management [6]. This paper uses different variants of deep reinforcement learning algorithms majorly based on Q learning approaches for the purpose of resource allocation.The paper suggests tools and techniques for solving the problem of resource allocation using reinforcement learning.The paper contributes towards development of reinforcement learning models for resource allocation in control systems and robotics. ...
Preprint
Full-text available
Reinforcement Learning has applications in field of mechatronics, robotics, and other resource-constrained control system. Problem of resource allocation is primarily solved using traditional predefined techniques and modern deep learning methods. The drawback of predefined and most deep learning methods for resource allocation is failing to meet the requirements in cases of uncertain system environment. We can approach problem of resource allocation in uncertain system environment alongside following certain criteria using deep reinforcement learning. Also, reinforcement learning has ability for adapting to new uncertain environment for prolonged period of time. The paper provides a detailed comparative analysis on various deep reinforcement learning methods by applying different components to modify architecture of reinforcement learning with use of noisy layers, prioritized replay, bagging, duelling networks, and other related combination to obtain improvement in terms of performance and reduction of computational cost. The paper identifies problem of resource allocation in uncertain environment could be effectively solved using Noisy Bagging duelling double deep Q network achieving efficiency of 97.7% by maximizing reward with significant exploration in given simulated environment for resource allocation.
... Another philosophy that has recently gained traction is learning-driven congestion control: although they are not yet mature for widespread use, the Remy [14] and Performanceoriented Congestion Control (PCC) [15] protocols are two important examples of this recent trend. The main issue that learning-driven congestion control has to face is generalization, since the mechanisms are often tied to knowledge about a specific scenario or a limited training set and cannot be used out of the box on the wider Internet without major performance losses. ...
Article
Full-text available
The new possibilities offered by 5G and beyond networks have led to a change in the focus of congestion control from capacity maximization for web browsing and file transfer to latency-sensitive interactive and real-time services, and consequently to a renaissance of research on the subject, whose most well-known result is Google’s Bottleneck Bandwidth and Round-trip propagation time (BBR) algorithm. BBR’s promise is to operate at the optimal working point of a connection, with the minimum Round Trip Time (RTT) and full capacity utilization, striking the balance between resource use efficiency and latency performance. However, while it provides significant performance improvements over legacy mechanisms such as Cubic, it can significantly overestimate the capacity of fast-varying mobile connections, leading to unreliable service and large potential swings in the RTT. Our BBR-S algorithm replaces the max filter that causes this overestimation issue with an Adaptive Tobit Kalman Filter (ATKF), an innovation on the Kalman filter that can deal with unknown noise statistics and saturated measurements, achieving a 40% reduction in the average RTT over BBR, which increases to 60% when considering worst-case latency, while maintaining over 95% of the throughput in 4G and 5G networks.
... The idea of this paper is inspired by the fact that practitioners typically prefer to learn from the coarse-grained patterns observed in previous problem instances and then optimize over a class of algorithms that achieves high performance given the coarse-grained patterns. This approach has been of interest of empirical studies for a long time [10,11,12,13,14,15]. However, developing a theoretical understanding of this approach has received attention only recently, after a seminal work of Gupta and Roughgarden [16,17] and its follow-ups [18,19,20,21]. ...
Article
Full-text available
The design of online algorithms has tended to focus on algorithms with worst-case guarantees, e.g., bounds on the competitive ratio. However, it is well-known that such algorithms are often overly pessimistic, performing sub-optimally on non-worst-case inputs. In this paper, we develop an approach for data-driven design of online algorithms that maintain near-optimal worst-case guarantees while also performing learning in order to perform well for typical inputs. Our approach is to identify policy classes that admit global worst-case guarantees, and then perform learning using historical data within the policy classes. We demonstrate the approach in the context of two classical problems, online knapsack and online set cover, proving competitive bounds for rich policy classes in each case. Additionally, we illustrate the practical implications via a case study on electric vehicle charging.
... These machine learning approaches assume that the network topology is known to the learning algorithm and do not deal with optimizing routing over opaque networks. There has been recent work on using machine learning for flow scheduling [38], congestion control [39]- [42] and optimization in video streaming [43]. In addition, several papers also study the application of RL on routing and network performance optimization problems [44]- [46]. ...
Article
Full-text available
Applications such as virtual reality and online gaming require low delays for acceptable user experience. A key task for over-the-top (OTT) service providers who provide these applications is sending traffic through the networks to minimize delays. OTT traffic is typically generated from multiple data centers which are multi-homed to several network ingresses. However, information about the path characteristics of the underlying network from the ingresses to destinations is not explicitly available to OTT services. These can only be inferred from external probing. In this paper, we combine network tomography with machine learning to minimize delays. We consider this problem in a general setting where traffic sources can choose a set of ingresses through which their traffic enter a black box network. The problem in this setting can be viewed as a reinforcement learning problem with strict linear constraints on a continuous action space. Key technical challenges to solving this problem include the high dimensionality of the problem and handling constraints that are intrinsic to networks. Evaluation results show that our methods achieve up to 60% delay reductions in comparison to standard heuristics. Moreover, the methods we develop can be used in a centralized manner or in a distributed manner by multiple independent agents.
... The idea of this paper is inspired by the fact that practitioners typically prefer to learn from the coarse-grained patterns observed in previous problem instances and then optimize over a class of algorithms that achieves high performance given the coarse-grained patterns. This approach has been of interest of empirical studies for a long time [10,11,12,13,14,15]. However, developing a theoretical understanding of this approach has received attention only recently, after a seminal work of Gupta and Roughgarden [16,17] and its follow-ups [18,19,20,21]. ...
Preprint
Full-text available
The design of online algorithms has tended to focus on algorithms with worst-case guarantees, e.g., bounds on the competitive ratio. However, it is well-known that such algorithms are often overly pessimistic, performing sub-optimally on non-worst-case inputs. In this paper, we develop an approach for data-driven design of online algorithms that maintain near-optimal worst-case guarantees while also performing learning in order to perform well for typical inputs. Our approach is to identify policy classes that admit global worst-case guarantees, and then perform learning using historical data within the policy classes. We demonstrate the approach in the context of two classical problems, online knapsack and online set cover, proving competitive bounds for rich policy classes in each case. Additionally, we illustrate the practical implications via a case study on electric vehicle charging.
... Deep Reinforcement Learning Congestion Control (DRL-CC) [80] algorithm jointly sets the congestion window for all active ows and all paths and achieves high fairness in a wired network scenario with multiple active ows. Authors in [77] propose and evaluate Remy tool that generates congestion control algorithms to run at the endpoints rather than manually formulate each endpoint's reaction to congestion signals. Remy is a heuristic search algorithm that maintains rule tables in which states are mapped to actions. ...
Preprint
Full-text available
Networking protocols are designed through long-time and hard-work human efforts. Machine Learning (ML)-based solutions have been developed for communication protocol design to avoid manual efforts to tune individual protocol parameters. While other proposed ML-based methods mainly focus on tuning individual protocol parameters (e.g., adjusting contention window), our main contribution is to propose a novel Deep Reinforcement Learning (DRL)-based framework to systematically design and evaluate networking protocols. We decouple a protocol into a set of parametric modules, each representing a main protocol functionality that is used as DRL input to better understand the generated protocols design optimization and analyze them in a systematic fashion. As a case study, we introduce and evaluate DeepMAC a framework in which a MAC protocol is decoupled into a set of blocks across popular flavors of 802.11 WLANs (e.g., 802.11 b/a/g/n/ac). We are interested to see what blocks are selected by DeepMAC across different networking scenarios and whether DeepMAC is able to adapt to network dynamics.
... There are works optimizing congestion control strategies through machine intelligence. Remy [7] uses offline training to find the optimal mapping from observed network states to control actions. Other works [8] [9] use deep reinforce leaning approach. ...
Preprint
Recently, much effort has been devoted by researchers from both academia and industry to develop novel congestion control methods. LearningCC is presented in this letter, in which the congestion control problem is solved by reinforce learning approach. Instead of adjusting the congestion window with fixed policy, there are serval options for an endpoint to choose. To predict the best option is a hard task. Each option is mapped as an arm of a bandit machine. The endpoint can learn to determine the optimal choice through trial and error method. Experiments are performed on ns3 platform to verify the effectiveness of LearningCC by comparing with other benchmark algorithms. Results indicate it can achieve lower transmission delay than loss based algorithms. Especially, we found LearningCC makes significant improvement in link suffering from random loss.
... These machine learning approaches assume that the network topology is known to the learning algorithm and do not deal with optimizing routing over opaque networks. There has been recent work on using machine learning for flow scheduling [23], congestion control [24], [25], [26], [27] and optimization in video streaming [28]. To our knowledge, this is the first work that uses the newly developed actor-critic reinforcement techniques for optimizing load distribution using tomographic information only. ...
Preprint
Applications such as virtual reality and online gaming require low delays for acceptable user experience. A key task for over-the-top (OTT) service providers who provide these applications is sending traffic through the networks to minimize delays. OTT traffic is typically generated from multiple data centers which are multi-homed to several network ingresses. However, information about the path characteristics of the underlying network from the ingresses to destinations is not explicitly available to OTT services. These can only be inferred from external probing. In this paper, we combine network tomography with machine learning to minimize delays. We consider this problem in a general setting where traffic sources can choose a set of ingresses through which their traffic enter a black box network. The problem in this setting can be viewed as a reinforcement learning problem with constraints on a continuous action space, which to the best of our knowledge have not been investigated by the machine learning community. Key technical challenges to solving this problem include the high dimensionality of the problem and handling constraints that are intrinsic to networks. Evaluation results show that our methods achieve up to 60% delay reductions in comparison to standard heuristics. Moreover, the methods we develop can be used in a centralized manner or in a distributed manner by multiple independent agents.
... Compound TCP was designed for high-speed and long RTT links, but in presence of BER its performance was degraded 19 . In approach like TCP-EX Machine 20 , it has been suggested to utilize the congestion control algorithm generated by computer. Further, Copa: practical delay based congestion control algorithm proposed in 21 and compared with Cubic and bottleneck bandwidth and round trip propagation time (BBR) 22 . ...
Preprint
Full-text available
Due to the widespread popularity and usage of Internet of things (IoT)-enabled devices, there is an exponential increase in the data traffic generated from these IoT devices. Most of these devices communicate with each other using heterogeneous links having constraints such as latency, throughput, and interference from concurrent transmissions. This results in an extra burden on the underlying communication infrastructure to manage the traffic within these constraints between source and destination. However, most of the existing applications use different Transmission Control Protocol (TCP) variants for traffic management between these devices, and are dependent on the stage of the sender, irrespective of the application types and link characteristics. Each operating system (OS) has different TCP variant for all applications, irrespective of path characteristics. Hence, a single TCP variant cannot select the best suitable link which results in degradation in throughput compared to the existing default. Moreover, it cannot use the full capacity of the available link for different applications and network links, especially in heterogeneous network such as IoT. To cope up with these challenges, in this paper, we propose an Adaptive and Dynamic TCP Interface Architecture (ADYTIA). ADYTIA allows the usage of different TCP variants based on application and link characteristics, irrespective of the physical links of the entire path. It allows the usage of different TCP variants based on their design principle across heterogeneous technologies, platforms, and applications. ADYTIA is implemented on NS-2 and Linux kernel for real testbed experiments. It's ability to select the best suitable TCP variant results in 20% to 80% improvement in throughput compared to the existing default and single TCP variant on Linux and Windows.
... In its initial design, TCP was not meant to operate in wireless environments with links that often face random effects and, depending on the congestion control, make TCP drastically reduce its sending rate; having a long-term detrimental impact [27], [28], [29], [30], [31], [32], [33]. For example, TCP's Additive Increase and Multiplicative Decrease (AIMD) mechanism reduces its sending rate by 50% over one Round-Trip-Time (RTT) 13 and, increase, roughly, one packet per RTT. ...
Preprint
Full-text available
Over the past years, TCP has gone through numerous updates to provide performance enhancement under diverse network conditions. However, with respect to losses, little can be achieved with legacy TCP detection and recovery mechanisms. Both fast retransmission and retransmission timeout take at least one extra round trip time to perform, and this might significantly impact performance of latency-sensitive applications, especially in lossy or high delay networks. While forward error correction (FEC) is not a new initiative in this direction, the majority of the approaches consider FEC inside the application. In this paper, we design and implement a framework, where FEC is integrated within TCP. Our main goal with this design choice is to enable latency sensitive applications over TCP in high delay and lossy networks, but remaining application agnostic. We further incorporate this design into multipath TCP (MPTCP), where we focus particularly on heterogeneous settings, considering the fact that TCP recovery mechanisms further escalate head-of-line blocking in multipath. We evaluate the performance of the proposed framework and show that such a framework can bring significant benefits compared to legacy TCP and MPTCP for latency-sensitive real application traffic, such as video streaming and web services.
... Remy: Remy [30] decides with "a tabular method", and it collects experience from the network simulator with network assumptions, however, like all TCP variants, when the real network deviates from Remy's input assumption, performance degrades. Pensieve: Mao et al. [17] develop a system that uses deep reinforcement learning to select bitrate for future video chunks. ...
Preprint
Real-time video streaming is now one of the main applications in all network environments. Due to the fluctuation of throughput under various network conditions, how to choose a proper bitrate adaptively has become an upcoming and interestingly issue. To tackle this problem, most adaptive bitrate control methods have been proposed to provide high video bitrates instead of video qualities. Nevertheless, we notice that there exists a trade-off between sending bitrate and video quality, which motivates us to focus on how to get a balance between them. In this paper, we propose QARC (video Quality Awareness Rate Control), a rate control algorithm that aims to have a higher perceptual video quality with possibly lower sending rate and transmission latency. Starting from scratch, QARC uses deep reinforcement learning(DRL) algorithm to train a neural network to select future bitrates based on previously observed network status and past video frames. To overcome the "state explosion problem", we design a neural network to predict future perceptual video quality as a vector for taking the place of the raw picture in the DRL's inputs. We evaluate QARC over a trace-driven emulation, outperforming existing approach with improvements in average video quality of 18\% - 25\% and decreases in average latency with 23% -45%. Meanwhile, Comparing QARC with offline optimal high bitrate method on various network conditions also yields a solid result.
... In its initial design, TCP was not meant to operate in wireless environments with links that often face random effects and, depending on the congestion control, make TCP drastically reduce its sending rate; having a long-term detrimental impact [27], [28], [29], [30], [31], [32], [33]. For example, TCP's Additive Increase and Multiplicative Decrease (AIMD) mechanism reduces its sending rate by 50% over one Round-Trip-Time (RTT) 13 and, increase, roughly, one packet per RTT. ...
Article
Over the past years, TCP has gone through numerous updates to provide performance enhancement under diverse network conditions. However, with respect to losses, little can be achieved with legacy TCP detection and recovery mechanisms. Both fast retransmission and retransmission timeout take at least one extra round trip time to perform, and this might significantly impact the performance of latency-sensitive applications, especially in lossy or high delay networks. While forward error correction (FEC) is not a new initiative in this direction, the majority of the approaches consider FEC inside the application. In this paper, we design and implement a framework, where FEC is integrated within TCP. Our main goal with this design choice is to enable latency sensitive applications over TCP in high delay and lossy networks, but remaining application agnostic. We further incorporate this design into multipath TCP (MPTCP), where we focus particularly on heterogeneous settings, considering the fact that TCP recovery mechanisms further escalate head-of-line blocking in multipath. We evaluate the performance of the proposed framework and show that such a framework can bring significant benefits compared with legacy TCP and MPTCP for latency-sensitive real application traffic, such as video streaming and web services.
... We do not compare with those approaches, as they are hard to deploy. More recently a completely different approach called RemyCC [26] was proposed that provides a computer-generated congestion control scheme based on a model of the network that the algorithm will be used in. As such a model is usually not known, especially not for the Internet, RemyCC is hard to deploy when competing with other (unknown) congestion control schemes. ...
Article
Congestion control has been an open research issue for more than two decades. More and more applications with narrow latency requirements are emerging which are not well addressed by existing proposals. In this paper we present TCP Scalable Increase Adaptive Decrease (SIAD), a new congestion control scheme supporting both high speed and low latency. More precisely, our algorithm aims to provide high utilization under various networking conditions, and therefore would allow operators to configure small buffers for low latency support. To provide full scalability with high speed networks, we designed TCP SIAD based on a new approach that aims for a fixed feedback rate independent of the available bandwidth. Further, our approach provides a configuration knob for the feedback rate. This can be used by a higher layer control loop to impact the capacity share, potentially at the cost of higher congestion, e.g. for applications that need a minimum rate. We evaluated TCP SIAD against well-known high-speed congestion control schemes, such as Scalable TCP and High Speed TCP, as well as H-TCP that among other goals targets small buffers. We show that only SIAD is able to utilize the bottleneck with arbitrary buffer sizes while avoiding a standing queue. Moreover, we demonstrate the capacity sharing of SIAD depending on the configured feedback rate and a high robustness of TCP SIAD to non-congestion related loss.
... TCP Remy [102] is the first example of machine learningbased congestion control: the authors define a Markov model of the channel and an objective function with a min parameter α which can be tuned to set the aggressiveness of the protocol (α = 1 corresponds to proportional fairness, while α = 0 does not consider fairness at all and α = ∞ achieves maxmin fairness), and use a machine learning algorithm to define the behavior of the congestion control mechanism. The inputs given to the mechanism are the ratio between the most recent RTT and the lowest measured RTT during the connection, an Exponentially Weighted Moving Average (EWMA) of the interarrival times of the latest ACKs, and an EWMA of the sending times of those same packets. ...
Article
Full-text available
Network load balancing (NLB) is an important element of construction and management of fault tolerance in communication networks. At present, there are a lot of balancing algorithms both for standard approaches to networking and for software defined networks (SDN). An asymmetric transport protocol having the property to use the method anycast for parallel coupling with several servers was developed. The general description of SDN, TCP, and the first asymmetric transport protocol Trickles, as well as one conducted experiment in a network simulator in comparison with Trickles with some implementations of TCP, is presented in the article. A new algorithm for operation of the asymmetric transport protocol based on the experimental results is suggested. Several variants of using the asymmetric transport protocols in terms of SDN are discussed.
Article
Full-text available
Intrusion detection system is one of the implemented solutions against harmful attacks. Moreover, attackers try to always change their tools and techniques. Although, implementing an accepted IDS system is additionally a difficult task. In this paper, several methods are identified and reviewed to assess various machine learning algorithms. Additionally, several conditions for wireless communication for deciding whether to apply ML as well suitable technique of ML. Also traditional summarized approaches along with their performance comparison with ML based techniques are surveyed.
Chapter
Breast cancer is most common in middle-aged female population. It is the fourth most dangerous cancer compared to remaining cancers. In recent years, breast cancer patients are significantly increasing, so the early diagnosis of cancer has become a necessary task in the cancer research, to facilitate subsequent clinical management of patients. The prevention of the breast cancer tumor is early detection of the tumor. Early detection of cancer can stop increase in tumor and saves lives. In the field of machine learning classification, cancer patients are classified into two types as benign or malignant. Different preprocessing techniques like filling missing values, applying correlation coefficient, synthetic minority oversampling technique (SMOTE) and tenfold cross-validations are implemented and aptly used to obtain the accuracy. The main context of this study is to identify key features from the dataset and analyze the performance evaluation of different machine learning algorithms like random forest classifier, logistic regression, support vector machine, decision tree, Gaussian Naive Bayes and k-nearest neighbors. Based on the results, the classification model that gives highest accuracy will be used as the best model for cancer prediction.
Article
In this paper, we analyze a model for transport control protocol (TCP) along with a non-adaptive virtual queue (VQ) and an adaptive virtual queue (AVQ) management policy. In the class of transport protocols, we focus on compound TCP as it is the default protocol in the Windows operating system. We start by conducting a local stability analysis for the underlying fluid models. For the VQ policy, we show that small virtual buffers play an important role in ensuring stability, whereas the AVQ policy could readily lose local stability as the link capacity, the feedback delay, or the link's damping factor gets large. With both the queue policies, the protocol parameters of compound TCP also influence stability. Furthermore, in both the models, we explicitly show that as parameters vary the loss of local stability would occur via a Hopf bifurcation. For the AVQ policy, we are also able to analytically verify if the Hopf bifurcation is super-critical, and determine the stability of the bifurcating limit cycles. Packet-level simulations, conducted over two topologies, using the network simulator (NS2) confirm the existence of stable limit cycles in the queue size.
Conference Paper
Most unwritten languages today have no known grammar, and are rather governed by ``unspoken rules''. Similarly, we think that the young discipline of networking is still a practice that lacks a deep understanding of the rules that govern it. This situation results in a loss of time and efforts. First, since the rules are unspoken, they are not systematically reused. Second, since there is no grammar, it is impossible to assert if a sentence is correct. Comparing two networking approaches or solutions is sometimes a synonym of endless religious debates. Drawing the proper conclusion from this claim, we advocate that networking research should spend more efforts on better understanding its rules as a first step to automatically reuse them. To illustrate our claim, we focus in this paper on one broad family of networking connectivity problems. We show how different instances of this problem, which were solved in parallel with no explicit knowledge reuse, can be derived from a small set of facts and rules implemented in a knowledge-based system.
Conference Paper
The paper proposes a global optimization approach to the network resource allocation problem, where the objective is to maximize the overall data flow through a shared network. In the proposed approach, the utility functions of agents may have different forms, which allows a more realistic modeling of phenomena occurring in computer networks. To solve the optimization problem, a modified gradient projection method has been applied.
Article
The Transmission Control Protocol (TCP) has evolved from its initial form defined in RFC 793 in 1981 to cope with the evolution of IP networks in general and of the Internet in particular. Over the years, several factors have led to the design of successive TCP variants: the increasing disparity of end hosts, the variety of data links characteristics (Optical, wireless cellular or satellite), the increase of the delay-bandwidth product in data networks or the use of multiple paths at the same time. In this context, some TCP variants tune the congestion control and avoidance mechanisms to adapt them to specific situations while others make use of TCP options. In practice, such enhanced versions of TCP can be unusable because of the presence of intermediate elements such as firewalls along the path between the two end hosts. Such elements can indeed filter some TCP options or tamper with the way congestion is managed, introducing then unacceptable jitter in the IP flows. In such cases, most TCP variants are designed to fallback to a generic form of TCP. In many situations, this generic TCP version is not the best fit, while another TCP variant could be used to deal with this transient problem. To address this issue, we introduce an original offer/answer (O/A) mechanism allowing end hosts to identify dynamically a suitable TCP variant able to satisfy the specific constraints of each type of packet flows.
Conference Paper
The Internet is currently undergoing a significant change. The majority of Internet transfers have historically occurred between wired network devices. The popularity of WiFi, combined with the uptake of smartphones and tablets, has changed this assumption and the majority of Internet connections will soon utilise a wireless link. Congestion avoidance controls the rate at which packets leave a TCP sender. This research re-evaluates different congestion avoidance mechanisms over real world wired and wireless links. This paper does not assert that present mechanisms are poor, nor that wireless has never been a consideration, but that the switch to a mobile and wireless majority necessitates a review, with wireless characteristics at the forefront of design considerations. The results of this study show that TCP Cubic and TCP Hybla perform similarly and generally outperform Veno and Westwood in both wired and wireless scenarios. All tested algorithms featured large delays in 3G links. The results suggest that queuing on the 3G links added between 370ms and 570ms of delay. It is suggested that additional research into congestion avoidance and buffering mechanisms over wireless links is needed.
ResearchGate has not been able to resolve any references for this publication.