Article
To read the full-text of this research, you can request a copy directly from the author.

Abstract

The TCP protocol is used by the majority of the network applica-tions on the Internet. TCP performance is strongly influenced by its congestion control algorithms that limit the amount of transmitted traffic based on the estimated network capacity and utilization. Be-cause the freely available Linux operating system has gained pop-ularity especially in the network servers, its TCP implementation affects many of the network interactions carried out today. This paper describes the fundamentals of the Linux TCP design, con-centrating on the congestion control algorithms. The Linux TCP implementation supports SACK, TCP timestamps, Explicit Con-gestion Notification, and techniques to undo congestion window adjustments after incorrect congestion notifications. This paper describes the basic concepts in Linux TCP and its con-gestion control engine. In addition to features specified by IETF, Linux has implementation details beyond the specifications aimed to further improve its performance. The paper presents how Linux TCP differs from the traditional TCP. Finally, we discuss whether it would be reasonable to implement TCP as a kernel module or as a user library.

No full-text available

Request Full-text Paper PDF

To read the full-text of this research,
you can request a copy directly from the author.

ResearchGate has not been able to resolve any citations for this publication.
Article
Full-text available
We analyze two alternative retransmission timers for the Transmission Control Protocol (TCP). We first study the retransmission timer of TCP-Lite which is considered to be the current de facto standard for TCP implementations. After revealing four major problems of TCP-Lite's retransmission timer, we propose a new timer, named the Eifel retransmission timer, that eliminates these. The strength of our work lies in its hybrid analysis methodology. We develop models of both retransmission timers for the class of network-limited TCP bulk data transfers in steady state. Using those models, we predict the problems of TCP-Lite's retransmission timer and develop the Eifel retransmission timer. We then validate our model-based analysis through measurements in a real network that yield the same results.
Article
Full-text available
We propose an enhancement to TCP's error recovery scheme, which we call the Eifel algorithm. It eliminates the retransmission ambiguity, thereby solving the problems caused by spurious timeouts and spurious fast retransmits. It can be incrementally deployed as it is backwards compatible and does not change TCP's congestion control semantics. In environments where spurious retransmissions occur frequently, the algorithm can improve the end-to-end throughput by several tens of percent. An exact quantification is, however, highly dependent on the path characteristics over time. The Eifel algorithm finally makes TCP truly wireless-capable without the need for proxies between the end points. Another key novelty is that the Eifel algorithm provides for the implementation of a more optimistic retransmission timer because it reduces the penalty of a spurious timeout to a single (in the common case) spurious retransmission.
Article
Full-text available
RFC 2001 [RFC2001] documents the following four intertwined TCP congestion control algorithms: Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery. RFC 2581 [RFC2581] explicitly allows certain modifications of these algorithms, including modifications that use the TCP Selective Acknowledgement (SACK) option [MMFR96], and modifications that respond to "partial acknowledgments" (ACKs which cover new data, but not all the data outstanding when loss was detected) in the absence of SACK. This document describes a specific algorithm for responding to partial acknowledgments, referred to as NewReno. This response to partial acknowledgments was first proposed by Janey Hoe in [Hoe95]. 1. Introduction For the typical implementation of the TCP Fast Recovery algorithm described in [RFC2581] (first implemented in the 1990 BSD Reno release, and referred to as the Reno algorithm in [FF96]), the TCP data sender only retransmits a packet after a retransmit timeout has occurred, or afte...
Article
Full-text available
This paper uses simulations to explore the benefits of adding selective acknowledgments (SACK) and selective repeat to TCP. We compare Tahoe and Reno TCP, the two most common reference implementations for TCP, with two modified versions of Reno TCP. The first version is New-Reno TCP, a modified version of TCP without SACK that avoids some of Reno TCP's performance problems when multiple packets are dropped from a window of data. The second version is SACK TCP, a conservative extension of Reno TCP modified to use the SACK option being proposed in the Internet Engineering Task Force (IETF). We describe the congestion control algorithms in our simulated implementation of SACK TCP and show that while selective acknowledgments are not required to solve Reno TCP's performance problems when multiple packets are dropped, the absence of selective acknowledgments does impose limits to TCP's ultimate performance. In particular, we show that without selective acknowledgments, TCP implementations are constrained to either retransmit at most one dropped packet per round-trip time, or to retransmit packets that might have already been successfully delivered. 1
Article
Full-text available
This note defines an extension of the Selective Acknowledgement (SACK) Option [RFC2018] for TCP. RFC 2018 specified the use of the SACK option for acknowledging out-of-sequence data not covered by TCP's cumulative acknowledgement field. This note extends RFC 2018 by specifying the use of the SACK option for acknowledging duplicate packets. This note suggests that when duplicate packets are received, the first block of the SACK option field can be used to report the sequence numbers of the packet that triggered the acknowledgement. This extension to the SACK option allows the TCP sender to infer the order of packets received at the receiver, allowing the sender to infer when it has unnecessarily retransmitted a packet. A TCP sender could then use this information for more robust operation in an environment of reordered packets, ACK loss, packet replication, and/or early retransmit timeouts. Floyd et al. [Page 1] draft-floyd-sack SACK Extension August 1999 1. Conventions and Acronyms T...
Article
Abstract TCP,may,experience,poor,performance,when,multiple,packets,are,lost from,one window,of data.,With,the limited,information,available from cumulative acknowledgments,a TCP sender can only learn about a single,lost,packet,per round,trip,time. An aggressive,sender,could choose to retransmit packets early, but such retransmitted segments may,have,already,been,successfully,received. A Selective Acknowledgment (SACK) mechanism, combined with a
Article
This document defines the standard algorithm that Transmission Control Protocol (TCP) senders are required to use to compute and manage their retransmission timer. It expands on the discussion in section 4.2.3.1 of RFC 1122 and upgrades the requirement of supporting the algorithm from a SHOULD to a MUST.
Article
In October of '86, the Internet had the first of what became a series of 'congestion collapses'. During this period, the data throughput from LBL to UC Berkeley (sites separated by 400 yards and three IMP hops) dropped from 32 Kbps to 40 bps. Mike Karels ¹ and I were fascinated by this sudden factor-of-thousand drop in bandwidth and embarked on an investigation of why things had gotten so bad. We wondered, in particular, if the 4.3BSD (Berkeley UNIX) TCP was mis-behaving or if it could be tuned to work better under abysmal network conditions. The answer to both of these questions was “yes”. Since that time, we have put seven new algorithms into the 4BSD TCP: round-trip-time variance estimation exponential retransmit timer backoff slow-start more aggressive receiver ack policy dynamic window sizing on congestion Karn's clamped retransmit backoff fast retransmit Our measurements and the reports of beta testers suggest that the final product is fairly good at dealing with congested conditions on the Internet. This paper is a brief description of ( i ) - ( v ) and the rationale behind them. ( vi ) is an algorithm recently developed by Phil Karn of Bell Communications Research, described in [KP87]. ( viii ) is described in a soon-to-be-published RFC. Algorithms ( i ) - ( v ) spring from one observation: The flow on a TCP connection (or ISO TP-4 or Xerox NS SPP connection) should obey a 'conservation of packets' principle. And, if this principle were obeyed, congestion collapse would become the exception rather than the rule. Thus congestion control involves finding places that violate conservation and fixing them. By 'conservation of packets' I mean that for a connection 'in equilibrium', i.e., running stably with a full window of data in transit, the packet flow is what a physicist would call 'conservative': A new packet isn't put into the network until an old packet leaves. The physics of flow predicts that systems with this property should be robust in the face of congestion. Observation of the Internet suggests that it was not particularly robust. Why the discrepancy? There are only three ways for packet conservation to fail: The connection doesn't get to equilibrium, or A sender injects a new packet before an old packet has exited, or The equilibrium can't be reached because of resource limits along the path. In the following sections, we treat each of these in turn.
Conference Paper
We have developed a Forward Acknowledgment (FACK) congestion control algorithm which addresses many of the performance problems recently observed in the Internet. The FACK algorithm is based on first principles of congestion control and is designed to be used with the proposed TCP SACK option. By decoupling congestion control from other algorithms such as data recovery, it attains more precise control over the data flow in the network. We introduce two additional algorithms to improve the behavior in specific situations. Through simulations we compare FACK to both Reno and Reno with SACK. Finally, we consider the potential performance and impact of FACK in the Internet.
Article
The TCP protocol is used by the majority of the network applications on the Internet. TCP performance is strongly influenced by its congestion control algorithms that limit the amount of transmitted traffic based on the estimated network capacity and utilization. Because the freely available Linux operating system has gained popularity especially in the network servers, its TCP implementation affects many of the network interactions carried out today. We describe the fundamentals of the Linux TCP design, concentrating on the congestion control algorithms. The Linux TCP implementation supports SACK, TCP timestamps, Explicit Congestion Notification, and techniques to undo congestion window adjustments after incorrect congestion notifications.
Article
TCP's congestion window controls the number of packets a TCP flow may have in the network at any time. However, long periods when the sender is idle or application-limited can lead to the invalidation of the congestion window, in that the congestion window no longer reflects current information about the state of the network. In this paper we propose a simple modification to TCP's congestion control algorithms to decay the congestion window cwnd after the transition from a sufficiently-long applicationlimited period, while using the slow-start threshold ssthresh to save information about the previous value of the congestion window. An invalid congestion window also results when the congestion window is increased (i.e., in TCP's slow-start or congestion avoidance phases) during application-limited periods, when the previous value of the congestion window might never have been fully utilized. We propose that the TCP sender should not increase the congestion window when the TCP ...
Article
This document specifies the incorporation of ECN (Explicit Congestion Notification) to TCP and IP, including ECN's use of two bits in the IP header. We begin by describing TCP's use of packet drops as an indication of congestion. Next we explain that with the addition of active queue management (e.g., RED) to the Internet infrastructure, where routers detect congestion before the queue overflows, routers are no longer limited to packet drops as an indication of congestion. Routers can instead set the Congestion Experienced (CE) codepoint in the IP header of packets from ECN-capable transports. We describe when the CE codepoint is to be set in routers, and describe Ramakrishnan, Floyd, Black Proposed Standard [Page 1] draft-ietf-tsvwg-ecn-04 Addition of ECN to IP June 2001 modifications needed to TCP to make it ECN-capable. Modifications to other transport protocols (e.g., unreliable unicast or multicast, reliable multicast, other reliable unicast transport protocols) could be considered as those protocols are developed and advance through the standards process. We also describe in this document the issues involving the use of ECN within IP tunnels, and within IPsec tunnels in particular. One of the guiding principles for this document is that, to the extent possible, the mechanisms specified here be incrementally deployable. One challenge to the principle of incremental deployment has been the prior existence of some IP tunnels that were not compatible with the use of ECN. As ECN becomes deployed, noncompatible IP tunnels will have to be upgraded to conform to this document. This document is intended to obsolete RFC 2481, "A Proposal to add Explicit Congestion Notification (ECN) to IP", which defined ECN as an Experimental Protocol for the Internet Communit...
Article
This memo presents a set of TCP extensions to improve performance over large bandwidth*delay product paths and to provide reliable operation over very high-speed paths. It defines new TCP options for scaled windows and timestamps, which are designed to provide compatible interworking with TCP's that do not implement the extensions. The timestamps are used for two distinct mechanisms: RTTM (Round Trip Time Measurement) and PAWS (Protect Against Wrapped Sequences). Selective acknowledgments are not included in this memo. This memo combines and supersedes RFC-1072 and RFC-1185, adding additional clarification and more detailed specification. Appendix C summarizes the changes from the earlier RFCs. TABLE OF CONTENTS 1. Introduction ................................................. 2 2. TCP Window Scale Option ...................................... 8 3. RTTM -- Round-Trip Time Measurement .......................... 11 4. PAWS -- Protect Against Wrapped Sequence Numbers ............. 17 5. C...
Start-up Dynamics of TCP's Congestion Control and Avoidance Schemes. Master's thesis, Massachusetts Institute of Technology
  • J Hoe
J. Hoe. Start-up Dynamics of TCP's Congestion Control and Avoidance Schemes. Master's thesis, Massachusetts Institute of Technology, June 1995.
An Extension to the Selective Acknowledgment (SACK) Option for TCP. RFC 2883
  • S Floyd
  • J Mahdavi
  • M Mathis
  • M Podolsky
S. Floyd, J. Mahdavi, M. Mathis, and M. Podolsky. An Extension to the Selective Acknowledgment (SACK) Option for TCP. RFC 2883, July 2000.
A Conservative SACK-based Loss Recovery Algorithm for TCP. Internet draft " draft-allman-tcp-sack-13.txt
  • E Blanton
  • M Allman
  • K Fall
  • L Wang
E. Blanton, M. Allman, K. Fall, and L. Wang. A Conservative SACK-based Loss Recovery Algorithm for TCP. Internet draft " draft-allman-tcp-sack-13.txt ", Oct. 2002. Work in progress.
  • M Mathis
  • J Mahdavi
  • S Floyd
  • A Romanow
M. Mathis, J. Mahdavi, S. Floyd, and A. Romanow. TCP Selective Acknowledgement Options. RFC 2018, Oct. 1996.
  • W Stevens
  • Tcp
  • Ip Illustrated
W. Stevens. TCP/IP Illustrated, Volume 1; The Protocols. Addison Wesley, 1995.