[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 03

TCPM Working Group                                            S. Schuetz
Internet-Draft                                                       NEC
Intended status: Standards Track                               L. Eggert
Expires: September 6, 2007                                         Nokia
                                                                 W. Eddy
                                                                Y. Swami
                                                                   K. Le
                                                           March 5, 2007

      TCP Response to Lower-Layer Connectivity-Change Indications

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.
   This document may not be modified, and derivative works of it may not
   be created, except to publish it as an RFC and to translate it into
   languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on September 6, 2007.

Copyright Notice

   Copyright (C) The IETF Trust (2007).

Schuetz, et al.         Expires September 6, 2007               [Page 1]

Internet-Draft  TCP Response to Connectivity Indications      March 2007


   When the path characteristics between two hosts change abruptly, TCP
   can experience significant delays before resuming transmission in an
   efficient manner or TCP can behave unfairly to competing traffic.
   This document describes TCP extensions that improve transmission
   behavior in response to advisory, lower-layer connectivity-change
   indications.  The proposed TCP extensions modify the local behavior
   of TCP and introduce a new TCP option to signal locally received
   connectivity-change indications to remote peers.  Performance gains
   result from a more efficient transmission behavior and there is no
   difference in aggressiveness in comparison to a freshly-started

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Motivation and Overview  . . . . . . . . . . . . . . . . . . .  4
   3.  Connectivity-Change Indications  . . . . . . . . . . . . . . .  6
   4.  TCP Response to Connectivity-Change Indications  . . . . . . .  7
     4.1.  Connectivity-Change Indication TCP Option  . . . . . . . .  8
     4.2.  Generation and Processing of Connectivity-Change
           Indication TCP Options . . . . . . . . . . . . . . . . . .  9
     4.3.  Re-Probing Path Characteristics  . . . . . . . . . . . . . 13
     4.4.  Speculative Retransmission . . . . . . . . . . . . . . . . 14
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 14
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 15
   7.  Acknowledgments  . . . . . . . . . . . . . . . . . . . . . . . 15
   8.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 15
     8.1.  Normative References . . . . . . . . . . . . . . . . . . . 15
     8.2.  Informative References . . . . . . . . . . . . . . . . . . 16
   Editorial Comments . . . . . . . . . . . . . . . . . . . . . . . .
   Appendix A.  Background: Classification of Connectivity
                Disruptions . . . . . . . . . . . . . . . . . . . . . 18
     A.1.  Short Connectivity Disruptions . . . . . . . . . . . . . . 20
     A.2.  Long Connectivity Disruptions  . . . . . . . . . . . . . . 21
   Appendix B.  Document Revision History . . . . . . . . . . . . . . 24
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 24
   Intellectual Property and Copyright Statements . . . . . . . . . . 26

Schuetz, et al.         Expires September 6, 2007               [Page 2]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

1.  Introduction

   The Transmission Control Protocol (TCP) [RFC0793] generally assumes
   that the end-to-end path between two hosts has characteristics that
   are relatively stable over the lifetime of a connection.  Although
   TCP's congestion control algorithms [RFC2581] can adapt to changes to
   the path characteristics after several round-trip times, they fail to
   support efficient operation in the few round-trip times immediately
   after a significant path change.  This is due to the granularity of
   TCP's sampling mechanisms.  Significant changes to path connectivity
   include loss or reestablishment of connectivity, and drastic, abrupt
   changes in round-trip time (RTT) or available bandwidth.
   Connectivity changes that occur on such short time-scales are
   becoming more common, due to host mobility or intermittent network

   This document describes a set of complementary TCP extensions that
   improve behavior when transmitting over paths whose characteristics
   can change on short time-scales.  TCP implementations that support
   these extensions respond to receiving generic, link-technology-
   independent, per-connection "path characteristics have changed" (or
   short: "connectivity-change") indications from lower layers.  A
   connectivity-change indication signals that the characteristics of
   the end-to-end path between the local node and its peer have changed
   in some undefined way.  The response mechanisms proposed for TCP act
   on this information in a conservative fashion.  The specific response
   depends on the state of a connection.

   It is important to note that this addition of response mechanisms to
   lower-layer information is following an established precedent.  TCP
   and other transport protocols already react to information and
   signals from lower layers; the proposed connectivity-change
   indications thus extend an established interface between layers in
   the protocol stack.  TCP measures the end-to-end path to implicitly
   derive network-layer information.  TCP also directly reacts to
   network-layer signals delivered via ICMP, for example, "Port
   Unreachable" or the now-deprecated "Source Quench" [RFC1122].
   Explicit Congestion Notification (ECN) [RFC3168] and Quick-Start
   [I-D.ietf-tsvwg-quickstart] are other sources of network-layer
   information for which response mechanisms for TCP have been defined.
   Connectivity-change indications are yet another source of lower-layer
   information that TCP can use to improve its operation.

   A second important point to note is that the TCP response mechanisms
   to connectivity-change indications are purely optional efficiency
   improvements.  In the absence of connectivity-change indications, a
   TCP that implements these changes behaves identically to an
   unmodified TCP.  When lower layers provide connectivity-change

Schuetz, et al.         Expires September 6, 2007               [Page 3]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   indications that trigger the response mechanisms, they enhance TCP
   operation based on the explicit lower-layer information that is
   signaled.  These response mechanisms do not increase the
   aggressiveness of TCP.

   Note that the IAB has recently described architectural issues of
   "link indications" [I-D.iab-link-indications].  The authors feel that
   this term is not quite accurate in this environment, because
   transport mechanisms should remain link-technology-agnostic.
   However, transport protocols have always acted on network-layer
   information and signals, such as measured path characteristics or
   ICMP-signaled conditions.  Because of the growing proliferation of
   shim layers between the traditional network and transport layers,
   this document uses the term "lower-layer indication" to remain
   independent of specific network or shim layers.

   Note that it is currently an open question as to whether additional
   lower-layer indications can provide further information to transport
   protocols.  Also, this document only describes response mechanisms
   for TCP, although other transport protocols may benefit from similar
   response mechanisms to react to connectivity-change indications.

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in [RFC2119].

2.  Motivation and Overview

   Several proposed network-layer extensions support host mobility,
   including Mobile IPv4 [RFC3344], Mobile IPv6 [RFC3775] and HIP
   [I-D.ietf-hip-mm].  Typically, they shield transport-layer protocols
   from mobility events and enable them to sustain established
   connections across mobility events.  However, the path
   characteristics that established connections experience after a
   mobility event may have changed drastically and on short time-scales.
   Congestion control, RTT and path-MTU state gathered over an old path
   before the move generally have no meaning for the new path.  Because
   TCP uses stale information when resuming transmission over the new
   path, it can be either too aggressive or highly inefficient.  Similar
   conditions may be found when fail-overs occur for multihomed hosts
   through the shim6 protocol.  Some background on the types of
   scenarios that the technology described in this document is designed
   to work within are found in Appendix A.

   TCP already forces a slow-start restart in some cases where the
   network state becomes unknown, such as after an idle period or heavy
   losses.  A first part of the response specified in this document

Schuetz, et al.         Expires September 6, 2007               [Page 4]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   involves a similar return to initial slow-start state in response to
   connectivity-change indications that are received while a connection
   is transmitting in steady-state.  Note that this behavior is more
   conservative than the standard TCP response or lack of response.
   Some performance gains with the proposed mechanisms are due to either
   avoiding overloading the new path, which typically incurs an RTO, or
   using slow-start to quickly detect new capacity far above the point
   where steady-state had previously been near.

   A second response component improves TCP operation in the presence of
   temporary connectivity disruptions.  These disruptions can occur
   independently of mobility events and, for example, may be due to
   insufficient wireless access coverage or nomadic computer use.
   Connectivity disruptions can severely decrease TCP performance.  The
   main reason for this decrease is TCP's retransmission behavior after
   a connectivity disruption [SCHUETZ].  TCP uses periodic
   retransmission attempts in exponentially increasing intervals, which
   can unnecessarily delay retransmissions after connectivity returns.
   In the extreme case, TCP connections can even abort, if the
   disruption is longer than the TCP "user timeout."  (Connection aborts
   are out of scope for this document but can be prevented by the TCP
   User Timeout Option [I-D.ietf-tcpm-tcp-uto].)

   This second response action executes when receiving a connectivity-
   change indication while a connection is stalled in exponential back-
   off.  It improves TCP retransmission behavior after connectivity is
   restored through an immediate speculative retransmission attempt
   [footnote-1].  Similar to the first response component, the second
   one also increases TCP performance through a more intelligent
   transmission behavior that uses periods of connectivity more
   efficiently.  In comparison to startup of a new connection, it does
   not cause significant amounts of additional traffic and it does not
   change TCP's congestion control algorithms.

   Finally, this draft specifies a third response component, which is a
   new TCP option that notifies the connection's remote peer of a
   connectivity-change event detected locally.  This is useful because
   connectivity-change indications typically require appropriate
   responses at both ends of a connection, but may only be received or
   detected by one end.  The other parts of the response to a
   connectivity-change indication are independent of the indication's
   source (locally notified or remotely signaled) and depend only on the
   specific indication and the state of the connection for which it was

Schuetz, et al.         Expires September 6, 2007               [Page 5]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

3.  Connectivity-Change Indications

   The focus of this document is on specifying TCP response mechanisms
   to lower-layer "path characteristics have changed" indications.  This
   section briefly describes how different network- and shim-layer
   mechanisms underneath the transport layer may provide these
   "connectivity-change" indications to TCP.  This section is included
   for clarification only; details on connectivity indication sources
   are out of scope of this document.

   When lower layers detect a connectivity-change event, they generate
   corresponding connectivity-change indications.  Lower-layer events
   that could trigger such an indication include (but are not limited

   o  the IP address of the local outbound interface used for a given
      connection has changed, e.g., due to DHCP [RFC2131] or IPv6 router
      advertisements [RFC2460]

   o  link-layer connectivity of the local outbound interface used for a
      given connection has changed, e.g., link-layer "link up" event

   o  the local outbound interface used for a given connection has
      changed, due to routing changes or link-layer connectivity changes
      at other interfaces (including tunnel establishment or teardown,
      e.g., in response to IKE events [RFC4306])

   o  a Mobile IP binding update has completed [RFC3775]

   o  a HIP readdressing update has completed [I-D.ietf-hip-mm]

   o  a path-change signal from the network has arrived (possible in
      theory, depends on network capabilities)

   o  other notifications as defined by the IETF's Detecting Network
      Attachment (DNA) working group have occurred

   Note that the list above only describes some potential sources for
   connectivity-change events.  Other sources exist, but the details on
   when to generate such events are out of the scope of this document,
   which focuses on the TCP response mechanisms when such events are

Schuetz, et al.         Expires September 6, 2007               [Page 6]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

4.  TCP Response to Connectivity-Change Indications

   A TCP connection can receive a connectivity-change indication (CCI)
   either from its local stack ("local CCI") or through a new
   "connectivity-change indication TCP option" from its peer ("remote
   CCI").  Section 4.1 specifies this new TCP option.  In either case,
   upon reception of a CCI, the TCP response mechanisms defined in this
   document re-probe path characteristics or perform a speculative
   retransmission, depending on whether the connection is currently
   stalled in exponential back-off or transmitting in steady-state.  A
   connection is "stalled in exponential back-off", if at least one
   segment was retransmitted due to an RTO expiration but has not been
   ACK'ed yet.

   The remainder of this section first defines the format of the new CCI
   option in Section 4.1 and then describes the two TCP response
   mechanisms triggered by receiving CCIs - re-probing path
   characteristics and speculative retransmission - in Section 4.3 and
   Section 4.4.

   To implement the RLCI mechanism defined in this document, TCP
   implementations MUST maintain five new state variables per TCP
   connection [footnote-2]:

      Counts (modulo 8) the number of local CCIs received for a
      connection.  Starting from value 7, it is decremented on each
      local CCI and after 0 wraps up to 7.

      Holds a copy of the last CCI counter value advertised by the peer
      through a CCI TCP option.  This is initialized to 7, and is
      updated in response to remote CCIs according to the rules defined
      in Section 4.2.

      Boolean flag, true if the local TCP stack is currently executing a
      response mechanism after having received a local CCI, and false

      Boolean flag, true if the local TCP stack is currently executing a
      response mechanism after having received a remote CCI, false

Schuetz, et al.         Expires September 6, 2007               [Page 7]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

      Retains a copy of SND.NXT [RFC0793] at the time the most recent
      remote CCI was received.

4.1.  Connectivity-Change Indication TCP Option

   Connectivity-change indications (CCIs) are generally asymmetric,
   i.e., they may occur or be detected by one end but not the other.
   The basic idea behind the CCI TCP option is to signal the occurrence
   of local CCIs to the other end, in order to allow it to respond

                                 1                   2
             0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4
             |    Kind = X    |  Length = 3   |RES| CNT | ECNT|

    Figure 1: Format of the connectivity-change indication TCP option.

   Figure 1 shows the format of the CCI TCP option.  It contains these

   Kind (8 bits)
      The TCP option number X [RFC0793] allocated by IANA upon
      publication of this document (see Section 6).

   Length (8 bits)
      Length of the TCP option in octets [RFC0793]; its value MUST be 3.

   RES (2 bits)
      Reserved bits.  The sender SHOULD set these to zero and the
      receiver MUST ignore them.

   CNT (3 bits)
      Current value of LOCAL_CCI_COUNT of the local end sending the

   ECNT (3 bits)
      Echoed value of CNT, i.e., the value of CNT in the last CCI option
      received from the other end.

   The CCI TCP option contains a counter (CNT) that represents the
   number of times each side has received local connectivity-change
   indications.  At the beginning of a connection, LOCAL_CCI_ACTIVE and

Schuetz, et al.         Expires September 6, 2007               [Page 8]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   A host opening a connection includes a CCI option in its SYN segment
   with the initial LOCAL_CCI_COUNT of 7 to advertise support for the
   option.  A host receiving a SYN MUST NOT include a CCI option in its
   SYN-ACK unless it has received a CCI option in the corresponding SYN.
   A host MUST NOT process any following CCI options unless one was
   included in both the SYN and SYN-ACK.

   After the SYN exchange, a host SHOULD send a CCI option only after
   receiving a new local connectivity-change indication, or in response
   to receiving a new CCI option from the other end.  Section 4.3 and
   Section 4.4 describe the processing rules in detail.

   A host MUST include a CCI option in all outgoing segments whenever
   LOCAL_CCI_ACTIVE is true or REMOTE_CCI_ACTIVE is true (or both).  A
   host MUST NOT include a CCI option in any segments whenever
   LOCAL_CCI_ACTIVE is false and REMOTE_CCI_ACTIVE is false, i.e. the
   host is not processing any connectivity-change indications.  When
   sending any CCI option, CNT MUST be set to the current
   LOCAL_CCI_COUNT and ECNT MUST be set to the current REMOTE_CCI_COUNT.

4.2.  Generation and Processing of Connectivity-Change Indication TCP

   Processing of a connectivity-change indication can be separated into
   two parts:

   1.  Processing in "initiator" mode, i.e., when a host receives a
       local CCI and forwards it to the other end through a CCI TCP

   2.  Processing in "responder" mode, i.e., when a host that receives a
       remote CCI in a CCI TCP option from the other end.

   Section 4.2.1 and Section 4.2.2 describe the state machines at an
   initiator and a responder, respectively.  Note that a single host can
   be both initiator and responder at the same time, if a local CCI and
   a remote CCI happen to occur at the same time.

   The following events, conditions and actions are used in the
   definition of the two state machines:


      Local end received a local CCI.

Schuetz, et al.         Expires September 6, 2007               [Page 9]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

      Local end received information about a remote CCI, i.e., received
      a TCP segment that includes a CCI TCP option.

      Local end received a TCP segment that does not include a CCI TCP


      Received CCI option signals a new remote CCI, i.e., CNT !=

      Received CCI option echoes the local CCI counter, i.e., ECNT ==

      Local end made progress since receiving the last remote CCI, i.e.,


      - 1.  LOCAL_CCI_COUNT wraps from 0 to 7.

      Force transmission of a segment that MUST include a CCI option.
      The segment can either be an outstanding retransmission, a new
      data segment or a pure ACK.

      Update remote CCI counter according to received CCI option, i.e.,

      Store the segment number of the next data segment, i.e., set

4.2.1.  Initiator Mode Processing

   This section describes the initiator mode processing of a TCP host
   implementing RLCI.  In initiator mode, a host needs to signal the
   last received local CCI to its peer, until the peer echoes reception
   of that CCI.  Figure 2 shows the corresponding state machine.

Schuetz, et al.         Expires September 6, 2007              [Page 10]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   At the beginning of a connection, i.e., before the first local CCI is
   received, LOCAL_CCI_ACTIVE is false.  This remains the case until the
   local end receives a local CCI (E_LOCAL_CCI).  When that happens, it
   decrements LOCAL_CCI_COUNT (A_DECREMENT_LOCAL), forces a segment to
   be sent to the peer (A_FORCE_SEND) and LOCAL_CCI_ACTIVE becomes true.
   Note that this also implies that all subsequent outgoing segments
   MUST contain a CCI TCP option until LOCAL_CCI_ACTIVE is false (and
   possibly until REMOTE_CCI_ACTIVE is false, in case it became true
   during the local CCI processing).

                       E_LOCAL_CCI =>
                 +-------------------------+    +-----+
                 |                         |    |     |
                 |                         V    V     |
          +------------------+  +------------------+  |
          |     == false     |  |     == true      |  |
          +------------------+  +------------------+  |
                 ^ ^                     | |    |     |
                 | |                     | |    |     |
                 | +---------------------+ |    ------+
                 |          E_NONE         |  E_LOCAL_CCI =>
                 |                         |    A_DECREMENT_LOCAL
                 +-------------------------+    A_FORCE_SEND
                      E_REMOTE_CCI &&

             Figure 2: State machine for initiator processing.

   When receiving a local CCI (E_LOCAL_CCI) while LOCAL_CCI_ACTIVE is
   true, a host remains in this state but needs to perform the actions
   until a host receives a segment carrying the CCI TCP option
   (E_REMOTE_CCI) that echoes the current LOCAL_CCI_COUNT in the ECNT
   field of the option (C_ECHOED_LOCAL_CCI).  In this case,
   LOCAL_CCI_ACTIVE becomes false.

4.2.2.  Responder Mode Processing

   This section describes the responder mode processing of CCIs for a
   TCP host implementing the CCI TCP option.  In responder mode, a host
   echoes the last received remote CCI to its peer, until it can be sure
   that the peer correctly received the echo.  Figure 3 shows the

Schuetz, et al.         Expires September 6, 2007              [Page 11]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   corresponding state machine.

   At the beginning of a connection, REMOTE_CCI_ACTIVE is false, i.e.,
   the local host is not processing any remote CCIs.  When it receives a
   TCP segment with a CCI TCP option (E_REMOTE_CCI) signaling a new
   remote CCI (C_NEW_REMOTE_CCI), it updates REMOTE_CCI_COUNT with the
   value of the CNT field in the received option
   (A_UPDATE_REMOTE_COUNT), stores the segment number of the next data
   segment in REMOTE_CCI_SNDNXT (A_UPDATE_SNDNXT) and sets
   REMOTE_CCI_ACTIVE to true.  Note that this also implies that all
   subsequent outgoing segments MUST contain a CCI TCP option until
   REMOTE_CCI_ACTIVE is false (and possibly until LOCAL_CCI_ACTIVE is
   false, in case it became true during the remote CCI processing).

             E_REMOTE_CCI &&
             C_NEW_REMOTE_CCI == true =>
             +-------------------------+    +-----+
             |                         |    |     |
             |                         V    V     |
    +-------------------+  +-------------------+  |
    |     == false      |  |      == true      |  |
    +-------------------+  +-------------------+  |
             ^ ^                     | |    |     |
             | |                     | |    |     |
             | +---------------------+ |    ------+
             |          E_NONE         |    E_REMOTE_CCI &&
             |                         |    C_NEW_REMOTE_CCI == true =>
             +-------------------------+      A_UPDATE_REMOTE_COUNT
             E_REMOTE_CCI &&                  A_UPDATE_SNDNXT
             C_NEW_REMOTE_CCI == false &&

             Figure 3: State machine for responder processing.

   When a host where REMOTE_CCI_ACTIVE is true receives a remote CCI TCP
   option (E_REMOTE_CCI) that signals a new remote CCI
   (C_NEW_REMOTE_CCI), it updates REMOTE_CCI_COUNT with the value of the
   CNT field in the received option (A_UPDATE_REMOTE_COUNT), stores the
   segment number of the next data segment in REMOTE_CCI_SNDNXT
   (A_UPDATE_SNDNXT) and leaves REMOTE_CCI_ACTIVE set to true.

   A host sets REMOTE_CCI_ACTIVE to false only in one of the following

Schuetz, et al.         Expires September 6, 2007              [Page 12]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   two cases.  First, if it receives a TCP segment that does not include
   a a CCI TCP option (E_NONE), because this signals that
   LOCAL_CCI_ACTIVE is false at the other end from which it can conclude
   that the other end has completed processing of the CCI.  Second, if
   it receives a CCI TCP option (E_REMOTE_CCI) that does not signal a
   new remote CCI (C_NEW_REMOTE_CCI == false) and the connection has
   made progress since the last remote CCI (C_LOCAL_PROGRESS).  In this
   case, data segments sent after the last remote CCI have already been
   ACK'ed, i.e., the peer must have received the echoed ECNT value in at
   least one of the segments sent since the last remote CCI, because a
   full round-trip of CCI option has completed.  Therefore, the local
   host can terminate responder mode processing.

   Note: The second transition is required for the case when both hosts
   are in responder mode at the same time.  Neither will stop including
   CCI TCP options in their segments, because REMOTE_CCI_ACTIVE is true
   on both sides.  This can happen, e.g., when both hosts receive local
   CCIs at (nearly) the same time and signal it to each other using a
   CCI TCP option.

4.3.  Re-Probing Path Characteristics

   When a TCP connection receives a connectivity-change indication and
   is not currently stalled in exponential back-off, it MUST re-probe
   the path characteristics to prevent causing congestion by
   transmitting based on stale path state.  In principle, this occurs
   similar to the initial slow-start: The sender MUST NOT transmit more
   than the default initial window (INIT_WINDOW) of data after a CCI is
   received and MUST reset the congestion control state (CWND and
   SS_THRESH), round-trip time measurement (RTTM) state, and RTO timer
   as if this were a new connection [RFC2581][RFC2988].  If case Path
   MTU Discovery (PMTUD) is activated, PMTUD state MUST also be reset

   One difference to an initial slow-start is that after a CCI, the
   connection may have segments in flight towards the destination along
   a previous path.  Therefore, after a CCI, congestion control MUST
   ignore any stale ACKs received and MUST update the congestion window
   solely based on ACKs for data that was sent before a CCI was
   received.  Each ACK that is received while the host is processing any
   CCI SHOULD be treated as a stale ACK, i.e., each ACK received for
   data sent while LOCAL_CCI_ACTIVE was false or REMOTE_CCI_ACTIVE was
   false is a stale ACK.  In practice, a decent heuristic to
   disambiguate stale and fresh ACKs is that all ACKs received while
   either LOCAL_CCI_ACTIVE or REMOTE_CCI_ACTIVE are true are considered
   stale.  This works assuming there is only little large-scale
   reordering, because the packet that triggers the local state machine
   back into an inactive state will generally be received after all

Schuetz, et al.         Expires September 6, 2007              [Page 13]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   stale packets.  In some scenarios this assumption may not hold, but
   it seems reasonable for the vast majority of scenarios where the
   stale path is cleared of packets in less time than one or two RTTs on
   the new path.

   For each stale ACK received, a host MUST NOT adjust the congestion
   window and MUST NOT send any new data into the network.  This SHOULD
   continue until both LOCAL_CCI_ACTIVE and REMOTE_CCI_ACTIVE are false
   or there is a timeout.  When that occurs, the sender should consider
   any un-ACK'ed segments below the highest received ACK as lost and
   discount them from the segments in flight.  The sender MUST use slow-
   start based loss recovery for these segments.

4.4.  Speculative Retransmission

   The basic idea behind the speculative retransmission is to allow TCP
   to resume stalled connections as soon as it receives an indication
   that connectivity to previously unreachable peers may have returned.

   When a TCP connection receives a connectivity-change indication -
   either from the local stack or in a connectivity-change TCP option
   from the peer - and is currently stalled, it MUST immediately
   initiate the standard retransmission procedure, just as if the RTO
   for the connection had expired.

5.  Security Considerations

   The only foreseen security considerations with the techniques
   presented in this document result from either an attacker's ability
   to spoof valid TCP segments with options that seemingly indicate
   connectivity changes, or an attacker's ability to generate bogus
   connectivity change indications locally.  An attacker might produce a
   stream of such false indicators that could keep a connection in slow-
   start at the initial window.  One possible defense against this type
   of attack is to rate-limit the response to connectivity indicators
   (whether local or remote).  This is also probably less serious than
   other attacks such an empowered adversary could perform, like
   resetting the connection or injecting data.  A similar effect could
   be achieved without the new option by forging duplicate ACKs that
   would keep a sender in loss recovery.  If both sets of IP addresses,
   port numbers, and sequence numbers are guessable for a connection,
   then the connection should use an approved means (such as IPsec)
   [I-D.ietf-tcpm-tcp-antispoof] for protection against spoofed

Schuetz, et al.         Expires September 6, 2007              [Page 14]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

6.  IANA Considerations

   This section is to be interpreted according to [RFC2434].

   This document does not define any new namespaces.  It uses an 8-bit
   TCP option number maintained by IANA at
   http://www.iana.org/assignments/tcp-parameters.  IANA is requested to
   assign a new TCP option number upon publication of this document.

7.  Acknowledgments

   This draft combines and obsoletes [I-D.swami-tcp-lmdr] and
   [I-D.eggert-tcpm-tcp-retransmit-now].  The authors would like to
   thank Mark Allman, Marcus Brunner, Shashikant Maheshwari, Kacheong
   Poon, Juergen Quittek, Stefan Schmid and Joe Touch for their comments
   and suggestions on the two previous drafts.

   Simon Schuetz is partly funded by Ambient Networks, a research
   project supported by the European Commission under its Sixth
   Framework Program.

   Wesley Eddy's work on this document was performed at NASA's Glenn
   Research Center, while in support of the NASA Space Communications
   Architecture Working Group (SCAWG), and the FAA/Eurocontrol Future
   Communications Study (FCS).

8.  References

8.1.  Normative References

              Mathis, M. and J. Heffner, "Packetization Layer Path MTU
              Discovery", draft-ietf-pmtud-method-11 (work in progress),
              December 2006.

   [RFC0793]  Postel, J., "Transmission Control Protocol", STD 7,
              RFC 793, September 1981.

   [RFC1191]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
              November 1990.

   [RFC1981]  McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
              for IP version 6", RFC 1981, August 1996.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119, March 1997.

Schuetz, et al.         Expires September 6, 2007              [Page 15]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   [RFC2434]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
              IANA Considerations Section in RFCs", BCP 26, RFC 2434,
              October 1998.

   [RFC2581]  Allman, M., Paxson, V., and W. Stevens, "TCP Congestion
              Control", RFC 2581, April 1999.

   [RFC2988]  Paxson, V. and M. Allman, "Computing TCP's Retransmission
              Timer", RFC 2988, November 2000.

8.2.  Informative References

   [DUKE]     Duke, M., Henderson, T., and J. Meegan, "Experience with
              ``Link-UP Notification'' Over a Mobile Satellite Link",
              ACM Computer Communication Review, Vol. 34, No. 3,
              July 2004.

   [EDDY]     Eddy, W. and Y. Swami, "Adapting End-host Congestion
              Control for Mobility", NASA Glenn Research Center
              Technical Report, CR-2005-213838, July 2005.

              Dawkins, S., "End-to-end, Implicit 'Link-Up'
              Notification", draft-dawkins-trigtran-linkup-01 (work in
              progress), October 2003.

              Eggert, L., "TCP Extensions for Immediate
              Retransmissions", draft-eggert-tcpm-tcp-retransmit-now-02
              (work in progress), June 2005.

              Aboba, B., "Architectural Implications of Link
              Indications", draft-iab-link-indications-10 (work in
              progress), March 2007.

              Yegin, A., "Link-layer Event Notifications for Detecting
              Network Attachments", draft-ietf-dna-link-information-06
              (work in progress), February 2007.

              Nikander, P., "End-Host Mobility and Multihoming with the
              Host Identity Protocol", draft-ietf-hip-mm-04 (work in
              progress), June 2006.

              Hughes, A., Touch, J., and J. Heidemann, "Issues in TCP

Schuetz, et al.         Expires September 6, 2007              [Page 16]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

              Slow-Start Restart After Idle",
              draft-ietf-tcpimpl-restart-00 (work in progress),
              March 1998.

              Touch, J., "Defending TCP Against Spoofing Attacks",
              draft-ietf-tcpm-tcp-antispoof-06 (work in progress),
              February 2007.

              Eggert, L. and F. Gont, "TCP User Timeout Option",
              draft-ietf-tcpm-tcp-uto-04 (work in progress),
              October 2006.

              Floyd, S., "Quick-Start for TCP and IP",
              draft-ietf-tsvwg-quickstart-07 (work in progress),
              October 2006.

              Swami, Y., "Lightweight Mobility Detection and Response
              (LMDR) Algorithm for TCP", draft-swami-tcp-lmdr-07 (work
              in progress), March 2006.

   [KOODLI]   Koodli, R. and C. Perkins, "Fast Handovers and Context
              Transfers in Mobile Networks", ACM Computer Communication
              Review, Vol. 31, No. 5, October 2001.

   [OTT]      Ott, J. and D. Kutscher, "OTT Internet: IEEE 802.11b for
              Automobile Users", Proc. Infocom 2004, March 2004.

   [RFC1122]  Braden, R., "Requirements for Internet Hosts -
              Communication Layers", STD 3, RFC 1122, October 1989.

   [RFC2131]  Droms, R., "Dynamic Host Configuration Protocol",
              RFC 2131, March 1997.

   [RFC2460]  Deering, S. and R. Hinden, "Internet Protocol, Version 6
              (IPv6) Specification", RFC 2460, December 1998.

   [RFC3168]  Ramakrishnan, K., Floyd, S., and D. Black, "The Addition
              of Explicit Congestion Notification (ECN) to IP",
              RFC 3168, September 2001.

   [RFC3344]  Perkins, C., "IP Mobility Support for IPv4", RFC 3344,
              August 2002.

   [RFC3775]  Johnson, D., Perkins, C., and J. Arkko, "Mobility Support

Schuetz, et al.         Expires September 6, 2007              [Page 17]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

              in IPv6", RFC 3775, June 2004.

   [RFC3819]  Karn, P., Bormann, C., Fairhurst, G., Grossman, D.,
              Ludwig, R., Mahdavi, J., Montenegro, G., Touch, J., and L.
              Wood, "Advice for Internet Subnetwork Designers", BCP 89,
              RFC 3819, July 2004.

   [RFC4306]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
              RFC 4306, December 2005.

   [SCHUETZ]  Schuetz, S., Eggert, L., Schmid, S., and M. Brunner,
              "Protocol Enhancements for Intermittently Connected
              Hosts", ACM Computer Communication Review, Vol. 35, No. 3,
              July 2005.

   [SCOTT]    Scott, J. and G. Mapp, "Link layer-based TCP optimisation
              for disconnecting networks", ACM Computer Communication
              Review, Vol. 33, No. 5, October 2003.

Editorial Comments

   [footnote-1]  The authors have heard the idea of triggering
                 retransmits based on connectivity events of directly-
                 connected links being attributed to Phil Karn ("kick"
                 operation in the KAQ9 TCP stack).  A thread from the
                 PILC mailing list in 2000 discusses some thoughts on
                 this (http://www.isi.edu/pilc/list/archive/0691.html).

   [footnote-2]  Although this specification introduces five new per-
                 connection state variables, a preliminary
                 implementation of an earlier revision of this mechanism
                 [I-D.swami-tcp-lmdr] only required around a hundred
                 lines of kernel code.

Appendix A.  Background: Classification of Connectivity Disruptions

   Connectivity disruptions can occur in many different situations.
   They can be due to wireless interference, movement out of a wireless
   coverage area, switching between access networks, or simply due to
   unplugging an Ethernet cable.  Depending on the situation in which
   they occur, the implications of connectivity disruptions are
   different and must be handled appropriately.  This section attempts
   to classify different types of connectivity disruptions and discusses
   their implications and impact on TCP.

   Two main properties of connectivity disruptions affect how TCP reacts
   to them: their duration and whether the path characteristics have

Schuetz, et al.         Expires September 6, 2007              [Page 18]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   significantly changed after they end.  This document distinguishes
   between "short" and "long" disruptions and "changed" and "unchanged"
   path characteristics.  Note that these two categories are orthogonal
   to each other, i.e., four types of connectivity disruptions exist.

   Connectivity disruptions are "short" for a given TCP connection, if
   connectivity returns before the RTO fires for the first time, i.e.,
   when TCP is still in steady-state.  In this case, standard TCP
   recovers lost data segments through Fast Retransmit and lost ACKs
   through successfully delivered later ACKs.  Appendix A.1 briefly
   describes this case.

   Connectivity disruptions are "long" for a given TCP connection, if
   the RTO fires at least once before connectivity returns, i.e., when
   TCP is in exponential back-off.  In this case, TCP can be inefficient
   in its retransmission scheme, as described in Appendix A.2.

   Whether or not path characteristics change when connectivity returns
   is a second important factor for TCP's retransmission scheme.
   Standard TCP implicitly assumes that path characteristics remain
   unchanged across short disruptions by performing Fast Retransmit
   using the path parameters collected before the disruption.  For long
   disruptions, standard TCP is more conservative and performs slow-
   start, re-probing the path characteristics from scratch.  However,
   the standard behavior can be inefficient due to when it is initiated.

   These implicit assumptions can cause standard TCP to misbehave or
   perform inefficiently in some scenarios.  Figure 4 illustrates the
   standard TCP behavior.

         Short    | Fast Retransmit using | Fast Retransmit using |
         Duration | currently collected   | currently collected   |
         < RTO    | path characteristics  | path characteristics  |
         Long     |                       |                       |
         Duration | Slow-start            | Slow-start            |
         >= RTO   |                       |                       |
                      Unchanged Path          Changed Path
                      Characteristics         Characteristics

                     Figure 4: Standard TCP behavior.

Schuetz, et al.         Expires September 6, 2007              [Page 19]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

A.1.  Short Connectivity Disruptions

   One common cause of short connectivity disruptions that result in a
   change of the end-to-end path characteristics is transparent network
   layer mobility, via protocols such as Mobile IP, NEMO, or HIP.  These
   protocols generally hide mobility events from the transport layer,
   but cannot mask the resulting changes to the end-to-end path that
   established TCP connections transmit over.

   Consider a Mobile IP scenario as shown in Figure 5.  At time T, a
   mobile node MN attaches to access network Net-1, connected to the
   Internet through access router AR-1 and has the care-of address
   <Net-1, MN>.  It establishes a TCP connection to the correspondent
   node CN.  While MN attaches to AR-1, packets between CN and <Net-1,
   MN> follow PATH-1 (via Cloud-1 and AR-1).  Assume that at some time
   T+1, MN moves and then attaches to Net-2, which is reachable through
   AR-2 with the care-of address <Net-2, MN>.  While MN attaches to
   AR-2, all packets between CN and <Net-2, MN> follow PATH-2 (through
   Cloud-2 and AR-2).


                       /---------\   +------+
                       |         |   |      | Net-1
                   +---+ Cloud-1 +---+ AR-1 +-----> MN (time=T)
                   |   |         |   |      |
                   |   \----+----/   +---+--+        |
                   |        |                        |
         CN <------+        | PATH-3                 |
                   |        |                        |
                   |   /----V----\   +-------+       V
                   |   |         |   |       |
                   +---+ Cloud-2 +---+ AR-2  +-----> MN (time=T+1)
                       |         |   |       | Net-2
                       \---------/   +-------+


                        Figure 5: Mobility example.

   During a transient disconnected period, MN may have disconnected from
   Net-1 and not yet attached to Net-2.  Consequently, AR-1 may not be
   able to deliver packets to MN.  This could result in a burst of
   packet losses.  Several approaches for "fast" or "seamless" handovers
   exist that involve adding machinery to the ARs to buffer and redirect
   packets originally sent to Net-1 towards Net-2, rather than dropping
   them (e.g., [KOODLI]).

Schuetz, et al.         Expires September 6, 2007              [Page 20]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   As long as MN remains in Net-1, standard congestion control
   algorithms [RFC2581] are sufficient.  However, once MN moves from
   Net-1 to Net-2, two different scenarios are possible depending on
   network topology:

   o  In the first scenario, with standard Mobile IPv4, all packets
      destined to <Net-1, MN> are dropped by AR-1 once MN has moved.
      Since the latency involved in establishing a new tunnel to the HA
      is on the order of the RTT (2*RTT in case of Mobile IPv6), roughly
      an entire window's worth of data and ACKs will be dropped by AR-1.
      Because of this burst loss, CN and MN are likely to incur
      expensive retransmission timeouts.

   o  In the second scenario, with a fast handover mechanism in place,
      losses are masked through buffering and tunneling between routers
      AR-1 and AR-2.  The exact sequence of buffering and forwarding
      between the ARs is not guaranteed to occur in a manner consistent
      with the available bandwidth of PATH-3 or conformant to TCP's
      clocking expectations.  This can cause TCP's behavior over PATH-2
      to be based on the unrelated properties of PATH-1 and PATH-3.

   After attaching to Net-2, reception of stale ACKs (for data sent on
   PATH-1) will cause MN to incorrectly inflate its congestion window.
   These stale ACKs do not provide any indication of the congestion
   along PATH-2.  CN's congestion window becomes similarly inflated by
   ACKs that MN sends for data segments redirected over PATH-3.  If the
   congestion windows from PATH-1 are already too big for PATH-2, this
   can overload Net-2 or PATH-2, causing packet loss and timeouts.

   On the other hand, if the available bandwidth along PATH-2 is greater
   than along PATH-1, and if the sender is in congestion avoidance, it
   will need potentially many RTTs before utilizing the available path
   capacity.  This is due to relatively slow bandwidth increase during
   congestion avoidance caused by a stale SS_THRESH.  (See [EDDY] for

A.2.  Long Connectivity Disruptions

   For long disruptions, standard TCP performs slow-start after
   connectivity returns, because the retransmission timeout (RTO) has
   expired.  This conservative strategy avoids overloading the new path.
   However, TCP's general exponential back-off retransmission strategy
   can time these slow-starts such that performance decreases.

   When a long connectivity disruption occurs along the path between a
   host and its peer while the host is transmitting data, it stops
   receiving ACKs.  After the RTO expires, the host attempts to
   retransmit the first unacknowledged segment.  TCP implementations

Schuetz, et al.         Expires September 6, 2007              [Page 21]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   that follow the recommended RTO management proposed in [RFC2988]
   double the RTO after each retransmission attempt until it exceeds 60
   seconds.  This scheme causes a host to attempt to retransmit across
   established connections roughly once a minute.  (More frequently
   during the first minute or two of the connectivity disruption, while
   the RTO is still being backed off.)

   When the long connectivity disruption ends, standard TCP
   implementations still wait until the RTO expires before attempting
   retransmission.  Figure 6 illustrates this behavior.  Depending on
   when connectivity becomes available again, this can waste up to a
   minute of connectivity for TCPs that implement the recommended RTO
   management described in [RFC2988].  For TCP implementations that do
   not implement [RFC2988], even longer connectivity periods may be
   wasted.  For example, Linux uses 120 seconds as the maximum RTO by

          number      X = Successfully transmitted segment
           ^          O = Lost segment
           |     :                     :              : X
           |     :                     :              :X
           |     OO O  O    O        O :              X
           |    X:                     :              :
           |   X :                     :<------------>:
           |  X  :                     :    Wasted    :
           | X   :                     :  connection  :
           |X    :                     :     time     :
                 :                     :              :       Time
            Connectivity          Connectivity       TCP
               gone                  back         retransmit

       Figure 6: Standard TCP behavior in the presence of disrupted

   This retransmission behavior is not efficient, especially in
   scenarios where connectivity periods are short and connectivity
   disruptions are frequent [OTT].  Experiments show that TCP
   performance across a path with frequent disruptions is significantly
   worse, compared to a similar path without disruptions [SCHUETZ].

   In the ideal case, TCP would attempt a retransmission as soon as
   connectivity to its peer was re-established.  Figure 7 illustrates
   the ideal behavior.

Schuetz, et al.         Expires September 6, 2007              [Page 22]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

          number      X = Successfully transmitted segment
           ^          O = Lost segment
           |     :                     : X            :
           |     :                     :X             :
           |     OO O  O    O        O X              :
           |    X:                     :              :
           |   X :                     :<------------>:
           |  X  :                     :  Efficiency  :
           | X   :                     :  improvement :
           |X    :                     :              :
                 :                     :              :       Time
            Connectivity          Connectivity      Next
               gone             back = immediate  scheduled
                                 TCP retransmit   retransmit

         Figure 7: Ideal TCP behavior in the presence of disrupted

   The ideal behavior is difficult to achieve for arbitrary connectivity
   disruptions.  One obviously problematic approach would use higher-
   frequency retransmission attempts to enable earlier detection of
   whether connectivity has returned.  This can generate significant
   amounts of extra traffic.  Other proposals attempt to trigger faster
   retransmissions by retransmitting buffered or newly-crafted segments
   from inside the network

   Note that scenarios exist where path characteristics remain unchanged
   after long connectivity disruptions.  In this case, even an
   intelligently scheduled slow-start is inefficient, because TCP could
   safely resume transmitting at the old rate instead of slow-starting.
   Although originally developed to avoid line-rate bursts, techniques
   for the well-known "slow-start after idle" case
   [I-D.ietf-tcpimpl-restart] may be useful to further improve
   performance after a disruption ends in such a scenario.  This
   document does not currently describe this additional optimization,
   and an open question remains on how unchanged path characteristics
   after long connectivity disruptions could be validated by an end

Schuetz, et al.         Expires September 6, 2007              [Page 23]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

Appendix B.  Document Revision History

   | Revision | Comments                                               |
   | 00       | Initial version. This document is a merge of and       |
   |          | obsoletes [I-D.eggert-tcpm-tcp-retransmit-now] and     |
   |          | [I-D.swami-tcp-lmdr].                                  |
   | 01       | Major revision of the description of the               |
   |          | connectivity-change indication TCP option and its      |
   |          | processing in Section 4. Other formatting changes to   |
   |          | the document include moving some background material   |
   |          | to the appendix.                                       |

Authors' Addresses

   Simon Schuetz
   NEC Network Laboratories
   Kurfuerstenanlage 36
   Heidelberg  69115

   Phone: +49 6221 4342 165
   Fax:   +49 6221 4342 155
   Email: simon.schuetz@netlab.nec.de
   URI:   http://www.netlab.nec.de/

   Lars Eggert
   Nokia Research Center
   P.O. Box 407
   Nokia Group  00045

   Phone: +358 50 48 24461
   Email: lars.eggert@nokia.com
   URI:   http://research.nokia.com/people/lars_eggert/

Schuetz, et al.         Expires September 6, 2007              [Page 24]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

   Wesley M. Eddy
   Verizon Federal Network Systems
   NASA Glenn Research Center
   21000 Brookpark Road, MS 54-5
   Cleveland, OH  44135

   Email: weddy@grc.nasa.gov

   Yogesh Prem Swami
   Nokia Research Center, Dallas
   955 Page Mill Road
   Palo Alto, California  94304

   Phone: +1 972 374 0669
   Email: yogesh.swami@nokia.com

   Khiem Le
   Nokia Research Center, Dallas
   6000 Connection Drive
   Irving, TX  75603

   Phone: +1 972 342 3502
   Email: khiem.le@nokia.com

Schuetz, et al.         Expires September 6, 2007              [Page 25]

Internet-Draft  TCP Response to Connectivity Indications      March 2007

Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at


   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).

Schuetz, et al.         Expires September 6, 2007              [Page 26]

Html markup produced by rfcmarkup 1.129b, available from https://tools.ietf.org/tools/rfcmarkup/