draft-ietf-mptcp-multiaddressed-06.txt   draft-ietf-mptcp-multiaddressed-07.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Cisco Internet-Draft Cisco
Intended status: Experimental C. Raiciu Intended status: Experimental C. Raiciu
Expires: August 3, 2012 University Politehnica of Expires: September 27, 2012 University Politehnica of
Bucharest Bucharest
M. Handley M. Handley
University College London University College London
O. Bonaventure O. Bonaventure
Universite catholique de Universite catholique de
Louvain Louvain
January 31, 2012 March 26, 2012
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-multiaddressed-06 draft-ietf-mptcp-multiaddressed-07
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network, and thus improve user improve resource usage within the network, and thus improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
skipping to change at page 1, line 49 skipping to change at page 1, line 49
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 3, 2012. This Internet-Draft will expire on September 27, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 40 skipping to change at page 2, line 40
2.2. Associating a new subflow with an existing MPTCP 2.2. Associating a new subflow with an existing MPTCP
connection . . . . . . . . . . . . . . . . . . . . . . . . 9 connection . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3. Informing the other Host about another potential 2.3. Informing the other Host about another potential
address . . . . . . . . . . . . . . . . . . . . . . . . . 9 address . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4. Data transfer using MPTCP . . . . . . . . . . . . . . . . 10 2.4. Data transfer using MPTCP . . . . . . . . . . . . . . . . 10
2.5. Requesting a change in a path's priority . . . . . . . . . 11 2.5. Requesting a change in a path's priority . . . . . . . . . 11
2.6. Closing an MPTCP connection . . . . . . . . . . . . . . . 11 2.6. Closing an MPTCP connection . . . . . . . . . . . . . . . 11
2.7. Notable features . . . . . . . . . . . . . . . . . . . . . 11 2.7. Notable features . . . . . . . . . . . . . . . . . . . . . 11
3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 12 3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 12
3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 13 3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 13
3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 17 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 16
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 21 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 21
3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 23 3.3.1. Data Sequence Mapping . . . . . . . . . . . . . . . . 22
3.3.2. Data Acknowledgements . . . . . . . . . . . . . . . . 26 3.3.2. Data Acknowledgements . . . . . . . . . . . . . . . . 25
3.3.3. Closing a Connection . . . . . . . . . . . . . . . . . 27 3.3.3. Closing a Connection . . . . . . . . . . . . . . . . . 27
3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 28 3.3.4. Receiver Considerations . . . . . . . . . . . . . . . 28
3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 29 3.3.5. Sender Considerations . . . . . . . . . . . . . . . . 29
3.3.6. Reliability and Retransmissions . . . . . . . . . . . 30 3.3.6. Reliability and Retransmissions . . . . . . . . . . . 30
3.3.7. Congestion Control Considerations . . . . . . . . . . 31 3.3.7. Congestion Control Considerations . . . . . . . . . . 31
3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . . 32 3.3.8. Subflow Policy . . . . . . . . . . . . . . . . . . . . 31
3.4. Address Knowledge Exchange (Path Management) . . . . . . . 33 3.4. Address Knowledge Exchange (Path Management) . . . . . . . 33
3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 34 3.4.1. Address Advertisement . . . . . . . . . . . . . . . . 34
3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . . 37 3.4.2. Remove Address . . . . . . . . . . . . . . . . . . . . 36
3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 38 3.5. Fast Close . . . . . . . . . . . . . . . . . . . . . . . . 37
3.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 39 3.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 42 3.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 42
3.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 42 3.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 42
3.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 42 3.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 42
3.8.2. Delayed Subflow Start . . . . . . . . . . . . . . . . 43 3.8.2. Delayed Subflow Start . . . . . . . . . . . . . . . . 43
3.8.3. Failure Handling . . . . . . . . . . . . . . . . . . . 44 3.8.3. Failure Handling . . . . . . . . . . . . . . . . . . . 44
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 44 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 44
5. Security Considerations . . . . . . . . . . . . . . . . . . . 46 5. Security Considerations . . . . . . . . . . . . . . . . . . . 46
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 47 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 47
7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 50 7. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 50
skipping to change at page 8, line 15 skipping to change at page 8, line 15
2. Operation Overview 2. Operation Overview
This section presents a single description of standard MPTCP This section presents a single description of standard MPTCP
operation, with reference to the protocol operation. Considerable operation, with reference to the protocol operation. Considerable
reference is made to symbolic names of MPTCP options throughout this reference is made to symbolic names of MPTCP options throughout this
section - these are subtypes of the IANA-assigned MPTCP option (see section - these are subtypes of the IANA-assigned MPTCP option (see
Section 8), and their formats are defined in the detailed protocol Section 8), and their formats are defined in the detailed protocol
specification which follows in Section 3. specification which follows in Section 3.
A Multipath TCP connection provides a bidirectionnal bytestream A Multipath TCP connection provides a bidirectionnal bytestream
between two hosts communicating hosts like normal TCP and thus does between two hosts communicating like normal TCP and thus does not
not require any change to the applications. However, Multipath TCP require any change to the applications. However, Multipath TCP
enables the hosts to use different paths with different IP addresses enables the hosts to use different paths with different IP addresses
to exchange packets belonging to the MPTCP connection. A Multipath to exchange packets belonging to the MPTCP connection. A Multipath
TCP connection appears like a normal TCP connection to an TCP connection appears like a normal TCP connection to an
application. However, to the network layer each MPTCP subflows looks application. However, to the network layer each MPTCP subflows looks
like a regular TCP flow whose segments carry a new TCP option type. like a regular TCP flow whose segments carry a new TCP option type.
Multipath TCP manages the creation, removal and utilization of these Multipath TCP manages the creation, removal and utilization of these
subflows to send data. The number of subflows that are managed subflows to send data. The number of subflows that are managed
within a Multipath TCP connection is not fixed and it can fluctuate within a Multipath TCP connection is not fixed and it can fluctuate
during the lifetime of the Multipath TCP connection. during the lifetime of the Multipath TCP connection.
skipping to change at page 10, line 23 skipping to change at page 10, line 23
Further details in Section 3.4.2. Further details in Section 3.4.2.
Host-A Host-B Host-A Host-B
------ ------ ------ ------
REMOVE_ADDR -> REMOVE_ADDR ->
[IP#-A2's Address ID] [IP#-A2's Address ID]
2.4. Data transfer using MPTCP 2.4. Data transfer using MPTCP
To ensure reliable, in-order delivery of data over subflows that may To ensure reliable, in-order delivery of data over subflows that may
appear and disappear at anytime, MPTCP uses a 64-bit Data Sequence appear and disappear at any time, MPTCP uses a 64-bit Data Sequence
Number (DSN) to number all data sent over the MPTCP connection. Each Number (DSN) to number all data sent over the MPTCP connection. Each
subflow has its own 32 bits sequence number space and a MPTCP option subflow has its own 32 bits sequence number space and an MPTCP option
allows to map the subflow sequence space to the data sequence space. maps the subflow sequence space to the data sequence space. In this
In this way, data can be retransmitted on different subflows (mapped way, data can be retransmitted on different subflows (mapped to the
to the same DSN) in the event of failure. same DSN) in the event of failure.
The "Data Sequence Signal" option which carries this mapping can also The "Data Sequence Signal" option which carries this "Data Sequence
carry a connection-level acknowledgement (the "Data ACK") for the Mapping", which consists of the subflow sequence number, data
received DSN. sequence number, and length for which this mapping is valid. This
option can also carry a connection-level acknowledgement (the "Data
ACK") for the received DSN.
With MPTCP, all subflows share the same receive buffer and advertise With MPTCP, all subflows share the same receive buffer and advertise
the same receive window. There are two levels of acknowledgement in the same receive window. There are two levels of acknowledgement in
MPTCP. Regular TCP acknowledgements are used on each subflow to MPTCP. Regular TCP acknowledgements are used on each subflow to
acknowledge the reception of the segments sent over the subflow acknowledge the reception of the segments sent over the subflow
independently of their DSN. In addition, there are connection-level independently of their DSN. In addition, there are connection-level
acknowledgements for the data sequence space. These acknowledgements acknowledgements for the data sequence space. These acknowledgements
track the advancement of the bytestream and slide the receiving track the advancement of the bytestream and slide the receiving
window. window.
skipping to change at page 14, line 26 skipping to change at page 14, line 26
receiver of the TCP packet (which can be either host). If the SYN receiver of the TCP packet (which can be either host). If the SYN
flag is set, a single key is included; if only an ACK flag is set, flag is set, a single key is included; if only an ACK flag is set,
both keys are present. both keys are present.
B's Key is echoed in the ACK in order to allow the listener (host B) B's Key is echoed in the ACK in order to allow the listener (host B)
to act statelessly until the TCP connection reaches the ESTABLISHED to act statelessly until the TCP connection reaches the ESTABLISHED
state. If the listener acts in this way, however, it MUST generate state. If the listener acts in this way, however, it MUST generate
its key in a verifiable fashion, allowing it to verify that it its key in a verifiable fashion, allowing it to verify that it
generated the key when it is echoed in the ACK. generated the key when it is echoed in the ACK.
As TCP when using SYN cookies, the MPTCP handshake would be This exchange allows the safe passage of MPTCP options on SYN packets
vulnerable if the third ACK (containing the MP_CAPABLE option) is to be determined. If any of these options are dropped, MPTCP SHOULD
lost. In order to ensure reliable delivery of the third ACK, a gracefully fall back to regular single-path TCP, as documented in
server MUST respond with an ACK segment on receipt of this, which may Section 3.6. Note that new subflows MUST NOT be established (using
contain data, or will be a pure ACK if it does not have any data to the process documented in Section 3.2) until a DSS option has been
send immediately. If the initiator does not receive this ACK within successfully received across the path (as documented in Section 3.3).
an RTO, it MUST re-send the ACK containing MP_CAPABLE.
In effect, an MPTCP connection is in a "PRE_ESTABLISHED" state while
awaiting this ACK, and only upon receipt of the ACK will it move to
the ESTABLISHED state. When in the PRE_ESTABLISHED state, a host can
send data, but MUST NOT attempt to create additional subflows. Only
in the ESTABLISHED state does it know that MPTCP options are
correctly passed in both directions on the path. If MPTCP options
fail to be passed, an implementation SHOULD undertake fallback as
documented in Section 3.6.
The first four bits of the first octet in the MP_CAPABLE option The first four bits of the first octet in the MP_CAPABLE option
(Figure 4) define the MPTCP option subtype (see Section 8; for (Figure 4) define the MPTCP option subtype (see Section 8; for
MP_CAPABLE, this is 0), and the remaining four bits of this octet MP_CAPABLE, this is 0), and the remaining four bits of this octet
specifies the MPTCP version in use (for this specification, this is specifies the MPTCP version in use (for this specification, this is
0). 0).
The second octet is reserved for flags. The leftmost bit - labeled C The second octet is reserved for flags. The leftmost bit - labeled C
- indicates "Checksum required", and SHOULD be set to 1 unless - indicates "Checksum required", and SHOULD be set to 1 unless
specifically overridden (for example, if the system administrator has specifically overridden (for example, if the system administrator has
skipping to change at page 19, line 12 skipping to change at page 18, line 43
(the SYN/ACK). This is to allow both hosts to have exchanged random (the SYN/ACK). This is to allow both hosts to have exchanged random
data to be used as the message before generating the MAC. In both data to be used as the message before generating the MAC. In both
cases, the MAC algorithm is HMAC as defined in [12], using the SHA-1 cases, the MAC algorithm is HMAC as defined in [12], using the SHA-1
hash algorithm [4] (thus generating a 160-bit / 20 octet HMAC). Due hash algorithm [4] (thus generating a 160-bit / 20 octet HMAC). Due
to option space limitations, the MAC included in the SYN/ACK is to option space limitations, the MAC included in the SYN/ACK is
truncated to the leftmost 64 bits, but this is acceptable since while truncated to the leftmost 64 bits, but this is acceptable since while
in an attacker-initiated attack, the attacker can retry many times; in an attacker-initiated attack, the attacker can retry many times;
if the attacker is the responder, he only has one chance to get the if the attacker is the responder, he only has one chance to get the
MAC correct. MAC correct.
The initiator's authentication information is sent in its first ACK, The initiator's authentication information is sent in its first ACK
and is shown in Figure 7. The same reliability algorithm for this (the third packet of the handshake), and this is shown in Figure 7.
packet as for the MP_CAPABLE ACK is applied: receipt of this packet This data needs to be sent reliably, and therefore receipt of this
MUST trigger an ACK in response, and the packet MUST be retransmitted packet MUST trigger an ACK in response, and the packet MUST be
if this ACK is not received. In other words, sending the ACK/MP_JOIN retransmitted if this ACK is not received. In other words, sending
packet places the subflow in the PRE_ESTABLISHED state, and it moves the ACK/MP_JOIN packet places the subflow in the PRE_ESTABLISHED
to the ESTABLISHED state only on receipt of an ACK from the receiver. state, and it moves to the ESTABLISHED state only on receipt of an
The reserved bits in this option MUST be set to zero by the sender. ACK from the receiver. It is not permitted to send data while in the
PRE_ESTABLISHED state. The reserved bits in this option MUST be set
to zero by the sender.
The key for the MAC algorithm, in the case of the message transmitted The key for the MAC algorithm, in the case of the message transmitted
by Host A, will be Key-A followed by Key-B, and in the case of Host by Host A, will be Key-A followed by Key-B, and in the case of Host
B, Key-B followed by Key-A. These are the keys that were exchanged B, Key-B followed by Key-A. These are the keys that were exchanged
in the original MP_CAPABLE handshake. The message in each case is in the original MP_CAPABLE handshake. The message in each case is
the concatenations of Random Number for each host (denoted by R): for the concatenations of Random Number for each host (denoted by R): for
Host A, R-A followed by R-B; and for Host B, R-B followed by R-A. Host A, R-A followed by R-B; and for Host B, R-B followed by R-A.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 26, line 20 skipping to change at page 26, line 5
remainder of the connection (see Section 3.6). This is achieved by remainder of the connection (see Section 3.6). This is achieved by
setting the data-level length field to the reserved value of 0. The setting the data-level length field to the reserved value of 0. The
checksum, in such a case, will also be set to zero. checksum, in such a case, will also be set to zero.
3.3.2. Data Acknowledgements 3.3.2. Data Acknowledgements
To provide full end-to-end resilience, MPTCP provides a connection- To provide full end-to-end resilience, MPTCP provides a connection-
level acknowledgement, to act as a cumulative ACK for the connection level acknowledgement, to act as a cumulative ACK for the connection
as a whole. This is the "Data ACK" field of the DSS option as a whole. This is the "Data ACK" field of the DSS option
(Figure 9). The Data ACK is analogous to the behaviour of the (Figure 9). The Data ACK is analogous to the behaviour of the
standard TCP cumulative ACK in TCP SACK - indicating how much data standard TCP cumulative ACK - indicating how much data has been
has been successfully received (with no holes). The Data ACK successfully received (with no holes). This is in comparison to the
specifies the next Data Sequence Number it expects to receive. subflow-level ACK, which acts analogous to TCP SACK, given that there
may still be holes in the data stream at the connection level. The
Data ACK specifies the next Data Sequence Number it expects to
receive.
The Data ACK, as for the DSN, can be sent as the full 64 bit value, The Data ACK, as for the DSN, can be sent as the full 64 bit value,
or as the lower 32 bits. If data is received with a 64 bit DSN, it or as the lower 32 bits. If data is received with a 64 bit DSN, it
MUST be acknowledged with a 64 bit Data ACK. If the DSN received is MUST be acknowledged with a 64 bit Data ACK. If the DSN received is
32 bits, it is valid for the implementation to choose whether to send 32 bits, it is valid for the implementation to choose whether to send
a 32 bit or 64 bit Data ACK. a 32 bit or 64 bit Data ACK.
The Data ACK proves that the data, and all required MPTCP signaling, The Data ACK proves that the data, and all required MPTCP signaling,
has been received and accepted by the remote end. One key use of the has been received and accepted by the remote end. One key use of the
Data ACK signal is that it is used to indicate the left edge of the Data ACK signal is that it is used to indicate the left edge of the
skipping to change at page 36, line 51 skipping to change at page 36, line 42
connection attempts to this address/port combination for this connection attempts to this address/port combination for this
connection. A sender that wants to trigger a new incoming connection connection. A sender that wants to trigger a new incoming connection
attempt on a previously advertised address/port combination can attempt on a previously advertised address/port combination can
therefore refresh ADD_ADDR information by sending the option again. therefore refresh ADD_ADDR information by sending the option again.
During normal MPTCP operation, it is unlikely that there will be During normal MPTCP operation, it is unlikely that there will be
sufficient TCP option space for ADD_ADDR to be included along with sufficient TCP option space for ADD_ADDR to be included along with
those for data sequence numbering (Section 3.3.1). Therefore, it is those for data sequence numbering (Section 3.3.1). Therefore, it is
expected that an MPTCP implementation will send the ADD_ADDR option expected that an MPTCP implementation will send the ADD_ADDR option
on separate ACKs. As discussed earlier, however, an MPTCP on separate ACKs. As discussed earlier, however, an MPTCP
implementation MUST NOT treat duplicate ACKs with MPTCP options as implementation MUST NOT treat duplicate ACKs with any MPTCP option
indications of congestion [9], and an MPTCP implementation SHOULD NOT apart from DSS as indications of congestion [9], and an MPTCP
send more than two duplicate ACKs in a row for signaling purposes. implementation SHOULD NOT send more than two duplicate ACKs in a row
for signaling purposes.
3.4.2. Remove Address 3.4.2. Remove Address
If, during the lifetime of an MPTCP connection, a previously- If, during the lifetime of an MPTCP connection, a previously-
announced address becomes invalid (e.g. if the interface disappears), announced address becomes invalid (e.g. if the interface disappears),
the affected host SHOULD announce this so that the peer can remove the affected host SHOULD announce this so that the peer can remove
subflows related to this address. subflows related to this address.
This is achieved through the Remove Address (REMOVE_ADDR) option This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 13), which will remove a previously-added address (or list of (Figure 13), which will remove a previously-added address (or list of
skipping to change at page 41, line 13 skipping to change at page 41, line 10
must be immediately closed with an RST, featuring an MP_FAIL option must be immediately closed with an RST, featuring an MP_FAIL option
(Figure 15), which defines the Data Sequence Number at the start of (Figure 15), which defines the Data Sequence Number at the start of
the segment (defined by the Data Sequence Mapping) which had the the segment (defined by the Data Sequence Mapping) which had the
checksum failure. checksum failure.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Kind | Length=12 |Subtype| (reserved) | | Kind | Length=12 |Subtype| (reserved) |
+---------------+---------------+-------+----------------------+ +---------------+---------------+-------+----------------------+
| Data Sequence Number (8 octets) : | |
+--------------------------------------------------------------+ | Data Sequence Number (8 octets) |
: Data Sequence Number (continued) | | |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
Figure 15: Fallback (MP_FAIL) option Figure 15: Fallback (MP_FAIL) option
Failed data will not be DATA_ACKed and so will be re-transmitted on The receiver MUST discard all data following the data sequence number
other subflows (Section 3.3.6). specified. Failed data will not be DATA_ACKed and so will be re-
transmitted on other subflows (Section 3.3.6).
A special case is when there is a single subflow and it fails with a A special case is when there is a single subflow and it fails with a
checksum error. If it is known that all unacknowledged data in checksum error. If it is known that all unacknowledged data in
flight is contiguous, an infinite mapping can be applied to the flight is contiguous (which will usually be the case with a single
subflow without the need to close it first, and essentially turn off subflow), an infinite mapping can be applied to the subflow without
all further MPTCP signaling. In this case, if a receiver identifies the need to close it first, and essentially turn off all further
a checksum failure when there is only one path, it will send back an MPTCP signaling. In this case, if a receiver identifies a checksum
MP_FAIL option on the subflow-level ACK. The sender will receive failure when there is only one path, it will send back an MP_FAIL
this, and if all unacknowledged data in flight is contiguous, will option on the subflow-level ACK, refering to the data-level sequence
signal an infinite mapping (if the data is not contiguous, the sender number of the start of the segment on which the checksum error was
MUST send an RST). This infinite mapping will be a DSS option detected. The sender will receive this, and if all unacknowledged
(Section 3.3) on the first new packet, containing a Data Sequence data in flight is contiguous, will signal an infinite mapping. This
Mapping that acts retroactively, referring to the start of the infinite mapping will be a DSS option (Section 3.3) on the first new
subflow sequence number of the last segment that was known to be packet, containing a Data Sequence Mapping that acts retroactively,
delivered intact. From that point onwards data can be altered by a referring to the start of the subflow sequence number of the last
middlebox without affecting MPTCP, as the data stream is equivalent segment that was known to be delivered intact. From that point
to a regular, legacy TCP session. onwards data can be altered by a middlebox without affecting MPTCP,
as the data stream is equivalent to a regular, legacy TCP session.
In the rare case that the data is not contiguous (which could happen
when there is only one subflow but it is retransmitting data from a
subflow that has recently been uncleanly closed), the receiver MUST
close the subflow with an RST with MP_FAIL. The receiver MUST
discard all data that follows the data sequence number specified.
The sender MAY attempt to create a new subflow belonging to the same
connection, and if it chooses to do so, SHOULD place the single
subflow immediately in fallback mode by setting an infinite data
sequence mapping. This mapping will begin from the data-level
sequence number that was declared in the MP_FAIL.
After a sender signals an infinite mapping it MUST only use subflow After a sender signals an infinite mapping it MUST only use subflow
ACKs to clear its send buffer. This is because Data ACKs may become ACKs to clear its send buffer. This is because Data ACKs may become
misaligned with the subflow ACKs when middleboxes insert or delete misaligned with the subflow ACKs when middleboxes insert or delete
data. The receive SHOULD stop generating Data ACKs after it receives data. The receive SHOULD stop generating Data ACKs after it receives
an infinite mapping. an infinite mapping.
When a connection is in fallback mode, only one subflow can send data When a connection is in fallback mode, only one subflow can send data
at a time. Otherwise, the receiver would not know how to reorder the at a time. Otherwise, the receiver would not know how to reorder the
data; in practice this means that all MPTCP subflows will have to be data; in practice this means that all MPTCP subflows will have to be
skipping to change at page 45, line 8 skipping to change at page 45, line 16
the subflow. To allow the receiver to reorder application data, the subflow. To allow the receiver to reorder application data,
an additional data-level sequence space is used. In this data- an additional data-level sequence space is used. In this data-
level sequence space, the initial SYN and the final DATA_FIN level sequence space, the initial SYN and the final DATA_FIN
occupy one octet of sequence space. There is an explicit mapping occupy one octet of sequence space. There is an explicit mapping
of data sequence space to subflow sequence space, which is of data sequence space to subflow sequence space, which is
signalled through TCP options in data packets. signalled through TCP options in data packets.
ACK: The ACK field in the TCP header acknowledges only the subflow ACK: The ACK field in the TCP header acknowledges only the subflow
sequence number, not the data-level sequence space. sequence number, not the data-level sequence space.
Implementations SHOULD NOT attempt to infer a data-level Implementations SHOULD NOT attempt to infer a data-level
acknowledgement from the subflow ACKs. Instead an explicit data- acknowledgement from the subflow ACKs. This separates subflow-
level ACK is This separates subflow- and connection-level and connection-level processing at an end host.
processing at an end host.
Duplicate ACK: A duplicate ACK that includes MPTCP signaling MUST Duplicate ACK: A duplicate ACK that includes any MPTCP signaling
NOT be treated as a signal of congestion. To avoid any non-MPTCP- (with the exception of the DSS option) MUST NOT be treated as a
aware entities also mistakenly seeing duplicate ACKs in such signal of congestion. To avoid any non-MPTCP-aware entities also
cases, MPTCP SHOULD NOT send more than two duplicate ACKs mistakenly seeing duplicate ACKs in such cases, MPTCP SHOULD NOT
containing MPTCP signals in a row. send more than two duplicate ACKs containing MPTCP signals in a
row.
Receive Window: The receive window in the TCP header indicates the Receive Window: The receive window in the TCP header indicates the
amount of free buffer space for the whole data-level connection amount of free buffer space for the whole data-level connection
(as opposed to for this subflow) that is available at the (as opposed to for this subflow) that is available at the
receiver. This is the same semantics as regular TCP, but to receiver. This is the same semantics as regular TCP, but to
maintain these semantics the receive window must be interpreted at maintain these semantics the receive window must be interpreted at
the sender as relative to the sequence number given in the the sender as relative to the sequence number given in the
DATA_ACK rather than the subflow ACK in the TCP header. In this DATA_ACK rather than the subflow ACK in the TCP header. In this
way the original flow control role is preserved. Note that some way the original flow control role is preserved. Note that some
middleboxes may change the receive window, and so a host must use middleboxes may change the receive window, and so a host must use
skipping to change at page 50, line 20 skipping to change at page 50, line 24
sequence number instead of using the sequence number in the sequence number instead of using the sequence number in the
segment. In this way, the mapping is independent of the packets segment. In this way, the mapping is independent of the packets
that carry it. that carry it.
o The Receive Window may be shrunk by some middleboxes at the o The Receive Window may be shrunk by some middleboxes at the
subflow level. MPTCP will use the maximum window at data-level, subflow level. MPTCP will use the maximum window at data-level,
but will also obey subflow specific windows. but will also obey subflow specific windows.
7. Acknowledgements 7. Acknowledgements
The authors were supported by Trilogy The authors were originally supported by Trilogy
(http://www.trilogy-project.org), a research project (ICT-216372) (http://www.trilogy-project.org), a research project (ICT-216372)
partially funded by the European Community under its Seventh partially funded by the European Community under its Seventh
Framework Program. The views expressed here are those of the Framework Program. The views expressed here are those of the
author(s) only. The European Commission is not liable for any use author(s) only. The European Commission is not liable for any use
that may be made of the information in this document. that may be made of the information in this document.
Alan Ford was supported by Roke Manor Research. Alan Ford was originally supported by Roke Manor Research.
The authors gratefully acknowledge significant input into this The authors gratefully acknowledge significant input into this
document from Sebastien Barre, Christoph Paasch, and Andrew McDonald. document from Sebastien Barre, Christoph Paasch, and Andrew McDonald.
The authors also wish to acknowledge reviews and contributions from The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock, Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo,
Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing, Lawrence Conroy, Yoshifumi Nishida, Bob Briscoe, Stein Gjessing,
Andrew McGregor, Georg Hampel, and Anumita Biswas. Andrew McGregor, Georg Hampel, and Anumita Biswas.
skipping to change at page 53, line 46 skipping to change at page 53, line 46
TCP data packets typically carry timestamp options in every packet, TCP data packets typically carry timestamp options in every packet,
taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28,
if word-aligned). The Data Sequence Signal (DSS) option varies in if word-aligned). The Data Sequence Signal (DSS) option varies in
length depending on whether the Data Sequence Mapping and DATA ACK length depending on whether the Data Sequence Mapping and DATA ACK
are included, and whether the sequence numbers in use are 4 or 8 are included, and whether the sequence numbers in use are 4 or 8
octets. The maximum size of the DSS option is 28 bytes, so even that octets. The maximum size of the DSS option is 28 bytes, so even that
will fit in the available space. But unless a connection is both bi- will fit in the available space. But unless a connection is both bi-
directional and high-bandwidth, it is unlikely that all that option directional and high-bandwidth, it is unlikely that all that option
space will be required on each DSS option. space will be required on each DSS option.
It is not necessary to include the Data Sequence Mapping and DATA ACK Within the DSS option, it is not necessary to include the Data
in each packet, and in many cases it may be possible to alternate Sequence Mapping and DATA ACK in each packet, and in many cases it
their presence (so long as the mapping covers the data being sent in may be possible to alternate their presence (so long as the mapping
the following packet). Other options include: alternating between 4 covers the data being sent in the following packet). It would also
and 8 byte sequence numbers in each option; and sending the DATA_ACK be possible to alternate between 4 and 8 byte sequence numbers in
on a duplicate subflow-level ACK (although note that this must not be each option.
taken as a signal of congestion).
On subflow and connection setup, an MPTCP option is also set on the On subflow and connection setup, an MPTCP option is also set on the
third packet (an ACK). These are 20 bytes (for Multipath Capable) third packet (an ACK). These are 20 bytes (for Multipath Capable)
and 24 bytes (for Join), both of which will fit in the available and 24 bytes (for Join), both of which will fit in the available
option space. option space.
Pure ACKs in TCP typically contain only timestamps (10B). Here, Pure ACKs in TCP typically contain only timestamps (10B). Here,
multipath TCP typically needs to encode only the DATA ACK (maximum of multipath TCP typically needs to encode only the DATA ACK (maximum of
12 octets). Occasionally ACKs will contain SACK information. 12 octets). Occasionally ACKs will contain SACK information.
Depending on the number of lost packets, SACK may utilize the entire Depending on the number of lost packets, SACK may utilize the entire
 End of changes. 24 change blocks. 
86 lines changed or deleted 96 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/