draft-ietf-mptcp-multiaddressed-01.txt   draft-ietf-mptcp-multiaddressed-02.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Roke Manor Research Internet-Draft Roke Manor Research
Intended status: Experimental C. Raiciu Intended status: Experimental C. Raiciu
Expires: January 13, 2011 M. Handley Expires: April 28, 2011 M. Handley
University College London University College London
July 12, 2010 October 25, 2010
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-multiaddressed-01 draft-ietf-mptcp-multiaddressed-02
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network, and thus improve user improve resource usage within the network, and thus improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
skipping to change at page 1, line 44 skipping to change at page 1, line 44
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 13, 2011. This Internet-Draft will expire on April 28, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 15 skipping to change at page 3, line 15
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . . 4 1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . . 4
1.2. Multipath TCP in the Networking Stack . . . . . . . . . . 5 1.2. Multipath TCP in the Networking Stack . . . . . . . . . . 5
1.3. Operation Summary . . . . . . . . . . . . . . . . . . . . 6 1.3. Operation Summary . . . . . . . . . . . . . . . . . . . . 6
1.4. Requirements Language . . . . . . . . . . . . . . . . . . 7 1.4. Requirements Language . . . . . . . . . . . . . . . . . . 7
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7
3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 8 3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 8
3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 8 3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 8
3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 10 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 11
3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 12 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 15
3.3.1. Data Sequence Numbering . . . . . . . . . . . . . . . 12 3.3.1. Data Sequence Numbering . . . . . . . . . . . . . . . 15
3.3.2. Data Acknowledgements . . . . . . . . . . . . . . . . 15 3.3.2. Data Acknowledgements . . . . . . . . . . . . . . . . 17
3.3.3. Receiver Considerations . . . . . . . . . . . . . . . 16 3.3.3. Receiver Considerations . . . . . . . . . . . . . . . 18
3.3.4. Sender Considerations . . . . . . . . . . . . . . . . 17 3.3.4. Sender Considerations . . . . . . . . . . . . . . . . 19
3.3.5. Congestion Control Considerations . . . . . . . . . . 18 3.3.5. Congestion Control Considerations . . . . . . . . . . 21
3.3.6. Subflow Policy . . . . . . . . . . . . . . . . . . . . 19 3.3.6. Subflow Policy . . . . . . . . . . . . . . . . . . . . 21
3.4. Closing a Connection . . . . . . . . . . . . . . . . . . . 20 3.4. Closing a Connection . . . . . . . . . . . . . . . . . . . 22
3.5. Address Knowledge Exchange (Path Management) . . . . . . . 21 3.5. Address Knowledge Exchange (Path Management) . . . . . . . 24
3.5.1. Address Advertisement . . . . . . . . . . . . . . . . 22 3.5.1. Address Advertisement . . . . . . . . . . . . . . . . 25
3.5.2. Remove Address . . . . . . . . . . . . . . . . . . . . 25 3.5.2. Remove Address . . . . . . . . . . . . . . . . . . . . 27
3.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 26 3.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 29 3.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 31
3.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 29 3.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 32
3.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 29 3.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 32
4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 30 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 32
5. Security Considerations . . . . . . . . . . . . . . . . . . . 31 5. Security Considerations . . . . . . . . . . . . . . . . . . . 34
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 32 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 34
7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 35 7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 38
8. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 36 8. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 38
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 38
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 39
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 37 10.1. Normative References . . . . . . . . . . . . . . . . . . . 39
11.1. Normative References . . . . . . . . . . . . . . . . . . . 37 10.2. Informative References . . . . . . . . . . . . . . . . . . 39
11.2. Informative References . . . . . . . . . . . . . . . . . . 38 Appendix A. Notes on use of TCP Options . . . . . . . . . . . . . 40
Appendix A. Notes on use of TCP Options . . . . . . . . . . . . . 39 Appendix B. Resync Packet . . . . . . . . . . . . . . . . . . . . 42
Appendix B. Resync Packet . . . . . . . . . . . . . . . . . . . . 40 Appendix C. Changelog . . . . . . . . . . . . . . . . . . . . . . 42
Appendix C. Changelog . . . . . . . . . . . . . . . . . . . . . . 41 C.1. Changes since draft-ietf-mptcp-multiaddressed-01 . . . . . 42
C.1. Changes since draft-ietf-mptcp-multiaddressed-00 . . . . . 41 C.2. Changes since draft-ietf-mptcp-multiaddressed-00 . . . . . 43
C.2. Changes since draft-ford-mptcp-multiaddressed-03 . . . . . 42 C.3. Changes since draft-ford-mptcp-multiaddressed-03 . . . . . 43
C.3. Changes since draft-ford-mptcp-multiaddressed-02 . . . . . 42 C.4. Changes since draft-ford-mptcp-multiaddressed-02 . . . . . 43
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 43
1. Introduction 1. Introduction
Multipath TCP (henceforth referred to as MPTCP) is a set of Multipath TCP (henceforth referred to as MPTCP) is a set of
extensions to regular TCP [2] to allow a transport connection to extensions to regular TCP [2] to allow a transport connection to
operate across multiple paths simultaneously. This document presents operate across multiple paths simultaneously. This document presents
the protocol changes required to add multipath capability to TCP; the protocol changes required to add multipath capability to TCP;
specifically, those for signalling and setting up multiple paths specifically, those for signalling and setting up multiple paths
("subflows"), managing these subflows, reassembly of data, and ("subflows"), managing these subflows, reassembly of data, and
termination of sessions. This is not the only information required termination of sessions. This is not the only information required
skipping to change at page 7, line 6 skipping to change at page 7, line 6
Address B1 on Host B. Address B1 on Host B.
o MPTCP identifies multiple paths by the presence of multiple o MPTCP identifies multiple paths by the presence of multiple
addresses at endpoints. Combinations of these multiple addresses addresses at endpoints. Combinations of these multiple addresses
equate to the additional paths. In the example, other potential equate to the additional paths. In the example, other potential
paths that could be set up are A1<->B2 and A2<->B2. Although this paths that could be set up are A1<->B2 and A2<->B2. Although this
additional session is shown as being initiated from A2, it could additional session is shown as being initiated from A2, it could
equally have been initiated from B1. equally have been initiated from B1.
o The discovery and setup of additional subflows will be achieved o The discovery and setup of additional subflows will be achieved
through a path management method. This document describes a through a path management method; this document describes a
mechanism by which an endpoint can initiate new subflows by using mechanism by which an endpoint can initiate new subflows by using
its own additional addresses, or by signalling its available its own additional addresses, or by signalling its available
addresses to the other endpoint. addresses to the other endpoint.
o MPTCP adds connection-level sequence numbers to allow the o MPTCP adds connection-level sequence numbers to allow the
reassembly of the in-order data stream from multiple subflows reassembly of the in-order data stream from multiple subflows
which may deliver packets out-of-order due to differing network which may deliver packets out-of-order due to differing network
delays. delays.
o Subflows are terminated as regular TCP connections, with a four o Subflows are terminated as regular TCP connections, with a four
skipping to change at page 8, line 31 skipping to change at page 8, line 31
Endpoint: A host operating an MPTCP implementation, and either Endpoint: A host operating an MPTCP implementation, and either
initiating or accepting an MPTCP connection. initiating or accepting an MPTCP connection.
3. MPTCP Protocol 3. MPTCP Protocol
This section describes the operation of the MPTCP protocol, and is This section describes the operation of the MPTCP protocol, and is
subdivided into sections for each key part of the protocol operation. subdivided into sections for each key part of the protocol operation.
All MPTCP operations are signalled using optional TCP header fields. All MPTCP operations are signalled using optional TCP header fields.
These TCP Options will have option numbers allocated by IANA, as These TCP Options will have option numbers allocated by IANA, as
listed in Section 10, and are defined throughout the following listed in Section 9, and are defined throughout the following
subsections. subsections.
3.1. Connection Initiation 3.1. Connection Initiation
Connection Initiation begins with a SYN, SYN/ACK exchange on a single Connection Initiation begins with a SYN, SYN/ACK, ACK exchange on a
path. Each packet contains the Multipath Capable (MP_CAPABLE) TCP single path. Each packet contains the Multipath Capable (MP_CAPABLE)
option (Figure 3). This option declares its sender is capable of TCP option (Figure 3). This option declares its sender is capable of
performing multipath TCP and wishes to do so on this particular performing multipath TCP and wishes to do so on this particular
connection. Each host includes in the MP_CAPABLE option a locally- connection.
unique token that identifies this connection. This is used when
adding additional subflows to this connection.
This token is generated by the sender and has local meaning only, This option contains a 64-bit key that is used to authenticate the
hence it MUST be unique for the sender. The token MUST be difficult addition of future subflows. This is the only time the key will be
for an attacker to guess, and thus it is recommended it SHOULD be sent in clear on the wire; all future subflows will identify the
generated randomly. (However, see further discussions about security connection using a 32-bit "token". This token is a cryptographically
in Section 5, including the possibility of 64-bit tokens.) secure hash of this key. This will be a truncated (most significant
32 bits) SHA-1 hash [6]. A different, 64-bit truncation (the least
significant 64 bits) of the hash of the key will be used as the
Initial Data Sequence Number.
The MP_CAPABLE option is only present in packets with the SYN flag This key is generated by the sender and has local meaning only, and
set. It is only used in the first TCP session of a connection, in its method of generation is implementation-specific. The key SHOULD
order to identify the connection; all following subflows will use the be hard to guess, and it MUST be unique for the sending host at any
"Join" option (see Section 3.2) to join the existing connection. one time. Connections will be indexed at each host by the token (the
truncated SHA-1 hash of the key), but an implementation will require
a mapping from the token to the key for each connection.
The MP_CAPABLE option is carried on the SYN, SYN/ACK, and ACK packets
that start the first subflow of an MPTCP connection. The data
carried by each packet is as follows, where A = initiator and B =
listener.
o SYN (A->B): A's Key.
o SYN/ACK (B->A): B's Key.
o ACK (A->B): Both A's Key and B's Key.
The contents of the option is determined by the SYN and ACK flags of
the packet, verified by the option's length field. For the diagram
shown in Figure 3, "sender" and "receiver" refer to the sender or
receiver of the TCP packet.
The keys are echoed in the ACK in order to allow the listener to act
statelessly until the TCP connection reaches the ESTABLISHED state.
If the listener acts in this way, however, it MUST generate its key
in a verifiable fashion, allowing it to verify that it generated the
key when it is echoed in the ACK. If this ACK does not carry data,
it MUST still be ACKed by the receiver in order for the sender to
ensure the ACK with MP_JOIN option has been received.
The first octet of this option specifies the MPTCP version in use
(for this specification, this is 0). The second octet is reserved
for flags, and currently MUST be set to all zeros. The meaning of
such flags will be determined in future revisions of MPTCP, however
some possible uses may be to enable or disable certain MPTCP
features, and to provide a mechanism for crypto agility.
The MP_CAPABLE option is only used in the first subflow of a
connection, in order to identify the connection; all following
subflows will use the "Join" option (see Section 3.2) to join the
existing connection.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+ +---------------+---------------+---------------+---------------+
|Kind=MP_CAPABLE| Length=12 | Sender Token : |Kind=MP_CAPABLE| Length | Version | (reserved) |
+---------------+---------------+-------------------------------+ +---------------+---------------+---------------+---------------+
: Sender Token (4 bytes total) | Initial Data Sequence Number : | Sender Key |
+-------------------------------+-------------------------------+ | (64 bits) |
: Initial Data Sequence Number (6 bytes total) | | |
+---------------------------------------------------------------+
| Receiver Key (64 bits) |
| (if Length==20) |
| |
+---------------------------------------------------------------+ +---------------------------------------------------------------+
Figure 3: Multipath Capable (MP_CAPABLE) option (only valid on SYN Figure 3: Multipath Capable (MP_CAPABLE) option
packets)
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
is assumed that the passive opener is not multipath capable and thus is assumed that the passive opener is not multipath capable and thus
the MPTCP session will operate as regular, single-path TCP. If a SYN the MPTCP session will operate as regular, single-path TCP. If a SYN
does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT contain does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT contain
one in response. one in response.
If the SYN packets are unacknowledged, it is up to local policy to If the SYN packets are unacknowledged, it is up to local policy to
decide how to respond. It is expected that a sender will eventually decide how to respond. It is expected that a sender will eventually
fall back to single-path TCP (i.e. without the MP_CAPABLE Option) in fall back to single-path TCP (i.e. without the MP_CAPABLE Option) in
order to work around middleboxes that may drop packets with unknown order to work around middleboxes that may drop packets with unknown
options; however, the number of multipath-capable attempts that are options; however, the number of multipath-capable attempts that are
made first will be up to local policy. Once the active opener has made first will be up to local policy. Once the active opener has
sent a SYN without the MP_CAPABLE option, it MUST fall back to sent a SYN without the MP_CAPABLE option, it MUST fall back to
regular TCP behavior, even if it subsequently receives a SYN/ACK that regular TCP behavior, even if it subsequently receives a SYN/ACK that
contains an MP_CAPABLE option. This might happen if the MP_CAPABLE contains an MP_CAPABLE option. This might happen if the MP_CAPABLE
SYN and subsequent non-MP-capable SYN are reordered. This is to SYN and subsequent non-MP-capable SYN are reordered. This is to
ensure that the two endpoints end up in an interoperable state, no ensure that the two endpoints end up in an interoperable state, no
matter what order the SYNs arrive at the passive opener. This final matter what order the SYNs arrive at the passive opener. This final
state is inferred from the presence or absence of the DATA_ACK option state is inferred from the presence or absence of the MP_CAPABLE
in the third packet of the TCP handshake. option in the third packet of the TCP handshake. If this option is
not present, the connection should fall back to regular TCP, as
The MP_CAPABLE option includes the most significant 6 bytes of the documented in Section 3.6.
8-byte initial Data Sequence Number option (discussed in
Section 3.3). The least significant two bytes should be treated as
being zero. This data sequence number maps the SYN into to the data
sequence space (and this initial SYN occupies one octet of this
space, as for a regular SYN in single-path TCP). Having the SYN
occupy sequence space means that it must be DATA_ACKed, and this
ensures that there is two-way agreement on whether or not the
multipath capability is enabled, even if a middlebox were to strip
the MP_CAPABLE option from a SYN/ACK packet.
To preserve option space, only the most significant six bytes of the The initial Data Sequence Number (IDSN) is generated as a hash from
data sequence number are sent in the SYN, as there is no significant the Key, in the same way as the token, i.e. IDSN-A = Hash(Key-A) and
security benefit from randomizing the values of the lower two bytes IDSN-B = Hash(Key-B). The Hash mechanism here provides the least
given that these fall within typical receive window sizes. significant 64 bits of the SHA-1 hash of the key. The SYN with
MP_CAPABLE occupies the first octet of Data Sequence Space.
3.2. Starting a New Subflow 3.2. Starting a New Subflow
Once a MPTCP connection has begun with the MP_CAPABLE exchange, Once a MPTCP connection has begun with the MP_CAPABLE exchange,
further subflows can be added to the connection. Endpoints have further subflows can be added to the connection. Endpoints have
knowledge of their own address(es), and can become aware of the other knowledge of their own address(es), and can become aware of the other
endpoint's addresses through signalling exchanges as described in endpoint's addresses through signalling exchanges as described in
Section 3.5. Using this knowledge, an endpoint can initiate a new Section 3.5. Using this knowledge, an endpoint can initiate a new
subflow over a currently unused pair of addresses. The protocol subflow over a currently unused pair of addresses. The protocol
permits either endpoint of a connection to initiate the creation of a permits either endpoint of a connection to initiate the creation of a
new subflow (but see Section 3.8 for heuristics) new subflow (but see Section 3.8 for heuristics).
A new subflow is started as a normal TCP SYN/ACK exchange. The Join A new subflow is started as a normal TCP SYN/ACK exchange. The Join
Connection (MP_JOIN)) TCP option (Figure 4) is used to identify the Connection (MP_JOIN) TCP option (Figure 4) is used to identify the
connection to be joined by the new subflow. The receiver token sent connection to be joined by the new subflow. The tokens used to
MUST be the other endpoint's locally unique connection token, which identify the MPTCP connection are cryptographically secure hashes of
was included in the MP_CAPABLE option during connection the keys exchanged in the initial MP_CAPABLE handshake. The tokens
establishment. The MP_JOIN option MUST only be present on SYN presented in this option are generated by the SHA-1 [6] algorithm,
packets. truncated to the most significant 32 bits. The token included in the
MP_JOIN option is the token that the receiver of the packet uses to
identify this connection, i.e. Host A will send Token-B (which is
generated from Key-B), and vice versa.
The MP_JOIN SYN/SYN-ACK handshake not only exchanges the tokens
(which are static for a connection) but also Random Numbers (nonces)
that are used to prevent replay attacks on the authentication method.
Whilst these data are transferred in the SYN exchange, the actual
cryptographic authentication is undertaken in the first two payload
segments of the connection. Once the peers have successfully
authenticated themselves, the subflow is handed over to the scheduler
to be used for data (the presense of a DSN_MAP option Section 3.3
indicates this).
The MP_JOIN option also contains an "Address ID" to identify the
source address of this packet if it has changed in transit; the
behaviour of this ID is explained later in this section.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+ +---------------+---------------+-------------------------------+
| Kind=MP_JOIN | Length = 7 |Receiver Token (4 octets total): | Kind=MP_JOIN | Length = 8 | Address ID | (reserved) |B|
+---------------+---------------+----------------+--------------+ +---------------+---------------+----------------+--------------+
: Receiver Token (continued) | Address ID | | Receiver Token (32 bits) |
+-------------------------------+----------------+ +---------------------------------------------------------------+
| Sender Random Number (32 bits) |
+---------------------------------------------------------------+
Figure 4: Join Connection (MP_JOIN) option (only valid on SYN Figure 4: Join Connection (MP_JOIN) option (only valid on SYN
packets) packets)
TBD: A better security mechanism that just the token is required On the third and fourth packets of the handshake, the following data
here, in order to prove that the sender of the SYN/MP_JOIN is the is sent in the TCP payload:
same sender as that who sent the original SYN/MP_CAPABLE. Hash
chains are considered an appropriate solution, and the mechanism will
be described in detail in a later version of this document.
When receiving a SYN with the MP_JOIN option that contains a valid 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+
| Kind=MP_AUTH | Length | (reserved) |
+---------------+---------------+-------------------------------+
| |
| |
| HMAC (256 bits for SHA-256) |
| |
| |
+---------------------------------------------------------------+
Figure 5: Authentication Data
For consistancy, this follows the same format as a TCP Option,
although it is sent in the TCP payload. The HMAC algorithm is as
defined in [6], using the SHA-256 hash algorithm (thus generating a
256-bit / 32 octet HMAC), however in the future some of the reserved
bits could be used to enable alternative algorithms.
The key for the HMAC algorithm, in the case of the message
transmitted by Host A, will be Key-A followed by Key-B, and in the
case of Host B, Key-B followed by Key-A. The message in each case is
the concatenations of Random Number for each host (denoted by R): for
Host A, R-A followed by R-B; and for Host B, R-B followed by R-A.
When receiving a SYN with a MP_JOIN option that contains a valid
token for an existing MPTCP connection, the recipient SHOULD respond token for an existing MPTCP connection, the recipient SHOULD respond
with a SYN/ACK also containing an MP_JOIN option containing the with a SYN/ACK also containing an MP_JOIN option containing the
initiator's token. This behaviour is illustrated in Figure 5. initiator's token. This will then lead on to the authentication HMAC
exchange described above. This behaviour is illustrated in Figure 6.
Host A Host B Host A Host B
------------------------ ------------------------ ------------------------ ------------------------
Address A1 Address A2 Address B1 Address B2 Address A1 Address A2 Address B1 Address B2
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
| | | | | | | |
| SYN + MP_CAPABLE(Token A) | | | SYN + MP_CAPABLE(Key-A) | |
|----------------------------------->| | |----------------------------------------->| |
|<-----------------------------------| | |<-----------------------------------------| |
| SYN/ACK + MP_CAPABLE(Token B) | | | SYN/ACK + MP_CAPABLE(Key-B) | |
| | | | | | | |
| | SYN + MP_JOIN(Token B) | | ACK + MP_CAPABLE(Key-A, Key-B) | |
| |----------------------------------->| |----------------------------------------->| |
| |<-----------------------------------| | | | |
| | SYN/ACK + MP_JOIN(Token A) | | | SYN + MP_JOIN(Token-B, R-A) |
| | | | | |----------------------------------------->|
| |<-----------------------------------------|
| | SYN/ACK + MP_JOIN(Token-A, R-B) |
| | | |
| | HMAC(Key=(Key-A+Key-B), Msg=(R-A+R-B)) |
| |----------------------------------------->|
| |<-----------------------------------------|
| | HMAC(Key=(Key-B+Key-A), Msg=(R-B+R-A)) |
| | | |
Figure 5: Example use of MPTCP Tokens Figure 6: Example use of MPTCP Authentication
If the token received at Host B is unknown or local policy prohibits If the token received at Host B is unknown or local policy prohibits
the acceptance of the new subflow, the recipient MUST respond with a the acceptance of the new subflow, the recipient MUST respond with a
TCP RST. TCP RST.
If the token is accepted at Host B, but the token returned to Host A If the token is accepted at Host B, but the token returned to Host A
is not the one expected, Host A MUST close the subflow with a TCP is not the one expected, Host A MUST close the subflow with a TCP
RST. RST.
If either host receives an incorrect HMAC (i.e. it does not match
what the host believes it should be), it MUST close the subflow with
a TCP RST.
The echoing of the token serves two purposes: it ensures both The echoing of the token serves two purposes: it ensures both
endpoints agree on the connection being referred to (this is endpoints agree on the connection being referred to (this is
particularly relevant when both addresses being used are new to the particularly relevant when both addresses being used are new to the
connection); and it ensures there are no middleboxes on the path that connection); and it ensures there are no middleboxes on this new path
will drop MPTCP options on the return path. that will drop MPTCP options on the return path.
If the SYN/ACK as received at Host A does not have an MP_JOIN option, If the SYN/ACK as received at Host A does not have an MP_JOIN option,
Host A MUST close the subflow with a RST. Host A MUST close the subflow with a RST.
If MP_JOIN is stripped from the SYN on the path from A to B, and Host If MP_JOIN is stripped from the SYN on the path from A to B, and Host
B does not have a passive opener on the relevant port, it will B does not have a passive opener on the relevant port, it will
respond with an RST in the normal way. If in response to a SYN with respond with an RST in the normal way. If in response to a SYN with
an MP_JOIN option, a SYN/ACK is received without the MP_JOIN option an MP_JOIN option, a SYN/ACK is received without the MP_JOIN option
(either since it was stripped on the return path, or it was stripped (either since it was stripped on the return path, or it was stripped
on the outgoing path but the passive opener on Host B responded as if on the outgoing path but the passive opener on Host B responded as if
it was a new regular TCP session), then the subflow is unusable and it were a new regular TCP session), then the subflow is unusable and
Host A MUST close it with a RST. Host A MUST close it with a RST.
It should be noted that additional subflows can be created between It should be noted that additional subflows can be created between
any pair of ports (but see Section 3.8 for heuristics); no explicit any pair of ports (but see Section 3.8 for heuristics); no explicit
application-level accept calls or bind calls are required to open application-level accept calls or bind calls are required to open
additional subflows. To associate a new subflow with an existing additional subflows. To associate a new subflow with an existing
connection, the token supplied in the subflow's SYN exchange is used connection, the token supplied in the subflow's SYN exchange is used
for demultiplexing. This then binds the 5-tuple of the TCP subflow for demultiplexing. This then binds the 5-tuple of the TCP subflow
to the local token of the connection. A consequence is that it is to the local token of the connection. A consequence is that it is
possible to allow any port pairs to be used for a connection. possible to allow any port pairs to be used for a connection.
Deumultiplexing subflow SYNs MUST be done using the token; this is Deumultiplexing subflow SYNs MUST be done using the token; this is
unlike traditional TCP, where the destination port is used for unlike traditional TCP, where the destination port is used for
demultiplexing SYN packets. Once a subflow is setup, demultiplexing demultiplexing SYN packets. Once a subflow is setup, demultiplexing
packets is done using the five-tuple, as in traditional TCP. The packets is done using the five-tuple, as in traditional TCP. The
five-tuples will be mapped to the local connection ID. five-tuples will be mapped to the local connection ID.
The MP_JOIN option includes an "Address ID". This is an identifier The MP_JOIN option includes an "Address ID". This is an identifier
that is locally unique to the sender of this option. It has only that only has significance within a single connection, where it
significance withing a single connection, where it identifies the identifies the source address of this packet. The key purpose of
source address of this packet. The key purpose of this identifier is this identifier is to allow address removal without needing to know
to allow address removal without needing to know what the source what the source address at the receiver is, thus allowing the use of
address actually is, thus allowing the use of NATs), when the subflow NATs. The sender can signal this to the receiver via the REMOVE_ADDR
is no longer available. The sender can signal this to the receiver option (Section 3.5.2). It also allows correlation between new
via the REMOVE_ADDR option (Section 3.5.2). It also allows subflow setup attempts and address signalling (Section 3.5.1), to
correlation between new subflow setup attempts and address signalling prevent setting up duplicate subflows on the same path.
(Section 3.5.1), to prevent setting up duplicate subflows on the same
path.
The Address IDs of the subflow used in the initial SYN exchange of The Address IDs of the subflow used in the initial SYN exchange of
the first subflow in the connection are implicit, and have the value the first subflow in the connection are implicit, and have the value
zero. zero.
The Address ID must be stored by the receiver in a data structure The Address ID must be stored by the receiver in a data structure
that gathers all the Address ID to address mappings for a connection that gathers all the Address ID to address mappings for a connection
identified by a token pair. In this way there is a stored mapping identified by a token pair. In this way there is a stored mapping
between Address ID, observed source address and token pair for future between Address ID, observed source address and token pair for future
processing of control information for a connection. processing of control information for a connection.
The MP_JOIN option also includes 8 bits of flags, 7 of which are
currently reserved. The final bit, labelled 'B', indicates whether
the initiator wishes this subflow to be used purely as a backup path
(B=1) in the event of failure of other paths, or whether it wants it
to be used as part of the connection immediately. Subflow policy is
discussed in more detail in Section 3.3.6.
3.3. General MPTCP Operation 3.3. General MPTCP Operation
This section discusses operation of MPTCP for data transfer. At a This section discusses operation of MPTCP for data transfer. At a
high level, an MPTCP implementation will take one input data stream high level, an MPTCP implementation will take one input data stream
from an application, and split it into one or more subflows, with from an application, and split it into one or more subflows, with
sufficient control information to allow it to be reassembled and sufficient control information to allow it to be reassembled and
delivered reliably and in-order to the recipient application. The delivered reliably and in-order to the recipient application. The
following subsections define this behaviour in detail. following subsections define this behaviour in detail.
3.3.1. Data Sequence Numbering 3.3.1. Data Sequence Numbering
The data stream as a whole can be reassembled through the use of the The data stream as a whole can be reassembled through the use of the
Data Sequence Mapping (DSN_MAP, Figure 6) option, which defines the Data Sequence Mapping (DSN_MAP, Figure 7) option, which defines the
mapping from the data sequence number to the subflow sequence number. mapping from the data sequence number to the subflow sequence number.
This is used by the receiver to ensure in-order delivery to the This is used by the receiver to ensure in-order delivery to the
application layer. Meanwhile, the subflow-level sequence numbers application layer. Meanwhile, the subflow-level sequence numbers
(i.e. the regular sequence numbers in the TCP header) have subflow- (i.e. the regular sequence numbers in the TCP header) have subflow-
only relevance. It is expected (but not mandated) that SACK [6] is only relevance. It is expected (but not mandated) that SACK [7] is
used at the subflow level to improve efficiency. used at the subflow level to improve efficiency.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
| Kind=DSN_MAP | Length | Data Sequence Number ... : | Kind=DSN_MAP | Length | Data Sequence Number ... :
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
: ... ( (length-12) octets ) | Data-level Length (2 octets) | : ... ( (length-10) octets ) | Data-level Length (2 octets) |
+-------------------------------+------------------------------+ +-------------------------------+------------------------------+
| Subflow Sequence Number (4 octets) | | Subflow Sequence Number (4 octets) |
+--------------------------------------------------------------+ +-------------------------------+------------------------------+
| CRC-32C (4 octets) | | Checksum (2 octets) |
+--------------------------------------------------------------+ +-------------------------------+
Figure 6: Data Sequence Mapping (DSN_MAP) option Figure 7: Data Sequence Mapping (DSN_MAP) option
This option specifies a full mapping from data sequence number to This option specifies a full mapping from data sequence number to
subflow sequence number, informing the receiver that there is a one- subflow sequence number, informing the receiver that there is a one-
to-one correspondence between the two sequence spaces for the to-one correspondence between the two sequence spaces for the
specified length (number of bytes of data). The purpose of the specified length (number of bytes of data). The purpose of the
explicit mapping is to assist with compatibility with situations explicit mapping is to assist with compatibility with situations
where TCP/IP segmentation or coalescing is undertaken separately from where TCP/IP segmentation or coalescing is undertaken separately from
the stack that is generating the data flow (e.g. through the use of the stack that is generating the data flow (e.g. through the use of
TCP segmentation offloading on network interface cards, or by TCP segmentation offloading on network interface cards, or by
middleboxes such as performance enhancing proxies). It also allows a middleboxes such as performance enhancing proxies). It also allows a
single mapping to cover many packets, which may be useful in bulk single mapping to cover many packets, which may be useful in bulk
transfer situations. transfer situations.
The data sequence number specified in this option is absolute, The data sequence number specified in this option is absolute,
whereas the subflow sequence numbering is relative (the SYN at the whereas the subflow sequence numbering is relative (the SYN at the
start of the subflow has relative subflow sequence number 1). This start of the subflow has relative subflow sequence number 1). This
is allow middleboxes to change the Initial Sequence Number of a is allow middleboxes to change the Initial Sequence Number of a
subflow, since the data stream itself will not be affected (some subflow, since the data stream itself will not be affected (some
firewalls do ISN randomization). firewalls do ISN randomization).
The final four octets of this option contain a checksum of the data The final two octets of this option contain a checksum of the data
that this mapping covers. This is a CRC-32C checksum, the same as that this mapping covers. This is used to detect if the payload has
used in SCTP [7]. This is used to detect if the payload has been been adjusted in any way by a non-MPTCP-aware middlebox. If this
adjusted in any way by a non-MPTCP-aware middlebox. If this checksum checksum fails, it will trigger a failure of the subflow, or a
fails, it will trigger a failure of the subflow, or a fallback to fallback to regular TCP, as documented in Section 3.6. The checksum
regular TCP, as documented in Section 3.6. algorithm used is the standard TCP checksum [2], operating only over
the data covered by this DSN_MAP (i.e. there is no pseudo-header).
This algorithm has been chosen since it will be calculated anyway for
the TCP subflow, and if calculated first over the data before adding
the pseudo-header, it only needs to be calculated once. Furthermore,
since the TCP checksum is additive, the checksum for a DSN_MAP can be
constructed by simply adding together the checksums for the data of
each constituent TCP segment. This relies on the TCP subflow
containing contiguous data, however, and thus a TCP subflow MUST NOT
use the Urgent Pointer (i.e. the URG flag MUST be zero).
TBD: Is this the most appropriate checksum, or would the IP checksum
algorithm be more appropriate?
A mapping is unique, in that the subflow sequence number is bound to A mapping is unique, in that the subflow sequence number is bound to
the data sequence number after the mapping has been processed. It is the data sequence number after the mapping has been processed. It is
not possible to change this mapping afterwards (although the length not possible to change this mapping afterwards (although the length
of a mapping can extend); however, the same data sequence number can of a mapping can extend); however, the same data sequence number can
be mapped on different subflows for retransmission purposes (see be mapped on different subflows for retransmission purposes (see
Section 3.3.4). Section 3.3.4).
To avoid possible deadlock scenarios, subflow-level processing should To avoid possible deadlock scenarios, subflow-level processing should
be undertaken separately from that at connection-level. Therefore, be undertaken separately from that at connection-level. Therefore,
even if a mapping does not exist from the subflow space to the data- even if a mapping does not exist from the subflow space to the data-
skipping to change at page 14, line 40 skipping to change at page 17, line 15
insertion attacks are not stringent, then it is permissible to insertion attacks are not stringent, then it is permissible to
include just the lower 32 bits of the sequence number in the DSN_MAP include just the lower 32 bits of the sequence number in the DSN_MAP
option as an optimization. Implementations MUST accept this and option as an optimization. Implementations MUST accept this and
implicitly promote it to a 64-bit quantity by incrementing the upper implicitly promote it to a 64-bit quantity by incrementing the upper
32 bits of sequence number each time the lower 32 bits wrap. By 32 bits of sequence number each time the lower 32 bits wrap. By
defauly, the full 64 bit DSN_MAP should be sent. Security defauly, the full 64 bit DSN_MAP should be sent. Security
implications are discussed in Section 5. implications are discussed in Section 5.
As with the standard TCP sequence number, the data sequence number As with the standard TCP sequence number, the data sequence number
should not start at zero, but at a random value to make blind session should not start at zero, but at a random value to make blind session
hijacking harder. This is done by including the most significant six hijacking harder. This is done by setting the initial data sequence
octets of the initial data sequence number in the MP_CAPABLE option number (IDSN) of each host to the least significant 64 bits of the
in the initial connection SYN (which itself occupies one octet of SHA-1 hash of the host's key (as declared in the MP_CAPABLE option in
data sequence space; see Section 3.1). the initial connection SYN, which itself occupies the first octet of
data sequence space). This handshake is described in more detail in
Section 3.1.
The DSN_MAP option does not need to be included in every MPTCP The DSN_MAP option does not need to be included in every MPTCP
packet, as long as the subflow sequence space in that packet is packet, as long as the subflow sequence space in that packet is
covered by a mapping known at the receiver. This can be used to covered by a mapping known at the receiver. This can be used to
reduce overhead in cases where the mapping is known in advance; one reduce overhead in cases where the mapping is known in advance; one
such case is when there is a single subflow between the endpoints, such case is when there is a single subflow between the endpoints,
another is when segments of data are scheduled in larger than packet- another is when segments of data are scheduled in larger than packet-
sized chunks. An "infinite" mapping can be used to fallback to sized chunks. An "infinite" mapping can be used to fallback to
regular TCP by mapping the subflow-level data to the connection-level regular TCP by mapping the subflow-level data to the connection-level
data for the remainder of the connection (see Section 3.6). This is data for the remainder of the connection (see Section 3.6). This is
achieved by setting the data-level length field to the reserved value achieved by setting the data-level length field to the reserved value
of 0. of 0.
3.3.2. Data Acknowledgements 3.3.2. Data Acknowledgements
To provide full end-to-end resilience, MPTCP provides a connection- To provide full end-to-end resilience, MPTCP provides a connection-
level acknowledgement, the DATA_ACK, illustrated in Figure 7, to act level acknowledgement, the DATA_ACK, illustrated in Figure 8, to act
as a cumulative ACK for the connection as a whole. This is analogous as a cumulative ACK for the connection as a whole. This is analogous
to the behaviour of the standard TCP cumulative ACK in TCP SACK - to the behaviour of the standard TCP cumulative ACK in TCP SACK -
indicating how much data has been successfully received (with no indicating how much data has been successfully received (with no
holes). holes).
The rationale for the inclusion of the DATA_ACK includes the The rationale for the inclusion of the DATA_ACK includes the
existence of certain middleboxes that pro-actively ACK packets, and existence of certain middleboxes that pro-actively ACK packets, and
thus might cause deadlock conditions if data were acked at the thus might cause deadlock conditions if data were acked at the
subflow level but then fails to reach the receiver. This sort of bad subflow level but then fails to reach the receiver. This sort of bad
interaction might be expecially prevalent when the receiver is interaction might be expecially prevalent when the receiver is
skipping to change at page 15, line 49 skipping to change at page 18, line 26
transfer is unidirectional. transfer is unidirectional.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
| Kind=DATA_ACK | Length | Data Sequence Number ... : | Kind=DATA_ACK | Length | Data Sequence Number ... :
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
: ... ( (length-2) octets ) | : ... ( (length-2) octets ) |
+-------------------------------+ +-------------------------------+
Figure 7: Connection-level Acknowledgement (DATA_ACK) Figure 8: Connection-level Acknowledgement (DATA_ACK)
3.3.3. Receiver Considerations 3.3.3. Receiver Considerations
Regular TCP advertises a receive window in each packet, telling the Regular TCP advertises a receive window in each packet, telling the
sender how much data the receiver is willing to accept past the sender how much data the receiver is willing to accept past the
cumulative ack. The receive window is used to implement flow cumulative ack. The receive window is used to implement flow
control, throttling down fast senders when receivers cannot keep up. control, throttling down fast senders when receivers cannot keep up.
MPTCP also uses a unique receive window, shared between the subflows. MPTCP also uses a unique receive window, shared between the subflows.
The idea is to allow any subflow to send data as long as the receiver The idea is to allow any subflow to send data as long as the receiver
skipping to change at page 19, line 38 skipping to change at page 22, line 15
For instance, a possibility is an 'all-or-nothing' approach, i.e. For instance, a possibility is an 'all-or-nothing' approach, i.e.
have a second path ready for use in the event of failure of the first have a second path ready for use in the event of failure of the first
path, but alternatives could include entirely saturating one path path, but alternatives could include entirely saturating one path
before using an additional path (the 'overflow' case). Such choices before using an additional path (the 'overflow' case). Such choices
would be most likely based on the monetary cost of links, but may would be most likely based on the monetary cost of links, but may
also be based on properties such as the delay or jitter of links, also be based on properties such as the delay or jitter of links,
where stability is more important than throughput. Application where stability is more important than throughput. Application
requirements such as these are discussed in detail in [5]. requirements such as these are discussed in detail in [5].
The ability to make effective choices at the sender requires full The ability to make effective choices at the sender requires full
knowledge of the path "cost", which is unlikely to be the case. knowledge of the path "cost", which is unlikely to be the case. It
There is no mechanism in MPTCP for a receiver to signal their own would be desirable for a receiver to be able to signal their own
particular preferences for paths, but this is a necessary feature preferences for paths, since they will often be the multihomed party,
since receivers will often be the multihomed party, and may have to and may have to pay for metered incoming bandwidth.
pay for metered incoming bandwidth. Instead of incorporating complex
signalling, it is proposed to use existing TCP features to signal
priority implicitly. If a receiver wishes to keep a path active as a
backup but wishes to prevent data being sent on that path, it could
stop sending ACKs for any data it receives on that path. The sender
would interpret this as severe congestion or a broken path and stop
using it. We do not advocate this method, however, since this will
result in unnecessary retransmissions.
Therefore, a proposal is to use ECN [8] to to provide fake congestion Whilst fine-grained control may be the most powerful solution, that
signals on paths that a receiver wishes to stop being used for data. would require some mechanism such as overloading the ECN signal [8],
This has the benefit of causing the sender to back off without the which is undesirable, and it is felt that there would not be
need to retransmit data unnecessarily, as in the case of a lost ACK. sufficient benefit to justify an entirely new signal. Therefore the
This should be sufficient to allow a receiver to express their MP_JOIN Section 3.2 and ADD_ADDR Section 3.5 options contain the 'B'
policy, although does not permit a rapid increase in throughput when bit, which allows a host to indicate to its peer that this path
switching to such a path. should be treated as a backup path to use only in the event of
failure of other working subflows (i.e. a subflow where the receiver
has indicated B=1 SHOULD NOT be used to send data unless there are no
usable subflows where B=0).
TBD: This is clearly an overload of the ECN signal, and as such other In the event that the available set of paths changes, a host may wish
solutions, such as explicitly signalling path operation preferences to signal a change in priority of subflows to the peer. Therefore,
(such as in the reserved bits of certain TCP options, or through the MP_PRIO option, shown in Figure 9, can be used to change the 'B'
entirely new options) may be a preferred solution. flag of the subflow on which it is sent.
1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------+-+
| Kind=MP_PRIO | Length=3 | (reserved) |B|
+---------------+---------------+-------------+-+
Figure 9: MP_PRIO option
It should be noted that the backup flag is a request from the
receiver to the sender only, and the sender SHOULD adhere to these
requests. The reciever, however, may continue using the subflow to
send data even if it has signalled B=1 to the other host.
3.4. Closing a Connection 3.4. Closing a Connection
In regular TCP a FIN announces the receiver that the sender has no In regular TCP a FIN announces the receiver that the sender has no
more data to send. In order to allow subflows to operate more data to send. In order to allow subflows to operate
independently and to keep the appearance of TCP over the wire, a FIN independently and to keep the appearance of TCP over the wire, a FIN
in MPTCP only affects the subflow on which it is sent. This allows in MPTCP only affects the subflow on which it is sent. This allows
nodes to exercise considerable freedom over which paths are in use at nodes to exercise considerable freedom over which paths are in use at
any one time. The semantics of a FIN remain as for regular TCP, i.e. any one time. The semantics of a FIN remain as for regular TCP, i.e.
it is not until both sides have ACKed each other's FINs that the it is not until both sides have ACKed each other's FINs that the
subflow is fully closed. subflow is fully closed.
When an application calls close() on a socket, this indicates that it When an application calls close() on a socket, this indicates that it
has no more data to send, and for regular TCP this would result in a has no more data to send, and for regular TCP this would result in a
FIN on the connection. For MPTCP, an equivalent mechanism is needed, FIN on the connection. For MPTCP, an equivalent mechanism is needed,
and this is the DATA_FIN. This option, shown in Figure 8, is and this is the DATA_FIN. This option, shown in Figure 10, is
attached to a regular FIN option on a subflow. attached to a regular FIN option on a subflow.
A DATA_FIN is an indication that the sender has no more data to send, A DATA_FIN is an indication that the sender has no more data to send,
and as such can be used as a rapid indication of the end of data from and as such can be used as a rapid indication of the end of data from
a sender. A DATA_FIN, as with the FIN on a regular TCP connection, a sender. A DATA_FIN, as with the FIN on a regular TCP connection,
is a unidirectional signal. is a unidirectional signal.
A DATA_FIN occupies one octet (the final octet) of Data Sequence A DATA_FIN occupies one octet (the final octet) of Data Sequence
Number space. This number is included in the option, and will be Number space. This number is included in the option, and will be
ACKed at data level to ensure reliable delivery. ACKed at data level to ensure reliable delivery.
skipping to change at page 21, line 38 skipping to change at page 24, line 21
1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
| Kind=DATA_FIN | Length=10 | Data Sequence Number (8B) : | Kind=DATA_FIN | Length=10 | Data Sequence Number (8B) :
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
: Data Sequence Number (contd.) : : Data Sequence Number (contd.) :
+-------------------------------+------------------------------+ +-------------------------------+------------------------------+
: Data Sequence Number (contd.)| : Data Sequence Number (contd.)|
+-------------------------------+ +-------------------------------+
Figure 8: DATA_FIN option Figure 10: DATA_FIN option
3.5. Address Knowledge Exchange (Path Management) 3.5. Address Knowledge Exchange (Path Management)
We use the term "path management" to refer to the exchange of We use the term "path management" to refer to the exchange of
information about additional paths between endpoints, which in this information about additional paths between endpoints, which in this
design is managed by multiple addresses at endpoints. For more design is managed by multiple addresses at endpoints. For more
detail of the architectural thinking behind this design, see the detail of the architectural thinking behind this design, see the
separate architecture document [3]. separate architecture document [3].
This design makes use of two methods of sharing such information, This design makes use of two methods of sharing such information,
skipping to change at page 22, line 16 skipping to change at page 24, line 47
complementary: the first is implicit and simple, while the explicit complementary: the first is implicit and simple, while the explicit
is more complex but is more robust. Together, the mechanisms allow is more complex but is more robust. Together, the mechanisms allow
addresses to change in flight (and thus support operation through addresses to change in flight (and thus support operation through
NATs, since the source address need not be known), and also allow the NATs, since the source address need not be known), and also allow the
signalling of previously unknown addresses, and of addresses signalling of previously unknown addresses, and of addresses
belonging to other address families (e.g. IPv4 and IPv6). belonging to other address families (e.g. IPv4 and IPv6).
Here is an example of typical operation of the protocol: Here is an example of typical operation of the protocol:
o A1 of host A and address/port B1 of host B. If host A is o A1 of host A and address/port B1 of host B. If host A is
multihomed, it can start an additional subflow from its address A2 multihomed and multi-addressed, it can start an additional subflow
to B1, by sending a SYN with a Join option from A2 to B1, using from its address A2 to B1, by sending a SYN with a Join option
B's previously declared token for this connection. Alternatively, from A2 to B1, using B's previously declared token for this
if B is multhomed, it can try to set up a new subflow from B2 to connection. Alternatively, if B is multhomed, it can try to set
A1, using A's previously declared token. In either case, the SYN up a new subflow from B2 to A1, using A's previously declared
will be sent to the port already in use for the original subflow token. In either case, the SYN will be sent to the port already
on the receiving host. in use for the original subflow on the receiving host.
o Simultaneously (or after a timeout), an ADD_ADDR option o Simultaneously (or after a timeout), an ADD_ADDR option
(Section 3.5.1) is sent on an existing subflow, informing the (Section 3.5.1) is sent on an existing subflow, informing the
receiver of the sender's alternative address(es). The recipient receiver of the sender's alternative address(es). The recipient
can use this information to open a new subflow to the sender's can use this information to open a new subflow to the sender's
additional address. In our example, A will send ADD_ADDR option additional address. In our example, A will send ADD_ADDR option
informing B of address A2. The mix of using the SYN-based option informing B of address A2. The mix of using the SYN-based option
and the ADD_ADDR option, including timeouts, is implementation- and the ADD_ADDR option, including timeouts, is implementation-
specific and can be tailored to agree with local policy. specific and can be tailored to agree with local policy.
skipping to change at page 22, line 51 skipping to change at page 25, line 35
gained if a host ensures there is a correlated ADD_ADDR option gained if a host ensures there is a correlated ADD_ADDR option
before responding to the SYN. before responding to the SYN.
Other ways of using the two signaling mechanisms are possible; for Other ways of using the two signaling mechanisms are possible; for
instance, signaling addresses in other address families can only be instance, signaling addresses in other address families can only be
done explicitly using the Add Address option. done explicitly using the Add Address option.
3.5.1. Address Advertisement 3.5.1. Address Advertisement
The Add Address (ADD_ADDR) TCP Option announces additional addresses The Add Address (ADD_ADDR) TCP Option announces additional addresses
on which an endpoint can be reached (Figure 9). It can be used to on which an endpoint can be reached (Figure 11). It can be used to
announce several (ID, address) pairs to be announced to the other announce several (ID, address) pairs to be announced to the other
endpoint. Multiple addresses can be added in a single message if endpoint. Multiple addresses can be added in a single message if
there is sufficient TCP option space, otherwise multiple TCP messages there is sufficient TCP option space, otherwise multiple TCP messages
containing this option will be sent. This option can be used at any containing this option will be sent. This option can be used at any
time during a connection, depending on when the sender wishes to time during a connection, depending on when the sender wishes to
enable multiple paths and/or when paths become available. enable multiple paths and/or when paths become available.
Every address has an ID which can be used for address removal, and Every address has an ID which can be used for address removal, and
therefore endpoints must cache the mapping between ID and address. therefore endpoints must cache the mapping between ID and address.
This is also used to identify Join Connection options (Section 3.2) This is also used to identify Join Connection options (Section 3.2)
relating to the same address, even when address translators are in relating to the same address, even when address translators are in
use. The ID must be unique to the sender and connection, per use. The ID must uniquely identify the address to the sender (within
address, but its mechanism for allocating such IDs is implementation- the connection), but its mechanism for allocating such IDs is
specific. implementation-specific.
This option is shown for IPv4. For IPv6, the IPVer field will read This option is shown for IPv4. For IPv6, the IPVer field will read
6, and the length of the address will be 16 octets (instead of 4), 6, and the length of the address will be 16 octets (instead of 4),
and the length of the option will be 2 + (18 * number_of_entries). and the length of the option will be 2 + (18 * number_of_entries).
If there is sufficient TCP option space, multiple addresses can be If there is sufficient TCP option space, multiple addresses can be
included, with an ID following on immediately from the previous included, with an ID following on immediately from the previous
address. The number of addresses can be deduced from the option address. The number of addresses can be deduced from the option
length and version fields. length and version fields.
The 'P' bit is used to indicate the presence of an additional two The 'P' bit is used to indicate the presence of an additional two
octets specifying the port number to use. Although it is expected octets specifying the port number to use. Although it is expected
that the majority of use cases will use the same port pairs as used that the majority of use cases will use the same port pairs as used
for the initial subflow (e.g. port 80 remains port 80 on all for the initial subflow (e.g. port 80 remains port 80 on all
subflows, as does the ephemeral port at the client, there may be subflows, as does the ephemeral port at the client, there may be
cases (such as port-based load balancing) where the explicit cases (such as port-based load balancing) where the explicit
specification of a different port is required. If the P bit is not specification of a different port is required. If the P bit is not
specified, MPTCP MUST attempt to connect to the specified address on specified, MPTCP MUST attempt to connect to the specified address on
same port as is already in use by the signalling subflow. same port as is already in use by the signalling subflow.
[TBD: We could make use of an additional flag, as follows. Exact The 'B' bit is used to indicate that this specified address (and
behaviour to be worked out: The 'B' bit is used to indicate that this port, if applicable) should be treated as a backup subflow to use
specified address (and port, if applicable) should be treated as a only in the event of failure of other working subflows. A receiver
backup subflow to use only in the event of failure of other working of this option SHOULD set up a TCP subflow to the specified address
subflows. A receiver of this option SHOULD set up a TCP subflow to and port, but SHOULD NOT send data on it until the other paths have
the specified address and port, but SHOULD NOT send data on it until failed.
the other paths have failed.]
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+-------+-------+ +---------------+---------------+---------------+-------+---+-+-+
| Kind=ADD_ADDR | Length | Address ID | IPVer |(res)|P| | Kind=ADD_ADDR | Length | Address ID | IPVer | |B|P|
+---------------+---------------+---------------+-------+-------+ +---------------+---------------+---------------+-------+---+-+-+
| Address (IPv4 - 4 octets / IPv6 - 16 octets) | | Address (IPv4 - 4 octets / IPv6 - 16 octets) |
+-------------------------------+-------------------------------+ +-------------------------------+-------------------------------+
| Port (2 octets if P=1) | ... | Port (2 octets if P=1) | ...
+-------------------------------+ +-------------------------------+
( ... further ID/Version/Address/Port fields as required ... ) ( ... further ID/Version/Address/Port fields as required ... )
Figure 9: Add Address (ADD_ADDR) option (shown for IPv4) Figure 11: Add Address (ADD_ADDR) option (shown for IPv4)
Due to the proliferation of NATs, it is reasonably likely that one Due to the proliferation of NATs, it is reasonably likely that one
endpoint may attempt to advertise private addresses [9]. We do not endpoint may attempt to advertise private addresses [9]. We do not
wish to blanket prohibit this, since there may be cases where both wish to blanket prohibit this, since there may be cases where both
endpoints have additional interfaces on the same private network. We endpoints have additional interfaces on the same private network. We
must ensure, however, that such advertisements do not cause harm. must ensure, however, that such advertisements do not cause harm.
The standard mechanism to create a new subflow (Section 3.2) contains The standard mechanism to create a new subflow (Section 3.2) contains
a randomly-generated 32-bit token that uniquely identifies the a 32-bit token that uniquely identifies the connection to the
connection to the receiving endpoint . If the token is unknown, the receiving endpoint . If the token is unknown, the endpoint will
endpoint will return with a RST. If the token is known, subflow return with a RST. If the token is known, subflow setup will
setup will continue, but the sender's token will be sent back. In continue, but the sender's token will be sent back. In order for a
order for a new subflow to be setup, both tokens must match what each new subflow to be setup, both tokens must match what each endpoint
endpoint expects. This will provide sufficient protection against expects. This will be further followed by the HMAC exchange for
two unconnected endpoints accidentally setting up a new subflow upon authentication. This will provide sufficient protection against two
the signal of a private address (furthermore, the mismatch in Data unconnected endpoints accidentally setting up a new subflow upon the
Sequence Number that would occur would provide even further signal of a private address.
protection).
Ideally, we'd like to ensure the ADD_ADDR (and REMOVE_ADDR) option is Ideally, we'd like to ensure the ADD_ADDR (and REMOVE_ADDR) option is
sent reliably and in order to the other end. This is to ensure that sent reliably and in order to the other end. This is to ensure that
we don't close the connection when remove/add addresses are processed we don't close the connection when remove/add addresses are processed
in reverse order, and to ensure that all possible paths are used. We in reverse order, and to ensure that all possible paths are used. We
note, however, that losing reliability and ordering it will not break note, however, that losing reliability and ordering it will not break
the multipath connections; they will just reduce the opportunity to the multipath connections; they will just reduce the opportunity to
open multipath paths and to survive different patterns of path open multipath paths and to survive different patterns of path
failures. failures.
skipping to change at page 25, line 34 skipping to change at page 28, line 8
option on all available subflows. option on all available subflows.
3.5.2. Remove Address 3.5.2. Remove Address
If, during the lifetime of a MPTCP connection, a previously-announced If, during the lifetime of a MPTCP connection, a previously-announced
address becomes invalid (e.g. if the interface disappears), the address becomes invalid (e.g. if the interface disappears), the
affected endpoint should announce this so that the other endpoint can affected endpoint should announce this so that the other endpoint can
remove subflows related to this address. remove subflows related to this address.
This is achieved through the Remove Address (REMOVE_ADDR) option This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 10), which will remove a previously-added address (or list of (Figure 12), which will remove a previously-added address (or list of
addresses) from a connection and terminate any subflows currently addresses) from a connection and terminate any subflows currently
using that address. using that address.
For security purposes, if a host receives a REMOVE_ADDR option, it For security purposes, if a host receives a REMOVE_ADDR option, it
must ensure the affected path(s) are no longer in use before it must ensure the affected path(s) are no longer in use before it
instigates closure. The receipt of REMOVE_ADDR should first trigger instigates closure. The receipt of REMOVE_ADDR should first trigger
the sending of a TCP Keepalive [10] on the path, and if a response is the sending of a TCP Keepalive [10] on the path, and if a response is
received the path is not removed. Typical TCP validity tests on the received the path is not removed. Typical TCP validity tests on the
subflow (e.g. ensuring sequence and ack numbers are correct) MUST subflow (e.g. ensuring sequence and ack numbers are correct) MUST
also be undertaken. also be undertaken.
skipping to change at page 26, line 17 skipping to change at page 28, line 40
The standard way to close a subflow (so long as it is still The standard way to close a subflow (so long as it is still
functioning) is to use a FIN exchange as in regular TCP - for more functioning) is to use a FIN exchange as in regular TCP - for more
information, see Section 3.4. information, see Section 3.4.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+ +---------------+---------------+---------------+
|Kind=REMOVEADDR| Length = 2+n | Address ID | ... |Kind=REMOVEADDR| Length = 2+n | Address ID | ...
+---------------+---------------+---------------+ +---------------+---------------+---------------+
Figure 10: Remove Address (REMOVE_ADDR) option Figure 12: Remove Address (REMOVE_ADDR) option
3.6. Fallback 3.6. Fallback
At the start of a MPTCP connection (i.e. the first subflow), it is At the start of a MPTCP connection (i.e. the first subflow), it is
important to ensure that the path is fully MPTCP-capable and the important to ensure that the path is fully MPTCP-capable and the
necessary TCP options can reach each endpoint. The handshake as necessary TCP options can reach each endpoint. The handshake as
described in Section 3.1 will fall back to regular TCP if either of described in Section 3.1 will fall back to regular TCP if either of
the SYN messages do not have the MPTCP options: this is the same, and the SYN messages do not have the MPTCP options: this is the same, and
desired, behaviour in the case where an endpoint is not MPTCP desired, behaviour in the case where an endpoint is not MPTCP
capable, or the path does not support he MPTCP options. When capable, or the path does not support he MPTCP options. When
skipping to change at page 27, line 15 skipping to change at page 29, line 38
subflow, it should be treated as a standard path failure. The data subflow, it should be treated as a standard path failure. The data
would not be DATA_ACKed (since there is no mapping for the data), and would not be DATA_ACKed (since there is no mapping for the data), and
the subflow can be closed with an RST. the subflow can be closed with an RST.
The case described above is a specialised case of fallback. More The case described above is a specialised case of fallback. More
generally, fallback to regular TCP can become necessary at any point generally, fallback to regular TCP can become necessary at any point
during a connection if a non-MPTCP-aware middlebox changes the data during a connection if a non-MPTCP-aware middlebox changes the data
stream. stream.
As described in Section 3.3, each portion of data for which there is As described in Section 3.3, each portion of data for which there is
a mapping is protected by a CRC-32 checksum. This mechanism is used a mapping is protected by a checksum. This mechanism is used to
to detect if middleboxes have made any adjustments to the payload detect if middleboxes have made any adjustments to the payload
(added, removed, or changed data). A checksum will fail if the data (added, removed, or changed data). A checksum will fail if the data
has been changed in any way. This will also detect if the length of has been changed in any way. This will also detect if the length of
data on the subflow is increased or decreased, and this means the data on the subflow is increased or decreased, and this means the
Data Sequence Mapping is no longer valid. The sender no longer knows Data Sequence Mapping is no longer valid. The sender no longer knows
what subflow-level sequence number the receiver is genuinely what subflow-level sequence number the receiver is genuinely
operating at (the middlebox will be faking ACKs in return), and operating at (the middlebox will be faking ACKs in return), and
cannot signal any further mappings. Furthermore, in addition to the cannot signal any further mappings. Furthermore, in addition to the
possibility of payload modifications that are valid at the possibility of payload modifications that are valid at the
application layer, there is the possibility that false-positives application layer, there is the possibility that false-positives
could be hit across segment boundaries, corrupting the data. could be hit across segment boundaries, corrupting the data.
Therefore, all data from the segment that failed the checksum onwards Therefore, all data from the start of the segment that failed the
is not trustworthy. checksum onwards is not trustworthy.
When multiple subflows are in use, the data in-flight on a subflow When multiple subflows are in use, the data in-flight on a subflow
will likely involve data that is not contiguously part of the will likely involve data that is not contiguously part of the
connection-level stream, since segments will be spread across the connection-level stream, since segments will be spread across the
multiple subflows. Due to the problems identified above, it is not multiple subflows. Due to the problems identified above, it is not
possible to determine what the adjustment has done to the data possible to determine what the adjustment has done to the data
(notably, any changes to the subflow sequence numbering). Therefore, (notably, any changes to the subflow sequence numbering). Therefore,
it is not possible to recover the subflow, and the affected subflow it is not possible to recover the subflow, and the affected subflow
must be immediately closed with an RST, featuring a "checksum failed" must be immediately closed with an RST, featuring a "checksum failed"
option, which defines the Data Sequence Number at the start of the option, which defines the Data Sequence Number at the start of the
segment (defined by the Data Sequence Mapping) which had the checksum segment (defined by the Data Sequence Mapping) which had the checksum
failure (see Figure 11). failure (see Figure 13).
1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+--------------+ +---------------+---------------+---------------+--------------+
| Kind=MP_FAIL | Length=10 | Data Sequence Number (8B) : | Kind=MP_FAIL | Length=10 | Data Sequence Number (8B) :
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
: Data Sequence Number (contd.) : : Data Sequence Number (contd.) :
+-------------------------------+------------------------------+ +-------------------------------+------------------------------+
: Data Sequence Number (contd.)| : Data Sequence Number (contd.)|
+-------------------------------+ +-------------------------------+
Figure 11: Fallback (MP_FAIL) option
Figure 13: Fallback (MP_FAIL) option
TBD: In this case, is there any point in signalling Checksum Failed, TBD: In this case, is there any point in signalling Checksum Failed,
or could we just RST the subflow? The signal would allow the sender or could we just RST the subflow? The signal would allow the sender
to know there is something wrong with the path and not try to re- to know there is something wrong with the path and not try to re-
establish the subflow (if that was otherwise the policy). establish the subflow (if that was otherwise the policy).
Failed data will not be DATA_ACKed and so will be re-transmitted on Failed data will not be DATA_ACKed and so will be re-transmitted on
other subflows (Section 3.3.4). other subflows (Section 3.3.4).
A special case is when there is a single subflow and it fails with a A special case is when there is a single subflow and it fails with a
skipping to change at page 28, line 33 skipping to change at page 31, line 8
Section 3.3) from the data sequence number of the segment that failed Section 3.3) from the data sequence number of the segment that failed
the checksum. This connection will then continue to appear as a the checksum. This connection will then continue to appear as a
regular TCP session, and a middlebox may change the payload without regular TCP session, and a middlebox may change the payload without
causing unintentional harm. causing unintentional harm.
An optimisation is possible, however. If it is known that all An optimisation is possible, however. If it is known that all
unacknowledged data in flight is contiguous, an infinite mapping unacknowledged data in flight is contiguous, an infinite mapping
could be applied to the subflow without the need to close it first, could be applied to the subflow without the need to close it first,
and essentially turn off all further MPTCP signalling. In this case, and essentially turn off all further MPTCP signalling. In this case,
if a receiver identifies a checksum failure when there is only one if a receiver identifies a checksum failure when there is only one
path, it will send back an OPT_FAIL on the subflow-level ACK. The path, it will send back an MP_FAIL option on the subflow-level ACK.
sender will receive this, and if all unacknowledged data in flight is The sender will receive this, and if all unacknowledged data in
contiguous, will signal an infinite mapping (if the data is not flight is contiguous, will signal an infinite mapping (if the data is
contiguous, the sender MUST send an RST). This infinite mapping will not contiguous, the sender MUST send an RST). This infinite mapping
be a Data Sequence Mapping option on the first new packet, but it will be a Data Sequence Mapping option on the first new packet, but
acts retroactively, referring to the start of the subflow sequence it acts retroactively, referring to the start of the subflow sequence
number of the last segment that was known to be delivered intact. number of the last segment that was known to be delivered intact.
From that point onwards data can be altered by a middlebox without From that point onwards data can be altered by a middlebox without
affecting MPTCP, as the data stream is equivalent to a regular, affecting MPTCP, as the data stream is equivalent to a regular,
legacy TCP session. legacy TCP session.
After a sender signals an infinite mapping it MUST only use subflow After a sender signals an infinite mapping it MUST only use subflow
ACKs to clear its send buffer. This is because data ACKs may become ACKs to clear its send buffer. This is because Data ACKs may become
misaligned with the subflow ACKs when middleboxes insert or delete misaligned with the subflow ACKs when middleboxes insert or delete
data. The receive SHOULD stop generating Data ACKs after it receives data. The receive SHOULD stop generating Data ACKs after it receives
an infinite mapping. an infinite mapping.
When a connection is in fallback mode, only one subflow can send data When a connection is in fallback mode, only one subflow can send data
at a time. Otherwise, the receiver would not know how to reorder the at a time. Otherwise, the receiver would not know how to reorder the
data. However, subflows can be opened and close as necessary, as data. However, subflows can be opened and closed as necessary, as
long as a single one is active at any point. long as a single one is active at any point.
It should be emphasised that we are not attempting to prevent the use It should be emphasised that we are not attempting to prevent the use
of middleboxes that want to adjust the payload. An MPTCP-aware of middleboxes that want to adjust the payload. An MPTCP-aware
middlebox to provide such functionality could be designed that would middlebox to provide such functionality could be designed that would
re-write checksums if needed, and additionally would be able to parse re-write checksums if needed, and additionally would be able to parse
the data sequence mappings, and thus not hit false positives though the data sequence mappings, and thus not hit false positives though
not knowing where data boundaries lie. not knowing where data boundaries lie.
3.7. Error Handling 3.7. Error Handling
skipping to change at page 29, line 29 skipping to change at page 32, line 5
of an RST - has already been covered in Section 4. Where possible, of an RST - has already been covered in Section 4. Where possible,
we do not want to deviate from regular TCP behaviour. we do not want to deviate from regular TCP behaviour.
The following list covers possible errors and the appropriate MPTCP The following list covers possible errors and the appropriate MPTCP
behaviour: behaviour:
o Unknown token in MP_JOIN (or token mismatch in MP_JOIN ACK, or o Unknown token in MP_JOIN (or token mismatch in MP_JOIN ACK, or
missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
behaviour on an unknown port) behaviour on an unknown port)
o (TBD: If we include DSN in MP_JOIN, and the DSN is out of the
window but the token is valid, do we still send an RST?)
o DSN out of Window (during normal operation): just ignore, however o DSN out of Window (during normal operation): just ignore, however
if at the beginning of a new subflow we might want to RST it as a if at the beginning of a new subflow we might want to RST it as a
security mechanism security mechanism
o Remove request for unknown address ID: silently ignore o Remove request for unknown address ID: silently ignore
o DATA_ACK for data not yet sent: abort connection by RST on every
subflow.
3.8. Heuristics 3.8. Heuristics
There are a number of heuristics that are needed for performance or There are a number of heuristics that are needed for performance or
deployment but which are not required for protocol correctness. In deployment but which are not required for protocol correctness. In
this section we detail such heuristics this section we detail such heuristics
3.8.1. Port Usage 3.8.1. Port Usage
Under typical operation an MPTCP implementation SHOULD use the same Under typical operation an MPTCP implementation SHOULD use the same
ports as already in use. In other words, the destination port of a ports as already in use. In other words, the destination port of a
skipping to change at page 31, line 31 skipping to change at page 34, line 7
5-tuple: The 5-tuple (protocol, local address, local port, remote 5-tuple: The 5-tuple (protocol, local address, local port, remote
address, remote port) presented by kernel APIs to the application address, remote port) presented by kernel APIs to the application
layer in a non-multipath-aware application is that of the first layer in a non-multipath-aware application is that of the first
subflow, even if the subflow has since been closed and removed subflow, even if the subflow has since been closed and removed
from the connection. This decision, and other related API issues, from the connection. This decision, and other related API issues,
are discussed in more detail in [5]. are discussed in more detail in [5].
5. Security Considerations 5. Security Considerations
TBD As identified in [11], the addition of multipath capability to TCP
will bring with it a number of new classes of threat. In order to
(Token generation, handshake mechanisms, new subflow authentication, prevent these, [3] presents a set of requirements for a security
etc...) solution for MPTCP. The fundamental goal is for the security of
MPTCP to be "no worse" than regular TCP today, and the key security
A generic threat analysis for the addition of multipath capabilities requirements are:
to TCP is presented in [11]. The protocol presented here has been
designed to minimise or eliminate these identified threats. (A
future version of this document will explicitly address the presented
threats).
The development of a TCP extension such as this will bring with it
many additional security concerns. We have set out here to produce a
solution that is "no worse" than current TCP, with the possibility
that more secure extensions could be proposed later.
The primary area of concern will be around the handshake to start new o Provide a mechanism to confirm that the parties in a subflow
subflows which join existing connections. The proposal set out in handshake are the same as in the original connection setup.
Section 3.1 and Section 3.2 is for the initiator of the new subflow
to include the token of the other endpoint in the handshake. The
purpose of this is to indicate that the sender of this token was the
same entity that received this token at the initial handshake.
One area of concern is that the token could be simply brute-forced. o Provide verification that the peer can receive traffic at a new
The token must be hard to guess, and as such could be randomly address before using it as part of a connection.
generated. This may still not be strong enough, however, and so the
use of 64 bits for the token would alleviate this somewhat.
The two tokens don't need to be the same length. Token B could be 64 o Provide replay protection, i.e. ensure that a request to add/
bits and token A 32 bits. If MP_JOIN always contains Token B, this remove a subflow is 'fresh'.
would provide adequate security while saving scarce space in the
initial SYN, where it is most at a premium.
Use of these tokens only provide an indication that the token is the In order to achieve these goals, MPTCP includes a hash-based
same as at the initial handshake, and does not say anything about the handshake algorithm documented in Section 3.1 and Section 3.2.
current sender of the token. Therefore, another approach would be to
bring a new measure of freshness in to the handshake, so instead of
using the initial token a sender could request a new token from the
receiver to use in the next handshake. Hash chains could also be
used for this purpose.
Yet another alternative would be for all SYN packets to include a The security of the MPTCP connection hangs on the use of keys that
data sequence number. This could either be used as a passive are shared once at the start of the first subflow, and never again in
identifier to indicate an awareness of the current data sequence the clear. To ease demultiplexing whilst not giving away any
number (although a reasonable window would have to be allowed for cryptographic material, future subflows use a truncated SHA-1 hash of
delays). Or, the SYN could form part of the data sequence space - this key as the connection identification "token". The keys are used
but this would cause issues in the event of lost SYNs (if a new as keys in a HMAC, and this should verify that the parties in the
subflow is never established), thus causing unnecessary delays for handshake are the same as in the original connection setup. It also
retransmissions. provides verification that the peer can receive traffic at this new
address. Replay attacks would still be possible in this scenario,
and therefore the handshakes use single-use random numbers (nonces)
at both ends - this ensures the HMAC will never be the same on two
handshakes. The security mechanism presented in this draft should
therefore protect against all forms of flooding and hijacking attacks
suggested in [11].
6. Interactions with Middleboxes 6. Interactions with Middleboxes
Multipath TCP was designed to be deployable in the present world. Multipath TCP was designed to be deployable in the present world.
Its design takes into account "reasonable" existing middlebox Its design takes into account "reasonable" existing middlebox
behaviour. In this section we outline a few representative behaviour. In this section we outline a few representative
middlebox-related failure scenarios and show how multipath TCP middlebox-related failure scenarios and show how multipath TCP
handles them. Next, we list the design decisions multipath has made handles them. Next, we list the design decisions multipath has made
to accomodate the different middleboxes. to accomodate the different middleboxes.
skipping to change at page 33, line 10 skipping to change at page 35, line 17
during segment coalescing. during segment coalescing.
MPTCP SYN packets contain the MP_CAPABLE option to indicate the use MPTCP SYN packets contain the MP_CAPABLE option to indicate the use
of MPTCP. When the middlebox drops the packet containing the of MPTCP. When the middlebox drops the packet containing the
MP_CAPABLE option either on the outgoing or the return path, the MP_CAPABLE option either on the outgoing or the return path, the
connection will fail. Host A SHOULD fall back to TCP in such cases connection will fail. Host A SHOULD fall back to TCP in such cases
(studies suggest that few middleboxes drop packets with unknown (studies suggest that few middleboxes drop packets with unknown
options). The same applies for subflow setup. options). The same applies for subflow setup.
The second case is when the middleboxes strip options. Let's first The second case is when the middleboxes strip options. Let's first
discuss behaviour for initial connection SYNs (see Figure 12). If discuss behaviour for initial connection SYNs (see Figure 14). If
the option is stripped from the packet on the outgoing path, the the option is stripped from the packet on the outgoing path, the
connection falls back to regular TCP. If the option is stripped on connection falls back to regular TCP. If the option is stripped on
the return path, host B will wait for a DATA_ACK of its connection the return path, host B will wait for a DATA_ACK of its connection
SYN, retransmitting the SYN/ACK until it declares the connection SYN, retransmitting the SYN/ACK until it declares the connection
failed. Host A thinks it is talking to a regular host, and may send failed. Host A thinks it is talking to a regular host, and may send
data segments, but these will not be acked by host B as they do not data segments, but these will not be acked by host B as they do not
have the proper mapping. Hence the connection fails. Host A SHOULD have the proper mapping. Hence the connection fails. Host A SHOULD
fall back to regular TCP after the connection times out. fall back to regular TCP after the connection times out.
Subflow SYNs contain the MP_JOIN option. If this option is stripped Subflow SYNs contain the MP_JOIN option. If this option is stripped
skipping to change at page 33, line 46 skipping to change at page 36, line 23
Host A Host B Host A Host B
| SYN(MP_CAPABLE) | | SYN(MP_CAPABLE) |
|------------------------------------>| |------------------------------------>|
| Middlebox M | | Middlebox M |
| | | | | |
| SYN/ACK |SYN/ACK(MP_CAPABLE)| | SYN/ACK |SYN/ACK(MP_CAPABLE)|
|<----------------|-------------------| |<----------------|-------------------|
b) MP_CAPABLE option stripped on return path b) MP_CAPABLE option stripped on return path
Figure 12: Connection Setup with Middleboxes that Strip Options from Figure 14: Connection Setup with Middleboxes that Strip Options from
Packets Packets
We now examine data flow with MPTCP, assuming the flow is correctly We now examine data flow with MPTCP, assuming the flow is correctly
setup which implies the options in the SYN packets were allowed setup which implies the options in the SYN packets were allowed
through by the relevant middleboxes. If options are allowed through through by the relevant middleboxes. If options are allowed through
and there is no resegmentation or coalescing to TCP segments, and there is no resegmentation or coalescing to TCP segments,
multipath TCP flows can proceed without problems. multipath TCP flows can proceed without problems.
The case when options get stripped on data packets has been discussed The case when options get stripped on data packets has been discussed
in the Fallback section. We can further analyze what happens when a in the Fallback section. We can further analyze what happens when a
skipping to change at page 34, line 45 skipping to change at page 37, line 21
causing the subflow to permanently stall. MPTCP therefore uses causing the subflow to permanently stall. MPTCP therefore uses
the DATA_ACK to make progress when one of its subflows fails in the DATA_ACK to make progress when one of its subflows fails in
this way. This is why MPTCP does not use subflow ACKs to infer this way. This is why MPTCP does not use subflow ACKs to infer
connection level ACKs. connection level ACKs.
o Traffic Normalizers [14]: do not allow holes in sequence numbers, o Traffic Normalizers [14]: do not allow holes in sequence numbers,
cache packets and retransmit the same data. MPTCP looks like cache packets and retransmit the same data. MPTCP looks like
standard TCP on the wire, and will not retransmit different data standard TCP on the wire, and will not retransmit different data
on the same subflow sequence number. on the same subflow sequence number.
o TCP Options: may be removed, or packets with unknown options
dropped, by many classes of middleboxes. It is intended that the
initial SYN exchange, with a TCP Option, will be sufficient to
identify the path capabilities. If such a packet does not get
through, MPTCP will end up falling back to regular TCP.
o Segmentation/Coalescing (e.g. tcp segmentation offloading, etc):
might copy options between packets and might strip some options.
MPTCP's data sequence mapping includes the subflow sequence number
instead of using the sequence number in the segment. In this way,
the mapping is independent of the packets that carry it.
o Firewalls [15]: might perform sequence number randomization on TCP o Firewalls [15]: might perform sequence number randomization on TCP
connections. MPTCP uses relative sequence numbers in data connections. MPTCP uses relative sequence numbers in data
sequence mapping to cope with this. Like NATs, firewalls will not sequence mapping to cope with this. Like NATs, firewalls will not
permit many incoming connections, so MPTCP supports address permit many incoming connections, so MPTCP supports address
signalling (ADD_ADDR) so that a multihomed endpoint can invite its signalling (ADD_ADDR) so that a multi-addressed endpoint can
peer behind the firewall/NAT to connect out to its additional invite its peer behind the firewall/NAT to connect out to its
interface. additional interface.
o Intrusion Detection Systems: look out for traffic patterns and o Intrusion Detection Systems: look out for traffic patterns and
content that could threaten a network. Multipath will mean that content that could threaten a network. Multipath will mean that
such data is potentially spread, so it is more difficult for an such data is potentially spread, so it is more difficult for an
IDS to analyse the whole traffic, and potentially increasint the IDS to analyse the whole traffic, and potentially increasint the
risk of false positives. However, for an MPTCP-aware IDS, risk of false positives. However, for an MPTCP-aware IDS,
connection IDs can be easily read by such systems to correlate connection IDs can be easily read by such systems to correlate
multiple subflows and re-assemble for analysis. multiple subflows and re-assemble for analysis.
o Application level NATs: may alter the payload within a subflow. o Application level NATs: may alter the payload within a subflow.
Multipath TCP will detect these using the checksum and close the Multipath TCP will detect these using the checksum and close the
affected subflow(s), if there are other subflows that can be used. affected subflow(s), if there are other subflows that can be used.
If all subflows are affected multipath will fallback to TCP, If all subflows are affected multipath will fallback to TCP,
allowing middleboxes to change the payload. allowing middleboxes to change the payload.
o Middleboxes that alter the receive window: MPTCP will use the o Middleboxes that alter the receive window: MPTCP will use the
maximum window at data-level, but will also obey subflow specific maximum window at data-level, but will also obey subflow specific
windows. windows.
In addition, all classes of middleboxes may affect TCP traffic in the
following ways:
o TCP Options: may be removed, or packets with unknown options
dropped, by many classes of middleboxes. It is intended that the
initial SYN exchange, with a TCP Option, will be sufficient to
identify the path capabilities. If such a packet does not get
through, MPTCP will end up falling back to regular TCP.
o Segmentation/Coalescing (e.g. tcp segmentation offloading, etc):
might copy options between packets and might strip some options.
MPTCP's data sequence mapping includes the subflow sequence number
instead of using the sequence number in the segment. In this way,
the mapping is independent of the packets that carry it.
7. Interfaces 7. Interfaces
TBD TBD
Interface with applications, interface with TCP, interface with lower Interface with applications, interface with TCP, interface with lower
layers... layers...
Discussion of interaction with applications (both in terms of how Discussion of interaction with applications (both in terms of how
MPTCP will affect an application's assumptions of the transport MPTCP will affect an application's assumptions of the transport
layer, and what API extensions an application may wish to use with layer, and what API extensions an application may wish to use with
MPTCP) are discussed in [5]. MPTCP) are discussed in [5].
8. Open Issues 8. Acknowledgements
This specification is a work-in-progress, and as such there are many
issues that are still to be resolved. This section lists many of the
key open issues within this specification; these are discussed in
more detail in the appropriate sections throughout this document.
o Best handshake mechanisms (Section 3.1). This document contains a
proposed scheme by which connections and subflows can be set up.
It is felt that, although this is "no worse than regular TCP",
there could be opportunities for significant improvements in
security that could be included (potentially optionally) within
this protocol.
o Issues around simultaneous opens, where both ends attempt to
create a new subflow simultaneously, need to be investigated and
behaviour specified.
o Appropriate mechanisms for controlling policy/priority of subflow
usage (specifically regarding controlling incoming traffic,
Section 3.3.6). The ECN signal is currently proposed but other
alternatives, including per subflow receive windows or options
indicating path properties, could be employed instead.
o How much control do we want over subflows from other subflows
(e.g. closing when interface has failed)? Do we want to
differentiate between subflows and addresses (Section 3.2)?
o Do we want a connection identifier in every packet? E.g. would it
make the implementation of an IDS easier?
o Should we do signaling in the TCP payload, rather than options as
proposed in this draft? We discuss this alternative in the
appendix.
o Should we explicitly support SYN cookies? With the current
design, MPTCP would be downgraded to basic TCP if SYN cookies were
used. Is it worth designing the protocol to allow stateless
server handshake?
o Instead of an Address ID in MP_JOIN, are there any cases where a
Subflow ID (i.e. unique to the subflow) would be useful instead?
For example, two addresses which become NATted to the same
address?
9. Acknowledgements
The authors are supported by Trilogy The authors are supported by Trilogy
(http://www.trilogy-project.org), a research project (ICT-216372) (http://www.trilogy-project.org), a research project (ICT-216372)
partially funded by the European Community under its Seventh partially funded by the European Community under its Seventh
Framework Program. The views expressed here are those of the Framework Program. The views expressed here are those of the
author(s) only. The European Commission is not liable for any use author(s) only. The European Commission is not liable for any use
that may be made of the information in this document. that may be made of the information in this document.
The authors gratefully acknowledge significant input into this The authors gratefully acknowledge significant input into this
document from many members of the Trilogy project, notably Iljitsch document from Olivier Bonaventure and Andrew McDonald.
van Beijnum, Lars Eggert, Marcelo Bagnulo Braun, Robert Hancock, Pasi
Sarolahti, Olivier Bonaventure, Toby Moncaster, Philip Eardley,
Andrew McDonald and Sergio Lembo.
10. IANA Considerations The authors also wish to acknowledge reviews and contributions from
Iljitsch van Beijnum, Lars Eggert, Marcelo Bagnulo, Robert Hancock,
Pasi Sarolahti, Toby Moncaster, Philip Eardley, Sergio Lembo, and
Lawrence Conroy.
9. IANA Considerations
This document will make a request to IANA to allocate new values for This document will make a request to IANA to allocate new values for
TCP Option identifiers, as follows: TCP Option identifiers, as follows:
+-------------+-----------------------------+---------------+-------+ +-------------+-----------------------------+---------------+-------+
| Symbol | Name | Ref | Value | | Symbol | Name | Ref | Value |
+-------------+-----------------------------+---------------+-------+ +-------------+-----------------------------+---------------+-------+
| MP_CAPABLE | Multipath Capable | Section 3.1 | (tbc) | | MP_CAPABLE | Multipath Capable | Section 3.1 | (tbc) |
| MP_JOIN | Join Connection | Section 3.2 | (tbc) | | MP_JOIN | Join Connection | Section 3.2 | (tbc) |
| ADD_ADDR | Add Address | Section 3.5.1 | (tbc) | | ADD_ADDR | Add Address | Section 3.5.1 | (tbc) |
| REMOVE_ADDR | Remove Address | Section 3.5.2 | (tbc) | | REMOVE_ADDR | Remove Address | Section 3.5.2 | (tbc) |
| DSN_MAP | Data Sequence Number | Section 3.3 | (tbc) | | DSN_MAP | Data Sequence Number | Section 3.3 | (tbc) |
| | Mapping | | | | | Mapping | | |
| DATA_ACK | Data-level Acknowledgment | Section 3.3 | (tbc) | | DATA_ACK | Data-level Acknowledgment | Section 3.3 | (tbc) |
| DATA_FIN | Data-level FIN | Section 3.4 | (tbc) | | DATA_FIN | Data-level FIN | Section 3.4 | (tbc) |
| MP_PRIO | Change Subflow Priority | Section 3.3.6 | (tbc) |
| MP_FAIL | Fallback | Section 3.6 | (tbc) | | MP_FAIL | Fallback | Section 3.6 | (tbc) |
+-------------+-----------------------------+---------------+-------+ +-------------+-----------------------------+---------------+-------+
Table 1: TCP Options for MPTCP Table 1: TCP Options for MPTCP
11. References 10. References
11.1. Normative References 10.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, March 1997.
11.2. Informative References 10.2. Informative References
[2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981. September 1981.
[3] Ford, A., Raiciu, C., Barre, S., and J. Iyengar, "Architectural [3] Ford, A., Raiciu, C., Handley, M., and J. Iyengar,
Guidelines for Multipath TCP Development", "Architectural Guidelines for Multipath TCP Development",
draft-ietf-mptcp-architecture-01 (work in progress), June 2010. draft-ietf-mptcp-architecture-02 (work in progress),
October 2010.
[4] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- [4] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
Aware Congestion Control", draft-raiciu-mptcp-congestion-01 Aware Congestion Control", draft-ietf-mptcp-congestion-00 (work
(work in progress), March 2010. in progress), July 2010.
[5] Scharf, M. and A. Ford, "MPTCP Application Interface [5] Scharf, M. and A. Ford, "MPTCP Application Interface
Considerations", draft-scharf-mptcp-api-01 (work in progress), Considerations", draft-scharf-mptcp-api-02 (work in progress),
March 2010. July 2010.
[6] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP [6] Eastlake, D. and T. Hansen, "US Secure Hash Algorithms (SHA and
Selective Acknowledgment Options", RFC 2018, October 1996. HMAC-SHA)", RFC 4634, July 2006.
[7] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, [7] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
September 2007. Selective Acknowledgment Options", RFC 2018, October 1996.
[8] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of [8] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
Explicit Congestion Notification (ECN) to IP", RFC 3168, Explicit Congestion Notification (ECN) to IP", RFC 3168,
September 2001. September 2001.
[9] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. [9] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E.
Lear, "Address Allocation for Private Internets", BCP 5, Lear, "Address Allocation for Private Internets", BCP 5,
RFC 1918, February 1996. RFC 1918, February 1996.
[10] Braden, R., "Requirements for Internet Hosts - Communication [10] Braden, R., "Requirements for Internet Hosts - Communication
Layers", STD 3, RFC 1122, October 1989. Layers", STD 3, RFC 1122, October 1989.
[11] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path [11] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
TCP", draft-ietf-mptcp-threat-02 (work in progress), TCP", draft-ietf-mptcp-threat-03 (work in progress),
March 2010. October 2010.
[12] Srisuresh, P. and K. Egevang, "Traditional IP Network Address [12] Srisuresh, P. and K. Egevang, "Traditional IP Network Address
Translator (Traditional NAT)", RFC 3022, January 2001. Translator (Traditional NAT)", RFC 3022, January 2001.
[13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. [13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Mitigate Shelby, "Performance Enhancing Proxies Intended to Mitigate
Link-Related Degradations", RFC 3135, June 2001. Link-Related Degradations", RFC 3135, June 2001.
[14] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion [14] Handley, M., Paxson, V., and C. Kreibich, "Network Intrusion
Detection: Evasion, Traffic Normalization, and End-to-End Detection: Evasion, Traffic Normalization, and End-to-End
skipping to change at page 39, line 18 skipping to change at page 40, line 44
Appendix A. Notes on use of TCP Options Appendix A. Notes on use of TCP Options
The TCP option space is limited due to the length of the Data Offset The TCP option space is limited due to the length of the Data Offset
field in the TCP header (4 bits), which defines the TCP header length field in the TCP header (4 bits), which defines the TCP header length
in 32-bit words. With the standard TCP header being 20 bytes, this in 32-bit words. With the standard TCP header being 20 bytes, this
leaves a maximum of 40 bytes for options, and many of these may leaves a maximum of 40 bytes for options, and many of these may
already be used by options such as timestamp and SACK. already be used by options such as timestamp and SACK.
We have performed a brief study on the commonly used TCP options in We have performed a brief study on the commonly used TCP options in
both SYN, data packets and pure ACK packets, and found that there is SYN, data, and pure ACK packets, and found that there is enough room
enough room to fit all the options we propose using in this draft. to fit all the options we propose using in this draft.
SYN packets typically include MSS (4 bytes), window scale (3 bytes), SYN packets typically include MSS (4 bytes), window scale (3 bytes),
SACK permitted (2 bytes) and timestamp (10 bytes) options. Together SACK permitted (2 bytes) and timestamp (10 bytes) options. Together
these sum to 19 bytes. Some operating systems appear to pad each these sum to 19 bytes. Some operating systems appear to pad each
option up to a word boundary, thus using 24 bytes (a brief survey option up to a word boundary, thus using 24 bytes (a brief survey
suggests Windows XP and Mac OS X do this, whereas Linux does not). suggests Windows XP and Mac OS X do this, whereas Linux does not).
Optimistically, therefore, we have 21 bytes spare, or 16 if it has to Optimistically, therefore, we have 21 bytes spare, or 16 if it has to
be word-aligned. In either case, however, the Multipath Capable (12 be word-aligned. In either case, however, the Multipath Capable (12
bytes) and Join (7 bytes) options will fit in this remaining space. bytes) and Join (12 bytes) options will fit in this remaining space.
TCP data packets typically carry timestamp options in every packet, TCP data packets typically carry timestamp options in every packet,
taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28, taking 10 bytes (or 12 with padding). That leaves 30 bytes (or 28,
if word-aligned), which are enough to encode the data sequence if word-aligned), which are enough to encode the data sequence
mapping (16 or 20 bytes, depending on the length of the sequence mapping (14 or 18 bytes, depending on the length of the sequence
number in use) and the DATA_ACK if the flow is bidirectional (6 or 10 number in use) and the DATA_ACK if the flow is bidirectional (6 or 10
bytes). Such options will just fit in the available option space, bytes). Such options will just fit in the available option space,
although 8 byte data-level sequence numbers in both will only fit if although 8 byte data-level sequence numbers in both will only fit if
word-alignment is not required. If this proves to be a problem, it word-alignment is not required. If this proves to be a problem, it
is not necessary to include the Data Sequence Mapping and DATA_ACK in is not necessary to include the Data Sequence Mapping and DATA_ACK in
each packet, and in many cases it may be possible to alternate their each packet, and in many cases it may be possible to alternate their
presence (so long as the mapping covers the data being sent in the presence (so long as the mapping covers the data being sent in the
following packet). Other options include: wrapping the DATA_ACK into following packet). Other options include: wrapping the DATA_ACK into
the Data Sequence Mapping option; alternating between 4 and 8 byte the Data Sequence Mapping option; alternating between 4 and 8 byte
sequence numbers in each option; and sending the DATA_ACK on a sequence numbers in each option; and sending the DATA_ACK on a
skipping to change at page 41, line 14 skipping to change at page 42, line 39
This option was dropped, however, since some middleboxes may get This option was dropped, however, since some middleboxes may get
confused when they meet a hole in the sequence space, and do not confused when they meet a hole in the sequence space, and do not
understand the resync option. It is therefore felt that the same understand the resync option. It is therefore felt that the same
data must continue to be retransmitted on a subflow even if it is data must continue to be retransmitted on a subflow even if it is
already received after being retransmitted on another. There should already received after being retransmitted on another. There should
not be a significant performance hit from this since the amount of not be a significant performance hit from this since the amount of
data involved and needing to be retransmitted multiple times will be data involved and needing to be retransmitted multiple times will be
relatively small. relatively small.
Therefore, it is necessary to 're-sync' the expected sequence Appendix C. Changelog
numbering at the receiving end of a subflow, using the following TCP
option. This packet declares a sequence number space (inclusive)
which the receiving node should skip over, i.e. if the receiver's
next expected sequence number was previously within the range
start_seq_num to end_seq_num, move it forward to end_seq_num + 1.
This option will be used on the first new packet on the subflow that This section maintains logs of significant changes made to this
needs its sequence numbering re-synchronised. It will be continue to document between versions.
be included on every packet sent on this subflow until a packet
containing this option has been acknowledged (i.e. if subflow
acknowledgements exist for packets beyond the end sequence number).
If the end sequence number is earlier than the current expected
sequence number (i.e. if a resync packet has already been received),
this option should be ignored.
1 2 3 C.1. Changes since draft-ietf-mptcp-multiaddressed-01
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+------------------------------+
|Kind=MP_RESYNC| Length = 10 | Start Sequence Number :
+---------------+---------------+------------------------------+
: (4 octets) | End Sequence Number :
+---------------+---------------+------------------------------+
: (4 octets) |
+-------------------------------+
Figure 13: Resync option o Added proposal for hash-based security mechanism.
Appendix C. Changelog o Added receiver subflow policy control (backup path flags and
MP_PRIO option).
This section maintains logs of significant changes made to this o Changed DSN_MAP checksum to use the TCP checksum algorithm.
document between versions.
C.1. Changes since draft-ietf-mptcp-multiaddressed-00 C.2. Changes since draft-ietf-mptcp-multiaddressed-00
o Various clarifications and minor re-structuring in response to o Various clarifications and minor re-structuring in response to
comments. comments.
C.2. Changes since draft-ford-mptcp-multiaddressed-03 C.3. Changes since draft-ford-mptcp-multiaddressed-03
o Clarified handshake mechanism, especially with regard to error o Clarified handshake mechanism, especially with regard to error
cases (Section 3.2). cases (Section 3.2).
o Added optional port to ADD_ADDR and clarified situation with o Added optional port to ADD_ADDR and clarified situation with
private addresses (Section 3.5.1). private addresses (Section 3.5.1).
o Added path liveness check to REMOVE_ADDR (Section 3.5.2). o Added path liveness check to REMOVE_ADDR (Section 3.5.2).
o Added chunk checksumming to DSN_MAP (Section 3.3.1) to detect o Added chunk checksumming to DSN_MAP (Section 3.3.1) to detect
payload-altering middleboxes, and defined fallback mechanism payload-altering middleboxes, and defined fallback mechanism
(Section 3.6). (Section 3.6).
o Major clarifications to receive window discussion (Section 3.3.4). o Major clarifications to receive window discussion (Section 3.3.4).
o Various textual clarifications, especially in examples. o Various textual clarifications, especially in examples.
C.3. Changes since draft-ford-mptcp-multiaddressed-02 C.4. Changes since draft-ford-mptcp-multiaddressed-02
o Remove Version and Address ID in MP_CAPABLE in Section 3.1, and o Remove Version and Address ID in MP_CAPABLE in Section 3.1, and
make ISN be 6 bytes. make ISN be 6 bytes.
o Data sequence numbers are now always 8 bytes. But in some cases o Data sequence numbers are now always 8 bytes. But in some cases
where it is unambiguous it is permissible to only send the lower 4 where it is unambiguous it is permissible to only send the lower 4
bytes if space is at a premium. bytes if space is at a premium.
o Clarified behaviour of MP_JOIN in Section 3.2. o Clarified behaviour of MP_JOIN in Section 3.2.
 End of changes. 99 change blocks. 
384 lines changed or deleted 423 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/