draft-ietf-mptcp-multiaddressed-00.txt   draft-ietf-mptcp-multiaddressed-01.txt 
Internet Engineering Task Force A. Ford Internet Engineering Task Force A. Ford
Internet-Draft Roke Manor Research Internet-Draft Roke Manor Research
Intended status: Experimental C. Raiciu Intended status: Experimental C. Raiciu
Expires: December 23, 2010 M. Handley Expires: January 13, 2011 M. Handley
University College London University College London
June 21, 2010 July 12, 2010
TCP Extensions for Multipath Operation with Multiple Addresses TCP Extensions for Multipath Operation with Multiple Addresses
draft-ietf-mptcp-multiaddressed-00 draft-ietf-mptcp-multiaddressed-01
Abstract Abstract
TCP/IP communication is currently restricted to a single path per TCP/IP communication is currently restricted to a single path per
connection, yet multiple paths often exist between peers. The connection, yet multiple paths often exist between peers. The
simultaneous use of these multiple paths for a TCP/IP session would simultaneous use of these multiple paths for a TCP/IP session would
improve resource usage within the network, and thus improve user improve resource usage within the network, and thus improve user
experience through higher throughput and improved resilience to experience through higher throughput and improved resilience to
network failure. network failure.
skipping to change at page 1, line 44 skipping to change at page 1, line 44
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on December 23, 2010. This Internet-Draft will expire on January 13, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2010 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 13 skipping to change at page 3, line 13
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . . 4 1.1. Design Assumptions . . . . . . . . . . . . . . . . . . . . 4
1.2. Multipath TCP in the Networking Stack . . . . . . . . . . 5 1.2. Multipath TCP in the Networking Stack . . . . . . . . . . 5
1.3. Operation Summary . . . . . . . . . . . . . . . . . . . . 6 1.3. Operation Summary . . . . . . . . . . . . . . . . . . . . 6
1.4. Requirements Language . . . . . . . . . . . . . . . . . . 7 1.4. Requirements Language . . . . . . . . . . . . . . . . . . 7
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7
3. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 8 3. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 8
4. MPTCP Protocol . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1. Connection Initiation . . . . . . . . . . . . . . . . . . 8
4.1. Connection Initiation . . . . . . . . . . . . . . . . . . 9 3.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 10
4.2. Starting a New Subflow . . . . . . . . . . . . . . . . . . 11 3.3. General MPTCP Operation . . . . . . . . . . . . . . . . . 12
4.3. Address Knowledge Exchange (Path Management) . . . . . . . 13 3.3.1. Data Sequence Numbering . . . . . . . . . . . . . . . 12
4.3.1. Address Advertisement . . . . . . . . . . . . . . . . 14 3.3.2. Data Acknowledgements . . . . . . . . . . . . . . . . 15
4.3.2. Remove Address . . . . . . . . . . . . . . . . . . . . 17 3.3.3. Receiver Considerations . . . . . . . . . . . . . . . 16
4.4. General MPTCP Operation . . . . . . . . . . . . . . . . . 18 3.3.4. Sender Considerations . . . . . . . . . . . . . . . . 17
4.4.1. Data Sequence Numbering . . . . . . . . . . . . . . . 18 3.3.5. Congestion Control Considerations . . . . . . . . . . 18
4.4.2. Data Acknowledgements . . . . . . . . . . . . . . . . 20 3.3.6. Subflow Policy . . . . . . . . . . . . . . . . . . . . 19
4.4.3. Receiver Considerations . . . . . . . . . . . . . . . 21 3.4. Closing a Connection . . . . . . . . . . . . . . . . . . . 20
4.4.4. Sender Considerations . . . . . . . . . . . . . . . . 22 3.5. Address Knowledge Exchange (Path Management) . . . . . . . 21
4.4.5. Congestion Control Considerations . . . . . . . . . . 24 3.5.1. Address Advertisement . . . . . . . . . . . . . . . . 22
4.4.6. Subflow Policy . . . . . . . . . . . . . . . . . . . . 24 3.5.2. Remove Address . . . . . . . . . . . . . . . . . . . . 25
4.5. Closing a Connection . . . . . . . . . . . . . . . . . . . 25 3.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.6. Fallback . . . . . . . . . . . . . . . . . . . . . . . . . 27 3.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 29
4.7. Error Handling . . . . . . . . . . . . . . . . . . . . . . 29 3.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 29
4.8. Heuristics . . . . . . . . . . . . . . . . . . . . . . . . 30 3.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 29
4.8.1. Port Usage . . . . . . . . . . . . . . . . . . . . . . 30 4. Semantic Issues . . . . . . . . . . . . . . . . . . . . . . . 30
5. Security Considerations . . . . . . . . . . . . . . . . . . . 31 5. Security Considerations . . . . . . . . . . . . . . . . . . . 31
6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 32 6. Interactions with Middleboxes . . . . . . . . . . . . . . . . 32
7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 35 7. Interfaces . . . . . . . . . . . . . . . . . . . . . . . . . . 35
8. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 35 8. Open Issues . . . . . . . . . . . . . . . . . . . . . . . . . 36
9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 36 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 37
10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 37
11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 37 11. References . . . . . . . . . . . . . . . . . . . . . . . . . . 37
11.1. Normative References . . . . . . . . . . . . . . . . . . . 37 11.1. Normative References . . . . . . . . . . . . . . . . . . . 37
11.2. Informative References . . . . . . . . . . . . . . . . . . 37 11.2. Informative References . . . . . . . . . . . . . . . . . . 38
Appendix A. Notes on use of TCP Options . . . . . . . . . . . . . 38 Appendix A. Notes on use of TCP Options . . . . . . . . . . . . . 39
Appendix B. Resync Packet . . . . . . . . . . . . . . . . . . . . 39 Appendix B. Resync Packet . . . . . . . . . . . . . . . . . . . . 40
Appendix C. Changelog . . . . . . . . . . . . . . . . . . . . . . 41 Appendix C. Changelog . . . . . . . . . . . . . . . . . . . . . . 41
C.1. Changes since draft-ford-mptcp-multiaddressed-03 . . . . . 41 C.1. Changes since draft-ietf-mptcp-multiaddressed-00 . . . . . 41
C.2. Changes since draft-ford-mptcp-multiaddressed-02 . . . . . 41 C.2. Changes since draft-ford-mptcp-multiaddressed-03 . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 41 C.3. Changes since draft-ford-mptcp-multiaddressed-02 . . . . . 42
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 42
1. Introduction 1. Introduction
Multipath TCP (henceforth referred to as MPTCP) is a set of Multipath TCP (henceforth referred to as MPTCP) is a set of
extensions to regular TCP [2] to allow a transport connection to extensions to regular TCP [2] to allow a transport connection to
operate across multiple paths simultaneously. This document presents operate across multiple paths simultaneously. This document presents
the protocol changes required to add multipath capability to TCP; the protocol changes required to add multipath capability to TCP;
specifically, those for signalling and setting up multiple paths specifically, those for signalling and setting up multiple paths
("subflows"), managing these subflows, reassembly of data, and ("subflows"), managing these subflows, reassembly of data, and
termination of sessions. This is not the only information required termination of sessions. This is not the only information required
skipping to change at page 5, line 42 skipping to change at page 5, line 42
o The design presented should work with network provided multipath, o The design presented should work with network provided multipath,
for instance ECMP routing; subflows could be opened with different for instance ECMP routing; subflows could be opened with different
source/destination ports between the same addreses to allow ECMP source/destination ports between the same addreses to allow ECMP
to place the subflows on different paths. to place the subflows on different paths.
1.2. Multipath TCP in the Networking Stack 1.2. Multipath TCP in the Networking Stack
MPTCP operates at the transport layer and aims to be transparent to MPTCP operates at the transport layer and aims to be transparent to
both higher and lower layers. It is a set of additional features on both higher and lower layers. It is a set of additional features on
top of standard TCP; MPTCP is designed to be usable by legacy top of standard TCP; Figure 1 illustrates this layering. MPTCP is
applications with no changes. Figure 1 illustrates this layering. designed to be usable by legacy applications with no changes;
detailed discussion of its interactions with applications is given in
One way to enable multipath TCP in a host is adding a system-wide [5].
setting: "Use multipath TCP by default? Y/N". Multipath-aware
applications would be able to use an extended sockets API [5] to have
finer control on the behaviour of MPTCP.
+-------------------------------+ +-------------------------------+
| Application | | Application |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| Application | | MPTCP | | Application | | MPTCP |
+---------------+ + - - - - - - - + - - - - - - - + +---------------+ + - - - - - - - + - - - - - - - +
| TCP | | Subflow (TCP) | Subflow (TCP) | | TCP | | Subflow (TCP) | Subflow (TCP) |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
| IP | | IP | IP | | IP | | IP | IP |
+---------------+ +-------------------------------+ +---------------+ +-------------------------------+
skipping to change at page 6, line 25 skipping to change at page 6, line 25
Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks Figure 1: Comparison of Standard TCP and MPTCP Protocol Stacks
Detailed discussion of an architecture for developing a multipath TCP Detailed discussion of an architecture for developing a multipath TCP
implementation, especially regarding the functional separation by implementation, especially regarding the functional separation by
which different components should be developed, is given in [3]. which different components should be developed, is given in [3].
1.3. Operation Summary 1.3. Operation Summary
This section provides a high-level summary of normal operation of This section provides a high-level summary of normal operation of
MPTCP, and is illustrated by the scenario shown in Figure 2. A MPTCP, and is illustrated by the scenario shown in Figure 2. A
detailed description of operation is given in Section 4. detailed description of operation is given in Section 3.
o To a non-MPTCP-aware application, MPTCP will behave the same as o To a non-MPTCP-aware application, MPTCP will behave the same as
normal TCP. Extended APIs could provide additional control to normal TCP. Extended APIs could provide additional control to
MPTCP-aware applications [5]. An application begins by opening a MPTCP-aware applications [5]. An application begins by opening a
TCP socket in the normal way. MPTCP signaling and operation is TCP socket in the normal way. MPTCP signaling and operation is
handled by the MPTCP implementation. handled by the MPTCP implementation.
o An MPTCP connection begins similarly to a regular TCP connection. o An MPTCP connection begins similarly to a regular TCP connection.
This is illustrated in Figure 2 where a connection is established This is illustrated in Figure 2 where a TCP connection is
between addresses A1 and B1 on Hosts A and B respectively. established between addresses A1 and B1 on Hosts A and B
respectively.
o If extra paths are available, additional TCP sessions (termed o If extra paths are available, additional TCP sessions (termed
"subflows") are created on these paths, and are combined with the "subflows") are created on these paths, and are combined with the
existing session, which continues to appear as a single connection existing session, which continues to appear as a single connection
to the applications at both ends. The creation of the additional to the applications at both ends. The creation of the additional
TCP session is illustrated between Address A2 on Host A and TCP session is illustrated between Address A2 on Host A and
Address B1 on Host B. Address B1 on Host B.
o MPTCP identifies multiple paths by the presence of multiple o MPTCP identifies multiple paths by the presence of multiple
addresses at endpoints. Combinations of these multiple addresses addresses at endpoints. Combinations of these multiple addresses
skipping to change at page 7, line 14 skipping to change at page 7, line 17
mechanism by which an endpoint can initiate new subflows by using mechanism by which an endpoint can initiate new subflows by using
its own additional addresses, or by signalling its available its own additional addresses, or by signalling its available
addresses to the other endpoint. addresses to the other endpoint.
o MPTCP adds connection-level sequence numbers to allow the o MPTCP adds connection-level sequence numbers to allow the
reassembly of the in-order data stream from multiple subflows reassembly of the in-order data stream from multiple subflows
which may deliver packets out-of-order due to differing network which may deliver packets out-of-order due to differing network
delays. delays.
o Subflows are terminated as regular TCP connections, with a four o Subflows are terminated as regular TCP connections, with a four
way FIN handshake. The connection is terminated by a connection- way FIN handshake. The MPTCP connection is terminated by a
level FIN packet, sent together with the FIN on the last subflow connection-level FIN packet, sent together with the FIN on the
of the connection. last subflow of the connection.
Host A Host B Host A Host B
------------------------ ------------------------ ------------------------ ------------------------
Address A1 Address A2 Address B1 Address B2 Address A1 Address A2 Address B1 Address B2
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
| | | | | | | |
| (initial connection setup) | | | (initial connection setup) | |
|----------------------------------->| | |----------------------------------->| |
|<-----------------------------------| | |<-----------------------------------| |
| | | | | | | |
skipping to change at page 8, line 5 skipping to change at page 8, line 8
document are to be interpreted as described in RFC 2119 [1]. document are to be interpreted as described in RFC 2119 [1].
2. Terminology 2. Terminology
Path: A sequence of links between a sender and a receiver, defined Path: A sequence of links between a sender and a receiver, defined
in this context by a source and destination address pair. in this context by a source and destination address pair.
Subflow: A stream of TCP packets sent over a path, started and Subflow: A stream of TCP packets sent over a path, started and
terminated similarly to a regular TCP connection. terminated similarly to a regular TCP connection.
Connection: A collection of one or more subflows, over which an (MPTCP) Connection: A collection of one or more subflows, over which
application can communicate between two endpoints. There is a an application can communicate between two endpoints. There is a
one-to-one mapping between a connection and an application socket. one-to-one mapping between a connection and an application socket.
Data-level: The payload data is nominally transfered over a Data-level: The payload data is nominally transfered over a
connection, which in turn is transported over subflows. Thus the connection, which in turn is transported over subflows. Thus the
term "data-level" is synonymous with "connection level", in term "data-level" is synonymous with "connection level", in
contrast to "subflow-level" which refers to properties of an contrast to "subflow-level" which refers to properties of an
individual subflow. individual subflow.
Token: A locally unique identifier given to a multipath connection Token: A locally unique identifier given to a multipath connection
by an endpoint. May also be referred to as a "Connection ID". by an endpoint. May also be referred to as a "Connection ID".
Endpoint: A host operating an MPTCP implementation, and either Endpoint: A host operating an MPTCP implementation, and either
initiating or accepting an MPTCP connection. initiating or accepting an MPTCP connection.
3. Semantic Issues 3. MPTCP Protocol
In order to support multipath operation, the semantics of some TCP
components have changed. To aid clarity, this section collects these
semantic changes as a reference.
Sequence Number: The (in-header) TCP sequence number is specific to
the subflow. To allow the receiver to reorder application data,
an additional data-level sequence space is used. In this data-
level sequence space, the initial SYN and the final DATA_FIN
occupy one octet of sequence space. There is an explicit mapping
of data sequence space to subflow sequence space, which is
signalled through TCP options in data packets.
ACK: The ACK field in the TCP header acknowledges only the subflow
sequence number, not the data-level sequence space.
Implementations SHOULD NOT attempt to infer a data-level
acknowledgement from the subflow ACKs. Instead an explicit data-
level DATA_ACK is used. This avoids possible deadlock scenarios
when a non-TCP-aware middlebox pro-actively ACKs at the subflow
level.
Receive Window: The receive window in the TCP header indicates the
amount of free buffer space for the whole data-level connection
(as opposed to for this subflow) that is available at the
receiver. This is the same semantics as regular TCP, but to
maintain these semantics the receive window must be interpreted at
the sender as relative to the sequence number given in the
DATA_ACK rather than the subflow ACK in the TCP header. In this
way the original flow control role is preserved.
FIN: The FIN flag in the TCP header applies only to the subflow it
is sent on, not to the whole connection. For connection-level FIN
semantics, the DATA_FIN option is used.
RST: The RST flag in the TCP header applies only to the subflow it
is sent on, not to the whole connection. A connection is
considered reset if a RST is received on every subflow.
Address List: Address list management (i.e. knowledge of the local
and remote hosts' lists of available IP addresses) is handled on a
per-connection basis (as opposed to per-subflow, per host, or per
pair of communicating hosts). This permits the application of
per-connection local policy. Adding an address to one connection
(either explicitly through an Add Address message, or implicitly
through a Join) has no implication for other connections between
the same pair of hosts.
5-tuple: The 5-tuple (protocol, local address, local port, remote
address, remote port) presented by kernel APIs to the application
layer in a non-multipath-aware application is that of the first
subflow, even if the subflow has since been closed and removed
from the connection. This decision, and other related API issues,
are discussed in more detail in [5].
4. MPTCP Protocol
This section describes the operation of the MPTCP protocol, and is This section describes the operation of the MPTCP protocol, and is
subdivided into sections for each key part of the protocol operation. subdivided into sections for each key part of the protocol operation.
All MPTCP operations are signalled using optional TCP header fields. All MPTCP operations are signalled using optional TCP header fields.
These TCP Options will have option numbers allocated by IANA, as These TCP Options will have option numbers allocated by IANA, as
listed in Section 10, and are defined throughout the following listed in Section 10, and are defined throughout the following
subsections. subsections.
4.1. Connection Initiation 3.1. Connection Initiation
Connection Initiation begins with a SYN, SYN/ACK exchange on a single Connection Initiation begins with a SYN, SYN/ACK exchange on a single
path. Each packet contains the Multipath Capable (MP_CAPABLE) TCP path. Each packet contains the Multipath Capable (MP_CAPABLE) TCP
option (Figure 3). This option declares its sender is capable of option (Figure 3). This option declares its sender is capable of
performing multipath TCP and wishes to do so on this particular performing multipath TCP and wishes to do so on this particular
connection. Each host includes in the MP_CAPABLE option a locally- connection. Each host includes in the MP_CAPABLE option a locally-
unique token that identifies this connection. This is used when unique token that identifies this connection. This is used when
adding additional subflows to this connection. adding additional subflows to this connection.
This token is generated by the sender and has local meaning only, This token is generated by the sender and has local meaning only,
hence it MUST be unique for the sender. The token MUST be difficult hence it MUST be unique for the sender. The token MUST be difficult
for an attacker to guess, and thus it is recommended it SHOULD be for an attacker to guess, and thus it is recommended it SHOULD be
generated randomly. (However, see further discussions about security generated randomly. (However, see further discussions about security
in Section 5, including the possibility of 64-bit tokens.) in Section 5, including the possibility of 64-bit tokens.)
The MP_CAPABLE option is only present in packets with the SYN flag The MP_CAPABLE option is only present in packets with the SYN flag
set. It is only used in the first TCP session of a connection, in set. It is only used in the first TCP session of a connection, in
order to identify the connection; all following connections will use order to identify the connection; all following subflows will use the
the "Join" option (see Section 4.2) to join the existing connection. "Join" option (see Section 3.2) to join the existing connection.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+ +---------------+---------------+-------------------------------+
|Kind=MP_CAPABLE| Length=11 | Sender Token : |Kind=MP_CAPABLE| Length=12 | Sender Token :
+---------------+---------------+---------------+---------------+ +---------------+---------------+-------------------------------+
: Sender Token (4 bytes total) | Initial Data Sequence Number : : Sender Token (4 bytes total) | Initial Data Sequence Number :
+-----------------------------------------------+---------------+ +-------------------------------+-------------------------------+
: Initial Data Sequence Number (6 bytes total) | : Initial Data Sequence Number (6 bytes total) |
+-----------------------------------------------+---------------+ +---------------------------------------------------------------+
Figure 3: Multipath Capable (MP_CAPABLE) option (only valid on SYN Figure 3: Multipath Capable (MP_CAPABLE) option (only valid on SYN
packets) packets)
If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it If a SYN contains an MP_CAPABLE option but the SYN/ACK does not, it
is assumed that the passive opener is not multipath capable and thus is assumed that the passive opener is not multipath capable and thus
the MPTCP session will operate as regular, single-path TCP. If a SYN the MPTCP session will operate as regular, single-path TCP. If a SYN
does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT contain does not contain a MP_CAPABLE option, the SYN/ACK MUST NOT contain
one in response. one in response.
skipping to change at page 10, line 48 skipping to change at page 9, line 43
regular TCP behavior, even if it subsequently receives a SYN/ACK that regular TCP behavior, even if it subsequently receives a SYN/ACK that
contains an MP_CAPABLE option. This might happen if the MP_CAPABLE contains an MP_CAPABLE option. This might happen if the MP_CAPABLE
SYN and subsequent non-MP-capable SYN are reordered. This is to SYN and subsequent non-MP-capable SYN are reordered. This is to
ensure that the two endpoints end up in an interoperable state, no ensure that the two endpoints end up in an interoperable state, no
matter what order the SYNs arrive at the passive opener. This final matter what order the SYNs arrive at the passive opener. This final
state is inferred from the presence or absence of the DATA_ACK option state is inferred from the presence or absence of the DATA_ACK option
in the third packet of the TCP handshake. in the third packet of the TCP handshake.
The MP_CAPABLE option includes the most significant 6 bytes of the The MP_CAPABLE option includes the most significant 6 bytes of the
8-byte initial Data Sequence Number option (discussed in 8-byte initial Data Sequence Number option (discussed in
Section 4.4). The least significant two bytes should be treated as Section 3.3). The least significant two bytes should be treated as
being zero. This data sequence number maps the SYN into to the data being zero. This data sequence number maps the SYN into to the data
sequence space (and this initial SYN occupies one octet of this sequence space (and this initial SYN occupies one octet of this
space, as for a regular SYN in single-path TCP). Having the SYN space, as for a regular SYN in single-path TCP). Having the SYN
occupy sequence space means that it must be DATA_ACKed, and this occupy sequence space means that it must be DATA_ACKed, and this
ensures that there is two-way agreement on whether or not the ensures that there is two-way agreement on whether or not the
multipath capability is enabled, even if a middlebox were to strip multipath capability is enabled, even if a middlebox were to strip
the MP_CAPABLE option from a SYN/ACK packet. the MP_CAPABLE option from a SYN/ACK packet.
To preserve option space, only the most significant six bytes of the To preserve option space, only the most significant six bytes of the
data sequence number are sent in the SYN, as there is no significant data sequence number are sent in the SYN, as there is no significant
security benefit from randomizing the values of the lower two bytes security benefit from randomizing the values of the lower two bytes
given that these fall within typical receive window sizes. given that these fall within typical receive window sizes.
4.2. Starting a New Subflow 3.2. Starting a New Subflow
Endpoints have knowledge of their own address(es), and can become Once a MPTCP connection has begun with the MP_CAPABLE exchange,
aware of the other endpoint's addresses through signalling exchanges further subflows can be added to the connection. Endpoints have
as described in Section 4.3. Using this knowledge, an endpoint can knowledge of their own address(es), and can become aware of the other
initiate a new subflow over a currently unused pair of addresses. endpoint's addresses through signalling exchanges as described in
The protocol permits either endpoint of a connection to initiate the Section 3.5. Using this knowledge, an endpoint can initiate a new
creation of a new subflow (but see Section 4.8 for heuristics) subflow over a currently unused pair of addresses. The protocol
permits either endpoint of a connection to initiate the creation of a
new subflow (but see Section 3.8 for heuristics)
A new subflow is started as a normal TCP SYN/ACK exchange. The Join A new subflow is started as a normal TCP SYN/ACK exchange. The Join
Connection (MP_JOIN)) TCP option (Figure 4) is used to identify the Connection (MP_JOIN)) TCP option (Figure 4) is used to identify the
connection to be joined by the new subflow. The receiver token sent connection to be joined by the new subflow. The receiver token sent
MUST be the other endpoint's locally unique connection token, which MUST be the other endpoint's locally unique connection token, which
was included in the MP_CAPABLE option during connection was included in the MP_CAPABLE option during connection
establishment. establishment. The MP_JOIN option MUST only be present on SYN
packets.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+-------------------------------+ +---------------+---------------+-------------------------------+
| Kind=MP_JOIN | Length = 7 |Receiver Token (4 octets total): | Kind=MP_JOIN | Length = 7 |Receiver Token (4 octets total):
+---------------+---------------+----------------+--------------+ +---------------+---------------+----------------+--------------+
: Receiver Token (continued) | Address ID | : Receiver Token (continued) | Address ID |
+-------------------------------+----------------+ +-------------------------------+----------------+
Figure 4: Join Connection (MP_JOIN) option (only valid on SYN Figure 4: Join Connection (MP_JOIN) option (only valid on SYN
packets) packets)
TBD: A better security mechanism that just the token is required TBD: A better security mechanism that just the token is required
here, in order to prove freshness of the subflow initiator's here, in order to prove that the sender of the SYN/MP_JOIN is the
knowledge of the connection. Possibilities could include the DSN same sender as that who sent the original SYN/MP_CAPABLE. Hash
(although this would require a reasonably large window), or something chains are considered an appropriate solution, and the mechanism will
to do with the checksums of data. be described in detail in a later version of this document.
When receiving a SYN with the MP_JOIN option that contains a valid When receiving a SYN with the MP_JOIN option that contains a valid
token for an existing MPTCP connection, the recipient SHOULD respond token for an existing MPTCP connection, the recipient SHOULD respond
with a SYN/ACK also containing an MP_JOIN option containing the with a SYN/ACK also containing an MP_JOIN option containing the
initiator's token. This serves two purposes: it ensures both initiator's token. This behaviour is illustrated in Figure 5.
endpoints agree on the connection being referred to (this is
particularly relevant when both addresses being used are new to the
connection); and it ensures there are no middleboxes on the path that
will drop MPTCP options on the return path. This behaviour is
illustrated in Figure 5.
Host A Host B Host A Host B
------------------------ ------------------------ ------------------------ ------------------------
Address A1 Address A2 Address B1 Address B2 Address A1 Address A2 Address B1 Address B2
---------- ---------- ---------- ---------- ---------- ---------- ---------- ----------
| | | | | | | |
| SYN + MP_CAPABLE(Token A) | | | SYN + MP_CAPABLE(Token A) | |
|----------------------------------->| | |----------------------------------->| |
|<-----------------------------------| | |<-----------------------------------| |
| SYN/ACK + MP_CAPABLE(Token B) | | | SYN/ACK + MP_CAPABLE(Token B) | |
| | | | | | | |
| | SYN + MP_JOIN(Token B) | | | SYN + MP_JOIN(Token B) |
| |----------------------------------->| | |----------------------------------->|
| |<-----------------------------------| | |<-----------------------------------|
| | SYN/ACK + MP_JOIN(Token A) | | | SYN/ACK + MP_JOIN(Token A) |
| | | | | | | |
Figure 5: Example use of MPTCP Tokens Figure 5: Example use of MPTCP Tokens
If the token is unknown or local policy prohibits the acceptable of If the token received at Host B is unknown or local policy prohibits
the new subflow, the recipient MUST respond with a TCP RST. the acceptance of the new subflow, the recipient MUST respond with a
TCP RST.
It is possible that a middlebox that strips MPTCP options exists, If the token is accepted at Host B, but the token returned to Host A
either on the path from A to B, or on the return path. MPTCP must be is not the one expected, Host A MUST close the subflow with a TCP
robust and refuse to open an additional subflow on such a path. RST.
The echoing of the token serves two purposes: it ensures both
endpoints agree on the connection being referred to (this is
particularly relevant when both addresses being used are new to the
connection); and it ensures there are no middleboxes on the path that
will drop MPTCP options on the return path.
If the SYN/ACK as received at Host A does not have an MP_JOIN option,
Host A MUST close the subflow with a RST.
If MP_JOIN is stripped from the SYN on the path from A to B, and Host If MP_JOIN is stripped from the SYN on the path from A to B, and Host
B does not have a passive opener on the relevant port, it will B does not have a passive opener on the relevant port, it will
respond with an RST in the normal way. If in response to a SYN with respond with an RST in the normal way. If in response to a SYN with
an MP_JOIN option, a SYN/ACK is received without the MP_JOIN option an MP_JOIN option, a SYN/ACK is received without the MP_JOIN option
(either since it was stripped on the return path, or it was stripped (either since it was stripped on the return path, or it was stripped
on the outgoing path but the passive opener on Host B responded as if on the outgoing path but the passive opener on Host B responded as if
it was a new regular TCP session), then the subflow is unusable and it was a new regular TCP session), then the subflow is unusable and
Host A MUST close it with a RST. Host A MUST close it with a RST.
It should be noted that additional subflows can be created between It should be noted that additional subflows can be created between
any pair of ports (but see Section 4.8 for heuristics); no explicit any pair of ports (but see Section 3.8 for heuristics); no explicit
application-level accept calls or bind calls are required to open application-level accept calls or bind calls are required to open
additional subflows. To associate a new subflow with an existing additional subflows. To associate a new subflow with an existing
connection, the token supplied in the subflow's SYN exchange is used connection, the token supplied in the subflow's SYN exchange is used
for demultiplexing. This then binds the 5-tuple of the TCP subflow for demultiplexing. This then binds the 5-tuple of the TCP subflow
to the local token of the connection. A consequence is that it is to the local token of the connection. A consequence is that it is
possible to allow any port pairs to be used for a connection. possible to allow any port pairs to be used for a connection.
Deumultiplexing subflow SYNs MUST be done using the token; this is Deumultiplexing subflow SYNs MUST be done using the token; this is
unlike traditional TCP, where the destination port is used for unlike traditional TCP, where the destination port is used for
demultiplexing SYN packets. Once a subflow is setup, demultiplexing demultiplexing SYN packets. Once a subflow is setup, demultiplexing
packets is done using the five-tuple, as in traditional TCP. The packets is done using the five-tuple, as in traditional TCP. The
five-tuples will be mapped to the local connection ID. five-tuples will be mapped to the local connection ID.
The MP_JOIN option includes an "Address ID". This is an identifier The MP_JOIN option includes an "Address ID". This is an identifier
that is locally unique to the sender of this option. It has only that is locally unique to the sender of this option. It has only
significance withing a single connection, where it identifies the significance withing a single connection, where it identifies the
source address of this packet. The key purpose of this identifier is source address of this packet. The key purpose of this identifier is
to allow address removal without needing to know what the source to allow address removal without needing to know what the source
address actually is, thus allowing the use of NATs), when the subflow address actually is, thus allowing the use of NATs), when the subflow
is no longer available. The sender can signal this to the receiver is no longer available. The sender can signal this to the receiver
via the REMOVE_ADDR option (Section 4.3.2). It also allows via the REMOVE_ADDR option (Section 3.5.2). It also allows
correlation between new connection attempts and address signalling correlation between new subflow setup attempts and address signalling
(Section 4.3.1), to prevent setting up duplicate subflows on the same (Section 3.5.1), to prevent setting up duplicate subflows on the same
path. path.
The Address IDs of the subflow used in the initial SYN exchange of The Address IDs of the subflow used in the initial SYN exchange of
the first subflow in the connection are implicit, and have the value the first subflow in the connection are implicit, and have the value
zero. zero.
The Address ID must be stored by the receiver in a data structure The Address ID must be stored by the receiver in a data structure
that gathers all the Address ID to address mappings for a connection that gathers all the Address ID to address mappings for a connection
identified by a token pair. In this way there is a stored mapping identified by a token pair. In this way there is a stored mapping
between Address ID, observed source address and token pair for future between Address ID, observed source address and token pair for future
processing of control information for a connection. processing of control information for a connection.
The MP_JOIN option MUST only be sent in segments with the SYN flag 3.3. General MPTCP Operation
set.
4.3. Address Knowledge Exchange (Path Management)
We use the term "path management" to refer to the exchange of
information about additional paths between endpoints, which in this
design is managed by multiple addresses at endpoints. For more
detail of the architectural thinking behind this design, see the
separate architecture document [3].
This design makes use of two methods of sharing such information,
used simultaneously. The first is the direct setup of new subflows,
already described in Section 4.2, where the initiator has an
additional address. The second method, described in the following
subsections, signals addresses explicitly to the other endpoint to
allow it to initiate new connections. The two mechanisms are
complementary: the first is implicit and simple, while the explicit
is more complex but is more robust. Together, the mechanisms allow
addresses to change in flight (and thus support operation through
NATs, since the source address need not be known), and also allow the
signalling of previ\ ously unknown addresses, and of addresses
belonging to other address families (e.g. IPv4 and IPv6).
Here is an example of typical operation of the protocol:
o A1 of host A and address/port B1 of host B. If host A is
multihomed, it can start an additional subflow from its address A2
to B1, by sending a SYN with a Join option from A2 to B1, using
B's previously declared token for this connection. Alternatively,
if B is multhomed, it can try to set up a new subflow from B2 to
A1, using A's previously declared token. In either case, the SYN
will be sent to the port already in use for the original subflow
on the receiving host.
o Simultaneously (or after a timeout), an ADD_ADDR option
(Section 4.3.1) is sent on an existing subflow, informing the
receiver of the sender's alternative address(es). The recipient
can use this information to open a new subflow to the sender's
additional address. In our example, A will send ADD_ADDR option
informing B of address A2. The mix of using the SYN-based option
and the ADD_ADDR option, including timeouts, is implementation-
specific and can be tailored to agree with local policy.
o If subflow A2-B1 is succesfully setup, host B1 can use the Address
ID in the Join option to correlate this with the ADD_ADDR option
that will also arrive on an existing subflow; now B knows not to
open A2-B1, ignoring the ADD_ADDR. Otherwise, if B has not
received the A2-B1 SYN join but received the ADD_ADDR, it will try
to initiate a new subflow from one or more of its addresses to
address A2. This permits new sessions to be opened if one
endpoint is behind a NAT. A slight security improvement can be
gained if a host ensures there is a correlated ADD_ADDR option
before responding to the SYN.
Other ways of using the two signaling mechanisms are possible; for
instance, signaling addresses in other address families can only be
done explicitly using the Add Address option.
4.3.1. Address Advertisement
The Add Address (ADD_ADDR) TCP Option announces additional addresses
on which an endpoint can be reached (Figure 6). It can be used to
announce several (ID, address) pairs to be announced to the other
endpoint. Multiple addresses can be added in a single message if
there is sufficient TCP option space, otherwise multiple TCP messages
containing this option will be sent. This option can be used at any
time during a connection, depending on when the sender wishes to
enable multiple paths and/or when paths become available.
Every address has an ID which can be used for address removal, and
therefore endpoints must cache the mapping between ID and address.
This is also used to identify Join Connection options (Section 4.2)
relating to the same address, even when address translators are in
use. The ID must be unique to the sender and connection, per
address, but its mechanism for allocating such IDs is implementation-
specific.
This option is shown for IPv4. For IPv6, the IPVer field will read
6, and the length of the address will be 16 octets (instead of 4),
and the length of the option will be 2 + (18 * number_of_entries).
If there is sufficient TCP option space, multiple addresses can be
included, with an ID following on immediately from the previous
address. The number of addresses can be deduced from the option
length and version fields.
The 'P' bit is used to indicate the presence of an additional two
octets specifying the port number to use. Although it is expected
that the majority of use cases will use the same port pairs as used
for the initial subflow (e.g. port 80 remains port 80 on all
subflows, as does the ephemeral port at the client, there may be
cases (such as port-based load balancing) where the explicit
specification of a different port is required. If the P bit is not
specified, MPTCP MUST attempt to connect to the specified address on
same port as is already in use by the signalling subflow.
[TBD: We could make use of an additional flag, as follows. Exact
behaviour to be worked out: The 'B' bit is used to indicate that this
specified address (and port, if applicable) should be treated as a
backup subflow to use only in the event of failure of other working
subflows. A receiver of this option SHOULD set up a TCP subflow to
the specified address and port, but SHOULD NOT send data on it until
the other paths have failed.]
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+-------+-------+
| Kind=ADD_ADDR | Length | Address ID | IPVer |(res)|P|
+---------------+---------------+---------------+-------+-------+
| Address (IPv4 - 4 octets / IPv6 - 16 octets) |
+-------------------------------+-------------------------------+
| Port (2 octets if P=1) | ...
+-------------------------------+
( ... further ID/Version/Address/Port fields as required ... )
Figure 6: Add Address (ADD_ADDR) option (shown for IPv4)
Due to the proliferation of NATs, it is reasonably likely that one
endpoint may attempt to advertise private addresses [6]. We do not
wish to blanket prohibit this, since there may be cases where both
endpoints have additional interfaces on the same private network. We
must ensure, however, that such advertisements do not cause harm.
The standard mechanism to create a new subflow (Section 4.2) contains
a randomly-generated 32-bit token that uniquely identifies the
connection to the receiving endpoint . If the token is unknown, the
endpoint will return with a RST. If the token is known, connection
setup will continue, but the sender's token will be sent back. In
order for a new subflow to be setup, both tokens must match what each
endpoint expects. This will provide sufficient protection against
two unconnected endpoints accidentally setting up a new subflow upon
the signal of a private address (furthermore, the mismatch in Data
Sequence Number that would occur would provide even further
protection).
Ideally, we'd like to ensure the ADD_ADDR (and REMOVE_ADDR) option is
sent reliably and in order to the other end. This is to ensure that
we don't close the connection when remove/add addresses are processed
in reverse order, and to ensure that all possible paths are used. We
note, however, that losing reliability and ordering it will not break
the multipath connections; they will just reduce the opportunity to
open multipath paths and to survive different patterns of path
failures.
Subflow level ACKs do not cover options, so if we want explicit
guarantees we need to build in other mechanisms. Solutions include
echoing the options and sending one option per RTT, or adding a
sequence number to the option which is explicitly acked in another
option. However, we feel these mechanisms' added complexity is not
worth the benefits they bring. There are two basic failure modes for
options: a) every new option gets stripped or b) some options get
stripped, randomly. The second option looks more like a middlebox
implementation error, so we believe it is not worth optimizing for.
In the first case, resending the option on a different subflow is the
thing to do. To achieve similar reliability without explicit ACKs,
we propose sending all ADD_ADDR/REMOVE_ADDR options on all existing
subflows. If ordering is needed, we should only send one ADD_ADDR/
REMOVE_ADDR option per RTT (modulo lost packets at subflow level).
When receiving an ADD_ADDR message with an address ID already in use
for that connection, the receiver SHOULD silently ignore the
ADD_ADDR.
During normal MPTCP operation, it is unlikely that there will be
sufficient TCP option space for ADD_ADDR to be included along with
those for data sequence numbering (Section 4.4.1). Therefore, it is
expected that an MPTCP implementation will send the ADD_ADDR option
on separate (either duplicate, or normal but lacking any payload)
ACKs. As with all TCP Options, the ADD_ADDR option does not have
reliable delivery. Therefore, a sender should send a duplicate ACK
with this option on all available subflows.
4.3.2. Remove Address
If, during the lifetime of a MPTCP connection, a previously-announced
address becomes invalid (e.g. if the interface disappears), the
affected endpoint should announce this so that the other endpoint can
remove subflows related to this address.
This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 7), which will remove a previously-added address (or list of
addresses) from a connection and terminate any subflows currently
using that address.
For security purposes, if a host receives a REMOVE_ADDR option, it
must ensure the affected path(s) are no longer in use before it
instigates closure. The receipt of REMOVE_ADDR should first trigger
the sending of a TCP Keepalive [7] on the path, and if a response is
received the path is not removed.
The sending and receipt (if no keepalive response was received) of
this message should trigger the sending of FINs by both endpoints on
the affected subflow(s) (if possible), as a courtesy to cleaning up
middlebox state, but endpoints may clean up their internal state
without a long timeout.
Address removal is undertaken by ID, so as to permit the use of NATs
and other middleboxes. If there is no address at the requested ID,
the receiver will silently ignore the request.
The standard way to close a subflow (so long as it is still
functioning) is to use a FIN exchange as in regular TCP - for more
information, see Section 4.5.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+
|Kind=REMOVEADDR| Length = 2+n | Address ID | ...
+---------------+---------------+---------------+
Figure 7: Remove Address (REMOVE_ADDR) option
4.4. General MPTCP Operation
This section discusses operation of MPTCP for data transfer. At a This section discusses operation of MPTCP for data transfer. At a
high level, an MPTCP implementation will take one input data stream high level, an MPTCP implementation will take one input data stream
from an application, and split it into one or more subflows, with from an application, and split it into one or more subflows, with
sufficient control information to allow it to be reassembled and sufficient control information to allow it to be reassembled and
delivered reliably and in-order to the recipient application. The delivered reliably and in-order to the recipient application. The
following subsections define this behaviour in detail. following subsections define this behaviour in detail.
4.4.1. Data Sequence Numbering 3.3.1. Data Sequence Numbering
The data stream as a whole can be reassembled through the use of the The data stream as a whole can be reassembled through the use of the
Data Sequence Mapping (DSN_MAP, Figure 8) option, which defines the Data Sequence Mapping (DSN_MAP, Figure 6) option, which defines the
mapping from the data sequence number to the subflow sequence number. mapping from the data sequence number to the subflow sequence number.
This is used by the receiver to ensure in-order delivery to the This is used by the receiver to ensure in-order delivery to the
application layer. Meanwhile, the subflow-level sequence numbers application layer. Meanwhile, the subflow-level sequence numbers
(i.e. the regular sequence numbers in the TCP header) have subflow- (i.e. the regular sequence numbers in the TCP header) have subflow-
only relevance. It is expected (but not mandated) that SACK [8] is only relevance. It is expected (but not mandated) that SACK [6] is
used at the subflow level to improve efficiency. used at the subflow level to improve efficiency.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
| Kind=DSN_MAP | Length | Data Sequence Number ... : | Kind=DSN_MAP | Length | Data Sequence Number ... :
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
: ... ( (length-8) octets ) | Data-level Length (2 octets) | : ... ( (length-12) octets ) | Data-level Length (2 octets) |
+-------------------------------+------------------------------+ +-------------------------------+------------------------------+
| Subflow Sequence Number (4 octets) | | Subflow Sequence Number (4 octets) |
+-------------------------------+------------------------------+ +--------------------------------------------------------------+
| CRC-32C (4 octets) | | CRC-32C (4 octets) |
+--------------------------------------------------------------+ +--------------------------------------------------------------+
Figure 8: Data Sequence Mapping (DSN_MAP) option Figure 6: Data Sequence Mapping (DSN_MAP) option
TBD: We could combine this with the DATA_ACK by adding the local DSN
too. However, this may not always be needed in this option, and this
option will not be present on all packets that include a DATA_ACK.
There is also the additional question of how to handle the different
possible DSN lengths. We could make only 4 and 8 octet ones valid,
and both must be the same?
This option specifies a full mapping from data sequence number to This option specifies a full mapping from data sequence number to
subflow sequence number, informing the receiver that there is a one- subflow sequence number, informing the receiver that there is a one-
to-one correspondence between the two sequence spaces for the to-one correspondence between the two sequence spaces for the
specified length. The purpose of the explicit mapping is to assist specified length (number of bytes of data). The purpose of the
with compatibility with situations where TCP/IP segmentation or explicit mapping is to assist with compatibility with situations
coalescing is undertaken separately from the stack that is generating where TCP/IP segmentation or coalescing is undertaken separately from
the data flow (e.g. through the use of TCP segmentation offloading on the stack that is generating the data flow (e.g. through the use of
network interface cards, or by middleboxes such as performance TCP segmentation offloading on network interface cards, or by
enhancing proxies). middleboxes such as performance enhancing proxies). It also allows a
single mapping to cover many packets, which may be useful in bulk
transfer situations.
The data sequence number specified in this option is absolute, The data sequence number specified in this option is absolute,
whereas the subflow sequence numbering is relative (the SYN at the whereas the subflow sequence numbering is relative (the SYN at the
start of the subflow has subflow sequence number 1). This is allow start of the subflow has relative subflow sequence number 1). This
middleboxes to change the Initial Sequence Number of a subflow, since is allow middleboxes to change the Initial Sequence Number of a
the data stream itself will not be affected (some firewalls do ISN subflow, since the data stream itself will not be affected (some
randomization). firewalls do ISN randomization).
The final four octets of this option contain a checksum of the data The final four octets of this option contain a checksum of the data
that this mapping covers. This is a CRC-32C checksum, the same as that this mapping covers. This is a CRC-32C checksum, the same as
used in SCTP [9]. This is used to detect if the payload has been used in SCTP [7]. This is used to detect if the payload has been
adjusted in any way by a non-MPTCP-aware middlebox. If this checksum adjusted in any way by a non-MPTCP-aware middlebox. If this checksum
fails, it will trigger a failure of the subflow, or a fallback to fails, it will trigger a failure of the subflow, or a fallback to
regular TCP, as documented in Section 4.6. regular TCP, as documented in Section 3.6.
TBD: Is this the most appropriate checksum, or would the IP checksum
algorithm be more appropriate?
A mapping is unique, in that the subflow sequence number is bound to A mapping is unique, in that the subflow sequence number is bound to
the data sequence number after the mapping has been processed. It is the data sequence number after the mapping has been processed. It is
not possible to change this mapping afterwards (although the length not possible to change this mapping afterwards (although the length
of a mapping can extend); however, the same data sequence number can of a mapping can extend); however, the same data sequence number can
be mapped on different subflows for retransmission purposes (see be mapped on different subflows for retransmission purposes (see
Section 4.4.4). Section 3.3.4).
To avoid possible deadline scenarios, subflow-level processing should To avoid possible deadlock scenarios, subflow-level processing should
be undertaken separately from that at connection-level. Therefore, be undertaken separately from that at connection-level. Therefore,
even if a mapping does not exist from the subflow space to the data- even if a mapping does not exist from the subflow space to the data-
level space, the data should still be ACKed at the subflow. This level space, the data should still be ACKed at the subflow. This
data cannot, however, be acknowledged at the data level data cannot, however, be acknowledged at the data level
(Section 4.4.2) because its data sequence numbers are unknown. (Section 3.3.2) because its data sequence numbers are unknown.
Implementations MAY hold onto such unmapped data for a short while in Implementations MAY hold onto such unmapped data for a short while in
the expectation than a mapping will arrive shortly. Such unmapped the expectation than a mapping will arrive shortly. Such unmapped
data cannot be counted as being within the receive window because data cannot be counted as being within the receive window because
this is relative to the data sequence numbers, so if the receiver this is relative to the data sequence numbers, so if the receiver
runs out of memory to hold this data, it will have to be discarded. runs out of memory to hold this data, it will have to be discarded.
If a mapping for that subflow-level sequence space does not arrive If a mapping for that subflow-level sequence space does not arrive
within a receive window of data, that subflow should be treated as within a receive window of data, that subflow should be treated as
broken, closed with an RST, and an unmapped data silently discarded. broken, closed with an RST, and an unmapped data silently discarded.
Data sequence numbers are always 64-bit quantities, and MUST be Data sequence numbers are always 64-bit quantities, and MUST be
maintained as such in implementations. If a connection is maintained as such in implementations. If a connection is
progressing at a slow rate, so protection against wrapped sequence progressing at a slow rate, so protection against wrapped sequence
numbers is not required, and if security requirements against blind numbers is not required, and if security requirements against blind
insertion attacks are not stringent, then it is permissible to insertion attacks are not stringent, then it is permissible to
include just the lower 32 bits of the sequence number in the DSN_MAP include just the lower 32 bits of the sequence number in the DSN_MAP
option as an optimization. Implementations MUST accept this and option as an optimization. Implementations MUST accept this and
implicitly promote it to a 64-bit quantity by incrementing the upper implicitly promote it to a 64-bit quantity by incrementing the upper
32 bits of sequence number the maintain each time the lower 32 bits 32 bits of sequence number each time the lower 32 bits wrap. By
wrap. By defauly, the full 64 bit DSN_MAP should be sent. Security defauly, the full 64 bit DSN_MAP should be sent. Security
implications are discussed in Section 5. implications are discussed in Section 5.
As with the standard TCP sequence number, the data sequence number As with the standard TCP sequence number, the data sequence number
should not start at zero, but at a random value to make blind session should not start at zero, but at a random value to make blind session
hijacking harder. This is done by including the most significant six hijacking harder. This is done by including the most significant six
octets of the initial data sequence number in the MP_CAPABLE option octets of the initial data sequence number in the MP_CAPABLE option
in the initial connection SYN (which itself occupies one octet of in the initial connection SYN (which itself occupies one octet of
data sequence space; see Section 4.1). data sequence space; see Section 3.1).
The DSN_MAP option does not need to be included in every MPTCP The DSN_MAP option does not need to be included in every MPTCP
packet, as long as the subflow sequence space in that packet is packet, as long as the subflow sequence space in that packet is
covered by a mapping known at the receiver. This can be used to covered by a mapping known at the receiver. This can be used to
reduce overhead in cases where the mapping is known in advance; one reduce overhead in cases where the mapping is known in advance; one
such case is when there is a single subflow between the endpoints, such case is when there is a single subflow between the endpoints,
another is when segments of data are scheduled in larger than packet- another is when segments of data are scheduled in larger than packet-
sized chunks. An "infinite" mapping can be used to fallback to sized chunks. An "infinite" mapping can be used to fallback to
regular TCP (see Section 4.6), which is achieved by setting the data- regular TCP by mapping the subflow-level data to the connection-level
level length field to the reserved value of 0. data for the remainder of the connection (see Section 3.6). This is
achieved by setting the data-level length field to the reserved value
4.4.2. Data Acknowledgements of 0.
In a perfect world, it would be possible to infer the acknowledgment 3.3.2. Data Acknowledgements
of data at the data-level from the receipt of subflow acks.
Unfortunately the existence of certain middleboxes that pro-actively
ACK packets might might cause deadlock conditions if data were acked
at the subflow level but then fails to reach the receiver. This sort
of bad interaction might be expecially prevalent when the receiver is
mobile.
To provide full end-to-end resilience, MPTCP provides a connection- To provide full end-to-end resilience, MPTCP provides a connection-
level acknowledgement, the DATA_ACK, illustrated in Figure 9, to act level acknowledgement, the DATA_ACK, illustrated in Figure 7, to act
as a cumulative ACK for the connection as a whole. This is analogous as a cumulative ACK for the connection as a whole. This is analogous
to the behaviour of the standard TCP cumulative ACK in TCP SACK - to the behaviour of the standard TCP cumulative ACK in TCP SACK -
indicating how much data has been successfully received (with no indicating how much data has been successfully received (with no
holes). holes).
The rationale for the inclusion of the DATA_ACK includes the
existence of certain middleboxes that pro-actively ACK packets, and
thus might cause deadlock conditions if data were acked at the
subflow level but then fails to reach the receiver. This sort of bad
interaction might be expecially prevalent when the receiver is
mobile. The DATA_ACK ensures the data has been delieverd to the
receiver.
An MPTCP sender MUST only free data from the send buffer when it has An MPTCP sender MUST only free data from the send buffer when it has
been acknowledged by both a DATA_ACK received on any subflow and at been acknowledged by both a DATA_ACK received on any subflow and at
the subflow level by any subflows the data was sent on. The former the subflow level by any subflows the data was sent on. The former
condition ensures liveness of the connection and the latter condition condition ensures liveness of the connection and the latter condition
ensures liveness and self-consistence of a subflow when data needs to ensures liveness and self-consistence of a subflow when data needs to
be restransmited. be restransmited.
The DATA_ACK option SHOULD be included in segments (data or pure The DATA_ACK option MAY be included in all segments, analogous to a
ACKs) whenever the DATA_ACK advances. This ensures the sender buffer standard TCP ACK. However, optimisations SHOULD be considered in
is freed, while reducing overhead when the data transfer is more advanced implementations, where the DATA_ACK option is present
unidirectional. in segments (data or pure ACKs) only when the DATA_ACK advances, and
this behaviour MUST be treated as valid. This behaviour ensures the
TBD: include in a single segment after a change, or in a few sender buffer is freed, while reducing overhead when the data
segments? Probably two makes sense if the segments are pure ACKs, as transfer is unidirectional.
they may be lost.
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
| Kind=DATA_ACK | Length | Data Sequence Number ... : | Kind=DATA_ACK | Length | Data Sequence Number ... :
+---------------+---------------+------------------------------+ +---------------+---------------+------------------------------+
: ... ( (length-8) octets ) | : ... ( (length-2) octets ) |
+-------------------------------+ +-------------------------------+
Figure 9: Connection-level Acknowledgement (DATA_ACK) Figure 7: Connection-level Acknowledgement (DATA_ACK)
4.4.3. Receiver Considerations 3.3.3. Receiver Considerations
Regular TCP advertises a receive window in each packet, telling the Regular TCP advertises a receive window in each packet, telling the
sender how much data the receiver is willing to accept past the sender how much data the receiver is willing to accept past the
cumulative ack. The receive window is used to implement flow cumulative ack. The receive window is used to implement flow
control, throttling down fast senders when receivers cannot keep up. control, throttling down fast senders when receivers cannot keep up.
MPTCP also uses a unique receive window, shared between the subflows. MPTCP also uses a unique receive window, shared between the subflows.
The idea is to allow any subflow to send data as long as the receiver The idea is to allow any subflow to send data as long as the receiver
is willing to accept it; the alternative, maintaining per subflow is willing to accept it; the alternative, maintaining per subflow
receive windows, could end-up stalling some subflows while others receive windows, could end-up stalling some subflows while others
skipping to change at page 22, line 19 skipping to change at page 17, line 5
have already been accepted in this way, so they can be ACKed have already been accepted in this way, so they can be ACKed
appropriately when the hole in the subflow sequence space in appropriately when the hole in the subflow sequence space in
subsequently filled. An implementation that does store such metadata subsequently filled. An implementation that does store such metadata
would still progress (the rules for freeing data at the sender ensure would still progress (the rules for freeing data at the sender ensure
this), but unnecessary retransmissions will result. this), but unnecessary retransmissions will result.
It is important for implementers to understand how large a receiver It is important for implementers to understand how large a receiver
buffer is appropriate. The lower bound for full network utilization buffer is appropriate. The lower bound for full network utilization
is the maximum bandwidth-delay product of any of the paths. However is the maximum bandwidth-delay product of any of the paths. However
this might be insufficient when a packet is lost on a slower subflow this might be insufficient when a packet is lost on a slower subflow
and needs to be retransmitted (see Section 4.4.4). A tight upper and needs to be retransmitted (see Section 3.3.4). A tight upper
bound would be the maximum RTT of any path multiplied by the total bound would be the maximum RTT of any path multiplied by the total
bandwidth available across all paths. This permits all subflows to bandwidth available across all paths. This permits all subflows to
continue at full speed while a packet is fast-retransmitted on the continue at full speed while a packet is fast-retransmitted on the
maximum RTT path. Even this might be insufficient to maintain full maximum RTT path. Even this might be insufficient to maintain full
performance in the event of a retransmit timeout on the maximum RTT performance in the event of a retransmit timeout on the maximum RTT
path. It is for future study to determine the relationship between path. It is for future study to determine the relationship between
retransmission strategies and receive buffer sizing. retransmission strategies and receive buffer sizing.
4.4.4. Sender Considerations 3.3.4. Sender Considerations
The sender remembers receiver window advertisements from the The sender remembers receiver window advertisements from the
receiver. It should only update its local receive window values when receiver. It should only update its local receive window values when
the largest sequence number allowed (i.e. DATA_ACK + receive window) the largest sequence number allowed (i.e. DATA_ACK + receive window)
increases. This is important to allow using paths with different increases. This is important to allow using paths with different
RTTs, and thus different feedback loops. RTTs, and thus different feedback loops.
Some classes of middleboxes may alter the TCP-level receive window. Some classes of middleboxes may alter the TCP-level receive window.
Typically these will shrink the offered window, although for short Typically these will shrink the offered window, although for short
periods of time it may be possible for the window to be larger periods of time it may be possible for the window to be larger
skipping to change at page 23, line 27 skipping to change at page 18, line 13
an endpoint must still retransmit the original data on the original an endpoint must still retransmit the original data on the original
subflow, in order to preserve the subflow integrity (middleboxes subflow, in order to preserve the subflow integrity (middleboxes
could replay old data, and/or could reject holes in subflows), and a could replay old data, and/or could reject holes in subflows), and a
receiver will ignore these retransmissions. While this is clearly receiver will ignore these retransmissions. While this is clearly
suboptimal, for compatibility reasons this is the best behaviour. suboptimal, for compatibility reasons this is the best behaviour.
Optimisations could be negotiated in future versions of this Optimisations could be negotiated in future versions of this
protocol. protocol.
This protocol specification does not mandate any mechanisms for This protocol specification does not mandate any mechanisms for
handling retransmissions, and much will be dependent upon local handling retransmissions, and much will be dependent upon local
policy (as discussed in Section 4.4.6). One can imagine aggressive policy (as discussed in Section 3.3.6). One can imagine aggressive
connection level retransmissions policies where every packet lost at connection level retransmissions policies where every packet lost at
subflow level is retransmitted on a different subflow (hence wasting subflow level is retransmitted on a different subflow (hence wasting
bandwidth but possibly reducing application-to-application delays), bandwidth but possibly reducing application-to-application delays),
or conservative retransmission policies where connection-level or conservative retransmission policies where connection-level
retransmits are only used after a few subflow level retransmission retransmits are only used after a few subflow level retransmission
timeouts occur. timeouts occur.
It is envisaged that a standard connection-level retransmission It is envisaged that a standard connection-level retransmission
mechanism would be implemented around a connection-level data queue: mechanism would be implemented around a connection-level data queue:
all segments that haven't been DATA_ACKed are stored. A timer (based all segments that haven't been DATA_ACKed are stored. A timer (based
skipping to change at page 24, line 10 skipping to change at page 18, line 45
failed after a predefined upper bound on retransmissions is reached, failed after a predefined upper bound on retransmissions is reached,
and only then delete the outstanding data segments. and only then delete the outstanding data segments.
A sender will maintain connection level timers for unacknowledged A sender will maintain connection level timers for unacknowledged
segments. These timers will be based on the subflow timers, and will segments. These timers will be based on the subflow timers, and will
guard against pro-active acking by middleboxes. guard against pro-active acking by middleboxes.
The send buffer must be, at the minimum, as big as the receive The send buffer must be, at the minimum, as big as the receive
buffer, to enable the sender to reach maximum throughput. buffer, to enable the sender to reach maximum throughput.
4.4.5. Congestion Control Considerations 3.3.5. Congestion Control Considerations
Different subflows in an MPTCP connection have different congestion Different subflows in an MPTCP connection have different congestion
windows. To achieve fairness at bottlenecks and resource pooling, it windows. To achieve fairness at bottlenecks and resource pooling, it
is necessary to couple the congestion windows in use on each subflow, is necessary to couple the congestion windows in use on each subflow,
in order to push most traffic to uncongested links. One algorithm in order to push most traffic to uncongested links. One algorithm
for achieving this is presented in [4]; the algorithm does not for achieving this is presented in [4]; the algorithm does not
achieve perfect resource pooling but is "safe" in that it is readily achieve perfect resource pooling but is "safe" in that it is readily
deployable in the current Internet. deployable in the current Internet.
It is foreseeable that different congestion controllers will be It is foreseeable that different congestion controllers will be
implemented for MPTCP, each aiming to achieve different properties in implemented for MPTCP, each aiming to achieve different properties in
the resource pooling/fairness/stability design space. Much research the resource pooling/fairness/stability design space. Much research
is expected in this area in the near future. is expected in this area in the near future.
Regardless of the algorithm used, the design of the MPTCP protocol Regardless of the algorithm used, the design of the MPTCP protocol
aims to provide the congestion control implementations sufficient aims to provide the congestion control implementations sufficient
information to take the right decisions; this information includes, information to take the right decisions; this information includes,
for each subflow, which packets where lost and when. for each subflow, which packets where lost and when.
4.4.6. Subflow Policy 3.3.6. Subflow Policy
Within a local MPTCP implementation, a host may use any local policy Within a local MPTCP implementation, a host may use any local policy
it wishes to decide how to share the traffic to be sent over the it wishes to decide how to share the traffic to be sent over the
available paths. available paths.
In the typical use case, where the goal is to maximise throughput, In the typical use case, where the goal is to maximise throughput,
all available paths will be used simultaneously for data transfer, all available paths will be used simultaneously for data transfer,
using coupled congestion control as described in [4]. It is using coupled congestion control as described in [4]. It is
expected, however, that other use cases will appear. expected, however, that other use cases will appear.
skipping to change at page 25, line 17 skipping to change at page 19, line 51
since receivers will often be the multihomed party, and may have to since receivers will often be the multihomed party, and may have to
pay for metered incoming bandwidth. Instead of incorporating complex pay for metered incoming bandwidth. Instead of incorporating complex
signalling, it is proposed to use existing TCP features to signal signalling, it is proposed to use existing TCP features to signal
priority implicitly. If a receiver wishes to keep a path active as a priority implicitly. If a receiver wishes to keep a path active as a
backup but wishes to prevent data being sent on that path, it could backup but wishes to prevent data being sent on that path, it could
stop sending ACKs for any data it receives on that path. The sender stop sending ACKs for any data it receives on that path. The sender
would interpret this as severe congestion or a broken path and stop would interpret this as severe congestion or a broken path and stop
using it. We do not advocate this method, however, since this will using it. We do not advocate this method, however, since this will
result in unnecessary retransmissions. result in unnecessary retransmissions.
Therefore, a proposal is to use ECN [10] to to provide fake Therefore, a proposal is to use ECN [8] to to provide fake congestion
congestion signals on paths that a receiver wishes to stop being used signals on paths that a receiver wishes to stop being used for data.
for data. This has the benefit of causing the sender to back off This has the benefit of causing the sender to back off without the
without the need to retransmit data unnecessarily, as in the case of need to retransmit data unnecessarily, as in the case of a lost ACK.
a lost ACK. This should be sufficient to allow a receiver to express This should be sufficient to allow a receiver to express their
their policy, although does not permit a rapid increase in throughput policy, although does not permit a rapid increase in throughput when
when switching to such a path. switching to such a path.
TBD: This is clearly an overload of the ECN signal, and as such other TBD: This is clearly an overload of the ECN signal, and as such other
solutions, such as explicitly signalling path operation preferences solutions, such as explicitly signalling path operation preferences
(such as in the reserved bits of certain TCP options, or through (such as in the reserved bits of certain TCP options, or through
entirely new options) may be a preferred solution. entirely new options) may be a preferred solution.
4.5. Closing a Connection 3.4. Closing a Connection
In regular TCP a FIN announces the receiver that the sender has no In regular TCP a FIN announces the receiver that the sender has no
more data to send. In order to allow subflows to operate more data to send. In order to allow subflows to operate
independently and to keep the appearance of TCP over the wire, a FIN independently and to keep the appearance of TCP over the wire, a FIN
in MPTCP only affects the subflow on which it is sent. This allows in MPTCP only affects the subflow on which it is sent. This allows
nodes to exercise considerable freedom over which paths are in use at nodes to exercise considerable freedom over which paths are in use at
any one time. The semantics of a FIN remain as for regular TCP, i.e. any one time. The semantics of a FIN remain as for regular TCP, i.e.
it is not until both sides have ACKed each other's FINs that the it is not until both sides have ACKed each other's FINs that the
subflow is fully closed. subflow is fully closed.
When an application calls close() on a socket, this indicates that it When an application calls close() on a socket, this indicates that it
has no more data to send, and for regular TCP this would result in a has no more data to send, and for regular TCP this would result in a
FIN on the connection. For MPTCP, an equivalent mechanism is needed, FIN on the connection. For MPTCP, an equivalent mechanism is needed,
and this is the DATA_FIN. This option, shown in Figure 10, is and this is the DATA_FIN. This option, shown in Figure 8, is
attached to a regular FIN option on a subflow. attached to a regular FIN option on a subflow.
A DATA_FIN is an indication that the sender has no more data to send, A DATA_FIN is an indication that the sender has no more data to send,
and as such can be used as a rapid indication of the end of data from and as such can be used as a rapid indication of the end of data from
a sender. A DATA_FIN, as with the FIN on a regular TCP connection, a sender. A DATA_FIN, as with the FIN on a regular TCP connection,
is a unidirectional signal. is a unidirectional signal.
A DATA_FIN occupies one octet (the final octet) of Data Sequence A DATA_FIN occupies one octet (the final octet) of Data Sequence
Number space. This number is included in the option, and will be Number space. This number is included in the option, and will be
ACKed at data level to ensure reliable delivery. ACKed at data level to ensure reliable delivery.
skipping to change at page 26, line 42 skipping to change at page 21, line 30
It should be noted that an endpoint may also send a FIN on an It should be noted that an endpoint may also send a FIN on an
individual subflow to shut it down, but this impact is limited to the individual subflow to shut it down, but this impact is limited to the
subflow in question. If all subflows have been closed with a FIN, subflow in question. If all subflows have been closed with a FIN,
that is equivalent to having closed the connection with a DATA_FIN. that is equivalent to having closed the connection with a DATA_FIN.
The full eight-byte data sequence number is always included in a The full eight-byte data sequence number is always included in a
DATA_FIN. DATA_FIN.
1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+--------------+ +---------------+---------------+------------------------------+
| Kind=DATA_FIN | Length=10 | Data Sequence Number (8B) : | Kind=DATA_FIN | Length=10 | Data Sequence Number (8B) :
+---------------+---------------+---------------+--------------+ +---------------+---------------+------------------------------+
: Data Sequence Number (contd.) : : Data Sequence Number (contd.) :
+---------------+---------------+---------------+--------------+ +-------------------------------+------------------------------+
: Data Sequence Number (contd.)| : Data Sequence Number (contd.)|
+---------------+---------------+ +-------------------------------+
Figure 10: DATA_FIN option Figure 8: DATA_FIN option
4.6. Fallback 3.5. Address Knowledge Exchange (Path Management)
We use the term "path management" to refer to the exchange of
information about additional paths between endpoints, which in this
design is managed by multiple addresses at endpoints. For more
detail of the architectural thinking behind this design, see the
separate architecture document [3].
This design makes use of two methods of sharing such information,
used simultaneously. The first is the direct setup of new subflows,
already described in Section 3.2, where the initiator has an
additional address. The second method, described in the following
subsections, signals addresses explicitly to the other endpoint to
allow it to initiate new subflows. The two mechanisms are
complementary: the first is implicit and simple, while the explicit
is more complex but is more robust. Together, the mechanisms allow
addresses to change in flight (and thus support operation through
NATs, since the source address need not be known), and also allow the
signalling of previously unknown addresses, and of addresses
belonging to other address families (e.g. IPv4 and IPv6).
Here is an example of typical operation of the protocol:
o A1 of host A and address/port B1 of host B. If host A is
multihomed, it can start an additional subflow from its address A2
to B1, by sending a SYN with a Join option from A2 to B1, using
B's previously declared token for this connection. Alternatively,
if B is multhomed, it can try to set up a new subflow from B2 to
A1, using A's previously declared token. In either case, the SYN
will be sent to the port already in use for the original subflow
on the receiving host.
o Simultaneously (or after a timeout), an ADD_ADDR option
(Section 3.5.1) is sent on an existing subflow, informing the
receiver of the sender's alternative address(es). The recipient
can use this information to open a new subflow to the sender's
additional address. In our example, A will send ADD_ADDR option
informing B of address A2. The mix of using the SYN-based option
and the ADD_ADDR option, including timeouts, is implementation-
specific and can be tailored to agree with local policy.
o If subflow A2-B1 is succesfully setup, host B1 can use the Address
ID in the Join option to correlate this with the ADD_ADDR option
that will also arrive on an existing subflow; now B knows not to
open A2-B1, ignoring the ADD_ADDR. Otherwise, if B has not
received the A2-B1 SYN join but received the ADD_ADDR, it will try
to initiate a new subflow from one or more of its addresses to
address A2. This permits new sessions to be opened if one
endpoint is behind a NAT. A slight security improvement can be
gained if a host ensures there is a correlated ADD_ADDR option
before responding to the SYN.
Other ways of using the two signaling mechanisms are possible; for
instance, signaling addresses in other address families can only be
done explicitly using the Add Address option.
3.5.1. Address Advertisement
The Add Address (ADD_ADDR) TCP Option announces additional addresses
on which an endpoint can be reached (Figure 9). It can be used to
announce several (ID, address) pairs to be announced to the other
endpoint. Multiple addresses can be added in a single message if
there is sufficient TCP option space, otherwise multiple TCP messages
containing this option will be sent. This option can be used at any
time during a connection, depending on when the sender wishes to
enable multiple paths and/or when paths become available.
Every address has an ID which can be used for address removal, and
therefore endpoints must cache the mapping between ID and address.
This is also used to identify Join Connection options (Section 3.2)
relating to the same address, even when address translators are in
use. The ID must be unique to the sender and connection, per
address, but its mechanism for allocating such IDs is implementation-
specific.
This option is shown for IPv4. For IPv6, the IPVer field will read
6, and the length of the address will be 16 octets (instead of 4),
and the length of the option will be 2 + (18 * number_of_entries).
If there is sufficient TCP option space, multiple addresses can be
included, with an ID following on immediately from the previous
address. The number of addresses can be deduced from the option
length and version fields.
The 'P' bit is used to indicate the presence of an additional two
octets specifying the port number to use. Although it is expected
that the majority of use cases will use the same port pairs as used
for the initial subflow (e.g. port 80 remains port 80 on all
subflows, as does the ephemeral port at the client, there may be
cases (such as port-based load balancing) where the explicit
specification of a different port is required. If the P bit is not
specified, MPTCP MUST attempt to connect to the specified address on
same port as is already in use by the signalling subflow.
[TBD: We could make use of an additional flag, as follows. Exact
behaviour to be worked out: The 'B' bit is used to indicate that this
specified address (and port, if applicable) should be treated as a
backup subflow to use only in the event of failure of other working
subflows. A receiver of this option SHOULD set up a TCP subflow to
the specified address and port, but SHOULD NOT send data on it until
the other paths have failed.]
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+-------+-------+
| Kind=ADD_ADDR | Length | Address ID | IPVer |(res)|P|
+---------------+---------------+---------------+-------+-------+
| Address (IPv4 - 4 octets / IPv6 - 16 octets) |
+-------------------------------+-------------------------------+
| Port (2 octets if P=1) | ...
+-------------------------------+
( ... further ID/Version/Address/Port fields as required ... )
Figure 9: Add Address (ADD_ADDR) option (shown for IPv4)
Due to the proliferation of NATs, it is reasonably likely that one
endpoint may attempt to advertise private addresses [9]. We do not
wish to blanket prohibit this, since there may be cases where both
endpoints have additional interfaces on the same private network. We
must ensure, however, that such advertisements do not cause harm.
The standard mechanism to create a new subflow (Section 3.2) contains
a randomly-generated 32-bit token that uniquely identifies the
connection to the receiving endpoint . If the token is unknown, the
endpoint will return with a RST. If the token is known, subflow
setup will continue, but the sender's token will be sent back. In
order for a new subflow to be setup, both tokens must match what each
endpoint expects. This will provide sufficient protection against
two unconnected endpoints accidentally setting up a new subflow upon
the signal of a private address (furthermore, the mismatch in Data
Sequence Number that would occur would provide even further
protection).
Ideally, we'd like to ensure the ADD_ADDR (and REMOVE_ADDR) option is
sent reliably and in order to the other end. This is to ensure that
we don't close the connection when remove/add addresses are processed
in reverse order, and to ensure that all possible paths are used. We
note, however, that losing reliability and ordering it will not break
the multipath connections; they will just reduce the opportunity to
open multipath paths and to survive different patterns of path
failures.
Subflow level ACKs do not cover options, so if we want explicit
guarantees we need to build in other mechanisms. Solutions include
echoing the options and sending one option per RTT, or adding a
sequence number to the option which is explicitly acked in another
option. However, we feel these mechanisms' added complexity is not
worth the benefits they bring. There are two basic failure modes for
options: a) every new option gets stripped or b) some options get
stripped, randomly. The second option looks more like a middlebox
implementation error, so we believe it is not worth optimizing for.
In the first case, resending the option on a different subflow is the
thing to do. To achieve similar reliability without explicit ACKs,
we propose sending all ADD_ADDR/REMOVE_ADDR options on all existing
subflows. If ordering is needed, we should only send one ADD_ADDR/
REMOVE_ADDR option per RTT (modulo lost packets at subflow level).
When receiving an ADD_ADDR message with an address ID already in use
for that connection, the receiver SHOULD silently ignore the
ADD_ADDR.
During normal MPTCP operation, it is unlikely that there will be
sufficient TCP option space for ADD_ADDR to be included along with
those for data sequence numbering (Section 3.3.1). Therefore, it is
expected that an MPTCP implementation will send the ADD_ADDR option
on separate (either duplicate, or normal but lacking any payload)
ACKs.
As with all TCP Options, the ADD_ADDR option does not have reliable
delivery. Therefore, a sender should send a duplicate ACK with this
option on all available subflows.
3.5.2. Remove Address
If, during the lifetime of a MPTCP connection, a previously-announced
address becomes invalid (e.g. if the interface disappears), the
affected endpoint should announce this so that the other endpoint can
remove subflows related to this address.
This is achieved through the Remove Address (REMOVE_ADDR) option
(Figure 10), which will remove a previously-added address (or list of
addresses) from a connection and terminate any subflows currently
using that address.
For security purposes, if a host receives a REMOVE_ADDR option, it
must ensure the affected path(s) are no longer in use before it
instigates closure. The receipt of REMOVE_ADDR should first trigger
the sending of a TCP Keepalive [10] on the path, and if a response is
received the path is not removed. Typical TCP validity tests on the
subflow (e.g. ensuring sequence and ack numbers are correct) MUST
also be undertaken.
The sending and receipt (if no keepalive response was received) of
this message SHOULD trigger the sending of RSTs by both endpoints on
the affected subflow(s) (if possible), as a courtesy to cleaning up
middlebox state, but endpoints may clean up their internal state
without a long timeout.
Address removal is undertaken by ID, so as to permit the use of NATs
and other middleboxes. If there is no address at the requested ID,
the receiver will silently ignore the request.
The standard way to close a subflow (so long as it is still
functioning) is to use a FIN exchange as in regular TCP - for more
information, see Section 3.4.
1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+
|Kind=REMOVEADDR| Length = 2+n | Address ID | ...
+---------------+---------------+---------------+
Figure 10: Remove Address (REMOVE_ADDR) option
3.6. Fallback
At the start of a MPTCP connection (i.e. the first subflow), it is At the start of a MPTCP connection (i.e. the first subflow), it is
important to ensure that the path is fully MPTCP-capable and the important to ensure that the path is fully MPTCP-capable and the
necessary TCP options can reach each endpoint. The handshake as necessary TCP options can reach each endpoint. The handshake as
described in Section 4.1 will fall back to regular TCP if either of described in Section 3.1 will fall back to regular TCP if either of
the SYN messages do not have the MPTCP options: this is the same, and the SYN messages do not have the MPTCP options: this is the same, and
desired, behaviour in the case where an endpoint is not MPTCP desired, behaviour in the case where an endpoint is not MPTCP
capable, or the path does not support he MPTCP options. When capable, or the path does not support he MPTCP options. When
attempting to join an existing MPTCP connection (Section 4.2), if a attempting to join an existing MPTCP connection (Section 3.2), if a
path is not MPTCP capable, the TCP options will not get through on path is not MPTCP capable, the TCP options will not get through on
the SYNs and the subflow will be closed. the SYNs and the subflow will be closed.
There is, however, another corner case which should be addressed. There is, however, another corner case which should be addressed.
That is one of MPTCP options getting through on the SYN, but not on That is one of MPTCP options getting through on the SYN, but not on
regular packets. This can be resolved if the subflow is the first regular packets. This can be resolved if the subflow is the first
subflow, and thus all data in flight is contiguous. This resolution subflow, and thus all data in flight is contiguous. This resolution
mechanism is as follows: mechanism is as follows:
o The first window's worth of data MUST be DATA_ACKed on every o The first window's worth of data MUST be DATA_ACKed on every
skipping to change at page 27, line 48 skipping to change at page 27, line 14
occur on any other subflow apart from the start of the initial occur on any other subflow apart from the start of the initial
subflow, it should be treated as a standard path failure. The data subflow, it should be treated as a standard path failure. The data
would not be DATA_ACKed (since there is no mapping for the data), and would not be DATA_ACKed (since there is no mapping for the data), and
the subflow can be closed with an RST. the subflow can be closed with an RST.
The case described above is a specialised case of fallback. More The case described above is a specialised case of fallback. More
generally, fallback to regular TCP can become necessary at any point generally, fallback to regular TCP can become necessary at any point
during a connection if a non-MPTCP-aware middlebox changes the data during a connection if a non-MPTCP-aware middlebox changes the data
stream. stream.
As described in Section 4.4, each portion of data for which there is As described in Section 3.3, each portion of data for which there is
a mapping is protected by a CRC-32 checksum. This mechanism is used a mapping is protected by a CRC-32 checksum. This mechanism is used
to detect if middleboxes have made any adjustments to the payload to detect if middleboxes have made any adjustments to the payload
(added, removed, or changed data). A checksum will fail if the data (added, removed, or changed data). A checksum will fail if the data
has been changed in any way. This will also detect if the length of has been changed in any way. This will also detect if the length of
data on the subflow is increased or decreased, and this means the data on the subflow is increased or decreased, and this means the
Data Sequence Mapping is no longer valid. The sender no longer knows Data Sequence Mapping is no longer valid. The sender no longer knows
what subflow-level sequence number the receiver is genuinely what subflow-level sequence number the receiver is genuinely
operating at (the middlebox will be faking ACKs in return), and operating at (the middlebox will be faking ACKs in return), and
cannot signal any further mappings. Furthermore, in addition to the cannot signal any further mappings. Furthermore, in addition to the
possibility of payload modifications that are valid at the possibility of payload modifications that are valid at the
skipping to change at page 28, line 30 skipping to change at page 27, line 45
(notably, any changes to the subflow sequence numbering). Therefore, (notably, any changes to the subflow sequence numbering). Therefore,
it is not possible to recover the subflow, and the affected subflow it is not possible to recover the subflow, and the affected subflow
must be immediately closed with an RST, featuring a "checksum failed" must be immediately closed with an RST, featuring a "checksum failed"
option, which defines the Data Sequence Number at the start of the option, which defines the Data Sequence Number at the start of the
segment (defined by the Data Sequence Mapping) which had the checksum segment (defined by the Data Sequence Mapping) which had the checksum
failure (see Figure 11). failure (see Figure 11).
1 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+---------------+---------------+---------------+--------------+ +---------------+---------------+---------------+--------------+
| Kind=MP_FAIL | Length=10 | Data Sequence Number (8B) : | Kind=MP_FAIL | Length=10 | Data Sequence Number (8B) :
+---------------+---------------+---------------+--------------+ +---------------+---------------+------------------------------+
: Data Sequence Number (contd.) : : Data Sequence Number (contd.) :
+---------------+---------------+---------------+--------------+ +-------------------------------+------------------------------+
: Data Sequence Number (contd.)| : Data Sequence Number (contd.)|
+---------------+---------------+ +-------------------------------+
Figure 11: Fallback (MP_FAIL) option Figure 11: Fallback (MP_FAIL) option
TBD: In this case, is there any point in signalling Checksum Failed, TBD: In this case, is there any point in signalling Checksum Failed,
or could we just RST the subflow? The signal would allow the sender or could we just RST the subflow? The signal would allow the sender
to know there is something wrong with the path and not try to re- to know there is something wrong with the path and not try to re-
establish the subflow (if that was otherwise the policy). establish the subflow (if that was otherwise the policy).
Failed data will not be DATA_ACKed and so will be re-transmitted on Failed data will not be DATA_ACKed and so will be re-transmitted on
other subflows (Section 4.4.4). other subflows (Section 3.3.4).
A special case is when there is a single subflow and it fails with a A special case is when there is a single subflow and it fails with a
checksum error. Here, MPTCP should be able to recover and continue checksum error. Here, MPTCP should be able to recover and continue
sending data. There are two possible mechanisms to support this. sending data. There are two possible mechanisms to support this.
The first and simplest is to nevertheless close the subflow with a The first and simplest is to nevertheless close the subflow with a
RST, and immediately establish a new one as part of the same MPTCP RST, and immediately establish a new one as part of the same MPTCP
connection. Since it is known that the path may be compromised, it connection. Since it is known that the path may be compromised, it
is not desirable to use MPTCP's segmentation on this path any longer. is not desirable to use MPTCP's segmentation on this path any longer.
The new subflow will begin and will signal an infinite mapping The new subflow will begin and will signal an infinite mapping
(indicated by length=0 in the Data Sequence Mapping option, (indicated by length=0 in the Data Sequence Mapping option,
Section 4.4) from the data sequence number of the segment that failed Section 3.3) from the data sequence number of the segment that failed
the checksum. This connection will then continue to appear as a the checksum. This connection will then continue to appear as a
regular TCP session, and a middlebox may change the payload without regular TCP session, and a middlebox may change the payload without
causing unintentional harm. causing unintentional harm.
An optimisation is possible, however. If it is known that all An optimisation is possible, however. If it is known that all
unacknowledged data in flight is contiguous, an infinite mapping unacknowledged data in flight is contiguous, an infinite mapping
could be applied to the subflow without the need to close it first, could be applied to the subflow without the need to close it first,
and essentially turn off all further MPTCP signalling. In this case, and essentially turn off all further MPTCP signalling. In this case,
if a receiver identifies a checksum failure when there is only one if a receiver identifies a checksum failure when there is only one
path, it will send back an OPT_FAIL on the subflow-level ACK. The path, it will send back an OPT_FAIL on the subflow-level ACK. The
skipping to change at page 29, line 48 skipping to change at page 29, line 14
data. However, subflows can be opened and close as necessary, as data. However, subflows can be opened and close as necessary, as
long as a single one is active at any point. long as a single one is active at any point.
It should be emphasised that we are not attempting to prevent the use It should be emphasised that we are not attempting to prevent the use
of middleboxes that want to adjust the payload. An MPTCP-aware of middleboxes that want to adjust the payload. An MPTCP-aware
middlebox to provide such functionality could be designed that would middlebox to provide such functionality could be designed that would
re-write checksums if needed, and additionally would be able to parse re-write checksums if needed, and additionally would be able to parse
the data sequence mappings, and thus not hit false positives though the data sequence mappings, and thus not hit false positives though
not knowing where data boundaries lie. not knowing where data boundaries lie.
4.7. Error Handling 3.7. Error Handling
In addition to the fallback mechanism as described above, the In addition to the fallback mechanism as described above, the
standard classes of TCP errors may need to be handled in an MPTCP- standard classes of TCP errors may need to be handled in an MPTCP-
specific way. Note that changing semantics - such as the relevance specific way. Note that changing semantics - such as the relevance
of an RST - has already been covered in Section 3. Where possible, of an RST - has already been covered in Section 4. Where possible,
we do not want to deviate from regular TCP behaviour. we do not want to deviate from regular TCP behaviour.
The following list covers possible errors and the appropriate MPTCP The following list covers possible errors and the appropriate MPTCP
behaviour: behaviour:
o Unknown token in MP_JOIN (or token mismatch in MP_JOIN ACK, or o Unknown token in MP_JOIN (or token mismatch in MP_JOIN ACK, or
missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's missing MP_JOIN in SYN/ACK response): send RST (analogous to TCP's
behaviour on an unknown port) behaviour on an unknown port)
o (TBD: If we include DSN in MP_JOIN, and the DSN is out of the o (TBD: If we include DSN in MP_JOIN, and the DSN is out of the
window but the token is valid, do we still send an RST?) window but the token is valid, do we still send an RST?)
o DSN out of Window (during normal operation): just ignore, however o DSN out of Window (during normal operation): just ignore, however
if at the beginning of a new subflow we might want to RST it as a if at the beginning of a new subflow we might want to RST it as a
security mechanism security mechanism
o Remove request for unknown address ID: silently ignore o Remove request for unknown address ID: silently ignore
4.8. Heuristics 3.8. Heuristics
There are a number of heuristics that are needed for performance or There are a number of heuristics that are needed for performance or
deployment but which are not required for protocol correctness. In deployment but which are not required for protocol correctness. In
this section we detail such heuristics this section we detail such heuristics
4.8.1. Port Usage 3.8.1. Port Usage
Under typical operation an MPTCP implementation SHOULD use the same Under typical operation an MPTCP implementation SHOULD use the same
ports as already in use. In other words, the destination port of a ports as already in use. In other words, the destination port of a
SYN containing a MP_JOIN option SHOULD be the same as the remote port SYN containing a MP_JOIN option SHOULD be the same as the remote port
of the first subflow in the connection. The local port for such SYNs of the first subflow in the connection. The local port for such SYNs
SHOULD also be the same as for the first subflow (and as such, an SHOULD also be the same as for the first subflow (and as such, an
implementation SHOULD reserve ephemeral ports across all local IP implementation SHOULD reserve ephemeral ports across all local IP
addresses), although there may be cases where this is infeasible. addresses), although there may be cases where this is infeasible.
This strategy is intended to maximize the probability of the SYN This strategy is intended to maximize the probability of the SYN
being permitted by a firewall or NAT at the recipient and to avoid being permitted by a firewall or NAT at the recipient and to avoid
confusing any network monitoring software. confusing any network monitoring software.
There may also be cases, however, where the passive opener wishes to There may also be cases, however, where the passive opener wishes to
signal to the other endpoint that a specific port should be used, and signal to the other endpoint that a specific port should be used, and
this facility is provided in the Add Address option as documented in this facility is provided in the Add Address option as documented in
Section 4.3.1. It is therefore feasible to allow multiple subflows Section 3.5.1. It is therefore feasible to allow multiple subflows
between the same two addresses but using different port pairs, and between the same two addresses but using different port pairs, and
such a facility could be such a facility could be used to allow load such a facility could be such a facility could be used to allow load
balancing within the network based on 5-tuples (e.g. ECMP). balancing within the network based on 5-tuples (e.g. ECMP).
4. Semantic Issues
In order to support multipath operation, the semantics of some TCP
components have changed. To aid clarity, this section collects these
semantic changes as a reference.
Sequence Number: The (in-header) TCP sequence number is specific to
the subflow. To allow the receiver to reorder application data,
an additional data-level sequence space is used. In this data-
level sequence space, the initial SYN and the final DATA_FIN
occupy one octet of sequence space. There is an explicit mapping
of data sequence space to subflow sequence space, which is
signalled through TCP options in data packets.
ACK: The ACK field in the TCP header acknowledges only the subflow
sequence number, not the data-level sequence space.
Implementations SHOULD NOT attempt to infer a data-level
acknowledgement from the subflow ACKs. Instead an explicit data-
level DATA_ACK is used. This avoids possible deadlock scenarios
when a non-TCP-aware middlebox pro-actively ACKs at the subflow
level.
Receive Window: The receive window in the TCP header indicates the
amount of free buffer space for the whole data-level connection
(as opposed to for this subflow) that is available at the
receiver. This is the same semantics as regular TCP, but to
maintain these semantics the receive window must be interpreted at
the sender as relative to the sequence number given in the
DATA_ACK rather than the subflow ACK in the TCP header. In this
way the original flow control role is preserved.
FIN: The FIN flag in the TCP header applies only to the subflow it
is sent on, not to the whole connection. For connection-level FIN
semantics, the DATA_FIN option is used.
RST: The RST flag in the TCP header applies only to the subflow it
is sent on, not to the whole connection. A connection is
considered reset if a RST is received on every subflow.
Address List: Address list management (i.e. knowledge of the local
and remote hosts' lists of available IP addresses) is handled on a
per-connection basis (as opposed to per-subflow, per host, or per
pair of communicating hosts). This permits the application of
per-connection local policy. Adding an address to one connection
(either explicitly through an Add Address message, or implicitly
through a Join) has no implication for other connections between
the same pair of hosts.
5-tuple: The 5-tuple (protocol, local address, local port, remote
address, remote port) presented by kernel APIs to the application
layer in a non-multipath-aware application is that of the first
subflow, even if the subflow has since been closed and removed
from the connection. This decision, and other related API issues,
are discussed in more detail in [5].
5. Security Considerations 5. Security Considerations
TBD TBD
(Token generation, handshake mechanisms, new subflow authentication, (Token generation, handshake mechanisms, new subflow authentication,
etc...) etc...)
A generic threat analysis for the addition of multipath capabilities A generic threat analysis for the addition of multipath capabilities
to TCP is presented in [11]. The protocol presented here has been to TCP is presented in [11]. The protocol presented here has been
designed to minimise or eliminate these identified threats. (A designed to minimise or eliminate these identified threats. (A
future version of this document will explicitly address the presented future version of this document will explicitly address the presented
threats). threats).
The development of a TCP extension such as this will bring with it The development of a TCP extension such as this will bring with it
many additional security concerns. We have set out here to produce a many additional security concerns. We have set out here to produce a
solution that is "no worse" than current TCP, with the possibility solution that is "no worse" than current TCP, with the possibility
that more secure extensions could be proposed later. that more secure extensions could be proposed later.
The primary area of concern will be around the handshake to start new The primary area of concern will be around the handshake to start new
subflows which join existing connections. The proposal set out in subflows which join existing connections. The proposal set out in
Section 4.1 and Section 4.2 is for the initiator of the new subflow Section 3.1 and Section 3.2 is for the initiator of the new subflow
to include the token of the other endpoint in the handshake. The to include the token of the other endpoint in the handshake. The
purpose of this is to indicate that the sender of this token was the purpose of this is to indicate that the sender of this token was the
same entity that received this token at the initial handshake. same entity that received this token at the initial handshake.
One area of concern is that the token could be simply brute-forced. One area of concern is that the token could be simply brute-forced.
The token must be hard to guess, and as such could be randomly The token must be hard to guess, and as such could be randomly
generated. This may still not be strong enough, however, and so the generated. This may still not be strong enough, however, and so the
use of 64 bits for the token would alleviate this somewhat. use of 64 bits for the token would alleviate this somewhat.
The two tokens don't need to be the same length. Token B could be 64 The two tokens don't need to be the same length. Token B could be 64
skipping to change at page 33, line 46 skipping to change at page 34, line 21
but performance will degrade as the fraction of stripped options but performance will degrade as the fraction of stripped options
increases. We do not expect such cases to appear in practice, increases. We do not expect such cases to appear in practice,
though: most middleboxes will either strip all options or let them though: most middleboxes will either strip all options or let them
all through. all through.
We end this section with a list of middlebox classes, their behaviour We end this section with a list of middlebox classes, their behaviour
and the elements in the MPTCP design that allow operation through and the elements in the MPTCP design that allow operation through
such middleboxes. Issues surrounding dropping packets with options such middleboxes. Issues surrounding dropping packets with options
or stripping options were discussed above, and are not included here: or stripping options were discussed above, and are not included here:
o NAT [12]: will prevent flow/subflow setup when the server does not o NAT [12]: changes the source address and port of packets. This
have a public address. MPTCP assumes the server has at least one means that a host will not know its public-facing address for
public address (or the client uses standard NAT traversal to reach signalling in MPTCP. Therefore, MPTCP permits implicit address
it) that is used to setup the connection. If uses ADD_ADDR addition via the MP_JOIN option, and has heuristics to ensure that
messages to signal the existence of other addresses. connection attempts to private addresses [9] do not cause
problems. Address removal is undertaken by an ID number to allow
no knowledge of the source address.
o Performance Enhancing Proxies [13]: might pro-actively ACK data o Performance Enhancing Proxies (PEPs) [13]: might pro-actively ACK
and then fail. MPTCP uses the DATA_ACK to make progress when one data to increase performance. Problems will occur if a PEP ACKs
of its subflows fails in this way. This is why MPTCP does not use data and then fails before sending it on to the receiver, of it
subflow ACKs to infer connection level ACKs. the receiver is mobile and moves away before proactively ACKed
data is forwarded on. If subflow ACKs were used to control send
buffering, the data could be lost and never be retransmitted, thus
causing the subflow to permanently stall. MPTCP therefore uses
the DATA_ACK to make progress when one of its subflows fails in
this way. This is why MPTCP does not use subflow ACKs to infer
connection level ACKs.
o Traffic Normalizers [14]: do not allow holes in sequence numbers, o Traffic Normalizers [14]: do not allow holes in sequence numbers,
cache packets and retransmit the same data. MPTCP looks like cache packets and retransmit the same data. MPTCP looks like
standard TCP on the wire, and will not retransmit different data standard TCP on the wire, and will not retransmit different data
on the same subflow sequence number. on the same subflow sequence number.
o TCP Options: may be removed, or packets with unknown options o TCP Options: may be removed, or packets with unknown options
dropped, by many classes of middleboxes. It is intended that the dropped, by many classes of middleboxes. It is intended that the
initial SYN exchange, with a TCP Option, will be sufficient to initial SYN exchange, with a TCP Option, will be sufficient to
identify the path capabilities. If such a packet does not get identify the path capabilities. If such a packet does not get
trhough, MPTCP will end up falling back to regular TCP. through, MPTCP will end up falling back to regular TCP.
o Segmentation/Coalescing (e.g. tcp segmentation offloading, etc): o Segmentation/Coalescing (e.g. tcp segmentation offloading, etc):
might copy options between packets and might strip some options. might copy options between packets and might strip some options.
MPTCP's data sequence mapping includes the subflow sequence number MPTCP's data sequence mapping includes the subflow sequence number
instead of using the sequence number in the segment. In this way, instead of using the sequence number in the segment. In this way,
the mapping is independent of the packets that carry it. the mapping is independent of the packets that carry it.
o Firewalls [15]: might perform sequence number randomization on o Firewalls [15]: might perform sequence number randomization on TCP
connections. MPTCP uses relative sequence numbers in data connections. MPTCP uses relative sequence numbers in data
sequence mapping to cope with this. sequence mapping to cope with this. Like NATs, firewalls will not
permit many incoming connections, so MPTCP supports address
signalling (ADD_ADDR) so that a multihomed endpoint can invite its
peer behind the firewall/NAT to connect out to its additional
interface.
o Intrusion Detection Systems: look out for traffic patterns and o Intrusion Detection Systems: look out for traffic patterns and
content that could threaten a network. Multipath will mean that content that could threaten a network. Multipath will mean that
such data is potentially spread, so it is more difficult for an such data is potentially spread, so it is more difficult for an
IDS to analyse the whole traffic, and potentially increasint the IDS to analyse the whole traffic, and potentially increasint the
risk of false positives. However, for an MPTCP-aware IDS, risk of false positives. However, for an MPTCP-aware IDS,
connection IDs can be easily read by such systems to correlate connection IDs can be easily read by such systems to correlate
multiple subflows and re-assemble for analysis. multiple subflows and re-assemble for analysis.
o Application level NATs: will alter the payload of the connection. o Application level NATs: may alter the payload within a subflow.
Multipath TCP will detect these using the checksum and close the Multipath TCP will detect these using the checksum and close the
affected subflow(s), if there are other subflows that can be used. affected subflow(s), if there are other subflows that can be used.
If all subflows are affected multipath will fallback to TCP, If all subflows are affected multipath will fallback to TCP,
allowing middleboxes to change the payload. allowing middleboxes to change the payload.
o Middleboxes that alter the receive window: multipath will use the o Middleboxes that alter the receive window: MPTCP will use the
maximum window at data-level, but will also obbey subflow specific maximum window at data-level, but will also obey subflow specific
windows. windows.
7. Interfaces 7. Interfaces
TBD TBD
Interface with applications, interface with TCP, interface with lower Interface with applications, interface with TCP, interface with lower
layers... layers...
Discussion of interaction with applications (both in terms of how Discussion of interaction with applications (both in terms of how
skipping to change at page 35, line 24 skipping to change at page 36, line 12
layer, and what API extensions an application may wish to use with layer, and what API extensions an application may wish to use with
MPTCP) are discussed in [5]. MPTCP) are discussed in [5].
8. Open Issues 8. Open Issues
This specification is a work-in-progress, and as such there are many This specification is a work-in-progress, and as such there are many
issues that are still to be resolved. This section lists many of the issues that are still to be resolved. This section lists many of the
key open issues within this specification; these are discussed in key open issues within this specification; these are discussed in
more detail in the appropriate sections throughout this document. more detail in the appropriate sections throughout this document.
o Best handshake mechanisms (Section 4.1). This document contains a o Best handshake mechanisms (Section 3.1). This document contains a
proposed scheme by which connections and subflows can be set up. proposed scheme by which connections and subflows can be set up.
It is felt that, although this is "no worse than regular TCP", It is felt that, although this is "no worse than regular TCP",
there could be opportunities for significant improvements in there could be opportunities for significant improvements in
security that could be included (potentially optionally) within security that could be included (potentially optionally) within
this protocol. this protocol.
o Issues around simultaneous opens, where both ends attempt to o Issues around simultaneous opens, where both ends attempt to
create a new subflow simultaneously, need to be investigated and create a new subflow simultaneously, need to be investigated and
behaviour specified. behaviour specified.
o Appropriate mechanisms for controlling policy/priority of subflow o Appropriate mechanisms for controlling policy/priority of subflow
usage (specifically regarding controlling incoming traffic, usage (specifically regarding controlling incoming traffic,
Section 4.4.6). The ECN signal is currently proposed but other Section 3.3.6). The ECN signal is currently proposed but other
alternatives, including per subflow receive windows or options alternatives, including per subflow receive windows or options
indicating path properties, could be employed instead. indicating path properties, could be employed instead.
o How much control do we want over subflows from other subflows o How much control do we want over subflows from other subflows
(e.g. closing when interface has failed)? Do we want to (e.g. closing when interface has failed)? Do we want to
differentiate between subflows and addresses (Section 4.2)? differentiate between subflows and addresses (Section 3.2)?
o Do we want a connection identifier in every packet? E.g. would it o Do we want a connection identifier in every packet? E.g. would it
make the implementation of an IDS easier? make the implementation of an IDS easier?
o Should we do signaling in the TCP payload, rather than options as o Should we do signaling in the TCP payload, rather than options as
proposed in this draft? We discuss this alternative in the proposed in this draft? We discuss this alternative in the
appendix. appendix.
o Should we explicitly support SYN cookies? With the current o Should we explicitly support SYN cookies? With the current
design, MPTCP would be downgraded to basic TCP if SYN cookies were design, MPTCP would be downgraded to basic TCP if SYN cookies were
skipping to change at page 36, line 38 skipping to change at page 37, line 28
Andrew McDonald and Sergio Lembo. Andrew McDonald and Sergio Lembo.
10. IANA Considerations 10. IANA Considerations
This document will make a request to IANA to allocate new values for This document will make a request to IANA to allocate new values for
TCP Option identifiers, as follows: TCP Option identifiers, as follows:
+-------------+-----------------------------+---------------+-------+ +-------------+-----------------------------+---------------+-------+
| Symbol | Name | Ref | Value | | Symbol | Name | Ref | Value |
+-------------+-----------------------------+---------------+-------+ +-------------+-----------------------------+---------------+-------+
| MP_CAPABLE | Multipath Capable | Section 4.1 | (tbc) | | MP_CAPABLE | Multipath Capable | Section 3.1 | (tbc) |
| MP_JOIN | Join Connection | Section 4.2 | (tbc) | | MP_JOIN | Join Connection | Section 3.2 | (tbc) |
| ADD_ADDR | Add Address | Section 4.3.1 | (tbc) | | ADD_ADDR | Add Address | Section 3.5.1 | (tbc) |
| REMOVE_ADDR | Remove Address | Section 4.3.2 | (tbc) | | REMOVE_ADDR | Remove Address | Section 3.5.2 | (tbc) |
| DSN_MAP | Data Sequence Number | Section 4.4 | (tbc) | | DSN_MAP | Data Sequence Number | Section 3.3 | (tbc) |
| | Mapping | | | | | Mapping | | |
| DATA_ACK | Data-level Acknowledgment | Section 4.4 | (tbc) | | DATA_ACK | Data-level Acknowledgment | Section 3.3 | (tbc) |
| DATA_FIN | Data-level FIN | Section 4.5 | (tbc) | | DATA_FIN | Data-level FIN | Section 3.4 | (tbc) |
| MP_FAIL | Fallback | Section 4.6 | (tbc) | | MP_FAIL | Fallback | Section 3.6 | (tbc) |
+-------------+-----------------------------+---------------+-------+ +-------------+-----------------------------+---------------+-------+
Table 1: TCP Options for MPTCP Table 1: TCP Options for MPTCP
11. References 11. References
11.1. Normative References 11.1. Normative References
[1] Bradner, S., "Key words for use in RFCs to Indicate Requirement [1] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997. Levels", BCP 14, RFC 2119, March 1997.
11.2. Informative References 11.2. Informative References
[2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793, [2] Postel, J., "Transmission Control Protocol", STD 7, RFC 793,
September 1981. September 1981.
[3] Ford, A., Raiciu, C., Barre, S., and J. Iyengar, "Architectural [3] Ford, A., Raiciu, C., Barre, S., and J. Iyengar, "Architectural
Guidelines for Multipath TCP Development", Guidelines for Multipath TCP Development",
draft-ietf-mptcp-architecture-00 (work in progress), draft-ietf-mptcp-architecture-01 (work in progress), June 2010.
March 2010.
[4] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath- [4] Raiciu, C., Handley, M., and D. Wischik, "Coupled Multipath-
Aware Congestion Control", draft-raiciu-mptcp-congestion-01 Aware Congestion Control", draft-raiciu-mptcp-congestion-01
(work in progress), March 2010. (work in progress), March 2010.
[5] Scharf, M. and A. Ford, "MPTCP Application Interface [5] Scharf, M. and A. Ford, "MPTCP Application Interface
Considerations", draft-scharf-mptcp-api-01 (work in progress), Considerations", draft-scharf-mptcp-api-01 (work in progress),
March 2010. March 2010.
[6] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E. [6] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Lear, "Address Allocation for Private Internets", BCP 5,
RFC 1918, February 1996.
[7] Braden, R., "Requirements for Internet Hosts - Communication
Layers", STD 3, RFC 1122, October 1989.
[8] Mathis, M., Mahdavi, J., Floyd, S., and A. Romanow, "TCP
Selective Acknowledgment Options", RFC 2018, October 1996. Selective Acknowledgment Options", RFC 2018, October 1996.
[9] Stewart, R., "Stream Control Transmission Protocol", RFC 4960, [7] Stewart, R., "Stream Control Transmission Protocol", RFC 4960,
September 2007. September 2007.
[10] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of [8] Ramakrishnan, K., Floyd, S., and D. Black, "The Addition of
Explicit Congestion Notification (ECN) to IP", RFC 3168, Explicit Congestion Notification (ECN) to IP", RFC 3168,
September 2001. September 2001.
[9] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and E.
Lear, "Address Allocation for Private Internets", BCP 5,
RFC 1918, February 1996.
[10] Braden, R., "Requirements for Internet Hosts - Communication
Layers", STD 3, RFC 1122, October 1989.
[11] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path [11] Bagnulo, M., "Threat Analysis for Multi-addressed/Multi-path
TCP", draft-ietf-mptcp-threat-02 (work in progress), TCP", draft-ietf-mptcp-threat-02 (work in progress),
March 2010. March 2010.
[12] Srisuresh, P. and K. Egevang, "Traditional IP Network Address [12] Srisuresh, P. and K. Egevang, "Traditional IP Network Address
Translator (Traditional NAT)", RFC 3022, January 2001. Translator (Traditional NAT)", RFC 3022, January 2001.
[13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z. [13] Border, J., Kojo, M., Griner, J., Montenegro, G., and Z.
Shelby, "Performance Enhancing Proxies Intended to Mitigate Shelby, "Performance Enhancing Proxies Intended to Mitigate
Link-Related Degradations", RFC 3135, June 2001. Link-Related Degradations", RFC 3135, June 2001.
skipping to change at page 41, line 10 skipping to change at page 41, line 47
: (4 octets) | : (4 octets) |
+-------------------------------+ +-------------------------------+
Figure 13: Resync option Figure 13: Resync option
Appendix C. Changelog Appendix C. Changelog
This section maintains logs of significant changes made to this This section maintains logs of significant changes made to this
document between versions. document between versions.
C.1. Changes since draft-ford-mptcp-multiaddressed-03 C.1. Changes since draft-ietf-mptcp-multiaddressed-00
o Various clarifications and minor re-structuring in response to
comments.
C.2. Changes since draft-ford-mptcp-multiaddressed-03
o Clarified handshake mechanism, especially with regard to error o Clarified handshake mechanism, especially with regard to error
cases (Section 4.2). cases (Section 3.2).
o Added optional port to ADD_ADDR and clarified situation with o Added optional port to ADD_ADDR and clarified situation with
private addresses (Section 4.3.1). private addresses (Section 3.5.1).
o Added path liveness check to REMOVE_ADDR (Section 4.3.2). o Added path liveness check to REMOVE_ADDR (Section 3.5.2).
o Added chunk checksumming to DSN_MAP (Section 4.4.1) to detect o Added chunk checksumming to DSN_MAP (Section 3.3.1) to detect
payload-altering middleboxes, and defined fallback mechanism payload-altering middleboxes, and defined fallback mechanism
(Section 4.6). (Section 3.6).
o Major clarifications to receive window discussion (Section 4.4.4). o Major clarifications to receive window discussion (Section 3.3.4).
o Various textual clarifications, especially in examples. o Various textual clarifications, especially in examples.
C.2. Changes since draft-ford-mptcp-multiaddressed-02 C.3. Changes since draft-ford-mptcp-multiaddressed-02
o Remove Version and Address ID in MP_CAPABLE in Section 4.1, and o Remove Version and Address ID in MP_CAPABLE in Section 3.1, and
make ISN be 6 bytes. make ISN be 6 bytes.
o Data sequence numbers are now always 8 bytes. But in some cases o Data sequence numbers are now always 8 bytes. But in some cases
where it is unambiguous it is permissible to only send the lower 4 where it is unambiguous it is permissible to only send the lower 4
bytes if space is at a premium. bytes if space is at a premium.
o Clarified behaviour of MP_JOIN in Section 4.2. o Clarified behaviour of MP_JOIN in Section 3.2.
o Added DATA_ACK to Section 4.4. o Added DATA_ACK to Section 3.3.
o Clarified fallback to non-multipath once a non-MP-capable SYN is o Clarified fallback to non-multipath once a non-MP-capable SYN is
sent. sent.
Authors' Addresses Authors' Addresses
Alan Ford Alan Ford
Roke Manor Research Roke Manor Research
Old Salisbury Lane Old Salisbury Lane
Romsey, Hampshire SO51 0ZN Romsey, Hampshire SO51 0ZN
 End of changes. 110 change blocks. 
490 lines changed or deleted 511 lines changed or added

This html diff was produced by rfcdiff 1.38. The latest version is available from http://tools.ietf.org/tools/rfcdiff/