draft-ietf-sigtran-mdtp-04.txt   draft-ietf-sigtran-mdtp-05.txt 
Network Working Group R. R. Stewart Network Working Group R. R. Stewart
INTERNET-DRAFT Q. Xie INTERNET-DRAFT Q. Xie
Motorola Motorola
T. Bova S. Hussain
S Hussain C. Sharp
T Krivoruchka
R. Revis
Cisco Cisco
H. J. Schwarzbauer
Siemens
T. Taylor
Nortel Networks
I. Rytina
Ericsson
expires in six months April 19 1999 expires in six months June 2, 1999
MULTI_NETWORK DATAGRAM TRANSMISSION PROTOCOL MULTI_NETWORK DATAGRAM TRANSMISSION PROTOCOL
<draft-ietf-sigtran-mdtp-04.txt> <draft-ietf-sigtran-mdtp-05.txt>
Status of This Memo Status of This Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. Internet-Drafts are working all provisions of Section 10 of RFC2026. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas, documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups. Note that other groups may also distribute and its working groups. Note that other groups may also distribute
working documents as Internet-Drafts. working documents as Internet-Drafts.
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This Internet Draft discusses an experimental call control signaling This Internet Draft discusses a new protocol, namely the Multi-network
transport protocol, namely the Multi-network Datagram Transmission Datagram Transmission Protocol (MDTP), that is intended to provide
Protocol (MDTP), that is intended to provide fault-tolerant reliable fault-tolerant reliable data transfer between communicating entities
data transfer between communicating entities over IP networks [1]. over IP networks [1].
MDTP is proposed as an application-level protocol which is designed MDTP is proposed as an application-level protocol that is designed to
with a high emphasis on supporting redundant networks and transparent support redundant networks and transparent fault management. MDTP also
fault management. MDTP also gives the user a great degree of timing provides timing control and configuration flexibilities to meet the
control and configuration flexibilities in order to meet the stringent stringent timing requirements often found in telephony signaling
time constraints often found in telephony signaling protocols. The protocols. The motivation of developing MDTP is to support
motivation of developing MDTP is to establish a framework for Internet-based high reliability applications such as signaling and
supporting Internet-based high reliability real-time commercial call control for Internet telephony.
applications such as signaling and call control for Internet
telephony.
Stewart, et al [Page 1]
TABLE OF CONTENTS TABLE OF CONTENTS
1. Introduction 1. Introduction.......................................................3
1.1 Design Requirements of MDTP 1.1 Terminology......................................................3
1.2 Interfaces to MDTP 1.2 Design Requirements of MDTP......................................4
2. MDTP Datagram Format 1.3 Interface to MDTP................................................5
2.1 Header Field Descriptions 2. MDTP Datagram Format...............................................5
2.2 Data Field 2.1 MDTP Common Header Field Descriptions............................6
3. Transmission Initialization 2.2 MDTP Control Parameter Part Definitions..........................7
3.1 Endpoint Association Initialization 2.3 MDTP Data Part Definitions......................................11
3.1.1 Choice of Tag Value 3. Endpoint Association Initialization...............................12
3.2 Data Field Format of Initiation Datagrams 3.1 Initiation Message and Tag Lock.................................12
3.3 Initialization Collision 3.2 Tag Unlock and TSN Initialization...............................13
3.4 Association Re-initialization 3.3 Datagram Processing during Tag Lock ............................14
4. Reliable Transfer of Datagrams 3.4 An Example of Association Initialization .......................14
4.1 Timer Management Rules 3.5 Other Initiation Issues.........................................15
4.1.1 Link Rotation 3.5.1 Selection of Tag Value......................................15
4.2 Gap Acknowledgment for Missing Datagrams 3.5.2 Initiation from behind a NAT................................15
4.3 Congestion Control 3.5.3 Initialization Collision....................................16
4.3.1 Sending with Window Control 3.5.4 Association Re-initialization...............................16
4.3.2 Window Length Adjustment 4. Transfer User Datagram............................................16
4.3.3 Flow Control using In-Queue Information 4.1 Timer Management Rules..........................................17
4.3.4 T3-send Timer Adjustment with RTT 4.1.1 T3-send Timer Adjustment with RTT...........................18
4.4 Sequence Number Reset 4.2 Multihoming Rotation............................................18
4.5 Datagram Re-transmission 4.2.1 Remote Multihoming Rotation.................................18
4.5.1 Re-transmission on Redundant networks 4.2.2 Local Multihoming Rotation..................................19
4.6 RTT Measurement 4.3 Stream Sequence Number..........................................19
4.6.1 RTT Datagram Header Format 4.4 Ordered and Un-ordered Delivery.................................19
4.6.2 Measure RTT 4.5 Report Missing Datagrams........................................20
4.7 Link Heart Beat 4.6 Range Check on TSN .............................................21
4.8 Advisory Acknowledgment 4.7 Advisory Ack Request............................................21
4.9 Termination of an Association 5 Congestion Controls...............................................22
4.10 Draining of an Association 5.1 Send with Window Control........................................22
5. Interface with upper level protocols 5.1.1 Window Length Adjustment....................................23
6. Suggested MDTP Protocol Parameter Values 5.2 Send Timer Back-off at Re-transmission..........................24
7. Acknowledgments 6. Network Management................................................25
8. Author's Addresses 6.1 Failure Detection in Redundant Networks.........................25
9. References 6.2 RTT Measurement.................................................26
Appendix A: Stream-based Reliable and Ordered Delivery 6.3 Network Heart Beat .............................................26
A.1 Stream Initiation 7. Termination of Association........................................27
A.2 Stream Termination 7.1 Graceful Shutdown of an Association.............................28
A.3 Stream Datagram Transfer 8. Stream Operations.................................................29
A.3.1 Header Format in Stream Datagrams with User Data 8.1 Stream Initiation...............................................29
A.3.2 Transmission of Stream Datagrams 8.2 Stream Termination..............................................29
A.3.3 Extended Stream Ack 8.3 Other Issues with Stream Operations.............................30
A.4 Other Issues with Stream Transfer 9. Interface with Upper Layer........................................30
Appendix B: Bundled Message Transfer 10. Suggested MDTP Timer and Protocol Parameter Values................34
B.1 Format of Bundled Datagram 11. Acknowledgments...................................................34
B.2 Bundled Datagram Transfer 12. Authors' Addresses................................................34
Appendix C: Fragmented Message Transfer 13. References........................................................35
Appendix D: Multicast Datagram Transfer
D.1 Multicast Datagram Header Format Stewart, et al [Page 2]
D.2 Transmission of Multicast Datagrams
Appendix E: Unreliable Delivery
E.1 Ordered Unreliable Delivery
1. Introduction 1. Introduction
This Internet Draft discusses an experimental protocol, namely the This Internet Draft discusses a new protocol, namely the Multi-network
Multi-network Datagram Transmission Protocol (MDTP). The intention of Datagram Transmission Protocol (MDTP). The intention of developing
developing MDTP is to provide a fault-tolerant, real-time reliable MDTP is to provide a fault-tolerant, real-time reliable data transfer
data transfer mechanism between communicating endpoints over IP mechanism between communicating endpoints over IP networks [1].
networks [1].
MDTP is proposed as an application-level protocol which is designed MDTP is proposed as an application-level protocol that is designed to
with a high emphasis on supporting redundant networks and transparent support redundant networks and transparent fault management. MDTP also
fault management. MDTP also gives the user a great degree of timing provides timing control and configuration flexibilities to meet the
control and configuration flexibilities in order to meet the stringent stringent timing requirements often found in telephony signaling
time constraints often found in telephony signaling protocols. The protocols. The motivation of developing MDTP is to support
motivation of developing MDTP is to establish a framework for Internet-based high reliability applications such as signaling and
supporting Internet-based high reliability real-time commercial call control for Internet telephony.
applications such as signaling and call control for Internet
telephony.
MDTP is also designed to be scalable in order to support different MDTP is also designed to be scalable in order to support different
signaling transport requirements for different interfaces in a signaling transport requirements for different interfaces to a
telephony network. telephony network.
For example, the transportation of signaling protocols such as PRI For example, the transportation of signaling protocols such as ISDN
ISDN may not require redundant links, and hence only a subset of MDTP PRI may not require redundant networks, and hence only a subset of
will need to be implemented. On the other hand, redundant networks MDTP will need to be implemented. On the other hand, redundant
may be mandated when transporting SS7 signaling messages amongst networks may be mandated when transporting SS7 signaling messages
different components in a carrier-grade telephony core network. In amongst different components in a carrier-grade telephony core
such cases, the transparent support for redundant networks, load network. In such cases, the transparent support for redundant
sharing, and fault management defined in MDTP become essential and networks, load sharing, and fault management defined in MDTP become
likely need to be fully supported in an implementation. essential.
Many of the fundamental concepts that have made TCP such a useful Many of the fundamental concepts that have made TCP such a useful
protocol are reused in MDTP, and some of the advantages of UDP are protocol are reused in MDTP, and some of the advantages of UDP are
also merged into the design. This has lead to a highly effective, also merged into the design.
robust protocol for fault tolerant data communications.
This document describes the functional interface and the details 1.1 Terminology
necessary for implementing MDTP. The main body of this document
contains the minimal set of functionalities of MDTP that must be
implemented. In the Appendices, a set of additional MDTP functions,
such as reliable stream, multicast, message bundling, message
fragmentation, are defined. Those additional functionalities are
optional to implementation.
1.1 Design Requirements of MDTP The following terms are defined and used in this document:
The following are some of the design requirements of MDTP, in order to - Redundant networks:
An endpoint may be able to transmit or receive on more than one IP
address/UDP port. RFC 1122 refers to this as multihoming. This
constitutes a redundant local network (for MDTP) relative to the
endpoint. MDTP makes no attempt to assure routing diversity within
the internet connecting two endpoints. Each endpoint attempts to
send to its peer endpoint using all the IP addresses and UDP ports
its peer has open (within the constraints of any application
specified restrictions). The choice of which local socket to send
upon is an implementation detail (it is possible only one socket is
available and bound to all of the local networks to which the machine is
connected). The O/S also will play a role in the multihoming/redundancy.
MDTP attempts a best effort at spreading the traffic across a
Stewart, et al [Page 3]
destination's available interfaces. It is assumed by MDTP that the
network (if fault tolerance is desired) is engineered for diversity
and MDTP's best effort will play only a small role in that diversity.
- Endpoint:
Representation of the logical point where MDTP datagrams can be sent
to or received from. Moreover, an MDTP endpoint shall be defined as
a set of IP address/port combinations in order to support redundant
networks. For example, an endpoint on a multi-homed host connected
with N IP networks can be represented as:
[IP addr1/port1,
...
IP addrN/portN]
where the port numbers or IP addresses may not be unique, but their
combinations shall be guaranteed to be unique by the underneath IP
networks.
- Association:
Representation of an ongoing communication channel between two MDTP
endpoints.
- Stream:
Defines a sub-channel within an association. Datagrams sent through
a stream shall be reliably transmitted and delivered independent to
datagrams from other streams.
Each stream shall be identified by a stream number that is unique
within the association. Stream 0xffff is reserved and shall not be
used.
1.2 Design Requirements of MDTP
The following are some of the design requirements of MDTP to
make MDTP capable of supporting real-time call control environments make MDTP capable of supporting real-time call control environments
which potentially may employ redundant networks: that may employ redundant networks:
A) High communication fan-out: an endpoint may need to be in A) High communication fan-out: an endpoint may need to be in
simultaneous communication with hundreds or thousands of endpoints simultaneous communication with hundreds or thousands of endpoints
performing various call processing functions. These endpoints may performing various call processing functions. These endpoints may
be codec converters, SS7 to IP translation applications, or, in the be codec converters, SS7 to IP translation applications, or, in the
case of mobile networks, data selector and combiner applications. case of mobile networks, data selector and combiner applications.
B) Stringent timer control: an endpoint needs to have a very fine B) Stringent timer control: an endpoint needs to have a very fine
control over the timing for delivering a datagram. The timing control over the timing for delivering a datagram. The timing
should be easily adjusted depending on the message type and the should be easily adjusted depending on the message type and the
destination. For example, after a few seconds of non-delivery the destination. For example, after a few seconds of non-delivery the
call which the message is about may not exist anymore. call which the message is about may not exist anymore.
C) Support redundant links: an endpoint communicating with a peer Stewart, et al [Page 4]
should be able to take advantage of the redundant networks in a
transparent way. This means that the application or upper layer
protocols need not to be involved in the network fault
management. Instead, when network failure occurs MDTP should be
able to automatically re-route the out-bound datagram to the
alternate network (if one exists) without intervention from the
application.
D) Orderly delivery: datagrams may arrive out of order, or may arrive C) Support multiple network paths: an endpoint communicating with a peer
in duplicate copies. This is especially true if redundant networks should be able to take advantage of the multiple network paths and
are used. MDTP should be strong enough to properly handle both multihoming in a transparent way. Therefore, the protocol must
situations with little intervention from the upper layer protocols be able to take advantage of local multi-homed hosts and remote
or applications. multi-homed hosts to provide resilient data delivery. This means
that the application or upper layer protocols need not to be involved
in the network fault management. Instead, when network failure occurs
MDTP should be able to automatically transmit out-bound datagrams to an
alternate destination network interface (if one exists) without
intervention from the application.
D) Reliable transport: datagrams might be lost or discarded while
traveling in the IP network towards the destination. The protocol
must handle the re-transmission of lost messages in an autonomous
way without any intervention from the upper layer. Also, sometimes
datagrams may arrive in duplicate copies, in such cases MDTP must
be able to detect and remove the duplicates automatically.
E) Support both ordered and un-ordered delivery: MDTP must support
both ordered and un-ordered delivery. In the case of ordered
delivery, the receiver shall detect out-of-order datagrams and
re-order them before dispatching them to the upper layer. In the
un-ordered case, received datagrams shall be dispatched without any
effort of re-ordering.
F) Support stream sequencing: on the demand of the upper layer F) Support stream sequencing: on the demand of the upper layer
protocols or applications, MDTP should be able to support sequenced protocols or applications, MDTP should be able to support sequenced
delivery with regard to each individual stream, i.e., the delay caused delivery with regard to each individual stream, i.e., the delay caused
by the loss and retransmission of a datagram should be isolated to by the loss and retransmission of a datagram should be isolated to
only the stream to which the datagram belongs. This is particularly only the stream to which the datagram belongs. This is particularly
important in some call control applications, where a loss of a important in some call control applications, where a loss of a
message should only affect the call whom the message belongs to. message should only affect the call whom the message belongs to.
1.2 Interfaces to MDTP 1.3 Interface to MDTP
The application programs or upper layer protocols interface with MDTP The application programs or upper layer protocols interface with MDTP
through a set of primitives (see section 5. for details). through a set of primitives (see section 9).
Towards the networks, it is assumed that a UDP-like data transport Towards the IP networks, it is assumed that UDP is used for the
protocol will provide the interface between MDTP and the operating transport layer. No special interfaces or changes are assumed within
system. No special interfaces or changes are assumed within the UDP or at the UDP/MDTP interface. MDTP maintains its own queuing and
operating system, all queuing and endpoint association information are endpoint association. When MDTP runs on a router or on a
maintained inside MDTP layer. gateway-enabled host, it will place no special constraints on the
lower layer protocol implementations other than those described in the
Router Requirements and Host Requirements RFCs.
2. MDTP Datagram Format 2. MDTP Datagram Format
MDTP inserts the following protocol header at the beginning of every A MDTP datagram consists of a common header and possibly a control
user datagram. The integer fields shall be transmitted in network byte parameter part, a data part, or both.
order.
MDTP Header Format Stewart, et al [Page 5]
MDTP Datagram Format
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MDTP Protocol Identifier | | MDTP Protocol Identifier | Vers |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number (Seen) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number (Send) | |C/D| Msg Type | Reserved | Data Size |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of | \ \
/ Control Parameter Part /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \ \ \
/ data / / Data Part /
\ \ \ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
2.1 Header Field Descriptions Note: integers in the header of MDTP datagrams MUST be transmitted in
network byte-order.
MDTP Protocol Identifier: 32 bits
This shall be a fixed long value of 0xf7873072. The receiver
shall always verify this Protocol Identifier before it proceeds
any further in interpreting the header fields.
Version: 8 bits
This field represents the version number of the MDTP protocol Note: when both the control part and data part are present in an MDTP
(value TBD). datagram, the control part MUST be processed first.
Flags: 16 bits 2.1 MDTP Common Header Field Descriptions
NOM - shall be set to 1 (reserved for fragmentation, see MDTP Protocol Identifier: 28 bits
Appendix C)
NOB - shall be set to 1 (reserved for bundling, see Appendix B) This shall be a fixed value of 0xf787307. The receiver shall
always verify this Protocol Identifier before it proceeds any
further in interpreting the header fields.
WIN - Window Up. This bit is set by the sender of this datagram Version: 4 bits
to indicate that the sender needs the receiver to acknowledge on
previously received datagrams before it can send more datagrams.
ISB - shall set to 0 (reserved for bundling, see Appendix B) This field represents the version number of the MDTP protocol,
and shall be set to 0x3.
FIR - First Datagram. This flag is set to indicate that this is a C/D Bits: 2 bits
Initiation datagram.
RTM - normally set to 0 (used for Link Heart Beat and RTT This field indicates whether the Message Type and Data Size fields
measurement, see sections 4.6 and 4.7) are filled in the present datagram:
DAT - Data Present. This bit is set to indicate that, following 00 - reserved, shall not be used. The receiver shall silently
this header, application data is present in this datagram. discard any datagram with C/D bits set to 00.
01 - Data Size only
10 - Message Type only
11 - Message Type and Data Size
ACK - Acknowledge. This bit is set to indicate that the sender is Message Type: 6 bits
acknowledging the reception of the specified Acknowledgment Number.
MUL - shall be set to 0 (reserved for multicast, see Appendix D) This shall indicate the type of control message. Its value is valid
only when the C/D bits are set to either "10" or "11". Otherwise it
SHU - Shutdown. This bit is set when the sender initiates its Stewart, et al [Page 6]
closing procedure and indicates to the receiver that the sender shall be set to 0x0 and ignored by the receiver.
is no longer a valid destination. If the UNR bit is set in
conjunction with the SHU bit, an incomplete shutdown is
specified. After an incomplete shutdown, the receiver can still
re-establish the communication with the sender by re-initiating
with the sender (see 4.7).
WNR - Window Up Response. This bit is set in the acknowledgment Message Type determines whether the control part is present in
reply to a Window Up flag. the current datagram.
RE1 - normally set to 0 (used for advisory ACK, see section 4.8) The value of Message Type is defined as the follows:
RTC - normally set to 0, (used for RTT, see section 4.6) 0x0 - reserved and shall not be used
FLO - shall be set to 0 (reserved for reliable stream, see 0x1 - Initiation
Appendix A) 0x2 - Initiation Ack
0x3 - Extended Data Ack
0x4 - Advisory Ack Request
0x5 - Window-up
0x6 - Window-up Ack
0x7 - RTT-request
0x8 - RTT-ack
0x9 - Abort
0xa - Graceful Shutdown
0xb - Graceful Shutdown Ack
0xc - Stream Initiation
0xd - Stream Initiation Ack
0xe - Stream Termination
0xf - Stream Termination Ack
GAR - shall be set to 1 (reserved for unreliable mode, see 0x10 to 0x3f - reserved and shall not be used
Appendix E)
UNR - shall be set to 0 (reserved for unreliable mode see Reserved: 8 bits
Appendix E)
In Queue: 8 bits These bits are reserved for future use. The sender shall always
set these bits to '0', and the receiver shall ignore there
values.
This field contains the number of messages the sender has on its Data Size: 16 bits
incoming queue, waiting to be read by the application. This
gives the receiver an indication of the flow control conditions
within the sender.
Acknowledgment Number (or Seen): 32 bits This value represents, in number of octets, the size of the user
data present in the Data Part of the current datagram. Its value
is only valid when C/D bits are set to either "01" or "11".
Otherwise it shall be set to 0x0 and ignored by the receiver.
If the flag ACK is set this value is the last sequence number 2.2 MDTP Control Parameter Part Definitions
that the sender of this datagram received from the
receiver of this datagram.
Sequence Number (or Send): 32 bits This section defines whether a control parameter part is present for
each message type, and its format if a control parameter part is
present.
If DAT flag is set, this value represents the sequence number of 2.2.1 Initiation (0x1) and Initiation Ack (0x2):
the current data unit following this header. Otherwise, this
value will be the sequence number of the next data unit that
will be sent.
Data Size: 16 bits The parameter field of the Initiation and Initiation Ack messages
shall carry two initiation Tags, the maximum window length and the
sender's local network information. Note that the endpoint MAY
be multi-homed.
This value represents, in number of octets, the size of the data Stewart, et al [Page 7]
field that follows this header in the current datagram. The following defines the parameter format for carrying N IPv4
Network addresses (other network address formats can be carried by
setting the size and type fields accordingly):
Part: 8 bits 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Tag Value 1 (Seen) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Tag Value 2 (Send) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Max Window Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of Networks = N |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of address=8 | Type of Address=2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Address of Network 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Port # 1 | Padding = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ /
\ ... \
/ /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of address=8 | Type of Address=2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Address of Network N |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Port # N | Padding = 0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
shall have value '0' (reserved for fragmentation, see Appendix C) If there is any implementation-specific data needed to be
exchanged at the setup of the association, it should be appended
to the end of the above data structure. The format of the
implementation-specific data should follow "Size/Type/Data Field"
format as defined above. In case an endpoint does not support the
implementation-specific data received, it shall ignore the
additional fields.
Of: 8 bits 2.2.2 Extended Data Ack (0x3):
shall have value '1' (reserved for fragmentation, see Appendix C) The parameter field contains 0 or more gap reports and the
highest transmission sequence number (TSN) received.
2.2 Data Field Stewart, et al [Page 8]
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of Gaps = N |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Gap #1 Start TSN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Gap #1 End TSN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ /
\ ... \
/ /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Gap #N Start TSN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Gap #N End TSN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Last TSN Seen |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
When the DAT flag is set to 1, the MDTP datagram header will be 2.2.3 Advisory Ack Request (0x4):
followed by a data field. An implementation may choose to pad some
'0's at the end of the data field so as to align with certain memory
boundaries. However, the padded '0' octets, if there are any, shall
not be counted in the Data Size.
The maximal Data Size for a single MDTP datagram is the MTU size of No parameter field.
the underlying transport protocol (e.g., UDP) minus the MDTP header
size.
3. Transmission Initialization 2.2.4 Window-up (0x5):
3.1 Endpoint Association Initialization No parameter field.
Before the first data transmission can take place from one endpoint 2.2.5 Window-up Ack (0x6):
("A") to another endpoint ("Z"), the two endpoints will need to
complete an initialization process in order to set up an association
between them.
The initialization procedure should be made transparent to the upper Same as that of Extended Data Ack.
layer protocol, i.e., it should take place automatically whenever the
upper layer tries to send a datagram to an endpoint which has never
been sent to before. The user datagram shall be withheld by MDTP from
transmission till the completion of the initialization.
A tag-and-lock mechanism is employed during the initialization in 2.2.6 RTT-request (0x7) and RTT-ack (0x8):
order to guard against erroneous or stale datagrams (this is
especially true if redundant networks are deployed).
The initialization process consists of the following steps (assuming The parameter field shall contain the time value that is used for
the upper layer at "A" tries to send data to "Z" for the first time): RTT calculation (see section 6.2), and optionally an
acknowledgment Seen value.
A) "A" first sends an Initiation (FIR) to "Z", with Seen field set 0 1 2 3
to 0 and Send field set to Tag_A, and then enters the Tag-lock mode 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
(see below). +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Value 1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time Value 2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| 0x0 or TSN Seen |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
B) "Z" responds immediately with an Initiation Ack (FIR|ACK), with 2.2.7 Abort (0x9):
Seen set to Tag_A and Send set to Tag_Z, and then enters the
Tag-lock mode, too (see below).
Note that no user data should be carried in the Initiation or The Abort message shall carry the initiation Tag of the
Initiation Ack datagram. destination endpoint as a measure of security.
At this point "Z" is ready to send user data to "A". And upon the Stewart, et al [Page 9]
receipt of the above Initiation Ack from "Z", "A" can also start 0 1 2 3
sending user data to "Z". 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Init-Tag |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
However, the first datagram with user data transmitted by "A" to "Z" 2.2.8 Graceful Shutdown (0xa):
shall have the Seen value set to Tag_Z, which is obtained from the
Initiation Ack. And similarly, the first datagram with user data
transmitted by "Z" to "A" shall have the Seen value set to Tag_A,
which comes from the Initiation datagram.
In the Tag-lock mode, each side will silently discard any datagrams The destination endpoint initiation Tag shall be carried as a
with user data from the other side until it receives the first measure of security.
datagram with user data and with a Seen value that matches its own
Tag. Once that datagram is received, that endpoint will leave the
Tag-lock mode and immediately send back a data acknowledgment, and
start using the sequence numbers to filter out missing and duplicate
datagrams.
If another Initiation from "A" is received by "Z" after it sent out 0 1 2 3
the Initiation Ack, "Z" will acknowledge this Initiation by re-sending 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
the Initiation Ack only when the Send field of this new Initiation has +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
the same tag as that of the original Initiation. Otherwise, "Z" will | Init-Tag |
send an Initiation of its own with Send field set to Tag_Z back to "A" +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
to elicit an Initiation Ack from "A". | TSN Seen |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In the following example, "A" initiates the association first and then 2.2.9 Graceful Shutdown Ack (0xb):
sends a datagram with user data to "Z":
Endpoint A Endpoint Z Same as that of Abort.
{first app message to Z} 2.2.10 Stream Initiation (0xc):
[Header Flags=FIR
& other options
Seen=0,Send=Tag_A] ----------------------->
(Start T1-init timer)
(Enter Tag_A-lock mode)
[Header Flags=FIR|ACK
& other options
/---------- Seen=Tag_A,Send=Tag_Z]
/ (Enter Tag_Z-lock mode)
(Cancel T1-init timer)<-------/
[Header Flags=ACK|DAT The parameter field shall contain the initiation Tag of the
& other options destination endpoint (see section 3.1), the Stream Identifier,
Seen=Tag_Z,Send=1] and the Initial Sequence Number of this stream. Also, there shall
[data field] -----------\ be a "Size of Stream Info" and "Stream Information" fields that
(Start T3-send timer) \ may contain an opaque user data structure specific to the stream
\----> (Leave Tag_Z-lock mode) being opened. The "Stream Information" field should be padded with
'0's to 32 bit word boundary before transmission.
If T1-init timer expires at "A" after the Initiation sent, the same 0 1 2 3
Initiation datagram with the same Tag_A value will be retransmitted 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
and the timer restarted. This will be repeated Max.Init.Retransmit +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
times before "A" considers "Z" unreachable and optionally reports the | Init-Tag |
failure. +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stream Identifier | Reserved (set to 0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of Stream Info = N |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ /
\ Stream Information (N octets) \
/ /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
3.1.1 Choice of Tag Value 2.2.11 Stream Initiation Ack (0xd):
Tag values should be selected from the range of 0x80000000 to The parameter field shall contain the Stream Identifier.
0xffffffff.
3.2 Data Field Format of Initiation Datagrams 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stream Identifier | Reserved (set to 0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
If redundant networks exist between two endpoints, the data field of 2.2.12 Stream Termination (0xe):
the Initiation and Initiation Ack datagrams will carry the redundant
network information.
The following shows the data field format carrying N IPv4 redundant The parameter field shall contain the initiation Tag value (see
network information: section 3.1) and the Stream Identification
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of Networks = N | | Init-Tag |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of address=8 | Type of Address=AF_INET (2)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Address of Network 1 | | Stream Identifier | Reserved (set to 0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Port # 1 | Padding = 0 |
2.2.13 Stream Termination Ack (0xf):
Same has that of Stream Initiation Ack.
2.3 MDTP Data Part Definitions
The following format shall be used for MDTP datagram Data Part:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
/ / | TSN Seen |
\ ... \
/ /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Size of address=8 | Type of Address=AF_INET (2)| | TSN Send |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| IP Address of Network N | | Stream Identifier N | Sequence Number n |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Port # N | Padding = 0 | \ \
/ User Data (seq n of Stream N) /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Additional implementation-specific data is allowed after the redundant TSN Seen: 32 bits
network information. No user data, however, is allowed to be
transported in Initiation or Initiation Ack datagrams.
3.3 Initialization Collision This is a piggy-backed acknowledgment, indicating the reception
of datagrams up to this TSN.
TSN Send: 32 bits
This value represents the TSN of the user data carried in this
datagram.
Stream Number: 16 bits
Identify the stream to which the following user date belongs.
Sequence Number: 16 bits
This value presents the sequence number of the following user
data within the stream.
Sequence number 0x0 indicates that the following user data shall
be treated as un-ordered, and shall be dispatched to the upper
layer by the receiver without any attempt of re-ordering.
User Data: variable length
This is the payload user data. The size of the user data shall
be specified in the Data Size field. The implementation may
optionally have some '0' padded at the end of User Data field.
3. Endpoint Association Initialization
Before the first data transmission can take place from one endpoint
("A") to another endpoint ("Z"), the two endpoints must complete an
initialization process in order to set up an association between them.
The upper layer may explicitly request MDTP to initialize an
association to an endpoint, or implicitly open the association by
sending the first datagram to that endpoint on stream 0.
Once the association is established, the global stream, i.e., stream
0, is automatically open and ready for datagram transmission. Other
streams must be explicitly opened before data transmission can occur.
A tag-and-lock mechanism must be employed during the initialization
in order to guard against security attacks as well as erroneous
datagrams.
3.1 Initiation Message and Tag Lock
The initialization process consists of the following steps (assuming
that MDTP endpoint "A" tries to set up an association with MDTP
endpoint "Z"):
A) "A" shall first send an Initiation message to "Z", with Tag Seen
field set to 0x0 and Tag Send field set to Tag_A, where Tag_A shall
be a random number in the range of 0x80000000 to 0xffffffff (see
3.1.4 for Tag value selection), and enter the Tag-lock mode.
B) "Z" shall respond immediately with an Initiation Ack message, with
Seen set to Tag_A and Send set to Tag_Z (same range as Tag_A), and
enter the Tag-lock-new mode.
At this point, "Z" is ready to send user datagrams to "A" in stream
0. And upon the reception of the above Initiation Ack from "Z", "A"
also becomes ready to send user datagrams to "Z" in stream 0.
Note: user data in other streams can not be sent until the
respective streams are opened.
C) "Z" shall leave Tag-lock-new mode and enter Tag-lock mode only if a
user datagram has been sent out from "Z" to "A".
Note: to guard against "man in the middle" attacks, a limit should
be imposed on the number of associations in the Tag-lock-new mode
at any given endpoint; whenever that limit is reached, any further
association Initiations received by the endpoint shall be silently
discarded. Also, a timer shall be used on each association that is
in the Tag-lock-new mode; at the expiration of that timer, that
association shall be shutdown by the endpoint.
Note: no user data shall be carried in both the Initiation and
Initiation Ack messages, i.e., the C/D bits must be set to 10.
Note: both side must exchange their local network information and
their maximal window length in the Initiation and Initiation Ack
messages.
3.2 Tag Unlock and TSN Initialization
The first user datagram transmitted by "A" to "Z" shall have the TSN
Seen value set to Tag_Z in the Data Part (see 2.3).
Similarly, the first user datagram transmitted by "Z" to "A" shall
have the TSN Seen value set to Tag_A.
The reception of this first datagram with user data and with the
correct Tag value in the TSN Seen field from its peer shall unlock the
Tag and cause the endpoint to leave the Tag-lock or Tag-lock-new mode.
The receiver shall immediately send back an Extended Data Ack to
acknowledge the reception of this first user datagram.
The TSN Send value carried in this first datagram with user data shall
be used to establish the initial TSN of this peer, i.e., the sender of
this datagram.
To strengthen the security, this initial TSN shall be randomly
selected from the range between 0x1 and 0x7fffffff by the sender, by
means such as those suggested in RFC 1750 [9].
Note: if there exists any un-acked datagram(s) when an endpoint is to
send its first user datagram to its peer, the endpoint MUST send a
stand-alone Extended Data Ack to acknowledge the un-acked datagram(s)
it has received from that peer before it sends out its first user
datagram. This is because the TSN Seen field in the first out-bound
user datagram can not be used as a TSN ack, instead it is used to
carried the peer's Tag.
3.3 Datagram Processing during Tag Lock
In Tag-lock or Tag-lock-new mode, an endpoint shall silently discard
any user datagrams from the peer endpoint that does not carried the
correct Tag value.
However, if there is a control part present in a discarded user
datagram (i.e., C/D = 0x11), the endpoint shall always process the
control part even when the data part is being discarded.
If another Initiation from "A" is received by "Z" after "Z" sent out
its Initiation Ack, "Z" shall respond to this second Initiation by
re-sending the Initiation Ack if the Tag Send field of this second
Initiation has the same value as that of the original Initiation.
Otherwise, "Z" shall respond by sending an Initiation of its own, with
Tag Send field set to Tag_Z, so as to elicit an Initiation Ack from
"A".
3.4 An Example of Association Initialization
In the following example, "A" initiates the association first and then
sends a user datagram to "Z", then "Z" sends two user datagrams
sometimes later:
Endpoint A Endpoint Z
{app sets association with Z}
Initiation(C/D = 10)
[Tag Seen=0,Tag Send=Tag_A
& net addr info] --------\
(Start T1-init timer) \
(Enter Tag_A-lock mode) \---->Initiation Ack(C/D = 10)
[Tag Seen=Tag_A,Tag Send=Tag_Z
/---- & net addr info]
/ (Enter Tag_Z-lock-new mode)
(Cancel T1-init timer)<-------/
{app sends 1st user data; strm 0}
U-Data(C/D = 01)
[Seen=Tag_Z,Send=init TSN-A
Strm=0,Seq=1,
& user data] -------\
(Start T3-send timer) \
\---->(Leave Tag_Z-lock-new mode)
------Ext Data Ack(C/D = 10)
/ [Gap=0,TSN Seen=init TSN-A]
(Cancel T3-send timer) <-----/
..
..
{app sends 2 datagrams;strm 0}
/---- U-data(C/D = 01)
/ [Seen=Tag_A,Send=init TSN-Z
(Leave Tag_A-lock mode) <----/ Strm=0,Seq=1,
(Start T2-receive timer) & user data 1]
/---- U-data(C/D = 01)
/ [Seen=init TSN-A,
/ Send=init TSN-Z +1,
<---/ Strm=0,Seq=1,
& user data 2]
If T1-init timer expires at "A" after the Initiation is sent, the same
Initiation message with the same Tag_A value shall be retransmitted and
the timer restarted. This shall be repeated Max.Init.Retransmit times
before "A" considers "Z" unreachable and optionally reports the
failure.
3.5 Other Initiation Issues
3.5.1 Selection of Tag Value
Tag values should be selected from the range of 0x80000000 to
0xffffffff. It is very important that the Tag value be randomized to
guard against "man in the middle" and "sequence number" attacks. It is
suggested that RFC 1750 [9] be used for the Tag randomization.
3.5.2 Initiation from behind a NAT
When a NAT is present between two endpoints, the endpoint that is
behind the NAT, i.e., one that does not have a publicly available
network address, shall take one of the following options:
A) Indicate that it has only one network by setting the 'Number of
networks' field in the Initiation message to 0. This will make the
endpoint that receives this Initiation message to consider the sender
as only having that one address. This method can be used for a dynamic
NAT, but any multihoming configuration at the endpoint that is behind
the NAT will not be visible to its peer, and thus not be taken
advantage of.
B) Indicate all of its networks in the Initiation by specifying all
the actual IP addresses and ports that the NAT will substitute for the
endpoint. This method requires that the endpoint behind the NAT must
have pre-knowledge of all the IP addresses and ports that the NAT will
assign.
3.5.3 Initialization Collision
If two endpoints attempt to initialize an association with each other If two endpoints attempt to initialize an association with each other
at about the same instance, a collision will occur, i.e., each side at about the same instance, a collision will occur. As a result, each
will receive an Initiation datagram from the other side after it side will receive an Initiation datagram from the other side after it
transmitted its own. In such a case, both sides shall acknowledge the transmitted its own. In such a case, both sides shall send an
Initiation datagram of the other side in the normal procedure as Initiation Ack datagram to the other side using the procedure
described above. described above.
3.4 Association Re-initialization 3.5.4 Association Re-initialization
An endpoint shall be allowed to re-initialize an established An endpoint shall be allowed to re-initialize an established
association with another endpoint. association with the other endpoint.
In such a case, the endpoint that initiates the re-initialization Once an endpoint has left the Tag-lock or Tag-lock-new mode of the
(i.e, the initiator) shall use a tag different from the one used in previous association initialization process, it shall treat any new
the previous initialization. And the initiator shall follow the normal Initiation message from its peer as a re-initialization event.
initialization procedure as stated in section 3.1.
Once left the Tag-lock mode of the current association initialization, During a re-initialization, both endpoint shall follow the same
an endpoint shall treat any new incoming Initiation from its peer as a procedure as defined in section 3.1. And a new Init-Tag must be used
re-initialization event. Upon the arrival of the new Initiation by the endpoint that receives the Initiation message if it has already
datagram from the peer, the receiving endpoint shall also follow the left the previous Tag-lock or Tag-lock-new mode.
procedure stated in section 3.1 to respond.
4. Reliable Transfer of Datagrams 4. Transfer User Datagram
Reliable transfer is indicated if the datagram being transferred has The receiver of a user datagram shall always acknowledge the reception
GAR bit set to 1 and the UNR bit set to 0. The receiver of a to the sender of the datagram. Normally, delayed acknowledgment shall
reliable datagram shall always acknowledgment the sender. be used. The delay shall be controlled by a T2-receive timer.
Normally, delayed acknowledgment is used, and the acknowledgment can At the expiration of T2-receive timer, if there is out-bound user data,
either be sent separately or piggy-backed on a datagram traveling the ack should be piggy-backed on the data part of the out-bound user
in the opposite direction. datagram, occupying the TSN Seen field (see section 2.3). Otherwise, a
stand-alone Extended Data Ack shall be used to carry the
acknowledgment.
The following example illustrates both separate and piggy-backed When Extended Data Ack is used, the sender shall fill the Last TSN
acknowledgments with both ends transmitting in reliable mode: Seen field to indicate the highest TSN Send number it has received
from the peer. Any detected gaps must also be reported
(see section 4.5).
The following example illustrates both stand-alone and piggy-backed
acknowledgments:
Endpoint A Endpoint Z Endpoint A Endpoint Z
{App sends 3 messages} {App sends 3 messages in strm 0}
[Header Flags=DAT|ACK|GAR U-Data(C/D = 01)
Part=0,Of=1 [Seen=5,Send=7,Strm=0,Seq=3]--------> (Start T2-receive timer)
Seen=0,Send=1,Size=100]-------------> (Start T2-receive timer)
(Start T3-send timer) (Start T3-send timer)
[Header Flags=DAT|ACK|GAR U-Data(C/D = 01)
Part=0,Of=1 [Seen=5,Send=8,Strm=0,Seq=4]-------->
Seen=0,Send=2,Size=100]----------->
(Restart T3-send timer)
[Header Flags=DAT|ACK|GAR U-Data(C/D = 01)
Part=0,Of=1 [Seen=5,Send=9,Strm=0,Seq=5]-------->
Seen=0,Send=3,Size=100]----------->
(Restart T3-send timer)
... ...
{Timer T2 expires} {Timer T2 expires}
/----------- [Header Flags=ACK /--------- Extended Data Ack(C/D=10)
/ Part=0,Of=0 / [Gap=0,Seen=9]
/ Seen=3,Send=1] (cancel T3-send timer) <----/
/
(cancel T3-send timer) <------
... ...
... ...
{App sends 1 message} {App sends 1 message; strm 0}
[Header Flags=DAT|ACK|GAR U-Data(C/D = 01)
Part=0,Of=1 [Seen=5,Send=10,Strm=0,Seq=6]-------> (Start T2-receive timer)
Seen=1,Send=4,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer) (Start T3-send timer)
... ...
{App sends 1 message} {App sends 1 message; strm 1}
(cancel T2-receive timer) (cancel T2-receive timer)
/----------- [Header Flags=DAT|ACK|GAR /------ U-Data (C/D=01)
/ Part=0,Of=1 / [Seen=10,Send=6,Strm=1,Seq=2]
/ Seen=4,Send=1,Size=45]
/ (Start T3-send timer) / (Start T3-send timer)
(cancel T3-send timer) <------ (cancel T3-send timer) <------/
(Start T2-receive timer) (Start T2-receive timer)
.. ..
{Timer T2 Expires} {Timer T2 Expires}
[Header Flags=ACK Extended Data Ack(C/D=10)
Part=0,Of=0 [Gap=0,Seen=6]----------------------> (cancel T3-send timer)
Seen=1,Send=5]------------------> (cancel T3-send timer)
Note that if the datagrams previously received from the same sending
endpoint was transmitted in Unreliable transfer mode (see Appendix E
for details on Unreliable transfer), the receiving endpoint must
reset its Seen counter to the value of the Send field in the current
reliable datagram.
4.1 Timer Management Rules 4.1 Timer Management Rules
The the following rules shall be used to manage the timers during The the following rules shall be used to manage the timers during
normal Reliable transfer, unless otherwise stated for some special normal datagram transfer, unless otherwise stated for some special
cases: cases:
A) When a reliable datagram with user data (i.e., with DAT flag set) is A) When a user datagram is received, the endpoint shall start a
received, the endpoint shall start a T2-receive timer if no other T2-receive timer if no T2-receive timer is currently running. Upon
timer is running, and upon the expiration of the T2-receive timer, the expiration of the T2-receive timer, the endpoint shall
the endpoint shall ack to the sender all the un-acked datagrams acknowledge to the sender all the un-acked user datagrams it has
it has received. received.
B) When a reliable datagram with user data is sent out, the sending B) When a user datagram is sent out, the sending endpoint shall start
endpoint shall start a T3-send timer. If the T3-send timer is a T3-send timer if no T3-send timer is currently running.
already running, the endpoint shall first stop the old T3 timer
and then start a new one. If the T2-receive timer is running, the If the T2-receive timer is running, the endpoint shall first stop
endpoint shall first stop the T2 timer, piggyback an Ack unto the the T2 timer, piggy-back an ack (or Extended Data Ack) unto the
out-bound datagram, and then start a T3-send timer. Upon the out-bound datagram, and then start a T3-send timer.
expiration of the T3-send timer, the endpoint shall follow the rules
described in 4.5 for possible re-transmission of the un-acked If the T3-send timer expires, the endpoint shall follow the rules
datagrams. Whenever the T3-send timer is started the RTT estimate described in 4.6 for possible re-transmission of the un-acked
last calculated for that network should be added to the base datagrams.
T3-send timer value (if a RTT value is measured, see section 4.6).
Moreover, whenever the T3-send timer is started the RTT estimate
last calculated for that remote network address should be added to
the base T3-send timer value (see sections 6.2 and 6.3 for RTT).
C) When all outstanding datagrams are acknowledged, the T3-send timer C) When all outstanding datagrams are acknowledged, the T3-send timer
shall be stopped if one is still running. shall be stopped if one is still running.
D) If an endpoint has a T3-send timer running and receives a partial
acknowledgment (one that acknowledges some of the outstanding
datagrams) then the endpoint shall restart the T3-send timer.
The following example shows the use of various timers. The following example shows the use of various timers.
Endpoint A Endpoint Z Endpoint A Endpoint Z
{App sends 2 messages} {App sends 2 messages; strm 0}
[Header Flags=DAT|ACK|GAR U-Data (C/D=01)
Part=0,Of=1 [Seen=5,Send=7,Strm=0,Seq=3] ---------> (Start T2-receive timer)
Seen=1,Send=6,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer) (Start T3-send timer)
[Header Flags=DAT|ACK|GAR U-Data (C/D=01) {App sends 1 message; strm 1}
Part=0,Of=1 {App sends 1 message} [Seen=5,Send=8,Strm=0,Seq=4] -\ /-- (cancel T2-receive timer)
Seen=1,Send=7,Size=100]---\ /--- (cancel T2-receive timer) \ / U-Data (C/D=01)
(Restart T3-send timer) \ / [Header Flags=DAT|ACK|GAR \ / [Seen=7,Send=6,Strm=1,Seq=2]
\ / Part=0,Of=1 \ (Start T3-send timer)
\/ Seen=6,Send=2,Size=100]
/\ (Start T3-send timer)
/ \ / \
<----/ ----> (Re-start T3-send timer) <-------/ \
... (Start T2-receive timer) \
... -> (Start T2-receive timer)
... ...
{T3-send timer expires} {T2-receive timer expires}
(re-transmit 2nd datagram) Extended Data Ack(C/D=10)
[Header Flags=DAT|ACK|GAR [Gap=0,Seen=6] -----------------------> (Cancel T3-send timer)
Part=0,Of=1
Seen=2,Send=7,Size=100]---------> (Cancel T3-send timer)
(Restart T3-send timer) (Start T2-receive timer)
.. ..
{Timer T2 expires} {T2-receive timer expires}
(Cancel T3-send timer) <-------------- [Header Flags=ACK (Cancel T3-send timer) <---------------- Extended Data Ack(C/D=10)
Part=0,Of=0 [Gap=0,Seen=8]
Seen=7,Send=3]
4.1.1 Link Rotation 4.1.1 T3-send Timer Adjustment with RTT
When multiple networks exist between two communicating endpoints, If the RTT measurement is available to a remote IP address, the sender
every time the application transmits a datagram, the MDTP shall adjust the T3-send timer each time when sending datagrams to
implementation MUST keep track of which network the transmission was that IP address. The calculation and adjustment of the timer should
sent on (if more than one network exists) in the MDTP protocol variable follow the method described in [4]. RTT measurement shall be tracked
'last.sent.intf'. If the user does not specifically override rotation, for each destination IP address if the remote host is multi-homed.
each send should be rotated in a round robin fashion amongst all
available networks and the protocol variable 'last.sent.intf' should
be updated to indicate which interface was used last.
The MDTP implementation MUST allow a user to override this rotation MDTP defines three methods to obtain RTT measurements, see sections
defeating MDTP's rotation upon each send. The implementation must also 4.7, 6.2, and 6.3.
provide a interface to add and remove a link from rotation eligibility.
4.2 Gap Acknowledgment for Missing Datagrams 4.2 Multihoming Rotation
If reliable datagrams become missing during a series of transmissions, 4.2.1 Remote Multihoming Rotation
a special type of acknowledgment known as the Gap Ack will be sent
back to inform the sender to re-transmit the missing datagrams.
The following example shows the use of Gap Ack. When an endpoint is transmitting to a remote multi-homed endpoint, the
transmitting endpoint shall rotate between destination IP addresses.
Every time the application transmits a datagram, MDTP MUST keep track
of the remote IP address to which it sent the datagram in the MDTP
protocol variable 'last.sent.intf'. MDTP should rotate each send in a
round robin fashion amongst all available destination IP addresses on
the remote multi-homed host and should update the protocol variable
'last.sent.intf' to indicate which destination IP address it last
used.
If possible, acks should be transmitted to the same IP address from
which the acked messages were received. When acknowledging multiple
messages, this may not be possible. In the latter case, MDTP SHOULD
rotate the transmission of acknowledgments to all remote IP addresses.
The MDTP implementation MUST allow an application to override this
rotation by specifying the destination IP address to which to send a
datagram. The implementation must also provide an interface to add
and remove a remote IP address from rotation eligibility.
4.2.2 Local Multihoming Rotation
As discussed in section 3.3.4 of RFC 1122, an endpoint MAY rotate
transmitted messages amongst all local network interfaces by
specifying the local IP address and UDP port or it may allow the
networking protocol to decide which local IP address (and network
interface) to use to transmit a datagram..
If possible, acks should be transmitted from the same IP address over
which the acked messages were received. When acknowledging multiple
messages, this may not be possible. In the latter case, MDTP SHOULD
rotate the transmission of acknowledgments from all configured IP
address/port pairs.
4.3 Stream Sequence Number
The datagram stream sequence number shall always be set to 1 when the
stream is opened.
Also, when the stream sequence number reaches the value 0xffff the
next sequence number shall be set to 1. Sequence number '0' has
special meaning (see section 4.4) and shall not be used in normal
sequence number rotation..
4.4 Ordered and Un-ordered Delivery
Normally, the receiver shall ensure the user datagrams within any
given stream be delivered to the upper layer according to the order of
their stream sequence number. If there are datagram arrived out of
order of their stream sequence number, the receiver must hold the
received datagrams from delivery until they are re-ordered.
However, a sender can set the stream sequence number of a user
datagram to 0, to indicate that no ordering shall be performed on that
datagram within that stream. Upon the reception of the datagram, the
receiver must by-pass the ordering mechanism and immediately delivery
the datagram to the upper layer.
This provides an effective way to transmit "out-of-band" data in any
given stream. Also, a stream can be used as an "un-ordered" stream by
simply setting the stream sequence number of each out-bound user
datagram to 0.
4.5 Report Missing Datagrams
MDTP uses a receiver-based retransmission policy, where the sender
attempts to elicit from the receiver information on the missing
datagrams before the retransmission.
If a receiver detects holes in the received user datagram sequence (by
examining TSN Send numbers), an Extended Data Ack with gap reports
shall be sent back to inform the sender so that the missing datagrams
can be re-transmitted.
Multiple gaps can be indicated in one single Extended Data Ack.
If there is out-bound user data, the endpoint shall piggy-back the
Extended Data Ack with the user data in the same MDTP datagram, by
setting the C/D bits to '11'. And the TSN Seen field in the data part
shall not be used, i.e., the sender shall set the field to 0x0 and the
receiver shall ignore it.
The following example shows the use of gap report in an Extended Data
Ack.
Endpoint A Endpoint Z Endpoint A Endpoint Z
{App sends 3 messages} {App sends 3 messages; strm 0}
[Header Flags=DAT|ACK|GAR U-Data (C/D=01)
Part=0,Of=1 [Seen=3,Send=6,Strm=0,Seq=2]-------> (Start T2-receive timer)
Seen=3,Send=8,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer) (Start T3-send timer)
[Header Flags=DAT|ACK|GAR U-Data (C/D=01)
Part=0,Of=1 [Seen=3,Send=7,Strm=0,Seq=3]-----X (lost)
Seen=3,Send=9,Size=100]-----X (lost)
(Restart T3-send timer)
[Header Flags=DAT|ACK|GAR U-Data (C/D=01)
Part=0,Of=1 [Seen=3,Send=8,Strm=0,Seq=4]-------> (A gap detected in data)
Seen=3,Send=10,Size=100]-----------> (A gap detected in data)
(Restart T3-send timer)
.. ..
{T2-receive timer expires} {T2-receive timer expires}
/------- [Header Flags=ACK /------ Extended Data Ack (C/D = 10)
/ Seen=9,Send=3, / [Gap=1,Strt=7,End=7,Seen=8]
/ Part=1,Of=1 (Prepare retransmission) <----/
/ data=(long integer)10]
(Prepare retransmit) <--------/
In this example, when "Z" receives the third datagram from "A" it In this example, when "Z" receives the third datagram from "A" it
realizes that a gap exists in the received data. At the expiration of realizes that a gap exists in the received data. At the expiration of
T2-receive timer, "Z" sends a Gap Ack, in place of a normal Ack, to T2-receive timer, "Z" sends an Extended Data Ack with a gap report to
"A" to indicate the missing datagram. "A" to indicate the missing datagram. Note that the Start and End
fields in the gap report specify the edges of the gap, i.e., the TSN
In the Gap Ack, the Part and Of fields are both set to '1', as opposed numbers between Start and End are missing.
to '0' as in a normal Ack. The data field of the Gap Ack is a four (4)
octet long integer containing the sequence number of the next datagram
after the Gap (which is 10 in this example). The Seen field in
the Gap Ack will contain the sequence number of the datagram of the
gap. Using these two values, "A" should be able to calculate the
the missing datagram numbers (which is 9 in this
example) and thus determine which datagrams will need to be
retransmitted.
Note that Gap Acks cannot be piggy-backed with user data; if there is When the peer endpoint is multi-homed, the Extended Data Ack should be
user data to be sent when a gap is detected, the Gap Ack must be sent sent out to the destination IP address specified in the MDTP protocol
out first before the datagram carrying user data can be sent. variable 'last.good.intf'. The value of 'last.good.intf' is always
updated to point to the source IP address from which the last datagram
from the peer endpoint arrived.
4.3 Flow and Congestion Controls 4.6 Range Check on TSN
For security reasons, the receiver must check the range of the TSN
Send value in each received user datagrams.
Assume that the highest TSN received from a peer is T and the maximal
window length of the same peer is W (exchanged during association
initiation, see section 3.1). When the next user datagram arrives from
this peer, the receiver shall silently discard the datagram if the TSN
Send value carried in the datagram is greater than T+W (calculation
rounds up at 0x7fffffff to 0x1).
4.7 Advisory Ack Request
An endpoint may use Advisory Ack Requests to improve bandwidth
utilization.
The endpoint should send an Advisory Ack Request to its peer when it
reaches half of its current window length, and also when it detects
that the next send will reach the full window length (see section 5.1
for window control).
Upon the reception of an Advisory Ack Request, when it is not under
flow control condition the peer endpoint should immediately
acknowledge all the datagrams it has received but not yet
acknowledged, and then cancel the T2-receive timer if one is still
running. Otherwise, the peer endpoint shall take no action and ignore
the Advisory Ack Request.
The following shows an example of using Advisory Ack Request:
Endpoint A Endpoint Z
{App sends 3 messages; strm 0}
U-Data(C/D = 01)
[Seen=5,Send=7,Strm=0,Seq=3]-------------> (Start T2-recv timer)
(Start T3-send timer)
U-Data(C/D = 01)
[Seen=5,Send=8,Strm=0,Seq=4]----------->
{detects window half full, use Advisory Ack Req}
Adv Ack Request(C/D=11)
[Seen=5,Send=9,Strm=0,Seq=5]------\
\
\----> (cancel T2-receive timer)
<---------------- Extended Data Ack(C/D=10)
[Gap=0,Seen=9]
An endpoint sending an Advisory Ack Request may also use this request
for its RTT calculation. The sending endpoint may note the time mark
when sending the datagram with the Advisory Ack Request. When the
peer endpoint responds with an Extended Data Ack, the sender of the
Advisory Ack Request may use the time mark of the arriving Extend Data
Ack and the stored time mark to calculate the RTT as defined in
[4]. However, the sender of the Advisory Ack Request shall abandon the
RTT calculation if more datagrams are sent to its peer and no Extended
Data Ack is received.
5 Congestion Controls
Several different mechanisms shall be used jointly to achieve Several different mechanisms shall be used jointly to achieve
flow and congestion controls in MDTP. congestion control in MDTP. These mechanisms are always used in regard
to the association, not a individual stream.
4.3.1 Sending with Window Control 5.1 Send with Window Control
The sending endpoint shall use a transmission window to control the The sending endpoint shall use a transmission window to control the
number of outstanding datagrams, i.e., datagrams that have been sent, number of outstanding datagrams, i.e., datagrams that have been sent,
but yet to be acknowledged. The length of the window is defined as the but yet to be acknowledged. The length of the window is defined as the
maximal number of outstanding datagrams a sending endpoint can maximal number of outstanding datagrams a sending endpoint can
allow. This length is adjusted dynamically, depending on the current allow. This length is adjusted dynamically, depending on the current
number of successful transmissions as well as the number of lost number of successful transmissions as well as the number of lost
datagrams. datagrams or retransmissions.
When the number of outstanding datagrams reaches the current window When the number of outstanding datagrams reaches the current window
length, the endpoint shall still accept send requests from its upper length, the endpoint shall still accept send requests from its upper
layer, but shall transmit no more datagrams until an Ack is received. layer, but shall transmit no more datagrams until some or all of the
outstanding datagrams are acknowledged. The endpoint may also elect
to queue only a specified number of datagram when the window is full.
When this maximal number of queued datagrams is reached the endpoint
shall return an error to its upper layer.
Moreover, when the window length is reached, the next send request Moreover, when the window length is reached, the next send request
from the upper layer will trigger the sending endpoint to transmit a from the upper layer will trigger a Window-up message to be
special Window Up message. Upon receiving this Window Up (WIN|ACK) the transmitted. Upon receiving this Window-up the receiver must respond
receiver must respond with a Window Up Response (WNR|ACK), as with a Window-up Ack, as illustrated by the following example
illustrated by the following example (assuming current window length (assuming current window length is 3):
is 3):
Endpoint A Endpoint Z Endpoint A Endpoint Z
{App sends 3 messages} {App sends 3 messages, strm 0}
[Header Flags=DAT|GAR|ACK U-Data(C/D = 01)
Part=0,Of=1 [Seen=5,Send=7,Strm=0,Seq=3]--------> (Start T2-receive timer)
Seen=0,Send=11,Size=100]-----------> (Start T2-recv timer)
(Start T3-send timer) (Start T3-send timer)
[Header Flags=DAT|GAR|ACK U-Data(C/D = 01)
Part=0,Of=1 [Seen=5,Send=8,Strm=0,Seq=4]-------->
Seen=0,Send=12,Size=100]----------->
(Restart T3-send timer)
[Header Flags=DAT|GAR|ACK U-Data(C/D = 01)
Part=0,Of=1 [Seen=5,Send=9,Strm=0,Seq=5]-------->
Seen=0,Send=13,Size=100]----------->
(Restart T3-send timer)
{App sends a new message} {App sends a new message, strm 1}
(queue new message and send Win Up) (queue new message and send Win-up)
[Header Flags=WIN|ACK Window-up(C/D = 10) ---------------> (cancel T2-recv timer)
Seen=0,Send=14]--------------------> (cancel T2-recv timer) /---- Window-up Ack(C/D = 10)
/----- [Header Flags=WNR|ACK / [Gap=0,Seen=9]
/ Part=0,Of=0 (Cancel T3-send timer) <--------/
/ Seen=14,Send=0] U-Data(C/D = 01)
[Header Flags=DAT|GAR|ACK <--------/ [Seen=5,Send=10,Strm=1,Seq=2]-------> (Start T2-receive timer)
Part=0,Of=1 (Start T3-send timer)
Seen=0,Send=15,Size=100]-----------> (Start T2-recv timer)
(Restart T3-send timer)
In the above example, after the transmission of the first three In the above example, after the transmission of the first three
datagrams, "A" reached its window length. The next message from the datagrams, "A" reached its window length. The next message from the
user triggered a Window Up that was sent to "Z". The Window Up shall user triggered a Window-up that was sent to "Z". The Window-up shall
contain no user data. In response, "Z" cancelled timer T2 and contain no user data. In response, "Z" cancelled timer T2 and
immediately sent a Window Up Response. The arrival of this Window Up immediately sent a Window-up Ack. The arrival of this Window-up Ack
Response effectively resolved all the outstanding datagrams at "A", effectively resolved all the outstanding datagrams at "A", thus
thus allowed "A" to send out the next datagram. allowing "A" to send out the next datagram.
4.3.2 Window Length Adjustment 5.1.1 Window Length Adjustment
The window length shall be initially set to 2, and shall then be The window length shall be initially set to 2, and shall then be
dynamically adjusted based on the datagram loss and acknowledgment dynamically adjusted based on datagram loss and acknowledgment.
conditions of the underlying network.
When 4 consecutive outstanding datagrams are acknowledged at once by If the current window length is less than or equal to 4, every time
the receiver, the sender's window length will be raised by 1 until it the number of consecutive outstanding datagrams acknowledged in a
reaches the protocol parameter 'Max.Outstanding.dg' (which should be a single ack is equal to or greater than half the current window length,
user configurable parameter). the sender's window length shall be raised by 1, until it reaches
'Max.Outstanding.dg'(which should be a user configurable parameter).
If the current window length is less than 4, every time when the If the current window length is greater than 4, every time the number
number of consecutively outstanding datagrams acknowledged in a single of consecutive outstanding datagrams acknowledged in a single ack is
Ack is equal to or greater than the current window length, the equal to or greater than 4, the sender's window length shall be raised
sender's window length shall be raised by 1, until it reaches by 1, until it reaches 'Max.Outstanding.dg'.
'Max.Outstanding.dg'.
In the following circumstances, the sender's window length shall be In the following circumstances, the sender's window length shall be
decreased. However, when the window length reaches 2 it shall not be decreased. However, when the window length reaches 2 it shall not be
decreased any further. decreased any further.
If between 1 to 3 consecutive datagrams are lost, the window length The peer endpoint may report reception gaps which may correspond to
will be decreased by 1. If between 4 to 7 datagrams are lost, the multiple datagram losses (indicated by an Extended Data Ack or
window length will be decreased by 2. If 8 or more datagrams are lost,
the window length will be decreased by 4.
Moreover, any time a Window Up is sent to the receiving endpoint the Window-up Ack). If between 1 to 3 datagrams are lost, the window
sender's window length will be decreased by 1. Also, if a timeout length shall be decreased by 1. If between 4 to 7 datagrams are lost,
forces a retransmission the sender's window length will be reduced the window length shall be decreased by 2. If 8 or more datagrams are
to half of its currently value. lost, the window length shall be decreased by 4.
Any time a Window Up is sent to the receiving endpoint the sender's
window length shall be decreased by 1. Also, if a timeout forces a
retransmission the sender's window length shall be reduced to half of
its currently value.
The following table summarizes these rules: The following table summarizes these rules:
- -----------------------------------------------------------------------
Duplicate Ack received by sender | Adjust down by 4 -----------------------------------------------------------------
- ----------------------------------------------------------------------- duplicate ack received by sender | Adjust down by 4
Greater than 8 datagrams lost | Adjust down by 4 -----------------------------------------------------------------
- ----------------------------------------------------------------------- 8 or more datagrams lost | Adjust down by 4
Greater than 4 datagrams lost | Adjust down by 2 -----------------------------------------------------------------
- ----------------------------------------------------------------------- 4 to 7 datagrams lost | Adjust down by 2
Greater than 0 datagrams lost | Adjust down by 1 -----------------------------------------------------------------
- ----------------------------------------------------------------------- 1 to 3 datagrams lost | Adjust down by 1
Timeout forces retransmission | Adjust down by 1/2 of the current -----------------------------------------------------------------
| window. Timeout forced retransmission | Adjust down by 1/2 of the
- ----------------------------------------------------------------------- | current window.
Window Up sent | Adjust down by 1 -----------------------------------------------------------------
- ----------------------------------------------------------------------- Window up sent | Adjust down by 1
-----------------------------------------------------------------
4 or more consecutive datagrams | Adjust up by 1 4 or more consecutive datagrams | Adjust up by 1
acknowledged (window length > 4) | acknowledged (window length > 4) |
- ----------------------------------------------------------------------- -----------------------------------------------------------------
1/2 Window length or more acked | Adjust up by 1 1/2 Window length or more acked | Adjust up by 1
(window length <=4) | (window length <=4) |
- ----------------------------------------------------------------------- -----------------------------------------------------------------
4.3.3 Flow Control using In-Queue Information 5.2 Send Timer Back-off at Re-transmission
By using the In Queue field in the MDTP header, the sender can inform Whenever a T3-send timer expires, the endpoint shall re-transmit the
the receiver the number of pending datagrams which the sender has un-acked datagram that has the highest TSN Send value in that and
received, but yet to deliver to its application. The following example re-start the T3-send timer, unless:
shows how the endpoints use In Queue value to accomplish Flow control.
Assume that Endpoint A has sent Endpoint Z 20 datagrams, and when A) If the current window length is reached, a Window-up message shall
Endpoint Z sends an Ack on the reception of these 20 datagrams, only be sent out (see section 5.1), or
the first one of them has been delivered to the upper layer at
Endpoint Z.
In the Ack sent by Endpoint Z, the In Queue field would then have a B) If the current window length is not reached and there is still user
value of 19, indicating the number of datagrams pending for delivery data pending for transmission, a new datagram with user data shall
to its upper layer. This value would be checked by Endpoint A before be sent out and T3-send timer shall be restarted.
it sent the next datagram to Endpoint Z. If this value was found to be
greater than its current window length, Endpoint A would not send the
next datagram. Instead, Endpoint A would start its T3-send timer and
send a Window Up message to Endpoint Z at the expiration of the timer.
This would force Endpoint Z to send another Ack with an updated In
Queue value. If the new In Queue value was still greater than its
window length, Endpoint A would re-start its T3-send timer, and repeat
this procedure until the In Queue value of Endpoint Z dropped below
the current window length of Endpoint A. Then, the transmission at
Endpoint A would resume.
4.3.4 T3-send Timer Adjustment with RTT When the T3-send timer is re-started at a retransmission, the length
of the timer shall be doubled from its previous value. Also, the
latest estimated RTT value for that network should be added to the new
timer value. The following shows the calculation of T3-send timer
value, where 'TL3-default' is a configurable protocol parameter.
If the RTT measurement is available on a specific network, the sender <at normal transmission>
shall adjust the T3-send timer each time when sending datagram using 1. TL3-value = 'TL3-default'
this network. The calculation and adjustment of the timer should 2. T3-send = TL3-value + RTT
follow the method described in [4]. RTT measurement shall be tracked
for each network if redundant networks are in use.
MDTP defines two optional methods to obtain RTT measurements, see <at re-transmissions>
sections 4.6 and 4.7. 1. TL3-value = TL3-value * 2
2. T3-send = TL3-value + RTT
4.4 Sequence Number Reset The T3-send timer at the sender shall be restored to its default value
when a datagram is received from the peer endpoint.
When the datagram sequence number reaches the value 0x7fffffff the The total number of consecutive re-transmissions to all destination IP
next sequence number shall be set to 1. addresses in an association shall be recorded. If this value exceeds
the limit defined in 'Max.Retransmit', the sending endpoint shall
consider the peer endpoint unreachable and shall stop transmitting any
more data to it. The sending endpoint MAY report the failure to the
upper layer, including all datagrams in its out-bound buffer which
have not been acknowledged. Whenever a datagram is received from a
peer endpoint the total number of retransmissions shall be cleared.
4.5 Datagram Re-transmission 6. Network Management
Whenever a T3-send timer expires, the endpoint shall re-transmit the 6.1 Failure Detection in Redundant Networks
un-acked datagram that has the lowest Send value, unless:
A) If the current window length is reached, a Window Up message will When the peer endpoint is multi-homed, the re-transmission of a
be sent out (see 4.3 Congestion Control), or datagram should be attempted to the destination IP address specified
in the MDTP protocol variable 'last.good.intf'. The value of
'last.good.intf' is always updated to point to the source IP address
from which the last datagram from the peer endpoint arrived.
B) If the current window length is not reached and there is still The number of consecutive T3-send timeout events is also recorded in
user data pending for transmission, a new datagram with user data a variable 'retran.count' for each destination IP address. This count
shall be sent out and T3-send timer shall be restarted. should be incremented when a T3-send time-out event occurs for that
destination IP address. Every time a datagram is received from a peer
endpoint, the receiving endpoint shall reset to 0 the 'retran.count'
corresponding to the source IP address .
When a T3-send timer is started at a re-transmission, the length of If the value in 'retran.count' exceeds half of the value of the
the next T3-send timer for this destination should be doubled and the protocol parameter 'Max.Retransmit', the destination IP address shall
last estimated RTT value for that network should be added to the timer. be reported to the upper layer as out-of-service and shall be removed
from eligibility for rotation. When re-transmitting a datagram, the
re-transmission should use 'last.good.intf' as the preferred
destination IP address to which to re-transmit, unless 'last.good.intf'
points to the destination IP address on which the original T3-send
time-out event occurred.
4.5.1 Re-transmission on Redundant networks In the event that a datagram is received from an IP address that has
been reported as out-of-service, the 'retran.count' shall be cleared
as specified above, the destination IP address shall be reported as
in-service to the upper layer, and the destination IP address shall be
considered valid for rotation.
When redundant networks exist between two communicating endpoints, the 6.2 RTT Measurement
re-transmission shall be attempted on the network specified in the
MDTP protocol variable 'last.good.intf'. The value of 'last.good.intf'
is always updated to refer to the network on which the last datagram
from the peer endpoint arrived.
Moreover, the number of consecutive re-transmissions is also recorded On occasions an endpoint of an association may need to perform an RTT
in a variable 'retran.count' for each network. Every time a datagram measurement of the network (or one of the redundant networks) between
is received on a network, the corresponding 'retran.count' shall be itself and its peer.
reset to 0.
If the value in the 'retran.count' of the current network exceeds RTT-request and RTT-ack messages shall be used to perform the RTT
half of the value of the protocol parameter 'Max.Retransmit', the measurement. In the messages, two 32 bit long opaque integers are used
'last.good.intf' will be changed, so as to force the next in the control parameter field to carry the time value.
re-transmission to be directed to an alternate network and
optionally report a failure condition.
The total number of consecutive re-transmissions across all the At the request of its upper layer, an endpoint shall initiate an RTT
networks in an association is also recorded. If this value exceeds the measurement by sending an RTT-request (to a specific network if
limit defined by 'Max.Retransmit', the sending endpoint shall consider redundant networks exist). The sender shall also place in Time value 1
the peer endpoint unreachable and stop transmitting data to it, and and Time value 2 the value of the current time mark.
optionally report the failure.
4.6 RTT Measurement Upon the reception of this RTT-request message, the recipient shall
immediately respond with a RTT-ack to the sender (over the same
network on which the RTT-request arrives if the recipient is
multi-homed), with the time mark carried in the original RTT-request
copied into its own Time value fields.
This defines the mechanism for round-trip-time (RTT) measurement in Upon the reception of this reply, the sender shall use the time mark
MDTP. in the reply RTT-ack to calculate the RTT (to the specific destination
IP address if redundant networks exist) as defined in [4].
On occasions either side of an association may need to perform an RTT Endpoint A Endpoint Z
measurement of the network (or one of the redundant networks) between {RTT - Request Now=x.y}
them. RTT-request (C/D=10)
[Time-value1=x,
Time-value2=y,
Seen=81] ----------------------->
/------- RTT-ack (C/D=10)
/ [Time-value1=x,
/ Time-value2=y,
/ Seen=3]
(Endpoint A uses <----------/
x.y to calculate RTT)
4.6.1 RTT Datagram Header Format 6.3 Network Heart Beat
The following shows the header format an endpoint shall use for RTT At the request of its upper layer, an endpoint shall enable heart beat
measurement: to a specific peer with which it has an established association.
MDTP Header Format - RTT measurement The RTT-request message defined in section 2.2 shall be used as
the heart beat while the RTT-ack shall be used as the heart beat
response.
0 1 2 3 After having heart beat enabled, the endpoint shall transmit a heart
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 beat to that specific peer and start a T5-heartBeat timer. The peer
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ shall immediately respond to the heart beat in the same manner as the
| MDTP Protocol Identifier | RTT measurement procedure described in section 6.2. This response, as
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ well as the new RTT measurement, shall be stored by the endpoint.
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number (Seen) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number (Send) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transparent Time Int-1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Transparent Time Int-2 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Two long integers are used in the data field to carry the time value. When the T5-heartBeat timer expires, the endpoint shall first check if
The RTT datagram is identified by setting the RTC or RTM bit to 1. the previous heart beat has been responded to (on the same network it
was sent in the case of multi-homed hosts). If not, the destination IP
address to which the last heart beat was sent shall have the
'retran.count' incremented and checked following the rules described
in section 6.1. Then, the endpoint shall send another heart beat and
re-start the T5-heartBeat timer.
4.6.2 Measure RTT In the case where one or both endpoints are multi-homed, the sending of
Heart beats shall follow the network rotation rules outlined in
section 4.2.
AT the request of its upper layer, an endpoint shall initiate an RTT If, before the expiration of T5-heartBeat timer, a datagram is
measurement by sending an RTT datagram with GAR, ACK, and RTC bits set received by the endpoint, the T5-heartBeat timer shall be stopped and
to 1 (to a specific network if redundant networks exist). No the appropriate T2-receive timer shall be started. In other words, the
user data shall be carried. The sender shall also place in Time Int-1 T5-heartBeat timer has the lower precedence than the T2-receive timer.
and Time Int-2 the value of the current time of day in seconds and
microseconds.
Upon the reception of this RTT datagram, the recipient shall When there are no datagrams to send and no other timers are running,
immediately return the datagram to the sender (over the same network the T5-heartBeat timer shall be started and the above procedure shall
on which the datagram arrives if redundant networks exist), with the continue.
RTM and ACK bits set to 1.
Upon the reception of this reply, the sender shall use the Time Int-1 The suggested interval for T5-heartBeat timer is 4000 ms, and may be
and Time Int-2 in the reply datagram to calculate the RTT (of the dynamically adjusted by adding the current RTT measurement if it is
specific network if redundant networks exist). available.
Endpoint A Endpoint Z 7. Termination of Association
RTT - Request Now=x.y
[Header Flags=ACK|GAR|RTC
Part=0,Of=1
Seen=1,Send=31,Size=0
Time-Int1=x
Time-Int2=y] ----------------------->
------ [Header Flags=ACK|RTM
/ Part=0,Of=0
/ Seen=31,Send=1
/ Time-Int1=x
/ Time-Int2=y]
/
(Endpoint A uses <-----------
current time subtracted from
x.y to calculate RTT)
4.7 Link Heart Beat Before an endpoint terminates itself, it shall send an Abort message
to each of its peer endpoints in all existing associations. The Abort
shall be sent without requiring an acknowledgment from the peer
endpoint. However, the sender of the Abort message MUST fill in the
peer's Init-Tag.
This defines the mechanism for activating and transmitting of link When the peer endpoint receives the Abort, after verifying the Tag,
heart beats in MDTP. the peer shall remove the sender from its record, and optionally
report the termination of the sender to its upper layer. However if
the Tag sent with the Abort message is incorrect, the peer must
silently discard the Abort message.
At request by its upper layer, an endpoint shall enable heart beat on The following shows an example of the termination of Endpoint A:
a specific peer with which it has an established association in the
Reliable transfer mode.
The RTT datagram defined in section 4.6.1 shall be used as the Heart Endpoint A
Beat. {App indicates termination}
Abort (C/D = 10)
[Tag-X] --------------------------------> to Endpoint X
After having heart beat enabled, the endpoint shall transmit a Heart Abort (C/D = 10)
Beat to that specific peer and start a T5-heartBeat timer. The peer [Tag-Y] --------------------------------> to Endpoint Y
shall immediately respond to the Heart Beat in the same manner as an
RTT as described in section 4.6. This response shall be stored by the
first endpoint (also can be used to update its RTT measurement).
When the T5-heartBeat timer expires, the endpoint shall first check if Abort (C/D = 10)
the previous heart beat has been responded (on the same network it was [Tag-Z] --------------------------------> to Endpoint Z
sent in the case of redundant network). If not, the network that the
last Heart Beat was sent upon shall be counted as a transmission
failure, and be handled following the rules described in section 4.5.
Then, the endpoint shall send another Heart Beat and re-start the
T5-heartBeat timer.
In the case where redundant networks exist, the sending of Heart beats 7.1 Graceful Shutdown of an Association
shall follow the link rotation rules outlined in section 4.1.1.
If, before the expiration of T5-heartBeat timer, a datagram is An endpoint in an association may decide to "graceful shutdown" the
transmitted or received by the endpoint, the T5-heartBeat timer shall association without completely closing it down. With graceful
be stopped and the appropriate T2-T4 timer shall be started. In other shutdown, both endpoints shall remove any record and pending datagrams
words, the T5-heartBeat timer has the lowest precedence. associated with the association. Further communications between the
two endpoints can be resumed by going through a re-initialization
procedure (see section 3.5.4).
When no datagram to send and no other timers are running, the A Graceful Shutdown message shall be sent to the peer endpoint of the
T5-heartBeat timer shall be start and the above procedure shall association, and the peer shall send back an acknowledgment. Note
continue. that it shall be the responsibility of the endpoint that sends the
Graceful Shutdown message to assure that all the outstanding datagrams
from its side have been resolved before it initiates the graceful
shutdown procedure.
The suggested interval for T5-heartBeat timer is 4000 ms. In the Graceful Shutdown message, the sender shall indicate the
highest TSN Seen it has received from the peer, as well as the
Init-Tag of the peer.
4.8 Advisory Acknowledgment Upon the reception of the Graceful Shutdown, the peer shall first
verify that Tag value contained in the Graceful Shutdown message is
valid. If the Tag is invalid, the message must be silently discarded.
This defines the mechanism for sending and handling of the Advisory The peer then shall verify, by checking the Seen numbers from the
Acknowledgments in MDTP. Graceful Shutdown message, that all the out-bound datagrams have
reached the destination. Otherwise, the peer shall re-transmit all
lost datagrams.
An endpoint may use Advisory Acks to increase bandwidth utilization After sending the Graceful Shutdown, if the endpoint receives any new
when transmitting over a reliable association. user datagram it shall immediately respond with an Extended Data Ack
and re-start its T3-send timer.
An Advisory Ack shall be indicated by setting RE1 flag to 1 in the The peer shall send a Graceful Shutdown Ack when all the outstanding
datagram. datagrams are acknowledged, then start a T4-shutdown timer. The
endpoint, after receiving the Graceful Shutdown Ack, must also
validate the Tag value contained in the message. If it does not match
the Tag value that unlocked the association, the message should be
silently discarded.
The endpoint shall send an Advisory Ack to its peer when it reaches The following sequence shows an example of Graceful Shutdown:
half of its current window length, and also when it detects that the
next send will reach the full window length.
Upon the reception of an Advisory Ack, the peer endpoint shall Endpoint A Endpoint X
immediately acknowledge all the datagrams it has received but yet {App indicates graceful shutdown}
acked upon, and then cancel the T2-recv timer if one is still Graceful Shutdown (C/D=10)
running. [Tag-X, Seen=10] ---------------------> (all datagrams resolved)
(start T3-send timer) /-------- Graceful Shutdown Ack (C/D=10)
/ [Tag-A]
/ (start T4-shutdown timer)
(cancel T3-send timer) <------/ ...
(clean-up the association) (T4-shutdown expires)
(clean-up the association)
The following shows an example of using Advisory Ack: Both endpoints shall reject any new data request from their upper layers
while the graceful shutdown procedure is in progress.
Endpoint A Endpoint Z 8. Stream Operations
{App sends 3 messages}
[Header Flags=DAT|GAR|ACK
Part=0,Of=1
Seen=0,Send=1,Size=100]-------------> (Start T2-recv timer)
(Start T3-send timer)
[Header Flags=DAT|GAR|ACK 8.1 Stream Initiation
Part=0,Of=1
Seen=0,Send=2,Size=100]----------->
(Restart T3-send timer)
{detects window half full, use Advisory Ack}
[Header Flags=DAT|GAR|ACK|RE1
Part=0,Of=1
Seen=0,Send=3,Size=100]------\
(Stop and restart T3-send timer) \
\----> (cancel T2-receive timer)
<---------------------- [Header Flags=ACK
Part=0,Of=0
Seen=3,Send=1]
4.9 Termination of an Association An MDTP association between the two endpoints must be established
before any stream operation.
When an endpoint terminates, it shall send a Shutdown datagram Except for the global stream (i.e, stream id 0), a stream shall be
(FIR|SHU) to each of the peer endpoints in all its existing initiated (opened) by the sender before any datagrams can be sent in
associations. The Shutdown datagram itself is sent in unreliable that stream. When a stream is no longer used, it shall be terminated
transfer mode and thus needs not to be acknowledged. (closed) by the user. Moreover, both sides of the association shall be
able to initiate or terminate streams independently.
When a peer endpoint receives the Shutdown, it will remove the sender The sender initiates a stream by sending a Stream Initiation. In
from its record, and optionally report the termination of the sender addition to specifying the Stream Identifier, the sender must set the
to the upper layer. Init-Tag field of the Stream Initiation to the Tag value of the peer
endpoint.
The following shows an example of the termination of Endpoint A: The sender shall also attach the stream-specific data, if any (usually
provided by the upper layer), with the Stream Initiation. Otherwise,
the Size of Stream Info shall be set to 0x0.
Endpoint A Then, the sender shall start a T3-send timer. If the T3-send timer
{App indicates termination} expires, the sender shall re-transmit the Stream Initiation.
[Header Flags=FIR|SHU
Seen=3,Send=14, ------------------------> to Endpoint X
[Header Flags=FIR|SHU Upon the reception of the Stream Initiation, the peer must first
Seen=1496,Send=101,------------------------> to Endpoint Y verify that the correct Tag value is carried in the Init-Tag field of
the Stream Initiation. If so, the peer shall respond immediately with
a Stream Initiation Ack. Otherwise, the peer must silently discard the
Stream Initiation.
[Header Flags=FIR|SHU The following example shows the opening of stream 5 by "A":
Seen=14,Send=2 -------------------------> to Endpoint Z
4.10 Draining of an Association Endpoint A Endpoint Z
{App Initiates stream 5}
Stream Initiation (C/D=10)
[Tag=Tag-Z,Strm=5] ----------------->
(Start T3-send timer)
(Cancel T3-send timer) <----------------- Stream Initiation Ack
(C/D=10) [Strm=5]
An endpoint in a association may decide to "drain" the association 8.2 Stream Termination
without completely shutting it down. By draining an association, both
endpoints will remove any record and pending datagrams associated with
the association. Further communications between the two endpoints can
be resumed by going through a re-initialization procedure (see
section 3).
In such a case, a Drain datagram (FIR|SHU|UNR) is sent to the peer An endpoint shall be allowed to terminate one of its streams by
endpoint of the association, and no Ack is required. sending a Stream Termination to the other side.
The following sequence shows an example of Draining: The same Tag verification process used for stream initiation shall
be applied to stream termination.
Endpoint A The peer shall send a Stream Termination Ack in response to the Stream
{App indicates draining} Termination.
[Header Flags=FIR|SHU|UNR
Seen=146,Send=1301]------------------------> to Endpoint X
5. Interface with upper level protocols The following example shows the termination of stream 5 by "A":
Endpoint A Endpoint Z
{App closes stream 5}
Stream Termination (C/D=10)
[Tag=Tag-Z,Strm=5] ------------------->
(Start T3-send timer)
(Cancel T3-send timer) <------------------ Stream Termination Ack
(C/D=10) [Strm=5]
Received datagrams associated with a terminated stream shall be
silently discarded. It is up to the endpoint to assure that all
outstanding user datagrams in the stream are acknowledged before the
stream termination.
8.3 Other Issues with Stream Operations
When an association is re-initialized (see section 3.5.4), all existing
streams within that association will be automatically terminated.
The receiver shall silently discard any datagrams associated with a
stream which has not yet been opened or has already been terminated.
9. Interface with Upper Layer
The upper layer protocols (ULP) shall request for services by passing The upper layer protocols (ULP) shall request for services by passing
primitives to MDTP and shall receive notifications from MDTP for primitives to MDTP and shall receive notifications from MDTP for
various events. various events.
The primitives and notifications described in this section should be The primitives and notifications described in this section should be
used as a guideline for implementing MDTP. used as a guideline for implementing MDTP.
A) Init.MDTP primitive A) Init.MDTP primitive
skipping to change at line 1109 skipping to change at page 31, line 10
Optional attributes: Optional attributes:
The following types of attributes may be passed along with The following types of attributes may be passed along with
the primitive: the primitive:
o Timer selection and its operation syntax -- to indicate to MDTP o Timer selection and its operation syntax -- to indicate to MDTP
an alternative timer the MDTP should use for its operation. an alternative timer the MDTP should use for its operation.
o Initial MDTP operation mode; o Initial MDTP operation mode;
o IP port number, if ULP wants it to be specified; o IP port number, if ULP wants it to be specified;
B) Send.Data primitive B) Init.Association
This primitive allows the upper layer to initiate an association to a
specific peer endpoint. The peer endpoint shall be specified by one of
the IP address/port pairs which define the endpoint (see section 1.1).
Mandatory attributes:
o associationID - specified as one of the IP address/port pairs of
the peer endpoint with which the association is to be established.
Optional attributes:
o eligibleNetList - a list of destination IP address/port pairs that
the peer endpoint is allowed to use in its network rotation. By
default, all destination IP address/port pairs on the peer are
available.
C) Term.Association
Terminating an association.
Mandatory attributes:
o associationID - specified as one of the IP address/port pairs of
the peer endpoint with which the association is to be terminated.
Optional attributes:
None.
D) Send.Data primitive
This is the main method to send datagrams via MDTP. This is the main method to send datagrams via MDTP.
Mandatory attributes: Mandatory attributes:
o data - This is the payload ULP wants to transmit; o data - This is the payload ULP wants to transmit;
o size - The size of the payload in number of octets; o size - The size of the payload in number of octets;
o to-address - The IP address and port number of the intended o associationID - One of the IP address/port pair of the peer endpoint.
receiver. In case of redundant networks, to-address can be any one Note that the actual destination address sent to will be determined
of the multiple IP addresses of the receiver. The network which the by MDTP due to the network rotation, unless the current mode
datagram will actually be sent through will be determined by MDTP due prohibits MDTP network rotation; in such a case the datagram will
to the link rotation, unless the current mode prohibits MDTP link be sent to the IP address/port specified by associationID.
rotation; in such case the datagram will be sent through the network
specified by to-address (see section 4.5).
Optional attributes: Optional attributes:
o mode-flags - This indicates a new MDTP operation mode, taking effect o mode-flags - This indicates a new MDTP operation mode, taking effect
immediately including the current datagram send; immediately including the current datagram send;
o context - optional information that will be carried in the o context - optional information that will be carried in the
Send.Failure notification to the ULP if the transportation of Send.Failure notification to the ULP if the transportation of
this datagram fails. this datagram fails.
C) Receive.Data primitive o streamID - to indicate which stream to send the data on. By
default, the global stream will be used.
E) Receive.Data primitive
This primitive shall return the first datagram in the MDTP in-queue to This primitive shall return the first datagram in the MDTP in-queue to
ULP, if there is one available. It may, depending on the specific ULP, if there is one available. It may, depending on the specific
implementation, also return other informations such as the sender's implementation, also return other informations such as the sender's
address, whether there are more datagrams available for retrieval, address, whether there are more datagrams available for retrieval,
etc. The behavior is undefined if no datagram is available when this etc. The behavior is undefined if no datagram is available when this
primitive is invoked. primitive is invoked.
Mandatory attributes: Mandatory attributes:
o buffer - the memory location indicated by the ULP to store the o buffer - the memory location indicated by the ULP to store the
received datagram and other information. received datagram and other information.
Optional attributes: Optional attributes:
None. None.
D) Data.Arrive notification F) Data.Arrive notification
MDTP shall invoke this notification on the ULP when a datagram is MDTP shall invoke this notification on the ULP when a datagram is
successfully received and ready for retrieval. successfully received and ready for retrieval.
E) Send.Failure notification G) Send.Failure notification
If a datagram can not be delivered MDTP shall invoke this notification If a datagram can not be delivered MDTP shall invoke this notification
on the ULP. on the ULP.
The following may be optionally passed with the notification: The following may be optionally be passed with the notification:
o data - the location ULP can find the un-delivered datagram. o data - the location ULP can find the un-delivered datagram.
o context - optional information associated with this datagram (see o context - optional information associated with this datagram (see
13.2). D).
F) Link.Status.Change notification H) Network.Status.Change notification
When a link is marked down (e.g., when MDTP detects a link failure), When a endpoint-id is marked down (e.g., when MDTP detects a failure),
or marked up (e.g., when MDTP detects a link recovery), MDTP shall or marked up (e.g., when MDTP detects a recovery), MDTP shall
invoke this notification on the ULP. invoke this notification on the ULP.
The following shall be passed with the notification: The following shall be passed with the notification:
o link-address - This indicates the IP address of the affected link; o endpoint-id - This indicates the IP address/port of the
o new-status - This indicates the new status of the link; peer endpoint affected by the change;
o new-status - This indicates the new status.
G) Communication.Up notification I) Communication.Up notification
This notification is used when MDTP becomes ready to send or receive This notification is used when MDTP becomes ready to send or receive
datagrams, or when a lost communication to an endpoint is restored. datagrams, or when a lost communication to an endpoint is restored.
The following shall be passed with the notification: The following shall be passed with the notification:
o status - This indicates what type of event that has occurred; o status - This indicates what type of event that has occurred;
o endpoint-id - The IP address and port number to identify the o associationID - An IP address/port to identify the peer endpoint;
endpoint;
H) Communication.Lost notification J) Communication.Lost notification
When MDTP loses communication to an endpoint completely or detects When MDTP loses communication to an endpoint completely or detects
that the endpoint has performed a shut-down operation, it shall invoke that the endpoint has performed a abort or graceful shutdown
this notification on the ULP. operation, it shall invoke this notification on the ULP.
The following shall be passed with the notification: The following shall be passed with the notification:
o status - This indicates what type of event that has occurred; o status - This indicates what type of event that has occurred;
o endpoint-id - The IP address and port number to identify the o associationID - An IP address/port number to identify the peer
endpoint; endpoint;
o packets-enqueue - The number and location of un-sent datagrams o packets-enqueue - The number and location of un-sent datagrams
still holding by MDTP; still holding by MDTP;
o last-acked - the sequence number last acked by that peer endpoint; o last-acked - the sequence number last acked by that peer endpoint;
o last-sent - the sequence number last sent to that peer endpoint; o last-sent - the sequence number last sent to that peer endpoint;
I) Change.Link.Rotation primitive K) Change.Network.Rotation primitive
When the upper layer wants to inform MDTP to make a specific network When the upper layer wants to inform MDTP to make a specific network
eligible or ineligible for in link rotation, the upper layer will send eligible or ineligible for in network rotation, the upper layer will send
this primitive to MDTP. this primitive to MDTP.
Mandatory attributes: Mandatory attributes:
o action - This indicates if the network is to be made eligible or o action - This indicates if the network is to be made eligible or
ineligible for link rotation. ineligible for network rotation.
o network-id - This is the IP address and port of the network to be o network-id - This is the IP address/port of the peer endpoint to
added or removed from link rotation consideration. be added or removed from network rotation consideration.
J) Open.Stream primitive L) Open.Stream primitive
This shall be used by the upper layer to open a new stream. This should be used by the upper layer to open a new stream.
Mandatory attributes: Mandatory attributes:
o endpoint-id - The IP address and port number to identify the o associationID - One of the IP address/port to identify the peer
peer endpoint to which the stream is to be opened. An association endpoint of the association to which the stream is to be opened. An
must have existed at the time of stream open. association must have existed at the time of stream open.
Optional attributes:
streamInfo - The upper layer should use this field to pass any
stream-specific data to the other endpoint of the association.
Returned attributes: Returned attributes:
o The stream number that is opened. o The stream number that is opened.
K) Close.Stream primitive M) Close.Stream primitive
This shall be used by the upper layer to request to close a stream. This shall be used by the upper layer to request to close a stream.
Mandatory attributes: Mandatory attributes:
o endpoint-id - The IP address and port number to identify the o associationID - One of the IP address/port to identify the peer
peer endpoint to which the stream is to be closed. endpoint of the association to which the stream is to be closed.
o stream number - The stream number to identify the stream to be o stream number - The stream number to identify the stream to be
closed (this should be the number returned by the Stream.Open closed (this should be the number returned by the Stream.Open
primitive on this stream). primitive on this stream).
6. Suggested MDTP Protocol Parameter Values 10. Suggested MDTP Timer and Protocol Parameter Values
The following are suggested timer values for MDTP: The following are suggested timer values for MDTP:
T1-init Timer - 160 ms T1-init Timer - 160 ms
T2-receive Timer - 20 ms T2-receive Timer - 20 ms
T3-send Timer - 160 ms + Last calculated RTT for that network. T3-send Timer - 160 ms (TL3-default)
T4-shutdown Timer - 300 ms
T5-heartBeat timer - 4000 ms (TL5-default)
The following protocol parameters are recommended: The following protocol parameters are recommended:
Max.Outstanding.dg - 20 messages Max.Outstanding.dg - 20 messages
Max.Retransmit - 10 attempts Max.Retransmit - 10 attempts
Max.Init.Retransmit - 8 attempts Max.Init.Retransmit - 8 attempts
Min.Mcast.Time.To.Reset - 5 seconds Min.Mcast.Time.To.Reset - 5 seconds
Num.Of.Mcast.Reset.Msg - 5 messages Num.Of.Mcast.Reset.Msg - 5 messages
7. Acknowledgments 11. Acknowledgments
The authors wish to thank Brian Wyld, A. Sankar, Henry Houh, Gary The authors wish to thank Brian Wyld, A. Sankar, Henry Houh, Gary
Lehecka, Ken Morneault, Lyndon Ong, and others for their very valuable Lehecka, Ken Morneault, Lyndon Ong, Greg Sidebottom and others for
comments. their very valuable comments.
8. Author's Addresses 12. Authors' Addresses
Randall R. Stewart Tel: +1-847-632-7438 Randall R. Stewart Tel: +1-847-632-7438
Cellular Infrastructure Group EMail: stewrtrs@cig.mot.com Cellular Infrastructure Group EMail: stewrtrs@cig.mot.com
Motorola, Inc. Motorola, Inc.
1475 W. Shure Drive, #2C-6 1475 W. Shure Drive, #2C-6
Arlington Heights, IL 60004 Arlington Heights, IL 60004
USA USA
Qiaobing Xie Tel: +1-847-632-3028 Qiaobing Xie Tel: +1-847-632-3028
Cellular Infrastructure Group EMail: xieqb@cig.mot.com Cellular Infrastructure Group EMail: xieqb@cig.mot.com
Motorola, Inc. Motorola, Inc.
1501 W. Shure Drive, #2309 1501 W. Shure Drive, #2309
Arlington Heights, IL 60004 Arlington Heights, IL 60004
USA USA
Tom Bova Tel: +1-703-484-3331
Cisco Systems Inc. EMail: tbova@cisco.com
13615 Dulles Technology Drive
Herndon, VA 20171
Suheel Hussain Tel: +1-919-472-2312 Suheel Hussain Tel: +1-919-472-2312
Cisco Systems Inc. EMail:ssh@cisco.com Cisco Systems Inc. EMail:ssh@cisco.com
7025 Kit Creek Road 7025 Kit Creek Road
Research Triangle Park, NC 27709 Research Triangle Park, NC 27709
Ted Krivoruchka Tel: +1-703-484-3331 Chip Sharp Tel: +1-919-851-2085
Cisco Systems Inc. EMail: tedk@cisco.com Cisco Systems Inc. EMail:chsharp@cisco.com
13615 Dulles Technology Drive
Herndon, VA 20171
Renee Revis Tel: +1-703-472-5681
Cisco Systems Inc. EMail: drrevis@cisco.com
7025 Kit Creek Road 7025 Kit Creek Road
Research Triangle Park, NC 27709 Research Triangle Park, NC 27709
9. References Hanns Juergen Schwarzbauer Tel: +49-89-722-24236
SIEMENS AG
Hofmannstr. 51
81359 Munich, Germany
EMail: HannsJuergen.Schwarzbauer@icn.siemens.de
Tom Taylor Tel: +1-613-736-0961
Nortel Networks EMail:taylor@nortelnetworks.com
1852 Lorraine Ave.
Ottawa Ontario Canada
K1H6Z8
Ian Rytina Tel:
Ericsson Australia EMail:ian.rytina@ericsson.com
37/360 Elizabeth Street
Melbourne, Victoria 3000, Australia
13. References
[1] Postel, J. (ed.), "Internet Protocol - DARPA Internet Program [1] Postel, J. (ed.), "Internet Protocol - DARPA Internet Program
Protocol Specification", RFC 791, USC/Information Sciences Institute, Protocol Specification", RFC 791, USC/Information Sciences Institute,
September 1981. September 1981.
[2] Postel, J., "User Datagram Protocol", RFC 768, USC/Information Sciences [2] Postel, J., "User Datagram Protocol", RFC 768, USC/Information Sciences
Institute, August 1980. Institute, August 1980.
[3] Postel, J. (ed.), "Transmission Control Protocol", RFC 793, USC/ [3] Postel, J. (ed.), "Transmission Control Protocol", RFC 793, USC/
Information Sciences Institute, September 1981. Information Sciences Institute, September 1981.
[4] Jacobson V., "Congestion Avoidance and Control", Proceedings of [4] Jacobson V., "Congestion Avoidance and Control", Proceedings of
SIGCOMM '88, pp 314-329, August, 1988. SIGCOMM '88, pp 314-329, August 1988.
[5] Seth, T., etc. "Performance Requirements for Signaling in Internet [5] Seth, T., etc. "Performance Requirements for Signaling in Internet
Telephony", Internet-Draft <draft-seth-sigtran-req-00.txt>, May, 1999. Telephony", Internet-Draft <draft-seth-sigtran-req-00.txt>, May 1999.
Appendix A: Stream-based Reliable and Ordered Delivery
This defines a reliable and ordered stream mechanism for MDTP. It is
optional for implementation.
A stream in MDTP is defined as a sequence of user datagrams that needs
to be reliably delivered with sequence preservation of its own. In
other words, the delivery of a stream shall not be delayed because of
the losses or re-transmissions occurred in other streams within the
same MDTP association. This capability is a critical requirement of
some telephony call signaling protocols [5].
Stream datagrams are identified by setting FLO bit to 1.
A.1 Stream Initiation
First, an MDTP association between the two endpoints must be initiated
before any stream operation.
A stream shall be initiated (opened) by the sender before datagrams
can be sent in the stream, and after the stream is complete it shall
be terminated (closed) by the user. Also, both sides of the
association shall be able to initiate or terminate streams
independently.
The sender initiates a stream by sending a Stream Initiation
(NOB|UNR), using the following header format:
Stream Initiation
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MDTP Protocol Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Seen = 0x0 (or Tag) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Send = 0x0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| New Stream Number | 0x0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Note that in the Stream Initiation, the Seen and Send shall be set to 0,
and the number of the new stream being initiated shall be indicated
in the first two octets of the data field.
However, if this is the first datagram sent out after receiving the
Initiation Ack from the peer (see section 3.1), the Seen field of
above Stream Initiation shall be set to the Tag value carried in the
Initiation Ack.
Upon the reception of the Stream Initiation, the peer shall respond
immediately with a Stream Initiation Ack (NOB|UNR|ACK), using the
following header format:
Stream Initiation Ack
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MDTP Protocol Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Seen = Stream Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Send = 0x0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The following example shows the opening of stream 5 by "A":
Endpoint A Endpoint Z
{App Initiates stream 5}
[Header Flags=FLO|UNR
Part=0,Of=1
Seen=0,Send=0,Size=0,
Stream=5 ]--------------------------->
(Start T3-send timer)
(Cancel T3-send timer) <--------------------- [Header Flags=FLO|UNR|ACK
Mode=UNR
Part=0,Of=1
Seen=5,Send=0]
A.2 Stream Termination
For an existing stream, either side shall be allowed to terminate the
stream by sending a Stream Termination (FLO|UNR|SHU) to the other side.
Besides flag RES, The Stream Termination shall use the same header
format as that used in Stream Initiation datagram (see A.2)
A Stream Termination Ack (FLO|UNR|SHU|ACK) shall be sent by the peer
endpoint in response.
The following example shows the termination of stream 5 by "A":
Endpoint A Endpoint Z
{App terminates stream 5}
[Header Flags=FLO|UNR|SHU
Part=0,Of=1
Seen=0,Send=0,Size=0,
Stream=5 ]--------------------->
(Start T3-send timer s-5)
(Cancel T3-send timer s-5) <------------ [Header Flags=FLO|UNR|SHU|ACK
Part=0,Of=1
Seen=5,Send=0]
Datagrams associated to a terminated stream received by either side
should be silently discarded. It is up to the side which terminates
the stream to assure that all outstanding user datagrams in the stream
are acknowledged before the termination.
A.3 Stream Datagram Transfer
A.3.1 Header Format in Stream Datagrams with User Data
The MDTP header in a stream datagram with user data shall have the
following format:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MDTP Protocol Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Seen |
| Stream Number | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Send |
| Stream Number | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \
/ data /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The stream number and sequence number in the Send field shall be used
by the sender to identify the current stream datagram. And, the
stream number and sequence number in the Seen field shall be used
by the sender to acknowledgment of stream datagrams it has received.
Stream number 0 and sequence number 0 are reserved for special
purposes and are not valid stream number or sequence number.
A.3.2 Transmission of Stream Datagrams
The rules of using the Seen Sequence Number and Send Sequence Number
are similar to those defined for normal MDTP non-stream datagram
transmissions (see section 4), except that for stream transfer the
sequence numbers shall roll-over to 1 after 0xFFFF.
Moreover, each stream maintains its individual T3-send timer, but only
one global T2-receive timer is maintained for all existing streams.
Acknowledgment to a stream datagram shall either be sent separately
or be piggy-backed with a stream datagram (not necessarily belonging
to the same stream) traveling in the opposite direction. For a
separate Stream Ack, the Send field will be set to 0000:0000.
The following shows an example of transmitting a stream datagram
(FLO|REL|DAT) and a separate Stream Ack (FLO|REL|ACK):
Endpoint A Endpoint Z
{App sends first data on stream 5}
[Header Flags=FLO|REL|DAT
Part=0,Of=1
Seen=0-0,Send=5-1,Size=20]----\
(Start T3-send timer-s5) \--->(Start T2-recv)
...
{T2-recv Timer Expires}
(Cancel T3-send timer-s5) <--------------- [Header Flags=FLO|REL|ACK
Part=0,Of=1
Seen=5-1,Send=0-0,Size=0]
The following example shows the use of a piggy-backed Stream Ack.
{App sends new data on stream 5}
[Header Flags=FLO|REL|DAT
Part=0,Of=1
Seen=0-0,Send=5-4,Size=20]--------->(Start T2-recv)
(Start T3-send timer-s5) ...
{App sends data on stream 11}
(cancel T2-recv Timer)
/----- [Header Flags=FLO|REL|DAT|ACK
/ Part=0,Of=1
/ Seen=5-4,Send=11-8,Size=10]
/ (Start T3-send timer-s11)
(Cancel T3-send timer-s5) <-----/
(Start T2-recv timer)
...
{T2-recv Timer Expires}
[Header Flags=FLO|REL|ACK
Part=0,Of=1
Seen=11-8,Send=0-0,Size=0]--------->(Cancel T3-send-s11)
Note that when piggy-back a Stream Ack with an out-bound stream
datagram when more than one streams have un-acked datagrams, the
endpoint shall choose one stream and piggy-back a Stream Ack on one of
the datagrams, and shall leave the T2-recv timer running.
A.3.3 Extended Stream Ack
Upon the expiration of T2-recv timer, if there are more than one
stream datagrams received but yet acked upon by the endpoint, an
Extended Stream Ack shall be used.
The following defines the header format of the Extended Stream Ack
that acknowledges N stream datagrams received:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MDTP Protocol Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Seen |
| Stream Number #0 | Sequence Number #0 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Number of Extra Acks = N-1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stream Number #1 | Sequence Number #1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ /
/ \
\ /
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Stream Number #N-1 | Sequence Number #N-1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Note that an Extended Stream Ack is identified by setting the Seen
field to the number of extra acks carried in its data field, as shown
above. Also, Extended Stream Acks shall not be piggy-backed.
The following example shows the using of an Extended Stream Ack
(NOB|REL|ACK) by "Z":
Endpoint A Endpoint Z
{App sends data on stream 5}
[Header Flags=FLO|REL|DAT
Part=0,Of=1
Seen=0-0,Send=5-2,Size=20]----------> (Start T2-recv)
(Start T3-send timer-s5)
{App sends data on stream 9}
[Header Flags=FLO|REL|DAT
Part=0,Of=1
Seen=0-0,Send=9-4,Size=20]---------->
(Start T3-send timer-s9)
{App sends more data on stream 5}
[Header Flags=FLO|REL|DAT
Part=0,Of=1
Seen=0-0,Send=5-3,Size=20]---------->
(Restart T3-send timer-s5)
{App sends data on stream 7}
[Header Flags=FLO|REL|DAT
Part=0,Of=1
Seen=0-0,Send=7-11,Size=20]--------->
(Start T3-send timer-s7)
...
{T2-recv Timer Expires}
(Cancel T3-send timer-s5) <-------------- [Header Flags=FLO|REL|ACK
(Cancel T3-send timer-s7) Part=0,Of=1
(Cancel T3-send timer-s9) Seen=5-3,NumExtAck=2,
Size=0,
ext[0]=9-4,
ext[1]=7-11]
A.4 Other Issues with Stream Transfer
- -- Congestion control, including the rules for timer management and window
management, shall apply to Stream Transfer the same way as it does to
non-Stream based transfer, as defined in section 4.3.
- -- When an association is re-initialized (see section 3.4), all existing
stream within that association will be automatically terminated.
- -- The receiver shall silently discard any datagrams associated
with a stream which has not been initiated or has already been
terminated.
- -- The same re-transmission and link rotation rules as defined in
section 4 shall apply to Stream Transfer.
- -- Bundled Message (see Appendix B) may be allowed in Stream Transfer,
but fragmentation (see Appendix C) shall not be allowed.
Appendix B: Bundled Message Transfer
This defines the mechanism for bundled datagram transport in MDTP. It
is optional for implementation.
Bundling is sometimes desired by the user when transferring small
datagrams, as a way of improving network utilization.
In bundled transfer, MDTP allows an endpoint to bundle small
application messages into one single datagram for transmission. This
bundled mode can be applied to both reliable and unreliable datagrams
(see Appendix E for Unreliable Delivery).
Note that an endpoint shall never send bundled messages to a peer if
that peer endpoint set NOB bit to 1 during their association
initialization (see section 3).
B.1 Format of Bundled Datagram
The ISB bit in the flag field is set to indicate the current datagram
is bundled, i.e., it contains multiple messages. The format of a
bundled datagram is defined as follows:
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MDTP Protocol Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number (Seen) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number (Send) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Total Number Of Messages=N | Message #1 Size = B1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| B1 octets of data |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message #2 Size = B2 | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| B2 octets of data |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \
/ /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message #N Size = BN | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ |
| BN octets of data |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Data_Size in a bundled datagram indicates the actually size of the
data field of the datagram, including both the bundling overhead and
the actually user data. Since no fragmentation is allowed in a bundled
datagram, the Part field will always be '0' and the Of field always be
'1'.
The first two octets of the data field is a 16 bit integer indicating
the number of messages bundled in the current datagram. This is
followed immediately by a list of bundled messages. Each bundled
message starts with an integer of two octets indicating the size of
the data in the message, followed by the data itself.
All integers in the datagram should be transmitted in the network byte
order.
B.2 Bundled Datagram Transfer
The T4-bundling timer and two protocol parameters, namely the
Min.Bundle and Max.Bundle, are used to control the bundling of user
datagrams.
The endpoint will withhold the datagram from transmission and start
T4-bundle timer, if the combined size of all user datagrams currently
pending for transmission in the out-bound buffer is smaller than
'Min.Bundle'.
Each time a new out-bound user data becomes available for
transmission, the endpoint will attempt to bundle the new data with
the current withheld datagram by using the following rules:
A) If the size of the new data is greater than or equal to
'Min.Bundle', the current withheld datagram will be transmitted and
T4-bundle timer will be canceled. Then, the new data will be
transmitted in a separate datagram.
B) If the size of the new data is less than 'Min.Bundle', but the
combined size of the current datagram and the new data is greater
than or equal to 'Max.Bundle', the current datagram will be sent and
the new data will be withheld as the new current datagram.
C) If the size of the new data is less than 'Min.Bundle', and the
combined size of the current datagram and the new data is greater
than 'Min.Bundle', but less than 'Max.Bundle', the new data will be
bundled into the current datagram and the bundled datagram will be
immediately transmitted. and T4-bundle timer will be canceled.
D) If the size of the new data is less than 'Min.Bundle', and the
combined size of the current datagram and the new data is less than
Min.Bundle, the new data will be bundled into the current
datagram. And the T4-bundle timer will be restarted.
E) If T4-bundle timer expires, the current datagram will be sent
immediately.
F) When a T2-receive timer expires, any bundled data waiting to be
transmitted should be sent immediately with a piggy-backed Ack to
acknowledge all un-acked data previously received.
G) If a T4-bundle timer is running and data arrives, the T2-receive
timer should not be started.
H) A T4-bundle timer should never be canceled unless it is being
supplanted by a T3-send timer.
When a bundled datagram arrives at the receiving endpoint, each
message is unbundled and delivered separately to the upper layer.
The following are the suggested protocol parameter values for bundled
datagram transfer:
T4-bundle Timer - 40 ms
Min.Bundle - 1000 octets
Max.Bundle - 1432 octets
Appendix C: Fragmented Message Transfer
This defines the mechanism for fragmented datagram transport in
MDTP. It is optional for implementation.
When the size of an out-bound user message exceeds the value defined
in the protocol parameter Max.Bundle, the endpoint shall fragment the
message into smaller pieces of size equal to or smaller than
'Max.Bundle' and send each piece out in a separate datagram.
The "Part" and "Of" fields are used to disassemble and reassemble the
fragmented message. The combination of the maximal 'Of' value, which
is 255, and the maximal Data Size (see section 2.2) will determined
the maximal size of a single user message that the MDTP can send or
receive in fragmented message transfer mode.
However, an endoint shall never send fragmented datagrams to a peer if
that peer set the NOM bit to 1 during their association
initialization.
The following example shows the transmission of a fragmented message
(assuming Max.Bundle=1432, Min.Bundle=1000):
Endpoint A Endpoint Z
{App sends message size=3300 octets}
[Header Flags=DAT|ACK|GAR
Part=0,Of=3
Seen=3,Send=16,Size=1432]-------> (Start T2-receive timer)
[Header Flags=DAT|ACK|GAR
Part=1,Of=3
Seen=3,Send=17,Size=1432]------->
[Header Flags=DAT|ACK|GAR
Part=2,Of=3
Seen=3,Send=18,Size=436]-------->
(Start T3-send timer)
..
{Timer T2 Expires}
/----------- [Header Flags=ACK
/ Mode=0
/ Part=0,Of=0
(cancel timer T3) <-----------/ Seen=18,Send=4]
Notice that "A" is using the reliable transfer mode to send the
fragmented message, therefore "Z" will hold the fragments and request
retransmission if a fragment is found missing, i.e., if a gap is found
in the received data (see ). When all the parts of the fragmented
message are received, the receiving endpoint will re-assemble the
message and dispatch it to the upper layer.
It is also allowed in MDTP to send fragmented message using Unreliable
Transfer mode (see section 4.5). However, in unreliable mode, each
fragment will be dispatch to the application upon its arrival, and no
retransmission will be requested even if a fragment is found missing.
Bundling is prohibited if the current datagram contains a fragment of
a fragmented message.
Appendix D: Multicast Datagram Transfer
This defines the mechanism for unreliable transportation of multicast
datagrams in MDTP. It is optional for implementation.
D.1 Multicast Datagram Header Format
Multicast datagrams are identified by setting MUL, UNR, and DAT bits
to 1.
Two new fields are added to the standard MDTP datagram header to
support multicast:
Multicast To Transmit address - This is the multicast address, in
network byte order, that the sender transmitted the data to. The
receiver can use this information for internal tracking purposes.
Multicast From - This is the network address (or the IP Address of
Network 1 as described in 3.2, if redundant networks exist) of the
sender, in network byte order.
MDTP Header Format - Multicast Format
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MDTP Protocol Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Version | Flags | In Queue |
| |N N W I F R D A M S W R R F G U| |
| |O O I S I T A C U H N E T L A N| |
| |M B N B R M T K L U R 1 C O R R| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Acknowledgment Number (Seen) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Sequence Number (Send) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Data Size | Part | Of |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Multicast To Transmit address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Multicast From - senders base address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
\ \
/ data /
\ \
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
For multicast datagrams, the value in the Send field shall indicate
the sequence number of multicast datagrams transmitted by the
sender. This information helps the receiver of the multicast to detect
duplicated multicast datagrams and also to detect lost multicast
datagrams from the same sender. The Seen field shall normally be
set to 0, unless in some special cases stated below.
Bundling and fragmentation are not allowed in either multicast or
broadcast datagrams.
No initiation shall be needed for an endpoint to transmit to a
multicast address.
D.2 Transmission of Multicast Datagrams
The following example illustrates multicast transmissions between two
endpoints.
Endpoint A Endpoint Z
{App multicasts a message}
[Header Flags=MUL|UNR|DAT
Part=0,Of=1
Seen=0,Send=5,Size=250]--------------> (no Ack necessary)
...
{App multicasts a message}
[Header Flags=MUL|UNR|DAT
Part=0,Of=1
Seen=0,Send=6,Size=500]--------------> (no Ack necessary)
Notice that the values of the Send field in the multicast datagrams
(which are 5 and 6, respectively). They represent the sequence numbers
of the multicast datagrams "A" has sent out. Endpoint Z should use
this value to detect missing or duplicate datagrams.
Duplicate datagrams will be discarded and no effort will be made to
retransmit lost multicast datagrams.
D.3 Reset of the Multicast Datagram Sequence Number
If the Seen field of a received multicast datagram equals to '1', this
indicates that the sender has reset its multicast datagram sequence
number. The receiving endpoint, upon detecting this reset indicator in
the incoming multicast datagram, should start a procedure to adopt the
new sequence number for error detection. However, caution
should be taken to prevent false resets due to duplicated datagrams
with reset indicator propagating through multiple networks.
To guarantee that all receivers of the multicast group adopt the new
sequence number, the reset indicator should be repeated within the
first N multicast datagrams sent out after the reset. N is predefined
by the protocol parameter 'Num.Of.Mcast.Reset.Msg'.
At the receiving endpoint, when the reset indicator is detected the
new sequence number will be adopted. However, if two reset events are
detected within a predefined time interval (Min.Mcast.Time.To.Reset),
the second reset indicator will be ignored.
The suggested values for these two protocol parameters are:
Min.Mcast.Time.To.Reset - 5 seconds
Num.Of.Mcast.Reset.Msg - 5 messages
Appendix E: Unreliable Delivery
This defines the support for sending Unreliable datagrams in MDTP. It
is optional for implementation.
The unreliable transfer mode allows two endpoints to send to each
other without acknowledging the receiving. This can usually achieve
higher data throughput than the reliable transfer mode. To indicate
the unreliable transfer mode the sender of a datagram with user data
simply sets the UNR flag to 1. The following sequence illustrates
unreliable data transfer.
Endpoint A Endpoint Z
{App sends 2 messages}
[Header Flags=UNR|DAT|ACK
Part=0,Of=1
Seen=0,Send=4,Size=100]-------->
[Header Flags=UNR|DAT|ACK
Part=0,Of=1
Seen=0,Send=5,Size=100]-------->
{App sends 1 message}
<------- [Header Flags=UNR|DAT|ACK
Part=0,Of=1
Seen=5,Send=1,Size=450]
...
{App sends 2 more messages}
[Header Flags=UNR|DAT|ACK
Part=0,Of=1
Seen=1,Send=6,Size=100]------>
[Header Flags=UNR|DAT|ACK
Part=0,Of=1
Seen=451,Send=7,Size=100]------>
Note that no timers shall be started by either end, and that even
though both ends are in Unreliable transfer mode, the ACK flag is
still set by the sender of the datagram. This means that the Seen
field in the datagram header is still valid to indicating the sequence
number of the last datagram received by the sender. The upper layer
can use this information to help detecting missing or duplicated
datagrams. However, MDTP shall make no effort to detect or retransmit
missing data or to screen out duplicated datagrams.
E.1 Ordered Unreliable Delivery [6] Rytina, I., "Framework for Generic Common Signaling Transport
Protocol", draft-rytina-sigtran-generic-framework-00.txt>, Feb. 1999.
In unreliable transfer, the sender should be allowed to request [7] Ashworth, J., "The Naming of Hosts", RFC 2100, April 1997.
ordered delivery by setting the RE1 flag to 1.
When Ordered Unreliable Delivery is indicated, the receiver shall [8] Braden, R., "Requirements for Internet hosts - Application and
order the newly arrived datagram with any datagrams it has received Support", RFC 1122, October 1989.
but yet passed to its upper layer.
If it receives a datagram which is older than the last datagram it has [9] Eastlake 3rd, D., Crocker, S., and Schiller, J., "Randomness
passed to the upper layer, that datagram shall be silently discarded. Recommendations for Security", December 1994.
This Internet Draft expires in 6 months from April 1999. This Internet Draft expires in 6 months from June 1999.
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/