Network Working Group                                      R. R. Stewart
INTERNET-DRAFT                                                  Motorola
                                                                  Q. Xie
                                                                Motorola
Expires in six months                                    15 Feburary                                      22 March 1999

          MULTI_NETWORK DATAGRAM TRANSMISSION PROTOCOL
                 <draft-sigtran-mdtp-01.txt>
                 <draft-ietf-sigtran-mdtp-02.txt>

Status of This Memo

This document is an Internet-Draft and is in full conformance
with all provisions of Section 10 of RFC2026. Internet-Drafts are working
documents of the Internet Engineering Task Force (IETF), its areas,
and its working groups.  Note that other groups may also distribute
working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any
time.  It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress."

To learn the current status of any Internet-Draft, please check the
"1id-abstracts.txt" listing contained in the Internet- Drafts Shadow
Directories on ftp.is.co.za (Africa), nic.nordu.net (Europe),
munnari.oz.au (Pacific Rim), ftp.ietf.org (US East Coast), or
ftp.isi.edu (US West Coast).

Abstract

This Internet Draft discusses an experimental call control protocol,
namely the Multi-network Datagram Transmission Protocol (MDTP), that is
intended to provide fault-tolerant reliable/unreliable data transfer
between communicating processes over IP networks [1]. MDTP is proposed
as an application-level protocol which is designed with a high emphasis
on supporting redundant networks and transparent fault management. MDTP
also gives the application a great degree of timing control and
configuration flexibilities. The motivation of developing MDTP is to
establish a framework for supporting Internet-based high reliability
real-time commercial applications such as signaling and call control
for Internet telephony.

Stewart & Xie                                                  [Page  1]
                        TABLE OF CONTENTS

1.  Introduction..............................................3
     1.1 Multi-network Datagram Transmission Protocol.........3
     1.2 Interfaces to MDTP...................................5
     1.3 Operation of MDTP....................................5
2.  Design Principles.........................................6
3.  Header Format.............................................7
     3.1 MDTP Header Format Description.......................7
     3.2 Notes on Multicast Header format....................10
4.  Transmission Initialization..............................10
     4.1 Normal Initialization...............................10
     4.2 Multiple Network Addresses..........................12
     4.3 Initialization Collision............................13
     4.4 Re-initialization...................................14
     4.5 Link rotation.......................................14
5.  Reliable Transfer Mode...................................14
     5.1 Timer Control.......................................16
     5.2 Gap Acknowledgments.................................19
     5.3 Congestion Control..................................21
     5.4 Sequence Number Reset...............................24
     5.5 Retransmission on Multiple Networks.................25
       5.5.1 Randomization of the T3-Send timer at resend ...25
     5.6 Termination of an Endpoint..........................26
     5.7 Endpoint Drain......................................26
     5.8 Advisory Acknowledgements...........................26
     5.9 RTT Measurement.....................................26
     5.10 Heart Beat Ack.....................................26
6.  Unreliable Transfer Mode.................................27
     6.1 Ordered receiption..................................28
7.  Mixed Mode Data Transmission.............................28
8.  Bundled Messages.........................................29
     8.1 Format of Bundled Datagram..........................30
     8.2 Bundled Transfer....................................31
9.  Fragmented Messages......................................32
10. Non-protocol Datagrams...................................33
11. Broadcast and Multicast..................................34
     11.1 Multicast/Broadcast Initialization.................34
     11.2 Transmission of Broadcast Datagrams................34
     11.3 Transmission of Multicast Datagrams................35
     11.4 Reset of the Multicast Datagram Sequence Number....36
12. Suggested Timer and Protocol Parameter Values............37
13. Further Study............................................37
14. Author's Addresses.......................................38
15. References...............................................38

Stewart & Xie                                                  [Page  2]

1.  Introduction

This Internet Draft discusses an experimental protocol, namely the
Multi-network Datagram Transmission Protocol (MDTP), that is intended
to provide fault-tolerant reliable/unreliable data transfer between
communicating processes over IP networks [1].

MDTP is proposed as an application-level protocol which is designed
with a high emphasis on supporting redundant networks and transparent
fault management. MDTP also gives the application a great degree of
timing control and configuration flexibilities. The motivation of
developing MDTP is to establish a framework for supporting
Internet-based high reliability real-time commercial applications
such as signaling and call control for Internet telephony.

This document describes the functional interface and the details
necessary to implement MDTP.

1.1 Propose of Multi-network Datagram Transmission Protocol (MDTP)

The Multi-network Datagram Transmission Protocol (MDTP) presented in
this Internet Draft is designed to meet the following critical
requirements common to real-time call control environments employing
redundant networks:

A) A process may need to be in simultaneous communication with
   thousands of endpoints performing various call processing
   functions. These endpoints may be codec converters, SS7 to IP
   translation applications, or, in the case of mobile networks, data
   selector and combiner applications.

B) A process needs to have a very fine control over the timing for
   delivering a datagram. The timing should be easily adjusted
   depending on the message type and the destination. For example,
   after a few seconds of non-delivery the call which the message
   is about may not exist anymore.

C) A process communicating with a peer should be able to take
   advantage of the redundant networks in a transparent way. This
   means that the application or upper level protocols need not to be
   involved in the network fault management. Instead, when network
   failure occurs the transmission protocol should be able to
   automatically re-route the outbound datagram to the alternate
   network without intervention from the application.

D) Datagrams may arrive out of order, or may arrive in duplicate
   copies. This is especially true in a redundant network
   environment. The transmission protocol should be strong enough to
   properly handle both situations with little intervention from the
   upper level protocol or application.

To accomplish the above objectives we have defined MDTP to reside in
user-space, i.e., it is not intended to be implemented as a module in
an operating system. This gives the application or upper level
protocols that use MDTP outstanding flexibility in controlling the
timing and other operational characteristics for the data
transmissions.

Stewart & Xie                                                  [Page  3]

MDTP is also made multi-network aware. This means that if more than
one path exists between two endpoints (such as redundant LANs), MDTP
will take advantage of the multiple networks by automatically
switching to the alternate LAN if the datagram delivery becomes
unavailable or inefficient (e.g., too many re-transmissions) on the
current LAN. The ability to handle multiple networks by MDTP can also
greatly facilitate the implementation of various traffic balancing
schemes in the application or upper level protocols.

In the redundant network setting, out-of-order or duplicate datagrams
are proven to be most harmful during MDTP transmission initiations and
re-initiations. To cope with the problem, MDTP utilizes a very
efficient tag mechanism to guard against out-of-order or duplicate
datagrams.

MDTP assumes that a UDP-like [2] transport protocol is available at the
operating system level for data transport. We have successfully
implemented and tested MDTP over UDP and Sun Microsystem's CLTS
transport layers.

Comparing to traditional TCP [3], MDTP design is more tuned towards a
special set of applications, that is the time critical fault tolerant
applications using redundant LANs. It is not designed to replace TCP
as a general purpose transmission protocol.

Stewart & Xie                                                  [Page  4]

1.3 Interfaces

MDTP interfaces with the application programs or higher level
protocols through a set of function calls. Due to the fact that MDTP
is an application level protocol, these calls are not executed within
the operating system, but within the user process (i.e., in the user
space). The application or higher level protocols pass data to MDTP by
making calls to MDTP, which then enqueues the data for transmission.
When data arrives, MDTP will distribute the data to the application or
higher level protocols via mechanisms predefined by the application.
The application also has an interface to change the operational mode
of an MDTP endpoint and the default operational mode of the MDTP
endpoint.  The default operational mode is used in the absence of any
specific direction from the application.

As noted above, it is assumed that a UDP-like data transport protocol
will provide the interface between MDTP and the operating system. No
other special interfaces or changes are assumed within the operating
system, all queuing and internal pseudo-connection information is
maintained inside MDTP endpoint.

1.4 Operation

MDTP operates in three different modes.

   A) Reliable transfer mode
   B) Unreliable transfer mode
   C) Raw UDP transfer mode

The two ends in a communication connection can operate in different
modes with respect to each other, with the exception of the raw UDP
mode. For example, if two endpoints A and B are communicating with
each other. Endpoint A may be sending information to B in reliable
transfer mode, while B, on the other hand, may be sending information
to A in unreliable transfer mode. All communications from A to B will
be acknowledged by B, but A will not need to acknowledge data received
from B.

Raw UDP transfer is used when one of the endpoints in communication
does not support MDTP. This allows compatibility with non-MDTP
endpoints. Two MDTP capable endpoints are also allowed to engage in
communications in raw UDP transfer mode. However, both sides will have
to be in raw UDP mode once one of them indicates to use raw UDP
transfer mode.

Stewart & Xie                                                  [Page  5]

MDTP also provides a bundling option for both the reliable and
unreliable transfer modes. This allows each side to hold the data
before transmission for some period of time, so that small datagrams
can be combined and sent in a single larger datagram to improve
network utilization efficiency.

2.  Design Principles

One of the major objectives which dictates the design of MDTP is to
provide a data transmission protocol that transparently supports highly
fault tolerant implementations. To accomplish this, provisions for two
endpoints engaging in communication to use multiple networks is
essential. MDTP is therefore designed to yield the best fault
tolerance when the application shares the load over multiple network
connections.

In cases of failed original transmission, MDTP provides the ability of
attempting retransmissions using an alternate network connection even
when the upper level protocol or the application is completely
ignorant of the existence of the alternate route.

Many of the fundamental concepts that have made TCP such a useful
protocol are reused, and some of the advantages of UDP are also merged
into the design of MDTP. This has lead to a highly effective, robust
protocol for fault tolerant data communications.

3.  Header Format

MDTP inserts at the beginning of every datagram a header. This header
is composed of various flags and integers. The integers are always kept
in network byte order. The following table illustrates the common
MDTP header overlay. Note that one tick mark represents one bit
position.

Stewart & Xie                                                  [Page  6]
                           MDTP Header Format - Non Multicast

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 1                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 2                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Acknowledgment Number (Seen)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Sequence Number (Send)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Flags      |     Mode      |   Version     |   In Queue    |
   |N N W I F R D A|B S W R R B G U|               |               |
   |O O I S I E A C|R H N E E U A N|               |               |
   |G B N B R S T K|O U R 1 2 N R R|               |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               \
   /                             data                              /
   \                                                               \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   MDTP Header Format - Multicast Format

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 1                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 2                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Acknowledgment Number (Seen)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Sequence Number (Send)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Flags      |     Mode      |   Version     |   In Queue    |
   |N N W I F R D A|B S W R R B G U|               |               |
   |O O I S I E A C|R H N E E U A N|               |               |
   |G B N B R S T K|O U R 1 2 N R R|               |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Multicast To Transmit address                |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             Multicast From - senders base address             |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               \
   /                             data                              /
   \                                                               \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   MDTP Header Format - RTT Ack

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 1                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 2                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Acknowledgment Number (Seen)              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                       Sequence Number (Send)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Flags      |     Mode      |   Version     |   In Queue    |
   |N N W I F R D A|B S W R R B G U|               |               |
   |O O I S I E A C|R H N E E U A N|               |               |
   |G B N B R S T K|O U R 1 2 N R R|               |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Transparent Time Int-1                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                  Transparent Time Int-2                       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

3.1 MDTP Header Format Description

    MDTP Protocol Identifier 1: 32 bits

      This is a fixed long value of 0xf7873072.

    MDTP Protocol Identifier 2: 32 bits

      This is a fixed long value of 0x17074012. MDTP Protocol
      Identifier 1 and 2 are jointly examined to determine a received
      datagram is an MDTP protocol datagram.

    Acknowledgment Number (or Seen): 32 bits

      If the flag ACK is set this value is the next sequence number
      that the sender of this datagram expects to receive from the
      receiver of this datagram.

      However, during initialization negotiation, multicast and
      broadcast transmissions, this field will have special meanings
      (see 4 and 11).

    Sequence Number (or Send): 32 bits

      If DAT flag is set, this value represents the sequence number of
      the first data octet that follows this header. Otherwise, this

Stewart & Xie                                                  [Page  7]
      value will be the sequence number of the first octet of the next
      data unit that will be sent.

      However, during initialization negotiation, multicast and
      broadcast transmissions, this field will have special meanings
      (see 4 and 11).

    Part: 8 bits

      This value represents the Part number of a fragmented message. The
      first fragment of a message is always part '0'.

    Of: 8 bits

      This value represents the total number of fragments in a
      fragmented message. The valid range for this value is from '1'
      to '255'. For broadcast and multicast datagrams this value is
      set to '1' to indicate that no fragmentation should occur.

    Data Size: 16 bits

      This value represents, in number of octets, the size of the data
      field that follows this header in the current datagram.

    Flags: 8 bits

      NOG - No Guaranteed delivery. This bit is used in negotiation
      and is set to indicate that the sender does not wish to use
      reliable delivery. When this bit has been set in negotiation,
      the receiver should prevent its application from putting
      communication with this endpoint in reliable mode.
      In normal data transfer (after the initiate sequence) this
      bit should be set to 0, except when responding to a  RTT Ack
      request.

      NOB - No Bundling. This bit is used  in negotiation and
      is set to indicate that the sender does not wish to perform of
      bundling or un-bundling of datagrams. When this bit has been set
      in negotiation, the receiver should prevent its application from
      putting communication with this endpoint in bundled mode.
      In normal data transfer (after the initiate sequence) this
      bit should be set to 0, except when sending a Heart Beat Ack
      at which time this bit must be set to 1.

      WIN - Window Up. This bit is set by the sender of this datagram
      to indicate that the sender needs the receiver to acknowledge on
      previously received datagrams before it can send more datagrams.

      ISB - Is Bundled. This bit is set by the sender to indicate that
      this datagram is bundled. This bit should never be set if during
      negotiation either end set the NOB bit.

      FIR - First Datagram. This flag is set to indicate that this is a
      negotiation datagram.

      RES - Reset Sequence Number. This bit is set to indicate that the
      sequence number is being reset. The sequence number should be reset
      whenever the sending count is greater than 0x7fffffff.

Stewart & Xie                                                  [Page  8]
      DAT - Data Present. This bit is set to indicate that, following
      this header, application data is present in this datagram.

      ACK - Acknowledge. This bit is set to indicate that the sender is
      acknowledging receipt of the specified Acknowledgment Number.

    Mode: 8 bits

      BRO - Broadcast. This bit is set to indicate a broadcast or
      multicast datagram. When this bit is set, bit SHU, WNR, BUN, and
      GAR are not used and should be set to '0'. This datagram is a
      multicast datagram if the UNR bit is also set. Otherwise, this
      datagram is a broadcast datagram.

      SHU - Shutdown. This bit is set when the sender initiates its
      closing procedure and indicates to the receiver that the sender
      is no longer a valid destination.	If the UNR bit is set in
      conjunction with the SHU bit, an incomplete shutdown is
      specified. After an incomplete shutdown, the receiver can still
      re-establish the communication with the sender by re-initiating
      with the sender (see 5.7).

      WNR - Window Up Response. This bit is set in the acknowledgement
      reply to a Window Up flag.

      RE1 - This bit will represent one of two things. If the GAR
      bit is set to one, then setting the RE1 bit indicates to the
      reciever that the sender is requesting a advisitory ACK. This
      is normally sent in a datagram when 1/2 of the current window
      has been sent. If this bit is set to 0 (when the GAR bit is
      set) then the sender is NOT requesting a advisitory ACK.
      If the UNR bit is set then the RE1 bit is set than the reciever
      is requested to order the datagrams (if more than one have
      not been read). If the receiver has already delivered a datagram
      of higher sequence, then the receiver should discard lower number
      sequence datagrams that arrive late.

      RE2 - This bit will represent one of two things. If the GAR
      bit is set to one, the DAT bit is set to 0 and the ACK bit is
      set to 1 then this is a ACK with a Round Trip Time Request
      format. This also identifies the RTT Ack header format it
      in place. If the UNR bit is set to 1 and DAT bit is set to 0,
      then this datagram is used in a implementation specific way but
      carries no data. The datagram can be safely ignored and discarded.

      BUN - Bundled Mode. This bit is set to indicate that bundled
      mode is in effect for the sender. This bit should never be set
      if during negotiation either endpoint set the NOB flag.

      GAR - Guaranteed Mode. This bit is set to indicate that the
      reliable mode is in effect for the sender, i.e., the sender
      expects an acknowledgement. This bit should never be set if
      either endpoint set the NOG flag during negotiation.

      UNR - Unreliable Mode. This bit is set to indicate that
      unreliable mode is in effect for the sender and the sender does
      not expect an acknowledgement. This bit has special meanings if
      BRO or SHU bit is set (see above).

    Version: 8 bits

      This field represents the version number of the MDTP
      protocol. It is currently If these bits are set to 1. 1, then the sender does
      not support Round Trip Time (RTT) caclulation or Heart
      Beat of reliable protcol. If these bits are set to 2 then
      this version does support RTT and Heartbeat.

    In Queue: 8 bits

Stewart & Xie                                                  [Page  9]
      This field contains the number of messages the sender has on its
      incoming queue, waiting to be read by the application. This gives
      the receiver an indication of the flow control conditions within
      the sender.

The message header is always followed by the data field. If there is
less than 4 octets of application data to send with the datagram, the
data field of the datagram should be padded with all '0' to make it
four (4) octets.  The padded all '0' octets, if there is any, are not
counted in the Data Size.

The maximal Data Size for a single MDTP datagram is the MTU size of
the underlying transport protocol (e.g., UDP) minus the MDTP header
size that is twenty four (24) octets. The combination of the maximal
'Of' value, which is 255, and the maximal Data Size will determined
the maximal size of a single message that the MDTP can send or
receive.

3.2 Notes on Multicast Header format.

The header format is identical to the standard MDTP header format has
discussed above but adds the following extensions.

Multicast To Transmit address - This is the address, in network byte
order, that the sender transmited the data to. Since a receiver does
not know what address the sender was sending to, the receiver can
use this information for internal tracking purposes.

Multicast From - This is the base address (address 0 in the initiate
message) that is the sender. Since a multicast sender may not have
gone through the initiate procedures this address is the base
reference that the receiver is to use to lookup the sender. This
network byte order address should be used to reference any internal
cache rather than the arriving network from address.

4.  Transmission Initialization

4.1 Normal Initialization

Before the first data transmission can take place from one endpoint
(A) to another endpoint (Z), the two endpoints will need to complete
an initialization process.

The initialization process consists of the following steps.

A) Endpoint A should first send an initiation datagram, while
   withholding the application data from transmission.

   Endpoint A                                          Endpoint Z
   [Header Flags=FIR|RES
	   Mode=options
	   Seen=0,Send=Tag_A] ----------------------->
   (Start T1-init timer)
   (Enter Tag_A-lock mode)

   The initiation datagram is identified by setting FIR and RES bits in
   the Flags field. No user data should be carried in the initiation
   datagram.

   The Endpoint A should fill in the appropriate options, e.g., BUN,
   GAR, or UNR, in the Mode field to indicate the transmission type it
   has chosen. It may also use NOB and NOG bits in the Flags field to
   specify to whether or not its peer is allowed for bundling or
   reliable transfer mode.

   The Seen field will be set to '0', but an initiation tag, Tag_A,
   generated by Endpoint A, will be carried in the Send field, as
   shown in the above diagram. If re-initializations are needed
   between two endpoints subsequently (see 4.3), a different tag with
   a unique value should be used for each re-initialization.

   After sending the initiation datagram, Endpoint A shall start T1-init
   timer and enter a Tag_A-lock mode.

   During the Tag_A-lock mode, Endpoint A will wait for the initiation
   Ack datagram with the Seen value set to Tag_A. Any other incoming
   datagrams from Endpoint Z, except for new initiation datagrams,
   will be discarded. The arrival of new initiation datagrams during the
   Tag_A-lock mode indicates an initialization collision that will be
   discussed in 4.3.

   If T1-init timer expires, the same initiation datagram will be
   retransmitted and the timer restarted. This will be repeated
   Max.Init.Retransmit times before Endpoint A considers Endpoint Z
   unreachable and optionally reports the failure.

B) Upon the receipt of the above initiation datagram from Endpoint A,
   Endpoint Z should respond immediately with an initiation Ack as shown
   below:

   Endpoint A                                 Endpoint Z
                                              [Header Flags=FIR|RES|ACK
					       Mode=Options
				   /---------- Seen=Tag_A,Send=Tag_Z]
				  /           (Enter Tag_Z-lock mode)
   (Cancel T1-init timer)<-------/

   The initiation Ack datagram is specified with FIR, RES, ACK bits set
   to '1' in the Mode field. Similarly, Endpoint Z will specify its
   preferred transmission mode and type by setting proper bits in the
   Mode and Flags fields.

   In addition, in the outbound initiation Ack datagram, Endpoint Z
   should set the Seen field to Tag_A and supply its own initiation
   tag, Tag_Z, in the Send field.

   Once the initiation Ack is transmitted, Endpoint Z should enter the
   Tag_Z-lock mode. In the Tag_Z-lock mode Endpoint Z will ignore any
   incoming initiation Ack datagrams and also discard any other incoming
   datagram whose Seen field is not equal to Tag_Z, except for new
   initiation datagrams.

   If a new initiation datagram is received when Endpoint Z is in
   Tag_Z-lock mode, Endpoint Z will acknowledged the initiation datagram
   only when the tag carried in the Send field matches Tag_A previously
   recorded by Endpoint Z. Otherwise, Endpoint Z will send an initiation
   datagram with Send field set to Tag_Z back to Endpoint A to elicit an
   initiation Ack.

C) After transmitted the initiation Ack, Endpoint Z can start
   transmitting datagrams with user data. However, the Seen field in the
   first outbound datagram with user data must be set to Tag_A.

D) Upon the receipt of the initiation Ack with Seen equal to Tag_A,
   Endpoint A can start transmitting datagrams with user data. However,
   the first datagram with application data transmitted by Endpoint A
   should have the Seen value set to Tag_Z, which is obtained from the
   initiation Ack.

   Endpoint A                                     Endpoint Z
   {first app message}
   [Header Flags=ACK|DAT
	   Mode=options
	   Seen=Tag_Z,Send=1]
	   [data field]   -----------\
				      \
				       \-------> (Leave Tag_Z-lock mode)

E) Upon the receipt of the first datagram with user data from Endpoint
   A and with the Seen value set to Tag_Z, Endpoint Z should leave the
   Tag_Z-lock mode.

F) Similarly, upon the receipt of the first datagram with user data
   and the Seen value set to Tag_A from Endpoint Z, Endpoint A
   should leave the Tag_A-lock mode.

The upper level protocol or application can predefine a set of default
transmission modes, which will be used by the endpoint for
initialization. However, it should be pointed out that the
transmission modes between two endpoints are allowed to change on a
datagram by datagram basis, as been illustrated in later chapters.

4.2 Multiple Addresses

In order to support multiple networks, both endpoints need to have
knowledge of all network addresses available to each other. This
information needs to be passed to the other end during the
initialization. The data field of the initiation and initiation Ack
datagrams is used for this purpose.

Depending on the underlying network configuration, the data field will
be filled in one of the two following ways:

A) If the sending endpoint of the initiation or initiation Ack
datagram does not have access to multiple networks, the data field
will be set to the pad value of 4 octets of '0's.

B) If the sending endpoint has access to multiple networks (for
example two redundant LANs), the first 4 octets of the data field will
be an unsigned long integer (in network order) specifying how many
networks the endpoint has access to. Following these 4 octets will be
a list of network addresses. Each address begins with a header of 4
octets followed by the actual address. The first 2 octets of the
header is an unsigned integer indicating the size of the actual
address. The next 2 octets of the header is the type of the address.

For an IPv4 address, the address header will have the size set to 8
and the type set to AF_INET (2). Of the 8 octets used by the actual
IPv4 address, the first 4 octets will contain the IP address (in
network order) of the path. The next two octets will contain the UDP
port number (in network byte order). The last two octets will be
padded with 0's.

The data field of the initiation or initiation Ack datagram from an
endpoint with access to two IPv4 networks would look the following:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                     Number of Networks = 2                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Size of address=8       |    Type of Address=AF_INET (2)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             IP Address of Network 1 = 0x88b68108              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Port = 52212          |      Padding = 0              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |       Size of address=8       |    Type of Address=AF_INET (2)|
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |             IP Address of Network 2 = 0x0a100001              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |         Port = 52212          |      Padding = 0              |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Any data following the initate network list can be ignored. Implementations
are at option to use additional data sent in subsequent locations for
implementation specific data exchanges. No user data, however, is allowed
to be transported in this datagram.

4.3 Initialization Collision

If both endpoints attempt to initialize the communication at about the
same instance, a collision will occur. In a collision each endpoint
will receive an initiation datagram from the other side after it
transmitted its own. Both sides must acknowledge the initiation
datagram in the normal procedure as described in 4.1

The following is an example of initialization collision:

Endpoint A                                          Endpoint Z
[Header Flags=FIR|RES                          [Header Flags=FIR|RES
	Mode=options                            Mode=options
        Seen=0,Send=Tag_A] --------\   /-----   Seen=0, Send=Tag_Z]
(Start T1-init timer)               \ /        (Start T1-init timer)
                                     /
                                    / \
                                   /   \
[Header Flags=FIR|RES|ACK  <------/     \
        Mode=options		         \---> [Header Flags=FIR|RES|ACK
        Seen=Tag_Z,Send=Tag_A]----\             Mode=options
                                   \ /-------   Seen=Tag_A,Send=Tag_Z]
                                    \
                                   / \-------> (Cancel T1-init timer)
(Cancel T1-init timer)     <------/

..
[Header Flags=ACK|DAT
	Mode=options
        Seen=Tag_Z,Send=1] ------------------>
                                               ..
                                               [Header Flags=ACK|DAT
                                                Mode=options
			   <-----------------   Seen=Tag_A,Send=1]

4.4 Re-initialization

An endpoint is allowed to re-initialize an established communication.

In the case of re-initialization, the endpoint which initiates the
re-initialization (i.e, the initiator) should use a tag different
from the one used in the previous initialization. The initiator should
follow the standard initialization procedure as stated in 4.1.

Upon the arrival of the initiation datagram, the peer of the initiator
should also follow the procedure stated in 4.1 to respond.

4.5 Link Rotation

When multiple networks exist between two communicating endpoints,
every time the application transmits a datagram, the MDTP implementation
MUST keep track of which network the transmission was sent on (if
more than one network exists) in the MDTP protcol variable
'last.sent.intf'. If the user does not specifically override rotation,
each send should be rotated in a round robin fashion amongst
all available networks and the protocol variable 'last.sent.intf' should
be updated to indicate which interface was used last. The MDTP
implementation should consider the rules defined in
"5.5 Retransmission on Multiple Networks" to consider if
a network is "available"

The MDTP implementation MUST allow a user to override this rotation
defeating MDTP's rotation upon each send.

'Max.Retransmit', the sending endpoint should consider the peer
endpoint unreachable and stop transmitting data to it, and optionally
report the failure.

5.  Reliable Transfer Mode

Reliable transfer mode is indicated if the sending endpoint sets the
GAR option on the current datagram.

If the sending endpoint was previously transmitting in unreliable mode
(by setting UNR bit in each previous datagram), the receiver must
reset its Seen counter to the Send value of this current datagram
upon receiving it.

The following example illustrates both piggybacked and non-piggybacked
acknowledgments with both ends transmitting in reliable mode:

Endpoint A                                      Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=1,Send=1,Size=100]-------------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=1,Send=101,Size=100]----------->
(Restart T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=1,Send=201,Size=100]----------->
(Stop and restart T3-send timer)

                                              {Timer T2 expires}
                <---------------------------- [Header Flags=ACK
                                              Mode=0
                                              Part=0,Of=0
                                              Seen=301,Send=1]
(cancel T3-send timer)
..
{App sends 1 message}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=1,Send=301,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer)

                                              {App sends 1 message}
                                              (cancel T2-receive timer)
                <---------------------------- [Header Flags=DAT|ACK
                                               Mode=GAR
                                               Part=0,Of=1
                                               Seen=401,Send=1,Size=45]
                                               (Start T3-send timer)
(cancel T3-send timer)
(Start T2-receive timer)
..
{Timer T2 Expires}
[Header Flags=ACK
Part=0,Of=0
        Seen=46,Send=401]------------------> (cancel T3-send timer)

In the above example, the first series of 3 messages of 100 octets each
are sent by Endpoint A. The messages are unbundled in this example,
i.e., each message will be transmitted in a single datagram. Endpoint
A starts its send timer T3 after sending the first datagram, and each
subsequent send will stop and restart the send timer T3, extending the
life of the send timer. Endpoint Z upon receiving the first datagram
starts the receive timer T2. When timer T2 in Endpoint Z expires,
Endpoint Z transmits an Ack. Upon receipt of this Ack by Endpoint A,
it stops timer T3 and discards the first 3 datagrams (held for
possible retransmissions).

After the first three messages were transmitted successfully, the
application at Endpoint A sends another message of 100 octets.  After
sending this datagram, Endpoint A starts timer T3 again.  Upon
receipt of the datagram, Endpoint Z starts Timer T2.  Before
Endpoint Z's T2 timer expires, the application at Endpoint Z sends a
message of 45 octets to Endpoint A.  This causes Endpoint Z to cancel
the T2 timer and to piggyback an Ack on the outbound datagram being
transmitted to Endpoint A. After the transmission, Endpoint Z then
starts its T3 timer.  Upon receipt of this datagram Endpoint A
cancels its T3 timer (since all data it has sent is acknowledged), and
starts a receive timer T2. At the expiration of the T2 timer Endpoint
A acks the receipt of the last datagram from Endpoint Z.  This Ack
causes Endpoint Z to cancel its T3-send timer.

It is very important to notice in the above example that the
acknowledgements to the received datagrams are always delayed by timer
T2. This delay gives the receiving endpoint a window to piggyback the
Acks onto subsequent datagrams traveling in the opposite direction,
thus to avoid sending the Acks in separate datagrams.

5.1 Timer Control

The basic rules for timer control are as follows:

A) When all outstanding datagrams are acknowledged, the T3-send timer
   shall be stopped, if one is running.

B) When a datagram with application data (i.e., with DAT flag set) is
   received, the endpoint shall start a T2-receive timer if no timer is
   running.

C) Upon the expiration of the T2-receive timer, the endpoint shall
   ack to the sender all the un-acked data it has received.

D) When a datagram with application data is sent out, the sending
   endpoint shall start a T3-send timer. If the T3-send timer is already
   running, the endpoint shall first stop the old T3 timer and then
   start a new one. If the T2-receive timer is running, the endpoint
   shall first stop the T2 timer, piggyback an Ack unto the outbound
   datagram, and then start a T3-send timer.

E) If the T3-send timer expires, the endpoint shall attempt
   re-transmission according to the rules described in 5.5.

F) No more than one timer of any type should be running on an
   endpoint at any given moment.

G) When a T2-receive timer expires, any bundled data waiting to be
   transmitted should be sent immediately with a piggybacked Ack to
   acknowledge all un-acked data previously received.

H) Whenever a T3-send timer is to be started, any running timer should
   be stopped and supplanted by the T3-send timer.

I) In bundling mode, if the total size of all application messages
   pending to be sent is less than the bundle size, the messages should
   be withheld and the T4-bundle timer should be started.

J) If the total size of all application messages pending to be sent
   exceeds the bundle size, the T4-bundle timer should be stopped and
   the message(s) should be immediately sent.

K) If a T4-bundle timer is running and data arrives, the T2-receive
   timer should not be started.

L) A T4-bundle timer should never be canceled unless it is being
   supplanted by a T3-send timer.

M) When the first datagram with the Tag which unlocks the initiation
   is received, no T2-receive timer should be started, instead an
   acknowledgement must be sent without delay.

The following example shows the use of various timers.

Endpoint A                                         Endpoint Z
{App sends 2 messages}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=1,Send=501,Size=100]-----------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1                           {App sends 1 message}
        Seen=1,Send=601,Size=100]-\       /-- (cancel T2-receive timer)
(stop and restart T3-send timer)   \     /    [Header Flags=DAT|ACK
                                    \   /      Mode=GAR
                                     \ /       Part=0,Of=1
                                      \        Seen=601,Send=1,Size=100]
                                     / \       (Start T3-send timer)
                                    /   \
                              <----/     \-->
..
{T3-send timer expires}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=101,Send=601,Size=100]---------> (Cancel T3-send timer)
(Restart T3-send timer)                       (Start T2-receive timer)

                                              ..
                                              {Timer T2 expires}
(Cancel T3-send timer)        <-------------- [Header Flags=ACK
                                               Mode=0
                                               Part=0,Of=0
                                               Seen=701,Send=101]

In this example, the application at Endpoint A sends 2 messages to
Endpoint Z. Both messages are 100 octets in length. Before the second
datagram arrives at Endpoint Z, Endpoint Z's application sends a
message to Endpoint A. This causes Endpoint Z to cancel its T2-receive
timer and piggyback the Ack to the first received datagram on the
outbound datagram destined to Endpoint A. After transmitting the
datagram Endpoint Z starts its T3-send timer. When the T3-send timer
at Endpoint A expires, it will re-send its earlier datagram. The
retransmitted datagram is the same except for now it acknowledges all
outstanding packets that Endpoint Z has sent. After retransmitting the
datagram Endpoint A restarts its T3-send timer.

The arrival of the retransmitted datagram causes Endpoint Z to cancel
its T3-send timer and discard the duplicate datagram, and it now
starts its T2-receive timer. At the expiration of the T2-receive timer

Endpoint Z sends the Ack to Endpoint A. Endpoint A upon receipt of the
Ack Cancels its T3 timer.

5.2 Gap Acknowledgments

If a datagram becomes missing during a series of transmissions, a
special type of acknowledgement known as the gap Ack will be sent. The
gap Ack tells the sender of the missing datagram that retransmission
is needed.

The following example shows the use of gap Ack.

Endpoint A                                       Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=701,Size=100]--------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=801,Size=100]-----X (lost)
(Restart T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=901,Size=100]--------> (A gap detected in data)
(Restart T3-send timer)
                                             ..
                                             {T2-receive timer expires}
                                     /------ [Header Flags=ACK
                                    /         Mode=0
                                   /          Seen=801,Send=146,
                                  /           Part=1,Of=1
                                 /            data=(long integer)901]
(Prepare retransmit)   <--------/

In this example, when Endpoint Z received the third datagram from
Endpoint A it realizes that a gap exists in the received data.  At the
expiration of T2-receive timer, Endpoint Z sends a gap Ack, in place
of a normal Ack, to Endpoint A to indicate the missing data.

In the gap Ack, the Part and Of fields are both set to '1', as opposed
to '0' as in a normal Ack. The data field of the gap Ack is a four (4)
octet long integer containing the sequence number of the last octet of
the gap (which is 901 in this example).  The Seen field in the gap Ack
will contain the sequence number of the first octet of the gap.

Using these two values, Endpoint A should be able to calculate the
position and size of the missing data (which is 801-900 in this
example) and thus determine which datagrams will need to be
retransmitted.

Gap Acks cannot be piggybacked with application data. The following is
another example of using gap Ack:

Endpoint A                                       Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=701,Size=100]--------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=801,Size=100]-----X (lost)
(Restart T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=901,Size=100]--------> (A gap is detected)
(Restart T3-send timer)
                                             ..
                                             {App sends a message}
                                             (Cancel T2-receive timer)
                                     /------ [Header Flags=ACK
                                    /         Mode=0
                                   /          Seen=801,Send=146,
                                  /           Part=1,Of=1
                                 /            data=(network long)901]
(Retransmit missing data) <-----/
[Header Flags=DAT|ACK                      - [Header Flags=DAT|ACK
        Mode=GAR                          /  Mode=GAR
        Part=0,Of=1                      /   Part=0,Of=1
        Seen=146,Send=801,Size=100]-    /    Seen=801,Send=146,Size=100]
(Restart T3-send timer)             \  /     (Start T3-send timer)
                                     \/
                                     /\
                          <---------/  \
                                        \
                                         \-->
                                             ..
                                             {T3-Send timer expires}
                                             (Retransmit app data)
(Cancel T3-send timer)    <--------------- [Header Flags=DAT|ACK
(Start T2-receive timer)                    Mode=GAR
                                            Part=0,Of=1
                                            Seen=1001,Send=146,Size=100]
                                             (Restart T3-send timer)
..
{T2-receive timer expires}
[Header Flags=ACK
        Part=0,Of=0
        Seen=246,Send=1001]----------------> (Cancel T3-send timer)

In this example, Endpoint Z detected the missing data when it received
the second datagram. However, before the T2-receive timer expired, the
application at Endpoint Z requested to send a message (of 100 octets
in length). This caused Endpoint Z to cancel its T2-receive timer and
send the gap Ack before it sent out the datagram containing the
application message. After transmitting the application message
Endpoint Z started its T3-send timer. When Endpoint Z's T3-send timer
expired it retransmitted the previous datagram and at the same time
acked all of Endpoint A's outstanding datagrams. Upon the receipt of
the retransmission from Endpoint Z, Endpoint A started its own
T2-receive timer. At the expiration of its T2-receive timer Endpoint A
sent an Ack to Endpoint Z and resolved the outstanding datagram at
Endpoint Z.

5.3 Congestion Control

Three different mechanisms should be used jointly to achieve flow
and congestion control in MDTP.

First, a limit should be set on the number of outbound messages
queued up at an endpoint. If the limit is reached, new send requests
from the application should be rejected until the number of messages
in the queue drops back.

Secondly, MDTP uses a transmission window to control the number of
outstanding datagrams, i.e., datagrams that have been sent, but yet to
be acknowledged. The length of the window is defined as the maximal
number of outstanding datagrams a sending endpoint can allow. This
length is adjusted dynamically, depending on the current number of
successful transmissions as well as the number of lost datagrams.

When the number of outstanding datagrams reaches the current window
length, the endpoint may still accept send requests from the
application, but will transmit no more datagram until an Ack is
received.

Also, when the window length is reached, the next send request from the

application will trigger the sending endpoint to transmit a special
Window Up message. Upon receiving this Window Up message the receiver
must respond with a Window Up Response message, as illustrated by the
following diagram (assume current window length is 3):

Endpoint A                                      Endpoint Z

{App sends 3 messages}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=1001,Size=100]--------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=1101,Size=100]-------->
(Restart T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=1201,Size=100]-------->
(Restart T3-send timer)

{App sends 1 messages}
{ queue 100 byte message }
[Header Flags=WIN|ACK
        Seen=146,Send=1301]-----------------> (cancel T2-receive timer)
                                         /--- [Header Flags=ACK
                                        /       Mode=WNR
                                       /        Part=0,Of=0
                                      /         Seen=1301,Send=146]
[Header Flags=DAT|ACK      <---------/
	Mode=GAR
	Part=0,Of=1
        Seen=146,Send=1301,Size=100]--------> (Start T2-receive timer)

In this example, after the transmission of the first three datagrams,
Endpoint A reached its window length. The next message from the
application triggered a Window Up message that was sent to Endpoint
Z. The Window Up message always contains no data and has its WIN flag
set. In response, Endpoint Z cancelled timer T2 and immediately sent
an Ack with the WNR set in the Mode field. The arrival of this Ack
from Endpoint Z effectively resolved all the outstanding datagrams at
Endpoint A, thus allowed Endpoint A to send out the next datagram.

The window length is initially set to 2, and is then dynamically
adjusted based on the performance of the underlying networks.

If the current window length is equal to or greater than 4, every time
when 4 consecutive outstanding datagrams are acknowledged at once by
the receiver, the sender's window length will be raised by 1 until it
reaches 20.

If the length is less than 4, every time when the number of
consecutively acknowledged outstanding datagrams is equal to or
greater than the current window length, the sender's window will be
raised by 1 until it reaches 20.

The sender's window length will be decreased if datagram loss
occurs. If between 1 to 3 consecutive datagrams are lost, the window
length will be decreased by 1. If between 4 to 7 datagrams are lost,
the window length will be decreased by 2. If 8 or more datagrams are
lost, the window length will be decreased by 4. When the window length
reaches 2 it will not be decreased any further.

Moreover, any time a Window Up is sent to the receiving endpoint the
sender's window length will be decreased by 1. Also, if a timeout
forces a retransmission the sender's window length will be decreased
by 1. Moreover if a duplicate Ack is received by a sender, this should
indicate a network congestion situation and the number of outstanding
packets allowed should be decreased by 4.

The following table summarizes these rules:
-----------------------------------------------------------------------
  Duplicate Ack received by sender  | Adjust down by 4
-----------------------------------------------------------------------
  Greater than 8 datagrams lost     | Adjust down by 4
-----------------------------------------------------------------------
  Greater than 4 datagrams lost     | Adjust down by 2
-----------------------------------------------------------------------
  Greater than 0 datagrams lost     | Adjust down by 1
-----------------------------------------------------------------------
  Timeout forces retransmission     | Adjust down by 1
-----------------------------------------------------------------------
  Window Up sent                    | Adjust down by 1
-----------------------------------------------------------------------
  4 or more consecutive datagrams   | Adjust up by 1
  acknowledged (window length > 4)  |
-----------------------------------------------------------------------
  1/2 Window length or more acked   | Adjust up by 1
  (window length <=4)               |
-----------------------------------------------------------------------

Finally, the third flow control mechanism is to exchange incoming
queue information between the two communicating endpoints. By using the
In Queue field in the MDTP header, the sender can inform the receiver
the number of pending datagrams which the sender has received, but yet
to deliver to its application. The following example shows how the
endpoints use In Queue value to accomplish flow control.

Assume that Endpoint A sent Endpoint Z 20 datagrams, and when Endpoint
Z acked the receipt of all the 20 datagrams, only the first one of

the 20 datagrams was delivered to the application at Endpoint Z.  In
the last Ack sent by Endpoint Z, the In Queue field would then have a
value of 19, indicating the number of datagrams pending for delivery
to its application. This value would be checked by Endpoint A before
it sent the next datagram to Endpoint Z. If this value was found to be
greater than its current window length, Endpoint A would not send the
next datagram. Instead, Endpoint A would start its T3-send timer and
send a Window Up message to Endpoint Z at the expiration of the timer.
This would force Endpoint Z to send an Ack with an updated In Queue
value. If the new In Queue value was still greater than its window
length, Endpoint A would restart its T3-send timer, repeating this
procedure until the In Queue value of Endpoint Z dropped below the
current window length of Endpoint A.  Then, the transmission at
Endpoint A would resume.

5.4 Sequence Number Reset

It may become necessary for an endpoint to reset the sequence number
while it is sending data to a peer. However, the endpoint must inform
the peer about this event by:

1) sending a Window Up message to force the peer to acknowledge all
   received datagrams which have not been acknowledged, and

2) sending the next datagram with RES bit set in the Flags field.

3) A sending endpoint should always reset it sequence counter before
   the counter reaches 0x7fffffff. When the counter reaches this
   value the sending endpoint is required to reset its sequence
   counter.

4) A sending endpoint should never reset its sequence counter until
   after reaching 0x7ff05ff. 0x7fff05ff.

Note: This section will be obsoleted in a future version of the
draft and be replaced by a deterministic rollover algorithm.

The following example illustrates the sequence number reset procedure
(assume that Endpoint A opts to do a reset when the data sequence
number becomes greater than 0x7fffff000).

Endpoint A                                        Endpoint Z

{App sends 2 messages}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=46,Send=0x7ffff000,Size=100]----> (Start T2-receive timer)
(Start T3-send timer)
(Reset sequence number)
[Header Flags=WIN|ACK
        Seen=146,Send=0x7ffff100]------------> (cancel T2-receive timer)
                                      /------- [Header Flags=ACK
                                     /          Mode=WNR
                                    /           Part=0,Of=0
                                   /            Seen=7fffff100,Send=46]
(Cancel T3-send timer)     <------/
[Header Flags=DAT|ACK|RES
	Mode=GAR
	Part=0,Of=1
        Seen=46,Send=2,Size=100]-------------> (Start T2-receive timer)
(Restart T3-send timer)
                                               ..
                                               {App sends 1 message}
                                               (cancel T2-receive timer)
(Cancel T3-send timer)     <---------------- [Header Flags=DAT|ACK
(Start T2-receive timer)                      Mode=GAR
                                              Part=0,Of=1
                                              Seen=102,Send=46,Size=100]
                                               (Start T3-send timer)

In the above example, after transmitting the first datagram Endpoint A
determines that its data sequence number needs to be reset before it
transmits the next datagram. It first sends out a Window Up message to
force Endpoint Z to send back a Window Up Response to ack all the
outstanding received data. Then, it transmits the datagram
it has been withholding, with the new sequence number and the RES flag
set. Upon detecting the RES flag in the header of the incoming datagram,
Endpoint Z resets its data sequence counter on Endpoint A.

5.5 Retransmission on Multiple Networks

Whenever a T3-send timer expires, the endpoint will take one of the
following three actions:

A) If the current window length is not reached (see 5.3) and there is
   application data pending, a new datagram will be sent out.

B) If the current window length is reached, a Window Up message will
   be sent out.

C) If the window length is not reached, but there is no pending
   application data to send, The datagram with the lowest Send value
   that is still outstanding (i.e., not been acked) will be
   retransmitted.

When multiple networks exist between two communicating endpoints, the
re-transmission should be attempted on the network specified
in the MDTP protocol variable 'last.good.intf'. The value of
'last.good.intf' is always updated to refer to the network on which
the last datagram from the peer endpoint arrived.

Moreover, the number of consecutive re-transmissions is also recorded
in a variable 'retran.count' for each network. Every time a datagram is
received from a network, the corresponding retran.count is reset to '0'.

If the value in the retran.count of the current network exceeds a half
of the value of the protocol parameter 'Max.Retransmit', the
'last.good.intf' will be changed, so as to force the next
re-transmission to be directed to an alternate network.

The total number of consecutive re-transmissions across all the
networks is also recorded. If this value exceeds the limit defined by

'Max.Retransmit', the sending endpoint should consider the peer
endpoint unreachable and stop transmitting data to it, and optionally
report the failure.

5.5.1 Randomization of the T3-send timer at retransmission

When a T3-send timer is started after retransmitting a packet, the
value of the next T3-send timer for this destination should be
extended by a random amount. The amount must be bounded so that the
application can predict with some reasonable degree of precision when
the destination endpoint is declared unreachable.

For performance considerations, this can be implemented by
pre-calculating a set of random values and then using a different
value to extend the T3-send timer for each re-transmission to the
same destination endpoint.

5.6 Termination of an Endpoint

When an endpoint terminates, it should send a shutdown message
to each of the peer endpoints it has ever initiated for a
communication. The shutdown message is sent in unreliable transfer
mode and need not to be acknowledged. When an endpoint receives a
shutdown message from its peer, it will remove the sender from its
record, and optionally report the termination of that peer.

The following sequence shows an example of the termination of an
endpoint (Endpoint A).

Endpoint A

{App indicates termination}
[Header Flags=FIR
	Mode=SHU
        Seen=146,Send=1301,------------------------> to Endpoint X

[Header Flags=FIR
	Mode=SHU
        Seen=1496,Send=101,------------------------> to Endpoint Y

[Header Flags=FIR
	Mode=SHU
        Seen=1460,Send=201-------------------------> to Endpoint Z

As shown in this example, the shutdown message is indicated by having
both FIR flag and SHU mode bit set. Also, notice that no
acknowledgement is sent back by Endpoint X, Y, or X.

5.7 Endpoint Drain

An endpoint may decide to "drain" a connection without completely
shutting it down. By draining a connection, both endpoints will remove
any record and pending datagrams associated with the connection.
Further communications between the two endpoints can be resumed by
going through a re-initialization procedure.

A "drain" message is specified with the UNR bit set in a shutdown
message. No Ack is required for a "drain" message.

The following sequence shows an example.

Endpoint A

{App indicates termination}
[Header Flags=FIR|UNR
	Mode=SHU
        Seen=146,Send=1301]------------------------> to Endpoint X

5.8 Advisory Acknowledgements.

To increase bandwidth utilization a sending endpoint may (at its option)
request an advisory acknowledgement. A endpoint would typically do this
when 1/2 of its window is unacknowledged and upon its last datagram
that will fill its window. Upon reception of a adivsory Acknowledgedment
request the receiver shall with no delay transmit an acknowledgement of
all recieved packets canceling any T2-Receive timer that may be running.
The sequence would look as follows:

Endpoint A                                      Endpoint Z
{App sends 3 messages}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=1,Send=1,Size=100]-------------> (Start T2-receive timer)
(Start T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=1,Send=101,Size=100]----------->
(Restart T3-send timer)

[Header Flags=DAT|ACK
	Mode=GAR|RE1
	Part=0,Of=1
        Seen=1,Send=201,Size=100]----------->
(Stop and restart T3-send timer)

                                              (cancel T2-receive timer)
                <---------------------------- [Header Flags=ACK
                                              Mode=0
                                              Part=0,Of=0
                                              Seen=301,Send=1]

5.9 RTT Measurement

On occasion either end may wish to do a Round Trip Time measurment of
a network.  There are two methods of measuring Round Trip Time.
Method 1 envolves a pingpong using a special ACK, Method 2 envolves
a rider on top of a datagram. If Method 2 is invoked then the Round
Trip Time includes the T2-Receive timer (this actually may be more
useful then pure RTT time since each endpoint may have a different
T2-Receive timer value).

Method 1:
When a endpoint wishes a RTT measurement it shall send a
ACK datagram with RE2 set to 1, GAR set to 1 and DAT set to 0.  The sender
should place in Time Int 1 and Time int 2 the value of the current time of
day in seconds/microseconds.

Upon receipt of a datagram with RE2 set to 1, GAR set to 1 and DAT set to
0, the recepient should return the datagram to the sender over the
arriving network with the NOG bit set. The sender can then use the
Time Int 1 and Time Int 2 to caclulate the current RTT.

Endpoint A                                      Endpoint Z
RTT - Request Now=x.y
[Header Flags=ACK
	Mode=GAR|RE2
	Part=0,Of=1
        Seen=1,Send=301,Size=0
        Time-Int1=x
        Time-Int2=y]------------->
                <---------------------------- [Header Flags=ACK|NOG
                                              Mode=0
                                              Part=0,Of=0
                                              Seen=301,Send=1
                                              Time-Int1=x
                                              Time-Int2=y]

Endpoint A uses
current time subtracted from
X.y (in arriving Datagram) to
calculate the RTT.

Method 2:

If a endpoint wishes to piggyback a RTT test including the T2-Timer at
the remote endpoint the sending endpoint fills out the datagram in the
normal way for reliable communication but also sets the RE2 flag, and
places at the end of the datagram (outside the length of the data) two
long integers has a trailer.

When the receiving endpoint recognizes the RE2 flag, it should extract
the two integers and place them in internal storage until the next
datagram is scheduled to be returned (i.e. at the expiration of the
T2-Recv timer). If the The T2-Recv timer expires the receiving
endpoint should send the acknowledgement as above with the addition
of the NOB flag as well.  If the receiving endpoints upper layer sends
a datagram causing the T2-Recv timer to be canceled then the datagram
should include the Trailing integers and have the NOB flag set.
In cases where a intervening Window UP is received the receiving endpoint
should respond with a window Up Response (per the window up procedure)
but NOT cancel its T2-Recv timer.

Example 1 - T2-Recv timer expires

Endpoint A                                      Endpoint Z
RTT - Request Now=x.y
[Header Flags=ACK|DAT
	Mode=GAR|RE2
	Part=0,Of=1
        Seen=1,Send=301,Size=100
	{data of 100 octets}
        Time-Int1=x
        Time-Int2=y]------------->            (started T2-Recv)
		                              {T2-Recv Expires }
                <---------------------------- [Header Flags=ACK|NOG|NOB
                                              Mode=0
                                              Part=0,Of=0
                                              Seen=301,Send=1
                                              Time-Int1=x
                                              Time-Int2=y]
Example 2 - Datagram causes T2-Recv timer cancel

Endpoint A                                      Endpoint Z
RTT - Request Now=x.y
[Header Flags=ACK|DAT
	Mode=GAR|RE2
	Part=0,Of=1
        Seen=1,Send=301,Size=100
	{data of 100 octets}
        Time-Int1=x
        Time-Int2=y]------------->            (started T2-Recv)
		                              {datagram sent by application}
			                      (cancel T2-Recv)
                <---------------------------- [Header Flags=DAT|ACK|NOG
                                              Mode=GAR
                                              Part=0,Of=1,Size=100
                                              Seen=401,Send=1
                                              {data of 100 octets}
                                              Time-Int1=x
                                              Time-Int2=y]

5.10 Heart Beat Ack

At request by the application, the user may wish a Heart Beat acknowledgement
sent. The Heart Beat should only be allowed to be enabled if the senders
Mod is Gar (reliable delivery). delivery) and version is 2. Once enabled when no
datagrams are being
transmitted, a T5-Heart Beat timer should be started. When the
T5 timer expires a ACK should be sent using the next available link, following
the link rotation procedure outlined in "4.5 Link Rotation". After sending
the Ack another T5-Heart Beat timer should be started. If, before the
expiration of T5-Heart Beat, a datagram is transmitted or recieved, the
T5 timer should be stopped and the appropriate T2-T4 timer should be started.
The T5 timer has the lowest precedence of all timers.

When sending a Heart Beat Ack, the format should be identical to that of a standard ACK with the exception that RTT time test.
This will require the NOB bit should be set. The reciever
should use to respond on the case of network. If the NOB bit being set to NOT calculate any changes sender
does not get a response on the network the heartbeat arrived on by the
time a next heartbeat is to its sending window, but still treat be sent, then the value has network that the last
heartbeat was sent upon should be counted as a ACK freeing up
window space if applicable. transmission failure has
described in section "5.5 Retransmission on Multiple Networks", and
should counted against the 'retran.count' and protocol parameter 'Max.Retransmit'.

6.  Unreliable Transfer Mode

The unreliable transfer mode allows two endpoints to send to each
other without acknowledging the receiving. This can usually achieve
higher data throughput than the reliable transfer mode. To indicate the
unreliable transfer mode the sender of a datagram simply sets the UNR
in the mode field. The following sequence illustrates unreliable data
transfer.

Endpoint A                                      Endpoint Z
{App sends 2 messages}
[Header Flags=DAT|ACK
	Mode=UNR
	Part=0,Of=1
        Seen=1,Send=11001,Size=100]-------->

[Header Flags=DAT|ACK
	Mode=UNR
	Part=0,Of=1
        Seen=1,Send=11101,Size=100]-------->

                                             {App sends 1 message}
                                   <------- [Header Flags=DAT|ACK
                                             Mode=UNR
                                             Part=0,Of=1
                                             Seen=11201,Send=1,Size=450]

{App sends 2 more messages}
[Header Flags=DAT|ACK
	Mode=UNR
	Part=0,Of=1
        Seen=451,Send=11201,Size=100]------>

[Header Flags=DAT|ACK
	Mode=UNR
	Part=0,Of=1
        Seen=451,Send=11301,Size=100]------>

Note that no timers are started by either end. Also note that even
though both ends are in UNR mode, the ACK flag is still set by the
sender of the datagram. This means that the Seen field in the datagram
header is still valid to indicating the sequence number of the last

octet received by the sender. However, the sender makes no claim as to
whether pieces of data are missing. The upper application can use this
information to help detecting missing or duplicated pieces. In
unreliable mode, MDTP makes no effort to re-transmit missing data or
to screen out duplicated datagrams.

6.1 Ordered receiption

In unreliable transfer if the sender sets the RE1 bit the reciever
should order the datagrams upon arrival. Any datagrams that have not
been read by the receivers application should be ordered so that the
datagrams will be received in order the datagrams were transmited
(using the sendStartsAt field). If a datagram arrives after a
new datagram then the datagram should be discarded. The sequence would
look as follows:

Endpoint A                                      Endpoint Z
{App sends 4 messages}
[Header Flags=DAT|ACK
	Mode=UNR|RE1
	Part=0,Of=1
        Seen=1,Send=11001,Size=100]-------->

[Header Flags=DAT|ACK
	Mode=UNR|RE1
	Part=0,Of=1
        Seen=1,Send=11101,Size=100]\       /-->
                                    \     /
                                     \   /   (User reads/Receives all
[Header Flags=DAT|ACK                 \ /     datagrams 11001 & 11201)
	Mode=UNR|RE1                   \
	Part=0,Of=1                   / \
        Seen=451,Send=11201,Size=100]/   \---> { Datagram is discarded }

[Header Flags=DAT|ACK
	Mode=UNR|RE1
	Part=0,Of=1
        Seen=1,Send=11301,Size=100]\       /-->
                                    \     /
                                     \   /
[Header Flags=DAT|ACK                 \ /
	Mode=UNR|RE1                   \
	Part=0,Of=1                   / \
        Seen=451,Send=11401,Size=100]/   \--->(User reads/Receives all
                                               datagrams in order
                                               11301 & 11401)

7. Mixed Mode Data Transmission

An endpoint can switch between reliable and unreliable transfer modes
at any time during the data transfer.

The following sequence illustrates such a transfer mode change, in
which both endpoints starts with the unreliable transfer mode, and
then Endpoint A switches to reliable transfer mode.

Endpoint A                                  Endpoint Z
                                            {App send 1 message}
                        <------------------ [Header Flags=DAT|ACK
                                             Mode=UNR
                                             Part=0,Of=1
                                             Seen=11201,Send=1,Size=450]
..
{App send 1 message}
[Header Flags=DAT|ACK
	Mode=UNR
	Part=0,Of=1
        Seen=451,Send=11201,Size=100]------>

..
{App send 1 message}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=451,Send=11301,Size=100]------> (Start T2-receive timer)
(Start T3-send timer)
                                             {App sends 1 message}
                                             (Cancel T2-receive timer)
                                    /------- [Header Flags=DAT|ACK
                                   /         Mode=UNR
                                  /          Part=0,Of=1
                                 /           Seen=11401,Send=1,Size=450]
(Cancel T3-send timer)  <-------/

..
{App sends 1 message}
[Header Flags=DAT|ACK
	Mode=GAR
	Part=0,Of=1
        Seen=451,Send=11401,Size=100]------> (Start T2-receive timer)
(Start T3-send timer)
                                             ..
                                             {Timer T2 Expires}
(Cancel T3-send timer)  <------------------- [Header Flags=ACK
                                              Mode=0
                                              Part=0,Of=0
                                              Seen=11501,Send=146]

Note that in the second datagram sent by Endpoint A the mode is
switched to reliable transfer mode (with GAR bit set). This causes
Endpoint A to start its T3-send timer. When Endpoint Z receives the
datagram and realizes the mode change, it starts its T2-receive timer.
At this point, Endpoint Z also must update its Seen value to 11301.
This will allow Endpoint Z to align its Seen counter
to the Seen value of this first reliable datagram from Endpoint
A. This prevents Endpoint Z from requesting retransmission of data
that Endpoint A may not have.

8.  Bundled Messages

In order to increase network utilization, MDTP allows an endpoint to
bundle small application messages into one single datagram for
transmission. This bundled mode can be applied to both reliable and
unreliable datagrams.

An endpoint indicates to its peer that it is currently in bundled
mode by setting the BUN bit in the mode field.

8.1 Format of Bundled Datagram

The ISB bit in the flag field is set to indicate the current datagram is
bundled, i.e., it contains multiple messages. The format of a bundled
datagram is defined as follow:

    0                   1                   2                   3
    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 1                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 MDTP Protocol Identifier 2                    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                 Acknowledgment Number (Seen)                  |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                    Sequence Number (Send)                     |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |        Data Size              |    Part       |      Of       |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Flags      |     Mode      |   Version     | Num On Queue  |
   |N N W I F R D A|B S W R R B G U|               |               |
   |O O I S I E A C|R H N E E U A N|               |               |
   |G B N B R S T K|O U R 1 2 N R R|               |               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |     Number Of Messages        |   Size of first message B1    |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |                                                               |
   |                     B1 octets of data                         |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Size of second message B2  |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                     B2 octets of data                         |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   \                                                               \
   /                                                               /
   \                                                               \
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |    Size of last message BL    |                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+                               |
   |                     BL octets of data                         |
   |                                                               |
   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Data Size in a bundled datagram indicates the actually size of the
data field of the datagram, including both the bundling overhead and
the actually application data. Since no fragmentation is allowed in a
bundled datagram, the Part field will always be '0' and the Of field
always be '1'.

The first two octets of the data field is a 16 bit integer indicating
the number of messages bundled in the datagram. This is followed
immediately by a list of bundled messages. Each bundled message starts
with an integer of two octets indicating the size of the data in the
message, followed by the data itself.

All integers in the datagram should be transmitted in the network byte
order.

8.2 Bundled Transfer

Two protocol parameters, namely the Min.Bundle and Max.Bundle, are
used to control the assembly of bundled datagrams. If the current size
of a bundled datagram is smaller than Min.Bundle, the endpoint will
withhold the datagram from transmission and start T4-bundle timer. If
new outbound data becomes available for transmission, the endpoint
will attempt to bundle the new data with the current withheld datagram
by using the following rules:

A) If the size of the new data is greater than or equal to
   Min.Bundle, the current withheld datagram will be transmitted and
   T4-bundle timer will be canceled. Then, the new data will be
   transmitted in a separate datagram.

B) If the size of the new data is less than Min.Bundle, but the
   combined size of the current datagram and the new data is greater
   than or equal to Max.Bundle, the current datagram will be sent and
   the new data will be withheld as the new current datagram.

C) If the size of the new data is less than Min.Bundle, and the
   combined size of the current datagram and the new data is less than
   Max.Bundle, the new data will be bundled into the current
   datagram and the bundled datagram will be immediately transmitted.

D) If the size of the new data is less than Min.Bundle, and the
   combined size of the current datagram and the new data is less than
   Min.Bundle, the new data will be bundled into the current
   datagram. And the T4-bundle timer will be restarted.

E) If T4-bundle timer expires, the current datagram will be sent
   immediately.

F) If the size of the new data is greater than the Max.Bundle, the
   current datagram will be sent. Then, the new data will be fragmented
   for transmission (see 9).

The following is an example of bundled data transfer, assuming
Max.Bundle=4096 and Min.Bundle=1700:

Endpoint A                                      Endpoint Z

{App sends 1 messages of 100 octets}
(withhold and Start T4-Bundle timer)

..
{App sends 1 messages of 100 octets}
(bundling into current datagram)

..
{App sends 1 messages of 100 octets}
(bundling into current datagram)

..
{T4-bundle timer expires}
[Header Flags=DAT|ACK
	Mode=GAR|BUN
	Part=0,Of=1
        Seen=146,Send=1001,Size=308]--------> (Start T2-receive timer)
(T3-send timer starts)
                                              ..
                                              {Timer T2 Expires}
(cancel T3-send)            <---------------- [Header Flags=ACK
                                               Mode=0
                                               Part=0,Of=0
                                               Seen=1309,Send=146]

Notice that the Data Size in the datagram sent by Endpoint A is not
300 but 308. This is due to the fact that this size reflects the
size of the data field of the datagram including the bundling overhead.

When the bundled datagram arrives at the receiving endpoint, each
message is unbundled and delivered separately to the upper level
application.

9.  Fragmented Messages

When the size of an outbound message exceeds the value defined in the
protocol parameter Max.Bundle, the endpoint will fragment the message
into smaller pieces of sizes equal to or smaller than Max.Bundle and
send each piece out in a separate datagram.

The Part and Of fields are used to disassemble and reassemble the
fragmented message.

The following example shows the transmission of a fragmented message
(assuming Max.Bundle=4096, Min.Bundle=1700):

Endpoint A                                      Endpoint Z

{App sends 1 messages 8544 octets long}
[Header Flags=DAT|ACK
	Mode=GAR|BUN
	Part=0,Of=3
        Seen=146,Send=1001,Size=4072]-------> (Start T2-receive timer)
[Header Flags=DAT|ACK
	Mode=GAR|BUN
	Part=1,Of=3
        Seen=146,Send=5073,Size=4072]------->
[Header Flags=DAT|ACK
	Mode=GAR|BUN
	Part=2,Of=3
        Seen=146,Send=9145,Size=400]-------->
(Start T3-send timer)
                                              ..
                                              {Timer T2 Expires}
                                 /----------- [Header Flags=ACK
                                /              Mode=0
                               /               Part=0,Of=0
(cancel timer T3) <-----------/                Seen=9545,Send=146]

Notice that Endpoint A is using the reliable transfer mode to send the
fragmented message. In this mode, Endpoint Z will hold the fragments
and request retransmission if a fragment is found missing, i.e., a gap
is found in the received data (see 5). When all the parts of the
fragmented message are received, the endpoint will re-assemble the
message and dispatch it to the upper level application.

It is also allowed in MDTP to send fragmented message using unreliable
transfer mode. However, in unreliable mode, each fragment datagram
will be dispatch to the application upon its arrival, and no
retransmission will be requested even if a fragment is found missing.

Bundling is prohibited if the current datagram contains a fragment of
a fragmented message.

10. Non-protocol Datagrams

The MDTP protocol allows an endpoint to send and receive non-protocol
datagrams such as the traditional UDP datagrams. Non-protocol
datagrams are detected by the absence of the MDTP protocol
identifiers at the beginning of the datagram. A non-protocol
transmission received by an MDTP endpoint is termed as a "raw"
datagram. When a raw datagram arrives, the receiving endpoint will set
itself into raw mode and start sending back to its peer in raw mode

as well.

Once an endpoint is in raw mode with a peer, only a change of
operational mode by the application or a reception of a MDTP datagram
will bring the endpoint out of raw mode. In the latter case, the
endpoint will use the default MDTP operational mode predefined by the
application for MDTP transmissions. When an endpoint changes from raw
mode into MDTP mode, the normal MDTP initiation messages must be
exchanged between the two endpoints, as described in 4.

11. Broadcast and Multicast

Broadcast and multicast are supported by MDTP when the underlying
transport layer supports them. Both types of transmissions are carried
out in unreliable transfer mode.

For broadcast datagrams, the BRO bit will be set to '1' and the UNR
bit will be set to '0' in the mode field. For multicast datagrams,
both the BRO bit and the UNR bit will be set to '1'.

For multicast datagrams, the value in the Send field will indicate
the number of multicast datagrams transmitted by the sender. This
information makes it possible for the receiver of the multicast to
detect duplicated multicast datagrams and also to detect lost
multicast datagrams. A multicast datagram transmission MUST use
the alternate multicast header filling in both the multicast transmit
to address as well as its lowest network address in the multicast
from address.

Bundling and fragmentation are not allowed in either multicast or
broadcast datagrams.

11.1 Multicast/Broadcast initialization.

No initiation is needed for an endpoint to transmit multicast or
broadcast datagrams. However, caution should be taken when
transmitting non-protocol datagrams (i.e., datagrams with no MDTP
protocol header) in multicast or broadcast transmission. This is
because the non-protocol datagrams may inadvertently force all the
receiving endpoints of the multicast or broadcast transmission into
raw mode (see 10).

11.2 Transmission of Broadcast Datagrams.

When sending a broadcast datagram, the endpoint will not take effort
to prevent duplicate transmissions (this is likely to occur
especially when multiple networks exist). The application at the
receiving end must be prepared to handle duplicate
broadcast messages.

The following is an example of broadcast datagram transmission:

Endpoint A                                               Endpoint Z
{application sends 2 messages }
[Header Flags=DAT
	Mode=BRO
	Part=0,Of=1
        Seen=0,Send=0,Size=200]--------------> (Datagram may appear
                                                more than once.)
[Header Flags=DAT
	Mode=BRO
	Part=0,Of=1
        Seen=0,Send=0,Size=100]-------------->

Notice that no timers are used on either end, and Seen and Send values
in the datagrams are always '0'.

11.3 Transmission of Multicast Datagrams.

Unlike the broadcast transmission, when multicast datagrams are
transmitted the receiving endpoints should take effort to prevent
duplicate copies of datagrams from being distributed to their
applications.

This is possible because the transmission of multicast datagrams is
usually addressed to a special multicast network address. The receiving
endpoints can thus use this multicast address in combination with the
sender's address to detect duplicate transmissions of a multicast
datagram.

The following example illustrates multicast transmissions between two
endpoints.

Endpoint A                                               Endpoint Z
{app multicasts a message}
[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=0,Send=5,Size=250]--------------> (may receive more
                                                than one copy)
..

{app multicasts a message}
[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=0,Send=6,Size=500]--------------> (may receive more
                                                than one copy)

Notice the values of the Send field in the multicast datagrams (which

are 5 and 6, respectively). They represent the sequence numbers of the
multicast datagrams Endpoint A has sent out. Endpoint Z should use the
Send value found in the incoming multicast datagrams to detect any
missing or duplicate datagrams.

Duplicate datagrams will be discarded and no effort will be made to
retransmit lost multicast datagrams.

For example, each endpoint can track the last 32 datagrams received by
using a sliding window of 32 bits. Each time a new datagram with a
sequence number higher than the current window head is received, the
window can be moved up. If a datagram received has a sequence number
below the current window head, then a check of the last 32 received
datagrams' sequence numbers can determine whether the new datagram is a
duplicate. If the sequence number of the new datagram is below the
current window tail then the datagram should be considered a duplicate
and discarded.

11.4 Reset of the Multicast Datagram Sequence Number

If the Seen field in a multicast datagram is set to '1', it is an
indication that the sender has reset its multicast datagram sequence
number. The receiving endpoint, upon detecting this reset indicator in
the incoming multicast datagram, should start a procedure to adopt the
new sequence number for error detection. However, caution
should be taken to prevent false resets due to duplicated datagrams
with reset indicator propagating through multiple networks.

To guarantee that all receivers of the multicast group adopt the new
sequence number, the reset indicator should be repeated within the
first N multicast datagrams sent out after the reset. N is predefined
by the protocol parameter Num.Of.Mcast.Reset.Msg.

At the receiving endpoint, when the reset indicator is detected the
new sequence number will be adopted. However, if two reset events are
detected within a predefined time interval (Min.Mcast.Time.To.Reset),
the second reset indicator will be ignored.

The following is an example (assuming Num.Of.Mcast.Reset.Msg = 4):

Endpoint A                                         Endpoint Z

[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=0,Send=17859,Size=300]---------->

{reset message sequence number indicated}

[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=1,Send=1,Size=250]--------------> (record new sequence
                                                number, datagram may
                                                appear more than once)
[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=1,Send=2,Size=250]--------------> (may appear more than
                                                once)

[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=1,Send=3,Size=500]--------------> (may appear more than
                                                once)
[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=1,Send=4,Size=500]--------------> (may appear more than
                                                once)
[Header Flags=DAT
	Mode=BRO|UNR
	Part=0,Of=1
        Seen=0,Send=5,Size=100]--------------> (may appear more than
                                                once)

In the above example Endpoint Z would detect the reset indicator in
the second multicast datagram and adopt the new sequence number which
is 1. Then, it would ignore the reset indicator in the subsequent three
(3) datagrams since they arrived within a very short time interval.

12. Suggested timer and MTU values.

The following are suggested timer values for MDTP:

T1-init Timer    -  160 ms
T2-receive Timer -   20 ms
T3-send Timer    -  160 ms
T4-bundle Timer  -   40 ms
T5-Heart Beat    - 4000 ms

The following protocol parameters are recommended:

Min.Bundle              - 1000 octets
Max.Bundle              - 1432 octets
Max.Retransmit          - 10 attempts
Max.Init.Retransmit     - 8  attempts
Min.Mcast.Time.To.Reset - 5 seconds
Num.Of.Mcast.Reset.Msg  - 5 messages

13. Further Study

Currently the authors are benchmarking and analyzing the MDTP
performance in a redundant distributed processing environment. Some of
the items which have been planned to investigate are:

A) Use random timers instead of fixed timers.
B) Change the way inbound flow control is transmitted back to the
   sender.
C) Experiment on load-related variable timers on a per endpoint basis.

14.  Author's Addresses

Randall R. Stewart                          Tel: +1-847-632-7438
Cellular Infrastructure Group               EMail: stewrtrs@cig.mot.com
Motorola, Inc.
1475 W. Shure Drive, #2C-6
Arlington Heights, IL 60004
USA

Qiaobing Xie                                Tel: +1-847-632-3028
Cellular Infrastructure Group               EMail: xieqb@cig.mot.com
Motorola, Inc.
1501 W. Shure Drive, #2309
Arlington Heights, IL 60004
USA

15. References

[1] Postel, J. (ed.), "Internet Protocol - DARPA Internet Program
Protocol Specification", RFC 791, USC/Information Sciences Institute,
September 1981.

[2] Postel, J., "User Datagram Protocol", RFC 768, USC/Information Sciences
Institute, August 1980.

[3] Postel, J. (ed.), "Transmission Control Protocol", RFC 793, USC/
Information Sciences Institute, September 1981.

      This Internet Draft expires in 6 months from August 1998.