draft-ietf-rmt-bb-track-01.txt   draft-ietf-rmt-bb-track-02.txt 
RMT Working Group Brian Whetten RMT Working Group Brian Whetten
Internet Engineering Task Force Talarian Internet Engineering Task Force Consultant
Internet Draft Dah Ming Chiu Internet Draft Dah Ming Chiu
Document: draft-ietf-rmt-bb-track-01.txt Sun Microsystems Document: draft-ietf-rmt-bb-track-02.txt Miriam Kadansky
2 March, 2001 Miriam Kadansky November 2002 Sun Microsystems
Expires 2 October, 2001 Sun Microsystems Expires May 2003 Seok Joo Koh
ETRI
Gursel Taskale Gursel Taskale
Talarian TIBCO
Reliable Multicast Transport Building Block for TRACK Reliable Multicast Transport Building Block for TRACK
<draft-ietf-rmt-bb-track-01.txt> <draft-ietf-rmt-bb-track-02.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that other
other groups may also distribute working documents as Internet- groups may also distribute working documents as Internet-Drafts.
Drafts. Internet-Drafts are draft documents valid for a maximum of Internet-Drafts are draft documents valid for a maximum of six months
six months and may be updated, replaced, or obsoleted by other and may be updated, replaced, or obsoleted by other documents at any
documents at any time. It is inappropriate to use Internet-Drafts time. It is inappropriate to use Internet- Drafts as reference
as reference material or to cite them other than as "work in material or to cite them other than as "work in progress."
progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
Abstract Abstract
This document describes the TRACK Building Block. It contains This document describes the TRACK Building Block. It contains
functions relating to positive acknowledgments and hierarchical functions relating to positive acknowledgments and hierarchical tree
tree construction and maintenance. It is primarily meant to be construction and maintenance. It is primarily meant to be used as
used as part of the TRACK Protocol Instantiation. It is also part of the TRACK Protocol Instantiation. It is also designed to be
designed to be useful as part of overlay multicast systems that useful as part of overlay multicast systems that wish to offer
wish to offer efficient confirmed delivery of multicast messages. efficient confirmed delivery of multicast messages.
Conventions used in this document Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
this document are to be interpreted as described in RFC-2119. document are to be interpreted as described in RFC-2119.
Table of Contents
1. Introduction 1. Introduction
2. Design Rationale One of the protocol instantiations the RMT WG is chartered to create
3. Applicability Statement is a TRee-based ACKnowledgement protocol (TRACK). Rather than create
3.1 Application types a set of monolithic protocol specifications, the RMT WG has chosen to
3.2 Network Infrastructure break the reliable multicast protocols into Building Blocks (BB) and
4. Message Types Protocol Instantiations (PI). A Building Block is a specification of
5. Global Configuration Variables, Constants, and Reason Codes the algorithms of a single component, with an abstract interface to
5.1 Global Configuration Variables other BBs and PIs. A PI combines a set of BBs, adds in the
5.2 Constants additional required functionality not specified in any BB, and
5.3 Reason Codes specifies the specific instantiation of the protocol. For more
6. External APIs information, see the Reliable Multicast Transport Building Blocks and
6.1 Interfaces to the BB from PI's Reliable Multicast Design Space documents [2][3].
6.1.1 Start(boolean RepairHead, boolean RejoinAllowed,
Advertisement)
6.1.2 End
6.1.3 incomingMessage(Message)
6.1.4 getStatistics
6.1.5 MessageSynched(Message)
6.1.6 RepairHead(boolean)
6.2 Interfaces from the BB to the PI
6.2.1 outgoingMessage(Message)
6.2.2 MessageReceived(Message, boolean Synch)
6.2.3 SenderLost
6.2.4 UnrecoverableData
6.2.5 SessionDone
7. Algorithms
7.1 Tree Based Session Creation and Maintenance
7.1.1 Overview of Tree Configuration
7.1.2 Bind
7.1.2.1 Input Parameters
7.1.2.2 Bind Algorithm
7.1.3 Unbind
7.1.4 Eject
7.1.5 Fault Detection
7.1.6 Fault Notification
7.1.7 Fault Recovery
7.2 TRACK Generation
7.2.1 TRACK Generation with the Rotating TRACK Algorithm
7.2.2 Local Repair
7.2.3 Flow Control Window Update
7.2.4 Reliability Window
7.2.5 Confirmed Delivery
7.3 Feedback Aggregation
7.4 Measuring Round Trip Times
8. Security
9. References
10. Acknowledgements
11. Authors' Addresses
1. Introduction
This document describes the TRACK Building Block. It contains
functions relating to positive acknowledgments and hierarchical
tree construction and maintenance. It is primarily meant to be
used as part of the TRACK Protocol Instantiation. It is also
designed to be useful as part of overlay multicast systems that
wish to offer efficient confirmed delivery of multicast messages.
As pointed out in the building blocks rationale draft [WVKHFL00], As specified in [2], there are two primary reliability requirements
there are two different reliability tasks that can be provided by a for a transport protocol, ensuring goodput, and confirming delivery
reliable multicast transport: ensuring goodput and confirming to the Sender. The NORM and ALC PIs are responsible solely for
delivery of application level messages. The NACK Protocol ensuring goodput. TRACK is designed to offer application level
Instantiation and ALC Protocol Instantiation are each primarily confirmed delivery, aggregation of control traffic and Receiver
concerned with ensuring goodput. The TRACK BB and TRACK PI rely on statistics, local recovery, automatic tree building, and enhanced
a repair tree to provide goodput as well as confirmed delivery. If flow and congestion control.
Forward Error Correction, Generic Router Assist or other mechanisms
are used to help provide goodput, they are assumed to work
transparently at a layer below this BB, as if the IP multicast
service has lower error rate.
The TRACK BB also assumes that there is an Automatic Tree Building Whereas the NORM and ALC PIs run only over other building blocks, the
BB [KLCWTCTK01] which provides the list of parents (known as TRACK PI has a more difficult integration task. To run in
Service Nodes within in Tree BB) each node should join to. If conjunction with NORM, it must either re-implement the functionality
Receivers are used that may also serve as Repair Heads, the TRACK in the NORM PI, or integrate directly with the NORM PI. In addition,
BB assumes the Auto Tree BB is also responsible for selecting the in order to have reasonable commercial applicability, TRACK needs to
role of each Receiver as either Receiver or Repair Head. However, be able to run over other protocols in addition to NORM. To meet
the TRACK BB may specify that a particular node may not operate as both of these challenges, the TRACK PI is designed to integrate with
a Repair Head. other transport layer protocols, including NORM, PGM [20], ALC [19],
UDP, or an overlay multicast system. In order to accomplish this,
there can be multiple TRACK PIs, one for each transport protocol it
is specified to integrate with. The vast majority of the protocol
functionality exists in this document, the TRACK BB, which in turn
references the automatic tree building block [16]. For more details
on the specific functionality of TRACK, please see the reference
TRACK PI[21].
TRACK is organized around a Data Channel and a Control Channel. The
Data Channel is responsible for multicast data from the Sender to all
other nodes in a TRACK session. In order to integrate with NORM and
other goodput-ensuring transport protocols, these protocols are used
as the Data Channel for a given Data Session. This Data Channel MAY
also provide congestion control. Otherwise, congestion control MUST
be provided by the TRACK PI, through using the TFMCC or other
approved congestion control building block.
This document describes the TRACK Building Block. It contains
functions relating to positive acknowledgments and hierarchical tree
construction and maintenance. While named as a building block, this
document describes more functionality than the PI documents. With
the exception of congestion control, almost all of the functionality
is encapsulated in this document or the BBs it references. The TRACK
PIs are then primarily responsible for instantiating packet formats
in conjunction with the other transport protocols it uses as its Data
Channel.
The TRACK BB assumes that there is an Automatic Tree Building BB [16]
which provides the list of parents (known as Service Nodes within the
Tree BB) each node should join to. If Receivers are used that may
also serve as Repair Heads, the TRACK BB assumes the Auto Tree BB is
also responsible for selecting the role of each Receiver as either
Receiver or Repair Head. However, the TRACK BB may specify that a
particular node may not operate as a Repair Head.
The TRACK BB also assumes that a separate session advertisement The TRACK BB also assumes that a separate session advertisement
protocol notifies the receivers as to when to join a session, the protocol notifies the Receivers as to when to join a session, the
data multicast address for the session, and the control parameters data multicast address for the session, and the control parameters
for the session. for the session. This functionality MAY be provided in a TRACK PI
document.
The TRACK BB provides additional information and aggregation
capabilities, which are useful for congestion control.
The TRACK BB provides the following detailed functionality. The TRACK BB provides the following detailed functionality.
@ Hierarchical Session Creation and Maintenance. This set of ... .Hierarchical Session Creation and Maintenance. This set of
functionality is responsible for creating and maintaining (but functionality is responsible for creating and maintaining (but not
not configuring) the hierarchical tree of Repair Heads and configuring) a hierarchical tree of Repair Heads and Receivers.
Receivers. - Bind. When a child knows the parent it wishes to join to for
o Bind. When a child knows the parent it wishes to join to a given Data Session, it binds to that parent.
for a given data session, it binds to that parent. - Unbind. When a child wishes to leave a Data Session, either
o Unbind. When a child wishes to leave a data session, because the session is over or because the application is
either because the session is over or because the finished with the session, it initiates an unbind operation
application is finished with the session, it initiates an with its parent.
unbind operation with its parent. - Eject. A parent can also force a child to unbind. This
happens if the parent needs to leave the session, if the child
o Eject. A parent can also force a child to unbind. This is not behaving correctly, or if the parent wants to move the
happens if the parent needs to leave the session, if the child to another parent as part of tree configuration
child is not behaving correctly, or if the parent wants to maintenance.
move the child to another parent as part of tree - Fault Detection. In order to verify liveness, parents and
configuration maintenance. children send regular heartbeat messages between themselves.
o Fault Detection. In order to verify liveness, parents and The Sender also sends regular null data messages to the group,
children send regular heartbeat messages between if it has no data to send.
themselves. The sender also sends regular null data - Fault Recovery. When a child detects that its parent is no
messages to the group, if it has no data to send.
o Fault Recovery. When a child detects that its parent is no
longer reachable, it may switch to another parent. When a longer reachable, it may switch to another parent. When a
parent detects that one of its children is no longer parent detects that one of its children is no longer
reachable, it removes that child from its membership list reachable, it removes that child from its membership list and
and reports this up the tree to the Sender of the Data reports this up the tree to the Sender of the Data Session.
Session. - Distributed Membership. Each Parent is responsible for
maintaining a local list of the children attached to it.
@ TRACK Generation. This set of functionality is responsible for
periodically generating TRACK messages from all receivers to
acknowledge receipt of data, report missing messages, advance
flow control windows, provide roundtrip time measurements and
provide other group management information. The algorithms
include:
o TRACK Timing. In order to avoid ACK implosion, the
Receivers and Repair Heads use the rotating TRACK
algorithm.
o Flow Control and Buffer Management. Receivers and Repair
Heads maintain a set of buffers that are at least as large
as the Sender's transmission window. The Receivers pass
their reception status up to the sender as part of their
TRACK messages. This is used to acknowledge receipt of
delivery, to advance the buffer windows at each node, and
to limit the sender's window advancement to the speed of
the slowest receiver.
o Application Level Confirmed Delivery. Confirmed Delivery
provides transport level confirmation of delivery. Senders
can put a synch point request in data messages, asking
for application level confirmation. Data messages with
this flag set are only confirmed by the Receivers after the
Receiver applications confirm receipt.
@ Local Recovery. This functionality describes how repair heads - Data Sessions. This functionality is responsible for the reliable,
maintain state on their children and provide repairs in response ordered transmission of a set of data messages, which together
to requests for retransmission contained in TRACK messages. constitute a Data Session. These are initially transmitted using
This has overlap with the NACK BB, which is unified in the TRACK another transport protocol, the Data Channel Protocol, which has
PI. primary responsibility for ensuring goodput and congestion control.
- Data Transmission. The Sender takes sequenced data messages
from the application, and passes them to the Data Channel
Protocol for multicast transmission. It delays passing them
to the Data Channel Protocol if it is presently flow
controlled.
- Flow Control and Buffer Management. Receivers and Repair
Heads MAY maintain a set of buffers that are at least as large
as the Senders transmission window. The Receivers pass their
reception status up to the Sender as part of their TRACK
messages. This MAY be used to advance the buffer windows at
each node and limit the Senders window advancement to the
speed of the slowest Receiver.
- Retransmission Requests. While primary responsibility for
goodput rests with the Data Channel Protocol, Receivers MAY
request retransmission of lost messages from their parents.
- Local Recovery. Repair heads keep track of retransmission
requests from their children, and provide repairs to them. If
a Repair Head cannot fulfill a retransmission request, it
forwards it up the tree.
- End of Stream. When a Data Session is completed, this is
signaled as an End of Stream condition.
@ TRACK Aggregation. In order to provide the highest levels of ...TRACK Generation and Aggregation. This set of functionality is
responsible for periodically generating TRACK messages from all
Receivers and aggregating them at Repair Heads. These messages
provide updated flow control window information, roundtrip time
measurements, and congestion control statistics. They OPTIONALLY
acknowledge receipt of data, OPTIONALLY report missing messages,
and OPTIONALLY provide group statistics. The algorithms include:
- TRACK Timing. In order to avoid ACK implosion, the Receivers
and Repair Heads use timing algorithms to control the speed at
which TRACK messages are sent.
- TRACK Aggregation. In order to provide the highest levels of
scalability and reliability, interior tree nodes provide scalability and reliability, interior tree nodes provide
aggregation of control traffic flowing up the tree. The aggregation of control traffic flowing up the tree. The
aggregated feedback information includes that used for end-to- aggregated feedback information includes that used for end-to-
end confirmed delivery, flow control, congestion control, and end confirmed delivery, flow control, congestion control, and
group membership monitoring and management. group membership monitoring and management.
- Statistics Request. A Sender may prompt Receivers to generate
and report a set of statistics back to the Sender. These
statistics are self-describing data types, and may be defined
by either the TRACK PI or the application.
@ Distributed RTT Calculations. One of the primary challenges of - Statistics Aggregation. In addition to the predefined
congestion control is efficient RTT calculations. TRACK aggregation types, aggregation of self-describing data may
provides two methods to perform these calculations. also be performed on Receiver statistics flowing up the tree.
o Sender Per-Message RTT Calculations. Each message is
stamped with a timestamp from the sender. As each is
passed up the tree, the amount of dally time spent waiting
at each node is accumulated. The lowest measurements are
passed up the tree, and the dally time is subtracted from
the original measurement.
o Local Per-Level RTT Calculations. Each parent measures the
local RTT to each of its children as part of the keep-alive
messages used for failure detection.
2. Design Rationale
Much of the design rationale behind the protocol instantiations and . Application Level Confirmed Delivery. Senders can issue requests
building blocks being standardized by the RMT working group are for application level confirmation of data up to a given message.
laid out in [WVKHFL00]. In addition, the design rationale for the Receivers reply to this request, and the confirmations are reliably
TRACK PI is laid out in [WCP00]. This building block conforms with forwarded up the tree.
the design rationales laid out in both of those documents.
TRACK is designed to provide confirmed delivery, receiver-based - Distributed RTT Calculations. One of the primary challenges of
flow control, distributed management of group membership (some of congestion control is efficient RTT calculations. TRACK provides
them may be dedicated servers in a repair tree), as well as two methods to perform these calculations.
providing aggregation of information up the tree. It also provides - Sender Per-Message RTT Calculations. On demand, a Sender
requests for retransmissions as part of TRACK messages, and local stamps outgoing messages with a timestamp. As each TRACK is
recovery of lost packets. passed up the tree, the amount of dally time spent waiting at
each node is accumulated. The lowest measurements are passed
up the tree, and the dally time is subtracted from the
original measurement.
- Local Per-Level RTT Calculations. Each parent measures the
local RTT to each of its children as part of the keep-alive
messages used for failure detection.
This TRACK BB is primarily designed to work as part of the TRACK 2. Applicability Statement
PI, in conjunction with other BB's including NACK, FEC, and Auto
Tree. In the spirit of modular reuse specified in [WVKHFL00], it
is also designed to be useful as an additional layer of
functionality on top of any of the following services. 1) The
functionality (if not the exact message headers) of the NORM PI.
2) The functionality (if not the exact message headers) of the ALC
PI. 3) Running directly on top of an unreliable IP multicast
routing protocol, but on a carefully provisioned network. 4) On
top of an overlay multicast (also known as application layer
multicast) system.
Overlay multicast is a system where servers in the network provide The primary objective of TRACK is to provide additional functionality
multicast (and unicast) routing as well as reliable multicast in conjunction with a receiver reliable protocol. These functions
delivery, all on top of a combination of unicast (i.e. TCP) and, as MAY include application layer reliability, enhanced congestion
available, reliable multicast services. control, flow control, statistics reporting, local recovery, and
automatic tree building. It is designed to do this while still
offering scalability in the range of 10,000s of Receivers per Data
Session. The primary corresponding design tradeoffs are additional
complexity, and lower isolation of nodes in the face of network and
host failures.
There is a fundamental tradeoff between reliability and real-time There is a fundamental tradeoff between reliability and real-time
performance in the face of failures. There are two primary types performance in the face of failures. There are two primary types of
of single layer reliability that have been proposed to deal with single layer reliability that have been proposed to deal with this:
this: sender reliable and receiver reliable delivery. Sender Sender reliable and Receiver reliable delivery. Sender reliable
reliable delivery is similar to TCP, where the sender knows the delivery is similar to TCP, where the Sender knows the identity of
identity of the receivers in a data session, and is notified when the Receivers in a Data Session, and is notified when any of them
any of them fails to receive all the data messages. Receiver fails to receive all the data messages. Receiver reliable delivery
reliable delivery limits knowledge of group membership and failures limits knowledge of group membership and failures to only the actual
to only the actual receivers. Senders do not have any knowledge of Receivers. Senders do not have any knowledge of the membership of a
the membership of a group, and do not require receivers to group, and do not require Receivers to explicitly join or leave a
explicitly join or leave a data session. Receiver reliable Data Session. Receiver reliable protocols scale better in the face
protocols scale better in the face of networks that have frequent of networks that have frequent failures, and have very high isolation
failures, and have very high isolation of failures between of failures between Receivers. This TRACK BB provides Sender
receivers. This TRACK BB provides sender reliable delivery, reliable delivery, typically in conjunction with a Receiver reliable
potentially on top of a receiver reliable system. system.
This BB is specified according to the guidelines in [KV00]. In
addition, it specifies all communication between entities in terms
of messages, rather than packets. A message is an abstract
communication unit, which may be part of, or all of, a given
packet. It does not have a specific format, although it does
contain a list of fields, some of which may be optional, and some
of which may have fixed lengths associated with them. It is up to
each protocol instantiation to combine the set of messages in this
BB, with those in other components, and create the actual set of
message formats that will be used.
As mentioned in the introduction, this BB assumes the existence of This BB is specified according to the guidelines in [21]. It
a separate Auto Tree Configuration BB. It also assumes that data specifies all communication between entities in terms of messages,
sessions are advertised to all receivers as part of an external BB rather than packets. A message is an abstract communication unit,
or other component. It expects to also interact with other BB's which may be part of, or all of, a given packet. It does not have a
through the TRACK PI, but does not require this. specific format, although it does contain a list of fields, some of
which may be optional, and some of which may have fixed lengths
associated with them. It is up to each protocol instantiation to
combine the set of messages in this BB, with those in other
components, and create the actual set of packet formats that will be
used.
3. Applicability Statement As mentioned in the introduction, this BB assumes the existence of a
It is widely recognized that no single reliable multicast protocol separate Auto Tree Configuration BB. It also assumes that Data
can meet the needs of all application types over all network types. Sessions are advertised to all Receivers as part of an external BB or
Distinguishing factors include functionality and performance. other component.
From a functionality perspective, TRACK and NACK based reliable
multicast protocols present an inherently different reliability
model. TRACK based protocols are able to remove messages from the
retransmission window when all the children have acknowledged them.
NACK based protocols have to rely on other means for determining
how long to keep messages in the retransmission window. A popular
method is a time based scheme[SFCGLTLLBEJMRSV00]. TRACK based
protocols can keep track of the membership of the data session, and
provide confirmed delivery against that membership list. NACK
protocols have anonymous membership.
Since reliability is obtained through control traffic, the Except where noted, this applicability statement is applicable both
difference in the semantics of the term reliability lead to the to the TRACK BB and to the TRACK PIs.
second distinguishing factor: performance. When a persistent
failure occurs among the members of a TRACK based protocol, there
is a possibility that this may slow down other members of the
group. NACK protocols have higher isolation of failures, as well
as smaller amounts of control traffic under many scenarios.
3.1 Application types 2.1 Application Types
The objectives of TRACK are to provide high level reliability, high TRACK is designed to support a wide range of applications that
scalability, congestion control and flow control for one to many require one to many bulk data transfer and application layer
bulk data dissemination. TRACK is not designed for many to many confirmed delivery. Examples of applications that fit into the one-
applications. Examples of applications that fit into the one-to- to-many data dissemination model are: real time financial news and
many data dissemination model are: real time financial news and
market data distribution, electronic software distribution, audio market data distribution, electronic software distribution, audio
video streaming, distance learning, software updates and server video streaming, distance learning, software updates and server
replication. But, not all of these application types have the same replication.
reliability requirements.
Historically, financial applications have had the most stringent Historically, financial applications have had the most stringent
reliability requirements, while audio video streaming have had the reliability requirements, while audio video streaming have had the
least stringent. For applications that want to have strong least stringent. For applications that do not require this level of
confirmation of delivery guarantees, TRACK may be more applicable reliability, or that demand the lowest levels of latency and the
than alternatives such as NORM or ALC. For applications that do highest levels of failure isolation, TRACK may be less applicable.
not require this level of reliability, or that demand the lowest
levels of latency and the highest levels of failure isolation,
TRACK may be less applicable.
The TRACK BB, in particular, is designed to optionally work on top TRACK is designed to work in conjunction with a receiver reliable
of a NORM or ALC PI, to allow applications to select this tradeoff protocol such as NORM, to allow applications to select this tradeoff
on a dynamic basis. on a dynamic basis.
3.2 Network Infrastructure 2.2 Network Infrastructure
The TRACKs also serve to provide feedback information to the TRACK is designed to work over almost all multicast and broadcast
sender. The sender uses this information for congestion and flow capable network infrastructures. It is specifically designed to be
control. This allows TRACK to be applicable in most networks (i.e. able to support both asymmetrical and single source multicast
managed and shared networks, and high congestion networks.) environments.
Asymmetric networks with very low upbound bandwidth and a very low Asymmetric networks with very low upbound bandwidth and a very low
loss data channel may be better served through NACK based loss Data Channel may be better served solely through NACK based
protocols, particularly if high reliability is not required. A protocols, particularly if high reliability is not required. A good
good example is some satellite networks. example is some satellite networks.
Networks that have very high loss rates, and regularly experience Networks that have very high loss rates, and regularly experience
partial network partitions, router flapping, or other persistent partial network partitions, router flapping, or other persistent
faults, may be better served through NACK only protocols. faults, may be better served through NACK only protocols. Some
wireless networks fall in to this category.
4. Message Types
The following table summarizes the messages and their fields used
by the TRACK BB. All messages contain the session identifier.
+--------------------------------------------------------------------+
Message From To Mcast? Fields
+--------------------------------------------------------------------+
BindRequest Child Parent no Scope, Level, Role,
SubTreeCount
+--------------------------------------------------------------------+
BindConfirm Parent Child no Level, RepairAddr, SeqNum,
MemberId, CacheInfo
+--------------------------------------------------------------------+
BindReject Parent Child no Reason
+--------------------------------------------------------------------+
UnbindRequest Child Parent no Reason
+--------------------------------------------------------------------+
UnbindConfirm Parent Child no
+--------------------------------------------------------------------+
EjectRequest Parent Child either Reason
+--------------------------------------------------------------------+
EjectConfirm Child Parent no
+--------------------------------------------------------------------+
Heartbeat Parent Child either Level, ParentTimestamp,
ChildrenList, SeqNum
+--------------------------------------------------------------------+
NullData Sender all yes SenderTimeStamp, AppSynch, End
Data Rate, HighestReleased, SeqNum
+--------------------------------------------------------------------+
Retransmission Parent Child yes SenderTimeStamp, AppSynch, End
Rate, HighestReleased, SeqNum
+--------------------------------------------------------------------+
Track Child Parent no SeqNum, BitMask, SubTreeCount
Slowest, FailedChildren,
HighestAllowed,LocalDallyTime
ApplicationConfirms,
ParentThere, ParentTimeStamp,
SenderTimeStamp,
SenderDallyTime
+--------------------------------------------------------------------+
The various fields of the messages are described as follows:
- Scope: an integer to indicate how far a repair message travels.
This is optional.
- Level: an integer that indicates the level in the repair tree.
This value is used to keep loops in the tree from forming, in
addition to indicating the distance from the sender. Any changes
in a node's level are passed down to the Tree BB using the
treeLevelUpdate interface.
- Role: This indicates if the bind requestor is a receiver or
repair head.
- SubTreeCount: This is an integer indicating the current number of
receivers below the node.
- RepairAddr: This field in the BindConfirm message is used to tell
the receiver which multicast address the repair head will be
sending retransmissions on. If this field is null, then the
receiver should expect retransmissions to be sent on the sender's
data multicast address.
- SeqNum: an integer indicating the sequence number of a data 2.3 Private and Public Networks
message within a given data session. The SeqNum field in the
BindConfirm message indicates the sequence number starting from
which the repair head promises to provide repair service.
- MemberId: This is an integer the repair head assigns to a TRACK is designed to work in private networks, controlled networks
particular child. The child receiver uses this value to implement and in the public Internet. A controlled network typically has a
the rotating TRACK Generation algorithm. single administrative domain, has more homogenous network bandwidth,
and is more easily managed and controlled. These networks have the
fewest barriers to IP multicast deployment and the most immediate
need for reliable multicast services. Deployment in the Internet
requires a protocol to span multiple administrative domains, over
vastly heterogeneous networks.
- CacheInfo: This field contains information about the repair data 2.4 Manual vs. Automatic Controls
available from this Repair Head.
- Reason: a code indicating the reason for the BindReject, Some networks can take advantage of manual or centralized tools for
UnbindRequest, or EjectRequest message. configuring and controlling the usage of a reliable multicast group.
In the public Internet the tools have to span multiple administrative
domains where policies may be inconsistent. Hence, it is preferable
to design tools that are fully distributed and automatic. To address
these requirements, TRACK provides automatic configuration, but can
also support manual configuration options.
- ParentTimestamp: This field is included in Heartbeat messages to 2.5 Heterogeneous Networks
signal the need to do a local RTT measurement from a parent. It is
the time when the parent sent the packet.
- ChildrenList: This field contains the identifiers for a list of While the majority of controlled networks are symmetrical and support
children. As part of the keepalive message, this field together many-to-many multicast, in designing a protocol for the Internet, we
with the SeqNum field is used to urge those listed receivers to must deal with virtually all major network types. These include
send a TRACK (for the provided SeqNum). The repair head sending asymmetrical networks, satellite networks, networks where only a
this must have been missing the regular TRACKs from these children single node may send to a multicast group, and wireless networks.
for an extended period of time. TRACK takes this into account by not requiring any many-to-many
multicast services. TRACK does not assume that the topology used for
sending control messages has any congruence to the topology of the
multicast address used for sending data messages.
- SenderTimestamp: This field is included in Data messages to 2.6 Use of Network Infrastructure
signal the need to do a roundtrip time measurement from the sender,
through the tree, and back to the sender. It is the time (measured
by the sender's local clock) when it sent the packet.
- AppSynch: a sequence number signaling a request for confirmed TRACK is designed to run in either single level or hierarchical
delivery by the application. configurations. In a single level, there is no need for specialized
network infrastructure. In hierarchical configurations, special
nodes called Repair Heads are defined, which may run either as part
of a distributed application, or as part of dedicated server
software. TRACK does not specifically support or require Generic
Router Assist or other router level assist.
- End: indicates that this packet is the end of the data for this 2.7 Deployment Constraints
session. The two primary tradeoffs TRACK has, for the functionality it
provides, are additional complexity, and decreased failure isolation.
Hence, if target applications are to be deployed in networks with
high rates of persistent failures, and isolation of failed Receivers
from affecting other Receivers is of high importance, TRACK may not
be appropriate. Similarly, if simplicity is paramount, TRACK may not
be appropriate.
- Rate: This field is used by the sender to tell the receivers its 2.8 Target Scalability
sending rate, in packets per second. It is part of the data or
nulldata messages.
- HighestReleased: This field contains a sequence number, The target scalability of TRACK is tens of thousands of simultaneous
corresponding to the trailing edge of the sender's retransmission Receivers per Data Session. Dedicated Repair Heads are targeted to
window. It is used (as part of the data, nulldata or be able to support thousands of simultaneous Data Sessions.
retransmission headers) to inform the receivers that they should no
longer attempt to recover those messages with a smaller (or same)
sequence number.
- HighestAllowed: a sequence number, used for flow control from the 2.9 Known Failure Modes
receivers. It signals the highest
sequence number the sender is allowed to send that will not overrun
the receivers' buffer pools.
- BitMask: an array of 1's and 0's. Together with a sequence If a hierarchical Control Tree is misconfigured, so that loop-free,
number it is used to indicate lost data messages. If the i'th contiguous connection is not provided, failure will occur. This
element is a 1, it indicates the message SeqNum+i is lost. failure is designed to occur gracefully, at the initialization of a
Data Session.
- Slowest: This field contains a field that characterizes the If the configuration parameters on control traffic are poorly chosen
slowest receiver in the subtree beneath (and including) the node on an asymmetrical network, where there is much less control channel
sending the TRACK. This is used to provide information for the bandwidth available than data channel bandwidth, there may be a very
congestion control BB, and the aggregation methods on this high rate of control traffic. This control traffic is not
information are defined by that BB. dynamically congestion controlled like the data traffic, and so could
potentially cause congestion collapse.
- ParentThere: This field indicates to the parent that the receiver This potential control channel overload could be exacerbated by an
sending the TRACK has not been receiving the regular keepalive application that makes overly heavy use of the application level
messages from its parent, and is wondering if it needs to find a confirmation or statistics gathering functions.
new parent.
- SenderDallyTime: This field is associated with a SenderTimestamp 2.10 Potential Conflicts With Other Components
field. It contains the sum of the waiting time that should be
subtracted from the RTT measurement at the sender.
- LocalDallyTime: This is the same as the SenderDallyTime, but is None are known of at this time.
associated with a ParentTimestamp instead of a SenderTimestamp.
- ApplicationConfirms: This is the SeqNum value for which delivery 3. Architecture Definition
has been confirmed by all children at or below this parent.
- FailedChildren: This is a list of all children that have recently 3.1 TRACK Entities
been dropped from the repair tree.
5. Global Configuration Variables, Constants, and Reason Codes 3.1.1 Node Types
TRACK divides the operation of the protocol into three major
entities: Sender, Receiver, and Repair Head. The Repair Head
corresponds to the Service Node described in the Tree Building draft.
It is assumed that Senders and Receivers typically run as part of an
application on an end host client. Repair Heads MAY be components in
the network infrastructure, managed by different network managers as
part of different administrative domains, or MAY run on an end host
client, in which case they function as both Receivers and Repair
Heads. Absent of any automatic tree configuration, it is assumed
that the Infrastructure Repair Heads have relatively static
configurations, which consist of a list of nearby possible Repair
Heads. Senders and Receivers, on the other hand, are transient
entities, which typically only exist for the duration of a single
Data Session. In addition to these core components, applications that
use TRACK are expected to interface with other services that reside
in other network entities, such as multicast address allocation,
session advertisement, network management consoles, DHCP, DNS,
overlay networking, application level multicast, and multicast key
management.
5.1 Global Configuration Variables 3.1.2 Multicast Group Address
These are variables that control the data session and are advertised
to all participants.
@ TimeMaxBindResponse: the time, in seconds, to wait for a A Multicast Group Address is a logical address that is used to
response to a BindRequest. Initial value is address a set of TRACK nodes. It is RECOMMENDED to consist of a pair
TIMEOUT_PARENT_RESPONSE (recommended value is 3). Maximum value consisting of an IP multicast address and a UDP port number. In this
is MAX_TIMEOUT_PARENT_RESPONSE. case, it may optionally have a Time To Live (TTL) value, although
this value MUST only be used for providing a global scope to a Data
Session, and not for scoping of local retransmissions. Data Multicast
Addresses are Multicast Group Addresses.
@ MaxChildren: The maximum number of children a repair head is TRACK MAY be used with an overlay multicast or application layer
allowed to handle. Recommended value: 32. multicast system. In this case, a Multicast Group Address MAY have a
different format. The TRACK PI is responsible for specifying the
format of a Multicast Group Address.
@ ConstantHeartbeatPeriod: Instead of dynamically calculating the 3.1.3 Data Session
HeartbeatPeriod as described in Section 7.1.5, a constant period
may be used instead. Recommended value: 3 seconds.
@ MinimumHeartbeatPeriod: The minimum value for the dynamically A Data Session is the unit of reliable delivery of TRACK. It
calculated HeartbeatPeriod. Recommended value: 1 second. consists of a sequence of sequentially numbered Data messages, which
are sent by a single Sender over a single Data Multicast Address.
They are delivered reliably, with acknowledgements and
retransmissions occurring over the Control Tree. A Data Session ID
uniquely identifies it. A given Data Session is received by a set of
zero or more Receivers, and a set of zero or more Repair Heads. One
or more Data Sessions MAY share the same Data Multicast Address
(although this is NOT RECOMMENDED). Each TRACK node can
simultaneously participate in multiple Data Sessions. A Receiver
MUST join all the Data Multicast Addresses and Control Trees
corresponding to the Data Sessions it wishes to receive.
@ MinHoldTime: The minimum amount of time a repair head holds on 3.1.4 Data Channel
to data packets.
@ MaxHoldTime: The maximum amount of time a repair head holds on A Data Session is multicast over a Data Channel. The Data Channel is
to data packets. responsible for efficiently delivering the Data messages to the
members of a Data Session, and providing statistical reliability
guarantees on this delivery. It does this by employing a Data
Channel Protocol, such as NORM, ALC, PGM, or Overlay Multicast.
TRACK is then responsible for providing application level, Sender
based reliability, by confirming delivery to all Receivers, and
optionally retransmitting lost messages that did not get correctly
delivered by the Data Channel. A common scenario would be to use
TRACK to provide application level confirmation of delivery, and
recover from persistent failures in the network that are beyond the
scope of the Data Channel Protocol.
@ AckWindow: The number of packets seen before a receiver issues 3.1.5 Data Channel Protocol
an acknowledgement. Recommended value: 32.
5.2 Constants This is the transport protocol used by a TRACK PI to ensure goodput
and statistical reliability on a Data Channel.
@ NUM_MAX_PARENT_ATTEMPTS: The number of times to try to bind to a 3.1.6 Data Multicast Address
repair head before declaring a PARENT_UNREACHABLE error.
Recommended value is 5.
@ NULL_DATA_PERIOD: The time between transmission of NullData This is the Multicast Group Address used by the Data Channel
Messages. Recommended value is 1. Protocol, to efficiently deliver Data messages to all Receivers and
Repair Heads. All Data Multicast Addresses used by TRACK are assumed
to be unidirectional and only support a single Sender.
@ FAILURE_DETECTION_REDUNDANCY: The number of times a message is 3.1.7 Control Tree
sent without receiving a response before declaring an error.
Recommended value is 3.
@ MAX_TRACK_TIMEOUT: The maximum value for TRACKTimeout. A Control Tree is a hierarchical communication path used to send
Recommended value is 5 seconds. control information from a set of Receivers, through zero or more
Repair Heads (RHs), to a Sender. Information from lower nodes is
aggregated as the information is relayed to higher nodes closer to
the Sender. Each Data Session uses a Control Tree. It is acceptable
to have a degenerate Control Tree with no Repair Heads, which
connects all of the Receivers directly to the Sender.
5.3 Reason Codes Each RH in the Control Tree uses a separate Local Control Channel for
communicating with its children. It is RECOMMENDED that each Local
Control Channel correspond to a separate Multicast Group Address.
Optionally, these RH multicast addresses MAY be the same as the Data
Multicast Address.
@ BindReject reason codes 3.1.8 Local Control Channel
@ LOOP_DETECTED A Local Control Channel is a unidirectional multicast path from a
@ MAX_CHILDREN_EXCEEDED Repair Head or Sender to its children. It uses a Multicast Group
Address for this communication.
@ UnbindRequest reason codes 3.1.9 Host ID
@ SESSION_DONE
@ APPLICATION_REQUEST
@ RECEIVER_TOO_SLOW
@ EjectRequest reason codes With the widespread deployment of network address translators,
@ PARENT_LEAVING creating a short globally unique ID for a host is a challenge. By
@ PARENT_FAILURE default, TRACK uses a 48 bit long Host ID field, filled with the low-
@ CHILD_TOO_SLOW order 48 bits of the MD5 signature of the DNS name of the source. A
@ PARENT_OVERLOADED TRACK PI, to match up with the goodput-ensuring protocol that TRACK
PI uses as its Data Channel Protocol, MAY redefine the length and
contents of this identifier.
6. External APIs 3.1.10 Data Session ID
This section describes external interfaces for the building block.
6.1 Interfaces to the BB from PI's A Data Session ID is a globally unique identifier for a Data Session.
It may either be selected by the Data Channel Protocol (i.e. NORM) or
by TRACK. By default, it is the combination of the Host ID for the
Sender, combined with the 16 bit port number used for the Data
Session at the Sender. This identifier is included in every TRACK
message.
6.1.1 Start(boolean RepairHead, boolean RejoinAllowed, 3.1.11 Child ID
Advertisement)
Start instructs the BB to initiate operation. All members in a TRACK Data Session, besides the Sender, are
identified by the combination of their Host ID, and the port number
with which they send IP packets to their parent.
RepairHead indicates whether or not the node may also operate as a 3.1.12 Message Sequence Numbers
repair head. This parameter is passed along to the tree BB.
RejoinAllowed indicates whether or not the node is allowed to rejoin A Message Sequence Number is a 32 bit number in the range from 1
the session if the only repair heads available are missing some repair through 2^32 1, which is used to specify the sequential order of a
data needed by this node. This parameter also controls whether or not Data message in a Data Stream. A Sender node assigns consecutive
the node is allowed to join the session after the first data messages Sequence Numbers to the Data messages provided by the Sender
have become unrecoverable (late join). The BB uses this parameter to application. By default, zero is reserved to indicate that the Data
decide whether or not to use a particular repair head (chosen by the Session has not yet started. A TRACK PI MAY redefine this. Message
tree BB) based on its available repair data. Sequence Numbers may wrap around, and so Sequence Number arithmetic
MUST be used to compare any two Sequence Numbers.
The Advertisement parameter passes to the BB all of the parameters 3.1.13 Data Queue
from the session advertisement.
6.1.2 End A Data Queue is a buffer, maintained by a Sender or a Repair Head,
for transmission and retransmission of the Data messages provided by
the Sender application. New Data messages are added to the Data
Queue as they arrive from the sending application, up to a specified
buffer limit. The admission rate of messages to the network is
controlled by the flow and congestion control algorithms. Once a
message has been received by the Receivers of a Data Stream, it may
be deleted from the buffer.
End instructs the BB to end its operation. If the node is the Sender, At the Sender, A TRACK PI may integrate the Data Queue with the
it indicates to the group that the last data message is the final one buffer used by the Data Channel Protocol.
for the session. Once a receiver has received all of the session's
data, it MAY unbind from its parent. However, if the receiver is also
a repair head, it continues to operate as a repair head until all of
its children have finished. Then it MAY unbind from its own parent.
If End is called at a repair head, it MUST use the multicast Eject 3.2 Basic Operation of the Protocol
procedure to inform its children for this session that it is leaving
the group. Once the procedure is complete (all children have
acknowledged receipt of the Eject, or the Eject has been sent the
maximum number of times), the repair head MAY unbind from its own
parent.
If End is called at a receiver, it MUST use the Unbind procedure to For each Data Session, TRACK provides sequenced, reliable delivery of
inform its parent for this session that it is leaving the group. data from a single Sender to up to tens of thousands of Receivers. A
TRACK Data Session consists of a network that has exactly one Sender
node, zero or more Receiver nodes and zero or more Repair Heads.
6.1.3 incomingMessage(Message) The figure below illustrates a TRACK Data Session with multiple
Repair Heads.
incomingMessage presents the BB with message received by the PI. Before a Data Session starts, a session advertisement MUST be
received by all members of the Data Session, notifying them to join
the group, and the appropriate configuration information for the Data
Session. This MAY be provided directly by the application, by an
external service, or by the TRACK PI.
6.1.4 getStatistics A Sender joins the Control Tree and a Data Channel Protocol. It
multicasts Data messages on the Data Multicast Address, using the
Data Channel Protocol. All of the nodes in the session subscribe to
the Data Multicast Address and join the Data Channel Protocol.
getStatistics returns current BB statistics to the upper BB or PI. There is no assumption of congruence between the topology of the Data
Multicast Address and the topology of the Control Tree.
6.1.5 MessageSynched(Message) -------> SD (Sender node)----->|
^^^ |
/ | \ Control |
TRACKs / | \ Tree |
/ | \ |
/ | \ (Repair |
/ | \ Head |
/ | \ nodes) v
RH RH RH <------------|
^^ ^^^ ^^ | Data
/ | / | \ | \ | Channel
/ | / | \ | \ |
/ | / | \ | \ v
R R R R R R R <---------
(Receiver Nodes)
MessageSynched tells the BB that the indicated message has been A Receiver joins the appropriate Data Channel Protocol, and the Data
synched with the application. Multicast Address used by that protocol, in order to receive Data. A
Receiver periodically informs its parent about the messages that it
has received by unicasting a TRACK message to the parent. It MAY
also request retransmission of lost messages in this TRACK. Each
parent node aggregates the TRACKs from its child nodes and (if it is
not the Sender) unicasts a single aggregated TRACK to its parent.
6.1.6 RepairHead(boolean) The Sender and each Repair Head have a multicast Local Control
Channel to their children. This is used for transmitting Heartbeat
messages that inform their child nodes that the parent node is still
functioning. This channel is also used to perform local
retransmission of lost Data messages to just these children. TRACK
MUST still provide correct operation even if multicast addresses are
reused across multiple Data Sessions or multiple Local Control
Channels. It is NOT RECOMMENDED to use the same multicast address
for multiple Local Control Channels serving any given Data Session.
RepairHead tells the BB whether or not it is now acting as a Repair The communication path forms a loop from the Sender to the Receivers,
Head. through the Repair Heads back to the Sender. Original data (ODATA),
Retransmission (RDATA) and NullData messages regularly exercise the
downward data direction. Heartbeat messages exercise the downward
control direction. TRACK messages regularly exercise the Control
Tree in the upward direction. This combination constantly checks
that all of the nodes in the tree are still functioning correctly,
and initiates fault recovery when required.
6.2 Interfaces from the BB to the PI This hierarchical infrastructure allows TRACK to provide a number of
functions in a scaleable way. Application level confirmation of
delivery and statistics aggregation both operate in a request-reply
mode. A sender issues a request for application level confirmation
or statistics reporting, and the receivers report back the
appropriate information in their TRACK messages. This information is
aggregated by the Repair Heads, and passed back up to the Sender.
Since TRACK messages are not delivered with the reliability of data
messages, Receivers and Repair Heads transmit this information
redundantly.
6.2.1 outgoingMessage(Message) TRACK also gathers control information that is useful for improving
the performance of flow and congestion control algorithms, including
scaleable round trip time measurements.
outgoingMessage instructs the PI to send the message. Normally, goodput in ensured by lower level protocols, such as the
NACKs and FEC algorithms in NORM and PGM. However, TRACKs MAY also
include optional retransmission requests, in the form of selective
bitmaps indicating which messages need to be retransmitted. The RH
is then responsible for retransmitting these messages on the Local
Control Channel to its children.
6.2.2 MessageReceived(Message, boolean Synch) 3.3 Component Relationships
TRACK is primarily designed to run in conjunction with another
transport protocol that is responsible for ensuring goodput. It is
RECOMMENDED that this Data Channel Protocol also be responsible for
congestion control, although the TRACK PI MAY provide this congestion
control function instead, and MAY pass the congestion control
statistics it collects to the Data Channel Protocol, in order to
enhance the performance of the congestion control algorithms.
MessageReceived passes a data message up to the PI. Synch indicates The primary Data Channel Protocol that TRACK is designed to work with
whether or not the PI should call MessageSynched once the message has is NORM. In this case, the NORM PI is responsible for interfacing
been consumed by the application. with the NACK BB, the FEC BB, the Generic Router Assist BB, and the
appropriate congestion control BB.
6.2.3 SenderLost TRACK then adds additional functionality that complements this
receiver-reliable protocol, such as application level confirmed
delivery, retransmission in the face of persistent failures,
statistics aggregation, and collection of extra information for
congestion control.
SenderLost tells the PI that contact with the sender has been lost. The TRACK BB is responsible for specifying all of the TRACK-specific
functionality. It interfaces with the Automatic Tree Building Block.
The TRACK PI is then responsible for instantiating a complete
protocol that includes all of the other components. It is expected
that there will be multiple TRACK PIs, one for each Data Channel
Protocol that it is specified to work with.
6.2.4 UnrecoverableData The following figure illustrates this, for the case where NORM is the
UnrecoverableData indicates to the PI that the BB was unable to Data Channel Protocol.
recover some session data.
6.2.5 SessionDone +----------+
| |
| TRACK |
| PI |
| |
+----------+
/ \
/ \
/ \
+---------+ +---------+
| | | |
| TRACK | | NORM | Data Channel
| BB | | PI | Protocol
| | | |
+---------+ +---------+
| |
| |
| |
+---------+ +-----------------------+
| | | |
| Tree | | FEC, CC, GRA, NACK |
| BB | | Building Blocks |
| | | |
+---------+ +-----------------------+
SessionDone indicates to the PI that the sender has completed sending For more details on integration, please see the example TRACK PI over
the data, and the node has left the session. UDP [17].
7. Algorithms 4. TRACK Functionality
7.1 Tree Based Session Creation and Maintenance 4.1 Hierarchical Session Creation and Maintenance
7.1.1 Overview of Tree Configuration 4.1.1 Overview of Tree Configuration
Before a Data Session starts delivering data, the tree for the Data Before a Data Session starts reliably delivering data, the tree for
Session needs to be created. This process binds each Receiver to the Data Session needs to be created. This process binds each
either a Repair Head or the Sender, and binds the participating Receiver to either a Repair Head or the Sender, and binds the
Repair Heads into a loop-free tree structure with the Sender as the participating Repair Heads into a loop-free tree structure with the
root of the tree. This process requires tree configuration Sender as the root of the tree. This process requires tree
knowledge, which can be provided with some combination of manual configuration knowledge, which can be provided with some combination
and/or automatic configuration. The algorithms for automatic tree of manual and/or automatic configuration. The algorithms for
configuration are part of the Automatic Tree Configuration BB. automatic tree configuration are part of the Automatic Tree
They return to each node the address of the parent it should bind Configuration BB. They return to each node the address of the parent
to, as well as zero or more backup parents to use if the primary it should bind to, as well as zero or more backup parents to use if
parent fails. the primary parent fails.
In addition to receiving the tree configuration information, the In addition to receiving the tree configuration information, the
receivers all receive a Session Advertisement message from the Receivers all receive a Session Advertisement message from the
senders, informing them of the Data Multicast Address and other Senders, informing them of the Data Multicast Address and other
session configuration information. This advertisement may contain session configuration information. This advertisement may contain
other relevant session information such as whether or not Repair other relevant session information such as whether or not Repair
Heads should be used, whether manual or automatic tree Heads should be used, whether manual or automatic tree configuration
configuration should be used, the time at which the session will should be used, the time at which the session will start, and other
start, and other protocol settings. This advertisement is created protocol settings. This advertisement is created as part of either
as part of either the PI or as part of an external service. In the TRACK PI or as part of an external service. In this way, the
this way, the Sender enforces a set of uniform Session Sender enforces a set of uniform session configuration parameters on
Configuration Parameters on all members of the session. all members of the session.
As described in the automatic tree configuration BB, the general As described in the automatic tree configuration BB, the general
algorithm for a given node in tree creation is as follows. algorithm for a given node in tree creation is as follows.
1) Get advertisement that a session is starting 1) Get advertisement that a session is starting
2) Get list of neighbor candidates using the getSNs Tree BB 2) Get a list of neighbor candidates using the getSNs Tree BB
interface, contact them interface, and OPTIONALLY contact them
3) Select best neighbor as parent in a loop free manner 3) Select best neighbor as parent in a loop free manner
4) Bind to parent 4) Bind to parent
5) Optionally, later rebind to another parent 5) Optionally, later rebind to another parent
When a child finishes step 4, it is up to automatic tree When a child finishes step 4, it is up to automatic tree
configuration to, if necessary, continue building the tree in order configuration to, if necessary, continue building the tree in order
to connect the node back to the Sender. After the session is to connect the node back to the Sender. After the session is
created, children can unbind from their parents and bind again to created, children can unbind from their parents and bind again to new
new parents. This happens when faults occur, or as part of a tree parents. This happens when faults occur, or as part of a tree
optimization process. Steps 1 through 3 are external to the TRACK optimization process. Steps 1 through 3 are external to the TRACK
BB. Step 4 is performed as part of session creation. Step 5 is BB. Step 4 is performed as part of session creation. Step 5 is
performed as part of session maintenance in conjunction with performed as part of session maintenance in conjunction with
automatic tree building, as either an unbind or eject, combined automatic tree building, as either an Unbind or Eject, combined with
with another bind operation. another Bind operation.
Once steps 1 through 3 are completed, Receivers join the Data Once steps 1 through 3 are completed, Receivers join the Data
Multicast Address, and attempt to bind to either the Sender or a Multicast Address, and attempt to Bind to either the Sender or a
local Repair Head. A Receiver will attempt to bind to the first local Repair Head. A Receiver will attempt to bind to the first node
node in the tree configuration list returned by step 3, and if this in the tree configuration list returned by step 3, and if this fails,
fails, it will move to the next one. A Receiver only binds to a it will move to the next one. A Receiver only binds to a single
single Repair Head or Sender, at a time, for each Data Session. Repair Head or Sender, at a time, for each Data Session.
The automatic tree building BB ensures that the tree is formed The automatic tree building BB ensures that the tree is formed
without loops. As part of this, when a Repair Head has a Receiver without loops. As part of this, when a Repair Head has a Receiver
attempt to bind to it for a given Data Session, it may not at first attempt to bBnd to it for a given Data Session, it may not at first
be able to accept the connection, until it is able to join the tree be able to accept the connection, until it is able to join the tree
itself. Because of this, a Receiver will sometimes have to itself. Because of this, a Receiver will sometimes have to
repeatedly attempt to bind to a given parent before succeeding. repeatedly attempt to Bind to a given parent before succeeding.
Once the Sender initiates tree building, it is also free to start Once the Sender initiates tree building, it is also free to start
sending Data messages on the Data Multicast Address. Repair Heads sending Data messages on the Data Multicast Address. Repair Heads
and Receivers may start receiving these messages, but may not and Receivers may start receiving these messages, but may not request
request retransmission or deliver data to the application until retransmission or deliver data to the application until they receive
they receive confirmation that they have successfully bound to the confirmation that they have successfully bound to the tree.
tree.
7.1.2 Bind 4.1.2 Bind
7.1.2.1 Input Parameters 4.1.2.1 Input Parameters
In order to join a data session and bind to the tree, the following In order to join a Data Session and Bind to the tree, the following
nodes need the following parameters. nodes need the following parameters.
A Repair Head requires the following parameters. A Repair Head requires the following parameters.
- Session: the unique identifier for the data session to join, - Session: the unique identifier for the Data Session to join,
received from the Session Advertisement algorithm in the PI. received from the session advertisement algorithm specified in the
PI.
- ParentAddress: the address and port of the parent node to which - ParentAddress: the address and port of the parent node to which
the node should connect the node should connect, received from the Auto Tree BB.
- UDPListenPort: the number of the port on which the node will - UDPListenPort: the number of the port on which the node will
listen for its children's control messages listen for its childrens control messages. This parameter is
configured by the application.
- RepairAddr: the multicast address, UDP port, and TTL on which - RepairAddr: the multicast address, UDP port, and TTL on which this
this node sends control messages to its children. node sends control messages to its children. This parameter is
configured by the application.
A Sender requires the above parameters, except for the A Sender requires the above parameters, except for the ParentAddress.
ParentAddress. A Receiver requires the above parameters, except A Receiver requires the above parameters, except for the
for the UDPListenPort and RepairAddr. UDPListenPort and RepairAddr.
7.1.2.2 Bind Algorithm 4.1.2.2 Bind Algorithm
A Bind operation happens when a child wishes to join a parent in A Bind operation happens when a child wishes to join a parent in the
the distribution tree for a given data session. The Receivers distribution tree for a given Data Session. The Receivers initiate
initiate the first Bind protocols to their parents, which then the first Bind protocols to their parents, which then cause recursive
cause recursive binding by each parent, up to the Sender. Each binding by each parent, up to the Sender. Each Receiver sends a
Receiver sends a separate BindRequest message for each of the separate BindRequest message for each of the streams that it would
streams that it would like to join. At the discretion of the PI, like to join. At the discretion of the PI, multiple BindRequest
multiple BindRequest messages may be bundled together in a single messages may be bundled together in a single message.
message.
A node sends a BindRequest message to its automatically selected or A node sends a BindRequest message to its automatically selected or
manually configured parent node. The parent node sends either a manually configured parent node. The parent node sends either a
BindConfirm message or a BindReject message. Reception of a BindConfirm message or a BindReject message. Reception of a
BindConfirm message terminates the algorithm successfully, while BindConfirm message terminates the algorithm successfully, while
receipt of a BindReject message causes the node to either retry the receipt of a BindReject message causes the node to either retry the
same parent or restart the Bind algorithm with its next parent same parent or restart the Bind algorithm with its next parent
candidate (depending on the BindReject reason code), or if it has candidate (depending on the BindReject reason code), or if it has
none, to declare a REJECTED_BY_PARENT error. Once the node is none, to declare a REJECTED_BY_PARENT error. Once the node is
accepted by a Repair head, it informs the Tree BB using the setSN accepted by a Repair head, it informs the Tree BB using the setSN
interface. interface.
Reliability is achieved through the use of a standard request- Reliability is achieved through the use of a standard request-
response protocol. At the beginning of the algorithm, the child response protocol. At the beginning of the algorithm, the child
initializes TimeMaxBindResponse to the constant initializes TimeMaxBindResponse to the constant
TIMEOUT_PARENT_RESPONSE and initializes NumBindResponseFailures to TIMEOUT_PARENT_RESPONSE and initializes NumBindResponseFailures to 0.
0. Every time it sends a BindRequest message, it waits Every time it sends a BindRequest message, it waits
TimeMaxBindResponse for a response from the parent node. If no TimeMaxBindResponse for a response from the parent node. If no
response is received, the node doubles its value for response is received, the node doubles its value for
TimeMaxBindResponse, but limits TimeMaxBindResponse to be no larger TimeMaxBindResponse, but limits TimeMaxBindResponse to be no larger
than MAX_TIMEOUT_PARENT_RESPONSE. It also than MAX_TIMEOUT_PARENT_RESPONSE. It also
increments NumBindResponseFailures, and retransmits the BindRequest increments NumBindResponseFailures, and retransmits the BindRequest
message. If NumBindResponseFailures reaches message. If NumBindResponseFailures reaches NUM_MAX_PARENT_ATTEMPTS,
NUM_MAX_PARENT_ATTEMPTS, it reports a PARENT_UNREACHABLE error. it reports a PARENT_UNREACHABLE error.
When a parent receives a BindRequest message, it first consults the When a parent receives a BindRequest message, it first consults the
automatic tree building BB for approval (using the acceptChild Tree automatic tree building BB for approval (using the acceptChild Tree
BB interface), for instance to ensure that accepting the BB interface), for instance to ensure that accepting the BindRequest
BindRequest will not cause a loop in the tree. Then the parent will not cause a loop in the tree. Then the parent checks to be sure
checks to be sure that it does not have more than MaxChildren that it does not have more than MaxChildren children already bound to
children already bound to it for this session. If it can accept it for this session. If it can accept the child, it sends back a
the child, it sends back a BindConfirm message. Otherwise, it BindConfirm message. Otherwise, it sends the node a BindReject
sends the node a BindReject message. Then the parent checks to see message. Then the parent checks to see if it is already a member of
if it is already a member of this data session. If it is not yet a this Data Session. If it is not yet a member of this session, it
member of this session, it attempts to join the tree itself. attempts to join the tree itself.
The BindConfirm message contains the lowest sequence number that The BindConfirm message contains the lowest Sequence Number that the
the repair head has available. If this number is 0 or 1, then the Repair Head has available. If this number is 0, then the Repair Head
repair head has all of the data available from the start of the has all of the data available from the start of the session.
session. Otherwise, the requesting node is attempting a late join, Otherwise, the requesting node is attempting a late join, and can
and can only use this repair head if late join was allowed by the only use this Repair Head if late join was allowed by the PI. If
PI. If late join is not allowed, the node may try another repair late join is not allowed, the node may try another Repair Head, or
head, or give up. give up.
Similarly, if a failure recovery occurs, when a node tries to bind Similarly, if a failure recovery occurs, when a node tries to bind to
to a new repair head, it must follow the same rules as for a late a new Repair Head, it must follow the same rules as for a late join.
join. See section 7.1.5. See Fault Recovery, below.
7.1.3 Unbind 4.1.3 Unbind
A child may decide to leave a data session for the following A child may decide to leave a Data Session for the following reasons.
reasons. 1) It detects that the data session is finished. 2) The 1) It detects that the Data Session is finished. 2) The application
application requests to leave the data session. 3) It is not able requests to leave the Data Session. 3) It is not able to keep up
to keep up with the data rate of the data session. When any of with the data rate of the Data Session. When any of these conditions
these conditions occurs, it initiates an Unbind process. occurs, it initiates an Unbind process.
An Unbind is, like the Bind function, a simple request-reply An Unbind is, like the Bind function, a simple request-reply
protocol. Unlike the Bind function, it only has a single response, protocol. Unlike the Bind function, it only has a single response,
UnbindConfirm. With this exception, the Unbind operation uses the UnbindConfirm. With this exception, the Unbind operation uses the
same state variables and reliability algorithms as the Bind same state variables and reliability algorithms as the Bind function.
function.
When a child receives an UnbindConfirm message from its parent, it When a child receives an UnbindConfirm message from its parent, it
reports a LEFT_DATA_SESSION_GRACEFULLY event. If it does not reports a LEFT_DATA_SESSION_GRACEFULLY event. If it does not receive
receive this message after NUM_MAX_PARENT_ATTEMPTS, then it reports this message after NUM_MAX_PARENT_ATTEMPTS, then it reports a
a LEFT_DATA_SESSION_ABNORMALLY event. Unbinds are reported to the LEFT_DATA_SESSION_ABNORMALLY event. Unbinds are reported to the Tree
Tree BB using the lostSN interface. BB using the lostSN interface.
7.1.4 Eject 4.1.4 Eject
A parent may decide to remove one or more of its children from a A parent may decide to remove one or more of its children from a data
data stream for the following reasons. 1) The parent needs to stream for the following reasons. 1) The parent needs to leave the
leave the group due to application reasons. 2) The repair head group due to application reasons. 2) The Repair Head detects an
detects an unrecoverable failure with either its parent or the unrecoverable failure with either its parent or the Sender. 3) The
sender. 3) The parent detects that the child is not able to keep parent detects that the child is not able to keep up with the speed
up with the speed of the data stream. 4) The parent is not able to of the data stream. 4) The parent is not able to handle the load of
handle the load of its children and needs some of them to move to its children and needs some of them to move to another parent. In
another parent. In the first two cases, the parent needs to the first two cases, the parent needs to multicast the advertisement
multicast the advertisement of the termination of one or more data of the termination of one or more Data Sessions to all of its
sessions to all of its children. In the second two cases, it needs children. In the second two cases, it needs to send one or more
to send one or more unicast notifications to one or more of its unicast notifications to one or more of its children.
children.
Consequently, an Eject can be done either with a repeated multicast Consequently, an Eject can be done either with a repeated multicast
advertisement message to all children, or a set of unicast request- advertisement message to all children, or a set of unicast request-
reply messages to the subset of children that it needs to go to. reply messages to the subset of children that it needs to go to.
For the multicast version of Eject, the parent sends a multicast For the multicast version of Eject, the parent sends a multicast
UnbindRequest message to all of its children for a given Data UnbindRequest message to all of its children for a given Data
Session, on its Local Multicast Channel. It is only necessary to Session, on its Local Multicast Channel. It is only necessary to
provide statistical reliability on this message, since children provide statistical reliability on this message, since children will
will detect the parent's failure even if the message is not detect the parents failure even if the message is not received.
received. Therefore, the UnbindRequest message is sent Therefore, the UnbindRequest message is sent
FAILURE_DETECTION_REDUNDANCY times. FAILURE_DETECTION_REDUNDANCY times.
For the unicast version of Eject, the parent sends a unicast For the unicast version of Eject, the parent sends a unicast
UnbindRequest message to all of its children. Each of them respond UnbindRequest message to all of its children. Each of them responds
with an EjectConfirm. Reliability is ensured through the same with an EjectConfirm. Reliability is ensured through the same
request-reply mechanism as the Bind operation. request-reply mechanism as the Bind operation.
Ejections are reported to the Tree BB using the removeChild Ejections are reported to the Tree BB using the removeChild
interface. interface.
7.1.5 Fault Detection 4.1.5 Fault Detection
There are three cases where fault detection is needed. 1) There are three cases where fault detection is needed. 1) Detection
Detection (by a child) that a parent has failed. 2) Detection (by (by a child) that a parent has failed. 2) Detection (by a parent)
a parent) that a child has failed. 3) Detection (by either a that a child has failed. 3) Detection (by either a Repair Head or
Repair Head or Receiver) that a Sender has failed. Receiver) that a Sender has failed.
In order to be scaleable and efficient, fault detection is In order to be scaleable and efficient, fault detection is primarily
primarily accomplished by periodic keep-alive messages, combined accomplished by periodic keep-alive messages, combined with the
with the existing TRACK messages. Nodes expect to see keep-alive existing TRACK messages. Nodes expect to see keep-alive messages
messages every set period of time. If more than a fixed number of every set period of time. If more than a fixed number of periods go
periods go by, and no keep-alive messages of a given type are by, and no keep-alive messages of a given type are received, the node
received, the node declares a preliminary failure. The detecting declares a preliminary failure. The detecting node may then ping the
node may then ping the potentially failed node before declaring it potentially failed node before declaring it failed, or it can just
failed, or it can just declare it failed. declare it failed.
Failures are detected through three keep-alive messages: Failures are detected through three keep-alive messages: Heartbeat,
Heartbeat, TRACK, and NullData. The Heartbeat message is multicast TRACK, and NullData. The Heartbeat message is multicast periodically
periodically from a parent to its children on its local control from a parent to its children on its Local Control Channel. NullData
channel. NullData messages are multicast by a sender on the global messages are multicast by a Sender on the Data Control Channel when
multicast address when it has no data to send. TRACK messages are it has no data to send. TRACK messages are generated periodically,
generated periodically, even if no data is being sent to a data even if no data is being sent to a Data Session, as described in
session, as described in section 7.2. section 7.2.
Heartbeat messages are multicast every HeartbeatPeriod seconds, Heartbeat messages are multicast every HeartbeatPeriod seconds, from
from a parent to its children. Every time that a parent sends a a parent to its children. Every time that a parent sends a
Retransmission message or a Heartbeat message (as well as at Retransmission message or a Heartbeat message (as well as at
initialization time), it resets a timer for HeartbeatPeriod initialization time), it resets a timer for HeartbeatPeriod seconds.
seconds. If the timer goes off, a Heartbeat is sent. The If the timer goes off, a Heartbeat is sent. The HeatbeatPeriod is
HeatbeatPeriod is dynamically computed as follows: dynamically computed as follows:
interval = AckWindow / PacketRate interval = AckWindow / MessageRate
HeartbeatPeriod = 2 * interval HeartbeatPeriod = 2 * interval
Global configuration parameters ConstantHeartbeatPeriod and Global configuration parameters ConstantHeartbeatPeriod and
MinimumHeartbeatPeriod can be used to either set HeartbeatPeriod to MinimumHeartbeatPeriod can be used to either set HeartbeatPeriod to a
a constant, or give HeartbeatPeriod a lower bound, globally. constant, or give HeartbeatPeriod a lower bound, globally.
Similarly, a NullData message is multicast by the sender to all Similarly, a NullData message is multicast by the Sender to all Data
data session members, every NULL_DATA_PERIOD. The NullData timer Session members, every NULL_DATA_PERIOD. The NullData timer is set
is set to NULL_DATA_PERIOD, and is reset every time that a Data or to NULL_DATA_PERIOD, and is reset every time that a Data or NullData
NullData message is sent by the Sender. message is sent by the Sender.
The key parameter for failure detection is the global tree The key parameter for failure detection is the global tree parameter
parameter FAILURE_DETECTION_REDUNDANCY. The higher the value for FAILURE_DETECTION_REDUNDANCY. The higher the value for this
this parameter, the more keep-alive messages that must be missed parameter, the more keep-alive messages that must be missed before a
before a failure is declared. failure is declared.
A major goal of failure detection is for children to detect parent A major goal of failure detection is for children to detect parent
failures fast enough that there is a high probability they can failures fast enough that there is a high probability they can rejoin
rejoin the stream at another parent, before flow control has the stream at another parent, before flow control has advanced the
advanced the buffer window to a point where the child can not buffer window to a point where the child can not recover all lost
recover all lost messages in the stream. In order to attempt to do messages in the stream. In order to attempt to do this, children
this, children detect a failure of a parent if detect a failure of a parent if FAILURE_DETECTION_REDUNDANCY *
FAILURE_DETECTION_REDUNDANCY * HeartbeatPeriod time goes by without HeartbeatPeriod time goes by without any heartbeats. As part of
any heartbeats. As part of buffer window advancement, described in buffer window advancement, all parents MAY choose to buffer all
section 7.2.4, all parents MAY choose to buffer all messages for a messages for a minimum of FAILURE_DETECTION_REDUNDANCY * 2 *
minimum of FAILURE_DETECTION_REDUNDANCY * 2 * HeartbeatPeriod HeartbeatPeriod seconds, which gives children a period of time to
seconds, which gives children a period of time to find a new parent find a new parent before the buffers are freed. Children report
before the buffers are freed. Children report parent failures to parent failures to the Tree BB using the lostSN interface.
the Tree BB using the lostSN interface.
A parent detects a preliminary failure of one of its children if it A parent detects a preliminary failure of one of its children if it
does not receive any TRACK messages from that child in does not receive any TRACK messages from that child in
FAILURE_DETECTION_REDUNDANCY * TrackTimeout seconds (see discussion FAILURE_DETECTION_REDUNDANCY * TrackTimeout seconds (see discussion
of how TrackTimeout is computed in 7.2.1). Because a failed child of how TrackTimeout is computed below). Because a failed child can
can slow down the group's progress, it is very important that a slow down the groups progress, it is very important that a parent
parent resolve the child's status quickly. Once a parent declares resolve the childs status quickly. Once a parent declares a
a preliminary failure of a child, it issues a set of up to preliminary failure of a child, it issues a set of up to
FAILURE_DETECTION_REDUNDANCY Heartbeat messages that are unicast FAILURE_DETECTION_REDUNDANCY Heartbeat messages that are unicast (or
(or multicast) to the failed receiver(s). These messages are multicast) to the failed Receiver(s). These messages are spaced
spaced apart by 2*LocalRTT, where LocalRTT is the round trip time apart by 2*LocalRTT, where LocalRTT is the round trip time that has
that has been measured to the child in question (see 7.4 for been measured to the child in question (see below for description of
description of how LocalRTT is measured). These Heartbeat messages how LocalRTT is measured). These Heartbeat messages contain a
contain a ChildrenList field that contains the children who are ChildrenList field that contains the children who are requested to
requested to send a TRACK immediately. send a TRACK immediately.
Whenever a child receives a Heartbeat message with an Whenever a child receives a Heartbeat message where the child is
ImmediateTRACK field set to 1, it immediately sends a TRACK to its identified in the ChildrenList field, it immediately sends a TRACK to
parent. If a parent does not receive a TRACK message from a child its parent. If a parent does not receive a TRACK message from a
after waiting a period of 2*ChildRTT after the last Heartbeat child after waiting a period of 2*LocalRTT after the last Heartbeat
message to that child, it declares the child failed, and removes it message to that child, it declares the child failed, and removes it
from the parent's child membership list. It informs the Tree BB from the parents child membership list. It informs the Tree BB using
using the removeChild interface. the removeChild interface.
A child or a repair head detects the failure of a sender if it does A child or a Repair Head detects the failure of a Sender if it does
not receive a Data or NullData message from a sender in not receive a Data or NullData message from a Sender in
FAILURE_DETECTION_REDUNDANCY * NULL_DATA_PERIOD. FAILURE_DETECTION_REDUNDANCY * NULL_DATA_PERIOD.
Note that the more receivers there are in a tree, and the higher Note that the more Receivers there are in a tree, and the higher the
the loss rate, the larger FAILURE_DETECTION_REDUNDANCY must be, in loss rate, the larger FAILURE_DETECTION_REDUNDANCY must be, in order
order to give the same probability that erroneous failures won't be to give the same probability that erroneous failures wont be
declared. declared.
7.1.6 Fault Notification 4.1.6 Fault Notification
When a parent detects the failure of a child, it adds a failure When a parent detects the failure of a child, it adds a failure
notification field to the next TRACK messages that it sends up the notification field to the next TRANSMISSION_REDUNDANCY TRACK messages
tree. It sends this notification multiple times because TRACKs are that it sends up the tree. It sends this notification multiple times
not delivered reliably. A failure notification field includes the because TRACKs are not delivered reliably. A failure notification
failure code, as well as a list of one or more failed nodes. field includes the failure code, as well as a list of one or more
Failure notifications are aggregated up the tree, according to the failed nodes. Failure notifications are aggregated up the tree and
rules in 7.3. A failure notification is not a definitive report of delivered to the Sender. A failure notification is not a definitive
a failure, as the child may have moved to a different repair head. report of a node failure, as the child may have detected a
communication failure with its parent and moved to a different Repair
Head.
7.1.7 Fault Recovery 4.1.7 Fault Recovery
The Fault Recovery algorithms require a list of one or more The Fault Recovery algorithms require a list of one or more addresses
addresses of alternate parents that can be bound to, and that still of alternate parents that can be bound to, and that still provide
provide loop free operation. loop free operation.
If a child detects the failure of its parent, it then re-runs the If a child detects the failure of its parent, it then re-runs the
Bind operation to a new parent candidate, in order to rejoin the Bind operation to a new parent candidate, in order to rejoin the
tree. As described above in section 7.1.2, a node may perform a tree. A node may perform a late join, i.e. binding with a Repair
late join, i.e. binding with a repair head which cannot provide all Head which cannot provide all the necessary repair data, only if
the necessary repair data, only if allowed by the PI. allowed by the PI.
7.2 TRACK Generation
This section describes the algorithms used by the receiver to
determine when to send the TRACK messages.
TRACK messages are sent from receivers to their parents. TRACK
messages may be sent for the following purposes:
- to request retransmission of messages
- to advance the sender's transmission window for flow control
purposes
- to deliver end-to-end confirmation of data reception
- to propagate other relevant feedback information up through the
session (such as RTT and loss reports, for congestion control)
The TRACK PI also makes use of the NACK BB, which requests 4.1.8 Distributed Membership.
retransmission of messages from a parent. The TRACK request and
response algorithms should be highly similar to the NACK algorithms
for this specific case.
7.2.1 TRACK Generation with the Rotating TRACK Algorithm Each Repair Head is responsible for maintaining a set of state
variables on the status of its children. Unlike the Generic Router
Assist, this is hard state, that only is removed when a child leaves
that Repair Head gracefully, or after the Repair Head detects that a
child has failed. These variables MUST include, but are not
necessarily limited to, the following:
- ChildID. This is the two byte identifier assigned to the Child by
the Repair Head. This uniquely identifies this Child to this
Repair Head, but has no meaning outside that scope.
- GlobalChildIdentifier. This is the globally unique identifier for
this Child.
- ChildRTT. This is the weighted average of the local RTT to this
Child.
- LastTRACK. This is the contents of the last TRACK message sent
from this Child, if any, not including options.
- LastApplicationLevelConfirmation. This is the contents of the last
Application Level Confirmation sent from this Child, if any.
- Last Statistics. This is the contents of the last Statistics
message sent from this Child, if any.
- ChildLiveness. This is a set of variables that keep track of the
liveness of each child. This includes the last time a TRACK
message was received from this child, as well as the number of
Heartbeat messages that have been directed at it, and the time at
which the last Heartbeat message was sent to the child. Please see
Fault Detection, above, for more details.
Each receiver sends a TRACK message to its parent once per 4.2 Data Sessions.
AckWindow of data messages received. A receiver uses an offset
from the boundary of each AckWindow to send its TRACK, in order to
reduce burstiness of control traffic at the parents. Each parent
has a maximum number of children, MaxChildren. When a child binds
to the parent, the parent assigns a locally unique ChildID to that
child, between 0 and MaxChildren-1.
Each child in a tree generates a TRACK message at least once every 4.2.1 Data Transmission and Retransmission
AckWindow of data messages, when the most recent data message's
sequence number, modulo AckWindow, is equal to MemberID. If the
message that would have triggered a given TRACK for a given node is
missed, the node will generate the TRACK as soon as it learns that
it has missed the message, typically through receipt of a higher
numbered data message.
Together, AckWindow and MaxChildren determine the maximum ratio of Data is multicast by a Sender on the Data Multicast Address via the
control messages to data messages seen by each parent, given a Data Channel Protocol. The Data Channel Protocol is responsible for
constant load of data messages. taking care of as many retransmissions as possible, and for ensuring
the goodput of the Data Session. TRACK is then responsible for
providing OPTIONAL flow control and application level reliability.
The mechanics of an application level confirmation of delivery are
handled by TRACK, including keeping track of the distributed
membership list of receivers and aggregating acknowledgements up the
Control Tree. Please see below for more details on flow control and
application level confirmation.
In each data message, the sender advertises the current PacketRate A common scenario for handling recovery of lost messages is to allow
(measured in messages per second) it is sending data at. This rate the Data Channel Protocol to provide statistical reliability, and
is generated by the congestion control algorithms in use at the then allow TRACK to provide retransmissions for more persistent
sender. failure cases, such as if a Receiver is not able to receive any Data
messages for a few minutes.
At the time a node sends a regular TRACK, it also computes a Retransmissions of data messages may be multicast by the Sender on
TRACKTimeout value: the Data Multicast Address or be multicast on a Local Control Channel
by a Repair Head.
interval = AckWindow / PacketRate A Repair Head joins all of the Data Multicast Addresses that any of
its descendants have joined. A Repair Head is responsible for
receiving and buffering all data messages using the reliability
semantics configured for a stream. As a simple to implement option,
a Repair Head MAY also function as a Receiver, and pass these data
messages to an attached application.
TRACKTimeout = 2 * interval For additional fault tolerance, a Receiver MAY subscribe to the
multicast address associated with the Local Control Channel of one or
more Repair Heads in addition to the multicast address of its parent.
In this case it does not bind to this Repair Head or Sender, but will
process Retransmission messages sent to this address. If the
Receivers Repair Head fails and it transfers to another Repair Head,
this minimizes the number of data messages it needs to recover after
binding to the new Repair Head.
If no TRACKs are sent within TRACKTimeout interval, a TRACK is 4.2.2 Local Retransmission
generated, and TRACKTimeout is increased by a factor of 2, up to a
value of MAX_TRACK_TIMEOUT.
This timer mechanism is used by a receiver to ensure timely repair If a Repair Head or Sender determines from its child nodes TRACK
of lost messages and regular feedback propagation up the tree even messages that a Data message was missed, the Repair Head retransmits
when the sender is not sending data continuously. This mechanism the Data message. The Repair Head or Sender multicasts the
complements the AckWindow-based regular TRACK generation mechanism. Retransmission message on its multicast Local Control Channel. In
the event that a Repair Head receives a retransmission and knows that
its children need this repair, it re-multicasts the retransmission to
its children.
7.2.2 Local Repair The scope of retransmission (the multicast TTL) is considered part of
the Control Channels multicast address, and is derived during tree
configuration.
A repair head maintains the following state for each of its A Repair Head maintains the following state for each of its children,
children, for the purpose of providing repair service to the local for the purpose of providing repair service to the local group:
group:
- HighestConsecutivelyReceived: a sequence number indicating all - HighestConsecutivelyReceived. A Sequence Number indicating all
Data messages up to this number (inclusive) have been received Data messages up to this number (inclusive) that have been received
by a given child. by a given child.
- MissingPackets: a data structure to keep track of the reception - MissingMessages. A data structure to keep track of the reception
status of the Data messages with sequence number higher than status of the Data messages with Sequence Number higher than
HighestConsecutivelyReceived. HighestConsecutivelyReceived.
In addition, a repair head also maintains other state for purposes The minimum HighestConsecutivelyReceived value of all its children is
of feedback aggregation described in the next section. kept as the variable LocalStable.
The minimum HighestConsecutivelyReceived value of all its children A Repair Head also maintains a retransmission buffer. The size of the
is kept as the variable LocalStable. retransmission buffer MUST be greater than the maximum value of a
Senders transmission window. The retransmission buffer MUST keep all
the Data messages received by the Repair Head with Sequence Number
higher than LocalStable, optionally some messages with Sequence
Number lower than LocalStable if there is room (beyond the maximum
value of Senders transmission window). The latter messages are kept
in the retransmission buffer in case a Receiver from another group
losses its parent and needs to join this group.
A repair head also maintains a retransmission buffer. The size of As TRACK messages are received, the Repair Head updates the above
the retransmission buffer must be greater than the maximum value of state variables.
a sender's transmission window. The retransmission buffer must keep
all the Data messages received by the repair head with sequence
number higher than LocalStable, optionally some messages with
sequence number lower than LocalStable if there is room (beyond the
maximum value of sender's transmission window). The latter
messages are kept in the retransmission buffer in case a receiver
from another group losses its parent and needs to join this group.
As TRACK messages are received, the repair head updates the above To perform local repair, a Repair Head implements a retransmission
states. queue with memory. Each lost message is entered into the
retransmission queue in increasing order according to its Sequence
Number. If the same Data message has already been retransmitted
recently (recognized due to the queues memory) it is delayed by the
local group RTT (see roundtrip time measurement) before
retransmission.
To perform local repair, a repair head implements a retransmission Retransmissions MAY NOT be sent at a faster rate than the current
queue with memory. Each lost message (reported by a child using TransmissionRate advertised by the Sender.
the BitMask field) is entered into the retransmission queue in
increasing order according to its sequence number. If the same Data
message has already been retransmitted recently (recognized due to
the queue's memory) it is delayed by the local group RTT (see
roundtrip time measurement) before retransmission.
The retransmissions are sent using the same PacketRate is that used 4.2.3 Flow and Rate Control
by the sender.
7.2.3 Flow Control Window Update TRACK offers the ability to limit the rate of Data traffic, through
both flow control and rate limits.
When a receiver sends a TRACK to its parent, the HighestAllowed When a Receiver sends a TRACK to its parent, the HighestAllowed field
field provides information on the status of the receiver's flow provides information on the status of the Receivers flow control
control window. The value of HighestAllowed is computed as window. The value of HighestAllowed is computed as follows:
follows:
HighestAllowed = seqnum + ReceiverWindow HighestAllowed = seqnum + ReceiverWindow
Where seqnum is the highest sequence number of consecutively Where seqnum is the highest Sequence Number of consecutively received
received data messages at the receiver. The size of the data messages at the Receiver. The size of the ReceiverWindow may
ReceiverWindow may either be based on a parameter local to the either be based on a parameter local to the Receiver or be a global
receiver or be a global parameter. parameter.
7.2.4 Reliability Window If flow control is enabled for a given Data Session, then a Sender
MUST NOT send any Data messages to the Data Channel Protocol that are
higher than the current value for HighestAllowed that it has. On
startup, HighestAllowed is initialized to ReceiverWindow.
The sender and each repair head maintain a window of messages for In addition, the Sender application MAY provide minimum and maximum
possible retransmission. As messages are acknowledged by all of rate limits. Unless overridden by the Data Channel Protocol, a
its children, they are released from the parent's retransmission Sender will not offer Data messages to the Data Channel Protocol at
buffer, as described in 7.2.2. In addition, there are two global lower than MinimumDataRate (except possibly during short periods of
parameters that can affect when a parent releases a data message time when certain slow Receivers are being ejected), or higher than
from the retransmission buffer -- MinHoldTime, and MaxHoldTime. MaximumDataRate. If a Receiver is not able to keep up with the
minimum rate for a period of time, it SHOULD leave the group
promptly. Receivers that leave the group MAY attempt to rejoin the
group at a later time, but SHOULD NOT attempt an immediate
reconnection.
MinHoldTime specifies a minimum length of time a message must be 4.2.4 Reliability Window
held for retransmission from when it was received. This parameter
is useful to handle scenarios where one or more children have been The Sender and each Repair Head maintain a window of messages for
possible retransmission. As messages are acknowledged by all of its
children, they are released from the parents retransmission buffer,
as described in 4.2.2. In addition, there are two global parameters
that can affect when a parent releases a data message from the
retransmission buffer -- MinHoldTime, and MaxHoldTime.
MinHoldTime specifies a minimum length of time a message must be held
for retransmission from when it was received. This parameter is
useful to handle scenarios where one or more children have been
disconnected from their parent, and have to reconnect to another. disconnected from their parent, and have to reconnect to another.
If, for example, MinHoldTime is set to FAILURE_DETECTION_REDUNDANCY If, for example, MinHoldTime is set to FAILURE_DETECTION_REDUNDANCY *
* 2 * ConstantHeartbeatPeriod, then there is a high likelihood that 2 * ConstantHeartbeatPeriod, then there is a high likelihood that any
any child will be able to recover any lost messages after child will be able to recover any lost messages after reconnecting to
reconnecting to another parent. another parent.
The sender continually advertises to the members of the data The Sender continually advertises to the members of the Data Session
session both edges of its retransmission window. The higher value both edges of its retransmission window. The higher value is the
is the SeqNum field in each Data or NullData message, which SeqNum field in each Data or NullData message, which specifies the
specifies the highest sequence number of any data message sent. highest Sequence Number of any data message sent. The trailing edge
The trailing edge of the window is advertised in the of the window is advertised in the HighestReleased field. This
HighestReleased field. This specifies the largest sequence number specifies the largest Sequence Number of any message sent that has
of any message sent that has subsequently been released from the subsequently been released from the Senders retransmission window.
sender's retransmission window. If both values are the same then If both values are the same then the window is presently empty. Zero
the window is presently empty. Zero is not a legitimate value for is not a legitimate value for a data Sequence Number, so if either
a data sequence number, so if either field has a value of zero, field has a value of zero, then no messages have yet reached that
then no messages have yet reached that state. All sequence number state. All Sequence Number fields use Sequence Number arithmetic so
fields use sequence number arithmetic so that a data session can that a Data Session can continue after exhausting the Sequence Number
continue after exhausting the sequence number space. space.
When a member of a data session receives an advertisement of a new When a member of a Data Session receives an advertisement of a new
HighestReleased value, it stores this, and is no longer allowed to HighestReleased value, it stores this, and is no longer allowed to
ask for retransmission for any messages up to and including the ask for retransmission for any messages up to and including the
HighestReleased value. If it has any outstanding missing messages HighestReleased value. If it has any outstanding missing messages
that are less than or equal to HighestReleased, it MAY move forward that are less than or equal to HighestReleased, it MAY move forward
and continue delivering the next data messages in the stream. It and continue delivering the next data messages in the stream. It
also SHOULD report an error for the messages that are no longer also SHOULD report an error for the messages that are no longer
recoverable. recoverable.
MaxHoldTime specifies the maximum length of time a message may be MaxHoldTime specifies the maximum length of time a message may be
held for retransmission. This parameter is set at the sender which held for retransmission. This parameter is set at the Sender which
uses it to set the HighestReleased field in data message headers. uses it to set the HighestReleased field in data message headers.
This is particularly useful for real-time, semi-reliable streams This is particularly useful for real-time, semi-reliable streams such
such as live video, where retransmissions are only useful for up to as live video, where retransmissions are only useful for up to a few
a few seconds. When combined with Unordered delivery semantics, seconds. When combined with Unordered delivery semantics, and
and application-level jitter control at the receivers, this application-level jitter control at the Receivers, this provides Time
provides Time Bounded Reliability. Obviously, MaxHoldTime must Bounded Reliability. MaxHoldTime MUST always be larger than
always be larger than MinHoldTime. MinHoldTime.
7.2.5 Confirmed Delivery 4.2.5 Ordering Semantics
Flow control and the reliability window are concerned with goodput, TRACK offers two flavors of ordering semantics: Ordered or Unordered.
of delivering data with a high probability that it is delivered at One of these is selected on a per session basis as part of the
all receivers. However, neither mechanism provides explicit Session Configuration Parameters.
confirmation to the sender as to the list of recipients for each
message. Confirmed delivery allows applications to determine the
set of applications that have received a set of data messages.
To request this service, a sender fills the AppSynch field of data Unordered service provides a reliable stream of messages, without
messages with the sequence number of the highest data message it duplicates, and delivers them to the application in the order
wishes to confirm delivery of. It continues to do so until it received. This allows the lowest latency delivery for time sensitive
receives confirmation, moves the AppSynch point forward to a higher applications. It may also be used by applications that wish to
sequence number, or declares an error. provide its own jitter control.
When a receiver gets a data message with a non-zero AppSynch field, Ordered service provides TCP semantics on delivery. All messages are
it starts including the highest sequence number that has been delivered in the order sent, without duplicates.
acknowledged by the application in the ApplicationConfirms field of
each TRACK message that it sends up the tree. In order to provide
reliable delivery of this acknowledgement, this continues so long
as a receiver gets data messages with non-zero AppSynch fields.
Each receiver is responsible for locally deciding the value of the 4.2.6 Retransmission Requests.
ApplicationConfirms field. There are two primary issues a receiver
must consider in setting this field: the reliability semantics of
the data stream, and when a given message is considered confirmed
at the receiver. As this is an application level confirmation, a
handshake with the application is required to get this
confirmation.
One example of how an application can implicitly signal A Receiver detects that it has missed one or more Data messages by
confirmation of delivery is through the freeing of buffers passed gaps in the sequence numbers of received messages. Each Receiver
to it by the transport. The API could specify that whenever an keeps track of HighestSequenceNumber, the highest sequence number
application has freed up a buffer containing one or more data known of for a Data Session, as observed from Data, RData, and
messages, then these messages are considered acknowledged by the NullData messages. Any sequence numbers between HighestReleased and
application. Alternatively, the application could be required to HighestSequenceNumber that have not been received are assumed to be
explicitly acknowledge each message. missing.
With a given transport-application API for signaling When a Receiver detects missing messages it MAY send off a request
acknowledgement, the transport then keeps track of all contiguous for retransmission, if local retransmission is enabled. It does this
acknowledgements from that application, and reports these up in the by sending a Retransmission Request message. The timing of this
ApplicationConfirms field. If one or more messages can not be request is described below.
acknowledged, the receiver should pass an error code describing the
type of failure that occurred, and the sequence number of the first
message that has not yet been delivered.
If MaxHoldTime is not in use for a data stream, so that delivery is 4.2.7 End Of Stream.
fully reliable, then any message that can not be delivered will be
considered a fatal error for that receiver. If MaxHoldTime has a
non-zero value, then any messages that could not be delivered, but
are less than HighestReleased as advertised by the sender, are not
reported as errors.
In addition to the AppSynch field, a sender may also set the When an application signals that a Data Session is complete, the
ImmediateACK field to 1. When a node gets a data message that has Sender advertises this to its children by setting the End of Session
this flag set, it will immediately send a TRACK after processing option on the last Data Message in the Data Session, as well as all
that message. subsequent retransmissions of that Data Message, and all subsequent
Null Data messages.
7.3 Feedback Aggregation The Sender SHOULD NOT leave the Data Session until it has a report
from the TRACK reports that all group members have left the Data
Session, or it has waited a period of at least
FAILURE_DETECTION_REDUNDANCY * TrackTimeout seconds.
This section describes how repair heads perform aggregation on 4.3 Control Traffic Generation and Aggregation.
feedback information sent up in the fields of the TRACK message,
and the purposes for performing such aggregation. One of the largest challenges for scaleable reliable multicast
There are many reasons for providing feedback from all the protocols has been that of controlling the potential explosion of
receivers to the sender in an aggregated form. The major ones are control traffic. There is a fundamental tradeoff between the latency
listed below: with which losses can be detected and repaired, and the amount of
control traffic generated by the protocol.
TRACK messages are the primary form of control traffic in this BB.
They are sent from Receivers and Repair Heads to their parents.
TRACK messages may be sent for the following purposes:
- to request retransmission of messages
- to advance the Senders transmission window for flow control
purposes
- to deliver application level confirmation of data reception
- to propagate other relevant feedback information up through the
session (such as RTT and loss reports, for congestion control)
4.3.1 TRACK Generation with the Rotating TRACK Algorithm
Each Receiver sends a TRACK message to its parent once per AckWindow
of data messages received. A Receiver uses an offset from the
boundary of each AckWindow to send its TRACK, in order to reduce
burstiness of control traffic at the parents. Each parent has a
maximum number of children, MaxChildren. When a child binds to the
parent, the parent assigns a locally unique ChildID to that child,
between 0 and MaxChildren-1.
Each child in a tree generates a TRACK message at least once every
AckWindow of data messages, when the most recent data messages
Sequence Number, modulo AckWindow, is equal to MemberID. If the
message that would have triggered a given TRACK for a given node is
missed, the node will generate the TRACK as soon as it learns that it
has missed the message, typically through receipt of a higher
numbered data message.
Together, AckWindow and MaxChildren determine the maximum ratio of
control messages to data messages seen by each parent, given a
constant load of data messages.
In each data message, the Sender advertises the current MessageRate
(measured in messages per second) it is sending data at. This rate
is generated by the congestion control algorithms in use at the
Sender.
At the time a node sends a regular TRACK, it also computes a
TRACKTimeout value:
interval = AckWindow / MessageRate
TRACKTimeout = 2 * interval
If no TRACKs are sent within TRACKTimeout interval, a TRACK is
generated, and TRACKTimeout is increased by a factor of 2, up to a
value of MAX_TRACK_TIMEOUT.
This timer mechanism is used by a Receiver to ensure timely repair of
lost messages and regular feedback propagation up the tree even when
the Sender is not sending data continuously. This mechanism
complements the AckWindow-based regular TRACK generation mechanism.
4.3.2 TRACK Aggregation.
There are many reasons for providing feedback from all the Receivers
to the Sender in an aggregated form. The major ones are listed
below:
1) End-to-end delivery confirmation. This confirmation tells the 1) End-to-end delivery confirmation. This confirmation tells the
sender that all the receivers (in the entire tree) have received Sender that all the Receivers (in the entire tree) have received data
data packets up to a certain sequence number. The field that messages up to a certain Sequence Number. This is carried in an
carries this information is AppSynch. Application Level Confirmation message.
2) Flow control. The aggregated information is carried in the 2) Flow control. The aggregated information is carried in the field
field HighestAllowed. It tells the sender the highest sequence HighestAllowed. It tells the Sender the highest Sequence Number that
number that all the receivers (in the entire tree) are prepared to all the Receivers (in the entire tree) are prepared to receive.
receive.
3) Identifying the slowest receiver. The aggregated information is 3) Congestion control feedback. Information about the state of the
carried in the field Slowest. The sender can use this value as tree can be passed up to help control the congestion control
part of congestion control. algorithms for the group.
4) Counting current membership in the group. This information is 4) Counting current membership in the group. This information is
carried in the field SubTreeCount. This lets the sender know the carried in the field SubTreeCount. This lets the Sender know the
number of receivers currently connected to the repair tree. number of Receivers currently connected to the repair tree.
5) Measuring the round-trip time from the sender to the "worst 5) Measuring the round-trip time from the Sender to the "worst"
receiver. Receiver.
A repair head maintains state for each child. Each time a TRACK A Repair Head maintains state for each child. Each time a TRACK
(from a child) is received, the corresponding states for that child (from a child) is received, the corresponding states for that child
are updated based on the information in the TRACK message. When a are updated based on the information in the TRACK message. When a
repair head sends a TRACK message to its parent, the following Repair Head sends a TRACK message to its parent, the following fields
fields of its TRACK message are derived from the aggregation of the of its TRACK message are derived from the aggregation of the
corresponding states for its children. The following rules corresponding states for its children. The following rules describe
describe how the aggregation is performed: how the aggregation is performed:
- AppSynch: take the minimum of the AppSynch value from all - WorstLossRate. Take the maximum value of the WorstLossRate from
children all Children.
- SubTreeCount. Take the sum of the SubTreeCount from all Children.
- HighestAllowed. Take the minimum of the HighestAllowed value from
all children.
- WorstEdgeThroughput. Take the minimum value of the
WorstEdgeThroughput field from all Children.
- UnicastCost. Take the sum of the UnicastCost from all Children.
- MulticastCost. Take the sum of the MulticastCost from all
Children.
- SenderDallyTime: take the minimum value, for all of the children,
of (childs reported SenderDallyTime + childs local dally time).
- FailureCount: take the sum of the FailureCount for all Children.
- FailureList: concatenate the FailureList fields for all Children,
up to a maximum list size of MaxFailureListSize.
- HighestAllowed: take the minimum of the HighestAllowed value from Note, the SenderTimeStamp, ParentTimestamp, and ParentDallyTime
all children fields are not aggregated. The Sender will derive the roundtrip time
to the worst Receiver by doing its local aggregation for
SenderDallyTime and then compute:
RTT = currentTime SenderTimeStamp SenderDallyTime.
- Slowest: this is a measure of how slow the slowest member in the Application level confirmations (ALCs) are handled as follows. For a
whole subtree is; take either the minimum (or maximum) of the set of ALC requests from receivers, the ones with the highest value
Slowest value from all children (depending what the Slowest measure for HighConfirmationSequenceNumber are considered, and all others are
is). discarded.
- SubTreeCount: take the sum of the SubTreeCount from all For the ConfirmationStatus field, the following rules apply. Note
children that ConfirmationStatus of SomeReceiversAcknowledge can correspond to
a ConfirmationCount of zero.
If all children report AllReceiversAcknowledge Then
ConfirmationStatus = AllReceiversAcknowlege
Else If at least one child reports (ListOfFailures OR
FailuresExceedMaximumListSize) Then
If the count of all reported failures >
MaximumFailureListSize Then
ConfimationStatus = FailuresExceedMaximumListSize
Else
ConfirmationStatus = ListOfFailures
Else
ConfirmationStatus = SomeReceiversAcknowledge
- SenderDallyTime: take the minimum value, for all of the children, The ConfirmationCount field is equal to the sum of the
of ConfimationCount for the aggregated ALC reports of all Children. The
PendingCount field is equal to the sum of the PendingCount fields of
all Children. The FailureList field is the concatenation of the
FailureList fields of all aggregated ALC reports of all children, up
to a maximum length of MaximumFailureListSize.
child's reported SenderDallyTime + child's local dally time In addition to these fields with fixed aggregation rules, TRACK
supports a set of user defined aggregation statistics. These
statistics are self describing in terms of their data type and
aggregation method. Statistics reports are numbered, and only the
most recent statistics report request is aggregated to the Sender.
Statistics are aggregated over the set of Child statistics reports
that have been received with that number. Aggregation methods
include minimum, maximum, sum, product, and concatenation.
Note, the SendTimeStamp field is left alone. The sender will 4.3.3 Statistics Reporting.
derive the roundtrip time to the worst receiver by doing its local
aggregation for SenderDallyTime and then compute:
RTT = currentTime - SendTimeStamp - SenderDallyTime. A Sender can request a list of aggregated statistics from all
Receivers in the group. There are a set of predefined statistics,
such as loss rate and average throughput. There is also the capacity
to request a set of other TRACK statistics, as well as application
defined statistics.
7.4 Measuring Round Trip Times The format of each statistic is self-describing, both in terms of
data type, size, and aggregation method. A Sender reliably sends out
a statistics request by attaching it as an option to a Data message.
When a Receiver gets a request for a statistic, it fills in the data
fields, and forwards it up the tree in the next TRACK message. Since
TRACKs are not reliable, multiple copies are sent in a total of
NumReplies consecutive TRACK messages from each Receiver. Each
statistics report is aggregated according to the method described in
the statistic, and the result is delivered to the Sender.
Most aggregation options have fixed length no matter how many
Receivers there are. The one exception is concatenation, which
creates a list of values from some or all Receivers, up to a length
of MaximumStatisticsListSize entries. It is NOT RECOMMENDED to use
this to create group-wide lists, unless the groups size is carefully
controlled.
4.4 Application Level Confirmed Delivery.
Flow control and the reliability window are concerned with goodput,
of delivering data with a high probability that it is delivered at
all Receivers. However, neither mechanism provides explicit
confirmation to the Sender as to the list of recipients for each
message. Application level confirmed delivery allows applications to
determine the set of applications that have received a set of data
messages.
There are three primary factors that determine the reliability
semantics of a message: the senders knowledge of the Receiver list,
the application level actions that must be performed in order to
consider a message delivered, and the response to persistent failure
conditions at Receivers. For example, an extremely strong
distributed guarantee would consist of the following. First, the
full Receiver membership list is known at the Sender, and verified to
make sure no Receivers have left the group. Second, the application
at each Receiver must write the Data to persistent store before it
can be acknowledged. Third, Receivers are given a very long period
of time - say one hour to recover all lost Data messages, before
they are ejected from the Data Session. In the meantime,
transmission of Data messages is flow controlled by the slowest
receivers.
A weaker form of reliability would include the following. First,
that the Sender gets a count of Receivers, and otherwise depends on
the distributed group membership algorithms to maintain the
membership list. Second, that Data messages are considered reliably
delivered as soon as the application receives the Data from TRACK.
Third, that Retransmissions are limited to only 30 seconds, and
Receivers must choose to leave the Data Session or continue with
missing Data messages, if a failure takes longer than this period to
recover from.
TRACK provides the functionality to easily implement a wide range of
application level confirmation semantics, based on how these three
items are configured. It is the applications responsibility to then
select the configurations it desires for a given Data Session.
4.4.1 Application Level Confirmation Mechanisms
The primary mechanism for application level confirmation (ALC) of
delivery is the ALC report. To check for ALC of delivery, a Sender
issues a Application Level Confirmation Request, by attaching this
message as an option to a Data message, and reliably transmitting it
to all Receivers. Each ALC Request includes a specified level of
reliability, a reply redundancy factor, and the range of Data message
sequence numbers that the ALC Confirmation covers.
When a Receiver gets an ALC Request, it checks to see if the
application has delivered the specified range of Data Messages,
including both the Low Confirmation Sequence Number and the High
Confirmation Sequence Number. When it sends the next TRACK out, it
sets the ConfirmationStatus field to either SomeReceiversAcknowledge
if it is still pending confirmation, AllReceiversAcknowledge if it
has application level confirmation, ListOfFailures if it has a
failure and MaximumFailureListSize > 0, or
FailuresExceedsMaximumListSize otherwise. It also sets the
ConfirmCount to 1 if it has a confirmation, and PendingCount to 1 if
it is still pending. If the Immediate ACK bit is set in the ALC
Request, the Receiver generates an ACK immediately.
One example of how an application can implicitly signal confirmation
of delivery is through the freeing of buffers passed to it by the
transport. The API could specify that whenever an application has
freed up a buffer containing one or more data messages, then these
messages are considered acknowledged by the application.
Alternatively, the application could be required to explicitly
acknowledge each message.
4.5 Distributed RTT Calculations.
This TRACK BB provides two algorithms for distributed RTT This TRACK BB provides two algorithms for distributed RTT
calculations LocalRTT measurements and SenderRTT measurements. calculations LocalRTT measurements and SenderRTT measurements.
LocalRTT measurements are only between a parent and its children. LocalRTT measurements are only between a parent and its children.
SenderRTT measurements are end-to-end RTT measurements, measuring SenderRTT measurements are end-to-end RTT measurements, measuring the
the RTT to the worst receiver as selected by the congestion control RTT to the worst Receiver as selected by the congestion control
algorithms. algorithms.
The SenderRTT is useful for congestion control. It can be used to The SenderRTT is useful for congestion control. It can be used to set
set the data rate based on the TCP response function, which is the data rate based on the TCP response function, which is being
being proposed for the congestion control building block. proposed for the congestion control building blocks.
The LocalRTT can be used to (a) quickly detect faulty children (as The LocalRTT can be used to (a) quickly detect faulty children (as
described in 7.1) or (b) avoid sending unnecessary retransmissions described under fault detection) or (b) avoid sending unnecessary
(as described in 7.2 in the local repair algorithm). retransmissions (as described in the local repair algorithm).
In the case of LocalRTT measurements, a parent initiates In the case of LocalRTT measurements, a parent initiates measurement
measurement by including a ParentTimestamp field in a Heartbeat by including a ParentTimestamp field in a Heartbeat message sent to
message sent to its children. When a child receives a Heartbeat its children. When a child receives a Heartbeat message with this
message with this field set, it notes the time of receipt using its field set, it notes the time of receipt using its local system clock,
local system clock, and stores this with the message as and stores this with the message as HeartbeatReceiveTime. When the
HeartbeatReceiveTime. When the child next generates a TRACK, just child next generates a TRACK, just before sending it, it measures its
before sending it, it measures its system clock again as system clock again as TRACKSendTime, and calculates the
TRACKSendTime, and calculates the LocalDallyTime. LocalDallyTime.
LocalDallyTime = TRACKSendTime - HeartbeatReceiveTime. LocalDallyTime = TRACKSendTime HeartbeatReceiveTime.
The child includes this value, along with the ParentTimestamp The child includes this value, along with the ParentTimestamp field,
field, as fields in the next TRACK message sent. Every heartbeat as fields in the next TRACK message sent. Every heartbeat message
message that is multicast to all children SHOULD include a that is multicast to all children SHOULD include a ParentTimestamp
ParentTimestamp field. field.
The SenderRTT algorithm is similar. A sender initiates the process The SenderRTT algorithm is similar. A Sender initiates the process
by including a SenderTimestamp field in a data message. When a by including a SenderTimestamp field in a data message. When a
receiver gets a message with this field set, it keeps track of the Receiver gets a message with this field set, it keeps track of the
DataReceiveTime for that message, and when it generates the next DataReceiveTime for that message, and when it generates the next
TRACK message, includes the SenderTimestamp and SenderDallyTime TRACK message, includes the SenderTimestamp and SenderDallyTime
value. These values are aggregated by Repair Heads, as described value. These values are aggregated by Repair Heads, as described
in section 7.3. above.
Each node only keeps track of the most recent value for Each node only keeps track of the most recent value for
{SenderTimestamp, DataReceiveTime} and {ParentTimestamp, {SenderTimestamp, DataReceiveTime} and {ParentTimestamp,
HeartbeatReceiveTime}, replacing any older values any time that a HeartbeatReceiveTime}, replacing any older values any time that a new
new message is received with these values set. As long as it has message is received with these values set. As long as it has non-
non-zero values to report, each node sends up both a zero values to report, each node sends up both a {SenderTimestamp,
{SenderTimestamp, SenderDallyTime} and a {ParentTimestamp, SenderDallyTime} and a {ParentTimestamp, LocalDallyTime} set of
LocalDallyTime} set of fields in each TRACK message generated. fields in each TRACK message generated.
These measurements need to be averaged by the TRACK PI. Unless redefined by the TRACK PI, these RTT measurements are averaged
using an exponentially weighted moving average, where the first RTT
measurement, RTT_measurement, initializes the average RTT_average,
and then each successive measurement is averaged in according to the
following formula. The RECOMMENDED value for alpha is 1/8.
RTT_average = RTT_measurement * alpha + RTT_average (1-alpha)
8. Security 4.6 SNMP Support
This BB does not specifically deal with security. It is the The Repair Heads and the Sender are designed to interact with SNMP
responsibility of the TRACK PI or the Security BB. This issue is management tools. This allows network managers to easily monitor and
covered in the Security Requirements For TRACK draft [HW00]. control the sessions being transmitted. All TRACK nodes MAY have
SNMP MIBs defined in a separate document. SNMP support is OPTIONAL
for Receiver nodes, but is RECOMMENDED for all other nodes.
9. References 4.7 Late Join Semantics
[HW00] T. Hardjono, B. Whetten, "Security Requirements For TRACK," TRACK offers three flavors of late join support:
Internet Draft, Internet Engineering Task Force, June, 2000. a) No Recovery
A Receiver binds to a Repair Head after the session has started
and agrees to the reliability service starting from the Sequence
Number in the current data message received from the Sender.
b) Continuation
This semantic is used when a Receiver has lost its Repair Head
and needs to re-affiliate. In this case, the Receiver must
indicate the oldest Sequence Number it needs to repair in order
to continue the reliability service it had from the previous
Repair Head. The binding occurs if this is possible.
c) No Late Join
For some applications, it is important that a Receiver receives
either all data or no data (e.g. software distribution). In this
case option (c) is used.
[KLCWTCTK01] M. Kadansky, D. Chiu, B. Whetten, B. Levine, G. These are specified by the LateJoinSemantics session parameter, and
Taskale, B. Cain, D. Thaler, S. Koh, "Reliable Multicast Transport enforced by a Parent when a Child attempts to bind to it.
Building Block: Tree Auto-Configuration," Internet Draft, Internet
Engineering Task Force, March, 2001.
[KV00] R. Kermode, L. Vicisano, "Author Guidelines for RMT Building 5. Message Types
Blocks and Protocol Instantiation Documents," Internet Draft, The following table summarizes the messages and their fields used by
Internet Engineering Task Force, June, 2000. the TRACK BB. All messages contain the session identifier. For more
details, please see the sample TRACK PI [17].
[SFCGLTLLBEJMRSV00] T. Speakman, D. Farinacci, J. Crowcroft, J. +--------------------------------------------------------------------+
Gemmell, S. Lin, A. Tweedly, D. Leshchiner, M. Luby, N. Bhaskar, R. Message From To Mcast? Fields
Edmonstone, K. M. Johnson, T. Montgomery, L. Rizzo, R. +--------------------------------------------------------------------+
Sumanasekera, and L. Vicisano, "PGM Reliable Transport Protocol
Specification," Internet Draft, Internet Engineering Task Force,
November 2000.
[WCP00] B. Whetten, D. Chiu, S. Paul, M. Kadansky, G. Taskale, BindRequest Child Parent no Scope, Level, Role, Rejoin
"TRACK Architecture, A Scalable Real-Time Reliable Multicast BindSequenceNumber, SubTreeCount
Protocol," Internet Draft, Internet Engineering ask Force, July +--------------------------------------------------------------------+
2000.
[WVKHFL00] B. Whetten, L. Vicisano, R. Kermode, M. Handley, S. BindConfirm Parent Child no RepairAddr, BindSequenceNumber
Floyd, and M. Luby, "Reliable Multicast Transport Building Blocks LowestAvailableRepair
for One-to-Many Bulk-Data Transfer," RFC 3048, January 2001. Level, ChildIndex, Role
+--------------------------------------------------------------------+
BindReject Parent Child no Reason, BindSequenceNumber
+--------------------------------------------------------------------+
UnbindRequest Child Parent no Reason, ChildIndex
+--------------------------------------------------------------------+
UnbindConfirm Parent Child no
+--------------------------------------------------------------------+
EjectRequest Parent Child either Reason, AlternateParent
+--------------------------------------------------------------------+
EjectConfirm Child Parent no
+--------------------------------------------------------------------+
Heartbeat Parent Child either Level, ParentTimestamp
ChildrenList, SeqNum
HighestReleased
+--------------------------------------------------------------------+
NullData, Sender all yes SenderTimeStamp, DataLength
OData HighestReleased, SeqNum
EndOfStream, TransmissionRate
+--------------------------------------------------------------------+
Rdata Parent Child yes SenderTimeStamp, DataLength
HighestReleased, SeqNum
EndOfStream, TransmissionRate
+--------------------------------------------------------------------+
Track Child Parent no BitMask, SubTreeCount
Slowest, HighestAllowed
ParentThere, ParentTimeStamp
ParentDallyTime, SenderTimeStamp
SenderDallyTime, CongestionControl
FailureList
+--------------------------------------------------------------------+
ALCRequest Sender Receiver yes Immediate, Reliability
NumReplies, SeqNumRange
+--------------------------------------------------------------------+
ALCReply Child Parent yes SeqNumRange, ConfirmStatus
ConfirmCount, PendingCount
FailedChildren
+--------------------------------------------------------------------+
StatsRequest Sender Receiver yes Immediate, StatsSeqNum
NumReplies, StatsList
+--------------------------------------------------------------------+
StatsReply Child Parent yes StatsSeqNum, StatsList
+--------------------------------------------------------------------+
The various fields of the messages are described as follows:
- BindSequenceNumber: This is a monotonically increasing sequence
number for each bind request from a given Receiver for a given Data
Session.
- Scope: an integer to indicate how far a repair message travels.
This is optional.
- Rejoin: a flag as to whether this Receiver was previously a member
of this Data Session.
- Level: an integer that indicates the level in the repair tree.
This value is used to keep loops in the tree from forming, in
addition to indicating the distance from the Sender. Any changes in
a nodes level are passed down to the Tree BB using the
treeLevelUpdate interface.
- Role: This indicates if the bind requestor is a Receiver or Repair
Head.
- SubTreeCount: This is an integer indicating the current number of
Receivers below the node.
- RepairAddr: This field in the BindConfirm message is used to tell
the Receiver which multicast address the Repair Head will be sending
retransmissions on. If this field is null, then the Receiver should
expect retransmissions to be sent on the Senders data multicast
address.
- AlternateParent: This is an optional field that specifies another
parent a Child may attempt to bind to.
- SeqNum: an integer indicating the Sequence Number of a data message
within a given Data Session. For a Heartbeat, it is the highest
sequence number the parent knows about.
- ChildIndex: This is an integer the Repair Head assigns to a
particular child. The child Receiver uses this value to implement
the rotating TRACK Generation algorithm.
- LowestRepairAvailable: This is the lowest sequence number that a
Repair Head will provide repairs for.
- Reason: a code indicating the reason for the BindReject,
UnbindRequest, or EjectRequest message.
- ParentTimestamp: This field is included in Heartbeat messages to
signal the need to do a local RTT measurement from a parent. It is
the time when the parent sent the message.
- ChildrenList: This field contains the identifiers for a list of
children. As part of the keepalive message, this field together with
the SeqNum field is used to urge those listed Receivers to send a
TRACK (for the provided SeqNum). The Repair Head sending this must
have been missing the regular TRACKs from these children for an
extended period of time.
- SenderTimestamp: This field is included in Data messages to signal
the need to do a roundtrip time measurement from the Sender, through
the tree, and back to the Sender. It is the time (measured by the
Senders local clock) when it sent the message.
- ApplicationSynch: a Sequence Number signaling a request for
confirmed delivery by the application.
- EndOfStream: indicates that this message is the end of the data for
this session.
- TransmissionRate: This field is used by the Sender to tell the
Receivers its sending rate, in messages per second. It is part of
the data or nulldata messages.
- HighestReleased: This field contains a Sequence Number,
corresponding to the trailing edge of the Senders retransmission
window. It is used (as part of the data, nulldata or retransmission
headers) to inform the Receivers that they should no longer attempt
to recover those messages with a smaller (or same) Sequence Number.
- HighestAllowed: a Sequence Number, used for flow control from the
Receivers. It signals the highest
Sequence Number the Sender is allowed to send that will not overrun
the Receivers buffer pools.
- BitMask: an array of 1s and 0s. Together with a Sequence Number it
is used to indicate lost data messages. If the ith element is a 1,
it indicates the message SeqNum+i is lost.
- Slowest: This field contains a field that characterizes the slowest
Receiver in the subtree beneath (and including) the node sending the
TRACK. This is used to provide information for the congestion
control BB.
- SenderDallyTime: This field is associated with a SenderTimestamp
field. It contains the sum of the waiting time that should be
subtracted from the RTT measurement at the Sender.
- ParentDallyTime: This is the same as the SenderDallyTime, but is
associated with a ParentTimestamp instead of a SenderTimestamp.
- DataLength: This is the length of the Data payload.
- CongestionControl: This includes any additional congestion control
variables for aggregation, such as WorstLossRate,
WorstEdgeThroughput, UnicastCost, and MulticastCost.
- ApplicationConfirms: This is the SeqNum value for which delivery
has been confirmed by all children at or below this parent.
- FailedChildren: This is a list of all children that have recently
been dropped from the repair tree.
- Immediate: If set to 1, a Receiver should immediately send a TRACK
on receipt of this packet.
- Reliability: The level of reliability required in order to consider
the set of data packets reliably delivered.
- NumReplies: The number of consecutive TRACK messages that should be
sent with this message attached
- SeqNumRange: The set of data messages that the ALC request applies
to.
- ConfirmStatus: The acknowledgement status of the Receivers in the
subtree up to the node that sends this message.
- ConfirmCount: The number of Receivers in the subtree up to the node
that sends this message, that have acknowledged the ALC request.
- PendingCount: The number of Receivers in this subtree that are
still pending in their decision as to acknowledging this ALC request.
- StatsSeqNum: The number of this request for statistics.
- StatsList: The list of statistics to be filled in by Receivers, and
aggregated by the control tree.
6. Global Configuration Variables, Constants, and Reason Codes
6.1 Global Configuration Variables
These are variables that control the Data Session and are advertised to
all participants. Some of them MAY instead be configured as constants.
- TimeMaxBindResponse: the time, in seconds, to wait for a response
to a BindRequest. Initial value is TIMEOUT_PARENT_RESPONSE
(recommended value is 3). Maximum value is
MAX_TIMEOUT_PARENT_RESPONSE.
- MaxChildren: The maximum number of children a Repair Head is
allowed to handle. Recommended value: 32.
- ConstantHeartbeatPeriod: Instead of dynamically calculating the
HeartbeatPeriod, a constant period may be used instead. Recommended
value: 3 seconds.
- MinimumHeartbeatPeriod: The minimum value for the dynamically
calculated HeartbeatPeriod. Recommended value: 1 second.
- MinHoldTime: The minimum amount of time a Repair Head holds on to
data messages.
- MaxHoldTime: The maximum amount of time a Repair Head holds on to
data messages.
- AckWindow: The number of messages seen before a Receiver issues an
acknowledgement. Recommended value: 32.
- LateJoinSemantics: The options available to a Receiver who wishes
to join a Data Session that is already in progress.
- MaximumFailureListSize: The maximum number of entries that can be
in a failure list. This MUST be small enough that the FailureList
does not ever cause a TRACK to exceed the size of a maximum UDP
packet. Recommended value: 800.
- MaximumStatisticsListSize: The maximum number of entries that can
be in a statistics list. This MUST be small enough that the
FailureList does not ever cause a TRACK to exceed the size of a
maximum UDP packet. Recommended value: 100.
- MaximumDataRate: The maximum admission rate for data messages from
the application to the Data Channel Protocol.
- MinimumDataRate: The minimum admission rate for data messages from
the application to the Data Channel Protocol.
6.2 Constants
- NUM_MAX_PARENT_ATTEMPTS: The number of times to try to bind to a
Repair Head before declaring a PARENT_UNREACHABLE error. Recommended
value is 5.
- TIMEOUT_PARENT_RESPONSE: The minimum value, in seconds, between
attempts to contact a parent. Recommended value is 1 second.
- MAX_TIMEOUT_PARENT_RESPONSE: The maximum value, in seconds,
between attempts to contact a parent. Recommended value is 16.
- NULL_DATA_PERIOD: The time between transmission of NullData
Messages. Recommended value is 1.
- FAILURE_DETECTION_REDUNDANCY: The number of times a message is sent
without receiving a response before declaring an error. Recommended
value is 3.
- MAX_TRACK_TIMEOUT: The maximum value for TRACKTimeout. Recommended
value is 5 seconds.
- TRANSMISSION_REDUNDANCY: The number of times a failure notification
is redundantly sent up the tree in a TRACK message. Recommended
value is 3.
6.3 Reason Codes
- BindReject reason codes
- LOOP_DETECTED
- MAX_CHILDREN_EXCEEDED
- UnbindRequest reason codes
- SESSION_DONE
- APPLICATION_REQUEST
- RECEIVER_TOO_SLOW
- EjectRequest reason codes
- PARENT_LEAVING
- PARENT_FAILURE
- CHILD_TOO_SLOW
- PARENT_OVERLOADED
7. Security
As specified in [12], the primary security requirement for a TRACK
protocol is protection of the transport infrastructure. This is
accomplished through the use of lightweight group authentication of
the control and, optionally, the data messages sent to the group.
These algorithms use IPsec and shared symmetric keys. For TRACK,
[12] recommends that there be one shared key for the Data Session and
one for each Local Control Channel. These keys are distributed
through a separate key manager component, which may be either
centralized or distributed. Each member of the group is responsible
for contacting the key manager, establishing a pair-wise security
association with the key manager, and obtaining the appropriate keys.
The exact algorithms for this BB are presently the subject of
research within the IRTF Secure Multicast Group (SMuG) and
standardization within the Multicast Security working group.
8. References
[1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP
9, RFC 2026, October 1996.
[2] Whetten, B., et. al. "Reliable Multicast Transport Building
Blocks for One-to-Many Bulk-Data Transfer." RFC 3048, January
2001.
[3] Handley, M., et. al. "The Reliable Multicast Design Space for
Bulk Data Transfer." RFC 2887, August 2000.
[4] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997
[5] Whetten, B., Taskale, G. "Overview of the Reliable Multicast
Transport Protocol II (RMTP-II)." IEEE Networking, Special Issue
on Multicast, February 2000.
[6] Nonnenmacher, J., Biersack, E. "Reliable Multicast: Where to
use Forward Error Correction", Proc. 5th. Workshop on Protocols
for High Speed Networks, Sophia Antipolis, France, Oct. 1996.
[7] Nonnenmacher, J., et. al. "Parity-Based Loss Recovery for
Reliable Multicast Transmission", In Proc. of ACM SIGCOMM 97,
Cannes, France, September 1997.
[8] Rizzo, L. "Effective erasure codes for reliable computer
communications protocols", DEIT Technical Report LR-970115.
[9] Nonnenmacher, J., Biersack, E. "Optimal Multicast Feedback",
Proc. IEEE INFOCOM 1998, March 1998.
[10] Whetten, B., Conlan, J. "A Rate Based Congestion Control
Scheme for Reliable Multicast", GlobalCast Communications
Technical White Paper, November 1998.
http://www.talarian.com/rmtp-ii
[11] Padhye, J., et. al. "Modeling TCP Throughput: A Simple Model
and its Empirical Validation". University of Massachusetts
Technical Report CMPSCI TR 98-008.
[12] Hardjorno, T., Whetten, B. "Security Requirements for TRACK"
draft-ietf-rmt-pi-track-security-00.txt, June 2000. Work in
Progress.
[13] Golestani, J., "Fundamental Observations on Multicast
Congestion Control in the Internet", Bell Labs, Lucent Technology,
paper presented at the July 1998 RMRG meeting.
[14] Kadansky, M., D. Chiu, J. Wesley, J. Provino, "Tree-based
Reliable Multicast (TRAM)", draft-kadansky-tram-02.txt, Work in
Progress.
[15] Whetten, B., M. Basavaiah, S. Paul, T. Montgomery, "RMTP-II
Specification", draft-whetten-rmtp-ii-00.txt, April 8, 1998. Work
in Progress.
[16] Kadansky, M., Chiu, D. M., Whetten, B., Levine, B. N., Taskale,
G., Cain, B., Thaler, D., Koh, s. J., "Reliable Multicast
Transport Building Block: Tree Auto-Configuration", draft-ietf-
rmt-bb-tree-config-02.txt, March 2, 2001. Work in Progress.
[17] Whetten, B. et. al., "TRACK Protocol Instantiation Over UDP",
draft-ietf-rmt-track-pi-udp-00.txt, November 2002.
[18] Adamson, B., et. al., "NACK Oriented Reliable Multicast
Protocol (NORM), draft-ietf-rmt-pi-norm-02.txt, July 2001. Work
in Progress.
[19] Vicisano, L., et. al., "Asynchronous Layered Coding - A
scalable reliable multicast protocol", draft-ietf-rmt-pi-alc-
02.txt, July 2001. Work in Progress.
[20] Speakman, T., et. al., "Pragmatic General Multicast (PGM)",
draft-speakman-pgm-spec-06.txt, Feb 2001. Work in Progress.
[21] Kermode, R., Vicisano, L., "Author Guidelines for RMT Building
Blocks and Protocol Instantiation Documents", RFC 3269.
10. Acknowledgements 10. Acknowledgements
We would like to thank the follow people: Sanjoy Paul, Seok Joo We would like to thank the follow people: Sanjoy Paul, Seok Joo Koh,
Koh, Supratik Bhattacharyya, Joe Wesley, and Joe Provino. Supratik Bhattacharyya, Joe Wesley, and Joe Provino.
11. Authors' Addresses 11. Authors Addresses
Brian Whetten
890 Sea Island Lane
Foster City, CA 94404
b2@whetten.net
Dah Ming Chiu Dah Ming Chiu
Sun Microsystems Laboratories
1 Network Drive
Burlington, MA 01803
dahming.chiu@sun.com dahming.chiu@sun.com
Miriam Kadansky Miriam Kadansky
miriam.kadansky@sun.com
Sun Microsystems Laboratories Sun Microsystems Laboratories
1 Network Drive 1 Network Drive
Burlington, MA 01803 Burlington, MA 01803
miriam.kadansky@sun.com
Seok Joo Koh
sjkoh@pec.etri.re.kr
Gursel Taskale Gursel Taskale
gursel@talarian.com TIBCO Corporation
Brian Whetten gursel@tibco.com
whetten@talarian.com
Talarian
333 Distel Circle
Los Altos, CA 94022-1404
Full Copyright Statement Full Copyright Statement
"Copyright (C) The Internet Society (2001). All Rights Reserved. "Copyright (C) The Internet Society (2000). All Rights Reserved. This
This document and translations of it may be copied and furnished to document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain others, and derivative works that comment on or otherwise explain it
it or assist in its implementation may be prepared, copied, or assist in its implementation may be prepared, copied, published
published and distributed, in whole or in part, without restriction and distributed, in whole or in part, without restriction of any
of any kind, provided that the above copyright notice and this kind, provided that the above copyright notice and this paragraph are
paragraph are included on all such copies and derivative works. included on all such copies and derivative works. However, this
However, this document itself may not be modified in any way, such document itself may not be modified in any way, such as by removing
as by removing the copyright notice or references to the Internet the copyright notice or references to the Internet Society or other
Society or other Internet organizations, except as needed for the Internet organizations, except as needed for the purpose of
purpose of developing Internet standards in which case the developing Internet standards in which case the procedures for
procedures for copyrights defined in the Internet Standards process copyrights defined in the Internet Standards process must be
must be followed, or as required to translate it into languages followed, or as required to translate it into languages other than
other than English. English.
The limited permissions granted above are perpetual and will not be The limited permissions granted above are perpetual and will not be
revoked by the Internet Society or its successors or assigns. revoked by the Internet Society or its successors or assigns.
This document and the information contained herein is provided on This document and the information contained herein is provided on an
an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING
ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING
IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION
THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF
WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE." MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE."
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/