draft-ietf-rmt-bb-track-00.txt   draft-ietf-rmt-bb-track-01.txt 
RMT Working Group Brian Whetten RMT Working Group Brian Whetten
Internet Engineering Task Force Talarian Internet Engineering Task Force Talarian
Internet Draft Dah Ming Chiu Internet Draft Dah Ming Chiu
Document: draft-ietf-rmt-bb-track-00.txt Sun Microsystems Document: draft-ietf-rmt-bb-track-01.txt Sun Microsystems
17 November, 2000 Miriam Kadansky 2 March, 2001 Miriam Kadansky
Expires 17 May 2001 Sun Microsystems Expires 2 October, 2001 Sun Microsystems
Gursel Taskale Gursel Taskale
Talarian Talarian
Reliable Multicast Transport Building Block for TRACK Reliable Multicast Transport Building Block for TRACK
<draft-ietf-rmt-bb-track-00.txt> <draft-ietf-rmt-bb-track-01.txt>
Status of this Memo Status of this Memo
This document is an Internet-Draft and is in full conformance with This document is an Internet-Draft and is in full conformance with
all provisions of Section 10 of RFC2026. all provisions of Section 10 of RFC2026.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
other groups may also distribute working documents as Internet- other groups may also distribute working documents as Internet-
Drafts. Internet-Drafts are draft documents valid for a maximum of Drafts. Internet-Drafts are draft documents valid for a maximum of
skipping to change at line 48 skipping to change at line 48
used as part of the TRACK Protocol Instantiation. It is also used as part of the TRACK Protocol Instantiation. It is also
designed to be useful as part of overlay multicast systems that designed to be useful as part of overlay multicast systems that
wish to offer efficient confirmed delivery of multicast messages. wish to offer efficient confirmed delivery of multicast messages.
Conventions used in this document Conventions used in this document
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in
this document are to be interpreted as described in RFC-2119. this document are to be interpreted as described in RFC-2119.
1.0 Introduction Table of Contents
1. Introduction
2. Design Rationale
3. Applicability Statement
3.1 Application types
3.2 Network Infrastructure
4. Message Types
5. Global Configuration Variables, Constants, and Reason Codes
5.1 Global Configuration Variables
5.2 Constants
5.3 Reason Codes
6. External APIs
6.1 Interfaces to the BB from PI's
6.1.1 Start(boolean RepairHead, boolean RejoinAllowed,
Advertisement)
6.1.2 End
6.1.3 incomingMessage(Message)
6.1.4 getStatistics
6.1.5 MessageSynched(Message)
6.1.6 RepairHead(boolean)
6.2 Interfaces from the BB to the PI
6.2.1 outgoingMessage(Message)
6.2.2 MessageReceived(Message, boolean Synch)
6.2.3 SenderLost
6.2.4 UnrecoverableData
6.2.5 SessionDone
7. Algorithms
7.1 Tree Based Session Creation and Maintenance
7.1.1 Overview of Tree Configuration
7.1.2 Bind
7.1.2.1 Input Parameters
7.1.2.2 Bind Algorithm
7.1.3 Unbind
7.1.4 Eject
7.1.5 Fault Detection
7.1.6 Fault Notification
7.1.7 Fault Recovery
7.2 TRACK Generation
7.2.1 TRACK Generation with the Rotating TRACK Algorithm
7.2.2 Local Repair
7.2.3 Flow Control Window Update
7.2.4 Reliability Window
7.2.5 Confirmed Delivery
7.3 Feedback Aggregation
7.4 Measuring Round Trip Times
8. Security
9. References
10. Acknowledgements
11. Authors' Addresses
1. Introduction
This document describes the TRACK Building Block. It contains This document describes the TRACK Building Block. It contains
functions relating to positive acknowledgments and hierarchical functions relating to positive acknowledgments and hierarchical
tree construction and maintenance. It is primarily meant to be tree construction and maintenance. It is primarily meant to be
used as part of the TRACK Protocol Instantiation. It is also used as part of the TRACK Protocol Instantiation. It is also
designed to be useful as part of overlay multicast systems that designed to be useful as part of overlay multicast systems that
wish to offer efficient confirmed delivery of multicast messages. wish to offer efficient confirmed delivery of multicast messages.
As pointed out in the building blocks rationale draft [WVKHFL00], As pointed out in the building blocks rationale draft [WVKHFL00],
there are two different reliability tasks that can be provided by a there are two different reliability tasks that can be provided by a
reliable multicast transport: ensuring goodput and confirming reliable multicast transport: ensuring goodput and confirming
delivery of application level messages. The NACK Protocol delivery of application level messages. The NACK Protocol
Instantiation and ALC Protocol Instantiation are each primarily Instantiation and ALC Protocol Instantiation are each primarily
concerned with ensuring goodput. The TRACK BB and TRACK PI relies concerned with ensuring goodput. The TRACK BB and TRACK PI rely on
on a repair tree to provide goodput as well as confirmed delivery. a repair tree to provide goodput as well as confirmed delivery. If
If Forward Error Correction, Generic Router Assist or other Forward Error Correction, Generic Router Assist or other mechanisms
mechanisms are used to help provide goodput, they are assumed to are used to help provide goodput, they are assumed to work
work transparently at a layer below this BB, as if the IP multicast transparently at a layer below this BB, as if the IP multicast
service has lower error rate. service has lower error rate.
The TRACK BB also assumes that there is an automatic tree The TRACK BB also assumes that there is an Automatic Tree Building
configuration algorithm which provides the list of parents each BB [KLCWTCTK01] which provides the list of parents (known as
node should join to. If Receivers are used that may also serve as Service Nodes within in Tree BB) each node should join to. If
Repair Heads, the TRACK BB assumes the Auto Tree BB is also Receivers are used that may also serve as Repair Heads, the TRACK
responsible for selecting the role of each Receiver as either BB assumes the Auto Tree BB is also responsible for selecting the
Receiver or Repair Head. role of each Receiver as either Receiver or Repair Head. However,
the TRACK BB may specify that a particular node may not operate as
a Repair Head.
The TRACK BB also assumes that a separate session advertisement The TRACK BB also assumes that a separate session advertisement
protocol notifies the receivers as to when to join a session, the protocol notifies the receivers as to when to join a session, the
data multicast address for the session, and the control parameters data multicast address for the session, and the control parameters
for the session. for the session.
The TRACK BB provides additional information and aggregation The TRACK BB provides additional information and aggregation
capabilities, which are useful for congestion control. capabilities, which are useful for congestion control.
The TRACK BB provides the following detailed functionality. The TRACK BB provides the following detailed functionality.
@ Hierarchical Session Creation and Maintenance. This set of @ Hierarchical Session Creation and Maintenance. This set of
functionality is responsible for creating and maintaining (but functionality is responsible for creating and maintaining (but
not configuring) the hierarchical tree of Repair Heads and not configuring) the hierarchical tree of Repair Heads and
Receivers. Receivers.
o Bind. When a child knows the parent it wishes to join to o Bind. When a child knows the parent it wishes to join to
for a given data session, it binds to that parent. for a given data session, it binds to that parent.
o Child Initiated Unbind. When a child wishes to leave a o Unbind. When a child wishes to leave a data session,
data session, either because the session is over or because either because the session is over or because the
the application is finished with the session, it initiates application is finished with the session, it initiates an
an unbind operation with its parent. unbind operation with its parent.
o Parent Initiated Unbind. A parent can also force a child
to unbind. This happens if the parent needs to leave the o Eject. A parent can also force a child to unbind. This
session, if the child is not behaving correctly, or if the happens if the parent needs to leave the session, if the
parent wants to move the child to another parent as part of child is not behaving correctly, or if the parent wants to
tree configuration maintenance. move the child to another parent as part of tree
configuration maintenance.
o Fault Detection. In order to verify liveness, parents and o Fault Detection. In order to verify liveness, parents and
children send regular heartbeat messages between children send regular heartbeat messages between
themselves. The sender also sends regular null data themselves. The sender also sends regular null data
messages to the group, if it has no data to send. messages to the group, if it has no data to send.
o Fault Recovery. When a child detects that its parent is no o Fault Recovery. When a child detects that its parent is no
longer reachable, it may switch to another parent. When a longer reachable, it may switch to another parent. When a
parent detects that one of its children is no longer parent detects that one of its children is no longer
reachable, it removes that child from its membership list reachable, it removes that child from its membership list
and reports this up the tree to the Sender of the Data and reports this up the tree to the Sender of the Data
Session. Session.
skipping to change at line 130 skipping to change at line 183
o Flow Control and Buffer Management. Receivers and Repair o Flow Control and Buffer Management. Receivers and Repair
Heads maintain a set of buffers that are at least as large Heads maintain a set of buffers that are at least as large
as the Sender's transmission window. The Receivers pass as the Sender's transmission window. The Receivers pass
their reception status up to the sender as part of their their reception status up to the sender as part of their
TRACK messages. This is used to acknowledge receipt of TRACK messages. This is used to acknowledge receipt of
delivery, to advance the buffer windows at each node, and delivery, to advance the buffer windows at each node, and
to limit the sender's window advancement to the speed of to limit the sender's window advancement to the speed of
the slowest receiver. the slowest receiver.
o Application Level Confirmed Delivery. Confirmed Delivery o Application Level Confirmed Delivery. Confirmed Delivery
provides transport level confirmation of delivery. Senders provides transport level confirmation of delivery. Senders
can put a "synch point" request in data messages, asking can put a ˘synch point÷ request in data messages, asking
for application level confirmation. Data messages with for application level confirmation. Data messages with
this flag set are only confirmed by the Receivers after the this flag set are only confirmed by the Receivers after the
Receiver applications confirm receipt. Receiver applications confirm receipt.
@ Local Recovery. This functionality describes how repair heads @ Local Recovery. This functionality describes how repair heads
maintain state on their children and provide repairs in response maintain state on their children and provide repairs in response
to requests for retransmission contained in TRACK messages. to requests for retransmission contained in TRACK messages.
This has overlap with the NACK BB, which is unified in the TRACK This has overlap with the NACK BB, which is unified in the TRACK
PI. PI.
skipping to change at line 154 skipping to change at line 207
aggregated feedback information includes that used for end-to- aggregated feedback information includes that used for end-to-
end confirmed delivery, flow control, congestion control, and end confirmed delivery, flow control, congestion control, and
group membership monitoring and management. group membership monitoring and management.
@ Distributed RTT Calculations. One of the primary challenges of @ Distributed RTT Calculations. One of the primary challenges of
congestion control is efficient RTT calculations. TRACK congestion control is efficient RTT calculations. TRACK
provides two methods to perform these calculations. provides two methods to perform these calculations.
o Sender Per-Message RTT Calculations. Each message is o Sender Per-Message RTT Calculations. Each message is
stamped with a timestamp from the sender. As each is stamped with a timestamp from the sender. As each is
passed up the tree, the amount of dally time spent waiting passed up the tree, the amount of dally time spent waiting
at each node is accumulated. The highest measurements are at each node is accumulated. The lowest measurements are
passed up the tree, and the dally time is subtracted from passed up the tree, and the dally time is subtracted from
the original measurement. the original measurement.
o Local Per-Level RTT Calculations. Each parent measures the o Local Per-Level RTT Calculations. Each parent measures the
local RTT to each of its children as part of the keep-alive local RTT to each of its children as part of the keep-alive
messages used for failure detection. messages used for failure detection.
2. Design Rationale 2. Design Rationale
Much of the design rationale behind the protocol instantiations and Much of the design rationale behind the protocol instantiations and
building blocks being standardized by the RMT working group are building blocks being standardized by the RMT working group are
skipping to change at line 177 skipping to change at line 230
the design rationales laid out in both of those documents. the design rationales laid out in both of those documents.
TRACK is designed to provide confirmed delivery, receiver-based TRACK is designed to provide confirmed delivery, receiver-based
flow control, distributed management of group membership (some of flow control, distributed management of group membership (some of
them may be dedicated servers in a repair tree), as well as them may be dedicated servers in a repair tree), as well as
providing aggregation of information up the tree. It also provides providing aggregation of information up the tree. It also provides
requests for retransmissions as part of TRACK messages, and local requests for retransmissions as part of TRACK messages, and local
recovery of lost packets. recovery of lost packets.
This TRACK BB is primarily designed to work as part of the TRACK This TRACK BB is primarily designed to work as part of the TRACK
BB, in conjunction with other BB's including NACK, FEC, and Auto PI, in conjunction with other BB's including NACK, FEC, and Auto
Tree. In the spirit of modular reuse specified in [WVKHFL00], it Tree. In the spirit of modular reuse specified in [WVKHFL00], it
is also designed to be useful as an additional layer of is also designed to be useful as an additional layer of
functionality on top of any of the following services. 1) The functionality on top of any of the following services. 1) The
functionality (if not the exact message headers) of the NORM PI. functionality (if not the exact message headers) of the NORM PI.
2) The functionality (if not the exact message headers) of the ALC 2) The functionality (if not the exact message headers) of the ALC
PI. 3) Running directly on top of an unreliable IP multicast PI. 3) Running directly on top of an unreliable IP multicast
routing protocol, but on a carefully provisioned network. 4) On routing protocol, but on a carefully provisioned network. 4) On
top of an overlay multicast (aka application layer multicast) top of an overlay multicast (also known as application layer
system. multicast) system.
Overlay multicast is a system where servers in the network provide Overlay multicast is a system where servers in the network provide
multicast (and unicast) routing as well as reliable multicast multicast (and unicast) routing as well as reliable multicast
delivery, all on top of a combination of unicast (i.e. TCP) and, as delivery, all on top of a combination of unicast (i.e. TCP) and, as
available, reliable multicast services. available, reliable multicast services.
There is a fundamental tradeoff between reliability and real-time There is a fundamental tradeoff between reliability and real-time
performance in the face of failures. There are two primary types performance in the face of failures. There are two primary types
of single layer reliability that have been proposed to deal with of single layer reliability that have been proposed to deal with
this: sender reliable and receiver reliable delivery. Sender this: sender reliable and receiver reliable delivery. Sender
skipping to change at line 256 skipping to change at line 307
failure occurs among the members of a TRACK based protocol, there failure occurs among the members of a TRACK based protocol, there
is a possibility that this may slow down other members of the is a possibility that this may slow down other members of the
group. NACK protocols have higher isolation of failures, as well group. NACK protocols have higher isolation of failures, as well
as smaller amounts of control traffic under many scenarios. as smaller amounts of control traffic under many scenarios.
3.1 Application types 3.1 Application types
The objectives of TRACK are to provide high level reliability, high The objectives of TRACK are to provide high level reliability, high
scalability, congestion control and flow control for one to many scalability, congestion control and flow control for one to many
bulk data dissemination. TRACK is not designed for many to many bulk data dissemination. TRACK is not designed for many to many
applications Examples of applications that fit into the one-to-many applications. Examples of applications that fit into the one-to-
data dissemination model are: real time financial news and market many data dissemination model are: real time financial news and
data distribution, electronic software distribution, audio video market data distribution, electronic software distribution, audio
streaming, distance learning, software updates and server video streaming, distance learning, software updates and server
replication. But, not all of these application types have the same replication. But, not all of these application types have the same
reliability requirements. reliability requirements.
Historically, financial applications have had the most stringent Historically, financial applications have had the most stringent
reliability requirements, while audio video streaming have had the reliability requirements, while audio video streaming have had the
least stringent. For applications that want to have strong least stringent. For applications that want to have strong
confirmation of delivery guarantees, TRACK may be more applicable confirmation of delivery guarantees, TRACK may be more applicable
than alternatives such as NORM or ALC. For applications that do than alternatives such as NORM or ALC. For applications that do
not require this level of reliability, or that demand the lowest not require this level of reliability, or that demand the lowest
levels of latency and the highest levels of failure isolation, levels of latency and the highest levels of failure isolation,
skipping to change at line 293 skipping to change at line 344
Asymmetric networks with very low upbound bandwidth and a very low Asymmetric networks with very low upbound bandwidth and a very low
loss data channel may be better served through NACK based loss data channel may be better served through NACK based
protocols, particularly if high reliability is not required. A protocols, particularly if high reliability is not required. A
good example is some satellite networks. good example is some satellite networks.
Networks that have very high loss rates, and regularly experience Networks that have very high loss rates, and regularly experience
partial network partitions, router flapping, or other persistent partial network partitions, router flapping, or other persistent
faults, may be better served through NACK only protocols. faults, may be better served through NACK only protocols.
4. Message Types 4. Message Types
The following table summarizes the messages and their fields used The following table summarizes the messages and their fields used
by the TRACK BB. by the TRACK BB. All messages contain the session identifier.
+--------------------------------------------------------------------+
Message From To Mcast? Fields Message From To Mcast? Fields
+--------------------------------------------------------------------+
BindRequest Child Parent no Session, Scope, Level, BindRequest Child Parent no Scope, Level, Role,
Role, SubTreeCount SubTreeCount
+--------------------------------------------------------------------+
BindConfirm Parent Child no Session, Level, RepairAddr,
SeqNum, MemberId
BindReject Parent Child no Session, Reason BindConfirm Parent Child no Level, RepairAddr, SeqNum,
MemberId, CacheInfo
+--------------------------------------------------------------------+
UnbindRequest Child Parent no Session, Reason BindReject Parent Child no Reason
+--------------------------------------------------------------------+
UnbindConfirm Parent Child no Session UnbindRequest Child Parent no Reason
+--------------------------------------------------------------------+
EjectRequest Parent Child either Session, Reason UnbindConfirm Parent Child no
+--------------------------------------------------------------------+
EjectConfirm Child Parent no Session EjectRequest Parent Child either Reason
+--------------------------------------------------------------------+
Heartbeat Parent Child either Session, Level, EjectConfirm Child Parent no
ChildrenList, SeqNum, +--------------------------------------------------------------------+
SendTime
NullData Sender all yes Session, SeqNum, Rate, Heartbeat Parent Child either Level, ParentTimestamp,
AppSynch ChildrenList, SeqNum
+--------------------------------------------------------------------+
Data Sender all yes Session, SeqNum, Rate, NullData Sender all yes SenderTimeStamp, AppSynch, End
AppSynch, HighestReleased, Data Rate, HighestReleased, SeqNum
SendTimeStamp +--------------------------------------------------------------------+
Retransmission Parent Child yes Session, SeqNum, Rate, Retransmission Parent Child yes SenderTimeStamp, AppSynch, End
AppSynch, HighestReleased, Rate, HighestReleased, SeqNum
SendTimeStamp +--------------------------------------------------------------------+
Track Child Parent no Session, SeqNum, BitMask, Track Child Parent no SeqNum, BitMask, SubTreeCount
Scope, HighestAllowed, Slowest, FailedChildren,
AppSynch, SubTreeCount, HighestAllowed,LocalDallyTime
Slowest, RTT, ParentThere, ApplicationConfirms,
ParentTimeStamp, ParentThere, ParentTimeStamp,
LocalDallyTime,
SenderTimeStamp, SenderTimeStamp,
SenderDallyTime SenderDallyTime
+--------------------------------------------------------------------+
The various fields of the messages are described as follows: The various fields of the messages are described as follows:
- Session: an id that identifies the session. - Scope: an integer to indicate how far a repair message travels.
This is optional.
- Level: an integer that indicates the level in the repair tree. - Level: an integer that indicates the level in the repair tree.
This value is used to keep loops in the tree from forming, in This value is used to keep loops in the tree from forming, in
addition to indicating the distance from the sender. addition to indicating the distance from the sender. Any changes
in a node's level are passed down to the Tree BB using the
treeLevelUpdate interface.
- SeqNum: an integer indicating the sequence number of a data - Role: This indicates if the bind requestor is a receiver or
message within a given data session. repair head.
- AppSynch: a sequence number signaling a request for confirmed - SubTreeCount: This is an integer indicating the current number of
delivery by the application. receivers below the node.
- HighestAllowed: a sequence number, used for flow control from the - RepairAddr: This field in the BindConfirm message is used to tell
receivers. It signals the highest the receiver which multicast address the repair head will be
sequence number the sender is allowed to send that will not overrun sending retransmissions on. If this field is null, then the
the receivers' buffer pools. receiver should expect retransmissions to be sent on the sender's
data multicast address.
- BitMask: an array of 1's and 0's. Together with a sequence - SeqNum: an integer indicating the sequence number of a data
number it is used to indicate lost data messages. If the i'th message within a given data session. The SeqNum field in the
element is a 1, it indicates the message SeqNum+i is lost. BindConfirm message indicates the sequence number starting from
which the repair head promises to provide repair service.
- Scope: an integer to indicate how far a repair message travels. - MemberId: This is an integer the repair head assigns to a
This is optional. particular child. The child receiver uses this value to implement
the rotating TRACK Generation algorithm.
- Role: This indicates if the bind requestor is a receiver or - CacheInfo: This field contains information about the repair data
repair head. available from this Repair Head.
- SubTreeCount: This is an integer indicating the current number of - Reason: a code indicating the reason for the BindReject,
receivers below the node. UnbindRequest, or EjectRequest message.
- ParentTimestamp: This field is included in Heartbeat messages to
signal the need to do a local RTT measurement from a parent. It is
the time when the parent sent the packet.
- ChildrenList: This field contains the identifiers for a list of - ChildrenList: This field contains the identifiers for a list of
children. As part of the keepalive message, this field together children. As part of the keepalive message, this field together
with the SeqNum field is used to urge those listed receivers to with the SeqNum field is used to urge those listed receivers to
send a TRACK (for the provided SeqNum). The repair head sending send a TRACK (for the provided SeqNum). The repair head sending
this must have been missing the regular TRACKs from these children this must have been missing the regular TRACKs from these children
for an extended period of time. for an extended period of time.
- SenderTimestamp: This field is included in Data messages to
signal the need to do a roundtrip time measurement from the sender,
through the tree, and back to the sender. It is the time (measured
by the sender's local clock) when it sent the packet.
- AppSynch: a sequence number signaling a request for confirmed
delivery by the application.
- End: indicates that this packet is the end of the data for this
session.
- Rate: This field is used by the sender to tell the receivers its - Rate: This field is used by the sender to tell the receivers its
sending rate, in packets per second. It is part of the data or sending rate, in packets per second. It is part of the data or
nulldata messages. nulldata messages.
- HighestReleased: This field contains a sequence number, - HighestReleased: This field contains a sequence number,
corresponding to the trailing edge of the sender's retransmission corresponding to the trailing edge of the sender's retransmission
window. It is used (as part of the data, nulldata or window. It is used (as part of the data, nulldata or
retransmission headers) to inform the receivers that they should no retransmission headers) to inform the receivers that they should no
longer attempt to recover those messages with a smaller (or same) longer attempt to recover those messages with a smaller (or same)
sequence number. sequence number.
- HighestPacketSent: This field is part of nulldata messages. It - HighestAllowed: a sequence number, used for flow control from the
contains the highest sequence numbered data message sent so far as receivers. It signals the highest
part of this data session. sequence number the sender is allowed to send that will not overrun
the receivers' buffer pools.
- BitMask: an array of 1's and 0's. Together with a sequence
number it is used to indicate lost data messages. If the i'th
element is a 1, it indicates the message SeqNum+i is lost.
- Slowest: This field contains a field that characterizes the - Slowest: This field contains a field that characterizes the
slowest receiver in the subtree beneath (and including) the node slowest receiver in the subtree beneath (and including) the node
sending the TRACK. This is used to provide information for the sending the TRACK. This is used to provide information for the
congestion control BB, and the aggregation methods on this congestion control BB, and the aggregation methods on this
information are defined by that BB. information are defined by that BB.
- ParentThere: This field indicates to the parent that the receiver - ParentThere: This field indicates to the parent that the receiver
sending the TRACK has not been receiving the regular keepalive sending the TRACK has not been receiving the regular keepalive
messages from its parent, and is wondering if it needs to find a messages from its parent, and is wondering if it needs to find a
new parent. new parent.
- Reason: a code indicating the reason for the BindReject,
UnbindRequest, or EjectRequest message.
- RepairAddr: This field in the BindConfirm message is used to tell - SenderDallyTime: This field is associated with a SenderTimestamp
the receiver which multicast address the repair head will be field. It contains the sum of the waiting time that should be
sending retransmissions on. If this field is null, then the subtracted from the RTT measurement at the sender.
receiver should expect retransmissions to be sent on the sender's
data multicast address. This SeqNum field in the same BindConfirm
message indicates the sequence number starting from which the
repair head promises to provide repair service.
- SenderTimestamp: This field is included in Data messages to - LocalDallyTime: This is the same as the SenderDallyTime, but is
signal the need to do a roundtrip time measurement from the sender, associated with a ParentTimestamp instead of a SenderTimestamp.
through the tree, and back to the sender. It is the time (measured
by the sender's local clock) when it sent the packet.
- SenderDallyTime: This field is included in TRACK messages. It is - ApplicationConfirms: This is the SeqNum value for which delivery
associated with a SenderTimestamp field. It contains the sum of has been confirmed by all children at or below this parent.
the waiting time that should be subtracted from the RTT measurement
at the sender.
- ParentTimestamp: This field is included in Heartbeat messages to - FailedChildren: This is a list of all children that have recently
signal the need to do a local RTT measurement from a parent. It is been dropped from the repair tree.
the time when the parent sent the packet.
- ParentDallyTime: This is the same as the SenderDallyTime, but is 5. Global Configuration Variables, Constants, and Reason Codes
associated with a ParentTimestamp instead of a SenderTimestamp.
- MemberId: This is an integer the repair head assigns to a 5.1 Global Configuration Variables
particular child. The child receiver uses this value to implement These are variables that control the data session and are advertised
the rotating TRACK Generation algorithm. to all participants.
5. Global Configuration Variables and Error Codes @ TimeMaxBindResponse: the time, in seconds, to wait for a
response to a BindRequest. Initial value is
TIMEOUT_PARENT_RESPONSE (recommended value is 3). Maximum value
is MAX_TIMEOUT_PARENT_RESPONSE.
6. External API @ MaxChildren: The maximum number of children a repair head is
allowed to handle. Recommended value: 32.
@ ConstantHeartbeatPeriod: Instead of dynamically calculating the
HeartbeatPeriod as described in Section 7.1.5, a constant period
may be used instead. Recommended value: 3 seconds.
@ MinimumHeartbeatPeriod: The minimum value for the dynamically
calculated HeartbeatPeriod. Recommended value: 1 second.
@ MinHoldTime: The minimum amount of time a repair head holds on
to data packets.
@ MaxHoldTime: The maximum amount of time a repair head holds on
to data packets.
@ AckWindow: The number of packets seen before a receiver issues
an acknowledgement. Recommended value: 32.
5.2 Constants
@ NUM_MAX_PARENT_ATTEMPTS: The number of times to try to bind to a
repair head before declaring a PARENT_UNREACHABLE error.
Recommended value is 5.
@ NULL_DATA_PERIOD: The time between transmission of NullData
Messages. Recommended value is 1.
@ FAILURE_DETECTION_REDUNDANCY: The number of times a message is
sent without receiving a response before declaring an error.
Recommended value is 3.
@ MAX_TRACK_TIMEOUT: The maximum value for TRACKTimeout.
Recommended value is 5 seconds.
5.3 Reason Codes
@ BindReject reason codes
@ LOOP_DETECTED
@ MAX_CHILDREN_EXCEEDED
@ UnbindRequest reason codes
@ SESSION_DONE
@ APPLICATION_REQUEST
@ RECEIVER_TOO_SLOW
@ EjectRequest reason codes
@ PARENT_LEAVING
@ PARENT_FAILURE
@ CHILD_TOO_SLOW
@ PARENT_OVERLOADED
6. External APIs
This section describes external interfaces for the building block. This section describes external interfaces for the building block.
6.1 Interfaces to the BB from the PI 6.1 Interfaces to the BB from PI's
6.1.1 Start(boolean RepairHead, boolean RejoinAllowed, 6.1.1 Start(boolean RepairHead, boolean RejoinAllowed,
Advertisement) Advertisement)
Start instructs the BB to initiate operation. RepairHead indicates Start instructs the BB to initiate operation.
whether or not the node may also operate as a repair head.
RepairHead indicates whether or not the node may also operate as a
repair head. This parameter is passed along to the tree BB.
RejoinAllowed indicates whether or not the node is allowed to rejoin RejoinAllowed indicates whether or not the node is allowed to rejoin
the session if the only repair heads available are missing some repair the session if the only repair heads available are missing some repair
data. In particular, this parameter also controls whether or not the data needed by this node. This parameter also controls whether or not
node is allowed to join the session after the first data messages have the node is allowed to join the session after the first data messages
become unrecoverable (late join). The Advertisement parameter passes have become unrecoverable (late join). The BB uses this parameter to
to the BB all of the parameters from the session advertisement. decide whether or not to use a particular repair head (chosen by the
tree BB) based on its available repair data.
The Advertisement parameter passes to the BB all of the parameters
from the session advertisement.
6.1.2 End 6.1.2 End
End instructs the BB to end its operation. If the node is the Sender, End instructs the BB to end its operation. If the node is the Sender,
it indicates to the group that the session has ended. If the node is it indicates to the group that the last data message is the final one
for the session. Once a receiver has received all of the session's
data, it MAY unbind from its parent. However, if the receiver is also
a repair head, it continues to operate as a repair head until all of a repair head, it continues to operate as a repair head until all of
its children have finished. its children have finished. Then it MAY unbind from its own parent.
6.1.3 SendMessage(Message) If End is called at a repair head, it MUST use the multicast Eject
procedure to inform its children for this session that it is leaving
the group. Once the procedure is complete (all children have
acknowledged receipt of the Eject, or the Eject has been sent the
maximum number of times), the repair head MAY unbind from its own
parent.
SendMessage presents the BB with a data message to send to the group. If End is called at a receiver, it MUST use the Unbind procedure to
inform its parent for this session that it is leaving the group.
6.1.4 MessageSynched(Message) 6.1.3 incomingMessage(Message)
incomingMessage presents the BB with message received by the PI.
6.1.4 getStatistics
getStatistics returns current BB statistics to the upper BB or PI.
6.1.5 MessageSynched(Message)
MessageSynched tells the BB that the indicated message has been MessageSynched tells the BB that the indicated message has been
synched with the application. synched with the application.
6.1.6 RepairHead(boolean)
RepairHead tells the BB whether or not it is now acting as a Repair
Head.
6.2 Interfaces from the BB to the PI 6.2 Interfaces from the BB to the PI
6.2.1 MessageReceived(Message, boolean SynchMessage) 6.2.1 outgoingMessage(Message)
MessageReceived passes a data message up to the PI. SynchMessage outgoingMessage instructs the PI to send the message.
indicates whether or not the PI should call MessageSynched once the
message has been consumed by the application.
6.2.2 SenderLost 6.2.2 MessageReceived(Message, boolean Synch)
MessageReceived passes a data message up to the PI. Synch indicates
whether or not the PI should call MessageSynched once the message has
been consumed by the application.
6.2.3 SenderLost
SenderLost tells the PI that contact with the sender has been lost. SenderLost tells the PI that contact with the sender has been lost.
6.2.3 UnrecoverableData 6.2.4 UnrecoverableData
UnrecoverableData indicates to the PI that the BB was unable to UnrecoverableData indicates to the PI that the BB was unable to
recover some session data. recover some session data.
6.2.4 SessionDone 6.2.5 SessionDone
SessionDone indicates to the PI that the sender has completed sending SessionDone indicates to the PI that the sender has completed sending
the data, and the node has left the session. the data, and the node has left the session.
7. Algorithms 7. Algorithms
7.1 Tree Based Session Creation and Maintenance 7.1 Tree Based Session Creation and Maintenance
7.1.1 Overview of Tree Configuration 7.1.1 Overview of Tree Configuration
skipping to change at line 511 skipping to change at line 669
knowledge, which can be provided with some combination of manual knowledge, which can be provided with some combination of manual
and/or automatic configuration. The algorithms for automatic tree and/or automatic configuration. The algorithms for automatic tree
configuration are part of the Automatic Tree Configuration BB. configuration are part of the Automatic Tree Configuration BB.
They return to each node the address of the parent it should bind They return to each node the address of the parent it should bind
to, as well as zero or more backup parents to use if the primary to, as well as zero or more backup parents to use if the primary
parent fails. parent fails.
In addition to receiving the tree configuration information, the In addition to receiving the tree configuration information, the
receivers all receive a Session Advertisement message from the receivers all receive a Session Advertisement message from the
senders, informing them of the Data Multicast Address and other senders, informing them of the Data Multicast Address and other
session configuration information. This advertisement may session configuration information. This advertisement may contain
advertise other relevant session information such as whether or not other relevant session information such as whether or not Repair
Repair Heads should be used, whether manual or automatic tree Heads should be used, whether manual or automatic tree
configuration should be used, the time at which the session will configuration should be used, the time at which the session will
start, and other protocol constants. This advertisement is created start, and other protocol settings. This advertisement is created
as part of either the PI or as part of an external service. In as part of either the PI or as part of an external service. In
this way, the Sender enforces a set of uniform Session this way, the Sender enforces a set of uniform Session
Configuration Parameters on all members of the session. Configuration Parameters on all members of the session.
As described in the automatic tree configuration BB, the general As described in the automatic tree configuration BB, the general
algorithm for a given node in tree creation is as follows. algorithm for a given node in tree creation is as follows.
1) Get advertisement that a session is starting 1) Get advertisement that a session is starting
2) Get list of neighbor candidates, contact them 2) Get list of neighbor candidates using the getSNs Tree BB
interface, contact them
3) Select best neighbor as parent in a loop free manner 3) Select best neighbor as parent in a loop free manner
4) Bind to parent 4) Bind to parent
5) Optionally, later rebind to another parent 5) Optionally, later rebind to another parent
When a child finishes step 4, it is up to automatic tree When a child finishes step 4, it is up to automatic tree
configuration to, if necessary, continue building the tree in order configuration to, if necessary, continue building the tree in order
to connect the node back to the Sender. After the session is to connect the node back to the Sender. After the session is
created, children can unbind from their parents and bind again to created, children can unbind from their parents and bind again to
new parents. This happens when faults occur, or as part of a tree new parents. This happens when faults occur, or as part of a tree
optimization process. Steps 1 through 3 are external to the TRACK optimization process. Steps 1 through 3 are external to the TRACK
BB. Step 4 is performed as part of session creation. Step 5 is BB. Step 4 is performed as part of session creation. Step 5 is
performed as part of session maintenance in conjunction with performed as part of session maintenance in conjunction with
automatic tree building, as either a child initiated unbind or a automatic tree building, as either an unbind or eject, combined
parent initiated unbind, combined with another bind operation. with another bind operation.
Once steps 1 through 3 are completed, Receivers join the Data Once steps 1 through 3 are completed, Receivers join the Data
Multicast Address, and attempt to bind to either the Sender or a Multicast Address, and attempt to bind to either the Sender or a
local Repair Head. A Receiver will attempt to bind to the first local Repair Head. A Receiver will attempt to bind to the first
node in the tree configuration list returned by step 3, and if this node in the tree configuration list returned by step 3, and if this
fails, it will move to the next one. A Receiver only binds to a fails, it will move to the next one. A Receiver only binds to a
single Repair Head or Sender, at a time, for each Data Session. single Repair Head or Sender, at a time, for each Data Session.
The automatic tree building BB ensures that the tree is formed The automatic tree building BB ensures that the tree is formed
without loops. As part of this, when a Repair Head has a Receiver without loops. As part of this, when a Repair Head has a Receiver
attempt to bind to it for a given Data Session, it may not at first attempt to bind to it for a given Data Session, it may not at first
be able to accept the connection, until it is able to join the tree be able to accept the connection, until it is able to join the tree
itself. Because of this, a Receiver will sometimes have to itself. Because of this, a Receiver will sometimes have to
repeatedly attempt to bind to a given parent before succeeding. repeatedly attempt to bind to a given parent before succeeding.
Once the Sender initiates tree building, it is also free to start Once the Sender initiates tree building, it is also free to start
sending Data messages on the Data Multicast Address. Repair Heads sending Data messages on the Data Multicast Address. Repair Heads
and Receivers may start receiving these messages, but may not and Receivers may start receiving these messages, but may not
request retransmission or deliver data to the application until request retransmission or deliver data to the application until
they receive confirmation that they have successfully bound to the they receive confirmation that they have successfully bound to the
group. tree.
7.1.2 Bind 7.1.2 Bind
7.1.2.1 Input Parameters 7.1.2.1 Input Parameters
In order to join a data session and bind to the tree, the following In order to join a data session and bind to the tree, the following
nodes need the following parameters. nodes need the following parameters.
A Repair Head requires the following parameters. A Repair Head requires the following parameters.
skipping to change at line 603 skipping to change at line 762
multiple BindRequest messages may be bundled together in a single multiple BindRequest messages may be bundled together in a single
message. message.
A node sends a BindRequest message to its automatically selected or A node sends a BindRequest message to its automatically selected or
manually configured parent node. The parent node sends either a manually configured parent node. The parent node sends either a
BindConfirm message or a BindReject message. Reception of a BindConfirm message or a BindReject message. Reception of a
BindConfirm message terminates the algorithm successfully, while BindConfirm message terminates the algorithm successfully, while
receipt of a BindReject message causes the node to either retry the receipt of a BindReject message causes the node to either retry the
same parent or restart the Bind algorithm with its next parent same parent or restart the Bind algorithm with its next parent
candidate (depending on the BindReject reason code), or if it has candidate (depending on the BindReject reason code), or if it has
none, to declare a REJECTED_BY_PARENT error. none, to declare a REJECTED_BY_PARENT error. Once the node is
accepted by a Repair head, it informs the Tree BB using the setSN
interface.
Reliability is achieved through the use of a standard request- Reliability is achieved through the use of a standard request-
response protocol. At the beginning of the algorithm, the child response protocol. At the beginning of the algorithm, the child
initializes TimeMaxBindResponse to the constant initializes TimeMaxBindResponse to the constant
TIMEOUT_PARENT_RESPONSE and initializes NumBindResponseFailures to TIMEOUT_PARENT_RESPONSE and initializes NumBindResponseFailures to
0. Every time it sends a BindRequest message, it waits 0. Every time it sends a BindRequest message, it waits
TimeMaxBindResponse for a response from the parent node. If no TimeMaxBindResponse for a response from the parent node. If no
response is received, the node doubles its value for response is received, the node doubles its value for
TimeMaxBindResponse, but limits TimeMaxBindResponse to be no larger TimeMaxBindResponse, but limits TimeMaxBindResponse to be no larger
than MAX_TIMEOUT_PARENT_RESPONSE. It also than MAX_TIMEOUT_PARENT_RESPONSE. It also
increments NumBindResponseFailures, and retransmits the BindRequest increments NumBindResponseFailures, and retransmits the BindRequest
message. If NumBindResponseFailures reaches message. If NumBindResponseFailures reaches
NUM_MAX_PARENT_ATTEMPTS, it reports a PARENT_UNREACHABLE error. NUM_MAX_PARENT_ATTEMPTS, it reports a PARENT_UNREACHABLE error.
When a parent receives a BindRequest message, it first consults the When a parent receives a BindRequest message, it first consults the
automatic tree building BB for approval, for instance to ensure automatic tree building BB for approval (using the acceptChild Tree
that accepting the BindRequest will not cause a loop in the tree. BB interface), for instance to ensure that accepting the
Then the parent checks to be sure that it does not have more than BindRequest will not cause a loop in the tree. Then the parent
NUM_MAX_CHILDREN children already bound to it for this session. If checks to be sure that it does not have more than MaxChildren
it can accept the child, it sends back a BindConfirm message. children already bound to it for this session. If it can accept
Otherwise, it sends the node a BindReject message. Then the parent the child, it sends back a BindConfirm message. Otherwise, it
checks to see if it is already a member of this data session. If sends the node a BindReject message. Then the parent checks to see
it is not yet a member of this session, it attempts to join the if it is already a member of this data session. If it is not yet a
tree itself. member of this session, it attempts to join the tree itself.
The BindConfirm message contains the lowest sequence number that The BindConfirm message contains the lowest sequence number that
the repair head has available. If this number is 0 or 1, then the the repair head has available. If this number is 0 or 1, then the
repair head has all of the data available from the start of the repair head has all of the data available from the start of the
session. Otherwise, the requesting node is attempting a late join, session. Otherwise, the requesting node is attempting a late join,
and can only use this repair head if late join was allowed by the and can only use this repair head if late join was allowed by the
PI. If late join is not allowed, the node may try another repair PI. If late join is not allowed, the node may try another repair
head, or give up. head, or give up.
Similarly, if a failure recovery occurs, when a node tries to bind Similarly, if a failure recovery occurs, when a node tries to bind
to a new repair head, it must follow the same rules as for a late to a new repair head, it must follow the same rules as for a late
join. join. See section 7.1.5.
7.1.3 Child Initiated Unbind 7.1.3 Unbind
A child may decide to leave a data session for the following A child may decide to leave a data session for the following
reasons. 1) It detects that the data session is finished. 2) The reasons. 1) It detects that the data session is finished. 2) The
application requests to leave the data session. 3) It is not able application requests to leave the data session. 3) It is not able
to keep up with the data rate of the data session. When any of to keep up with the data rate of the data session. When any of
these conditions occurs, it initiates an Unbind process. these conditions occurs, it initiates an Unbind process.
A Child Initiated Unbind is, like the Bind function, a simple An Unbind is, like the Bind function, a simple request-reply
request-reply protocol. Unlike the Bind function, it only has a protocol. Unlike the Bind function, it only has a single response,
single response, UnbindAccept. With this exception, the Unbind UnbindConfirm. With this exception, the Unbind operation uses the
operation uses the same state variables and reliability algorithms same state variables and reliability algorithms as the Bind
as the Bind function. function.
When a child receives an UnbindAccept message from its parent, it When a child receives an UnbindConfirm message from its parent, it
reports a LEFT_DATA_SESSION_GRACEFULLY event. If it does not reports a LEFT_DATA_SESSION_GRACEFULLY event. If it does not
receive this message after NUM_MAX_PARENT_ATTEMPTS, then it reports receive this message after NUM_MAX_PARENT_ATTEMPTS, then it reports
a LEFT_DATA_SESSION_ABNORMALLY event. a LEFT_DATA_SESSION_ABNORMALLY event. Unbinds are reported to the
Tree BB using the lostSN interface.
7.1.4 Parent Initiated Unbind 7.1.4 Eject
A parent may decide to remove one or more of its children from a A parent may decide to remove one or more of its children from a
data stream for the following reasons. 1) The parent needs to data stream for the following reasons. 1) The parent needs to
leave the group due to application reasons. 2) The repair head leave the group due to application reasons. 2) The repair head
detects an unrecoverable failure with either its parent or the detects an unrecoverable failure with either its parent or the
sender. 3) The parent detects that the child is not able to keep sender. 3) The parent detects that the child is not able to keep
up with the speed of the data stream. 4) The parent is not able to up with the speed of the data stream. 4) The parent is not able to
handle the load of its children and needs some of them to move to handle the load of its children and needs some of them to move to
another parent. In the first two cases, the parent needs to another parent. In the first two cases, the parent needs to
multicast the advertisement of the termination of one or more data multicast the advertisement of the termination of one or more data
sessions to all of its children. In the second two cases, it needs sessions to all of its children. In the second two cases, it needs
to send one or more unicast notifications to one or more of its to send one or more unicast notifications to one or more of its
children. children.
Consequently, a Parent Initiated Unbind can be done either with a Consequently, an Eject can be done either with a repeated multicast
repeated multicast advertisement message to all children, or a set advertisement message to all children, or a set of unicast request-
of unicast request-reply messages to the subset of children that it reply messages to the subset of children that it needs to go to.
needs to go to.
For the multicast version of Parent Initiated Unbind, the parent For the multicast version of Eject, the parent sends a multicast
sends a multicast UnbindNotification message to all of its children UnbindRequest message to all of its children for a given Data
for a given Data Session, on its Local Multicast Channel. It is Session, on its Local Multicast Channel. It is only necessary to
only necessary to provide statistical reliability on this message, provide statistical reliability on this message, since children
since children will detect the parent's failure even if the message will detect the parent's failure even if the message is not
is not received. Therefore, the UnbindNotification message is sent received. Therefore, the UnbindRequest message is sent
NUM_REDUNDANT_MULTICAST_ADVERTISEMENT times, which has a FAILURE_DETECTION_REDUNDANCY times.
recommended value of 3.
For the unicast version of Parent Initiated Unbind, the parent For the unicast version of Eject, the parent sends a unicast
sends a unicast UnbindNotification message to all of its children. UnbindRequest message to all of its children. Each of them respond
Each of them respond with an EjectConfirm. Reliability is ensured with an EjectConfirm. Reliability is ensured through the same
through the same request-reply mechanism as the Bind operation. request-reply mechanism as the Bind operation.
Ejections are reported to the Tree BB using the removeChild
interface.
7.1.5 Fault Detection 7.1.5 Fault Detection
There are three cases where fault detection is needed. 1) There are three cases where fault detection is needed. 1)
Detection (by a child) that a parent has failed. 2) Detection (by Detection (by a child) that a parent has failed. 2) Detection (by
a parent) that a child has failed. 3) Detection (by either a RH or a parent) that a child has failed. 3) Detection (by either a
Receiver) that a Sender has failed. Repair Head or Receiver) that a Sender has failed.
In order to be scaleable and efficient, fault detection is In order to be scaleable and efficient, fault detection is
primarily accomplished by periodic keep-alive messages, combined primarily accomplished by periodic keep-alive messages, combined
with the existing TRACK messages. Nodes expect to see keep-alive with the existing TRACK messages. Nodes expect to see keep-alive
messages every set period of time. If more than a fixed number of messages every set period of time. If more than a fixed number of
periods go by, and no keep-alive messages of a given type are periods go by, and no keep-alive messages of a given type are
received, the node declares a preliminary failure. The detecting received, the node declares a preliminary failure. The detecting
node may then ping the potentially failed node before declaring it node may then ping the potentially failed node before declaring it
failed, or it can just declare it failed. failed, or it can just declare it failed.
skipping to change at line 728 skipping to change at line 891
Heartbeat messages are multicast every HeartbeatPeriod seconds, Heartbeat messages are multicast every HeartbeatPeriod seconds,
from a parent to its children. Every time that a parent sends a from a parent to its children. Every time that a parent sends a
Retransmission message or a Heartbeat message (as well as at Retransmission message or a Heartbeat message (as well as at
initialization time), it resets a timer for HeartbeatPeriod initialization time), it resets a timer for HeartbeatPeriod
seconds. If the timer goes off, a Heartbeat is sent. The seconds. If the timer goes off, a Heartbeat is sent. The
HeatbeatPeriod is dynamically computed as follows: HeatbeatPeriod is dynamically computed as follows:
interval = AckWindow / PacketRate interval = AckWindow / PacketRate
HeartbeatPeriod = 2 * interval HeartbeatPeriod = 2 * interval
If no Retransmission messages are sent within HeartbeatPeriod, a
Heartbeat message is generated. Global configuration parameters Global configuration parameters ConstantHeartbeatPeriod and
ConstantHeartbeatPeriod and MinimumHeartbeatPeriod can be used to MinimumHeartbeatPeriod can be used to either set HeartbeatPeriod to
either set HeartbeatPeriod to a constant, or give HeartbeatPeriod a a constant, or give HeartbeatPeriod a lower bound, globally.
lower bound, globally.
Similarly, a NullData message is multicast by the sender to all Similarly, a NullData message is multicast by the sender to all
data session members, every NULL_DATA_PERIOD. The NullData timer data session members, every NULL_DATA_PERIOD. The NullData timer
is set to NULL_DATA_PERIOD, and is reset every time that a Data or is set to NULL_DATA_PERIOD, and is reset every time that a Data or
NullData message is sent by the Sender. NullData message is sent by the Sender.
The key parameter for failure detection is the global tree The key parameter for failure detection is the global tree
parameter FAILURE_DETECTION_REDUNDANCY. The higher the value for parameter FAILURE_DETECTION_REDUNDANCY. The higher the value for
this parameter, the more keep-alive messages that must be missed this parameter, the more keep-alive messages that must be missed
before a failure is declared. before a failure is declared.
skipping to change at line 755 skipping to change at line 917
failures fast enough that there is a high probability they can failures fast enough that there is a high probability they can
rejoin the stream at another parent, before flow control has rejoin the stream at another parent, before flow control has
advanced the buffer window to a point where the child can not advanced the buffer window to a point where the child can not
recover all lost messages in the stream. In order to attempt to do recover all lost messages in the stream. In order to attempt to do
this, children detect a failure of a parent if this, children detect a failure of a parent if
FAILURE_DETECTION_REDUNDANCY * HeartbeatPeriod time goes by without FAILURE_DETECTION_REDUNDANCY * HeartbeatPeriod time goes by without
any heartbeats. As part of buffer window advancement, described in any heartbeats. As part of buffer window advancement, described in
section 7.2.4, all parents MAY choose to buffer all messages for a section 7.2.4, all parents MAY choose to buffer all messages for a
minimum of FAILURE_DETECTION_REDUNDANCY * 2 * HeartbeatPeriod minimum of FAILURE_DETECTION_REDUNDANCY * 2 * HeartbeatPeriod
seconds, which gives children a period of time to find a new parent seconds, which gives children a period of time to find a new parent
before the buffers are freed. before the buffers are freed. Children report parent failures to
the Tree BB using the lostSN interface.
A parent detects a preliminary failure of one of its children if it A parent detects a preliminary failure of one of its children if it
does not receive any TRACK messages from that child in does not receive any TRACK messages from that child in
FAILURE_DETECTION_REDUNDANCDY * TrackTimeout seconds (see FAILURE_DETECTION_REDUNDANCY * TrackTimeout seconds (see discussion
discussion of how TrackTimeout is computed in 7.2). Because a of how TrackTimeout is computed in 7.2.1). Because a failed child
failed child can slow down the group's progress, it is very can slow down the group's progress, it is very important that a
important that a parent resolve the child's status quickly. Once a parent resolve the child's status quickly. Once a parent declares
parent declares a preliminary failure of a child, it issues a set a preliminary failure of a child, it issues a set of up to
of up to FAILURE_DETECTION_REDUNDANCY Heartbeat messages that are FAILURE_DETECTION_REDUNDANCY Heartbeat messages that are unicast
unicast (or multicast) to the failed receiver(s). These messages (or multicast) to the failed receiver(s). These messages are
are spaced apart by 2*LocalRTT, where LocalRTT is the round trip spaced apart by 2*LocalRTT, where LocalRTT is the round trip time
time that has been measured to the child in question (see 7.4 for that has been measured to the child in question (see 7.4 for
description of how LocalRTT is measured). These Heartbeat messages description of how LocalRTT is measured). These Heartbeat messages
contain a ChildrenList field that contains the children who are contain a ChildrenList field that contains the children who are
requested to send a TRACK immediately. requested to send a TRACK immediately.
Whenever a child receives a Heartbeat message with an Whenever a child receives a Heartbeat message with an
ImmediateTRACK field set to 1, it immediately sends a TRACK to its ImmediateTRACK field set to 1, it immediately sends a TRACK to its
parent. If a parent does not receive a TRACK message from a child parent. If a parent does not receive a TRACK message from a child
after waiting a period of 2*ChildRTT after the last Heartbeat after waiting a period of 2*ChildRTT after the last Heartbeat
message to that child, it declares the child failed, and removes it message to that child, it declares the child failed, and removes it
from the parent's child membership list. from the parent's child membership list. It informs the Tree BB
using the removeChild interface.
A child or a repair head detects the failure of a sender if it does A child or a repair head detects the failure of a sender if it does
not receive a Data or NullData message from a sender in not receive a Data or NullData message from a sender in
FAILURE_DETECTION_REDUNDANCY * NULL_DATA_PERIOD. FAILURE_DETECTION_REDUNDANCY * NULL_DATA_PERIOD.
Note that the more receivers there are in a tree, and the higher Note that the more receivers there are in a tree, and the higher
the loss rate, the larger FAILURE_DETECTION_REDUNDANCY must be, in the loss rate, the larger FAILURE_DETECTION_REDUNDANCY must be, in
order to give the same probability that erroneous failures won't be order to give the same probability that erroneous failures won't be
declared. declared.
7.1.6 Fault Notification 7.1.6 Fault Notification
When a parent detects the failure of a child, it adds a failure When a parent detects the failure of a child, it adds a failure
notification field to the next FAILURE_DETECTION_REDUNDANCY TRACK notification field to the next TRACK messages that it sends up the
messages that it sends up the tree. It sends this notification tree. It sends this notification multiple times because TRACKs are
multiple times because TRACKs are not delivered reliably. A not delivered reliably. A failure notification field includes the
failure notification field includes the failure code, as well as a failure code, as well as a list of one or more failed nodes.
list of one or more failed nodes. Failure notifications are Failure notifications are aggregated up the tree, according to the
aggregated up the tree, according to the rules in 7.3. A failure rules in 7.3. A failure notification is not a definitive report of
notification is not a definitive report of a failure, as the child a failure, as the child may have moved to a different repair head.
may have moved to a different repair head.
7.1.7 Fault Recovery 7.1.7 Fault Recovery
The Fault Recovery algorithms require a list of one or more The Fault Recovery algorithms require a list of one or more
addresses of alternate parents that can be bound to, and that still addresses of alternate parents that can be bound to, and that still
provide loop free operation. provide loop free operation.
If a child detects the failure of its parent, it then re-runs the If a child detects the failure of its parent, it then re-runs the
Bind operation to a new parent candidate, in order to rejoin the Bind operation to a new parent candidate, in order to rejoin the
tree. As described above in section 7.1.2, a node may perform a tree. As described above in section 7.1.2, a node may perform a
late bind, i.e. binding with a repair head which cannot provide all late join, i.e. binding with a repair head which cannot provide all
the necessary repair data, only if allowed by the PI. the necessary repair data, only if allowed by the PI.
7.2 TRACK Generation 7.2 TRACK Generation
This section describes the algorithms used by the receiver to This section describes the algorithms used by the receiver to
determine when to send the TRACK messages. determine when to send the TRACK messages.
TRACK messages are sent from receivers to their parents. TRACK TRACK messages are sent from receivers to their parents. TRACK
messages may be sent for the following purposes: messages may be sent for the following purposes:
- to request retransmission of messages - to request retransmission of messages
- to advance the sender's transmission window for flow control - to advance the sender's transmission window for flow control
purposes purposes
- to deliver end-to-end confirmation of data reception - to deliver end-to-end confirmation of data reception
- to propagate other relevant feedback information up through the - to propagate other relevant feedback information up through the
session (such as RTT and loss reports, for congestion control) session (such as RTT and loss reports, for congestion control)
The TRACK PI also makes use of the NACK BB, which request The TRACK PI also makes use of the NACK BB, which requests
retransmission of messages from a parent. The TRACK request and retransmission of messages from a parent. The TRACK request and
response algorithms should be highly similar to the NACK algorithms response algorithms should be highly similar to the NACK algorithms
for this specific case. for this specific case.
7.2.1 TRACK Generation with the Rotating TRACK Algorithm 7.2.1 TRACK Generation with the Rotating TRACK Algorithm
Each receiver sends a TRACK message to its parent once per Each receiver sends a TRACK message to its parent once per
AckWindow (a globally configured variable) of data messages AckWindow of data messages received. A receiver uses an offset
received. A receiver uses an offset from the boundary of each from the boundary of each AckWindow to send its TRACK, in order to
AckWindow to send its TRACK, in order to reduce burstiness of reduce burstiness of control traffic at the parents. Each parent
control traffic at the parents. Each parent has a maximum number has a maximum number of children, MaxChildren. When a child binds
of children, MaxChildren. When a child binds to the parent, the to the parent, the parent assigns a locally unique ChildID to that
parent assigns a locally unique ChildID to that child, between 0 child, between 0 and MaxChildren-1.
and MaxChildren-1.
Each child in a tree generates a TRACK message at least once every Each child in a tree generates a TRACK message at least once every
AckWindow of data messages, when the most recent data message's AckWindow of data messages, when the most recent data message's
sequence number, modulo MemberID, is equal to AckWindow. If the sequence number, modulo AckWindow, is equal to MemberID. If the
message that would have triggered a given TRACK for a given node is message that would have triggered a given TRACK for a given node is
missed, the node will generate the TRACK as soon as it learns that missed, the node will generate the TRACK as soon as it learns that
it has missed the message, typically through receipt of a higher it has missed the message, typically through receipt of a higher
numbered data message. numbered data message.
Together, AckWindow and MaxChildren determine the maximum ratio of Together, AckWindow and MaxChildren determine the maximum ratio of
control messages to data messages seen by each parent, given a control messages to data messages seen by each parent, given a
constant load of data messages. constant load of data messages.
In each data message, the sender advertises the current PacketRate In each data message, the sender advertises the current PacketRate
skipping to change at line 868 skipping to change at line 1030
At the time a node sends a regular TRACK, it also computes a At the time a node sends a regular TRACK, it also computes a
TRACKTimeout value: TRACKTimeout value:
interval = AckWindow / PacketRate interval = AckWindow / PacketRate
TRACKTimeout = 2 * interval TRACKTimeout = 2 * interval
If no TRACKs are sent within TRACKTimeout interval, a TRACK is If no TRACKs are sent within TRACKTimeout interval, a TRACK is
generated, and TRACKTimeout is increased by a factor of 2, up to a generated, and TRACKTimeout is increased by a factor of 2, up to a
value of MaxTRACKTimeout. value of MAX_TRACK_TIMEOUT.
This timer mechanism is used by a receiver to ensure timely repair This timer mechanism is used by a receiver to ensure timely repair
of lost messages and regular feedback propagation up the tree even of lost messages and regular feedback propagation up the tree even
when the sender is not sending data continuously. This mechanism when the sender is not sending data continuously. This mechanism
complements the AckWindow-based regular TRACK generation mechanism. complements the AckWindow-based regular TRACK generation mechanism.
7.2.2 Local Repair 7.2.2 Local Repair
A repair head maintains the following state for each of its A repair head maintains the following state for each of its
children, for the purpose of providing repair service to the local children, for the purpose of providing repair service to the local
group: group:
- HighestConsecutivelyReceived: a sequence number indicating all - HighestConsecutivelyReceived: a sequence number indicating all
Data messages before this number (inclusive) have been received Data messages up to this number (inclusive) have been received
by a given child. by a given child.
- MissingPackets: a data structure to keep track of the reception - MissingPackets: a data structure to keep track of the reception
status of the Data messages with sequence number higher than status of the Data messages with sequence number higher than
HighestConsecutivelyReceived. HighestConsecutivelyReceived.
In addition, a repair head also maintains other state for purposes In addition, a repair head also maintains other state for purposes
of feedback aggregation described in the next section. of feedback aggregation described in the next section.
The minimum HighestConsecutivelyReceived value of all its children The minimum HighestConsecutivelyReceived value of all its children
skipping to change at line 926 skipping to change at line 1089
by the sender. by the sender.
7.2.3 Flow Control Window Update 7.2.3 Flow Control Window Update
When a receiver sends a TRACK to its parent, the HighestAllowed When a receiver sends a TRACK to its parent, the HighestAllowed
field provides information on the status of the receiver's flow field provides information on the status of the receiver's flow
control window. The value of HighestAllowed is computed as control window. The value of HighestAllowed is computed as
follows: follows:
HighestAllowed = seqnum + ReceiverWindow HighestAllowed = seqnum + ReceiverWindow
Where seqnum is the highest sequence number of consecutively Where seqnum is the highest sequence number of consecutively
received data message at the receiver. The size of the received data messages at the receiver. The size of the
ReceiverWindow may either be based on a parameter local to the ReceiverWindow may either be based on a parameter local to the
receiver or be a global parameter. receiver or be a global parameter.
7.2.4 Reliability Window 7.2.4 Reliability Window
The sender and each repair head maintain a window of messages for The sender and each repair head maintain a window of messages for
possible retransmission. As messages are acknowledged by all of possible retransmission. As messages are acknowledged by all of
its children, they are released from the parent's retransmission its children, they are released from the parent's retransmission
buffer, as described in 7.2.2. In addition, there are two global buffer, as described in 7.2.2. In addition, there are two global
parameters that can affect when a parent releases a data message parameters that can affect when a parent releases a data message
from the retransmission buffer -- MIN_HOLD_TIME, and MAX_HOLD_TIME. from the retransmission buffer -- MinHoldTime, and MaxHoldTime.
MIN_HOLD_TIME specifies a minimum length of time a message must be MinHoldTime specifies a minimum length of time a message must be
held for retransmission from when it was received. This parameter held for retransmission from when it was received. This parameter
is useful to handle scenarios where one or more children have been is useful to handle scenarios where one or more children have been
disconnected from their parent, and have to reconnect to another. disconnected from their parent, and have to reconnect to another.
If, for example, MIN_HOLD_TIME is set to If, for example, MinHoldTime is set to FAILURE_DETECTION_REDUNDANCY
FAILURE_DETECTION_REDUNDANCY * 2 * ConstantHeartbeatPeriod, then * 2 * ConstantHeartbeatPeriod, then there is a high likelihood that
there is a high likelihood that any child will be able to recover any child will be able to recover any lost messages after
any lost messages after reconnecting to another parent. reconnecting to another parent.
The sender continually advertises to the members of the data The sender continually advertises to the members of the data
session both edges of its retransmission window. The higher value session both edges of its retransmission window. The higher value
is the HighestPacketSent field in each data message, which is the SeqNum field in each Data or NullData message, which
specifies the highest sequence number of any data message sent. specifies the highest sequence number of any data message sent.
The trailing edge of the window is advertised in the The trailing edge of the window is advertised in the
HighestReleased field. This specifies the largest sequence number HighestReleased field. This specifies the largest sequence number
of any message sent that has subsequently been released from the of any message sent that has subsequently been released from the
sender's retransmission window. If both values are the same then sender's retransmission window. If both values are the same then
the window is presently empty. Zero is not a legitimate value for the window is presently empty. Zero is not a legitimate value for
a data sequence number, so if either field has a value of zero, a data sequence number, so if either field has a value of zero,
then no messages have yet reached that state. All sequence number then no messages have yet reached that state. All sequence number
fields use sequence number arithmetic so that a data session can fields use sequence number arithmetic so that a data session can
continue after exhausting the sequence number space. continue after exhausting the sequence number space.
When a member of a data session receives an advertisement of a new When a member of a data session receives an advertisement of a new
HighestReleased value, it stores this, and is no longer allowed to HighestReleased value, it stores this, and is no longer allowed to
ask for retransmission for any messages up to and including the ask for retransmission for any messages up to and including the
HighestReleased value. If it has any outstanding missing messages HighestReleased value. If it has any outstanding missing messages
that are less than or equal to HighestReleased, it MAY move forward that are less than or equal to HighestReleased, it MAY move forward
and continue delivering the next data messages in the stream. It and continue delivering the next data messages in the stream. It
also SHOULD report an error for the messages that are no longer also SHOULD report an error for the messages that are no longer
recoverable. recoverable.
MAX_HOLD_TIME specifies the maximum length of time a message may be MaxHoldTime specifies the maximum length of time a message may be
held for retransmission. This parameter is set at the sender who held for retransmission. This parameter is set at the sender which
uses it to set the HighestReleased field in data message headers. uses it to set the HighestReleased field in data message headers.
This is particularly useful for real-time, semi-reliable streams This is particularly useful for real-time, semi-reliable streams
such as live video, where retransmissions are only useful for up to such as live video, where retransmissions are only useful for up to
a few seconds. When combined with Unordered delivery semantics, a few seconds. When combined with Unordered delivery semantics,
and application-level jitter control at the receivers, this and application-level jitter control at the receivers, this
provides Time Bounded Reliability. Obviously, MAX_HOLD_TIME must provides Time Bounded Reliability. Obviously, MaxHoldTime must
always be larger than MIN_HOLD_TIME. always be larger than MinHoldTime.
7.2.5 Confirmed Delivery 7.2.5 Confirmed Delivery
Flow control and the reliability window are concerned with goodput, Flow control and the reliability window are concerned with goodput,
of delivering data with a high probability that it is delivered at of delivering data with a high probability that it is delivered at
all receivers. However, neither mechanism provides explicit all receivers. However, neither mechanism provides explicit
confirmation to the sender as to the list of recipients for each confirmation to the sender as to the list of recipients for each
message. Confirmed delivery allows applications to determine the message. Confirmed delivery allows applications to determine the
set of applications that have received a set of data messages. set of applications that have received a set of data messages.
skipping to change at line 1025 skipping to change at line 1188
to it by the transport. The API could specify that whenever an to it by the transport. The API could specify that whenever an
application has freed up a buffer containing one or more data application has freed up a buffer containing one or more data
messages, then these messages are considered acknowledged by the messages, then these messages are considered acknowledged by the
application. Alternatively, the application could be required to application. Alternatively, the application could be required to
explicitly acknowledge each message. explicitly acknowledge each message.
With a given transport-application API for signaling With a given transport-application API for signaling
acknowledgement, the transport then keeps track of all contiguous acknowledgement, the transport then keeps track of all contiguous
acknowledgements from that application, and reports these up in the acknowledgements from that application, and reports these up in the
ApplicationConfirms field. If one or more messages can not be ApplicationConfirms field. If one or more messages can not be
acknowledged, the receiver should pass an error report attached to acknowledged, the receiver should pass an error code describing the
the ApplicationConfirms field, detailing the type of failure that type of failure that occurred, and the sequence number of the first
occurred, and the sequence number of the first message that has not message that has not yet been delivered.
yet been delivered.
If MAX_HOLD_TIME is not in use for a data stream, so that delivery If MaxHoldTime is not in use for a data stream, so that delivery is
is fully reliable, then any message that can not be delivered will fully reliable, then any message that can not be delivered will be
be considered a fatal error for that receiver. If MAX_HOLD_TIME considered a fatal error for that receiver. If MaxHoldTime has a
has a non-zero value, then any messages that could not be non-zero value, then any messages that could not be delivered, but
delivered, but are less than HighestReleased as advertised by the are less than HighestReleased as advertised by the sender, are not
sender, are not reported as errors. reported as errors.
In addition to the AppSynch field, a sender may also set the In addition to the AppSynch field, a sender may also set the
ImmediateACK field to 1. When a node gets a data message that has ImmediateACK field to 1. When a node gets a data message that has
this flag set, it will immediately send a TRACK after processing this flag set, it will immediately send a TRACK after processing
that message. that message.
7.3 Feedback Aggregation 7.3 Feedback Aggregation
This section describes how repair heads perform aggregation on This section describes how repair heads perform aggregation on
feedback information sent up in the fields of the TRACK message, feedback information sent up in the fields of the TRACK message,
skipping to change at line 1063 skipping to change at line 1224
sender that all the receivers (in the entire tree) have received sender that all the receivers (in the entire tree) have received
data packets up to a certain sequence number. The field that data packets up to a certain sequence number. The field that
carries this information is AppSynch. carries this information is AppSynch.
2) Flow control. The aggregated information is carried in the 2) Flow control. The aggregated information is carried in the
field HighestAllowed. It tells the sender the highest sequence field HighestAllowed. It tells the sender the highest sequence
number that all the receivers (in the entire tree) are prepared to number that all the receivers (in the entire tree) are prepared to
receive. receive.
3) Identifying the slowest receiver. The aggregated information is 3) Identifying the slowest receiver. The aggregated information is
carried in the field Slowest. The sender can used this value as carried in the field Slowest. The sender can use this value as
part of congestion control. part of congestion control.
4) Counting current membership in the group. This information is 4) Counting current membership in the group. This information is
carried in the field SubTreeCount. This lets the sender know the carried in the field SubTreeCount. This lets the sender know the
number of receivers currently connected to the repair tree. number of receivers currently connected to the repair tree.
5) Measuring the round-trip time from the sender to the "worst" 5) Measuring the round-trip time from the sender to the "worst÷
receiver. receiver.
A repair head maintains state for each child. Each time a TRACK A repair head maintains state for each child. Each time a TRACK
(from a child) is received, the corresponding states for that child (from a child) is received, the corresponding states for that child
are updated based on the information in the TRACK message. When a are updated based on the information in the TRACK message. When a
repair head sends a TRACK message to its parent, the following repair head sends a TRACK message to its parent, the following
fields of its TRACK message are derived from the aggregation of the fields of its TRACK message are derived from the aggregation of the
corresponding states for its children. The following rules corresponding states for its children. The following rules
describe how the aggregation is performed: describe how the aggregation is performed:
skipping to change at line 1095 skipping to change at line 1256
all children all children
- Slowest: this is a measure of how slow the slowest member in the - Slowest: this is a measure of how slow the slowest member in the
whole subtree is; take either the minimum (or maximum) of the whole subtree is; take either the minimum (or maximum) of the
Slowest value from all children (depending what the Slowest measure Slowest value from all children (depending what the Slowest measure
is). is).
- SubTreeCount: take the sum of the SubTreeCount from all - SubTreeCount: take the sum of the SubTreeCount from all
children children
- SenderDallyTime: take the minimum value of the SenderDallyTime - SenderDallyTime: take the minimum value, for all of the children,
from all children and add in the local dally time. Note, the of
SendTimeStamp field is left alone. The sender will derive the
roundtrip time to the worst receiver by doing its local aggregation child's reported SenderDallyTime + child's local dally time
for SenderDallyTime and then compute:
Note, the SendTimeStamp field is left alone. The sender will
derive the roundtrip time to the worst receiver by doing its local
aggregation for SenderDallyTime and then compute:
RTT = currentTime - SendTimeStamp - SenderDallyTime. RTT = currentTime - SendTimeStamp - SenderDallyTime.
7.4 Measuring Round Trip Times 7.4 Measuring Round Trip Times
This TRACK BB provides two algorithms for distributed RTT This TRACK BB provides two algorithms for distributed RTT
calculations-LocalRTT measurements and SenderRTT measurements. calculations ¨ LocalRTT measurements and SenderRTT measurements.
LocalRTT measurements are only between a parent and its children. LocalRTT measurements are only between a parent and its children.
SenderRTT measurements are an end to end RTT measurement, measuring SenderRTT measurements are end-to-end RTT measurements, measuring
the RTT to the worst receiver as selected by the congestion control the RTT to the worst receiver as selected by the congestion control
algorithms. algorithms.
The SenderRTT is useful for congestion control. It can be used to The SenderRTT is useful for congestion control. It can be used to
set the data rate based on the TCP response function, which is set the data rate based on the TCP response function, which is
being proposed for the congestion control building block. being proposed for the congestion control building block.
The LocalRTT can be used to (a) quickly detect faulty children (as The LocalRTT can be used to (a) quickly detect faulty children (as
described in 7.1) or (b) avoid sending unnecessary retransmissions described in 7.1) or (b) avoid sending unnecessary retransmissions
(as described in 7.2 in the local repair algorithm). (as described in 7.2 in the local repair algorithm).
skipping to change at line 1158 skipping to change at line 1322
non-zero values to report, each node sends up both a non-zero values to report, each node sends up both a
{SenderTimestamp, SenderDallyTime} and a {ParentTimestamp, {SenderTimestamp, SenderDallyTime} and a {ParentTimestamp,
LocalDallyTime} set of fields in each TRACK message generated. LocalDallyTime} set of fields in each TRACK message generated.
These measurements need to be averaged by the TRACK PI. These measurements need to be averaged by the TRACK PI.
8. Security 8. Security
This BB does not specifically deal with security. It is the This BB does not specifically deal with security. It is the
responsibility of the TRACK PI or the Security BB. This issue is responsibility of the TRACK PI or the Security BB. This issue is
covered in the Security Requirements For TRACK draft [HV00]. covered in the Security Requirements For TRACK draft [HW00].
9. References 9. References
[HV00] T. Hardjono, B. Whetten, "Security Requirements For TRACK," [HW00] T. Hardjono, B. Whetten, "Security Requirements For TRACK,"
Internet Draft, Internet Engineering Task Force, June, 2000. Internet Draft, Internet Engineering Task Force, June, 2000.
[KLCWTCTK01] M. Kadansky, D. Chiu, B. Whetten, B. Levine, G.
Taskale, B. Cain, D. Thaler, S. Koh, "Reliable Multicast Transport
Building Block: Tree Auto-Configuration," Internet Draft, Internet
Engineering Task Force, March, 2001.
[KV00] R. Kermode, L. Vicisano, "Author Guidelines for RMT Building [KV00] R. Kermode, L. Vicisano, "Author Guidelines for RMT Building
Blocks and Protocol Instantiation Documents," Internet Draft, Blocks and Protocol Instantiation Documents," Internet Draft,
Internet Engineering Task Force, June, 2000. Internet Engineering Task Force, June, 2000.
[SFCGLTLLBEJMRSV00] T. Speakman, D. Farinacci, J. Crowcroft, J. [SFCGLTLLBEJMRSV00] T. Speakman, D. Farinacci, J. Crowcroft, J.
Gemmell, S. Lin, A. Tweedly, D. Leshchiner, M. Luby, N. Bhaskar, R. Gemmell, S. Lin, A. Tweedly, D. Leshchiner, M. Luby, N. Bhaskar, R.
Edmonstone, K. M. Johnson, T. Montgomery, L. Rizzo, R. Edmonstone, K. M. Johnson, T. Montgomery, L. Rizzo, R.
Sumanasekera, and L. Vicisano, "PGM Reliable Transport Protocol Sumanasekera, and L. Vicisano, "PGM Reliable Transport Protocol
Specification," Internet Draft, Internet Engineering Task Force, Specification," Internet Draft, Internet Engineering Task Force,
April 2000. November 2000.
[WCP00] B. Whetten, D. Chiu, S. Paul, M. Kadansky, G. Taskale, [WCP00] B. Whetten, D. Chiu, S. Paul, M. Kadansky, G. Taskale,
"TRACK Architecture, A Scalable Real-Time Reliable Multicast "TRACK Architecture, A Scalable Real-Time Reliable Multicast
Protocol," Internet Draft, Internet Engineering ask Force, July Protocol," Internet Draft, Internet Engineering ask Force, July
2000. 2000.
[WVKHFL00] B. Whetten, L. Vicisano, R. Kermode, M. Handley, S. [WVKHFL00] B. Whetten, L. Vicisano, R. Kermode, M. Handley, S.
Floyd, and M. Luby, "Reliable Multicast Transport Building Blocks Floyd, and M. Luby, "Reliable Multicast Transport Building Blocks
for One-to-Many Bulk-Data Transfer," Internet Draft, Internet for One-to-Many Bulk-Data Transfer," RFC 3048, January 2001.
Engineering Task Force, October 2000.
10. Acknowledgments 10. Acknowledgements
We would like to thank the follow people: Sanjoy Paul, Seok Joo We would like to thank the follow people: Sanjoy Paul, Seok Joo
Koh, Supratik Bhattacharyya, Joe Wesley, and Joe Provino. Koh, Supratik Bhattacharyya, Joe Wesley, and Joe Provino.
11. Authors' Addresses 11. Authors' Addresses
Dah Ming Chiu Dah Ming Chiu
dahming.chiu@sun.com dahming.chiu@sun.com
Miriam Kadansky Miriam Kadansky
miriam.kadansky@sun.com miriam.kadansky@sun.com
Sun Microsystems Laboratories Sun Microsystems Laboratories
skipping to change at line 1210 skipping to change at line 1378
Gursel Taskale Gursel Taskale
gursel@talarian.com gursel@talarian.com
Brian Whetten Brian Whetten
whetten@talarian.com whetten@talarian.com
Talarian Talarian
333 Distel Circle 333 Distel Circle
Los Altos, CA 94022-1404 Los Altos, CA 94022-1404
Full Copyright Statement Full Copyright Statement
"Copyright (C) The Internet Society (2000). All Rights Reserved. "Copyright (C) The Internet Society (2001). All Rights Reserved.
This document and translations of it may be copied and furnished to This document and translations of it may be copied and furnished to
others, and derivative works that comment on or otherwise explain others, and derivative works that comment on or otherwise explain
it or assist in its implementation may be prepared, copied, it or assist in its implementation may be prepared, copied,
published and distributed, in whole or in part, without restriction published and distributed, in whole or in part, without restriction
of any kind, provided that the above copyright notice and this of any kind, provided that the above copyright notice and this
paragraph are included on all such copies and derivative works. paragraph are included on all such copies and derivative works.
However, this document itself may not be modified in any way, such However, this document itself may not be modified in any way, such
as by removing the copyright notice or references to the Internet as by removing the copyright notice or references to the Internet
Society or other Internet organizations, except as needed for the Society or other Internet organizations, except as needed for the
purpose of developing Internet standards in which case the purpose of developing Internet standards in which case the
 End of changes. 

This html diff was produced by rfcdiff 1.23, available from http://www.levkowetz.com/ietf/tools/rfcdiff/