[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01 02 draft-ietf-avt-rtcp-feedback

      INTERNET-DRAFT                                            Stephan Wenger
      draft-wenger-avt-rtcp-feedback-01.txt                          TU Berlin
                                                                     Joerg Ott
                                                       Universitaet Bremen TZI
                                                             24 November, 2000
                                                             Expires June 2001
                  RTCP-based Feedback: Concepts and Message Timing Rules
      Status of this Memo
      This document is an Internet-Draft and is in full conformance with all
      provisions of Section 10 of RFC 2026.  Internet-Drafts are working
      documents of the Internet Engineering Task Force (IETF), its areas, and
      its working groups.  Note that other groups may also distribute working
      documents as Internet-Drafts.
      Internet-Drafts are draft documents valid for a maximum of six months
      and may be updated, replaced, or obsoleted by other documents at any
      time.  It is inappropriate to use Internet- Drafts as reference material
      or to cite them other than as "work in progress."
      The list of current Internet-Drafts can be accessed at
      The list of Internet-Draft Shadow Directories can be accessed at
         Real-time media streams are not resilient against packet losses. RTP
         [1] provides all the necessary mechanisms to restore ordering and
         timing to properly reproduce a media stream at the recipient.  RTP
         also provides continuous feedback about the overall reception quality
         from all receivers -- thereby allowing the sender(s) in the mid-term
         (in the order of several seconds to minutes) to adapt their coding
         scheme and transmission behavior to the observed network QoS.
         However, except for a few payload specific mechanisms [2], RTP makes
         no provision for timely feedback that would allow a sender to repair
         the media stream immediately: through retransmissions, retro-active
         FEC, or media-specific mechanisms such as reference picture
         This document specifies a modification to the algorithm for
         scheduling RTCP packets in order to allow occasional timely feedback
         to events observed by a receiver (such a lost packets).  The message
         format for RTCP-based feedback is defined in a companion document
      Wenger/Ott              Expires December 2000                 [Page 1]

      Internet Draft                                       24 November, 2000
      1. Introduction
         Real-time media streams are not resilient against packet losses. RTP
         [1] provides all the necessary mechanisms to restore ordering and
         timing to properly reproduce a media stream at the recipient.  RTP
         also provides continuous feedback about the overall reception quality
         from all receivers -- thereby allowing the sender(s) in the mid-term
         (in the order of several seconds to minutes) to adapt their coding
         scheme and transmission behavior to the observed network QoS.
         However, except for a few payload specific mechanisms [2], RTP makes
         no provision for timely feedback that would allow a sender to repair
         the media stream immediately: through retransmissions, retro-active
         FEC, or media-specific mechanisms such as reference picture
         Current mechanisms available with RTP to improve error resilience
         include audio redundancy coding [3], video redundancy coding [4],
         RTP-level FEC [5], and general considerations on more robust media
         streams transmission [6].  Particularly in small groups, however,
         virtually all kinds of all types of real-time media streams could
         benefit from a mechanism that would enable a sender to perform media
         stream repair -- including but not limited to audio, video, DTMF, and
         text chat streams.
         For example, predictive video coding is not loss resilient.  Any loss
         of coded data leads to annoying artifacts not only in the reproduced
         picture in which the loss occurred, but also in subsequent pictures.
         Error resilience can be achieved by spending bits to convey redundant
         information using source coding based mechanisms or transport based
         mechanisms.  This can be done without the use of any feedback between
         the decoder(s) and the encoder.
         Alternatively, where applicable, decoders can inform the encoder
         through a feedback channel about a loss situation, and the encoder
         can react accordingly.  This approach provides better picture quality
         and is more efficient with respect to the bandwidth used by the
         encoder to achieve a given quality.  However, using feedback
         mechanisms is limited to certain application scenarios identified by
         encoder characteristics, delay constraints, and/or the number of
         recipients.  To reflect the need for very low delay for the
         transmission of the FBs, which is necessary to make them efficient,
         the rules for sending receiver reports are enhanced to support
         Immediate Feedback messages (FB messages) and Early Receiver Reports
         (Early RRs) and algorithms are specified that allow for low delay in
         small multicast groups, but prevent network flooding in larger ones.
         Special consideration is given for point-to-point scenarios.
         In addition, this memo gives some consideration to specific
         application scenarios are the respective feedback requirements, at
         the moment focusing on predictive video coding.
      Wenger/Ott                Expires June 2001                   [Page 2]

      Internet Draft                                       24 November, 2000
         A companion document [7] discusses various types of general purpose
         feedback information (also allowing for extensions specific to
         certain media payload) and defines an RTCP packet format to transmit
         FBs in an RTP environment.  It can be used in conjunction with all
         payload specifications for predictive video coding schemes currently
         available for RTP.
      2. Motivation
         2.1 Example: Predictive Video Coding
         2.1.1 Video Encoder-decoder synchronicity
         Most current video coding schemes for compressed video, such as the
         ITU-T H.261 and H.263 and ISO/IEC MPEG[124] employ a mechanism known
         as Inter Picture Prediction.  Each picture is divided into
         macroblocks of uniform size.   For each macroblock, one or more
         motion vectors may be identified and transmitted.  The residual
         signal after motion compensation is DCT-transformed, quantized,
         entropy coded, and transmitted as well.  The encoder reconstructs,
         based on this information, a so-called reference picture, which is
         used to perform the motion compensation and residual signal coding
         steps for the subsequent picture.  Since the reference picture is
         generated using only such information that is also available at the
         decoder, the reference picture is identical to the reconstructed
         picture at the decoder.  Having identical reference pictures at the
         encoder and decoder is referred to as encoder-decoder-synchronicity.
         Whenever data is damaged or lost on the way between the encoder and
         the decoder, the reconstructed picture at the decoder is no more
         identical with the encoder's reference picture -- the encoder-decoder
         synchronicity is lost.
         Any loss of the encoder-decoder synchronicity results in annoying
         artifacts at the decoder.  Because the prediction of subsequent
         pictures in the decoder is based on a damaged reference picture, the
         annoying artifacts are present not only in the picture in which the
         loss occurred; they propagate to all subsequent pictures, until,
         through source coding based mechanisms, the encoder-decoder
         synchronicity is restored.  Therefore, the goal of systems employing
         predictive video coding in a lossy environment must be to keep the
         encoder-decoder synchronicity, or, if this is not possible, to regain
         that synchronicity as quickly as possible.
         2.1.2. Non-feedback based mechanisms
         Avoiding the loss of the encoder-decoder synchronicity corresponds to
         avoiding the loss of coded picture data.  Such a task can be
         performed on the transport layer.  In RTP environments, the use of
         packet-based FEC is a good example for such a technique. (The use of
         TCP or reliable multicast as the transport for media streams would be
         an even better one but is inappropriate for low-delay (interactive)
      Wenger/Ott                Expires June 2001                   [Page 3]

      Internet Draft                                       24 November, 2000
         real-time systems.)  FEC schemes, interleaving, and other means for
         repairing real-time media streams may also add additional delay and
         significant bit rate overhead without being able to guarantee
         compensation of virtually all packet losses.
         Once the encoder-decoder synchronicity is lost, only source coding
         oriented mechanisms can help to regain it.  One common way is to send
         a non predictively coded picture (known as Intra picture).  Intra
         pictures have the disadvantage of being several times bigger than
         predictively coded pictures (Inter pictures).  Therefore, sending
         Intra pictures has negative implications both on the bandwidth and
         (in bandwidth limited environments) delay.  Another way is to use
         Intra macroblock refresh.  Here, certain parts of the picture (those
         affected by a packet loss) are coded non predictively in order to
         resynchronize the encoder and decoder over time.  Intra macroblock
         refresh has better delay characteristics then full Intra pictures
         because the picture size can be kept constant, but is less efficient
         in terms of bit rate/distortion than full Intra pictures.  More
         sophisticated means such as Reference Picture Selection (RPS) are
         also available in modern video coding standards.
         Systems not employing feedback channels may use any combination of
         the mechanisms described above to add error resilience -- at the cost
         of added bit rate and, sometimes, added delay.  The number of
         additional bits spent for error resilience can be adapted using the
         long-term packet loss rate information in the RTCP receiver reports.
         But, even when using such adaptive means, it is still likely that
         systems spend many more bits then theoretically necessary to achieve
         error resilience in order to be on the safe side.  Plus, as regular
         RTCP feedback is aimed at longer terms, reactivity to sudden losses
         is limited.  In all practical applications today this means that
         fewer bits are available for non redundant picture data, and hence
         the overall picture quality suffers.
         2.1.3 Feedback based systems
         Feedback-based systems try to avoid spending too many bits for
         redundant information by informing the encoder about a loss situation
         at the decoder(s).  The encoder can then react accordingly and spend
         redundant bits only when needed possibly only for the part of the
         picture that was effected by the loss -- thereby reducing the number
         of redundant bits and leaving more bits for useful information.  As a
         result, a higher reproduced picture quality can generally be expected
         when feedback channels are available.
         Similar to the observations of section 2.1.2, transport and source
         coding based mechanisms can be distinguished that react on loss
         situations reported by feedback.
         Transport based systems employing feedback react media unaware, by
         re-transmitting lost packets.  TCP is a good example for a protocol
         following such a scheme.  Transport-based feedback in real-time
      Wenger/Ott                Expires June 2001                   [Page 4]

      Internet Draft                                       24 November, 2000
         and/or multicast environments is a complex matter and subject of a
         lot of engineering and research in and outside of the IETF.  This
         specification is not concerned with pure transport-based feedback.
         Source coding based mechanisms may react upon the arrival of a
         feedback message indicating a loss situation by adding bits that
         restore, or at least make an effort to restore, the encoder-decoder
         synchronicity.  This process has to be performed by a real-time
         encoder.  However, schemes were reported, that allow the use of
         feedback also for non-real-time encoders by storing multiple
         representations of the same data (e.g. Inter and Intra coded), and
         dynamically switching between those representations.
         Several types of feedback messages, called Feedback Messages or FB
         messages, can be defined for such a case.  An FB message can be as
         simple as a Boolean condition, indicating for example the loss of a
         full picture (and, therefore, the need of a full Intra picture
         transmission).  Other feedback messages may contain more complex
         information such as information about the damage of a spatial region
         of the picture.  A special form consists of a message the format and
         semantics of which are not known at the transport level, because they
         are defined in the video codec standards.
         2.2 Feedback Messages
         Most FB messages contain negative acknowledge information, indicating
         an erroneous situation at the decoder.  In others, the nature of the
         acknowledge (positive, negative, or both) is part of the feedback
         message itself.  When used in multicast environments, positive
         acknowledge must not be used.
         This document assumes that feedback messages are transmitted using
         RTCP packets.  RTCP messages from the receivers to the sender cannot
         be sent at any possible time, in order to prevent traffic explosion
         in case of large multicast groups.  Instead, the bit rate for all
         RTCP messages of all receivers together has to obey a maximum
         fraction of the total RTP session bit rate, yielding a very limited
         bit rate budget for a single receiver when having a large multicast
         group.  This, in turn, leads to an increased average delay when the
         size of the receiving multicast group grows.  (see section 6 of
         draft-ietf-avt-rtp-new-08.txt [1] for details)
         This specification defines an algorithm that adheres to the bit rate
         limitations for the feedback channel on the long term, but allows
         short-term overdrafting for any receiver (but not all of them
         simultaneously).  Thus, the algorithm allows for better real-time
         performance then the one specified in draft-ietf-avt-rtp-new-08.txt.
         Traffic explosion in such cases in which many receivers identify a
         picture damage simultaneously is prevented by dithering.
         As this specification assumes a real-time encoder that has full
         control over its transmission bit rate, there is no scaling problem
      Wenger/Ott                Expires June 2001                   [Page 5]

      Internet Draft                                       24 November, 2000
         on the forward channel.  Any reaction to negative feedback generates
         additional bits, which have to be conveyed but this is taken from the
         sender's total bit rate budget.  The encoder can take this into
         account by, for example, changing the encoding mode, packet size, and
         so forth.  The sender is also free to simply ignore feedback
         messages.  Adjusting the tradeoff between the reproduced media
         quality of all receivers of a multicast group and the amount of
         additional repair traffic is a media-dependent, very complex task and
         is not covered in this specification.
         Finally, frequent RTCP-based feedback messages may provide additional
         input to the sender(s)'s congestion control algorithms and thus
         improve its reactivity towards network congestion.
         2.4. Applications and Relationships to other Standards
         This specification is based on RTCP, which implies its use in an RTP
         environment.  RTP itself is used in a variety of systems such as in
         SIP- or H.323-based multimedia conferencing/telephony.
         As for the video codecs, there is currently a small set of standards
         that are, for the purpose of this discussion, roughly comparable.
         Many mechanisms for regaining encoder-decoder synchronicity are
         applicable to all video codecs.  Others require certain tools (such
         as Reference Picture Selection, aka NEWPRED) that are available only
         in certain versions of the standards, and/or optional tools whose use
         must be negotiated prior to being used.
         A few RTP payload specifications such as RFC 2032 already define a
         feedback mechanism for some of the coding algorithms considered in
         this specification.  An application capable of performing both
         schemes MUST use the feedback mechanism defined in this
         specification, although, for backward compatibility reasons, it MUST
         also be capable to conform to the feedback scheme defined in the
         respective RTP payload format, if this is required by that payload
         Also, audio, DTMF, and text streams could benefit from more immediate
         feedback even though the redundancy payload formats work well for
         these media.
         All kinds of non-interactive media streams (such as RTSP-controlled
         media streaming applications) could benefit significantly as without
         interactivity there is more time available for media repair.
         2.5 Remarks on the size of the multicast group
         This specification makes an attempt to prevent traffic explosion on
         the feedback channel in a very similar way as RTP does, with the
         exception of allowing individual receivers to overdraft their bit
      Wenger/Ott                Expires June 2001                   [Page 6]

      Internet Draft                                       24 November, 2000
         rate budget from time to time.  This is necessary in order to allow
         for low delay, which is needed by the algorithms reacting to FBs.
         This scaling, however, limits the usefulness of this mechanism in
         multicast groups from a certain size upwards (where the size
         threshold depends on a number of parameters including loss rate,
         frame rate).  The maximum size of the multicast group is not
         specified here (which is soft and also depends on application
         requirements).  Considerations on the multicast group sizes will be
         presented in section 3.5.
         2.6 Terminology
         The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
         "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
         document are to be interpreted as described in RFC 2119 [8]
      3. Low delay RTCP Feedback
         Two components constitute RTCP-based feedback as described in this
         . Status reports are contained in SR/RR messages and are transmitted
           at regular intervals as part of compound RTCP packets (which also
           include SDES and possibly other messages); these status reports
           provide an overall indication for the recent reception quality of a
           media stream.  RTP-new [1] define rules for the transmission of
           these status reports.
         . Feedback messages as defined in a companion document [7] that
           indicate loss or reception of particular pieces of a media
           stream(or provide some other form of rather immediate feedback on
           the data received).  Rules for the transmission of feedback
           messages are newly introduced in this memo.
         As discussed in [7], RTCP Feedback (FB) messages are just another
         RTCP message type.  Thus multiple FB messages may be combined in a
         single RTCP packet.  FB messages may be sent in full compound RTCP
         packets along with SR/RR, SDES, and other RTCP messages.  Or they may
         be transmitted in minimal compound RTCP FB packets (which only
         contain the RR/SR and an encryption prefix if necessary to reduce the
         message size).  RTCP packets that do not contain FB messages are
         referred to as non-FB RTCP packets.
         3.1 Algorithm Outline
         FB messages are part of the RTCP control streams and are thus subject
         to the same bandwidth constraints as other RTCP traffic.  This means
         in particular that it may not be possible to report a packet loss at
         a receiver immediately back to the sender.  However, the value of
      Wenger/Ott                Expires June 2001                   [Page 7]

      Internet Draft                                       24 November, 2000
         feedback given to a sender typically decreases over time -- in terms
         of the media quality as perceived by the user at the receiving end
         and/or the cost required to achieve media stream repair.
         RTP-new [1] specifies rules when compound RTCP packets should be
         sent.  This specification modifies those rules in order to allow
         applications to timely report media loss or reception events, since
         most algorithms that use FB messages are very critical to the
         feedback timing.  See section 5 and following for a discussion of FB
         messages and the impact of delay on the performance these FB types.
         The modified algorithm can be outlined as follows: Normally, when no
         FB messages have to be conveyed, compound RTCP packets are sent
         following the rules of RTP-new.  If a receiver detects the need for
         an FB message, the receiver first checks whether it has already seen
         a corresponding FB message from any other receiver (which it can do
         with all FB messages that are transmitted via multicast; for unicast
         sessions, there is no such delay).  If this is the case then the
         receiver refrains from sending the FB message, and continues to
         follow the regular RTCP sending schedule.  If the receiver has not
         yet seen a similar FB message from any other receiver, it checks
         whether it has already overdrafted its RTCP bit rate budget before
         (without waiting for its regularly scheduled RTCP transmission time).
         Only if this is not the case, it sends the FB message, after waiting
         a short, random dithering interval period (in case of multicast).
         FB messages are sent as part of minimal compound RTCP packets .  Full
         compound RTCP packet are interspersed as per [1] in regular intervals
         of at least five seconds.
         3.2 Modes of Operation
         RTCP-based feedback may operate in one of three modes (figure 1):
         a) Immediate feedback mode: the group size is below a certain
            threshold (the FB threshold) which gives each receiving party
            sufficient bandwidth to transmit the feedback traffic for the
            intended purpose.  This means, for each receiver there is enough
            bandwidth to report each event it is supposed/expected to by means
            of a virtually "immediate" Early RTCP packet to.
            The group size threshold is a function of a number of parameters
            including (but not necessarily limited to) the type of feedback
            used (e.g. ACK vs. NACK), bandwidth, packet rate, packet loss
            probability, codec, and -- again depending on the type ff FB used
            -- the (worst case or observed) frequency of events to report
            (e.g. frame received, packet lost).
            A special case of this is the ACK mode (where positive
            acknowledgements are used to confirm reception of data) which is
            restricted to point-to-point communications.
      Wenger/Ott                Expires June 2001                   [Page 8]

      Internet Draft                                       24 November, 2000
         b) In Early RTCP mode, the group size and other parameters no longer
            allow each receiver to react to each event that would be worth (or
            needed) to report.  But feedback can still be given sufficiently
            often so that it allows the sender to adapt the media stream and
            thereby increase the overall quality of the reproduced media
         c) From a some group size upwards, it is no longer useful to provide
            feedback from individual receivers at all -- because of the time
            scale in which the feedback could be provided and/or because in
            large groups the sender(s) have no chance to react to individual
            feedback anymore.
         As the feedback algorithm described in this memo scales, there is no
         need for an agreement on the precise values of the respective
         "thresholds" within the group.  Hence the borders between all these
         modes are fluent.
           :<- - - - NACK feedback - - - ->//
           :   Immediate   ||
           : Feedback mode ||Early RTCP mode   Regular RTCP mode
           :               ||
          -+---------------||---------------//------------------> group size
           2               ||
                     FB Threshold
               = f(rate,loss,codec,...)
         Figure 1: Modes of operation
         The respective thresholds depend on a number of technical parameters
         (of the codec, the transport, the feedback used, etc.) but also on
         the respective application scenarios.  Section 3.5 provides some
         useful hints (but no complete precise calculations) on estimating
         these thresholds.
         3.3 Definitions
         a) Let the media stream be transmitted at a (roughly) constant packet
            rate f (in packets per second).  This results in an average
            inter-packet interval of tau=1/f.
         b) Let T_rtt be the maximum round trip time as measured by RTCP
            (if available to the receiver).
            Note that this may be asymmetric.
      Wenger/Ott                Expires June 2001                   [Page 9]

      Internet Draft                                       24 November, 2000
         c) Let t_rr and t_(rr-1) be the time for the next (last) scheduled
            RTCP RR transmission calculated prior to reconsideration.
            Let T_rr + t_(rr-1) = t_rr.  (In the RTP-new draft these are
            termed tp, tn, respectively).
         d) Let t_e be the time for which a feedback packet is scheduled.
         e) Let t_dither_max be the maximum interval for which an RTCP
            feedback packet may be additionally delayed (to prevent
         f) Let T_fd be the delay for the feedback message that a certain
            packet to return to the sender after.
         g) Let S be the number of active senders in the RTP session.
         h) Let N be the current estimate of the number of receivers in the
            RTP session.
         The feedback situation for a packet loss at a receiver is depicted in
         figure 2 below.  At time t0, a packet loss is detected at the
         receiver.  The receiver decides -- based upon current T_rtt, group
         size, and other (application-specific) parameters -- that a feedback
         message shall be sent back to the sender.
         To avoid an implosion of immediate feedback packets, the receiver
         delays transmission of the compound feedback packet by a random
         amount T_fd (with the random number evenly distributed in the
         interval [0, T_dither_max].  Transmission of the compound RTCP packet
         is then scheduled for t_e = t0 + T_fd.
         The T_dither_max parameter is chosen based upon the group size, the
         RTCP bandwidth constraints, and, if available, the round-trip time.
         In addition, the receiver may take into account a number of other
         parameters (such as the estimated round-trip time, the type of
         feedback to be provided) to possibly extend the upper bound for the
         feedback while ensuring that the feedback information still will make
         sense when it reaches the sender.
         If a compound RTCP feedback packet is scheduled, the time slot for
         the next scheduled compound RTCP packet is updated accordingly to a
         new t_rr.
      Wenger/Ott                Expires June 2001                  [Page 10]

      Internet Draft                                       24 November, 2000
                   event to
                      |  RTCP feedback
                      vXXXXXXXXXXXXXXXXXXXX            ) )
         |---+--------+-------------+-----+------------| |--------+--------->
             |        |             |     |            ( (        |
             |       t0            te                             |
          t_(rr-1)                                              t_rr
                       \_______  ________/
         Figure 2: Event report and parameters for Early RTCP scheduling
         3.4 Early RTCP Algorithm
         Assume an active sender S0 (out of S senders) and a number N of
         receivers with R being one of these receivers.
         Assume further that R has verified that using feedback mechanisms is
         reasonable at the current constellation (which is highly application
         specific and hence not specified in this document at the moment; a
         future revision may contain more detailed guidelines to this end).
         Then, the following rules apply to transmitting a Feedback Messages
         as minimal compound RTCP packet
         Initially, R sets allow_early=TRUE.
         At a point in time t0, R has transmitted the last RTCP RR packet at
         t_(rr-1) and has scheduled the next transmission (prior to
         reconsideration) for t_rr.
         Now R detects the need to transmit a feedback message (e.g. because a
         media "unit" needs to be ACKed or NACKed) at time t0.
         R first checks whether there is still a feedback packet waiting for
         transmission.  If so, the new feedback message is appended to the
         packet and the increased RTCP packet size is updated in the RTCP
         bandwidth calculation (which may lead to an adjustment of t_rr); the
         schedule for the waiting RTCP feedback packet remains unchanged.
         If no feedback message is already awaiting transmission a new
         (minimal) compound RTCP feedback message is created and the interval
         T_dither_max is chosen as follows:
         i)   If the session is a unicast session (group size = 2) then
              T_dither_max := 0.
      Wenger/Ott                Expires June 2001                  [Page 11]

      Internet Draft                                       24 November, 2000
         ii)  If the receiver has an RTT estimate to the originator of the
              media unit to provide feedback about, then T_dither_max is set
              to T_rtt/2, but a minimum of 10ms.
         iii) If the receiver does not have an RTT estimate to the originator,
              T_dither_max is set to 100ms or T_rr/2, whichever is lower.
         Note: These values are subject to discussion.
         (Note that feedback-specific considerations may make it worth while
         to increase T_dither_max beyond this value.)
         Then, R checks whether its next regularly scheduled RTCP packet is
         within the time bounds for the RTCP FB (t_e + t_dither_max > t_rr).
         If so, no Early RTCP is scheduled; instead the FB message is appended
         to the regular RTCP packet. Otherwise, R should check whether it is
         allowed to transmit an Early RTCP packet (allow_early==TRUE).
            If so, R calculates t_dither_max and then schedules an Early RTCP
            packet for t_e = t0 + RND * T_dither_max with the RND function
            evenly distributed between 0 and 1.
            If R receives an  RTCP feedback packet (indicating the same or a
            superset of the feedback information R wanted to transmit) before
            t_e is reached, the FB information is discarded and the
            transmission schedule for the next RR packet is reset to t_rr as
            calculated before.
            (Note: if the FB is piggybacked onto a regularly scheduled RTCP RR
             message, this should not affect transmission of the RR; but
             should the FB then be removed from the compound RR/FB?)
            Otherwise, when t_e is reached, R creates an RR, appends the FB
            information, and transmits the RTCP packet.  R then sets
            allow_early=FALSE and recalculates t_rr = t_rr = t_e + 2*T_rr.  As
            soon as R sends its next regularly scheduled RTCP RR
            (at the new t_rr), it sets allow_early=TRUE again.
         If allow_early==FALSE then R calculates T_dither_max and checks the
         time for the next scheduled RR: if t_rr - t0 < t_dither_max then R
         creates an FB message for transmission along with the RTCP packet at
         t_rr (see above).  Otherwise, R does not send an RTCP feedback
         In regular RTCP intervals as specified by [1] (i.e. at most every
         five seconds), a full compound RTCP packet is sent (which may also
         contain a feedback message if one is scheduled).
         The E bit in the message header [7] is used upon reception to detect
         whether this RTCP feedback message was sent as Early RTCP or not.
         Hence, a feedback message that is sent as an Early RTCP packet MUST
         set the E bit in the message header to "1".  Feedback messages piggy-
      Wenger/Ott                Expires June 2001                  [Page 12]

      Internet Draft                                       24 November, 2000
         backed on regularly scheduled RTCP packets will MUST set the E bit to
         3.5 Considerations on the Group Size
         This section intends to give some brief guidelines to the group sizes
         at which the various feedback modes may be used.
         3.5.1 ACK mode
         The group size MUST be exactly two participants, i.e. point-to-point
         communications.  Unicast addresses SHOULD be used in the session
         For unidirectional as well as bi-directional communication between
         two parties, 2.5% of the RTP session bandwidth are available for
         feedback.  Assuming a ratio of 1:10 for minimal to full compound RTCP
         packets, at 64kbit/s, a receiver can report 2.5 events per second
         back to the sender, at 256kbit/s 10 events and so forth.
         From 768kbit/s upwards, a receiver would be able to acknowledge each
         individual frame (not packet!) in a 30 fps video stream.
         ACK strategies have to be defined accordingly to work with these
         bandwidth limitations.
         3.5.2 NACK mode
         Negative acknowledgements have to be used for all groups larger than
         Whether or not the use of Early RTCP packets should be considered
         depends upon a number of parameters including session bandwidth,
         codec, special type of feedback, number of senders and receivers,
         among many others.
         The crucial parameters -- to which all of the above can be reduced --
         is the allowed minimal interval between two RTCP reports and the
         number of events that presumably need reporting per time interval.
         The minimum interval is derived from the available RTCP bandwidth and
         the expected average size of an RTCP packet.  The number events to
         report e.g. per second may be derived from the packet loss rate and
         sender's rate of transmitting packet.  From these two values, the
         allowable group size can be calculated.
         Example: If a 256kbit/s video with 30 fps is transmitted through a
         network with an MTU size of some 1500 bytes, then, in most cases,
         each frame would fit in its own packet leading to a packet rate of 30
         packets per second.  If 5% packet loss occurs in the network (equally
      Wenger/Ott                Expires June 2001                  [Page 13]

      Internet Draft                                       24 November, 2000
         distributed, no inter-dependence between receivers), then each
         receiver will have to report 3 packets lost each two seconds.
         Assuming a single sender and more then three receivers yields 3.75%
         of the RTCP bandwidth allocated to the receivers and thus 9.6kbit/s.
         Assuming further 100 bytes for the average compound RTCP packet
         allows 12 RTCP packets to be sent per second or 24 in two seconds.
         If every receiver needs to report three packets, this yields a
         maximum group size of 8 receivers if all loss events shall be
         reported.  The rules for Early RTCP packets should provide sufficient
         flexibility for most of this reporting to occur in a timely fashion.
         3.6 Summary of decision steps
         3.6.1 General Hints
         Before even considering whether or not to send RTCP FB information an
         application has to determine whether this mechanism is applicable:
         1) An application has to decide whether -- for the current ratio of
            packet rate with the associated (application-specific) maximum
            feedback delay and the currently observed round-trip time (if
            available) -- feedback mechanisms can be applied at all.
            This decision may be initially influenced by the session
            description as
         2) The application has to decide whether -- for a certain observed
            error rate, assigned bandwidth, frame rate, and group size -- (and
            which) feedback mechanisms can be applied.
         3) If these tests pass, the application has to follow the rules for
            transmitting Early RTCP packets or regularly scheduled RTCP
            packets with piggybacked FBs.
         3.6.2 Session Description Attributes
         A number of additional SDP parameters may be used to describe a
         session.  These are defined as session level and/or media level
       Group size
         Session level or media level attribute.  Indicates a group size
         estimate if known to the sender.  A lower, an upper, or both bounds
         may be specified.  Also a single number can be given.
         ACK mode of operation is only allowed if the group-size attribute is
         present and a fixed group size of 2 is indicated AND the transport
         addresses of all media streams are unicast addresses.
      Wenger/Ott                Expires June 2001                  [Page 14]

      Internet Draft                                       24 November, 2000
       RTCP Feedback
         a=rtcp-fb: {"ack"|"nack"|extension} params
         This attribute is used to indicate the feedback (to be) supported by
         the sender. "ack" MUST only beused if "a=group-size:2 fixed" is
         defined as well.
         It is up to the recipients whether or not they make use of the
      4. Format of RTCP Feedback messages
         The general format of the FB messages are defined in [7].
      5. Informative: Considerations On Video
         This section of this memo covers feedback messages for a Picture Loss
         Indication (PLI), Slice Loss Indication (SLI), and Reference Picture
         Selection Indication (RPSI).  PLI indicates the loss of a full
         picture and roughly corresponds to the Fast Intra Request known from
         H.320 systems and from RFC 2032 (H261 packetization).  Algorithms
         using SLI can be found under the acronym Automatic Repeat Request
         (ARQ) in the signal processing literature.  Reference Picture
         Selection, aka NEWPRED, is available in certain profiles of MPEG-4
         (version 2 and later) and as an optional mode in H.263 (version 2 and
         later).  The packet format specified in this document is open to
         extensions so that future feedback mechanisms can easily be
         All these messages use the payload specific feedback format as
         defined in [7], using PT=PSFB and the FMT field to further
         distinguish between the three subtypes.  These messages are defined
         for payload types indicating H.263 and MPEG-4.
         Note that the Bit 00 of the first (counting from 1) 32-bit word in
         the messages described below is placed in Bit 08 of the fourth
         (counting from 1) 32-bit word of the payload type specific feedback
         5.1 Message Type 1: Picture Loss Indication (PLI)
         5.1.1 Semantics
         With the Picture Loss Indication message a decoder informs the
         encoder about the loss of one or more full pictures
         5.1.2 Format
      Wenger/Ott                Expires June 2001                  [Page 15]

      Internet Draft                                       24 November, 2000
         PLI does not require parameters.  Therefore, the length field MUST be
         0, and there MUST NOT be Feedback Control Information.
         5.1.3 Timing Rules
         The timing follows the rules outlined in section 3.  In systems that
         employ both PLI and other FB types it may be advisable to follow the
         regular RTCP RR timing rules, since PLI is not as delay critical as
         other FB types.
         5.1.4 Remarks
         PLI messages typically trigger the sending of full Intra pictures.
         Intra Pictures are several times larger then predicted (Inter)
         pictures.  Their size is independent of the time they are generated.
         In most environments, especially when employing bandwidth-limited
         links, the use of an Intra picture implies an allowed delay that is a
         significant multitude of the typical frame duration.  An example: If
         the sending frame rate is 10 fps, and an Intra picture is assumed to
         be 10 times as big as an Inter picture (not an unrealistic
         assumption, see [] for details), then a full second of latency has to
         be accepted.  In such an environment there is no need for a
         particular short delay in sending the feedback message.  Hence
         waiting for the next possible time slot allowed by RFC1889bis RTCP
         timing rules does not negatively influence system performance.
         5.2 Message Type 2: Slice Lost Indication
         5.2.1 Semantics
         With the Slice Lost Indication a decoder can inform an encoder that
         it was unable to decode one, or several consecutive, macroblocks.
         The encoder can take appropriate action in order to re-synchronize
         encoder and decoder by means of its choice, typically by sending the
         lost macroblocks in Intra mode.  This feedback message SHALL NOT be
         used for video codecs with non-uniform, dynamically changeable
         macroblock sizes such as H.263 with enabled Annex Q.  In such a case,
         an encoder cannot always identify the corrupted spatial region.
         5.2.2 Format
         When FBT indicates a Slice Lost Indication, then there is one
         additional UCI field the content of which is in the following format:
          0                   1                   2                   3
          0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
         |            First        |  Number                 |  TR       |
         First: 13 bits
      Wenger/Ott                Expires June 2001                  [Page 16]

      Internet Draft                                       24 November, 2000
         The macroblock (MB) address of the first lost macroblock.  The MB
         numbering is done such that the macroblock in the upper left corner
         of the picture is considered macroblock number 1 and the number for
         each macroblock increases from left to right and then from top to
         bottom in raster-scan order (such that if there is a total of N
         macroblocks in a picture, the bottom right macroblock is considered
         macroblock number N).
         Number: 13 bits
         The number of lost macroblocks, in scan order as discussed above.
         TR: 6 bits
         The six least significant bits of the Temporal Reference of the
         5.2.3 Timing Rules
         The efficiency of algorithms using the Slice Lost Indication is
         reduced greatly when the Indication is not transmitted in a timely
         fashion.  Motion compensation propagates corrupted pixels that are
         not reported as being corrupted.  Therefore, the use of the algorithm
         discussed in section 3 is highly recommended.
         Constraints on T_dither_max to be discussed.
         5.2.4 Remarks
         The First field of the UCI defines the first macroblock of a picture
         as 1 and not, as one could suspect, as 0.  This was done to align
         this specification with the comparable mechanism available in H.245.
         The maximum number of macroblocks in a picture (2**13 or 8192)
         corresponds to the maximum picture sizes of the ITU-T and ISO/IEC
         video codecs.  If future video codecs offer larger picture sizes
         and/or smaller macroblock sizes, then an additional feedback message
         has to be defined.  The six least significant bits of the Temporal
         Reference field are deemed to be sufficient to indicate the picture
         in which the loss occurred.
         Algorithms were reported that keep track of the regions effected by
         motion compensation, in order to allow for a transmission of Intra
         macroblocks to all those areas, regardless of the timing of the FB
         [TBP.].  While, when those algorithms are used, the timing of the FB
         is less critical then without, it has to be observed that those
         algorithms correct large parts of the picture and, therefore, have to
         transmit many for bits in case of delayed FBs.
         5.3 Message Type 3: Reference Picture Selection Indication
         5.3.1 Semantics
      Wenger/Ott                Expires June 2001                  [Page 17]

      Internet Draft                                       24 November, 2000
         Modern video coding standards such as MPEG-4 visual version 2 or
         H.263 version 2 allow the use of older reference pictures then the
         most recent one.  Typically, a first-in-first-out queue of reference
         pictures is maintained.  If an encoder has learned about a loss of
         encoder-decoder synchronicity, a known-as-correct reference picture
         can be used. As this reference picture is temporally further away
         then usual, the resulting predictively coded picture will use more
         Both MPEG-4 and H.263 define a binary format for the _payload_ of an
         RPSI message that includes information such as the temporal ID of the
         damaged picture and the size of the damaged region.  This bit string
         is typically small _- a couple of dozen bits -_, of variable length,
         and self-contained, i.e. contains all information that is necessary
         to perform reference picture selection.
         Note that both MPEG-4 and H.263 allow the use of RPSI with positive
         feedback information as well.  That is, all corrected pictures are
         reported.  Any form of positive feedback MUST NOT be used when in a
         multicast environment (reporting positive feedback about individual
         reference pictures at RTCP intervals is not expected to be of much
         use anyway).  For point-to-point communication, positive feedback MAY
         be used but, again, the bit rate budget of RTCP feedback will prevent
         the use in most scenarios anyway.
         5.3.2 Format
         When FB indicates an RPSI, then the length field is set to the number
         of bits of the following bit string that contains the RPS
         information.  This bit string follows byte aligned in the UCI field.
         Bit padding is used to achieve 32-bit word alignment of the UCI
         message (and the whole packet).
         5.3.3 Timing Rules
         RPS is even more critical to delay then algorithms using SLI.  This
         is due to the fact that the older the RPS message is, the more bits
         the encoder has to spend to achieve encoder-decoder synchronicity.
         See [TBP.] for some information about the overhead of RPS for certain
         bit rate/frame rate/loss rate scenarios.
         Therefore, RPS messages should typically be sent as soon as possible,
         employing the algorithm of section 3.
         Constraints on T_dither_max to be discussed.
         5.3.4 Remarks
         [To Do]
         6. Informative: Considerations on Audio
      Wenger/Ott                Expires June 2001                  [Page 18]

      Internet Draft                                       24 November, 2000
         7. Informative: Considerations on DTMF
         8. Informative: Considerations on Text
         9. Security considerations
         RTP packets transporting information with the proposed payload for-
         mat are subject to the security considerations discussed in the RTP
         specification [1]. This implies that confidentiality of the media
         streams is achieved by encryption.
         If the entire stream (extension data and AU data) is to be secured
         and all the participants are expected to have the keys to decode the
         entire stream, then the encryption is performed in the usual manner,
         and there is no conflict between the two operations (encapsulation
         and encryption).
         The need for a portion of stream (e.g. extension data) to be
         encrypted with a different key, or not to be encrypted, would require
         application level signaling protocols to be aware of the usage of
         the XT field, and to exchange keys and negotiate their usage on the
         media and extension data separately.
         10. Acknowledgements
         Large parts of the syntax and the text concerned with RPS and NEWPRED
         were borrowed from an early I-D from Fukunaga et. al. that was
         concerned with MPEG-4 ES packetization.
         10. Full Copyright Statement
         Copyright (C) The Internet Society (1999). All Rights Reserved.
         This document and translations of it may be copied and furnished to
         others, and derivative works that comment on or otherwise explain it
         or assist in its implementation may be prepared, copied, published
         and distributed, in whole or in part, without restriction of any
         kind, provided that the above copyright notice and this paragraph are
         included on all such copies and derivative works.
         However, this document itself may not be modified in any way, such as
         by removing the copyright notice or references to the Internet Soci-
         ety or other Internet organizations, except as needed for the purpose
         of developing Internet standards in which case the procedures for
         copyrights defined in the Internet Standards process must be fol-
      Wenger/Ott                Expires June 2001                  [Page 19]

Internet Draft                                       24 November, 2000
         lowed, or as required to translate it into languages other than
         The limited permissions granted above are perpetual and will not be
         revoked by the Internet Society or its successors or assigns.
         This document and the information contained herein is provided on an
         11. Authors' Addresses
            Stephan Wenger (stewe@cs.tu-berlin.de)
            TU Berlin
            Sekr. FR 6-3
            Franklinstr. 28-29
            D-10587 Berlin
            Joerg Ott (jo@tzi.uni-bremen.de)
            Universitaet Bremen TZI
            MZH 5180
            Bibliothekstr. 1
            D-28359 Bremen
         12. Bibliography
         [still incomplete]
         [1]  RTP - draft-ietf-avt-rtp-new-08.txt
         [2]  RFC 2032
         [3]  RFC 2198
         [4]  RFC 2429
         [5]  RFC 2354
         [6]  RFC 2733
         [7]  draft-fukunaga-avt-low-delay-rtcp.txt
         [8]  RFC 2119

Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/