draft-ietf-avtcore-rtp-vvc-00.txt   draft-ietf-avtcore-rtp-vvc-01.txt 
avtcore S. Zhao avtcore S. Zhao
Internet-Draft S. Wenger Internet-Draft S. Wenger
Intended status: Standards Track Tencent Intended status: Standards Track Tencent
Expires: August 28, 2020 February 25, 2020 Expires: October 1, 2020 Y. Sanchez
Fraunhofer HHI
March 30, 2020
RTP Payload Format for Versatile Video Coding (VVC) RTP Payload Format for Versatile Video Coding (VVC)
draft-ietf-avtcore-rtp-vvc-00 draft-ietf-avtcore-rtp-vvc-01
Abstract Abstract
This memo describes an RTP payload format for the video coding This memo describes an RTP payload format for the video coding
standard ITU-T Recommendation [H.266] and ISO/IEC International standard ITU-T Recommendation [H.266] and ISO/IEC International
Standard [ISO23090-3], both also known as Versatile Video Coding Standard [ISO23090-3], both also known as Versatile Video Coding
(VVC) and developed by the Joint Video Experts Team (JVET). The RTP (VVC) and developed by the Joint Video Experts Team (JVET). The RTP
payload format allows for packetization of one or more Network payload format allows for packetization of one or more Network
Abstraction Layer (NAL) units in each RTP packet payload as well as Abstraction Layer (NAL) units in each RTP packet payload as well as
fragmentation of a NAL unit into multiple RTP packets. The payload fragmentation of a NAL unit into multiple RTP packets. The payload
skipping to change at page 1, line 39 skipping to change at page 1, line 41
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 28, 2020. This Internet-Draft will expire on October 1, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 18 skipping to change at page 2, line 20
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Overview of the VVC Codec . . . . . . . . . . . . . . . . 3 1.1. Overview of the VVC Codec . . . . . . . . . . . . . . . . 3
1.1.1. Coding-Tool Features (informative) . . . . . . . . . 3 1.1.1. Coding-Tool Features (informative) . . . . . . . . . 3
1.1.2. Systems and Transport Interfaces . . . . . . . . . . 6 1.1.2. Systems and Transport Interfaces . . . . . . . . . . 6
1.1.3. Parallel Processing Support (informative) . . . . . . 10 1.1.3. Parallel Processing Support (informative) . . . . . . 10
1.1.4. NAL Unit Header . . . . . . . . . . . . . . . . . . . 10 1.1.4. NAL Unit Header . . . . . . . . . . . . . . . . . . . 10
1.2. Overview of the Payload Format . . . . . . . . . . . . . 11 1.2. Overview of the Payload Format . . . . . . . . . . . . . 12
2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 12 2. Conventions . . . . . . . . . . . . . . . . . . . . . . . . . 12
3. Definitions and Abbreviations . . . . . . . . . . . . . . . . 12 3. Definitions and Abbreviations . . . . . . . . . . . . . . . . 12
3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 12 3.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 12
3.1.1. Definitions from the VVC Specification . . . . . . . 12 3.1.1. Definitions from the VVC Specification . . . . . . . 13
3.1.2. Definitions Specific to This Memo . . . . . . . . . . 15 3.1.2. Definitions Specific to This Memo . . . . . . . . . . 16
3.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 16 3.2. Abbreviations . . . . . . . . . . . . . . . . . . . . . . 16
4. RTP Payload Format . . . . . . . . . . . . . . . . . . . . . 17 4. RTP Payload Format . . . . . . . . . . . . . . . . . . . . . 17
4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 17 4.1. RTP Header Usage . . . . . . . . . . . . . . . . . . . . 18
4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 19 4.2. Payload Header Usage . . . . . . . . . . . . . . . . . . 19
4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 19 4.3. Payload Structures . . . . . . . . . . . . . . . . . . . 20
4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 19 4.3.1. Single NAL Unit Packets . . . . . . . . . . . . . . . 20
4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 20 4.3.2. Aggregation Packets (APs) . . . . . . . . . . . . . . 21
4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 24 4.3.3. Fragmentation Units . . . . . . . . . . . . . . . . . 25
4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 27 4.4. Decoding Order Number . . . . . . . . . . . . . . . . . . 28
5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 28 5. Packetization Rules . . . . . . . . . . . . . . . . . . . . . 29
6. De-packetization Process . . . . . . . . . . . . . . . . . . 29 6. De-packetization Process . . . . . . . . . . . . . . . . . . 30
7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 31 7. Payload Format Parameters . . . . . . . . . . . . . . . . . . 32
8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 31 8. Use with Feedback Messages . . . . . . . . . . . . . . . . . 32
8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 31 8.1. Picture Loss Indication (PLI) . . . . . . . . . . . . . . 32
8.2. Slice Loss Indication (SLI) . . . . . . . . . . . . . . . 31 8.2. Slice Loss Indication (SLI) . . . . . . . . . . . . . . . 32
8.3. Reference Picture Selection Indication (RPSI) . . . . . . 32 8.3. Reference Picture Selection Indication (RPSI) . . . . . . 33
8.4. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 32 8.4. Full Intra Request (FIR) . . . . . . . . . . . . . . . . 33
9. Frame marking . . . . . . . . . . . . . . . . . . . . . . . . 32 9. Frame marking . . . . . . . . . . . . . . . . . . . . . . . . 33
10. Security Considerations . . . . . . . . . . . . . . . . . . . 32 10. Security Considerations . . . . . . . . . . . . . . . . . . . 33
11. Congestion Control . . . . . . . . . . . . . . . . . . . . . 34 11. Congestion Control . . . . . . . . . . . . . . . . . . . . . 35
12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 35 12. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 36
13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 35 13. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 36
14. References . . . . . . . . . . . . . . . . . . . . . . . . . 35 14. References . . . . . . . . . . . . . . . . . . . . . . . . . 36
14.1. Normative References . . . . . . . . . . . . . . . . . . 35 14.1. Normative References . . . . . . . . . . . . . . . . . . 36
14.2. Informative References . . . . . . . . . . . . . . . . . 37 14.2. Informative References . . . . . . . . . . . . . . . . . 38
Appendix A. Change History . . . . . . . . . . . . . . . . . . . 38 Appendix A. Change History . . . . . . . . . . . . . . . . . . . 39
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 38 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 39
1. Introduction 1. Introduction
The Versatile Video Coding [VVC] specification, formally published as The Versatile Video Coding [VVC] specification, formally published as
both ITU-T Recommendation H.266 and ISO/IEC International Standard both ITU-T Recommendation H.266 and ISO/IEC International Standard
23090-3 [ISO23090-3], is currently in the ISO/IEC approval process 23090-3 [ISO23090-3], is currently in the ISO/IEC approval process
and is planned for ratification in mid 2020. H.266 is reported to and is planned for ratification in mid 2020. H.266 is reported to
provide significant coding efficiency gains over H.265 and earlier provide significant coding efficiency gains over H.265 and earlier
video codec formats. video codec formats.
skipping to change at page 5, line 20 skipping to change at page 5, line 24
Motion prediction and coding Motion prediction and coding
Compared to HEVC, [VVC] introduces several improvements in this area. Compared to HEVC, [VVC] introduces several improvements in this area.
First, there is the Adaptive motion vector resolution (AMVR), which First, there is the Adaptive motion vector resolution (AMVR), which
can save bit cost for motion vectors by adaptively signaling motion can save bit cost for motion vectors by adaptively signaling motion
vector resolution. Then the Affine motion compensation is included vector resolution. Then the Affine motion compensation is included
to capture complicated motion like zooming and rotation. Meanwhile, to capture complicated motion like zooming and rotation. Meanwhile,
prediction refinement with the optical flow with affine mode (PROF) prediction refinement with the optical flow with affine mode (PROF)
is further deployed to mimic affine motion at the pixel level. is further deployed to mimic affine motion at the pixel level.
Thirdly the decoder side motion vector refinement (DMVR) is a method Thirdly the decoder side motion vector refinement (DMVR) is a method
to derive MV vector at decoder side so that fewer bits may be spent to derive MV vector at decoder side based on block matching so that
on motion vectors. Bi-directional optical flow (BDOF) is a similar fewer bits may be spent on motion vectors. Bi-directional optical
method to DMVR but at 4x4 sub-block level. Another difference is flow (BDOF) is a similar method to PROF. BDOF adds a sample wise
that DMVR is based on block matching while BDOF derives MVs with offset at 4x4 sub-block level that is derived with equations based on
equations. Furthermore, merge with motion vector difference (MMVD) gradients of the prediction samples and a motion difference relative
is a special mode, which further signals a limited set of motion to CU motion vectors. Furthermore, merge with motion vector
vector differences on top of merge mode. In addition to MMVD, there difference (MMVD) is a special mode, which further signals a limited
are another three types of special merge modes, i.e., sub-block set of motion vector differences on top of merge mode. In addition
merge, triangle, and combined intra-/inter- prediction (CIIP). Sub- to MMVD, there are another three types of special merge modes, i.e.,
block merge list includes one candidate of sub-block temporal motion sub-block merge, triangle, and combined intra-/inter- prediction
vector prediction (SbTMVP) and up to four candidates of affine motion (CIIP). Sub- block merge list includes one candidate of sub-block
vectors. Triangle is based on triangular block motion compensation. temporal motion vector prediction (SbTMVP) and up to four candidates
CIIP combines intra- and inter- predictions with weighting. Adaptive of affine motion vectors. Triangle is based on triangular block
weighting may be employed with a block-level tool called bi- motion compensation. CIIP combines intra- and inter- predictions
prediction with CU based weighting (BCW) which provides more with weighting. Adaptive weighting may be employed with a block-
flexibility than in HEVC. level tool called bi-prediction with CU based weighting (BCW) which
provides more flexibility than in HEVC.
Intra prediction and intra-coding Intra prediction and intra-coding
To capture the diversified local image texture directions with finer To capture the diversified local image texture directions with finer
granularity, [VVC] supports 65 angular directions instead of 33 granularity, [VVC] supports 65 angular directions instead of 33
directions in HEVC. The intra mode coding is based on a 6 most directions in HEVC. The intra mode coding is based on a 6 most
probable mode scheme, and the 6 most probable modes are derived using probable mode scheme, and the 6 most probable modes are derived using
the neighboring intra prediction directions. In addition, to deal the neighboring intra prediction directions. In addition, to deal
with the different distributions of intra prediction angles for with the different distributions of intra prediction angles for
different block aspect ratios, a wide-angle intra prediction (WAIP) different block aspect ratios, a wide-angle intra prediction (WAIP)
scheme is applied in [VVC] by including intra prediction angles scheme is applied in [VVC] by including intra prediction angles
beyond those present in HEVC. Unlike HEVC which only allows using beyond those present in HEVC. Unlike HEVC which only allows using
the most adjacent line of reference samples for intra prediction, the most adjacent line of reference samples for intra prediction,
[VVC] also allows using two further reference lines, as known as [VVC] also allows using two further reference lines, as known as
multi-reference-line (MRL) intra prediction. The additional multi-reference-line (MRL) intra prediction. The additional
reference lines can be only used for 6 most probable intra prediction reference lines can be only used for 6 most probable intra prediction
modes. To capture the strong correlation between different colour modes. To capture the strong correlation between different colour
components, in VVC, a cross-component linear mode (CCLM) is utilized components, in VVC, a cross-component linear mode (CCLM) is utilized
which assumes a linear relationship between the luma sample which assumes a linear relationship between the luma sample values
and their associated chroma samples. For intra prediction, [VVC]
values and their associated chroma samples. For intra prediction, also applies a position-dependent prediction combination (PDPC) for
[VVC] also applies a position-dependent prediction combination (PDPC) refining the prediction samples closer to the intra prediction block
for refining the prediction samples closer to the intra prediction boundary. Matrix-based intra prediction (MIP) modes are also used in
block boundary. Matrix-based intra prediction (MIP) modes are also [VVC] which generates an up to 8x8 intra prediction block using a
used in [VVC] which generates an up to 8x8 intra prediction block weighted sum of downsampled neighboring reference samples, and the
using a weighted sum of downsampled neighboring reference samples, weightings are hardcoded constants.
and the weightings are hardcoded constants.
Other coding-tool feature Other coding-tool feature
[VVC] introduces dependent quantization (DQ) to reduce quantization [VVC] introduces dependent quantization (DQ) to reduce quantization
error by state-based switching between two quantizers. error by state-based switching between two quantizers.
1.1.2. Systems and Transport Interfaces 1.1.2. Systems and Transport Interfaces
[VVC] inherits the basic systems and transport interfaces designs [VVC] inherits the basic systems and transport interfaces designs
from HEVC and H.264. These include the NAL-unit-based syntax from HEVC and H.264. These include the NAL-unit-based syntax
structure, the hierarchical syntax and data unit structure, the structure, the hierarchical syntax and data unit structure, the
Supplemental Enhancement Information (SEI) message mechanism, and the Supplemental Enhancement Information (SEI) message mechanism, and the
video buffering model based on the Hypothetical Reference Decoder video buffering model based on the Hypothetical Reference Decoder
(HRD). The scalability features of [VVC] are conceptually similar to (HRD). The scalability features of [VVC] are conceptually similar to
the scalable variant of HEVC known as SHVC. The hierarchical syntax the scalable variant of HEVC known as SHVC. The hierarchical syntax
and data unit structure consists of parameter sets at various levels and data unit structure consists of parameter sets at various levels
(decoder, sequence (pertaining to all), sequence (pertaining to a (decoder, sequence (pertaining to all), sequence (pertaining to a
single), picture), slice-level header parameters, and lower-level single), picture), picture-level header parameters, slice-level
parameters. header parameters, and lower-level parameters.
A number of key components that influenced the Network Abstraction A number of key components that influenced the Network Abstraction
Layer design of [VVC] as well as this memo are described below Layer design of [VVC] as well as this memo are described below
Decoding Capability Information Decoding Capability Information
The Decoding capability information includes parameters that stay The Decoding capability information includes parameters that stay
constant for the lifetime of a Video Bitstream, which in IETF terms constant for the lifetime of a Video Bitstream, which in IETF terms
can translate to the lifetime of a session. Decoding capability can translate to the lifetime of a session. Decoding capability
informations can include profile, level, and sub-profile information informations can include profile, level, and sub-profile information
to determine a maximum complexity interop point that is guaranteed to to determine a maximum complexity interop point that is guaranteed to
be never exceeded, even if splicing of video sequences occurs within be never exceeded, even if splicing of video sequences occurs within
a session. It further optionally includes constraint flags, which a session. It further includes constraint flags, which can
indicate that the video bitstream will be constraint in the use of optionally be set to indicate that the video bitstream will be
certain features as indicated by the values of those flags. With constraint in the use of certain features as indicated by the values
this, a bitstream can be labelled as not using certain tools, which of those flags. With this, a bitstream can be labelled as not using
allows among other things for resource allocation in a decoder certain tools, which allows among other things for resource
implementation. allocation in a decoder implementation.
Video parameter set Video parameter set
The Video Parameter Set (VPS) pertains to a Coded Video Sequences The Video Parameter Set (VPS) pertains to a Coded Video Sequences
(CVS) of multiple layers covering the same range of picture units, (CVS) of multiple layers covering the same range of picture units,
and includes, among other information decoding dependency expressed and includes, among other information decoding dependency expressed
as information for reference picture set construction of enhancement as information for reference picture set construction of enhancement
layers. The VPS provides a "big picture" of a scalable sequence, layers. The VPS provides a "big picture" of a scalable sequence,
including what types of operation points are provided, the profile, including what types of operation points are provided, the profile,
tier, and level of the operation points, and some other high-level tier, and level of the operation points, and some other high-level
skipping to change at page 7, line 27 skipping to change at page 7, line 30
Sequence parameter set Sequence parameter set
The Sequence Parameter Set (SPS) contains syntax elements pertaining The Sequence Parameter Set (SPS) contains syntax elements pertaining
to a coded layer video sequence (CLVS), which is a group of pictures to a coded layer video sequence (CLVS), which is a group of pictures
belonging to the same layer, starting with a random access point, and belonging to the same layer, starting with a random access point, and
followed by pictures that may depend on each other and the random followed by pictures that may depend on each other and the random
access point picture. In MPGEG-2, the equivalent of a CVS was a access point picture. In MPGEG-2, the equivalent of a CVS was a
Group of Pictures (GOP), which normally started with an I frame and Group of Pictures (GOP), which normally started with an I frame and
was followed by P and B frames. While more complex in its options of was followed by P and B frames. While more complex in its options of
random access points, VVC retains this basic concept. In many TV- random access points, VVC retains this basic concept. One remarkable
like applications, a CVS contains a few hundred milliseconds to a few difference of VVC is that a CLVS may start with a Gradual Decoding
Refresh (GDR) picture, without requiring presence of traditional
random access points in the bitstream, such as Instantaneous Decoding
Refresh (IDR) or Clean Random Access (CRA) pictures. In many TV-like
applications, a CVS contains a few hundred milliseconds to a few
seconds of video. In video conferencing (without switching MCUs seconds of video. In video conferencing (without switching MCUs
involved), a CVS can be as long in duration as the whole session. involved), a CVS can be as long in duration as the whole session.
Picture and Adaptation parameter set Picture and Adaptation parameter set
The Picture Parameter Set and the Adaptation Parameter Set (PPS and The Picture Parameter Set and the Adaptation Parameter Set (PPS and
APS, respectively) carry information pertaining to zero or more APS, respectively) carry information pertaining to zero or more
pictures and zero or more slices, respectively. The PPS contains pictures and zero or more slices, respectively. The PPS contains
information that is likely to stay constant from picture to picture- information that is likely to stay constant from picture to picture-
at least for pictures for a certain type-whereas the APS contains at least for pictures for a certain type-whereas the APS contains
information, such as adaptive loop filter coefficients, that are information, such as adaptive loop filter coefficients, that are
likely to change from picture to picture. likely to change from picture to picture or even within a picture. A
single APS can be referenced by slices of the same picture if that
APS contains information about luma mapping with chroma scaling
(LMCS) but different APS can be referenced by slices of the same
picture if those APS contain information about ALF.
Picture Header
A Picture Header contains information that is common to all slices
that belong to the same picture. Being able to send that information
as a separate NAL unit when pictures are split into several slices
allows for saving bitrate, compared to repeating the same information
in all slices. However, there might be scenarios where low-bitrate
video is transmitted using a single slice per picture. Having a
separate NAL unit to convey that information incurs in an overhead
for such scenarios. Therefore, VVC specifies signaling that
indicates whether Picture Headers are present in the CLVS or not.
Profile, tier, and level Profile, tier, and level
The profile, tiler and level syntax structures in DCI, VPS and SPS The profile, tier and level syntax structures in DCI, VPS and SPS
contain profile, tier, level information for all layers that refer to contain profile, tier, level information for all layers that refer to
the DCI, for layers associated with one or more output layer sets the DCI, for layers associated with one or more output layer sets
specified by the VPS, and for the lowest layer among the layers that specified by the VPS, and for any layer that refers to the SPS,
refers to the SPS, respectively. respectively.
Sub-Profiles Sub-Profiles
Within the [VVC] specification, a sub-profile is a 32-bit number Within the [VVC] specification, a sub-profile is a 32-bit number
coded according to ITU-T Rec. T.35, that does not carry a semantic. coded according to ITU-T Rec. T.35, that does not carry a semantic.
It is carried in the profile_tier_level structure and hence It is carried in the profile_tier_level structure and hence
(potentially) present in the DCI, VPS, and SPS. External (potentially) present in the DCI, VPS, and SPS. External
registration bodies can register a T.35 codepoint with ITU-T registration bodies can register a T.35 codepoint with ITU-T
registration authorities and associate with their registration a registration authorities and associate with their registration a
description of bitstream complexity restrictions beyond the profiles description of bitstream complexity restrictions beyond the profiles
defined by ITU-T and ISO/IEC. This would allow encoder manufacturers defined by ITU-T and ISO/IEC. This would allow encoder manufacturers
to label the bitstreams generated by their encoder as complying with to label the bitstreams generated by their encoder as complying with
such sub-profile. It is expected that upstream standardization such sub-profile. It is expected that upstream standardization
skipping to change at page 8, line 23 skipping to change at page 8, line 46
such sub-profile. It is expected that upstream standardization such sub-profile. It is expected that upstream standardization
organizations (such as: DVB and ATSC), as well as walled-garden video organizations (such as: DVB and ATSC), as well as walled-garden video
services will take advantage of this labelling system. In contrast services will take advantage of this labelling system. In contrast
to "normal" profiles, it is expected that sub-profiles may indicate to "normal" profiles, it is expected that sub-profiles may indicate
encoder choices traditionally left open in the (decoder- centric) encoder choices traditionally left open in the (decoder- centric)
video coding specs, such as GOP structures, minimum/maximum QP video coding specs, such as GOP structures, minimum/maximum QP
values, and the mandatory use of certain tools or SEI messages. values, and the mandatory use of certain tools or SEI messages.
Constraint Flags Constraint Flags
The profile_tier_level structure optionally carries a considerable The profile_tier_level structure carries a considerable number of
number of constraint flags, which an encoder can use to indicate to a constraint flags, which an encoder can use to indicate to a decoder
decoder that it will not use a certain tool or technology. They were that it will not use a certain tool or technology. They were
included in reaction to a perceived market need for labelling a included in reaction to a perceived market need for labelling a
bitstream as not exercising a certain tool that has become bitstream as not exercising a certain tool that has become
commercially unviable. commercially unviable.
Temporal scalability support Temporal scalability support
Editor notes: need will update along with VVC new draft in the Editor notes: need will update along with VVC new draft in the
future future
[VVC] includes support of temporal scalability, by inclusion of the [VVC] includes support of temporal scalability, by inclusion of the
skipping to change at page 9, line 14 skipping to change at page 9, line 37
through the signaling of the layer_id in the NAL unit header, the VPS through the signaling of the layer_id in the NAL unit header, the VPS
which associates layers with given layer_ids to each other, reference which associates layers with given layer_ids to each other, reference
picture selection, reference picture resampling for spatial picture selection, reference picture resampling for spatial
scalability, and a number of other mechanisms not relevant for this scalability, and a number of other mechanisms not relevant for this
memo. Scalability support can be implemented in a single decoding memo. Scalability support can be implemented in a single decoding
"loop" and is widely considered a comparatively lightweight "loop" and is widely considered a comparatively lightweight
operation. operation.
Spatial Scalability Spatial Scalability
With the existence of Reference Picture Resampling, in the With the existence of Reference Picture Resampling (RPR), in
"main" profile of VVC, the additional burden for scalability the "main" profile of VVC, the additional burden for
support is just a minor modification of the high-level syntax scalability support is just a minor modification of the high-
(HLS). In technical aspects, the inter-layer prediction is level syntax (HLS). In technical aspects, the inter-layer
employed in a scalable system to improve the coding efficiency prediction is employed in a scalable system to improve the
of the enhancement layers. In addition to the spatial and coding efficiency of the enhancement layers. In addition to
temporal motion-compensated predictions that are available in a the spatial and temporal motion-compensated predictions that
single- layer codec, the inter-layer prediction in [VVC] uses are available in a single- layer codec, the inter-layer
the resampled video data of the reconstructed reference picture prediction in [VVC] uses the resampled video data of the
from a reference layer to predict the current enhancement reconstructed reference picture from a reference layer to
layer. Then, the resampling process for inter-layer prediction predict the current enhancement layer. Then, the resampling
is performed at the block-level, by modifying the existing process for inter-layer prediction is performed at the block-
interpolation process for motion compensation. It means that level, without modifying the existing interpolation process for
no additional resampling process is needed to support motion compensation compared to non-scalable RPR. It means
that no additional resampling process is needed to support
scalability. scalability.
SNR Scalability SNR Scalability
SNR scalability is similar to Spatial Scalability except that SNR scalability is similar to Spatial Scalability except that
the resampling factors are 1:1-in other words, there is no the resampling factors are 1:1-in other words, there is no
change in resolution, but there is inter-layer prediction. change in resolution, but there is inter-layer prediction.
SEI Messages SEI Messages
skipping to change at page 11, line 16 skipping to change at page 11, line 45
This memo does not overload the "Z" bit for local extensions, as This memo does not overload the "Z" bit for local extensions, as
a) overloading the "F" bit is sufficient and b) to preserve the a) overloading the "F" bit is sufficient and b) to preserve the
usefulness of this memo to possible future versions of [VVC]. usefulness of this memo to possible future versions of [VVC].
LayerId: 6 bits LayerId: 6 bits
nuh_layer_id. Identifies the layer a NAL unit belongs to, wherein nuh_layer_id. Identifies the layer a NAL unit belongs to, wherein
a layer may be, e.g., a spatial scalable layer, a quality scalable a layer may be, e.g., a spatial scalable layer, a quality scalable
layer . layer .
Type: 6 bits Type: 5 bits
nal_unit_type. This field specifies the NAL unit type as defined nal_unit_type. This field specifies the NAL unit type as defined
in Table 7-1 of VVC. For a reference of all currently defined NAL in Table 7-1 of VVC. For a reference of all currently defined NAL
unit types and their semantics, please refer to Section 7.4.2.2 in unit types and their semantics, please refer to Section 7.4.2.2 in
[VVC]. [VVC].
TID: 3 bits TID: 3 bits
nuh_temporal_id_plus1. This field specifies the temporal nuh_temporal_id_plus1. This field specifies the temporal
identifier of the NAL unit plus 1. The value of TemporalId is identifier of the NAL unit plus 1. The value of TemporalId is
skipping to change at page 13, line 18 skipping to change at page 13, line 49
Coded layer video sequence (CLVS): A sequence of PUs with the same Coded layer video sequence (CLVS): A sequence of PUs with the same
value of nuh_layer_id that consists, in decoding order, of a CLVSS value of nuh_layer_id that consists, in decoding order, of a CLVSS
PU, followed by zero or more PUs that are not CLVSS PUs, including PU, followed by zero or more PUs that are not CLVSS PUs, including
all subsequent PUs up to but not including any subsequent PU that is all subsequent PUs up to but not including any subsequent PU that is
a CLVSS PU. a CLVSS PU.
Coded layer video sequence start (CLVSS) PU: A PU in which the coded Coded layer video sequence start (CLVSS) PU: A PU in which the coded
picture is a CLVSS picture. picture is a CLVSS picture.
Coded layer video sequence start (CLVSS) picture: A coded picture
that is an IRAP picture with NoOutputBeforeRecoveryFlag equal to 1 or
a GDR picture with NoOutputBeforeRecoveryFlag equal to 1.
Coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs Coding tree unit (CTU): A CTB of luma samples, two corresponding CTBs
of chroma samples of a picture that has three sample arrays, or a CTB of chroma samples of a picture that has three sample arrays, or a CTB
of samples of a monochrome picture or a picture that is coded using of samples of a monochrome picture or a picture that is coded using
three separate colour planes and syntax structures used to code the three separate colour planes and syntax structures used to code the
samples. samples.
Decoding Capability Information (DCI): A syntax structure containing Decoding Capability Information (DCI): A syntax structure containing
syntax elements that apply to the entire bitstream. syntax elements that apply to the entire bitstream.
Decoded picture buffer (DPB): A buffer holding decoded pictures for Decoded picture buffer (DPB): A buffer holding decoded pictures for
reference, output reordering, or output delay specified for the reference, output reordering, or output delay specified for the
hypothetical reference decoder. hypothetical reference decoder.
Gradual decoding refresh (GDR) picture: A picture for which each VCL
NAL unit has nal_unit_type equal to GDR_NUT.
Instantaneous decoding refresh (IDR) PU: A PU in which the coded Instantaneous decoding refresh (IDR) PU: A PU in which the coded
picture is an IDR picture. picture is an IDR picture.
Instantaneous decoding refresh (IDR) picture: An IRAP picture for Instantaneous decoding refresh (IDR) picture: An IRAP picture for
which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or which each VCL NAL unit has nal_unit_type equal to IDR_W_RADL or
IDR_N_LP.. IDR_N_LP.
Intra random access point (IRAP) AU: An AU in which there is a PU for Intra random access point (IRAP) AU: An AU in which there is a PU for
each layer in the CVS and the coded picture in each PU is an IRAP each layer in the CVS and the coded picture in each PU is an IRAP
picture. picture.
Intra random access point (IRAP) PU: A PU in which the coded picture Intra random access point (IRAP) PU: A PU in which the coded picture
is an IRAP picture. is an IRAP picture.
Intra random access point (IRAP) picture: A coded picture for which
all VCL NAL units have the same value of nal_unit_type in the range
of IDR_W_RADL to CRA_NUT, inclusive.
Layer: A set of VCL NAL units that all have a particular value of Layer: A set of VCL NAL units that all have a particular value of
nuh_layer_id and the associated non-VCL NAL units. nuh_layer_id and the associated non-VCL NAL units.
Network abstraction layer (NAL) unit: A syntax structure containing Network abstraction layer (NAL) unit: A syntax structure containing
an indication of the type of data to follow and bytes containing that an indication of the type of data to follow and bytes containing that
data in the form of an RBSP interspersed as necessary with emulation data in the form of an RBSP interspersed as necessary with emulation
prevention bytes. prevention bytes.
Network abstraction layer (NAL) unit stream: A sequence of NAL units. Network abstraction layer (NAL) unit stream: A sequence of NAL units.
skipping to change at page 16, line 12 skipping to change at page 17, line 4
the NAL unit transmission order is the same as the order of the NAL unit transmission order is the same as the order of
appearance of NAL units in the packet. appearance of NAL units in the packet.
3.2. Abbreviations 3.2. Abbreviations
AU Access Unit AU Access Unit
AP Aggregation Packet AP Aggregation Packet
CTU Coding Tree Unit CTU Coding Tree Unit
CVS Coded Video Sequence CVS Coded Video Sequence
DPB Decoded Picture Buffer DPB Decoded Picture Buffer
DCI Decoding capability information DCI Decoding capability information
DON Decoding Order Number DON Decoding Order Number
DONB Decoding Order Number Base
FIR Full Intra Request FIR Full Intra Request
FU Fragmentation Unit FU Fragmentation Unit
HRD Hypothetical Reference Decoder HRD Hypothetical Reference Decoder
IDR Instantaneous Decoding Refresh IDR Instantaneous Decoding Refresh
MANE Media-Aware Network Element MANE Media-Aware Network Element
skipping to change at page 19, line 33 skipping to change at page 20, line 28
interpretation of the VPS is required. We can add language about interpretation of the VPS is required. We can add language about
the need for stateful interpretation of LayerID vis-a-vis the need for stateful interpretation of LayerID vis-a-vis
stateless interpretation of TID later. stateless interpretation of TID later.
4.3. Payload Structures 4.3. Payload Structures
Three different types of RTP packet payload structures are specified. Three different types of RTP packet payload structures are specified.
A receiver can identify the type of an RTP packet payload through the A receiver can identify the type of an RTP packet payload through the
Type field in the payload header. Type field in the payload header.
The four different payload structures are as follows: The three different payload structures are as follows:
o Single NAL unit packet: Contains a single NAL unit in the payload, o Single NAL unit packet: Contains a single NAL unit in the payload,
and the NAL unit header of the NAL unit also serves as the payload and the NAL unit header of the NAL unit also serves as the payload
header. This payload structure is specified in Section 4.4.1. header. This payload structure is specified in Section 4.4.1.
o Aggregation Packet (AP): Contains more than one NAL unit within o Aggregation Packet (AP): Contains more than one NAL unit within
one access unit. This payload structure is specified in one access unit. This payload structure is specified in
Section 4.3.2. Section 4.3.2.
o Fragmentation Unit (FU): Contains a subset of a single NAL unit. o Fragmentation Unit (FU): Contains a subset of a single NAL unit.
skipping to change at page 38, line 24 skipping to change at page 39, line 24
[Wang05] Wang, YK, ., Zhu, C, ., and . Li, H, "Error resilient [Wang05] Wang, YK, ., Zhu, C, ., and . Li, H, "Error resilient
video coding using flexible reference fames", Visual video coding using flexible reference fames", Visual
Communications and Image Processing 2005 (VCIP 2005) , Communications and Image Processing 2005 (VCIP 2005) ,
July 2005. July 2005.
Appendix A. Change History Appendix A. Change History
draft-zhao-payload-rtp-vvc-00 ........ initial version draft-zhao-payload-rtp-vvc-00 ........ initial version
draft-zhao-payload-rtp-vvc-01 ........ editorial clarifications and
corrections
Authors' Addresses Authors' Addresses
Shuai Zhao Shuai Zhao
Tencent Tencent
2747 Park Blvd 2747 Park Blvd
Palo Alto 94588 Palo Alto 94588
USA USA
Email: shuai.zhao@ieee.org Email: shuai.zhao@ieee.org
Stephan Wenger Stephan Wenger
Tencent Tencent
2747 Park Blvd 2747 Park Blvd
Palo Alto 94588 Palo Alto 94588
Email: stewe@stewe.org Email: stewe@stewe.org
Yago Sanchez
Fraunhofer HHI
Einsteinufer 37
Berlin 10587
Germany
Email: yago.sanchez@hhi.fraunhofer.de
 End of changes. 28 change blocks. 
92 lines changed or deleted 127 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/