--- 1/draft-ietf-cellar-ffv1-v4-15.txt 2020-12-01 13:13:15.823718291 -0800 +++ 2/draft-ietf-cellar-ffv1-v4-16.txt 2020-12-01 13:13:15.931721043 -0800 @@ -1,20 +1,20 @@ cellar M. Niedermayer Internet-Draft Intended status: Standards Track D. Rice -Expires: 10 April 2021 +Expires: 4 June 2021 J. Martinez - 7 October 2020 + 1 December 2020 FFV1 Video Coding Format Version 4 - draft-ietf-cellar-ffv1-v4-15 + draft-ietf-cellar-ffv1-v4-16 Abstract This document defines FFV1, a lossless intra-frame video encoding format. FFV1 is designed to efficiently compress video data in a variety of pixel formats. Compared to uncompressed video, FFV1 offers storage compression, frame fixity, and self-description, which makes FFV1 useful as a preservation or intermediate video format. Status of This Memo @@ -25,63 +25,64 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on 10 April 2021. + This Internet-Draft will expire on 4 June 2021. Copyright Notice Copyright (c) 2020 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Notation and Conventions . . . . . . . . . . . . . . . . . . 4 - 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 4 + 2.1. Definitions . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. Conventions . . . . . . . . . . . . . . . . . . . . . . . 5 2.2.1. Pseudo-code . . . . . . . . . . . . . . . . . . . . . 6 2.2.2. Arithmetic Operators . . . . . . . . . . . . . . . . 6 2.2.3. Assignment Operators . . . . . . . . . . . . . . . . 7 2.2.4. Comparison Operators . . . . . . . . . . . . . . . . 7 2.2.5. Mathematical Functions . . . . . . . . . . . . . . . 7 2.2.6. Order of Operation Precedence . . . . . . . . . . . . 8 2.2.7. Range . . . . . . . . . . . . . . . . . . . . . . . . 9 2.2.8. NumBytes . . . . . . . . . . . . . . . . . . . . . . 9 2.2.9. Bitstream Functions . . . . . . . . . . . . . . . . . 9 3. Sample Coding . . . . . . . . . . . . . . . . . . . . . . . . 9 3.1. Border . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.2. Samples . . . . . . . . . . . . . . . . . . . . . . . . . 10 3.3. Median Predictor . . . . . . . . . . . . . . . . . . . . 11 + 3.3.1. Exception . . . . . . . . . . . . . . . . . . . . . . 11 3.4. Quantization Table Sets . . . . . . . . . . . . . . . . . 12 3.5. Context . . . . . . . . . . . . . . . . . . . . . . . . . 12 - 3.6. Quantization Table Set Indexes . . . . . . . . . . . . . 12 + 3.6. Quantization Table Set Indexes . . . . . . . . . . . . . 13 3.7. Color spaces . . . . . . . . . . . . . . . . . . . . . . 13 3.7.1. YCbCr . . . . . . . . . . . . . . . . . . . . . . . . 13 3.7.2. RGB . . . . . . . . . . . . . . . . . . . . . . . . . 14 - 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 15 - 3.8.1. Range Coding Mode . . . . . . . . . . . . . . . . . . 15 + 3.8. Coding of the Sample Difference . . . . . . . . . . . . . 16 + 3.8.1. Range Coding Mode . . . . . . . . . . . . . . . . . . 16 3.8.2. Golomb Rice Mode . . . . . . . . . . . . . . . . . . 22 4. Bitstream . . . . . . . . . . . . . . . . . . . . . . . . . . 28 4.1. Quantization Table Set . . . . . . . . . . . . . . . . . 29 4.1.1. quant_tables . . . . . . . . . . . . . . . . . . . . 30 4.1.2. context_count . . . . . . . . . . . . . . . . . . . . 31 4.2. Parameters . . . . . . . . . . . . . . . . . . . . . . . 31 4.2.1. version . . . . . . . . . . . . . . . . . . . . . . . 33 4.2.2. micro_version . . . . . . . . . . . . . . . . . . . . 33 4.2.3. coder_type . . . . . . . . . . . . . . . . . . . . . 34 4.2.4. state_transition_delta . . . . . . . . . . . . . . . 35 @@ -125,29 +126,34 @@ 4.8.1. plane_pixel_width . . . . . . . . . . . . . . . . . . 48 4.8.2. slice_pixel_width . . . . . . . . . . . . . . . . . . 49 4.8.3. slice_pixel_x . . . . . . . . . . . . . . . . . . . . 49 4.8.4. sample_difference . . . . . . . . . . . . . . . . . . 49 4.9. Slice Footer . . . . . . . . . . . . . . . . . . . . . . 49 4.9.1. slice_size . . . . . . . . . . . . . . . . . . . . . 50 4.9.2. error_status . . . . . . . . . . . . . . . . . . . . 50 4.9.3. slice_crc_parity . . . . . . . . . . . . . . . . . . 50 5. Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 50 6. Security Considerations . . . . . . . . . . . . . . . . . . . 51 - 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 52 - 7.1. Media Type Definition . . . . . . . . . . . . . . . . . . 52 + 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 51 + 7.1. Media Type Definition . . . . . . . . . . . . . . . . . . 51 + 8. Changelog . . . . . . . . . . . . . . . . . . . . . . . . . . 53 9. Normative References . . . . . . . . . . . . . . . . . . . . 53 10. Informative References . . . . . . . . . . . . . . . . . . . 54 Appendix A. Multi-theaded decoder implementation suggestions . . 55 Appendix B. Future handling of some streams created by non conforming encoders . . . . . . . . . . . . . . . . . . . 56 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 56 + Appendix C. FFV1 Implementations . . . . . . . . . . . . . . . . 56 + C.1. FFmpeg FFV1 Codec . . . . . . . . . . . . . . . . . . . . 56 + C.2. FFV1 Decoder in Go . . . . . . . . . . . . . . . . . . . 56 + C.3. MediaConch . . . . . . . . . . . . . . . . . . . . . . . 57 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 57 1. Introduction This document describes FFV1, a lossless video encoding format. The design of FFV1 considers the storage of image characteristics, data fixity, and the optimized use of encoding time and storage requirements. FFV1 is designed to support a wide range of lossless video applications such as long-term audiovisual preservation, scientific imaging, screen recording, and other video encoding scenarios that seek to avoid the generational loss of lossy video @@ -195,24 +201,20 @@ "Line": A discrete component of a static image composed of Samples that represent a specific quantification of Samples of that image. "Plane": A discrete component of a static image composed of Lines that represent a specific quantification of Lines of that image. "Pixel": The smallest addressable representation of a color in a Frame. It is composed of one or more Samples. - "ESC": An ESCape Symbol to indicate that the Symbol to be stored is - too large for normal storage and that an alternate storage method is - used. - "MSB": Most Significant Bit, the bit that can cause the largest change in magnitude of the Symbol. "VLC": Variable Length Code, a code that maps source symbols to a variable number of bits. "RGB": A reference to the method of storing the value of a Pixel by using three numeric values that represent Red, Green, and Blue. "YCbCr": A reference to the method of storing the value of a Pixel by @@ -476,50 +478,55 @@ The prediction for any Sample value at position "X" may be computed based upon the relative neighboring values of "l", "t", and "tl" via this equation: median(l, t, l + t - tl) Note, this prediction template is also used in [ISO.14495-1.1999] and [HuffYUV]. - Exception for the median predictor: if "colorspace_type == 0 && - bits_per_raw_sample == 16 && ( coder_type == 1 || coder_type == 2 )" - (see Section 4.2.5, Section 4.2.7 and Section 4.2.5), the following - median predictor MUST be used: +3.3.1. Exception + + If "colorspace_type == 0 && bits_per_raw_sample == 16 && ( coder_type + == 1 || coder_type == 2 )" (see Section 4.2.5, Section 4.2.7 and + Section 4.2.3), the following median predictor MUST be used: median(left16s, top16s, left16s + top16s - diag16s) where: left16s = l >= 32768 ? ( l - 65536 ) : l top16s = t >= 32768 ? ( t - 65536 ) : t diag16s = tl >= 32768 ? ( tl - 65536 ) : tl Background: a two's complement 16-bit signed integer was used for - storing Sample values in all known implementations of FFV1 bitstream. - So in some circumstances, the most significant bit was wrongly - interpreted (used as a sign bit instead of the 16th bit of an - unsigned integer). Note that when the issue was discovered, the only - configuration of all known implementations being impacted is 16-bit - YCbCr with no Pixel transformation with Range Coder coder, as other - potentially impacted configurations (e.g. 15/16-bit JPEG2000-RCT with - Range Coder coder, or 16-bit content with Golomb Rice coder) were - implemented nowhere [ISO.15444-1.2016]. In the meanwhile, 16-bit - JPEG2000-RCT with Range Coder coder was implemented without this - issue in one implementation and validated by one conformance checker. - It is expected (to be confirmed) to remove this exception for the - median predictor in the next version of the FFV1 bitstream. + storing Sample values in all known implementations of FFV1 bitstream + (see Appendix C). So in some circumstances, the most significant bit + was wrongly interpreted (used as a sign bit instead of the 16th bit + of an unsigned integer). Note that when the issue was discovered, + the only configuration of all known implementations being impacted is + 16-bit YCbCr with no Pixel transformation with Range Coder coder, as + other potentially impacted configurations (e.g. 15/16-bit + JPEG2000-RCT with Range Coder coder, or 16-bit content with Golomb + Rice coder) were implemented nowhere [ISO.15444-1.2016]. In the + meanwhile, 16-bit JPEG2000-RCT with Range Coder coder was implemented + without this issue in one implementation and validated by one + conformance checker. It is expected (to be confirmed) to remove this + exception for the median predictor in the next version of the FFV1 + bitstream. 3.4. Quantization Table Sets + Quantization Tables are used on Sample Differences (see Section 3.8), + so Quantized Sample Differences are stored in the bitstream. + The FFV1 bitstream contains one or more Quantization Table Sets. Each Quantization Table Set contains exactly 5 Quantization Tables with each Quantization Table corresponding to one of the five Quantized Sample Differences. For each Quantization Table, both the number of quantization steps and their distribution are stored in the FFV1 bitstream; each Quantization Table has exactly 256 entries, and the 8 least significant bits of the Quantized Sample Difference are used as index: Q_(j)[k] = quant_tables[i][j][k&255] @@ -606,48 +613,30 @@ An optional transparency Plane can be used to code transparency data. JPEG2000-RCT is a Reversible Color Transform that codes RGB (red, green, blue) Planes losslessly in a modified YCbCr color space [ISO.15444-1.2016]. Reversible Pixel transformations between YCbCr and RGB use the following formulae. Cb = b - g Cr = r - g Y = g + (Cb + Cr) >> 2 + + Figure 6: Description of the transformation of pixels from RGB + color space to coded modified YCbCr color space. + g = Y - (Cb + Cr) >> 2 r = Cr + g b = Cb + g - Figure 6 - - Exception for the JPEG2000-RCT conversion: if "bits_per_raw_sample" - is between 9 and 15 inclusive and "extra_plane" is 0, the following - formulae for reversible conversions between YCbCr and RGB MUST be - used instead of the ones above: - - Cb = g - b - Cr = r - b - Y = b +(Cb + Cr) >> 2 - b = Y -(Cb + Cr) >> 2 - r = Cr + b - g = Cb + b - - Figure 7 - - Background: At the time of this writing, in all known implementations - of FFV1 bitstream, when "bits_per_raw_sample" was between 9 and 15 - inclusive and "extra_plane" is 0, GBR Planes were used as BGR Planes - during both encoding and decoding. In the meanwhile, 16-bit - JPEG2000-RCT was implemented without this issue in one implementation - and validated by one conformance checker. Methods to address this - exception for the transform are under consideration for the next - version of the FFV1 bitstream. + Figure 7: Description of the transformation of pixels from coded + modified YCbCr color space to RGB color space. Cb and Cr are positively offset by "1 << bits_per_raw_sample" after the conversion from RGB to the modified YCbCr and are negatively offseted by the same value before the conversion from the modified YCbCr to RGB, in order to have only non-negative values after the conversion. When FFV1 uses the JPEG2000-RCT, the horizontal Lines are interleaved to improve caching efficiency since it is most likely that the JPEG2000-RCT will immediately be converted to RGB during decoding. @@ -664,140 +653,177 @@ | Pixel(1,2) | Pixel(2,2) | | Y(1,2) Cb(1,2) Cr(1,2) | Y(2,2) Cb(2,2) Cr(2,2) | +------------------------+------------------------+ In JPEG2000-RCT, the coding order would be left to right and then top to bottom, with values interleaved by Lines and stored in this order: Y(1,1) Y(2,1) Cb(1,1) Cb(2,1) Cr(1,1) Cr(2,1) Y(1,2) Y(2,2) Cb(1,2) Cb(2,2) Cr(1,2) Cr(2,2) +3.7.2.1. Exception + + If "bits_per_raw_sample" is between 9 and 15 inclusive and + "extra_plane" is 0, the following formulae for reversible conversions + between YCbCr and RGB MUST be used instead of the ones above: + + Cb = g - b + Cr = r - b + Y = b + (Cb + Cr) >> 2 + + Figure 8: Description of the transformation of pixels from RGB + color space to coded modified YCbCr color space (in case of + exception). + + b = Y - (Cb + Cr) >> 2 + r = Cr + b + g = Cb + b + + Figure 9: Description of the transformation of pixels from coded + modified YCbCr color space to RGB color space (in case of + exception). + + Background: At the time of this writing, in all known implementations + of FFV1 bitstream, when "bits_per_raw_sample" was between 9 and 15 + inclusive and "extra_plane" is 0, GBR Planes were used as BGR Planes + during both encoding and decoding. In the meanwhile, 16-bit + JPEG2000-RCT was implemented without this issue in one implementation + and validated by one conformance checker. Methods to address this + exception for the transform are under consideration for the next + version of the FFV1 bitstream. + 3.8. Coding of the Sample Difference Instead of coding the n+1 bits of the Sample Difference with Huffman or Range coding (or n+2 bits, in the case of JPEG2000-RCT), only the n (or n+1, in the case of JPEG2000-RCT) least significant bits are used, since this is sufficient to recover the original Sample. In the equation below, the term "bits" represents "bits_per_raw_sample + 1" for JPEG2000-RCT or "bits_per_raw_sample" otherwise: - coder_input = [(sample_difference + 2 ^ (bits - 1)) & - (2 ^ bits - 1)] - 2 ^ (bits - 1) + coder_input = ((sample_difference + 2 ^ (bits - 1)) & + (2 ^ bits - 1)) - 2 ^ (bits - 1) - Figure 8: Description of the coding of the Sample Difference in + Figure 10: Description of the coding of the Sample Difference in the bitstream. 3.8.1. Range Coding Mode Early experimental versions of FFV1 used the CABAC Arithmetic coder from H.264 as defined in [ISO.14496-10.2014] but due to the uncertain patent/royalty situation, as well as its slightly worse performance, CABAC was replaced by a Range coder based on an algorithm defined by G. Nigel N. Martin in 1979 [range-coding]. 3.8.1.1. Range Binary Values To encode binary digits efficiently a Range coder is used. C_(i) is the i-th Context. B_(i) is the i-th byte of the bytestream. b_(i) is the i-th Range coded binary value, S_(0, i) is the i-th initial state. The length of the bytestream encoding n binary symbols is j_(n) bytes. r_(i) = floor( ( R_(i) * S_(i, C_(i)) ) / 2 ^ 8 ) - Figure 9: A formula of the read of a binary value in Range Binary - mode. + Figure 11: A formula of the read of a binary value in Range + Binary mode. S_(i + 1, C_(i)) = zero_state_(S_(i, C_(i))) AND l_(i) = L_(i) AND t_(i) = R_(i) - r_(i) <== b_(i) = 0 <==> L_(i) < R_(i) - r_(i) S_(i + 1, C_(i)) = one_state_(S_(i, C_(i))) AND l_(i) = L_(i) - R_(i) + r_(i) AND t_(i) = r_(i) <== b_(i) = 1 <==> L_(i) >= R_(i) - r_(i) - - Figure 10 + Figure 12 S_(i + 1, k) = S_(i, k) <== C_(i) != k - Figure 11 + Figure 13: The "i+1,k"-th State is equal to the "i,k"-th State if + the value of "k" is unequal to the i-th value of Context. R_(i + 1) = 2 ^ 8 * t_(i) AND L_(i + 1) = 2 ^ 8 * l_(i) + B_(j_(i)) AND j_(i + 1) = j_(i) + 1 <== t_(i) < 2 ^ 8 R_(i + 1) = t_(i) AND L_(i + 1) = l_(i) AND j_(i + 1) = j_(i) <== t_(i) >= 2 ^ 8 - Figure 12 + Figure 14: The "i+1"-th values for "Range", "Low", and the length + of the bytestream encoding are conditionally set depending on the + "i-th" value of "t". R_(0) = 65280 - Figure 13 + Figure 15: The initial value for "Range". L_(0) = 2 ^ 8 * B_(0) + B_(1) - Figure 14 + + Figure 16: The initial value for "Low" is set according to the + first two bytes of the bytestream. j_(0) = 2 - Figure 15 + Figure 17: The initial value for "j", the length of the + bytestream encoding. range = 0xFF00; end = 0; low = get_bits(16); if (low >= range) { low = range; end = 1; } - Figure 16: A pseudo-code description of the initial states in + Figure 18: A pseudo-code description of the initial states in Range Binary mode. refill() { if (range < 256) { range = range * 256; low = low * 256; if (!end) { c.low += get_bits(8); if (remaining_bits_in_bitstream( NumBytes ) == 0) { end = 1; } } } } - Figure 17: A pseudo-code description of refilling the Range + Figure 19: A pseudo-code description of refilling the Range Binary Value coder buffer. get_rac(state) { rangeoff = (range * state) / 256; range -= rangeoff; if (low < range) { state = zero_state[state]; refill(); return 0; } else { low -= range; state = one_state[state]; range = rangeoff; refill(); return 1; } } - Figure 18: A pseudo-code description of the read of a binary + + Figure 20: A pseudo-code description of the read of a binary value in Range Binary mode. 3.8.1.1.1. Termination The range coder can be used in three modes. * In "Open mode" when decoding, every Symbol the reader attempts to read is available. In this mode arbitrary data can have been appended without affecting the range coder output. This mode is not used in FFV1. @@ -836,21 +862,21 @@ To encode scalar integers, it would be possible to encode each bit separately and use the past bits as context. However that would mean 255 contexts per 8-bit Symbol that is not only a waste of memory but also requires more past data to reach a reasonably good estimate of the probabilities. Alternatively assuming a Laplacian distribution and only dealing with its variance and mean (as in Huffman coding) would also be possible, however, for maximum flexibility and simplicity, the chosen method uses a single Symbol to encode if a number is 0, and if not, encodes the number using its exponent, mantissa and sign. The exact contexts used are best described by - Figure 19. + Figure 21. int get_symbol(RangeCoder *c, uint8_t *state, int is_signed) { if (get_rac(c, state + 0) { return 0; } int e = 0; while (get_rac(c, state + 1 + min(e, 9)) { //1..10 e++; } @@ -864,46 +890,52 @@ return a; } if (get_rac(c, state + 11 + min(e, 10))) { //11..21 return -a; } else { return a; } } - Figure 19: A pseudo-code description of the contexts of Range Non + Figure 21: A pseudo-code description of the contexts of Range Non Binary Values. "get_symbol" is used for the read out of "sample_difference" - indicated in Figure 8. + indicated in Figure 10. "get_rac" returns a boolean, computed from the bytestream as - described in Figure 9 as a formula and in Figure 18 as pseudo-code. + described in Figure 11 as a formula and in Figure 20 as pseudo-code. 3.8.1.3. Initial Values for the Context Model When "keyframe" (see Section 4.4) value is 1, all Range coder state variables are set to their initial state. 3.8.1.4. State Transition Table + In this mode a State Transition Table is used, indicating in which + state the decoder will move to, based on the current state and the + value extracted from Figure 20. + one_state_(i) = default_state_transition_(i) + state_transition_delta_(i) - - Figure 20 + Figure 22 zero_state_(i) = 256 - one_state_(256-i) - Figure 21 + Figure 23 3.8.1.5. default_state_transition + + By default, the following State Transition Table is used: + 0, 0, 0, 0, 0, 0, 0, 0, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, @@ -929,21 +961,21 @@ 241,242,243,244,245,246,247,248,248, 0, 0, 0, 0, 0, 0, 0, 3.8.1.6. Alternative State Transition Table The alternative state transition table has been built using iterative minimization of frame sizes and generally performs better than the default. To use it, the "coder_type" (see Section 4.2.3) MUST be set to 2 and the difference to the default MUST be stored in the "Parameters", see Section 4.2. The reference implementation of FFV1 - in FFmpeg uses Figure 22 by default at the time of this writing when + in FFmpeg uses Figure 24 by default at the time of this writing when Range coding is used. 0, 10, 10, 10, 10, 16, 16, 16, 28, 16, 16, 29, 42, 49, 20, 49, 59, 25, 26, 26, 27, 31, 33, 33, 33, 34, 34, 37, 67, 38, 39, 39, 40, 40, 41, 79, 43, 44, 45, 45, 48, 48, 64, 50, 51, 52, 88, 52, 53, 74, 55, 57, 58, 58, 74, 60,101, 61, 62, 84, 66, 66, 68, 69, @@ -964,21 +996,21 @@ 175,189,179,181,186,183,192,185,200,187,191,188,190,197,193,196, 197,194,195,196,198,202,199,201,210,203,207,204,205,206,208,214, 209,211,221,212,213,215,224,216,217,218,219,220,222,228,223,225, 226,224,227,229,240,230,231,232,233,234,235,236,238,239,237,242, 241,243,242,244,245,246,247,248,249,250,251,252,252,253,254,255, - Figure 22: Alternative state transition table for Range coding. + Figure 24: Alternative state transition table for Range coding. 3.8.2. Golomb Rice Mode The end of the bitstream of the Frame is padded with 0-bits until the bitstream contains a multiple of 8 bits. 3.8.2.1. Signed Golomb Rice Codes This coding mode uses Golomb Rice codes. The VLC is split into two parts. The prefix stores the most significant bits and the suffix @@ -987,30 +1019,30 @@ int get_ur_golomb(k) { for (prefix = 0; prefix < 12; prefix++) { if (get_bits(1)) { return get_bits(k) + (prefix << k); } } return get_bits(bits) + 11; } - Figure 23: A pseudo-code description of the read of an unsigned + Figure 25: A pseudo-code description of the read of an unsigned integer in Golomb Rice mode. int get_sr_golomb(k) { v = get_ur_golomb(k); if (v & 1) return - (v >> 1) - 1; else return (v >> 1); } - Figure 24: A pseudo-code description of the read of a signed + Figure 26: A pseudo-code description of the read of a signed integer in Golomb Rice mode. 3.8.2.1.1. Prefix +================+=======+ | bits | value | +================+=======+ | 1 | 0 | +----------------+-------+ | 01 | 1 | @@ -1021,20 +1053,24 @@ +----------------+-------+ | 0000 0000 001 | 10 | +----------------+-------+ | 0000 0000 0001 | 11 | +----------------+-------+ | 0000 0000 0000 | ESC | +----------------+-------+ Table 1 + "ESC" is an ESCape Symbol to indicate that the Symbol to be stored is + too large for normal storage and that an alternate storage method is + used. + 3.8.2.1.2. Suffix +=========+========================================+ +=========+========================================+ | non ESC | the k least significant bits MSB first | +---------+----------------------------------------+ | ESC | the value - 11, in MSB first order | +---------+----------------------------------------+ Table 2 @@ -1309,21 +1345,21 @@ "context_count[ i ]" indicates the count of contexts for Quantization Table Set "i". "context_count[ i ]" MUST be less than or equal to 32768. 4.2. Parameters The "Parameters" section contains significant characteristics about the decoding configuration used for all instances of Frame (in FFV1 version 0 and 1) or the whole FFV1 bitstream (other versions), including the stream version, color configuration, and quantization - tables. Figure 25 describes the contents of the bitstream. + tables. Figure 27 describes the contents of the bitstream. "Parameters" has its own initial states, all set to 128. pseudo-code | type --------------------------------------------------------------|----- Parameters( ) { | version | ur if (version >= 3) { | micro_version | ur } | @@ -1358,21 +1394,21 @@ initial_state_delta[ i ][ j ][ k ] | sr } | } | } | } | ec | ur intra | ur } | } | - Figure 25: A pseudo-code description of the bitstream contents. + Figure 27: A pseudo-code description of the bitstream contents. CONTEXT_SIZE is 32. 4.2.1. version "version" specifies the version of the FFV1 bitstream. Each version is incompatible with other versions: decoders SHOULD reject FFV1 bitstreams due to an unknown version. @@ -1616,26 +1652,26 @@ Table 13 4.2.15. initial_state_delta "initial_state_delta[ i ][ j ][ k ]" indicates the initial Range coder state, it is encoded using "k" as context index and pred = j ? initial_states[ i ][j - 1][ k ] : 128 - Figure 26 + Figure 28 initial_state[ i ][ j ][ k ] = ( pred + initial_state_delta[ i ][ j ][ k ] ) & 255 - Figure 27 + Figure 29 4.2.16. ec "ec" indicates the error detection/correction type. +=======+=================================================+ | value | error detection/correction type | +=======+=================================================+ | 0 | 32-bit CRC in "ConfigurationRecord" | +-------+-------------------------------------------------+ @@ -2184,60 +2220,40 @@ a slice in the previous Frame, except if "reset_contexts" is 1. 6. Security Considerations Like any other codec, (such as [RFC6716]), FFV1 should not be used with insecure ciphers or cipher-modes that are vulnerable to known plaintext attacks. Some of the header bits as well as the padding are easily predictable. Implementations of the FFV1 codec need to take appropriate security - considerations into account, as outlined in [RFC4732]. It is - extremely important for the decoder to be robust against malicious - payloads. Malicious payloads MUST NOT cause the decoder to overrun - its allocated memory or to take an excessive amount of resources to - decode. The same applies to the encoder, even though problems in + considerations into account. Those related to denial of service are + outlined in Section 2.1 of [RFC4732]. It is extremely important for + the decoder to be robust against malicious payloads. Malicious + payloads MUST NOT cause the decoder to overrun its allocated memory + or to take an excessive amount of resources to decode. An overrun in + allocated memory could lead to arbitrary code execution by an + attacker. The same applies to the encoder, even though problems in encoders are typically rarer. Malicious video streams MUST NOT cause the encoder to misbehave because this would allow an attacker to attack transcoding gateways. A frequent security problem in image and video codecs is failure to check for integer overflows. An example is allocating "frame_pixel_width * frame_pixel_height" in Pixel count computations without considering that the multiplication result may have overflowed the arithmetic types range. The range coder could, if implemented naively, read one byte over the end. The implementation MUST ensure that no read outside allocated and initialized memory occurs. None of the content carried in FFV1 is intended to be executable. - The reference implementation [REFIMPL] contains no known buffer - overflow or cases where a specially crafted packet or video segment - could cause a significant increase in CPU load. - - The reference implementation [REFIMPL] was validated in the following - conditions: - - * Sending the decoder valid packets generated by the reference - encoder and verifying that the decoder's output matches the - encoder's input. - - * Sending the decoder packets generated by the reference encoder and - then subjected to random corruption. - - * Sending the decoder random packets that are not FFV1. - - In all of the conditions above, the decoder and encoder was run - inside the [VALGRIND] memory debugger as well as clangs address - sanitizer [Address-Sanitizer], which track reads and writes to - invalid memory regions as well as the use of uninitialized memory. - There were no errors reported on any of the tested conditions. - 7. IANA Considerations The IANA is requested to register the following values: 7.1. Media Type Definition This registration is done using the template defined in [RFC6838] and following [RFC4855]. Type name: video @@ -2354,46 +2369,53 @@ 10. Informative References [Address-Sanitizer] The Clang Team, "ASAN AddressSanitizer website", undated, . [AVI] Microsoft, "AVI RIFF File Reference", undated, . + [FFV1GO] Buitenhuis, D., "FFV1 Decoder in Go", 2019, + . + [HuffYUV] Rudiak-Gould, B., "HuffYUV", December 2003, . [I-D.ietf-cellar-ffv1] Niedermayer, M., Rice, D., and J. Martinez, "FFV1 Video Coding Format Version 0, 1, and 3", Work in Progress, - Internet-Draft, draft-ietf-cellar-ffv1-17, 21 August 2020, - . + Internet-Draft, draft-ietf-cellar-ffv1-18, 7 October 2020, + . [ISO.14495-1.1999] International Organization for Standardization, "Information technology -- Lossless and near-lossless compression of continuous-tone still images: Baseline", December 1999. [ISO.14496-10.2014] International Organization for Standardization, "Information technology -- Coding of audio-visual objects -- Part 10: Advanced Video Coding", September 2014. [ISO.14496-12.2015] International Organization for Standardization, "Information technology -- Coding of audio-visual objects -- Part 12: ISO base media file format", December 2015. + [MediaConch] + MediaArea.net, "MediaConch", 2018, + . + [NUT] Niedermayer, M., "NUT Open Container Format", December 2013, . [range-coding] Martin, G. N. N., "Range encoding: an algorithm for removing redundancy from a digitised message", Proceedings of the Conference on Video and Data Recording. Institution of Electronic and Radio Engineers, Hampshire, England, July 1979. @@ -2433,20 +2455,62 @@ This appendix is informative. Some bitstreams were found with 40 extra bits corresponding to "error_status" and "slice_crc_parity" in the "reserved" bits of "Slice()". Any revision of this specification SHOULD care about avoiding to add 40 bits of content after "SliceContent" if "version" == 0 or "version" == 1. Else a decoder conforming to the revised specification could not distinguish between a revised bitstream and such buggy bitstream in the wild. +Appendix C. FFV1 Implementations + + This appendix provides references to a few notable implementations of + FFV1. + +C.1. FFmpeg FFV1 Codec + + This reference implementation [REFIMPL] contains no known buffer + overflow or cases where a specially crafted packet or video segment + could cause a significant increase in CPU load. + + The reference implementation [REFIMPL] was validated in the following + conditions: + + * Sending the decoder valid packets generated by the reference + encoder and verifying that the decoder's output matches the + encoder's input. + + * Sending the decoder packets generated by the reference encoder and + then subjected to random corruption. + + * Sending the decoder random packets that are not FFV1. + + In all of the conditions above, the decoder and encoder was run + inside the [VALGRIND] memory debugger as well as clangs address + sanitizer [Address-Sanitizer], which track reads and writes to + invalid memory regions as well as the use of uninitialized memory. + There were no errors reported on any of the tested conditions. + +C.2. FFV1 Decoder in Go + + An FFV1 decoder was [FFV1GO] written in Go by Derek Buitenhuis during + the work to development this document. + +C.3. MediaConch + + The developers of the MediaConch project [MediaConch] created an + independent FFV1 decoder as part of that project to validate FFV1 + bitstreams. This work led to the discovery of three conflicts + between existing FFV1 implementations and this document without the + added exceptions. + Authors' Addresses Michael Niedermayer Email: michael@niedermayer.cc Dave Rice Email: dave@dericed.com