IPSec Working Group T. Krovetz, Intel INTERNET-DRAFT J. Black, UNR Expires April 2001 S. Halevi, IBM A. Hevia, U.C. San Diego H. Krawczyk, Technion P. Rogaway, U.C. Davis October 2000 UMAC: Message Authentication Code using Universal Hashing <draft-krovetz-umac-00.txt> Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Abstract This specification describes how to generate an authentication tag (also called a "MAC") using the UMAC message authentication code. UMAC is designed to be very fast to compute, in software, on contemporary processors. Measured speeds are as low as 1.0 cycles per byte. The heart of UMAC is a universal hash function, UHASH, which relies on addition and multiplication of 16-bit, 32-bit, or 64-bit numbers, operations well-supported by contemporary machines. To generate the authentication tag on a given message, UHASH is applied to the message and key to produce a short, fixed-length, hash value, and this hash value is then XOR-ed with a key-derived pseudorandom pad. UMAC enjoys a rigorous security analysis and its only "cryptographic" use is a block cipher, AES, to generate the pseudorandom pads and internal key material. Krovetz et al. Expires April 2001 [Page 0]

INTERNET-DRAFT UMAC October 2000 Table of Contents 1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.1 Organization . . . . . . . . . . . . . . . . . . . . . . . . 5 2 Named parameter sets: UMAC16 and UMAC32 . . . . . . . . . . . . . 5 2.1 Named parameters . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Alternative instantiations . . . . . . . . . . . . . . . . . 6 2.3 Naming convention . . . . . . . . . . . . . . . . . . . . . 7 3 Notation and basic operations . . . . . . . . . . . . . . . . . . 7 3.1 Operations on strings . . . . . . . . . . . . . . . . . . . 8 3.2 Operations on integers . . . . . . . . . . . . . . . . . . . 10 3.3 String-Integer conversion operations . . . . . . . . . . . . 10 3.4 Mathematical operations on strings . . . . . . . . . . . . . 11 4 Key and pad derivation functions . . . . . . . . . . . . . . . . 12 4.1 KDF: Key derivation function . . . . . . . . . . . . . . . . 12 4.2 PDF: Pad-derivation function . . . . . . . . . . . . . . . . 13 5 UHASH-32: Universal hash function for a 32-bit word size . . . . 15 5.1 NH-32: NH hashing with a 32-bit word size . . . . . . . . . 16 5.2 L1-HASH-32: First-layer hash . . . . . . . . . . . . . . . . 17 5.3 POLY: Polynomial hash . . . . . . . . . . . . . . . . . . . 19 5.4 L2-HASH-32: Second-layer hash . . . . . . . . . . . . . . . 20 5.5 L3-HASH-32: Third-layer hash . . . . . . . . . . . . . . . . 22 5.6 UHASH-32: Three-layer universal hash . . . . . . . . . . . . 23 6 UHASH-16: Universal hash function for a 16-bit word size . . . . 24 6.1 NH-16: NH hashing with a 16-bit word size . . . . . . . . . 24 6.2 L1-HASH-16: First-layer hash . . . . . . . . . . . . . . . . 25 6.3 L2-HASH-16: Second-layer hash . . . . . . . . . . . . . . . 27 6.4 L3-HASH-16: Third-layer hash . . . . . . . . . . . . . . . . 28 6.5 UHASH-16: Three-layer universal hash . . . . . . . . . . . . 29 7 UMAC tag generation . . . . . . . . . . . . . . . . . . . . . . . 30 7.1 Interface . . . . . . . . . . . . . . . . . . . . . . . . . 30 7.2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . 30 8 Security considerations . . . . . . . . . . . . . . . . . . . . . 31 8.1 Resistance to cryptanalysis . . . . . . . . . . . . . . . . 31 8.2 Tag lengths and forging probability . . . . . . . . . . . . 31 8.3 Selective-assurance authentication . . . . . . . . . . . . . 33 8.4 Nonce considerations . . . . . . . . . . . . . . . . . . . . 34 8.5 Guarding against replay attacks . . . . . . . . . . . . . . 35 9 Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . 36 10 References . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 11 Author contact information . . . . . . . . . . . . . . . . . . . 37 A Suggested application programming interface (API) . . . . . . . . 38 B Reference code and test vectors . . . . . . . . . . . . . . . . . 39 Krovetz et al. Expires April 2001 [Page 1]

INTERNET-DRAFT UMAC October 2000 1 Introduction This specification describes how to generate an authentication tag (also called a "MAC") using the UMAC message authentication code. Typically the authentication tag will be transmitted along with a message and a nonce to allow the receiver of the message to verify the message's authenticity. Generation and verification of the authentication tag depends on the message, the nonce, and on a secret key (typically, shared by sender and receiver). UMAC is designed to be very fast to compute, in software, on contemporary processors. The heart of UMAC is a universal hash function, UHASH, which relies on addition and multiplication of 16-bit, 32-bit, and 64-bit numbers. These operations are supported well by contemporary machines. For many applications, especially ones with short-lived authentication needs, sufficient speed is already obtained by algorithms such as HMAC-SHA1 [2, 9] or the CBC-MAC of a block cipher [1, 8]. But for the most speed-demanding applications, UMAC may be a better choice: An optimized implementation of UMAC can achieve peak performance which is about an order of magnitude faster than what can be achieved with HMAC or CBC-MAC. Moreover, UMAC offers a tradeoff between forging probability and speed (it is possible to trade forging probability for speed). UMAC has been designed so that computing the prefix of a tag can be done faster than computing the entire tag. This feature allows for a receiver to verify the authenticity of a message to various levels of assurance depending on its needs and resources. Finally, UMAC enjoys better analytical security properties than many other constructions. Closely associated to this specification are the papers [3, 4, 10, 11]. See those papers for descriptions of the ideas which underlie this algorithm, for performance data, and for proofs of the correctness and maximal forging probability of UMAC. The UMAC algorithms described in the papers [3, 4] are "parameterized". This means that various low-level choices, like the endian convention and the underlying cryptographic primitive, have not been fixed. One must choose values for these parameters before the authentication tag generated by UMAC (for a given message, key, and nonce) becomes fully-defined. In this document we provide two collections of parameter settings, and have named the sets UMAC16 and UMAC32. The parameter sets have been chosen based on experimentation and provide good performance on a wide variety of processors. UMAC16 is designed to excel on processors which provide small-scale SIMD parallelism of the type found in Intel's MMX and Motorola's AltiVec instruction sets, while UMAC32 is designed to do well on processors Krovetz et al. Expires April 2001 [Page 2]

INTERNET-DRAFT UMAC October 2000 with good 32- and 64- bit support. UMAC32 may take advantage of SIMD parallelism in future processors. UMAC has been designed to allow implementations which accommodate "on-line" authentication. This means that pieces of the message may be presented to UMAC at different times (but in correct order) and an on-line implementation will be able to process the message correctly without the need to buffer more than a few dozen bytes of the message. For simplicity, the algorithms in this specification are presented as if the entire message being authenticated were available at once. The ideas which underlie UMAC go back to Wegman and Carter [12]. The sender and receiver share a secret key (the MAC key) which determines: * The key for a "universal hash function". This hash function is "non-cryptographic", in the sense that it does not need to have any cryptographic "hardness" property. Rather, it needs to satisfy some combinatorial property, which can be proven to hold without relying on unproven hardness assumptions. The concept of a universal hash function (family) is due to [5]. * The key for a pseudorandom function. This is where one needs a cryptographic hardness assumption. The pseudorandom function may be obtained (for example) from a block cipher or cryptographic hash function. The concept of a pseudorandom function (family) is due to [6]. To authenticate a message, Msg, one first applies the universal hash function, resulting in a string which is typically much shorter than the original message. The pseudorandom function is applied to a nonce, and the result is used in the manner of a Vernam cipher: the authentication tag is the xor of the output from the hash function and the output from the pseudorandom function. Thus, an authentication tag is generated as AuthTag = f(Nonce) xor h(Msg). Here f is the pseudorandom function shared between the sender and the receiver, and h is a universal hash function shared by the sender and the receiver. In UMAC, a shared key is used to key the pseudorandom function f, and then f is used for both tag generation and internally to generate all of the bits needed by the universal hash function. For a general discussion of the speed and assurance advantages of this approach see, for example, [3, 7]. The universal hash function that we use is called UHASH. It combines Krovetz et al. Expires April 2001 [Page 3]

INTERNET-DRAFT UMAC October 2000 several software-optimized algorithms into a multi-layered structure. The algorithm is moderately complex. Some of this complexity comes from extensive speed optimizations. For the pseudorandom function we use the block cipher of the Advanced Encryption Standard (AES). (At the time of this working draft, the AES definition process is still in progress. Here AES refers to the final blok cipher defined by this process.) Any block cipher with the same block-length (128 bits) and key-length (128 bits) could trivially be substituted in place of what we call AES. With slightly more effort one can define UMAC using a pseudorandom function other than a block cipher. One unusual feature of UMAC is that authentication-tag generation depends on a nonce (in addition to depending on the message and key) It is imperative that the nonce not be reused when generating authentication tags under the same key. Thus the nonce will normally be implemented by a counter, though any other way to achieve a non- repeating value (or almost certainly non-repeating value) is acceptable. This document specifies the procedure for generating the authentication tag from the message, key and nonce. The exact way in which the message, nonce and authentication tag are transmitted between sender and receiver is not specified here. It is the responsibility of the particular applications using UMAC to specify how the message, nonce and tag are transmitted. For example, an application may choose to send the three values concatenated by some encoding scheme while others may choose not to transmit the nonce at all if it is known to both parties (e.g., when the nonce is a shared state used to detect replay of messages), or to send only part of the bits of the nonce. Section 8 discusses security considerations that are important for the proper understanding and use of UMAC. To the authors' knowledge no ideas utilized in UMAC have been or will be patented. To the best of the authors' knowledge, it should be possible to use UMAC immediately, without any intellectual property concerns. Public-domain reference code for UMAC is available from the UMAC homepage: http://www.cs.ucdavis.edu/~rogaway/umac/ Other information, like timing data and papers, are distributed from the same URL. Krovetz et al. Expires April 2001 [Page 4]

INTERNET-DRAFT UMAC October 2000 1.1 Organization The rest of this document is organized as follows: In Section 2 parameters of the named parameter sets UMAC16 and UMAC32 are described. In Section 3 we introduce the basic notations used throughout the rest of the document. Section 4 describes the methods used for generating the Vernam pad and the pseudorandom strings needed internally for hashing. In Sections 5 and 6 the universal hash function is described. Finally, in Section 7 we describe how all these components fit together in the UMAC construction. Some readers may prefer to read sections 4-7 backwards, in order to get a top-down description. Section 8 describes some security considerations in the use of UMAC. 2 Named parameter sets: UMAC16 and UMAC32 As described in [3, 4], a concrete instantiation of UMAC requires the setting of many parameters. We have chosen two sets of values for all of these parameters which allow for good performance on a wide variety of processors. For maximum performance we offer UMAC16 which is designed to exploit the vector-parallel instructions on the Intel MMX and Motorola AltiVec instruction sets. For good performance on processors which support 32- and 64-bit quantities well, we offer UMAC32. 2.1 Named parameters Throughout the algorithms described in this document, we have integrated most of the parameters required for a concrete UMAC instantiation as unnamed numeric constants. However, we have named six parameters and assign them the following values depending on whether one wishes to use UMAC16 or UMAC32. UMAC16 UMAC32 ------ ------ WORD-LEN 2 4 UMAC-OUTPUT-LEN 8 8 L1-KEY-LEN 1024 1024 UMAC-KEY-LEN 16 16 ENDIAN-FAVORITE LITTLE LITTLE L1-OPERATIONS-SIGN SIGNED UNSIGNED Here we give a brief explanation of the role each named parameter plays. Krovetz et al. Expires April 2001 [Page 5]

INTERNET-DRAFT UMAC October 2000 WORD-LEN: Specifies the size in bytes of a "word". UMAC will be significantly faster in execution if the executing machine supports well certain operations on datatypes of this size. Note that WORD-LEN is not necessarily the native wordsize of the target machine (and on some machines a smaller value turns out to be preferable). UMAC-OUTPUT-LEN: Specifies the length of the authentication tag generated by UMAC, in bytes. L1-KEY-LEN: Specifies the "block length," in bytes, on which the hash-function initially operates. This much storage (and then some) will be needed in the run-time environment for UMAC's internal keys. UMAC-KEY-LEN: Specifies the length in bytes of the user-sup- plied UMAC key. ENDIAN-FAVORITE: Specifies which endian-orientation will be fol- lowed in the reading of data to be hashed. This need not be equal to the native endianess of any specific machine running UMAC. L1-OPERATIONS-SIGN: Specifies whether the strings manipulated in the hash-function are to be initially consid- ered as signed or unsigned integers. 2.2 Alternative instantiations Although this document only specifies two named parameter sets, the named parameters could be altered to suit specific authentication needs which are not adequately served by either UMAC16 or UMAC32. Below, we list alternatives that are supported by this specification for each of the named parameters. WORD-LEN ::= 2 | 4 UMAC-OUTPUT-LEN ::= 1 | 2 | ... | 31 | 32 L1-KEY-LEN ::= 32 | 64 | 128 | 256 | ... | 2^28 UMAC-KEY-LEN ::= 16 | 32 ENDIAN-FAVORITE ::= BIG | LITTLE L1-OPERATIONS-SIGN ::= SIGNED | UNSIGNED Roughly speaking, doubling UMAC-OUTPUT-LEN approximately doubles execution time and squares (ie. decreases) the probability of MAC Krovetz et al. Expires April 2001 [Page 6]

INTERNET-DRAFT UMAC October 2000 forgery. Setting ENDIAN-FAVORITE to BIG causes UMAC to perform better on big-endian processors rather than little-endian processors. Setting L1-OPERATIONS-SIGN to UNSIGNED slightly increases UMAC security at the expense of complicating implementations on systems which do not support unsigned integers well. This effectively disallows the use of Intel's MMX instructions which only support signed integers. Finally, increasing L1-KEY-LEN tends to speed tag generation on large messages, but requires more memory for processing and could potentially slow the processor by overflowing its cache. 2.3 Naming convention A concise shorthand may be used to specify an instance of UMAC. The word "UMAC" followed by up to six parameters specifies unambiguously an instance of UMAC. If only a prefix of the six parameters are written, it is implicitly specified that those missing parameters take on default values listed below. The format of the shorthand is "UMAC-w/l/n/k/s/e", and the meaning of the letters (and their defaults) is as follows: w = WORD-LEN (4) l = UMAC-OUTPUT-LEN (8) n = L1-KEY-LEN (1024) k = UMAC-KEY-LEN (16) s = L1-OPERATIONS-SIGN (UNSIGNED) e = ENDIAN-FAVORITE (LITTLE) Some examples UMAC-4/8/1024/16/UNSIGNED/LITTLE (Same as named set "UMAC32" ) UMAC-2/8/1024/16/SIGNED/LITTLE (Same as named set "UMAC16" ) UMAC-4/12 ("UMAC32" with 96-bit output) UMAC-2/8/4096 ("UMAC16" with 4K L1-key and) (unsigned L1-OPERATIONS ) 3 Notation and basic operations The specification of UMAC involves the manipulation of both strings and numbers. String variables are denoted with initial capitals (upper-case), whereas numeric variables are denoted in all lower- case. Global parameters are denoted in all capital letters. Simple functions, like those for string-length and string-xor, are written with all lower-case, while the algorithms of UMAC are named in all upper-case. Whenever a variable is followed by an underscore ("_"), the Krovetz et al. Expires April 2001 [Page 7]

INTERNET-DRAFT UMAC October 2000 underscore is intended to denote a subscript, with the subscripted expression needing to be evaluated to resolve the meaning of the variable. For example, if i=2, then M_{2 * i} refers to the variable M_4. We now define some basic operations for manipulating strings and numbers, and for converting between the two. 3.1 Operations on strings In this specification, we view the messages to be hashed (as well as the keys used for hashing) as strings of bytes. A "byte" is an 8-bit string. The algorithms have been designed so that they are easily extendable to allow arbitrary bit-strings, if necessary. We use the following notation to manipulate these strings. length(S): The length of string S in bytes. zeroes(n): The string made of n zero-bytes. S xor T: The string which is the bitwise exclusive-or of S and T. Strings S and T must have the same length. S and T: The string which is the bitwise conjunction of S and T. Strings S and T must have the same length. S[i]: The i-th byte of the string S (indices begin at 1). S[i..j]: The substring of S consisting of bytes i through j. S || T: The string S concatenated with string T. zeropad(S,n): The string S, padded with zero-bytes to the nearest non-zero multiple of n bytes. Formally, zeropad(S,n) = S || zeroes(i), where i is the smallest nonnegative integer such that S || zeroes(i) is non-empty and n divides length(S)+i. 3.1.1 ENDIAN-SWAP: Adjusting endian orientation This routine is used to make the data input to UMAC conform to the ENDIAN-FAVORITE global parameter. Krovetz et al. Expires April 2001 [Page 8]

INTERNET-DRAFT UMAC October 2000 3.1.1.1 Discussion The most time consuming portion of many UMAC computations involves the reading of key and message data from memory. Because big- and little-endian computers will read these bytes differently, specifying a particular endian-orientation for UMAC could have significant performance ramifications. If necessary, the key-bytes can be preprocessed once during key setup to eliminate the need for their reorientation during performance-critical tag generation. But, message data presumably cannot be preprocessed. Any reorientation needed for each message must be done during tag generation, introducing a significant penalty to computers whose native endian- orientation is opposite to that specified for UMAC. Therefore, UMAC defines a parameter, ENDIAN-FAVORITE, which allows UMAC to be specified to favor big- or little-endian memory conventions. If the parameter is set to favor little-endian computers, then we specify the reversal of the bytes of every word in the input message using the following support function. By reversing the data in the specification, an implementation on a little-endian machine would in fact do nothing but read the input data using native-endian word loads. The loads would automatically reverse the bytes within each word, fulfilling the requirements of the specification. Any other endian reorientation needed to comply with the specification requires an insignificant amount of time during each tag calculation. 3.1.1.2 Interface Function Name: ENDIAN-SWAP Input: S, string with length divisible by WORD-LEN bytes. Output: T, string S with each word endian-reversed. 3.1.1.3 Algorithm Compute T using the following algorithm. // // Break S into word-size chunks // n = length(S) / WORD-LEN Let S_1, S_2, ..., S_n be strings of length WORD-LEN bytes so that S_1 || S_2 || .. || S_n = S. // // Byte-reverse each chunk, and build-up T Krovetz et al. Expires April 2001 [Page 9]

INTERNET-DRAFT UMAC October 2000 // T = <empty string> for i = 1 to n do Let W_1, W_2, ..., W_{WORD-LEN} be bytes so that W_1 || W_2 || ... || W_{WORD-LEN} = S_i SReversed_i = W_{WORD-LEN} || W_{WORD-LEN - 1} || ... || W_1 T = T || SReversed_i Return T 3.2 Operations on integers In this specification, we generally use standard notation for mathematical operations, such as "*" for multiplication, "+" for addition and "mod" for modular reduction. Some less standard notations are defined here. a^i: The integer a raised to the integer i-th power. lg a: The base-2 logarithm of integer a. floor(x): The largest integer less than or equal to x. ceil(x): The smallest integer greater than or equal to x. prime(n): The largest prime number less than 2^n. The prime numbers used in UMAC are: +-----+--------------------+---------------------------------------+ | x | prime(x) [Decimal] | prime(x) [Hexadecimal] | +-----+--------------------+---------------------------------------+ | 19 | 2^19 - 1 | 0x0007FFFF | | 32 | 2^32 - 5 | 0xFFFFFFFB | | 36 | 2^36 - 5 | 0x0000000F FFFFFFFB | | 64 | 2^64 - 59 | 0xFFFFFFFF FFFFFFC5 | | 128 | 2^128 - 159 | 0xFFFFFFFF FFFFFFFF FFFFFFFF FFFFFF61 | +-----+--------------------+---------------------------------------+ 3.3 String-Integer conversion operations We convert between strings and integers using the following functions. Each function treats initial bits as more significant than later ones. Krovetz et al. Expires April 2001 [Page 10]

INTERNET-DRAFT UMAC October 2000 bit(S,n): Returns the integer 1 if the n-th bit of the string S is 1, otherwise returns the integer 0 (indices begin at 1). Here n must be between 1 and the bit- length of S. str2uint(S): The non-negative integer whose binary representation is the string S. More formally, if S is t bits long then str2uint(S) = 2^{t-1} * bit(S,1) + 2^{t-2} * bit(S,2) + ... + 2^{1} * bit(S,t-1) + bit(S,t). uint2str(n,i): The i-byte string S such that str2uint(S) = n. If no such string exists then uint2str(n,i) is unde- fined. str2sint(S): The integer whose binary representation in two's- complement is the string S. More formally, if S is t bits long then str2sint(S) = -2^{t-1} * bit(S,1) + 2^{t-2} * bit(S,2) + ... + 2^{1} * bit(S,t-1) + bit(S,t). sint2str(n,i): The i-byte string S such that str2sint(S) = n. If no such string exists then sint2str(n,i) is unde- fined. 3.4 Mathematical operations on strings One of the primary operations in the universal hashing part of UMAC is repeated application of addition and multiplication on strings. We use "+_n" and "*_n" to denote the string operations which are equivalent to addition and multiplication modulo 2^n, respectively. These operations correspond exactly with the addition and multiplication operations which are performed efficiently on registers by modern computers. So, when n is 16, 32 or 64, these operations can be preformed by computers very quickly. There are two interpretations of the operators depending on whether the strings are interpreted as signed or unsigned integers. The global parameter L1-OPERATIONS-SIGN determines which interpretation is made. If strings S and T are interpreted as signed integers (that is, L1-OPERATIONS-SIGN == SIGNED) then "S *_n T" as uint2str(str2sint(S) * str2sint(T) mod 2^n, n/8), and "S +_n T" as uint2str(str2sint(S) + str2sint(T) mod 2^n, n/8). Krovetz et al. Expires April 2001 [Page 11]

INTERNET-DRAFT UMAC October 2000 If strings S and T are interpreted as unsigned integers (that is, L1-OPERATIONS-SIGN == UNSIGNED) then we define "S *_n T" as uint2str(str2uint(S) * str2uint(T) mod 2^n, n/8), and "S +_n T" as uint2str(str2uint(S) + str2uint(T) mod 2^n, n/8). In any case, the number n must be divisible by 8. In this document we use S *_16 T, S *_32 T, S *_64 T, S +_32 T and S +_64 T, corresponding to multiplication of 2, 4 and 8 byte numbers, and the addition of 4 and 8 byte numbers. 4 Key and pad derivation functions UMAC, as described in this document, requires either a 16- or 32-byte key which is used with a key-derivation function (KDF) to produce pseudorandom bits needed within the universal hash function. 4.1 KDF: Key derivation function Stretching the user-supplied key into pseudorandom bits used internally by UMAC is done with a key-derivation function (KDF). In this section we define a KDF which is efficiently instantiated with a block cipher. The Advanced Encryption Standard (AES) is used in output-feedback mode to produce the required bits. If UMAC-KEY-LEN is 16, then the 128-bit key/128-bit block-length variant of AES is used, and if UMAC-KEY-LEN is 32, then the 256-bit key/128-bit block- length variant is used. The KDF requires an "index" parameter. Using the same key, but different indices, generates different pseudorandom outputs. 4.1.1 Interface Function Name: KDF Input: K, string of length UMAC-KEY-LEN bytes // key to AES index, a non-negative integer less than 256. numbytes, a positive integer. Output: Y, string of length numbytes bytes. Krovetz et al. Expires April 2001 [Page 12]

INTERNET-DRAFT UMAC October 2000 4.1.2 Algorithm Compute Y using the following algorithm. // // Calculate number of AES iterations, set indexed starting point // n = ceil(numbytes / 16) T = zeroes(15) || uint2str(index, 1) Y = <empty string> // // Build Y using AES in a feedback mode // for i = 1 to n do T = AES(K, T) Y = Y || T Y = Y[1..numbytes] Return Y 4.2 PDF: Pad-derivation function The Wegman-Carter MAC scheme used in UMAC requires the exclusive-or of a pseudorandom string with the output from the universal hash function. The pseudorandom string is obtained by applying a pad- derivation function (PDF) to a nonce which, for security reasons, must change with each authentication-tag computation. Nonces may be any number of bytes from 1 to 16, but all nonces in a single authentication session must be of equal length. In this section we define a PDF which is efficiently instantiated with a block cipher. Again we use AES with either 16- or 32-bytes keys depending on the value of UMAC-KEY-LEN. 4.2.1 Discussion The PDF output is exclusive-or'd with the result of the universal hash function. AES, however, may provide more or fewer bits per invocation than are needed for this purpose. For example, UMAC- OUTPUT-LEN is normally 8 bytes and AES produces an output of 16 bytes. It would save processing time if half of the AES output bits could be used to generate one tag, and then the second half of the same AES output could be used for the tag of the next message. For this reason, we include an optimization which allows the use of different substrings of the same AES output. This optimization is Krovetz et al. Expires April 2001 [Page 13]

INTERNET-DRAFT UMAC October 2000 effective only when nonces are sequential. We do so by using the low bits of the nonce as an index into the AES output, which is generated using the higher bits of the nonce which are not used for indexing. This speeds message authentication by reducing the average time spent by AES for each authentication. Note that if a counter-variable is used to exploit this optimization, and the variable is stored in memory, then the variable must be treated as big-endian. If UMAC- OUTPUT-LEN is larger than 16, then two AES invocations are required to produce a sufficient number of bits. 4.2.2 Interface Function Name: PDF Input: K, string of length UMAC-KEY-LEN bytes // key for AES Nonce, string of length 1 to 16 bytes. Output: Y, string of length UMAC-OUTPUT-LEN bytes. 4.2.3 Algorithm Compute Y using the following algorithm. // // Make Nonce 16 bytes by prepending zeroes // Nonce = Nonce || zeroes(16 - length(Nonce)) // // If one AES invocation is enough for more than one // PDF invocation. // if (UMAC-OUTPUT-LEN <= 8) then // // Compute number of index bits needed // i = floor(16 / UMAC-OUTPUT-LEN) numlowbits = floor(lg(i)) // // Extract index bits and zero low bits of Nonce // nlowbitsnum = str2uint(Nonce) mod 2^numlowbits Nonce = Nonce xor uint2str(nlowbitsnum, 16) Krovetz et al. Expires April 2001 [Page 14]

INTERNET-DRAFT UMAC October 2000 // // Generate subkey, AES and extract indexed substring // K' = KDF(K, 128, UMAC-KEY-LEN) T = AES(K', Nonce) Y = T[ nlowbitsnum * UMAC-OUTPUT-LEN + 1 .. (nlowbitsnum + 1) * UMAC-OUTPUT-LEN] else // // Repeated AES calls to build length // K_1 = KDF(K, 128, UMAC-KEY-LEN) K_2 = KDF(K, 129, UMAC-KEY-LEN) if (UMAC-OUTPUT-LEN <= 16) Y = AES(K_1, Nonce) else Y = AES(K_1, Nonce) || AES(K_2, Nonce) Y = Y[1..UMAC-OUTPUT-LEN] Return Y 5 UHASH-32: Universal hash function for a 32-bit word size UHASH is a keyed hash function, which takes as input a string of arbitrary length, and produces as output a string of fixed length (such as 8 bytes). The actual output length depends on the parameter UMAC-OUTPUT-LEN. UHASH has been shown to be epsilon-ASU ("Almost Strongly Universal"), where epsilon is a small (parameter-dependent) real number. Informally, saying that a keyed hash function is epsilon-ASU means that for any two distinct fixed input strings, the two outputs of the hash function with a random key "look almost like a pair of random strings". The number epsilon measures how non-random the output strings may be. For details, see [3, 4, 11]. UHASH has been designed to be fast by exploiting several architectural features of modern commodity processors. It was specifically designed for use in UMAC. But UHASH is useful beyond that domain, and can be easily adopted for other purposes. UHASH does its work in three layers. First, a hash function called NH [3] is used to compress input messages into strings which are typically many times smaller than the input message. Second, the compressed message is hashed with an optimized "polynomial hash Krovetz et al. Expires April 2001 [Page 15]

INTERNET-DRAFT UMAC October 2000 function" into a fixed-length 16-byte string. Finally, the 16-byte string is hashed using an "inner-product hash" into a string of length WORD-LEN bytes. These three layers are repeated (with a modified key) until the outputs total UMAC-OUTPUT-LEN bytes. Note: Because the repetitions of the three-layer scheme are independent (aside from sharing some internal key), it follows that each "word" of the final output can be computed independently. Hence, to compute a prefix of a UMAC tag, one can simply repeat the three-layer scheme fewer times. Thus, computing a prefix of the tag can be done significantly faster than computing the whole tag. 5.1 NH-32: NH hashing with a 32-bit word size The first of the three hash-layers that UHASH uses is the NH hash function [3]. More than any other part of UHASH, NH is designed to be fast on modern processors, because it is where the bulk of the UHASH work is done. The NH universal hash function hashes an input string M using a key K by considering M and K to be arrays of integers, each WORD-LEN bytes in length, and performing a sequence of arithmetic operations on them. See [3] for definitions, proofs and rationale relating to NH. The NH-32 algorithm is designed to perform well on processors which support well multiplications of 32-bit operands into 64-bit results. NH-32 is also designed to exploit the recent trend of including instructions for small-scale vector parallelism in uniprocessor CPUs. Intel's Streaming SIMD 2 instruction set is a good example of this trend. It supports an instruction, which multiplies two pairs of 32-bit operands into two 64-bit results, which can be used by UHASH-32 for accelerated hashing. To accommodate this parallelism, NH-32 accesses data-words in pairs which are 4 words (16 bytes) apart. 5.1.1 Interface Function Name: NH-32 Input: K, string of length L1-KEY-LEN bytes. M, string with length divisible by 32 bytes. Output: Y, string of length 8 bytes. Krovetz et al. Expires April 2001 [Page 16]

INTERNET-DRAFT UMAC October 2000 5.1.2 Algorithm Compute Y using the following algorithm. // // Break M and K into 4-byte chunks // t = length(M) / 4 Let M_1, M_2, ..., M_t be 4-byte strings so that M = M_1 || M_2 || .. || M_t. Let K_1, K_2, ..., K_t be 4-byte strings so that K_1 || K_2 || .. || K_t is a prefix of K. // // Perform NH hash on the chunks, pairing words for multiplication // which are 4 apart to accommodate vector-parallelism. // Y = zeroes(8) i = 1 while (i < t) do Y = Y +_64 ((M_{i+0} +_32 K_{i+0}) *_64 (M_{i+4} +_32 K_{i+4})) Y = Y +_64 ((M_{i+1} +_32 K_{i+1}) *_64 (M_{i+5} +_32 K_{i+5})) Y = Y +_64 ((M_{i+2} +_32 K_{i+2}) *_64 (M_{i+6} +_32 K_{i+6})) Y = Y +_64 ((M_{i+3} +_32 K_{i+3}) *_64 (M_{i+7} +_32 K_{i+7})) i = i + 8 Return Y 5.2 L1-HASH-32: First-layer hash To limit the length of key required in the first layer of hashing, L1-HASH-32 breaks the input message into chunks no longer than L1-KEY-LEN and NH hashes each with a key of the same length. 5.2.1 Discussion The NH hash function requires a key which is just as long as the message being hashed. To limit the amount of key used in the NH hashing layer, we use a key of fixed length (defined by the parameter L1-KEY-LEN), and process the message in chunks of this length (or less). The L1-HASH-32 algorithm takes an input message and breaks it into chunks of L1-KEY-LEN bytes (except the last chuck, which may be shorter and may need to be zero-padded to an appropriate length). Each chunk is hashed with NH-32, and the outputs from all the NH invocations are annotated with some length information and concatenated to produce the final L1-HASH-32 result. Krovetz et al. Expires April 2001 [Page 17]

INTERNET-DRAFT UMAC October 2000 If ENDIAN-FAVORITE is LITTLE, then each word in the input message is required to be endian reversed. 5.2.2 Interface Function Name: L1-HASH-32 Input: K, string of length L1-KEY-LEN bytes. M, string of length less than 2^64 bytes. Output: Y, string of length (8 * ceil(length(M)/L1-KEY-LEN)) bytes. 5.2.3 Algorithm Compute Y using the following algorithm. // // Break M into L1-KEY-LEN byte chunks (final chunk may be shorter) // t = ceil(length(M) / L1-KEY-LEN) Let M_1, M_2, ..., M_t be strings so that M = M_1 || M_2 || .. || M_t, and length(M_i) = L1-KEY-LEN for all 0 < i < t. // // For each chunk, except the last: endian-adjust, NH hash // and add bit-length. Use results to build Y. // Len = uint2str(L1-KEY-LEN * 8, 8) Y = <empty string> for i = 1 to t-1 do if (ENDIAN-FAVORITE == LITTLE) then // See endian discussion ENDIAN-SWAP(M_i) // in section 3.1.1 Y = Y || (NH-32(K, M_i) +_64 Len) // // For the last chunk: pad to 32-byte boundary, endian-adjust, // NH hash and add bit-length. Concatenate the result to Y. // Len = uint2str(length(M_t) * 8, 8) M_t = zeropad(M_t, 32) if (ENDIAN-FAVORITE == LITTLE) then ENDIAN-SWAP(M_t) Y = Y || (NH-32(K, M_t) +_64 Len) return Y Krovetz et al. Expires April 2001 [Page 18]

INTERNET-DRAFT UMAC October 2000 5.3 POLY: Polynomial hash The output from L1-HASH is a string which is shorter than, but still proportional to, that of its input. The POLY hash algorithm takes an arbitrary message and hashes it to a fixed length. 5.3.1 Discussion Polynomial hashing treats an input message as a sequence of coefficients of a polynomial, and the hash-key is the point at which this polynomial is evaluated. The security guarantee assured by polynomial hashing degrades linearly in the length of the message being hashed. If two messages of n words are hashed, then the probability they collide when hashed by POLY with a prime modulus of p is no more than n / p. For more information on the polynomial hashing schemes used in UMAC see [10]. The parameter 'wordbits' specifies the prime modulus used in the polynomial as well as the granularity (length of words) in which the input message should be broken. Because some strings of length wordbits are greater than prime(wordbits), a mechanism is needed to fix words which are not in the range 0 .. prime(wordbits) - 1. To this end, any word larger than 'maxwordrange' is split into two words guaranteed to be in range, and each is hashed by the polynomial hash. 5.3.2 Interface Function Name: POLY Input: wordbits, positive integer divisible by 8. maxwordrange, positive integer less than 2^wordbits. k, integer in the range 0 .. prime(wordbits) - 1. M, string with length divisible by (wordbits / 8) bytes. Output: y, integer in the range 0 .. prime(wordbits) - 1. 5.3.3 Algorithm Compute y using the following algorithm. // // Define constants used for fixing out-of-range words // wordbytes = wordbits / 8 Krovetz et al. Expires April 2001 [Page 19]

INTERNET-DRAFT UMAC October 2000 p = prime(wordbits) offset = 2^wordbits - p marker = p - 1 // // Break M into chunks of length wordbytes bytes // n = length(M) / wordbytes Let M_1, M_2, ..., M_n be strings of length wordbytes bytes so that M = M_1 || M_2 || .. || M_n // // For each input word, compare it with maxwordrange. If larger // then hash the words 'marker' and (m - offset), both in range. // y = 1 for i = 1 to n do m = str2uint(M_i) if (m >= maxwordrange) then y = (k * y + marker) mod p y = (k * y + (m - offset)) mod p else y = (k * y + m) mod p Return y 5.4 L2-HASH-32: Second-layer hash Because L1-HASH may produce quite long strings, and POLY's security guarantee degrades linearly, a scheme is required to allow long strings while ensuring that the collision probability never grows beyond a certain pre-set bound. This is accomplished by dynamically increasing the prime modulus used in the polynomial hashing as the collision probability bound is approached. 5.4.1 Discussion The probability of two n-word messages hashing to the same result when polynomially hashed with prime modulus p is as much as (n / p). To maintain a limit on the maximum collision probability, a scheme is needed to disallow (n / p) growing too large. The scheme used here hashes a number of words n_1 under modulus p_1 until (n_1 / p_1) reaches a critical point. The result of the hash-so-far is prepended to the remaining message needing to be hashed, and the hashing continues, but under a prime modulus p_2 which is substantially larger than p_1. Hashing continues for n_2 more words until (n_2 / Krovetz et al. Expires April 2001 [Page 20]

INTERNET-DRAFT UMAC October 2000 p_2) also reaches a critical point, at which time a new larger prime p_3 could be used. Because polynomial hashing under a small prime modulus is often faster than hashing under a large one, this dynamic ramping-up of the polynomial's modulus provides a hash function which is faster on short messages, but still accommodates long ones. The keys used for polynomial hashing are restricted to particular subsets to allow for faster implementations on 32-bit architectures. The restrictions allow an implementor to disregard some potential arithmetic carries during computation. For more information see [10]. 5.4.2 Interface Function Name: L2-HASH-32 Input: K, string of length 24 bytes. M, string of length less than 2^64 bytes. Output: Y, string of length 16 bytes. 5.4.3 Algorithm Compute y using the following algorithm. // // Extract keys and restrict to special key-sets // Mask64 = uint2str(0x01ffffff01ffffff, 8) Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16) k64 = str2uint(K[1..8] and Mask64) k128 = str2uint(K[9..24] and Mask128) // // If M no more than 2^17 bytes, hash under 64-bit prime, // otherwise, hash first 2^17 bytes under 64-bit prime and // remainder under 128-bit prime. // if (length(M) <= 2^17) then // 2^14 64-bit words // // View M as an array of 64-bit words, and use POLY modulo Krovetz et al. Expires April 2001 [Page 21]

INTERNET-DRAFT UMAC October 2000 // prime(64) (and with bound 2^64 - 2^32) to hash it. // y = POLY(64, 2^64 - 2^32, k64, M) else M_1 = M[1 .. 2^17] M_2 = M[2^17 + 1 .. length(M)] M_2 = zeropad(M_2 || uint2str(0x80,1), 16) y = POLY(64, 2^64 - 2^32, k64, M_1) y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_2) Y = uint2str(y, 16) Return Y 5.5 L3-HASH-32: Third-layer hash The output from L2-HASH-32 is 16 bytes long. This final hash function hashes the 16-byte string to a fixed length of 4 bytes using a simple inner-product hash with affine translation. A 36-bit prime modulus is used to improve security. 5.5.1 Interface Function Name: L3-HASH-32 Input: K1, string of length 64 bytes. K2, string of length 4 bytes. M, string of length 16 bytes. Output: Y, string of length 4 bytes. 5.5.2 Algorithm Compute Y using the following algorithm. y = 0 // // Break M and K1 into 8 chunks and convert to integers // for i = 1 to 8 do M_i = M [(i - 1) * 2 + 1 .. i * 2] K_i = K1[(i - 1) * 8 + 1 .. i * 8] Krovetz et al. Expires April 2001 [Page 22]

INTERNET-DRAFT UMAC October 2000 m_i = str2uint(M_i) k_i = str2uint(K_i) mod prime(36) // // Inner-product hash, extract last 32 bits and affine-translate // y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(36) y = y mod 2^32 Y = uint2str(y, 4) Y = Y xor K2 Return Y 5.6 UHASH-32: Three-layer universal hash The hash functions L1-HASH, L2-HASH and L3-HASH are used together in a straightforward manner. A message is first hashed by L1-HASH, its output is then hashed by L2-HASH, whose output is then hashed by L3-HASH. If the message being hashed is no longer than L1-KEY-LEN bytes, then L2-HASH is skipped as an optimization. Because L3-HASH outputs a string whose length is only WORD-LEN bytes long, multiple iterations of this three-layer hash are used, with different keys each time, until UMAC-OUTPUT-LEN have been generated. To reduce memory requirements, L1-HASH and L3-HASH both reuse most of their key-material between iterations. 5.6.1 Interface Function Name: UHASH-32 Input: K, string of length UMAC-KEY-LEN bytes. M, string of length less than 2^64 bytes. Output: Y, string of length UMAC-OUTPUT-LEN bytes. 5.6.2 Algorithm Compute Y using the following algorithm. // // Calculate iterations needed to make UMAC-OUTPUT-LEN bytes // streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN) // Krovetz et al. Expires April 2001 [Page 23]

INTERNET-DRAFT UMAC October 2000 // Define total key needed for all iterations using KDF. // L1Key and L3Key1 both reuse most key between iterations. // L1Key = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16) L2Key = KDF(K, 1, streams * 24) L3Key1 = KDF(K, 2, streams * 64) L3Key2 = KDF(K, 3, streams * 4) // // For each iteration, extract key and three-layer hash. // If length(M) <= L1-KEY-LEN, then skip L2-HASH. // Y = <empty string> for i = 1 to streams do L1Key_i = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN] L2Key_i = L2Key [(i-1) * 24 + 1 .. i * 24] L3Key1_i = L3Key1[(i-1) * 64 + 1 .. i * 64] L3Key2_i = L3Key2[(i-1) * 4 + 1 .. i * 4] A = L1-HASH-32(L1Key_i, M) if (length(M) <= L1-KEY-LEN) then B = zeroes(8) || A else B = L2-HASH-32(L2Key_i, A) C = L3-HASH-32(L3Key1_i, L3Key2_i, B) Y = Y || C Y = Y[1 .. UMAC-OUTPUT-LEN] Return Y 6 UHASH-16: Universal hash function for a 16-bit word size See Section 5 (UHASH-32) for general discussion of the UHASH algorithm. Each sub-section of Section 6 will note only differences between UHASH-32 and UHASH-16. 6.1 NH-16: NH hashing with a 16-bit word size The NH-16 algorithm is designed to exploit the recent trend of including instructions for small-scale vector parallelism in uniprocessor CPUs. Intel's MMX and Mororola's AltiVec instruction sets are good examples of this trend. Both support single- instruction multiply-add instructions on vectors of 16-bit words which can be used by UHASH-16 for accelerated hashing. To accommodate this parallelism, NH-16 accesses data-words in pairs which are 8 words (16 bytes) apart. Krovetz et al. Expires April 2001 [Page 24]

INTERNET-DRAFT UMAC October 2000 6.1.1 Interface Function Name: NH-16 Input: K, string of length L1-KEY-LEN bytes. M, string with length divisible by 32 bytes. Output: Y, string of length 4 bytes. 6.1.2 Algorithm Compute Y using the following algorithm. // // Break M and K into 2-byte chunks // t = length(M) / 2 Let M_1, M_2, ..., M_t be 2-byte strings so that M = M_1 || M_2 || .. || M_t. Let K_1, K_2, ..., K_t be 2-byte strings so that K_1 || K_2 || .. || K_t is a prefix of K. // // Perform NH hash on the chunks, pairing words for multiplication // which are 8 apart to accommodate vector-parallelism. // Y = zeroes(4) i = 1 while (i < t) do Y = Y +_32 ((M_{i+0} +_16 K_{i+0}) *_32 (M_{i+ 8} +_16 K_{i+ 8})) Y = Y +_32 ((M_{i+1} +_16 K_{i+1}) *_32 (M_{i+ 9} +_16 K_{i+ 9})) Y = Y +_32 ((M_{i+2} +_16 K_{i+2}) *_32 (M_{i+10} +_16 K_{i+10})) Y = Y +_32 ((M_{i+3} +_16 K_{i+3}) *_32 (M_{i+11} +_16 K_{i+11})) Y = Y +_32 ((M_{i+4} +_16 K_{i+4}) *_32 (M_{i+12} +_16 K_{i+12})) Y = Y +_32 ((M_{i+5} +_16 K_{i+5}) *_32 (M_{i+13} +_16 K_{i+13})) Y = Y +_32 ((M_{i+6} +_16 K_{i+6}) *_32 (M_{i+14} +_16 K_{i+14})) Y = Y +_32 ((M_{i+7} +_16 K_{i+7}) *_32 (M_{i+15} +_16 K_{i+15})) i = i + 16 Return Y 6.2 L1-HASH-16: First-layer hash To limit the length of key required in the first layer of hashing, L1-HASH-16 breaks the input message into chunks no longer than Krovetz et al. Expires April 2001 [Page 25]

INTERNET-DRAFT UMAC October 2000 L1-KEY-LEN bytes and NH hashes each with a key of that same length. 6.2.1 Interface Function Name: L1-HASH-16 Input: K, string of length L1-KEY-LEN bytes. M, string of length less than 2^64 bytes. Output: Y, string of length (4 * ceil(length(M)/L1-KEY-LEN)) bytes. 6.2.2 Algorithm Compute Y using the following algorithm. // // Break M into L1-KEY-LEN byte chunks (final chunk may be shorter) // t = ceil(length(M) / L1-KEY-LEN) Let M_1, M_2, ..., M_t be strings so that M = M_1 || M_2 || .. || M_t, and length(M_i) = L1-KEY-LEN for all 0 < i < t. // // For each chunk, except the last: endian-adjust, NH hash // and add bit-length. Use results to build Y. // Len = uint2str(L1-KEY-LEN * 8, 4) Y = <empty string> for i = 1 to t-1 do if (ENDIAN-FAVORITE == LITTLE) then // See endian discussion ENDIAN-SWAP(M_i) // in section 3.1.1 Y = Y || (NH-16(K, M_i) +_32 Len) // // For the last chunk: pad to 32-byte boundary, endian-adjust, // NH hash and add bit-length. Concatenate the result to Y. // Len = uint2str(length(M_t) * 8, 4) M_t = zeropad(M_t, 32) if (ENDIAN-FAVORITE == LITTLE) then ENDIAN-SWAP(M_t) Y = Y || (NH-16(K, M_t) +_32 Len) return Y Krovetz et al. Expires April 2001 [Page 26]

INTERNET-DRAFT UMAC October 2000 6.3 L2-HASH-16: Second-layer hash L2-HASH-16 differs from L2-HASH-32 by beginning the ramped hash with a smaller prime modulus. See Section 5.3 for the definition of POLY. 6.3.1 Interface Function Name: L2-HASH-16 Input: K, string of length 28 bytes. M, string of length less than 2^64 bytes. Output: Y, string of length 16 bytes. 6.3.2 Algorithm Compute Y using the following algorithm. // // Extract keys and restrict to special key-sets // Mask32 = uint2str(0x1fffffff, 4) Mask64 = uint2str(0x01ffffff01ffffff, 8) Mask128 = uint2str(0x01ffffff01ffffff01ffffff01ffffff, 16) k_32 = str2uint(K[1..4] and Mask32) k64 = str2uint(K[5..12] and Mask64) k128 = str2uint(K[13..28] and Mask128) // // If M no more than 2^11 bytes, hash under 32-bit prime. // Otherwise, hash under increasingly long primes. // if (length(M) <= 2^11) then // 2^9 32-bit words y = POLY(32, 2^32 - 6, k_32, M) else if (length(M) <= 2^33) then // 2^31 32-bit words M_1 = M[1 .. 2^11] M_2 = M[2^11 + 1 .. length(M)] M_2 = zeropad(M_2 || uint2str(0x80,1), 8) y = POLY(32, 2^32 - 6, k_32, M_1) y = POLY(64, 2^64 - 2^32, k64, uint2str(y, 8) || M_2) else Krovetz et al. Expires April 2001 [Page 27]

INTERNET-DRAFT UMAC October 2000 M_1 = M[1 .. 2^11] M_2 = M[2^11 + 1 .. 2^33] M_3 = M[2^33 + 1 .. length(M)] M_3 = zeropad(M || uint2str(0x80,1), 16) y = POLY(32, 2^32 - 6, k_32, M_1) y = POLY(64, 2^64 - 2^32, k64, uint2str(y, 8) || M_2) y = POLY(128, 2^128 - 2^96, k128, uint2str(y, 16) || M_3) Y = uint2str(y, 16) Return Y 6.4 L3-HASH-16: Third-layer hash The L3-HASH-16 algorithm differs from L3-HASH-32 by hashing under a 19-bit prime modulus (instead of a 36-bit prime modulus) and then returning a 2-byte result (instead of a 4-byte result). 6.4.1 Interface Function Name: L3-HASH-16 Input: K1, string of length 32 bytes. K2, a string of length 2 bytes. M, a string of length 16 bytes. Output: Y, a string of length 2 bytes. 6.4.2 Algorithm Compute Y using the following algorithm. y = 0 // // Break M and K1 into 8 chunks and convert to integers // for i = 1 to 8 do M_i = M[(i - 1) * 2 + 1 .. i * 2] K_i = K1[(i - 1) * 4 + 1 .. i * 4] m_i = str2uint(M_i) k_i = str2uint(K_i) mod prime(19) // Krovetz et al. Expires April 2001 [Page 28]

INTERNET-DRAFT UMAC October 2000 // Inner-product hash, extract last 32 bits and affine-translate // y = (m_1 * k_1 + ... + m_8 * k_8) mod prime(19) y = y mod 2^16 Y = uint2str(y, 2) Y = Y xor K2 Return Y 6.5 UHASH-16: Three-layer universal hash The algorithm UHASH-16 differs from UHASH-32 only in the size of its keys generated, and in that it refers to the 16-bit variants of the three-layer hash functions. 6.5.1 Interface Function Name: UHASH-16 Input: K, string of length UMAC-KEY-LEN bytes. M, string of length less than 2^64 bytes. Output: Y, string of length UMAC-OUTPUT-LEN bytes. 6.5.2 Algorithm Compute Y using the following algorithm. // // Calculate iterations needed to make UMAC-OUTPUT-LEN bytes // streams = ceil(UMAC-OUTPUT-LEN / WORD-LEN) // // Define total key needed for all iterations using KDF. // L1Key and L3Key1 both reuse most key between iterations. // L1Key = KDF(K, 0, L1-KEY-LEN + (streams - 1) * 16) L2Key = KDF(K, 1, streams * 28) L3Key1 = KDF(K, 2, streams * 32) L3Key2 = KDF(K, 3, streams * 2) // // For each iteration, extract key and three-layer hash. Krovetz et al. Expires April 2001 [Page 29]

INTERNET-DRAFT UMAC October 2000 // If length(M) <= L1-KEY-LEN, then skip L2-HASH. // Y = <empty string> for i = 1 to streams do L1Key_i = L1Key [(i-1) * 16 + 1 .. (i-1) * 16 + L1-KEY-LEN] L2Key_i = L2Key [(i-1) * 28 + 1 .. i * 28] L3Key1_i = L3Key1[(i-1) * 32 + 1 .. i * 32] L3Key2_i = L3Key2[(i-1) * 2 + 1 .. i * 2] A = L1-HASH-16(L1Key_i, M) if (length(M) <= L1-KEY-LEN) then B = zeroes(12) || A else B = L2-HASH-16(L2Key_i, A) C = L3-HASH-16(L3Key1_i, L3Key2_i, B) Y = Y || C Y = Y[1 .. UMAC-OUTPUT-LEN] Return Y 7 UMAC tag generation Tag generation for UMAC proceeds as follows. Use UHASH to hash the message and apply the PDF to the nonce to produce a string to xor with the UHASH output. The resulting string is the authentication- tag. 7.1 Interface Function Name: UMAC Input: K, string of length UMAC-KEY-LEN bytes. M, string of length less than 2^64 bytes. Nonce, string of length 1 to 16 bytes. Output: AuthTag, string of length UMAC-OUTPUT-LEN bytes. 7.2 Algorithm Compute AuthTag using the following algorithm. if (WORD-LEN == 2) then HashedMessage = UHASH-16(K, M) else HashedMessage = UHASH-32(K, M) Krovetz et al. Expires April 2001 [Page 30]

INTERNET-DRAFT UMAC October 2000 Pad = PDF(K, Nonce) AuthTag = Pad xor HashedMessage Return AuthTag 8 Security considerations As a specification of a message authentication code, this entire document is about security. Here we describe some security considerations important for the proper understanding and use of UMAC. 8.1 Resistance to cryptanalysis The strength of UMAC depends on the strength of its underlying cryptographic functions: the key-derivation function (KDF) and the pad-derivation function (PDF). In this specification it is assumed that both operations are implemented using the Advanced Encryption Standard (AES). However, the full design and specification allow for the replacement of these components. Indeed, it is straightforward to use other block ciphers or other cryptographic objects, such as SHA-1 or HMAC for the realization of the KDF or PDF. The core of the UMAC design, the UHASH function, does not depend on any "cryptographic assumptions": its strength is specified by a purely mathematical property stated in terms of collision probability, and this property is proven in an absolute sense. In this way, the strength of UHASH is guaranteed regardless of future advances in cryptanalysis. The analysis of UMAC [3, 4] shows this scheme to have "provable security", in the sense of modern cryptography, by way of tight reductions. What this means is that an adversarial attack on UMAC which forges with probability significantly exceeding the established collision probability will give rise to an attack of comparable complexity which breaks the AES, in the sense of distinguishing AES from a family of random permutations. This design approach essentially obviates the need for cryptanalysis on UMAC: cryptanalytic efforts might as well focus on AES, the results imply. 8.2 Tag lengths and forging probability A MAC algorithm is used between two parties that share a secret MAC key, K. Messages transmitted between these parties are accompanied by authentication tags computed using K and a given nonce. Breaking Krovetz et al. Expires April 2001 [Page 31]

INTERNET-DRAFT UMAC October 2000 the MAC means that the attacker is able to generate, on its own, a new message M (i.e. one not previously transmitted between the legitimate parties) and to compute on M a correct authentication tag under the key K. This is called a forgery. Note that if the authentication tag is specified to be of length t then the attacker can trivially break the MAC with probability 1/2^t. For this the attacker can just generate any message of its choice and try a random tag; obviously, the tag is correct with probability 1/2^t. By repeated guesses the attacker can increase linearly its probability of success. UMAC is designed to make this guessing strategy the best possible attack against UMAC as long as the attacker does not invest the computational effort needed to break the underlying cipher, e.g. AES, used to produce the one time pads used in UMAC computation. More precisely, under the assumed strength of this cipher UMAC provides for close-to-optimal security with regards to forgery probability as represented in the next table. -------------------------------------------------------------------- UHASH-OUTPUT-LEN Forging probability Approximate actual forging (bytes) using a random tag probability in UMAC (using a clever tag) 2 2^-16 2^-15 4 2^-32 2^-30 8 2^-64 2^-60 16 2^-128 2^-120 -------------------------------------------------------------------- Recall that the parameter UHASH-OUTPUT-LEN specifies the length of the UMAC authentication tag. The above table states, for example, for the case of an 8-byte tag that the ideal forging probability would be 2^-64 while UMAC would withstand an actual forging probability of 2^-60. Note that under this tag length (which is the default length in UMAC) the probability of forging a message is well under the chance that a randomly guessed DES key is correct. DES is now widely seen as vulnerable, but the problem has never been that some extraordinarily lucky attacker might, in a single guess, find the right key. Instead, the problem is that large amounts of computation can be thrown at the problem until, off-line, the attacker finds the right key. With UMAC, off-line computation aimed at exceeding the forging probability is hopeless, regardless of tag length, as long as the underlying cipher is not broken. The only way to forge is to interact with the entity that verifies the MAC and to try a huge amount of forgeries before one is likely to succeed. The system Krovetz et al. Expires April 2001 [Page 32]

INTERNET-DRAFT UMAC October 2000 architecture will determine the extent to which this is possible. In a well-architected system there should not be any high-bandwidth capability for presenting forged MACs and determining if they are valid. In particular, the number of authentication failures at the verifying party should be limited. If a large number of such attempts are detected the session key in use should be dropped and the event be recorded in an audit log. Let us reemphasize: a forging probability of 1 / 2^60 does not mean that there is an attack that runs in 2^60 time - as long as AES maintains its believed security there is no such attack for UMAC. Instead, a 1 / 2^60 forging probability means that if an attacker could try out 2^60 MACs, then the attacker would probably get one right. But the architecture of any real system should render this infeasible. One can certainly imagine an attacker having a high bandwidth channel (e.g., 1 Gbit/second or more) over which it can continually present attempted forgeries, the attacker being signaled when a correct tag is found, but realizing such a scenario in a real system would represent a major lapse in the security architecture. It should be pointed out that once an attempted forgery is successful, it is entirely possible that all subsequent messages under this key may be forged, too. This is important to understanding in gauging the severity of a successful forgery. In conclusion, the default 64-bit tags seem appropriate for most security architectures and applications. In cases where when the consequences of an authentication failure are not extremely severe, and when the system architecture is designed to conscientiously limit the number of forgery attempts before a session is torn down, 32-bit authentication tags may be adequate. For the paranoid, or if an attacker is allowed a fantastic number of forgery tests, 96- or 128-bits may be utilized. 8.3 Selective-assurance authentication We have already remarked about the flexibility built into UMAC to use authentication tags of various lengths: shorter tags are faster to compute and one needs to transmit fewer bits, but the forging probability is higher. There is an additional degree of flexibility built into the design of UMAC: even if the sender generates and transmits a tag of 8 bytes, say, a receiver may elect to verify only the first 4 bytes of the tag, and computing that 4-byte prefix by the receiver will be substantially faster than computing what the full 8-byte tag would be. Indeed when WORD-LEN is 2 one can more quickly check the 2-byte prefix of the tag than the 4-byte prefix of the tag, one can more quickly check the 4-byte prefix of the tag than the Krovetz et al. Expires April 2001 [Page 33]

INTERNET-DRAFT UMAC October 2000 6-byte prefix of the tag, and so forth. When WORD-LEN is 4 one can more quickly check the 4-byte prefix of the tag than an entire 8-byte tag, and so forth. This type of flexibility allows different parties who receive a MAC (as in a multicast setting) to authenticate the transmission to the extent deemed necessary and to the extent consistent with any computational limits of the receiver. In a scenario where receivers are allowed to verify short prefixes of longer tags, it is even more important that conservative policies are followed when a bad tag is presented to the receiver. Because short prefixes are easier to forge than are long ones, an attacker may attempt to forge short prefixes and then leverage information gained from these attacks to forge longer tags. If the attacker can learn which short tags are good and which are bad, the attacker may be able to learn enough to allow longer forgeries. One salient feature of the security-performance trade-off offered by UMAC is its usability in contexts where performance is severely constrained. In such cases, using a mild-security authentication tag can be of significant value especially if the alternative would be not to use authentication at all (a possible such scenario could be the high-speed transmission of real-time multimedia streams). Another potential scenario where short and fast-to-compute tags can be useful is for fast detection of data forgery intended as a denial of service attack. In this case, even a moderate increase in the attacker's difficulty to produce forgeries may suffice to make the attack worthless for the attacker. Moreover, being able to detect just a portion of attempted forgeries may be enough to identify the attack. 8.4 Nonce considerations The function UMAC (Section 7) requires a nonce with length in the range 1 to 16 bytes. All nonces in an authentication session must be equal in length. For secure operation, no nonce value should be repeated within the life of a single UMAC session-key. To authenticate messages over a duplex channel (where two parties send messages to each other), a different key could be used for each direction. If the same key is used in both directions, then it is crucial that all nonces be distinct. For example, one party can use even nonces while the other party uses odd ones. The receiving party must verify that the sender is using a nonce of the correct form. This specification does not indicate how nonce values are created, updated, or communicated between the entity producing a tag and the entity verifying a tag. The following exemplify some of the Krovetz et al. Expires April 2001 [Page 34]

INTERNET-DRAFT UMAC October 2000 possibilities: 1. The nonce is an eight-byte [resp., four-byte] unsigned number, Counter, which is initialized to zero, which is incremented by one following the generation of each authentication tag, and which is always communicated along with the message and the authentication tag. An error occurs at the sender if there is an attempt to authenticate more than 2^64 [resp., 2^32] messages within a session. 2. The nonce is a 16-byte unsigned number, Counter, which is initialized to zero and which is incremented by one following the generation of each authentication tag. The Counter is not explicitly communicated between the sender and receiver. Instead, the two are assumed to communicate over a reliable transport, and each maintains its own counter so as to keep track of what the current nonce value is. 3. The nonce is a 16-byte random value. (Because repetitions in a random n-bit value are expected at around 2^(n/2) trials, the number of messages to be communicated in a session using n-bit nonces should not be allowed to approach 2^(n/2).) We emphasize that the value of the nonce need not be kept secret. When UMAC is used within a higher-level protocol there may already be a field, such as a sequence number, which can be co-opted so as to specify the nonce needed by UMAC. The application will then specify how to construct the nonce from this already-existing field. Note that if the nonce starts at zero and is incremented with each message then an attacker can easily ascertain the number of messages which have been sent during a session. If this is information which one wishes to deny the attacker then one might have the sender initialize the nonce to a random value, rather than to zero. Inspecting the current nonce will no longer reveal to the attacker the number of messages which have been sent during this session. This is a computationally cheaper approach than enciphering the nonce. 8.5 Guarding against replay attacks A replay attack entails the attacker repeating a message, nonce, and authentication tag. In systems, replay attacks may be quite damaging, and many applications will want to guard against them. In UMAC, this would normally be done at the receiver by having the receiver check that no nonce value is used twice. On a reliable Krovetz et al. Expires April 2001 [Page 35]

INTERNET-DRAFT UMAC October 2000 connection, when the nonce is a counter, this is trivial. On an unreliable connection, when the nonce is a counter, one would normally cache some "window" of recent nonces. Out-of-order message delivery in excess of what the window allows will result in rejecting otherwise valid authentication tags. We emphasize that it is up to the receiver when a given message, nonce and tag will be deemed authentic. Certainly the tag should be valid for the message and nonce, as determined by UMAC, but the message may still be deemed inauthentic because the nonce is detected to be a replay. 9 Acknowledgments Thanks are due to David Balenson and David Carman of NAI Labs, who suggested the advantages of allowing a receiver to verify authentication tags to various forgery probabilities. Thanks are also due to David McGrew and Scott Fluhrer of Cisco Systems for encouraging us to improve UMAC performance on short messages. Phillip Rogaway, John Black, and Ted Krovetz were supported in this work under Rogaway's NSF CAREER Award CCR-962540, and under MICRO grants 97-150, 98-129, and 99-103 funded by RSA Data Security, Inc., and ORINCON Corporation. Much of Rogaway's work was carried out during two sabbatical visits to Chiang Mai University. Special thanks to Prof. Darunee Smawatakul for helping to arrange these visits. 10 References [1] ANSI X9.9, "American National Standard for Financial Institution Message Authentication (Wholesale)", American Bankers Association, 1981. Revised 1986. [2] M. Bellare, R. Canetti, and H. Krawczyk, "Keyed hash functions and message authentication", Advances in Cryptology - CRYPTO '96, LNCS vol. 1109, pp. 1-15. Full version available from http://www.research.ibm.com/security/keyed-md5.html/ [3] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway, "UMAC: Fast and provably secure message authentication", Advances in Cryptology - CRYPTO '99, LNCS vol. 1666, pp. 216-233. Full version available from http://www.cs.ucdavis.edu/~rogaway/umac [4] J. Black, S. Halevi, H. Krawczyk, T. Krovetz, and P. Rogaway, "The UMAC message authentication code", work in progress, 2000. Krovetz et al. Expires April 2001 [Page 36]

INTERNET-DRAFT UMAC October 2000 To be available from http://www.cs.ucdavis.edu/~rogaway/umac [5] L. Carter and M. Wegman, "Universal classes of hash functions", Journal of Computer and System Sciences, 18 (1979), pp. 143-154. [6] O. Goldreich, S. Goldwasser and S. Micali, "How to construct random functions", Journal of the ACM, 33, No. 4 (1986), pp. 210-217. [7] S. Halevi and H. Krawczyk, "MMH: Software message authentication in the Gbit/second rates", Fast Software Encryption, LNCS Vol. 1267, Springer-Verlag, pp. 172-189, 1997. [8] ISO/IEC 9797-1, "Information technology - Security techniques - Data integrity mechanism using a cryptographic check function employing a block cipher algorithm", International Organization for Standardization, 1999. [9] H. Krawczyk, M. Bellare, and R. Canetti, "HMAC: Keyed-hashing for message authentication", RFC-2104, February 1997. [10] T. Krovetz, and P. Rogaway, "Fast universal hashing with small keys and no preprocessing", work in progress, 2000. To be available from http://www.cs.ucdavis.edu/~rogaway/umac [11] T. Krovetz, and P. Rogaway, "Variationally universal hashing", work in progress, 2000. To be available from http://www.cs.ucdavis.edu/~rogaway/umac [12] M. Wegman and L. Carter, "New hash functions and their use in authentication and set equality", Journal of Computer and System Sciences, 22 (1981), pp. 265-279. 11 Author contact information Authors' Addresses John Black Department of Computer Science University of Nevada Reno NV 89557 USA EMail: jrb@cs.unr.edu Shai Halevi Krovetz et al. Expires April 2001 [Page 37]

INTERNET-DRAFT UMAC October 2000 IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights NY 10598 USA EMail: shaih@watson.ibm.com Alejandro Hevia Department of Computer Science & Engineering University of California at San Diego La Jolla CA 92093 USA EMail: ahevia@cs.ucsd.edu Hugo Krawczyk Deprtment of Electrical Engineering Technion Haifa 32000 ISRAEL EMail: hugo@ee.technion.ac.il Ted Krovetz Intel Corporation 1900 Prairie City Road Folsom CA 95630 USA EMail: tdk@acm.org Phillip Rogaway Department of Computer Science University of California Davis CA 95616 USA EMail: rogaway@cs.ucdavis.edu A Suggested application programming interface (API) /* umac.h */ typedef struct UMAC_CTX *umac_ctx_t; umac_ctx_t umac_alloc(char key[]); /* Dynamically allocate UMAC_CTX struct */ Krovetz et al. Expires April 2001 [Page 38]

INTERNET-DRAFT UMAC October 2000 /* initialize variables and generate */ /* subkeys for default parameters. */ int umac_free(umac_ctx_t ctx); /* Deallocate the context structure. */ int umac_set_params(umac_ctx_t ctx, void *params); /* After default initialization, */ /* optionally set parameters to */ /* different values and reset for */ /* new message. */ int umac_update(umac_ctx_t ctx, char *input, long len); /* Incorporate len bytes pointed to by */ /* input into context ctx. */ int umac_final(umac_ctx_t ctx, char tag[], char nonce[]); /* Incorporate nonce value and return */ /* tag. Reset ctx for next message. */ int umac(umac_ctx_t ctx, char *input, long len, char tag[], char nonce[]); /* All-in-one (non-incremental) */ /* implementation of the functions */ /* umac_update() and umac_final(). */ Each routine returns zero if unsuccessful. B Reference code and test vectors See the UMAC World Wide Web homepage for reference code and test vectors. http://www.cs.ucdavis.edu/~rogaway/umac/ Krovetz et al. Expires April 2001 [Page 39]