[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits]

Versions: (draft-mcgrew-avt-srtp) 00 01 02 03 04 05 06 07 08 09 RFC 3711

Internet Engineering Task Force
AVT Working Group                                       Baugher, McGrew,
INTERNET-DRAFT                                              Oran (Cisco)
Expires: April 2002                               Blom, Carrara,Naslund,
                                                      Norrman (Ericsson)
                                                           November 2001

                    The Secure Real Time Transport Protocol

Status of this memo

   This document is an Internet-Draft and is in full conformance with
   all provisions of Section 10 of RFC2026.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups. Note that other
   groups may also distribute working documents as Internet-Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time. It is inappropriate to use Internet-Drafts as reference
   material or cite them other than as "work in progress".

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at


   This document describes the Secure Real Time Transport Protocol
   (SRTP), a profile of the Real Time Transport Protocol (RTP), which
   can provide confidentiality, message authentication, and replay

   SRTP can achieve high throughput and low packet expansion. SRTP
   proves to be a suitable protection for heterogeneous environments,
   i.e. environments including both wired and wireless links. To get
   such features, default transforms are described, based on an additive
   stream cipher for encryption, a keyed-hash based function for message

Baugher, et al.                                                 [Page 1]

INTERNET-DRAFT                    SRTP                    November, 2001

   authentication, and an 'implicit' index for sequencing based on the
   RTP sequence number.

   1. Notational Conventions.........................................3
   2. Goals..........................................................3
   3. SRTP Framework.................................................4
   3.1 SRTP Cryptographic Contexts...................................6
   3.1.1 Transform-independent parameters............................6
   3.1.2 Transform-dependent parameters..............................7
   3.1.3 Mapping SRTP Packets to Cryptographic Contexts..............7
   3.2 SRTP Packet Processing........................................7
   3.2.1 Packet Index Determination..................................8
   3.2.2 Cryptographic Transforms....................................9
   3.2.3 Replay Protection...........................................10
   3.3 Secure RTCP...................................................10
   4. Pre-Defined Transforms.........................................13
   4.1 Encryption....................................................13
   4.1.1 AES in Counter Mode.........................................15 Keystream generation......................................15
   4.1.2 AES in f8-Mode..............................................15 Keystream Generation......................................16 SRTP IV Formation.........................................17 SRTCP IV Formation........................................17
   4.1.3 NULL Cipher.................................................18
   4.2 Message Authentication and Integrity..........................18
   4.2.1. HMAC/SHA1..................................................18
   4.2.2 TMMH/16.....................................................18
   4.3 Key Derivation................................................20
   4.3.1 Key Derivation Algorithm....................................20
   4.3.2 AES-CM PRF..................................................21
   4.3.3 SRTCP Key Derivation........................................21
   5. Default and Mandatory Transforms...............................22
   5.1 Encryption: AES-CM............................................22
   5.2 Authentication/Integrity: HMAC/SHA1...........................22
   5.3 Key Derivation: AES-CM PRF....................................22
   6. SRTP Parameters................................................22
   7. Adding SRTP Transforms.........................................23
   8. Rationale......................................................23
   8.1 Key derivation................................................23
   8.2 Salting key...................................................24
   8.3 TMMH _ Message Integrity from Universal Hashing...............24
   8.4 Data Origin Authentication considerations.....................24
   9. Key Management Considerations..................................25
   10. Security Considerations.......................................25
   10.1 Key Usage....................................................25
   10.2 SSRC collision and two-time pad..............................26
   10.3 Confidentiality of the RTP Payload...........................26
   10.4 Confidentiality of the RTP Header............................27

Baugher, et al.                                                 [Page 2]

INTERNET-DRAFT                    SRTP                    November, 2001

   10.5 Integrity of the RTP packet..................................27
   10.5.1 Integrity of the RTP header: IHA...........................28
   11. Interaction with Forward Error Correction mechanisms..........28
   12. IANA Considerations...........................................29
   13. Open issue....................................................29
   14. Acknowledgements..............................................29
   15. Author's Addresses............................................29
   16. References....................................................30
   Appendix A: Pseudocode for Index Determination,
   and ROC and s_l Update............................................32
   Appendix B: Test Vectors..........................................32
   B.1 AES-f8 Test Vectors...........................................32
   B.2 AES-CM Test Vectors...........................................33
   B.3 TMMH/16 Test Vectors..........................................34

1. Notational Conventions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in [RFC2119].
   Terminology is conform to [RFC2828].

   By convention, the most left bit (byte) is the most significant one.
   By XOR we mean bitwise addition modulo 2 of binary strings, and ||
   denotes concatenation. E.g. if C = A || B, then the most significant
   bits of C are the bits of A, and the least significant bits of C
   equals the bits of B. Hexadecimal numbers are prefixed by 0x.

   At the time of writing, NIST has not published the Advanced
   Encryption Standard, AES [AES]. However, as it is clear that AES will
   be the Rijndael algorithm as specified in [AES], we shall throughout
   this document let AES denote the block cipher Rijndael.

2. Goals

   The security goals for SRTP are to ensure:

   * the confidentiality of the RTP payload, and

   * the integrity protection of the entire RTP packet, together with
   protection against replayed RTP packets.

   Each of these security services is optional and independent.

   Other, functional, goals for the protocol are:

   * a framework that permits upgrade to new cryptographic transforms,

Baugher, et al.                                                 [Page 3]

INTERNET-DRAFT                    SRTP                    November, 2001

   * low bandwidth cost, i.e. a framework preserving RTP header
   compression efficiency,

   and, asserted by the pre-defined transforms:

   * a low computational cost,

   * a small footprint (i.e. small code size and data memory for keying
      information and replay lists),

   * limited packet expansion to support the bandwidth economy goal,

   * independence from the underlying transport, network, and physical
   layer used by RTP, in particular high tolerance to packet loss and
   re-ordering, and robustness to transmission bit-errors.

   The described security services are also provided for RTCP, the
   control protocol defined for RTP [RFC1889], with the exception that
   integrity and replay protection for the RTCP packets are mandatory
   when SRTP services are applied to the RTP packets of the
   corresponding session.

   These properties ensure that SRTP is a suitable protection scheme for
   RTP in both wired and wireless scenarios.

3. SRTP Framework

   RTP is the Real Time Transport Protocol [RFC1889]. We define SRTP as
   a profile of RTP, in a way analogous to RFC1890 which defines the
   audio/video profile for RTP. Conceptually, we consider a 'bump in the
   stack' implementation which resides between the RTP application and
   the transport layer, which intercepts RTP packets and then forwards
   an equivalent SRTP packet on the sending side, and which intercepts
   SRTP packets and passes an equivalent RTP packet up the stack on the
   receiving side.

Baugher, et al.                                                 [Page 4]

INTERNET-DRAFT                    SRTP                    November, 2001

         0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |   |V=2|P|X|  CC   |M|     PT      |       sequence number         |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |                           timestamp                           |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |           synchronization source (SSRC) identifier            |
   |   +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   |   |            contributing source (CSRC) identifiers             |
   |   |                               ....                            |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |                   RTP extension (optional)                    |
   | +>+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                                                               |
   | | |                           payload                             |
   | | |                             ....                              |
   | | |                     authentication tag (optional)             |
   | | |                                                               |
   | | |                             ....                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | |
   | +- Encrypted Portion
   +---- Authenticated Portion

   Figure 1.  The format of an SRTP packet.

   The format of an SRTP packet is illustrated in Figure 1. The optional
   authentication tag is the only field defined by SRTP that is not in

   The added field is:

   Authentication tag: variable length, optional
          The authentication tag shall be used to carry authentication
          data. The Authenticated Portion of an SRTP packet consists of
          the entire equivalent RTP packet. Note that, if encryption and
          authentication are applied, then 'payload' in the
          Authenticated Portion refers to the correspondent encrypted
          payload. The authentication tag provides authentication of the
          RTP header and payload, and it indirectly provides replay
          protection by authenticating the sequence number.

Baugher, et al.                                                 [Page 5]

INTERNET-DRAFT                    SRTP                    November, 2001

   The Encrypted Portion of an SRTP packet consists of the RTP payload
   of the equivalent RTP packet.

3.1 SRTP Cryptographic Contexts

   Each SRTP session requires the sender and receiver to maintain
   cryptographic state information. This information is called the
   cryptographic context.

   By a session key, we mean a key that is to enter a cryptographic
   transform (e.g. encryption or authentication), and a master key is a
   random bit string (given by the key management protocol) from which
   session keys are derived in a cryptographically secure way.

3.1.1 Transform-independent parameters

   The transport-independent parameters of the cryptographic context
   consists of:

   * a 32-bit rollover counter, ROC, which records how many times the
   16-bit RTP sequence number has been reset to zero after passing
   through 65,535. Unlike the sequence number, SEQ, which SRTP extracts
   from the RTP packet header, the ROC is maintained by SRTP. This ROC
   is thus a parameter internal to SRTP.

   * for the receiver only, a sequence number s_l, which is the last
   received sequence number (possibly authenticated, if authentication
   is provided). Here, 'sequence number' refers to the 16-bit SEQ
   carried in the RTP packet header.

   * identifier for the encryption algorithm, i.e. the cipher and its
   mode of operation, and related parameters,

   * identifier for the authentication protection algorithm, and related
   parameters (when authentication is provided),

   * a replay list L, maintained by the receiver only (when
   authentication is provided),

   * integers n_e and n_a, determining the length of the session keys
   for encryption and authentication,

   * the master key(s),

   * a 16-bit integer, the session key derivation-rate,

   * FirstSEQ+ROC and LastSEQ+ROC as key lifetime for each of the master
   keys (FirstSEQ and LastSEQ are the RTP sequence numbers inside whose

Baugher, et al.                                                 [Page 6]

INTERNET-DRAFT                    SRTP                    November, 2001

   range the master key is valid, and ROC is the rollover counter).
   These values are absolute quantities, not relative.

3.1.2 Transform-dependent parameters

   Any encryption, authentication/integrity, and key derivation
   parameters that depend on the transform definitions are defined in
   the Transforms section. Future SRTP transform specifications MUST
   include a section to list the cryptographic context's parameters for
   that transform.

3.1.3 Mapping SRTP Packets to Cryptographic Contexts

   Recall that an RTP session for each participant is defined [RFC1889]
   by a pair of destination transport addresses (one network address
   plus a port pair for RTP and RTCP), and that a multimedia session is
   defined as a collection of RTP sessions. For example, a particular
   multimedia session could include an audio RTP session, a video RTP
   session, and a text RTP session.

   A cryptographic context shall be uniquely identified by the triplet
   context identifier:

   <SSRC, destination network address, destination transport port

   where the destination network address and the destination transport
   port are the ones in the current packet. It is assumed that, when
   presented with this information, the key management returns a context
   with the information as described in Section 3.1.

3.2 SRTP Packet Processing

   To construct a proper SRTP packet, given an RTP packet, the sender
   does the following:

   1. Determine which cryptographic context to use as described in
   Section 3.1.3.

   2. Determine the index of the SRTP packet as described in Section
   3.2.1, using the rollover counter in the cryptographic context and
   the sequence number in the RTP packet.

   3. Determine the session keys, as described in Section 4.3.

   4. Encrypt the Encrypted Portion of the packet (see Section 4, for
   the defined ciphers), using the encryption keys found in Step 3.

Baugher, et al.                                                 [Page 7]

INTERNET-DRAFT                    SRTP                    November, 2001

   5. If authentication is provided, compute the authentication tag for
   the Authenticated Portion of the packet, as described in Section 4,
   using the index determined in Step 2 and the authentication key found
   in Step 3. Note that the Encrypted Portion is encrypted before the
   authentication tag is computed.

   To authenticate and decrypt a SRTP packet, the receiver does the

   1. Determine which cryptographic context to use as described in
   Section 3.1.3.

   2. Estimate the index of the SRTP packet from the rollover counter in
   the cryptographic context and the sequence number in the RTP packet,
   as described in Section 3.2.1.

   3. Determine the session keys, as described in Section 4.3.

   4. If authentication is provided, check if the packet has been
   replayed, by checking the Replay List to ensure that no packet with
   that index has been received and authenticated before. If that index
   is in the list, then the packet has been replayed and is invalid. It
   MUST be discarded, and the event SHOULD be logged.

   Next, perform verification of the authentication tag, using the
   authentication key and packet index from Step 2. If the result is
   'AUTHENTICATION FAILURE' (see Section 4), the packet MUST be
   discarded from further processing and the event SHOULD be logged.

   5. Decrypt the Encrypted Portion of the packet (see Section 4, for
   the defined ciphers), using the decryption keys found in Step 3.

   6. Update the rollover counter and last sequence number, s_l, in the
   local context to the values used in the packet index estimated in
   Step 2.

3.2.1 Packet Index Determination

   SRTP implementations use an 'implicit' packet index for sequencing.

   When the session starts, the sender side shall set the rollover
   counter, ROC, to zero. Each time the RTP sequence number, SEQ, wraps
   modulo 2^16, the sender side shall increment ROC by one. The sender's
   packet index is then defined as i = 65,536 * ROC + SEQ.

   Receiver-side implementations use the RTP sequence number to
   reconstruct the correct index (that is, location in the sequence of
   all RTP packets). Also here, the index is defined as SEQ + ROC *
   65,536, where   the RTP sequence number is SEQ and the rollover

Baugher, et al.                                                 [Page 8]

INTERNET-DRAFT                    SRTP                    November, 2001

   counter is ROC, maintained locally by the receiver as described

   A robust approach for the proper use of a rollover counter requires
   its handling and use to be well defined. In particular, out-of-order
   RTP packets with sequence numbers close to 65,536 or zero must be
   properly dealt with.

   A receiver reconstructs the index i of a packet with sequence number
   SEQ using the estimate

         i = 65,536 * v + SEQ,

   where v is chosen from the set { ROC-1, ROC, ROC+1 } such that i is
   closest to the value 65,536 * ROC + s_l. If the value ROC+1 is used,
   then the rollover counter ROC in the cryptographic context is
   incremented by one (see Appendix A).

   The index i is used in replay protection (Section 3.2.3), encryption
   and authentication (Section 4), and for the key derivation (Section

   As the rollover counter is 32 bits long, the maximum number of
   packets in any given SRTP session is 2^48 = 281,474,976,710,656.
   After that number of SRTP packets have been sent with a given key,
   the sender MUST not send any more packets with that key. This
   limitation enforces a security benefit by providing an upper bound on
   the amount of traffic that can pass before cryptographic keys are
   changed. Re-keying (see Section 9) MUST be triggered, no later than
   after this amount of traffic, and MAY be triggered earlier, e.g. for
   increased security and access control to media. Re-occurring key
   derivation, as determined by a non-zero derivation rate (see Section
   4.3), gives even stronger security benefits, but does NOT change the
   above absolute maximum value.

   For the receiver, the 'implicit index' approach works as long as the
   reorder and loss of the packets is not too great. In particular,
   32,768 packets would need to be lost, or a packet would need to be
   32,768 packets out of sequence in order for synchronization to be
   lost. Such drastic loss or reorder is likely to disrupt the RTP
   application itself.

3.2.2 Cryptographic Transforms

   While there are numerous encryption and message authentication
   algorithms that can be used in SRTP, we define (Section 4) default
   algorithms in order to avoid the complexity of specifying the
   encodings for the signaling of algorithm and parameter identifiers.
   The defined algorithms have been chosen as they fulfil the goals

Baugher, et al.                                                 [Page 9]

INTERNET-DRAFT                    SRTP                    November, 2001

   listed in Section 2. Recommendation on how to extend SRTP with new
   transforms are given in Section 7.

3.2.3 Replay Protection

   Robust replay protection is possible when authentication of RTP
   packets is present.

   A packet is 'replayed' when it is stored by an adversary, and then
   re-injected into the network. SRTP provides protection against such
   attacks whenever authentication is provided, through the storage of
   the indices of the most recently received and authenticated packets.

   Each SRTP receiver maintains a Replay List, which conceptually
   contains the indices of all of the packets which have been received
   and authenticated. In practice, the list can use a 'sliding window'
   approach, so that a fixed amount of storage suffices for replay
   protection. Packet indices which lag behind the packet index in the
   context by more than SRTP-WINDOW-SIZE can be assumed to have been
   received, where SRTP-WINDOW-SIZE is a parameter that MUST be at least
   64, and which MAY be set to a higher value.

   The Replay List can be efficiently implemented by using a bitmap to
   represent which packets have been received, as described in the
   Security Architecture for IP [RFC2401].

   Note that there are no provisions for managing transmitted Sequence
   Number values among multiple senders using the same crypto contexts,
   thus the anti-replay service SHOULD NOT be used in a multi-sender
   environment that employs a single crypto context.

3.3 Secure RTCP

   Secure RTCP follows the definition of Secure RTP. SRTCP is defined as
   a profile of RTCP, and it adds two mandatory new fields to the RTCP
   packet definition, the SRTCP index and the authentication tag. Those
   fields are appended to an RTCP packet in order to form an equivalent
   SRTCP packet, so that they follow any other profile specific
   extensions. An SRTCP packet is illustrated in Figure 2.

Baugher, et al.                                                [Page 10]

INTERNET-DRAFT                    SRTP                    November, 2001

        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
   |   |V=2|P|    RC   |   PT=SR=200   |             length            |
   |   +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   |   |                         SSRC of sender                        |
   | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   | | |                              ...                              |
   | | |                          sender info                          |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                              ...                              |
   | | |                         report block 1                        |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                              ...                              |
   | | |                         report block 2                        |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                                                               |
   | | |                              ...                              |
   | | |                                                               |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |V=2|P|    SC   |  PT=SDES=202  |             length            |
   | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   | | |                          SSRC/CSRC_1                          |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | | |                           SDES items                          |
   | | |                              ...                              |
   | | +=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   | | |                                                               |
   | | |                              ...                              |
   | | |                                                               |
   | +>+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
   | | |E|                         SRTCP index                         |
   | | |                              ...                              |
   | | |                       authentication field                    |
   | | |                                                               |
   | | |                              ...                              |
   | | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
   | |
   | +-- Encrypted Portion (optional)
   +---- Authenticated Portion (mandatory when SRTP is used for RTP

   Figure 2.  The format of a Secure RTCP packet, consisting of
   underlying RTCP compound packet with Sender Report and SDES packet.

Baugher, et al.                                                [Page 11]

INTERNET-DRAFT                    SRTP                    November, 2001

   The added fields are:

   E bit and SRTCP index: 32 bits, mandatory
          The SRTCP index is a 31-bit counter for the SRTCP
          packets. The index is explicitly included in each packet, in
          contrast to the 'implicit' index approach used for SRTP.
          As Section 9.1 of [RFC1889] allows the split of a compound
          RTCP packet into two lower-layer packets, one to be encrypted
          and one to be sent in the clear, indices with their most
          significant bit (E bit) set to '1' are reserved for encrypted
          packets, and indices with most significant bit set to '0' are
          used for non-encrypted packets. With this restriction, the
          rest of the bits are set to zero before the first SRTCP packet
          is sent, and is incremented by one after each SRTCP is sent.
          Except for differences in the most significant (E) bit,
          indices form a strictly increasing sequence.

   Authentication Tag: variable length, mandatory
          The authentication tag shall be used to carry message
          authentication data. The Authenticated Portion of an SRTCP
          packet consists of the entire equivalent (eventually compound)
          RTP packet and SRTCP index.

   The Encrypted Portion of an SRTCP packet consists of the RTCP payload
   of the equivalent compound RTCP packet, from the first RTCP packet,
   i.e. from the ninth (9) byte to the end of the compound packet.

   SRTCP packet processing is identical to that of SRTP packet
   processing, with the following changes:

   * SRTCP replay protection is as defined in Section 3.2.3, but using
   the SRTCP index as the index i and maintains separate values for s_l
   and the replay list specific to SRTCP. SRTCP replay protection is

   * SRTCP encryption is as defined in Section 4, but using the
   definition of the SRTCP Encrypted Portion as defined in this section,
   using the SRTCP index as the index i. The encryption transforms shall
   be the same selected for the protection of the associated SRTP
   stream(s) (when RTP is encrypted too), while the NULL algorithm shall
   be applied to the RTCP packets to be authenticated but not encrypted.

   * The SRTCP authentication tag is defined as in Section 4, but with
   the Authenticated Portion of the SRTCP packet defined in this
   section, and using the SRTCP index as the index i. SRTCP
   authentication is mandatory. The authentication transforms and
   related parameters (e.g., key size) shall be the same selected for
   the protection of the associated SRTP stream(s) (when SRTP is
   authenticated too).

Baugher, et al.                                                [Page 12]

INTERNET-DRAFT                    SRTP                    November, 2001

   * SRTCP decryption is performed as in Section 4, but only if the
   SRTCP index has its most significant bit (E bit) equal to 1. If so,
   the encrypted portion is decrypted, using the SRTCP index as the
   index i. In case the most significant bit of the index is 0, the
   payload is simply copied.

   There MAY also exist some minor transform specific changes, see
   Section 4 for the defined transforms.

   The encryption prefix (Section 6.1 of [RFC1889]), a random 32-bit
   quantity intended to improve privacy, MUST NOT be used. This is
   because we strongly recommend ciphers secure against known plaintext
   attacks. The pre-defined SRTP encryption uses a secure, additive
   stream cipher, and thus the prefix offers no benefit at all.

   The maximum number of SRTCP packets with a fixed key is limited to
   2^31 = 2,147,483,648.

   Authentication MUST be applied to RTCP, as it is the control
   protocol (e.g. it has a BYE packet). Note however, the cost for RTCP
   authentication is not of the same order of RTP authentication, as the
   session bandwidth allocated to RTCP recommended is at 5% and the RTCP
   packets have less frequency. However, when adding authentication to
   RTCP, the overhead in bandwidth SHOULD be considered (it will be more
   than 5%).

4. Pre-Defined Transforms

4.1 Encryption

   Generic parameters, common to all pre-defined, non-NULL, encryption

   * BLOCK CIPHER is the block cipher used
   * n_b is the bit-size of the block for the block cipher
   * k_e is the session encrypting key
   * n_e is the length of k_e (the default is 128 bits)
   * k_s is the so called salting key
   * n_s is the length of the salting key. The default value is equal to
    n_b. Another (shorter) value MUST be explicitly signaled.
   * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix, an
    (at least) 8-bit integer, inferred from the message authentication
    code in use.

   The session key is by default derived as specified in Section 4.3.
   The salting key is obtained directly from the cryptographic context.

   The encryption transforms defined in SRTP use a "seekable" segmented
   keystream generator, which for each secret key maps the RTP packet

Baugher, et al.                                                [Page 13]

INTERNET-DRAFT                    SRTP                    November, 2001

   index into a pseudorandom keystream segment, used to encrypt a single
   RTP packet (with that packet index). The process of encrypting a
   packet consists of generating the keystream segment corresponding to
   the packet, and then bitwise exclusive-oring that keystream segment
   onto the Encrypted Portion of the RTP packet. Decryption is done the
   same way, but swapping the roles of the plaintext and ciphertext.

   The definition of how the keystream is generated, given the index,
   depends on the cipher and its mode of operation. Below, two such key
   stream generators are defined. The NULL cipher is also defined, to be
   used when encryption of RTP is not required.

   The initial octets of each keystream segment MAY be reserved for
   use in a message authentication code, in which case the keystream
   used for encryption starts immediately after the last reserved
   octet. The initial reserved octets are called the keystream prefix,
   and the remaining octets are called the keystream suffix.  This
   process is illustrated in Figure 3.

   +----+   +------------------+---------------------------------+
   | KG |-->| Keystream Prefix |          Keystream Suffix       |---+
   +----+   +------------------+---------------------------------+   |
                               +---------------------------------+   v
                               | Encrypted Portion of RTP Packet |->(*)
                               +---------------------------------+   |
                               +---------------------------------+   |
                               | Encrypted Portion of SRTP Packet|<--+

   Figure 3: SRTP Encryption.  Here KG denotes the keystream
   generator, and (*) denotes bitwise exclusive-or.

   The number of octets in the keystream prefix is denoted as
   SRTP_PREFIX_LENGTH. The key stream prefix is reserved for use with
   certain message authentication transforms, indicated by positive,
   non-zero value of this latter parameter. This means that even if
   confidentiality is not to be provided, the keystream generator output
   MAY still need to be computed, in which case the default keystream
   generator SHALL be used.

   The default cipher is the Advanced Encryption Standard (AES), and we
   define two modes of running AES, Segmented Integer Counter Mode AES
   and AES in f8-mode. In the sequel, let E(k,x) be AES applied to key k
   and input block x.

Baugher, et al.                                                [Page 14]

INTERNET-DRAFT                    SRTP                    November, 2001

4.1.1 AES in Counter Mode

   The default keystream generator cipher SHALL be AES [AES] used in the
   Segmented Integer Counter Mode, with a n_e = 128-bit key size and a
   n_b = 128-bit block size.

   Conceptually, counter mode consists of encrypting successive
   integers. The actual definition is somewhat more complicated, in
   order to randomize the starting point of the integer sequence. Each
   packet is encrypted with a distinct keystream segment, which is
   computed as follows. Keystream generation

   A keystream segment is the concatenation of the 128-bit output blocks
   of the AES cipher in the encrypt direction, using key k = k_e, in
   which the block indices are in increasing order. Symbolically, each
   keystream segment looks like

      E(k,A) || E(k,A + 1 mod 2^128) || E(k,A + 2 mod 2^128) ...

   The 128-bit integer value A is defined as 2^16 times the packet
   index, i, plus k_s (the salting key), modulo 2^128:

      A = (k_s + (i * 2^16)) modulo 2^128.

   Note that the initial value A is fixed for each packet. The number of
   blocks of keystream generated for any fixed value of A MUST NOT
   exceed 2^16.

   The AES has a block size of 128 bits, so 2^16 output blocks are
   sufficient to generate the 2^23 bits of keystream needed to encrypt
   the largest possible RTP packet (actually, except for IPv6
   'jumbograms' [RFC2675], which are not likely to be used for RTP-based
   multimedia traffic).

   This restriction on the maximum number of RTP packets ensures
   the security of the encryption method by limiting the effectiveness
   of probabilistic attacks [BR98].

4.1.2 AES in f8-mode

   To encrypt UMTS (Universal Mobile Telecommunications System, as 3G
   networks) data, a solution (see [ES3D]) known as the f8-algorithm has
   been developed. On a high level, the proposed scheme is a variant of
   Output Feedback Mode (OFB) [HAC], with a more elaborate
   initialization and feedback function. As in normal OFB, the core

Baugher, et al.                                                [Page 15]

INTERNET-DRAFT                    SRTP                    November, 2001

   consists of a block cipher. We also here define the use of AES as
   default block cipher to be used in f8-mode for RTP encryption, with
   128-bit key and block size.

   Figure 2 shows the structure of block cipher, E, running in what we
   shall call "f8-mode of operation".

                |      |
           +--->|  E   |
           |    |      |
           |    +------+
           |        |
     m --> *        +-----------+-------------+--  ...     ------+
           |    IV' |           |             |                  |
           |        |   j=1 --> *     j=2 --> *   ...  j=L-1 --> *
           |        |           |             |                  |
           |        |      +--> *        +--> *   ...       +--> *
           |        |      |    |        |    |             |    |
           |        v      |    v        |    v             |    v
           |    +------+   | +------+    | +------+         | +------+
           |    |      |   | |      |    | |      |         | |      |
    k_e ---+--->|  E   |   | |  E   |    | |  E   |         | |  E   |
                |      |   | |      |    | |      |         | |      |
                +------+   | +------+    | +------+         | +------+
                    |      |    |        |    |             |    |
                    +------+    +--------+    +--  ...  ----+    |
                    |           |             |                  |
                    v           v             v                  v
                   S(0)        S(1)          S(2)  . . .       S(L-1)

   Figure 2. f8-mode of operation (asterisk, *, denotes bitwise XOR). Keystream Generation

   As above, let E(k_e,x) be the 128-bit output of AES in the encrypt
   direction when applied to the n_e = 128-bit key k_e and n_b = 128-bit
   plaintext block x. The Initialization Vector (IV) is determined as
   described in Section

   Let IV', S(j), and m denote n_b-bit blocks, determined below. The
   keystream, S(0) || ... || S(L-1), for an N-bit message is defined by
   setting IV' = E(k_e XOR m, IV), and S(-1) = 00..0. For j = 0,1,..,
   L-1 where L = N/n_b (rounded up to nearest integer) compute

Baugher, et al.                                                [Page 16]

INTERNET-DRAFT                    SRTP                    November, 2001

            S(j) = E(k_e, IV' XOR j XOR S(j-1))

   Notice that the IV is not used directly. Instead it is fed through E
   under another key to produce an internal, "masked" value (denoted
   IV') to prevent an attacker from gaining known input/output pairs.
   The role of the internal counter is to prevent short keystream
   cycles. The value of the key mask m is defined to be

           m = k_s || 0x555..5,

   i.e. the salting key, appended by the binary pattern 0101.. to fill
   the entire desired key size, n_e.

   The maximum allowable packet size can be determined as follows. The
   AES has a block size of 128 bits, and assuming that AES behaves like
   a random function, it is (heuristically) secure to generate about
   2^64 output blocks, which is sufficient to generate 2^71 bits of
   keystream. For practical sizes of the RTP packets, much fewer blocks
   are required though, and the counter j above will often be
   sufficient if implemented as a 16- or 32-bit counter. SRTP IV Formation

   The purpose of the following IV formation is to provide a feature
   which we call implict header authentication (IHA), see Section

   The IV for 128-bit block AES-f8 is formed in the following way:

        IV = 0x00 || M || PT || SEQ || TS || SSRC || ROC

   M, PT, SEQ, TS, SSRC are taken from the RTP header; ROC is from the
   crypto context.

   The presence of the SSRC as part of the IV allows AES_f8 to be used
   when a master key is shared between multiple streams, see Section
   10.2. SRTCP IV Formation

   The IV for 128-bit block AES-f8 is formed in the following way:

   IV = 0x00000000 || E || SRTCP index || V || P || RC || PT || length
   || SSRC

   V, P, RC, PT, length, SSRC are taken from the first header in the
   RTCP compound packet. E || SRTCP index is the added 32-bit index to
   the packet.

Baugher, et al.                                                [Page 17]

INTERNET-DRAFT                    SRTP                    November, 2001

4.1.3 NULL Cipher

   The NULL cipher is used when no confidentiality for RTP is requested.
   The keystream can be thought of as "000..0", e.g. the encryption
   simply copies the plaintext input into the ciphertext output.

4.2 Message Authentication and Integrity

   Common parameters

   * k_a is the session authentication key.
   * n_a is the bit-length of the authentication key. The default is 128
   * n_tag is the bit-length of the output authentication tag. The
   default is 32 bits.
   * SRTP_PREFIX_LENGTH is the octet length of the keystream prefix as
   defined above.
   * M is the Authenticated Portion as specified in Section 3 for RTP
   and 3.3 for RTCP.

   The session key is by default derived as specified in Section 4.3.
   The values of n_a, n_tag, and SRTP_PREFIX_LENGTH MUST be fixed for
   any particular fixed value of the key.

   Below we describe the process of computing authentication tags. The
   SRTP receiver verifies a message/authentication tag pair as follows.
   A new authentication tag is computed using one of the algorithms
   below, and it is compared to the tag associated with the message. If
   the two tags are equal, then the message/tag pair is valid;
   otherwise, it is not and the error audit message "AUTHENTICATION
   FAILURE" MUST be returned.

4.2.1. HMAC/SHA1

   The default authentication code is HMAC with SHA1 [HMAC]. When
   HMAC/SHA1 is used, the SRTP_PREFIX_LENGTH is 0. For RTP, the HMAC is
   applied to the concatenation of the Authenticated Portion of the
   packet (M) and the rollover counter in the cryptographic context,
   i.e. HMAC(k_a, M || ROC). For RTCP, we apply HMAC to the
   corresponding M, only. By default, the output shall be truncated to
   the n_tag left-most bits.

4.2.2 TMMH/16

   TMMH is a simple function that maps a key and a message to a hash
   value. This hash value is encrypted by combining it with the
   keystream prefix to make the authentication tag, as described below.

Baugher, et al.                                                [Page 18]

INTERNET-DRAFT                    SRTP                    November, 2001

   TMMH/16 uses sixteen bit unsigned words as a basic data unit, and
   besides the above common parameters we define the following
   parameters for convenience:

   - MESSAGE_LENGTH is the octet length of M.

   - K is the key, i.e. k_a.

   - KEY_LENGTH is the octet length of K, i.e. n_a divided by 8.

   - TAG is the authentication tag, which is the output of TMMH/16

   - TAG_LENGTH is the octet length of the authentication tag, i.e.
   n_tag divided by 8. This value defines SRTP_PREFIX_LENGTH to be equal

   - PREFIX is the key stream prefix for the current packet as defined
   in Section 4.1.

   The values of KEY_LENGTH and TAG_LENGTH MUST obey the alignment
   restrictions described below.

   For TMMH/16, a word is 16-bits long; with the word being 2-bytes
   long, the TAG_LENGTH and KEY_LENGTH MUST be even; if MESSAGE_LENGTH
   is odd, the MESSAGE MUST be padded with a zero octet, but this does
   not change the value of MESSAGE_LENGTH.

   The words of the key are denoted as K[0], K[1], ..., K[KEY_WORDS],
   and the words of the message (after zero padding, if needed) are
   denoted as M[1], M[2], ..., M[MSG_WORDS], where MSG_WORDS is the
   smallest number such that 2 * MSG_WORDS is at least MESSAGE_LENGTH,
   and KEY_WORDS is KEY_LENGTH / 2.

   If MESSAGE_LENGTH is greater than KEY_LENGTH - TAG_LENGTH, then the
   value of TMMH/16 is undefined. Implementations MUST indicate an
   error if asked to hash a message with such a length. Otherwise,
   the hash value is defined to be the length TAG_WORDS sequence of
   words in which the j-th word in the sequence is defined as

   T[j] = [[ K[j] * MESSAGE_LENGTH +32 K[j+1] * M[1] +32 K[j+2] * M[2]
         +32 ... K[j+MSG_WORDS] * M[MSG_WORDS] ] modulo p ] modulo 2^16

   where j ranges from zero to TAG_WORDS-1.

   Here, TAG_WORDS is equal to TAG_LENGTH/2, and p is equal to
   2^16 + 1.  The symbol * denotes multiplication and the symbol +32
   denotes addition modulo 2^32.

   To compute the authentication tag of an SRTP packet, the TMMH hash
   value of that message is computed, then that value is combined with

Baugher, et al.                                                [Page 19]

INTERNET-DRAFT                    SRTP                    November, 2001

   the keystream prefix as defined in Section 4.1.  The combining
   operation is word-wise addition modulo 2^16 (for TMMH/16).

   TAG[j] = T[j] +16 PREFIX[j], where j ranges from zero to TAG_WORDS-1.

   Note that for RTP, where HMAC is applied to M || ROC, TMMH is applied
   to M only. This is so, because the dependence on ROC is for TMMH
   inherent to the PREFIX quantity.

4.3 Key Derivation

4.3.1 Key Derivation Algorithm

   Regardless of the encryption or authentication transform that is
   employed (it may be a defined transform or newly introduced according
   to Section 7), SRTP key derivation is the process of generating
   session keys, without extra communication between the parties and in
   a sender-receiver synchronized way.

                            packet index ---+
                  +-----------+         +--------+ session encr_key
                  | ext       | master  |        |---------->
                  | key mgmt  | key     |  key   |
                  | (optional |-------->| deriv  |---------->
                  | rekey)    |         |        | session auth_key
                  +-----------+         +--------+

   Figure 4: SRTP key derivation.

   At least one initial key derivation is always performed by SRTP.
   Further applications of the key derivation MAY be performed,
   according to the 'key derivation rate' value in the crypto context.

   Let m >= 64, and n be positive integers. A pseudo random function
   family is a set of keyed functions {PRF_m^n(k,x)} such that for
   (secret) random key k, given m-bit x, PRF_m^n(k,x) is an n-bit
   string, computationally indistinguishable from random n-bit strings.

   Let a DIV t denote integer division of a by t, rounded down, and with
   the convention that a DIV 0 = 0 for all a. We also make the
   convention of treating a DIV t as a bit string of the same length as
   a, and thus "a DIV t" will in general have leading zeros. Key

Baugher, et al.                                                [Page 20]

INTERNET-DRAFT                    SRTP                    November, 2001

   derivation is defined as follows. To generate session key(s) for the
   current packet, let the n-bit SRTP key for this packet be

   PRF_m^n(k_master, <label> || (index DIV key_derivation_rate) ||

   where <label> is a 4-bit constant (see below), key_derivation_rate is
   as determined in the crypto context, and index is the packet index
   (i.e. the 48-bit ROC || SEQ for SRTP). We then pad by 1010... to fill
   the m-bit input size.

   The session keys are now derived using:

   - k_e (SRTP encryption): <label> = 0x0, n = n_e.

   - k_a (SRTP authentication): <label> = 0x1, n = n_a.

   where n_e and n_a are as determined in the cryptographic context.

   Note that for the defined counter mode and f8 transforms, the salting
   key k_s is used directly as determined in the cryptographic context
   (not going through the derivation).

   Note that for a key_derivation_rate of 0, anyway the initial key
   derivation application will take place once. The derivation operation
   is facilitated if the non-zero rates are chosen to be powers of 2, or
   preferably, powers of 256.

   Note that the previously mentioned limit on key usage to at most 2^48
   packets for one given key applies both to the derived session keys
   and to the master keys, as key derivation does not increase this
   maximum number.

4.3.2 AES-CM PRF

   The currently defined PRF is keyed by 128 to 256 bit (master) keys,
   has input block size m = 128 and can produce n-bit outputs for
   essentially arbitrary n. We define PRF_m^n(k,x) to be AES in counter
   mode as described in Section 4.1.1, applied to (master) key k, input
   block A = x, and with the output keystream truncated to the n first
   (left-most) bits. (Requiring n/128, rounded up, applications of AES.)

4.3.3 SRTCP Key Derivation

   SRTCP uses the same master key as SRTP, i.e. it is shared between the
   two protocols. To do this securely, the following changes are done to
   Section 4.3.1 when applying session key derivation for SRTCP.

Baugher, et al.                                                [Page 21]

INTERNET-DRAFT                    SRTP                    November, 2001

   Replace the index by the 32-bit quantity: 0 || SRTCP index (i.e.
   excluding the E-bit, replacing it with a fixed 0-bit), and use
   <label> = 0x2 for the SRTCP encryption key and <label> = 0x3 for the
   SRTCP authentication key.

   SRTCP SHALL use the same salting key as SRTP.

5. Default and Mandatory Transforms

   The "default" transforms also are "mandatory-to-implement" transforms
   in SRTP.  Of course, "mandatory-to-implement" does not imply

5.1 Encryption: AES-CM

   AES running in Counter Mode, as defined in Section 4.1.1, is the
   default encryption algorithm, which is mandatory-to-implement.

5.2 Authentication/Integrity: HMAC/SHA1

   HMAC/SHA1, as defined in Section 4.2.1, is the default and mandatory-
   to-implement message authentication code.

5.3 Key Derivation: AES-CM PRF

   The AES Counter Mode PRF defined in Sections 4.3.1 and 4.3.2, is the
   default and mandatory-to-implement method for generating keys.

6. SRTP Parameters

   The SRTP-WINDOW-SIZE is defined to be at least 64 (Section 3.2.3).

   The current defined modes are Segmented Integer Counter Mode
   (default), f8-mode (Section 4), and the NULL Cipher. The default
   cipher is AES (Section 4), used with a block- and encryption key size
   of n_b = n_e = 128 bits.

   The current defined authentication functions are the HMAC/SHA1 and
   TMMH/16. Default value is absence of authentication for RTP
   (authentication is mandatory for RTCP). For HMAC/SHA1, the default
   key-size is n_a = 128 bits and the output length is n_tag = 32 bits.
   SRTP_PREFIX_LENGTH is therefore by default 0.

   The default size of the master key and salting key shall thus also be
   128 bits.

Baugher, et al.                                                [Page 22]

INTERNET-DRAFT                    SRTP                    November, 2001

   The default value for the key derivation-rate field in the context is
   "0", in practice meaning "no key-derivation" (though one (1)
   application of it is mandatory, see Section 4.3).

7. Adding SRTP Transforms

   Sections 4 provide examples of the level of detail needed for
   defining transforms (Section 4).  Whenever a new transform is to be
   added to SRTP, a companion standards-track RFC MUST be written to
   exactly define how the new transform can be used with SRTP (and
   SRTCP). Such a companion RFC should avoid to overlap with the SRTP
   protocol document. Note however, that it might be necessary to extend
   the cryptographic context's definition with new parameters, or add
   steps to the packet processing. The companion RFC shall explain any
   known issues regarding interactions between the transform and other
   aspects of SRTP.

   Encryption and authentication transforms require some set of optional
   parameters or have optional modes of operation. The companion RFC
   shall select fixed or default values for these parameters (whenever
   possible), to reduce key management complexity. The mode of operation
   of ciphers and related parameters (e.g. IV-formation for RTP and
   RTCP) shall be defined.

   Each new transform document should specify its key attributes, e.g.
   size of keys (minimum, maximum, recommended), format of keys,
   recommended/required processing of input keying material,
   requirements/recommendations on re-keying and key derivation, etc.

8. Rationale

8.1 Key derivation

   Key derivation has been introduced to lighten the burden on the key-
   exchange: the four keys necessary to protect the RTP session (SRTP
   and SRTPC encryption keys, SRTP and SRTCP authentication keys) are
   derived from a single master key in a cryptographically secure way.
   The security stands (and falls) with the master key as the derived
   session keys are cryptographically independent (under reasonable
   assumptions on the PRF, here AES-based).

   Subsequent applications of the key derivation are optional but will
   give security benefits when enabled. They prevent a cryptanalysist
   from obtaining large amounts of ciphertext produced by a single fixed
   session key. They provide backwards and forward security in the sense
   that a compromised session key does not compromise other session keys
   derived from the same master (but of course, a leaked master key
   reveals all session keys).

Baugher, et al.                                                [Page 23]

INTERNET-DRAFT                    SRTP                    November, 2001

   If future encryption transforms are added, having a short IV that
   cannot fit the SEQ+ROC combination, a proper refresh-policy will
   enable these algorithms to encrypt longer streams without need to
   involve expensive key management operations.

8.2 Salting key

   The salting key has been introduced to protect against some attacks
   on additive stream ciphers, see Section 10.1. For simplicity, we per
   default require the salting key to have the same size as the block
   size of the cipher.

8.3 TMMH: Message Integrity from Universal Hashing

   The Truncated Multi-Modular Hash Function (TMMH) is a "universal"
   hash function suitable for message authentication in the Wegman-
   Carter paradigm [WC81].  It is simple, quick, and especially
   appropriate for Digital Signal Processors and other processors with a
   fast multiply operation, though a straightforward implementation
   requires storage equal in length to the largest message to be hashed.

   TMMH offers secure (provably secure under randomness assumptions on
   the added prefix) and very efficient MACs. However, as this approach
   to message integrity is new (not conceptually, but within
   standardization), we have chosen to make HMAC the default transform
   as many devices already have an HMAC implementation used for other
   purposes. We envision a migration to TMMH so that HMAC may eventually
   be phased-out from SRTP.

8.4 Data Origin Authentication Considerations

   Note that in unicast and, in general, in keys-per-user scenarios,
   integrity and data origin authentication are provided together.
   However, in group scenarios where the keys are shared between
   members, the MAC tag only proves that a member of the group sent the
   packet, but does not prove the actual sender. Data origin
   authentication (DOA) for multicast and group RTP sessions is a hard
   problem that needs a solution; while some promising proposals are
   being investigated [PCST1, PCST2], more work is needed to rigorously
   specify these technologies. Thus SRTP data origin authentication in
   groups is for further study.

   DOA can be done otherwise using signatures. However, this has high
   impact in terms of bandwidth and processing time, therefore we do not
   consider signatures in the discussion.

   The presence of mixers and translators does not allow data origin
   authentication in case the RTP payload and/or the RTP header are

Baugher, et al.                                                [Page 24]

INTERNET-DRAFT                    SRTP                    November, 2001

   manipulated. Note that this type of middle entities also disrupts
   end-to-end confidentiality (being the IV formation dependent e.g. on
   the RTP header preservation).

9. Key Management Considerations

   The SSRC and the random initial sequence number are known to the key

   A particular key management system might allow the different RTP
   sessions to use identical cryptographic master keys. Note that this
   is possible if the design of the synchronization mechanism, i.e. the
   IV in the case of the f8-mode, avoids keystream re-use (the two-time
   pad, Section 10.2). If this is used, the SSRC MUST be unique per

   A particular key management system might choose to provide re-key by
   associating a key for a crypto context with a pair of SEQ+ROC values,
   <FirstSEQ+ROC, LastSEQ+ROC>. The key management specification may
   require the SRTP implementation to check the SEQ+ROC of an incoming
   SRTP packet against the interval for the master key in the context
   before using the key. These interactions are defined by the key
   management interface to SRTP and are not defined by this protocol

   The key management interface might use the defaults for the SRTP
   protocol or define values for any and all SRTP parameters such as the

   - cipher and related parameters, including mode of operation
   - key(s), i.e. correct master (and salting) key(s), and related
   - authentication algorithm(s), and related parameter,
   - re-keying (key lifetime) and key derivation parameters,
   - SSRC, network address, RTP port pair
   - Current value of ROC and SEQ (or zeros prior to session
   - Replay window size

10. Security Considerations

10.1 Key Usage

   The effective key size is determined (upper bounded) by the size of
   the master key and, for encryption, the size of the salting key.
   Any additive stream cipher is vulnerable to attacks that use
   statistical knowledge about the plaintext source to enable key
   collision and time-memory tradeoff attacks [MF00,H80,Bi96]. These
   attacks take advantage of commonalities among plaintexts, and provide

Baugher, et al.                                                [Page 25]

INTERNET-DRAFT                    SRTP                    November, 2001

   a way for a cryptanalyst to amortize the computational effort of
   decryption over many keys, thus reducing the effective key size of
   the cipher. A detailed analysis of these attacks and their
   applicability to the encryption of Internet traffic is provided in
   [MF00]. In summary, the effective key size of SRTP when used in a
   security system in which m distinct keys are used, is equal to the
   key size of the cipher less the logarithm (base two) of m. Protection
   against such attacks can be provided simply by increasing the size of
   the keys used, which here can be accomplished by the use of the
   "salting key". Note that the salting key MUST be random, but MAY be

   Implementations SHOULD use keys that are as large as possible. Please
   note that in many cases increasing the key size of a cipher does not
   affect the throughput of that cipher.

10.2 SSRC collision and two-time pad

   Any fixed keystream output, generated from the same key and index
   should only be used to encrypt once. Re-using such keystream
   (jokingly called a 'two-time pad' system by cryptographers), can
   seriously compromise security. The NSA's VENONA project [C99]
   provides a historical example of such a compromise. In SRTP, a 'two-
   time pad' is avoided by requiring the key, or some other parameter of
   cryptographic significance, to be unique per RTP stream.

   It may in some cases be desirable that multiple crypto contexts
   contain identical master keys. For instance, there could be a desire
   for a group to share a single key. Issues as above (two-time pad)
   MUST then be considered. As discussed in Section 9, f8 may allow such
   sharing by its use of the SSRC in the IV; however, the effect of an
   eventual RTP SSRC collision detection MUST be taken into account.

   Note that sharing a master key between multiple streams in a
   multimedia session implies using a distinct SSRC in the IV of AES-f8.
   This means, each SSRC MUST be unique among all the RTP streams inside
   that multimedia session, to avoid unlucky IV combinations and end up
   in two-time pad.

10.3 Confidentiality of the RTP Payload

   By using 'seekable' stream ciphers, SRTP avoids the denial of service
   attacks that are possible on stream ciphers that lack this property
   (these attacks are described in Section 3.4 of [B96]). It is
   important to be aware that, as with any stream cipher, the exact
   length of the payload is revealed by the encryption. This means
   that it may be possible to deduce certain "formatting bits" of the
   payload, as the length of the codec output might vary due to certain
   parameter settings etc. This, in turn, implies that the corresponding

Baugher, et al.                                                [Page 26]

INTERNET-DRAFT                    SRTP                    November, 2001

   bit of the keystream can be deduced. However, if the stream cipher is
   secure (counter mode and f8 are provably secure under certain
   assumptions), knowledge of a few bits of the keystream will not aid
   an attacker in predicting the following keystream bits. Thus, the
   payload length (and information deducible from this) will leak, but
   nothing else.

10.4 Confidentiality of the RTP Header

   With the described proposal, RTP headers are sent in the clear to
   allow for header compression. This means that data such as payload
   type, synchronization source identifier, and timestamp are available
   to an eavesdropper. Moreover, since RTP allows for future extensions
   of headers, we cannot foresee what kind of possibly sensitive
   information might also be "leaked".

   The described proposal is a low-cost method, which allows header
   compression to reduce bandwidth. It is up to the endpoints policies
   to decide about the security protocol to employ. If the header
   compression is omitted, other solutions might be applicable. In other
   words, we provide a solution that works in the most general scenario,
   even in the most demanding one (like conversational multimedia over
   low-bandwidth, unreliable media). Of course the solution will then
   also work in less restricted environments, but we suggest that if one
   really needs to protect headers, and is allowed to do so by the
   surrounding environment, then one should also look at alternatives,
   e.g. IPsec. In addition, we strongly recommend the use of profiles to
   select the right trade-off for the required level of security, e.g.
   if the headers can be left in cleartext or not.

10.5 Integrity of the RTP packet

   Additive ciphers do not provide any security service other than
   privacy. In particular, they do not provide message authentication
   (see [RK99] or [HAC] for a discussion of this security service).

   However, SRTP uses a message authentication code to provide that
   security service.

   With HMAC being a well-studied authentication scheme, based on a
   provably secure construction, the security against MAC forgery
   depends on the key-size and the size of the output tags (or for some
   attacks, half the size of the tag due to the "birthday-paradox").

   The default size for HMAC has been fixed to 32 bits. Other size
   values may be defined. The use of a truncated size is motivated by
   the fact that it may be desirable, e.g. in wireless environments, to
   save bandwidth. The choice of such a truncation MUST be evaluated to
   the reduction in security it implies. The default 32-bit size is a

Baugher, et al.                                                [Page 27]

INTERNET-DRAFT                    SRTP                    November, 2001

   compromise, offering a reasonable level of security, taking into
   account the real-time aspects of the protected protocol. High
   security applications SHOULD however use larger tags.

   The fact that authentication is optional is motivated by the fact
   that, while the function is typically highly desired, there are
   certain cases (notably in cellular environments) where it has an
   impact in terms of cost, e.g. for bandwidth consumption. Also,
   independently of the tag length, a single transmission bit error in
   the protected part of the packet or in the tag itself forces the
   entire packet to be dropped. Given a fixed quality, it implies the
   necessity of higher protection of the transmitted unit, hence higher
   cost. In those cases, it is up to the user security profile to
   request authentication.

10.5.1 Integrity of the RTP header: IHA

   The IV formation of the f8-mode gives implicit authentication of the
   RTP header, even if no cryptographic integrity protection is present.
   This means that modifying bits of the RTP header will cause the
   decryption process at the receiver to produce essentially random

11. Interaction with Forward Error Correction mechanisms

   Some considerations are due when Forward Error Correction mechanisms
   are performed, e.g. as specified in RFC 2733. In particular, the
   order in which SRTP processing and the error correction processing
   are applied, is of concern.

   The optimal order would be the following:

   - on the sender side, first encrypt the packet, then perform the FEC
      processing, finally authenticate

   - on the receiver side, first authenticate the packet, then perform
      the FEC processing, finally decrypt.

   The motivations for the above ordering are:

   - FEC expands the packet, so performing encryption after FEC would be
      more expensive

   - on the receiver side, authentication has to be verified before
     getting engaged in the FEC processing, to reduce effects of certain
    denial of service attacks

   - adding redundancy before encrypting, slightly reduces the effective
     key-size and resistance to attacks

Baugher, et al.                                                [Page 28]

INTERNET-DRAFT                    SRTP                    November, 2001

   However, this implies to split the security processing.
   Implementations could gain in keeping the security process strictly
   tied, in this case the recommendation is that the security processing
   takes place after FEC on the sender's side, and before FEC on the
   receiver's side. This implies the cost of placing encryption after
   FEC processing, as above explained, hence a convenient choice is left
   to the application. For interoperability clearness, implementations
   are requested to place the security process after FEC on the sender's
   side, and before FEC on the receiver's side. This is also default
   behavior; another choice has to be agreed out-of-band.

12. IANA Considerations

   The RTP specification establishes a registry of profile names for use
   by higher-level control protocols, such as the Session Description
   Protocol (SDP), to refer to transport methods. This profile registers
   the name "RTP/SAVP".

13. Open Issue

   It is open issue to investigate the need for AES-CM to provide a mean
   to support the use of the same master key for multiple streams. This
   feature was supported in the previous drafts by insertion of the SSRC
   in the IV (under the constraint of unique SSRC).
   The feature is currently supported only by the non-mandatory-to-
   implement f8-AES. The reason for raising this question is that there
   might be cases where the feature is needed, e.g. when a single master
   key is available but there are multiple streams. As an example, it is
   likely that such simplistic key management is used in very 'thin'
   clients that cannot afford implementing anything but the mandatory
   transform. Thus, this may be a restriction in SRTP's applicability in
   such devices.

14. Acknowledgements

   The authors would like to thank Magnus Westerlund, Brian Weis, Robert
   Fairlie-Cuninghame, and Adrian Perrig for their reviews and comments.

15. Author's Addresses

   Questions and comments should be directed to the authors and

      Mark Baugher
      Cisco Systems, Inc.
      5510 SW Orchid Street     Phone:  +1 503-245-4543

Baugher, et al.                                                [Page 29]

INTERNET-DRAFT                    SRTP                    November, 2001

      Portland, OR 97219 USA    Email:  mbaugher@cisco.com

      Rolf Blom
      Ericsson Research
      SE-16480 Stockholm     Phone:  +46 8 58531707
      Sweden                 EMail:  rolf.blom@era.ericsson.se

      Elisabetta Carrara
      Ericsson Research
      SE-16480 Stockholm     Phone:  +46 8 50877040
      Sweden                 EMail:  elisabetta.carrara@era.ericsson.se

      David A. McGrew
      Cisco Systems, Inc.
      San Jose, CA 95134-1706   Phone:  +1 301-349-5815
      USA                       EMail:  mcgrew@cisco.com

      Mats Naslund
      Ericsson Research
      SE-16480 Stockholm     Phone:  +46 8 58533739
      Sweden                 EMail:  mats.naslund@era.ericsson.se

      Karl Norrman
      Ericsson Research
      SE-16480 Stockholm     Phone:  +46 8 4044502
      Sweden                 EMail:  karl.norrman@era.ericsson.se

      David Oran
      Cisco Systems, Inc.
      San Jose, CA 95134-1706
      USA                       EMail:  oran@cisco.com

16. References

   [AES] NIST, "Advanced Encryption Standard (AES)", Draft FIPS,

   [C99]  Crowell, W. P., "Introduction to the VENONA Project",

   [ES3D] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
          Algorithms Group of Experts (SAGE); General Report on the
          Design,Specification and Evaluation of 3GPP Standard
          Confidentiality and Integrity Algorithms", Public report,
          Draft Version 1.0, Dec 1999.

   [ES3E] ETSI SAGE 3GPP Standard Algorithms Task Force, "Security
          Algorithms Group of Experts (SAGE) Report on the Evaluation of
          3GPP Standard Confidentiality and Integrity Algorithms",
          Public report, Draft Version 1.0, Dec 1999.

Baugher, et al.                                                [Page 30]

INTERNET-DRAFT                    SRTP                    November, 2001

   [HAC]  Menezes, A., Van Oorschot, P., and Vanstone, S., "Handbook of
          Applied Cryptography", CRC Press, 1997, ISBN 0-8493-8523-7.

   [HMAC] Krawczyk, H., Bellare, M., and Canetti, R.: "HMAC: Keyed-
          hashing for message authentication". IETF RFC 2104, February

   [H80]  Hellman, M. E., "A cryptanalytic time-memory trade-off", IEEE
          Transactions on Information Theory, July 1980, pp. 401-406.

   [MF00]  McGrew, D., and Fluhrer, S., "Attacks on Encryption of
          Redundant Plaintext and Implications on Internet Security",
          the Proceedings of the Seventh Annual Workshop on Selected
          Areas in Cryptography (SAC 2000), Springer-Verlag.

   [RFC1889] Schulzrinne, H., Casner, S., Frederick, R., Jacobson,V.,
           "RTP: A Transport Protocol for Real-Time Applications", IETF
           RFC 1889.

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
           Requirement Levels", IETF RFC 2119, March 1997.

   [RFC2401] Kent, S., and R. Atkinson, "Security Architecture for IP",
          IETF RFC 2401, November 1998.

   [RFC2675] Borman, D., Deering, S., Hinden, R., "IPv6 Jumbograms",
          IETF RFC 2675, August 1999.

   [RFC2828] Shirey, R., "Internet Security Glossary", IETF RFC 2828,
            May 2000.

   [RK99]  Rescorla, E., and Korver, B., "Guidelines for Writing RFC
          Text on Security Considerations," draft-rescorla-sec-cons-

   [PCST1] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient and
          Secure Source Authentication for Multicast", in Proc. of
          Network and Distributed System Security Symposium NDSS 2001,
          pp. 35-46, 2001.

   [PCST2] Perrig, A., Canetti, R., Tygar, D., Song, D., "Efficient
           Authentication and Signing of Multicast Streams over Lossy
          Channels", in Proc. of IEEE Security and Privacy Symposium
          S&P2000, pp. 56-73, 2000.

   [WC81] M. N. Wegman and J. L. Carter, "New Hash Functions and Their
          Use in Authentication and Set Equality", JCSS 22, 265-279,

Baugher, et al.                                                [Page 31]

INTERNET-DRAFT                    SRTP                    November, 2001

Appendix A: Pseudocode for Index Determination, and ROC and s_l Update

   Pseudocode for the algorithm to process a packet with sequence number
   SEQ, determining the index i and updating the rollover counter and
   sequence number for the last (authenticated) packet, s_l.

            if (s_l < 32,768)
               if (SEQ - s_l > 32,768)
                  set i to SEQ + 65,536 * (ROC-1)
                  set i to SEQ + 65,536 * ROC
               if (s_l - 32,768 > SEQ)
                  set ROC to ROC + 1
               set i to SEQ + ROC * 65,536
            set s_l to SEQ

Appendix B: Test Vectors

B.1 AES-f8 Test Vectors

   All values are in hexadecimal.


   RTP packet header   :   806e5cba50681de55c621599

   RTP packet payload  :   70736575646f72616e646f6d6e657373

   ROC                 :   d462564a
   key                 :   234829008467be186c3de14aae72d62c
   salt key            :   32f2870d
   key-mask (m)        :   32f2870d555555555555555555555555
   key XOR key-mask    :   11baae0dd132eb4d3968b41ffb278379

   IV                  :   006e5cba50681de55c621599d462564a
   IV'                 :   595b699bbd3bc0df26062093c1ad8f73

   j                   :   0
   IV' XOR j           :   595b699bbd3bc0df26062093c1ad8f73
   S(-1)               :   00000000000000000000000000000000
   S(-1) XOR IV' XOR j :   595b699bbd3bc0df26062093c1ad8f73
   S(0)                :   71ef82d70a172660240709c7fbb19d8e

Baugher, et al.                                                [Page 32]

INTERNET-DRAFT                    SRTP                    November, 2001

   plaintext           :   70736575646f72616e646f6d6e657373
   ciphertext          :   019ce7a26e7854014a6366aa95d4eefd

   j                   :   1
   IV' XOR j           :   595b699bbd3bc0df26062093c1ad8f72
   S(0)                :   71ef82d70a172660240709c7fbb19d8e
   S(0) XOR IV' XOR j  :   28b4eb4cb72ce6bf020129543a1c12fc
   S(1)                :   3abd640a60919fd43bd289a09649b5fc
   plaintext           :   20697320746865206e65787420626573
   ciphertext          :   1ad4172a14f9faf455b7f1d4b62bd08f

   j                   :   2
   IV' XOR j           :   595b699bbd3bc0df26062093c1ad8f70
   S(1)                :   3abd640a60919fd43bd289a09649b5fc
   S(1) XOR IV' XOR j  :   63e60d91ddaa5f0b1dd4a93357e43a8c
   S(2)                :   584d14a591acfca846b3aa3a0ab50fec
   plaintext           :   74207468696e67
   ciphertext          :   2c6d60cdf8c29b

B.2 AES-CM Test Vectors

    All values are in hexadecimal.

    AES-CM Key:


    Block Cipher Key:     75387824D1F1F3815641B65D78D51EDB
    Salting key:          96C9781981053CBBCB36927844F1932C
    Packet Index:         12345678

    Counter                          Keystream
    96C9781981053CBBCB36A4AC9B69932C EA0AA027BA6D56E44B28F43A7E3E5F58
    96C9781981053CBBCB36A4AC9B69932D CBDB3107EDA8D420D3EF7AB7FF290166
    96C9781981053CBBCB36A4AC9B69932E AED6F7CB14ED49174336CC010AEB8780
    96C9781981053CBBCB36A4AC9B69932F 4C3A754AF027A5C8CCB40E0FE20AF246
    96C9781981053CBBCB36A4AC9B699330 01A6D1CE983EF993E980CC9568587E3D

    Keystream Segment (final output)


Baugher, et al.                                                [Page 33]

INTERNET-DRAFT                    SRTP                    November, 2001

B.3 TMMH/16 Test Vectors

   This section provides test vectors which can be used to test an
   implementation of TMMH/16.  The key, message, and outputs are
   expressed as octet sequences, with each octet in hexadecimal.

    KEY_LENGTH: 10
    key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc }
    message: { 0xca, 0xfe, 0xba, 0xbe, 0xba, 0xde }
    output: { 0x9d, 0x6a }

    KEY_LENGTH: 10
    key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc }
    message: { 0xca, 0xfe, 0xba }
    output: { 0xc8, 0x8e }

    KEY_LENGTH: 10
    key: { 0x01, 0x23, 0x45, 0x67, 0x89, 0xab, 0xcd, 0xef, 0xfe, 0xdc }
    message: { 0xca, 0xfe, 0xba, 0xbe, 0xba, 0xde }
    output: { 0x9d, 0x6a, 0xc0, 0xd3 }

   This Internet-Draft expires in April 2002.

Baugher, et al.                                                [Page 34]

Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/