draft-ietf-ipsecme-failure-detection-02.txt   draft-ietf-ipsecme-failure-detection-03.txt 
IPsecME Working Group Y. Nir, Ed. IPsecME Working Group Y. Nir, Ed.
Internet-Draft Check Point Internet-Draft Check Point
Intended status: Standards Track D. Wierbowski Intended status: Standards Track D. Wierbowski
Expires: April 28, 2011 IBM Expires: July 14, 2011 IBM
F. Detienne F. Detienne
P. Sethi P. Sethi
Cisco Cisco
October 25, 2010 January 10, 2011
A Quick Crash Detection Method for IKE A Quick Crash Detection Method for IKE
draft-ietf-ipsecme-failure-detection-02 draft-ietf-ipsecme-failure-detection-03
Abstract Abstract
This document describes an extension to the IKEv2 protocol that This document describes an extension to the IKEv2 protocol that
allows for faster detection of SA desynchronization using a saved allows for faster detection of SA desynchronization using a saved
token. token.
When an IPsec tunnel between two IKEv2 peers is disconnected due to a When an IPsec tunnel between two IKEv2 peers is disconnected due to a
restart of one peer, it can take as much as several minutes for the restart of one peer, it can take as much as several minutes for the
other peer to discover that the reboot has occurred, thus delaying other peer to discover that the reboot has occurred, thus delaying
skipping to change at page 1, line 42 skipping to change at page 1, line 42
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 28, 2011. This Internet-Draft will expire on July 14, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 3, line 22 skipping to change at page 3, line 22
4.1. Notification Format . . . . . . . . . . . . . . . . . . . 7 4.1. Notification Format . . . . . . . . . . . . . . . . . . . 7
4.2. Passing a Token in the AUTH Exchange . . . . . . . . . . . 8 4.2. Passing a Token in the AUTH Exchange . . . . . . . . . . . 8
4.3. Replacing Tokens After Rekey or Resumption . . . . . . . . 9 4.3. Replacing Tokens After Rekey or Resumption . . . . . . . . 9
4.4. Replacing the Token for an Existing SA . . . . . . . . . . 9 4.4. Replacing the Token for an Existing SA . . . . . . . . . . 9
4.5. Presenting the Token in an Unprotected Message . . . . . . 10 4.5. Presenting the Token in an Unprotected Message . . . . . . 10
5. Token Generation and Verification . . . . . . . . . . . . . . 11 5. Token Generation and Verification . . . . . . . . . . . . . . 11
5.1. A Stateless Method of Token Generation . . . . . . . . . . 11 5.1. A Stateless Method of Token Generation . . . . . . . . . . 11
5.2. A Stateless Method with IP addresses . . . . . . . . . . . 12 5.2. A Stateless Method with IP addresses . . . . . . . . . . . 12
5.3. Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 12 5.3. Token Lifetime . . . . . . . . . . . . . . . . . . . . . . 12
6. Backup Gateways . . . . . . . . . . . . . . . . . . . . . . . 12 6. Backup Gateways . . . . . . . . . . . . . . . . . . . . . . . 12
7. Alternative Solutions . . . . . . . . . . . . . . . . . . . . 13 7. Interaction with Session Resumption . . . . . . . . . . . . . 13
7.1. Initiating a new IKE SA . . . . . . . . . . . . . . . . . 13 8. Operational Considerations . . . . . . . . . . . . . . . . . . 14
7.2. SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 8.1. Who should implement this specification . . . . . . . . . 14
7.3. Birth Certificates . . . . . . . . . . . . . . . . . . . . 13 8.2. Response to unknown child SPI . . . . . . . . . . . . . . 15
7.4. Reducing Liveness Check Length . . . . . . . . . . . . . . 14 9. Security Considerations . . . . . . . . . . . . . . . . . . . 15
8. Interaction with Session Resumption . . . . . . . . . . . . . 14 9.1. QCD Token Generation and Handling . . . . . . . . . . . . 16
9. Operational Considerations . . . . . . . . . . . . . . . . . . 16 9.2. QCD Token Transmission . . . . . . . . . . . . . . . . . . 17
9.1. Who should implement this specification . . . . . . . . . 16 9.3. QCD Token Enumeration . . . . . . . . . . . . . . . . . . 17
9.2. Response to unknown child SPI . . . . . . . . . . . . . . 16 9.4. Selecting an Appropriate Token Generation Method . . . . . 17
10. Security Considerations . . . . . . . . . . . . . . . . . . . 17 10. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 18
10.1. QCD Token Generation and Handling . . . . . . . . . . . . 17 11. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18
10.2. QCD Token Transmission . . . . . . . . . . . . . . . . . . 18 12. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 18
10.3. QCD Token Enumeration . . . . . . . . . . . . . . . . . . 18 12.1. Changes from draft-ietf-ipsecme-failure-detection-02 . . . 19
10.4. Selecting an Appropriate Token Generation Method . . . . . 19 12.2. Changes from draft-ietf-ipsecme-failure-detection-01 . . . 19
11. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 20 12.3. Changes from draft-ietf-ipsecme-failure-detection-00 . . . 19
12. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 20 12.4. Changes from draft-nir-ike-qcd-07 . . . . . . . . . . . . 19
13. Change Log . . . . . . . . . . . . . . . . . . . . . . . . . . 20 12.5. Changes from draft-nir-ike-qcd-03 and -04 . . . . . . . . 19
13.1. Changes from draft-ietf-ipsecme-failure-detection-01 . . . 20 12.6. Changes from draft-nir-ike-qcd-02 . . . . . . . . . . . . 20
13.2. Changes from draft-ietf-ipsecme-failure-detection-00 . . . 20 12.7. Changes from draft-nir-ike-qcd-01 . . . . . . . . . . . . 20
13.3. Changes from draft-nir-ike-qcd-07 . . . . . . . . . . . . 21 12.8. Changes from draft-nir-ike-qcd-00 . . . . . . . . . . . . 20
13.4. Changes from draft-nir-ike-qcd-03 and -04 . . . . . . . . 21 12.9. Changes from draft-nir-qcr-00 . . . . . . . . . . . . . . 20
13.5. Changes from draft-nir-ike-qcd-02 . . . . . . . . . . . . 21 13. References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
13.6. Changes from draft-nir-ike-qcd-01 . . . . . . . . . . . . 21 13.1. Normative References . . . . . . . . . . . . . . . . . . . 20
13.7. Changes from draft-nir-ike-qcd-00 . . . . . . . . . . . . 21 13.2. Informative References . . . . . . . . . . . . . . . . . . 21
13.8. Changes from draft-nir-qcr-00 . . . . . . . . . . . . . . 21 Appendix A. The Path Not Taken . . . . . . . . . . . . . . . . . 21
14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 22 A.1. Initiating a new IKE SA . . . . . . . . . . . . . . . . . 21
14.1. Normative References . . . . . . . . . . . . . . . . . . . 22 A.2. SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
14.2. Informative References . . . . . . . . . . . . . . . . . . 22 A.3. Birth Certificates . . . . . . . . . . . . . . . . . . . . 22
A.4. Reducing Liveness Check Length . . . . . . . . . . . . . . 22
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22
1. Introduction 1. Introduction
IKEv2, as described in [RFC5996] and its predecessor RFC 4306, has a IKEv2, as described in [RFC5996] and its predecessor RFC 4306, has a
method for recovering from a reboot of one peer. As long as traffic method for recovering from a reboot of one peer. As long as traffic
flows in both directions, the rebooted peer should re-establish the flows in both directions, the rebooted peer should re-establish the
tunnels immediately. However, in many cases the rebooted peer is a tunnels immediately. However, in many cases the rebooted peer is a
VPN gateway that protects only servers, or else the non-rebooted peer VPN gateway that protects only servers, or else the non-rebooted peer
has a dynamic IP address. In such cases, the rebooted peer will not has a dynamic IP address. In such cases, the rebooted peer will not
skipping to change at page 4, line 49 skipping to change at page 4, line 49
The term "token taker" refers to an implementation that stores such a The term "token taker" refers to an implementation that stores such a
token or a digest thereof, in order to verify that a new token it token or a digest thereof, in order to verify that a new token it
receives is identical to the old token it has stored. receives is identical to the old token it has stored.
The term "non-volatile storage" in this document refers to a data The term "non-volatile storage" in this document refers to a data
storage module, that persists across restarts of the token maker. storage module, that persists across restarts of the token maker.
Examples of such a storage module include an internal disk, an Examples of such a storage module include an internal disk, an
internal flash memory module, an external disk and an external internal flash memory module, an external disk and an external
database. A small non-volatile storage module is required for a database. A small non-volatile storage module is required for a
token maker, but a larger one can be used to enhance performance, as token maker, but a larger one can be used to enhance performance, as
described in Section 9.2. described in Section 8.2.
2. RFC 5996 Crash Recovery 2. RFC 5996 Crash Recovery
When one peer loses state or reboots, the other peer does not get any When one peer loses state or reboots, the other peer does not get any
notification, so unidirectional IPsec traffic can still flow. The notification, so unidirectional IPsec traffic can still flow. The
rebooted peer will not be able to decrypt it, however, and the only rebooted peer will not be able to decrypt it, however, and the only
remedy is to send an unprotected INVALID_SPI notification as remedy is to send an unprotected INVALID_SPI notification as
described in section 3.10.1 of [RFC5996]. That section also described in section 3.10.1 of [RFC5996]. That section also
describes the processing of such a notification: describes the processing of such a notification:
skipping to change at page 6, line 17 skipping to change at page 6, line 17
sending INVALID_SPI notification, as it detected that the other end sending INVALID_SPI notification, as it detected that the other end
is not sending any packets anymore while it is still rebooting or is not sending any packets anymore while it is still rebooting or
recovering from the situation. recovering from the situation.
This means that the several minutes recovery period is overlaping the This means that the several minutes recovery period is overlaping the
actual recover time of the other peer, i.e. if the security gateway actual recover time of the other peer, i.e. if the security gateway
requires several minutes to boot up from the crash then the other requires several minutes to boot up from the crash then the other
peers have already finished their liveness checks before the crashing peers have already finished their liveness checks before the crashing
peer even has change to send INVALID_SPI notifications. peer even has change to send INVALID_SPI notifications.
There are cases where the peer looses state and is able to recover There are cases where the peer loses state and is able to recover
immediately, in those cases it might take several minutes to recover. immediately, in those cases it might take several minutes to recover.
Note, that IKEv2 specification specifically leaves number of retries Note, that IKEv2 specification specifically leaves number of retries
and lengths of timeouts out from the specification, as they do not and lengths of timeouts out from the specification, as they do not
affect interoperability. This means that implementations are allowed affect interoperability. This means that implementations are allowed
to use the hints provided by the INVALID_SPI messages as hints that to use the hints provided by the INVALID_SPI messages as hints that
will shorten those timeouts (i.e. different environment and situation will shorten those timeouts (i.e. different environment and situation
requiring different rules). requiring different rules).
Good existing IKEv2 implementations already do that (i.e. both Good existing IKEv2 implementations already do that (i.e. both
skipping to change at page 6, line 45 skipping to change at page 6, line 45
token", as described in Section 4.1 in the first IKE_AUTH exchange token", as described in Section 4.1 in the first IKE_AUTH exchange
messages. These are the first IKE_AUTH request and final IKE_AUTH messages. These are the first IKE_AUTH request and final IKE_AUTH
response that contain the AUTH payloads. The generation of these response that contain the AUTH payloads. The generation of these
tokens is a local matter for implementations, but considerations are tokens is a local matter for implementations, but considerations are
described in Section 5. Implementations that send such a token will described in Section 5. Implementations that send such a token will
be called "token makers". be called "token makers".
A supporting implementation receiving such a token MUST store it (or A supporting implementation receiving such a token MUST store it (or
a digest thereof) along with the IKE SA. Implementations that a digest thereof) along with the IKE SA. Implementations that
support this part of the protocol will be called "token takers". support this part of the protocol will be called "token takers".
Section 9.1 has considerations for which implementations need to be Section 8.1 has considerations for which implementations need to be
token takers, and which should be token makers. Implementation that token takers, and which should be token makers. Implementation that
are not token takers will silently ignore QCD tokens. are not token takers will silently ignore QCD tokens.
When a token maker receives a protected IKE request message with When a token maker receives a protected IKE request message with
unknown IKE SPIs, it SHOULD generate a new token that is identical to unknown IKE SPIs, it SHOULD generate a new token that is identical to
the previous token, and send it to the requesting peer in an the previous token, and send it to the requesting peer in an
unprotected IKE message as described in Section 4.5. unprotected IKE message as described in Section 4.5.
When a token taker receives the QCD token in an unprotected When a token taker receives the QCD token in an unprotected
notification, it MUST verify that the TOKEN_SECRET_DATA matches the notification, it MUST verify that the TOKEN_SECRET_DATA matches the
skipping to change at page 7, line 21 skipping to change at page 7, line 21
IKE SA associated with the IKE_SPI fields, and all dependent child IKE SA associated with the IKE_SPI fields, and all dependent child
SAs. This event MAY also be logged. The token taker MUST accept SAs. This event MAY also be logged. The token taker MUST accept
such tokens from any IP address and port combination, so as to allow such tokens from any IP address and port combination, so as to allow
different kinds of high-availability configurations of the token different kinds of high-availability configurations of the token
maker. maker.
A supporting token taker MAY immediately create new SAs using an A supporting token taker MAY immediately create new SAs using an
Initial exchange, or it may wait for subsequent traffic to trigger Initial exchange, or it may wait for subsequent traffic to trigger
the creation of new SAs. the creation of new SAs.
See Section 8 for a short discussion about this extensions's See Section 7 for a short discussion about this extensions's
interaction with IKEv2 Session Resumption ([RFC5723]). interaction with IKEv2 Session Resumption ([RFC5723]).
4. Formats and Exchanges 4. Formats and Exchanges
4.1. Notification Format 4.1. Notification Format
The notification payload called "QCD token" is formatted as follows: The notification payload called "QCD token" is formatted as follows:
1 2 3 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
skipping to change at page 10, line 40 skipping to change at page 10, line 40
old QCD_TOKEN. old QCD_TOKEN.
4.5. Presenting the Token in an Unprotected Message 4.5. Presenting the Token in an Unprotected Message
This QCD_TOKEN notification is unprotected, and is sent as a response This QCD_TOKEN notification is unprotected, and is sent as a response
to a protected IKE request, which uses an IKE SA that is unknown. to a protected IKE request, which uses an IKE SA that is unknown.
request --> N(INVALID_IKE_SPI), N(QCD_TOKEN)+ request --> N(INVALID_IKE_SPI), N(QCD_TOKEN)+
If child SPIs are persistently mapped to IKE SPIs as described in If child SPIs are persistently mapped to IKE SPIs as described in
Section 9.2, a token taker may get the following unprotected message Section 8.2, a token taker may get the following unprotected message
in response to an ESP or AH packet. in response to an ESP or AH packet.
request --> N(INVALID_SPI), N(QCD_TOKEN)+ request --> N(INVALID_SPI), N(QCD_TOKEN)+
The QCD_TOKEN and INVALID_IKE_SPI notifications are sent together to The QCD_TOKEN and INVALID_IKE_SPI notifications are sent together to
support both implementations that conform to this specification and support both implementations that conform to this specification and
implementations that don't. Similar to the description in section implementations that don't. Similar to the description in section
2.21 of [RFC5996], the IKE SPI and message ID fields in the packet 2.21 of [RFC5996], the IKE SPI and message ID fields in the packet
headers are taken from the protected IKE request. headers are taken from the protected IKE request.
skipping to change at page 12, line 13 skipping to change at page 12, line 13
TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R) TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R)
5.2. A Stateless Method with IP addresses 5.2. A Stateless Method with IP addresses
This method is similar to the one in the previous section, except This method is similar to the one in the previous section, except
that the IP address of the token taker is also added to the block that the IP address of the token taker is also added to the block
being hashed. This has the disadvantage that the token needs to be being hashed. This has the disadvantage that the token needs to be
replaced (as described in Section 4.4) whenever the token taker replaced (as described in Section 4.4) whenever the token taker
changes its address. changes its address.
The reason to use this method is described in Section 10.4. When The reason to use this method is described in Section 9.4. When
using this method, the TOKEN_SECRET_DATA field is calculated as using this method, the TOKEN_SECRET_DATA field is calculated as
follows: follows:
TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R | IPaddr-T) TOKEN_SECRET_DATA = HASH(QCD_SECRET | SPI-I | SPI-R | IPaddr-T)
The IPaddr-T field specifies the IP address of the token taker. The IPaddr-T field specifies the IP address of the token taker.
Secret rollover considerations are similar to those in the previous Secret rollover considerations are similar to those in the previous
section. section.
5.3. Token Lifetime 5.3. Token Lifetime
skipping to change at page 12, line 49 skipping to change at page 12, line 49
If such a configuration is available, it is RECOMMENDED that the If such a configuration is available, it is RECOMMENDED that the
stand-by gateway be able to generate the same token as the active stand-by gateway be able to generate the same token as the active
gateway. if the method described in Section 5.1 is used, this means gateway. if the method described in Section 5.1 is used, this means
that the QCD_SECRET field is identical in both gateways. This has that the QCD_SECRET field is identical in both gateways. This has
the effect of having the crash recovery available immediately. the effect of having the crash recovery available immediately.
Note that this refers to "high availability" configurations, where Note that this refers to "high availability" configurations, where
only one gateway is active at any given moment. This is different only one gateway is active at any given moment. This is different
from "load sharing" configurations where more than one gateway is from "load sharing" configurations where more than one gateway is
active at the same time. For load sharing configurations, please see active at the same time. For load sharing configurations, please see
Section 10.2 for security considerations. Section 9.2 for security considerations.
7. Alternative Solutions
7.1. Initiating a new IKE SA
Instead of sending a QCD token, we could have the rebooted
implementation start an Initial exchange with the peer, including the
INITIAL_CONTACT notification. This would have the same effect,
instructing the peer to erase the old IKE SA, as well as establishing
a new IKE SA with fewer rounds.
The disadvantage here, is that in IKEv2 an authentication exchange
MUST have a piggy-backed Child SA set up. Since our use case is such
that the rebooted implementation does not have traffic flowing to the
peer, there are no good selectors for such a Child SA.
Additionally, when authentication is asymmetric, such as when EAP is
used, it is not possible for the rebooted implementation to initiate
IKE.
7.2. SIR
Another proposal that was considered for this work item is the SIR
extension, which is described in [recovery]. Under that proposal,
the non-rebooted peer sends a non-protected query to the possibly
rebooted peer, asking whether the IKE SA exists. The peer replies
with either a positive or negative response, and the absence of a
positive response, along with the existence of a negative response is
taken as proof that the IKE SA has really been lost.
The working group preferred the QCD proposal to this one.
7.3. Birth Certificates
Birth Certificates is a method of crash detection that has never been
formally defined. Bill Sommerfeld suggested this idea in a mail to
the IPsec mailing list on August 7, 2000, in a thread discussing
methods of crash detection:
If we have the system sign a "birth certificate" when it
reboots (including a reboot time or boot sequence number),
we could include that with a "bad spi" ICMP error and in
the negotiation of the IKE SA.
We believe that this method would have some problems. First, it
requires Alice to store the certificate, so as to be able to compare
the public keys. That requires more storage than does a QCD token.
Additionally, the public-key operations needed to verify the self-
signed certificates are more expensive for Alice.
We believe that a symmetric-key operation such as proposed here is
more light-weight and simple than that implied by the Birth
Certificate idea.
7.4. Reducing Liveness Check Length
Some have suggested that the RFC 5996 procedure described in
Section 2 can be tweaked by requiring fewer retransmissions over a
shorter period of time for cases of liveness check started because of
an INVALID_SPI or INVALID_IKE_SPI notification.
We believe that the default retransmission policy should represent a
good balance between the need for a timely discovery of a dead peer,
and a low probability of false detection. We expect the policy to be
set to take the shortest time such that this probability achieves a
certain target. Therefore, reducing elapsed time and retransmission
count will create an unacceptably high probability of false
detection, and this can be triggered by a single INVALID_IKE_SPI
notification.
Additionally, even if the retransmission policy is reduced to, say,
one minute, it is still a very noticeable delay from a human
perspective, from the time that the gateway has come up until the
tunnels are active, or from the time the backup gateway has taken
over until the tunnels are active.
8. Interaction with Session Resumption 7. Interaction with Session Resumption
Session Resumption, specified in [RFC5723] proposes to make setting Session Resumption, specified in [RFC5723] allows setting up a new
up a new IKE SA consume less computing resources. This is IKE SA consume less computing resources. This is particularly useful
particularly useful in the case of a remote access gateway that has in the case of a remote access gateway that has many tunnels. A
many tunnels. A failure of such a gateway would require all these failure of such a gateway would require all these many remote access
many remote access clients to establish an IKE SA either with the clients to establish an IKE SA either with the rebooted gateway or
rebooted gateway or with a backup gateway. This tunnel re- with a backup gateway. This tunnel re-establishment should occur
establishment should occur within a short period of time, creating a within a short period of time, creating a burden on the remote access
burden on the remote access gateway. Session Resumption addresses gateway. Session Resumption addresses this problem by having the
this problem by having the clients store an encrypted derivative of clients store an encrypted derivative of the IKE SA for quick re-
the IKE SA for quick re-establishment. establishment.
What Session Resumption does not help is the problem of detecting What Session Resumption does not help is the problem of detecting
that the peer gateway has failed. A failed gateway may go undetected that the peer gateway has failed. A failed gateway may go undetected
for as long as the lifetime of a child SA, because IPsec does not for an arbitrarily long time, because IPsec does not have packet
have packet acknowledgement, and applications cannot signal the IPsec acknowledgement, and applications cannot signal the IPsec layer that
layer that the tunnel "does not work". Before establishing a new IKE the tunnel "does not work". Section 2.4 of RFC 5996 does not specify
SA using Session Resumption, a client should ascertain that the how long an implementation needs to wait before beginning a liveness
gateway has indeed failed. This could be done using either a check, and only says "not recently" (see full quote in Section 2).
liveness check (as in RFC 5996) or using the QCD tokens described in In practice some mobile devices wait a very long time before
this document. beginning liveness check, in order to extend battery life by allowing
parts of the device to remain in low-power modes.
QCD tokens provide a way to detect the failure of the peer in the
case where liveness check has not yet ended (or begun).
A remote access client conforming to both specifications will store A remote access client conforming to both specifications will store
QCD tokens, as well as the Session Resumption ticket, if provided by QCD tokens, as well as the Session Resumption ticket, if provided by
the gateway. A remote access gateway conforming to both the gateway. A remote access gateway conforming to both
specifications will generate a QCD token for the client. When the specifications will generate a QCD token for the client. When the
gateway reboots, the client will discover this in either of two ways: gateway reboots, the client will discover this in either of two ways:
1. The client does regular liveness checks, or else the time for 1. The client does regular liveness checks, or else the time for
some other IKE exchange has come. Since the gateway is still some other IKE exchange has come. Since the gateway is still
down, the IKE exchange times out after several minutes. In this down, the IKE exchange times out after several minutes. In this
case QCD does not help. case QCD does not help.
skipping to change at page 16, line 5 skipping to change at page 14, line 30
---- Reboot ----- ---- Reboot -----
HDR, {} --> HDR, {} -->
<-- HDR, N(QCD_TOKEN) <-- HDR, N(QCD_TOKEN)
HDR, [N(COOKIE),] HDR, [N(COOKIE),]
Ni, N(TICKET_OPAQUE) Ni, N(TICKET_OPAQUE)
[,N+] --> [,N+] -->
<-- HDR, Nr [,N+] <-- HDR, Nr [,N+]
9. Operational Considerations 8. Operational Considerations
9.1. Who should implement this specification 8.1. Who should implement this specification
Throughout this document, we have referred to reboot time Throughout this document, we have referred to reboot time
alternatingly as the time that the implementation crashes and the alternatingly as the time that the implementation crashes and the
time when it is ready to process IPsec packets and IKE exchanges. time when it is ready to process IPsec packets and IKE exchanges.
Depending on the hardware and software platforms and the cause of the Depending on the hardware and software platforms and the cause of the
reboot, rebooting may take anywhere from a few seconds to several reboot, rebooting may take anywhere from a few seconds to several
minutes. If the implementation is down for a long time, the benefit minutes. If the implementation is down for a long time, the benefit
of this protocol extension is reduced. For this reason critical of this protocol extension is reduced. For this reason critical
systems should implement backup gateways as described in Section 6. systems should implement backup gateways as described in Section 6.
skipping to change at page 16, line 45 skipping to change at page 15, line 24
several roles. several roles.
In order to limit the effects of DoS attacks, a token taker SHOULD In order to limit the effects of DoS attacks, a token taker SHOULD
limit the rate of QCD_TOKENs verified from a particular source. limit the rate of QCD_TOKENs verified from a particular source.
If excessive amounts of IKE requests protected with unknown IKE SPIs If excessive amounts of IKE requests protected with unknown IKE SPIs
arrive at a token maker, the IKE module SHOULD revert to the behavior arrive at a token maker, the IKE module SHOULD revert to the behavior
described in section 2.21 of [RFC5996] and either send an described in section 2.21 of [RFC5996] and either send an
INVALID_IKE_SPI notification, or ignore it entirely. INVALID_IKE_SPI notification, or ignore it entirely.
9.2. Response to unknown child SPI 8.2. Response to unknown child SPI
After a reboot, it is more likely that an implementation receives After a reboot, it is more likely that an implementation receives
IPsec packets than IKE packets. In that case, the rebooted IPsec packets than IKE packets. In that case, the rebooted
implementation will send an INVALID_SPI notification, triggering a implementation will send an INVALID_SPI notification, triggering a
liveness check. The token will only be sent in a response to the liveness check. The token will only be sent in a response to the
liveness check, thus requiring an extra round-trip. liveness check, thus requiring an extra round-trip.
To avoid this, an implementation that has access to enough non- To avoid this, an implementation that has access to enough non-
volatile storage MAY store a mapping of child SPIs to owning IKE volatile storage MAY store a mapping of child SPIs to owning IKE
SPIs, or to generated tokens. If such a mapping is available and SPIs, or to generated tokens. If such a mapping is available and
skipping to change at page 17, line 21 skipping to change at page 15, line 48
QCD token that arrives with an INVALID_SPI notification the same as QCD token that arrives with an INVALID_SPI notification the same as
if it arrived with the IKE SPIs of the parent IKE SA. if it arrived with the IKE SPIs of the parent IKE SA.
However, a persistent storage module might not be updated in a timely However, a persistent storage module might not be updated in a timely
manner, and could be populated with tokens relating to IKE SPIs that manner, and could be populated with tokens relating to IKE SPIs that
have already been rekeyed. A token taker MUST NOT take an invalid have already been rekeyed. A token taker MUST NOT take an invalid
QCD Token sent along with an INVALID_SPI notification as evidence QCD Token sent along with an INVALID_SPI notification as evidence
that the peer is either malfunctioning or attacking, but it SHOULD that the peer is either malfunctioning or attacking, but it SHOULD
limit the rate at which such notifications are processed. limit the rate at which such notifications are processed.
10. Security Considerations 9. Security Considerations
The extension described in this document must not reduce the security The extension described in this document must not reduce the security
of IKEv2 or IPsec. Specifically, an eavesdropper must not learn any of IKEv2 or IPsec. Specifically, an eavesdropper must not learn any
non-public information about the peers. non-public information about the peers.
The proposed mechanism should be secure against attacks by a passive The proposed mechanism should be secure against attacks by a passive
MITM (eavesdropper). Such an attacker must not be able to disrupt an MITM (eavesdropper). Such an attacker must not be able to disrupt an
existing IKE session, either by resetting the session or by existing IKE session, either by resetting the session or by
introducing significant delays. This requirement is especially introducing significant delays. This requirement is especially
significant, because this document introduces a new way to reset an significant, because this document introduces a new way to reset an
IKE SA. IKE SA.
The mechanism need not be similarly secure against an active MITM, The mechanism need not be similarly secure against an active MITM,
since this type of attacker is already able to disrupt IKE sessions. since this type of attacker is already able to disrupt IKE sessions.
10.1. QCD Token Generation and Handling 9.1. QCD Token Generation and Handling
Tokens MUST be hard to guess. This is critical, because if an Tokens MUST be hard to guess. This is critical, because if an
attacker can guess the token associated with an IKE SA, she can tear attacker can guess the token associated with an IKE SA, she can tear
down the IKE SA and associated tunnels at will. When the token is down the IKE SA and associated tunnels at will. When the token is
delivered in the IKE_AUTH exchange, it is encrypted. When it is sent delivered in the IKE_AUTH exchange, it is encrypted. When it is sent
again in an unprotected notification, it is not, but that is the last again in an unprotected notification, it is not, but that is the last
time this token is ever used. time this token is ever used.
An aggregation of some tokens generated by one maker together with An aggregation of some tokens generated by one maker together with
the related IKE SPIs MUST NOT give an attacker the ability to guess the related IKE SPIs MUST NOT give an attacker the ability to guess
skipping to change at page 18, line 22 skipping to change at page 17, line 5
The QCD token is sent by the rebooted peer in an unprotected message. The QCD token is sent by the rebooted peer in an unprotected message.
A message like that is subject to modification, deletion and replay A message like that is subject to modification, deletion and replay
by an attacker. However, these attacks will not compromise the by an attacker. However, these attacks will not compromise the
security of either side. Modification is meaningless because a security of either side. Modification is meaningless because a
modified token is simply an invalid token. Deletion will only cause modified token is simply an invalid token. Deletion will only cause
the protocol not to work, resulting in a delay in tunnel re- the protocol not to work, resulting in a delay in tunnel re-
establishment as described in Section 2. Replay is also meaningless, establishment as described in Section 2. Replay is also meaningless,
because the IKE SA has been deleted after the first transmission. because the IKE SA has been deleted after the first transmission.
10.2. QCD Token Transmission 9.2. QCD Token Transmission
A token maker MUST NOT send a QCD token in an unprotected message for A token maker MUST NOT send a QCD token in an unprotected message for
an existing IKE SA. This implies that a conforming QCD token maker an existing IKE SA. This implies that a conforming QCD token maker
MUST be able to tell whether a particular pair of IKE SPIs represent MUST be able to tell whether a particular pair of IKE SPIs represent
a valid IKE SA. a valid IKE SA.
This requirement is obvious and easy in the case of a single gateway. This requirement is obvious and easy in the case of a single gateway.
However, some implementations use a load balancer to divide the load However, some implementations use a load balancer to divide the load
between several physical gateways. It MUST NOT be possible even in between several physical gateways. It MUST NOT be possible even in
such a configuration to trick one gateway into sending a QCD token such a configuration to trick one gateway into sending a QCD token
for an IKE SA which is valid on another gateway. for an IKE SA which is valid on another gateway.
This document does not specify how a load sharing configuration of This document does not specify how a load sharing configuration of
IPsec gateways would work, but in order to support this IPsec gateways would work, but in order to support this
specification, all members MUST be able to tell whether a particular specification, all members MUST be able to tell whether a particular
IKE SA is active anywhere in the cluster. One way to do it is to IKE SA is active anywhere in the cluster. One way to do it is to
synchronize a list of active IKE SPIs among all the cluster members. synchronize a list of active IKE SPIs among all the cluster members.
10.3. QCD Token Enumeration 9.3. QCD Token Enumeration
An attacker may try to attack QCD if the generation algorithm An attacker may try to attack QCD if the generation algorithm
described in Section 5.1 is used. The attacker will send several described in Section 5.1 is used. The attacker will send several
fake IKE requests to the gateway under attack, receiving and fake IKE requests to the gateway under attack, receiving and
recording the QCD Tokens in the responses. This will allow the recording the QCD Tokens in the responses. This will allow the
attacker to create a dictionary of IKE SPIs to QCD Tokens, which can attacker to create a dictionary of IKE SPIs to QCD Tokens, which can
later be used to tear down any IKE SA. later be used to tear down any IKE SA.
Three factors mitigate this threat: Three factors mitigate this threat:
o The space of all possible IKE SPI pairs is huge: 2^128, so making o The space of all possible IKE SPI pairs is huge: 2^128, so making
such a dictionary is impractical. Even if we assume that one such a dictionary is impractical. Even if we assume that one
implementation always generates predictable IKE SPIs, the space is implementation always generates predictable IKE SPIs, the space is
still at least 2^64 entries, so making the dictionary is extremely still at least 2^64 entries, so making the dictionary is extremely
hard. To ensure this, token makers MUST generate unpredictable hard. To ensure this, token makers MUST generate unpredictable
IKE SPIs by using a cryptographically strong pseudo-random number IKE SPIs by using a cryptographically strong pseudo-random number
generator. generator.
o Throttling the amount of QCD_TOKEN notifications sent out, as o Throttling the amount of QCD_TOKEN notifications sent out, as
discussed in Section 9.1, especially when not soon after a crash discussed in Section 8.1, especially when not soon after a crash
will limit the attacker's ability to construct a dictionary. will limit the attacker's ability to construct a dictionary.
o The methods in Section 5.1 and Section 5.2 allow for a periodic o The methods in Section 5.1 and Section 5.2 allow for a periodic
change of the QCD_SECRET. Any such change invalidates the entire change of the QCD_SECRET. Any such change invalidates the entire
dictionary. dictionary.
10.4. Selecting an Appropriate Token Generation Method 9.4. Selecting an Appropriate Token Generation Method
This section describes the rationale for token generation methods This section describes the rationale for token generation methods
such as the one described in Section 5.2. Note that this section such as the one described in Section 5.2. Note that this section
merely provides a possible rationale, and does not specify or merely provides a possible rationale, and does not specify or
recommend any kind of configuration. recommend any kind of configuration.
Some configurations of security gateway use a load-sharing cluster of Some configurations of security gateway use a load-sharing cluster of
hosts, all sharing the same IP addresses, where the SAs (IKE and hosts, all sharing the same IP addresses, where the SAs (IKE and
child) are not synchronized between the cluster members. In such a child) are not synchronized between the cluster members. In such a
configuration, a single member does not know about all the IKE SAs configuration, a single member does not know about all the IKE SAs
skipping to change at page 20, line 5 skipping to change at page 18, line 32
To thwart this possible attack, such configurations should use a To thwart this possible attack, such configurations should use a
method that considers the taker's IP address, such as the method method that considers the taker's IP address, such as the method
described in Section 5.2. described in Section 5.2.
On the other hand, when using this method a change of address On the other hand, when using this method a change of address
invalidates the tokens, so this method is only recommended when the invalidates the tokens, so this method is only recommended when the
configuration involves gateways generating the same tokens without configuration involves gateways generating the same tokens without
access to all the IKE SAs. access to all the IKE SAs.
11. IANA Considerations 10. IANA Considerations
IANA is requested to assign a notify message type from the status IANA is requested to assign a notify message type from the status
types range (16406-40959) of the "IKEv2 Notify Message Types" types range (16406-40959) of the "IKEv2 Notify Message Types"
registry with name "QUICK_CRASH_DETECTION". registry with name "QUICK_CRASH_DETECTION".
12. Acknowledgements 11. Acknowledgements
We would like to thank Hannes Tschofenig and Yaron Sheffer for their We would like to thank Hannes Tschofenig and Yaron Sheffer for their
comments about Session Resumption. comments about Session Resumption.
Others who have contrinuted valuable comments are, in alphabetical Others who have contrinuted valuable comments are, in alphabetical
order, Lakshminath Dondeti, Tero Kivinen, and Scott C Moonen. order, Lakshminath Dondeti, Tero Kivinen, and Scott C Moonen.
13. Change Log 12. Change Log
This section lists all changes in this document This section lists all changes in this document
NOTE TO RFC EDITOR : Please remove this section in the final RFC NOTE TO RFC EDITOR : Please remove this section in the final RFC
13.1. Changes from draft-ietf-ipsecme-failure-detection-01 12.1. Changes from draft-ietf-ipsecme-failure-detection-02
o Moved section 7 to Appendix A. Also changed some wording.
o Fixed some language in the "interaction with session resumption"
section to say that although liveness check MUST be done, there
are no time limits to how long an implementation takes before
starting liveness check, or ending it.
12.2. Changes from draft-ietf-ipsecme-failure-detection-01
o Fixed the language requiring random IKE SPIs. o Fixed the language requiring random IKE SPIs.
o Some better explanation of the reasons to choose the methods in o Some better explanation of the reasons to choose the methods in
Section 5.2 and the method in Section 5.1, to close issue #193. Section 5.2 and the method in Section 5.1, to close issue #193.
o Added text to the beginning of Section 10 to accomodate issue o Added text to the beginning of Section 9 to accomodate issue #194.
#194.
13.2. Changes from draft-ietf-ipsecme-failure-detection-00 12.3. Changes from draft-ietf-ipsecme-failure-detection-00
o Nits pointed out by Scott and Yaron. o Nits pointed out by Scott and Yaron.
o Pratima and Frederic are back on board. o Pratima and Frederic are back on board.
o Changed IKEv2bis draft reference to RFC 5996. o Changed IKEv2bis draft reference to RFC 5996.
o Resolved issues #189, #190, #191, and #192: o Resolved issues #189, #190, #191, and #192:
* Renamed section 4.5 and removed the requirement to send an * Renamed section 4.5 and removed the requirement to send an
acknowledgement for the unprotected message. acknowledgement for the unprotected message.
* Moved the QCD token from the last to the first IKE_AUTH * Moved the QCD token from the last to the first IKE_AUTH
request. request.
* Added a MUST to Section 10.3 to require that IKE SPIs be * Added a MUST to Section 9.3 to require that IKE SPIs be
randomly generated. randomly generated.
* Changed the language in Section 9.1, to not use RFC 2119 * Changed the language in Section 8.1, to not use RFC 2119
terminology. terminology.
* Moved the section describing why one would want the method * Moved the section describing why one would want the method
dependant on IP addresses (in Section 5.2 from operational dependant on IP addresses (in Section 5.2 from operational
considerations to security considerations. considerations to security considerations.
13.3. Changes from draft-nir-ike-qcd-07 12.4. Changes from draft-nir-ike-qcd-07
o First WG version. o First WG version.
o Addressed Scott C Moonen's concern about collisions of QCD tokens. o Addressed Scott C Moonen's concern about collisions of QCD tokens.
o Updated references to point to IKEv2bis instead of RFC 4306 and o Updated references to point to IKEv2bis instead of RFC 4306 and
4718. Also converted draft reference for resumption to RFC 5723. 4718. Also converted draft reference for resumption to RFC 5723.
o Added Dave Wiebrowski as author, and removed Pratima and Frederic. o Added Dave Wiebrowski as author, and removed Pratima and Frederic.
13.4. Changes from draft-nir-ike-qcd-03 and -04 12.5. Changes from draft-nir-ike-qcd-03 and -04
Mostly editorial changes and cleaning up. Mostly editorial changes and cleaning up.
13.5. Changes from draft-nir-ike-qcd-02 12.6. Changes from draft-nir-ike-qcd-02
o Described QCD token enumeration, following a question by o Described QCD token enumeration, following a question by
Lakshminath Dondeti. Lakshminath Dondeti.
o Added the ability to replace the QCD token for an existing IKE SA. o Added the ability to replace the QCD token for an existing IKE SA.
o Added tokens dependent on peer IP address and their interaction o Added tokens dependent on peer IP address and their interaction
with MOBIKE. with MOBIKE.
13.6. Changes from draft-nir-ike-qcd-01 12.7. Changes from draft-nir-ike-qcd-01
o Removed stateless method. o Removed stateless method.
o Added discussion of rekeying and resumption. o Added discussion of rekeying and resumption.
o Added discussion of non-synchronized load-balanced clusters of o Added discussion of non-synchronized load-balanced clusters of
gateways in the security considerations. gateways in the security considerations.
o Other wording fixes. o Other wording fixes.
13.7. Changes from draft-nir-ike-qcd-00 12.8. Changes from draft-nir-ike-qcd-00
o Merged proposal with draft-detienne-ikev2-recovery o Merged proposal with draft-detienne-ikev2-recovery
o Changed the protocol so that the rebooted peer generates the o Changed the protocol so that the rebooted peer generates the
token. This has the effect, that the need for persistent storage token. This has the effect, that the need for persistent storage
is eliminated. is eliminated.
o Added discussion of birth certificates. o Added discussion of birth certificates.
13.8. Changes from draft-nir-qcr-00 12.9. Changes from draft-nir-qcr-00
o Changed name to reflect that this relates to IKE. Also changed o Changed name to reflect that this relates to IKE. Also changed
from quick crash recovery to quick crash detection to avoid from quick crash recovery to quick crash detection to avoid
confusion with IFARE. confusion with IFARE.
o Added more operational considerations. o Added more operational considerations.
o Added interaction with IFARE. o Added interaction with IFARE.
o Added discussion of backup gateways. o Added discussion of backup gateways.
14. References 13. References
14.1. Normative References
13.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4555] Eronen, P., "IKEv2 Mobility and Multihoming Protocol [RFC4555] Eronen, P., "IKEv2 Mobility and Multihoming Protocol
(MOBIKE)", RFC 4555, June 2006. (MOBIKE)", RFC 4555, June 2006.
[RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen, [RFC5996] Kaufman, C., Hoffman, P., Nir, Y., and P. Eronen,
"Internet Key Exchange Protocol: IKEv2", RFC 5996, "Internet Key Exchange Protocol: IKEv2", RFC 5996,
September 2010. September 2010.
14.2. Informative References 13.2. Informative References
[RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption", [RFC5723] Sheffer, Y. and H. Tschofenig, "IKEv2 Session Resumption",
RFC 5723, January 2010. RFC 5723, January 2010.
[cluster] Nir, Y., Ed., "IPsec Cluster Problem Statement", [cluster] Nir, Y., Ed., "IPsec Cluster Problem Statement",
draft-ietf-ipsecme-ipsec-ha (work in progress), July 2010. draft-ietf-ipsecme-ipsec-ha (work in progress), July 2010.
[recovery] [recovery]
Detienne, F., Sethi, P., and Y. Nir, "Safe IKE Recovery", Detienne, F., Sethi, P., and Y. Nir, "Safe IKE Recovery",
draft-detienne-ikev2-recovery (work in progress), draft-detienne-ikev2-recovery (work in progress),
January 2010. January 2010.
Appendix A. The Path Not Taken
A.1. Initiating a new IKE SA
Instead of sending a QCD token, we could have the rebooted
implementation start an Initial exchange with the peer, including the
INITIAL_CONTACT notification. This would have the same effect,
instructing the peer to erase the old IKE SA, as well as establishing
a new IKE SA with fewer rounds.
The disadvantage here, is that in IKEv2 an authentication exchange
MUST have a piggy-backed Child SA set up. Since our use case is such
that the rebooted implementation does not have traffic flowing to the
peer, there are no good selectors for such a Child SA.
Additionally, when authentication is asymmetric, such as when EAP is
used, it is not possible for the rebooted implementation to initiate
IKE.
A.2. SIR
Another proposal that was considered for this work item is the SIR
extension, which is described in [recovery]. Under that proposal,
the non-rebooted peer sends a non-protected query to the possibly
rebooted peer, asking whether the IKE SA exists. The peer replies
with either a positive or negative response, and the absence of a
positive response, along with the existence of a negative response is
taken as proof that the IKE SA has really been lost.
The working group preferred the QCD proposal to this one.
A.3. Birth Certificates
Birth Certificates is a method of crash detection that has never been
formally defined. Bill Sommerfeld suggested this idea in a mail to
the IPsec mailing list on August 7, 2000, in a thread discussing
methods of crash detection:
If we have the system sign a "birth certificate" when it
reboots (including a reboot time or boot sequence number),
we could include that with a "bad spi" ICMP error and in
the negotiation of the IKE SA.
We believe that this method would have some problems. First, it
requires Alice to store the certificate, so as to be able to compare
the public keys. That requires more storage than does a QCD token.
Additionally, the public-key operations needed to verify the self-
signed certificates are more expensive for Alice.
We believe that a symmetric-key operation such as proposed here is
more light-weight and simple than that implied by the Birth
Certificate idea.
A.4. Reducing Liveness Check Length
Some implementations require fewer retransmissions over a shorter
period of time for cases of liveness check started because of an
INVALID_SPI or INVALID_IKE_SPI notification.
We believe that the default retransmission policy should represent a
good balance between the need for a timely discovery of a dead peer,
and a low probability of false detection. We expect the policy to be
set to take the shortest time such that this probability achieves a
certain target. Therefore, we believe that reducing the elapsed time
and retransmission count may create an unacceptably high probability
of false detection, and this can be triggered by a single
INVALID_IKE_SPI notification.
Additionally, even if the retransmission policy is reduced to, say,
one minute, it is still a very noticeable delay from a human
perspective, from the time that the gateway has come up (i.e. is able
to respond with an INVALID_SPI or INVALID_IKE_SPI notification) and
until the tunnels are active, or from the time the backup gateway has
taken over until the tunnels are active. The use of QCD tokens can
reduce this delay.
Authors' Addresses Authors' Addresses
Yoav Nir (editor) Yoav Nir (editor)
Check Point Software Technologies Ltd. Check Point Software Technologies Ltd.
5 Hasolelim st. 5 Hasolelim st.
Tel Aviv 67897 Tel Aviv 67897
Israel Israel
Email: ynir@checkpoint.com Email: ynir@checkpoint.com
 End of changes. 44 change blocks. 
162 lines changed or deleted 174 lines changed or added

This html diff was produced by rfcdiff 1.40. The latest version is available from http://tools.ietf.org/tools/rfcdiff/