--- 1/draft-ietf-ccamp-gr-description-02.txt 2008-05-19 19:12:15.000000000 +0200 +++ 2/draft-ietf-ccamp-gr-description-03.txt 2008-05-19 19:12:15.000000000 +0200 @@ -1,20 +1,20 @@ Network Working Group Dan Li (Huawei) Internet Draft Jianhua Gao (Huawei) Arun Satyanarayana (Cisco) Intended Status: Informational -Expires: November 5, 2008 May 5, 2008 +Expires: November 19, 2008 May 19, 2008 Description of the RSVP-TE Graceful Restart Procedures - draft-ietf-ccamp-gr-description-02.txt + draft-ietf-ccamp-gr-description-03.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that @@ -57,168 +57,169 @@ can be recovered in different scenarios where the order in which the nodes restart is different. This document does not define any new processes or procedures. All protocol mechanisms are already defined in the referenced documents. Table of Contents 1. Introduction.................................................3 2. Existing Procedures for Single Node Restart..................4 - 2.1. Procedures defined in [RFC3473]............................4 - 2.2. Procedures defined in [RFC5063]............................5 + 2.1. Procedures Defined in [RFC3473]............................4 + 2.2. Procedures Defined in [RFC5063]............................5 3. Multiple Node Restart Scenarios..............................5 4. RSVP State...................................................7 5. Procedures for Multiple Node Restart.........................7 5.1. Procedures for the Normal Node.............................7 5.2. Procedures for the Restarting Node.........................7 5.2.1. Procedures for Scenario 1................................8 5.2.2. Procedures for Scenario 2................................9 - 5.2.3. Procedures for scenario 3...............................10 - 5.2.4. Procedures for scenario 4...............................11 - 5.2.5. Procedures for scenario 5...............................12 + 5.2.3. Procedures for Scenario 3...............................10 + 5.2.4. Procedures for Scenario 4...............................11 + 5.2.5. Procedures for Scenario 5...............................12 5.3. Consideration of Re-Use of Data Plane Resources...........12 5.4. Consideration of Management Plane Intervention............12 6. Clarification of Restarting Node Procedure..................13 7. Security Considerations.....................................14 - 8. IANA Considerations.........................................14 - 9. Acknowledgments.............................................15 - 10. References.................................................15 - 10.1. Normative References.....................................15 - 10.2. Informative References...................................15 - 11. Author's Addresses.........................................16 - 12. Full Copyright Statement...................................16 - 13. Intellectual Property Statement............................17 + 8. IANA Considerations.........................................16 + 9. Acknowledgments.............................................16 + 10. References.................................................16 + 10.1. Normative References.....................................16 + 10.2. Informative References...................................16 + 11. Author's Addresses.........................................17 + 12. Full Copyright Statement...................................18 + 13. Intellectual Property Statement............................18 1. Introduction The Hello message for the Resource Reservation Protocol (RSVP) has been defined to establish and maintain basic signaling node adjacencies for Label Switching Routers (LSRs) participating in a Multiprotocol Label Switching (MPLS) traffic engineered (TE) network [RFC3209]. The Hello message has been extended for use in Generalized MPLS (GMPLS) network for state recovery of control channel or nodal faults through the exchange of the Restart Capabilities object [RFC3473]. GMPLS protocol definitions for RSVP [RFC3473] also allow a restarting node to learn the label that it previously allocated for - use on a Label Switching Path (LSP) through the Recovery Label + use on a Label Switching Path (LSP) through the RECOVERY_LABEL object carried on a Path message sent to a restarting node from its upstream neighbor. Further RSVP protocol extensions have been defined [RFC5063] to perform graceful restart and to enable a restarting node to recover full control plane state by exchanging RSVP messages with its upstream and downstream neighbors. State previously transmitted to the upstream neighbor (principally the downstream label) is recovered from the upstream neighbor on a Path message (using the - Recovery Label object as described in [RFC3473]). State previously + RECOVERY_LABEL object as described in [RFC3473]). State previously transmitted to the downstream neighbor (including the upstream label, interface identifiers, and the explicit route) is recovered from the downstream neighbor using a RecoveryPath message. [RFC5063] also extends the Hello message to exchange information about the ability to support the RecoveryPath message. The examples and procedures in [RFC3473] and [RFC5063] focus on the description of a single node restart when adjacent network nodes are operative. Although the procedures are equally applicable to multi-node restarts, no detailed explanation is provided. - This document provides and informational clarification of the + This document provides an informational clarification of the control plane procedures for a GMPLS network when there are multiple node failures, and describes how full control plane state can be recovered in different scenarios where the order in which the nodes restart is different. This document does not define any new processes or procedures. All protocol mechanisms already defined in [RFC3473] and [RFC5063] are definitive. 2. Existing Procedures for Single Node Restart This section documents for information the existing procedures defined in [RFC3473] and [RFC5063]. Those documents are definitive, and the description here is non-normative. It is provided for informational clarification only. -2.1. Procedures defined in [RFC3473] +2.1. Procedures Defined in [RFC3473] In the case of nodal faults, the procedures for the restarting node and the procedures for the neighbor of a restarting node are applied to the corresponding nodes. These procedures described in [RFC3473] are summarized as follows: For the Restarting Node: 1) Tells its neighbors that state recovery is supported using the Hello message; - 2) Recovers its RSVP state with the help of a Path message received + 2) Recover its RSVP state with the help of a Path message received from its upstream neighbor carrying the RECOVERY_LABEL object; 3) For bidirectional LSPs, the UPSTREAM_LABEL object on the received Path message is used to recover the corresponding RSVP state; - 4) If the corresponding forwarding state in data plane is not existed, - the node treats this as a setup for a new LSP. If the forwarding - state in data plane is existed, the forwarding state is bound to the - LSP associated with the message, and related forwarding state should - be considered as valid and refreshed. In addition, if the node is not - the tail-end of the LSP, the corresponding outgoing Path messages is - sent with the incoming label from that entry carried in the - UPSTREAM_LABEL object. + 4) If the corresponding forwarding state in the data plane does not + exist, the node treats this as a setup for a new LSP. If the + forwarding state in the data plane exists, the forwarding state is + bound to the LSP associated with the message, and related forwarding + state should be considered as valid and refreshed. In addition, if + the node is not the tail-end of the LSP, the incoming label on the + downstream interface is retrieved from the forwarding state on the + restarting node and set in the UPSTREAM_LABEL object in the Path + message sent to the downstream neighbor. For the Neighbor of a restarting node: - 1) Sends the Path message with RECOVERY_LABEL object containing a - label value corresponding to the label value received in the most - recently received corresponding Resv message; + 1) Sends a Path message with RECOVERY_LABEL object containing a label + value corresponding to the label value received in the most recently + received corresponding Resv message; 2) Resumes refreshing Path state with the restarting node; 3) Resumes refreshing Resv state with the restarting node. -2.2. Procedures defined in [RFC5063] +2.2. Procedures Defined in [RFC5063] - A new message is introduced in [RFC5063] which is called the - RecoveryPath message. The message is sent by the downstream - neighbor of a restarting node to convey the contents of the last - received Path message back to the restarting node. + A new message is introduced in [RFC5063] called the RecoveryPath + message. The message is sent by the downstream neighbor of a + restarting node to convey the contents of the last received Path + message back to the restarting node. The restarting node will receive the Path message with the RECOVERY_LABEL object from its upstream neighbor, and/or the RecoveryPath message from its downstream neighbor. The full RSVP state of the restarting node can be recovered from these two messages. - From the received Path message the following state can be recovered: + The following state can be recovered from the received Path message: o Upstream data interface (from RSVP_HOP object) o Label on the upstream data interface (from RECOVERY_LABEL object) o Upstream label for bidirectional LSP (from UPSTREAM_LABEL object) - From the received RecoveryPath message the following state can be - recovered: + The following state can be recovered from the received RecoveryPath + message: o Downstream data interface (from RSVP_HOP object) o Label on the downstream data interface (from RECOVERY_LABEL object) o Upstream direction label for bidirectional LSP (from UPSTREAM_LABEL object) - The other objects also can be recovered either by regular Path - message or RecoveryPath message, and Resv message. + The other objects also can be recovered either from the regular + Path and Resv messages, or from the RecoveryPath message. 3. Multiple Node Restart Scenarios We define the following terms for the different node types: Restarting - The node has restarted; communication with its neighbor nodes is restored, its RSVP state is under recovery. Delayed Restarting - The node has restarted, but the communication with a neighbor node is interrupted (for example, the neighbor node @@ -257,28 +258,28 @@ a Delayed Restarting node. Nodes C and D are Normal nodes. 5) A Restarting Egress node with upstream Delayed Restarting node. For example, in Figure 1, Nodes A and B are Normal nodes, Node C is a Delayed Restarting node, and Node D is a Restarting node. If the communication between two nodes is interrupted, the upstream node may think the downstream node is a Delayed Restarting node, or vice versa. - Note that if multi nodes which are not neighbors are restarted, the - restart Procedures could be applied as multiple separated restart - procedures which are exactly the same as the procedures described - in [RFC3473] and [RFC5063]. Therefore, these scenarios are not - described in this document. For example, in Figure 1, Node A and - Node C are normal nodes, and Node B and Node D are restarting nodes, - so Node B could be restarted through Node A and Node C, meanwhile, - Node D could be restarted through Node C separately. + Note that if multiple nodes which are not neighbors are restarted, + the restart Procedures could be applied as multiple separated + restart procedures which are exactly the same as the procedures + described in [RFC3473] and [RFC5063]. Therefore, these scenarios + are not described in this document. For example, in Figure 1, Node + A and Node C are normal nodes, and Node B and Node D are restarting + nodes, so Node B could be restarted through Node A and Node C, + meanwhile, Node D could be restarted through Node C separately. 4. RSVP State For each scenario, the RSVP state needs to be recovered at the restarting nodes are Path State Block (PSB) and Resv State Block (RSB), which are created when the node receives the corresponding Path message and Resv message. According to [RFC2209], how to construct the PSB and RSB is really an implementation issue. In fact, there is no requirement to @@ -445,21 +446,21 @@ state. Note that if Node B restarts after this operation, the Path message that it sends to Node C will not be matched with any state on Node C and will be treated as a new Path message resulting in LSP setup. Node C should use the labels carried in the Path message (in the UPSTREAM_LABEL object and in the RECOVERY_LABEL object) to drive its label allocation, but may use other labels according to normal LSP setup rules. -5.2.3. Procedures for scenario 3 +5.2.3. Procedures for Scenario 3 In this example, the Restarting node (Node C) is isolated. It's upstream and downstream neighbors have not restarted. The Restarting node (Node C) follows the procedures in section 9.3 of [RFC3473] and may run a Restart Timer for each of its neighbors (Nodes B and D). If a neighbor has not restarted before its Restart Timer expires, the corresponding LSPs may be torn down according to local policy [RFC3473]. Note, however, that the Restart Time values suggested in [RFC3473] are based on the previous Hello message @@ -470,64 +471,66 @@ During the Recovery Time, if the upstream Delayed Restarting node has restarted, the procedure for scenario 1 can be applied. During the Recovery Time, if the downstream Delayed Restarting node has restarted, the procedure for scenario 2 can be applied. In the case that neither Delayed Restarting node ever comes back, and where a Restart Timer is not used to automatically tear down LSPs, management intervention is required to tidy up the control - plane and the data plane on the nodes that are waiting for the - failed device to restart. + plane and the data plane on the node that is waiting for the failed + device to restart. If the downstream Delayed Restarting node restarts after the cleanup of LSPs at Node C, the RecoveryPath message from Node D will be responded with a PathTear message. If the upstream Delayed Restarting node restarts after the cleanup of LSPs at Node C, the Path message from Node B will be treated as a new LSP setup request, but the setup will fail because Node D cannot be reached - Node C will respond with a PathErr message. Since this happens to Node B during its restart processing, it should follow the rules of [RFC5063] and tear down the LSP. -5.2.4. Procedures for scenario 4 +5.2.4. Procedures for Scenario 4 When the Ingress node (Node A) restarts, it does not know which LSPs it caused to be created. Usually, however, this information is retrieved from the management plane or from the configuration requests stored in non-volatile form in the node in order to recover the LSP state. Furthermore, if the downstream node (Node B) is a Normal node, according to the procedures in [RFC5063], the ingress will receive a RecoveryPath message and will understand that it was the ingress of the LSP. However, in this scenario, the downstream node is a Delayed Restarting node, so Node A must rely on the information from the management plane or stored configuration, or it must wait for Node B to restart. In the event that Node B never restarts, management plane - intervention may be used at Node A to clean up any LSP state - restored from the management plane or from local configuration. + intervention is needed at Node A to clean up any LSP control plane + state restored from the management plane or from local + configuration, and to release any data plane resources. -5.2.5. Procedures for scenario 5 +5.2.5. Procedures for Scenario 5 In this scenario the Egress node (Node D) restarts, and its upstream neighbor (Node C) has not restarted. In this case, the - Egress node is completely unaware of the LSPs. It has no downstream - neighbor to help it, and no management plane or configuration - information. The Egress node must simply wait until its upstream - neighbor restarts and gives it the information as Path messages - carrying RECOVERY_LABEL objects. + Egress node may have no control plane state relating to the LSPs. + It has no downstream neighbor to help it, and no management plane + or configuration information, although there will be data plane + state for the LSP. The Egress node must simply wait until its + upstream neighbor restarts and gives it the information as Path + messages carrying RECOVERY_LABEL objects. 5.3. Consideration of Re-Use of Data Plane Resources Fundamental to the processes described above is an understanding that data plane resources may remain in use (allocated and cross- connected) when control plane state has not been fully resynchronized because some control plane nodes have not restarted. It is assumed that these data plane resources might be carrying traffic and should not be reconfigured except through application @@ -581,70 +584,129 @@ |<---------------| | Path without | | recovery label | |--------------->| | X (resoure allocation failed because the | | resouces are in use) | PathErr | |<---------------| | PathTear | |--------------->| - X(CON deletion) X (CON deletion) + X(LSP deletion) X (LSP deletion) | | + Figure 2 Message flow for accidental LSP deletion The sequence diagram above depicts one scenario where the LSP may get deleted. - In this sequence N1 did not detect hello failure and continues + In this sequence N1 did not detect Hello failure and continues sending SRefreshes which may get NACK'ed by N2 once restart completes because there is no Path state corresponding to the SRefresh message. This NACK causes a Path refresh message to be generated but there is no RECOVERY_LABEL because N1 did not yet - detect that N2 has restarted as hello exchanges have not yet + detect that N2 has restarted as Hello exchanges have not yet started. The Path message is treated as "new" and fails to allocate the resources because they are still in use. This causes a PathErr message to be generated which may lead to the tear down of the LSP. - To resolve the aforementioned problem, the following procedures are - proposed and are meant to work together with the recovery - procedures documented in [RFC3473]. Here, it is assumed that the - restarting node and the neighboring node(s) support Hello extension - as documented in [RFC3209] and recovery procedures documented in + To resolve the aforementioned problem, the following procedures + which are implicit in [RFC3473] and [RFC5063] should be followed. + These procedures work together with the recovery procedures + documented in [RFC3473]. Here, it is assumed that the restarting + node and the neighboring node(s) support Hello extension as + documented in [RFC3209] and recovery procedures documented in [RFC3473]. After a node restarts its control plane, it should ignore and - silently drop all RSVP-TE messages, except hello messages, it + silently drop all RSVP-TE messages, except Hello messages, it receives from any neighbor to which, no HELLO session has been established. The restarting node should follow [RFC3209] to establish Hello sessions with its neighbors, after its control plane becomes operational. The restarting node resumes processing of RSVP-TE messages sent from each neighbor to which the Hello session has been established. 7. Security Considerations - This document clarifies the procedures to be performed on RSVP - agents that neighbor one or more restarting RSVP agents. In the - case of the control plane in general, and the RSVP agent in + This document clarifies the procedures defined in [RFC3473] and + [RFC5063] to be performed on RSVP agents that neighbor one or more + restarting RSVP agents. It does not introduce any new procedures + and, therefore, does not introduce any new security risks or issues. + + In the case of the control plane in general, and the RSVP agent in particular, where one or more nodes carrying one or more LSPs are restarted due to external attacks, the procedures defined in [RFC5063] and described in this document provide the ability for the restarting RSVP agents to recover the RSVP state in each restarting node corresponding to the LSPs, with the least possible - perturbation to the rest of the network. Ideally, only the - neighboring RSVP agents should notice the restart and hence need to - perform additional processing. This allows for a network with - active LSPs to recover LSP state gracefully from an external attack, - without perturbing the data/forwarding plane state. + perturbation to the rest of the network. These procedures can be + considered to provide mechanisms by which the GMPLS network can + recover from physical attacks or from attacks on remotely + controlled power supplies. + + The procedures described are such that, only the neighboring RSVP + agents should notice the restart of a node, and hence only they + need to perform additional processing. This allows for a network + with active LSPs to recover LSP state gracefully from an external + attack, without perturbing the data/forwarding plane state, and + without propagating the error condition in the control or data + plane. In other words, the effect of the restart (which might be + the result of an attack) does not spread into the network. + + Note that concern has been expressed about the vulnerability of a + restarting node to false messages received from its neighbors. For + example, a restarting node might receive a false Path message with + a Recovery Label object from an upstream neighbor, or a false + RecoveryPath message from its downstream neighbor. This situation + might arise in one of four cases: + + - The message is spoofed and does not come from the neighbor at all. + + - The message has been modified as it was travelling from the + neighbor. + + - The neighbor is defective and has generated a message in error. + + - The neighbor has been subverted and has a "rogue" RSVP agent. + + The first two cases may be handled using standard RSVP + authentication and integrity procedures [RFC3209], [RFC3473]. If + the operator is particularly worried, the control plane may be + operated using IPsec [RFC4301], [RFC4302], [RFC4835], [RFC4306], + and [RFC2411]. + + Protection against defective or rogue RSVP implementations is + generally hard to impossible. Neighbor-to-neighbor authentication + and integrity validation is, by definition, ineffective in these + situations. For example, if a neighbor node sends a Resv during + normal LSP setup, and if that message carries a GENERALIZED_LABEL + object carrying an incorrect label value, then the receiving LSR + will use the supplied value and the LSP will be set up incorrectly. + Alternatively, if a Path message is modified by an upstream LSR to + change the destination and explicit route, there is no way for the + downstream LSR to detect this, and the LSP may be set up to the + wrong destination. Furthermore, the upstream LSR could disguise + this fact by modifying the recorded route reported in the Resv + message. Thus, these issues are in no way specific to the restart + case, do not cause any greater or different problems from the + normal case, and do not warrant specific security measure + applicable to restart scenarios. + + Note that the RSVP POLICY_DATA object [RFC2205] provides a scope by + which secure end-to-end checks could be applied. However, very + little definition of the use of this object has been made to date. + + See [MPLS-SEC] for a wider discussion of security in MPLS and GMPLS + networks. 8. IANA Considerations This document defines no new protocols or extensions and makes no requests to IANA for registry management. 9. Acknowledgments We would like to thank Adrian Farrel, Dimitri Papadimitriou, and Lou Berger for their useful comments. @@ -663,55 +725,72 @@ [RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling Resource ReserVation Protocol-Traffic Engineering (RSVP-TE) Extensions", RFC 3473, January 2003. [RFC5063] A. Satyanarayana, R. Rahman, "Extensions to GMPLS RSVP Graceful Restart", RFC 5063, September 2007. 10.2. Informative References - None. +[MPLS-SEC] Fang, L., "Security Framework for MPLS and GMPLS Networks", + draft-ietf-mpls-mpls-and-gmpls-security-framework, work in + progress. + +[RFC2205] Braden, R. (Ed.), Zhang, L., Berson, S., Herzog, S. and S. + Jamin, "Resource ReserVation Protocol -- Version 1 + Functional Specification", RFC 2205, September 1997. + +[RFC2411] R. Thayer, N. Doraswamy, R. Glenn, "IP Security Document + Roadmap", RFC 2411, November 1998. + +[RFC4301] S. Kent, K. Seo, "Security Architecture for the Internet + Protocol", RFC 4301, December 2005. + +[RFC4302] S. Kent, "IP Authentication Header", RFC 4302, December + 2005. + +[RFC4306] C. Kaufman, "Internet Key Exchange (IKEv2) Protocol", RFC + 4306, December 2005. + +[RFC4835] V. Manral, "Cryptographic Algorithm Implementation + Requirements for Encapsulating Security Payload (ESP) and + Authentication Header (AH)", RFC 4835, April 2007. 11. Author's Addresses Dan Li Huawei Technologies Co., Ltd. F3-5-B R&D Center, Huawei Base, Bantian, Longgang District - Shenzhen 518129, - China + Shenzhen 518129, P.R.China Phone: +86 755 28973237 Email: danli@huawei.com Jianhua Gao Huawei Technologies Co., Ltd. F3-5-B R&D Center, Huawei Base, Bantian, Longgang District - Shenzhen 518129, - China + Shenzhen 518129, P.R.China Phone: +86 755 28972902 Email: gjhhit@huawei.com Arun Satyanarayana Cisco Systems, Inc. 170 West Tasman Dr. - San Jose, CA 95134, - USA + San Jose, CA 95134, USA Phone: +1 408 853-3206 Email: asatyana@cisco.com - Snigdho C. Bardalai Fujitsu Network Communications, Inc. 2801 Telecom Parkway, - Richardson, Texas 75082 - USA + Richardson, Texas 75082, USA Phone: +1 972 479 2951 Email: snigdho.bardalai@us.fujitsu.com 12. Full Copyright Statement Copyright (C) The IETF Trust (2008). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights.