--- 1/draft-ietf-ccamp-gr-description-00.txt 2007-11-16 19:12:06.000000000 +0100 +++ 2/draft-ietf-ccamp-gr-description-01.txt 2007-11-16 19:12:06.000000000 +0100 @@ -1,21 +1,21 @@ Network Working Group Dan Li Internet Draft Jianhua Gao Huawei Arun Satyanarayana Cisco Intended Status: Informational -Expires: February 2008 August, 2007 +Expires: May 2008 November, 2007 Description of the RSVP-TE Graceful Restart Procedures - draft-ietf-ccamp-gr-description-00.txt + draft-ietf-ccamp-gr-description-01.txt Status of this Memo By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that @@ -59,92 +59,93 @@ the nodes restart is different. This document does not define any new processes or procedures. All protocol mechanisms are already defined in the referenced documents. Table of Contents 1. Introduction.................................................3 2. Existing Procedures for Single Node Restart..................4 2.1. Procedures defined in [RFC3473]............................4 - 2.2. Procedures defined in [GR-EXT].............................5 + 2.2. Procedures defined in [RFC5063]............................5 3. Multiple Node Restart Scenarios..............................5 4. RSVP State...................................................6 5. Procedures for Multiple Node Restart.........................7 5.1. Procedures for the Normal Node.............................7 5.2. Procedures for the Restarting Node.........................7 5.2.1. Procedures for Scenario 1................................7 5.2.2. Procedures for Scenario 2................................9 5.2.3. Procedures for scenario 3...............................10 5.2.4. Procedures for scenario 4...............................11 5.2.5. Procedures for scenario 5...............................11 5.3. Consideration of Re-Use of Data Plane Resources...........12 5.4. Consideration of Management Plane Intervention............12 - 6. Security Considerations.....................................12 - 7. IANA Considerations.........................................13 - 8. Acknowledgments.............................................13 - 9. References..................................................13 - 9.1. Normative References......................................13 - 10. Authors' Addresses.........................................14 - 11. Full Copyright Statement...................................14 - 12. Intellectual Property Statement............................15 + 6. Clarification of Restarting Node Procedure..................12 + 7. Security Considerations.....................................14 + 8. IANA Considerations.........................................14 + 9. Acknowledgments.............................................14 + 10. References.................................................15 + 10.1. Normative References.....................................15 + 11. Authors' Addresses.........................................16 + 12. Full Copyright Statement...................................16 + 13. Intellectual Property Statement............................17 1. Introduction The Hello message for the Resource Reservation Protocol (RSVP) has been defined to establish and maintain basic signaling node adjacencies for Label Switching Routers (LSRs) participating in a Multiprotocol Label Switching (MPLS) traffic engineered (TE) network [RFC3209]. The Hello message has been extended for use in Generalized MPLS (GMPLS) network for state recovery of control channel or nodal faults through the exchange of the Restart Capabilities object [RFC3473]. GMPLS protocol definitions for RSVP [RFC3473] also allow a restarting node to learn the label that it previously allocated for use on a Label Switching Path (LSP) through the Recovery Label object carried on a Path message sent to a restarting node from its upstream neighbor. - Further RSVP protocol extensions have been defined [GR-EXT] to + Further RSVP protocol extensions have been defined [RFC5063] to perform graceful restart and to enable a restarting node to recover full control plane state by exchanging RSVP messages with its upstream and downstream neighbors. State previously transmitted to the upstream neighbor (principally the downstream label) is recovered from the upstream neighbor on a Path message (using the Recovery Label object as described in [RFC3473]). State previously transmitted to the downstream neighbor (including the upstream label, interface identifiers, and the explicit route) is recovered from the downstream neighbor using a RecoveryPath message. - [GR-EXT] also extends the Hello message to exchange information + [RFC5063] also extends the Hello message to exchange information about the ability to support the RecoveryPath message. - The examples and procedures in [RFC3473] and [GR-EXT] focus on the + The examples and procedures in [RFC3473] and [RFC5063] focus on the description of a single node restart when adjacent network nodes are operative. Although the procedures are equally applicable to multi-node restarts, no detailed explanation is provided. This document provides and informational clarification of the control plane procedures for a GMPLS network when there are multiple node failures, and describes how full control plane state can be recovered in different scenarios where the order in which the nodes restart is different. This document does not define any new processes or procedures. All - protocol mechanisms already defined in [RFC3473] and [GR-EXT] are + protocol mechanisms already defined in [RFC3473] and [RFC5063] are definitive. 2. Existing Procedures for Single Node Restart This section documents for information the existing procedures - defined in [RFC3473] and [GR-EXT]. Those documents are definitive, + defined in [RFC3473] and [RFC5063]. Those documents are definitive, and the description here is non-normative. It is provided for informational clarification only. 2.1. Procedures defined in [RFC3473] In the case of nodal faults, the procedures for the restarting node and the procedures for the neighbor of a restarting node are applied to the corresponding nodes. These procedures described in [RFC3473] are summarized as follows: @@ -171,23 +172,23 @@ For the Neighbor of a restarting node: 1) Sends the Path message with RECOVERY_LABEL object containing a label value corresponding to the label value received in the most recently received corresponding Resv message; 2) Resumes refreshing Path state with the restarting node; 3) Resumes refreshing Resv state with the restarting node. -2.2. Procedures defined in [GR-EXT] +2.2. Procedures defined in [RFC5063] - A new message is introduced in [GR-EXT] which is called the + A new message is introduced in [RFC5063] which is called the RecoveryPath message. The message is sent by the downstream neighbor of a restarting node to convey the contents of the last received Path message back to the restarting node. The restarting node will receive the Path message with the RECOVERY_LABEL object from its upstream neighbor, and/or the RecoveryPath message from its downstream neighbor. The full RSVP state of the restarting node can be recovered from these two messages. @@ -281,41 +282,41 @@ the PSB is responsible for receiving a Path from upstream and sending a Path to downstream. Regardless of how the RSVP state is implemented, on recovery there are two logical pieces of state to be recovered and these correspond to the PSB and RSB. 5. Procedures for Multiple Node Restart In this document, all the nodes are assumed to have the graceful - restart capabilities which are described in [RFC3473] and [GR-EXT]. + restart capabilities which are described in [RFC3473] and [RFC5063]. 5.1. Procedures for the Normal Node When the downstream Normal node detects its neighbor restarting, it must send a RecoveryPath message for each LSP associated with the restarting node for which it has previously sent a Resv message and which has not been torn down. When the upstream Normal node detects its neighbor restarting, it must send a Path message with RECOVERY_LABEL object containing a label value corresponding to the label value received in the most recently received corresponding Resv message. This document does not modify the procedures for the Normal node - which are described in [RFC3473] and [GR-EXT]. + which are described in [RFC3473] and [RFC5063]. 5.2. Procedures for the Restarting Node This document does not modify the procedures for the Restarting - node which are described in [RFC3473] and [GR-EXT]. + node which are described in [RFC3473] and [RFC5063]. 5.2.1. Procedures for Scenario 1 After the Restarting node restarts, it starts a Recovery Timer. Any RSVP state that has not been resynchronized when the Recovery Timer expires, should be cleared. At the Restarting node (Node B in the example), full resynchronization with the upstream neighbor (Node A) is possible because Node A is a Normal node. The upstream Path information is @@ -325,23 +326,23 @@ message received from Node A, but, obviously, some information (like the Recorded Route Object) will be missing from the new Resv message generated by Node B, and can not be supplied until the downstream Delayed Restarting node (Node C) restarts and sends a Resv. After the upstream Path information and upstream Resv information has been recovered by Node B, the normal refresh procedure with the upstream Node A should be started. - As per [GR-EXT], the Restarting node (Node B) would normally expect - to receive a RecoveryPath message from its downstream neighbor - (Node C). It would use this to recover the downstream Path + As per [RFC5063], the Restarting node (Node B) would normally + expect to receive a RecoveryPath message from its downstream + neighbor (Node C). It would use this to recover the downstream Path information, and would subsequently send a Path message to its downstream neighbor and receive a Resv message. But in this scenario, because the downstream neighbor has not restarted yet, Node B detects the communication with Node C is interrupted and must wait before resynchronizing with its downstream neighbor. In this case, the Restarting node (Node B) follows the procedures in section 9.3 of [RFC3473] and may run a Restart Timer to wait for the downstream neighbor (Node C) to restart. If its downstream neighbor (Node C) has not restarted before the timer expires the @@ -469,35 +470,35 @@ plane and the data plane on the nodes that are waiting for the failed device to restart. If the downstream Delayed Restarting node restarts after the cleanup of LSPs at Node C, the RecoveryPath message from Node D will be responded with a PathTear message. If the upstream Delayed Restarting node restarts after the cleanup of LSPs at Node C, the Path message from Node B will be treated as a new LSP setup request, but the setup will fail because Node D cannot be reached - Node C will respond with a PathErr message. Since this happens to Node B - during its restart processing, it should follow the rules of [GR- - EXT] and tear down the LSP. + during its restart processing, it should follow the rules of + [RFC5063] and tear down the LSP. 5.2.4. Procedures for scenario 4 When the Ingress node (Node A) restarts, it does not know which LSPs it caused to be created. Usually, however, this information is retrieved from the management plane or from the configuration requests stored in non-volatile form in the node in order to recover the LSP state. Furthermore, if the downstream node (Node B) is a Normal node, - according to the procedures in [GR-EXT], the ingress will receive a - RecoveryPath message and will understand that it was the ingress of - the LSP. + according to the procedures in [RFC5063], the ingress will receive + a RecoveryPath message and will understand that it was the ingress + of the LSP. However, in this scenario, the downstream node is a Delayed Restarting node, so Node A must rely on the information from the management plane or stored configuration, or it must wait for Node B to restart. In the event that Node B never restarts, management plane intervention may be used at Node A to clean up any LSP state restored from the management plane or from local configuration. @@ -534,68 +535,137 @@ plane resources and to over-ride the control plane. In this context, the management plane must always be able to release data plane resources that were previously in place for use by control-plane established LSPs. Further, the management plane must always be able to instruct any control plane node to tear down any LSP. Operators should be aware of the risks of misconnection that could be caused by careless manipulation from the management plane of in- use data plane resources. -6. Security Considerations +6. Clarification of Restarting Node Procedure + + According to the current graceful restart procedure [RFC3473], + after a node restarts its control plane, it needs its upstream node + to send PATH message with recovery label to synchronize its RSVP + state. If the restarted control plane becomes operational quickly, + the upstream node may not detect the restarting of downstream node + and therefore, may send a PATH message without recovery label + causing errors and unwanted connection deletion. + + N1 N2 + | | + | X (Restart start) + | HELLO | + |--------------->| + | | + | SRefresh | + |--------------->| + | | + | HELLO | + |--------------->| + | | + | X (Restart complete) + | SRefresh | + |--------------->| + | NACK | + |<---------------| + | Path without | + | recovery label | + |--------------->| + | X (resoure allocation failed because the + | | resouces are in use) + | PathErr | + |<---------------| + | PathTear | + |--------------->| + X(CON deletion) X (XCON deletion) + | | + + The sequence diagram above depicts one scenario where the LSP may + get deleted. + + In this sequence N1 did not detect hello failure and continues + sending SRefreshes which may get NACK'ed by N2 once restart + completes because there is no Path state corresponding to the + SRefresh message. This NACK causes a Path refresh message to be + generated but there is no RECOVERY_LABEL because N1 did not yet + detect that N2 has restarted as hello exchanges have not yet + started. The Path message is treated as "new" and fails to allocate + the resources because they are still in use. This causes a PathErr + message to be generated which may lead to the tear down of the LSP. + + To resolve the aforementioned problem, the following procedures are + proposed and are meant to work together with the recovery + procedures documented in [RFC3473]. Here, it is assumed that the + restarting node and the neighboring node(s) support Hello extension + as documented in [RFC3209] and recovery procedures documented in + [RFC3473]. + + After a node restarts its control plane, it should ignore and + silently drop all RSVP-TE messages, except hello messages, it + receives from any neighbor to which, no HELLO session has been + established. + + The restarting node should follow [RFC3209] to establish Hello + sessions with its neighbors, after its control plane becomes + operational. + + The restarting node resumes processing of RSVP-TE messages sent + from each neighbor to which the Hello session has been established. + +7. Security Considerations This document clarifies the procedures to be performed on RSVP agents that neighbor one or more restarting RSVP agents. In the case of the control plane in general, and the RSVP agent in particular, where one or more nodes carrying one or more LSPs are - restarted due to external attacks, the procedures defined in [GR- - EXT] and described in this document provide the ability for the - restarting RSVP agents to recover the RSVP state in each restarting - node corresponding to the LSPs, with the least possible + restarted due to external attacks, the procedures defined in + [RFC5063] and described in this document provide the ability for + the restarting RSVP agents to recover the RSVP state in each + restarting node corresponding to the LSPs, with the least possible perturbation to the rest of the network. Ideally, only the neighboring RSVP agents should notice the restart and hence need to perform additional processing. This allows for a network with active LSPs to recover LSP state gracefully from an external attack, without perturbing the data/forwarding plane state. -7. IANA Considerations +8. IANA Considerations This document defines no new protocols or extensions and makes no requests to IANA for registry management. -8. Acknowledgments +9. Acknowledgments We would like to thank Adrian Farrel, Dimitri Papadimitriou, and Lou Berger for their useful comments. -9. References +10. References -9.1. Normative References +10.1. Normative References -[RFC2209] R. Braden, L. Zhang, "Resource ReSerVation Protocol - (RSVP) -- Version 1 Message Processing Rules", RFC 2209, - September 1997. +[RFC2209] R. Braden, L. Zhang, "Resource ReSerVation Protocol (RSVP) + -- Version 1 Message Processing Rules", RFC 2209, September + 1997. -[RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, - V., and G. Swallow, "RSVP-TE: Extensions to RSVP for - LSP Tunnels", RFC 3209, December 2001. +[RFC3209] Awduche, D., Berger, L., Gan, D., Li, T., Srinivasan, V., + and G. Swallow, "RSVP-TE: Extensions to RSVP for LSP + Tunnels", RFC 3209, December 2001. [RFC3473] Berger, L., "Generalized Multi-Protocol Label Switching (GMPLS) Signaling Resource ReserVation Protocol-Traffic - Engineering (RSVP-TE) Extensions", RFC 3473, January - 2003. + Engineering (RSVP-TE) Extensions", RFC 3473, January 2003. -[GR-EXT] A. Satyanarayana, R. Rahman, "Extensions to GMPLS RSVP - Graceful Restart", Internet Draft, work in progress, - draft-ietf-ccamp-rsvp-restart-ext-08.txt, January 2007. +[RFC5063] A. Satyanarayana, R. Rahman, "Extensions to GMPLS RSVP + Graceful Restart", RFC 5063, September 2007. -10. Authors' Addresses +11. Authors' Addresses Dan Li Huawei Technologies Co., Ltd. F3-5-B R&D Center, Huawei Base, Bantian, Longgang District Shenzhen 518129 P.R.China Phone: +86-755-28972910 Email: danli@huawei.com @@ -609,38 +679,47 @@ Email: gjhhit@huawei.com Arun Satyanarayana Cisco Systems, Inc. 170 West Tasman Dr. San Jose, CA 95134, USA Phone: +1 408 853-3206 Email: asatyana@cisco.com -11. Full Copyright Statement + Snigdho C. Bardalai + Fujitsu Network Communications, Inc. + 2801 Telecom Parkway, + Richardson, Texas 75082 + United States of America + + Phone: +1 972 479 2951 + Email: snigdho.bardalai@us.fujitsu.com + +12. Full Copyright Statement Copyright (C) The IETF Trust (2007). This document is subject to the rights, licenses and restrictions contained in BCP 78, and except as set forth therein, the authors retain all their rights. This document and the information contained herein are provided on an "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY, THE IETF TRUST AND THE INTERNET ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. -12. Intellectual Property Statement +13. Intellectual Property Statement The IETF takes no position regarding the validity or scope of any Intellectual Property Rights or other rights that might be claimed to pertain to the implementation or use of the technology described in this document or the extent to which any license under such rights might or might not be available; nor does it represent that it has made any independent effort to identify any such rights. Information on the procedures with respect to rights in RFC documents can be found in BCP 78 and BCP 79.