draft-ietf-forces-ceha-01.txt   draft-ietf-forces-ceha-02.txt 
Network Working Group K. Ogawa Network Working Group K. Ogawa
Internet-Draft NTT Corporation Internet-Draft NTT Corporation
Intended status: Standards Track W. M. Wang Intended status: Standards Track W. M. Wang
Expires: August 26, 2011 Zhejiang Gongshang University Expires: February 25, 2012 Zhejiang Gongshang University
E. Haleplidis E. Haleplidis
University of Patras University of Patras
J. Hadi Salim J. Hadi Salim
Mojatatu Networks Mojatatu Networks
February 22, 2011 August 24, 2011
ForCES Intra-NE High Availability ForCES Intra-NE High Availability
draft-ietf-forces-ceha-01 draft-ietf-forces-ceha-02
Abstract Abstract
This document discusses CE High Availability within a ForCES NE. This document discusses CE High Availability within a ForCES NE.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 26, 2011. This Internet-Draft will expire on February 25, 2012.
Copyright Notice Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 17 skipping to change at page 2, line 17
1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Document Scope . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Document Scope . . . . . . . . . . . . . . . . . . . . . . 5
2.2. Quantifying Problem Scope . . . . . . . . . . . . . . . . 5 2.2. Quantifying Problem Scope . . . . . . . . . . . . . . . . 5
3. RFC5810 CE HA Framework . . . . . . . . . . . . . . . . . . . 6 3. RFC5810 CE HA Framework . . . . . . . . . . . . . . . . . . . 6
3.1. Current CE High Availability Support . . . . . . . . . . . 6 3.1. Current CE High Availability Support . . . . . . . . . . . 6
3.1.1. Cold Standby Interaction with ForCES Protocol . . . . 7 3.1.1. Cold Standby Interaction with ForCES Protocol . . . . 7
3.1.2. Responsibilities for HA . . . . . . . . . . . . . . . 9 3.1.2. Responsibilities for HA . . . . . . . . . . . . . . . 9
4. CE HA Hot Standby . . . . . . . . . . . . . . . . . . . . . . 10 4. CE HA Hot Standby . . . . . . . . . . . . . . . . . . . . . . 10
4.1. Changes to the FEPO model . . . . . . . . . . . . . . . . 12 4.1. Changes to the FEPO model . . . . . . . . . . . . . . . . 10
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12 4.2. FEPO processing . . . . . . . . . . . . . . . . . . . . . 10
6. Security Considerations . . . . . . . . . . . . . . . . . . . 12 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14
7.1. Normative References . . . . . . . . . . . . . . . . . . . 12 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 14
7.2. Informative References . . . . . . . . . . . . . . . . . . 12 7.1. Normative References . . . . . . . . . . . . . . . . . . . 14
Appendix 1. Appendix I - New FEPO version . . . . . . . . . . . . 13 7.2. Informative References . . . . . . . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20 Appendix 1. Appendix I - New FEPO version . . . . . . . . . . . . 15
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 22
1. Definitions 1. Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119. document are to be interpreted as described in RFC 2119.
The following definitions are taken from [RFC3654]and [RFC3746]: The following definitions are taken from [RFC3654]and [RFC3746]:
Logical Functional Block (LFB) -- A template that represents a fine- Logical Functional Block (LFB) -- A template that represents a fine-
skipping to change at page 4, line 48 skipping to change at page 4, line 48
Fr: CE-CE interface Fr: CE-CE interface
Fc: Interface between the CE Manager and a CE Fc: Interface between the CE Manager and a CE
Ff: Interface between the FE Manager and an FE Ff: Interface between the FE Manager and an FE
Fl: Interface between the CE Manager and the FE Manager Fl: Interface between the CE Manager and the FE Manager
Fi/f: FE external interface Fi/f: FE external interface
Figure 1: ForCES Architecture Figure 1: ForCES Architecture
The ForCES architecture allows FEs to be aware of multiple CEs but The ForCES architecture allows FEs to be aware of multiple CEs but
enforces that only one CE be the master controller. This is known in enforces that only one CE be the master controller. This is known in
the industry as 1+N redundancy [refxxxx]. The master CE controls the the industry as 1+N redundancy. The master CE controls the FEs via
FEs via the ForCES protocol operating in the Fp interface. If the the ForCES protocol operating in the Fp interface. If the master CE
master CE becomes faulty, a backup CE takes over and NE operation becomes faulty, a backup CE takes over and NE operation continues.
continues. By definition, the current documented setup is known as By definition, the current documented setup is known as cold-standby.
cold-standby [refxxxx]. The CE set is static and is passed to the FE The CE set is static and is passed to the FE by the FE Manager (FEM)
by the FE Manager (FEM) via the Ff interface and to each CE by the CE via the Ff interface and to each CE by the CE Manager (CEM) in the Fc
Manager (CEM) in the Fc interface during the pre-association phase. interface during the pre-association phase.
From an FE perspective, the knobs of control for a CE set are defined From an FE perspective, the knobs of control for a CE set are defined
by the FEPO LFB in [RFC5810], Appendix B. Section 3.1 of this by the FEPO LFB in [RFC5810], Appendix B. Section 3.1 of this
document details these knobs further. document details these knobs further.
2.1. Document Scope 2.1. Document Scope
It is assumed that the reader is aware of the ForCES architecture to
make sense of the changes made here. This document provides minimal
background to set the context of the discussion in Section 4.
By current definition, the Fr interface is out of scope for the By current definition, the Fr interface is out of scope for the
ForCES architecture. However, it is expected that organizations ForCES architecture. However, it is expected that organizations
implementing a set of CEs will need to have the CEs communicate to implementing a set of CEs will need to have the CEs communicate to
each other via the Fr interface in order to achieve the each other via the Fr interface in order to achieve the
synchronization necessary for controlling the FEs. synchronization necessary for controlling the FEs.
The problem scope addressed by this document falls into 2 areas: The problem scope addressed by this document falls into 2 areas:
1. To describe with more clarity (than [RFC5810]) how current cold- 1. To describe with more clarity (than [RFC5810]) how current cold-
standby approach operates within the NE cluster. standby approach operates within the NE cluster.
skipping to change at page 8, line 49 skipping to change at page 9, line 8
considered as a loss of association between the CE and corresponding considered as a loss of association between the CE and corresponding
FE. FE.
If the FE's FEPO CE Failover Policy is configured to mode 0 (the If the FE's FEPO CE Failover Policy is configured to mode 0 (the
default), it will immediately transition to the pre-association default), it will immediately transition to the pre-association
phase. This means that if association is again established, all FE phase. This means that if association is again established, all FE
state will need to be re-established. state will need to be re-established.
If the FE's FEPO CE Failover Policy is configured to mode 1, it If the FE's FEPO CE Failover Policy is configured to mode 1, it
indicates that the FE is capable of HA restart recovery. In such a indicates that the FE is capable of HA restart recovery. In such a
case, the FE transitions to the not associated state and the CEFTI case, the FE transitions to the Not Associated state and the CEFTI
timer[RFC 5810] is started. The FE MAY continue to forward packets timer[RFC 5810] is started. The FE MAY continue to forward packets
during this state. It MAY also recycle through any configured backup during this state. It MAY also recycle through any configured backup
CEs in a round-robin fashion. It first adds its primary CE to the CEs in a round-robin fashion. It first adds its primary CE to the
bottom of table BackupCEs and sets its CEID component to be the first bottom of table BackupCEs and sets its CEID component to be the first
secondary retrieved from table BackupCEs. The FE then attempts to secondary retrieved from table BackupCEs. The FE then attempts to
associate with the CE designated as the new primary CE. If it fails associate with the CE designated as the new primary CE. If it fails
to re-associate with any CE and the CEFTI expires, the FE then to re-associate with any CE and the CEFTI expires, the FE then
transitions to the pre-association state. transitions to the pre-association state.
If the FE, while in the not associated state, manages to reconnect to If the FE, while in the not associated state, manages to reconnect to
a new primary CE before CEFTI expires it transitions to the a new primary CE before CEFTI expires it transitions to the
Associated state. Once re-associated, the FE tries to recover any Associated state. Once re-associated, the FE tries to recover any
state that may have been lost during the not associated state. How state that may have been lost during the not associated state. How
the FE achieves to re-synchronize its state is out of scope for the the FE re-synchronizes state is out of scope for the current ForCES
current ForCES architecture. architecture.
An explicit message (a Config message setting Primary CE component in An explicit message (a Config message setting Primary CE component in
ForCES Protocol object) from the primary CE, can also be used to ForCES Protocol object) from the primary CE, can also be used to
change the Primary CE for an FE during normal protocol operation. In change the Primary CE for an FE during normal protocol operation. In
this case, the FE transitions to the Not Associated State and this case, the FE transitions to the Not Associated State and
attempts to Associate with the new CE. attempts to Associate with the new CE.
3.1.2. Responsibilities for HA 3.1.2. Responsibilities for HA
XXX: we may remove this section (not much value to overall
discussion)
TML Level: TML Level:
1. The TML controls logical connection availability and failover. 1. The TML controls logical connection availability and failover.
2. The TML also controls peer HA management. 2. The TML also controls peer HA management.
At this level, control of all lower layers, for example transport At this level, control of all lower layers, for example transport
level (such as IP addresses, MAC addresses etc) and associated links level (such as IP addresses, MAC addresses etc) and associated links
going down are the role of the TML. going down are the role of the TML.
PL Level: PL Level:
All other functionality, including configuring the HA behavior during All other functionality, including configuring the HA behavior during
setup, the CE IDs used to identify primary and secondary CEs, setup, the CE IDs used to identify primary and secondary CEs,
protocol messages used to report CE failure (Event Report), Heartbeat protocol messages used to report CE failure (Event Report), Heartbeat
messages used to detect association failure, messages to change the messages used to detect association failure, messages to change the
primary CE (Config), and other HA related operations described primary CE (Config), and other HA related operations described in
before, are the PL responsibility. Section 3.1, are the PL's responsibility.
To put the two together, if a path to a primary CE is down, the TML To put the two together, if a path to a primary CE is down, the TML
would take care of failing over to a backup path, if one is would take care of failing over to a backup path, if one is
available. If the CE is totally unreachable then the PL would be available. If the CE is totally unreachable then the PL would be
informed and it would take the appropriate actions described before. informed and it would take the appropriate actions described before.
4. CE HA Hot Standby 4. CE HA Hot Standby
In this section we make some small extensions to the existing scheme In this section we describe small extensions to the existing scheme
to enable it to achieve hot standby HA. With these suggested changes to enable hot standby HA. To achieve hot standby HA, we target
we achieve some of the goals defined in Section 2.2, namely: specific goals defined in Section 2.2, namely:
o How fast a backup CE becomes operational. o How fast a backup CE becomes operational.
o How fast the FEs associate with the new master CE. o How fast the FEs associate with the new master CE.
As described in Section 3.1, in the pre-association phase the FEM As described in Section 3.1, in the pre-association phase the FEM
configures the FE to make it aware of all the CEs in the NE. The FEM configures the FE to make it aware of all the CEs in the NE. The FEM
MUST configure the FE to make it aware of which CE is the master and MUST configure the FE to make it aware of which CE is the master and
MAY specify any backup CE(s). MAY specify any backup CE(s).
4.1. Changes to the FEPO model
In order for the above to be achievable there is a need to make a few
changes in the FEPO model. Section 1 contains the xml definition of
the new version 2 of the FEPO LFB.
Changes from the version 1 of FEPO are:
1. Addition of a new datatype, status (unsigned char) with special
values 0 (Disconnected), 1 (Connected), 2 (Associated), 3
(Lost_Connection) and 4 (Unreachable).
2. Change Component BackupCEs (9) to AllCEs and instead of an Array
of unsigned integers(CEID), it MUST be an Array of unsigned
integers (CEID) and unsigned char (status) for each CE.
3. Add two special values to the CEFailoverPolicyValues. 2 (High
availability without Graceful restart) and 3 (High availability
with Graceful restart).
4.2. FEPO processing
The FE's FEPO LFB version 2 AllCEs table (previously BackupCEs) The FE's FEPO LFB version 2 AllCEs table (previously BackupCEs)
contains all the CEIDs that the FE may connect and associate with. contains all the CEIDs that the FE may connect and associate with.
The sequence of the CE IDs is also the conncetion priority for the The ordering of the CE IDs in this table defines the priority order
FE. In the pre-association phase, the first CE ID in the AllCEs in which an FE will connect to the CEs. In the pre-association
table MUST be the first CE ID that the FE will attempt to connect and phase, the first CE ID (lowest table index) in the AllCEs table MUST
associate with. If the FE fails to connect and associate with the be the first CE ID that the FE will attempt to connect and associate
first CE ID it will attempt to connect to the second CE ID and so with. If the FE fails to connect and associate with the first CE ID,
forth, until there is a connection and an association or the list it will attempt to connect to the second CE ID and so forth, and
ends. The FEPO's CEID component identifies the current associated cycles back to the beggining of the list until there is a connection
master CE. and an association. The FE MUST associate with at least one CE.
Upon a successful association, the FEPO's CEID component identifies
the current associated master CE.
Once the FE has associated with a master CE it moves to the post- Once the FE has associated with a master CE it moves to the post-
association phase. In the post-association phase, the master CE MAY association phase. In the post-association phase, the master CE MAY
update the list of backup CEs. It MAY also instruct the FE to use a update the list of backup CEs. It MAY also instruct the FE to use a
different master CE. It is assumed that the master CE will different master CE. It is assumed that the master CE will
communicate with other CEs within the NE for the purpose of communicate with other CEs within the NE for the purpose of
synchronization via the CE-CE interface. The CE-CE interface is out synchronization via the CE-CE interface. The CE-CE interface is out
of scope for this document. of scope for this document.
FE CE#1 CE#2 ... CE#N
| | | |
| Asso Estb,Caps exchg | | |
1 |<-------------------->| | |
| | | |
| state update | | |
2 |<-------------------->| | |
| | | |
| Asso Estb,Caps exchg | |
3I|<--------------------------------->| |
... ... ... ...
| Asso Estb,Caps exchg |
3N|<------------------------------------------>|
| | | |
4 |<-------------------->| | |
. . . .
4x|<-------------------->| | |
| FAILURE | |
| | | |
| Event Report (CE#2 is new master) | |
5 |--------------------->|----------->|------->|
| | |
| Config (Set CEID to CEID of CE#3) | |
6 |<----------------------------------| |
7 |<--------------------------------->| |
. . . .
7x|<--------------------------------->| |
. . . .
Figure 4: CE Failover for Hot Standby
XXX: We need to have a figure3' to match the new FEPO SM
While in the post-association phase, if the CE Failover Policy is set While in the post-association phase, if the CE Failover Policy is set
to 2 (High Availability without Graceful Restart) or 3 (High to 2 (High Availability without Graceful Restart) or 3 (High
Availability with Graceful Restart) then the FE, after succesfully Availability with Graceful Restart) then the FE, after succesfully
associating with the master CE, MUST attempt to connect and associate associating with the master CE, MUST attempt to connect and associate
with all the CEs that it becomes aware of. If it fails to connect or with all the CEs that it becomes aware of. Figure 4 steps #1 and #2
associate with some CEs, the FE MAY flag them as unreachable to avoid illustrates the FE associating with CE#1 as the master and then
continuous attempts to connect. proceeding to steps #3I to #3N the association with backup CE's CE#2
to CE#N. If the FE fails to connect or associate with some CEs, the
FE MAY flag them as unreachable to avoid continuous attempts to
connect. The FE may retry to reassociate with unreachable CEs when
possible.
When the master CE for any reason is considered to be down, then the When the master CE for any reason is considered to be down, then the
FE will try to find the first associated CE from the list of all CEs FE will try to find the first associated CE from the list of all CEs
in a round-robin fashion. in a round-robin fashion.
If the FE is unable to find an associated FE in its list of CEs, then If the FE is unable to find an associated FE in its list of CEs, then
it will attempt to connect and associate with the first from the list it will attempt to connect and associate with the first from the list
of all CEs and continue in a round-robin fashion until it connects of all CEs and continue in a round-robin fashion until it connects
and associates with a CE. and associates with a CE.
"XXX: We need to discuss what should happen to CEs in the AllCEs list Once the FE selects the associated CE to use as the new master, the
which an FE has attempted to connect or associate to but failed." FE then sends the Primary CE Down Event Notification to all
Once connected and associated it assumes that the new associated CE
is the new master CE and sets the FEPO CEID component's value with
the new associated CE's ID.
The FE then sends the Primary CE Down Event Notification to all
associated CEs to notify them that the FE considers this CE as the associated CEs to notify them that the FE considers this CE as the
new master CE. new master CE.
The new master CE MUST configure the CEID component of the FE within The new master CE MUST configure the CEID component of the FE within
the time limit defined in the FEPO Failover Timeout as a confirmation the time limit defined in the FEPO Failover Timeout as a confirmation
that the FE made the right choice. that the FE made the right choice.
XXX: We need to discuss what happenes if a CE doesn't respond within FE CE#1 CE#2 ... CE#N
a FEPO Failover Timeout. | | | |
| Asso Estb,Caps exchg | | |
1 |<-------------------->| | |
| | | |
| state update | | |
2 |<-------------------->| | |
| | | |
| Asso Estb,Caps exchg | |
3I|<--------------------------------->| |
| | | |
... ... ... ...
| Asso Estb,Caps exchg |
3N|<------------------------------------------>|
| | | |
4 |<-------------------->| | |
. . . .
4x|<-------------------->| | |
| FAILURE | |
| | | |
| Event Report (CE#2 is new master) | |
5 |--------------------->|----------->|------->|
| | | |
| FEPO Failover Timeout | |
| | | |
| Event Report (CE#N is new master) | |
6 |--------------------->|----------->|------->|
| | | |
| Config (Set CEID to CEID of CE#N) |
7 |<-------------------------------------------|
8a|<------------------------------------------>|
. . . .
8x|<------------------------------------------>|
Figure 5: CE Failover for Hot Standby
If the FE does not get confirmation within the FEPO Failover Timeout,
it picks the next CE on its list and advertises it as the new master.
Figure 5 illustrates in step #5 selecting CE#2 as its new master. In
step #6, the timeout occurs and it picks CE#N as its new master. The
FE receives confirmation that CE#N is the new master in step #7.
If the CE the FE assumed to be the master discovers that it should If the CE the FE assumed to be the master discovers that it should
not be the new master CE, then it will configure the CEID with the ID not be the new master CE, then it will configure the CEID with the ID
of the proper master CE. How the CE decides who the new master CE of the proper master CE. How the CE decides who the new master CE
is, is also out of scope of this document and is assumed to be done is, is also out of scope of this document and is assumed to be done
via a CE-CE communication protocol. via a CE-CE communication protocol. The FE must then associate with
then new CE.
In most High Availability architectures the split-brain issue is In most High Availability architectures there exists the possibility
present. However, since the FE will never accept any configuration of split-brain. However, since in our setup the FE will never accept
messages from any other than the master CE, we consider the FE as any configuration messages from any other than the master CE, we
fenced against data corruption from the other CEs that consider consider the FE as fenced against data corruption from the other CEs
themselves as the master. The split-brain issue is mostly a CE-CE that consider themselves as the master. The split-brain issue
communication problem and is considered to be out of scope. becomes mostly a CE-CE communication problem which is considered to
be out of scope.
By virtue of having multiple CE connections, the FE switchover to a By virtue of having multiple CE connections, the FE switchover to a
new master CE will be relatively much faster. The overall effect is new master CE will be relatively much faster. The overall effect is
improving the NE recovery time in case of communication failure or improving the NE recovery time in case of communication failure or
faults of the master CE. faults of the master CE. This satisfies the requirement we set to
achieve.
For the sake of simplicity, the FE MUST respond to messages issued by
only the master CE. This simplifies the synchronization and avoids
the concept of locking FE state. The FE MUST drop any messages from
backup CEs. However, asynchronous events that the master CE has
subscribed to and heartbeats are sent to all associated-to CEs.
Packet redirects continue to be sent only to the master CE. The
Heartbeat Interval, the CEHB Policy and the FEHB Policy MUST apply to
all CEs.
4.1. Changes to the FEPO model
In order for the above to be achievable there is a need to make a few
changes in the FEPO model. Appendix I contains the xml of the new
version of the FEPO.
Changes from the previous version are:
1. Addition of a new datatype, status (unsigned char) with special
values 0 (Disconnected), 1 (Connected), 2 (Associated), 3
(Lost_Connection) and 4 (Unreachable).
2. Change Component BackupCEs (9) to AllCEs and instead of an Array
of unsigned integers, it MUST be an Array of unsigned integers
(CEID) and unsigned char (status) for each CE.
3. Add two special values to the CEFailoverPolicyValues. 2 (High For the sake of simplicity, the FE MUST respond to messages issued
availability without Graceful restart) and 3 (High availability only by the master CE. This simplifies the synchronization and
with Graceful restart). avoids the concept of locking FE state. The FE MUST drop any
messages from backup CEs. However, asynchronous events that the
master CE has subscribed to, as well as heartbeats are sent to all
associated-to CEs. Packet redirects continue to be sent only to the
master CE. The Heartbeat Interval, the CEHB Policy and the FEHB
Policy MUST be the same for all CEs.
5. IANA Considerations 5. IANA Considerations
TBA TBA
6. Security Considerations 6. Security Considerations
TBA TBA
7. References 7. References
 End of changes. 22 change blocks. 
87 lines changed or deleted 165 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/