draft-ietf-forces-ceha-00.txt   draft-ietf-forces-ceha-01.txt 
Network Working Group K. Ogawa Network Working Group K. Ogawa
Internet-Draft NTT Corporation Internet-Draft NTT Corporation
Intended status: Standards Track W. M. Wang Intended status: Standards Track W. M. Wang
Expires: April 20, 2011 Zhejiang Gongshang University Expires: August 26, 2011 Zhejiang Gongshang University
E. Haleplidis E. Haleplidis
University of Patras University of Patras
J. Hadi Salim J. Hadi Salim
Mojatatu Networks Mojatatu Networks
Oct 17, 2010 February 22, 2011
ForCES Intra-NE High Availability ForCES Intra-NE High Availability
draft-ietf-forces-ceha-00 draft-ietf-forces-ceha-01
Abstract Abstract
This document discusses CE High Availability within a ForCES NE. This document discusses CE High Availability within a ForCES NE.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 20, 2011. This Internet-Draft will expire on August 26, 2011.
Copyright Notice Copyright Notice
Copyright (c) 2010 IETF Trust and the persons identified as the Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 17 skipping to change at page 2, line 17
1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Definitions . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Document Scope . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Document Scope . . . . . . . . . . . . . . . . . . . . . . 5
2.2. Quantifying Problem Scope . . . . . . . . . . . . . . . . 5 2.2. Quantifying Problem Scope . . . . . . . . . . . . . . . . 5
3. RFC5810 CE HA Framework . . . . . . . . . . . . . . . . . . . 6 3. RFC5810 CE HA Framework . . . . . . . . . . . . . . . . . . . 6
3.1. Current CE High Availability Support . . . . . . . . . . . 6 3.1. Current CE High Availability Support . . . . . . . . . . . 6
3.1.1. Cold Standby Interaction with ForCES Protocol . . . . 7 3.1.1. Cold Standby Interaction with ForCES Protocol . . . . 7
3.1.2. Responsibilities for HA . . . . . . . . . . . . . . . 9 3.1.2. Responsibilities for HA . . . . . . . . . . . . . . . 9
4. CE HA Hot Standby . . . . . . . . . . . . . . . . . . . . . . 10 4. CE HA Hot Standby . . . . . . . . . . . . . . . . . . . . . . 10
5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11 4.1. Changes to the FEPO model . . . . . . . . . . . . . . . . 12
6. Security Considerations . . . . . . . . . . . . . . . . . . . 11 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 12
7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 11 6. Security Considerations . . . . . . . . . . . . . . . . . . . 12
7.1. Normative References . . . . . . . . . . . . . . . . . . . 11 7. References . . . . . . . . . . . . . . . . . . . . . . . . . . 12
7.2. Informative References . . . . . . . . . . . . . . . . . . 11 7.1. Normative References . . . . . . . . . . . . . . . . . . . 12
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 11 7.2. Informative References . . . . . . . . . . . . . . . . . . 12
Appendix 1. Appendix I - New FEPO version . . . . . . . . . . . . 13
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20
1. Definitions 1. Definitions
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 2119. document are to be interpreted as described in RFC 2119.
The following definitions are taken from [RFC3654]and [RFC3746]: The following definitions are taken from [RFC3654]and [RFC3746]:
Logical Functional Block (LFB) -- A template that represents a fine- Logical Functional Block (LFB) -- A template that represents a fine-
skipping to change at page 10, line 15 skipping to change at page 10, line 15
4. CE HA Hot Standby 4. CE HA Hot Standby
In this section we make some small extensions to the existing scheme In this section we make some small extensions to the existing scheme
to enable it to achieve hot standby HA. With these suggested changes to enable it to achieve hot standby HA. With these suggested changes
we achieve some of the goals defined in Section 2.2, namely: we achieve some of the goals defined in Section 2.2, namely:
o How fast a backup CE becomes operational. o How fast a backup CE becomes operational.
o How fast the FEs associate with the new master CE. o How fast the FEs associate with the new master CE.
As described in Section 3.1, the FEM configures the FE to make it As described in Section 3.1, in the pre-association phase the FEM
aware of all the CEs in the NE. The FEM also configures the FE to configures the FE to make it aware of all the CEs in the NE. The FEM
make it aware of which CE is the master and which are backup(s). The MUST configure the FE to make it aware of which CE is the master and
FE's FEPO LFB CEID component identifies the current master CE and MAY specify any backup CE(s).
table BackupCEs identifies the backup CEs. The FE only connects to
the master CE and then proceeds to associate with it. The master
thereafter controls the FE and receives events from it. This
continues until there is communication failure between the FE and CE
at which point the FE attempts to connect to a CE from the BackupCEs
table until it succeeds to connect and associate with one listed CE.
It is recommended that at least one backup CE should be online. The FE's FEPO LFB version 2 AllCEs table (previously BackupCEs)
Doing so will improve how fast the backup CE will take to be contains all the CEIDs that the FE may connect and associate with.
operational (as opposed to bringing up a backup CE when we detect a The sequence of the CE IDs is also the conncetion priority for the
master CE fault). If we assume that a CE implementation does state FE. In the pre-association phase, the first CE ID in the AllCEs
synchronization between CEs, then the cost of making the backup CE table MUST be the first CE ID that the FE will attempt to connect and
operational and ready to serve FEs; in such a case an associating FE associate with. If the FE fails to connect and associate with the
could immediately become operational. first CE ID it will attempt to connect to the second CE ID and so
forth, until there is a connection and an association or the list
ends. The FEPO's CEID component identifies the current associated
master CE.
If we assume the presence of at least one backup CE online, we can Once the FE has associated with a master CE it moves to the post-
improve how fast the FEs associate with a new master CE by making two association phase. In the post-association phase, the master CE MAY
changes: update the list of backup CEs. It MAY also instruct the FE to use a
different master CE. It is assumed that the master CE will
communicate with other CEs within the NE for the purpose of
synchronization via the CE-CE interface. The CE-CE interface is out
of scope for this document.
The first change that needs to be made is to have the FE, soon after While in the post-association phase, if the CE Failover Policy is set
successfully connecting and associating with the master CE, to to 2 (High Availability without Graceful Restart) or 3 (High
proceed and connect as well as associate with the rest of the CEs Availability with Graceful Restart) then the FE, after succesfully
listed in the BackupCEs table. associating with the master CE, MUST attempt to connect and associate
with all the CEs that it becomes aware of. If it fails to connect or
associate with some CEs, the FE MAY flag them as unreachable to avoid
continuous attempts to connect.
When the master CE for any reason is considered to be down, then the
FE will try to find the first associated CE from the list of all CEs
in a round-robin fashion.
If the FE is unable to find an associated FE in its list of CEs, then
it will attempt to connect and associate with the first from the list
of all CEs and continue in a round-robin fashion until it connects
and associates with a CE.
"XXX: We need to discuss what should happen to CEs in the AllCEs list
which an FE has attempted to connect or associate to but failed."
Once connected and associated it assumes that the new associated CE
is the new master CE and sets the FEPO CEID component's value with
the new associated CE's ID.
The FE then sends the Primary CE Down Event Notification to all
associated CEs to notify them that the FE considers this CE as the
new master CE.
The new master CE MUST configure the CEID component of the FE within
the time limit defined in the FEPO Failover Timeout as a confirmation
that the FE made the right choice.
XXX: We need to discuss what happenes if a CE doesn't respond within
a FEPO Failover Timeout.
If the CE the FE assumed to be the master discovers that it should
not be the new master CE, then it will configure the CEID with the ID
of the proper master CE. How the CE decides who the new master CE
is, is also out of scope of this document and is assumed to be done
via a CE-CE communication protocol.
In most High Availability architectures the split-brain issue is
present. However, since the FE will never accept any configuration
messages from any other than the master CE, we consider the FE as
fenced against data corruption from the other CEs that consider
themselves as the master. The split-brain issue is mostly a CE-CE
communication problem and is considered to be out of scope.
By virtue of having multiple CE connections, the FE switchover to a By virtue of having multiple CE connections, the FE switchover to a
new master CE will be relatively much faster. The overall effect is new master CE will be relatively much faster. The overall effect is
improving the NE recovery time in case of communication failure or improving the NE recovery time in case of communication failure or
faults of the master CE. faults of the master CE.
XXX: below paragraph needs more text discussion .. For the sake of simplicity, the FE MUST respond to messages issued by
only the master CE. This simplifies the synchronization and avoids
the concept of locking FE state. The FE MUST drop any messages from
backup CEs. However, asynchronous events that the master CE has
subscribed to and heartbeats are sent to all associated-to CEs.
Packet redirects continue to be sent only to the master CE. The
Heartbeat Interval, the CEHB Policy and the FEHB Policy MUST apply to
all CEs.
The FE MUST respond to messages issued by only the master CE. This 4.1. Changes to the FEPO model
simplifies the synchronization and avoids the concept of locking FE
state. The FE MUST drop any messages from backup CEs (XXX: Should we
log and increment some stat?).
XXX: below paragraph needs text discussion .. In order for the above to be achievable there is a need to make a few
changes in the FEPO model. Appendix I contains the xml of the new
version of the FEPO.
Again for the sake of simplicity, asynchronous events and heartbeats Changes from the previous version are:
are sent to all associated-to CEs. Packet redirects continue to be
sent only to the master CE.
XXXX: We need to have an extra state for each CE (master, connected, 1. Addition of a new datatype, status (unsigned char) with special
associated, stats etc) on the FEPO - so probably another change to values 0 (Disconnected), 1 (Connected), 2 (Associated), 3
current FEPO components. (Lost_Connection) and 4 (Unreachable).
2. Change Component BackupCEs (9) to AllCEs and instead of an Array
of unsigned integers, it MUST be an Array of unsigned integers
(CEID) and unsigned char (status) for each CE.
3. Add two special values to the CEFailoverPolicyValues. 2 (High
availability without Graceful restart) and 3 (High availability
with Graceful restart).
5. IANA Considerations 5. IANA Considerations
TBA TBA
6. Security Considerations 6. Security Considerations
TBA TBA
7. References 7. References
skipping to change at page 12, line 5 skipping to change at page 13, line 9
of IP Control and Forwarding", RFC 3654, November 2003. of IP Control and Forwarding", RFC 3654, November 2003.
[RFC3746] Yang, L., Dantu, R., Anderson, T., and R. Gopal, [RFC3746] Yang, L., Dantu, R., Anderson, T., and R. Gopal,
"Forwarding and Control Element Separation (ForCES) "Forwarding and Control Element Separation (ForCES)
Framework", RFC 3746, April 2004. Framework", RFC 3746, April 2004.
[RFC5812] Halpern, J. and J. Hadi Salim, "Forwarding and Control [RFC5812] Halpern, J. and J. Hadi Salim, "Forwarding and Control
Element Separation (ForCES) Forwarding Element Model", Element Separation (ForCES) Forwarding Element Model",
RFC 5812, March 2010. RFC 5812, March 2010.
1. Appendix I - New FEPO version
XXX: Describe this to conform to LFB extensions as prescribed in the
model
<LFBLibrary xmlns="http://ietf.org/forces/1.0/lfbmodel"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://ietf.org/forces/1.0/lfbmodel D:\Workspace\ForCES\XML\LFBSchema.xsd"
provides="FEPO">
<!-- XXX -->
<dataTypeDefs>
<dataTypeDef>
<name>CEHBPolicyValues</name>
<synopsis>
The possible values of CE heartbeat policy
</synopsis>
<atomic>
<baseType>uchar</baseType>
<specialValues>
<specialValue value="0">
<name>CEHBPolicy0</name>
<synopsis>
The CE heartbeat policy 0
</synopsis>
</specialValue>
<specialValue value="1">
<name>CEHBPolicy1</name>
<synopsis>
The CE heartbeat policy 1
</synopsis>
</specialValue>
</specialValues>
</atomic>
</dataTypeDef>
<dataTypeDef>
<name>FEHBPolicyValues</name>
<synopsis>
The possible values of FE heartbeat policy
</synopsis>
<atomic>
<baseType>uchar</baseType>
<specialValues>
<specialValue value="0">
<name>FEHBPolicy0</name>
<synopsis>
The FE heartbeat policy 0
</synopsis>
</specialValue>
<specialValue value="1">
<name>FEHBPolicy1</name>
<synopsis>
The FE heartbeat policy 1
</synopsis>
</specialValue>
</specialValues>
</atomic>
</dataTypeDef>
<dataTypeDef>
<name>FERestartPolicyValues</name>
<synopsis>
The possible values of FE restart policy
</synopsis>
<atomic>
<baseType>uchar</baseType>
<specialValues>
<specialValue value="0">
<name>FERestartPolicy0</name>
<synopsis>
The FE restart policy 0
</synopsis>
</specialValue>
</specialValues>
</atomic>
</dataTypeDef>
<dataTypeDef>
<name>CEFailoverPolicyValues</name>
<synopsis>
The possible values of CE failover policy
</synopsis>
<atomic>
<baseType>uchar</baseType>
<specialValues>
<specialValue value="0">
<name>CEFailoverPolicy0</name>
<synopsis>
The CE failover policy 0
No High Availability or Graceful Restart.
</synopsis>
</specialValue>
<specialValue value="1">
<name>CEFailoverPolicy1</name>
<synopsis>
Graceful Restart
</synopsis>
</specialValue>
<specialValue value="2">
<name>CEFailoverPolicy2</name>
<synopsis>
High Availability without Graceful Restart
</synopsis>
</specialValue>
<specialValue value="3">
<name>CEFailoverPolicy3</name>
<synopsis>
High Availability with Graceful Restart
</synopsis>
</specialValue>
</specialValues>
</atomic>
</dataTypeDef>
<dataTypeDef>
<name>FEHACapab</name>
<synopsis>
The supported HA features
</synopsis>
<atomic>
<baseType>uchar</baseType>
<specialValues>
<specialValue value="0">
<name>GracefullRestart</name>
<synopsis>
The FE supports Graceful Restart
</synopsis>
</specialValue>
<specialValue value="1">
<name>HA</name>
<synopsis>
The FE supports HA
</synopsis>
</specialValue>
</specialValues>
</atomic>
</dataTypeDef>
<dataTypeDef>
<name>CEStatusType</name>
<synopis>
Status values. Status for each CE.
</synopis>
<atomic>
<baseType>uchar</baseType>
<specialValues>
<specialValue value="0">
<name>Disconnected</name>
<synopsis>
No connection attempt with the CE yet.
</synopsis>
</specialValue>
<specialValue value="1">
<name>Connected</name>
<synopsis>
The FE has connected with the CE.
</synopsis>
</specialValue>
<specialValue value="2">
<name>Associated</name>
<synopsis>
The FE has associated with the CE.
</synopsis>
</specialValue>
<specialValue value="3">
<name>Lost_Connection</name>
<synopsis>
The FE was associated with the CE
but lost the connection.
</synopsis>
</specialValue>
<specialValue value="4">
<name>Unreachable</name>
<synopsis>
The CE is deemed as unreachable by the FE.
</synopsis>
</specialValue>
</specialValues>
</atomic>
</dataTypeDef>
<dataTypeDef>
<name>AllCEType</name>
<synopsis>
Table Type for AllCE component.
</synopsis>
<struct>
<component componentID="1">
<name>CEID</name>
<synopsis>ID of the CE</synopsis>
<typeRef>uint32</typeRef>
</component>
<component componentID="2">
<name>CEStatus</name>
<synopsis>Status of the CE</synopsis>
<typeRef>CEStatusType</typeRef>
</component>
</struct>
</dataTypeDef>
</dataTypeDefs>
<LFBClassDefs>
<LFBClassDef LFBClassID="2">
<name>FEPO</name>
<synopsis>
The FE Protocol Object
</synopsis>
<version>2.0</version>
<components>
<component componentID="1" access="read-only">
<name>CurrentRunningVersion</name>
<synopsis>Currently running ForCES version</synopsis>
<typeRef>u8</typeRef>
</component>
<component componentID="2" access="read-only">
<name>FEID</name>
<synopsis>Unicast FEID</synopsis>
<typeRef>uint32</typeRef>
</component>
<component componentID="3" access="read-write">
<name>MulticastFEIDs</name>
<synopsis>
the table of all multicast IDs
</synopsis>
<array type="variable-size">
<typeRef>uint32</typeRef>
</array>
</component>
<component componentID="4" access="read-write">
<name>CEHBPolicy</name>
<synopsis>
The CE Heartbeat Policy
</synopsis>
<typeRef>CEHBPolicyValues</typeRef>
</component>
<component componentID="5" access="read-write">
<name>CEHDI</name>
<synopsis>
The CE Heartbeat Dead Interval in millisecs
</synopsis>
<typeRef>uint32</typeRef>
</component>
<component componentID="6" access="read-write">
<name>FEHBPolicy</name>
<synopsis>
The FE Heartbeat Policy
</synopsis>
<typeRef>FEHBPolicyValues</typeRef>
</component>
<component componentID="7" access="read-write">
<name>FEHI</name>
<synopsis>
The FE Heartbeat Interval in millisecs
</synopsis>
<typeRef>uint32</typeRef>
</component>
<component componentID="8" access="read-write">
<name>CEID</name>
<synopsis>
The Primary CE this FE is associated with
</synopsis>
<typeRef>uint32</typeRef>
</component>
<component componentID="9" access="read-write">
<name>AllCEs</name>
<synopsis>
The table of all CEs.
</synopsis>
<array type="variable-size">
<typeRef>AllCEType</typeRef>
</array>
</component>
<component componentID="10" access="read-write">
<name>CEFailoverPolicy</name>
<synopsis>
The CE Failover Policy
</synopsis>
<typeRef>CEFailoverPolicyValues</typeRef>
</component>
<component componentID="11" access="read-write">
<name>CEFTI</name>
<synopsis>
The CE Failover Timeout Interval in millisecs
</synopsis>
<typeRef>uint32</typeRef>
</component>
<component componentID="12" access="read-write">
<name>FERestartPolicy</name>
<synopsis>
The FE Restart Policy
</synopsis>
<typeRef>FERestartPolicyValues</typeRef>
</component>
<component componentID="13" access="read-write">
<name>LastCEID</name>
<synopsis>
The Primary CE this FE was last associated with
</synopsis>
<typeRef>uint32</typeRef>
</component>
</components>
<capabilities>
<capability componentID="30">
<name>SupportableVersions</name>
<synopsis>
the table of ForCES versions that FE supports
</synopsis>
<array type="variable-size">
<typeRef>u8</typeRef>
</array>
</capability>
<capability componentID="31">
<name>HACapabilities</name>
<synopsis>
the table of HA capabilities the FE supports
</synopsis>
<array type="variable-size">
<typeRef>FEHACapab</typeRef>
</array>
</capability>
</capabilities>
<events baseID="61">
<event eventID="1">
<name>PrimaryCEDown</name>
<synopsis>
The pimary CE has changed
</synopsis>
<eventTarget>
<eventField>LastCEID</eventField>
</eventTarget>
<eventChanged/>
<eventReports>
<eventReport>
<eventField>LastCEID</eventField>
</eventReport>
</eventReports>
</event>
</events>
</LFBClassDef>
</LFBClassDefs>
</LFBLibrary>
Authors' Addresses Authors' Addresses
Kentaro Ogawa Kentaro Ogawa
NTT Corporation NTT Corporation
3-9-11 Midori-cho 3-9-11 Midori-cho
Musashino-shi, Tokyo 180-8585 Musashino-shi, Tokyo 180-8585
Japan Japan
Email: ogawa.kentaro@lab.ntt.co.jp Email: ogawa.kentaro@lab.ntt.co.jp
 End of changes. 16 change blocks. 
47 lines changed or deleted 442 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/