Network Working Group                                           K. Ogawa
Internet-Draft                                           NTT Corporation
Intended status: Standards Track                              W. M. Wang
Expires: July 21, August 23, 2013                   Zhejiang Gongshang University
                                                           E. Haleplidis
                                                    University of Patras
                                                           J. Hadi Salim
                                                       Mojatatu Networks
                                                        January 17,
                                                       February 19, 2013

                   ForCES Intra-NE High Availability
                       draft-ietf-forces-ceha-05
                       draft-ietf-forces-ceha-06

Abstract

   This document discusses CE High Availability within a ForCES NE.

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on July 21, August 23, 2013.

Copyright Notice

   Copyright (c) 2013 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  4
     2.1.  Document Scope . . . . . . . . . . . . . . . . . . . . . .  5
     2.2.  Quantifying Problem Scope  . . . . . . . . . . . . . . . .  5
   3.  RFC5810 CE HA Framework  . . . . . . . . . . . . . . . . . . .  6
     3.1.  Current  RFC 5810 CE High Availability Support  . . . . . . . . . . .  6
       3.1.1.  Cold Standby Interaction with ForCES Protocol  . . . .  7
       3.1.2.  Responsibilities for HA  . . . . . . . . . . . . . . .  9
   4.  CE HA Hot Standby  . . . . . . . . . . . . . . . . . . . . . . 10
     4.1.  Changes to the FEPO model  . . . . . . . . . . . . . . . . 10
     4.2.  FEPO processing  . . . . . . . . . . . . . . . . . . . . . 11 12
   5.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16
   6.  Security Considerations  . . . . . . . . . . . . . . . . . . . 16
   7.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 17 16
     7.1.  Normative References . . . . . . . . . . . . . . . . . . . 17 16
     7.2.  Informative References . . . . . . . . . . . . . . . . . . 17
   Appendix 1.  Appendix I - A.  New FEPO version  . . . . . . . . . . . . . . . . . . 17
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 25 26

1.  Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
   document are to be interpreted as described in RFC 2119.

   The following definitions are taken from [RFC3654]and [RFC3746]:

   o  Logical Functional Block (LFB) -- A template that represents a fine-
   grained,
      fine-grained, logically separate aspects of FE processing.

   o  ForCES Protocol -- The protocol used at the Fp reference point in
      the ForCES Framework in [RFC3746].

   o  ForCES Protocol Layer (ForCES PL) -- A layer in the ForCES
      architecture that embodies the ForCES protocol and the state
      transfer mechanisms as defined in [RFC5810].

   o  ForCES Protocol Transport Mapping Layer (ForCES TML) -- A layer in
      ForCES protocol architecture that specifically addresses the
      protocol message transportation issues, such as how the protocol
      messages are mapped to different transport media (like SCTP, IP,
      TCP, UDP, ATM, Ethernet, etc), and how to achieve and implement
      reliability, security, etc.

2.  Introduction

   Figure 1 illustrates a ForCES NE controlled by a set of redundant CEs
   with CE1 being active and CE2 and CEn-1 CEN being a backup.

                           -----------------------------------------
                           | ForCES Network Element                |
                           |                        +-----------+  |
                           |                        |  CEn-1  CEN      |  |
                           |                        |  (Backup) |  |
     --------------   Fc   | +------------+      +------------+ |  |
     | CE Manager |--------+-|     CE1    |------|    CE2     |-+  |
     --------------        | |  (Active)  |  Fr  |  (Backup)  |    |
           |               | +-------+--+-+      +---+---+----+    |
           | Fl            |         |  |    Fp      /   |         |
           |               |         |  +---------+ /    |         |
           |               |       Fp|            |/     |Fp       |
           |               |         |            |      |         |
           |               |         |      Fp   /+--+   |         |
           |               |         |  +-------+    |   |         |
           |               |         |  |            |   |         |
     --------------    Ff  | --------+--+--      ----+---+----+    |
     | FE Manager |--------+-|     FE1    |  Fi  |     FE2    |    |
     --------------        | |            |------|            |    |
                           | --------------      --------------    |
                           |   |  |  |  |          |  |  |  |      |
                           ----+--+--+--+----------+--+--+--+-------
                               |  |  |  |          |  |  |  |
                               |  |  |  |          |  |  |  |
                                 Fi/f                   Fi/f

          Fp: CE-FE interface
          Fi: FE-FE interface
          Fr: CE-CE interface
          Fc: Interface between the CE Manager and a CE
          Ff: Interface between the FE Manager and an FE
          Fl: Interface between the CE Manager and the FE Manager
          Fi/f: FE external interface

                       Figure 1: ForCES Architecture

   The ForCES architecture allows FEs to be aware of multiple CEs but
   enforces that only one CE be the master controller.  This is known in
   the industry as 1+N redundancy.  The master CE controls the FEs via
   the ForCES protocol operating in the Fp interface.  If the master CE
   becomes faulty, a backup CE takes over and NE operation continues.
   By definition, the current documented setup is known as cold-standby.
   The CE set of CEs controlling an FE is static and is passed to the FE by
   the FE Manager (FEM) via the Ff interface and to each CE by the CE
   Manager (CEM) in the Fc interface during the pre-association phase.

   From an FE perspective, the knobs of control for a CE set are defined
   by the FEPO LFB in [RFC5810], Appendix B.  Section 3.1 of this
   document details these knobs further.

2.1.  Document Scope

   It is assumed that the reader is aware of the ForCES architecture to
   make sense of the changes made here. being described in this document.  This
   document provides minimal background information to set the context of the
   discussion in Section 4.

   By current definition,

   At the time this document is being written, the Fr interface is out
   of scope for the ForCES architecture.  However, it is expected that
   organizations implementing a set of CEs will need to have the CEs
   communicate to each other via the Fr interface in order to achieve
   the synchronization necessary for controlling the FEs.

   The problem scope addressed by this document falls into 2 areas:

   1.  To describe with more clarity (than [RFC5810]) how current cold-
       standby approach operates within the NE cluster.

   2.  To describe how to evolve the [RFC5810] cold-standby setup to a
       hot-standby redundancy setup so as to improve the failover time
       and NE availability.

2.2.  Quantifying Problem Scope

   The NE recovery and availability is dependent on several time-
   sensitive metrics:

   1.  How fast the CE plane failure is detected by the FE.

   2.  How fast a backup CE becomes operational.

   3.  How fast the FEs associate with the new master CE.

   4.  How fast the FEs recover their state and become operational.

   The design goals intent of the current [RFC5810] choices as well as this document
   to meet the above goals are driven by desire for simplicity.

   To quantify the above criteria with the current prescribed ForCES CE
   setup in [RFC5810]:

   1.  How fast the CE FE side detects a CE failure is left undefined.  To
       illustrate an extreme scenario, we could have a human operator
       acting as the monitoring entity to detect faulty CEs.  How fast
       such detection happens could be in the range of seconds to days.
       A more active monitor on the Fr interface could improve this
       detection.

   2.  How fast the backup CE becomes operational is also currently out
       of scope.  In the current setup, a backup CE need not be
       operational at all (for example, to save power) and therefore it
       is feasible for a monitoring entity to boot up a backup CE after
       it detects the failure of the master CE.  In this document
       Section 4 we suggest that at least one backup CE be online so as
       to improve this metric.

   3.  How fast an FE associates with new master CE is also currently
       undefined.  The cost of an FE connecting and associating adds to
       the recovery overhead.  As mentioned above we suggest having at
       least one backup CE online.  In Section 4 we propose to zero out
       the connection and association cost on failover by having each FE
       associate with all online backup CEs after associating to the
       active an
       active/master CE.  Note that if an FE pre-associates with at
       least one backup CEs, CE, then the system will be technically
       operating in hot-standby mode.

   4.  And last: How fast an FE recovers its state depends on how much
       NE state exists.  By ForCES current definition, the new master CE
       assumes zero state on the FE and starts from scratch to update
       the FE.  So the larger the state, the longer the recovery.

3.  RFC5810 CE HA Framework

   To achieve CE High Availabilty, FEs and CEs MUST inter-operate per
   [RFC5810] definition which is repeated for contextual reasons in
   Section 3.1.  It should be noted that in this default setup, which
   MUST be implemented by CEs and FEs needing HA, the Fr plane is out of
   scope (and if available is proprietary to an implementation).

3.1.  Current  RFC 5810 CE High Availability Support

   As mentioned earlier, although there can be multiple redundant CEs,
   only one CE actively controls FEs in a ForCES NE.  In practice there
   may be only one backup CE.  At any moment in time time, only one master CE
   can control the FEs. an FE.  In addition, the FE connects and associates to
   only the master CE.  The FE and the CE PL are aware of the primary and
   one or more secondary CEs.  This information (primary, secondary CEs)
   is configured on the FE and the CE PLs during pre-association by the FEM
   and the CEM respectively.

   Figure 2 below illustrates the Forces message sequences that the FE
   uses to recover the connection in current defined cold-standby
   scheme.

         FE                   CE Primary        CE Secondary
         |                       |                    |
         |  Asso Estb,Caps exchg |                    |
       1 |<--------------------->|                    |
         |                       |                    |
         |       state update    |                    |
       2 |<--------------------->|                    |
         |                       |                    |
         |                       |                    |
         |                   FAILURE                  |
         |                                            |
         |         Asso Estb,Caps exchange            |
       3 |<------------------------------------------>|
         |                                            |
         |              Event Report (pri CE down)    |
       4 |------------------------------------------->|
         |                                            |
         |                 state update               |
       5 |<------------------------------------------>|

                  Figure 2: CE Failover for Cold Standby

3.1.1.  Cold Standby Interaction with ForCES Protocol

   High Availability parameterization in an FE is driven by configuring
   the FE Protocol Object (FEPO) LFB.

   The FEPO CEID component identifies the current master CE and the
   component table BackupCEs identifies the configured backup CEs.  The
   FEPO FE Heartbeat Interval, CE Heartbeat Dead Interval, and CE
   Heartbeat policy help in detecting connectivity problems between an
   FE and CE.  The CE Failover policy defines how the FE should react on
   a detected failure.  The FEObject FEState component [RFC5812] defines
   the operational forwarding status and control.  The CE can turn off
   the FE's forwarding operations by setting the FEState to AdminDisable
   and can turn it on by setting it to OperEnable.  Note: [RFC5812]
   section 5.1 has an erratta which describes the FEState as read-only
   when it should be read-write.

   Figure 3 illustrates the defined state machine that facilitates
   connection recovery.

   The FE connects to the CE specified on FEPO CEID component.  If it
   fails to connect to the defined CE, it moves it to the bottom of
   table BackupCEs and sets its CEID component to be the first CE
   retrieved from table BackupCEs.  The FE then attempts to associate
   with the CE designated as the new primary CE.  The FE continues
   through this procedure until it successfully connects to one of the
   CEs.

                              FE tries to associate
                                   +-->-----+
                                   |        |
      (CE changes master ||        |        |
      CE issues Teardown ||    +---+--------v----+
        Lost association) &&   | Pre-Association |
       CE failover policy = 0  | (Association    |
           +------------>-->-->|   in            +<----+
           |                   | progress)       |     |
           |     CE Issues                   |                 |     |
           |                   +--------+--------+     |
           |  CE Association        |                  | CEFTI
           |       Response         V                  | timer
           |     +------------------+                  | expires
           |     |     |FE issue CEPrimaryDown               ^
           |     V                                     |
         +-+-----------+                        +------+-----+
         |             |  (CE changes master || |  Not       |
         |             | (CE  CE issues Teardown || | Associated |
         |             |  Lost association) &&  |            +->---+
         | Associated  | CE Failover Policy = 1 |(May        | FE  |
         |             |                        | Continue   |try  v
         |             |-------->------->------>| Forwarding)|assn |
         |             |   Start CEFTI timer    |            |-<---+
         |             |                        |            |
         +-------------+                        +-------+-----+
              ^                                         |
              |            CE Issues            Successful                   V
              |            Association                  |
              |            Setup                        |
              |            (Cancel CEFTI Timer)         |
              +_________________________________________+
                        FE issue CEPrimaryDown event

                 Figure 3: FE State Machine considering HA

   There are several events that trigger mastership changes: The master
   CE may issue a mastership change (by changing the CEID value), or
   teardown an existing association; and last, connectivity may be lost
   between the CE and FE.

   When communication fails between the FE and CE (which can be caused
   by either the CE or link failure but not FE related), either the TML
   on the FE will trigger the FE PL regarding this failure or it will be
   detected using the HB messages between FEs and CEs.  The
   communication failure, regardless of how it is detected, MUST be
   considered as a loss of association between the CE and corresponding
   FE.

   If the FE's FEPO CE Failover Policy is configured to mode 0 (the
   default), it will immediately transition to the pre-association
   phase.  This means that if association is again established, later re-established with a
   CE, all FE state will need to be re-established. re-created.

   If the FE's FEPO CE Failover Policy is configured to mode 1, it
   indicates that the FE is capable of will run in HA restart recovery.  In such a
   case, the FE transitions to the Not Associated state and the CEFTI
   timer[RFC 5810]
   timer [RFC5810] is started.  The FE MAY continue to forward packets
   during this state.  It MAY also recycle  The FE recycles through any configured backup CEs
   in a round-robin fashion.  It first adds its primary CE to the bottom
   of table BackupCEs and sets its CEID component to be the first
   secondary retrieved from table BackupCEs.  The FE then attempts to
   associate with the CE designated as the new primary CE.  If it fails
   to re-associate with any CE and the CEFTI expires, the FE then
   transitions to the pre-association state. state and FE will operationally
   bring down its forwarding path (and set the [RFC5812] FEObject
   FEState component to OperDisable).

   If the FE, while in the not associated state, manages to reconnect to
   a new primary CE before CEFTI expires it transitions to the
   Associated state.  Once re-associated, the CE tries may try to synchronize
   any state that the FE may have lost during the not associated state. disconnection.  How the CE
   re-synchronizes such state is out of scope for the current ForCES
   architecture but would typically constitute the issuing of new
   configs and queries.

   An explicit message (a Config message setting Primary CE component in
   ForCES Protocol object) from the primary CE, can also be used to
   change the Primary CE for an FE during normal protocol operation.  In
   this case, the FE transitions to the Not Associated State and
   attempts to Associate with the new CE.

3.1.2.  Responsibilities for HA

   TML Level:

   1.  The TML controls logical connection availability and failover.

   2.  The TML also controls peer HA management.

   At this level, control of all lower layers, for example transport
   level (such as IP addresses, MAC addresses etc) and associated links
   going down are the role of the TML.

   PL Level:
   All other functionality, including configuring the HA behavior during
   setup, the CE IDs used to identify primary and secondary CEs,
   protocol messages used to report CE failure (Event Report), Heartbeat
   messages used to detect association failure, messages to change the
   primary CE (Config), and other HA related operations described in
   Section 3.1, are the PL's responsibility.

   To put the two together, if a path to a primary CE is down, the TML
   would take care of failing over to a backup path, if one is
   available.  If the CE is totally unreachable then the PL would be
   informed and it would take the appropriate actions described before.

4.  CE HA Hot Standby

   In this section we describe small extensions to the existing scheme
   to enable hot standby HA.  To achieve hot standby HA, we target to
   improve the specific goals defined in Section 2.2, namely:

   o  How fast a backup CE becomes operational.

   o  How fast the FEs associate with the new master CE.

   As described in Section 3.1, in the pre-association phase the FEM
   configures the FE to make it aware of all the CEs in the NE.  The FEM
   MUST configure the FE to make it aware which CE is the master and MAY
   specify any backup CE(s).

4.1.  Changes to the FEPO model

   In order for the above to be achievable there is a need to make a few
   changes in the FEPO model.  Section 1  Appendix A contains the xml definition of
   the new version 2 1.1 of the FEPO LFB.

   Changes from the version 1 of FEPO are:

   1.  Added four new datatypes:

       1.  CEStatusType an unsigned char to specify status of a
           connection with a CE.  Special values are are:

           +  0 (Disconnected), (Disconnected) represents that no connection attempt has
              been made with the CE yet

           +  1
           (Connected), (Connected) represents that the FE connection with the
              CE at the TML has completed successfully

           +  2 (Associated), (Associated) represents that the FE has successfully
              associated with the CE

           +  3 (Lost_Connection) (IsMaster) represents that the FE has associated with
              the CE and is the master of the FE

           +  4
           (Unreachable)

       2.  HAModeValues an unsigned (LostConnection) represents that the FE was associated
              with the CE at one point but lost the connection

           +  5 (Unreachable) represents the FE deems this CE
              unreachable. i.e the FE has tried over a period to connect
              to it but has failed.

       2.  HAModeValues an unsigned char to specify selected HA mode.
           Special values are are:

           +  0 (No HA Mode), Mode) The FE is not running in HA mode

           +  1 (HA Mode - Cold Standby) The FE is in HA mode cold
              Standby

           +  and 2 (HA Mode - Hot Standby) The FE is in HA mode hot
              Standby

       3.  FEHACapab an unsigned char  Statistics, a complex structure, representing the
           communication statistics between the FE and CE.  The
           components are:

           +  RecvPackets representing the packet count received from
              the CE

           +  RecvBytes representing the byte count received from the CE

           +  RecvErrPackets representing the erronous packets received
              from the CE.  This component logs badly formatted packets
              as well as good packets sent to specify HA capabilities of the
           FE.  Special values FE by the CE to set
              components whilst that CE is not the master.  Erronous
              packets are 0 (Graceful Restart), 1 (Cold
           Standby) and 2 (Hot Standby) dropped(i.e not responded to).

           +  RecvErrBytes representing the RecvErrPackets byte count
              received from the CE

           +  TxmitPackets representing the packet count transmitted to
              the CE

           +  TxmitErrPackets representing the error packet count
              transmitted to the CE.  Typically these would be failures
              due to communication.

           +  TxmitBytes representing the byte count transmitted to the
              CE

           +  TxmitErrBytes representing the byte count of errors from
              transmit to the CE

       4.  AllCEType  AllCEType, a struct of complex structure constituing the CE ID ID,
           Statistics and CEStatusType to contain reflect connection information
           for one CE.  Used in the AllCEs component array.

   2.  Appended three two new components:

       1.  Read-only AllCEs to hold status for all CEs.  AllCEs is an
           Array of the AllCEType.

       2.  Read-write HAMode of type HAModeValues to specify current High Availability mode selected.
           An unsigned char with three special values 0 (No HA), 1
           (Running Cold-Standby) and 2 (Running Hot-Standby)

       3.  AcceptBackupGets to provide the master CE to control whether carry the FE will accept incoming queries from backup CEs.

   3.  Added two new capabilities.:

       1.  HACapabilities, a table that defines which HA capabilities
           the FE supports.

       2.  MaximumMultipleCEAssocations which defines mode
           used by the maximum
           associations with CEs this FE can have.

   4. FE.

   3.  Added one additional Event, PrimaryCEChanged, reporting the HAPrimaryCEDown event which
       reports last known CEID and tentative new
       master CEID. CEID when there is a mastership change.

   Since no component from the FEPO v1 has been changed FEPO v2 v1.1
   retains backwards compatibility with CEs that know only version 1.0.
   These CEs however cannot make use of the High Availability options
   that the new FEPO provides.

4.2.  FEPO processing

   The FE's FEPO LFB version 2 1.1 AllCEs table contains all the CEIDs
   that the FE may connect and associate with.  The ordering of the CE
   IDs in this table defines the priority order in which an FE will
   connect to the CEs.  This table is provisioned initially from the
   configuration plane (FEM).  In the pre-association phase, the first
   CE ID (lowest table index) in the AllCEs table MUST be the first CE ID that
   the FE will attempt to connect and associate with.  If the FE fails
   to connect and associate with the first CE ID, listed CE, it will attempt to
   connect to the second CE ID and so forth, and cycles back to the
   beggining of the list until there is a connection and an successful association.  The
   FE MUST associate with at least one CE.  Upon a successful
   association, the FEPO's CEID component identifies the current
   associated master CE.

   While it would be much simpler to have the FE not respond to any
   messages from a CE other than the master, in practise it may has been
   found to be useful for the
   backup CEs to be able respond to query the FE.  Query commands are sent
   always on the high priority channel.  In order to avoid missing
   critical configuration or query commands from the master CE, all
   query commands from backup CEs MUST be sent on the high priority
   channel but with the least priority, the value of which is 4.
   However since queries are high priority from heartbeats, if the
   master CE waits for heartbeat responses and the hearbeats from backup
   CEs.  For this reason, we allow backup CEs flood the
   FE, the master CE may think that the FE is down.  Therefore it is
   prudent to add a control mechanism that will be able issues queries to control
   whether the FE can respond to query
   FE.  Configuration messages (SET/DEL) from backup CEs.  The
   AcceptBackupGets component, a boolean, is designed for this occasion.
   If the master CE sets it to true, the FE CEs MUST accept and process
   query commands from backup CEs.  If the AcceptBackupGets is false, be dropped
   by the FE MUST drop query commands from backup CEs. and logged as received errors.

   Asynchronous events that the master CE has subscribed to, as well as
   heartbeats are sent to all associated-to CEs.  Packet redirects
   continue to be sent only to the master CE.  The Heartbeat Interval,
   the CEHB Policy and the FEHB Policy MUST be the same are global for all CEs. CEs(and
   changed only by the master CE).

   Figure 4 illustrates the state machine that facilitates connection
   recovery with High Availability enabled.

                          FE tries to associate
                               +-->-----+
                               |        |
                                        ^        v
  (CE changes master ||        |        |
  CE issues Teardown ||    +----+--------+---+    +---+--------v----+
    Lost association) &&   | Pre-Association |
   CE failover policy = 0  | (Association    +<-------------------+    |
       +------------>-->-->|   in            +<-----+             |            +<----+
       |                   | progress)       |     |
       |                   |     CE Issues     +--------+--------+                 |     |
       |     Association                   +--------+--------+     |
       |  CE Association        |                  |       Response         V            Not Found || CEFTI
       |       Response         V                  |     +------------------+ timer expires      |
       |     +------------------+                  | expires
       |     |FE issue CEPrimaryDown               ^
       |     |FE issue PrimaryCEChanged            ^
       |     V                                      ^                                     |
     +-+-----------+                         +------+------+      |                        +------+-----+
     |             |  (CE issues Teardown changes master || |  Not       |
     |             |             |    Lost association) &&  CE issues Teardown || | Associated |
     |             |  Lost association) &&  | (CE Failover Policy=1)  |             |    CEFTI            +->-----------+
     | Associated  | CE Failover Policy = 1 |(May        |find first   | (May        |    timer
     |             |                        | Continue   |associated   v
     |   expires
       |             +---------->------->----->|  Forwarding)|             |-------->------->------>| Forwarding)|CE or  retry |
     |             |   Start CEFTI Timer     |             |      |
       |             |                         | Search for  |      |
       |             |              +--------->| next        |      |
       |             |              |          | associated  |      |
       |             |              | timer    | CE            |associating  |
     |             |                        |            |-<-----------+
     |             | (HAMode 2)                        |            |
       +-------------+
     +----+--------+                        +-------+----+
          |          +-------------+                                         |
          ^                       |                V              |
            |                       |                |              |
            |                       |                                   Found CE          | |                  CEHDI Expires   Send Event of        |
            |                       |            New associated CE ID.         |
            |                       |        Start CEHDI Timer      |
            |                       |                |
          |                                or newly | associated CE
          |                                         V
          |            (Cancel CEFTI Timer)         |                       |         +------+------+       |
            |                       ^---------+ Confirm     +-------^
            |                                 | State       |
            |              Received     +---->|             |
            |              different    |     | Wait for CE |
            |              CE ID.       ^     | to confirm  |
            |              Resend Event |     | new CE ID   |
            |       Restart CEHDI Timer +----<|             |
            |                                 +-----+-------+
            |           Received same CE ID         |
            |      (Cancel CEFTI & CEHDI Timer)     |
            +_______________________________________+
          +_________________________________________+
                    FE issue CEPrimaryDown event
                    FE issue PrimaryCEChanged event

                 Figure 4: FE State Machine considering HA

   Once the FE has associated with a master CE it moves to the post-
   association phase (Associated state).  It MAY also instruct the FE to
   use a different master CE.  It is assumed that the master
   CE will communicate with other CEs within the NE for the purpose of
   synchronization via the CE-CE interface.  The CE-CE interface is out
   of scope for this document.  An election result amongst CEs may
   result in desire to change mastership to a different associated CE;
   at which point current assumed master CE will instruct the FE to use
   a different master CE.

         FE                   CE#1         CE#2 ... CE#N
         |                      |            |        |
         | Asso Estb,Caps exchg |            |        |
       1 |<-------------------->|            |        |
         |                      |            |        |
         |      state update    |            |        |
       2 |<-------------------->|            |        |
         |                      |            |        |
         |        Asso Estb,Caps exchg       |        |
       3I|<--------------------------------->|        |
        ...                    ...          ...      ...
         |               Asso Estb,Caps exchg         |
       3N|<------------------------------------------>|
         |                      |            |        |
       4 |<-------------------->|            |        |
         .                      .            .        .
       4x|<-------------------->|            |        |
         |                   FAILURE         |        |
         |                      |            |        |
         | Event Report (CE#2 is new master) (LastCEID changed)   |        |
       5 |---------------------------------->|------->|
         | Event Report (CE#2 is new master) |        |
         | Config (Set CEID to CEID of CE#2)
       6 |---------------------------------->|------->|
         |                                   |
       6 |<----------------------------------|        |
       7 |<--------------------------------->|        |
         .                      .            .        .
       7x|<--------------------------------->|        |
         .                      .            .        .

                   Figure 5: CE Failover for Hot Standby

   While in the post-association phase, if the CE Failover Policy is set
   to 1 and HAMode set to 2 (HotStandby) then the FE, after succesfully
   associating with the master CE, MUST attempt to connect and associate
   with all the CEs that it is aware of.  Figure 5 steps #1 and #2
   illustrates the FE associating with CE#1 as the master and then
   proceeding to steps #3I to #3N the association with backup CE's CEs CE#2
   to CE#N. If the FE fails to connect or associate with some CEs, the
   FE MAY flag them as unreachable to avoid continuous attempts to
   connect.  The FE MAY retry to reassociate with unreachable CEs when
   possible.

   When the master CE for any reason is considered to be down, then the
   FE will MUST try to find the first associated CE from the list of all CEs
   in a round-robin fashion.

   If the FE is unable to find an associated FE in its list of CEs, then
   it will MUST attempt to connect and associate with the first from the list
   of all CEs and continue in a round-robin fashion until it connects
   and associates with a CE.

   Once the FE selects the associated CE to use as the new master, the
   FE then sends a High Availability Primary CE Changed Event
   Notification to all associated CEs to notifying them that the primary
   CE is down as well as which CE the reporting FE considers to be the
   new master.

   The new master CE MUST configure the CEID component of the FE within
   the time limit defined in the CEHDI Failover Timeout as a
   confirmation that the FE made the right choice.

         FE                   CE#1         CE#2 ... CE#N
         |                      |            |        |
         | Asso Estb,Caps exchg |            |        |
       1 |<-------------------->|            |        |
         |                      |            |        |
         |      state update    |            |        |
       2 |<-------------------->|            |        |
         |                      |            |        |
         |        Asso Estb,Caps exchg       |        |
       3I|<--------------------------------->|        |
         |                      |            |        |
        ...                    ...          ...      ...
         |               Asso Estb,Caps exchg         |
       3N|<------------------------------------------>|
         |                      |            |        |
       4 |<-------------------->|            |        |
         .                      .            .        .
       4x|<-------------------->|            |        |
         |                   FAILURE         |        |
         |                      |            |        |
         | Event Report (CE#2 is new master) |        |
       5 |---------------------------------->|------->|
         |                      |            |        |
         |           CEHDI Failover Timeout  |        |
         |                      |            |        |
         | Event Report (CE#N is new master) |        |
       6 |---------------------------------->|------->|
         |                      |            |        |
         |     Config (Set CEID to CEID of CE#N)      |
       7 |<-------------------------------------------|
       8a|<------------------------------------------>|
         .                      .            .        .
       8x|<------------------------------------------>|
                   Figure 6: CE Failover for Hot Standby

   If the FE does not get confirmation within the CEHDI Failover
   Timeout, it picks the next CE on its list and advertises it as the
   new master.  Figure 6 illustrates in step #5 selecting CE#2 as its
   new master.  In step #6, the timeout occurs and it picks CE#N as its
   new master.  The FE receives confirmation that CE#N is from the new master list
   of all CEs and continue in step #7.

   If the CE a round-robin fashion until it connects
   and associates with a CE.

   Once the FE assumed selects an associated CE to be the master discovers that it should
   not be use as the new master CE, then it will configure the CEID with the ID
   of master, the proper master CE.  How FE
   issues a PrimaryCEDown Event Notification to all associated CEs to
   notify them that the last primary CE decides who went down (and what its identity
   was); a second event PrimaryCEChanged identifying the new master CE
   is, is also out of scope of this document and
   is assumed sent as well to be done
   via a CE-CE communication protocol.  The FE must then associate with
   then new CE.

   If the CEFTI timer expires at either the not-associated or confirm
   states without a new master identify which CE confirmed, then the reporting FE MUST revert considers to be
   the pre-association stage. new master.

   In most High Availability architectures there exists the possibility
   of split-brain.  However, since in our setup the FE will never accept
   any configuration messages from any other than the master CE, we
   consider the FE as fenced against data corruption from the other CEs
   that consider themselves as the master.  The split-brain issue
   becomes mostly a CE-CE communication problem which is considered to
   be out of scope.

   By virtue of having multiple CE connections, the FE switchover to a
   new master CE will be relatively much faster.  The overall effect is
   improving the NE recovery time in case of communication failure or
   faults of the master CE.  This satisfies the requirement we set to
   achieve.

5.  IANA Considerations

   TBA

   XXX: This document updates an IANA registered FE Protocol object
   Logical Functional Block (LFB).  At minimal when it becomes RFC we
   should update https://www.iana.org/assignments/forces/forces.xml
   section on FEPO.

6.  Security Considerations

   TBA

   Security consideration as defined in section 9 of [RFC5810] applies.

7.  References

7.1.  Normative References

   [RFC5810]  Doria, A., Hadi Salim, J., Haas, R., Khosravi, H., Wang,
              W., Dong, L., Gopal, R., and J. Halpern, "Forwarding and
              Control Element Separation (ForCES) Protocol
              Specification", RFC 5810, March 2010.

7.2.  Informative References

   [RFC3654]  Khosravi, H. and T. Anderson, "Requirements for Separation
              of IP Control and Forwarding", RFC 3654, November 2003.

   [RFC3746]  Yang, L., Dantu, R., Anderson, T., and R. Gopal,
              "Forwarding and Control Element Separation (ForCES)
              Framework", RFC 3746, April 2004.

   [RFC5812]  Halpern, J. and J. Hadi Salim, "Forwarding and Control
              Element Separation (ForCES) Forwarding Element Model",
              RFC 5812, March 2010.

1.

Appendix I - A.  New FEPO version

   <LFBLibrary xmlns="urn:ietf:params:xml:ns:forces:lfbmodel:1.0"
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:noNamespaceSchemaLocation="lfb-schema.xsd" provides="FEPO">
      <!-- XXX -->
      <dataTypeDefs>
         <dataTypeDef>
            <name>CEHBPolicyValues</name>
            <synopsis>
               The possible values of CE heartbeat policy
            </synopsis>
            <atomic>
               <baseType>uchar</baseType>
               <specialValues>
                  <specialValue value="0">
                     <name>CEHBPolicy0</name>
                     <synopsis>
                             The CE will send heartbeats to the FE
                             every CEHDI timeout if no other messages
                             have been sent since.
                     </synopsis>
                  </specialValue>
                  <specialValue value="1">
                     <name>CEHBPolicy1</name>
                     <synopsis>
                             The CE will not send heartbeats to the FE
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
            <name>FEHBPolicyValues</name>
            <synopsis>
               The possible values of FE heartbeat policy
            </synopsis>
            <atomic>
               <baseType>uchar</baseType>
               <specialValues>
                  <specialValue value="0">
                     <name>FEHBPolicy0</name>
                     <synopsis>
           The FE heartbeat policy 0 will not generate any heartbeats to the CE
                     </synopsis>
                  </specialValue>
                  <specialValue value="1">
                     <name>FEHBPolicy1</name>
                     <synopsis>
           The FE heartbeat generates heartbeats to the CE every FEHI
           if no other messages have been sent to the CE.
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
            <name>FERestartPolicyValues</name>
            <synopsis>
               The possible values of FE restart policy 1
            </synopsis>
            <atomic>
               <baseType>uchar</baseType>
               <specialValues>
                  <specialValue value="0">
                     <name>FERestartPolicy0</name>
                     <synopsis>
                        The FE restart restats its state from scratch
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
            <name>HAModeValues</name>
            <synopsis>
               The possible values of HA modes
            </synopsis>
            <atomic>
               <baseType>uchar</baseType>
               <specialValues>
                  <specialValue value="0">
                     <name>NoHA</name>
                     <synopsis>
                        The FE is not running in HA mode
                     </synopsis>
                  </specialValue>
                  <specialValue value="1">
                     <name>ColdStandby</name>
                     <synopsis>
                        The FE is running in HA mode cold Standby
                     </synopsis>
                  </specialValue>
                  <specialValue value="2">
                     <name>HotStandby</name>
                     <synopsis>
                        The FE is running in HA mode hot Standby
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
            <name>FERestartPolicyValues</name>
            <name>CEFailoverPolicyValues</name>
            <synopsis>
               The possible values of FE restart CE failover policy
            </synopsis>
            <atomic>
               <baseType>uchar</baseType>
               <specialValues>
                  <specialValue value="0">
                     <name>FERestartPolicy0</name>
                     <name>CEFailoverPolicy0</name>
                     <synopsis>
           The FE restart policy 0 should stop functioning immediate and
                             transition to the FE OperDisable state
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
           <name>CEHBPolicyValues</name>
           <synopsis>The possible values of CE heartbeat policy</synopsis>
           <atomic>
             <baseType>uchar</baseType>
             <specialValues>
               <specialValue value="0">
                 <name>CEHBPolicy0</name>
                 <synopsis>The CE heartbeat policy 0</synopsis>
               </specialValue>
                  <specialValue value="1">
                 <name>CEHBPolicy1</name>
                 <synopsis>The
                     <name>CEFailoverPolicy1</name>
                     <synopsis>
           The FE should continue forwarding even without an
                             associated CE heartbeat policy 1</synopsis> for CEFTI. The FE goes to FE
                             OperDisable when the CEFTI expires and no
           association. Requires graceful restart support.
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
            <name>FEHACapab</name>
            <synopsis>
               The supported HA features
            </synopsis>
            <atomic>
               <baseType>uchar</baseType>
               <specialValues>
                  <specialValue value="0">
                     <name>GracefullRestart</name>
                     <synopsis>
                        The FE supports Graceful Restart
                     </synopsis>
                  </specialValue>
                  <specialValue value="1">
                     <name>HA</name>
                     <synopsis>
                        The FE supports cold-standby mode
                   </synopsis>
                </specialValue>
                <specialValue value="2">
                   <name>HOtStandBy</name>
                   <synopsis>
                       The FE supports hot-standby mode HA
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
            <name>CEStatusType</name>
            <synopsis>
                Status
            <synopsis>Status values. Status for each CE.
             </synopsis> CE</synopsis>
            <atomic>
               <baseType>uchar</baseType>
               <specialValues>
                  <specialValue value="0">
                     <name>Disconnected</name>
                     <synopsis>
               No
                     <synopsis>No connection attempt with the CE yet. yet
                     </synopsis>
                  </specialValue>
                  <specialValue value="1">
                     <name>Connected</name>
                     <synopsis>
                The
                     <synopsis>The FE has connected connection with the CE. CE at the TML
                        has been completed
                     </synopsis>
                  </specialValue>
                  <specialValue value="2">
                     <name>Associated</name>
                     <synopsis>
               The
                     <synopsis>The FE has associated with the CE. CE
                     </synopsis>
                  </specialValue>
                  <specialValue value="3">
                     <name>Lost_Connection</name>
                     <synopsis>
                The
                     <name>IsMaster</name>
                     <synopsis>The CE is the master (and associated)
                     </synopsis>

                  </specialValue>
                  <specialValue value="4">
                     <name>LostConnection</name>
                     <synopsis>The FE was associated with the CE but
                        lost the connection. connection
                     </synopsis>
                  </specialValue>
                  <specialValue value="4"> value="5">
                     <name>Unreachable</name>
                     <synopsis>
                The
                     <synopsis>The CE is deemed as unreachable by the FE. FE
                     </synopsis>
                  </specialValue>
               </specialValues>
            </atomic>
         </dataTypeDef>
         <dataTypeDef>
            <name>AllCEType</name>
            <name>StatisticsType</name>
            <synopsis>Statistics Definition</synopsis>
            <struct>
               <component componentID="1">
                  <name>RecvPackets</name>
                  <synopsis>Packets Received</synopsis>
                  <typeRef>uint64</typeRef>
               </component>
               <component componentID="2">
                  <name>RecvErrPackets</name>
                  <synopsis>Packets Received from CE with errors
                  </synopsis>
                  <typeRef>uint64</typeRef>
               </component>
               <component componentID="3">
                  <name>RecvBytes</name>
                  <synopsis>Bytes Received from CE</synopsis>
                  <typeRef>uint64</typeRef>
               </component>
               <component componentID="4">
                  <name>RecvErrBytes</name>
                  <synopsis>Bytes Received from CE in Error</synopsis>
                  <typeRef>uint64</typeRef>
               </component>
               <component componentID="5">
                  <name>TxmitPackets</name>
                  <synopsis>Packets Transmitted to CE</synopsis>
                  <typeRef>uint64</typeRef>
               </component>
               <component componentID="6">
                  <name>TxmitErrPackets</name>
                  <synopsis>
                Table
                     Packets Transmitted to CE that incurred
                     errors
                  </synopsis>
                  <typeRef>uint64</typeRef>
               </component>
               <component componentID="7">
                  <name>TxmitBytes</name>
                  <synopsis>Bytes Transmitted to CE</synopsis>
                  <typeRef>uint64</typeRef>
               </component>
               <component componentID="8">
                  <name>TxmitErrBytes</name>
                  <synopsis>Bytes Transmitted to CE incurring errors
                  </synopsis>
                  <typeRef>uint64</typeRef>
               </component>
            </struct>
         </dataTypeDef>
         <dataTypeDef>
            <name>AllCEType</name>
            <synopsis>Table Type for AllCE component.
             </synopsis> component</synopsis>
            <struct>
               <component componentID="1">
                  <name>CEID</name>
                  <synopsis>ID of the CE</synopsis>
                  <typeRef>uint32</typeRef>
               </component>
               <component componentID="2">
                  <name>Statistics</name>
                  <synopsis>Statistics per CE</synopsis>
                  <typeRef>StatisticsType</typeRef>
               </component>
               <component componentID="3">
                  <name>CEStatus</name>
                  <synopsis>Status of the CE</synopsis>
                  <typeRef>CEStatusType</typeRef>
               </component>
            </struct>
         </dataTypeDef>
      </dataTypeDefs>
      <LFBClassDefs>
         <LFBClassDef LFBClassID="2">
            <name>FEPO</name>
            <synopsis>
               The FE Protocol Object Object, with new CEHA
            </synopsis>
            <version>2.0</version>
            <version>1.1</version>
            <components>
               <component componentID="1" access="read-only">
                  <name>CurrentRunningVersion</name>
                  <synopsis>Currently running ForCES version</synopsis>
                  <typeRef>u8</typeRef>
                  <typeRef>uchar</typeRef>
               </component>
               <component componentID="2" access="read-only">
                  <name>FEID</name>
                  <synopsis>Unicast FEID</synopsis>
                  <typeRef>uint32</typeRef>
               </component>
               <component componentID="3" access="read-write">
                  <name>MulticastFEIDs</name>
                  <synopsis>
                     the table of all multicast IDs
                  </synopsis>
                  <array type="variable-size">
                     <typeRef>uint32</typeRef>
                  </array>
               </component>
               <component componentID="4" access="read-write">
                  <name>CEHBPolicy</name>
                  <synopsis>
                     The CE Heartbeat Policy
                  </synopsis>
                  <typeRef>CEHBPolicyValues</typeRef>
               </component>
               <component componentID="5" access="read-write">
                  <name>CEHDI</name>
                  <synopsis>
                     The CE Heartbeat Dead Interval in millisecs
                  </synopsis>
                  <typeRef>uint32</typeRef>
               </component>
               <component componentID="6" access="read-write">
                  <name>FEHBPolicy</name>
                  <synopsis>
                     The FE Heartbeat Policy
                  </synopsis>
                  <typeRef>FEHBPolicyValues</typeRef>
               </component>
               <component componentID="7" access="read-write">
                  <name>FEHI</name>
                  <synopsis>
                     The FE Heartbeat Interval in millisecs
                  </synopsis>
                  <typeRef>uint32</typeRef>
               </component>
               <component componentID="8" access="read-write">
                  <name>CEID</name>
                  <synopsis>
                     The Primary CE this FE is associated with
                  </synopsis>
                  <typeRef>uint32</typeRef>
               </component>
               <component componentID="9" access="read-write">
                  <name>BackupCEs</name>
                  <synopsis>
                     The table of all backup CEs other than the
                     primary
                  </synopsis>
                  <array type="variable-size">
                     <typeRef>uint32</typeRef>
                  </array>
               </component>
               <component componentID="10" access="read-write">
                  <name>CEFailoverPolicy</name>
                  <synopsis>
                     The CE Failover Policy
                  </synopsis>
                  <typeRef>CEFailoverPolicyValues</typeRef>
               </component>
               <component componentID="11" access="read-write">
                  <name>CEFTI</name>
                  <synopsis>
                     The CE Failover Timeout Interval in millisecs
                  </synopsis>
                  <typeRef>uint32</typeRef>
               </component>
               <component componentID="12" access="read-write">
                  <name>FERestartPolicy</name>
                  <synopsis>
                     The FE Restart Policy
                  </synopsis>
                  <typeRef>FERestartPolicyValues</typeRef>
               </component>
               <component componentID="13" access="read-write">
                  <name>LastCEID</name>
                  <synopsis>
                     The Primary CE this FE was last associated
                     with
                  </synopsis>
                  <typeRef>uint32</typeRef>
               </component>
               <component componentID="14" access="read-only">
                  <name>AllCEs</name> access="read-write">
                  <name>HAMode</name>
                  <synopsis>
                     The HA mode used
                  </synopsis>
                  <typeRef>HAModeValues</typeRef>
               </component>
               <component componentID="15" access="read-only">
                  <name>AllCEs</name>
                  <synopsis>The table of all CEs.
           </synopsis> CEs</synopsis>
                  <array type="variable-size">
                     <typeRef>AllCEType</typeRef>
                  </array>
               </component>
        <component componentID="15" access="read-write">
                            <name>HAMode</name>
                            <synopsis>
            Mode selection for action in HA after loss of master CE
          </synopsis>
                            <typeRef>HAModeValues</typeRef>
                    </component>
        <component componentID="16" access="read-write">
           <name>AcceptBackupGets</name>
           <synopsis>If true, the FE will accept and respond to Queries
           from BackupCEs.</synopsis>
           <typeRef>Boolean</typeRef>
         </component>
            </components>
            <capabilities>
               <capability componentID="30">
                  <name>SupportableVersions</name>
                  <synopsis>
                     the table of ForCES versions that FE supports
                  </synopsis>
                  <array type="variable-size">
                     <typeRef>uchar</typeRef>
                  </array>
               </capability>
               <capability componentID="31">
                  <name>HACapabilities</name>
                  <synopsis>
                     the table of HA capabilities the FE supports
                  </synopsis>
                  <array type="variable-size">
                     <typeRef>FEHACapab</typeRef>
                  </array>
               </capability>
            <capability componentID="32">
               <name>MaximumMultipleCEAssocations</name>
               <synopsis>
                 The number of CEs this FE can associate with at the same
                 time
               </synopsis>
               <atomic>
                  <baseType>uint32</baseType>
               </atomic>
            </capability>
            </capabilities>
            <events baseID="61">
               <event eventID="1">
                  <name>PrimaryCEDown</name>
                  <synopsis>
                     The pimary primary CE has changed
                  </synopsis>
                  <eventTarget>
                     <eventField>LastCEID</eventField>
                  </eventTarget>
                  <eventChanged/>
                  <eventReports>
                     <eventReport>
                        <eventField>LastCEID</eventField>
                     </eventReport>
                  </eventReports>
               </event>
               <event eventID="2">
                  <name>HAPrimaryCEDown</name>
                  <synopsis>The
                  <name>PrimaryCEChanged</name>
                  <synopsis>A New primary CE has changed</synopsis> been selected
                  </synopsis>
                  <eventTarget>
                     <eventField>LastCEID</eventField>
                     <eventField>CEID</eventField>
                  </eventTarget>
                  <eventChanged/>
                  <eventReports>
                     <eventReport>
                        <eventField>CEID</eventField>
                        <eventField>LastCEID</eventField>
                     </eventReport>
                  </eventReports>
               </event>
            </events>
         </LFBClassDef>
      </LFBClassDefs>
   </LFBLibrary>

Authors' Addresses

   Kentaro Ogawa
   NTT Corporation
   3-9-11 Midori-cho
   Musashino-shi, Tokyo  180-8585
   Japan

   Email: ogawa.kentaro@lab.ntt.co.jp

   Weiming Wang
   Zhejiang Gongshang University
   149 Jiaogong Road
   Hangzhou  310035
   P.R.China

   Phone: +86-571-88057712
   Email: wmwang@mail.zjgsu.edu.cn
   Evangelos Haleplidis
   University of Patras
   Panepistimioupoli Patron
   Patras  26504
   Greece

   Email: ehalep@ece.upatras.gr

   Jamal Hadi Salim
   Mojatatu Networks
   Suite 400, 303 Moodie Dr.
   Ottawa, Ontario  K2H 9R4
   Canada

   Email: hadi@mojatatu.com