Network Working Group                                      J. Hadi Salim
Internet-Draft                                         Mojatatu Networks
Intended status: Standards Track                                K. Ogawa
Expires: May 27, July 23, 2010                                   NTT Corporation
                                                       November 23, 2009
                                                        January 19, 2010

      SCTP based TML (Transport Mapping Layer) for ForCES protocol


   This document defines the SCTP based TML (Transport Mapping Layer)
   for the ForCES protocol.  It explains the rationale for choosing the
   SCTP (Stream Control Transmission Protocol) and also describes how
   this TML addresses all the requirements required by and the ForCES

Status of this Memo

   This Internet-Draft is submitted to IETF in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on May 27, July 23, 2010.

Copyright Notice

   Copyright (c) 2009 2010 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   ( in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the BSD License.

   This document may contain material from IETF Documents or IETF
   Contributions published or made publicly available before November
   10, 2008.  The person(s) controlling the copyright in some of this
   material may not have granted the IETF Trust the right to allow
   modifications of such material outside the IETF Standards Process.
   Without obtaining an adequate license from the person(s) controlling
   the copyright in such materials, this document may not be modified
   outside the IETF Standards Process, and derivative works of it may
   not be created outside the IETF Standards Process, except to format
   it for publication as an RFC or to translate it into languages other
   than English.

Table of Contents

   1.  Definitions  . . . . . . . . . . . . . . . . . . . . . . . . .  3  4
   2.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3  4
   3.  Protocol Framework Overview  . . . . . . . . . . . . . . . . .  3  4
     3.1.  The PL . . . . . . . . . . . . . . . . . . . . . . . . . .  5  6
     3.2.  The TML  . . . . . . . . . . . . . . . . . . . . . . . . .  5  6
       3.2.1.  TML and PL Interfaces  . . . . . . . . . . . . . . . .  5  6
       3.2.2.  TML Parameterization . . . . . . . . . . . . . . . . .  6  7
   4.  SCTP TML overview  . . . . . . . . . . . . . . . . . . . . . .  7  8
     4.1.  Rationale for using SCTP for TML . . . . . . . . . . . . .  7  8
     4.2.  Meeting TML requirements . . . . . . . . . . . . . . . . .  8  9
       4.2.1.  SCTP TML Channels  . . . . . . . . . . . . . . . . . .  9 10
       4.2.2.  Satisfying TML Requirements  . . . . . . . . . . . . . 14 15
   5.  SCTP TML Channel Work  . . . . . . . . . . . . . . . . . . . . 16 17
   6.  IANA Considerations  . . . . . . . . . . . . . . . . . . . . . 16 17
   7.  Security Considerations  . . . . . . . . . . . . . . . . . . . 17 18
     7.1.  IPsec Usage  . . . . . . . . . . . . . . . . . . . . . . . 17 19
       7.1.1.  SAD and SPD setup  . . . . . . . . . . . . . . . . . . 18 19
   8.  Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 18 19
   9.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 18 20
     9.1.  Normative References . . . . . . . . . . . . . . . . . . . 18 20
     9.2.  Informative References . . . . . . . . . . . . . . . . . . 19 20
   Appendix A.  Suggested SCTP TML Channel Work Implementation  . . . 20 21
     A.1.  SCTP TML Channel Initialization  . . . . . . . . . . . . . 20 21
     A.2.  Channel work scheduling  . . . . . . . . . . . . . . . . . 20 22
       A.2.1.  FE Channel work scheduling . . . . . . . . . . . . . . 21 22
       A.2.2.  CE Channel work scheduling . . . . . . . . . . . . . . 21 22
     A.3.  SCTP TML Channel Termination . . . . . . . . . . . . . . . 22 23
     A.4.  SCTP TML NE level channel scheduling . . . . . . . . . . . 22 24
   Appendix B.  Suggested Service Interface . . . . . . . . . . . . . 23 24
     B.1.  TML Boot-strapping . . . . . . . . . . . . . . . . . . . . 23 25
     B.2.  TML Shutdown . . . . . . . . . . . . . . . . . . . . . . . 25 26
     B.3.  TML Sending and Receiving  . . . . . . . . . . . . . . . . 26 27
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 27 29

1.  Definitions

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC 2119.

   The following definitions are taken from [RFC3654]and [RFC3746]:

   Logical Functional Block (LFB) -- A template that represents a fine-
   grained, logically separate aspects of FE processing.

   ForCES Protocol -- The protocol used at the Fp reference point in the
   ForCES Framework in [RFC3746].

   ForCES Protocol Layer (ForCES PL) -- A layer in the ForCES
   architecture that embodies the ForCES protocol and the state transfer
   mechanisms as defined in [I-D.ietf-forces-protocol].

   ForCES Protocol Transport Mapping Layer (ForCES TML) -- A layer in
   ForCES protocol architecture that specifically addresses the protocol
   message transportation issues, such as how the protocol messages are
   mapped to different transport media (like SCTP, IP, TCP, UDP, ATM,
   Ethernet, etc), and how to achieve and implement reliability,
   security, etc.

2.  Introduction

   The ForCES (Forwarding and Control Element Separation) working group
   in the IETF defines the architecture and protocol for separation of
   Control Elements(CE) and Forwarding Elements(FE) in Network
   Elements(NE) such as routers.  [RFC3654] and [RFC3746] respectively
   define architectural and protocol requirements for the communication
   between CE and FE.  The ForCES protocol layer specification
   [I-D.ietf-forces-protocol] describes the protocol semantics and
   workings.  The ForCES protocol layer operates on top of an inter-
   connect hiding layer known as the TML.  The relationship is
   illustrated in Figure 1.

   This document defines the SCTP based TML for the ForCES protocol
   layer.  It also addresses all the requirements for the TML including
   security, reliability, etc as defined in [I-D.ietf-forces-protocol].

3.  Protocol Framework Overview

   The reader is referred to the Framework document [RFC3746], and in
   particular sections 3 and 4, for an architectural overview and
   explanation of where and how the ForCES protocol fits in.

   There is some content overlap between the ForCES protocol
   specification [I-D.ietf-forces-protocol] and this section (Section 3)
   in order to provide basic context to the reader of this document.

   The ForCES protocol layering constitutes two pieces: the PL and TML.
   This is depicted in Figure 1.

               |                    CE PL                     |
               |                    CE TML                    |
                           ForCES PL  |messages
               |                   FE TML                      |
               |                   FE PL                       |

      Figure 1: Message exchange between CE and FE to establish an NE

   The PL is in charge of the ForCES protocol.  Its semantics and
   message layout are defined in [I-D.ietf-forces-protocol].  The TML is
   necessary to connect two ForCES end-points as shown in Figure 1.

   Both the PL and TML are standardized by the IETF.  While only one PL
   is defined, different TMLs are expected to be standardized.  The TML
   at each of the nodes (CE and FE) is expected to be of the same
   definition in order to inter-operate.

   When transmitting from a ForCES end-point, the PL delivers its
   messages to the TML.  The TML then delivers the PL message to the
   destination TML(s).

   On reception of a message, the TML delivers the message to its
   destination PL (as described in the ForCES header).

3.1.  The PL

   The PL is common to all implementations of ForCES and is standardized
   by the IETF [I-D.ietf-forces-protocol].  The PL is responsible for
   associating an FE or CE to an NE.  It is also responsible for tearing
   down such associations.

   An FE may use the PL to asynchronously send packets to the CE.  The
   FE may redirect via the PL (from outside the NE) various control
   protocol packets (e.g.  OSPF, etc) to the CE.  Additionally, the FE
   delivers various events that CE has subscribed-to via PL

   The CE and FE may interact synchronously via the PL.  The CE issues
   status requests to the FE and receives responses via the PL.  The CE
   also configures the associated FE's LFBs' components using the PL

3.2.  The TML

   The TML is responsible for transport of the PL messages.
   [I-D.ietf-forces-protocol] section 5 defines the requirements that
   need to be met by a TML specification.  The SCTP TML specified in
   this document meets all the requirements specified in
   [I-D.ietf-forces-protocol] section 5.  Section 4.2.2 describes how
   the TML requirements are met.

3.2.1.  TML and PL Interfaces

   There are two interfaces to the PL and TML.  The specification of
   these interfaces is out of scope for this document, but the
   interfaces are introduced to show how they fit into the architecture
   and summarize the function provided at the interfaces.  The first
   interface is between the PL and TML and the other is the CE Manager
   (CEM)/FE Manager (FEM)[RFC3746] interface to both the PL and TML.
   Both interfaces are shown in Figure 2.

                      |  +----------------------+  |
                      |  |                      |  |
     +---------+      |  |          PL          |  |
     |         |      |  +----------------------+  |
     |FEM/CEM  |<---->|             ^              |
     |         |      |             |              |
     +---------+      |             |TML API       |
                      |             |              |
                      |             V              |
                      |  +----------------------+  |
                      |  |                      |  |
                      |  |          TML         |  |
                      |  |                      |  |
                      |  +----------------------+  |

                      Figure 2: The TML-PL interface

   The CEM/FEM[RFC3746] interface is responsible for bootstrapping and
   parameterization of the TML.  In its most basic form the CEM/FEM
   interface takes the form of a simple static config file which is read
   on startup in the pre-association phase.

   Appendix B discusses in more details the service interfaces.

3.2.2.  TML Parameterization

   It is expected that it should be possible to use a configuration
   reference point, such as the FEM or the CEM, to configure the TML.

   Some of the configured parameters may include:

   o  PL ID

   o  Connection Type and associated data.  For example if a TML uses
      IP/SCTP then parameters such as SCTP ports and IP addresses need
      to be configured.

   o  Number of transport connections

   o  Connection Capability, such as bandwidth, etc.

   o  Allowed/Supported Connection QoS policy (or Congestion Control

4.  SCTP TML overview

   SCTP [RFC4960] is an end-to-end transport protocol that is equivalent
   to TCP, UDP, or DCCP in many aspects.  With a few exceptions, SCTP
   can do most of what UDP, TCP, or DCCP can achieve.  SCTP as well can
   do most of what a combination of the other transport protocols can
   achieve (e.g.  TCP and DCCP or TCP and UDP).

   Like TCP, it provides ordered, reliable, connection-oriented, flow-
   controlled, congestion controlled data exchange.  Unlike TCP, it does
   not provide byte streaming and instead provides message boundaries.

   Like UDP, it can provide unreliable, unordered data exchange.  Unlike
   UDP, it does not provide multicast support

   Like DCCP, it can provide unreliable, ordered, congestion controlled,
   connection-oriented data exchange.

   SCTP also provides other services that none of the 3 transport
   protocols mentioned above provide that we found attractive.  These

   o  Multi-homing

   o  Runtime IP address binding

   o  A range of reliability shades with congestion control

   o  Built-in heartbeats

   o  Multi-streaming

   o  Message boundaries with reliability

   o  Improved SYN DOS protection

   o  Simpler transport events

   o  Simplified replicasting

4.1.  Rationale for using SCTP for TML

   SCTP has all the features required to provide a robust TML.  As a
   transport that is all-encompassing, it negates the need for having
   multiple transport protocols in order to satisfy the TML requirements
   ([I-D.ietf-forces-protocol] section 5).  As a result it allows for
   simpler coding and therefore reduces a lot of the interoperability

   SCTP is also very mature and widely used, making it a good choice for
   ubiquitous deployment.

4.2.  Meeting TML requirements

                  |                      |
                              |   TML API
                   TML        |
                  |           |          |
                  |    +------+------+   |
                  |    |  TML core   |   |
                  |    +-+----+----+-+   |
                  |      |    |    |     |
                  |    SCTP socket API   |
                  |      |    |    |     |
                  |      |    |    |     |
                  |    +-+----+----+-+   |
                  |    |    SCTP     |   |
                  |    +------+------+   |
                  |           |          |
                  |           |          |
                  |    +------+------+   |
                  |    |      IP     |   |
                  |    +-------------+   |

                     Figure 3: The TML-SCTP interface

   Figure 3 details the interfacing between the PL and SCTP TML and the
   internals of the SCTP TML.  The core of the TML interacts on its
   north-bound interface to the PL (utilizing the TML API).  On the
   south-bound interface, the TML core interfaces to the SCTP layer
   utilizing the standard socket interface[I-D.ietf-tsvwg-sctpsocket].
   There are three SCTP socket connections opened between any two PL
   endpoints (whether FE or CE).

4.2.1.  SCTP TML Channels

                  |                    |
                  |     TML   core     |
                  |                    |
                    |       |        |
                    |   Med prio,    |
                    |  Semi-reliable |
                    |    channel     |
                    |       |      Low prio,
                    |       |      Unreliable
                    |       |      channel
                    |       |        |
                    ^       ^        ^
                    |       |        |
                    Y       Y        Y
          High prio,|       |        |
           reliable |       |        |
            channel |       |        |
                    Y       Y        Y
                 |                     |
                 |        SCTP         |
                 |                     |

                      Figure 4: The TML-SCTP channels

   Figure 4 details further the interfacing between the TML core and
   SCTP layers.  There are 3 channels used to separate group and prioritize the
   work for different types of ForCES traffic.  Each channel constitutes a
   an SCTP socket interface. interface which has different properties.  It should
   be noted that all SCTP channels are congestion aware (and for that
   reason that detail is left out of the description of the 3 channels).
   SCTP port 6704, 6705, 6706 are used for the higher, medium and lower
   priority channels respectively.  SCTP Payload Protocol ID (PPID)
   values of 21, 22, and 23 are used for the higher, medium and lower
   priority channels respectively.  Justifying Choice of 3 Sockets

   SCTP allows up to 64K streams to be sent over a single socket
   interface.  The authors initially envisioned using a single socket
   for all three channels (mapping a channel to an SCTP stream).  This
   simplifies programming of the TML as well as conserves use of SCTP

   Further analysis revealed head of line blocking issues with this
   initial approach.  Lower priority packets not needing reliable
   delivery could block higher priority packets (needing reliable
   delivery) under congestion situation for an indeterminate period of
   time (depending on how many outstanding lower priority packets are
   pending).  For this reason, we elected to go with mapping each of the
   three channels to a different SCTP socket (instead of a different
   stream within a single socket).  Higher Priority, Reliable channel

   The higher priority (HP) channel uses a standard SCTP reliable socket
   on port 6704.  SCTP PPID 21 is used for all messages on the HP
   channel.  The HP channel is used for CE solicited messages and their

   1.  ForCES configuration messages flowing from CE to FE and responses
       from the FE to CE.

   2.  ForCES query messages flowing from CE to FE and responses from
       the FE to the CE.

   PL priorities 4-7 MUST be used for all PL messages using this
   channel.  The following PL messages MUST use the HP channel for

   o  Association Setup (default priority: 7)

   o  Association Setup Response (default priority: 7)

   o  Association Teardown (default priority: 7)

   o  Config (default priority: 4)

   o  Config Response (default priority: 4)

   o  Query (default priority: 4)

   o  Query Response (default priority: 4)

   If PL priorities outside of the specified range (4-7) priority, PPID
   or PL message types other than the above are received on the HP
   channel, then the PL message MUST be dropped.

   Although an implementation may choose different values from the
   defined range (4-7), it is RECOMMENDED that default priorities be
   used.  A response to a ForCES message MUST contain the same priority
   as the request.  Example, a config sent by the CE with priority 5
   MUST have a config-response from the FE with priority 5.  Medium Priority, Semi-Reliable channel

   The medium priority (MP) channel uses SCTP-PR on port 6705.  SCTP
   PPID 22 MUST be used for all messages on the MP channel.  Time limits
   on how long a message is valid are set on each outgoing message.
   This channel is used for events from the FE to the CE that are
   obsoleted over time.  Events that are accumulative in nature and are
   recoverable by the CE (by issuing a query to the FE) can tolerate
   lost events and therefore should use this channel.  For example, a
   generated event which carries the value of a counter that is
   monotonically incrementing fits to use this channel.

   PL priority 3 MUST be used for PL messages on this channel.  The
   following PL messages MUST use the MP channel for transport:

   o  Event Notification (default priority: 3)

   If PL priority outside of the specified priority, PPID or PL message
   type other than the above are received on the MP channel, then the PL
   message MUST be dropped.  Lower Priority, Unreliable channel

   The lower priority (LP) channel uses SCTP port 6706.  SCTP PPID 23 is
   used for all messages on the LP channel.  The LP channel also MUST
   use SCTP-PR with lower timeout values than the MP channel.  The
   reason an unreliable channel is used for redirect messages is to
   allow the control protocol at both the CE and its peer-endpoint to
   take charge of how the end-to-end semantics of the said control
   protocol's operations.  For example:

   1.  Some control protocols are reliable in nature, therefore making
       this channel reliable introduces an extra layer of reliability
       which could be harmful.  So any end-to-end retransmits will
       happen from remote.

   2.  Some control protocols may desire to have obsolescence of
       messages over retransmissions; making this channel reliable
       contradicts that desire.

   Given ForCES PL heartbeats are traffic sensitive, sending them over
   the LP channel also makes sense.  If the other end is not processing
   other channels it will eventually get heartbeats; and if it is busy
   processing other channels heartbeats will be obsoleted locally over
   time (and it does not matter if they did not make it).

   PL priorities 1-2 MUST be used for PL messages on this channel.  PL
   messages that MUST use the MP channel for transport are:

   o  Packet Redirect (default priority: 2)

   o  Heartbeats (default priority: 1)

   If PL priorities outside of the specified priority range, PPID or PL
   message types other than the above are received on the LP channel,
   then the PL message MUST be dropped.  Scheduling of The 3 Channels

   Strict priority work-conserving scheduling is used to process both on
   sending and receiving (of the PL messages) by the TML Core as shown
   in Figure 5.

   This means that the HP messages are always processed first until
   there are no more left.  The LP channel is processed only if channels
   that are a higher priority than itself has no more messages left to
   process.  This means that under congestion situation, a higher
   priority channel with sufficient messages that occupy the available
   bandwidth would starve lower priority channel(s).

   The design intent of the SCTP TML is to tie processing prioritization
   as described in Section and transport congestion control to
   provide implicit node congestion control.  This is further detailed
   in Appendix A.2.

   It should be emphasized that the work scheduling prioritization
   scheme prescribed in this document is receiver based processing.
   Fully arrived packets on any of the channels are a source of work
   whose output may result in transmitted packets.  However, we have no
   control on the order in which SCTP/OS/network chooses to send
   transmitted packets across and make them available to the receiver.
   This is a limitation that we try to ameliorate by our choice of
   channel properties, ForCES message grouping and the tying of CE and
   FE work scheduling.  And while that helps us ameliorate some of these
   issues it does not fully resolve all.

   From a ForCES perspective, we can tolerate some reordering.  Example:
   If an FE transmits a config response (HP), followed by 10000 OSPF
   redirect packets(LP) and the CE gets 5 OSPF redirects (LP) first
   before the config response(HP), that is tolerable.  What matters is
   the CE gets to processing the HP message soon (instead of sitting in
   long periods of time processing OSPF packets which would have
   happened if we use a single socket with 3 streams).  This is
   particularly important in order to deal well with node overload as
   discussed in Section

       SCTP channel            +----------+
       Work available          |   DONE   +---<--<--+
           |                   +---+------+         |
           Y                                        ^
           |         +-->--+         +-->---+       |
   +-->-->-+         |     |         |      |       |
   |       |         |     |         |      |       ^
   |       ^         ^     Y         ^      Y       |
   ^      / \        |     |         |      |       |
   |     /   \       |     ^         |      ^       ^
   |    / Is  \      |    / \        |     / \      |
   |   / there \     |   /Is \       |    /Is \     |
   ^  / HP work \    ^  /there\      ^   /there\    ^
   |  \    ?    /    | /MP work\     |  /LP work\   |
   |   \       /     | \    ?  /     |  \   ?   /   |
   |    \     /      |  \     /      |   \     /    ^
   |     \   /       ^   \   /       ^    \   /     |
   |      \ /        |    \ /        |     \ /      |
   ^       Y-->-->-->+     Y-->-->-->+      Y->->->-+
   |       |    NO         |    NO          |  NO
   |       |               |                |
   |       Y               Y                Y
   |       | YES           | YES            | YES
   ^       |               |                |
   |       Y               Y                Y
   |  +----+------+    +---|-------+   +----|------+
   |  |- process  |    |- process  |   |- process  |
   |  |  HP work  |    |  MP work  |   | LP work   |
   |  +------+----+    +-----+-----+   +-----+-----+
   |         |               |               |
   ^         Y               Y               Y
   |         |               |               |
   |         Y               Y               Y

               Figure 5: SCTP TML Strict Priority Scheduling  SCTP TML Parameterization

   The following is a list of parameters needed for booting the TML.  It
   is expected these parameters will be extracted via the FEM/CEM
   interface for each PL ID.

   1.  The IP address(es) or a resolvable DNS/hostname(s) of the CE/FE.

   2.  Whether to use IPsec or not.  If IPsec is used, how to
       parameterize the different required ciphers, keys etc as
       described in Section 7.1

   3.  The HP SCTP port, as discussed in Section  The default
       HP port value is 6704 (Section 6).

   4.  The MP SCTP port, as discussed in Section  The default
       MP port value is 6705 (Section 6).

   5.  The LP SCTP port, as discussed in Section  The default
       LP port value is 6706 (Section 6).

4.2.2.  Satisfying TML Requirements

   [I-D.ietf-forces-protocol] section 5 lists requirements that a TML
   needs to meet.  This section describes how the SCTP TML satisfies
   those requirements.  Satisfying Reliability Requirement

   As mentioned earlier, a shade of reliability ranges is possible in
   SCTP.  Therefore this requirement is met.  Satisfying Congestion Control Requirement

   Congestion control is built into SCTP.  Therefore, this requirement
   is met.  Satisfying Timeliness and Prioritization Requirement

   By using 3 sockets in conjunction with the partial-reliability
   feature[RFC3758], both timeliness and prioritization requirements are
   addressed.  Satisfying Addressing Requirement

   There are no extra headers required for SCTP to fulfil this
   requirement.  SCTP can be told to replicast packets to multiple
   destinations.  The TML implementation will need to translate PL
   addresses, to a variety of unicast IP addresses in order to emulate
   multicast and broadcast PL addresses.  Satisfying High Availability Requirement

   Transport link resiliency is one of SCTP's strongest point.  Failure
   detection and recovery is built in, as mentioned earlier.

   o  The SCTP multi-homing feature is used to provide path diversity.
      Should one of the peer IP addresses become unreachable, the
      other(s) are used without needing lower layer convergence
      (routing, for example) or even the TML becoming aware.

   o  SCTP heartbeats and data transmission thresholds are used on a per
      peer IP address to detect reachability faults.  The faults could
      be a result of an unreachable address or peer, which may be caused
      by a variety of reasons, like interface, network, or endpoint
      failures.  The cause of the fault is noted.

   o  With the ADDIP feature, one can migrate IP addresses to other
      nodes at runtime.  This is not unlike the VRRP[RFC3768] protocol
      use.  This feature is used in addition to multi-homing in a
      planned migration of activity from one FE/CE to another.  In such
      a case, part of the provisioning recipe at the CE for replacing an
      FE involves migrating activity of one FE to another.  Satisfying Node Overload Prevention Requirement

   The architecture of this TML defines three separate channels, one per
   socket, to be used within any FE-CE setup.  The work scheduling
   design for processing the TML channels (Section is strict
   priority.  A fundamental desire of the strict prioritization is to
   ensure that more important processing work always gets node resources
   over lesser important work.

   When a ForCES node CPU is overwhelmed because the incoming packet
   rate is higher than it can keep up with, the channel queues grow and
   transport congestion subsequently follows.  By virtue of using SCTP,
   the congestion is propagated back to the source of the incoming
   packets and eventually alleviated.

   The HP channel work gets prioritized at the expense of the MP which
   gets prioritized over LP channels.  The preferential scheduling only
   kicks in when there is node overload regardless of whether there is
   transport congestion.  As a result of the preferential work
   treatment, the ForCES node achieves a robust steady processing
   capacity.  Refer to Appendix A.2 for details on scheduling.

   For an example of how the overload prevention works: consider a
   scenario where an overwhelming amount redirected packets (from
   outside the NE) coming into the NE may overload the FE while it has
   outstanding config work from the CE.  In such a case, the FE, while
   it is busy processing config requests from the CE essentially ignores
   processing the redirect packets on the LP channel.  If enough
   redirect packets accumulate, they are dropped either because the LP
   channel threshold is exceeded or because they are obsoleted.  If on
   the other hand, the FE has successfully processed the higher priority
   channels and their related work, then it can proceed and process the
   LP channel.  So as demonstrated in this case, the TML ties transport
   congestion and node overload implicitly together.  Satisfying Encapsulation Requirement

   The SCTP TML sets SCTP PPIDs to identify channels used as described
   in Section

5.  SCTP TML Channel Work

   There are two levels of TML channel work within an NE when a ForCES
   node (CE or FE) is connected to multiple other ForCES nodes:

   1.  NE-level I/O work where a ForCES node (CE or FE) needs to choose
       which of the peer nodes to process.

   2.  Node-level I/O work where a ForCES node, handles the three SCTP
       TML channels separately for each single ForCES endpoint.

   NE-level scheduling definition is left up to the implementation and
   is considered out of scope for this document.  Appendix A.4 discuss
   briefly some constraints that an implementer needs to worry about.

   This document provides suggestions on SCTP channel work
   implementation in Appendix A.

   The FE SHOULD do channel connections to the CE in the order of
   incrementing priorities i.e.  LP socket first, followed by MP and
   ending with HP socket connection.  The CE, however, MUST NOT assume
   that there is ordering of socket connections from any FE.

6.  IANA Considerations

   Following the policies outlined in "Guidelines for Writing an IANA
   Considerations Section in RFCs" [RFC5226], the following name spaces
   are defined in ForCES SCTP TML.

   o  SCTP port 6704 for the HP channel, 6705 for the MP channel, and
      6706 for the LP channel.

   o  SCTP Payload Protocol ID (PPID) 21 for the HP channel, 22 for the
      MP channel, and 23 for the LP channel.

   XXX [Note to IANA]: Port allocations(SCTP 6700-6702) were made in
   August 2009.  We have been asked by IESG to change these as
   prescribed above.

7.  Security Considerations

   The SCTP TML provides the following security services to the PL:

   o  A mechanism to authenticate ForCES CEs and FEs at transport level
      in order to prevent the participation of unauthorized CEs and
      unauthorized FEs in the control and data path processing of a
      ForCES NE.

   o  A mechanism to ensure message authentication of PL data and
      headers transferred from the CE to FE (and vice-versa) in order to
      prevent the injection of incorrect data into PL messages.

   o  A mechanism to ensure the confidentiality of PL data and headers
      transferred from the CE to FE (and vice-versa), in order to
      prevent disclosure of PL information transported via the TML.

   Security choices provided by the TML are made by the operator and
   take effect during the pre-association phase of the ForCES protocol.
   An operator may choose to use all, some or none of the security
   services provided by the TML in a CE-FE connection.

   When operating under a secured environment, or for other operational
   concerns (in some cases performance issues) the operator may turn off
   all the security functions between CE and FE.

   IP Security Protocol (IPsec) [RFC4301] is used to provide needed
   security mechanisms.

   IPsec is an IP level security scheme transparent to the higher-layer
   applications and therefore can provide security for any transport
   layer protocol.  This gives IPsec the advantage that it can be used
   to secure everything between the CE and FE without expecting the TML
   implementation to be aware of the details.

   The IPsec architecture is designed to provide message integrity and
   message confidentiality outlined in the TML security requirements

   [I-D.ietf-forces-protocol].  Mutual authentication and key exchange
   protocol are provided by Internet Key Exchange (IKE)[RFC2409].

7.1.  IPsec Usage

   A ForCES FE or CE MUST support the following:

   o  Internet Key Exchange (IKE)[RFC2409] with certificates for
      endpoint authentication.

   o  Transport Mode Encapsulating Security Payload (ESP)[RFC4303].

   o  HMAC-SHA1-96 [RFC2404] for message integrity protection

   o  AES-CBC with 128-bit keys [RFC3602] for message confidentiality.

   o  Replay protection[RFC4301].

   It is expected to be possible

   A compliant implementation SHOULD provide operational means for
   configuring the CE or and FE to be operationally
   configured to negotiate other cipher suites and even
   use manual keying.

7.1.1.  SAD and SPD setup

   To minimize the operational configuration it is recommended RECOMMENDED that only
   the IANA issued SCTP protocol number(132) be used as a selector in
   the Security Policy Database (SPD) for ForCES.  In such a case only a
   single SPD and SAD entry is needed.

   It should be straightforward to

   Setup MAY alternatively extend such a the above policy to alternatively
   use so that it uses the 3
   SCTP TML port numbers as SPD selectors.  But as noted above this
   choice will require increased number of SPD entries.

   In scenarios where multiple IP addresses are used within a single
   association, and there is desire to configure different policies on a
   per IP address, then it is recommended RECOMMENDED to follow [RFC3554]

8.  Acknowledgements

   The authors would like to thank Joel Halpern, Michael Tuxen, Randy
   Stewart, Evangelos Haleplidis, Chuanhuang Li, Lars Eggert, Avshalom
   Houri, Adrian Farrel, Juergen Quittek, Magnus Westerlund, and Pasi
   Eronen for engaging us in discussions that have made this document

   Ross Callon was an excellent manager who persevered in providing us
   guidance and Joel Halpern was an excellent document shepherd without
   whom this document would have taken longer to publish.

9.  References

9.1.  Normative References

              Dong, L., Doria, A., Gopal, R., HAAS, R., Salim, J.,
              Khosravi, H., and W. Wang, "ForCES Protocol
              Specification", draft-ietf-forces-protocol-22 (work in
              progress), March 2009.

   [RFC2404]  Madson, C. and R. Glenn, "The Use of HMAC-SHA-1-96 within
              ESP and AH", RFC 2404, November 1998.

   [RFC2409]  Harkins, D. and D. Carrel, "The Internet Key Exchange
              (IKE)", RFC 2409, November 1998.

   [RFC3554]  Bellovin, S., Ioannidis, J., Keromytis, A., and R.
              Stewart, "On the Use of Stream Control Transmission
              Protocol (SCTP) with IPsec", RFC 3554, July 2003.

   [RFC3602]  Frankel, S., Glenn, R., and S. Kelly, "The AES-CBC Cipher
              Algorithm and Its Use with IPsec", RFC 3602,
              September 2003.

   [RFC3758]  Stewart, R., Ramalho, M., Xie, Q., Tuexen, M., and P.
              Conrad, "Stream Control Transmission Protocol (SCTP)
              Partial Reliability Extension", RFC 3758, May 2004.

   [RFC4301]  Kent, S. and K. Seo, "Security Architecture for the
              Internet Protocol", RFC 4301, December 2005.

   [RFC4303]  Kent, S., "IP Encapsulating Security Payload (ESP)",
              RFC 4303, December 2005.

   [RFC4960]  Stewart, R., "Stream Control Transmission Protocol",
              RFC 4960, September 2007.

   [RFC5226]  Narten, T. and H. Alvestrand, "Guidelines for Writing an
              IANA Considerations Section in RFCs", BCP 26, RFC 5226,
              May 2008.

9.2.  Informative References

              Halpern, J. and J. Salim, "ForCES Forwarding Element
              Model", draft-ietf-forces-model-16 (work in progress),
              October 2008.

              Stewart, R., Poon, K., Tuexen, M., Yasevich, V., and P.
              Lei, "Sockets API Extensions for Stream Control
              Transmission Protocol (SCTP)",
              draft-ietf-tsvwg-sctpsocket-20 (work in progress),
              February 2009.
              January 2010.

   [RFC3654]  Khosravi, H. and T. Anderson, "Requirements for Separation
              of IP Control and Forwarding", RFC 3654, November 2003.

   [RFC3746]  Yang, L., Dantu, R., Anderson, T., and R. Gopal,
              "Forwarding and Control Element Separation (ForCES)
              Framework", RFC 3746, April 2004.

   [RFC3768]  Hinden, R., "Virtual Router Redundancy Protocol (VRRP)",
              RFC 3768, April 2004.

Appendix A.  Suggested SCTP TML Channel Work Implementation

   As mentioned in Section 5, there are two levels of TML channel work
   within an NE when a ForCES node (CE or FE) is connected to multiple
   other ForCES nodes:

   1.  NE-level I/O work where a ForCES node (CE or FE) needs to choose
       which of the peer nodes to process.

   2.  Node-level I/O work where a ForCES node, handles the three SCTP
       TML channels separately for each single ForCES endpoint.

   NE-level scheduling definition is left up to the implementation and
   is considered out of scope for this document.  Appendix A.4 discusses
   briefly some constraints that an implementer needs to worry about.

   This document and in particular Appendix A.1, Appendix A.2 and
   Appendix A.3 discuss details of node-level I/O work.

A.1.  SCTP TML Channel Initialization

   As discussed in Section 5, it is recommended that the FE SHOULD do
   socket connections to the CE in the order of incrementing priorities
   i.e.  LP socket first, followed by MP and ending with HP socket
   connection.  The CE, however, MUST NOT assume that there is ordering
   of socket connections from any FE.  Appendix B.1 has more details on
   the expected initialization of SCTP channel work.

A.2.  Channel work scheduling

   This section provides high level details of the scheduling view of
   the SCTP TML core (Section 4.2.1).  A practical scheduler
   implementation takes care of many little details (such as timers,
   work quanta, etc) not described in this document.  It is left to the
   implementer to take care of those details.

   The CE(s) and FE(s) are coupled together in the principles of the
   scheduling scheme described here to tie together node overload with
   transport congestion.  The design intent is to provide the highest
   possible robust work throughput for the NE under any network or
   processing congestion.

A.2.1.  FE Channel work scheduling

   The FE scheduling, in priority order, needs to I/O process:

   1.  The HP channel I/O in the following priority order:

       1.  Transmitting back to the CE any outstanding result of
           executed work via the HP channel transmit path.

       2.  Taking new incoming work from the CE which creates ForCES
           work to be executed by the FE.

   2.  ForCES events which result in transmission of unsolicited ForCES
       packets to the CE via the MP channel.

   3.  Incoming Redirect work in the form of control packets that come
       from the CE via LP channel.  After redirect processing, these
       packets get sent out on external (to the NE) interface.

   4.  Incoming Redirect work in the form of control packets that come
       from other NEs via external (to the NE) interfaces.  After some
       processing, such packets are sent to the CE.

   It is worth emphasizing at this point again that the SCTP TML
   processes the channel work in strict priority.  For example, as long
   as there are messages to send to the CE on the HP channel, they will
   be processed first until there are no more left before processing the
   next priority work (which is to read new messages on the HP channel
   incoming from the CE).

A.2.2.  CE Channel work scheduling

   The CE scheduling, in priority order, needs to deal with:

   1.  The HP channel I/O in the following priority order:

       1.  Process incoming responses to requests of work it made to the

       2.  Transmitting any outstanding HP work it needs for the FE(s)
           to complete.

   2.  Incoming ForCES events from the FE(s) via the MP channel.

   3.  Outgoing Redirect work in the form of control packets that get
       sent from the CE via LP channel destined to external (to the NE)
       interface on FE(s).

   4.  Incoming Redirect work in the form of control packets that come
       from other NEs via external (to the NE) interfaces on the FE(s).

   It is worth to repeat for emphasis again that the SCTP TML processes
   the channel work in strict priority.  For example, if there are
   messages incoming from an FE on the HP channel, they will be
   processed first until there are no more left before processing the
   next priority work which is to transmit any outstanding HP channel
   messages going to the FE.

A.3.  SCTP TML Channel Termination

   Appendix B.2 describes a controlled disassociation of the FE from the

   It is also possible for connectivity to be lost between the FE and CE
   on one or more sockets.  In cases where SCTP multi-homing features
   are used for path availability, the disconnection of a socket will
   only occur if all paths are unreachable; otherwise, SCTP will ensure
   reachability.  In the situation of a total connectivity loss of even
   one SCTP socket, it is recommended that the FE and CE SHOULD assume a
   state equivalent to ForCES Association Teardown being issued and
   follow the sequence described in Appendix B.2.

   A CE could also disconnect sockets to an FE to indicate an "emergency
   teardown".  The "emergency teardown" may be necessary in cases when a
   CE needs to disconnect an FE but knows that an FE is busy processing
   a lot of outstanding commands (some of which the FE hasn't got around
   to processing yet).  By virtue of the CE closing the connections, the
   FE will immediately be asynchronously notified and will not have to
   process any outstanding commands from the CE.

A.4.  SCTP TML NE level channel scheduling

   In handling NE-level I/O work, an implementation needs to worry about
   being both fair and robust across peer ForCES nodes.

   Fairness is desired so that each peer node makes progress across the
   NE.  For the sake of illustration consider two FEs connected to a CE;
   whereas one FE has a few HP messages that need to be processed by the
   CE, another may have infinite HP messages.  The scheduling scheme may
   decide to use a quota scheduling system to ensure that the second FE
   does not hog the CE cycles.

   Robustness is desired so that the NE does not succumb to a DoS attack
   from hostile entities and always achieves a maximum stable workload
   processing level.  For the sake of illustration consider again two
   FEs connected to a CE.  Consider FE1 as having a large number of HP
   and MP messages and FE2 having a large number of MP and LP messages.
   The scheduling scheme needs to ensure that while FE1 always gets its
   messages processed, at some point we allow FE2 messages to be
   processed.  A promotion and preemption based scheduling could be used
   by the CE to resolve this issue.

Appendix B.  Suggested Service Interface

   This section outlines high level service interface between FEM/CEM
   and TML, the PL and TML, and between local and remote TMLs.  The
   intent of this interface discussion is to provide general guidelines.
   The implementer is expected to care of details and even follow a
   different approach if needed.

   The theory of operation for the PL-TML service is as follows:

   1.  The PL starts up and bootstraps the TML.  The end result of a
       successful TML bootstrap is that the CE TML and the FE TML
       connect to each other at the transport level.

   2.  Transmission and reception of the PL messages commences after a
       successful TML bootstrap.  The PL uses send and receive PL-TML
       interfaces to communicate to its peers.  The TML is agnostic to
       the nature of the messages being sent or received.  The first
       message exchanges that happen are to establish ForCES
       association.  Subsequent messages maybe either unsolicited events
       from the FE PL, control message redirects from/to the CE to/from
       FE, and configuration from the CE to the FE and their responses
       flowing from the FE to the CE.

   3.  The PL does a shutdown of the TML after terminating ForCES

B.1.  TML Boot-strapping

   Figure 6 illustrates a flow for the TML bootstrapped by the PL.

   When the PL starts up (possibly after some internal initialization),
   it boots up the TML.  The TML first interacts with the FEM/CEM and
   acquires the necessary TML parameterization (Section  Next
   the TML uses the information it retrieved from the FEM/CEM interface
   to initialize itself.

   The TML on the FE proceeds to connect the 3 channels to the CE.  The
   socket interface is used for each of the channels.  The TML continues
   to re-try the connections to the CE until all 3 channels are
   connected.  It is advisable that the number of connection retry
   attempts and the time between each retry is also configurable via the
   FEM.  On failure to connect one or more channels, and after the
   configured number of retry thresholds is exceeded, the TML will
   return an appropriate failure indicator to the PL.  On success (as
   shown in Figure 6), a success indication is presented to the PL.

   FE PL      FE TML           FEM  CEM        CE TML              CE PL
     |            |             |    |            |                    |
     |            |             |    |            |      Bootup        |
     |            |             |    |            |<-------------------|
     |  Bootup    |             |    |            |                    |
     |----------->|             |    |get CEM info|                    |
     |            |get FEM info |    |<-----------|                    |
     |            |------------>|    ~            ~                    |
     |            ~             ~    |----------->|                    |
     |            |<------------|                 |                    |
     |            |                               |-initialize TML     |
     |            |                               |-create the 3 chans.|
     |            |                               | to listen to FEs   |
     |            |                               |                    |
     |            |-initialize TML                |Bootup success      |
     |            |-create the 3 chans. locally   |------------------->|
     |            |-connect 3 chans. remotely     |                    |
     |            |------------------------------>|                    |
     |            ~                               ~ - FE TML connected ~
     |            ~                               ~ - FE TML info init ~
     |            | channels connected            |                    |
     |            |<------------------------------|                    |
     | Bootup     |                               |                    |
     | succeeded  |                               |                    |
     |<-----------|                               |                    |
     |            |                               |                    |

                     Figure 6: SCTP TML Bootstrapping

   On the CE things are slightly different.  After initializing from the
   CEM, the TML on the CE side proceeds to initialize the 3 channels to
   listen to remote connections from the FEs.  The success or failure
   indication is passed on to the CE PL (in the same manner as was done
   in the FE).

   Post boot-up, the CE TML waits for connections from the FEs.  Upon a
   successful connection by an FE, the CE TML level keeps track of the
   transport level details of the FE.  Note, at this stage only
   transport level connection has been established; ForCES level
   association follows using send/receive PL-TML interfaces (refer to
   Appendix B.3 and Figure 8).

B.2.  TML Shutdown

   Figure 7 shows an example of an FE shutting down the TML.  It is
   assumed at this point that the ForCES Association Teardown has been
   issued by the CE.  It should also be noted that different
   implementations may have different procedures for cleaning up state

   When the FE PL issues a shutdown to its TML for a specific PL ID, the
   TML releases all the channel connections to the CE.  This is achieved
   by closing the sockets used to communicate to the CE.  This results
   in the stack sending a SCTP shutdown which is received on the CE.

   FE PL      FE TML                      CE TML              CE PL
     |            |                         |                    |
     |  Shutdown  |                         |                    |
     |----------->|                         |                    |
     |            |-disconnect 3 chans.     |                    |
     |            |-SCTP level shutdown     |                    |
     |            |------------------------>|                    |
     |            |                         |                    |
     |            |                         |TML detects shutdown|
     |            |                         |-FE TML info cleanup|
     |            |                         |-optionally tell PL |
     |            |                         |------------------->|
     |            |                         |                    |
     |            |- clean up any state of  |                    |
     |            |-channels disconnected   |                    |
     |            |<------------------------|                    |
     |            |-SCTP shutdown ACK       |                    |
     |            |                         |                    |
     | Shutdown   |                         |                    |
     | succeeded  |                         |                    |
     |<-----------|                         |                    |
     |            |                         |                    |

                        Figure 7: FE Shutting down

   On the CE side, a TML disconnection would result in possible cleanup
   of the FE state.  Optionally, depending on the implementation, there
   may be need to inform the PL about the TML disconnection.  The CE
   stack level SCTP sends an acknowledgement to the FE TML in response
   to the earlier SCTP shutdown.

B.3.  TML Sending and Receiving

   The TML should be agnostic to the content of the PL messages, or
   their operations.  The PL should provide enough information to the
   TML for it to assign an appropriate priority and loss behavior to the
   message.  Figure 8 shows an example of a message exchange originated
   at the FE and sent to the CE (such as a ForCES association message)
   which illustrates all the necessary service interfaces for sending
   and receiving.

   When the FE PL sends a message to the TML, the TML is expected to
   pick one of HP/MP/LP channels and send out the ForCES message.

   FE PL       FE TML           CE TML                CE PL
      |            |              |                      |
      |PL send     |              |                      |
      |----------->|              |                      |
      |            |              |                      |
      |            |              |                      |
      |            |-pick channel |                      |
      |            |-TML  Send    |                      |
      |            |------------->|                      |
      |            |              |                      |
      |            |              |-TML Receive on chan. |
      |            |              |- mux to PL/PL recv   |
      |            |              |--------------------->|
      |            |              |                      ~
      |            |              |                      ~ PL Process
      |            |              |                      ~
      |            |              |  PL send             |
      |            |              |<---------------------|
      |            |              |-pick chan to send on |
      |            |              |-TML send             |
      |            |<-------------|                      |
      |            |-TML Receive  |                      |
      |            |-mux to PL    |                      |
      | PL Recv    |              |                      |
      |<---------- |              |                      |
      |            |              |                      |

                       Figure 8: Send and Recv Flow

   When the CE TML receives the ForCES message on the channel it was
   sent on, it demultiplexes the message to the CE PL.

   The CE PL, after some processing (in this example dealing with the
   FE's association), sends to the TML the response.  And as in the case
   of FE PL, the CE TML picks the channel to send on before sending.

   The processing of the ForCES message upon arriving at the FE TML and
   delivery to the FE PL is similar to the CE side equivalent as shown
   above in Appendix B.3.

Authors' Addresses

   Jamal Hadi Salim
   Mojatatu Networks
   Ottawa, Ontario


   Kentaro Ogawa
   NTT Corporation
   3-9-11 Midori-cho
   Musashino-shi, Tokyo  180-8585