draft-ietf-tsvwg-sctp-failover-02.txt   draft-ietf-tsvwg-sctp-failover-03.txt 
Network Working Group Y. Nishida Network Working Group Y. Nishida
Internet-Draft GE Global Research Internet-Draft GE Global Research
Intended status: Experimental P. Natarajan Intended status: Experimental P. Natarajan
Expires: April 24, 2014 Cisco Systems Expires: September 3, 2014 Cisco Systems
A. Caro A. Caro
BBN Technologies BBN Technologies
P. Amer P. Amer
University of Delaware University of Delaware
October 21, 2013 K. Nielsen
Ericsson
March 2, 2014
Quick Failover Algorithm in SCTP Quick Failover Algorithm in SCTP
draft-ietf-tsvwg-sctp-failover-02 draft-ietf-tsvwg-sctp-failover-03.txt
Abstract Abstract
One of the major advantages in SCTP is supporting multi-homing One of the major advantages of SCTP is supporting multi-homed
communication. If a multi-homed end-point has redundant network communication. If a multi-homed end-point has a redundant network
connections, SCTP sessions can have a good chance to survive from connections, the SCTP associations have a good chance to survive
network failures by migrating inactive network to active one. network failures by migrating traffic from inactive networks to
However, if we follow the SCTP standard, there can be significant active ones. However, if the SCTP standard is followed, there can be
delay for the network migration. During this migration period, SCTP a significant delay during the migration. During this period, SCTP
cannot transmit much data to the destination. This issue drastically might not be able to transmit much data to the peer. This issue
impairs the usability of SCTP in some situations. This memo drastically impairs the usability of SCTP in some situations. This
describes the issue of SCTP failover mechanism and discuss its memo describes the issue of the SCTP failover mechanism and specifies
solutions which require minimal modification to the current standard. an alternative failover procedure for SCTP that improves its
performance during and after failover. The procedures require only
minimal modifications to the current specification.
Status of this Memo Status of this Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 24, 2014. This Internet-Draft will expire on September 3, 2014.
Copyright Notice Copyright Notice
Copyright (c) 2013 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4
3. Issue in SCTP Path Management Process . . . . . . . . . . . . 5 3. Issues with the SCTP Path Management . . . . . . . . . . . . . 5
4. Existing Solutions for Smooth Failover . . . . . . . . . . . . 6 4. Existing Solutions for Smooth Failover . . . . . . . . . . . . 6
4.1. Reduce Path.Max.Retrans . . . . . . . . . . . . . . . . . 6 4.1. Reduce Path.Max.Retrans (PMR) . . . . . . . . . . . . . . 6
4.2. Adjust RTO related parameters . . . . . . . . . . . . . . 7 4.2. Adjust RTO related parameters . . . . . . . . . . . . . . 6
5. Proposed Solution: SCTP with Potentially-Failed 5. SCTP with Potentially-Failed Destination State (SCTP-PF) . . . 8
Destination State (SCTP-PF) . . . . . . . . . . . . . . . . . 8
5.1. SCTP-PF Description . . . . . . . . . . . . . . . . . . . 8 5.1. SCTP-PF Description . . . . . . . . . . . . . . . . . . . 8
5.2. Effect of Path Bouncing . . . . . . . . . . . . . . . . . 10 5.2. Effect of Path Bouncing . . . . . . . . . . . . . . . . . 10
5.3. Permanent Failover . . . . . . . . . . . . . . . . . . . . 10 5.3. Permanent Failover . . . . . . . . . . . . . . . . . . . . 10
5.4. Handling of Association Error Counter . . . . . . . . . . 11
6. Socket API Considerations . . . . . . . . . . . . . . . . . . 12 6. Socket API Considerations . . . . . . . . . . . . . . . . . . 12
6.1. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) socket 6.1. Support for the Potentially Failed Path State . . . . . . 12
option . . . . . . . . . . . . . . . . . . . . . . . . . . 12 6.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket
7. Security Considerations . . . . . . . . . . . . . . . . . . . 13 Option . . . . . . . . . . . . . . . . . . . . . . . . . . 13
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 6.3. Exposing the Potentially Failed Path State
9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 15 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option . . . 14
9.1. Normative References . . . . . . . . . . . . . . . . . . . 15 7. Security Considerations . . . . . . . . . . . . . . . . . . . 15
9.2. Informative References . . . . . . . . . . . . . . . . . . 15 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 17 9. References . . . . . . . . . . . . . . . . . . . . . . . . . . 17
9.1. Normative References . . . . . . . . . . . . . . . . . . . 17
9.2. Informative References . . . . . . . . . . . . . . . . . . 17
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 19
1. Introduction 1. Introduction
The Stream Control Transmission Protocol (SCTP) [RFC4960] natively The Stream Control Transmission Protocol (SCTP) as specified in
supports multihoming at the transport layer -- an SCTP association [RFC4960] supports multihoming at the transport layer -- an SCTP
can bind to multiple IP addresses at each endpoint. SCTP's association can bind to multiple IP addresses at each endpoint.
multihoming features include failure detection and failover SCTP's multihoming features include failure detection and failover
procedures to provide network interface redundancy and improved end- procedures to provide network interface redundancy and improved end-
to-end fault tolerance. to-end fault tolerance.
In SCTP's current failure detection procedure, the sender must In SCTP's current failure detection procedure, the sender must
experience Path.Max.Retrans (PMR) number of consecutive timeouts on a experience Path.Max.Retrans (PMR) number of consecutive failed
destination before detecting path failure. The sender fails over to retransmissions on a destination before detecting a path failure.
an alternate active destination only after failure detection. Until The sender fails over to an alternate active destination only after
failover, the sender transmits data on the failed path, degrading failure detection. Until detecting the failover, the sender
continues to transmit data on the failed path, which degrades the
SCTP performance. Concurrent Multipath Transfer (CMT) [IYENGAR06] is SCTP performance. Concurrent Multipath Transfer (CMT) [IYENGAR06] is
an extension to SCTP and allows the sender to transmit data on an extension to SCTP and allows the sender to transmit data on
multiple paths simultaneously. Research [NATARAJAN09] shows that the multiple paths simultaneously. Research [NATARAJAN09] shows that the
current failure detection procedure worsens CMT performance during current failure detection procedure worsens CMT performance during
failover and can be significantly improved by employing a better failover and can be significantly improved by employing a better
failover algorithm. failover algorithm.
This document proposes an alternative failure detection procedure for This document specifies an alternative failure detection procedure
SCTP (and CMT) that improves SCTP (CMT) performance during failover. for SCTP (and CMT) that improves the SCTP (and CMT) performance
during a failover.
Also the operation after a failover impacts the performance of the
protocol. With [RFC4960] procedures, SCTP will, after a failover
from the primary path, switch back to use the primary path for data
transfer as soon as this path becomes available. From a performance
perspective, as confirmed in research [CARO02], such a switchback of
the data transmission path is not optimal in general. As an
alternative option to the switchback operation of [RFC4960], this
document specifies the support the Permanent Failover switchover
procedures proposed by [CARO02].
2. Conventions and Terminology 2. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
3. Issue in SCTP Path Management Process 3. Issues with the SCTP Path Management
SCTP can utilize multiple IP addresses for a single SCTP association. SCTP can utilize multiple IP addresses for a single SCTP association.
Each SCTP endpoint exchanges the list of available addresses on the Each SCTP endpoint exchanges the list of its usable addresses during
node during initial negotiation. After this, endpoints select one initial negotiation with its peer. Then the endpoints select one
address from the list and define this as the primary destination. address from the peer's list and define this as the primary
During normal transmission, SCTP sends all data to the primary destination. During normal transmission, SCTP sends all user data to
destination. Also, it sends heartbeat packets to other (non-primary) the primary destination. Also, it sends heartbeat packets to all
destinations at a certain interval to check the reachability of the idle destinations at a certain interval to check the reachability of
path. the path. Idle destinations normally include all non-primary
destinations.
If sender has multiple active destination addresses, it can If a sender has multiple active destination addresses, it can
retransmit data to secondary destination address when the retransmit data to secondary destination address, when the
transmission to the primary times out. transmission to the primary times out.
When sender receives the acknowledgment for data or heartbeat packets When a sender receives an acknowledgment for DATA or HEARTBEAT chunks
from one of the destination addresses, it considers the destination sent to one of the destination addresses, it considers that
is active. If it fails to receive acknowledgments, the error count destination to be active. If it fails to receive acknowledgments,
for the address is increased. If the error counter exceeds the the error count for the address is increased. If the error counter
protocol parameter 'Path.Max.Retrans', SCTP endpoint considers the exceeds the protocol parameter 'Path.Max.Retrans', SCTP endpoint
address is inactive. considers the address to be inactive.
The failover process of SCTP is initiated when the primary path The failover process of SCTP is initiated when the primary path
becomes inactive (error counter for the primary path exceeds becomes inactive (error counter for the primary path exceeds
Path.Max.Retrans). If the primary path is marked inactive, SCTP Path.Max.Retrans). If the primary path is marked inactive, SCTP
chooses new destination address from one of the active destinations chooses a new destination address from one of the active destinations
and start using this address to send data. If the primary path and start using this address to send data to. If the primary path
becomes active again, SCTP uses the primary destination for becomes active again, SCTP uses the primary destination for
subsequent data transmissions and stop using non-primary one. subsequent data transmissions and stop using non-primary one.
An issue in this failover process is that it usually takes One issue with this failover process is that it usually takes
significant amount of time before SCTP switches to the new significant amount of time before SCTP switches to the new
destination. Let's say the primary path on a multi-homed host destination. Let's say the primary path on a multi-homed host
becomes unavailable and the RTO value for the primary path at that becomes unavailable and the RTO value for the primary path at that
time is around 1 second, it usually takes over 60 seconds before SCTP time is around 1 second, it usually takes over 60 seconds before SCTP
starts to use the secondary path. This is because the recommended starts to use the secondary path. This is because the recommended
value for Path.Max.Retrans in the standard is 5, which requires 6 value for Path.Max.Retrans in the standard is 5, which requires 6
consecutive timeouts before failover takes place. Before SCTP consecutive timeouts before failover takes place. Before SCTP
switches to the secondary address, SCTP keeps trying to send packets switches to the secondary address, SCTP keeps trying to send packets
to the primary and only retransmitted packets are sent to the to the primary and only retransmitted packets are sent to the
secondary can be reached at the receiver. This slow failover process secondary can be reached at the receiver. This slow failover process
can cause significant performance degradation and will not be can cause significant performance degradation and will not be
acceptable in some situations. acceptable in some situations.
Another issue is that once the primary path is active again, the
traffic is switched back. This is not optimal in general.
4. Existing Solutions for Smooth Failover 4. Existing Solutions for Smooth Failover
The following approach are conceivable for the solutions of this The following approaches are conceivable for the solutions of this
issue. issue.
4.1. Reduce Path.Max.Retrans 4.1. Reduce Path.Max.Retrans (PMR)
If we choose smaller value for Path.Max.Retrans, we can shorten the Smaller values for Path.Max.Retrans shorten the failover duration.
duration of failover process. In fact, this is recommended in some In fact, this is recommended in some research results [JUNGMAIER02]
research results [JUNGMAIER02] [GRINNEMO04] [FALLON08]. For example, [GRINNEMO04] [FALLON08]. For example, if when Path.Max.Retrans=0,
if we set Path.Max.Retrans to 0, SCTP switches to another destination SCTP switches to another destination on a single timeout. However,
on a single timeout. However, smaller value for Path.Max.Retrans smaller value for Path.Max.Retrans also results in spurious failover.
might cause spurious failover. In addition, if we use smaller value In addition, smaller Path.Max.Retrans values also affect
for Path.Max.Retrans, we may also need to choose smaller value for 'Association.Max.Retrans' values. When the SCTP association's error
'Association.Max.Retrans'. The Association.Max.Retrans indicates the count (sum of error counts on all ACTIVE paths) exceeds
threshold for the total number of consecutive error count for the Association.Max.Retrans threshold, the SCTP sender considers the peer
entire SCTP association. If the total of the error count for all endpoint unreachable and terminates the association. Therefore,
paths exceeds this value, the endpoint considers the peer endpoint Section 8.2 in [RFC4960] recommends that Association.Max.Retrans
unreachable and terminates the association. According to the Section value should not be larger than the summation of the Path.Max.Retrans
8.2 in [RFC4960], we should avoid having the value of of each of the destination addresses, else the SCTP sender considers
Association.Max.Retrans larger than the summation of the its peer reachable even when all destinations are INACTIVE. To avoid
Path.Max.Retrans of all the destination addresses. Otherwise, even such inconsistent behavior an SCTP implementation SHOULD reduce
if all the destination addresses become inactive, the endpoint still Association.Max.Retrans accordingly whenever it reduces
considers the peer endpoint reachable. The behavior in this Path.Max.Retrans. However, smaller Association.Max.Retrans value
situation is not defined in the RFC and depends on each increases chances of association termination during minor congestion
implementation. In order to avoid inconsistent behavior between events.
implementations, we had better use smaller value for
Association.Max.Retrans. However, if we choose smaller value for
Association.Max.Retrans, associations will prone to be terminated
with minor congestion.
Another issue is that the interval of heartbeat packet: 'HB.interval' Another issue is that the interval of heartbeat packet: 'HB.interval'
may not be small. (recommended value is 30 seconds) This means once could be in the order of seconds (recommended value is 30 seconds).
failover takes place, an endpoint might need a certain amount of time When the primary path becomes inactive, the next HB can be
to use the primary path again. This can cause undesirable effects in transmitted only seconds later. Meanwhile, the primary path may have
case of spurious failover. If we choose smaller value for recovered. In such situations, post failover, an endpoint is forced
HB.interval, the traffic used for path probing in a session will be to wait on the order of seconds before the endpoint can resume
increased. transmission on the primary path.
The advantage of tuning Path.Max.Retrans is that it requires no The advantage of tuning Path.Max.Retrans is that it requires no
modification to the current standard, although it needs to ignore modification to the current standard. However, as we discuss above
several recommendations. In addition, some research results indicate tuning Path.Max.Retrans ignores several recommendations in [RFC4960].
path bouncing caused by spurious failover does not cause serious In addition, some research results indicate path bouncing caused by
problems. We discuss the effect of path bouncing in the section 5. spurious failover does not cause serious problems. We discuss the
effect of path bouncing in Section 5.2.
4.2. Adjust RTO related parameters 4.2. Adjust RTO related parameters
As several research results indicate, we can also shorten the As several research results indicate, we can also shorten the
duration of failover process by adjusting RTO related parameters duration of failover process by adjusting RTO related parameters
[JUNGMAIER02] [FALLON08]. During failover process, RTO keeps being [JUNGMAIER02] [FALLON08]. During failover process, RTO keeps being
doubled. However, if we can choose smaller value for RTO.max, we can doubled. However, if we can choose smaller value for RTO.max, we can
stop the exponential growth of RTO at some point. Also, choosing stop the exponential growth of RTO at some point. Also, choosing
smaller values for RTO.initial or RTO.min can contribute to keep RTO smaller values for RTO.initial or RTO.min can contribute to keep RTO
value small. value small.
Similar to reducing Path.Max.Retrans, the advantage of this approach Similar to reducing Path.Max.Retrans, the advantage of this approach
is that it requires no modification to the current standard, although is that it requires no modification to the current specification,
it needs to ignore several recommendations. However, this approach although it needs to ignore several recommendations described in the
requires to have enough knowledge about the network characteristics Section 15 of [RFC4960]. However, this approach requires to have
between end points. Otherwise, it can introduce adverse side-effects enough knowledge about the network characteristics between end
such as spurious timeouts. points. Otherwise, it can introduce adverse side-effects such as
spurious timeouts.
5. Proposed Solution: SCTP with Potentially-Failed Destination State 5. SCTP with Potentially-Failed Destination State (SCTP-PF)
(SCTP-PF)
5.1. SCTP-PF Description 5.1. SCTP-PF Description
Our proposal stems from the following two observations about SCTP's SCTP-PF stems from the following two observations about SCTP's
failure detection procedure: failure detection procedure:
o In order to minimize performance impact during failover, the o In order to minimize performance impact during failover, the
sender should avoid transmitting data to the failed destination as sender should avoid transmitting data to the failed destination as
early as possible. In the current SCTP path management scheme, early as possible. In the current SCTP path management scheme,
the sender stops transmitting data to a destination only after the the sender stops transmitting data to a destination only after the
destination is marked Failed. Thus, a smaller PMR value is ideal destination is marked Failed. Thus, a smaller PMR value is ideal
so that the sender transitions a destination to the Failed state so that the sender transitions a destination to the Failed state
quicker. quicker.
skipping to change at page 8, line 34 skipping to change at page 8, line 33
are preferable to avoid spurious failure detection. are preferable to avoid spurious failure detection.
From the above observations it is clear that tweaking the PMR value From the above observations it is clear that tweaking the PMR value
involves the following tradeoff -- a lower value improves performance involves the following tradeoff -- a lower value improves performance
but increases the chances of spurious failure detection, whereas a but increases the chances of spurious failure detection, whereas a
higher value degrades performance and reduces spurious failure higher value degrades performance and reduces spurious failure
detection in a wide range of path conditions. Thus, tweaking the detection in a wide range of path conditions. Thus, tweaking the
association's PMR value is an incomplete solution to address association's PMR value is an incomplete solution to address
performance impact during failure. performance impact during failure.
We propose a new "Potentially-failed" (PF) destination state in This proposal introduces a new "Potentially-failed" (PF) destination
SCTP's path management procedure. The PF state was originally state in SCTP's path management procedure. The PF state was
proposed to improve CMT performance [NATARAJAN09]. The PF state is originally proposed to improve CMT performance [NATARAJAN09]. The PF
an intermediate state between Active and Failed states. SCTP's state is an intermediate state between Active and Failed states.
failure detection procedure is modified to include the PF state. The SCTP's failure detection procedure is modified to include the PF
new failure detection algorithm assumes that loss detected by a state. The new failure detection algorithm assumes that loss
timeout implies either severe congestion or failure en-route. After detected by a timeout implies either severe congestion or failure en-
a single timeout on a path, a sender is unsure, and marks the route. After a number of consecutive timeouts on a path, the sender
corresponding destination as PF. A PF destination is not used for is unsure, and marks the corresponding destination as PF. A PF
data transmission except in special cases (discussed below). The new destination is not used for data transmission except in special cases
failure detection algorithm requires only sender-side changes. (discussed below). The new failure detection algorithm requires only
Details are: sender-side changes. Details are:
1. The sender maintains a new tunable parameter called Potentially- 1. The sender maintains a new tunable parameter called Potentially-
failed.Max.Retrans (PFMR). The recommended value of PFMR = 0 failed.Max.Retrans (PFMR). The recommended value of PFMR = 0
when quick failover is used. When an association's PFMR >= PMR, when quick failover is used. When PFMR is larger or equal to
quick failover is turned off. PMR, quick failover is turned off.
2. Each time the T3-rtx timer expires on an active or idle 2. Each time the T3-rtx timer expires on an active destination, the
destination, the error counter of that destination address will error counter of that destination address will be incremented.
be incremented. When the value in the error counter exceeds
PFMR, the endpoint should mark the destination transport address When the value in the error counter exceeds PFMR, the endpoint
as PF. SCTP MUST NOT send any notification to the upper layer should mark the destination transport address as PF.
about the Active to PF state transition.
3. The sender SHOULD avoid data transmission to PF destinations. 3. The sender SHOULD avoid data transmission to PF destinations.
When all destinations are in either PF or Inactive state, the When all destinations are in either PF or Inactive state, the
sender MAY either move the destination from PF to Active state sender MAY either move the destination from PF to Active state
(and transmit data to the active destination) or the sender MAY (and transmit data to the active destination) or the sender MAY
transmit data to a PF destination. In the former scenario, (i) transmit data to a PF destination. In the former scenario, (i)
the sender MUST NOT notify the ULP about the state transition, the sender MUST NOT notify the ULP about the state transition,
and (ii) MUST NOT clear the destination's error counter. It is and (ii) MUST NOT clear the destination's error counter. It is
recommended that the sender picks the PF destination with least recommended that the sender picks the PF destination with least
error count (fewest consecutive timeouts) for data transmission. error count (fewest consecutive timeouts) for data transmission.
skipping to change at page 9, line 33 skipping to change at page 9, line 29
4. Only heartbeats MUST be sent to PF destination(s) once per RTO. 4. Only heartbeats MUST be sent to PF destination(s) once per RTO.
This means the sender SHOULD ignore HB.interval for PF This means the sender SHOULD ignore HB.interval for PF
destinations. If an heartbeat is unanswered, the sender destinations. If an heartbeat is unanswered, the sender
increments the error counter and exponentially backs off the RTO increments the error counter and exponentially backs off the RTO
value. If error counter is less than PMR, the sender SHOULD value. If error counter is less than PMR, the sender SHOULD
transmit another heartbeat immediately after T3-timer expiration. transmit another heartbeat immediately after T3-timer expiration.
5. When the sender receives an heartbeat ACK from a PF destination, 5. When the sender receives an heartbeat ACK from a PF destination,
the sender clears the destination's error counter and transitions the sender clears the destination's error counter and transitions
the PF destination back to Active state. This state transition the PF destination back to Active state. The sender should
MUST NOT be notified to the ULP. This destination's cwnd is set perform slow-start as specified in Section 7.2.1 of [RFC4960]
to 1 MTU. Note that in scenarios where the destination was when it sends data on this destination.
temporarily congested during the T3-timer expiration, an SCTP
sender transmits 1 MTU worth of data while an SCTP-PF sender
transmits an HB after the T3-timer expiry (more details in
Section 5 of [NATARAJAN09]). The SCTP sender has 1 RTT head-
start in cwnd evolution compared to SCTP-PF sender. An SCTP-PF
sender may set cwnd to 2 MTUs after receiving HB-ACK in order to
offset this performance difference.
6. An additional (PMR - PFMR) consecutive timeouts on a PF 6. Additional (PMR - PFMR) consecutive timeouts on a PF destination
destination confirm the path failure, upon which the destination confirm the path failure, upon which the destination transitions
transitions to the Inactive state. As described in [RFC4960], to the Inactive state. As described in [RFC4960], the sender (i)
the sender (i) SHOULD notify ULP about this state transition, and SHOULD notify ULP about this state transition, and (ii) transmit
(ii) transmit heartbeats to the Inactive destination at a lower heartbeats to the Inactive destination at a lower frequency as
frequency as described in Section 8.3 of [RFC4960]. described in Section 8.3 of [RFC4960].
7. When all destinations are in the Inactive state, the sender picks 7. When all destinations are in the Inactive state, the sender picks
one of the Inactive destinations for data transmission. This one of the Inactive destinations for data transmission. This
proposal recommends that the sender picks the Inactive proposal recommends that the sender picks the Inactive
destination with least error count (fewest consecutive timeouts) destination with least error count (fewest consecutive timeouts)
for data transmission. In case of a tie (multiple Inactive for data transmission. In case of a tie (multiple Inactive
destinations with same error count), the sender MAY choose the destinations with same error count), the sender MAY choose the
last active destination. last active destination.
8. ACKs for retransmissions do not transition a PF destination back 8. ACKs for retransmissions do not transition a PF destination back
to Active state, since a sender cannot disambiguate whether the to Active state, since a sender cannot disambiguate whether the
ack was for the original transmission or the retransmission(s). ack was for the original transmission or the retransmission(s).
9. SCTP shall provide the means to expose the PF state of its
destinations as well as SCTP SHOULD notify the ULP of the state
transitions from Active to PF and from PF to Active state. SCTP
can provide the means to suppress exposure of PF state and
association state transitions and in this case the ULP MAY make
SCTP suppress exposure of PF state to ULP. In this case the ULP
will rely solely on the [RFC4960] state machine even if quick
failover function is activated in SCTP.
5.2. Effect of Path Bouncing 5.2. Effect of Path Bouncing
The methods described above can accelerate failover process. Hence, The methods described above can accelerate the failover process.
it might introduce path bouncing effect which keeps changing the data Hence, they might introduce the path bouncing effect where the sender
transmission path frequently. This sounds harmful for data transfer, keeps changing the data transmission path frequently. This sounds
however several research results indicate that there is no serious harmful to the data transfer, however several research results
problem with SCTP in terms of path bouncing effect [CARO04] [CARO05]. indicate that there is no serious problem with SCTP in terms of path
bouncing effect [CARO04] [CARO05].
There are two main reasons for this. First, SCTP is basically There are two main reasons for this. First, SCTP is basically
designed for multipath communication, which means SCTP maintains all designed for multipath communication, which means SCTP maintains all
path related parameters (cwnd, ssthresh, RTT, error count, etc) per path related parameters (CWND, ssthresh, RTT, error count, etc) per
each destination address. These parameters cannot be affected by each destination address. These parameters cannot be affected by
path bouncing. In addition, when SCTP migrates to another path, it path bouncing. In addition, when SCTP migrates the data transfer to
starts with minimal cwnd because of slow-start. Hence, there is another path, it starts with the minimal or the initial CWND. Hence,
little chance for packet reordering or duplicating. there is little chance for packet reordering or duplicating.
Second, even if all communication paths between end-nodes share the Second, even if all communication paths between the end-nodes share
same bottleneck, the proposed method does not make situations worse. the same bottleneck, the quick failover results in a behavior already
In case of congestion, the current standard tries to transmit data allowed by [RFC4960].
packets to the primary during failover, while the proposed method
tries to explore other destinations. In any case, the same amount of
data packets sent to the same bottleneck.
5.3. Permanent Failover 5.3. Permanent Failover
Post failover, an SCTP sender migrates back to the original primary Post failover then, by [RFC4960] behavior, an SCTP sender migrates
destination once this destination becomes active. The sender sets the traffic back to the original primary destination once this
cwnd to the initial cwnd value and performs slow start. [CARO02] destination becomes active anew. As the CWND towards the original
shows that the switch over to the original primary may degrade SCTP primary destination has to be rebuilt once data transfer resumes, the
performance compared to continuing data transmission on the same switch back to use the original primary path is not always optimal.
path, especially in scenarios where this path's characteristics are Indeed [CARO02] shows that the switch over to the original primary
better. In order to mitigate this performance degradation, permanent may degrade SCTP performance compared to continuing data transmission
failover was proposed in [CARO02]. Permanent failover allows SCTP to on the same path, especially, but not only, in scenarios where this
remain the alternative path even if the primary path becomes active path's characteristics are better. In order to mitigate this
again. We recommend that SCTP-PF should stick to the standard performance degradation, Permanent Failover operation was proposed in
RFC4960 behavior, i.e., switch back to the original primary once this [CARO02]. When SCTP changes the destination due to failover,
destination becomes active again. Permanent failover may be Permanent Failover marks it as new primary. This means Permanent
considered in the future based on discussions and consensus within Failover allows SCTP sender to continue data transmission to the path
the community. even after the old primary destination becomes active again. This is
achieved by having SCTP perform a switchover of the primary path to
an alternative working path rather than having SCTP switch back data
transfer to the (previous) primary path.
5.4. Handling of Association Error Counter The manner of switchover operation that is most optimal in a given
scenario depends on the relative quality of a set primary path versus
the quality of alternative paths available as well as it depends on
the extent to which it is desired for the mode of operation to
enforce traffic distribution over a number of network paths. I.e.,
load distribution of traffic from multiple SCTP associations may be
sought to be enforced by distribution of the set primary paths with
[RFC4960] switchback operation. However as [RFC4960] switchback
behavior is suboptimal in certain situations, especially in scenarios
where a number of equally good paths are available, it is recommended
for SCTP to support also, as alternative behavior, the Permanent
Failover modes of operation where forced switch back to a previously
failed primary path is not always performed. The Permanent Failover
operation requires only sender side changes. Details, as originally
outlined in [CARO02], are:
When multiple destinations are in the PF state, the sender may 1. The sender maintains a new tunable parameter, called
transmit heartbeats to multiple destinations at the same time. This Primary.Switchover.Max.Retrans (PSMR). When the path error
allows SCTP-PF sender to quickly track and respond to network status counter on a set primary path exceeds PSMR, the SCTP
change compared to an SCTP sender. However, when all PF destinations implementation autonomously selects and sets a new primary path.
become unavailable, an SCTP-PF sender has outstanding HBs on all
destinations compared to an SCTP sender and increases the count for
the total number of consecutive retransmissions faster than the SCTP
sender. SCTP-PF's faster increase in the error count will result in
association termination sooner than SCTP. The key difference between
SCTP and SCTP-PF with regard to this feature is whether checking path
status sequentially or concurrently as the number of packets sent for
probing is the same.
For deployments where aggressive failure detection and association 2. The primary path selected by the SCTP implementation shall be the
termination is not desired, we suggest that AMR be set to the path which at the given time would be chosen for data transfer.
recommended maximum value (sum of PMRs of all paths), to delay assoc A previously failed primary path may come in use as data transfer
termination during SCTP-PF. Another option is to send retransmitted path as per normal path selection when the present data transfer
data or HB to only one PF destination at a time, but this approach path fails.
may delay path status tracking. To exclude HB timeouts from
incrementing the error count can also be a solution, however, this 3. The recommended value of PSMR is PFMR when Permanent failover is
requires an update to Section 8.1 and Section 8.3 of [RFC4960], used. This means that no forced switchback to a previously
otherwise special logics for error counter need to be implemented for failed primary path is performed.
SCTP-PF.
4. It must be possible to disable the Permanent Failover and obtain
the standard switchback operation of [RFC4960].
We recommend that SCTP-PF should stick to the standard RFC4960
behavior as default, i.e., switch back to the old primary destination
once the destination becomes active again. However, implementors MAY
implement Permanent Failover and MAY enable it based on network
configurations or users' requests.
6. Socket API Considerations 6. Socket API Considerations
This section describes how the socket API defined in [RFC6458] is This section describes how the socket API defined in [RFC6458] is
extended to provide a way for the application to control the quick extended to provide a way for the application to control and observe
failover behavior. the quick failover behavior.
Please note that this section is informational only. Please note that this section is informational only.
A socket API implementation based on [RFC6458] is extended by adding A socket API implementation based on [RFC6458] is, by means of the
a new read/write socket option for the level IPPROTO_SCTP and the existing SCTP_PEER_ADDR_CHANGE event, extended to provide the event
name SCTP_PEER_ADDR_THLDS as described below. This socket option is notification when a peer address enters or leaves the potentially
used to read/write the value of PFMR parameter described in Section failed state as well as the socket API implementation is extended to
5. expose the potentially failed state of a peer address in the existing
SCTP_GET_PEER_ADDR_INFO structure.
Support for the SCTP_PEER_ADDR_THLDS socket option needs also to be Furthermore, two new read/write socket options for the level
IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS and
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE are defined as described below.
The first socket option is used to control the values of the PFMR and
PSMR parameters described in Section 5. The second one controls the
exposition of the potentially failed path state.
Support for the SCTP_PEER_ADDR_THLDS and
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE socket options need also to be
added to the function sctp_opt_info(). added to the function sctp_opt_info().
6.1. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) socket option 6.1. Support for the Potentially Failed Path State
As defined in [RFC6458], the SCTP_PEER_ADDR_CHANGE event is provided
if the status of a peer address changes. In addition to the state
changes described in [RFC6458], this event is also provided, if a
peer address enters or leaves the potentially failed state. The
notification as defined in [RFC6458] uses the following structure:
struct sctp_paddr_change {
uint16_t spc_type;
uint16_t spc_flags;
uint32_t spc_length;
struct sockaddr_storage spc_aaddr;
uint32_t spc_state;
uint32_t spc_error;
sctp_assoc_t spc_assoc_id;
}
[RFC6458] defines the constants SCTP_ADDR_AVAILABLE,
SCTP_ADDR_UNREACHABLE, SCTP_ADDR_REMOVED, SCTP_ADDR_ADDED, and
SCTP_ADDR_MADE_PRIM to be provided in the spc_state field. This
document defines in addition to that the new constant
SCTP_ADDR_POTENTIALLY_FAILED, which is reported if the affected
address becomes potentially failed.
The SCTP_GET_PEER_ADDR_INFO socket option defined in [RFC6458] can be
used to query the state of a peer address. It uses the following
structure:
struct sctp_paddrinfo {
sctp_assoc_t spinfo_assoc_id;
struct sockaddr_storage spinfo_address;
int32_t spinfo_state;
uint32_t spinfo_cwnd;
uint32_t spinfo_srtt;
uint32_t spinfo_rto;
uint32_t spinfo_mtu;
};
[RFC6458] defines the constants SCTP_UNCONFIRMED, SCTP_ACTIVE, and
SCTP_INACTIVE to be provided in the spinfo_state field. This
document defines in addition to that the new constant
SCTP_POTENTIALLY_FAILED, which is reported if the peer address is
potentially failed.
6.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket Option
Applications can control the quick failover behavior by getting or Applications can control the quick failover behavior by getting or
setting the number of timeouts before a peer address is considered setting the number of consecutive timeouts before a peer address is
potentially failed or unreachable. considered potentially failed or unreachable and before the primary
path is changed automatically. This socket option uses the level
IPPROTO_SCTP and the name SCTP_PEER_ADDR_THLDS.
The following structure is used to access and modify the thresholds: The following structure is used to access and modify the thresholds:
struct sctp_paddrthlds { struct sctp_paddrthlds {
sctp_assoc_t spt_assoc_id; sctp_assoc_t spt_assoc_id;
struct sockaddr_storage spt_address; struct sockaddr_storage spt_address;
uint16_t spt_pathmaxrxt; uint16_t spt_pathmaxrxt;
uint16_t spt_pathpfthld; uint16_t spt_pathpfthld;
uint16_t spt_pathcpthld;
}; };
spt_assoc_id: This parameter is ignored for one-to-one style spt_assoc_id: This parameter is ignored for one-to-one style
sockets. For one-to-many style sockets the application may fill sockets. For one-to-many style sockets the application may fill
in an association identifier or SCTP_FUTURE_ASSOC for this query. in an association identifier or SCTP_FUTURE_ASSOC. It is an error
It is an error to use SCTP_{CURRENT|ALL}_ASSOC in spt_assoc_id. to use SCTP_{CURRENT|ALL}_ASSOC in spt_assoc_id.
spt_address: This specifies which peer address is of interest. If a spt_address: This specifies which peer address is of interest. If a
wildcard address is provided, this socket option applies to all wildcard address is provided, this socket option applies to all
current and future peer addresses. current and future peer addresses.
spt_pathmaxrxt: Each peer address of interest is considered spt_pathmaxrxt: Each peer address of interest is considered
unreachable, if its path error counter exceeds spt_pathmaxrxt. unreachable, if its path error counter exceeds spt_pathmaxrxt.
spt_pathpfthld: Each peer address of interest is considered spt_pathpfthld: Each peer address of interest is considered
potentially failed, if its path error counter exceeds potentially failed, if its path error counter exceeds
spt_pathpfthld. spt_pathpfthld.
spt_pathcpthld: Each peer address of interest is not considered the
primary remote address anymore, if its path error counter exceeds
spt_pathcpthld. Using a value of 0xffff disables the selection of
a new primary peer address. If an implementation does not support
the automatically selection of a new primary address, it should
indicate an error with errno set to EINVAL if a value different
from 0xffff is used in spt_pathcpthld.
6.3. Exposing the Potentially Failed Path State
(SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option
Applications can control the exposure of the potentially failed path
state in the SCTP_PEER_ADDR_CHANGE event and the
SCTP_GET_PEER_ADDR_INFO as described in Section 6.1. The default
value is implementation specific.
This socket option uses the level IPPROTO_SCTP and the name
SCTP_EXPOSE_POTENTIALLY_FAILED_STATE.
The following structure is used to control the exposition of the
potentially failed path state:
struct sctp_assoc_value {
sctp_assoc_t assoc_id;
uint32_t assoc_value;
};
assoc_id: This parameter is ignored for one-to-one style sockets.
For one-to-many style sockets the application may fill in an
association identifier or SCTP_FUTURE_ASSOC. It is an error to
use SCTP_{CURRENT|ALL}_ASSOC in assoc_id.
assoc_value: The potentially failed path state is exposed if and
only if this parameter is non-zero.
7. Security Considerations 7. Security Considerations
There are no new security considerations introduced in this document. There are no new security considerations introduced in this document.
8. IANA Considerations 8. IANA Considerations
This document does not create any new registries or modify the rules This document does not create any new registries or modify the rules
for any existing registries managed by IANA. for any existing registries managed by IANA.
9. References 9. References
skipping to change at line 536 skipping to change at page 19, line 38
Email: acaro@bbn.com Email: acaro@bbn.com
Paul D. Amer Paul D. Amer
University of Delaware University of Delaware
Computer Science Department - 434 Smith Hall Computer Science Department - 434 Smith Hall
Newark, DE 19716-2586 Newark, DE 19716-2586
USA USA
Email: amer@udel.edu Email: amer@udel.edu
Karen E. E. Nielsen
Ericsson
Kistavaegen 25
Stockholm, 164 80
Sweden
Email: karen.nielsen@tieto.com
 End of changes. 51 change blocks. 
202 lines changed or deleted 328 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/