draft-ietf-tsvwg-sctp-failover-05.txt   draft-ietf-tsvwg-sctp-failover-06.txt 
Network Working Group Y. Nishida Network Working Group Y. Nishida
Internet-Draft GE Global Research Internet-Draft GE Global Research
Intended status: Experimental P. Natarajan Intended status: Experimental P. Natarajan
Expires: January 21, 2015 Cisco Systems Expires: April 26, 2015 Cisco Systems
A. Caro A. Caro
BBN Technologies BBN Technologies
P. Amer P. Amer
University of Delaware University of Delaware
K. Nielsen K. Nielsen
Ericsson Ericsson
July 20, 2014 October 23, 2014
Quick Failover Algorithm in SCTP Quick Failover Algorithm in SCTP
draft-ietf-tsvwg-sctp-failover-05.txt draft-ietf-tsvwg-sctp-failover-06.txt
Abstract Abstract
One of the major advantages of SCTP is that it supports multi-homed One of the major advantages of SCTP is that it supports multi-homed
communication. A multi-homed SCTP end-point has the ability to communication. A multi-homed SCTP end-point has the ability to
withstand network failures by migrating the traffic from an inactive withstand network failures by migrating the traffic from an inactive
network to an active one. However, if the [RFC4960] specified network to an active one. However, if the [RFC4960] specified
failover operation is followed there can be a significant delay in failover operation is followed there can be a significant delay in
the migration to the active destination addresses, thus severely the migration to the active destination addresses, thus severely
reducing the effectiveness of SCTP multi-homed operation. reducing the effectiveness of SCTP multi-homed operation.
skipping to change at page 2, line 15 skipping to change at page 2, line 15
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on January 21, 2015. This Internet-Draft will expire on April 26, 2015.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4
3. Issues with the SCTP Path Management . . . . . . . . . . . . 4 3. Issues with the SCTP Path Management . . . . . . . . . . . . 4
4. SCTP with Potentially-Failed Destination State (SCTP-PF) . . 5 4. SCTP with Potentially-Failed Destination State (SCTP-PF) . . 5
4.1. SCTP-PF Description . . . . . . . . . . . . . . . . . . . 5 4.1. SCTP-PF Concept . . . . . . . . . . . . . . . . . . . . . 5
4.2. Permanent Failover . . . . . . . . . . . . . . . . . . . 9 4.2. SCTP-PF Algorithm Detail . . . . . . . . . . . . . . . . 6
4.3. Optional Feature: Permanent Failover . . . . . . . . . . 9
5. Socket API Considerations . . . . . . . . . . . . . . . . . . 10 5. Socket API Considerations . . . . . . . . . . . . . . . . . . 10
5.1. Support for the Potentially Failed Path State . . . . . . 11 5.1. Support for the Potentially Failed Path State . . . . . . 11
5.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket 5.2. Peer Address Thresholds (SCTP_PEER_ADDR_THLDS) Socket
Option . . . . . . . . . . . . . . . . . . . . . . . . . 12 Option . . . . . . . . . . . . . . . . . . . . . . . . . 12
5.3. Exposing the Potentially Failed Path State 5.3. Exposing the Potentially Failed Path State
(SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option . . 13 (SCTP_EXPOSE_POTENTIALLY_FAILED_STATE) Socket Option . . 13
6. Security Considerations . . . . . . . . . . . . . . . . . . . 13 6. Security Considerations . . . . . . . . . . . . . . . . . . . 13
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 14 8. Proposed Change of Status (to be Deleted before Publication) 14
8.1. Normative References . . . . . . . . . . . . . . . . . . 14 9. References . . . . . . . . . . . . . . . . . . . . . . . . . 14
8.2. Informative References . . . . . . . . . . . . . . . . . 14 9.1. Normative References . . . . . . . . . . . . . . . . . . 14
9.2. Informative References . . . . . . . . . . . . . . . . . 14
Appendix A. Discussions of Alternative Approaches . . . . . . . 15 Appendix A. Discussions of Alternative Approaches . . . . . . . 15
A.1. Reduce Path.Max.Retrans (PMR) . . . . . . . . . . . . . . 15 A.1. Reduce Path.Max.Retrans (PMR) . . . . . . . . . . . . . . 15
A.2. Adjust RTO related parameters . . . . . . . . . . . . . . 16 A.2. Adjust RTO related parameters . . . . . . . . . . . . . . 16
Appendix B. Discussions for Path Bouncing Effect . . . . . . . . 16 Appendix B. Discussions for Path Bouncing Effect . . . . . . . . 16
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 17
1. Introduction 1. Introduction
The Stream Control Transmission Protocol (SCTP) as specified in The Stream Control Transmission Protocol (SCTP) as specified in
[RFC4960] supports multihoming at the transport layer -- an SCTP [RFC4960] supports multihoming at the transport layer -- an SCTP
skipping to change at page 3, line 41 skipping to change at page 3, line 44
for SCTP that improves the SCTP performance during a failover. for SCTP that improves the SCTP performance during a failover.
Also the operation after a failover impacts the performance of the Also the operation after a failover impacts the performance of the
protocol. With [RFC4960] procedures, SCTP will, after a failover protocol. With [RFC4960] procedures, SCTP will, after a failover
from the primary path, switch back to use the primary path for data from the primary path, switch back to use the primary path for data
transfer as soon as this path becomes available. From a performance transfer as soon as this path becomes available. From a performance
perspective, as confirmed in research [CARO02], such a switchback of perspective, as confirmed in research [CARO02], such a switchback of
the data transmission path is not optimal in general. As an optional the data transmission path is not optimal in general. As an optional
alternative to the switchback operation of [RFC4960], this document alternative to the switchback operation of [RFC4960], this document
specifies for SCTP to support the Permanent Failover switchover specifies for SCTP to support the Permanent Failover switchover
procedures proposed by [CARO02]. procedures proposed by [CARO02]. Additional discussions for
alternative approach that does not require modifications to [RFC4960]
and path bouncing effects that might be caused by frequent switchover
are provided in Appendix.
2. Conventions and Terminology 2. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
3. Issues with the SCTP Path Management 3. Issues with the SCTP Path Management
This section describes issues in the current SCTP to be fixed by the This section describes issues in the current SCTP to be fixed by the
skipping to change at page 5, line 7 skipping to change at page 5, line 12
value for Path.Max.Retrans in the standard is 5, which requires 6 value for Path.Max.Retrans in the standard is 5, which requires 6
consecutive timeouts before failover takes place. Before SCTP consecutive timeouts before failover takes place. Before SCTP
switches to the secondary address, SCTP keeps trying to send packets switches to the secondary address, SCTP keeps trying to send packets
to the primary and only retransmitted packets are sent to the to the primary and only retransmitted packets are sent to the
secondary and can thus be reached at the receiver. This slow secondary and can thus be reached at the receiver. This slow
failover process can cause significant performance degradation and failover process can cause significant performance degradation and
will not be acceptable in some situations. will not be acceptable in some situations.
Another issue is that once the primary path is active again, the Another issue is that once the primary path is active again, the
traffic is switched back. This is not optimal in some situations. traffic is switched back. This is not optimal in some situations.
This is further discussed in Section 4.2. This is further discussed in Section 4.3.
4. SCTP with Potentially-Failed Destination State (SCTP-PF) 4. SCTP with Potentially-Failed Destination State (SCTP-PF)
In order to address the issues described in Section 3, We propose to To address the issues described in Section 3, this section updates
update SCTP path management scheme as follows. SCTP path management scheme with the Potentially Failed state and
associated Quick Failover operation. We use the term SCTP-PF to
denote the resulting SCTP path management operation.
4.1. SCTP-PF Description 4.1. SCTP-PF Concept
SCTP-PF stems from the following two observations about SCTP's SCTP-PF as defined stems from the following two observations about
failure detection procedure: SCTP's failure detection procedure:
o In order to minimize performance impact during failover, the o To minimize performance impact during failover, the sender should
sender should avoid transmitting data to the failed destination as avoid transmitting data to the failed destination as early as
early as possible. In the current SCTP path management scheme, possible. In the current SCTP path management scheme, the sender
the sender stops transmitting data to a destination only after the stops transmitting data to a destination only after the
destination is marked Failed (inactive). Thus, a smaller PMR destination is marked Failed (inactive). Thus, a smaller PMR
value is ideal so that the sender transitions a destination to the value is ideal so that the sender transitions a destination to the
Failed (inactive) state quicker. Failed (inactive) state quicker.
o Smaller PMR values increase the chances of spurious failure o Smaller PMR values increase the chances of spurious failure
detection where the sender incorrectly marks a destination as detection where the sender incorrectly marks a destination as
Failed (inactive) during periods of temporary congestion. As Failed (inactive) during periods of temporary congestion. As
[RFC4960] recommends for a coupling of the PMR value and the AMR [RFC4960] recommends for a coupling of the PMR value and the AMR
value such spurious failure detection risks to carry over to value such spurious failure detection risks to carry over to
spurious association failure detection and closure. Larger PMR spurious association failure detection and closure. Larger PMR
values are preferable to avoid spurious failure detection. values are preferable to avoid spurious failure detection.
From the above observations it is clear that tweaking the PMR value From the above observations it is clear that tuning the PMR value
involves the following tradeoff -- a lower value improves performance involves the following tradeoff -- a lower value improves performance
but increases the chances of spurious failure detection, whereas a but increases the chances of spurious failure detection, whereas a
higher value degrades performance and reduces spurious failure higher value degrades performance and reduces spurious failure
detection in a wide range of path conditions. Thus, tweaking the detection in a wide range of path conditions. Thus, tuning the
association's PMR value is an incomplete solution to address association's PMR value is an incomplete solution to address
performance impact during failure. performance impact during failure.
This proposal introduces a new "Potentially-Failed" (PF) destination This new method introduces a new "Potentially-Failed" (PF)
state in SCTP's path management procedure. The PF state was destination state in SCTP's path management procedure. The PF state
originally proposed to improve CMT performance [NATARAJAN09]. The PF was originally proposed to improve CMT performance [NATARAJAN09].
state is an intermediate state between Active and Failed states. The PF state is an intermediate state between Active and Failed
SCTP's failure detection procedure is modified to include the PF states. SCTP's failure detection procedure is modified to include
state. The new failure detection algorithm assumes that loss the PF state. The new failure detection algorithm assumes that loss
detected by a timeout implies either severe congestion or failure en- detected by a timeout implies either severe congestion or failure en-
route. After a number of consecutive timeouts on a path, the sender route. After a number of consecutive timeouts on a path, the sender
is unsure, and marks the corresponding destination as PF. A PF is unsure, and marks the corresponding destination as PF. A PF
destination is not used for data transmission except in special cases destination is not used for data transmission except in special cases
(discussed below). The new failure detection algorithm requires only (discussed below). The new failure detection algorithm requires only
sender-side changes. The details are: sender-side changes.
4.2. SCTP-PF Algorithm Detail
SCTP PF operation is specified as follows:
1. The sender maintains a new tunable parameter called Potentially- 1. The sender maintains a new tunable parameter called Potentially-
Failed.Max.Retrans (PFMR). The recommended value of PFMR = 0 Failed.Max.Retrans (PFMR). The RECOMMENDED value of PFMR = 0
when quick failover is used. When PFMR is larger or equal to when Quick Failover is used. When PFMR is larger or equal to
PMR, quick failover is turned off. PMR, Quick Failover is turned off.
2. The error counter of an active destination address is 2. The error counter of an active destination address is
incremented as specified in [RFC4960]. This means that the incremented as specified in [RFC4960]. This means that the
error counter of the destination address will be incremented error counter of the destination address will be incremented
each time the T3-rtx timer expires, or at times where a each time the T3-rtx timer expires, or at times where a
HEARTBEAT sent to an idle, active address is not acknowledged HEARTBEAT sent to an idle, active address is not acknowledged
within an RTO. When the value in the destination address error within an RTO. When the value in the destination address error
counter exceeds PFMR, the endpoint MUST mark the destination counter exceeds PFMR, the endpoint MUST mark the destination
transport address as PF. transport address as PF.
3. The sender SHOULD avoid data transmission to PF destinations. 3. The sender SHOULD avoid data transmission to PF destinations.
When the destinations are all in PF state or some in PF state When the destinations are all in PF state or some in PF state
and some in inactive state, the sender MUST choose one and some in inactive state, the sender MUST choose one
destination in PF state and transmit data to this destination. destination in PF state and transmit data to this destination.
The sender SHOULD choose the destination in PF state with least The sender SHOULD choose the destination in PF state with the
error count (fewest consecutive timeouts) for data transmission lowest error count (fewest consecutive timeouts) for data
and transmit data to this destination. In case of multiple PF transmission and transmit data to this destination. When there
destinations with same error count, the sender SHOULD let the are multiple PF destinations with same error count, the sender
choice among the multiple PF destination with equal error count SHOULD let the choice among the multiple PF destination with
be based on the [RFC4960], section 6.4.1, principles of choosing equal error count be based on the [RFC4960], section 6.4.1,
most divergent source-destination pairs when executing principles of choosing most divergent source-destination pairs
(potentially consecutive) retransmission. This means that the when executing (potentially consecutive) retransmission. This
sender SHOULD attempt to pick the most divergent source - means that the sender SHOULD attempt to pick the most divergent
destination pair from the last source - destination pair on source - destination pair from the last source - destination
which data were transmitted or retransmitted. Rules for picking pair on which data were transmitted or retransmitted. Rules for
the most divergent source-destination pair are an implementation picking the most divergent source-destination pair are an
decision and are not specified within this document. A sender implementation decision and are not specified within this
MAY choose to deploy other strategies than the above when document. A sender may choose to deploy other strategies than
choosing among multiple PF destinations with equal error count. the above when choosing among multiple PF destinations with
In all cases the sender MUST NOT change the state of chosen equal error count. In all cases the sender MUST NOT change the
destination and it MUST NOT clear the destination's error state of chosen destination and it MUST NOT clear the
counter as a result of choosing the destination for data destination's error counter as a result of choosing the
transmission. destination for data transmission.
4. Heartbeats SHOULD be sent to PF destination(s) once per RTO. 4. Heartbeats SHOULD be sent to PF destination(s) once per RTO.
This means the sender MUST ignore HB.interval for PF This means the sender MUST ignore HB.interval for PF
destinations. If an heartbeat is unanswered, the sender SHOULD destinations. If an heartbeat is unanswered, the sender SHOULD
increment the error counter and exponentially back off the RTO increment the error counter and exponentially back off the RTO
value. If error counter is less than PMR, the sender SHOULD value. If error counter is less than PMR, the sender SHOULD
transmit another heartbeat immediately after T3-timer transmit another heartbeat immediately after T3-timer
expiration. When data is transmitted to a PF destination the expiration. When data is transmitted to a PF destination, the
transmittal of heartbeats may be omitted as SACK or T3-rtx timer transmission of heartbeats may be omitted as SACK or T3-rtx
expiration can provide equivalent information. This timer expiration can provide equivalent information. It is
specification recommends that heartbeats be send to PF RECOMMENDED that heartbeats be send to PF destinations
destinations independently from whether the Path Heartbeat regardless of whether the Path Heartbeat function (Section 8.3
function (Section 8.3 of [RFC4960]) is enabled for the of [RFC4960]) is enabled for the destination address or not.
destination address or not.
5. When the sender receives an heartbeat ACK from a PF destination, 5. When the sender receives an heartbeat ACK from a PF destination,
the sender MUST clear the destination's error counter and the sender MUST clear the destination's error counter and
transition the PF destination back to Active state. When the transition the PF destination back to Active state. When the
sender resumes data transmission on the destination it MUST do sender resumes data transmission on the destination it MUST do
this following the prescriptions of Section 7.2 of [RFC4960]. this following the prescriptions of Section 7.2 of [RFC4960].
6. Additional (PMR - PFMR) consecutive timeouts on a PF destination 6. Additional (PMR - PFMR) consecutive timeouts on a PF destination
confirm the path failure, upon which the destination transitions confirm the path failure, upon which the destination transitions
to the Inactive state. As described in [RFC4960], the sender to the Inactive state. As described in [RFC4960], the sender
(i) SHOULD notify ULP about this state transition, and (ii) (i) SHOULD notify ULP about this state transition, and (ii)
transmit heartbeats to the Inactive destination at a lower transmit heartbeats to the Inactive destination at a lower
frequency as described in Section 8.3 of [RFC4960] (when this frequency as described in Section 8.3 of [RFC4960] (when this
function is enabled for the destination address). function is enabled for the destination address).
7. When all destinations are in inactive state (association dormant 7. When all destinations are in inactive state (association dormant
state) the sender MUST also choose one destination to transmit state) the sender MUST also choose one destination to transmit
data to. The sender SHOULD choose the destination in inactive data to. The sender SHOULD choose the destination in inactive
state with least error count (fewest consecutive timeouts) for state with the lowest error count (fewest consecutive timeouts)
data transmission and transmit data to this destination. In for data transmission and transmit data to this destination.
case of multiple destinations with same error count in inactive When there are multiple destinations with same error count in
state, the sender SHOULD attempt to pick the most divergent inactive state, the sender SHOULD attempt to pick the most
source - destination pair from the last source - destination divergent source - destination pair from the last source -
pair on which data were transmitted or retransmitted following destination pair on which data were transmitted or retransmitted
[RFC4960]. Rules for picking the most divergent source- following [RFC4960]. Rules for picking the most divergent
destination pair are an implementation decision and are not source-destination pair are an implementation decision and are
specified within this document. In order to support this not specified within this document. Therefore, a sender SHOULD
prescription a sender SHOULD allow for increment of the allow for incrementing the destination error counters up to some
destination error counters up to some reasonable limit above reasonable limit larger than PMR+1, thus changing the
PMR+1, thus changing the prescriptions of [RFC4960], section prescriptions of [RFC4960], section 8.3, in this respect. The
8.3, in this respect. The exact limit to apply is not specified exact limit to apply is not specified in this document but it is
in this document but it is considered reasonable to require for considered reasonable to require for such to be an order of
such to be an order of magnitude higher than the PMR value. A magnitude higher than the PMR value. A sender MAY choose to
sender MAY choose to deploy other strategies than the above. deploy other strategies than the above. For example, a sender
For example, a sender could choose to prioritize the last active could choose to prioritize the last active destination during
destination during dormant state. The strategy to prioritize dormant state. The strategy to prioritize the last active
the last active destination is optimal when some paths are destination is optimal when some paths are permanently inactive,
permanently inactive, but suboptimal when paths' instability is but suboptimal when paths' instability is transient. While the
transient. While the increment of the error counters above increment of the error counters above PMR+1 is a prerequisite
PMR+1 is a prerequisite for the error counter values to serve to for the error counter values to serve to guide the path
guide the path selection in dormant state, then it is noted that selection in dormant state, then it is noted that by virtue of
by virtue of the introduction of the Potentially Failed state, the introduction of the Potentially Failed state, one may deploy
one may deploy higher values of PMR without compromising the higher values of PMR without compromising the efficiency of the
efficiency of the failover operation, and thus making the failover operation, and thus making the increase of path error
increase of path error counters above PMR+1 less critical as the counters above PMR+1 less critical as the dormant state will be
dormant state will be less likely to happen. The downside of less likely to happen. The downside of increasing the PMR value
increasing the PMR value relative to the AMR value, however, is relative to the AMR value, however, is that the per destination
that the per destination address failure detection and address failure detection and notification of such to ULP
notification of such to ULP thereby is weakened. In all cases thereby is weakened. In all cases the sender MUST NOT change
the sender MUST NOT change the state of the chosen destination the state of the chosen destination and it MUST NOT clear the
and it MUST NOT clear the destination's error counter as a destination's error counter as a result of choosing the
result of choosing the destination for data transmission. destination for data transmission.
8. ACKs for chunks which have been transmitted to multiple 8. ACKs for chunks that have been transmitted to multiple
destinations (i.e., a chunk which has been retransmitted to a destinations (i.e., a chunk which has been retransmitted to a
different destination than the destination to which the chunk different destination than the destination to which the chunk
was first transmitted) SHOULD NOT clear the error count of an was first transmitted) SHOULD NOT clear the error count of an
inactive destination and SHOULD NOT transition a PF destination inactive destination and SHOULD NOT transition a PF destination
back to Active state, since a sender cannot disambiguate whether back to Active state, since a sender cannot disambiguate whether
the ack was for the original transmission or the the ACK was for the original transmission or the
retransmission(s). The same ambiguity concerns the related retransmission(s). The same ambiguity concerns the related
congestion window growth. In this respect then it is specified congestion window growth. The bytes of a newly acknowledged
that bytes of a newly acknowledged chunk which has been chunk which has been transmitted to multiple destinations SHOULD
transmitted to multiple destinations SHOULD, when the conditions be considered for contribution to the congestion window growth
for such contribution is fulfilled following the prescriptions towards the destination where the chunk was last sent. The
of Section 7.2 of [RFC4960], contribute to the congestion window contribution of the acked bytes to the window growth is subject
growth towards the destination to which the chunk was last to the prescriptions described in Section 7.2 of [RFC4960] is
transmitted. A SCTP sender MAY apply a different approach for fulfilled. A SCTP sender MAY apply a different approach for
both the error count handling as well as the congestion control both the error count handling and the congestion control growth
growth handling does it have unequivocally information as to handling based on unequivocally information on which destination
which destination (including multiple destinations) the chunk (including multiple destinations) the chunk reached. This
reached. This document makes no reference to what such document makes no reference to what such unequivocally
unequivocally information could consist of, neither how such information could consist of, neither how such unequivocally
unequivocally information could be obtained. The implementation information could be obtained. The implementation of such an
of such an alternative approach is left to implementations. alternative approach is left to implementations.
9. ACKs for chunks which has been transmitted to one destination 9. ACKs for chunks which has been transmitted to one destination
address only MUST clear the error counter of the destination address only MUST clear the error counter of the destination
address and MUST transition a PF destination back to Active address and MUST transition a PF destination back to Active
state. This situation can happen when new data is sent to a state. This situation can happen when new data is sent to a
destination address in PF state. It can also happen in destination address in PF state. It can also happen in
situations where the destination address is in PF state due to situations where the destination address is in PF state due to
the occurrence of a spurious T3-rtx timer and ACKs start to the occurrence of a spurious T3-rtx timer and ACKs start to
arrive for data sent prior to occurrence of the spurious T3-rtx arrive for data sent prior to occurrence of the spurious T3-rtx
and data has not yet been retransmitted towards other and data has not yet been retransmitted towards other
destinations. This document does not specify special handling destinations. This document does not specify special handling
for detection of or reaction to spurious T3-rtx timeouts, e.g., for detection of or reaction to spurious T3-rtx timeouts, e.g.,
for special operation vis-a-vis the congestion control handling for special operation vis-a-vis the congestion control handling
or data retransmission operation towards a destination address or data retransmission operation towards a destination address
which undergoes a transition from active to PF to active state which undergoes a transition from active to PF to active state
due to a spurious T3-rtx timeout. But it is noted that this is due to a spurious T3-rtx timeout. But it is noted that this is
an area which would benefit from additional attention, an area which would benefit from additional attention,
experimentation and specification for Single Homed SCTP as well experimentation and specification for Single Homed SCTP as well
as for Multi Homed SCTP protocol operation. as for Multi Homed SCTP protocol operation.
10. SCTP SHOULD provide the means to expose the PF state of its 10. SCTP stack SHOULD provide the ULP with the means to expose the
destinations as well as the means to notify the ULP of the state PF state of its destinations as well as the means to notify the
transitions from Active to PF and from PF to Active state . When state transitions from Active to PF, and vice-versa. When doing
doing such SCTP MUST provide the means to suppress exposure of this, such SCTP stack MUST provide the ULP with the means to
PF state and association state transitions and in this case the suppress exposure of PF state and association state transitions
ULP MAY make SCTP suppress exposure of PF state to ULP. If as well.
exposure of PF state is suppressed, the ULP will rely solely on
the [RFC4960] state machine even if Quick Failover function is
activated in SCTP.
4.2. Permanent Failover 4.3. Optional Feature: Permanent Failover
Post failover then, by [RFC4960] behavior, an SCTP sender migrates In [RFC4960], an SCTP sender migrates the traffic back to the
the traffic back to the original primary destination once this original primary destination once this destination becomes active
destination becomes active anew. As the CWND towards the original again. As the CWND towards the original primary destination has to
primary destination has to be rebuilt once data transfer resumes, the be rebuilt once data transfer resumes, the switch back to use the
switch back to use the original primary path is not always optimal. original primary path is not always optimal. Indeed [CARO02] shows
Indeed [CARO02] shows that the switch back to the original primary that the switch back to the original primary may degrade SCTP
may degrade SCTP performance compared to continuing data transmission performance compared to continuing data transmission on the same
on the same path, especially, but not only, in scenarios where this path, especially, but not only, in scenarios where this path's
path's characteristics are better. In order to mitigate this characteristics are better. In order to mitigate this performance
performance degradation, Permanent Failover operation was proposed in degradation, Permanent Failover operation was proposed in [CARO02].
[CARO02]. When SCTP changes the destination due to failover, When SCTP changes the destination due to failover, Permanent Failover
Permanent Failover operation allows SCTP sender to continue data operation allows SCTP sender to continue data transmission on the new
transmission on the new working path even if the old primary working path even if the old primary destination becomes active
destination becomes active again. This is achieved by having SCTP again. This is achieved by having SCTP perform a switch over of the
perform a switch over of the primary path to the alternative working primary path to the alternative working path rather than having SCTP
path rather than having SCTP switch back data transfer to the switch back data transfer to the (previous) primary path.
(previous) primary path.
The manner of switch over operation that is most optimal in a given The manner of switch over operation that is most optimal in a given
scenario depends on the relative quality of a set primary path versus scenario depends on the relative quality of a set primary path versus
the quality of alternative paths available as well as it depends on the quality of alternative paths available as well as it depends on
the extent to which it is desired for the mode of operation to the extent to which it is desired for the mode of operation to
enforce traffic distribution over a number of network paths. I.e., enforce traffic distribution over a number of network paths. I.e.,
load distribution of traffic from multiple SCTP associations may be load distribution of traffic from multiple SCTP associations may be
sought to be enforced by distribution of the set primary paths with sought to be enforced by distribution of the set primary paths with
[RFC4960] switchback operation. However as [RFC4960] switchback [RFC4960] switchback operation. However as [RFC4960] switchback
behavior is suboptimal in certain situations, especially in scenarios behavior is suboptimal in certain situations, especially in scenarios
where a number of equally good paths are available, it is recommended where a number of equally good paths are available, it is recommended
for SCTP to support also, as alternative behavior, the Permanent for SCTP to support also, as alternative behavior, the Permanent
Failover switch over modes of operation. Failover switch over modes of operation.
The Permanent Failover operation requires only sender side changes. The Permanent Failover operation requires only sender side changes.
The details are: The details are:
1. The sender maintains a new tunable parameter, called 1. The sender maintains a new tunable parameter, called
Primary.Switchover.Max.Retrans (PSMR). The PSMR SHOULD be set Primary.Switchover.Max.Retrans (PSMR). The PSMR MUST be set
greater or equal to the PFMR value. Any setting of PSMR < PFMR greater or equal to the PFMR value. Implementations MUST reject
MUST be rejected by the implementation. any other values of PSMR.
2. When the path error counter on a set primary path exceeds PSMR, 2. When the path error counter on a set primary path exceeds PSMR,
the SCTP implementation autonomously selects and sets a new the SCTP implementation MUST autonomously select and set a new
primary path. primary path.
3. The primary path selected by the SCTP implementation shall be the 3. The primary path selected by the SCTP implementation MUST be the
path which at the given time would be chosen for data transfer. path which at the given time would be chosen for data transfer.
A previously failed primary path may come in use as data transfer A previously failed primary path MAY come in use as data transfer
path as per normal path selection when the present data transfer path as per normal path selection when the present data transfer
path fails. path fails.
4. The recommended value of PSMR is PFMR when Permanent Failover is 4. The recommended value of PSMR is PFMR when Permanent Failover is
used. This means that no forced switchback to a previously used. This means that no forced switchback to a previously
failed primary path is performed. An implementation of Permanent failed primary path is performed. An implementation of Permanent
Failover MUST support set of PSMR = PFMR. An implementation of Failover MUST support the setting of PSMR = PFMR. An
Permanent Failover MAY support setting of PSMR > PFMR. implementation of Permanent Failover MAY support setting of PSMR
> PFMR.
5. It MUST be possible to disable the Permanent Failover and obtain 5. It MUST be possible to disable the Permanent Failover and obtain
the standard switchback operation of [RFC4960]. the standard switchback operation of [RFC4960].
We recommend that SCTP-PF sticks to the standard RFC4960 switchback This specifications RECOMMENDS a default configuration that uses
behavior as default, i.e., switch back to the old primary destination standard RFC4960 switchback, i.e., switch back to the old primary
once the destination becomes active again. However in order to destination once the destination becomes active again. However, to
support optimal operation in a wider range of network scenarios, an support optimal operation in a wider range of network scenarios, an
implementation MAY implement Permanent Failover operation as detailed implementation MAY implement Permanent Failover operation as detailed
above and MAY enable it based on network configurations or users' above and MAY enable it based on network configurations or users'
requests. requests.
5. Socket API Considerations 5. Socket API Considerations
This section describes how the socket API defined in [RFC6458] is This section describes how the socket API defined in [RFC6458] is
extended to provide a way for the application to control and observe extended to provide a way for the application to control and observe
the quick failover behavior. the quick failover behavior.
skipping to change at page 13, line 50 skipping to change at page 13, line 50
assoc_id: This parameter is ignored for one-to-one style sockets. assoc_id: This parameter is ignored for one-to-one style sockets.
For one-to-many style sockets the application may fill in an For one-to-many style sockets the application may fill in an
association identifier or SCTP_FUTURE_ASSOC. It is an error to association identifier or SCTP_FUTURE_ASSOC. It is an error to
use SCTP_{CURRENT|ALL}_ASSOC in assoc_id. use SCTP_{CURRENT|ALL}_ASSOC in assoc_id.
assoc_value: The potentially failed path state is exposed if and assoc_value: The potentially failed path state is exposed if and
only if this parameter is non-zero. only if this parameter is non-zero.
6. Security Considerations 6. Security Considerations
There are no new security considerations introduced in this document. Security considerations for the use of SCTP and its APIs are
discussed in [RFC4960] and [RFC6458]. There are no new security
considerations introduced in this document.
7. IANA Considerations 7. IANA Considerations
This document does not create any new registries or modify the rules This document does not create any new registries or modify the rules
for any existing registries managed by IANA. for any existing registries managed by IANA.
8. References 8. Proposed Change of Status (to be Deleted before Publication)
8.1. Normative References The initial status of this document was Experimental. However,
because of its usefulness, simple design and the existence of
multiple active implementations, it has been changed to PS by WG
consensus.
9. References
9.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, March 1997. Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC [RFC4960] Stewart, R., "Stream Control Transmission Protocol", RFC
4960, September 2007. 4960, September 2007.
8.2. Informative References 9.2. Informative References
[CARO02] Caro Jr., A., Iyengar, J., Amer, P., Heinz, G., and R. [CARO02] Caro Jr., A., Iyengar, J., Amer, P., Heinz, G., and R.
Stewart, "A Two-level Threshold Recovery Mechanism for Stewart, "A Two-level Threshold Recovery Mechanism for
SCTP", Tech report, CIS Dept, University of Delaware , 7 SCTP", Tech report, CIS Dept, University of Delaware , 7
2002. 2002.
[CARO04] Caro Jr., A., Amer, P., and R. Stewart, "End-to-End [CARO04] Caro Jr., A., Amer, P., and R. Stewart, "End-to-End
Failover Thresholds for Transport Layer Multihoming", Failover Thresholds for Transport Layer Multihoming",
MILCOM 2004 , 11 2004. MILCOM 2004 , 11 2004.
 End of changes. 37 change blocks. 
151 lines changed or deleted 168 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/