draft-ietf-shim6-reach-detect-00.txt   draft-ietf-shim6-reach-detect-01.txt 
INTERNET-DRAFT Iljitsch van Beijnum INTERNET-DRAFT Iljitsch van Beijnum
Jul 11, 2005 Jul 11, 2005
Shim6 Reachability Detection Shim6 Reachability Detection
draft-ietf-shim6-reach-detect-00.txt draft-ietf-shim6-reach-detect-01.txt
Status of this Memo Status of this Memo
By submitting this Internet-Draft, each author represents that any By submitting this Internet-Draft, each author represents that any
applicable patent or other IPR claims of which he or she is aware applicable patent or other IPR claims of which he or she is aware
have been or will be disclosed, and any of which he or she becomes have been or will be disclosed, and any of which he or she becomes
aware will be disclosed, in accordance with Section 6 of BCP 79. aware will be disclosed, in accordance with Section 6 of BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF), its areas, and its working groups. Note that Task Force (IETF), its areas, and its working groups. Note that
skipping to change at page 1, line 31 skipping to change at page 6, line ?
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
The list of current Internet-Drafts can be accessed at The list of current Internet-Drafts can be accessed at
http://www.ietf.org/ietf/1id-abstracts.txt http://www.ietf.org/ietf/1id-abstracts.txt
The list of Internet-Draft Shadow Directories can be accessed at The list of Internet-Draft Shadow Directories can be accessed at
http://www.ietf.org/shadow.html. http://www.ietf.org/shadow.html.
This Internet Draft expires Jan 11, 2006. This Internet Draft expires April 24, 2006.
Copyright Notice Copyright Notice
Copyright (C) The Internet Society (2005). All Rights Reserved. Copyright (C) The Internet Society (2005). All Rights Reserved.
Abstract Abstract
This draft discusses the issues of detecting failures in a currently The shim6 working group is developing a mechanism that allows
used address pair between two hosts and picking a new address pair to multihoming by using multiple addresses. When communication between
be used when a failure occurs. The input for these processes are the initially chosen addresses for a transport session is no longer
ordered lists of local and remote addresses that are reasonably likely possible, a "shim" layer makes it possible to switch to a different
to work. (I.e., not include addresses that are known to be unreachable set of addresses without breaking current transport protocol
for local reasons.) These lists must be available at both ends of the assumptions. This draft discusses the issues of detecting failures
communication, although the ordering may differ. Building these address in a currently used address pair between two hosts and picking a
lists from locally available information and synchronizing them with new address pair to be used when a failure occurs. The input for
the remote end are outside the scope of this document. these processes are ordered lists of local and remote addresses
that are reasonably likely to work. (I.e., not include addresses
This text is for the most part based on discussions on the multi6 list, that are known to be unreachable for local reasons.) These lists
several multi6 design team lists and the shim6 list, with notable must be available at both ends of the communication, although the
contributions from Erik Nordmark and Marcelo Bagnulo. ordering may differ. Building these address lists from locally
available information and synchronizing them with the remote end
are outside the scope of this document.
Suggestions and additions are more than welcome. This text is for the most part based on discussions on the multi6
list, several multi6 design team lists and the shim6 list, with
notable contributions from Erik Nordmark, Marcelo Bagnulo and Jari
Arkko. Suggestions and additions are more than welcome.
1 Introduction 1 Introduction
The most widespread mechanisms to ensure reachability in current A naive implementation of an (un)reachability detection mechanism
protocols are: could just probe all possible paths between two hosts periodically.
A "path" is defined as a combination of a source address for host A
and a destination address for host B. In hop-by-hop forwarding the
source address doesn't have any effect on reachability, but in the
presence of filters or source address based routing, it may. And
although links almost always work in two directions, routing
protocols and filters only work in one direction so unidirectional
reachability can happen. Without additional mechanisms, the
practice of ingress filtering by ISPs makes unidirectional
connectivity likely. Being able to use the working leg in a
unidirectional path is useful, it's not an essential requirement.
It is essential, however, to avoid assuming bidirectional
connectivity when there is in fact a unidirectional failure.
- Acknowledgments. For instance, in TCP each segment received is Exploring the full set of communication options between two hosts
acknowledged immediately or after a short delay. Lack of that both have two or more addresses is an expensive operation as
acknowledgments leads to retransmissions, and eventually, session the number of combinations to be explored increases very quickly
timeouts. with the number of addresses. For instance, with two addresses on
both sides, there are four possible address pairs. Since we can't
assume that reachability in one direction automatically means
reachability for the complement pair in the other direction, the
total number of two-way combinations is eight. (Combinations = nA *
nB * 2.)
- Keepalives. In routing protocols it's customary to send keepalives at An important observation in multihoming is that failures are
periodic intervals and look for either responses to local keepalives relatively infrequent, so that a path that worked a few seconds ago
or for keepalives generated by the other side. If no keepalives or is very likely to work now as well. So it makes sense to have a
responses were received for some time the other side is declared light-weight protocol that confirms existing reachability, and only
unreachable. invoke the much heavier protocol that can determine full
reachability when a there is a suspected failure.
- Monitoring and probing. IPv6 Neighbor Unreachability Detection 2 Determining reachability for the current pair
monitors the progress of higher layer protocols, and in the absence
of progress, probes the other side (when on-link) or the next hop
with a directed neighbor solicitation message. If there is no answer,
the other side (on-link) or router is declared unreachable.
None of these mechanisms seems like a good candidate to adopt for Reachability for the currently used address pair in a shim context
end-to-end reachability detection, either because they duplicate is determined by making sure that whenever there is data traffic in
existing mechanisms or introduce unnecessary overhead. one direction, there is also traffic in the other direction. This
can be data traffic as well, but also transport layer
acknowledgments or a shim reachability keepalive if there is no
other traffic. This way, it is no longer possible to have traffic
in only one direction, so whenever there is data traffic going out,
but there are no return packets, there must be a failure, so the
full path exploration mechanism is started.
In addition, exploring the full set of communication options between A more detailed description of the current pair reachability
two hosts that both have two or more addresses is an expensive evaluation mechanism:
operation as the number of combinations to be explored increases very
quickly with the number of addresses. For instance, with two addresses
on both sides, there are four possible address pairs. Since we can't
assume that reachability in one direction automatically means
reachability for the complement pair in the other direction, the total
number of two-way combinations is eight. (Combinations = nA * nB * 2.)
Although links almost always work in two directions, routing protocols
and filters only work in one direction so unidirectional reachability
can happen. Without additional mechanisms, the practice of ingress
filtering by ISPs makes unidirectional connectivity likely.
In order to reduce packet overhead, it makes sense to have different 1. The base timing unit for this mechanism is named ShimKeepT.
on-the-wire protocols for confirming existing reachability and full Until a negotiation mechanism to negotiate different values for
exploration of potential reachability. ShimKeepT becomes available, a value of 10 for ShimKeepT MUST be
used.
2 Determining reachability for the current pair 2. Whenever outgoing packets are generated that are part of a shim
context, one of two timestamps belonging to the shim context is
updated: the timestamp for outgoing data packets, or the timestamp
for outgoing non-data packets. The difference between the two is
that data packets are packets that should generate return traffic.
The host should use the information available to it to determine
whether a packet is a data or a non-data packet. Examples of
non-data packets are TCP ACKs and shim keepalive packets. If there
is any doubt, a packet should be considered a data packet.
In discussions two models came up for determining whether the current 3. Whenever incoming packets are received that are part of a shim
address pair used in ongoing communication still works. context, one of two timestamps belonging to the shim context is
updated: the timestamp for incoming data packets, or the timestamp
of incoming non-data packets. For incoming packets, it's less
critical that packets are labeled as data or non-data correctly. In
the absence of better information, hosts may assume that any IPv6
packet with a total length field with a value of 20 or lower is a
non-data packet.
The first model resembles IPv6 neighbor unreachability detection (NUD). 4. ShimKeepT seconds after the last data packet has been received
The idea is that when transport protocols see forward progress, they for a context, and if no other packet has been sent within this
inform the shim layer (positive feedback) and the shim layer doesn't context since the data packet has been received, a shim keepalive
take any action. However, in the absence of positive feedback and in packet is generated for the context in question and transmitted to
the presence of outgoing traffic, the shim layer generates packets that the correspondent. The shim keepalive packet consists of an IPv6
probe reachability. When the correspondent receives a probe, it sends header and a shim header containing the context tag, but no
back an acknowledgment so the shim layer at the originating host knows subsequent headers. Intermediate headers may be present between the
the address pair is still functional. When there are no acknowledgments IPv6 and shim headers. A host may send the shim keepalive after
for several probes, a full reachability exploration is executed. fewer than ShimKeepT seconds if implementation considerations
warrant this. The average time after which shim keepalives are sent
must be at least ShimKeepT / 2 seconds. After potentially sending a
single shim keepalive, no additional shim keepalives are sent until
a data packet is received within this shim context. If the shim
keepalive wasn't sent because a data or non-data packet was sent
since the last received data packet, no shim keepalives are sent.
The second model ensures that all communication is bidirectional. So 5. When after a timeout period since the last transmission of a
when communication isn't bidirectional, there must be a failure and data packet no packets were received from the correspondent within
again, a full reachability exploration is executed. Although most this context, a full reachability exploration is started. The
protocols generate traffic in both directions most of the time, there timeout period is ShimKeepT seconds plus additional time to
are times when there is only legitimate traffic in one direction and accommodate for a round trip and regular variations in
not the other. The shim layer monitors incoming and outgoing packets, network-related functions. In the absence of better information, a
and when there are incoming packets but no regular outgoing data timeout of at least ShimKeepT + 2 seconds but no more than
packets, the shim generates keepalive packets. So when there is ShimKeepT + 5 seconds is recommended.
outgoing traffic, there must be either regular incoming traffic, or
keepalives generated by the other side. If not, there is probably a
failure so the full reachability exploration procedure is executed.
There are several different tradeoffs between the two models: 3 Address pair exploration
- In the first model, the sending host detects the problem, in the In its essence, address pair exploration is very simple: just send
second model, the receiving host detects the problem probes using every possible address pair, wait for something to
come back and possibly consider the round trip time. In practice,
testing the full combination of all source addresses and all
destination addresses is very undesirable because of the large
number of packets involved. This can be especially harmful when a
lot of hosts on a link start doing this for many of their
correspondents at the same time when there is a failure further
upstream.
- In the first model, a host can detect problems in either direction In order to arrive at a desired outcome more quickly and with less
packets, and also to accommodate traffic engineering needs, we'll
assume a model where each address (source or destination) has two
preference values: p1 and p2. Addresses within the same set (source
or destination) are ranked by their p1 value, where a higher p1
means that the address is more preferred. When there are multiple
addresses with the same p1 value, an address is selected at random
from the group with the same p1 value, where the likelihood of
selecting any given address is relative to its p2 value compared to
the sum of all p2 values. So if addresses A, B and C have the same
p1 value and p2 values of 10, 30 and 60 for a total of 100, the
chance that A is selected is 10%, the chance that B is selected is
30% and the chance that C is selected is 60%.
- In the second model, a host can only detect problems in the receiving Note that preference information may be related to type of service.
direction so it must depend on the correspondent to detect problems So different context with different type of service requirements
in the other direction may see different p1 and p2 values for a given address.
- The first model generates traffic in both directions, possibly When a host suspects that there is a failure for a context, it
competing with payload traffic in the high-volume direction gathers the set of possible source addresses and the set of
possible destination addresses. Both sets are ordered such that
each next address has an equal or lower p1 value. Addresses with
the same p1 value are further ordered as per any heuristics that
the host may employ, such as longest prefix matches on known
working and/or known not working addresses along with the p2 value.
The p2 value is considered relatively weak, and breaking p2
ordering is allowed if there is a sufficient reason for this.
However, in the absence of other information, p2 ordering should be
used. P1 ordering overrules any other information except a recent
reachability failure for the address in question. In addition to
this, the most recently used address is put in front of the list.
- The second model only generates traffic in the no-traffic direction, From the lists of eligible source and destination addresses, the
so there is never competition with payload traffic host creates a list of source/destination address pairs, along with
a combined preference value for this address pair. The calculation
of the preference value is implementation specific, with the only
requirement being that when one address pair has a higher p1 for
both the source and destination address than another pair, the pair
with the higher p1 values also has a higher combined pair
preference value.
- In absence of upper layer protocol feedback, the first model always The list of address pairs from different contexts is combined into
sends periodic probes a host-wide list of address pairs. The preference values are
updated to take into consideration the number of contexts that is
interested in the pair. The specifics of calculating the resulting
host-wide preference value are left upto the implementation, but
implementations SHOULD try, within reason, to avoid using address
pairs with lower p1 values when pairs with higher p1 values are
available for a context. Context-specific address pair preferences
may be normalized prior to calculating host-wide address pair
preference values. (So when context A has pairs P and Q with p1
values 10 and 1, while context B has pairs R and S with p1 values 7
and 4, the values for P and R are changed to 2 and the values for Q
and S to 1.)
- The second model doesn't require upper layer protocol feedback to The host now starts probing address pairs, in order from the pair
suppress keepalives with the highest pair preference to the pair with the lowest pair
preference. When all address pairs have been tested, testing
restarts from the pair with the highest preference. New pairs that
become available are put in the list before pairs that have been
probed already, regardless of the preference values. However, both
the group of address pairs that haven't been probed and the group
of address pairs that have may be reordered to reflect the
preference values, as long as reordering is done such that
starvation doesn't occur.
There have been some discussions about positive versus negative When a probe is answered by the correspondent, the context that use
feedback. The first model doesn't have any use for negative feedback, the address pair in question are informed so they can start
but needs positive feedback to reduce overhead. The second model has remapping address is outgoing packets to the pair in question. (All
little or no use for positive feedback, but may use negative feedback of this also happens when there is a working pair but an address
pair with at least one address with a higher preference is
determined to work.) At this point, the context updates its list of
address pairs to probe by removing all pairs where either the
source address has a lower p1 value than the p1 value of the now
working source address, or the destination address has a lower p1
value than the p1 value of the now working destination address.
Additionally, all address pairs where the p1 values for the source
and destination addresses match the respective p1 values of the
source and destination addresses in the now working pair are
removed from the list. The host-wide list of address pair to probe
is updated to reflect the removal of lower or equal priority
addresses, so probing will only continue for pairs where at least
one address has a higher p1 than the currently working pair.
to detect failures faster. However, using negative feedback from upper The time between probes (ShimProbeT) must be chosen such that the
layer protocols may prove challenging because upper layers can't be number of probes is limited to 60 per 300 second period. When no
trusted to provide the right quality or quantity feedback ("feedback probes have been sent for some time, an implementation may send the
spamming"). initial group of probes at a fairly aggressive rate. For instance,
when no probes have been sent for 60 seconds, a host may send a
second probe 200 ms after the first one, and increase the
ShimProbeT by a factor 1.25 after every probe, until ShimProbeT
reaches 5 seconds. This results in sending 5 probes in the first 2
seconds and/or 14 probes within the first 20 seconds after a
failure. After that, there is one probe every 5 seconds.
3 Address pair exploration When a context didn't see any outgoing data packets (see section 2)
for four minutes, it removes all its address pairs from the
host-wide list of address pairs.
In its essence, address pair exploration is very simple: just send 4 Address pair exploration packet format
probes using every possible address pair, wait for something to come
back and possibly consider the round trip time.
In practice, doing a full address pair exploration is very undesirable The address pair exploration packet may be encapsulated in
because of the large number of packets involved. This can be especially different ways. An obvious way is inside a shim header. The address
harmful when a lot of hosts on a link start doing this for many of pair exploration packet contains the following information:
their correspondents at the same time when there is a failure further
upstream.
At this time, we don't have a clear vision of what this protocol should - A type field that is at least 8 bits long
look like, except that it should be conservative in the number of - An 8 bit "number of probes sent" field
packets it transmits in average-case scenarios, and that it's vitally - An 8 bit "number of probes received" field
important to reject very bad paths or address pairs. - An 8 bit "options length" field
- One or more sent probes (see below)
- Zero or more received probes (see below)
- Zero or more bytes of option data
Since the failures that have the largest potential to generate a lot of There is currently one bit in the type field defined: the reply
local address pair exploration are the ones where a link that's used requested bit. If this bit is set, the other side should send a
for a lot of different sessions breaks, it makes sense to somehow probe in reply to this probe.
generalize results for one correspondent into optimizations in the
address exploration with another correspondent.
A promising way to avoid bad paths would be to send out a first probe, The option data contains zero or more options in the following
wait for about a round trip for the old working path and then send format:
another probe, and after that do an exponential backoff. If either the
first or the second pair were reasonable choices, there is a workable
solution within several round trips.
4 Granularity - An 8 bit option type
- An 8 bit option length
- Zero or more bytes of data in this option
It has not been determined what the association/multiplexing Sent and received probes contain data in the following format:
granularity of shim6 will be: host-to-host,
upper-layer-identity-to-upper-layer-identity (ULID) or session. By its - Source locator/address (128 bits)
nature, the reachability detection works on address or locator pairs. - Destination locator/address (128 bits)
It would be highly inefficient if each session, or even each ULID pair, - Sent timestamp (32 bits in ms resolution relative to private epoch)
would do its own address pair exploration. On the other hand, it would - Time between reception and retransmission (32 bits in ms resolution,
also be undesirable force all sessions or ULID associations between 0 on first transmission)
two hosts to use the same address pairs. This probably means that when - Nonce (32 bits)
a failure is determined, all sessions or associations should act - Sequence number (32 bits)
accordingly, but when reachability is determined, each session or
association may react according to its own preferences. The first and only mandatory sent probe structure contains the
addresses that are present in the current IPv6 packet along with a
timestamp for the current time. Additional probe structures contain
copies of earlier probes, presumably toward different addresses,
with the appropriate field indicating how long ago the probe in
question was sent. The received probes are copies of the last seen
probes from the other side.
Note that an application must be able to infer which addresses
belong to the same host in order to perform this probing correctly
5 NAT and firewall considerations 5 NAT and firewall considerations
Since shim6 is chartered for IPv6 solutions only, and NAT compatibility Since shim6 is chartered for IPv6 solutions only, and NAT
is not expected, and by most people, not desired in IPv6, there is no compatibility is not expected, and by most people, not desired in
requirement for this protocol to pass through Network Address IPv6, there is no requirement for this protocol to pass through
Translation devices. However, the protocol may be applicable outside Network Address Translation devices. However, the protocol may be
shim6, making NAT compatibility desirable. applicable outside shim6, making NAT compatibility desirable.
It is absolutely essential that the shim6 negotiations and the It is absolutely essential that the shim6 negotiations and the
reachability detection packets are passed through filters or firewalls reachability detection packets are passed through filters or
wherever application packets are passed through. If the shim6 firewalls wherever application packets are passed through. If the
negotiation and reachability detection packets are filtered out, shim6 shim6 negotiation and reachability detection packets are filtered
can't be used. out, shim6 can't be used.
A more complex situation arises when the shim6 negotiation packets pass A more complex situation arises when the shim6 negotiation packets
through a firewall, but the reachability detection packets are blocked. pass through a firewall, but the reachability detection packets are
To avoid this complexity, it's highly desirable to make the shim6 blocked. To avoid this complexity, it's highly desirable to make
negotiation and reachability detection part of the same protocol, so the shim6 negotiation and reachability detection part of the same
either both are allowed through or both are blocked. However, the same protocol, so either both are allowed through or both are blocked.
is true if this reachability detection mechanism is used in other However, the same is true if this reachability detection mechanism
protocols. This makes it desirable to define the reachability detection is used in other protocols. This makes it desirable to define the
protocol such that it can be embedded in other protocols. reachability detection protocol such that it can be embedded in
other protocols.
Since firewalls are in wide use, it's important to consider whether a Since firewalls are in wide use, it's important to consider whether
new protocol will be able to pass through most firewalls without a new protocol will be able to pass through most firewalls without
requiring changes to the filter configuration. On the other hand, it requiring changes to the filter configuration. On the other hand,
may not be possible to come up with a protocol that would be allowed it may not be possible to come up with a protocol that would be
through a large percentage of all firewalls without changes, so extra allowed through a large percentage of all firewalls without
effort in this area may produce limited results. Also, in the long run changes, so extra effort in this area may produce limited results.
firewall configuration will presumably be changed, so any compromises Also, in the long run firewall configuration will presumably be
would only have short term benefits but long term downsides. changed, so any compromises would only have short term benefits but
long term downsides.
6 Security considerations 6 Security considerations
To avoid exposing information (even if it's just the fact that an To avoid exposing information (even if it's just the fact that an
address is reachable), hosts will probably want to limit themselves to address is reachable), hosts will probably want to limit themselves
taking part in reachability detection with known correspondents. This to taking part in reachability detection with known correspondents.
means that there must be identifying information and a nonce that is at This means that there must be identifying information and a nonce
least hard to guess but easy to check in all reachability detection that is at least hard to guess but easy to check in all
packets. reachability detection packets.
4 Document and author information 4 Document and author information
This document expires January, 2006. The latest version will always be This document expires April, 2006. The latest version will always
available at http://www.muada.com/drafts/. Comments are welcome at: be available at http://www.muada.com/drafts/. Comments are welcome
at:
Iljitsch van Beijnum Iljitsch van Beijnum
Email: iljitsch@muada.com Email: iljitsch@muada.com
Intellectual Property Statement Intellectual Property Statement
The IETF takes no position regarding the validity or scope of any The IETF takes no position regarding the validity or scope of any
Intellectual Property Rights or other rights that might be claimed to Intellectual Property Rights or other rights that might be claimed to
pertain to the implementation or use of the technology described in pertain to the implementation or use of the technology described in
 End of changes. 39 change blocks. 
163 lines changed or deleted 280 lines changed or added

This html diff was produced by rfcdiff 1.27, available from http://www.levkowetz.com/ietf/tools/rfcdiff/