[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits] [IPR]

Versions: 00 01 02 03

Network Working Group                                        R. Perlman
Internet Draft                                                      Sun
Expires: November 2005                                         J. Touch
                                                                USC/ISI
                                                               A. Yegin
                                                                Samsung
                                                            May 2, 2005



                       RBridges: Transparent Routing
                       draft-perlman-rbridge-03.txt


Status of this Memo

   By submitting this Internet-Draft, each author represents that
   any applicable patent or other IPR claims of which he or she is
   aware have been or will be disclosed, and any of which he or she
   becomes aware will be disclosed, in accordance with Section 6 of
   BCP 79.

   This document may not be modified, and derivative works of it may not
   be created, except to publish it as an RFC and to translate it into
   languages other than English.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-
   Drafts.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at
        http://www.ietf.org/ietf/1id-abstracts.txt

   The list of Internet-Draft Shadow Directories can be accessed at
        http://www.ietf.org/shadow.html

   This Internet-Draft will expire on November 2, 2005.

Copyright Notice

   Copyright (C) The Internet Society (2005).  All Rights Reserved.




Perlman                Expires November 2, 2005                [Page 1]


Internet-Draft      RBridges: Transparent Routing              May 2005


Abstract

   RBridges provide the ability to have an entire campus, with multiple
   physical links, look to IP like a single subnet. The design allows
   for zero configuration of switches within a campus, optimal pair-wise
   routing, safe forwarding even during periods of temporary loops, and
   the ability to cut down on ARP/ND traffic. The design also supports
   VLANs, and allows forwarding tables to be based on RBridge
   destinations (rather than endnode destinations), which allows
   internal routing tables to be substantially smaller than in
   conventional bridge systems.

   This document is a work in progress; we invite you to participate on
   the mailing list at http://www.postel.org/RBridge

Table of Contents


   1. Introduction...................................................3
   2. Detailed RBridge Design........................................5
      2.1. Link State Protocol.......................................5
      2.2. Spanning Tree.............................................6
      2.3. Designated RBridge........................................7
      2.4. Learning Endnode Location.................................8
      2.5. Forwarding Behavior.......................................8
      2.6. Forwarding Header on 802 Links............................8
      2.7. Distributed ARP Query....................................11
   3. RBridge Addresses, Parameters, and Constants..................12
   4. Handling ARP Queries..........................................12
   5. Issues........................................................13
      5.1. How Many Spanning Trees?.................................13
         5.1.1. Per-ingress Spanning Tree...........................13
         5.1.2. Per VLAN............................................13
         5.1.3. Single Spanning Tree................................13
      5.2. Reasons Not to Optimize Handling of IP packets...........14
         5.2.1. Avoiding Encapsulation for On-campus IP Packets.....14
         5.2.2. Avoiding Encapsulation for Cff-campus IP Packets....15
      5.3. Supporting Heterogeneous Link Types......................15
      5.4. Effects on L3 TTL........................................15
      5.5. Using L3 encapsulation...................................15
      5.6. Optimizing ARP/ND........................................16
   6. Security Considerations.......................................17
   7. Conclusions...................................................17
   8. Acknowledgments...............................................17
   9. References....................................................17
      9.1. Normative References.....................................17
      9.2. Informative References...................................18


Perlman                Expires November 2, 2005                [Page 2]


Internet-Draft      RBridges: Transparent Routing              May 2005


   Author's Addresses...............................................19
   Intellectual Property Statement..................................19
   Disclaimer of Validity...........................................20
   Copyright Statement..............................................20
   Acknowledgment...................................................20

1. Introduction

   In traditional IPv4 and IPv6 networks, each link must have a unique
   prefix.  This means that a node that moves from one link to another
   must change its IP address, and a node with multiple links must have
   multiple addresses.  It also means that a company with many links
   (separated by routers) will have difficulty making full use of its IP
   address block (since any link not fully populated will waste
   addresses), and routers require significant configuration.

   Bridges avoid these problems because bridges can transparently glue
   many physical links into what appears to IP to be a single LAN.
   However, bridge routing via the spanning tree concentrates traffic
   onto selected links, forward based on a header for which any
   temporary loops (which might arise due to topology changes or lost
   spanning tree messages or components such as repeaters coming up) are
   very dangerous (because there is no hop count in the header and there
   may be exponential proliferation of packets during loops), and routes
   cannot be pair-wise shortest paths, but instead whatever path remains
   after the spanning tree eliminates redundant paths.

   We define the term "campus" to be the set of links connected by any
   combination of RBridges and bridges. In other words the term 'campus'
   needs to be clearly defined.  A campus refers to a set of links
   connected by either RBridges or bridges.  In other words, the campus
   is terminated by traditional IP routers, in the same way that an IP
   subnet would be terminated by an IP router.  A campus will look to IP
   nodes like a single IP subnet, whether the interconnection of the
   links is done with bridges, RBridges, or some combination of the two.

   There have been proposals for having routers within a campus
   automatically number links with distinct IP subnet numbers.  Although
   this makes a campus plug-and-play, it requires a large number of IP
   subnet numbers, a node must change its address if it moves to a
   different link, and addresses of nodes might fluctuate as the
   topology changes and links must be renumbered.

   This proposal introduces RBridges [8] (Routing Bridges), which
   combine the advantages of bridges and routers. Like bridges, RBridges
   are zero configuration, and are transparent to IP nodes. Like
   routers, RBridges forward on pair-wise shortest paths, and do not


Perlman                Expires November 2, 2005                [Page 3]


Internet-Draft      RBridges: Transparent Routing              May 2005


   have dangerous behavior during temporary loops. RBridges have the
   additional advantage that they can suppress the broadcast/multicast
   for neighbor discovery by doing proxy ARP (IPv4) or proxy ND (IPv6).

   RBridges are fully compatible with current bridges as well as current
   IPv4 and IPv6 routers and endnodes.  They are as invisible to current
   IP routers as bridges are, and like routers, they terminate a bridged
   spanning tree.

   The main idea is to have RBridges run a link state protocol amongst
   themselves (IS-IS is ideal, since its TLV encoding easily allows new
   information to be carried in link state information, as this proposal
   requires, and also makes zero configuration easier because IS-IS does
   not require assigning IP addresses to the RBridges).

   The next step is for RBridges to learn the location of endnodes. They
   can learn the location and layer 2 addresses of attached nodes from
   the source address of data packets, as bridges do. Additionally, in
   order to facility proxy ARP or proxy ND optimizations, RBridges can
   also learn the (layer 3, layer 2) addresses of attached IP nodes from
   ARP or ND replies.

   Once an RBridge learns the location of a directly attached endnode,
   it informs the other RBridges in its link state information.

   RBridge forwarding can be done, as with a router, via pairwise
   shortest paths.  RBridges could also utilize forwarding
   optimizations, e.g., MPLS.

   To prevent the temporary loop issues with bridges, RBridges must
   always forward based on a header with a hop count. Although the hop
   count will quickly discard looping packets, it is also desirable not
   to spawn additional copies of packets. This can be accomplished by
   having RBridges specify the next RBridge recipient while forwarding
   across a shared-media link.

   For two reasons, packets must be encapsulated as they are traveling
   between RBridges:

   1. so that intermediate RBridges (and bridges) will not be confused
      about the location of the source by learning the source address
      from packets in transit

   2. so that the packet can be directed towards the egress RBridge, and
      can include a hop count (for links, like Ethernet, that do not
      already contain a hop count).



Perlman                Expires November 2, 2005                [Page 4]


Internet-Draft      RBridges: Transparent Routing              May 2005


   RBridges are similar to Recursive Routers, which provide similar
   transit to emulate a single L3 router, in that case using L3 + L2
   encapsulation [10][11].

   A VLAN is a broadcast domain. That means that a layer 2 broadcast
   (multicast) packet sent to a VLAN must only be delivered to links
   that are in that VLAN. A packet for a particular VLAN may transit any
   link on the campus, but an unencapsulated VLAN packet must only be
   delivered to links that RBridges have been configured to know support
   that VLAN. Support of VLANs does traditionally require configuration
   of the bridges (or in this case RBridges) to know which links belong
   to which VLANs. In theory some other mechanism might allow an RBridge
   to know which VLANs should be supported on which port. The RBridge
   design does not care how RBridges discover which VLANs are supported
   by each of their ports, but for simplicity we assume here that
   RBridges (like bridges) are configured with this information.

   RBridges must calculate a spanning tree for each broadcast domain. In
   a campus without VLANs, this means a single spanning tree would be
   used for delivery of packets with unknown or group address layer 2
   destination.

   It is possible to support VLANs with a single spanning tree, and just
   avoid forwarding the decapsulated packet onto links that do not
   support that VLAN. However, it will allow for more optimal delivery
   if a different spanning tree is calculated for each broadcast domain.

   It is not necessary to use the bridge spanning tree algorithm to
   calculate the spanning trees. Instead, they can be calculated based
   on the link state information. Using the link state protocol to
   calculate spanning trees makes the design very flexible and
   efficient. The link state database gives sufficient information so
   that RBridges can calculate a single spanning tree, spanning trees
   per VLAN, or per-ingress RBridge spanning trees without requiring any
   additional exchange of information between RBridges.

2. Detailed RBridge Design

2.1. Link State Protocol

   Running a link state protocol among RBridges is straightforward.  It
   is the same as running a level 1 routing protocol in an area.  IS-IS
   is a more appropriate choice than OSPF in this case because it is
   easy in IS-IS to define new TLVs for carrying new information.
   However, the instance of IS-IS that RBridges will implement will be
   separate from any routing protocol that IP routers will implement,
   just as the spanning tree messages are not implemented by IP routers.


Perlman                Expires November 2, 2005                [Page 5]


Internet-Draft      RBridges: Transparent Routing              May 2005


   To keep the instances separate, RBridge routing messages should be
   sent to a different layer 2 multicast address than IS-IS routing
   messages.  Alternatively, they can be differentiated by having a
   different "area address", where, in order to keep RBridges
   configuration-free, the RBridge area address would be a constant for
   all RBridges, and would not be one that would ever appear as a real
   IS-IS area address.

   Additional information that RBridge link state information will carry
   is:

   o  layer 2 addresses of nodes within the campus which have
      transmitted packets but have not transmitted ARP or ND replies

   o  layer 3, layer 2 addresses of IP nodes within the campus.  For
      data compression, perhaps only the portion of the address
      following the campus-wide prefix need be carried.  This will be
      more of an issue for IPv6 than for IPv4.

   o  VLANs directly connected to this RBridge

   The endnode information (the endnode information) need only be
   delivered to RBridges supporting the VLAN in which the endnode
   resides. So for instance, if endnode E is discovered through a VLAN A
   packet, then E's location need only be delivered to other RBridges
   that are attached to VLAN A links.

   Given that RBridges must support delivery only to links within a VLAN
   (for multicast or unknown packets marked with the VLAN's tag), this
   mechanism can be used to advertise endnode information solely to
   RBridges within a VLAN. Although a separate instance of the link
   state protocol could be run for this purpose, the topology is so
   restricted (just a single broadcast domain), that it might be
   preferable to design a special case mechanism where each DR
   advertises its attached endnodes, and receives explicit acks from the
   other RBridges.

2.2. Spanning Tree

   There will be cases when RBridges may need to send packets to all
   links.  These cases include:

   o  layer 2 multicast or broadcast packets

   o  unknown layer 2 destination addresses

   o  distributed RBridge layer 3 address location query


Perlman                Expires November 2, 2005                [Page 6]


Internet-Draft      RBridges: Transparent Routing              May 2005


   In this case the packets must be sent through a spanning tree.
   However, there is no need to implement a separate spanning tree
   protocol in addition to the link state protocol.  Instead, the link
   state information can be used to create a single spanning tree
   throughout the campus.  This is done by choosing the RBridge with
   lowest ID, and calculating the Dijkstra tree with that RBridge as
   Root.

   In the case of multiple equal cost links, some tie-breaker must be
   used to ensure that all RBridges calculate the same spanning tree. We
   suggest using the ID of the parent as the tie breaker (if a node can
   be attached to either parent P1 or P2 with the same cost, choose P1
   if P1's ID is lower than P2).

   In the case of multicast L2 addresses, the RBridge may treat these as
   broadcast, or may include existing techniques for emulating multicast
   at L2, i.e., snooping IGMP and/or PIM-SM packets to configure an
   internal, L2 multicast tree.

   For a packet tagged with a VLAN ID (e.g., VLAN A), the packet is only
   delivered to links that support VLAN A. It would provide for more
   optimal delivery if a different spanning tree were calculated for
   each VLAN. This would be done by choosing the RBridge with lowest ID
   that connects to that VLAN as root, and calculating a tree of
   shortest paths from that RBridge. RBridges that do not support VLAN A
   may be on the delivery path for VLAN A packets, but they will not
   decapsulate the packet onto links that are not VLAN A links.

   If IGMP snooping is used to know where recipients of a multicast
   packet reside, then the total number of packet-hops to deliver the
   packet can be optimized by calculating a separate spanning tree per
   ingress RBridge. This, however, requires a lot more computation (one
   tree per RBridge). The tradeoffs will be discussed in the "Issues"
   section at the end of this document.

2.3. Designated RBridge

   It is useful for one RBridge on each link to have special duties.
   Thus one RBridge per link should be elected Designated RBridge. IS-IS
   already holds such an election.

   The Designated RBridge is the one on the link that will learn the
   identities of attached endnodes, initiate a distributed ARP when an
   ARP query is received for an unknown destination, and answer ARP
   queries when the target node is known.




Perlman                Expires November 2, 2005                [Page 7]


Internet-Draft      RBridges: Transparent Routing              May 2005


2.4. Learning Endnode Location

   RBridges learn endnode location from data packets. They learn (layer
   3, layer 2) pairs (for the purpose of supporting proxy ARP/ND) from
   listening to ARP or ND replies.

   This endnode information is learned by the DR, and distributed to
   other RBridges through the link state protocol.

2.5. Forwarding Behavior

   When a DR R1 receives a native packet with layer 2 address S and
   layer 2 destination address D, R1 looks up the location of D. If D is
   claimed by egress RBridge R2, then R1 encapsulates the packet,
   directing it towards R2.

   When an RBridge receives an encapsulated packet, it forwards based on
   the specified egress RBridge (rather than the ultimate destination
   endnode).

   If the packet belongs in VLAN A, then R1 (the ingress RBridge) looks
   up D's location in R1's table of VLAN A endnodes.

2.6. Forwarding Header on 802 Links

   It is essential that RBridges coexist with ordinary bridges.
   Therefore, a packet in transit must look to ordinary bridges like an
   ordinary layer 2 packet. However, it must also be differentiable from
   a native layer 2 packet by RBridges. To accomplish this, we use a new
   layer 2 protocol type ("Ethertype").

   A packet in transit on an 802 link will therefore have two 802
   headers, since the original frame (including the original 802 header)
   will be tunneled by the RBridges. But rather than just having an
   additional 802 header, we include additional information between the
   two headers; at least a hop count.

   An encapsulated packet would look as follows:

               +--------------+-------------+-----------------+
               | outer header | shim header | original packet |
               +--------------+-------------+-----------------+

                       Figure 1 Encapsulated packet





Perlman                Expires November 2, 2005                [Page 8]


Internet-Draft      RBridges: Transparent Routing              May 2005


   The outer header contains:

   o  L2 destination = next RBridge

   o  L2 source = transmitting RBridge (the most one that most recently
      handled this packet)

   o  protocol type = "to be assigned...RBridge encapsulated packet"

   The shim header includes:

   o  TTL = starts at some value and decremented by each RBridge.
      Discarded if=0

   o  egress RBridge (in the case of unicast), or ingress RBridge (in
      the case of multicast)

   Note that one variation is to have the egress RBridge specified in
   the outer header rather than in the shim header. This will mean that
   some packet duplication might occur during temporary loops. But the
   advantage is that the header will be 6 bytes smaller. This is
   discussed in the "issues" section.

   The following is a walk-through of a packet traversing an RBridge
   campus. Consider a packet consisting of "data" to be sent from node A
   to node B through an RBridge campus (dotted area) as per Figure 2.

                     ...............................
                     .                             .
         +--------+  .+-----+    +-----+    +-----+.   +--------+
         |        |  .|     |    |     |    |     |.   |        |
         | Host A ----- Rb1 ------ Rb2 ------ Rb3 ------ Host B |
         |        |  .|     |    |     |    |     |.   |        |
         +--------+  .+-----+    +-----+    +-----+.   +--------+
                     .                             .
                     .              RBridge campus .
                     ...............................

       Figure 2 Sample path for packet traversing an RBridge campus

   In this figure, Host A is the source, Host B the sink, and Rb1..Rb3
   are nodes of the RBridge campus. Rb1 is the ingress, and Rb3 is the
   egress. Additionally, layer 2 (L2) addresses are as shown below the
   components on the particular ports in Figure 3; note that addresses
   are required for RBridge nodes for encapsulation and routing within
   the campus. Different addresses are shown for each port on an RBridge
   node for simplicity, although this is not required.


Perlman                Expires November 2, 2005                [Page 9]


Internet-Draft      RBridges: Transparent Routing              May 2005


                     ...............................
                     .                             .
         +--------+  .+-----+    +-----+    +-----+.   +--------+
         |        |  .|     |    |     |    |     |.   |        |
         | Host A ----- Rb1 ------ Rb2 ------ Rb3 ------ Host B |
         |        a   b1x b1y    b2x b2y    b3x b3y    b        |
         |        |  .|     |    |     |    |     |.   |        |
         +--------+  .+-----+    +-----+    +-----+.   +--------+
                     .                             .
                     .              RBridge campus .
                     ...............................

                Figure 3 Sample path including L2 addresses

   Consider the originating packet as per Figure 4; "L2 a->b" means the
   layer 2 (L2) source address is "a" and the L2 destination address is
   "b", and "IP A->B" means the IP source address is A and the IP
   destination is B.

                      +---------+---------+--------+
                      | L2 a->b | IP A->B |  data  |
                      +---------+---------+--------+

                  Figure 4 Packet as originated at Host A

   The ingress RBridge Rb1 looks up 'b' in its encapsulation tables,
   which indicate that Rb3 is the egress RBridge. The packet gets
   wrapped to direct it to Rb3 using a shim header (SH), where the
   destination is based on the L2 address of Rb3 (the egress) and uses a
   TTL of 20, as shown in Figure 5.

              +-----------------+---------+---------+--------+
              | SH ->b3y TTL=20 | L2 a->b | IP A->B |  data  |
              +-----------------+---------+---------+--------+

                     Figure 5 Packet with shim header

   Note that the shim header includes only egress addresses for unicast
   packets; for multicast packets, ingress L2 is used instead.

   Rb1 then looks up the shim header destination in its (campus)
   forwarding tables, yielding Rb2 as the next hop inside the campus.
   Rb1 then sends the packet on to Rb2 by adding the appropriate L2
   header, as shown in Figure 6.





Perlman                Expires November 2, 2005               [Page 10]


Internet-Draft      RBridges: Transparent Routing              May 2005


      +-------------+-----------------+---------+---------+--------+
      | L2 b1y->b2x | SH ->b3y TTL=20 | L2 a->b | IP A->B |  data  |
      +-------------+-----------------+---------+---------+--------+

                  Figure 6 Packet as sent from Rb1 to Rb2

   Rb2 unwraps the outermost L2, decrements the shim TTL, and looks up
   the shim destination's next hop (which is Rb3 here). Rb2 then adds a
   new L2 header addressed to Rb3, as shown in Figure 7.

      +-------------+-----------------+---------+---------+--------+
      | L2 b2y->b3x | SH ->b3y TTL=19 | L2 a->b | IP A->B |  data  |
      +-------------+-----------------+---------+---------+--------+

                  Figure 7 Packet as sent from Rb2 to Rb3

   Rb3 unwraps the outer L2, notices that the shim destination has been
   reached (itself), and unwraps the shim too. At that point, it
   proceeds to send the original packet shown in Figure 4 to Host B.

2.7. Distributed ARP Query

   The distributed ARP query is carried by RBridges through the RBridge
   spanning tree. Each Designated RBridge, in addition to forwarding the
   query through the spanning tree, initiates an ARP query on its
   link(s). If a reply is received by Designated RBridge R2, R2
   initiates a link state update to inform all the other RBridges of D's
   location, layer 3 address, and layer 2 address.

   The distributed ARP query must be sent to a (new, to be assigned)
   layer 2 multicast address. The fields it must contain are:

   Outer Layer 2 header:

   o  destination = newly defined l2 multicast address

   o  source = transmitting RBridge (replaced hop by hop)

   o  protocol type = same as encapsulated RBridge

   Shim header:

   o  TTL (for safety if the RBridge spanning tree has temporary loops,
      and where the L2 header lacks an existing TTL)





Perlman                Expires November 2, 2005               [Page 11]


Internet-Draft      RBridges: Transparent Routing              May 2005


   o  ingress RBridge (rather than egress RBridge, which would be
      specified in unicast packets to known destinations); this is used
      for ingress-specific forwarding, e.g., for VLANs

   RBridge payload:

   o  original ARP or ND query

   Intermediate RBridges decrement the above TTL, and replace the source
   RBridge with their own layer 2 address on the outgoing interface.

3. RBridge Addresses, Parameters, and Constants

   Each RBridge needs a unique ID within the campus.  The simplest such
   address is a unique 6-byte ID, since such an ID is easily obtainable
   as any of the EUI-48's owned by that RBridge.  IS-IS already requires
   each router to have such an address.

   A parameter is the value to which to initially set the hop count in
   the envelope.  Recommended default=20.

   A new Ethertype must be assigned to indicate an RBridge-encapsulated
   packet.

   A layer 2 multicast address must be assigned for use as the
   destination address in distributed ARP queries.

   To support VLANs, RBridges (like bridges today), must be configured,
   for each port, with the VLAN in which that port belongs.

4. Handling ARP Queries

   If the target address is unknown, initiate a distributed ARP query.
   If the target address is known, reply with a proxy ARP reply, giving
   the target's true layer 2 address.

   When initiating a distributed ARP query (or IPv6 neighbor
   solicitation) remember the address of the requesting node.  When the
   information is discovered, respond to the requester.










Perlman                Expires November 2, 2005               [Page 12]


Internet-Draft      RBridges: Transparent Routing              May 2005


5. Issues

5.1. How Many Spanning Trees?

5.1.1. Per-ingress Spanning Tree

   If a separate spanning tree is calculated per ingress RBridge, then
   delivery of both broadcast and multicast packets, where the recipient
   locations are known through some mechanism such as IGMP snooping, can
   be optimized (for number of packet hops to deliver the multicast
   packet).

   Also, if a separate spanning tree is calculated per ingress RBridge,
   then out of order delivery is minimized when RBridges learn the
   location of the destination, since the packet will traverse the same
   path whether it is being delivered via the "destination unknown" tree
   to that broadcast domain, or the direct path to that destination.

   However, there is obvious overhead involved in calculating separate
   spanning trees.

   This mechanism of avoiding out of order delivery by calculating
   separate spanning trees per ingress RBridge was presented at the IETF
   TRILL BOF on March 10, 2005.

5.1.2. Per VLAN

   If there are not many links that support VLAN A, then total number of
   packet hops to deliver a packet within the VLAN A broadcast domain is
   minimized by calculating a separate spanning tree for each VLAN.

   It would be possible to still support VLANs with a single spanning
   tree, by having RBridges only decapsulate a VLAN A packet onto VLAN A
   links, but the number of transit links such a packet would traverse
   would be more than necessary (assuming that the location of VLAN A
   links within the campus is somewhat sparse).

5.1.3. Single Spanning Tree

   Broadcast and multicast and VLANs can be supported with a single
   spanning tree, which the simplest solution and requires the least
   computation and smallest forwarding tables in the RBridges. In that
   case all such packets would be delivered to all the RBridges, and
   only Designated RBridges would differentiably not forward onto links
   that the packet does belong on. So from the endnodes' point of view,
   things are still correct; a packet will only be delivered to the



Perlman                Expires November 2, 2005               [Page 13]


Internet-Draft      RBridges: Transparent Routing              May 2005


   proper links. But the cost to deliver the packet within the core can
   be much greater.

   Additionally, the more different spanning trees that are utilized,
   the more all the links within the core can be fully utilized.

   The cases in which a broadcast/multicast packet is not delivered to
   all the links in the campus are:

   o  when there is a VLAN tag, in which case the packet will only be
      delivered to links that support that VLAN

   o  when the layer 2 multicast is derived from an IP multicast, and
      the RBridges have learned, through IGMP snooping, which links wish
      to receive the packet

5.2. Reasons Not to Optimize Handling of IP packets

   There are two optimizations that were considered but abandoned due to
   their impact on transparency, i.e., that an RBridge should appear
   like a bridged network to upper layer protocols. These optimizations
   focus on ways of merging the shim layer functionality with the
   existing headers of IP packets.

5.2.1. Avoiding Encapsulation for On-campus IP Packets

   In theory, on-campus IP packets need not be encapsulated with an
   additional layer 2 header.  The original layer 2 header can be
   discarded and replaced with one where the layer 2 destination is
   replaced by the next RBridge, and the source layer 2 address is
   replaced by something that will not confuse bridge learning (since
   packets will be injected into each segment from unpredictable
   directions because shortest path routes will be used).

   The disadvantages of this approach are:

   o  the IP header's TTL would be decremented by each RBridge, making
      the customer aware that bridges have been replaced by RBridges,
      and possibly breaking IP protocols that expect the TTL not to be
      decremented over an L2 system

   o  the original layer 2 addresses might need to be preserved for some
      conceivable uses

   The real disadvantage, though, is that RBridges would have to have
   more complex forwarding behavior. They would need to forward based on
   layer 2 addresses sometimes, and layer 3 addresses at other times.


Perlman                Expires November 2, 2005               [Page 14]


Internet-Draft      RBridges: Transparent Routing              May 2005


   Even if all packets were IP, RBridges would need to forward packets
   for off-campus IP destinations based on the layer 2 address of the IP
   router.

5.2.2. Avoiding Encapsulation for Cff-campus IP Packets

   Likewise, in theory, off-campus IP packets need not be encapsulated.
   The TTL in the IP header can be decremented.  The same disadvantages
   as for on-campus IP packets apply, including the concerns on the
   impact of decremented TTL on other IP protocol behavior.  However,
   there is the additional disadvantage that since the actual layer 2
   destination has to be preserved end-to-end there is the danger of
   packet proliferation if multiple RBridges decide to forward the
   packet, which can occur while the topology is adjusting.

5.3. Supporting Heterogeneous Link Types

   It is easy to support link types other than 802 links with RBridges.
   However, mixing link types within a single campus raises
   complexities, such as packet size, incompatible layer 2 addresses,
   and other layer 2 features (such as priority) that might be lost when
   trying to "bridge" two different link types.

5.4. Effects on L3 TTL

   In general, an RBridge should have no effect on a Layer 3, e.g., IP
   TTL field, since the RBridge is a Layer 2 device.  The TTLs which
   ensure loop-free operation in an RBridge system should occur in the
   encapsulation header, and not affect any of the headers of the packet
   passed through the RBridge system.  The RBridge should do nothing to
   transited packets other than that which would be done by an
   equivalent L2 system.

5.5. Using L3 encapsulation

   RBridges may use L3, e.g., IP encapsulation to provide a routable
   internal address and a loop-check indicator.  This allows the RBridge
   system to use L3 routing algorithms, e.g., OSPF, using existing L3
   implementations.  As with any RBridge system, packets are forwarded
   only within the preconfigured RBridge system.  Intermediate L2
   bridges are allowed whether L2 or L3 encapsulation is used.  L3
   encapsulation processing - including ICMP handling, fragmentation,
   etc., are well-defined (e.g., RFC2003).

   In this case, the L3 encapsulation should not decrement the TTL of
   the inner transited packet, since (as per RFC2003) the RBridge system
   would not be considered a forwarding (i.e., L3) 'tunnel'.  Further,


Perlman                Expires November 2, 2005               [Page 15]


Internet-Draft      RBridges: Transparent Routing              May 2005


   changing the IP TTL would potentially affect the reachability of all
   1's broadcast or multicast, which would not reach the full L2 subnet.

   The primary disadvantage to L3 encapsulation is the increased
   overhead of encapsulation (e.g., adding both an L3 and subsequent
   outer L2 header) and complexity of providing L2 services (broadcast
   notably) within the L3 subnet (RFC1122, RFC1812).  Note that L3
   supports fragmentation and reassembly for tunnels, notably both for
   IPv4 and IPv6 encapsulation.  Reassembly would be required at the
   egress, which increases the load on the egress RBridge in tracking
   and storing the fragments, but the resulting transited packet is
   generally transparent to the process.  The primary effect would be if
   there were a large amount of reordering (increasing the reassembly
   load) or high packet loss (resulting in failed reassembly and thus
   lost packets).  In the latter case, packet loss is amplified because
   of the lack of fate sharing of the fragments of a single transited
   packet.

5.6. Optimizing ARP/ND

   There are various alternatives for how an RBridge could handle
   ARPs/NDs when the target is known (because of having been
   disseminated through the link state protocol). Listed from most
   expensive to least expensive:

   o  treat ARP/ND like any multicast packet, and send along the
      (appropriate) spanning tree, and let the target respond

   o  route the ARP/ND to the RBridge that claims attachment to the
      target

   o  do proxy ARP/ND

   The only reason not to do proxy ARP/ND is in case the target node has
   actually moved, and has not yet been discovered by the RBridges. If
   the actual target needs to respond, then obviously the target is
   there. If the query is routed to the expected link, then there won't
   be a false positive, but the real location of the target may not be
   found, if the target has moved.

   Some mix of these strategies might be the best solution. For
   instance, if the target's location has not been recently verified
   through a broadcast ARP/ND, then the source's RBridge should
   broadcast the ARP/ND. Otherwise it should do proxy ARP. So for
   instance, RBridges could keep track of the last time a broadcast
   ARP/ND occurred for each endnode E (by any source, and injected by
   any RBridge). Let's say the parameter is 20 seconds. If a source S on


Perlman                Expires November 2, 2005               [Page 16]


Internet-Draft      RBridges: Transparent Routing              May 2005


   RBridge R1's link does an ARP/ND for D, if R1 has not seen an ARP/ND
   for D within the last 20 seconds, R1 broadcasts the query; otherwise
   it proxies the reply.

6. Security Considerations

   The goal is for RBridges to not add additional security issues over
   what would be present with traditional bridges.  RBridges will not be
   able to prevent nodes from impersonating other nodes, for instance,
   by issuing bogus ARP replies.  However, RBridges will not interfere
   with any schemes that would secure neighbor discovery.

   As with routing schemes, authentication of RBridge messages would be
   a simple addition to the design (and it would be accomplished the
   same way as it would be in IS-IS).  However, any sort of
   authentication requires additional configuration, which might
   interfere with the perception that RBridges, like bridges, are zero
   configuration.

7. Conclusions

   This design allows transparent interconnection of multiple links into
   a single IP subnet.  Management would be just like with bridges
   (plug-and-play).  But this design avoids the disadvantages of
   bridges.  Temporary loops are not a problem so failover can be as
   fast as possible, and shortest paths can be followed.

   The design is compatible with current IP nodes and routers, and with
   current bridges.

8. Acknowledgments

   We anticipate that many people will contribute to this design, and
   invite you to join the mailing list at http://www.postel.org/rbridge

9. References

9.1. Normative References

   [1]   Perkins, C., "IP Encapsulation within IP", RFC 2003 (Standards
         Track), October 1996.

   [2]   Braden, R., "Requirements for Internet Hosts - Communication
         Layers", STD 3, RFC 1122, October 1989.

   [3]   Baker, F., "Requirements for IP Version 4 Routers", RFC 1812
         (Standards Track), June 1995.


Perlman                Expires November 2, 2005               [Page 17]


Internet-Draft      RBridges: Transparent Routing              May 2005


   [4]   Plummer, D., "Ethernet Address Resolution Protocol: Or
         converting network protocol addresses to 48.bit Ethernet
         address for transmission on Ethernet hardware", STD 37, RFC
         826, November 1982.

   [5]   Narten, T., Nordmark, E. and W. Simpson, "Neighbor Discovery
         for IP Version 6 (IPv6)", RFC 2461 (Standards Track), December
         1998.

   [6]   Callon, R., "Use of OSI IS-IS for routing in TCP/IP and dual
         environments", RFC 1195, December 1990.

   [7]   IEEE 802.1d bridging standard, "IEEE 802.1d bridging standard".

   [8]   Perlman, R., "RBridges: Transparent Routing", Proc. Infocom
         2005, March 2004.

   [9]   Perlman, R., "Interconnection: Bridges, Routers, Switches, and
         Internetworking Protocols", Addison Wesley Chapter 3, 1999.

   [10]  Touch, J., "Dynamic Internet overlay deployment and management
         using the X-Bone", Computer Networks Vol. 36, No. 2-3, July
         2001.

   [11]  Touch, J., Wang, Y., Eggert, L. and G. Finn, "A Virtual
         Internet Architecture", ISI Technical Report ISI-TR-570,
         Presented at the Workshop on Future Directions in Network
         Architecture (FDNA) 2003 at Sigcomm 2003, March 2003.

9.2. Informative References

   [12]  Harkins, D. and D. Carrel, "The Internet Key Exchange (IKE)",
         RFC 2409 (Standards Track), November 1998.

   [13]  Mogul, J. and S. Deering, "Path MTU discovery", RFC 1191,
         November 1990.

   [14]  Lahey, K., "TCP Problems with Path MTU Discovery", RFC 2923
         (Informational), September 2000.

   [15]  Kent, S., "IP Encapsulating Security Payload (ESP)",
         draft-ietf-ipsec-esp-v3-10 (work in progress), March 2005.

   [16]  Kent, S., "IP Authentication Header",
         draft-ietf-ipsec-rfc2402bis-011 (work in progress), March 2005.




Perlman                Expires November 2, 2005               [Page 18]


Internet-Draft      RBridges: Transparent Routing              May 2005


   [17]  Kaufman, C., "Internet Key Exchange (IKEv2) Protocol",
         draft-ietf-ipsec-ikev2-17 (work in progress), Oct. 2004.

Author's Addresses

   Radia Perlman
   Sun Microsystems

   Email: Radia.Perlman@sun.com


   Joe Touch
   USC/ISI
   4676 Admiralty Way
   Marina del Rey, CA 90292 U.S.A.

   Phone: +1 (310)_448-9151
   Email: touch@isi.edu


   Alper Yegin
   Samsung Advanced Institute of Technology

   Email: alper.yegin@samsung.com


Intellectual Property Statement

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at
   http://www.ietf.org/ipr.

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement


Perlman                Expires November 2, 2005               [Page 19]


Internet-Draft      RBridges: Transparent Routing              May 2005


   this standard.  Please address the information to the IETF at
   ietf-ipr@ietf.org

Disclaimer of Validity

   This document and the information contained herein are provided on an
   "AS IS" basis and THE CONTRIBUTOR, THE ORGANIZATION HE/SHE REPRESENTS
   OR IS SPONSORED BY (IF ANY), THE INTERNET SOCIETY AND THE INTERNET
   ENGINEERING TASK FORCE DISCLAIM ALL WARRANTIES, EXPRESS OR IMPLIED,
   INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE
   INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED
   WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.

Copyright Statement

   Copyright (C) The Internet Society (2005).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

Acknowledgment

   Funding for the RFC Editor function is currently provided by the
   Internet Society.
























Perlman                Expires November 2, 2005               [Page 20]


Html markup produced by rfcmarkup 1.129c, available from https://tools.ietf.org/tools/rfcmarkup/