[Docs] [txt|pdf] [Tracker] [WG] [Email] [Diff1] [Diff2] [Nits] [IPR]

Versions: 00 01 02 03 04 draft-ietf-bess-evpn-irb-mcast

BESS                                                              W. Lin
Internet-Draft                                                  Z. Zhang
Intended status: Standards Track                                J. Drake
Expires: September 14, 2017                       Juniper Networks, Inc.
                                                              J. Rabadan
                                                              A. Sajassi
                                                           Cisco Systems
                                                          March 13, 2017

                 EVPN Inter-subnet Multicast Forwarding


   This document describes inter-subnet multicast forwarding procedures
   for Ethernet VPNs (EVPN).  This includes forwarding inside an EVN
   domain and to/from outside the EVPN domain.

Requirements Language

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC2119.

Status of This Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF).  Note that other groups may also distribute
   working documents as Internet-Drafts.  The list of current Internet-
   Drafts is at http://datatracker.ietf.org/drafts/current/.

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   This Internet-Draft will expire on September 14, 2017.

Copyright Notice

   Copyright (c) 2017 IETF Trust and the persons identified as the
   document authors.  All rights reserved.

Lin, et al.            Expires September 14, 2017               [Page 1]

Internet-Draft               evpn-irb-mcast                   March 2017

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document.  Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document.  Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.

Table of Contents

   1.  Introduction  . . . . . . . . . . . . . . . . . . . . . . . .   2
     1.1.  Background and Terminologies  . . . . . . . . . . . . . .   3
       1.1.1.  Integrated Routing and Bridging . . . . . . . . . . .   3
       1.1.2.  General Multicast Routing . . . . . . . . . . . . . .   4
     1.2.  Inter-subnet Multicast in EVPN  . . . . . . . . . . . . .   5
   2.  EVPN-aware Solution . . . . . . . . . . . . . . . . . . . . .   7
     2.1.  Basic Operations  . . . . . . . . . . . . . . . . . . . .   7
     2.2.  Multi-homing Support  . . . . . . . . . . . . . . . . . .   8
     2.3.  Receiver NVEs not connected to a source subnet  . . . . .   9
       2.3.1.  IMET routes advertisement . . . . . . . . . . . . . .  10
       2.3.2.  Layer 2 Forwarding State  . . . . . . . . . . . . . .  11
       2.3.3.  Layer 3 Forwarding State  . . . . . . . . . . . . . .  12
     2.4.  Selective Multicast . . . . . . . . . . . . . . . . . . .  12
     2.5.  Advanced Topics . . . . . . . . . . . . . . . . . . . . .  14
       2.5.1.  Legacy NVEs . . . . . . . . . . . . . . . . . . . . .  14
       2.5.2.  Traffic to/from outside of an EVPN domain . . . . . .  15
       2.5.3.  Integration with MVPN . . . . . . . . . . . . . . . .  17
       2.5.4.  When Tenant Routers Are Present . . . . . . . . . . .  19
   3.  IANA Considerations . . . . . . . . . . . . . . . . . . . . .  20
   4.  Security Considerations . . . . . . . . . . . . . . . . . . .  21
   5.  Acknowledgements  . . . . . . . . . . . . . . . . . . . . . .  21
   6.  References  . . . . . . . . . . . . . . . . . . . . . . . . .  21
     6.1.  Normative References  . . . . . . . . . . . . . . . . . .  21
     6.2.  Informative References  . . . . . . . . . . . . . . . . .  21
   Appendix A.  Integrated Routing and Bridging  . . . . . . . . . .  23
   Authors' Addresses  . . . . . . . . . . . . . . . . . . . . . . .  24

1.  Introduction

   EVPN offers an efficient L2 VPN solution with all-active multi-homing
   support for intra-subnet connectivity over MPLS/IP network.  EVPN
   also provides an integrated L2 and L3 service.  When forwarding among
   Tenant Systems (TS) across different IP subnets is required,
   Integrated Routing and Bridging (IRB) can be used [ietf-bess-evpn-

Lin, et al.            Expires September 14, 2017               [Page 2]

Internet-Draft               evpn-irb-mcast                   March 2017

   An network virtualization endpoint (NVE) device supporting IRB is
   called a L3 Gateway.  In a centralized approach, a centralized
   gateway provides all routing functionality, and even two tenant
   systems on two subnets connected to the same NVE need to go through
   the central gateway, which is inefficient.  In a distributed
   approach, each NVE has IRB configured, and inter-subnet traffic will
   be locally routed without having to go through a central gateway.

   Inter-subnet multicast forwarding is more complicated and not covered
   in [ietf-bess-evpn-inter-subnet-forwarding].  This document describes
   the procedures for inter-subnet multicast forwarding.

1.1.  Background and Terminologies

   For each Broadcast Domain (BD, an L2 concept), there is usually a
   subnet (an L3 concept).  This document may use subnet and BD
   interchangeably.  When inter-subnet forwarding is allowed between
   some subnets of the same tenant on the same NVE, the BDs are
   associated with the same routing instance via IRB interfaces.
   Multiple BDs of the same tenant may be attached to different routing
   instances if inter-subnet forwarding is subject to some restrictions.
   This document assumes that inter-subnet forwarding is allowed by
   default between subnets of the same tenant.

1.1.1.  Integrated Routing and Bridging

   Appendix A describes the concept of Integrated Routing and Bridging
   and in particular IRB interfaces in more details.

   An IRB interface is a logical connection between a BD and a routing
   instance.  It has two ends - one on routing instance side and one on
   BD side.  In this document, when we say a packet is "routed/sent down
   an IRB interface", it is from L3 point of view and on the routing
   instance side (from L3 down to L2).  L3 forwarding related processing
   like TTL/fragmentation and mac address change are done before the
   packet is put onto the IRB interface "wire" and sent to the
   corresponding BD.  From the BD's point of view, that packet is
   received on the BD side of the IRB interface and L2 switched out of
   one or more other L2 interfaces (Attachment Circuits or ACs) in the

   Note that there is one BD in a MAC-VRF with Vlan-based service and
   multiple BDs in a MAC-VRF with Vlan-aware Bundle Service.  Therefore,
   a routing instance for a tenant may have one or more MAC-VRFs
   associated with it, with the IRB interfaces being the ties.

Lin, et al.            Expires September 14, 2017               [Page 3]

Internet-Draft               evpn-irb-mcast                   March 2017

1.1.2.  General Multicast Routing

   IP routing is inter-subnet forwarding - traffic received from one
   subnet is routed/forwarded to other subnets.  The subnets could be
   tradictional networks like LANs or could be broadcast domains
   implemented by EVPN.  This section provides a very high level
   description on layer 3 multicast routing and is not specific to EVPN
   at all.

   Multicast routing is based on trees - rooted at the source or
   Rendezvous Point (RP).  Typically the tree is set up by PIM protocol
   [RFC7761] following the reverse path from a receiver towards the
   source/RP.  On a particular router on the tree, the process to
   determine the upstream interface/neighbor is called the RPF process
   and the upstream interface/neighbor is also called the RPF interface/
   neighbor.  The PIM protocol signals the control plane state, and
   corresponding (s,g) or (*,g) forwarding state is installed on the
   routers on the tree.  The forwarding state includes one (or more, in
   case of bidirectional trees) expected Incoming Interfaces (IIFs) and
   a list of Outging interfaces (OIFs).  The IIF is the RPF interface
   (IIF is forwarding state while RPF is control plane state, but may be
   used interchangeably in this document) towards the source/RP, and in
   case of bidirectional trees [RFC5015], the IIFs also include other
   interfaces where traffic is accepted.

   An interface is added to the OIF list if one of the following two
   conditions is met:

   o  There are local receivers on the subnet that the interface is
      connected to, and this router is the PIM Designated Router (DR) or
      IGMP/MLD Querier if PIM is not used.  In this case the router is
      referred to as a Last Hop Router (LHR).

   o  A PIM join has been received from a downstream router connected by
      this interface.

   The LHR also send PIM join messages towards its RPF neighbor.  This
   will establish the branch of the tree towards the root.

   In case of PIM-SM for ASM (Any Source Multicast), the LHRs send (*,g)
   joins towards the RP, establishing a (*,g) shared tree rooted at the
   RP.  On the subnet that a source is connected to, the PIM DR,
   referred to as First Hop Router (FHR), sends PIM Register messages to
   the RP when it receives initial traffic for a flow.  The RP then
   sends (s,g) PIM join towards the FHR, establishing a branch from the
   RP towards the source.  Traffic is initially sent from the FHR to the
   RP following the (s,g) branch, and the RP delivers the traffic to all
   LHRs following the (*,g) shared tree.  Upon receiving traffic, an LHR

Lin, et al.            Expires September 14, 2017               [Page 4]

Internet-Draft               evpn-irb-mcast                   March 2017

   optionally sends (s,g) join towards the source, establishing an (s,g)
   branch between the source and the LHR so that traffic can follow a
   more optimal path.

1.2.  Inter-subnet Multicast in EVPN

   For multicast traffic sourced from a TS in subnet 1, EVPN Broadcast,
   Unknow unicast, Multicast (BUM) forwarding based on RFC 7432, will
   deliver it to all sites in subnet 1.  When NVEs receive the mulitcast
   traffic on IRBs for subnet1, they route the traffic to other subnets
   via their IRB interfaces following multicast routing procedures.
   From an L3 point of view, each NVE has an (IRB) interface to subnet
   1, and hence is attached to the same subnet as the multicast source.
   Nothing is different from a traditional LAN and regular IGMP/MLD/PIM
   procedures kick in.

   If a TS is a multicast receiver, it uses IGMP/MLD to signal its
   interest in some multicast flows.  One of the gateways is the IGMP/
   MLD querier for a given subnet.  It sends queries down the IRB for
   that subnet, which in turn causes the queries to be forwarded
   throughout the subnet following the EVPN BUM procedures.  TS's send
   IGMP/MLD joins via multicast, which are also forwarded throughout the
   subnet via EVPN BUM procedure.  The gateways receive the joins via
   their IRB interfaces.  From layer 3 point of view, again it is
   nothing different from a traditional LAN.

   On a traditional LAN, only one router can send multicast to local
   receivers on the LAN.  That is either the PIM Designated Router
   (subject to PIM Assert procedure) or IGMP/MLD querier (if PIM is not
   used - e.g., the LAN is a stub network).  On the source subnet, PIM
   is typically needed so that traffic can be delivered to other subnets
   via other routers.  For example, in case of PIM-SM, the DR on the
   source network encapsulates the initial packets for a particular ASM
   flow in PIM Register messages and unicasts the Register messages to
   the Rendezvous Point (RP) for that flow, triggering necessary state
   for that flow to be built throughout the network.

   That also works in the EVPN scenario, although not efficiently.
   Consider the example depicted in Figure 1, where a tenant has two
   subnets (subnets 1 and 2) corresponding to two EVPN broadcast domains
   (VLANs 1 and 2) at three sites.  With VLAN-based service, each
   broadcast domain has its own EVI.  With VLAN-aware bundle service,
   many broadcast domains can belong to the same EVI.

   In Figure 1, a multicast source is located at site 1 on subnet 1 and
   three receivers are located at site 2 on subnet 1, site 1 and 2 on
   subnet 2 respectively.  PIM adjacencies are formed among the NVEs on

Lin, et al.            Expires September 14, 2017               [Page 5]

Internet-Draft               evpn-irb-mcast                   March 2017

   each subnet.  On subnet 1, NVE1 is the PIM DR while on subnet 2, NVE3
   is the PIM DR.

   Multicast traffic from the source at site 1 on subnet 1 is forwarded
   to all three sites on BD 1 following EVPN BUM procedure.  Rcvr1 gets
   the traffic when NVE2 sends it out of its local Attachment Circuit
   (AC).  The three gateways for EVI1 also receive the traffic on their
   IRB interfaces for subnet1 and potentially route to other subnets.
   NVE3 is the DR on subnet 2 so it routes the local traffic (from L3
   point of view) to subnet 2 while NVE1/2 is not the DR on subnet 2 so
   they don't.  Once traffic gets onto subnet 2, it is forwarded back to
   NVE1/2 and delivered to rcvr2/3 following the EVPN BUM procedures.

   Notice that the traffic is sent across the EVPN core multiple times -
   once for each subnet with receivers.  Additionally, both NVE1 and
   NVE2 receive the multicast traffic from subnet 1 on their IRB
   interfaces for subnet 1, but they do not route to subnet 2 where they
   are not the PIM DRs.  Instead, they wait to receive traffic at L2
   from NVE3.  For example, for receiver 3 connected to NVE1 but on
   different IP subnet as the multicast source, the multicast traffic
   from source has to go from NVE1 to NVE3 and then back to NVE1 before
   it is being delivered to the receiver 3.  This is similar to the
   hairpinning issue with centralized approach - the inter-subnet
   multicast forwarding is centralized via the DR, even though
   distributed approach is being used for unicast (in that each NVE is
   supporting IRB and routing inter-subnet unicast traffic locally).

           site 1     .      site 2      .       site 3
                      .                  .
            src       .      rcvr1       .
             |        .        |         .
         --------------------------------------------  BD 1
             |        .        |         .         |
         IRB1| DR     .    IRB1|         .     IRB1|
         IRB2|        .    IRB2|         .     IRB2| DR
             |        .        |         .         |
         --------------------------------------------  BD 2
             |        .        |         .
            rcvr3     .       rcvr2      .
                      .                  .
           site 1     .     site 2       .      site 3

           Figure 1 - EVPN IRB multicast scenario

Lin, et al.            Expires September 14, 2017               [Page 6]

Internet-Draft               evpn-irb-mcast                   March 2017

2.  EVPN-aware Solution

   In the above text, the term "gateway" is from hosts point of view,
   referring to a "routing gateway" that provides layer 3 forwarding.
   With the distributed approach, each or almost every NVE is a gateway,
   hence in the rest of the document we simply use the term NVE instead
   of gateway.

2.1.  Basic Operations

   The multicast forwarding inefficiency described above (hairpinning
   and multiple copies across the core) can be avoided if the following
   Optimized Inter-subnet Multicast (OISM) procedures are followed:

   1.  When a routing instance on an NVE receives multicast traffic on
       one of its IRB interfaces, it routes the traffic down any other
       IRB interfaces that attach to subnets that have receivers for the
       traffic, regardless whether the NVE is DR for those IRB
       interfaces or not.

   2.  For ASM multicast traffic sourced from a local AC, if PIM runs on
       the corresponding IRB interface, the NVE behaves as if it were
       the DR on the IRB interface and performs PIM Registering

   3.  When an NVE receives Membership Reports from one of its ACs and
       PIM runs on the corresponding IRB interface, it sends PIM joins
       towards the RP or source regardless if it is DR/querier or not.

   4.  Multicast data traffic received by a BD on its IRB interface
       (i.e.  multicast data traffic routed down the IRB interface) is
       L2 switched out of that BD's local ACs only and not forwarded to
       other NVEs.  Note that link local multicast traffic (e.g.
       addressed to 224.0.0.x in case of IPv4), is not subject to the
       above procedures.  It is still forwarded to remote NVEs in the
       same subnet following EVPN procedures and not routed into other

   The above procedures are for routing traffic from the source subnet
   to other subnets.  In the source subnet itself, traffic is L2
   switched according to EVPN procedures.  It is assumed that each NVE
   of the tenant can receive the L2 switched traffic in the source
   subnet.  If there are NVEs not attached to every subnet (therefore an
   NVE cannot received L2 switched traffic in a source subnet that it is
   not connected to), then a Supplemental BD (Section 2.3) is needed to
   L2 switch the traffic from the source NVE to NVEs not attached to the
   source subnet.  In that SBD, multicast data traffic received on its
   IRB interface is forwarded to other NVEs, as an exception to rule 4.

Lin, et al.            Expires September 14, 2017               [Page 7]

Internet-Draft               evpn-irb-mcast                   March 2017

   That is needed for situations discussed in Section 2.5.2 and
   Section 2.5.4.

   In the example in Figure 1, when NVE1's routing instance receives
   traffic on its IRB1 interface it will route the traffic down its IRB2
   for delivery to local rcvr3.  It also sends register messages to the
   RP since the source is local.  Both NVE2 and NVE3 will receive the
   traffic on IRB1 but neither sends register messages to the RP, since
   the source is not local.  NVE2 will route the traffic down its IRB2
   and deliver to local rcvr2.  NVE3 will also route the traffic down
   IRB2 even though there is no receiver at the local site, because the
   IGMP/MLD joins from rcvr2/3 are also received by NVE3.

   Essentially, each NVE behaves as a DR/querier on an IRB interface for
   local senders and receivers, and multicast data traffic routed down
   IRB interfaces is limited to local receivers.

   If EVPN is only used to provide DC overlay service but not transit
   service (i.e. simulate a transit LAN connecting tenant routers) for a
   tenant, then there is no need to run PIM protocol and the rule 2 and
   3 above do not apply.  Otherwise, additional procedures in
   Section 2.5.4 are needed.

2.2.  Multi-homing Support

   The solution works as described when there are multi-homed ethernet

   As shown in Figure 2, both rcvr4 and rcvr5 are all-active multi-homed
   to NVE2 and NVE3.  Receiver 4 is on subnet BD 1 and receiver 5 is on
   BD 2.  When IRBs on NVE1 and NVE2 forward multicast traffic to its
   local attached access interface(s) based on EVPN BUM procedure, only
   DF for the ES deliveries multicast traffic to its multi-homed
   receiver.  Hence no duplicated multicast traffic will be forwarded to
   receiver 4 or receiver 5.

Lin, et al.            Expires September 14, 2017               [Page 8]

Internet-Draft               evpn-irb-mcast                   March 2017

            src       .        +-------- rcvr4-----+
             |        .        |         .         |
         --------------------------------------------  BD 1 (EVI1)
             |        .        |         .         |
         IRB1| DR     .    IRB1|         .     IRB1|
         IRB2|        .    IRB2|         .     IRB2| DR
             |        .        |         .         |
         --------------------------------------------  BD 2 (EVI2)
             |        .        |         .         |
            rcvr3     .        +-------- rcvr5-----+

           Figure 2 - EVPN IRB multicast and multi-homing

   For traffic sourced from a multi-homed ES, existing split-horizon
   procedures work as is, because vanilla EVPN forwarding is used for
   intra-subnet traffic.

2.3.  Receiver NVEs not connected to a source subnet

   The procedures of this document require that a inter-subnet multicast
   packet is carried across the core as an intra-subnet frame.  However,
   consider the case where, for a given tenant, (a) NVE-1 attaches to
   subnet-1, (b) NVE-2 attaches to subnet-2 but not to subnet-1, and (c)
   a receiver in subnet-2 needs to receive multicast packets that are
   sourced in subnet-1.  Since NVE-1 sends the packets across the core
   as intra-subnet multicasts, how does NVE-2 receive the packets?

   One possible solution would be to configure subnet-1 on NVE-2.  On
   NVE-2, subnet-1 would have an IRB interface attaching it to the
   routing instance, but subnet-1 would have no ACs.  Then NVE-2 would
   receive the intra-subnet multicast traffic of subnet-1, and the
   procedures already discussed would cause the traffic to be forwarded
   to NVE-2's local ACs for subnet-2.

   However, if a given tenant has many subnets, only a few of which
   attach to any given NVE, it is undesirable to have to configure all
   those subnets on all those PEs.  To avoid this, we introduce the
   notion of a "Supplemental Broadcast Domain" (SBD).  Each NVE will
   have a single SBD (per tenant) configured.  The SBD has no ACs, just
   an IRB interface.  The purpose of the SBD on a given NVE is to
   receive (over the core) the intra-subnet multicasts of all subnets
   that are not atached to that NVE.  Additionally, traffic routed down
   the SBD IRB interface will be sent across the core to remote NVEs.
   This is an exception to rule 4 in Section 2, and is explained in
   Section 2.5.2 and Section 2.5.4.

Lin, et al.            Expires September 14, 2017               [Page 9]

Internet-Draft               evpn-irb-mcast                   March 2017

   Thus in the above example, when NVE-1 sends a multicast packet from
   subnet-1 to other NVEs, NVE-2 will receive the packet on the SBD.
   Note that, in the example, NVE-1 would not have to send any extra
   copies of the packet across the core.  It just sends what it would
   normally send.  If an NVE receiving the packet is attached to subnet-
   1, it associates the packet with subnet-1; if an NVE receiving the
   packet is not attached to subnet-1, it associates the packet with the

   Subsequent sections explain how the NVEs construct the necessary EVPN
   routes to make this happen.

2.3.1.  IMET routes advertisement

   The SBD is a separate broadcast domain present on all the NVEs of the
   tenant.  It has a corresponding IRB interface but no ACs.  With VLAN-
   based service, the SBD is in its own EVI.  With VLAN-aware bundle
   service, the SBD is just an additional BD in the EVI.  The SBD uses a
   Route Target that allows its routes to be imported by all the NVEs of
   the tenant and associated with the SBD.  In case of VLAN-aware bundle
   service, the Route Target may be the same as or different from the
   Route Targets for other BDs in the same EVI.  In this document, when
   we say a route is originated for/in the SBD, it means that the RD of
   the route is set to the RD of the originating NVE's MAC-VRF for the
   SBD, the Route Target is set to that of the SBD, and the Tag ID is
   set to 0 in case of VLAN-based service or the Tag ID for the SBD in
   case of VLAN-aware bundle service.

   The rules of IMET route advertisement can be summarized as following:

   o  When IR, BIER, or RSVP-TE P2MP is being used for inclusive
      tunnels, each NVE originates an IMET route in the SBD.  In case of
      IR, the MPLS Label field in the IMET route's PMSI Tunnel Attribute
      (PTA) is a downstream allocated label for the SBD.

   o  When PIM, BIER or mLDP/RSVP-TE P2MP is being used for inclusive
      tunnels, the IMET route that an NVE originates for a subnet
      carries the RT for the subnet and the RT for the SBD.

   o  In case of BIER, or if tunnel aggregation (a single tunnel is used
      for more than one broadcast domains) is used for mLDP/RSVP-TE
      P2MP, the IMET route for the source subnet carries an upstream
      allocated label in the PMSI Tunnel Attribute.  The label is
      different for each source subnet.

   With the above rules, IMET routes are advertised in both the SBD and
   source subnets if IR, BIER or RSVP-TE P2MP tunnels are used.  IMET

Lin, et al.            Expires September 14, 2017              [Page 10]

Internet-Draft               evpn-irb-mcast                   March 2017

   routes are only advertised in the source subnet in case of PIM/mLDP
   P2MP tunnels.

2.3.2.  Layer 2 Forwarding State

   In case of IR, when a source NVE builds its L2 forwarding state for a
   BD, it finds all the remote NVEs that needs to receive traffic by
   finding the IMET routes for the SBD.  The IMET routes for the SBD are
   those in the MAC-VRF for the SBD (in case of VLAN-based service) or
   those in the MAC-VRF for the SBD and with the SBD's Tag ID (in case
   of VLAN-aware bundle service).

   If a remote NVE (learnt via the IMET route for the SBD) also
   advertises an IMET route for the source subnet, the label in that
   route is used.  Otherwise, the label in the IMET route for the SBD is
   used.  Thus when a packet is transmitted to an NVE attached to the
   source subnet, it carries the label that that NVE assigned to the
   source subnet.  When a packet is transmitted to an NVE that is not
   attached to the source subnet, it carries the label that that NVE
   assigned to the SBD.

   In case of RSVP-TE P2MP, the source NVE establishes a P2MP tunnel to
   all remote NVEs found through the SBD's IMET routes and advertises
   the tunnel in the IMET route for the source subnet.  If tunnel
   aggregation is not used, a remote NVE attached to the source subnet
   binds the incoming tunnel branch to the source subnet, and a remote
   NVE that is not attached to the source subnet binds the incoming
   tunnel branch to the SBD.

   In case of PIM/mLDP, a remote NVE joins the tunnel advertised in the
   IMET route for a source subnet.  If tunnel aggregation is not used, a
   remote NVE attached to the source subnet binds the incoming tunnel
   branch to the source subnet, and a remote NVE that is not attached to
   the source subnet binds the incoming tunnel branch to the SBD.

   In case of BIER, or if tunnel aggregation is used for mLDP/RSVP-TE
   P2MP, a remote NVE binds the upstream allocated label in the IMET
   route for a source subnet to that subnet if it is present on the NVE.
   Otherwise it binds the label to the SBD.

   With the forwarding state set up as above, the incoming traffic from
   a remote NVE is either associated with the source subnet or with the
   SBD.  In the former case, traffic is forwarded at L2 to local
   receivers in the same source subnet, and split-horizon procedures for
   multi-homing work as is.  In the latter case, the traffic appears to
   the receiving NVE as if it were sourced from the SBD.

Lin, et al.            Expires September 14, 2017              [Page 11]

Internet-Draft               evpn-irb-mcast                   March 2017

   The incoming traffic from a remote NVE is also associated with the
   IRB interface in either the source subnet or SBD and routed down
   other IRB interfaces for local receivers in other subnets, according
   to a matching Layer 3 forwarding state described in the following

2.3.3.  Layer 3 Forwarding State

   When an NVE's routing instance receives IGMP/MLD joins on IRB
   interfaces, corresponding (C-S,C-G) or (C-*,C-G) L3 forwarding
   entries are created/updated.  The OIF list includes IRB interfaces
   that have corresponding (C-S,C-G) or (C-*,C-G) IGMP/MLD state built
   from relevant IGMP/MLD joins.  An OIF is removed when the
   corresponding IGMP/MLD state is removed from the interface, and the
   (C-S,C-G) or (C-*,C-G) L3 forwarding state is removed when all of its
   OIFs are removed.

   For (C-S,C-G) L3 forwarding entries, the IIF is set to the source
   subnet's IRB interface if the source subnet is present on the NVE.
   If the source subnet is not present on the NVE, the IIF is set to the
   SBD's IRB interface.

   For (C-*,C-G) forwarding entries, the RPF interfaces include all IRB
   interfaces as the traffic can arrive in the SBD or in any subnet to
   which the NVE is attached.  Note that for a particular packet, it
   only arrive once, and is associated with either the source subnet or
   the SBD.

2.4.  Selective Multicast

   For intra-subnet selective multicast,
   [I-D.sajassi-bess-evpn-igmp-mld-proxy] specifies the procedures of
   SMET routes.  If a NVE has local receivers for (C-*,C-G) traffic in
   subnet X, since the sources could be in any of other subnets that are
   present on the NVE, it would need to advertise the (C-*,C-G) SMET
   routes in each of those source subnets to pull traffic.  To avoid the
   duplication, SBD is used even if every subnet is connected to every
   NVE of a tenant, and SMET routes are advertised as following:

   o  If there are tenant routers (Section 2.5.4), SMET routes are
      originated per [I-D.sajassi-bess-evpn-igmp-mld-proxy] in the
      subnet where the state is originally learnt.  This will allow NVEs
      in the same subnet to convert SMET routes back to IGMP/MLD
      messages on ACs.

   o  Additionally, a corresponding SMET route is originated for the
      SBD, with the v1/v2/v3 flag bits cleared, with one exception
      described below.

Lin, et al.            Expires September 14, 2017              [Page 12]

Internet-Draft               evpn-irb-mcast                   March 2017

   Note that for (C-S,C-G) SMET routes, even though they would not need
   to be advertised in every source subnet like in (C-*,C-G) case, they
   are also advertised in the SBD.  The reason is that a receiver for an
   (C-S,C-G) flow may be attached to a NVE that is not connected to the
   source subnet so the SMET route need to be advertised in the SBD
   anyway in that case.  For consistence in all situations, all SMET
   routes are advertised in the SBD.

   The one exception is that a (C-S,C-G) SMET route with the IE
   (include/exclude) bit set may be suppressed in the SBD, according to
   the IGMP/MLD state merged from all subnets.  For example, a
   particular source may be excluded in one subnet but not in another,
   then the SMET route will not be originated for the SBD.  This can be
   considered that IGMP/MLD state in subnets is proxied into the SBD,
   just like the IGMP/MLD state on ACs is proxied to other ACs in the
   same subnet.

   The SMET routes in the SBD will trigger IGMP/MLD state on the SBD's
   IRB interfaces.  Note that for L3 multicast forwarding state, the SBD
   IRB interface is not added to the Outgoing InterFace (OIF) List when
   the RPF interface is one or more IRB interfaces (i.e., traffic is
   sourced from a BD), even with the IGMP/MLD state on the SBD IRB
   interface.  The reason is that traffic from that BD is already L2
   switched to all NVEs.

   [I-D.sajassi-bess-evpn-igmp-mld-proxy] assumes selective forwarding
   is always used with IR or BIER for all flows.  The SMET route allows
   other NVEs to identify which NVEs need to receive traffic for a
   particular (C-S,C-G) or (C-*,C-G).  With SBD, a source NVE builds the
   corresponding forwarding state using the same procedure as in the
   inclusive tunnel case, except that it checks the corresponding SMET
   route in the SBD to determine if a remote NVE needs to receive the

   For other tunnel types, or if selective forwarding is only used for
   some of the flows, S-PMSI A-D routes are needed as specified in
   [I-D.ietf-bess-evpn-bum-procedure-updates].  A source NVE advertises
   S-SPMSI A-D routes to announce the tunnels used for certain flows,
   and receiving NVEs either join the announced PIM/mLDP tunnel or
   respond with Leaf A-D routes if the Leaf Information Requested flag
   is set in the S-PMSI A-D route's PTA (so that the source NVE can
   include them as tunnel leaves).  As in the inclusive tunnel case, the
   S-PMSI A-D routes additionally carry the RT for the SBD so that all
   NVEs of the tenant will import them.  A receiving NVE binds the
   announced tunnel to either the subnet that the route is for if the
   subnet is present on the NVE or to the SBD otherwise.

Lin, et al.            Expires September 14, 2017              [Page 13]

Internet-Draft               evpn-irb-mcast                   March 2017

2.5.  Advanced Topics

2.5.1.  Legacy NVEs

   It is possible that an NVE may not support the OISM procedures.  For
   example, it may not have IRB interfaces for some of its BDs, or its
   software could not be upgraded to support OISM.  To indicate the OISM
   support, an NVE that supports the procedures in this document
   includes the Multicast Flags Extended Community in its IMET routes
   and sets a new flag bit (OISM bit, to be assigned by IANA) in the EC.

   Suppose a multicast source is attached to NVE 1 in subnet 1.  Subnet
   1 is not present on NVE 2 that does not support OISM, and NVE 2 has
   some receivers in its subnet 2.  In this case, the receivers need to
   receive traffic in subnet 2 from NVE 1.  For that, the OISM NVEs run
   PIM over the subnet for which not all NVEs support OISM, and the
   elected PIM DR use a separate provider tunnel to forward traffic
   (that is routed down the DR's IRB interface for the subnet) only to
   NVEs that do not support OISM.

   If the PIM DR uses IR to forward BUM traffic in the subnet, the
   special tunnel's leaves includes the NVEs that do not set the OISM
   bit in the above mentioned EC.

   If the PIM DR uses P2MP tunnels, the special tunnel is advertised in
   an EVPN S-PMSI A-D route per
   [I-D.ietf-bess-evpn-bum-procedure-updates].  The route carries an
   EVPN Non-OISM Extended Community, indicating that a receiving NVE
   attached to the BD identified in the route should join the advertised
   tunnel only if it does not support OISM.

   The routes could be either be a (C-*,C-*) wildcard S-PMSI A-D routes
   if an inclusive tunnel is used (but only for all sites without IRBs),
   or individual (C-S,C-G)/(C-*,C-G)/(C-S,C-*) S-PMSI A-D routes if
   selective tunnels are used.  They are advertised for each of BD to
   deliver multicast traffic routed down the IRB interface for the BD to
   remote sites that do not have IRBs for the BD.  If the same
   (C-S,C-G)/(C-*,C-G)/(C-S,C-*)/(C-*,C-*) S-PMSI A-D routes are also
   advertised without the EVPN Non-OISM EC (to deliver intra-subnet
   traffic), then different RDs MUST be used for the two routes.

   The EVPN Non-OISM Extended Community is a new EVPN extended
   community.  EVPN extended communities are transitive extended
   community with a Type field of 6.  The subtype of this new EVPN
   extended community will be assigned by IANA, and with the following
   8-octet encoding:

Lin, et al.            Expires September 14, 2017              [Page 14]

Internet-Draft               evpn-irb-mcast                   March 2017

        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       | Type=0x06     | Sub-Type TBD  |      Reserved=0               |
       |                       Reserved=0                              |

   For multicast sources attached to a Non-OISM NVE, if the source
   subnet is present on all NVEs, then traffic will be L2 switched to
   all NVEs in the source subnet and then forwarded appropriately.  For
   simplicity, this document requires that all subnets on a Non-OISM NVE
   are configured on all NVEs, even if there would be no ACs on some
   NVEs for those subnets.

2.5.2.  Traffic to/from outside of an EVPN domain

   For traffic coming in/out of an EVPN domain, EVPN Gateways (GWs) are
   used.  They are NVEs that also participate in the SBD for each
   tenant, and may be connected to some subnets.  This document supposes
   that the GWs run PIM on its external tenant interfaces, or act as
   MVPN PEs for external connection (and the IRB interfaces are VRF
   interfaces in the IPVPN).  The subnets in the EVPN domain appear as
   stub networks connected to the PIM/MVPN domain.  This section
   describes the procedures that are common for both PIM and MVPN as
   external connection, while the next section focuses on procedures
   specific to MVPN.

   If there are multiple GWs for the same EVPN domain, then the GWs need
   to run PIM on the IRB interfaces for the subnets and the SBD, so that
   a DR can be elected for each subnet/SBD, and act as FHR/LHR on the
   subnets/SBD.  In other words, traffic inside the EVPN domain follows
   the procedures described in previous sections, while traffic to/from
   outside the EVPN domain need to additioanlly follow existing PIM/MVPN

   For traffic going out of the EVPN domain, the IRB interface of the
   source subnet or SBD is the RPF interface on the GW, depending on
   whether the source subnet is present on the GW.  In case of PIM-SM,
   one of the EVPN GWs is the PIM DR on a connected source subnet or on
   the SBD act as the First Hop Router (e.g. handling PIM register
   procedures for ASM).  For that, the SBD IRB needs to be configured to
   treat incoming packets as if the sources were on a local subnet (in
   this case the SBD).

   When selective forwarding is used in the EVPN domain, for the EVPN GW
   to receive all traffic (before it learns possible external receivers)
   for the purpose of FHR procedures, it MUST advertise a (C-*,C-*) SMET

Lin, et al.            Expires September 14, 2017              [Page 15]

Internet-Draft               evpn-irb-mcast                   March 2017

   route in the SBD, indicating to other NVEs that it needs to receive
   all traffic.  Later the EVPN GW may receive (C-S,C-G) prunes from the
   external network.  At that time, it MAY advertise (C-S,C-G) SMET
   route with the Exclude Group type bit and IGMPv3 bit in the Flags
   field set, signaling to other NVEs that the particular (C-S,C-G)
   traffic is not needed.

   For traffic coming into a EVPN domain, the IRB interfaces for
   connected subnets are included in OIF list for the L3 multicast
   forwarding route, if the subnets have corresponding local IGMP/MLD
   state.  The IRB interface of the SBD may also be added as an outgoing
   interface so that remote NVEs can receive the traffic and route to
   their connected subnets.  Note that in this case, data traffic sent
   down the SBD IRB interface is forwarded to remote NVEs (this is a
   exception to the behavior in Section 2).  The SBD IRB interface is
   added only if the GW has corresponding SMET routes (as described in
   Section 2.4) received from other NVEs in the SBD.  Corresponding PIM
   join/prune messages or BGP-MVPN routes will be triggered/withdrawn as
   a result.

   For (C-*,C-G) L3 forwarding state, Section 2.3.3 states that all IRB
   interfaces are included in the RPF interface list.  Section 2.4
   states that the SBD IRB interface is not added to OIF list if the RPF
   interfaces include one or more IRB interfaces.  That is to prevent
   routing internal traffic into the SBD at layer 3 (because the source
   NVE already L2 switch the traffic to all NVEs).  This means that
   traffic coming into the EVPN domain cannot use the (C-*,C-G)
   forwarding state (it would not be routed down the IRB interface for
   ths SBD to reach remote NVEs because that IRB is not in the OIF
   list).  For this to work, the interface or MVPN tunnel connecting
   towards the C-RP is not added as an IIF of the (C-*,C-G) forwarding
   state (even though a PIM join is sent out of that interface), so
   initial traffic for an externally sourced flow will match the
   (C-*,C-G) forwarding state and trigger IIF Mismatch notifications,
   (since the incoming interface does not match any of the IIFs),
   causing the EVPN GW to install (C-S,C-G) state with the external
   interface (or MVPN provider tunnel) being the RPF interface and IRB
   interface included in the OIF list.  A Variation of External Connection

   If a tenant's external connection can be via a vlan (instead of
   MVPN), and there are no sources like C-S1/2/5 as described in
   Section 2.5.3, then the following variation can be used.

   The external vlan connection becomes an AC in the SBD.  The tenant
   external router becomes the PIM FHR and LHR for the EVPN domain that
   is treated as a stub network.  The previous EVPN GWs are no longer

Lin, et al.            Expires September 14, 2017              [Page 16]

Internet-Draft               evpn-irb-mcast                   March 2017

   gateways and are referred to as edge NVEs in this section.  An AE
   bundle can be used to connect to multiple edge NVEs - the bundle
   terminates either on the external router or on a switch between the
   edge NVEs and the external router, as depicted in the following
   picture.  From the edge NVEs' point of view, the external PIM router
   is a TS on a multihomed ES.

                     vlan3                           vlan4
                TS3--------NVE3                NVE4---------TS4
                 vlan1                                   vlan2
             TS1------Edge NVE1                Edge NVE2------TS2
                            \                   /
                             \SBD AC     SBD AC/
                              \               /
                               \             /
                                \ AE bundle /
                                 \         /
                              external PIM router

   PIM is not running on any of the NVEs.  IGMP/MLD state inside the
   EVPN domain is proxied to the external vlan and triggers
   corresponding multicast state on the external router.  Externally
   sourced traffic is routed to the vlan as a result, and is L2 switched
   by the edge NVEs to other NVEs via the SBD.  All receivers in the
   EVPN domain receive the traffic that is routed to them by their
   attached NVEs (the IIF is the SBD IRB and the OIFs are the IRBs for
   the subnets that the receivers are on).

   For traffic sourced from inside the EVPN domain to reach external
   receivers, the edge NVEs still need to advertise a (C-*,C-*) SMET
   route in the SBD to pull all traffic and L2 switch to the external
   router, who will register towards the RP.  The external router may
   prune back a particular flow by sending approproate IGMP/MLD
   messages, triggering correspoding SMET routes on the edge NVEs so
   that the source NVEs will stop sending traffic towards the edge NVEs.

2.5.3.  Integration with MVPN

   When a tenant needs to connect its EVPN subnets to external networks
   via L3VPN, instead of running both EVPN and L3VPN on each NVE, this
   document recommends that L3VPN (hence MVPN) only extends to the EVPN
   GWs, and only EVPN runs inside the EVPN domain.  EVPN GWs run both
   EVPN and L3VPN/MVPN, as depicted in the following diagram.

Lin, et al.            Expires September 14, 2017              [Page 17]

Internet-Draft               evpn-irb-mcast                   March 2017

                    C-s1                               ---+
                      |                                   |
                     PE1          PE2                     |
                                                          |  WAN; L3VPN
                       (L3VPN/MVPN)    _C-s5              |
                                      /                   |
            C-s2 --- GW1          GW2 --- r1           ---+  GWs; EVPN+L3VPN
                     /|\          /|\                     |
                    / | \ (EVPN) / | \                    |
                   /  |  \      /  |  \                   |
                 bd1 bd2    SBD   bd2 bd3                 |  DC; EVPN only
                 /    |                                   |
              C-s3  C-s4                                  |
                        (more NVEs omitted)            ---+

   GW1/2 run both EVPN and L3VPN.  They may advertise routes learnt from
   PE1/PE2 (e.g.  C-s1), routes to locally attached non-EVPN
   destinations (e.g., C-s2/s5), or just a default route into the EVPN
   domain as EVPN type-5 routes.  For destinations inside the EVPN
   domain (including EVPN and non-EVPN, e.g.  C-s2/3/4/5), the GWs may
   advertise subnet prefix L3VPN routes towards outside the EVPN domain,
   or optionally advertise host IPVPN route when they're learnt via EVPN
   type-2 routes.  The L3VNP routes are all advertised with Source AS
   and VRF Route Import ECs [RFC6514] for MVPN purpose.

   Using the GW2 example, when it determines RPF interface/neighbor or
   MVPN UMH for various sources, it follows the following rules:

   o  If the source (e.g.  C-s5) is reachable on a local non-IRB
      interface, use that interface as the RPF interface.  Or,

   o  If the source (e.g.  C-s4) is on a local BD, use the IRB for that
      local subnet as the RPF interface.  Or,

   o  If the route to the source (e.g.  C-s2/s3) is learnt via EVPN
      type-2/5 routes, use the SBD IRB as the RPF interface.  Or,

   o  If the route to the source (e.g.  C-s1/s2) has a VRF Import RT EC,
      then use MVPN procedure for UMH selection and use the MVPN
      provider tunnel as the RPF interface.

   Notice that for C-s2, GW2 may either use the SBD IRB or the MVPN
   provider tunnel as the RPF interface, depending whether the IPVPN
   route or EVPN type-5 route is selected as the active route.

   Also notice that for C-s4, if GW1/2 only advertises the subnet prefix
   into L3VPN, then PE1/2 may pick GW2 as the UMH.  It will still work
   as GW2 will get the traffic in bd2 as well.  However, it would be

Lin, et al.            Expires September 14, 2017              [Page 18]

Internet-Draft               evpn-irb-mcast                   March 2017

   more optimal if GW1 is picked as the UMH as C-s4 is directly attached
   to GW1.  To achieve this optimization, when GW2 receives the
   C-multicast route for (C-s4,C-g) from PE1/2, it may optionally
   advertise a C-multicast route to GW1 where C-s4 is directly attached.
   This will trigger an (C-s4,C-g) Source Active route, which PE1/2 may
   optionally use to influence their UMH selection such that GW1 is
   chosen as their UMH for C-s4.

2.5.4.  When Tenant Routers Are Present

   It is possible that an EVPN broadcast domain is providing transit
   service for a tenant's larger network and there are tenant routers
   attached to the subnet, running routing protocols like PIM.  In that
   case, traffic routed by an upstream NVE to the subnet via IRB
   interface may be expected on a downstream tenant router.  However,
   since multicast data traffic sent down the IRB interfaces is
   forwarded to local ACs only and not to other EVPN sites according to
   rule 4 in Section 2, additional procedures are needed to handle this
   situation with tenant routers.  In particular, NVEs connecting to
   tenant routers or traffic sources need to run PIM on the IRB
   interface for the transit subnet and the SBD.

   Consider the following situation:

        S1                                  S2
         \ N1                              / N2
          CE1a                            CE2b
           \ vlan1                       / vlan1
            NVE1    ------------   NVE2  ---- CE2a -- receiver
           / N3                                    N4

   CE1a, CE2a/b are three CE routers on vlan1 that is implemented by
   EVPN.  The CEs and NVE1/2 run PIM protocol and are PIM neighbors on
   vlan1.  CE2a has a receiver on network N4 for multicast traffic from
   S1/2/3 on network N1/2/3 respectively.

   CE2a sends PIM joins to CE1a/CE2b/NVE1 on vlan1 for the three sources
   respectively and they all route traffic accordingly onto vlan1.
   Traffic from S1/2 will reach CE2a because NVE1/2 receive the L2
   traffic on their ACs and forward across the core following EVPN
   procedures.  Traffic from S3 is routed into vlan1 by NVE1 via the IRB
   interface, and per rule 4 in Section 2 the traffic will not be sent
   across the core.  Thus, according to the procedures specified so far,
   the traffic from S3 will never be received by NVE2 or CE2a.

   To solve this problem, NVE2 needs to know that CE2a sent a PIM join
   to another NVE in vlan1 and needs to pull traffic via the SBD, where

Lin, et al.            Expires September 14, 2017              [Page 19]

Internet-Draft               evpn-irb-mcast                   March 2017

   the traffic via IRB is not blocked on the core side.  Because PIM
   protocol already requires a router to process join/prune messages
   that it receives on an interface even if it is not the intended RPF
   neighbor (for the purpose of join suppression and prune overriding),
   NVE2 can realize that the upstream router in the join message is
   another NVE vs. a CE router (this only requires the NVEs to keep
   track if a neighbor is an NVE for the subnet).  In that case, it
   treats that join/prune as for itself.  Correspondingly, its PIM
   upstream state machine will choose one of the NVEs as the RPF
   neighbor.  Between this local NVE and the chosen RPF neighbor there
   could be multiple subnets including the SBD but the SBD IRB interface
   is explicitly chosen as the RPF interface.  Corresponding join/prune
   is sent over the SBD IRB interface (optionally the the join/prune
   could be replaced with SMET routes) and the upstream NVE will route
   traffic through the SBD.  This NVE then route traffic further
   downstream to CE routers.

   Similarly, if an NVE needs to send PIM join/prune messages due to its
   local IGMP/MLD state changes, the RPF interface is always explicitly
   set to the SBD IRB.

   Note that, if CE2a chooses NVE1 or NVE2 instead of CE1a as its RPF
   neighbor for S1, then both CE1a and NVE2 will send traffic to vlan1
   (NVE1 receives join from NVE2 on the SBD and sends join to CE1a on
   vlan1.  NVE1 receives traffic from CE1a on vlan1 and route to SBD.
   NVE2 receives traffic on SBD and route to local receivers on vlan1).
   PIM assert procedure kicks in but only on NVE2, as CE1a does not
   receive traffic from NVE2.  To address this, an NVE must track all
   the RPF neighbors and not add an IRB interface to the OIF list if it
   received a corresponding PIM join on the IRB, in which a tenant
   router is listed as the upstream neighbor.  That tenant router will
   deliver traffic to the subnet, and the traffic will be forwarded
   through the core as it is not routed down the IRB but received on an

   With PIM-ASM, if the DR on a source subnet is a tenant router, it
   will handle the registering procedures for PIM-ASM.  As a result, the
   NVE at same site as the tenant router/DR MUST not handle registering
   procedures as described in Section 2.

3.  IANA Considerations

   This document requests the following IANA assignments:

   o  A "Non-OISM" Sub-Type in "EVPN Extended Community Sub-Types"
      registry for the EVPN Non-OISM Extended Community.

Lin, et al.            Expires September 14, 2017              [Page 20]

Internet-Draft               evpn-irb-mcast                   March 2017

   o  An "Optimized Inter-subnet Multicast" bit (OISM) in the Multicast
      Flags extended community defined in

4.  Security Considerations

   To be updated.

5.  Acknowledgements

   The authors thanks Eric Rosen for his detailed review, valuable
   comments/suggestions and some suggesgted text.  The authors also
   thanks Vikram Nagarajan and Princy Elizabeth for their contribution
   of the external connection variation (xref target="variation"/>.  The
   authors alse benefited tremendously from the discussions with Aldrin
   Isaac on EVPN multicast optimizations.

6.  References

6.1.  Normative References

              Zhang, Z., Lin, W., Rabadan, J., and K. Patel, "Updates on
              EVPN BUM Procedures", draft-ietf-bess-evpn-bum-procedure-
              updates-01 (work in progress), December 2016.

              Sajassi, A., Thoria, S., Patel, K., Yeung, D., Drake, J.,
              and W. Lin, "IGMP and MLD Proxy for EVPN", draft-sajassi-
              bess-evpn-igmp-mld-proxy-01 (work in progress), October

   [RFC2119]  Bradner, S., "Key words for use in RFCs to Indicate
              Requirement Levels", BCP 14, RFC 2119,
              DOI 10.17487/RFC2119, March 1997,

   [RFC7432]  Sajassi, A., Ed., Aggarwal, R., Bitar, N., Isaac, A.,
              Uttaro, J., Drake, J., and W. Henderickx, "BGP MPLS-Based
              Ethernet VPN", RFC 7432, DOI 10.17487/RFC7432, February
              2015, <http://www.rfc-editor.org/info/rfc7432>.

6.2.  Informative References

Lin, et al.            Expires September 14, 2017              [Page 21]

Internet-Draft               evpn-irb-mcast                   March 2017

              Sajassi, A., Salam, S., Thoria, S., Drake, J., Rabadan,
              J., and L. Yong, "Integrated Routing and Bridging in
              EVPN", draft-ietf-bess-evpn-inter-subnet-forwarding-03
              (work in progress), February 2017.

   [RFC4364]  Rosen, E. and Y. Rekhter, "BGP/MPLS IP Virtual Private
              Networks (VPNs)", RFC 4364, DOI 10.17487/RFC4364, February
              2006, <http://www.rfc-editor.org/info/rfc4364>.

   [RFC5015]  Handley, M., Kouvelas, I., Speakman, T., and L. Vicisano,
              "Bidirectional Protocol Independent Multicast (BIDIR-
              PIM)", RFC 5015, DOI 10.17487/RFC5015, October 2007,

   [RFC7761]  Fenner, B., Handley, M., Holbrook, H., Kouvelas, I.,
              Parekh, R., Zhang, Z., and L. Zheng, "Protocol Independent
              Multicast - Sparse Mode (PIM-SM): Protocol Specification
              (Revised)", STD 83, RFC 7761, DOI 10.17487/RFC7761, March
              2016, <http://www.rfc-editor.org/info/rfc7761>.

Lin, et al.            Expires September 14, 2017              [Page 22]

Internet-Draft               evpn-irb-mcast                   March 2017

Appendix A.  Integrated Routing and Bridging

   Consider a traditional router that only does routing and has no L2
   switching (also referred to as "bridging") capabilities.  It has two
   interfaces lan1 and lan2 connecting to LAN 1 and LAN 2 respectively.
   The two LANs are realized by two switches respectively, with hosts
   and the router attached:

                  +-------+        +--------+        +-------+
                  |       |    lan1|        |lan2    |       |
          H1 -----+switch1+--------+ router +--------+switch2+------H2
                  |       |        |        |        |       |
                  +-------+        +--------+        +-------+
             |____________________|          |_____________________|
                  LAN1                              LAN2

   Interfaces lan1 and lan2 are two physical interfaces with IP
   configuration and functionality.  With that they may also be referred
   to as IP interfaces (on top of layer 2 interfaces).  H1 has a default
   gateway configured, which is the router's IP address on interface
   lan1.  For H1 to send an IP packet destined to H2, it uses the
   router's mac address (learnt via ARP resolution for the gateway) for
   lan1 as the destination mac address.  The router receives the packet
   from the switch and associate it with the IP interface lan1 because
   the desitnation mac address matches.  A IP lookup is done and the
   packet is sent out of interface lan2, with H2's mac address (again
   learnt via ARP resolution) as the destination mac address and the
   router's mac address on lan2 as the source mac address.  TTL is
   decremented and fragmentation may be done during this forwarding
   process.  This process may be referred to as "routing a packet".  For
   comparison, when switch1 sends the packet that it receives from H1 to
   the router, it is "bridging or L2 switching a packet".  There is no
   TTL or fragmentation for L2 switching, and there is no source/
   destination mac address change.

   If H1 sends an IP multicast packet, the multicast destination mac
   address and IPv4/6 Ethertype cause the router to associate the packet
   with the IP interface lan1 and may route it out of other IP
   interfaces as appropriate, following multicast routing rules.

   Now consider that the router itself supports both routing and
   bridging.  Now the above picture becomes the following:

Lin, et al.            Expires September 14, 2017              [Page 23]

Internet-Draft               evpn-irb-mcast                   March 2017

                  |   Integrated Router and Bridge/Switch    |

                  +-------+        +--------+        +-------+
                  |       |    IRB1|   L3   |IRB2    |       |
          H1 -----+  BD1  +--------+routing +--------+  BD2  +------H2
                  |       |        |instance|        |       |
                  +-------+        +--------+        +-------+
             |____________________|          |______________________|
                  LAN1                              LAN2

   The router now includes a routing instance (which could be the
   default/master instance, a Virtual Router routing instance, a VRF, or
   a VRF Lite, depending on which vendor's nomenclature one is familiar
   with) and two broadcast domains (BDs) that provide bridging
   functionalities.  Instead of two physical interface connecting to two
   physical switches, there are two logical interfaces connecting the
   routing instance to the two BDs.

   Because the device now provides both routing and bridging
   functionalities, it becomes an Integrated Router and Bridge(/Switch),
   and the two logical interfaces are referred to IRB interfaces, or
   sometimes simply IRBs.  For each BD that needs routing functionality,
   there is one IRB interface connecting the BD to a particular routing

   Other than that the logical IRB interfaces replace physical
   interfaces, the way the packets are forwarded does not change.

Authors' Addresses

   Wen Lin
   Juniper Networks, Inc.

   EMail: wlin@juniper.net

   Zhaohui Zhang
   Juniper Networks, Inc.

   EMail: zzhang@juniper.net

   John Drake
   Juniper Networks, Inc.

   EMail: jdrake@juniper.net

Lin, et al.            Expires September 14, 2017              [Page 24]

Internet-Draft               evpn-irb-mcast                   March 2017

   Jorge Rabadan

   EMail: jorge.rabadan@nokia.com

   Ali Sajassi
   Cisco Systems

   EMail: sajassi@cisco.com

Lin, et al.            Expires September 14, 2017              [Page 25]

Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/