[Docs] [txt|pdf] [Tracker] [Email] [Diff1] [Diff2] [Nits]

Versions: 00 01

NSIS Working Group                                              R. Bless
Internet-Draft                                                   M. Doll
Expires: January 8, 2008                              Univ. of Karlsruhe
                                                            Jul 07, 2007

           Inter-Domain Reservation Aggregation for QoS NSLP

Status of this Memo

   By submitting this Internet-Draft, each author represents that any
   applicable patent or other IPR claims of which he or she is aware
   have been or will be disclosed, and any of which he or she becomes
   aware will be disclosed, in accordance with Section 6 of BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on January 8, 2008.

Copyright Notice

   Copyright (C) The IETF Trust (2007).


   QoS NSLP is a recently proposed signaling protocol that allows to
   establish QoS reservations in the Internet.  In order to enable large
   scale deployment, inter-domain aggregation should be considered as
   mechanism to allow for the necessary scalability in the control
   plane.  This draft describes the major problems that must be solved
   and proposes also solutions to these problems, requiring only modest
   modifications and extensions to the currently defined GIST and QoS
   NSLP specifications.

Bless & Doll             Expires January 8, 2008                [Page 1]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

Table of Contents

   1.  Introduction . . . . . . . . . . . . . . . . . . . . . . . . .  3
   2.  Aggregation Concept  . . . . . . . . . . . . . . . . . . . . .  3
     2.1.  Aggregate Setup  . . . . . . . . . . . . . . . . . . . . .  4
     2.2.  Aggregate Use and Changes  . . . . . . . . . . . . . . . .  4
     2.3.  Aggregate Teardown . . . . . . . . . . . . . . . . . . . .  5
   3.  Problems . . . . . . . . . . . . . . . . . . . . . . . . . . .  5
     3.1.  Determination of Aggregator and Deaggregator . . . . . . .  5
     3.2.  Signaling between Aggregator and Deaggregator  . . . . . .  6
     3.3.  Route Change Detection for Aggregated Flows in an
           Aggregate  . . . . . . . . . . . . . . . . . . . . . . . .  7
     3.4.  A Priori Determination of a Flow's Path  . . . . . . . . .  8
   4.  Solution Proposals . . . . . . . . . . . . . . . . . . . . . .  9
     4.1.  Determination of Aggregator and Deaggregator . . . . . . .  9
     4.2.  Signaling Between Aggregate Endpoints  . . . . . . . . . . 10
     4.3.  Route Change Detection for Aggregated Flows in an
           Aggregate  . . . . . . . . . . . . . . . . . . . . . . . . 11
       4.3.1.  IP Layer Solution  . . . . . . . . . . . . . . . . . . 13
       4.3.2.  GIST Layer Solution  . . . . . . . . . . . . . . . . . 13
       4.3.3.  NSLP Layer Solution  . . . . . . . . . . . . . . . . . 15
     4.4.  A Priori Determination of a Flow's Path  . . . . . . . . . 15
     4.5.  Example  . . . . . . . . . . . . . . . . . . . . . . . . . 16
   5.  Security Considerations  . . . . . . . . . . . . . . . . . . . 19
   6.  References . . . . . . . . . . . . . . . . . . . . . . . . . . 20
     6.1.  Normative References . . . . . . . . . . . . . . . . . . . 20
     6.2.  Informative References . . . . . . . . . . . . . . . . . . 20
   Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 20
   Intellectual Property and Copyright Statements . . . . . . . . . . 22

Bless & Doll             Expires January 8, 2008                [Page 2]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

1.  Introduction

   A primary objective of NSIS QoS Signaling is to perform resource-
   based admission control for data flows.  Per-flow signaling, however,
   has scalability issues.  Aggregation of resource reservations can be
   used to achieve better scalability in the control plane (e.g., as
   proposed in [RFC3175]).  Aggregation achieves two important
   reductions: reduction of state (or reservation context) information
   and reduction of signaling message processing.  Aggregation in the
   data plane can be achieved by using packet forwarding mechanisms
   according to the Differentiated Services architecture [RFC2475].  For
   the sake of simplicity, we assume that the latter is used to provide
   QoS for packet forwarding and that Autonomous Systems contain one or
   more Differentiated Services domains.

   For the following discussion, we assume that the reader is familiar
   with RSVP aggregation concepts as described in [RFC3175].  For the
   remainder of this memo the terms "aggregate" and "aggregation" are
   used in the meaning of "reservation aggregation", i.e. aggregation of
   reservation state in the control plane.  Furthermore, the term
   "aggregated flow" denotes a flow that is contained in a reservation
   aggregate that encompasses several single reservations (of the
   aggregated flows).  These aggregated flows also share some properties
   in the data path, i.e. usually they belong to the same service class
   and share a part of the same data path (but they need not be in the
   same address aggregate or have the same destination).

   Currently, QoS NSLP describes coarsely the process of reservation
   aggregation, and it supports a single aggregation level using two
   different router alert option (RAO) values.  This is usually
   sufficient if only intra-domain aggregation is considered.  But if
   the ultimate goal of providing QoS for an end-to-end communication is
   considered from a global perspective, intra-domain aggregation is not
   sufficient for a scalable end-to-end QoS support: if aggregated flows
   leave an aggregation domain, the next domain sees all individual
   (i.e., non-aggregated) flows again.  Thus, especially larger transit
   providers will have to manage a lot of individual flows and thus they
   will be suffering from scalability problems.  Moreover, using
   manually established static aggregates between providers would be a
   huge management overhead.  Therefore, we want to design a mechanism
   that allows to dynamically create aggregates between different
   providers on demand.

2.  Aggregation Concept

   This section briefly describes the concept of aggregated reservations
   as assumed in this draft.

Bless & Doll             Expires January 8, 2008                [Page 3]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

2.1.  Aggregate Setup

   Aggregation of QoS reservations is performed between an aggregator
   and a deagreggator (which is located downstream from the aggregator,
   cf. also RFC 3175 [RFC3175]).  Based on some trigger (e.g., the
   current number of reservations), the aggregator decides to subsume
   several existing flow reservations along the same path segment (e.g.,
   same AS hops) into a larger aggregate reservation.  In order to make
   this decision, the aggregator must find a potential deaggregator.
   The flows (and their reservations) must follow exactly the same path,
   at least up to the deaggregator.  Therefore, the aggregator must
   either know the actual path taken by the flows (e.g., by using
   corresponding routing protocol information) or it must get notified
   by the deaggregator explicitly.  In this case the deaggregator must
   know that the aggregator is an upstream node which is common to all
   reservations under consideration.

   The comprising aggregate reservation is setup by an appropriate
   signaling between aggregator and deaggregator.  During this
   procedure, the existing reservations are moved into the aggregate
   reservation, i.e., all intermediate nodes between aggregator and
   deaggregator delete all state information related to the individual
   reservations that should be aggregated, so they only see a single
   aggregate reservation afterwards.  This will achieve the desired
   reduction of managed states.  Aggregator and deaggregator manage both
   the individual reservations and the aggregate reservation, i.e., they
   don't save any state information but need to manage one additional
   state for the aggregate.

   Furthermore, in order to save message processing cost, the aggregate
   capacity should be somewhat larger than actually required by the
   subsumed flows.  In this case the aggregator should not need to adapt
   the aggregate capacity every time a flow leaves or joins the
   aggregate.  Thus, the aggregate capacity should change only
   infrequently, usually by applying some hysteresis function (cf.
   discussion in [RFC3175], sec. 1.4.4).

2.2.  Aggregate Use and Changes

   If a new reservation request approaches the aggregator it must
   determine a priori whether the new flow "fits" into an existing
   aggregate.  So the flow's route must be known and whether enough
   residual capacity is left in the aggregate to subsume the new
   request.  In case that the aggregate capacity is too small, it must
   be increased prior to including the new reservation.

   Signaling messages for all aggregated flows should be directly
   forwarded from aggregator to deaggregator in order to save signaling

Bless & Doll             Expires January 8, 2008                [Page 4]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   message processing by nodes between aggregator and deaggregator.
   Furthermore, these nodes do not have any knowledge about the
   aggregated flow sessions anymore, thus one must avoid to signal
   messages related to these single flows to them.

2.3.  Aggregate Teardown

   The aggregate can be torn down some time after the last reservation
   has left the aggregate.  The aggregator will notice either an
   explicit teardown or a refresh timeout for the last reservation.  If
   no new reservation request shows up after a waiting period, the
   aggregate reservation will be torn down completely.

3.  Problems

   We see several problems for a QoS NSLP to support inter-domain
   aggregation, namely:
   o  Determination of Aggregator and Deaggregator
   o  Signaling between Aggregator and Deaggregator
   o  A Priori Determination of a Flow's Path
   o  Route Change Detection for Aggregated Flows in an Aggregate
   These points are discussed in more detail in the following sections.

3.1.  Determination of Aggregator and Deaggregator

   When aggregation within a domain is considered, it is no problem to
   choose an aggregator and deaggregator for a set of flows, because
   boundary routers at the domain borders ("aggregation region") are
   typically acting as aggregators and deaggregators for flows entering
   and leaving the domain respectively.  Thus, their role is

   But if aggregation across domains is considered, it is not obvious
   which routers are aggregators or deaggregators for a set of flows,
   because there are many choices due to the fact that flows usually
   traverse several different administrative domains.  Aggregates are
   more efficient the longer they are, because longer aggregates save
   more states and control message processing.  A set of flows can be
   aggregated along the path that they share, i.e., along the set of
   nodes that are traversed by all flows within this set.  In example of
   Figure 1 three different flows are shown, f1 from host H1 to sink S1,
   f2 from H2 to S1, and, f3 from H3 to S3.  While all three flows can
   be aggregated along domains Dd, De, and Df, only f2 and f3 can be
   aggregated from Dd up to Dg.  Furthermore, the flows are entering the
   domain Dd usually at different ingress routers and join somewhere
   within the domain.  However, they may leave the domain by the same
   egress router.  For the deaggregation domain, the reverse is true:

Bless & Doll             Expires January 8, 2008                [Page 5]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   the aggregated flows enter the domain via the same ingress, but may
   split sooner or later within the domain, leaving it via different

            \           S1
             \         /
             /              \
            /                S3

   f1: data flow H1->S1 Hx: Host x
   f2: data flow H2->S2 Sx: Sink x
   f3: data flow H3->S3 Dx: Domain x

   Example for aggregation of flows along different domains

                                 Figure 1

   In summary, there are many more choices to determine an aggregator-
   deaggregator pair for a set of flows.  Moreover, it is important to
   consider who initiates the establishment of an aggregate.  In RFC
   3175 [RFC3175] it is the deaggregator that initiates the reservation,
   which corresponds nicely to the receiver-initiated reservation scheme
   of RSVP.

   For QoS NSLP both ends are basically able to initiate an aggregate
   reservation.  The more natural choice would be that the aggregator
   initiates establishment of an aggregate reservation.  In this case,
   it is required that the aggregator has knowledge about potential
   deaggregators.  This information may be collected during
   establishment of the reservation for a single flow and reported back
   to the initiator.  See Section 4 for a possible solution.

3.2.  Signaling between Aggregator and Deaggregator

   Signaling messages related to flows that are aggregated in an
   encompassing aggregate should be forwarded directly from aggregator
   to deaggregator and vice versa.  This is necessary, because in-
   between nodes know only the aggregate flow and do not have any
   information about individual flows that are contained in this
   aggregate.  Moreover, the aggregate should not only save states, but
   it should also allow for saving signaling message processing.

   The intra-domain aggregation defines that NTLP uses a router alert
   option to signal directly from aggregator to deaggregator, i.e., all

Bless & Doll             Expires January 8, 2008                [Page 6]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   NSIS QNEs in-between do not interpret the signaling message.
   However, this simple and effective scheme does not work for inter-
   domain aggregation, because the space of possible RAO values (16 bit)
   is much too small to cover the huge set of potential unique
   aggregator-deaggregator pairs, which would be required for different
   provider domains.

   Moreover, when considering GIST, a further problem occurs if the
   aggregator has to send periodically a Query message for every flow in
   order to detect any route changes for this flow.  However, this
   should not cancel the aggregation gain, i.e., nodes between
   aggregator and deaggregator should ideally not process these messages
   or store state about these flows.  Thus, on the one hand these query
   messages should detect any change in the path between aggregator and
   deaggregator, on the other hand, nodes in-between should preferably
   not process these per-flow signaling messages.

3.3.  Route Change Detection for Aggregated Flows in an Aggregate

   It may occur that the route of an aggregated flow changes during its
   lifetime.  If the routing change does not affect the part of the data
   path that is also covered by the aggregate, it is not a problem,
   because it will be managed by the usual GIST/QoS NSLP mechanisms.  If
   the route change also affects the encompassing aggregate in the same
   way as the aggregated flows, it would be covered by trying to reserve
   resources for the re-routed aggregate.

   However, the flow may actually leave the aggregate's path and either
   return to it before or after the deaggregator (cf. flows f1 and f2
   respectively in Figure 2).  An alternative, as mentioned in
   [RFC3175], would be to "tunnel" the data packets between aggregator
   and deaggregator.  However, due to the burden for routers caused by
   the overhead of tunneling data packets as well as MTU related
   problems, we do not consider such solutions in this draft.
   Therefore, a mechanism must be defined to detect any route changes
   affecting aggregated flows.

Bless & Doll             Expires January 8, 2008                [Page 7]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

    /  Dst  \
   |    |    |
   |De  |&&&&&&&&&&&
    \   |   /       &&
     +--|--+          && f2
        |               &
     +--D--+          +-&---+
    /   #   \        /  &    \
   |Dd  #    |      |  &&   Dg|
   |    #%%%%|      |&&       |
    \   #   /%%    &&\       /
     +--#--+   %%&&   +-----+
        #      &&%%
     +--#--+ &&    %% +-----+
    /   #  &&        %       \
   |Dc  #&&& |      | %%%   Df|
   |    #    |      |    %    |
    \   #   /        \   %   /
     +--#--+          +--%--+
        #                %
     +--#--+             %
    /   #   \    f1      %
   |Db  #%%%%|%%%%%%%%%%%%
   |    #    |
    \   #   /
     +--A--+   Dx: Domain x
        |       #: Aggregate
     +--|--+    A: Aggregator
    /   |   \   D: Deaggregator
   |Da  |    |  %: f1, returning to aggregate route
   |   Src   |     before deaggregator
    \       /   &: f2, returning to aggregate route
     +-----+       behind deaggregator

   Possible route changes of aggregated flows

                                 Figure 2

3.4.  A Priori Determination of a Flow's Path

   In order to utilize an already established aggregate reservation, an
   aggregator must know if a new incoming reservation can be integrated
   into an already established aggregate.  This requires that the
   aggregator is able to determine the path that the flow will take a
   priori.  In case the flow runs along the same path as an already
   established aggregate and the aggregate has enough unused capacity,
   the aggregator may include the request into the aggregate and forward

Bless & Doll             Expires January 8, 2008                [Page 8]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   it directly to the deaggregator.  However, predicting a flow's path
   is difficult in the inter-domain case: usually only an AS path (i.e.,
   a sequence of AS numbers) for a given destination prefix can be
   determined by using BGP routing table information.  Thus, an
   aggregator usually does not know the exact ingress or egress border
   routers for a given AS.  Especially multi-homing techniques between
   ASes make it difficult to predict an exact path, e.g., flows whose AS
   paths differ only in their destination AS may enter the same
   penultimate AS through different ingress routers.  Furthermore, some
   mechanism must be provided in order to verify the prediction, i.e.,
   to revert if the prediction was wrong.

   An alternative to prediction would be to probe the actual path first,
   preferably without installing any state.  This would, however,
   increase the reservation setup time, because a round-trip signaling
   message exchange would be required before one could determine whether
   there is an existing aggregate that would match the flow.

4.  Solution Proposals

   This section sketches some proposals to the previously described
   problems.  This is preliminary work and some details still need to be
   worked out further in forthcoming version of this draft.

4.1.  Determination of Aggregator and Deaggregator

   Determination of aggregator and deaggregator could be accomplished by
   using a QoS NSLP mechanism to record the route for individual
   reservations.  Therefore each QNE that is able and willing (i.e., if
   local policy allows it) to serve as a deaggregator may simply append
   its IP address to a new protocol object ("Route-Record") that holds a
   list of such addresses.  This protocol object would be carried in a
   RESERVE message.  Usually, it is sufficient to record only two
   addresses per domain, i.e., ingress and egress QNE.  The average AS
   path length is usually well below 4 ASes, so the total number of
   recorded addresses would still be small.  It may be useful to record
   also the AS number in addition to the QNE addresses.  Moreover, the
   ideal object to record would be the peer identity, but due to its
   non-unique and potentially lengthy format they are probably harder to
   process more efficiently than IP addresses.  The completed "Route-
   Record" object would be reported back to potential aggregators in the
   RESPONSE message.  So if aggregation is going to be used, requesting
   a RESPONSE message by inserting an RII object at the aggregator would
   be required.  If an RII object is already present, the response must
   be checked by the aggregator.

   An aggregator may then store the list of traversed QNEs together with

Bless & Doll             Expires January 8, 2008                [Page 9]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   the per-flow session data and pick a deaggregator according to its
   own criteria later.  Usually, the aggregator should choose a
   deaggregator that is far away in order to achieve long and thus
   efficient aggregates.

   In case he determined a QNE to serve as a deaggregator, a new RESERVE
   will be sent towards the deaggregator.  The RESERVE message would
   contain the totalized capacity of all individual reservations and a
   list of the session IDs for all flows that should be aggregated.
   This would require the use of a SESSION_ID_LIST object that is
   contained in the aggregation messages RESERVE and RESPONSE.  The list
   in the RESPONSE message will contain all sessions that could not be
   aggregated, e.g., in case of aggregation conflicts, i.e., when some
   flows were already aggregated in a way that they cannot be aggregated
   as intended by the new aggregation request.  Furthermore, a flag
   (AGGREGATION bit) in the RESERVE or RESPONSE message could indicate
   that it is a special type of RESERVE and RESPONSE message containing
   the additional SESSION_ID_LIST object.  Using a flag instead of a new
   message type may have some implementation advantages, because most of
   the code is completely identical to a normal RESERVE processing.
   However, it is also possible to define a new message type for
   aggregation establishment (e.g, ARESERVE and ARESPONSE).

4.2.  Signaling Between Aggregate Endpoints

   The objective is to forward per flow signaling messages (e.g.,
   refreshing RESERVEs of a flow's session) between aggregator and
   deaggregator directly, so that no intermediate QNE has to process
   these messages.  The QNI sends a per flow message (e.g., a refreshing
   RESERVE) that arrives at the aggregator.  The aggregator detects that
   this flow is part of a larger aggregate reservation and performs
   "aggregate signaling", i.e., it sends the message along a special
   direct messaging association (MA) that must be established between
   aggregator and deaggregator for the aggregate.  A possible mechanism
   to establish a corresponding messaging association is described
   below.  At QoS NSLP level the aggregator should also insert the
   BOUND_SESSION_ID object containing the session ID of the aggregate's
   session.  When the signaling message arrives at the deaggregator, it
   notices that this message arrived via the direct message association
   between the aggregate endpoints and removes the inserted
   BOUND_SESSION_ID.  Then normal message processing at QoS NSLP level

   In order to establish a direct signaling message transport between
   aggregator and deaggregator, the GIST Query message must be conveyed
   to the deaggregator.  This could be done via several ways, e.g., at
   IP layer or at GIST layer as described in the next section in more
   detail.  However, the aggregator will create a new GIST session that

Bless & Doll             Expires January 8, 2008               [Page 10]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   solely serves the purpose to directly transfer signaling messages
   between aggregator and deaggregator.  The Query will be sent as
   Q-mode encapsulated message with the single flow as destination
   address and a special MRM (further details for this special query
   encapsulation are described in the next section).

   Once the Query arrives at the deaggregator it will send a UDP
   encapsulated Response directly back to the aggregator and a messaging
   association should be created.  Using an SCTP connection for a
   messaging association would be a good choice so that messages for
   different flows can be mapped to different streams.  Signaling
   messages for individual flows that arrive at the aggregator are
   mapped to the GIST session for aggregate signaling, i.e., they are
   directly sent to the deaggregator.  A problem is, however, that the
   session ID for the individual flow must be conveyed by additional
   means, because GIST must use its present session ID for aggregate

4.3.  Route Change Detection for Aggregated Flows in an Aggregate

   As mentioned above, it should be possible to establish a messaging
   association directly between aggregator and deaggregator, e.g., using
   the GIST bypass mechanism, so that intermediate QNEs drop out of the
   signaling path.  However, in case that all messages for aggregated
   single reservations are passed over this direct message association
   unconditionally, reservation path and data path would probably
   diverge once the route of this flow changes and sheers off the
   aggregate path.

   Before describing the details of the method, we summarize the
   sequence of the overall aggregate operations:
   1.  Deaggregator is discovered.
   2.  Aggregator establishes an aggregate reservation.
   3.  Aggregator initiates a direct signaling messaging association.
       The messaging associations for all aggregated flows at both sides
       are updated or installed, so that signaling messages for the
       aggregated flows use the direct signaling MA.
   4.  Aggregator performs periodically route change checks for
       aggregated flows.
   5.  Additional flows may be added to the aggregate later (leaving
       flows are straightforward).

   A direct forwarding of signaling messages between aggregator and
   deaggregator would be a problem if a flow within the aggregate
   changes its route within the aggregate, leaving the aggregate's path
   (maybe even re-joining it later on, cf. flow f2 in Figure 2).  In
   case a flow diverges from the aggregate route it must establish a new
   reservation along the new part of the path from the branching QNE.  A

Bless & Doll             Expires January 8, 2008               [Page 11]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   further problem occurs due to the fact, that the QNEs within the
   aggregate don't know anything about this particular flow session.
   Furthermore, the single flow should be removed from the aggregate
   reservation, otherwise new requests have to be rejected

   Therefore, it must be checked regularly for path divergence between
   the flow's path and the aggregate's path.  With RSVP aggregation or
   the intra-domain aggregation for QoS NSLP this is automatically done
   by using the router alert option, i.e. per flow signaling messages
   will be routed along their natural path, possibly swerving from the
   aggregate's path.  The interior nodes still don't have to process the
   signaling messages.  Some boundary node will intercept the message in
   its role as potential deaggregator and possibly trigger creation of a
   new aggregate or initiate integration of the single flow into an
   existing aggregate.  Though as explained earlier, for inter-domain
   aggregation, the RAO-based approach is not usable due to the limited
   RAO value space with respect to the potential number of aggregator/
   deaggregator peers, so another approach must be developed.

   Route detection is the task of the NTLP layer.  In GIST, a periodic
   Query per routing entry is triggered in order to discover new routes
   or route changes respectively.  The GIST Query is sent with Q-mode
   encapsulation.  This also works with intra-domain aggregation by
   setting the corresponding RAO.  For inter-domain aggregation we want
   to use a similar mechanism.

   A GIST node cannot easily detect if the flow's path diverges from the
   aggregate's route.  A GIST node could detect that the IP next hop of
   the flow and the IP next hop of the aggregate flow differ.  But this
   need not necessarily result in a change of the next GIST peer.
   Therefore, a check for route divergence in a GIST node is not a
   reliable indication that the flow actually left the aggregate route.

   Thus, a better indication would be that a signaling message arrives
   at a GIST node where the aggregate reservation is yet unknown (e.g.,
   the session ID for the aggregate is unknown).  In this case the node
   should not establish a new GIST signaling session for the aggregate,
   but send back an error indication to the aggregator instead.  This
   would at least detect the case when a flow leaves the aggregate and
   hits a different GIST node.  Not covered by this detection is the
   case when a flow diverges from the route and rejoins the aggregate
   route later thereby leaving out some GIST nodes on the aggregate
   route.  To detect such situations a GIST node must either determine
   whether the previous peer who sent the message is still the same peer
   as in the aggregate, or, use the GIST Hop Count as indicator that a
   QNE was skipped.

Bless & Doll             Expires January 8, 2008               [Page 12]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   Therefore, we define an "aggregate query encapsulation" mode that
   detects any divergent routes for a flow and its encompassing
   aggregate.  This query aggregate encapsulation must not use the
   direct messaging association between the end points, because it would
   not detect any route changes then.  It can, however, also be used to
   initially establish such a direct signaling relationship.

4.3.1.  IP Layer Solution

   The most efficient solution would be a bypass at IP level, like the
   special Router Alert Option for aggregate signaling.  But, in this
   inter-domain case, a simple Router Alert Option codepoint is not
   enough to cover the huge set of different deaggregators in different

   One solution would be a new IP option carring an additional
   destination address as an IP packet option (a new hop-by-hop option
   in IPv6, which we call Route Verify).  On receipt of such a packet
   the router simply performes one additional routing lookup for the
   conveyed destination address and must compare the next hop for the
   normal destination address and the additional destination address.
   In our case, the outer IP destination address would be the one from
   the deaggregator and the additional destination address the same as
   the destination from the flow's path-coupled MRI.  In case that the
   next hop entry is the same for both destination addresses, the router
   simply forwards the message, keeping this additional option.  If the
   next hops differ, however, flow and aggregate diverge at this router
   and the originator of the signaling message should be notified of
   this fact.  This would require to send a special ICMP message back to
   the aggregator indicating the route divergence.  This ICMP message
   must be correlated to the GIST signaling message at the aggregator
   again.  However, as mentioned above, route divergence between GIST
   nodes may not be relevant.  Furthermore, because only the IP
   destination address is used for a routing decision and not the MRI,
   routes may be different from MRI-based routing decisions.

4.3.2.  GIST Layer Solution

   The GIST only method would be as follows: if the aggregator would
   send a Q-mode encapsulated message for a single reservation, a GIST
   Query message with a new MRM (aggregate forwarding - AF) is sent
   instead.  We call this mode Aggregate Q-Mode (AQ-mode).  The
   Aggregate Q-mode encapsulation is as follows: the query is sent with
   Q-mode encapsulation, i.e., it has the RAO set, uses the flow
   destination address as IP destination address and uses the
   aggregators address as source address (S-Bit is set in the GIST
   header).  The session-ID is the one of the single flow, but it
   contains the new aggregate AF-MRM instead of the normal path-coupled

Bless & Doll             Expires January 8, 2008               [Page 13]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   MRM.  The AF-MRM contains a type field, the session ID of the
   aggregate, and the path-coupled MRI of the single flow.  In summary
   its structure is like this:

      MRI = type
            session-id     (aggregate flow)
            PC-MRI         (single flow)

   The type field allows to differentiate between different AQ-modes
   where interpretation of contained session IDs and MRIs differ.  It
   contains one of the following values:
   o  "Route Check": Perform a route check for the specified flow only.
   o  "Establish Direct": Establish a new direct signaling session for
      the given flow.
   o  "Add Flow": if message hits endpoint, message should be passed to

   The session ID carried within the aggregate forwarding MRI (AF-MRI)
   is the one of the established aggregate (which we call session ID A
   for now), the PC-MRI is a fully encapsulated path-coupled MRI object
   (i.e., including the common object header).

   First we describe how the route change check is performed and then
   how a direct signaling messaging association can be established by
   using the AQ-mode encapsulation.  When the aggregator sends such a
   signaling message in direction towards the deaggregator, the next
   GIST node (supporting QoS NSLP) will intercept the message due to the
   RAO and detect the AF-MRM.  It then checks whether session ID A is
   known.  If session ID A is known and the node is not the endpoint
   (deaggregator) for this session it will forward the message further
   downstream (basically unchanged, i.e., IP hop count and Gist Hop
   Count are decremented) according to the flows PC-MRI.  If the node is
   the endpoint for the session, message forwarding is terminated and
   the message has to be processed by GIST, i.e., it must refresh the
   routing state for the single flow.  In contrast to a normal Query the
   R flag in the GIST header need not be set, so a Response may be
   suppressed, because the primary objective is to check for diverging
   routes, that are indicated by Error messages.  But if the R flag is
   set, a Response message must be sent back directly to the aggregator
   via the existing direct messaging association.

   If a GIST node receives such an AQ-mode encapsulated message but does
   not have any installed state for session ID A, it MUST send back an
   error message (yet to be defined but indicating that the signaling
   message left the aggregate's path) directly to the IP source of the
   query message, which is the aggregator.  The aggregator should then
   indicate a route change to the QoS NSLP and should remove the single
   flow reservation from the aggregate and initiate a normal single flow

Bless & Doll             Expires January 8, 2008               [Page 14]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   reservation along the further path (optimizations like shifting the
   aggregate reserved resources for this flow along the unchanged
   aggregate path are for further study).

   The Gist Hop Count must be checked on reception at the endpoint,
   because it will indicate if a GIST-aware node was skipped due to
   rerouting.  Its value must be compared with the one that was received
   stored during establishment of the direct signaling connection.  This
   requires that the latter is set up immediately after the
   establishment of the reservation aggregate as described earlier.

   Setting up a direct signaling messaging association is possible by
   using the same AQ-mode encapsulation for the initial Query, having
   the AF-MRI type set to "Establish Direct".  The session ID carried by
   GIST is the one of the newly to be established direct signaling MA.
   The PC-MRI in the AF-MRI contains a description of the MA signaling
   flow between the two aggregate endpoints.  For this AF-MRM type the R
   flag must be set (R=1) so the endpoint (deaggregator) must send a
   Response message directly back to the aggregator to continue the
   initial GIST handshake.  Therefore, the S flag must be set, too.

   Details for the last type of this AQ-mode (Add Flow) are described in
   Section 4.4.

4.3.3.  NSLP Layer Solution

   In principle it is also possible to use the NSLP layer for the
   functions described in the previous GIST related section, e.g.,
   signaling messages are received by the NSLP layer and bypassed if the
   node is not the deaggregator.  The Bound-Session-ID object could be
   used to refer to the aggregate session and a new flag in the NSLP
   common header could be used to indicate that this message should be
   forwarded unless arrived at the deaggregator.  If this
   Bound-Session-ID is unkown, the signaling message has left the
   aggregate and an error message should indicate this fact.

   The disadvantage of this method is, that is possesses more processing
   overhead than using bypassing via GIST as describe above.

4.4.  A Priori Determination of a Flow's Path

   This is the most difficult task of inter-domain aggregation.  We
   propose to solve it by using (BGP) routing tables, the GIST AQ-mode,
   and new QoS NSLP mechanisms.

   A QNE may detect that the flow's destination address of a new
   incoming reservation request is in the same prefix or AS as already
   aggregated reservations.  In this case, it may try to integrate this

Bless & Doll             Expires January 8, 2008               [Page 15]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   reservation into the same already existing aggregate (either having
   some capacity left or having to increase the capacity of the
   aggregate reservation first).  Moreover, the QNE may try to use an
   existing aggregate if the flow traverses the same AS path as the
   aggregate, but as stated in Section 3, this prediction may be
   inaccurate.  Thus, an optimistic approach would try to use the
   existing aggregate first, but doing some path verification in nearly
   the same way as for route change detection.  Therefore, we use a
   variant of the AQ-mode for that purpose, too.

   However, because the Query in the AQ-mode cannot carry larger
   payloads it is not well suited to carry any larger QoS NSLP RESERVE
   message.  Thus, we use the following mechanism: The RESERVE for the
   newly to be established flow is sent directly to the (predicted)
   deaggregator over the direct signaling messaging association.  It
   must not be forwarded, however, before not being sure that the flow
   follows the path of the aggregate.  Therefore, the RESERVE must not
   be forwarded and has to wait until a path verification message via
   AQ-mode arrives at the deaggregator.  This would be a new type of QoS
   NSLP message, simply carrying a unique message ID (e.g., a 128-bit
   value) that is chosen by the aggregator.  This message ID must be
   also contained in the RESERVE message to allow for a successful
   matching of these mutual dependent signaling messages.  This MSG_ID/
   BOUND_MSG_ID mechanism was introduced in version 14 of the QoS NSLP
   specification draft [I-D.ietf-nsis-qos-nslp].

   If the AQ-mode Query message arrives before the RESERVE message, the
   deaggregator will note that the message ID was received and can
   immediately forward the RESERVE, because the "Waiting Condition" is
   already satisfied.  However, waiting messages will time out after a
   while, because the path prediction may have been wrong and the flow
   diverged from the predicted path.  Additionally, one could design an
   explicit cancellation mechanism, so that the aggregator could
   explicitly cancel waiting messages if it has been notified of a
   diverging route.

   The proposed method has the advantage of saving more than one round
   trip time compared to a mechanism where the path is probed first.
   Furthermore, it is advantageous when nested aggregates have to be
   increased in their capacity.  Details of such a concept and using
   waiting conditions for messages in a signaling protocol are described
   in [DARIS].

4.5.  Example

   We now describe an example of how the signaling messages will be
   exchanged.  First we assume, that the deaggregator can be determined
   by feedback from the proposed Route Record object in all RESPONSE

Bless & Doll             Expires January 8, 2008               [Page 16]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   messages for the individual flows.  An aggregate capacity will be
   determined and all flows sharing a part the same route will be
   aggregated in a corresponding reservation between aggregator and
   deaggregator (RA and RD respectively in Figure 4).

   RA       R1       R2       R3       R4       R5       RD
   |        |        |        |        |        |         |
   |---- (1) RESERVE(RA->RD, SID=A, SESSION_ID_LIST) ---->| Establish
   |                                                      | an Aggregate
   |<--- (2) RESPONSE(RA->RD, SID=A, SESSION_ID_LIST) ----| Reservation
   |        |        |        |        |        |         |
   |-- (3) Query(SID=S,AF-MRI(EstDirect,SID=A,RA->RD)) -->| Establish
   |        |        |        |        |        |         | a direct
   |<========== (4) Response(SID=S,AF-MRI) ===============| signaling
   |        |        |        |        |        |         | association
   |<======================= New MA =====================>| between
   |                                                      | RA and RD

   ----> hop-by-hop signaling (intercepted/inspected by R1-R5)
   ====> direct signaling (no interpretation by R1-R5)

   Example for establishing an aggregate

                                 Figure 4

   Message (1) is a RESERVE for the aggregate reservation from RA to RD
   that is sent as usually: using a path-coupled MRM via query mode
   encapsulation.  The only additions are: SESSION_ID_LIST that lists
   all the flows that are included into the aggregate.  Furthermore, a
   MSG_ID is included so that message number (3) will wait if it arrives
   earlier than message (4).  Each intermediate node (R1-R5) performs
   admission control for the requested aggregated bandwidth while taking
   into account that resources for the listed single flows are part of
   the requested capacity.  Intermediate nodes R1-R5 will delete
   information about the single information flows that are listed in the
   SESSION_ID_LIST object if message (2) indicates a successful

   Message (3) is requesting the setup of a direct signaling messaging
   association between RA and RD.  It will be an independent signaling
   session (SID=S) but can be bound to the aggregate session (SID=A).
   This session must be handled in a special manner by the RMF along
   with all the reservation aggregation mechanisms anyway: any
   individual flow that is now contained in the aggregate A should use
   this MA instead.  The PC-MRI in the AF-MRI is now the same as for the
   signaling connection.  A QoS NSLP Query message can be used to
   initially set up this direct MA.  It should, however, be some kind of

Bless & Doll             Expires January 8, 2008               [Page 17]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   NULL QSPEC.  This message must be sent in AQ-mode and every
   intermediate node (R1-R5) will bypass the message if it is neither
   the destination node of the PC-MRI nor the deaggregator of the
   session-ID in the AF-MRI (session id is A in this case).  The
   deaggregator will set up an MA with RA as direct GIST peer (previous
   hop).  Subsequent messages for the aggregated flows could now be sent
   explicitly routed by using the SII handle from the session S.

  RA       R1       R2       R3       R4       R5       RD
  |        |        |        |        |        |         |
  |============== (1) RESERVE(S->D, SID=f) =============>| per-flow
  |                                                      | message
  |<=========== (2) RESPONSE(S->D, SID=f) ===============| (not refresh)
  |        |        |        |        |        |         |
  |---(3) Query(SID=f,AF-MRI(RouteCheck,SID=A,S->D)----->| GIST refresh
  |        |        |        |        |        |         |
  |        |        |        |        |        |         |

  ----> hop-by-hop signaling (intercepted/inspected by R1-R5)
  ====> direct signaling (no interpretation by R1-R5)

   Example for single flow end-to-end message and GIST route refresh

                                 Figure 5

   Figure 5 shows that end-to-end per flow messages (messages (1) and
   (2) like a TEAR or reservation update) will now be sent via the
   direct signaling connection (using the explicit routing feature by
   specifying an appropriate SII handle).  If GIST needs to refresh the
   route of an aggregated flow, it will send a Query in AQ-Mode (message
   (1)).  This Query will be intercepted by each intermediate node
   (R1-R5) and the AF-MRI is checked as described in Section 4.3.2: if
   session ID A from the AF-MRI is known the message will be bypassed
   and forwarded in direction of the PC-MRI(S->D) of the single flow
   that is contained in the AF-MRI.  If the node is endpoint of the
   aggregate, i.e. it is the deaggregator RD for A, it will stop
   forwarding and refresh the flows routing state instead.  In case the
   route of the single flow diverges from the aggregate's path an error
   will be returned to the aggregator RA indicating that session A is
   unknown at this node.

Bless & Doll             Expires January 8, 2008               [Page 18]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   RA       R1       R2       R3       R4       R5       RD
   |        |        |        |        |        |         |
   |===== (1) RESERVE(S->D, SID=g, BOUND_MSG_ID(x)) =====>| per-flow
   |        |        |        |        |        |         | message
   |        |        |        |        |        |         |
   |        |        |        |        |        |         |
   |--(2) Query(SID=g,AF-MRI(AddFlow,SID=A,S->D),         |
   |      QUERY,MSG_ID(x))                        ------->| path
   |        |        |        |        |        |         | verification
   |        |        |        |        |        |         |

   ----> hop-by-hop signaling (intercepted/inspected by R1-R5)
   ====> direct signaling (no interpretation by R1-R5)

   Example for a new flow that is included into the already existing

                                 Figure 6

   Figure 6 shows a new RESERVE message for a single flow g that should
   also be aggregated into A. Thus it is sent directly to RD via the
   direct MA.  However, in order to verify the path prediction a second
   message, e.g. a QUERY, will be sent in AQ-Mode using an AF-MRI of
   type AddFlow.  Every intermediate node will bypass this message until
   the deaggregator of A is hit.  This message will be the triggering
   message for message (1) whose processing is suspended by the waiting
   condition on message (2).  If message (2) arrives at RD, the path was
   verified and the prediction was correct.  Any divergence from the
   path will also result in an error that is sent back to the

5.  Security Considerations

   Basically, the security considerations of GIST and QoS-NSLP apply.
   Inter-domain aggregation, however, may open new aspects due to
   different trust relationships between domains.  So not every provider
   may be willing to accept aggregate reservations.  On the other hand,
   using the proposed mechanisms for deaggregator discovery, it is no
   problem to avoid acting as deaggregator by not writing own addresses
   into the Route-Record object.  So the particular policy of a provider
   could be easily realized.  Furthermore, domains that share or carry a
   lot of end-to-end reservations would likely cooperate with each

   The newly proposed waiting condition for messages cannot be used for
   DoS attacks that try to exhaust state memory, because every
   deaggregator will accept such messages only within an aggregate

Bless & Doll             Expires January 8, 2008               [Page 19]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

   context.  Usually, a trust relationship between aggregator and
   deaggregator exists and they may also use a secure direct signaling
   messaging association (which is recommended).  Thus, messages from
   the aggregator could be authenticated.  Attackers are not able to
   send such message blindly, because the deaggregator would drop them
   due to their unauthorized origin and a non-matching session ID.

   These are preliminary considerations, so they probably cover not all
   possible aspect of the proposed solutions.  There will be more
   details in the next versions of this draft.

6.  References

6.1.  Normative References

              Schulzrinne, H. and R. Hancock, "GIST: General Internet
              Signalling Transport", draft-ietf-nsis-ntlp-13 (work in
              progress), April 2007.

              Manner, J., "NSLP for Quality-of-Service Signaling",
              draft-ietf-nsis-qos-nslp-14 (work in progress), June 2007.

6.2.  Informative References

   [DARIS]    Bless, R., "Dynamic Aggregation of Reservations for
              Internet Services", Telecommunications Systems Volume 26,
              Issue 1, pp. 33--52, Kluwer,
              May 2004.

   [RFC2475]  Blake, S., Black, D., Carlson, M., Davies, E., Wang, Z.,
              and W. Weiss, "An Architecture for Differentiated
              Services", RFC 2475, December 1998.

   [RFC3175]  Baker, F., Iturralde, C., Le Faucheur, F., and B. Davie,
              "Aggregation of RSVP for IPv4 and IPv6 Reservations",
              RFC 3175, September 2001.

Bless & Doll             Expires January 8, 2008               [Page 20]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

Authors' Addresses

   Roland Bless
   Institute of Telematics, Universitaet Karlsruhe (TH)
   Zirkel 2
   Karlsruhe  76187

   Phone: +49 721 608 6413
   Email: bless@tm.uka.de
   URI:   http://www.tm.uka.de/~bless

   Mark Doll
   Institute of Telematics, Universitaet Karlsruhe (TH)
   Zirkel 2
   Karlsruhe  76187

   Phone: +49 721 608 6403
   Email: doll@tm.uka.de
   URI:   http://www.tm.uka.de/~doll

Bless & Doll             Expires January 8, 2008               [Page 21]

Internet-Draft    Inter-Domain Reservation Aggregation          Jul 2007

Full Copyright Statement

   Copyright (C) The IETF Trust (2007).

   This document is subject to the rights, licenses and restrictions
   contained in BCP 78, and except as set forth therein, the authors
   retain all their rights.

   This document and the information contained herein are provided on an

Intellectual Property

   The IETF takes no position regarding the validity or scope of any
   Intellectual Property Rights or other rights that might be claimed to
   pertain to the implementation or use of the technology described in
   this document or the extent to which any license under such rights
   might or might not be available; nor does it represent that it has
   made any independent effort to identify any such rights.  Information
   on the procedures with respect to rights in RFC documents can be
   found in BCP 78 and BCP 79.

   Copies of IPR disclosures made to the IETF Secretariat and any
   assurances of licenses to be made available, or the result of an
   attempt made to obtain a general license or permission for the use of
   such proprietary rights by implementers or users of this
   specification can be obtained from the IETF on-line IPR repository at

   The IETF invites any interested party to bring to its attention any
   copyrights, patents or patent applications, or other proprietary
   rights that may cover technology that may be required to implement
   this standard.  Please address the information to the IETF at


   Funding for the RFC Editor function is provided by the IETF
   Administrative Support Activity (IASA).

Bless & Doll             Expires January 8, 2008               [Page 22]

Html markup produced by rfcmarkup 1.129d, available from https://tools.ietf.org/tools/rfcmarkup/