Differentiated Services Working Group G. Mercankosk Internet Draft Australian Expires January 2001 Telecommunications CRC Document: draft-mercankosk-diffserv-pdb-vw-00.txt July 2000 Category: Informational The Virtual Wire Per Domain behavior _ Analysis and Extensions draft-mercankosk-diffserv-pdb-vw-00.txt Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026 [1]. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. Distribution of this memo is unlimited. Abstract This document provides an analysis of the Virtual Wire Per Domain Behavior with limited extensions. The draft formalizes a model for the PDB by explicitly identifying key timing and decision variables and associated design parameters. It derives the necessary and sufficient conditions for creating and establishing a Virtual Wire flow. It provides methods for quantifying design parameters and setting decision parameters and discusses their extensions to incorporate variations in traffic characteristics. A pdf version of this document is available at http://www.ee.uwa.edu.au/~guven/vwpdb.pdf Mercankosk Informational - Jan 2001 1 Virtual Wire - Analysis & Extensions July 2000 1. Introduction The document [3] describes a Differentiated Services (diffserv or DS) Per Hop Behavior (PHB) called Expedited Forwarding (EF) intended for use in building a scalable edge-to-edge service that appears to the end points like an un-shared, point-to-point connection or `Virtual Wire.' For scalability, a DS-domain supplying this service must be completely unaware of the individual end points using it, except for routing purposes, and sees only the aggregate EF marked traffic entering and traversing the domain. The document [4] provides a set of specifications necessary on the aggregate traffic (in DS terminology, a Per Domain Behavior or PDB) in order to meet these requirements and thus defines a new PDB, the `Virtual Wire' PDB of some fixed capacity. It does not require `per- flow' state to exist anywhere other than the ingress and egress routers. Despite the lack of per-flow state, if the aggregate input rates are appropriately policed and the EF service rates on interior links are appropriately configured, the edge-to-edge service supplied by the DS-domain will be indistinguishable from that supplied by a dedicated wire between the end points. This document provides an analysis of `Virtual Wire' PDB with limited extensions. "The same as a dedicated wire," means that as long as packets are sourced at a rate less than or equal to the virtual wire's configured rate, they will be delivered with a high degree of assurance and with no distortion (apart from line clock jitter) of the inter-packet timing imposed by the source. However, any packet sourced at a rate greater than the virtual wire's configured rate will either be unconditionally discarded or spaced to conform with the configured rate. In the latter case, the packets will be delivered with inter-packet timing imposed by the spacer. This draft formalizes a model for the `Virtual Wire' PDB by explicitly identifying key timing and decision variables and associated design parameters. It derives the necessary and sufficient conditions for creating and establishing a `Virtual Wire' flow. It provides methods for quantifying design parameters and setting decision parameters and discusses their extensions to incorporate variations in traffic characteristics. In this document, we distinguish between the transfer delay over a DS-domain and edge-to-edge service delay. We provide a characterization of the transfer delay bound based on established results. We argue that the issue of non-EF traffic starvation does not exist even when priority queueing is deployed. We point out that the presence of non-EF micro flows does not necessitate additional bandwidth resources. We determine a conservative expected time to the first failure of a `Virtual Wire.' We discuss the impact of heterogeneous packet lengths on the transfer delay bound. This document provides a synthesis of various methods in delivering a particular but versatile solution. The issue of scalability, the Mercankosk Informational - Jan 2001 2 Virtual Wire - Analysis & Extensions July 2000 edge-to-edge service characteristics and a simple management of DS- domain provide the basis for the presented solution. The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119 [2]. 2. Description of Virtual Wire PDB with extensions Figure 1: An illustration of virtual wire transfer over a DS-domain An illustration of the model of a Virtual Wire transfer over a DS- domain is given in Figure 1. At the ingress end of the micro flow, the model contains an ingress link (S to I) and a DS-domain ingress border router (at I). Over the DS-domain (I to E), the packets from the micro flow are subject to random delays, according to EF-PHB service. At the egress end of the flow, the remaining components of the model are a DS-domain egress border router (at E) and an egress link (E to D). The ingress link and the egress link are assumed to run at the same constant rate as the virtual wire rate R with negligible line clock jitter. 2.1 Ingress edge of DS-domain A periodic flow of packets of size S is sourced at a rate R on the ingress link. Accordingly, one packet is presented at the DS-domain ingress border router in every T=S/R time units. The number of complete packets that are in transit in the DS-domain at time t is denoted by N_DS(t). Without loss of generality, we choose the time origin such that the last bit of the first packet presented at the ingress border router is at time T. Let t_k represent the time at which the last bit of the kth packet is presented to the DS-domain at the ingress border router, then (1) t_k = kT. 2.2 Over the DS-domain Once a packet that belongs to a micro flow is presented at the ingress border router, it hops from one EF router to another along its path over the DS-domain before reaching the egress border router. There are two main components for the delay experienced by a packet over the DS-domain, namely the propagation delay and the queueing delay. The propagation delay refers to the time it would take for a packet to traverse through the DS-domain if the packet experienced no queueing delay along its path. Accordingly, the propagation delay, d_p, is fixed and includes the time necessary to process a packet at routers. Let d_k denote the total queueing delay experienced by the kth packet along its path. It is assumed that the queueing delay experienced by each packet of a micro flow over the DS-domain is statistically bounded by a known value, D, that is (2) 0 <= d_k < D Mercankosk Informational - Jan 2001 3 Virtual Wire - Analysis & Extensions July 2000 The value, D, can be obtained by some (1-alpha) quantile of the total queueing delay. Note that the value D refers to the difference between the best and the worst case expectation of the packet transfer delay. The best case is equal to d_p and the worst case is equal to d_p + D, a value likely to be exceeded with a probability less than alpha. Therefore, the value D is a measure of variation of the delay distribution and may be referred to as expected maximum jitter. At a given EF-router, let C generically denote the capacity of the output link associated with the micro flow. The transmission time delta of a size S packet onto the output link is given by S/C time units. Finally, let f denote the fraction of the output link capacity C as the target operating point for the aggregate EF flow at a given node. Also let tau_k denote the time at which the last bit of the kth packet reaches the egress border router. So we have (3) tau_k = t_k + d_k + d_p 2.3 Egress edge of DS-domain The packets that arrive at the egress border router are placed in a micro flow specific buffer of size B_E. The number of complete packets in the egress buffer at time t is denoted by N_E(t). Similar to the ingress link, one packet is transmitted onto the egress link in every T time units. It is clear that the timing structure of the sequence {tau_k} is not necessarily the same as that of the sequence {t_k}. A commonly used strategy for recovering the initial timing structure is to delay the transmission of the first packet onto the egress link by a fixed amount of time, say D_xi, and set the size of the egress buffer B_E such that the continuity of the data flow is maintained when packets are transmitted onto the egress link using the very same timing structure {t_k}. We note that a micro flow specific buffer at the egress border router is required even for the original Virtual Wire [4]. Let xi_k denote the time at which the first bit of the kth packet is transmitted onto the egress link. Accordingly, we have (4) xi_k = T + d_1 + d_p + D_xi + (k-1)T 3. Establishing a virtual wire In Appendix B, we list a number of micro flow properties over a virtual wire and express key micro flow quantities in terms of the virtual wire rate R and the equalization delay D_xi. We observe that the end-to-end delay experienced by each packet of the micro flow is constant as it would be over a dedicated wire and given by (5) xi_k - t_k = d_1 + d_p + D_xi Mercankosk Informational - Jan 2001 4 Virtual Wire - Analysis & Extensions July 2000 We note that the end-to-end delay depends on the queueing delay d_1 experienced by the first packet, the propagation delay d_p over the virtual wire, and the equalization delay D_xi at the egress border router. We will later observe that, in establishing a virtual wire, no explicit knowledge of d_1 and d_p is required. In expressing the key micro flow quantities, we assume that buffers over the DS-domain and at the egress border are sufficiently large that they do not overflow. We also assume that the equalization delay is made sufficiently large so that when the transmission of a packet onto the egress link is completed, there is always another packet in the egress buffer ready for transmission. Continuity of micro flow requires that every packet presented to the DS-domain at the ingress router at fixed intervals has to be transmitted onto the egress link after a fixed delay but again at the same fixed intervals, without being subject to packet loss or interruption during the life time of the flow. Packet loss occurs if a packet has to be discarded due to lack of storing capacity upon arrival at the egress border router. On the other hand, the flow is interrupted if a transmission cannot be started due to unavailability of a packet at the egress border buffer. In Appendix C, we show that having a bounded queueing delay over the DS-domain, as indicated by (2), is sufficient to have maximum and minimum limits on the egress border buffer fill N_E(t). Accordingly, the egress buffer fill N_E(t) is bounded above by ceiling((D+D_xi)/T) and below by floor((D_xi-D)/T) (definitions for floor(x) and ceiling(x) are given in Appendix A). We then use these limits to derive the necessary and sufficient conditions to maintain the continuity of flow. In relation to avoiding egress link starvation, we consider the situation where the first packet of the flow experiences no queueing delay and some other packet experiences a queueing delay arbitrarily close to the maximum D to obtain (6) floor((D_xi-D)/T) >= 0 In relation to avoiding egress border buffer overflow however, we consider the situation where the first packet experiences a queueing delay arbitrarily close to the maximum D and some other packet experiences no queueing delay to obtain (7) ceiling((D+D_xi)/T) <= B_E As indicated earlier, in determining the maximum queueing delay D, the rarity of events has been taken into account. Accordingly, some level of loss is allowed in case events considered rare do occur. The conditions given in (6) and (7) ensure that there would be no further losses. The smallest value for D_xi that satisfies (6) is D. Setting the initial delay as (8) D_xi = D Mercankosk Informational - Jan 2001 5 Virtual Wire - Analysis & Extensions July 2000 also minimizes the end-to-end delay over the DS-domain. The resulting end-to-end delay is (9) xi_k - t_k = xi_1 - T = d_1 + d_p + D Now substituting (8) into (7), we have (10) ceiling(2D/T) <= B_E Consequently, the left hand side of the inequality in (10) determines the minimum buffer size for egress border buffer as (11) B_E = ceiling(2D/T) 4. Characterization of the delay bound In Appendix D, we outline a general queueing model that forms the basis for quantifying the transfer delay bound. We observe that, for all practical purposes, the transfer delay bound is only a function of aggregate traffic intensity and independent of individual micro flow rates. We point out that strict ingress policing and resource reservation ensure the delivery of rate guarantees to individual micro flows. We also emphasize the non-ergodic nature of the problem which is critical in determining the measure of interest. In this section, we first look at the delay due to phase coincidences with other micro flows. We next consider heterogeneous micro flow rates in relation to the delay due to packets from the same micro flow. We then summarize the methods of determining the delay bound across the DS-domain. We finally consider servers taking vacations upon becoming idle to examine the delay due to non-EF micro flows. 4.1 Delay due to phase coincidences with other micro flows The queueing model outlined in Appendix D is based on the superposition of periodic micro flows. Given the strict ingress policing to ensure a micro flow behavior given by (1), a nD/D/1 queue naturally arises provided that all contributing micro flows have just joined in. Due to phase coincidences at a given router, a micro flow cannot maintain the behavior given by (1) into the successive routers along its path. However, we note that the work associated with the micro flow remains unchanged. The deviation from the nominal behavior, after a number of multiplexing stages, results in the delay experienced by each packet of a micro flow being additionally affected by the previous packets of that flow [10]. Such an impact can be taken into account by considering an additional flow into the queueing system. As a limiting case, we observe that the delay distribution through an (M+D)/D/1 queueing system always provides an upper bound for the delay distribution through any nD/D/1 queue of the same aggregate Mercankosk Informational - Jan 2001 6 Virtual Wire - Analysis & Extensions July 2000 intensity. We note, however, that the use of a conservative upper bound does not alter the non-ergodic nature of the problem. 4.2 Delay due to packets from the same micro flow In Appendix D, when the aggregate consists of micro flows of the same rate, we note that a packet cannot be further delayed due to packets from the micro flow that it belongs to. Yet, when the aggregate consists of micro flows of different rates, this is not necessarily correct. In particular, a packet from a high rate flow may find itself queued behind packets from the same micro flow even when all micro flows are perfectly periodic. Referring to the numerical examples given in Appendix D, the following observation can be made: The delay distribution for an arbitrary aggregate traffic mix is always upper bounded by that of an aggregate which consists of micro flows of the smallest rate only and totalling to the same intensity. This means that for a given fraction f, the number of possible minimum rate micro flows determines an upper bound distribution function for any aggregate traffic mix. Note that, through conservative upper bounding, we have the same delay bound for all micro flows in an aggregate irrespective of individual rates which in turn greatly simplifies the management of a DS-domain. Take, for example, the case where the transmission is over an OC-3 link (C=148.61 Mbps, sonet user rate) with aggregate EF traffic intensity 0.90C on the link, then the delay bound per hop is of the order of 1.02 ms (delta=10.77 micro seconds, S=200 bytes, M - > infinity, D=95delta, alpha=10^-9). 4.3 Delay across a DS-domain If a DS-domain is designed such that the lowest capacity domain link constrains the total number of micro flows that can be transferred over the DS-domain [4] then the Appendix F establishes that the bound on the delay can be determined from one count of an equivalent stand alone nD/D/1 queue or (M+D)/D/1 queue. Alternatively, consider the case where each micro flow follows a fixed routing path from ingress to egress over a fixed number of hops, say H. We note that a micro flow may traverse portions of DS- domain together with other micro flows. However, a conservative delay bound across the domain can be determined by assuming that the micro flow meets a new set of micro flows at each hop. The delay across the DS-domain can then be considered as the sum of H independent random delay components. Consequently, the bound on the delay can be determined from H counts of nD/D/1 queue or (M+D)/D/1 queue. The study given in [11] provides an accurate closed-form formula to calculate the transfer delay bound under similar assumptions. Since the use of conservative assumptions result in small delay bounds (e.g. 1.02 ms), over provisioning of bandwidth resources is Mercankosk Informational - Jan 2001 7 Virtual Wire - Analysis & Extensions July 2000 not required. Rather, moderate amounts of additional storage resources have been traded for lowering the probability of data flow discontinuity. Also, the simplicity in management of a DS-domain has been an underlying guideline. 4.4 Delay due to non-EF micro flows In the case of non-preemptive priority queueing, it is possible that an EF micro flow packet arriving to an empty EF queue may find the server busy transmitting a non-EF packet and has to wait for the end of transmission. The worst case occurs when there is always a non-EF packet waiting to be transmitted. The situation is then equivalent to a queueing system where a server can take vacations if no customer is waiting upon completion of a service. Appendix E provides relevant references for quantifying the additional delay an EF-packet experiences due to non-EF micro flows. Figure 2: Residual rest periods due to non-EF flows Given the properties of queueing models with rest periods stated in Appendix E, the problem of determining the delay due to non-EF micro flows reduces to that of calculating the distribution of residual rest periods. Let S_N-EF denotes the size of a non-EF packet. Then the transmission time delta_N-EF onto the output link is given by S_N-EF/C. There are three cases to consider as illustrated in Figure 2. If the size of a non-EF packet is also equal to S, i.e. delta_N- EF = delta, then the study in [5] readily provides the solution. If the size of a non-EF packet is, and normally would be, larger than S but requires less than T time units for transmission, i.e. T >= delta_N-EF > delta, we can use the property shown in [8] with uniformly distributed residual rest period to quantify the delay due to non-EF micro flows. Finally, if the time required to transmit a non-EF packet is larger than T time units, i.e. delta_N-EF > T, then the residual rest period is equal to delta_N-EF - T which is a constant plus a uniformly distributed random variable over (0,T]. If the packet lengths from non-EF micro flows are not fixed but drawn from a known distribution then the residual rest period distribution has to be determined accordingly. 5. Other issues We finally look at other issues such as the issue of non-EF traffic starvation, the expected time to the first failure and heterogeneous packet lengths. Unless otherwise stated, we assume non-preemptive priority queueing with equal size EF micro flow packets. 5.1 The choice of scheduling mechanism Recall that f denotes the fraction of the output link capacity C that is allocated to the aggregate EF flow at a given node. The maximum amount of EF work presented to the node is then worth fT time units in any interval of T time units. Equivalently, the output link either remains idle or is used to serve non-EF work for T-fT Mercankosk Informational - Jan 2001 8 Virtual Wire - Analysis & Extensions July 2000 time units. We note that these proportions remain valid under any scheduling mechanism. However, the delay experienced by the members of the aggregate flow depends on the mechanism under consideration. Clearly, the use of priority queueing minimizes the delay for EF traffic and there is no issue of starvation for non-EF traffic provided that T-fT>0. We note, with non-preemptive priority queueing, that the possibility of an additional delay due to non-EF micro flows does not necessitate an additional bandwidth for EF aggregate provided that the aggregate input rate does not exceed the minimum departure rate which happens to be the output link capacity. This observation is based on the fact that the EF traffic never waits while a non-EF queue is serviced except under the circumstances described earlier. This situation is distinctly different under weighted round robin scheduling where the EF queue is forced to relinquish control every now and then as its scheduling parameters require. We acknowledge that the particular way EF PHB is defined may result in EF bandwidth allocation restrictions. We note, however, that it is not a technical requirement, either. It is correct that one could clear the EF traffic backlog by using a scheduling mechanism such as weighted round robin that guarantees a minimum departure rate of (f+deltaf)C with deltaf>0. Such a departure rate, instead of C, can then be used in the analysis above to determine the corresponding delay bounds. As a matter of fact, the minimum departure rate is actually the maximum rate that a scheduler can guarantee to clear the EF traffic backlog. Depending on the dynamic state of the node, the scheduler may clear the EF traffic backlog at a higher rate. However, for the purposes of creating a Virtual Wire PDB, the situation is not considered well defined [4]. 5.2 Expected time to the first failure We noted earlier that the delay bound, D, is obtained by the (1- alpha) quantile of the queuing delay for some alpha, say alpha=10^- 9. Accordingly, if a VW micro flow lasts long enough then there is a distinct possibility that the data flow continuity could break down. We now establish a crude but conservative expected time to the first failure. A similar approximation has been given in [5]. Recall that an activity change means that either one micro flow drops out or another one joins in. Although activity changes occur due to one micro flow at a time, we assume an entirely new and independent arrival pattern at each activity change. We further assume that there will be an activity change for each packet of a tagged micro flow which is not normally the case. The probability that a packet from the tagged micro flow experiences very small or no queueing delay is a function of target operating loading (indicated by fraction f of each link along the path), and it is generally very high. Therefore, if the first packet of the tagged micro flow experiences a queueing delay greater than D, a break Mercankosk Informational - Jan 2001 9 Virtual Wire - Analysis & Extensions July 2000 down in data flow continuity would immediately follow due to the reasoning that led to the condition in (6). However, this will happen only once per 1/alpha micro flow starts. For the more likely case of the first packet experiencing no queueing delay, the time until the first break down (expressed in number of activity changes) due to the reasoning that led to the condition in (7), is geometrically distributed with mean 1/alpha. For a micro flow that generates one packet every 20 ms, and given alpha=10^-9, the expected time to the first failure is conservatively 230 days which can be considered reasonably long for all practical purposes. These numbers further put the choice for parameter alpha into a context. 5.3 Heterogeneous packet lengths Finally, we look at the issue of heterogeneous packet lengths. We first consider the case where fixed size packets are in use within the same flow, but the size of packets may differ from one micro flow to another. We then lift any restrictions on the packet lengths. We note that, given the virtual wire rate R, large packets mean longer periods and short packets mean shorter periods. However, the amount of work presented to an output link, in its nature, does not differ from the case when we have fixed size packets. Because of the strict ingress policing, the size of a packet can be considered as a quantum of service request over a short time scale. At one extreme, consider a micro flow with a smaller packet size competing with micro flows that use larger packets. Given the uniformly distributed packet arrival epochs in the analysis above, it should be clear that the distribution of the number of packets that form the unfinished work does not change. However, we note that, given the number of packets in the queue, the unfinished work and hence the queueing delay is larger due to larger packet lengths. Under the circumstances, we can relate the difference in queueing delays to the service quanta represented by different packet sizes. Although the arguments given in this paragraph requires further study, the impact of heterogeneous packet sizes should not be significant provided that packet sizes, in use for EF micro flows, do not vary wildly. With variable length packets allowed within the same micro flow, its impact on the queueing delay can be determined when we observe that the unfinished work is now a random sum of random variables that represent packet lengths. Variable length packets within a given micro flow has also an impact on the packet transmission times for the micro flow. With fixed size packets, there is no variation due to packet transmissions and as such the transmission delay is considered as a component of the fixed propagation delay d_p. With variable packet lengths, the variation in packet transmission times is bounded by the difference between the transmission time of a maximum length packet and that of a minimum length packet. We note that the variation in transmission Mercankosk Informational - Jan 2001 10 Virtual Wire - Analysis & Extensions July 2000 time can be incorporated into the earlier analysis as a modification on the propagation delay without any effect on the queueing delay. 6. Assumptions In this document, we have derived the necessary and sufficient conditions for realizing a `Virtual Wire' PDB under the assumption that a periodic flow of packets of fixed size is presented to the DS-domain. It is further assumed that each flow is fully and isochronously used. However, the use of `Virtual Wire' PDB is not limited to periodic flows of fixed size packets. As a matter of fact, pointers for incorporating any variations in traffic characteristics are given. Possible extensions include multi- priority EF queueing and periodic flows with on-off periods. These extensions and possibly some others are left for future discussions. References [1] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [2] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997 [3] V. Jacobson, K. Nichols, K. Poduri, "An Expedited forwarding PHB," ftp://ftp.isi.edu/in-notes/rfc2598.txt. [4] V. Jacobson, K. Nichols, K. Poduri, "The `Virtual Wire' Behavior Aggregate," draft-ietf-diffserv-ba-vw-00.txt, March 2000. [5] P.A. Humblet, A. Bhargava, M.G. Hluchyj, "Ballot Theorems Applied to the Transient Analysis of nD/D/1 Queues," IEEE/ACM Transactions on Networking, Vol 1, No 1, pp 81-95, February 1993. [6] J.W. Roberts, J.T. Virtamo, "The Superposition of Periodic Cell Arrival Streams in an ATM Multiplexer," IEEE Transactions on Communications, Vol 39, No 2, pp 298-303, February 1991. [7] M. Scholl, L. Kleinrock, "On the M/G/1 with Rest Periods and certain Service Independent Queueing Disciplines," Operations Research, Vol 31, pp 705-719, 1983. [8] G. Mercankosk, "Mathematical Preliminaries of Rate Enforced ATM Access," Australian Telecommunications Research Institute, NRL- TM-071, July 1995. [9] R.L. Graham, D.E. Knuth, O. Patashnik, "Concrete Mathematics," Addison Wesley Publishing Company, 1994. [10] Commission of the European Communities, "COST 224, Performance evaluation and design of multiservice networks," Edited by J.W. Roberts, October 1991. [11] D. De Vleeschauwer, G.H. Petit, B. Steyaert, S. Wittevrongel, H. Bruneel, "An accurate Closed-Form Formula to calculate the Dejittering Delay in Packetized Voice Transport," Proceedings of the IFIP-TC6, International Conference on Networking 2000, pp 374-385, Paris, May 2000. Mercankosk Informational - Jan 2001 11 Virtual Wire - Analysis & Extensions July 2000 Acknowledgments This document is based on and inspired by an earlier joint work with Professor Cantoni. Numerous discussions on related topics with Professor Budrikis have found their way into the arguments presented in this document. Dr. Siliquini has provided a constant encouragement, and has shared the excitement for the preparation of this document. I would also like to acknowledge the context provided by Van Jacobson and his team. Author's Address Guven Mercankosk Australian Telecommunications CRC University of Western Australia TEN Research Group (EE) Nedlands WA 6907 Australia Phone: +61 8 9380 7260 Email: guven@ee.uwa.edu.au Full Copyright Statement "Copyright (C) The Internet Society (2000). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into Mercankosk Informational - Jan 2001 12 Virtual Wire - Analysis & Extensions July 2000 A. Some number theoretic results We first note that, for all real x, floor(x) denotes the largest integer less than or equal to x and ceiling(x) denotes the smallest integer greater than or equal to x. Also note that R and Z denote the set of real and integer numbers, respectively. The following rules can be found in [9]. (12) for all x in R, -floor(x) = ceiling(-x) (13) for all x in R, n in Z, n = ceiling(x) if and only if x <= n < x+1 (14) for all x in R, n in Z, n < x if and only if n < ceiling(x) (15) for all x in R, n in Z, n > x if and only if n > floor(x) Mercankosk Informational - Jan 2001 13 Virtual Wire - Analysis & Extensions July 2000 B. Micro flow relations In this appendix, we express key system quantities in terms of the virtual wire rate R and the equalization delay D_xi. We assume that buffers over the DS-domain and at the egress border are sufficiently large that they do not overflow. We also assume that the initial delay is made sufficiently large so that when the transmission of a packet onto the egress link is completed, there is always another packet in the egress buffer ready for transmission. Subsequently in the Appendix C, we determine the bounds on the buffer sizes and the initial delay. B.1 Packets presented and delivered The number of packets presented to the DS-domain at the ingress border in the interval (0,t] is given by the expression (16) floor(t/T) and illustrated in Figure 3. Figure 3: Packet presentation instants at the ingress Similarly, the number of packets transmitted onto the egress link in the interval (d_1+d_p+D_xi,t] is given by the expression (17) floor((t-(d_1+d_p+D_xi))/T) and illustrated in Figure 4. Figure 4: Packet departure instants at egress B.2 Egress router buffer fill The arrival instants at the egress border buffer partition the time axis into contiguous intervals of variable length. That is, for a given time t, there exists an integer n such that (18) tau_n <= t < tau_(n+1) where tau_n is defined as in (3) and illustrated in Figure 5. Figure 5: Arrivals at egress border buffer The fill level of the egress border buffer at time t is equal to a difference between the total number of packets that have arrived at the egress buffer and the number of packets transmitted onto the egress link. Equivalently, (19) N_E(t) = n - floor((t-(d_1+d_p+D_xi))/T) which follows from (17) and (18). Mercankosk Informational - Jan 2001 14 Virtual Wire - Analysis & Extensions July 2000 The fill level N_E(t) is a non-increasing function of t between the two successive arrivals at the egress border buffer or over the interval [tau_n,tau_(n+1)) as illustrated in Figure 6. Figure 6: Fill level between successive arrivals B.3 Packets in transit Packets in transit are the packets that have been presented to the DS-domain at the ingress router but have not yet reached the egress border buffer. So, the number of packets in transit at a given time t is given by (20) N_DS(t) = floor(t/T) - n which follows from (16) and (18). B.4 End-to-end delay The initial fill for a particular flow is the number of packets presented to the DS-domain just ahead of the transmission of the first packet onto the egress link minus the first packet. Noting that N_DS(xi_1) is the number of packets in transit at time xi_1 and N_E(xi_1) is the egress border buffer fill at the same time xi_1, we have (21) N_DS(xi_1) + N_E(xi_1) = floor(xi_1/T) - 1 = floor((d_1+d_p+D_xi)/T). Next, we derive an expression for the end-to-end delay over the VW. The end-to-end delay experienced by the kth packet is the time from when the last bit of the kth packet is presented to the DS-domain to when the first bit is transmitted onto the egress link. It can be found by rearranging (4) and substituting t_k for kT as (22) xi_k - t_k = xi_1 - T = d_1 + d_p + D_xi which shows that the end-to-end delay is constant as it would be over a dedicated wire. Adding and subtracting floor(xi_1/T)T and rearranging terms lead to (23) xi_k - t_k = (xi_1 - floor(xi_1/T)T) + (N_DS(xi_1) + N_E(xi_1))T where we also make use of (4) for k=1 and (21). Since the packet presentation to the DS-domain and packet transmission onto the egress link are periodic with period T, the first term on the right hand side of the (23) can be considered as the phase difference phi_R of the egress link relative to the ingress link which is (24) 0 <= phi_R = xi_1 - floor(xi_1/T)T < T. B.5 An invariance property Mercankosk Informational - Jan 2001 15 Virtual Wire - Analysis & Extensions July 2000 Adding equations (19) and (20) leads to (25) N_DS(t) + N_E(t) = floor(t/T) - floor((t-(d_1+d_p+D_xi))/T) = (floor(xi_1/T) - 1) + (floor(t/T) - floor((t-phi_R)/T)) = N_DS(xi_1) + N_E(xi_1) + (floor(t/T) - floor((t-phi_R)/T)) where we make use of (21) and (24). (25) indicates that the total number of packets for a periodic flow over a virtual wire remains constant to within one packet, once the egress link is initialized and the transmission starts. This is expected as packets flow in at the same rate that they flow out. This represents an invariance property of the total fill, including the packets in transit. Figure 7: Invariance property of the total fill As illustrated in Figure 7, the last term on the right hand side of (25) takes only the values 0 and 1. If we observe the term over the time, the term switches from 0 to 1 just after a packet is presented at the ingress. It stays at that value until the next packet is transmitted onto the egress link. Then it switches back to 0. Mercankosk Informational - Jan 2001 16 Virtual Wire - Analysis & Extensions July 2000 C. Continuity of flow In Appendix B, we assume sufficiently large buffers and initial total fill so that the data flow continuity could be maintained. In this section, we derive the necessary and sufficient conditions for data flow continuity. Continuity of data flow requires that every packet presented to the DS-domain at the ingress router at fixed intervals has to be transmitted onto the egress link after a fixed delay but again at the same fixed intervals, without being subject to packet loss or interruption during the life time of the flow. Packet loss occurs if a packet has to be discarded due to lack of storing capacity upon arrival at the egress border router. On the other hand, the flow is interrupted if a transmission cannot be started due to unavailability of a packet at the egress border buffer. In the latter case, the buffer is said to be underflowing. In what follows, we show that having a bounded queueing delay over the DS-domain is sufficient to have limits on the maximum fill level that the egress buffer could reach and the minimum fill level that it could fall. These limits can then be used to derive the necessary and sufficient conditions to avoid both overflows and underflows of the egress buffer, i.e. to maintain the continuity of flow. C.1 Egress buffer overflow As the fill level N_E(t) is a non-increasing function of t over the interval [tau_n,tau_(n+1)), it has its peak level at the start of the interval, when the nth packet arrives. Substituting tau_n for t in (19), we have (26) N_E(t) <= n - floor((t_n+d_n+d_p-(d_1+d_p+D_xi))/T) where we make use of (3). Again from the identity given in (1), we see that t_n/T is an integer. This leads to (27) N_E(t) <= - floor((d_n-d_1-D_xi)/T) <= ceiling((d_1+D_xi-d_n)/T) < (d_1+D_xi-d_n)/T + 1 where we make use of (12) and (13). An upper bound for N_E(t) can be found by considering a packet which experiences no queueing delay over the DS-domain and the case where the first packet experiences a queueing delay arbitrarily close to the maximum. That is, (28) N_E(t) < (D+D_xi)/T + 1 <= ceiling((D+D_xi)/T) where we make use of (14). Therefore, the maximum value N_E,max that the fill level N_E(t) of the egress buffer can reach satisfies the relation (29) N_E(t) <= N_E,max = ceiling((D+D_xi)/T). Mercankosk Informational - Jan 2001 17 Virtual Wire - Analysis & Extensions July 2000 So if we choose the size of the egress buffer B_E greater than or equal to N_E,max, then the egress buffer never overflows. Otherwise whenever the fill level N_E(t) reaches N_E,max, that is (30) N_E(t_max) = N_E,max > B_E the egress buffer overflows. This leads us to our first dimensioning rule (31) ceiling((D+D_xi)/T) <= B_E which is necessary and sufficient to avoid egress buffer overflow. C.2 Egress link starvation Since the fill level N_E(t) is a non-increasing function of t over the interval [tau_n,tau_(n+1)), it falls to its minimum level after the last, say the kth, packet is transmitted onto the egress link but before the (n+1)st packet arrives. Substituting tau_(n+1) for t in (19), we obtain (32) N_E(t) >= n - floor((t_(n+1)+d_(n+1)+d_p-(d_1+d_p+D_xi))/T) where we again make use of (3). Using the fact that t_(n+1)/T is an integer and the identity (1), we are led to (33) N_E(t) >= -1 - floor((d_(n+1)-d_1-D_xi)/T) >= ceiling((d_1+D_xi-d_(n+1))/T) - 1 >= (d_1+D_xi-d_(n+1))/T - 1 where we again make use of (12) and (13). Over all packet transfers, a lower bound for N_E(t) can be found this time by considering a packet which experiences a queueing delay arbitrarily close to the maximum and the case where the first packet experiences no queueing delay over the DS-domain. That is, (34) N_E(t) > (D_xi-D)/T - 1 >= floor((D_xi-D)/T) where we make use of (15). Therefore, the minimum value N_E,min that the fill level N_E(t) of the egress buffer can fall satisfies the relation (35) N_E(t) >= N_E,min = floor((D_xi-D)/T). Note that the minimum value N_E,min is determined at a time instant such that there will be at least one packet arrival ahead of the next scheduled transmission onto the egress link. So if we delay the transmission of the first packet onto the egress link longer than the maximum queueing delay possible over the DS-domain, then the fill level of the egress buffer never falls below zero. Otherwise, whenever circumstances occur for the fill level N_E(t) to fall down to N_E,min, that is Mercankosk Informational - Jan 2001 18 Virtual Wire - Analysis & Extensions July 2000 (36) N_E(t_min) = N_E,min < 0 the transmission is interrupted. This leads us to our second dimensioning rule (37) floor((D_xi-D)/T) >= 0 which is necessary and sufficient to avoid egress link starvation. Mercankosk Informational - Jan 2001 19 Virtual Wire - Analysis & Extensions July 2000 D. Superposition of periodic flows In this appendix, we outline a queueing model that forms the basis for quantifying the transfer delay bound over a DS-domain. We first describe the queueing model in terms of periodic micro flows to establish the relevance. We then summarize the studies in the literature to quantify the delay due to phase coincidences among micro flows and the delay due to packets within the same micro flow. We finally provide a set of numerical examples. D.1 Queueing model Consider M independent and identical EF-micro flows arriving at a DS-domain router such that all M are bound for the same output link. With the use of priority queueing, the output link capacity provides a well-defined minimum departure rate for EF-traffic. Let C denote the output link capacity. The transmission time delta of a size S packet onto the output link is given by S/C time units. Assuming that the micro flow behavior is strictly determined by (1), each micro flow presents one packet in every T time units to the DS- domain router. Accordingly, the aggregate is a superposition of periodic flows. Limiting the number M of micro flows such that M delta < T, we ensure that the aggregate input rate is less than the minimum departure rate. Consequently, at the output link of the DS- domain router, there is no issue of rate guarantees to individual micro flows. That is, whatever is presented to the DS-domain router, as specified by (1), will be transmitted onto the output link of the router in finite time with probability one, irrespective of the scheduling mechanism being deployed provided that the mechanism is work conserving. Figure 8: An illustration of an arrival pattern The aggregate behavior is completely specified by a set of arrival epochs in any interval of length T. Each set consists of one arrival per micro flow. An illustration of a possible arrival pattern is given in Figure 8. The worst case phasing occurs when all M arrival epochs coincide. However, it should be clear that the likelihood of such occurrence is very small. Nonetheless, phase coincidences give rise to an inevitable multiplexing delay. Assuming a deterministic scheduler, we note that during an interval of no activity change, there is no change in the delay for individual micro flows. An activity change means that either one micro flow drops out or another one joins in. In determining the likelihood of exceeding a specified delay bound, we can safely ignore the repetitive arrival patterns (which is the case over an interval of no activity change) and only consider distinct arrival patterns. In other words, we look at the percentage of arrival patterns that result in a delay exceeding a specified bound. Given the non-ergodic character of the behavior aggregate, we note that the measure of interest is the likelihood of exceeding a specified delay bound (ensemble average) rather than the percentage of time a specified delay bound is exceeded (time average). Mercankosk Informational - Jan 2001 20 Virtual Wire - Analysis & Extensions July 2000 D.2 Delay due to phase coincidences with other micro flows At a multiplexing stage, the delay due to phase coincidences has been studied in [5] and [6]. The method involves tagging a micro flow and considering an arbitrary arrival pattern of other micro flows for determining the probability of delay exceeding a specified value for the tagged micro flow. Given that each micro flow is strictly shaped to rate R, each packet arrives at a time when it is supposed to arrive. Consequently, FCFS scheduling becomes a natural choice for arbitration of EF micro flows. We note that any other sophisticated arbitration method would require per micro flow state information. Without loss of generality, a packet from the tagged micro flow is assumed to arrive at time t. The waiting time of the packet is then given by the unfinished work in the queue (referred to as nD/D/1 queue) at time t, due to M-1 packet arrivals, one per every other micro flow, which are distributed uniformly and independently over the interval [t-T,t) provided that M delta < T. In [5], the associated distribution is given as (38) Pr{ delay > D } = SUM D(j) over [j,ceiling(D/delta),M-1] where D(j) = A(j) C(M-1,j) B(j)^j (1-B(j))^(M-1-j) A(j) = (T+D-(M-1)delta)/(T+D-j delta) C(M-1,j) denotes M-1 choose j B(j) = (j delta - D) / T. The result given in (38) can be normalized, in a straight forward manner, to the transmission time delta. Consequently, the result is not specific to any network architecture. Furthermore, since the choice of tagged micro flow is arbitrary, the delay distribution applies to all micro flows. The issue of heterogeneous packet sizes is addressed elsewhere. It is also a common practice in the literature to normalize the delay to the period T which we have avoided. The normalization naturally takes place in deriving necessary and sufficient conditions given by (6) and (7). D.3 Delay due to packets from the same micro flow The study given in [6] provides a method of determining the delay distribution for an individual micro flow where the aggregate consists of micro flows with heterogeneous rates. Let M_i denote the number of type-i micro flows with period T_i. Then, we ensure that the aggregate input rate does not exceed the minimum departure rate by satisfying the condition SUM (M_i delta) / T_i < 1 where the summation is taken over the set of possible micro flow types. In determining the unfinished work in the queue at time t, the method makes the observation, as illustrated in Figure 9, that the number of packet arrivals from a given type of micro flow in an interval [t-s,t) of length s consists of a deterministic part and a random part. Note that in the former case, we only have the random part and that the intervals of size greater than T need not be considered. Mercankosk Informational - Jan 2001 21 Virtual Wire - Analysis & Extensions July 2000 Figure 9: The number of packet arrivals in [t-s,t) The study [6] develops tight upper bounds for delay distribution functions as the exact distributions are, in general, difficult, if not impossible, to determine. Fortunately though, the study [6] also concludes that the delay distribution for an arbitrary traffic mix is always upper bounded by that of an aggregate which consists of micro flows of the smallest rate only and totalling to the same intensity. In the case of homogeneous micro flows in a single multiplexing stage, it is always correct that the delay bound for an individual micro flow never exceeds its period T and in general is much smaller than the period by a couple of orders of magnitude. As a result, a packet cannot be further delayed due to packets from the micro flow that it belongs to. We note however that when the aggregate consists of micro flows with different rates, this is not necessarily correct. In particular, a packet from a high rate micro flow may find itself queued behind packets from the same micro flow. Despite this fact, the method that we refer above already incorporates the situation in its derivations. D.4 Numerical results Figure 10: Delay distribution for nD/D/1 queues, aggregate intensity 0.90C First, we consider multiplexing homogenous micro flows at an aggregate intensity of 0.90C. The number M of micro flows varies between 50 to 900. The corresponding delay distributions, obtained by using (38), are plotted in Figure 10. Clearly, as the number M increases higher delays are observed. But even for 900 micro flows, the probability of delay exceeding 62 delta time units is 10^-9. This value of delay is smaller than the worst case, which is 899 delta time units, by an order of magnitude. We note that the corresponding distribution for an (M+D)/D/1 system represents the limiting case as M->infinity. Next, we keep the aggregate traffic intensity at 0.90C but consider different aggregate traffic compositions. The upper bounds for delay distribution functions, obtained by using the formulation given in [6], are plotted in Figure 11 where each composition is labelled by a triple (M_1,M_2,M_3) and M_i indicates the number of type-i micro flows. In the first composition, we have few high rate flows. Then we replace some high rate flows with many low rate flows. This results in an increase in the delay experienced, but again to some moderate values. For example, in the third composition where we have 461 micro flows, the probability of delay exceeding 40 delta time units is 10^-9. The delay bound is again 10 times less than the worst case figure. Figure 11: Upper bound distribution functions, T_1=1000delta,T_2=33.33delta,T_3=6.67delta Mercankosk Informational - Jan 2001 22 Virtual Wire - Analysis & Extensions July 2000 Comparing these two sets of experiments, the following observations can be made. First, the upper bound for a distribution function yields the same result as (38) does, if there is only one type of micro flow. Second, any aggregate traffic composition would result in a better delay performance than the case where enough number of lowest rate micro flows making up the same aggregate intensity. This means that for a given fraction f, the number of possible minimum rate micro flows determines the limiting distribution for any aggregate traffic mix. Mercankosk Informational - Jan 2001 23 Virtual Wire - Analysis & Extensions July 2000 E. Queueing models with rest periods Queueing models with rest periods are used in the analysis of slotted queueing systems. This appendix provides relevant references for studying the delay due to non-EF traffic. In [7], the study shows that, for a slotted M/G/1 system, the waiting time is the sum of two independent random variables representing the waiting time when there are no rest periods and an additional delay distributed as the residual life of the rest period. In [5], the study provides not only the delay distribution as given in (38) when there is no slotting but also the delay distribution when the transmission is restricted to start at a slot boundary where each slot is of fixed length equal to delta. Since packet arrival epochs are assumed to be uniformly distributed over an interval of length T, the residual slot period until the start of the next transmission opportunity (following the arrival of a packet from the tagged micro flow) is uniformly distributed over (0,delta]. In [8], based on the results given in [5], we show that a slotted nD/D/1 queue exhibits the same property, given in [7], as a slotted M/G/1 system does. Mercankosk Informational - Jan 2001 24 Virtual Wire - Analysis & Extensions July 2000 F. A network of routers in tandem In this appendix, we consider two simulation scenarios. In the first scenario, we have a network of routers in tandem where micro flows with a common destination are incrementally introduced at each hop. In the second scenario, the same set of micro flows compete for access onto the same output link of a stand-alone router. The latter is equivalent to having infinite capacity links between the intermediate routers in tandem except at the last hop. An illustration of a network of routers in tandem is given in Figure 12. The network is comprised of a specified number of routers. The background flows in groups are attached to specified routers and join the aggregate flow. We focus on a single tagged or test flow and study its delay characteristics through the network. The test flow can be attached to any router and runs from the router it is attached (the ingress router) to the egress router. Figure 12: A network with 4 hops, all flows through the same egress router Figure 13: A stand-alone router With the use of infinite capacity links between the routers, the network of routers in tandem can be represented with a stand-alone router as illustrated in Figure 13. To facilitate the comparison, the flow arrival patterns used for the routers in tandem are exactly regenerated and presented to the stand-alone router. F.1 Traffic model In quantifying the delay due to phase coincidences, we maintain a periodic test flow. Once the activity of the test flow starts, it presents packets to the network continuously at the specified rate until the end of simulation run. In determining ensemble averages, we use a specified number of background micro flows with identical period. After an uniformly distributed transient delay, each background flow determines a random phase within its period to start its activity. After periodically generating a specified number packets at a given phase, the background flow stops for a geometrically distributed number of periods. At the end of the silence interval, it repeats its activity with a different phase. The use of Markovian background traffic can be considered as a limiting case. As a matter of fact, considering a background traffic comprised of M independent periodic flows each with period T, it is true that the number of arrivals over an interval (t-s,t] is distributed according to the binomial distribution where s<T. Consequently, in the limit as M->infinity while keeping the aggregate intensity fixed at fC=N delta/T, we obtain Poisson arrival process with parameter fC. In other words, the Markovian noise is Mercankosk Informational - Jan 2001 25 Virtual Wire - Analysis & Extensions July 2000 comprised of infinitely many periodic flows each with period infinity while the aggregate intensity is fC. Accordingly, the background traffic either is comprised of a specified number of periodic flows with randomly changing phases or is generated by a Markovian process of equivalent intensity. Although it is assumed that priority queueing is deployed at each router, the non-EF traffic is not considered. The ingress-to-egress total queuing delay experienced by each test packet is recorded to obtain a delay histogram. F.2 Numerical results In the first set of experiments, we use a test flow that periodically generates one packet in every 100 delta time units, i.e. with intensity 0.01C. In the background, we use 100 micro flows and 10 hops, i.e. 10 micro flows per hop. Each flow generates 20 packets with period 100 delta time units at a random phase. After each phase activity, a flow remains silent for geometrically distributed number of 100 delta time units with a mean value of 1.053. Accordingly, the intensity of background flows joining the aggregate at each hop is 0.095C. In the second set of experiments, the set of periodic flows at each router is replaced with a Poisson source of the same intensity. In both set of experiments, the location of the test flow is varied from nearest to the egress router to furthest. The associated delay distributions are plotted in Figure 14. Despite the fact that the aggregate intensity remains fixed, we observe that a test flow closer to the egress router experiences a better delay performance. We also note that, in the case of periodic background flows, after 5 hops the difference becomes unnoticeable. In the case of Markovian flows, the difference is marginal altogether. Figure 14: Delay distribution versus test flow position, aggregate intensity In Figure 15, the tail distribution of the delay experienced by the packets of test flow when located at both extremes are plotted against that of the packets of test flow when the aggregate flow is fed into a stand-alone router. The delay experienced through the routers in tandem is depicted by dashed lines whereas that of through the stand-alone router is depicted by solid lines. Figure 15: Delay distribution, tandem versus stand-alone, aggregate intensity As the curves indicate, when the test flow is located furthest from the egress router, the performance is inferior to that of stand- alone router. However, in practical terms, it is insignificant. The difference can be attributed to arrival pattern transformations. A pattern transformation results due to the possibility of packets from micro flows joining the aggregate at a nearer router overtaking packets from the test flow until the latter packets reach the router Mercankosk Informational - Jan 2001 26 Virtual Wire - Analysis & Extensions July 2000 in question. The simulation results show that the effect of such transformations are minimal. When the test flow is located nearest to the egress router, it is in the most advantageous position. As indicated earlier, the stand- alone router represents the situation of having infinite capacity links between the intermediate routers in tandem except at the last hop. A consequence of such a situation is that packets arriving at the intermediate routers appear at the last hop in no time. In the case of finite but relatively higher rate links, each packet appears at the last hop shortly following its arrival. Under the circumstances, the test flow located nearest loses its advantage but not beyond the delay due to originating phase coincidences. Mercankosk Informational - Jan 2001 27