--- 1/draft-ietf-ippm-multimetrics-09.txt 2009-04-23 06:12:09.000000000 +0200 +++ 2/draft-ietf-ippm-multimetrics-10.txt 2009-04-23 06:12:09.000000000 +0200 @@ -1,46 +1,55 @@ Network Working Group E. Stephan Internet-Draft France Telecom Intended status: Standards Track L. Liang -Expires: April 18, 2009 University of Surrey +Expires: October 24, 2009 University of Surrey A. Morton AT&T Labs - October 15, 2008 + April 22, 2009 IP Performance Metrics (IPPM) for spatial and multicast - draft-ietf-ippm-multimetrics-09 + draft-ietf-ippm-multimetrics-10 Status of this Memo - By submitting this Internet-Draft, each author represents that any - applicable patent or other IPR claims of which he or she is aware - have been or will be disclosed, and any of which he or she becomes - aware will be disclosed, in accordance with Section 6 of BCP 79. + This Internet-Draft is submitted to IETF in full conformance with the + provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. - This Internet-Draft will expire on April 18, 2009. + This Internet-Draft will expire on October 24, 2009. + +Copyright Notice + + Copyright (c) 2009 IETF Trust and the persons identified as the + document authors. All rights reserved. + + This document is subject to BCP 78 and the IETF Trust's Legal + Provisions Relating to IETF Documents in effect on the date of + publication of this document (http://trustee.ietf.org/license-info). + Please review these documents carefully, as they describe your rights + and restrictions with respect to this document. Abstract The IETF has standardized IP Performance Metrics (IPPM) for measuring end-to-end performance between two points. This memo defines two new categories of metrics that extend the coverage to multiple measurement points. It defines spatial metrics for measuring the performance of segments of a source to destination path, and metrics for measuring the performance between a source and many destinations in multiparty communications (e.g., a multicast tree). @@ -63,21 +72,20 @@ 8. One-to-group Sample Statistics . . . . . . . . . . . . . . . . 26 9. Measurement Methods: Scalability and Reporting . . . . . . . . 36 10. Manageability Considerations . . . . . . . . . . . . . . . . . 39 11. Security Considerations . . . . . . . . . . . . . . . . . . . 44 12. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 45 13. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 45 14. References . . . . . . . . . . . . . . . . . . . . . . . . . . 49 14.1. Normative References . . . . . . . . . . . . . . . . . . 49 14.2. Informative References . . . . . . . . . . . . . . . . . 50 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 50 - Intellectual Property and Copyright Statements . . . . . . . . . . 51 1. Introduction and Scope IETF has standardized IP Performance Metrics (IPPM) for measuring end-to-end performance between two points. This memo defines two new categories of metrics that extend the coverage to multiple measurement points. It defines spatial metrics for measuring the performance of segments of a source to destination path, and metrics for measuring the performance between a source and many destinations in multiparty communications (e.g., a multicast tree). @@ -87,21 +95,21 @@ Spatial metrics measure the performance of each segment along a path. One-to-group metrics measure the performance for a group of users. These metrics are derived from one-way end-to-end metrics, all of which follow the IPPM framework [RFC2330]. This memo is organized as follows: Section 2 introduces new terms that extend the original IPPM framework [RFC2330]. Section 3 motivates each metric category and briefly introduces the new metrics. Sections 4 through 7 develop each category of metrics with definitions and statistics. Then the memo discusses the impact of - the measurement methods on the scaleability and proposes an + the measurement methods on the scalability and proposes an information model for reporting the measurements. Finally, the memo discusses security aspects related to measurement and registers the metrics in the IANA IP Performance Metrics Registry [RFC4148]. The scope of this memo is limited to metrics using a single source packet or stream, and observations of corresponding packets along the path (spatial), at one or more destinations (one-to-group), or both. Note that all the metrics defined herein are based on observations of packets dedicated to testing, a process which is called active measurement. Passive measurement (for example, a spatial metric @@ -1184,42 +1192,42 @@ The one-to-group metrics defined above are directly achieved by collecting relevant unicast one-way metrics measurements results and by gathering them per group of receivers. They produce network performance information which guides engineers toward potential problems which may have happened on any branch of a multicast routing tree. The results of these metrics are not directly usable to present the performance of a group because each result is made of a huge number of singletons which are difficult to read and analyze. As an - example, delay are not comparable because the distance between + example, delays are not comparable because the distance between receiver and sender differs. Furthermore they don't capture relative performance situation a multiparty communication. From the performance point of view, the multiparty communication services not only require the support of absolute performance information but also information on "relative performance". The relative performance means the difference between absolute performance of all users. Directly using the one-way metrics cannot present the relative performance situation. However, if we use the variations of all users one-way parameters, we can have new metrics to measure the difference of the absolute performance and hence provide the threshold value of relative performance that a multiparty service might demand. A very good example of the high relative - performance requirement is the online gaming. A very light - difference in delay might result in failure in the game. We have to - use multicast specific statistic metrics to define the relative delay - required by online gaming. There are many other services, e.g. - online biding, online stock market, etc., that require multicast - metrics in order to evaluate the network against their requirements. - Therefore, we can see the importance of new, multicast specific, - statistic metrics to feed this need. + performance requirement is online gaming. A very small difference in + delay might result in failure in the game. We have to use multicast + specific statistic metrics to define the relative delay required by + online gaming. There are many other services, e.g. online biding, + online stock market, etc., that require multicast metrics in order to + evaluate the network against their requirements. Therefore, we can + see the importance of new, multicast specific, statistic metrics to + feed this need. We might also use some one-to-group statistic conceptions to present and report the group performance and relative performance to save the report transmission bandwidth. Statistics have been defined for One- way metrics in corresponding RFCs. They provide the foundation of definition for performance statistics. For instance, there are definitions for minimum and maximum One-way delay in [RFC2679]. However, there is a dramatic difference between the statistics for one-to-one communications and for one-to-many communications. The former one only has statistics over the time dimension while the @@ -1270,85 +1278,85 @@ rather than sending every one-way singleton it observed. As long as an appropriate time interval is decided, appropriate statistics can represent the performance in a certain accurate scale. How to decide the time interval and how to bootstrap all points of interest and the reference point depend on applications. For instance, applications with lower transmission rate can have the time interval longer and ones with higher transmission rate can have the time interval shorter. However, this is out of the scope of this memo. Moreover, after knowing the statistics over the time dimension, one - might want to know how this statistics distributed over the space - dimension. For instance, a TV broadcast service provider had the - performance Matrix M and calculated the One-way delay mean over the - time dimension to obtain a delay Vector as {V1,V2,..., VN}. He then - calculated the mean of all the elements in the Vector to see what - level of delay he has served to all N users. This new delay mean - gives information on how good the service has been delivered to a - group of users during a sampling interval in terms of delay. It - needs twice calculation to have this statistic over both time and - space dimensions. We name this kind of statistics 2-level statistics - to distinct with those 1-level statistics calculated over either - space or time dimension. It can be easily proven that no matter over - which dimension a 2-level statistic is calculated first, the results - are the same. I.e. one can calculate the 2-level delay mean using - the Matrix M by having the 1-level delay mean over the time dimension - first and then calculate the mean of the obtained vector to find out - the 2-level delay mean. Or, he can do the 1-level statistic - calculation over the space dimension first and then have the 2-level - delay mean. Both two results will be exactly the same. Therefore, - when defining a 2-level statistic there is no need to specify the - order in which the calculation is executed. + might want to know how these statistics are distributed over the + space dimension. For instance, a TV broadcast service provider had + the performance Matrix M and calculated the One-way delay mean over + the time dimension to obtain a delay Vector as {V1,V2,..., VN}. He + then calculated the mean of all the elements in the Vector to see + what level of delay he has served to all N users. This new delay + mean gives information on how good the service has been delivered to + a group of users during a sampling interval in terms of delay. It + requires twice as much calculation to have this statistic over both + time and space dimensions. This kind of statistics is referred to as + 2-level statistics to distinguish them from 1-level statistics + calculated over either space or time dimension. It can be easily + proven that no matter over which dimension a 2-level statistic is + calculated first, the results are the same. I.e. one can calculate + the 2-level delay mean using the Matrix M by having the 1-level delay + mean over the time dimension first and then calculate the mean of the + obtained vector to find out the 2-level delay mean. Or, he can do + the 1-level statistic calculation over the space dimension first and + then have the 2-level delay mean. Both two results will be exactly + the same. Therefore, when defining a 2-level statistic there is no + need to specify the order in which the calculation is executed. Many statistics can be defined for the proposed one-to-group metrics over either the space dimension or the time dimension or both. This memo treats the case where a stream of packets from the Source results in a sample at each of the Receivers in the Group, and these samples are each summarized with the usual statistics employed in one-to-one communication. New statistic definitions are presented, which summarize the one-to-one statistics over all the Receivers in the Group. 8.1. Discussion on the Impact of packet loss on statistics The packet loss does have effects on one-way metrics and their - statistics. For example, the lost packet can result in an infinite + statistics. For example, a lost packet can result in an infinite one-way delay. It is easy to handle the problem by simply ignoring the infinite value in the metrics and in the calculation of the - corresponding statistics. However, the packet loss has so strong + corresponding statistics. However, the packet loss has such a strong impact on the statistics calculation for the one-to-group metrics that it can not be solved by the same method used for one-way metrics. This is due to the complexity of building a matrix, which is needed for calculation of the statistics proposed in this memo. The situation is that measurement results obtained by different end users might have different packet loss pattern. For example, for User1, packet A was observed lost. And for User2, packet A was successfully received but packet B was lost. If the method to overcome the packet loss for one-way metrics is applied, the two singleton sets reported by User1 and User2 will be different in terms of the transmitted packets. Moreover, if User1 and User2 have different number of lost packets, the size of the results will be different. Therefore, for the centralized calculation, the reference point will not be able to use these two results to build up the group - Matrix and can not calculate the statistics. In an extreme - situation, no single packet arrives all users in the measurement and - the Matrix will be empty. One of the possible solutions is to - replace the infinite/undefined delay value by the average of the two - adjacent values. For example, if the result reported by user1 is { - R1dT1 R1dT2 R1dT3 ... R1dTK-1 UNDEF R1dTK+1... R1DM } where "UNDEF" - is an undefined value, the reference point can replace it by R1dTK = - {(R1dTK-1)+( R1dTK+1)}/2. Therefore, this result can be used to - build up the group Matrix with an estimated value R1dTK. There are - other possible solutions such as using the overall mean of the whole - result to replace the infinite/undefined value, and so on. However - this is out of the scope of this memo. + Matrix and can not calculate the statistics. The extreme situation + being the case when no packets arrive at any user. One of the + possible solutions is to replace the infinite/undefined delay value + by the average of the two adjacent values. For example, if the + result reported by user1 is { R1dT1 R1dT2 R1dT3 ... R1dTK-1 UNDEF + R1dTK+1... R1DM } where "UNDEF" is an undefined value, the reference + point can replace it by R1dTK = {(R1dTK-1)+( R1dTK+1)}/2. Therefore, + this result can be used to build up the group Matrix with an + estimated value R1dTK. There are other possible solutions such as + using the overall mean of the whole result to replace the infinite/ + undefined value, and so on. However this is out of the scope of this + memo. For the distributed calculation, the reported statistics might have different "weight" to present the group performance, which is especially true for delay and ipdv relevant metrics. For example, User1 calculates the Type-P-Finite-One-way-Delay-Mean R1DM as shown in Figure. 8 without any packet loss and User2 calculates the R2DM with N-2 packet loss. The R1DM and R2DM should not be treated with equal weight because R2DM was calculated only based on 2 delay values in the whole sample interval. One possible solution is to use a weight factor to mark every statistic value sent by users and use @@ -1663,49 +1671,54 @@ distributed statistic calculation method. The sample should include all metrics parameters, the values and the corresponding sequence numbers. The transmission of the whole sample can cost much more bandwidth than the transmission of the statistics that should include all statistic parameters specified by policies and the additional information about the whole sample, such as the size of the sample, the group address, the address of the point of interest, the ID of the sample session, and so on. Apparently, the centralized calculation method can require much more bandwidth than the distributed calculation method when the sample size is big. This is - especially true when the measurement has huge number of the points of - interest. It can lead to a scalability issue at the reference point - by over load the network resources. The distributed calculation - method can save much more bandwidth and release the pressure of the - scalability issue at the reference point side. However, it can - result in the lack of information because not all measured singletons - are obtained for building up the group matrix. The performance over - time can be hidden from the analysis. For example, the loss pattern - can be missed by simply accepting the loss ratio as well as the delay - pattern. This tradeoff between the bandwidth consuming and the - information acquiring has to be taken into account when design the - measurement campaign to optimize the measurement results delivery. - The possible solution could be to transit the statistic parameters to + especially true when the measurement has a very large number of the + points of interest. It can lead to a scalability issue at the + reference point by overloading the network resources. + + The distributed calculation method can save much more bandwidth and + mitigate issues arising from scalability at the reference point side. + + However, it may result in a lost of information. As all measured + singletons are not available for building up the group matrix, the + real performance over time can be hidden from the result. For + example, the loss pattern can be missed by simply accepting the loss + ratio. This tradeoff between bandwidth consumption and information + acquisition has to be taken into account when designing the + measurement approach. + + One possible solution could be to transit the statistic parameters to the reference point first to obtain the general information of the - group performance. If the detail results are required, the reference + group performance. If detailed results are required, the reference point should send the requests to the points of interest, which could be particular ones or the whole group. This procedure can happen in the off peak time and can be well scheduled to avoid delivery of too many points of interest at the same time. Compression techniques can also be used to minimize the bandwidth required by the transmission. + This could be a measurement protocol to report the measurement results. However, this is out of the scope of this memo. 9.2. Measurement To prevent any bias in the result, the configuration of a one-to-many - measure must take in consideration that implicitly more packets will - to be routed than send and selects a test packets rate that will not - impact the network performance. + measure must take in consideration that intrically more packets will + to be routed than sent (copies of a packet sent are expected to + arrive at many destination points) and selects a test packets rate + that will not impact the network performance. 9.3. Effect of Time and Space Aggregation Order on Stats This section presents the impact of the aggregation order on the scalability of the reporting and of the computation. It makes the hypothesis that receivers are not co-located and that results are gathered in a point of reference for further usages. Multimetrics samples are represented in a matrix as illustrated below @@ -1722,66 +1735,66 @@ n RnS1 RnS2 RnS3 ... RnSk / S1M S2M S3M ... SnM Stats over space \------------- ------------/ \/ Stat over space and time Figure 13: Impact of space aggregation on multimetrics Stat - 2 methods are available to compute statistics on a matrix: + Two methods are available to compute statistics on a matrix: o Method 1: The statistic metric is computed over time and then over space; o Method 2: The statistic metric is computed over space and then over time. These 2 methods differ only by the order of the aggregation. The order does not impact the computation resources required. It does not change the value of the result. However, it impacts severely the minimal volume of data to report: - o Method 1: Each point of interest computes periodically statistics over time to lower the volume of data to report. They are - reported to the reference point for computing the stat over space. - This volume no longer depends on the number of samples. It is - only proportional to the computation period; + reported to the reference point for for subsequent computations + over the spatial dimension. This volume no longer depends on the + number of samples. It is only proportional to the computation + period; o Method 2: The volume of data to report is proportional to the number of samples. Each sample, RiSi, must be reported to the reference point for computing statistic over space and statistic over time. The volume increases with the number of samples. It is proportional to the number of test packets; Method 2 has severe drawbacks in terms of security and dimensioning: o Increasing the rate of the test packets may result in a Denial of Service toward the points of reference; o The dimensioning of a measurement system is quite impossible to validate because any increase of the rate of the test packets will increase the bandwidth requested to collect the raw results. The computation period over time period (commonly named aggregation period) provides the reporting side with a control of various collecting aspects such as bandwidth, computation and storage capacities. So this draft defines metrics based on method 1. 9.3.1. Impact on spatial statistics - 2 methods are available to compute spatial statistics: + Two methods are available to compute spatial statistics: o Method 1: spatial segment metrics and statistics are preferably - computed over time by each points of interest; + computed over time for each points of interest; o Method 2: Vectors metrics are intrinsically instantaneous space - metrics which must be reported using method2 whenever + metrics which must be reported using Method2 whenever instantaneous metrics information is needed. 9.3.2. Impact on one-to-group statistics - 2 methods are available to compute group statistics: + Two methods are available to compute group statistics: o Method1: Figure 5 and Figure 8 illustrate the method chosen: the one-to-one statistic is computed per interval of time before the computation of the mean over the group of receivers; o Method2: Figure 13 presents the second one, metric is computed over space and then over time. 10. Manageability Considerations Usually IPPM WG documents defines each metric reporting within its definition. This document defines the reporting of all the metrics @@ -1842,33 +1855,35 @@ 10.2. Reporting One-to-group metric All reporting rules described in [RFC2679] and [RFC2680] apply to the corresponding One-to-group metrics. Following are specific parameters that should be reported. 10.2.1. Path As suggested by the [RFC2679] and [RFC2680] , the path traversed by the packet SHOULD be reported, if possible. For One-to-group - metrics, there is a path tree SHOULD be reported rather than A path. - This is even more impractical. If, by anyway, partial information is - available to report, it might not be as valuable as it is in the one- - to-one case because the incomplete path might be difficult to - identify its position in the path tree. For example, how many points - of interest are reached by the packet travelled through this - incomplete path? + metrics, the path tree between the source and the destinations or the + set of paths between the source and each destination SHOULD be + reported. + + Path tree might not be as valuable as individual paths because an + incomplete path might be difficult to identify in the path tree. For + example, how many points of interest are reached by a packet + travelling along an incomplete path? 10.2.2. Group size The group size should be reported as one of the critical management - parameters. Unlike the spatial metrics, there is no need of order of - points of interests. + parameters. One-to-group metrics, unlike spatial metrics, don't + require the ordering of the points of interests because group members + receive the packets in parallel. 10.2.3. Timestamping bias It is the same as described in section 10.1.3. 10.2.4. Reporting One-to-group One-way Delay It is the same as described in section 10.1.4. 10.2.5. Measurement method @@ -1882,30 +1897,30 @@ IANA assigns each metric defined by the IPPM WG with a unique identifier as per [RFC4148] in the IANA-IPPM-METRICS-REGISTRY-MIB. 10.4. Information model This section presents the elements of information and the usage of the information reported for network performance analysis. It is out of the scope of this section to define how the information is reported. - The information model is build with pieces of information introduced + The information model is built with pieces of information introduced and explained in one-way delay definitions [RFC2679], in packet loss definitions [RFC2680] and in IPDV definitions of [RFC3393] and [RFC3432]. It includes not only information given by "Reporting the metric" sections but by sections "Methodology" and "Errors and - Uncertainties" sections. + Uncertainties". Following are the elements of information taken from end-to-end - definitions referred in this memo and from spatial and multicast - metrics it defines: + metrics definitions referred in this memo and from spatial and + multicast metrics it defines: o Packet_type, The Type-P of test packets (Type-P); o Packet_length, a packet length in bits (L); o Src_host, the IP address of the sender; o Dst_host, the IP address of the receiver; o Hosts_serie: