--- 1/draft-ietf-ippm-testplan-rfc2679-00.txt 2012-03-11 19:13:57.422671718 +0100 +++ 2/draft-ietf-ippm-testplan-rfc2679-01.txt 2012-03-11 19:13:57.470671586 +0100 @@ -1,24 +1,23 @@ Network Working Group L. Ciavattone Internet-Draft AT&T Labs Intended status: Informational R. Geib -Expires: April 23, 2012 Deutsche Telekom +Expires: September 11, 2012 Deutsche Telekom A. Morton AT&T Labs M. Wieser - University of Applied Sciences - Darmstadt - October 21, 2011 + Technical University Darmstadt + March 10, 2012 Test Plan and Results for Advancing RFC 2679 on the Standards Track - draft-ietf-ippm-testplan-rfc2679-00 + draft-ietf-ippm-testplan-rfc2679-01 Abstract This memo proposes to advance a performance metric RFC along the standards track, specifically RFC 2679 on One-way Delay Metrics. Observing that the metric definitions themselves should be the primary focus rather than the implementations of metrics, this memo describes the test procedures to evaluate specific metric requirement clauses to determine if the requirement has been interpreted and implemented as intended. Two completely independent implementations @@ -38,25 +37,25 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on April 23, 2012. + This Internet-Draft will expire on September 11, 2012. Copyright Notice - Copyright (c) 2011 IETF Trust and the persons identified as the + Copyright (c) 2012 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as @@ -70,53 +69,53 @@ Without obtaining an adequate license from the person(s) controlling the copyright in such materials, this document may not be modified outside the IETF Standards Process, and derivative works of it may not be created outside the IETF Standards Process, except to format it for publication as an RFC or to translate it into languages other than English. Table of Contents 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . 4 - 1.1. RFC 2679 Coverage . . . . . . . . . . . . . . . . . . . . 5 2. A Definition-centric metric advancement process . . . . . . . 5 3. Test configuration . . . . . . . . . . . . . . . . . . . . . . 6 4. Error Calibration, RFC 2679 . . . . . . . . . . . . . . . . . 10 4.1. NetProbe Error and Type-P . . . . . . . . . . . . . . . . 11 4.2. Perfas Error and Type-P . . . . . . . . . . . . . . . . . 13 5. Pre-determined Limits on Equivalence . . . . . . . . . . . . . 14 6. Tests to evaluate RFC 2679 Specifications . . . . . . . . . . 14 6.1. One-way Delay, ADK Sample Comparison - Same & Cross Implementation . . . . . . . . . . . . . . . . . . . . . . 15 6.1.1. NetProbe Same-implementation results . . . . . . . . . 16 6.1.2. Perfas Same-implementation results . . . . . . . . . . 17 6.1.3. One-way Delay, Cross-Implementation ADK Comparison . . 18 6.1.4. Conclusions on the ADK Results for One-way Delay . . . 18 - 6.2. One-way Delay, Loss threshold, RFC 2679 . . . . . . . . . 19 - 6.2.1. NetProbe results for Loss Threshold . . . . . . . . . 20 - 6.2.2. Perfas Results for Loss Threshold . . . . . . . . . . 20 - 6.2.3. Conclusions for Loss Threshold . . . . . . . . . . . . 20 - 6.3. One-way Delay, First-bit to Last bit, RFC 2679 . . . . . . 20 - 6.3.1. NetProbe and Perfas Results for Serialization . . . . 21 - 6.3.2. Conclusions for Serialization . . . . . . . . . . . . 22 - 6.4. One-way Delay, Difference Sample Metric (Lab) . . . . . . 22 - 6.4.1. NetProbe results for Differential Delay . . . . . . . 23 - 6.4.2. Perfas results for Differential Delay . . . . . . . . 24 - 6.4.3. Conclusions for Differential Delay . . . . . . . . . . 24 - 6.5. Implementation of Statistics for One-way Delay . . . . . . 24 - 7. Security Considerations . . . . . . . . . . . . . . . . . . . 25 - 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25 - 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 25 - 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 25 - 10.1. Normative References . . . . . . . . . . . . . . . . . . . 25 - 10.2. Informative References . . . . . . . . . . . . . . . . . . 26 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 26 + 6.1.5. Additional Investigations . . . . . . . . . . . . . . 19 + 6.2. One-way Delay, Loss threshold, RFC 2679 . . . . . . . . . 22 + 6.2.1. NetProbe results for Loss Threshold . . . . . . . . . 23 + 6.2.2. Perfas Results for Loss Threshold . . . . . . . . . . 23 + 6.2.3. Conclusions for Loss Threshold . . . . . . . . . . . . 23 + 6.3. One-way Delay, First-bit to Last bit, RFC 2679 . . . . . . 24 + 6.3.1. NetProbe and Perfas Results for Serialization . . . . 24 + 6.3.2. Conclusions for Serialization . . . . . . . . . . . . 25 + 6.4. One-way Delay, Difference Sample Metric (Lab) . . . . . . 26 + 6.4.1. NetProbe results for Differential Delay . . . . . . . 26 + 6.4.2. Perfas results for Differential Delay . . . . . . . . 27 + 6.4.3. Conclusions for Differential Delay . . . . . . . . . . 27 + 6.5. Implementation of Statistics for One-way Delay . . . . . . 27 + 7. Security Considerations . . . . . . . . . . . . . . . . . . . 28 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 28 + 9. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 28 + 10. References . . . . . . . . . . . . . . . . . . . . . . . . . . 29 + 10.1. Normative References . . . . . . . . . . . . . . . . . . . 29 + 10.2. Informative References . . . . . . . . . . . . . . . . . . 30 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 30 1. Introduction The IETF (IP Performance Metrics working group, IPPM) has considered how to advance their metrics along the standards track since 2001, with the initial publication of Bradner/Paxson/Mankin's memo [ref to work in progress, draft-bradner-metricstest-]. The original proposal was to compare the results of implementations of the metrics, because the usual procedures for advancing protocols did not appear to apply. It was found to be difficult to achieve consensus on exactly how to @@ -125,21 +124,21 @@ to keep the network paths equal, and because considerable variation was allowed in the parameters (and therefore implementation) of each metric. Flexibility in metric definitions, essential for customization and broad appeal, made the comparison task quite difficult. A renewed work effort sought to investigate ways in which the measurement variability could be reduced and thereby simplify the problem of comparison for equivalence. - There is *preliminary* consensus [I-D.ietf-ippm-metrictest] that the + There is consensus represented in [I-D.ietf-ippm-metrictest] that the metric definitions should be the primary focus of evaluation rather than the implementations of metrics, and equivalent results are deemed to be evidence that the metric specifications are clear and unambiguous. This is the metric specification equivalent of protocol interoperability. The advancement process either produces confidence that the metric definitions and supporting material are clearly worded and unambiguous, OR, identifies ways in which the metric definitions should be revised to achieve clarity. The process should also permit identification of options that were @@ -162,26 +161,20 @@ Another aspect of the metric RFC advancement process is the requirement to document the work and results. The procedures of [RFC2026] are expanded in[RFC5657], including sample implementation and interoperability reports. This memo follows the template in [I-D.morton-ippm-advance-metrics] for the report that accompanies the protocol action request submitted to the Area Director, including description of the test set-up, procedures, results for each implementation and conclusions. -1.1. RFC 2679 Coverage - - This plan, in it's first draft version, does not cover all critical - requirements and sections of [RFC2679]. Material will be added as it - is "discovered" (not all requirements use requirements language). - 2. A Definition-centric metric advancement process The process described in Section 3.5 of [I-D.ietf-ippm-metrictest] takes as a first principle that the metric definitions, embodied in the text of the RFCs, are the objects that require evaluation and possible revision in order to advance to the next step on the standards track. IF two implementations do not measure an equivalent singleton or sample, or produce the an equivalent statistic, @@ -330,21 +323,21 @@ + Src, the IP address of a host (12.3.167.16 or 193.159.144.8) + Dst, the IP address of a host (193.159.144.8 or 12.3.167.16) + T0, a time + Tf, a time + lambda, a rate in reciprocal seconds - + Thresh, a maximum waiting time in seconds (see Section 3.82 of + + Thresh, a maximum waiting time in seconds (see Section 3.8.2 of [RFC2679]) And (Section 4.3. [RFC2679]) Metric Units: A sequence of pairs; the elements of each pair are: + T, a time, and + dT, either a real number or an undefined number of seconds. The values of T in the sequence are monotonic increasing. Note that @@ -483,26 +476,26 @@ 3rd Qu.:127.0 3rd Qu.: 88.00 3rd Qu.: 74.00 Max. :205.0 Max. :177.00 Max. :163.00 > NetProbe Calibration with Cross-Connect Cable, one-way delay values in microseconds (us) The median or systematic error can be as high as 110 us, and the range of the random error is also on the order of 116 us for all streams. - Also, anticipating the Anderson-Darling K-sample (ADK) comparisons to - follow, we corrected the CAL2 values for the difference between means - between CAL2 and CAL3 (as specified in [I-D.ietf-ippm-metrictest]), - and found strong support for the (Null Hypothesis that) the samples - are from the same distribution (resolution of 1 us and alpha equal - 0.05 and 0.01) + Also, anticipating the Anderson-Darling K-sample (ADK) [ADK] + comparisons to follow, we corrected the CAL2 values for the + difference between means between CAL2 and CAL3 (as specified in + [I-D.ietf-ippm-metrictest]), and found strong support for the (Null + Hypothesis that) the samples are from the same distribution + (resolution of 1 us and alpha equal 0.05 and 0.01) > XD4CVCAL2 <- XD4CAL$CAL2 - (mean(XD4CAL$CAL2)-mean(XD4CAL$CAL3)) > boxplot(XD4CVCAL2,XD4CAL$CAL3) > XD4CV2_ADK <- adk.test(XD4CVCAL2, XD4CAL$CAL3) > XD4CV2_ADK Anderson-Darling k-sample test. Number of samples: 2 Sample sizes: 300 300 Total number of values: 600 Number of unique values: 97 @@ -511,20 +504,21 @@ Standard deviation of Anderson Darling Criterion: 0.75896 T = (Anderson Darling Criterion - mean)/sigma Null Hypothesis: All samples come from a common population. t.obs P-value extrapolation not adj. for ties 0.71734 0.17042 0 adj. for ties -0.39553 0.44589 1 > + using [Rtool] and [Radk]. 4.2. Perfas Error and Type-P Perfas+ is configured to use GPS synchronisation and uses NTP synchronization as a fall-back or default. GPS synchronisation worked throughout this test with the exception of the calibration stated here (one implementation was NTP synchronised only). The time stamp accuracy typically is 0.1 ms. The resolution of the results reported by Perfas+ is 1us (us = @@ -718,21 +712,21 @@ | | | | | | p4 |-0.81 (0.57) |-0.13 (0.37) | 1.36 (0.09) | | | | | | +------------+-------------+-------------+-------------+ Perfas ADK Results for same-implementation 6.1.3. One-way Delay, Cross-Implementation ADK Comparison The cross-implementation results are compared using a combined ADK - analysis [ref], where all NetProbe results are compared with all + analysis [Radk], where all NetProbe results are compared with all Perfas results after testing that the combined same-implementation results pass the ADK criterion. When 4 (same) samples are compared, the ADK criterion for 0.95 confidence is 1.915, and when all 8 (cross) samples are compared it is 1.85. Combination of Anderson-Darling K-Sample Tests. Sample sizes within each data set: @@ -773,20 +767,139 @@ from NetProbe or Perfas proved to be different from the others in paired comparisons (even same comparisons). When the out lier stream was removed from the comparison, the remaining streams passed combined ADK criterion. Also, the application of correction factors resulted in higher comparison success. We conclude that the two implementations are capable of producing equivalent one-way delay distributions based on their interpretation of [RFC2679] . +6.1.5. Additional Investigations + + On the final day of testing, we performed a series of measurements to + evaluate the amount of emulated delay variation necessary to achieve + successful ADK comparisons. The need for Correction factors (as + permitted by Section 5) and the size of the measurement sample + (obtained as sub-sets of the complete measurement sample) were also + evaluated. + + The common parameters used for tests in this section are: + + o IP header + payload = 64 octets + + o Periodic sampling at 1 packet per second + + o Test duration = 300 seconds at each delay variation setting, for a + total of 1200 seconds (May 2, 2011 at 1720 UTC) + + The netem emulator was set for 100ms average delay, with (emulated) + uniform delay variation of: + + o +/-7.5 ms + + o +/-5.0 ms + + o +/-2.5 ms + + o 0 ms + + In this experiment, the netem emulator was configured to operate + independently on each VLAN and thus the emulator itself is a + potential source of error when comparing streams that traverse the + test path in different directions. + + In the result analysis of this section: + + o All comparisons used 1 microsecond resolution. + + o Correction Factors *were* applied as noted (under column heading + "mean adj"). The difference between each sample mean and the + lowest mean of the NetProbe or Perfas stream samples was + subtracted from all values in the sample. ("raw" indicates no + correction factors were used.) + + o The 0.95 confidence factor (1.960 for paired stream comparison) + was used. + + When 8 (cross) samples are compared, the ADK criterion for 0.95 + confidence is 1.85. The Combined ADK test statistic ("TC observed") + must be less than 1.85 to accept the Null Hypothesis (all samples in + the data set are from a common distribution). + +012345678901234567890123456789012345678901234567890123456789012345678901 +Emulated Delay Sub-Sample size +Variation 0ms +adk.combined (all) 300 values 75 values +Adj. for ties raw mean adj raw mean adj +TC observed 226.6563 67.51559 54.01359 21.56513 +P-value 0 0 0 0 +Mean std dev (all),us 719 635 +Mean diff of means,us 649 0 606 0 + +Variation +/- 2.5ms +adk.combined (all) 300 values 75 values +Adj. for ties raw mean adj raw mean adj +TC observed 14.50436 -1.60196 3.15935 -1.72104 +P-value 0 0.873 0.00799 0.89038 +Mean std dev (all),us 1655 1702 +Mean diff of means,us 471 0 513 0 + +Variation +/- 5ms +adk.combined (all) 300 values 75 values +Adj. for ties raw mean adj raw mean adj +TC observed 8.29921 -1.28927 0.37878 -1.81881 +P-value 0 0.81601 0.29984 0.90305 +Mean std dev (all),us 3023 2991 +Mean diff of means,us 582 0 513 0 + +Variation +/- 7.5ms +adk.combined (all) 300 values 75 values +Adj. for ties raw mean adj raw mean adj +TC observed 2.53759 -0.72985 0.29241 -1.15840 +P-value 0.01950 0.66942 0.32585 0.78686 +Mean std dev (all),us 4449 4506 +Mean diff of means,us 426 0 856 0 + + From the table above, we conclude the following: + + 1. None of the raw or mean adjusted results pass the ADK criterion + with 0 ms emulated delay variation. Use of the 75 value sub- + sample yielded the same conclusion. (We note the same results + when comparing same implementation samples for both NetProbe and + Perfas.) + + 2. When the smallest emulated delay variation was inserted + (+/-2.5ms), the mean adjusted samples pass the ADK criterion and + the high P-value supports the result. The raw results do not + pass. + + 3. At higher values of emulated delay variation (+/-5.0ms and + +/-7.5ms), again the mean adjusted values pass ADK. We also see + that the 75-value sub-sample passed the ADK in both raw and mean + adjusted cases. This indicates that sample size may have played + a role in our results, as noted in the Appendix of [RFC2680] for + Goodness-of-Fit testing. + + We note that 150 value sub-samples were also evaluated, with ADK + conclusions that followed the results for 300 values. Also, same- + implementation analysis was conducted with results similar to the + above, except that more of the "raw" or uncorrected samples passed + the ADK criterion. + + >>>> To be provided: + + >>>> Overall statement about Correction Factors w.r.t. section 5 + limits. + + >>>> Appendix with more details ??? + 6.2. One-way Delay, Loss threshold, RFC 2679 This test determines if implementations use the same configured maximum waiting time delay from one measurement to another under different delay conditions, and correctly declare packets arriving in excess of the waiting time threshold as lost. See Section 3.5 of [RFC2679], 3rd bullet point and also Section 3.8.2 of [RFC2679]. @@ -936,21 +1049,24 @@ Since it was not possible to confirm the estimated serialization time increases in field tests, we resort to examination of the implementations to determine compliance. NetProbe performs all time stamping above the IP-layer, accepting that some compromises must be made to achieve extreme portability and measurement scale. Therefore, the first-to-last bit convention is supported because the serialization time is included in the one-way delay measurement, enabling comparison with other implementations. - Perfas >>>>>>>>>>>>>>> TBD + Perfas is optimized for its purpose and performs all time stamping + close to the interface hardware. The first-to-last bit convention is + supported because the serialization time is included in the one-way + delay measurement, enabling comparison with other implementations. 6.4. One-way Delay, Difference Sample Metric (Lab) This test determines if implementations register the same relative increase in delay from one measurement to another under different delay conditions. This test tends to cancel the sources of error which may be present in an implementation. This test is intended to evaluate measurements in sections 3 and 4 of [RFC2679]. @@ -1034,23 +1150,22 @@ 5.1. Type-P-One-way-Delay-Percentile yes no 5.2. Type-P-One-way-Delay-Median yes no 5.3. Type-P-One-way-Delay-Minimum yes yes 5.4. Type-P-One-way-Delay-Inverse-Percentile no no Implementation of Section 5 Statistics - 5.1. Type-P-One-way-Delay-Percentile 5.2. Type-P-One-way-Delay- - Median 5.3. Type-P-One-way-Delay-Minimum 5.4. Type-P-One-way-Delay- - Inverse-Percentile + Only the Type-P-One-way-Delay-Inverse-Percentile has been ignored in + both implementations, so it is a candidate for removal in RFC2679bis. 7. Security Considerations The security considerations that apply to any active measurement of live networks are relevant here as well. See [RFC4656] and [RFC5357]. 8. IANA Considerations This memo makes no requests of IANA, and hopes that IANA will be as @@ -1062,28 +1177,27 @@ advance the IPPM metrics during his tenure as AD Advisor. Nicole Kowalski supplied the needed CPE router for the NetProbe side of the test set-up, and graciously managed her testing in spite of issues caused by dual-use of the router. Thanks Nicole! The "NetProbe Team" also acknowledges many useful discussions with Ganga Maguluri. 10. References - 10.1. Normative References [I-D.ietf-ippm-metrictest] Geib, R., Morton, A., Fardid, R., and A. Steinmitz, "IPPM standard advancement testing", - draft-ietf-ippm-metrictest-03 (work in progress), - June 2011. + draft-ietf-ippm-metrictest-05 (work in progress), + November 2011. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997. [RFC2330] Paxson, V., Almes, G., Mahdavi, J., and M. Mathis, "Framework for IP Performance Metrics", RFC 2330, May 1998. @@ -1113,28 +1227,42 @@ [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, October 2008. [RFC5657] Dusseault, L. and R. Sparks, "Guidance on Interoperation and Implementation Reports for Advancement to Draft Standard", BCP 9, RFC 5657, September 2009. 10.2. Informative References + [ADK] Scholz, F. and M. Stephens, "K-sample Anderson-Darling + Tests of fit, for continuous and discrete cases", + University of Washington, Technical Report No. 81, + May 1986. + [I-D.morton-ippm-advance-metrics] Morton, A., "Lab Test Results for Advancing Metrics on the Standards Track", draft-morton-ippm-advance-metrics-02 (work in progress), October 2010. [RFC3931] Lau, J., Townsley, M., and I. Goyret, "Layer Two Tunneling Protocol - Version 3 (L2TPv3)", RFC 3931, March 2005. + [Radk] Scholz, F., "adk: Anderson-Darling K-Sample Test and + Combinations of Such Tests. R package version 1.0.", , + 2008. + + [Rtool] R Development Core Team, "R: A language and environment + for statistical computing. R Foundation for Statistical + Computing, Vienna, Austria. ISBN 3-900051-07-0, URL + http://www.R-project.org/", , 2011. + Authors' Addresses Len Ciavattone AT&T Labs 200 Laurel Avenue South Middletown, NJ 07748 USA Phone: +1 732 420 1239 Fax: @@ -1155,17 +1282,16 @@ 200 Laurel Avenue South Middletown, NJ 07748 USA Phone: +1 732 420 1571 Fax: +1 732 368 1192 Email: acmorton@att.com URI: http://home.comcast.net/~acmacm/ Matthias Wieser - University of Applied Sciences Darmstadt - Birkenweg 8 Department EIT - Darmstadt, 64295 + Technical University Darmstadt + Darmstadt, Germany Phone: - Email: matthias.wieser@stud.h-da.de + Email: matthias_michael.wieser@stud.tu-darmstadt.de