--- 1/draft-ietf-ippm-metrictest-03.txt 2011-10-24 21:14:03.506670670 +0200 +++ 2/draft-ietf-ippm-metrictest-04.txt 2011-10-24 21:14:03.582670560 +0200 @@ -1,23 +1,23 @@ Internet Engineering Task Force R. Geib, Ed. Internet-Draft Deutsche Telekom Intended status: Standards Track A. Morton -Expires: December 31, 2011 AT&T Labs +Expires: April 26, 2012 AT&T Labs R. Fardid Cariden Technologies A. Steinmitz Deutsche Telekom - June 29, 2011 + October 24, 2011 IPPM standard advancement testing - draft-ietf-ippm-metrictest-03 + draft-ietf-ippm-metrictest-04 Abstract This document specifies tests to determine if multiple independent instantiations of a performance metric RFC have implemented the specifications in the same way. This is the performance metric equivalent of interoperability, required to advance RFCs along the standards track. Results from different implementations of metric RFCs will be collected under the same underlying network conditions and compared using state of the art statistical methods. The goal is @@ -33,21 +33,21 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on December 31, 2011. + This Internet-Draft will expire on April 26, 2012. Copyright Notice Copyright (c) 2011 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents @@ -76,21 +76,22 @@ 4. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . . 22 5. Contributors . . . . . . . . . . . . . . . . . . . . . . . . . 22 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 22 7. Security Considerations . . . . . . . . . . . . . . . . . . . 23 8. References . . . . . . . . . . . . . . . . . . . . . . . . . . 23 8.1. Normative References . . . . . . . . . . . . . . . . . . . 23 8.2. Informative References . . . . . . . . . . . . . . . . . . 24 Appendix A. An example on a One-way Delay metric validation . . . 25 A.1. Compliance to Metric specification requirements . . . . . 25 A.2. Examples related to statistical tests for One-way Delay . 27 - Appendix B. Anderson-Darling 2 sample C++ code . . . . . . . . . 29 + Appendix B. Anderson-Darling K-sample Reference and 2 sample + C++ code . . . . . . . . . . . . . . . . . . . . . . 29 Appendix C. Glossary . . . . . . . . . . . . . . . . . . . . . . 37 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . . 38 1. Introduction The Internet Standards Process RFC2026 [RFC2026] requires that for a IETF specification to advance beyond the Proposed Standard level, at least two genetically unrelated implementations must be shown to interoperate correctly with all features and options. This requirement can be met by supplying: @@ -186,20 +187,25 @@ The metric RFC advancement process begins with a request for protocol action accompanied by a memo that documents the supporting tests and results. The procedures of [RFC2026] are expanded in[RFC5657], including sample implementation and interoperability reports. Section 3 of [morton-advance-metrics-01] can serve as a template for a metric RFC report which accompanies the protocol action request to the Area Director, including description of the test set-up, procedures, results for each implementation and conclusions. + Changes from WG-03 to WG-04: + + o Revisions to Appendix B code and add reference to "R" in the + Appendix and the text of section 3.6. + Changes from WG-02 to WG-03: o Changes stemming from experiments that implemented this plan, in general. o Adoption of the VLAN loopback figure in the main body of the memo (section 3.2). Changes from WG-01 to WG-02: @@ -986,21 +992,23 @@ differences such that the connectivity differences of the cross- implementation tests are also experienced and measured by the same implementation. Comparative results for the same implementation represent a bound on cross-implementation equivalence. This should be particularly useful when the metric does *not* produces a continuous distribution of singleton values, such as with a loss metric, or a duplication metric. Appendix A indicates how the ADK will work for 0ne-way delay, and should be likewise applicable to distributions of delay - variation. + variation. Appendix B discusses two possible ways to perform the ADK + analysis, the R statistical language [Rtool] with ADK package [Radk] + and C++ code. Proposal: the implementation with the largest difference in homogeneous comparison results is the lower bound on the equivalence threshold, noting that there may be other systematic errors to account for when comparing between implementations. Thus, when evaluating equivalence in cross-implementation results: Maximum_Error = Same_Implementation_Error + Systematic_Error @@ -1026,21 +1034,21 @@ Scott Bradner, Vern Paxson and Allison Mankin drafted bradner- metrictest [bradner-metrictest], and major parts of it are included in this document. 6. IANA Considerations This memo includes no request to IANA. 7. Security Considerations - This draft does not raise any specific security issues. + This memo does not raise any specific security issues. 8. References 8.1. Normative References [RFC2003] Perkins, C., "IP Encapsulation within IP", RFC 2003, October 1996. [RFC2026] Bradner, S., "The Internet Standards Process -- Revision 3", BCP 9, RFC 2026, October 1996. @@ -1105,20 +1113,29 @@ [GU+Duffield] Gu, Y., Duffield, N., Breslau, L., and S. Sen, "GRE Encapsulated Multicast Probing: A Scalable Technique for Measuring One-Way Loss", SIGMETRICS'07 San Diego, California, USA, June 2007. [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, October 2008. + [Radk] Scholz, F., "adk: Anderson-Darling K-Sample Test and + Combinations of Such Tests. R package version 1.0.", , + 2008. + + [Rtool] R Development Core Team, "R: A language and environment + for statistical computing. R Foundation for Statistical + Computing, Vienna, Austria. ISBN 3-900051-07-0, URL + http://www.R-project.org/", , 2011. + [Rule of thumb] Hardy, M., "Confidence interval", March 2010. [bradner-metrictest] Bradner, S., Mankin, A., and V. Paxson, "Advancement of metrics specifications on the IETF Standards Track", draft -bradner-metricstest-03, (work in progress), July 2007. [morton-advance-metrics] @@ -1292,21 +1309,39 @@ table 1. Comparing column 1 and column 3 of the table by an ADK test shows, that the data contained in these columns passes an ADK tests with 95% confidence. >>> Comment: Extensive averaging was used in this example, because of the vastly different sampling frequencies. As a result, the distributions compared do not exactly align with a metric in [RFC2679], but illustrate the ADK process adequately. -Appendix B. Anderson-Darling 2 sample C++ code +Appendix B. Anderson-Darling K-sample Reference and 2 sample C++ code + + There are many statistical tools available, and this Appendix + describes two that are familiar to the authors. + + The "R tool" is a language and command-line environment for + statistical computing and plotting [Rtool]. With the optional "adk" + package installed [Radk], it can perform individual and combined + sample ADK computations. The user must consult the package + documentation and the original paper [ADK] to interpret the results, + but this is as it should be. + + The C++ code below will perform a 2-sample AD comparison when + compiled and presented with two column vectors in a file (using white + space as separation). This version contains modifications to use the + vectors and run as a stand-alone module by Wes Eddy, Sept 2011. The + status of the comparison can be checked on the command line with "$ + echo $?" or the last line can be replaced with a printf statement for + adk_result instead. /* Routines for computing the Anderson-Darling 2 sample * test statistic. * * Implemented based on the description in * "Anderson-Darling K Sample Test" Heckert, Alan and * Filliben, James, editors, Dataplot Reference Manual, * Chapter 15 Auxiliary, NIST, 2004. * Official Reference by 2010 * Heckert, N. A. (2001). Dataplot website at the @@ -1315,146 +1350,132 @@ * June 2001. */ #include #include #include #include using namespace std; + int main() { vector vec1, vec2; double adk_result; - double adk_criterium = 1.993; - - /* vec1 and vec2 to be initialised with sample 1 and - * sample 2 values in ascending order. - */ - - /* example for iterating the vectors - * for(vector::iterator it = vec1->begin(); - * it != vec1->end(); it++ - * { - * cout << *it << endl; - * } - */ - static int k, val_st_z_samp1, val_st_z_samp2, val_eq_z_samp1, val_eq_z_samp2, j, n_total, n_sample1, n_sample2, L, max_number_samples, line, maxnumber_z; static int column_1, column_2; static double adk, n_value, z, sum_adk_samp1, sum_adk_samp2, z_aux; static double H_j, F1j, hj, F2j, denom_1_aux, denom_2_aux; static bool next_z_sample2, equal_z_both_samples; static int stop_loop1, stop_loop2, stop_loop3,old_eq_line2, old_eq_line1; static double adk_criterium = 1.993; + /* vec1 and vec2 to be initialised with sample 1 and + * sample 2 values in ascending order */ + while (!cin.eof()) { + double f1, f2; + cin >> f1; + cin >> f2; + vec1.push_back(f1); + vec2.push_back(f2); + } + k = 2; - n_sample1 = vec1->size() - 1; - n_sample2 = vec2->size() - 1; + n_sample1 = vec1.size() - 1; + n_sample2 = vec2.size() - 1; // -1 because vec[0] is a dummy value - n_total = n_sample1 + n_sample2; /* value equal to the line with a value = zj in sample 1. * Here j=1, so the line is 1. */ - val_eq_z_samp1 = 1; /* value equal to the line with a value = zj in sample 2. * Here j=1, so the line is 1. */ - val_eq_z_samp2 = 1; /* value equal to the last line with a value < zj * in sample 1. Here j=1, so the line is 0. */ - val_st_z_samp1 = 0; /* value equal to the last line with a value < zj * in sample 1. Here j=1, so the line is 0. */ - val_st_z_samp2 = 0; sum_adk_samp1 = 0; sum_adk_samp2 = 0; j = 1; // as mentioned above, j=1 - equal_z_both_samples = false; + next_z_sample2 = false; //assuming the next z to be of sample 1 - stop_loop1 = n_sample1 + 1; // + 1 because vec[0] is a dummy, see n_sample1 declaration - stop_loop2 = n_sample2 + 1; stop_loop3 = n_total + 1; /* The required z values are calculated until all values * of both samples have been taken into account. See the * lines above for the stoploop values. Construct required * to avoid a mathematical operation in the While condition */ - while (((stop_loop1 > val_eq_z_samp1) || (stop_loop2 > val_eq_z_samp2)) && stop_loop3 > j) { if(val_eq_z_samp1 < n_sample1+1) { - /* here, a preliminary zj value is set. * See below how to calculate the actual zj. */ - - z = (*vec1)[val_eq_z_samp1]; + z = vec1[val_eq_z_samp1]; /* this while sequence calculates the number of values * equal to z. */ - while ((val_eq_z_samp1+1 < n_sample1) - && z == (*vec1)[val_eq_z_samp1+1] ) + && z == vec1[val_eq_z_samp1+1] ) { val_eq_z_samp1++; } } else { val_eq_z_samp1 = 0; val_st_z_samp1 = n_sample1; // this should be val_eq_z_samp1 - 1 = n_sample1 } if(val_eq_z_samp2 < n_sample2+1) { - z_aux = (*vec2)[val_eq_z_samp2];; + z_aux = vec2[val_eq_z_samp2];; /* this while sequence calculates the number of values * equal to z_aux */ while ((val_eq_z_samp2+1 < n_sample2) - && z_aux == (*vec2)[val_eq_z_samp2+1] ) + && z_aux == vec2[val_eq_z_samp2+1] ) { val_eq_z_samp2++; } /* the smaller of the two actual data values is picked * as the next zj. */ if(z > z_aux) { @@ -1684,29 +1700,30 @@ next_z_sample2 = false; equal_z_both_samples = false; /* index to count the z. It is only required to prevent * the while slope to execute endless */ j++; } // calculating the adk value is the final step. - adk_result = (double) (n_total - 1) / (n_total * n_total * (k - 1)) * (sum_adk_samp1 / n_sample1 + sum_adk_samp2 / n_sample2); /* if(adk_result <= adk_criterium) * adk_2_sample test is passed */ + return adk_result <= adk_criterium; + } Figure 5 Appendix C. Glossary +-------------+-----------------------------------------------------+ | ADK | Anderson-Darling K-Sample test, a test used to | | | check whether two samples have the same statistical | | | distribution. | | ECMP | Equal Cost Multipath, a load balancing mechanism |