Traffic Engineering Working Group Wai Sum Lai Internet Draft AT&T Labs Document:
<draft-ietf-tewg-measure-01.txt><draft-ietf-tewg-measure-02.txt> Category: Informational Blaine Christian UUNET Richard W. Tibbs Oak City Networks & Solutions Steven Van den Berghe Ghent University/IMEC November 2001March 2002 A Framework for Internet Traffic Engineering Measurement Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet- Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet- Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. 1. Abstract In this document, a measurement framework for supporting the traffic engineering of IP-based networks is presented. Uses of traffic measurement in service provider environments are described, and issues related to time scale and read-out period are discussed. Different measurement types are classified, with each being specified as a meaningful combination of a measurement entity and a measurement basis. For interoperable compatibility, uniform definitions across vendors and operators must be ensured, e.g., in the distinction between offered load and achieved throughput. To aid network dimensioning, mechanisms to collect node-pair-based traffic data should be developed to facilitate the derivation of per-service-class traffic matrix statistics. For service assurance, there is a need for the use of higher-order statistics. To preserve representative traffic detail at manageable sample volumes, there is a need for packet sampled measurements. Table of Contents Status of this Memo................................................1 1. Abstract........................................................1 2. Conventions used in this document...............................2 3. Introduction....................................................2 4. Terminology.....................................................4 4.1 Route, path....................................................4 4.2 Throughput, traffic volume.....................................4 5. Uses of Traffic Measurement.....................................5 5.1 Traffic characterization.......................................5 5.2 Network monitoring.............................................5 5.3 Traffic control................................................6 6. Time Scales for Network Operations..............................6 7. Read-Out Periods................................................7 8. Measurement Bases...............................................8 8.1 Flow-based.....................................................8Flow-based.....................................................9 8.2 Interface-based, link-based, node-based........................9 8.3 Node-pair-based................................................9Node-pair-based...............................................10 8.4 Path-based....................................................10 9. Measurement Entities...........................................10 9.1 Entities related to traffic and performance...................10performance...................11 9.2 Entities related to establishment of connection or path.......13 10. Measurement Types.............................................13 10.1 Measurement types related to traffic or performance..........13 10.2 Measurement types related to resource usage..................14 11. Traffic Matrix Statistics.....................................14Statistics.....................................15 12. Performance Monitoring........................................15Monitoring........................................16 13. Security Considerations.......................................16Packet Sampling...............................................16 14. References....................................................16Statistical Estimation and Information Modeling...............17 14.1 Engineering methods for statistical estimation of measures...17 14.2 TE Measure Information Modeling..............................18 15. Acknowledgments...............................................18Conclusions and Recommendations...............................20 16. Security Considerations.......................................20 17. References....................................................20 18. Acknowledgments...............................................22 19. Author's Addresses............................................18Addresses............................................22 Full Copyright Statement..........................................18Statement..........................................23 2. Conventions used in this document The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in RFC-2119. 3. Introduction This document describes a framework for Internet traffic engineering measurement, with the objective of providing principles for the development of a set of measurement systems to support the traffic engineering of IP-based networks . A major goal is to provide guidance for establishing protocol-independent and platform-neutral traffic measurement standards to achieve multi-vendor inter- operability. It is critical to minimize the possibilities of inconsistencies arising from, e.g., differing statistical definitions, overlapping data collection, processing at different protocol levels, and similar inconsistencies by different vendors or network operators. The need for a common framework, including detailed definitions for measurements, is motivated by the needs for consistency, precision, and effectiveness of the overall traffic engineering function. Traffic engineering includes measurements, forecasting, planning, dimensioning, control, and performance monitoring. From this perspective, the purpose of this document is to set principles of measurement in place that assure the quality of the other aspects of traffic engineering. The scope of this document is limited to those aspects of measurement pertaining to intra-domain operations, i.e., within a given autonomous system. However, measurements on its boundary with other domains are included as well. The focus is primarily on traffic engineering in Internet service provider environments. In this document, uses of traffic measurement in traffic characterization, network monitoring, and traffic control are first described. Depending on the network operations to be performed in these tasks, three different time scales can be identified, ranging from months, through days or hours, to minutes or less. To support these operations, traffic measurement must be able to capture accurately, within a given confidence interval, the traffic variations and peaks without degrading network performance and without generating an immense amount of data. As one consequence of the need to avoid network performance degradation, specification of a suitable read-out period for each service class for traffic summarization is essential. Other principles such as concise representation of measurements are identified as well. Traffic measurement can be performed on the basis of flows, interfaces, links, nodes, node-pairs, or paths. Based on these objects, different measurement entities can be defined, such as traffic volume, average holding time, bandwidth availability, throughput, delay, delay variation, packet loss, and resource usage. Using these measured traffic data, in conjunction with other network data such as topological data and router configuration data, traffic matrix and other relevant statistics can be derived for traffic engineering purposes. Traffic measurement also plays a key role in network performance management. In addition to these capabilities, functions of a measurement system should also include data storage, data processing, statistics generation and reporting. However, these aspects are outside the scope of this document. As a framework, this document is mainly concerned with a discussion of various technical issues surrounding traffic measurement, particularly in the area of statistical traffic load estimation for traffic engineering purposes. As far as possible and to avoid duplication of effort, relevant work done in measurements by other standards organizations will be applied or adapted, and references to them will be made. These include, in particular, . IP Performance Metrics (IPPM) Working Group of the IETF: its framework document  and the associated documents on individual metrics [3, 4, 5, 6, 7, 8, 9] . ITU-T: Recommendation I.380/Y.1540  and Draft Recommendation Y.1541  4. Terminology The intent of this section is not to provide definition or description of terms used in this document. Rather, it is to highlight the difference in usage of closely related terms. 4.1 Route, path A route is any unidirectional sequence of nodes and links, for sending packets from a source node to a destination node. A path refers to an MPLS tunnel, i.e., a label-switched path . It should be pointed out that there are also methods for creating paths with other technologies such as frame relay or ATM. The measurement described in this document may apply to these technologies with suitable adaptation. To simplify description, reference is made to MPLS only in what follows. 4.2 Throughput, traffic volume Both quantities can be applied to a network, a network segment, or an individual network element. Throughput of a network, as a measure of delivered performance, refers to the maximum sustainable rate of transferring packets successfully across the network, under given network conditions, e.g., a given traffic mix, while meeting quality of service (QoS) objectives. This usage is consistent with the definition of throughput for a network interconnect device as specified in . For real-time network control, active measurement of throughput by probing may be used to determine the currently available capacity of a network to carry additional traffic. (In an active measurement, test packets are injected into the network. Data collected about these packets are taken as representative of the behavior of the network.) Traffic volume, as a measure of the traffic carried, characterizes the level of traffic that a network is designed to support. Passive, i.e., in-service non-intrusive, measurement of the traffic volume is usually used to estimate the long-term offered traffic for the purposes of network dimensioning in the capacity-management and network-planning processes (see the Section on Time Scales for Network Operations). A network should be properly dimensioned so that its throughput is adequate to handle the expected traffic volume. Throughput is expressed in terms of number of data units per time unit. Traffic volume is expressed in data units with reference to a read-out period (see the Section on Read-Out Periods). For transmission systems, the data unit is usually a multiple of either bits or bytes. For processing systems, the data unit is usually a multiple of packets. 5. Uses of Traffic Measurement Traffic measurement is used to collect traffic data for the following purposes: . Traffic characterization . Network monitoring . Traffic control 5.1 Traffic characterization . Identifying traffic patterns, particularly traffic peak patterns, and their variations in statistical analysis; this includes developing traffic profiles to capture daily, weekly, or seasonal variations. . Determining traffic distributions in the network on the basis of flows, interfaces, links, nodes, node-pairs, paths, or destinations. . Estimation of the traffic load according to service classes in different routers and the network. . Observing trends for traffic growth and forecasting of traffic demands. For example, traffic engineering measurements are usually used to determine the statistical moments of a traffic flow. As suggested in , given the time series of packet arrivals, a suitable parametric stochastic model based on the mean and variance of the time series can be constructed. This traffic model is then used in the ensuing phases of traffic engineering, such as link dimensioning to meet service objectives. 5.2 Network monitoring . Determining the operational state of the network, including fault detection. . Monitoring the continuity and quality of network services, to ensure that QoS/GoS objectives are met for various classes of traffic, to verify the performance of delivered services, or to serve as a means of sectionalizing performance issues seen by a customer. [QoS reflects the performance perceivable by a user of a service, while GoS (grade of service) is used by a service provider for internal design and operation of a network.] . Evaluating the effectiveness of traffic engineering policies, or triggering certain policy-based actions (such as alarm generation, or path preemption) upon threshold crossing; this may be based on the use of performance history data. . Verifying peering agreements between service providers by monitoring/measuring the traffic flows over interconnecting links at border routers; this includes the estimation of inter- and intra-network traffic, as well as originating, terminating, and transit traffic that are being exchanged between peers. An example of using traffic measurements in this area might be monitoring packet loss rates at various points in a network to detect apparent link failure. Another example is monitoring the QoS delivered to external peers by an autonomous system to ensure that peering agreements are met. 5.3 Traffic control . Adaptively optimizing network performance in response to network events, e.g., rerouting to work around congestion or failures. . Providing a feedback mechanism in the reverse flow messaging of RSVP-TE or CR-LDP signaling in MPLS to report on actual topology state information such as link bandwidth availability. . Support of measurement-based admission control, i.e., by predicting the future demands of the aggregate of existing flows so that admission decisions can be made on new flows. An example of traffic engineering measurements used to effect a traffic control mechanism is to configure policing mechanisms in response to traffic load and performance measurements. A network operator could selectively throttle low-priority flows to improve near-real-time performance of higher-priority flows, and maintain tighter QoS envelopes. Another example would be to use measurement results for feedback into IGP routing decisions, e.g., for adjusting the link weights based on them. 6. Time Scales for Network Operations The information collected by traffic measurement can be provided to the end user or application either in real time, or for record (i.e., data retention) in non-real time, depending on the activities to be performed and the network actions to be taken. Traffic control will generally require real-time information. For network planning and capacity management as described below, information may be provided in non-real time after the processing of raw data. Broadly speaking, the following three time scales can be classified, according to the use of observed traffic information for network operations . Network planning Information that changes on the order of months is used to make traffic forecasts as a basis for network extensions and long-term network configuration. That is, for planning the topology of the network, planning alternative routes to survive failures or determining where capacity must be augmented in advance of projected traffic growth. Forecasting and planning may also lead to the introduction of new technology and architecture. Capacity management Information that changes on the order of days or hours is used to manage the deployed facilities, by taking appropriate maintenance or engineering actions to optimize utilization. For example, new MPLS tunnels may be set up or existing tunnels modified while meeting service level agreements. Also, load balancing may be performed, or traffic may be rerouted for re-optimization after a failure. Real-time network control Information that changes on the order of minutes or less is used to adapt to the current network conditions in near real time. Thus, to combat localized congestion, traffic management actions may perform temporary rerouting to redistribute the load. Upon detecting a failure, traffic may be diverted to pre-established, secondary routes until more optimized routes can be arranged. 7. Read-Out Periods A measurement infrastructure must be able to scale with the size and the speed of a network as it evolves. Hence, it is important to minimize the amount of data to be collected, and to condense the collected data by periodic summarization. This is to prevent network performance from being adversely affected by the unnecessarily excessive loading of router control processors, router memories, transmission facilities, and the administrative support systems. A measurement interval is the time interval over which measurements are taken. Some traffic data must be collected continuously, while others by sampling, or on a scheduled basis. For example, peak loads and peak periods can be identified only by continuous measurement as traffic typically fluctuates irregularly during the whole day. If traffic variations are regular and predictable, it may be possible to measure the expected normal load on pre- determined portions of the day. This requires the definition of a busy period. Special studies on selected segments of the network may be conducted on a scheduled basis. Active measurement, with the involvement of network operator, may be activated manually. For instance, active throughput measurement may be used to identify alternate routes during periods of network congestion. A measurement interval consists of a sequence of consecutive read- out periods. Summarization is usually done by integrating the raw data over a pre-specified read-out period. The granularity of this period must be suitably chosen. It should be short enough to capture, with acceptable accuracy, the bursty nature of the traffic, i.e., the traffic variations and peaks. Since measurements represent a load for the router, the read-out period should not be so short that router performance is degraded while a voluminous quantity of data is produced. Also, read-out may be started when the measured data exceeds a preset threshold, or when the space allocated for temporarily holding the data in a router is exhausted. For a multi-service IP-based network, each service typically has its own traffic characteristics and performance objectives. To ensure that service-specific features are reflected in the measurement process, different read-out periods may be needed for different classes of service. 8. Measurement Bases Measurements can be classified on the basis of where, and at which level the traffic data are gathered and aggregated. This is similar to the concept of a *population of interest* as specified in ITU-T Recommendation I.380/Y.1540. As defined therein, this refers to a set of packets, possibly relative to a particular pair of source and destination hosts, for the purposes of defining performance parameters. However, measurement bases as used here may not have any association with a source-destination pair. In this document, customer-based measurements are not considered. Service providers will make decisions on how to perform the measurements needed, and there are various tradeoffs involved. One option is to obtain the measurements directly from the network elements themselves, e.g., via SNMP (Simple Network Management Protocol). Collecting the measurements on the operational network elements such as routers is sometimes a performance concern. Currently, there are a number of third-party measurement/monitoring products available. Hence, another option is to deploy such equipment, which might have performance advantages but also introduces additional cost. Regardless of the type of measurement source, either a network element or a third-party product, measurements should be collected, as far as possible, by a measurement source without requiring coordination with other measurement sources. Thus, it is desirable to perform those measurements that do not require the use of specialized monitoring equipment connected to the network at multiple locations. While each measurement source may act autonomously with regard to taking measurements, a network operator may specify some network-wide policy regarding measurement scheduling. Such policy may be, say, the use of the same time of day, the same measurement interval, or measurement intervals that are multiples of each other (e.g., nested intervals with synchronized boundaries). A schedule therefore should include such time information as the start, the duration, and periodicity of a certain measurement. The following measurement bases are considered in this document: . Flow-based . Interface-based, link-based, node-based . Node-pair-based . Path-based 8.1 Flow-based This is conceptually similar to the call detail record (CDR) in circuit-switched telecommunications networks. It is primarily used on interfaces at access routers, edge routers, or aggregation routers where traffic originates or terminates, rather than on backbone routers in the core network. Like CDR measurements, flow- based records are used to collect detailed information about a flow. This includes such information as source and destination IP addresses/port numbers, protocol, type of service, timestamps for the start and end of a flow, packet count, octet count, etc. As flow is a fine-grained object, measuring every flow that passes through all the edge devices may not be scalable or feasible. Hence, per-flow data are usually used in a special study conducted on a non-continuous schedule and on selected routers only. Sampling of flow-based measurements may also be needed to reduce both the amount of data collected and the associated overhead. 8.2 Interface-based, link-based, node-based Passive measurement can be taken at each network element. For example, SNMP uses passive monitoring to collect raw data on an interface at an edge or backbone router. These data are stored in MIBs (Management Information Bases) and include counts on packets and octets sent/received, packet discards, errored packets. While not intended for core network, RMON (Remote Network Monitoring) can possibly be used in the access link of an Internet service provider to provide managed Internet service to corporate LANs. To reduce the overhead in managing multiple links between the same ingress and egress points, there is proposal to aggregate links for network optimization . Component links in such a *bundled link* will have same routing constraints, resource classes, and attributes. Multiple links are treated as a single IP link. Traffic measurements, such as bandwidth availability, throughput, should consider the measurements for bundled links. Also, such measurements should be protocol independent and media independent to ensure portability and commonality in the measurements. 8.3 Node-pair-based Active measurements by probing, as specified in the IPPM framework, can be conducted between each pair of major routing hubs for determining edge-to-edge performance of a core network. This complements the passive measurements of the previous sub-section, which provide local views of the performance of individual network elements. In telecommunications networks, each established call has an associated node-pair. By maintaining a set of node-pair data registers (usage, peg count, overflow, etc) in each switch, node- pair-based measurements for traffic statistics such as the load between a given node pair are taken directly. In contrast, in IP- based networks, currently such kind of node-pair-based measurements cannot be taken directly. However, it is possible to infer them from flow-based passive measurements and other network information. A problem with this approach is that flow-based measurement data are voluminous. Also, another problem that must be accounted for is the routing changes among the multiple routes due to, e.g., a change in the configuration of intradomain routing, or a change in interdomain policies made by another autonomous system. This is further discussed in the Section on Traffic Matrix Statistics. 8.4 Path-based The ability of MPLS to use fixed preferred paths for routing traffic, so-called route pinning, gives the means to develop path- based measurements. This may enable the development of methodologies for such functions as admission control and performance verification of delivered service. Like a flow, a path is associated with a pair of nodes. However, path is a more coarse-grained object than flow, as paths are usually used to carry aggregated traffic. In addition, when routing changes occur, the amount of traffic to be carried by a path will either not be affected or be merged with that of another path. Because of these properties, path-based measurements are more scalable and may be used to provide more readily an accurate, network-wide, view of the traffic demands. For example, the traffic between a given pair of nodes may be inferred from the aggregate of the traffic carried by the all the paths either terminated by or passed through the same node-pair. 9. Measurement Entities A measurement entity defines what is measured: it is a quantity for which data collection must be performed with a certain measurement. A measurement type can be specified by a (meaningful) combination of a measurement entity with the measurement basis described in the previous section. 9.1 Entities related to traffic and performance Some of the measurement entities listed below, such as throughput, delay, delay variation, and packet loss, are related to the respective IPPM performance metrics or the I.380/Y.1540 performance parameters. . Traffic volume (mean and variance, in number of bits, bytes, or packets transferred, as counted over a given time interval), on a per service class basis, at various aggregation levels (IP address prefix, interface, link, node, node-pair, path, network edge, customer, or autonomous system) Note: (1) This is a measurement for the traffic carried by a network, a network segment, or an individual network element; it is used to derive the carried load or carried traffic intensity . When measured during the busy period, this entity is normally used to estimate the traffic offered. However, the estimation procedure should take into account such factors as congestion, which may result in decreased carried traffic. In addition, congestion may lead to user behavior such as reattempt or abandonment, which may affect the actual traffic offered. (2) To reduce uncertainty in traffic estimation, second-order measures may need to be developed. (3) Measurement of traffic volumes over interconnecting links at border routers can be used to estimate the traffic exchange between peers for contract verification. . Average holding time (e.g., flow duration or lifetime, duration of an MPLS path), on a per service class basis Note: (1) This is similar to call holding time in telecommunications networks. Peg count, usage, and call holding time are three busy-hour entities that should be independently measured for both call-dependent and load-dependent engineering. This is important especially when the call busy hour and the load busy hour during a day are non-coincident, due to the hour-to-hour variation of call holding times. (2) The holding time statistics of long-living static paths reflect the effect of network equipment failures, link outages, or scheduled maintenance, and hence may to used to derive information about up-time or service availability. . Available bandwidth of a link or path - useful for load balancing, measurement-based admission control to determine the feasibility of creating a new MPLS tunnel (real-time information can be used for dynamic establishment) . Throughput (in bits per second, bytes per second, or packets per second) Note: (1) This is a measure of the "goodput." That is, the rate at which a given amount of traffic excluding lost, misdelivered, or errored packets, that passes between a set of end points, where end points can be logically or physically defined. The condition of the network, e.g., normal or high load, under which the measurement is taken should be noted. (2) The protocol level at which a throughput measurement is taken must be specified, as the packet payload and packet overheads are protocol dependent. (3) The average packet size may be inferred from the bit rate and packet rate measurements. This quantity is useful to gauge router performance, since router operations are typically packet-oriented and small packets are more processing-intensive. . Delay (e.g., cross-router delay from node-based measurement may be used to measure queueing delay within a router; end-to-end one-way or round-trip packet delay can be obtained by node-pair-based measurement) Note: The condition of the network, e.g., normal or high load, under which the measurement is taken should be noted. This is useful to determine if delay objectives are met. . Delay variation Note: There are several methods to measure this quantity as specified in ITU-T and IPPM. (1) In Appendix II of I.380/Y.1540, IP packet delay variation is defined via four alternative methods. The first two methods define an end-to-end two-point delay variation of a given packet, measured between two measurement points (such as ingress and egress), as the difference between the one-way delay of the given packet and some nominal delay. This nominal delay is chosen to be the first packet delay in the first method and the average delay of the population of packets in the second method. The third alternative, interval-based method, measures the percentage of packets with delay variations that fall outside some pre-specified delay variation interval. Finally, the quantile-based method measures the distance (in time units) between pre-selected quantiles, e.g., 99.5 percentile and 0.5 percentile, of the delay variation distribution. This method is tighter than the interval-based method since it bounds the tail of the delay variation distribution. In Y.1541, additional considerations and more alternatives of delay variations are described. (2) In IPPM , the concept of a selection function is introduced that allows for the explicit designation of selected packets whose one-way delay values are compared to compute one-way delay variation. For example, a selection function can be defined to select the consecutive packets within a specified interval, or to select the maximum and minimum one-way delays within a specified interval. . Packet loss Note: (1) While packet losses due to transmission and/or protocol errors may not be traffic related, unexpected excessive loss may be used as a means of fault detection. (2) Packet losses due to policing or network congestion should be distinguished. The former is a result of user violation of service contract and the network operator should not be penalized for it. The latter, whether intentional or unintentional, is caused by network conditions such as buffer overflow, router forwarding process busy, and may not be the user's fault. When policing is done by a network, measurement of non-conforming packets at the edge provides an indication on the extent to which the network is carrying this type of packets (which can potentially be dropped if network gets congested). Loss due to congestion of any packets, including loss of non-conforming packets, is a useful measure in traffic engineering to account for resource management. (3) Long- term averages can be measured by the I.380/Y.1540 IP packet loss ratio or by the IPPM Poisson sampling of one-way loss. However, during the convergence times associated with routing updating, the loss may be high enough as to cause service unavailability. This effect needs to be captured and statistics such as loss patterns, burst loss, or severe loss ratio may be useful. . Resource usage, such as link/router utilization, buffer occupancy (e.g., fraction of arriving packets finding the buffer above a given set of thresholds) Note: (1) Depending on the architecture of a router, router utilization measurements may include processor and memory (e.g., forwarding tables) utilization for each of the line cards and/or the central unit. (2) Trigger points may be set when resource usage consistently exceeds a certain threshold. 9.2 Entities related to establishment of connection or path Where connection admission control is used, a measurement entity for monitoring network performance may be the proportion of connections denied admission. Also, it may be useful to score the requested bandwidth within the traffic parameters for the setup request. Corresponding to the number of call attempts (i.e., peg count) in telecommunications networks, the number of connection requests, the number of flows, etc., may be measured in given read-out periods to characterize the traffic. To characterize paths, the following measurement entities may be defined: path setup delay, path setup error probability, path setup denial (blocking) probability, path release delay, path disconnect probability, path restoration time. 10. Measurement Types A measurement matrix can be defined wherein each column represents a measurement basis and each row represents a measurement entity. An entry in this measurement matrix, corresponding to a meaningful and measurable combination of an entity and a basis, defines a particular measurement type. For each measurement type, there should be a set of measurement points specified to bound the network segment for the purposes of taking measurement. A measurement point may be the physical boundary between a node and an adjacent link, or the logical interface between two protocol layers in a protocol stack. 10.1 Measurement types related to traffic or performance The following measurement matrix illustrates some of the measurement types related to traffic or performance. Potentially, there can be one such matrix for each service class. Bases: Flow Interface, Node Pair Path Node Entities: (passive) (passive) (both) (both) Traffic Volume x(1) x x(3) x(3) Avg. Hold. Time x x(3) Avail. Bandwidth x x(3) Throughput x(4) x(4) Delay x(2) x(4) x(4) Delay Variation x(2) x(4) x(4) Packet Loss x x(5) x(5) Notes: (1) This measurement type can be used to derive flow size statistics. (2) These are 1-point measurements. (3) As a starting point, statistics collected by passive measurement through the MPLS traffic engineering MIBs [17, 18, 19] may be used. (4) Active measurements based on IPPM metrics are currently in use for node-pairs; they may be developed for paths. (5) Besides active measurements based on IPPM, path loss may possibly be inferred from the difference between ingress and egress traffic statistics at the two endpoints of a path. However, such inference for the cumulative losses between a given node pair over multiple routes may be less useful, since different routes may have different loss characteristics. 10.2 Measurement types related to resource usage Another measurement matrix can be constructed for resource consumption. This leads to a set of measurement types comprising the different usage, one for each network resource object such as router (processor and memory), link, and buffer, by different classes of traffic: . control (e.g., routing control) traffic . signaling traffic . user traffic from different service classes Bases: Node Link Buffer Entities: Control Util. x x x Signaling Util. x x x Service Class Util. x x x The amount of control and signaling traffic carried by a network is a function of many factors. To name a few, they include the size and topology of the network, the control and signaling protocols used, the amount of user traffic carried, the number of failure events, etc. Also, flooding of link-state advertisement (LSA) messages in Interior Gateway Protocol (IGP, such as OSPF or IS-IS) may cause significant routing control traffic during events such as an LSA storm as a result of failures due to fiber cuts or failed power supply. The above utilization measurements for control and signaling traffic are intended to help develop guidelines for the proper dimensioning and apportionment of network resources so that a given level of user traffic can be adequately supported. As the primary focus here is on user traffic measurements, the additional needs and properties of control and signaling traffic measurements are beyond the scope of this document. 11. Traffic Matrix Statistics An important set of data for traffic engineering is point-to-point or point-to-multipoint demands. This data is needed in the provisioning of intradomain routes and external peering in the existing network, as well as planning for the placement and sizing of new links, routers, or peers. In current practice, estimates for traffic demands are usually determined from a combination of traffic projections, customer prescriptions, and service level agreements. Under existing mode of operation, it is not easy to obtain network-wide traffic demands from the local interface measurements taken by different IP routers. As explained in [20, 21], information from diverse network measurements and various configuration files are needed to infer the traffic volume. Besides raw measurement data, additional information such as topological data and router configuration data are required to obtain a network view. Furthermore, destination- based routing/forwarding in IGP provides a network operator with primitive and limited control over the routing of traffic flows. This necessitates the association of a time sequence of forwarding tables from different routers to reconstruct the different routes used by the network over time. By using this auxiliary information, together with flow-based measurements, the above-cited references describe how to determine the traffic volume from an ingress link to a set of egress links by validating and joining various data sets together. Some shortcomings in today's method to derive traffic matrix statistics as above include the volume of data from flow-based measurement, the lack of sufficient routing control information, and the need to correlate data from a variety of sources. The routing control offered by MPLS can be used to avoid some of these deficiencies. To take advantage of this capability, path-based passive measurement should be developed. Furthermore, as explained in the Section on Path-based Measurement Bases, by aggregating the appropriate set of path-based traffic data, the corresponding node- pair-based traffic data can be obtained. This will facilitate the derivation of traffic matrix statistics, possibly on a per service class basis. Besides traffic engineering, a major application of MPLS is the support of network-based virtual private networks (VPNs). A VPN can be an enterprise network or a carrier's carrier network. Path-based measurement by a network operator on behalf of the VPN customers facilitates the estimation of the traffic offered by these VPNs. 12. Performance Monitoring General aspects of measurements required to support the operation, administration, and maintenance of a network are outside the scope of this document (see [22, 23, 24] for a discussion of MPLS OAM). The focus of the measurements here is only on operations related to traffic engineering and network performance management. A major component of performance management is performance monitoring, i.e., continuous real-time monitoring of the quality or health of the network and its various elements to ensure a sustained, uninterrupted delivery of quality service. This requires the use of measurement, either passively or actively, to collect information about the operational state of the network and to track its performance. For a discussion of passive monitoring and the use of synthetic traffic sources in active probing, see . Alarms may be generated when the state of a network element exceeds prescribed thresholds. Performance degradation can occur as a result of routing instability, congestion, or failure of network components. Periods of congestion may be detected when the resource usage of a network segment consistently exceeds a certain threshold, or when the cross- router delay is unexpectedly high. After the identification of a hot spot, active throughput measurement may be used to seek out alternate routes for congestion bypass. Unexpected excessive loss of packets or throughput drops may be used as a means of fault detection, and may result in restoration activities. Internet utilities such as ping and traceroute have been useful to help diagnose network problems and performance debugging. Utilities with similar functions would be essential for path-oriented operations like in MPLS. This would include the capability to list, at any time, (1) for a given path, all the nodes traversed by it, and (2) for a given node, all the paths originating from it, transiting through it, and/or terminating on it. A proposal for route tracing is described in . 13. Security Considerations The principlesPacket Sampling A wide spectrum of operational applications can be built on traffic measurement. However, different applications usually require traffic measurements at different levels of temporal and concepts relatedspatial granularity. To achieve an effective tradeoff between implementation complexity and the range of operational tasks to Internet trafficbe enabled, a passive measurement as discussedframework based on packet sampling is proposed in this document do not by themselves affect the security. The use of packet sampling has two motivations. First, the Internet. However, it is assumedenormous volumes of traffic require that anysome form of data reduction to be used. Second, simple data reduction by aggregation at the measurement systems that are developedpoint will not provide sufficiently detailed views for all network management applications or deployed byexploratory studies. For this reason, packet sampling is proposed as a service provider are responsible for providing sufficientmeans to reduce data integrity and confidentiality. Itvolume while still retaining representative detail. The primary aim of the proposal  is also assumed that a service provider will take proper precautionsto ensuredefine a minimal set of primitive packet selection operations out of which all sampling operations that accessare necessary to its measurement systemssupport measurement-based applications can be composed. Operations currently under consideration include filtering and all associated data is secure. Methodsstatistical sampling, and also hash-based packet selection, a method that can be used to achieve thesesupport the determination of spatial traffic flows across a domain. Whichever method is used, the interpretation of the stream of measurements arising from sampled packets must be both transparent and standard. Other goals are to specify a means to format and export measurements, and a means to manage the configuration of the sampling and export operations. The proposal positions these function to provide a basic packet sampled measurement service to higher level "consumers." A typical consumer is a network management application that sits behind a remote measurement collector. Such measurements can support applications for a number of tasks: troubleshooting, demand characterization, scenario evaluation and what-ifs. Another type of consumer is a higher level on-router measurement application. One potential class of examples is composite measurements (e.g., interpacket delay statistics) formed from a number of individual packet measurements. Another class is network security applications, e.g., IP traceback . For some applications, the ability to have low latency between packet measurement and reporting will be particularly useful. 14. Statistical Estimation and Information Modeling This section deals with engineering methods in statistical estimation, as well as the need for an information model and associated repository schema for the measurements. 14.1 Engineering methods for statistical estimation of measures The use of the well-established methods of optimal estimation [29, 30, 31, 32] to obtain estimates of the measures for TE is recommended. This draws upon several facts: . Internet traffic is inherently band-limited, but non-stationary; . Internet traffic may be heavy-tailed and possess strong short-term correlations; . A stationary, band-limited process can be approximated arbitrarily closely by optimal estimation methods based on a finite number of past samples. Standard procedures for de-trending the raw data to provide "trend + stationary" decompositions should be adopted. An example is the use of Autoregressive Integrated Moving Average (ARIMA) models, where first differences are applied to the raw (non-stationary) data, yielding a stationary derived process. Then, the methods of optimal estimation can be applied in a practical setting (e.g., finite sample counts) to the derived stationary process to produce quality estimates of the measures defined herein. As the original raw process may be any of the measurements discussed in this document, the above procedure may be applied without loss of generality to measures of delay, loss, or complex measures of network state such as path characteristics, etc. In addition, these methods need to be applied across multiple time- scales, so that TE applications can work with measures related to: . long-term trends over days, weeks, and months; . busy-hour characterizations; and . statistics and correlation properties on the order of seconds . The above estimation procedures apply equally to traffic workload, traffic performance, or other estimates of network state, such as the state of routes. 14.2 TE Measure Information Modeling An information model is valuable for organizing data generated through the estimation process. An information model is needed for TE measures because a complete model does not exist for these measures. Measures must be associated with a large, and sometimes complicated set of attributes (e.g., as simple as an IP address of a measurement point, or as complex as the path of a round-trip measurement). Information models exist that richly describe network elements and their configuration . These models have been extended to include policy mechanisms . Specifications for flows have been developed for network resource allocation purposes . No centralized information model exists that can completely describe many of the TE measures defined herein. Therefore, necessary integrating information models that make maximal reuse of pre- existing work may need to be developed for TE measures. As a brief example of the limitations of existing information models, consider RFC 1363  as a model for a traffic flow. It can be described as collection of attributes defining traffic offered load, performance to be delivered (a goal), and the assurance level (risk) associated with the actual performance obtained. The traffic offered load is specified via an envelope described by a token bucket concept (token bucket rate, bucket size) and a maximum transmission rate. This model, while clearly intended for description of what a network will tolerate of a flow, could also be used to describe a flow in a TE measure sense, e.g., "a flow that lives within the token rate x and size y with probability 0.999." Note that a probability statement must be added to complete the characterization. This type of specification is known as (sigma, rho) in the literature. Also, note that adopting such an information model for flows lacks any flexibility to specify time scale, or more detailed second-order statistics. Similar limitations exist with respect to delivered performance specification in RFC1363, and the text of the RFC is quick to point out, for example, that the "loss model is crude." For these reasons, and others, an appropriate information model is needed for TE measures that can support uniformity of data definition in subsequent TE applications. Several approaches and options for repository technology are now broadly discussed. Relationships between TE measure information models on other information models (e.g., COPS) that drive network outcomes are of particular importance. Linkages may need to be considered between policy mechanisms and TE measures. This is useful because, while policy-driven networking is well-developed between the policy repositories, policy control points and policy enforcement, policy content is very likely the output of TE applications. Since TE applications are dependent upon TE measures, it is advantageous to provide traceability between the measures and the engineering changes made as a consequence of them. Measures (represented by their estimates) should be centrally stored and collected. Two methods are: (1) extend MIBs with new definitions for TE measure estimates, and (2) create data depositories through more centralized facilities, such as LDAP repositories. Both methods have merits as collection processes for TE measures. Using MIBs allows well-established SNMP protocol and related applications to retrieve data from the network elements being measured. This is inherently "vendor-neutral," allowing commonly defined TE measurements to be stored for retrieval in a common MIB definition, regardless of network element vendor, technology or other differences. Measurements from individual network elements (interfaces, routers, etc.) can be obtained "locally," if measures from a single network element are sufficient for a given TE application. However, if a network-wide view of the measurements is desired, the drawback of a MIB-based approach is that the data must be retrieved from each element over the network. As experience attests, this approach sometimes generates significant SNMP traffic, and during periods of high congestion (when measurements may be quite important) SNMP may not reliably fetch the measurement data. Finally, a MIB-based approach may be difficult to implement for various two-point measurements, such as end-to-end, or round-trip delay and delay variation. Such measurements are not related to a single network element, and somewhat heuristic practices (e.g., storing end-to-end delay measurements in MIBs located on source address elements, etc.) are required. An LDAP repository approach centralizes the data storage. This has the advantage that TE applications (such as offline and online TE, or measurement-based admission control) can be performed, and policy database content can be updated without invasive retrieval of data from network-wide MIBs. Further, traceability can be established between the TE measurements in an LDAP repository, and the associated policy content derived from them. It is possible that both the MIB-based and LDAP-based (or another approach altogether) should be considered jointly. 15. Conclusions and Recommendations This document is intended as a framework for traffic metrics needed for successful TE. Principles of best practice in traffic characterization and performance characterization are described. For interoperable compatibility, basic areas of traffic measurement recommended for standardization include: . Need for uniform definitions across vendors and operators . Distinction between traffic offered load versus achieved throughput . Use of node-pair-based traffic data to derive per-service-class traffic matrix statistics . Statistics of carried load versus performance . Need for higher-order statistics for service assurance . Need for packet sampled measurements that preserve representative traffic detail at manageable sample volumes 16. Security Considerations The principles and concepts related to Internet traffic measurement as discussed in this document do not by themselves affect the security of the Internet. However, it is assumed that any measurement systems that are developed or deployed by a service provider are responsible for providing sufficient data integrity and confidentiality. It is also assumed that a service provider will take proper precautions to ensure that access to its measurement systems and all associated data is secure. Methods to achieve these security considerations are not addressed in this document. 14.17. References 1 D.O. Awduche, A. Chiu, A. Elwalid, I. Widjaja, and X. Xiao, "Overview and Principles of Internet Traffic Engineering," Internet-Draft, Work in Progress, October 2001. 2 V. Paxson, G. Almes, J. Mahdavi, and M. Mathis, "Framework for IP Performance Metrics," RFC 2330, May 1998. 3 J. Mahdavi and V. Paxson, "IPPM Metrics for Measuring Connectivity," RFC 2678, September 1999. 4 G. Almes, S. Kalidindi, and M. Zekauskas, "A One-way Delay Metric for IPPM," RFC 2679, September 1999. 5 G. Almes, S. Kalidindi, and M. Zekauskas, "A One-way Packet Loss Metric for IPPM," RFC 2680, September 1999. 6 G. Almes, S. Kalidindi, and M. Zekauskas, "A Round-trip Delay Metric for IPPM," RFC 2681, September 1999. 7 M. Mathis and M. Allman, "A Framework for Defining Empirical Bulk Transfer Capacity Metrics," RFC 3148, July 2001. 8 C. Demichelis and P. Chimento, "IP Packet Delay Variation Metric for IPPM," Internet-Draft, Work in Progress, February 2001. 9 V. Raisanen andRaisanen, G. Grotefeld, and A. Morton, "Network performance measurement for periodic streams," Internet-Draft, Work in Progress, January 2001.February 2002. 10 ITU-T Recommendation I.380/Y.1540, "Internet Protocol Data Communication Service -- IP Packet Transfer and Availability Performance Parameters," February 1999. 11 ITU-T Draft Recommendation Y.1541, "Network Performance Objectives for IP-Based Services," October 2001. 12 E. Rosen, A. Viswanathan, and R. Callon, "Multiprotocol Label Switching Architecture," RFC 3031, January 2001. 13 S. Bradner (Editor), "Benchmarking Terminology for Network Interconnection Devices," RFC 1242, July 1991. 14 G. Ash, "Traffic Engineering & QoS Methods for IP-, ATM-, & TDM- Based Multiservice Networks," Internet-Draft, Work in Progress, October 2001. 15 K. Kompella, Y. Rekhter, and L. Berger, "Link Bundling in MPLS Traffic Engineering," Internet-Draft, Work in Progress, February 2001. 16 W.S. Lai, "Traffic Measurement for Dimensioning and Control of IP Networks," Internet Performance and Control of Network Systems II Conference, SPIE Proceedings, Vol. 4523, Denver, Colorado, 21-22 August 2001, pp. 359-367. 17 C. Srinivasan, A. Viswanathan, and T.D. Nadeau, "MPLS Label Switch Router Management Information Base Using SMIv2," Internet- Draft, Work in Progress, January 2001. 18 C. Srinivasan, A. Viswanathan, and T.D. Nadeau, "Multiprotocol Label Switching (MPLS) Traffic Engineering Management Information Base," Internet-Draft, Work in Progress, August 2001. 19 K. Kompella, " A Traffic Engineering MIB," Internet-Draft, Work in Progress, October 2001. 20 A. Feldmann, A. Greenberg, C. Lund, N. Reingold, J. Rexford, and F. True, "Deriving Traffic Demands for Operational IP Networks: Methodology and Experience," Proc. ACM SIGCOMM 2000, Stockholm, Swedan. 21 A. Feldmann, A. Greenberg, C. Lund, N. Reingold, and J. Rexford, "NetScope: Traffic Engineering for IP Networks," IEEE Network, March/April 2000. 22 N. Harrison, P. Willis, S. Davari, E. Cuevas, B. Mack-Crane, E. Franze, H. Ohta, T. So, S. Goldfless, and F. Chen, "Requirements for OAM in MPLS Networks," Internet-Draft, Work in Progress, May 2001. 23 ITU-T Draft Recommendation Y.1710, "Requirements for OAM Functionality for MPLS Networks," May 2001. 24 ITU-T Draft Recommendation Y.1711, "OAM Mechanisms for MPLS Networks," May 2001. 25 R.G. Cole, R. Dietz, C. Kalbfleisch, and D. Romascanu, "A Framework for Synthetic Sources for Performance Monitoring," Internet-Draft, Work in Progress, May 2001. 26 R. Bonica, K. Kompella, and D. Meyer, "Tracing Requirements for Generic Tunnels," Internet-Draft, Work in Progress, February 2001. 15.27 R. Bush, N.G. Duffield, A. Greenberg, M. Grossglauser, J. Rexford, "A Framework for Passive Packet Measurement," Internet- Draft, Work in Progress, November 2001. 28 C. Partridge, C. Jones, D. Waitzman, and A. Snoeren, "New Protocols to Support Internet Traceback," Internet-Draft, Work in Progress, November 2001. 29 S. Haykin, Ed., "Kalman Filtering and Neural Networks," Wiley Interscience, 2001. 30 A. Papoulis, "Probability, Random Variables and Stochastic Processes," 3rd Ed., McGraw-Hill, 1991. 31 A. Gelb, Ed., "Applied Optimal Estimation," MIT Press, 1974. 32 I. R. Petersen, V. A. Ugrinovskii, A. V. Savkin, "Robust Control Design Using H<\infinity> Methods," Springer, 2000. 33 V. Bolotin, J. Coombs-Reyes, D. Heyman, Y. Levy, and D. Liu, "IP Traffic Characterization for Planning and Control," Proc. ITC16, Edinburgh, Scotland, June 1999. 34 Distributed Management Task Force (DMTF) Common Information Model (CIM), www.dmtf.org 35 B. Moore, E. Ellesson, and J. Strassner, "Policy Core Information Model -- Version 1 Specification," RFC 3060, February 2001. 36 C. Partridge, "A Proposed Flow Specification," RFC 1363, September 1992. 18. Acknowledgments The support of Gerald Ash on this work and his comments are much appreciated. Also, thanks to the inputs from Robert Cole, Enrique Cuevas, Alfred Morton, Moshe Segal, and the Tequila project. 16.Nick Duffield contributed section 13 on packet sampling. 19. Author's Addresses Wai Sum Lai AT&T Labs Room D5-3D18 200 Laurel Avenue Middletown, NJ 07748, USA Phone: +1 732-420-3712 Email: email@example.com Blaine Christian UUNET Room D1-2-737 22001 Loudoun County Parkway Ashburn, VA 20147, USA Phone: +1 703-206-5600 Email: Blaine@uu.net Richard W. Tibbs Oak City Networks & Solutions P.O. Box 10292 Raleigh, NC 27605, USA Phone: +1 919-510-9551 Email: firstname.lastname@example.org Steven Van den Berghe Ghent University/IMEC St. Pietersnieuwsstraat 41 B-9000 Ghent, Belgium Phone: ++32 9 267 35 86 E-mail: email@example.com Full Copyright Statement "Copyright (C) The Internet Society (date). All Rights Reserved. This document and translations of it may be copied and furnished to others, and derivative works that comment on or otherwise explain it or assist in its implmentation may be prepared, copied, published and distributed, in whole or in part, without restriction of any kind, provided that the above copyright notice and this paragraph are included on all such copies and derivative works. However, this document itself may not be modified in any way, such as by removing the copyright notice or references to the Internet Society or other Internet organizations, except as needed for the purpose of developing Internet standards in which case the procedures for copyrights defined in the Internet Standards process must be followed, or as required to translate it into languages other than English. The limited permissions granted above are perpetual and will not be revoked by the Internet Society or its successors or assigns. This document and the information contained herein is provided on an "AS IS" basis and THE INTERNET SOCIETY AND THE INTERNET ENGINEERING TASK FORCE DISCLAIMS ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO ANY WARRANTY THAT THE USE OF THE INFORMATION HEREIN WILL NOT INFRINGE ANY RIGHTS OR ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE.