draft-ietf-opsawg-ntf-12.txt   draft-ietf-opsawg-ntf-13.txt 
OPSAWG H. Song OPSAWG H. Song
Internet-Draft Futurewei Internet-Draft Futurewei
Intended status: Informational F. Qin Intended status: Informational F. Qin
Expires: 4 June 2022 China Mobile Expires: 6 June 2022 China Mobile
P. Martinez-Julia P. Martinez-Julia
NICT NICT
L. Ciavaglia L. Ciavaglia
Rakuten Mobile Rakuten Mobile
A. Wang A. Wang
China Telecom China Telecom
1 December 2021 3 December 2021
Network Telemetry Framework Network Telemetry Framework
draft-ietf-opsawg-ntf-12 draft-ietf-opsawg-ntf-13
Abstract Abstract
Network telemetry is a technology for gaining network insight and Network telemetry is a technology for gaining network insight and
facilitating efficient and automated network management. It facilitating efficient and automated network management. It
encompasses various techniques for remote data generation, encompasses various techniques for remote data generation,
collection, correlation, and consumption. This document describes an collection, correlation, and consumption. This document describes an
architectural framework for network telemetry, motivated by architectural framework for network telemetry, motivated by
challenges that are encountered as part of the operation of networks challenges that are encountered as part of the operation of networks
and by the requirements that ensue. This document clarifies the and by the requirements that ensue. This document clarifies the
skipping to change at page 1, line 48 skipping to change at page 1, line 48
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 4 June 2022. This Internet-Draft will expire on 6 June 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
skipping to change at page 2, line 27 skipping to change at page 2, line 27
provided without warranty as described in the Revised BSD License. provided without warranty as described in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Applicability Statement . . . . . . . . . . . . . . . . . 4 1.1. Applicability Statement . . . . . . . . . . . . . . . . . 4
1.2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 4 1.2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7
2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 10 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9
2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 11 2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 11
2.5. The Necessity of a Network Telemetry Framework . . . . . 13 2.5. The Necessity of a Network Telemetry Framework . . . . . 13
3. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14 3. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14
3.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 15 3.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 15
3.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18 3.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18
3.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18 3.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18
3.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19 3.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19
3.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21 3.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21
3.2. Second Level Function Components . . . . . . . . . . . . 22 3.2. Second Level Function Components . . . . . . . . . . . . 22
3.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24 3.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24
skipping to change at page 4, line 9 skipping to change at page 4, line 9
different category of telemetry data and corresponding procedures. different category of telemetry data and corresponding procedures.
All the modules are internally structured in the same way, including All the modules are internally structured in the same way, including
components that allow the operator to configure data sources in components that allow the operator to configure data sources in
regard to what data to generate and how to make that available to regard to what data to generate and how to make that available to
client applications, components that instrument the underlying data client applications, components that instrument the underlying data
sources, and components that perform the actual rendering, encoding, sources, and components that perform the actual rendering, encoding,
and exporting of the generated data. We show how the network and exporting of the generated data. We show how the network
telemetry framework can benefit the current and future network telemetry framework can benefit the current and future network
operations. Based on the distinction of modules and function operations. Based on the distinction of modules and function
components, we can map the existing and emerging techniques and components, we can map the existing and emerging techniques and
protocols into the framework. The framework can also simplify the protocols into the framework. The framework can also simplify
designing, maintaining, and understanding a network telemetry system. designing, maintaining, and understanding a network telemetry system.
In addition, we outline the evolution stages of the network telemetry In addition, we outline the evolution stages of the network telemetry
system and discuss the potential security concerns. system and discuss the potential security concerns.
The purpose of the framework and taxonomy is to set a common ground The purpose of the framework and taxonomy is to set a common ground
for the collection of related work and provide guidance for future for the collection of related work and provide guidance for future
technique and standard developments. To the best of our knowledge, technique and standard developments. To the best of our knowledge,
this document is the first such effort for network telemetry in this document is the first such effort for network telemetry in
industry standards organizations. This document does not define industry standards organizations. This document does not define
specific technologies. specific technologies.
skipping to change at page 4, line 44 skipping to change at page 4, line 44
Before further discussion, we list some key terminology and acronyms Before further discussion, we list some key terminology and acronyms
used in this document. We make an intended differentiation between used in this document. We make an intended differentiation between
the terms of network telemetry and OAM. However, it should be the terms of network telemetry and OAM. However, it should be
understood that there is not a hard-line distinction between the two understood that there is not a hard-line distinction between the two
concepts. Rather, network telemetry is considered as an extension of concepts. Rather, network telemetry is considered as an extension of
OAM. It covers all the existing OAM protocols but puts more emphasis OAM. It covers all the existing OAM protocols but puts more emphasis
on the newer and emerging techniques and protocols concerning all on the newer and emerging techniques and protocols concerning all
aspects of network data from acquisition to consumption. aspects of network data from acquisition to consumption.
AI: Artificial Intelligence. In network domain, AI refers to the AI: Artificial Intelligence. In the network domain, AI refers to
machine-learning based technologies for automated network the machine-learning based technologies for automated network
operation and other tasks. operation and other tasks.
AM: Alternate Marking, a flow performance measurement method, AM: Alternate Marking, a flow performance measurement method,
specified in [RFC8321]. specified in [RFC8321].
BMP: BGP Monitoring Protocol, specified in [RFC7854]. BMP: BGP Monitoring Protocol, specified in [RFC7854].
DPI: Deep Packet Inspection, referring to the techniques that DPI: Deep Packet Inspection, referring to the techniques that
examines packet beyond packet L3/L4 headers. examines packet beyond packet L3/L4 headers.
skipping to change at page 6, line 15 skipping to change at page 6, line 15
[I-D.ietf-ippm-ioam-direct-export]. [I-D.ietf-ippm-ioam-direct-export].
RESTCONF: An HTTP-based protocol that provides a programmatic RESTCONF: An HTTP-based protocol that provides a programmatic
interface for accessing data defined in YANG, using the datastore interface for accessing data defined in YANG, using the datastore
concepts defined in NETCONF, as specified in [RFC8040]. concepts defined in NETCONF, as specified in [RFC8040].
SMIv2: Structure of Management Information Version 2, defining MIB SMIv2: Structure of Management Information Version 2, defining MIB
objects, specified in [RFC2578]. objects, specified in [RFC2578].
SNMP: Simple Network Management Protocol. Version 1, 2, and 3 are SNMP: Simple Network Management Protocol. Version 1, 2, and 3 are
specified in [RFC1157], [RFC3416], and [RFC3414], respectively. specified in [RFC1157], [RFC3416], and [RFC3411], respectively.
XML: Extensible Markup Language is a markup language for data XML: Extensible Markup Language is a markup language for data
encoding that is both human-readable and machine-readable, encoding that is both human-readable and machine-readable,
specified by W3C [xml]. specified by W3C [xml].
YANG: YANG is a data modeling language for the definition of data YANG: YANG is a data modeling language for the definition of data
sent over network management protocols such as the NETCONF and sent over network management protocols such as the NETCONF and
RESTCONF. YANG is defined in [RFC6020] and [RFC7950]. RESTCONF. YANG is defined in [RFC6020] and [RFC7950].
YANG ECA: A YANG model for Event-Condition-Action policies, defined YANG ECA: A YANG model for Event-Condition-Action policies, defined
skipping to change at page 7, line 9 skipping to change at page 7, line 9
technologies, network big data analytics gives network operators an technologies, network big data analytics gives network operators an
opportunity to gain network insights and move towards network opportunity to gain network insights and move towards network
autonomy. Some operators start to explore the application of autonomy. Some operators start to explore the application of
Artificial Intelligence (AI) to make sense of network data. Software Artificial Intelligence (AI) to make sense of network data. Software
tools can use the network data to detect and react on network faults, tools can use the network data to detect and react on network faults,
anomalies, and policy violations, as well as predicting future anomalies, and policy violations, as well as predicting future
events. In turn, the network policy updates for planning, intrusion events. In turn, the network policy updates for planning, intrusion
prevention, optimization, and self-healing may be applied. prevention, optimization, and self-healing may be applied.
It is conceivable that an autonomic network [RFC7575] is the logical It is conceivable that an autonomic network [RFC7575] is the logical
next step for network evolution following Software Defined Network next step for network evolution following Software Defined Networking
(SDN), aiming to reduce (or even eliminate) human labor, make more (SDN), aiming to reduce (or even eliminate) human labor, make more
efficient use of network resources, and provide better services more efficient use of network resources, and provide better services more
aligned with customer requirements. The related technique of aligned with customer requirements. The IETF ANIMA working group is
Intent-based Networking (IBN) dedicated to developing and maintaining protocols and procedures for
automated network management and control of professionally-managed
networks. The related technique of Intent-based Networking (IBN)
[I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility [I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility
and telemetry data in order to ensure that the network is behaving as and telemetry data in order to ensure that the network is behaving as
intended. intended.
However, while the data processing capability is improved and However, while the data processing capability is improved and
applications require more data to function better, the networks lag applications require more data to function better, the networks lag
behind in extracting and translating network data into useful and behind in extracting and translating network data into useful and
actionable information in efficient ways. The system bottleneck is actionable information in efficient ways. The system bottleneck is
shifting from data consumption to data supply. Both the number of shifting from data consumption to data supply. Both the number of
network nodes and the traffic bandwidth keep increasing at a fast network nodes and the traffic bandwidth keep increasing at a fast
skipping to change at page 7, line 37 skipping to change at page 7, line 39
a nutshell, it is a challenge to get enough high-quality data out of a nutshell, it is a challenge to get enough high-quality data out of
the network in a manner that is efficient, timely, and flexible. the network in a manner that is efficient, timely, and flexible.
Therefore, we need to survey the existing technologies and protocols Therefore, we need to survey the existing technologies and protocols
and identify any potential gaps. and identify any potential gaps.
In the remainder of this section, first we clarify the scope of In the remainder of this section, first we clarify the scope of
network data (i.e., telemetry data) relevant in this document. Then, network data (i.e., telemetry data) relevant in this document. Then,
we discuss several key use cases for today's and future network we discuss several key use cases for today's and future network
operations. Next, we show why the current network OAM techniques and operations. Next, we show why the current network OAM techniques and
protocols are insufficient for these use cases. The discussion protocols are insufficient for these use cases. The discussion
underlines the need of new methods, techniques, and protocols, as underlines the need for new methods, techniques, and protocols, as
well as the extensions of existing ones, which we assign under the well as the extensions of existing ones, which we assign under the
umbrella term - Network Telemetry. umbrella term - Network Telemetry.
2.1. Telemetry Data Coverage 2.1. Telemetry Data Coverage
Any information that can be extracted from networks (including data Any information that can be extracted from networks (including data
plane, control plane, and management plane) and used to gain plane, control plane, and management plane) and used to gain
visibility or as basis for actions is considered telemetry data. It visibility or as basis for actions is considered telemetry data. It
includes statistics, event records and logs, snapshots of state, includes statistics, event records and logs, snapshots of state,
configuration data, etc. It also covers the outputs of any active configuration data, etc. It also covers the outputs of any active
skipping to change at page 8, line 20 skipping to change at page 8, line 22
2.2. Use Cases 2.2. Use Cases
The following set of use cases is essential for network operations. The following set of use cases is essential for network operations.
While the list is by no means exhaustive, it is enough to highlight While the list is by no means exhaustive, it is enough to highlight
the requirements for data velocity, variety, volume, and veracity, the requirements for data velocity, variety, volume, and veracity,
the attributes of big data, in networks. the attributes of big data, in networks.
* Security: Network intrusion detection and prevention systems need * Security: Network intrusion detection and prevention systems need
to monitor network traffic and activities and act upon anomalies. to monitor network traffic and activities and act upon anomalies.
Given increasingly sophisticated attack vector coupled with Given increasingly sophisticated attack vectors coupled with
increasingly severe consequences of security breaches, new tools increasingly severe consequences of security breaches, new tools
and techniques need to be developed, relying on wider and deeper and techniques need to be developed, relying on wider and deeper
visibility into networks. The ultimate goal is to achieve the visibility into networks. The ultimate goal is to achieve
security with no, or only minimal, human intervention. security with no, or only minimal, human intervention, and without
disrupting legitimate traffic flows.
* Policy and Intent Compliance: Network policies are the rules that * Policy and Intent Compliance: Network policies are the rules that
constrain the services for network access, provide service constrain the services for network access, provide service
differentiation, or enforce specific treatment on the traffic. differentiation, or enforce specific treatment on the traffic.
For example, a service function chain is a policy that requires For example, a service function chain is a policy that requires
the selected flows to pass through a set of ordered network the selected flows to pass through a set of ordered network
functions. Intent, as defined in functions. Intent, as defined in
[I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational
goals that a network should meet and outcomes that a network is goals that a network should meet and outcomes that a network is
supposed to deliver, defined in a declarative manner without supposed to deliver, defined in a declarative manner without
specifying how to achieve or implement them. An intent requires a specifying how to achieve or implement them. An intent requires a
complex translation and mapping process before being applied on complex translation and mapping process before being applied on
networks. While a policy or intent is enforced, the compliance networks. While a policy or intent is enforced, the compliance
needs to be verified and monitored continuously by relying on needs to be verified and monitored continuously by relying on
visibility that is provided through network telemetry data. Any visibility that is provided through network telemetry data. Any
violation must be notified immediately, potentially resulting in violation must be reported immediately, potentially resulting in
updates to how the policy or intent is applied in the network to updates to how the policy or intent is applied in the network to
ensure that it remains in force, or otherwise alerting the network ensure that it remains in force, or otherwise alerting the network
administrator to the policy or intent violation. administrator to the policy or intent violation.
* SLA Compliance: A Service-Level Agreement (SLA) is a service * SLA Compliance: A Service-Level Agreement (SLA) is a service
contract between a service provider and a client, which include contract between a service provider and a client, which include
the metrics for the service measurement and remedy/penalty the metrics for the service measurement and remedy/penalty
procedures when the service level misses the agreement. Users procedures when the service level misses the agreement. Users
need to check if they get the service as promised and network need to check if they get the service as promised and network
operators need to evaluate how they can deliver the services that operators need to evaluate how they can deliver services that can
can meet the SLA based on realtime network telemetry data, meet the SLA based on realtime network telemetry data, including
including data from network measurements. data from network measurements.
* Root Cause Analysis: Many network failure can be the effect of a * Root Cause Analysis: Many network failure can be the effect of a
sequence of chained events. Troubleshooting and recovery require sequence of chained events. Troubleshooting and recovery require
quick identification of the root cause of any observable issues. quick identification of the root cause of any observable issues.
However, the root cause is not always straightforward to identify, However, the root cause is not always straightforward to identify,
especially when the failure is sporadic and the number of event especially when the failure is sporadic and the number of event
messages, both related and unrelated to the same cause, is messages, both related and unrelated to the same cause, is
overwhelming. While technologies such as machine learning can be overwhelming. While technologies such as machine learning can be
used for root cause analysis, it is up to the network to sense and used for root cause analysis, it is up to the network to sense and
provide the relevant diagnostic data which are either actively fed provide the relevant diagnostic data which are either actively fed
skipping to change at page 10, line 21 skipping to change at page 10, line 12
techniques are not sufficient to support the above use cases for the techniques are not sufficient to support the above use cases for the
following reasons: following reasons:
* Most use cases need to continuously monitor the network and * Most use cases need to continuously monitor the network and
dynamically refine the data collection in real-time. Poll-based dynamically refine the data collection in real-time. Poll-based
low-frequency data collection is ill-suited for these low-frequency data collection is ill-suited for these
applications. Subscription-based streaming data directly pushed applications. Subscription-based streaming data directly pushed
from the data source (e.g., the forwarding chip) is preferred to from the data source (e.g., the forwarding chip) is preferred to
provide sufficient data quantity and precision at scale. provide sufficient data quantity and precision at scale.
* Comprehensive data is needed from packet processing engines to * Comprehensive data is needed, ranging from packet processing
traffic manager, from line cards to main control board, from user engines to traffic manager, from line cards to main control board,
flows to control protocol packets, from device configurations to from user flows to control protocol packets, from device
operations, and from physical layer to application layer. configurations to operations, and from physical layer to
Conventional OAM only covers a narrow range of data (e.g., SNMP application layer. Conventional OAM only covers a narrow range of
only handles data from the Management Information Base (MIB)). data (e.g., SNMP only handles data from the Management Information
Classical network devices cannot provide all the necessary probes. Base (MIB)). Classical network devices cannot provide all the
More open and programmable network devices are therefore needed. necessary probes. More open and programmable network devices are
therefore needed.
* Many application scenarios need to correlate network-wide data * Many application scenarios need to correlate network-wide data
from multiple sources (i.e., from distributed network devices, from multiple sources (i.e., from distributed network devices,
different components of a network device, or different network different components of a network device, or different network
planes). A piecemeal solution is often lacking the capability to planes). A piecemeal solution is often lacking the capability to
consolidate the data from multiple sources. The composition of a consolidate the data from multiple sources. The composition of a
complete solution, as partly proposed by Autonomic Resource complete solution, as partly proposed by Autonomic Resource
Control Architecture(ARCA) Control Architecture(ARCA)
[I-D.pedro-nmrg-anticipated-adaptation], will be empowered and [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and
guided by a comprehensive framework. guided by a comprehensive framework.
skipping to change at page 11, line 6 skipping to change at page 10, line 46
* Although some conventional OAM techniques support data push (e.g., * Although some conventional OAM techniques support data push (e.g.,
SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow [RFC3176]), the
pushed data are limited to only predefined management plane pushed data are limited to only predefined management plane
warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow). warnings (e.g., SNMP Trap) or sampled user packets (e.g., sFlow).
Network operators require the data with arbitrary source, Network operators require the data with arbitrary source,
granularity, and precision which are beyond the capability of the granularity, and precision which are beyond the capability of the
existing techniques. existing techniques.
* The conventional passive measurement techniques can either consume * The conventional passive measurement techniques can either consume
excessive network resources and render excessive redundant data, excessive network resources and produce excessive redundant data,
or lead to inaccurate results; on the other hand, the conventional or lead to inaccurate results; on the other hand, the conventional
active measurement techniques can interfere with the user traffic active measurement techniques can interfere with the user traffic
and their results are indirect. Techniques that can collect and their results are indirect. Techniques that can collect
direct and on-demand data from user traffic are more favorable. direct and on-demand data from user traffic are more favorable.
These challenges were addressed by newer standards and techniques These challenges were addressed by newer standards and techniques
(e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push) (e.g., IPFIX/Netflow, Packet Sampling (PSAMP), IOAM, and YANG-Push)
and more are emerging. These standards and techniques need to be and more are emerging. These standards and techniques need to be
recognized and accommodated in a new framework. recognized and accommodated in a new framework.
skipping to change at page 12, line 7 skipping to change at page 11, line 48
telemetry collectors subscribe to streaming data pushed from data telemetry collectors subscribe to streaming data pushed from data
sources in network devices. sources in network devices.
* Volume and Velocity: The telemetry data is intended to be consumed * Volume and Velocity: The telemetry data is intended to be consumed
by machines rather than by human being. Therefore, the data by machines rather than by human being. Therefore, the data
volume can be huge and the processing is optimized for the needs volume can be huge and the processing is optimized for the needs
of automation in realtime. of automation in realtime.
* Normalization and Unification: Telemetry aims to address the * Normalization and Unification: Telemetry aims to address the
overall network automation needs. Efforts are made to normalize overall network automation needs. Efforts are made to normalize
the data representation and unify the protocols, so to simplify the data representation and unify the protocols, so as to simplify
data analysis and provide integrated analysis across heterogeneous data analysis and provide integrated analysis across heterogeneous
devices and data sources across a network. devices and data sources across a network.
* Model-based: The telemetry data is modeled in advance which allows * Model-based: The telemetry data is modeled in advance which allows
applications to configure and consume data with ease. applications to configure and consume data with ease.
* Data Fusion: The data for a single application can come from * Data Fusion: The data for a single application can come from
multiple data sources (e.g., cross-domain, cross-device, and multiple data sources (e.g., cross-domain, cross-device, and
cross-layer) based on common naming/ID and needs to be correlated cross-layer) based on common naming/ID and needs to be correlated
to take effect. to take effect.
skipping to change at page 14, line 20 skipping to change at page 14, line 13
and path can be acquired. An application may need to switch its and path can be acquired. An application may need to switch its
viewpoint during operation. It may also need to correlate a viewpoint during operation. It may also need to correlate a
service and its impact on user experience to acquire the service and its impact on user experience to acquire the
comprehensive information. comprehensive information.
* Applications require network telemetry to be elastic in order to * Applications require network telemetry to be elastic in order to
make efficient use of network resources and reduce the impact of make efficient use of network resources and reduce the impact of
processing related to network telemetry on network performance. processing related to network telemetry on network performance.
For example, routine network monitoring should cover the entire For example, routine network monitoring should cover the entire
network with a low data sampling rate. Only when issues arise or network with a low data sampling rate. Only when issues arise or
critical trends emerge should telemetry data source be modified critical trends emerge should telemetry data sources be modified
and telemetry data rates boosted as needed. and telemetry data rates boosted as needed.
* Efficient data aggregation is critical for applications to reduce * Efficient data aggregation is critical for applications to reduce
the overall quantity of data and improve the accuracy of analysis. the overall quantity of data and improve the accuracy of analysis.
A telemetry framework collects together all the telemetry-related A telemetry framework collects together all the telemetry-related
works from different sources and working groups within IETF. This works from different sources and working groups within IETF. This
makes it possible to assemble a comprehensive network telemetry makes it possible to assemble a comprehensive network telemetry
system and to avoid repetitious or redundant work. The framework system and to avoid repetitious or redundant work. The framework
should cover the concepts and components from the standardization should cover the concepts and components from the standardization
skipping to change at page 18, line 16 skipping to change at page 18, line 16
follows (note that the requirements may pertain across all telemetry follows (note that the requirements may pertain across all telemetry
modules; however, we emphasize those that are most pronounced for a modules; however, we emphasize those that are most pronounced for a
particular plane). particular plane).
3.1.1. Management Plane Telemetry 3.1.1. Management Plane Telemetry
The management plane of network elements interacts with the Network The management plane of network elements interacts with the Network
Management System (NMS), and provides information such as performance Management System (NMS), and provides information such as performance
data, network logging data, network warning and defects data, and data, network logging data, network warning and defects data, and
network statistics and state data. The management plane includes network statistics and state data. The management plane includes
many protocols, including some that are considered "legacy", such as many protocols, including the classical SNMP and syslog. Regardless
SNMP and syslog. Regardless the protocol, management plane telemetry the protocol, management plane telemetry must address the following
must address the following requirements: requirements:
* Convenient Data Subscription: An application should have the * Convenient Data Subscription: An application should have the
freedom to choose which data is exported (see section 4.3) and the freedom to choose which data is exported (see section 4.3) and the
means and frequency of how that data is exported (e.g., on-change means and frequency of how that data is exported (e.g., on-change
or periodic subscription). or periodic subscription).
* Structured Data: For automatic network operation, machines will * Structured Data: For automatic network operation, machines will
replace human for network data comprehension. Data modeling replace human for network data comprehension. Data modeling
languages, such as YANG, can efficiently describe structured data languages, such as YANG, can efficiently describe structured data
and normalize data encoding and transformation. and normalize data encoding and transformation.
skipping to change at page 19, line 25 skipping to change at page 19, line 25
* Conventional OAM-based approaches for control plane KPI * Conventional OAM-based approaches for control plane KPI
measurement include Ping (L3), Traceroute (L3), Y.1731 [y1731] measurement include Ping (L3), Traceroute (L3), Y.1731 [y1731]
(L2), and so on. One common issue behind these methods is that (L2), and so on. One common issue behind these methods is that
they only measure the KPIs instead of reflecting the actual they only measure the KPIs instead of reflecting the actual
running status of these protocols, making them less effective or running status of these protocols, making them less effective or
efficient for control plane troubleshooting and network efficient for control plane troubleshooting and network
optimization. optimization.
* An example of the control plane telemetry is the BGP monitoring * An example of the control plane telemetry is the BGP monitoring
protocol (BMP), it is currently used for monitoring the BGP routes protocol (BMP). It is currently used for monitoring the BGP
and enables rich applications, such as BGP peer analysis, AS routes and enables rich applications, such as BGP peer analysis,
analysis, prefix analysis, and security analysis. However, the AS analysis, prefix analysis, and security analysis. However, the
monitoring of other layers, protocols and the cross-layer, cross- monitoring of other layers, protocols and the cross-layer, cross-
protocol KPI correlations are still in their infancy (e.g., IGP protocol KPI correlations are still in their infancy (e.g., IGP
monitoring is not as extensive as BMP), which require further monitoring is not as extensive as BMP), which require further
research. research.
* The requirement and solutions for network congestion avoidance are * The requirement and solutions for network congestion avoidance are
also applicable to the control plane telemetry. also applicable to the control plane telemetry.
3.1.3. Forwarding Plane Telemetry 3.1.3. Forwarding Plane Telemetry
skipping to change at page 23, line 14 skipping to change at page 23, line 14
data. On the other hand, it receives, stores, and processes the data. On the other hand, it receives, stores, and processes the
returned data from network devices. Data analysis can be returned data from network devices. Data analysis can be
interactive to initiate further data queries. This component can interactive to initiate further data queries. This component can
reside in either network devices or remote controllers. It can be reside in either network devices or remote controllers. It can be
centralized and distributed, and involve one or more instances. centralized and distributed, and involve one or more instances.
* Data Configuration and Subscription: This component manages data * Data Configuration and Subscription: This component manages data
queries on devices. It determines the protocol and channel for queries on devices. It determines the protocol and channel for
applications to acquire desired data. This component is also applications to acquire desired data. This component is also
responsible for configuring the desired data that might not be responsible for configuring the desired data that might not be
directly available form data sources. The subscription data can directly available from data sources. The subscription data can
be described by models, templates, or programs. be described by models, templates, or programs.
* Data Encoding and Export: This component determines how telemetry * Data Encoding and Export: This component determines how telemetry
data is delivered to the data analysis and storage component with data is delivered to the data analysis and storage component with
access control. The data encoding and the transport protocol may access control. The data encoding and the transport protocol may
vary due to the data export location. vary due to the data export location.
* Data Generation and Processing: The requested data needs to be * Data Generation and Processing: The requested data needs to be
captured, filtered, processed, and formatted in network devices captured, filtered, processed, and formatted in network devices
from raw data sources. This may involve in-network computing and from raw data sources. This may involve in-network computing and
skipping to change at page 28, line 41 skipping to change at page 28, line 41
transporting, and analyzing a wide variety of data sources in support transporting, and analyzing a wide variety of data sources in support
of network applications. The protocols, data formats, and of network applications. The protocols, data formats, and
configurations chosen to implement this framework will dictate the configurations chosen to implement this framework will dictate the
specific security considerations. These considerations may include: specific security considerations. These considerations may include:
* Telemetry framework trust and policy model; * Telemetry framework trust and policy model;
* Role management and access control for enabling and disabling * Role management and access control for enabling and disabling
telemetry capabilities; telemetry capabilities;
* Protocol transport used telemetry data and inherent security * Protocol transport used for telemetry data and its inherent
capabilities; security capabilities;
* Telemetry data stores, storage encryption, methods of access, and * Telemetry data stores, storage encryption, methods of access, and
retention practices; retention practices;
* Tracking telemetry events and any abnormalities that might * Tracking telemetry events and any abnormalities that might
identify malicious attacks using telemetry interfaces. identify malicious attacks using telemetry interfaces.
* Authentication and signing of telemetry data to make data more * Authentication and integrity protection of telemetry data to make
trustworthy. data more trustworthy.
* Segregating the telemetry data traffic from the data traffic * Segregating the telemetry data traffic from the data traffic
carried over the network (e.g., historically management access and carried over the network (e.g., historically management access and
management data may be carried via an independent management management data may be carried via an independent management
network). network).
Some security considerations highlighted above may be minimized or Some security considerations highlighted above may be minimized or
negated with policy management of network telemetry. In a network negated with policy management of network telemetry. In a network
telemetry deployment it would be advantageous to separate telemetry telemetry deployment it would be advantageous to separate telemetry
capabilities into different classes of policies, i.e., Role Based capabilities into different classes of policies, i.e., Role Based
skipping to change at page 29, line 40 skipping to change at page 29, line 40
The other contributors of this document are Tianran Zhou, Zhenbin Li, The other contributors of this document are Tianran Zhou, Zhenbin Li,
Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm
8. Acknowledgments 8. Acknowledgments
We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe
Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe
Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra, Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra,
Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin
Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Eric Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Eric
Vyncke, Jean-Michel Combes, and many others who have provided helpful Vyncke, Jean-Michel Combes, Erik Kline, Benjamin Kaduk, and many
comments and suggestions to improve this document. others who have provided helpful comments and suggestions to improve
this document.
9. Informative References 9. Informative References
[gnmi] "gNMI - gRPC Network Management Interface", [gnmi] "gNMI - gRPC Network Management Interface",
<https://github.com/openconfig/reference/tree/master/rpc/ <https://github.com/openconfig/reference/tree/master/rpc/
gnmi>. gnmi>.
[gpb] "Google Protocol Buffers", [gpb] "Google Protocol Buffers",
<https://developers.google.com/protocol-buffers>. <https://developers.google.com/protocol-buffers>.
skipping to change at page 32, line 14 skipping to change at page 32, line 14
[RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981,
DOI 10.17487/RFC2981, October 2000, DOI 10.17487/RFC2981, October 2000,
<https://www.rfc-editor.org/info/rfc2981>. <https://www.rfc-editor.org/info/rfc2981>.
[RFC3176] Phaal, P., Panchen, S., and N. McKee, "InMon Corporation's [RFC3176] Phaal, P., Panchen, S., and N. McKee, "InMon Corporation's
sFlow: A Method for Monitoring Traffic in Switched and sFlow: A Method for Monitoring Traffic in Switched and
Routed Networks", RFC 3176, DOI 10.17487/RFC3176, Routed Networks", RFC 3176, DOI 10.17487/RFC3176,
September 2001, <https://www.rfc-editor.org/info/rfc3176>. September 2001, <https://www.rfc-editor.org/info/rfc3176>.
[RFC3414] Blumenthal, U. and B. Wijnen, "User-based Security Model [RFC3411] Harrington, D., Presuhn, R., and B. Wijnen, "An
(USM) for version 3 of the Simple Network Management Architecture for Describing Simple Network Management
Protocol (SNMPv3)", STD 62, RFC 3414, Protocol (SNMP) Management Frameworks", STD 62, RFC 3411,
DOI 10.17487/RFC3414, December 2002, DOI 10.17487/RFC3411, December 2002,
<https://www.rfc-editor.org/info/rfc3414>. <https://www.rfc-editor.org/info/rfc3411>.
[RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations [RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations
for the Simple Network Management Protocol (SNMP)", for the Simple Network Management Protocol (SNMP)",
STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002,
<https://www.rfc-editor.org/info/rfc3416>. <https://www.rfc-editor.org/info/rfc3416>.
[RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management
Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877,
September 2004, <https://www.rfc-editor.org/info/rfc3877>. September 2004, <https://www.rfc-editor.org/info/rfc3877>.
skipping to change at page 39, line 23 skipping to change at page 39, line 23
[I-D.ietf-ippm-ioam-direct-export] and IOAM Marking [I-D.ietf-ippm-ioam-direct-export] and IOAM Marking
[I-D.song-ippm-postcard-based-telemetry], is a complementary [I-D.song-ippm-postcard-based-telemetry], is a complementary
technique to the passport-based IOAM. PBT directly exports data at technique to the passport-based IOAM. PBT directly exports data at
each node through an independent packet. At the cost of higher each node through an independent packet. At the cost of higher
bandwidth overhead and the need for data correlation, PBT shows bandwidth overhead and the need for data correlation, PBT shows
several unique advantages. It can also help to identify packet drop several unique advantages. It can also help to identify packet drop
location in case a packet is dropped on its forwarding path. location in case a packet is dropped on its forwarding path.
A.3.6. Existing OAM for Specific Data Planes A.3.6. Existing OAM for Specific Data Planes
Various data planes raises unique OAM requirements. IETF has Various data planes raise unique OAM requirements. IETF has
published OAM technique and framework documents (e.g., [RFC8924] and published OAM technique and framework documents (e.g., [RFC8924] and
[RFC5085]) targeting different data planes such as Multi-Protocol [RFC5085]) targeting different data planes such as Multi-Protocol
Label Switching (MPLS), L2 Virtual Private Network (L2-VPN), Network Label Switching (MPLS), L2 Virtual Private Network (L2-VPN), Network
Virtualization Overlays (NVO3), Virtual Extensible LAN (VXLAN), Bit Virtualization Overlays (NVO3), Virtual Extensible LAN (VXLAN), Bit
Indexed Explicit Replication (BIER), Service Function Chaining (SFC), Indexed Explicit Replication (BIER), Service Function Chaining (SFC),
Segment Routing (SR), and Deterministic Networking (DETNET). The Segment Routing (SR), and Deterministic Networking (DETNET). The
aforementioned data plane telemetry techniques can be used to enhance aforementioned data plane telemetry techniques can be used to enhance
the OAM capability on such data planes. the OAM capability on such data planes.
A.4. External Data and Event Telemetry A.4. External Data and Event Telemetry
 End of changes. 27 change blocks. 
50 lines changed or deleted 55 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/