draft-ietf-opsawg-ntf-11.txt   draft-ietf-opsawg-ntf-12.txt 
OPSAWG H. Song OPSAWG H. Song
Internet-Draft Futurewei Internet-Draft Futurewei
Intended status: Informational F. Qin Intended status: Informational F. Qin
Expires: 2 June 2022 China Mobile Expires: 4 June 2022 China Mobile
P. Martinez-Julia P. Martinez-Julia
NICT NICT
L. Ciavaglia L. Ciavaglia
Rakuten Mobile Rakuten Mobile
A. Wang A. Wang
China Telecom China Telecom
29 November 2021 1 December 2021
Network Telemetry Framework Network Telemetry Framework
draft-ietf-opsawg-ntf-11 draft-ietf-opsawg-ntf-12
Abstract Abstract
Network telemetry is a technology for gaining network insight and Network telemetry is a technology for gaining network insight and
facilitating efficient and automated network management. It facilitating efficient and automated network management. It
encompasses various techniques for remote data generation, encompasses various techniques for remote data generation,
collection, correlation, and consumption. This document describes an collection, correlation, and consumption. This document describes an
architectural framework for network telemetry, motivated by architectural framework for network telemetry, motivated by
challenges that are encountered as part of the operation of networks challenges that are encountered as part of the operation of networks
and by the requirements that ensue. This document clarifies the and by the requirements that ensue. This document clarifies the
skipping to change at page 1, line 48 skipping to change at page 1, line 48
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on 2 June 2022. This Internet-Draft will expire on 4 June 2022.
Copyright Notice Copyright Notice
Copyright (c) 2021 IETF Trust and the persons identified as the Copyright (c) 2021 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents (https://trustee.ietf.org/ Provisions Relating to IETF Documents (https://trustee.ietf.org/
license-info) in effect on the date of publication of this document. license-info) in effect on the date of publication of this document.
Please review these documents carefully, as they describe your rights Please review these documents carefully, as they describe your rights
and restrictions with respect to this document. Code Components and restrictions with respect to this document. Code Components
extracted from this document must include Revised BSD License text as extracted from this document must include Revised BSD License text as
described in Section 4.e of the Trust Legal Provisions and are described in Section 4.e of the Trust Legal Provisions and are
provided without warranty as described in the Revised BSD License. provided without warranty as described in the Revised BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 4 1.1. Applicability Statement . . . . . . . . . . . . . . . . . 4
1.2. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 4
2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 7
2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 7 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 8
2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 9 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 10
2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 10 2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 11
2.5. The Necessity of a Network Telemetry Framework . . . . . 13 2.5. The Necessity of a Network Telemetry Framework . . . . . 13
3. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14 3. Network Telemetry Framework . . . . . . . . . . . . . . . . . 14
3.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 14 3.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 15
3.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18 3.1.1. Management Plane Telemetry . . . . . . . . . . . . . 18
3.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18 3.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 18
3.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19 3.1.3. Forwarding Plane Telemetry . . . . . . . . . . . . . 19
3.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21 3.1.4. External Data Telemetry . . . . . . . . . . . . . . . 21
3.2. Second Level Function Components . . . . . . . . . . . . 22 3.2. Second Level Function Components . . . . . . . . . . . . 22
3.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24 3.3. Data Acquisition Mechanism and Type Abstraction . . . . . 24
3.4. Mapping Existing Mechanisms into the Framework . . . . . 26 3.4. Mapping Existing Mechanisms into the Framework . . . . . 26
4. Evolution of Network Telemetry Applications . . . . . . . . . 27 4. Evolution of Network Telemetry Applications . . . . . . . . . 27
5. Security Considerations . . . . . . . . . . . . . . . . . . . 28 5. Security Considerations . . . . . . . . . . . . . . . . . . . 28
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 29
skipping to change at page 3, line 24 skipping to change at page 3, line 25
Network visibility is the ability of management tools to see the Network visibility is the ability of management tools to see the
state and behavior of a network, which is essential for successful state and behavior of a network, which is essential for successful
network operation. Network Telemetry revolves around network data network operation. Network Telemetry revolves around network data
that can help provide insights about the current state of the that can help provide insights about the current state of the
network, including network devices, forwarding, control, and network, including network devices, forwarding, control, and
management planes, and that can be generated and obtained through a management planes, and that can be generated and obtained through a
variety of techniques, including but not limited to network variety of techniques, including but not limited to network
instrumentation and measurements, and that can be processed for instrumentation and measurements, and that can be processed for
purposes ranging from service assurance to network security using a purposes ranging from service assurance to network security using a
wide variety of techniques including machine learning, data analysis, wide variety of data analytical techniques. In this document,
and correlation. In this document, Network Telemetry refer to both Network Telemetry refer to both the data itself (i.e., "Network
the data itself (i.e., "Network Telemetry Data"), and the techniques Telemetry Data"), and the techniques and processes used to generate,
and processes used to generate, export, collect, and consume that export, collect, and consume that data for use by potentially
data for use by potentially automated management applications. automated management applications. Network telemetry extends beyond
Network telemetry extends beyond the historical network Operations, the classical network Operations, Administration, and Management
Administration, and Management (OAM) techniques and expects to (OAM) techniques and expects to support better flexibility,
support better flexibility, scalability, accuracy, coverage, and scalability, accuracy, coverage, and performance.
performance.
However, the term "network telemetry" lacks an unambiguous However, the term "network telemetry" lacks an unambiguous
definition. The scope and coverage of it cause confusion and definition. The scope and coverage of it cause confusion and
misunderstandings. It is beneficial to clarify the concept and misunderstandings. It is beneficial to clarify the concept and
provide a clear architectural framework for network telemetry, so we provide a clear architectural framework for network telemetry, so we
can articulate the technical field, and better align the related can articulate the technical field, and better align the related
techniques and standard works. techniques and standard works.
To fulfill such an undertaking, we first discuss some key To fulfill such an undertaking, we first discuss some key
characteristics of network telemetry which set a clear distinction characteristics of network telemetry which set a clear distinction
from the conventional network OAM and show that some conventional OAM from the conventional network OAM and show that some conventional OAM
technologies can be considered a subset of the network telemetry technologies can be considered a subset of the network telemetry
technologies. We then provide an architectural framework for network technologies. We then provide an architectural framework for network
telemetry which includes four modules, each concerned with a telemetry which includes four modules, each concerned with a
different category of telemetry data and corresponding procedures. different category of telemetry data and corresponding procedures.
All the modules are internally structured in the same way, including All the modules are internally structured in the same way, including
components that allow to configure data sources in regard to what components that allow the operator to configure data sources in
data to generate and how to make that available to client regard to what data to generate and how to make that available to
applications, components that instrument the underlying data sources, client applications, components that instrument the underlying data
and components that perform the actual rendering, encoding, and sources, and components that perform the actual rendering, encoding,
exporting of the generated data. We show how the network telemetry and exporting of the generated data. We show how the network
framework can benefit the current and future network operations. telemetry framework can benefit the current and future network
Based on the distinction of modules and function components, we can operations. Based on the distinction of modules and function
map the existing and emerging techniques and protocols into the components, we can map the existing and emerging techniques and
framework. The framework can also simplify the tasks for designing, protocols into the framework. The framework can also simplify the
maintaining, and understanding a network telemetry system. At last, designing, maintaining, and understanding a network telemetry system.
we outline the evolution stages of the network telemetry system and In addition, we outline the evolution stages of the network telemetry
discuss the potential security concerns. system and discuss the potential security concerns.
The purpose of the framework and taxonomy is to set a common ground The purpose of the framework and taxonomy is to set a common ground
for the collection of related work and provide guidance for future for the collection of related work and provide guidance for future
technique and standard developments. To the best of our knowledge, technique and standard developments. To the best of our knowledge,
this document is the first such effort for network telemetry in this document is the first such effort for network telemetry in
industry standards organizations. industry standards organizations. This document does not define
specific technologies.
1.1. Glossary 1.1. Applicability Statement
Large-scale network data collection is a major threat to user privacy
and may be indistinguishable from pervasive monitoring [RFC7258].
The network telemetry framework presented in this document must not
be applied to generating, exporting, collecting, analyzing, or
retaining individual user data or any data that can identify end
users or characterize their behavior without consent. Based on this
principle, the network telemetry framework is not applicable to
networks whose endpoints represent individual users, such as general-
purpose access networks.
1.2. Glossary
Before further discussion, we list some key terminology and acronyms Before further discussion, we list some key terminology and acronyms
used in this document. We make an intended differentiation between used in this document. We make an intended differentiation between
the terms of network telemetry and OAM. However, it should be the terms of network telemetry and OAM. However, it should be
understood that there is not a hard-line distinction between the two understood that there is not a hard-line distinction between the two
concepts. Rather, network telemetry is considered as an extension of concepts. Rather, network telemetry is considered as an extension of
OAM. It covers all the existing OAM protocols but puts more emphasis OAM. It covers all the existing OAM protocols but puts more emphasis
on the newer and emerging techniques and protocols concerning all on the newer and emerging techniques and protocols concerning all
aspects of network data from acquisition to consumption. aspects of network data from acquisition to consumption.
skipping to change at page 7, line 6 skipping to change at page 7, line 19
next step for network evolution following Software Defined Network next step for network evolution following Software Defined Network
(SDN), aiming to reduce (or even eliminate) human labor, make more (SDN), aiming to reduce (or even eliminate) human labor, make more
efficient use of network resources, and provide better services more efficient use of network resources, and provide better services more
aligned with customer requirements. The related technique of aligned with customer requirements. The related technique of
Intent-based Networking (IBN) Intent-based Networking (IBN)
[I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility [I-D.irtf-nmrg-ibn-concepts-definitions] requires network visibility
and telemetry data in order to ensure that the network is behaving as and telemetry data in order to ensure that the network is behaving as
intended. intended.
However, while the data processing capability is improved and However, while the data processing capability is improved and
applications are hungry for more data, the networks lag behind in applications require more data to function better, the networks lag
extracting and translating network data into useful and actionable behind in extracting and translating network data into useful and
information in efficient ways. The system bottleneck is shifting actionable information in efficient ways. The system bottleneck is
from data consumption to data supply. Both the number of network shifting from data consumption to data supply. Both the number of
nodes and the traffic bandwidth keep increasing at a fast pace. The network nodes and the traffic bandwidth keep increasing at a fast
network configuration and policy change at smaller time slots than pace. The network configuration and policy change at smaller time
before. More subtle events and fine-grained data through all network slots than before. More subtle events and fine-grained data through
planes need to be captured and exported in real time. In a nutshell, all network planes need to be captured and exported in real time. In
it is a challenge to get enough high-quality data out of the network a nutshell, it is a challenge to get enough high-quality data out of
in a manner that is efficient, timely, and flexible. Therefore, we the network in a manner that is efficient, timely, and flexible.
need to survey the existing technologies and protocols and identify Therefore, we need to survey the existing technologies and protocols
any potential gaps. and identify any potential gaps.
In the remainder of this section, first we clarify the scope of In the remainder of this section, first we clarify the scope of
network data (i.e., telemetry data) concerned in the context. Then, network data (i.e., telemetry data) relevant in this document. Then,
we discuss several key use cases for today's and future network we discuss several key use cases for today's and future network
operations. Next, we show why the current network OAM techniques and operations. Next, we show why the current network OAM techniques and
protocols are insufficient for these use cases. The discussion protocols are insufficient for these use cases. The discussion
underlines the need of new methods, techniques, and protocols, as underlines the need of new methods, techniques, and protocols, as
well as the extensions of existing ones, which we assign under the well as the extensions of existing ones, which we assign under the
umbrella term - Network Telemetry. umbrella term - Network Telemetry.
2.1. Telemetry Data Coverage 2.1. Telemetry Data Coverage
Any information that can be extracted from networks (including data Any information that can be extracted from networks (including data
plane, control plane, and management plane) and used to gain plane, control plane, and management plane) and used to gain
visibility or as basis for actions is considered telemetry data. It visibility or as basis for actions is considered telemetry data. It
includes statistics, event records and logs, snapshots of state, includes statistics, event records and logs, snapshots of state,
configuration data, etc. It also covers the outputs of any active configuration data, etc. It also covers the outputs of any active
and passive measurements [RFC7799]. In some cases, raw data is and passive measurements [RFC7799]. In some cases, raw data is
processed in network before being sent to a data consumer. Such processed in network before being sent to a data consumer. Such
processed data is also considered telemetry data. The value of processed data is also considered telemetry data. The value of
telemetry data varies. Less but higher quality data are often better telemetry data varies. In some cases, if the cost is acceptable,
than lots of low quality data. A classification of telemetry data is less but higher quality data are preferred than lots of low quality
provided in Section 3. To preserve user privacy, the user packet data. A classification of telemetry data is provided in Section 3.
content should not be collected. To preserve the privacy of end-users, no user packet content should
be collected. Specifically, the data objects generated, exported,
and collected by a network telemetry application should not include
any packet payload from traffic associated with end-users systems.
2.2. Use Cases 2.2. Use Cases
The following set of use cases is essential for network operations. The following set of use cases is essential for network operations.
While the list is by no means exhaustive, it is enough to highlight While the list is by no means exhaustive, it is enough to highlight
the requirements for data velocity, variety, volume, and veracity, the requirements for data velocity, variety, volume, and veracity,
the attributes of big data, in networks. the attributes of big data, in networks.
* Security: Network intrusion detection and prevention systems need * Security: Network intrusion detection and prevention systems need
to monitor network traffic and activities and act upon anomalies. to monitor network traffic and activities and act upon anomalies.
Given increasingly sophisticated attack vector coupled with Given increasingly sophisticated attack vector coupled with
increasingly severe consequences of security breaches, new tools increasingly severe consequences of security breaches, new tools
and techniques need to be developed, relying on wider and deeper and techniques need to be developed, relying on wider and deeper
visibility into networks. The ultimate goal is to achieve the visibility into networks. The ultimate goal is to achieve the
ideal security with no, or only minimal, human intervention. security with no, or only minimal, human intervention.
* Policy and Intent Compliance: Network policies are the rules that * Policy and Intent Compliance: Network policies are the rules that
constrain the services for network access, provide service constrain the services for network access, provide service
differentiation, or enforce specific treatment on the traffic. differentiation, or enforce specific treatment on the traffic.
For example, a service function chain is a policy that requires For example, a service function chain is a policy that requires
the selected flows to pass through a set of ordered network the selected flows to pass through a set of ordered network
functions. Intent, as defined in functions. Intent, as defined in
[I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational
goal that a network should meet and outcomes that a network is goals that a network should meet and outcomes that a network is
supposed to deliver, defined in a declarative manner without supposed to deliver, defined in a declarative manner without
specifying how to achieve or implement them. An intent requires a specifying how to achieve or implement them. An intent requires a
complex translation and mapping process before being applied on complex translation and mapping process before being applied on
networks. While a policy or intent is enforced, the compliance networks. While a policy or intent is enforced, the compliance
needs to be verified and monitored continuously by relying on needs to be verified and monitored continuously by relying on
visibility that is provided through network telemetry data. Any visibility that is provided through network telemetry data. Any
violation must be notified immediately, potentially resulting in violation must be notified immediately, potentially resulting in
updates to how the policy or intent is applied in the network to updates to how the policy or intent is applied in the network to
ensure that it remains in force, or otherwise alerting the network ensure that it remains in force, or otherwise alerting the network
administrator to the policy or intent violation. administrator to the policy or intent violation.
* SLA Compliance: A Service-Level Agreement (SLA) defines the level * SLA Compliance: A Service-Level Agreement (SLA) is a service
of service a user expects from a network operator, which include contract between a service provider and a client, which include
the metrics for the service measurement and remedy/penalty the metrics for the service measurement and remedy/penalty
procedures when the service level misses the agreement. Users procedures when the service level misses the agreement. Users
need to check if they get the service as promised and network need to check if they get the service as promised and network
operators need to evaluate how they can deliver the services that operators need to evaluate how they can deliver the services that
can meet the SLA based on realtime network telemetry data, can meet the SLA based on realtime network telemetry data,
including data from network measurements. including data from network measurements.
* Root Cause Analysis: Any network failure can be the effect of a * Root Cause Analysis: Many network failure can be the effect of a
sequence of chained events. Troubleshooting and recovery require sequence of chained events. Troubleshooting and recovery require
quick identification of the root cause of any observable issues. quick identification of the root cause of any observable issues.
However, the root cause is not always straightforward to identify, However, the root cause is not always straightforward to identify,
especially when the failure is sporadic and the number of event especially when the failure is sporadic and the number of event
messages, both related and unrelated to the same cause, is messages, both related and unrelated to the same cause, is
overwhelming. While machine learning technologies can be used for overwhelming. While technologies such as machine learning can be
root cause analysis, it up to the network to sense and provide the used for root cause analysis, it is up to the network to sense and
relevant diagnostic data which are either actively fed into, or provide the relevant diagnostic data which are either actively fed
passively retrieved by, machine learning applications. into, or passively retrieved by, the root cause analysis
applications.
* Network Optimization: This covers all short-term and long-term * Network Optimization: This covers all short-term and long-term
network optimization techniques, including load balancing, Traffic network optimization techniques, including load balancing, Traffic
Engineering (TE), and network planning. Network operators are Engineering (TE), and network planning. Network operators are
motivated to optimize their network utilization and differentiate motivated to optimize their network utilization and differentiate
services for better Return On Investment (ROI) or lower Capital services for better Return On Investment (ROI) or lower Capital
Expenditures (CAPEX). The first step is to know the real-time Expenditures (CAPEX). The first step is to know the real-time
network conditions before applying policies for traffic network conditions before applying policies for traffic
manipulation. In some cases, micro-bursts need to be detected in manipulation. In some cases, micro-bursts need to be detected in
a very short time-frame so that fine-grained traffic control can a very short time-frame so that fine-grained traffic control can
skipping to change at page 9, line 32 skipping to change at page 10, line 15
2.3. Challenges 2.3. Challenges
For a long time, network operators have relied upon SNMP [RFC3416], For a long time, network operators have relied upon SNMP [RFC3416],
Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the Command-Line Interface (CLI), or Syslog [RFC5424] to monitor the
network. Some other OAM techniques as described in [RFC7276] are network. Some other OAM techniques as described in [RFC7276] are
also used to facilitate network troubleshooting. These conventional also used to facilitate network troubleshooting. These conventional
techniques are not sufficient to support the above use cases for the techniques are not sufficient to support the above use cases for the
following reasons: following reasons:
* Most use cases need to continuously monitor the network and * Most use cases need to continuously monitor the network and
dynamically refine the data collection in real-time. The poll- dynamically refine the data collection in real-time. Poll-based
based low-frequency data collection is ill-suited for these low-frequency data collection is ill-suited for these
applications. Subscription-based streaming data directly pushed applications. Subscription-based streaming data directly pushed
from the data source (e.g., the forwarding chip) is preferred to from the data source (e.g., the forwarding chip) is preferred to
provide enough data quantity and precision at scale. provide sufficient data quantity and precision at scale.
* Comprehensive data is needed from packet processing engine to * Comprehensive data is needed from packet processing engines to
traffic manager, from line cards to main control board, from user traffic manager, from line cards to main control board, from user
flows to control protocol packets, from device configurations to flows to control protocol packets, from device configurations to
operations, and from physical layer to application layer. operations, and from physical layer to application layer.
Conventional OAM only covers a narrow range of data (e.g., SNMP Conventional OAM only covers a narrow range of data (e.g., SNMP
only handles data from the Management Information Base (MIB)). only handles data from the Management Information Base (MIB)).
Classical network devices cannot provide all the necessary probes. Classical network devices cannot provide all the necessary probes.
More open and programmable network devices are therefore needed. More open and programmable network devices are therefore needed.
* Many application scenarios need to correlate network-wide data * Many application scenarios need to correlate network-wide data
from multiple sources (i.e., from distributed network devices, from multiple sources (i.e., from distributed network devices,
skipping to change at page 10, line 43 skipping to change at page 11, line 26
2.4. Network Telemetry 2.4. Network Telemetry
Network telemetry has emerged as a mainstream technical term to refer Network telemetry has emerged as a mainstream technical term to refer
to the network data collection and consumption techniques. Several to the network data collection and consumption techniques. Several
network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and network telemetry techniques and protocols (e.g., IPFIX [RFC7011] and
gRPC [grpc]) have been widely deployed. Network telemetry allows gRPC [grpc]) have been widely deployed. Network telemetry allows
separate entities to acquire data from network devices so that data separate entities to acquire data from network devices so that data
can be visualized and analyzed to support network monitoring and can be visualized and analyzed to support network monitoring and
operation. Network telemetry covers the conventional network OAM and operation. Network telemetry covers the conventional network OAM and
has a wider scope. It is expected that network telemetry can provide has a wider scope. For instance, it is expected that network
the necessary network insight for autonomous networks and address the telemetry can provide the necessary network insight for autonomous
shortcomings of conventional OAM techniques. networks and address the shortcomings of conventional OAM techniques.
Network telemetry usually assumes machines as data consumers rather Network telemetry usually assumes machines as data consumers rather
than human operators. Hence, the network telemetry can directly than human operators. Hence, the network telemetry can directly
trigger the automated network operation, while in contrast some trigger the automated network operation, while in contrast some
conventional OAM tools are designed and used to help human operators conventional OAM tools were designed and used to help human operators
to monitor and diagnose the networks and guide manual network to monitor and diagnose the networks and guide manual network
operations. Such a proposition leads to very different techniques. operations. Such a proposition leads to very different techniques.
Although new network telemetry techniques are emerging and subject to Although new network telemetry techniques are emerging and subject to
continuous evolution, several characteristics of network telemetry continuous evolution, several characteristics of network telemetry
have been well accepted. Note that network telemetry is intended to have been well accepted. Note that network telemetry is intended to
be an umbrella term covering a wide spectrum of techniques, so the be an umbrella term covering a wide spectrum of techniques, so the
following characteristics are not expected to be held by every following characteristics are not expected to be held by every
specific technique. specific technique.
skipping to change at page 13, line 7 skipping to change at page 13, line 27
understand that there are no inherent assumptions about how a system understand that there are no inherent assumptions about how a system
should be architected. While a network architecture with centralized should be architected. While a network architecture with centralized
controller (e.g., SDN) seems a natural fit for network telemetry, controller (e.g., SDN) seems a natural fit for network telemetry,
network telemetry can work in distributed fashions as well. For network telemetry can work in distributed fashions as well. For
example, telemetry data producers and consumers can have a peer-to- example, telemetry data producers and consumers can have a peer-to-
peer relationship, in which a network node can be the direct consumer peer relationship, in which a network node can be the direct consumer
of telemetry data from other nodes. of telemetry data from other nodes.
2.5. The Necessity of a Network Telemetry Framework 2.5. The Necessity of a Network Telemetry Framework
Network data analytics and machine-learning technologies are applied Network data analytics (e.g., machine learning) is applied for
for network operation automation, relying on abundant and coherent network operation automation, relying on abundant and coherent data
data from networks. Data acquisition that is limited to a single from networks. Data acquisition that is limited to a single source
source and static in nature will in many cases not be sufficient to and static in nature will in many cases not be sufficient to meet an
meet an application's telemetry data needs. As a result, multiple application's telemetry data needs. As a result, multiple data
data sources, involving a variety of techniques and standards, will sources, involving a variety of techniques and standards, will need
need to be integrated. It is desirable to have a framework that to be integrated. It is desirable to have a framework that
classifies and organizes different telemetry data source and types, classifies and organizes different telemetry data source and types,
defines different components of a network telemetry system and their defines different components of a network telemetry system and their
interactions, and helps coordinate and integrate multiple telemetry interactions, and helps coordinate and integrate multiple telemetry
approaches across layers. This allows flexible combinations of data approaches across layers. This allows flexible combinations of data
for different applications, while normalizing and simplifying for different applications, while normalizing and simplifying
interfaces. In detail, such a framework would benefit application interfaces. In detail, such a framework would benefit the
development for the following reasons: development of network operation applications for the following
reasons:
* Future networks, autonomous or otherwise, depend on holistic and * Future networks, autonomous or otherwise, depend on holistic and
comprehensive network visibility. All the use cases and comprehensive network visibility. The use cases and applications
applications are better to be supported uniformly and coherently are better to be supported uniformly and coherently using an
under a single intelligent agent using an integrated, converged integrated, converged mechanism and common telemetry data
mechanism and common telemetry data representations wherever representations wherever feasible. Therefore, the protocols and
feasible. Therefore, the protocols and mechanisms should be mechanisms should be consolidated into a minimum yet comprehensive
consolidated into a minimum yet comprehensive set. A telemetry set. A telemetry framework can help to normalize the technique
framework can help to normalize the technique developments. developments.
* Network visibility presents multiple viewpoints. For example, the * Network visibility presents multiple viewpoints. For example, the
device viewpoint takes the network infrastructure as the device viewpoint takes the network infrastructure as the
monitoring object from which the network topology and device monitoring object from which the network topology and device
status can be acquired; the traffic viewpoint takes the flows or status can be acquired; the traffic viewpoint takes the flows or
packets as the monitoring object from which the traffic quality packets as the monitoring object from which the traffic quality
and path can be acquired. An application may need to switch its and path can be acquired. An application may need to switch its
viewpoint during operation. It may also need to correlate a viewpoint during operation. It may also need to correlate a
service and its impact on user experience to acquire the service and its impact on user experience to acquire the
comprehensive information. comprehensive information.
skipping to change at page 14, line 15 skipping to change at page 14, line 36
A telemetry framework collects together all the telemetry-related A telemetry framework collects together all the telemetry-related
works from different sources and working groups within IETF. This works from different sources and working groups within IETF. This
makes it possible to assemble a comprehensive network telemetry makes it possible to assemble a comprehensive network telemetry
system and to avoid repetitious or redundant work. The framework system and to avoid repetitious or redundant work. The framework
should cover the concepts and components from the standardization should cover the concepts and components from the standardization
perspective. This document describes the modules which make up a perspective. This document describes the modules which make up a
network telemetry framework and decomposes the telemetry system into network telemetry framework and decomposes the telemetry system into
a set of distinct components that existing and future work can easily a set of distinct components that existing and future work can easily
map to. map to.
Disclaimer: large-scale network data collection is a major threat to
user privacy [RFC7258]. The network telemetry framework presented in
this document should not be applied to collect and retain individual
user data or any data that can identify end users without consent.
Any data collection or retention using the framework must be tightly
limited to protect user privacy.
3. Network Telemetry Framework 3. Network Telemetry Framework
The top level network telemetry framework partitions the network The top level network telemetry framework partitions the network
telemetry into four modules based on the telemetry data object source telemetry into four modules based on the telemetry data object source
and represents their relationship. At the next level, the framework and represents their relationship. Once the network operation
applications acquire the data from these modules, they can apply data
analytics and take actions. At the next level, the framework
decomposes each module into separate components. Each of the modules decomposes each module into separate components. Each of the modules
follows the same underlying structure, with one component dedicated follows the same underlying structure, with one component dedicated
to the configuration of data subscriptions and data sources, a second to the configuration of data subscriptions and data sources, a second
component dedicated to encoding and exporting data, and a third component dedicated to encoding and exporting data, and a third
component instrumenting the generation of telemetry related to the component instrumenting the generation of telemetry related to the
underlying resources. Throughout the framework, the same set of underlying resources. Throughout the framework, the same set of
abstract data acquiring mechanisms and data types (Section 3.3) are abstract data acquiring mechanisms and data types (Section 3.3) are
applied. The two-level architecture with the uniform data applied. The two-level architecture with the uniform data
abstraction helps accurately pinpoint a protocol or technique to its abstraction helps accurately pinpoint a protocol or technique to its
position in a network telemetry system or disaggregate a network position in a network telemetry system or disaggregate a network
telemetry system into manageable parts. telemetry system into manageable parts.
3.1. Top Level Modules 3.1. Top Level Modules
Telemetry can be applied on the forwarding plane, the control plane, Telemetry can be applied on the forwarding plane, the control plane,
and the management plane in a network, as well as other sources out and the management plane in a network, as well as other sources out
of the network, as shown in Figure 1. Therefore, we categorize the of the network, as shown in Figure 1. Therefore, we categorize the
network telemetry into four distinct modules with each having its own network telemetry into four distinct modules (management plane,
interface to Network Operation Applications. control plane, forwarding plane, and external data and event
telemetry) with each having its own interface to Network Operation
Applications.
+------------------------------+ +------------------------------+
| | | |
| Network Operation |<-------+ | Network Operation |<-------+
| Applications | | | Applications | |
| | | | | |
+------------------------------+ | +------------------------------+ |
^ ^ ^ | ^ ^ ^ |
| | | | | | | |
V V | V V V | V
skipping to change at page 16, line 26 skipping to change at page 16, line 42
exported also varies. For example, forwarding plane data mainly exported also varies. For example, forwarding plane data mainly
originates as data exported from the forwarding Application-Specific originates as data exported from the forwarding Application-Specific
Integrated Circuits (ASICs), while control plane data mainly Integrated Circuits (ASICs), while control plane data mainly
originates from the protocol daemons running on the control CPU(s). originates from the protocol daemons running on the control CPU(s).
For convenience and efficiency, it is preferred to export the data For convenience and efficiency, it is preferred to export the data
off the device from locations near the source. Because the locations off the device from locations near the source. Because the locations
that can export data have different capabilities, different choices that can export data have different capabilities, different choices
of data model, encoding, and transport method are made to balance the of data model, encoding, and transport method are made to balance the
performance and cost. For example, the forwarding chip has high performance and cost. For example, the forwarding chip has high
throughput but limited capacity for processing complex data and throughput but limited capacity for processing complex data and
maintaining states, while the main control CPU is capable of complex maintaining state, while the main control CPU is capable of complex
data and state processing, but has limited bandwidth for high data and state processing, but has limited bandwidth for high
throughput data. As a result, the suitable telemetry protocol for throughput data. As a result, the suitable telemetry protocol for
each module can be different. Some representative techniques are each module can be different. Some representative techniques are
shown in the corresponding table blocks to highlight the technical shown in the corresponding table blocks to highlight the technical
diversity of these modules. Note that the selected techniques just diversity of these modules. Note that the selected techniques just
reflect the de facto state of the art and are by no means exhaustive reflect the de facto state of the art and are by no means exhaustive
(e.g., IPFIX can also be implemented over TCP and SCTP, but that is (e.g., IPFIX can also be implemented over TCP and SCTP, but that is
not recommended for forwarding plane). The key point is that one not recommended for forwarding plane). The key point is that one
cannot expect to use a universal protocol to cover all the network cannot expect to use a universal protocol to cover all the network
telemetry requirements. telemetry requirements.
skipping to change at page 22, line 47 skipping to change at page 22, line 47
possibilities of current and future network systems, as reflected in possibilities of current and future network systems, as reflected in
the incorporation of cognitive capabilities to new hardware and the incorporation of cognitive capabilities to new hardware and
software (virtual) elements. software (virtual) elements.
3.2. Second Level Function Components 3.2. Second Level Function Components
The telemetry module at each plane can be further partitioned into The telemetry module at each plane can be further partitioned into
five distinct conceptual components: five distinct conceptual components:
* Data Query, Analysis, and Storage: This component works at the * Data Query, Analysis, and Storage: This component works at the
application layer. It is normally a part of the network network operation application block in Figure 1. It is normally a
management system at the receiver side. On the one hand, it is part of the network management system at the receiver side. On
responsible for issuing data requirements. The data of interest the one hand, it is responsible for issuing data requirements.
can be modeled data through configuration or custom data through The data of interest can be modeled data through configuration or
programming. The data requirements can be queries for one-shot custom data through programming. The data requirements can be
data or subscriptions for events or streaming data. On the other queries for one-shot data or subscriptions for events or streaming
hand, it receives, stores, and processes the returned data from data. On the other hand, it receives, stores, and processes the
network devices. Data analysis can be interactive to initiate returned data from network devices. Data analysis can be
further data queries. This component can reside in either network interactive to initiate further data queries. This component can
devices or remote controllers. It can be centralized and reside in either network devices or remote controllers. It can be
distributed, and involve one or more instances. centralized and distributed, and involve one or more instances.
* Data Configuration and Subscription: This component manages data * Data Configuration and Subscription: This component manages data
queries on devices. It determines the protocol and channel for queries on devices. It determines the protocol and channel for
applications to acquire desired data. This component is also applications to acquire desired data. This component is also
responsible for configuring the desired data that might not be responsible for configuring the desired data that might not be
directly available form data sources. The subscription data can directly available form data sources. The subscription data can
be described by models, templates, or programs. be described by models, templates, or programs.
* Data Encoding and Export: This component determines how telemetry * Data Encoding and Export: This component determines how telemetry
data is delivered to the data analysis and storage component with data is delivered to the data analysis and storage component with
skipping to change at page 28, line 30 skipping to change at page 28, line 30
Given that this document has proposed a framework for network Given that this document has proposed a framework for network
telemetry and the telemetry mechanisms discussed are more extensive telemetry and the telemetry mechanisms discussed are more extensive
(in both message frequency and traffic amount) than the conventional (in both message frequency and traffic amount) than the conventional
network OAM concepts, we must also reflect that various new security network OAM concepts, we must also reflect that various new security
considerations may also arise. A number of techniques already exist considerations may also arise. A number of techniques already exist
for securing the forwarding plane, the control plane, and the for securing the forwarding plane, the control plane, and the
management plane in a network, but it is important to consider if any management plane in a network, but it is important to consider if any
new threat vectors are now being enabled via the use of network new threat vectors are now being enabled via the use of network
telemetry procedures and mechanisms. telemetry procedures and mechanisms.
Security considerations for networks that use telemetry methods may This document proposes a conceptual architectural for collecting,
include: transporting, and analyzing a wide variety of data sources in support
of network applications. The protocols, data formats, and
configurations chosen to implement this framework will dictate the
specific security considerations. These considerations may include:
* Telemetry framework trust and policy model; * Telemetry framework trust and policy model;
* Role management and access control for enabling and disabling * Role management and access control for enabling and disabling
telemetry capabilities; telemetry capabilities;
* Protocol transport used telemetry data and inherent security * Protocol transport used telemetry data and inherent security
capabilities; capabilities;
* Telemetry data stores, storage encryption and methods of access; * Telemetry data stores, storage encryption, methods of access, and
retention practices;
* Tracking telemetry events and any abnormalities that might * Tracking telemetry events and any abnormalities that might
identify malicious attacks using telemetry interfaces. identify malicious attacks using telemetry interfaces.
* Authentication and signing of telemetry data to make data more * Authentication and signing of telemetry data to make data more
trustworthy. trustworthy.
* Segregating the telemetry data traffic from the data traffic * Segregating the telemetry data traffic from the data traffic
carried over the network (e.g., historically management access and carried over the network (e.g., historically management access and
management data may be carried via an independent management management data may be carried via an independent management
skipping to change at page 29, line 34 skipping to change at page 29, line 39
The other contributors of this document are Tianran Zhou, Zhenbin Li, The other contributors of this document are Tianran Zhou, Zhenbin Li,
Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm Zhenqiang Li, Daniel King, Adrian Farrel, and Alexander Clemm
8. Acknowledgments 8. Acknowledgments
We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe We would like to thank Rob Wilton, Greg Mirsky, Randy Presuhn, Joe
Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe Clarke, Victor Liu, James Guichard, Uri Blumenthal, Giuseppe
Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra, Fioccola, Yunan Gu, Parviz Yegani, Young Lee, Qin Wu, Gyan Mishra,
Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin Ben Schwartz, Alexey Melnikov, Michael Scharf, Dhruv Dhody, Martin
Duke, and many others who have provided helpful comments and Duke, Roman Danyliw, Warren Kumari, Sheng Jiang, Lars Eggert, Eric
suggestions to improve this document. Vyncke, Jean-Michel Combes, and many others who have provided helpful
comments and suggestions to improve this document.
9. Informative References 9. Informative References
[gnmi] "gNMI - gRPC Network Management Interface", [gnmi] "gNMI - gRPC Network Management Interface",
<https://github.com/openconfig/reference/tree/master/rpc/ <https://github.com/openconfig/reference/tree/master/rpc/
gnmi>. gnmi>.
[gpb] "Google Protocol Buffers", [gpb] "Google Protocol Buffers",
<https://developers.google.com/protocol-buffers>. <https://developers.google.com/protocol-buffers>.
 End of changes. 35 change blocks. 
109 lines changed or deleted 129 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/