draft-ietf-opsawg-ntf-03.txt   draft-ietf-opsawg-ntf-04.txt 
OPSAWG H. Song OPSAWG H. Song
Internet-Draft Futurewei Internet-Draft Futurewei
Intended status: Informational F. Qin Intended status: Informational F. Qin
Expires: October 15, 2020 China Mobile Expires: March 25, 2021 China Mobile
P. Martinez-Julia P. Martinez-Julia
NICT NICT
L. Ciavaglia L. Ciavaglia
Nokia Nokia
A. Wang A. Wang
China Telecom China Telecom
April 13, 2020 September 21, 2020
Network Telemetry Framework Network Telemetry Framework
draft-ietf-opsawg-ntf-03 draft-ietf-opsawg-ntf-04
Abstract Abstract
Network telemetry is the technology for gaining network insight and Network telemetry is the technology for gaining network insight and
facilitating efficient and automated network management. It engages facilitating efficient and automated network management. It engages
various techniques for remote data collection, correlation, and various techniques for remote data collection, correlation, and
consumption. This document provides an architectural framework for consumption. This document provides an architectural framework for
network telemetry, motivated by the network operation challenges and network telemetry, motivated by the network operation challenges and
requirements. As evidenced by some key characteristics and industry requirements. As evidenced by some key characteristics and industry
practices, network telemetry covers technologies and protocols beyond practices, network telemetry covers technologies and protocols beyond
skipping to change at page 2, line 7 skipping to change at page 2, line 7
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on October 15, 2020. This Internet-Draft will expire on March 25, 2021.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Background . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Telemetry Data Coverage . . . . . . . . . . . . . . . . . 5
2.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6
2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 8 2.4. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 8
3. The Necessity of a Network Telemetry Framework . . . . . . . 10 2.5. Network Telemetry . . . . . . . . . . . . . . . . . . . . 9
4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 11 3. The Necessity of a Network Telemetry Framework . . . . . . . 11
4.1. Data Acquiring Mechanisms and Data Types . . . . . . . . 12 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 12
4.2. Data Object Modules . . . . . . . . . . . . . . . . . . . 13 4.1. Top Level Modules . . . . . . . . . . . . . . . . . . . . 13
4.2.1. Requirements and Challenges for each Module . . . . . 16 4.1.1. Management Plane Telemetry . . . . . . . . . . . . . 15
4.3. Function Components . . . . . . . . . . . . . . . . . . . 19 4.1.2. Control Plane Telemetry . . . . . . . . . . . . . . . 15
4.4. Existing Works Mapped in the Framework . . . . . . . . . 21 4.1.3. Data Plane Telemetry . . . . . . . . . . . . . . . . 16
5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 22 4.1.4. External Data Telemetry . . . . . . . . . . . . . . . 18
6. Security Considerations . . . . . . . . . . . . . . . . . . . 23 4.2. Second Level Function Components . . . . . . . . . . . . 19
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 4.3. Data Acquiring Mechanism and Type Abstraction . . . . . . 20
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 4.4. Existing Works Mapped in the Framework . . . . . . . . . 22
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 23
6. Security Considerations . . . . . . . . . . . . . . . . . . . 24
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 25
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 25
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 25
10. Informative References . . . . . . . . . . . . . . . . . . . 25 10. Informative References . . . . . . . . . . . . . . . . . . . 25
Appendix A. A Survey on Existing Network Telemetry Techniques . 28 Appendix A. A Survey on Existing Network Telemetry Techniques . 29
A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 28 A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 29
A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 28 A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 29
A.1.2. gRPC Network Management Interface . . . . . . . . . . 28 A.1.2. gRPC Network Management Interface . . . . . . . . . . 30
A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 29 A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 30
A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 29 A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 30
A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 29 A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 31
A.3.1. The IPFPM technology . . . . . . . . . . . . . . . . 29 A.3.1. The Alternate Marking technology . . . . . . . . . . 31
A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 30 A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 32
A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 31 A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 32
A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 31 A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 32
A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 31 A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 33
A.4. External Data and Event Telemetry . . . . . . . . . . . . 32 A.4. External Data and Event Telemetry . . . . . . . . . . . . 33
A.4.1. Sources of External Events . . . . . . . . . . . . . 32 A.4.1. Sources of External Events . . . . . . . . . . . . . 33
A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 33 A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 34
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 35
1. Introduction 1. Introduction
Network visibility is the ability of management tools to see the Network visibility is the ability of management tools to see the
state and behavior of a network. It is essential for successful state and behavior of a network. It is essential for successful
network operation. Network telemetry is the process of measuring, network operation. Network telemetry is the process of measuring,
correlating, recording, and distributing information about the correlating, recording, and distributing information about the
behavior of a network. Network telemetry has been considered as an behavior of a network. Network telemetry has been considered as an
ideal means to gain sufficient network visibility with better ideal means to gain sufficient network visibility with better
flexibility, scalability, accuracy, coverage, and performance than flexibility, scalability, accuracy, coverage, and performance than
skipping to change at page 3, line 38 skipping to change at page 3, line 42
misunderstandings. It is beneficial to clarify the concept and misunderstandings. It is beneficial to clarify the concept and
provide a clear architectural framework for network telemetry, so we provide a clear architectural framework for network telemetry, so we
can articulate the technical field, and better align the related can articulate the technical field, and better align the related
techniques and standard works. techniques and standard works.
To fulfill such an undertaking, we first discuss some key To fulfill such an undertaking, we first discuss some key
characteristics of network telemetry which set a clear distinction characteristics of network telemetry which set a clear distinction
from the conventional network OAM and show that some conventional OAM from the conventional network OAM and show that some conventional OAM
technologies can be considered a subset of the network telemetry technologies can be considered a subset of the network telemetry
technologies. We then provide an architectural framework for network technologies. We then provide an architectural framework for network
telemetry from three different perspectives. We show how network telemetry by partitioning a network telemetry system into four
telemetry can meet the current and future network operation modules each with the same building components and data abstracts.
requirements, and the challenges each telemetry module is facing. We show how the network telemetry framework can benefit the current
Based on the distinction of modules and function components, we can and future network operations. Based on the distinction of modules
map the existing and emerging techniques and protocols into the and function components, we can map the existing and emerging
framework. At last, we outline a road-map for the evolution of the techniques and protocols into the framework. The framework can also
network telemetry system and discuss the potential security concerns simplify the tasks for designing, maintaining, and understanding a
for network telemetry. network telemetry system. At last, we outline the evolution stages
of the network telemetry system and discuss the potential security
concerns.
The purpose of the framework and taxonomy is to set a common ground The purpose of the framework and taxonomy is to set a common ground
for the collection of related work and provide guidance for future for the collection of related work and provide guidance for future
technique and standard developments. To the best of our knowledge, technique and standard developments. To the best of our knowledge,
this document is the first such effort for network telemetry in this document is the first such effort for network telemetry in
industry standards organizations. industry standards organizations.
2. Motivation 2. Background
The term "big data" is used to describe the extremely large volume of The term "big data" is used to describe the extremely large volume of
data sets that can be analyzed computationally to reveal patterns, data sets that can be analyzed computationally to reveal patterns,
trends, and associations. Network is undoubtedly a source of big trends, and associations. Network is undoubtedly a source of big
data because of its scale and all the traffic goes through it. It is data because of its scale and all the traffic goes through it. It is
easy to see that network OAM can benefit from network big data. easy to see that network OAM can benefit from network big data.
Today one can access advanced big data analytics capability through a Today one can access advanced big data analytics capability through a
plethora of commercial and open source platforms (e.g., Apache plethora of commercial and open source platforms (e.g., Apache
Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine
learning). Thanks to the advance of computing and storage learning). Thanks to the advance of computing and storage
technologies, network big data analytics gives network operators an technologies, network big data analytics gives network operators an
opportunity to gain network insights and move towards network opportunity to gain network insights and move towards network
autonomy. Some operators start to explore the application of autonomy. Some operators start to explore the application of
Artificial Intelligence (AI) to make sense of network data. Software Artificial Intelligence (AI) to make sense of network data. Software
tools can use the network data to detect and react on network faults, tools can use the network data to detect and react on network faults,
anomalies, and policy violations, as well as predicting future anomalies, and policy violations, as well as predicting future
events. In turn, the network policy updates for planning, intrusion events. In turn, the network policy updates for planning, intrusion
prevention, optimization, and self-healing may be applied. prevention, optimization, and self-healing may be applied.
It is conceivable that an intent-driven autonomic network [RFC7575] It is conceivable that an autonomic network [RFC7575] is the logical
is the logical next step for network evolution following Software next step for network evolution following Software Defined Network
Defined Network (SDN), aiming to reduce (or even eliminate) human (SDN), aiming to reduce (or even eliminate) human labor, make more
labor, make more efficient use of network resources, and provide efficient use of network resources, and provide better services more
better services more aligned with customer requirements. Although it aligned with customer requirements. Intent-based Networking (IBN)
takes time to reach the ultimate goal, the journey has started [I-D.irtf-nmrg-ibn-concepts-definitions] provides the necessary
nevertheless. mechanisms. Although it takes time to reach the ultimate goal, the
journey has started nevertheless.
However, while the data processing capability is improved and However, while the data processing capability is improved and
applications are hungry for more data, the networks lag behind in applications are hungry for more data, the networks lag behind in
extracting and translating network data into useful and actionable extracting and translating network data into useful and actionable
information in efficient ways. The system bottleneck is shifting information in efficient ways. The system bottleneck is shifting
from data consumption to data supply. Both the number of network from data consumption to data supply. Both the number of network
nodes and the traffic bandwidth keep increasing at a fast pace. The nodes and the traffic bandwidth keep increasing at a fast pace. The
network configuration and policy change at smaller time slots than network configuration and policy change at smaller time slots than
before. More subtle events and fine-grained data through all network before. More subtle events and fine-grained data through all network
planes need to be captured and exported in real time. In a nutshell, planes need to be captured and exported in real time. In a nutshell,
it is a challenge to get enough high-quality data out of network it is a challenge to get enough high-quality data out of network
efficiently, timely, and flexibly. Therefore, we need to examine the efficiently, timely, and flexibly. Therefore, we need to examine the
existing network technologies and protocols, and identify any existing network technologies and protocols, and identify any
potential technique and standard gaps based on the real network and potential technique and standard gaps based on the real network and
device architectures. device architectures.
In the remaining of this section, first we discuss several key use In the remaining of this section, first we clarify the scope of
cases for today's and future network operations. Next, we show why network data (i.e., telemetry data) concerned in the context. Then,
the current network OAM techniques and protocols are insufficient for we discuss several key use cases for today's and future network
these use cases. The discussion underlines the need of new methods, operations. Next, we show why the current network OAM techniques and
techniques, and protocols which we assign under an umbrella term - protocols are insufficient for these use cases. The discussion
network telemetry. underlines the need of new methods, techniques, and protocols which
we assign under an umbrella term - network telemetry.
2.1. Use Cases 2.1. Telemetry Data Coverage
Any information that can be extracted from networks (including data
plane, control plane, and management plane) and used to gain
visibility or as basis for actions is considered telemetry data. It
includes statistics, event records and logs, snapshots of state,
configuration data, etc. It also covers the outputs of any active
and passive measurements. Specially, raw data can be processed in
network before sending to a data consumer. Such processed data are
also telemetry data in the context. A classification of the
telemetry data form is provided in Section 4.
2.2. Use Cases
These use cases are essential for network operations. While the list These use cases are essential for network operations. While the list
is by no means exhaustive, it is enough to highlight the requirements is by no means exhaustive, it is enough to highlight the requirements
for data velocity, variety, volume, and veracity in networks. for data velocity, variety, volume, and veracity in networks.
Security: Network intrusion detection and prevention need monitor
network traffic and activities, and act upon anomalies. Given the
more and more sophisticated attack vector and higher and higher
tolls due to security breach, new tools and techniques need to be
developed, relying on wider and deeper visibility in networks.
Policy and Intent Compliance: Network policies are the rules that Policy and Intent Compliance: Network policies are the rules that
constraint the services for network access, provide service constraint the services for network access, provide service
differentiation, or enforce specific treatment on the traffic. differentiation, or enforce specific treatment on the traffic.
For example, a service function chain is a policy that requires For example, a service function chain is a policy that requires
the selected flows to pass through a set of ordered network the selected flows to pass through a set of ordered network
functions. An intent is a high-level abstract policy which functions. Intent, as defined in
requires a complex translation and mapping process before being [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational
applied on networks. While a policy is enforced, the compliance goal that a network should meet and outcomes that a network is
supposed to deliver, defined in a declarative manner without
specifying how to achieve or implement them. An intent requires a
complex translation and mapping process before being applied on
networks. While a policy or an intent is enforced, the compliance
needs to be verified and monitored continuously, and any violation needs to be verified and monitored continuously, and any violation
needs to be reported immediately. needs to be reported immediately.
SLA Compliance: A Service-Level Agreement (SLA) defines the level of SLA Compliance: A Service-Level Agreement (SLA) defines the level of
service a user expects from a network operator, which include the service a user expects from a network operator, which include the
metrics for the service measurement and remedy/penalty procedures metrics for the service measurement and remedy/penalty procedures
when the service level misses the agreement. Users need to check when the service level misses the agreement. Users need to check
if they get the service as promised and network operators need to if they get the service as promised and network operators need to
evaluate how they can deliver the services that can meet the SLA evaluate how they can deliver the services that can meet the SLA
based on realtime network measurement. based on realtime network measurement.
skipping to change at page 6, line 15 skipping to change at page 6, line 45
accumulated data of network operations. accumulated data of network operations.
Event Tracking and Prediction: The visibility of traffic path and Event Tracking and Prediction: The visibility of traffic path and
performance is critical for services and applications that rely on performance is critical for services and applications that rely on
healthy network operation. Numerous related network events are of healthy network operation. Numerous related network events are of
interest to network operators. For example, Network operators interest to network operators. For example, Network operators
want to learn where and why packets are dropped for an application want to learn where and why packets are dropped for an application
flow. They also want to be warned of issues in advance so flow. They also want to be warned of issues in advance so
proactive actions can be taken to avoid catastrophic consequences. proactive actions can be taken to avoid catastrophic consequences.
2.2. Challenges 2.3. Challenges
For a long time, network operators have relied upon SNMP [RFC3416], For a long time, network operators have relied upon SNMP [RFC3416],
Command-Line Interface (CLI), or Syslog to monitor the network. Some Command-Line Interface (CLI), or Syslog to monitor the network. Some
other OAM techniques as described in [RFC7276] are also used to other OAM techniques as described in [RFC7276] are also used to
facilitate network troubleshooting. These conventional techniques facilitate network troubleshooting. these conventional techniques
are not sufficient to support the above use cases for the following are not sufficient to support the above use cases for the following
reasons: reasons, which explains why new standards and techniques keep
emerging and the needs remain high:
o Most use cases need to continuously monitor the network and o Most use cases need to continuously monitor the network and
dynamically refine the data collection in real-time. The poll- dynamically refine the data collection in real-time. The poll-
based low-frequency data collection is ill-suited for these based low-frequency data collection is ill-suited for these
applications. Subscription-based streaming data directly pushed applications. Subscription-based streaming data directly pushed
from the data source (e.g., the forwarding chip) is preferred to from the data source (e.g., the forwarding chip) is preferred to
provide enough data quantity and precision at scale. provide enough data quantity and precision at scale.
o Comprehensive data is needed from packet processing engine to o Comprehensive data is needed from packet processing engine to
traffic manager, from line cards to main control board, from user traffic manager, from line cards to main control board, from user
skipping to change at page 7, line 25 skipping to change at page 8, line 7
precision which are beyond the capability of the existing precision which are beyond the capability of the existing
techniques. techniques.
o The conventional passive measurement techniques can either consume o The conventional passive measurement techniques can either consume
excessive network resources and render excessive redundant data, excessive network resources and render excessive redundant data,
or lead to inaccurate results; on the other hand, the conventional or lead to inaccurate results; on the other hand, the conventional
active measurement techniques can interfere with the user traffic active measurement techniques can interfere with the user traffic
and their results are indirect. Techniques that can collect and their results are indirect. Techniques that can collect
direct and on-demand data from user traffic are more favorable. direct and on-demand data from user traffic are more favorable.
2.3. Glossary 2.4. Glossary
Before further discussion, we list some key terminology and acronyms Before further discussion, we list some key terminology and acronyms
used in this documents. We make an intended distinction between used in this documents. We make an intended distinction between
network telemetry and network OAM. network telemetry and network OAM.
AI: Artificial Intelligence. In network domain, AI refers to the AI: Artificial Intelligence. In network domain, AI refers to the
machine-learning based technologies for automated network machine-learning based technologies for automated network
operation and other tasks. operation and other tasks.
AM: Alternate Marking, a flow performance measurement method,
specified in [RFC8321].
BMP: BGP Monitoring Protocol, specified in [RFC7854]. BMP: BGP Monitoring Protocol, specified in [RFC7854].
DNP: Dynamic Network Probe, referring to programmable in-network DNP: Dynamic Network Probe, referring to programmable in-network
sensors for network monitoring and measurement. sensors for network monitoring and measurement.
DPI: Deep Packet Inspection, referring to the techniques that DPI: Deep Packet Inspection, referring to the techniques that
examines packet beyond packet L3/L4 headers. examines packet beyond packet L3/L4 headers.
gNMI: gRPC Network Management Interface, a network management gNMI: gRPC Network Management Interface, a network management
protocol from OpenConfig Operator Working Group, mainly protocol from OpenConfig Operator Working Group, mainly
contributed by Google. See [gnmi] for details. contributed by Google. See [gnmi] for details.
gRPC: gRPC Remote Procedure Call, a open source high performance RPC gRPC: gRPC Remote Procedure Call, a open source high performance RPC
framework that gNMI is based on. See [grpc] for details. framework that gNMI is based on. See [grpc] for details.
IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. IPFIX: IP Flow Information Export Protocol, specified in [RFC7011].
IPFPM: IP Flow Performance Measurement method, specified in
[RFC8321].
IOAM: In-situ OAM, a dataplane on-path telemetry technique. IOAM: In-situ OAM, a dataplane on-path telemetry technique.
NETCONF: Network Configuration Protocol, specified in [RFC6241]. NETCONF: Network Configuration Protocol, specified in [RFC6241].
NetFlow: A Cisco protocol for flow record collecting, described in
[RFC3594].
Network Telemetry: Acquiring and processing network data remotely Network Telemetry: Acquiring and processing network data remotely
for network monitoring and operation. A general term for a large for network monitoring and operation. A general term for a large
set of network visibility techniques and protocols, with the set of network visibility techniques and protocols, with the
characteristics defined in this document. Network telemetry characteristics defined in this document. Network telemetry
addresses the current network operation issues and enables smooth addresses the current network operation issues and enables smooth
evolution toward future intent-driven autonomous networks. evolution toward future intent-driven autonomous networks.
NMS: Network Management System, referring to applications that allow NMS: Network Management System, referring to applications that allow
network administrators manage a network's software and hardware network administrators manage a network's software and hardware
components. It usually records data from a network's remote components. It usually records data from a network's remote
skipping to change at page 8, line 33 skipping to change at page 9, line 19
OAM: Operations, Administration, and Maintenance. A group of OAM: Operations, Administration, and Maintenance. A group of
network management functions that provide network fault network management functions that provide network fault
indication, fault localization, performance information, and data indication, fault localization, performance information, and data
and diagnosis functions. Most conventional network monitoring and diagnosis functions. Most conventional network monitoring
techniques and protocols belong to network OAM. techniques and protocols belong to network OAM.
PBT: Postcard-Based Telemetry, a dataplane on-path telemetry PBT: Postcard-Based Telemetry, a dataplane on-path telemetry
technique. technique.
SMIv2 Structure of Management Information Version 2, specified in
[RFC2578].
SNMP: Simple Network Management Protocol. Version 1 and 2 are SNMP: Simple Network Management Protocol. Version 1 and 2 are
specified in [RFC1157] and [RFC3416], respectively. specified in [RFC1157] and [RFC3416], respectively.
YANG: The abbreviation of "Yet Another Next Generation". YANG is a YANG: The abbreviation of "Yet Another Next Generation". YANG is a
data modeling language for the definition of data sent over data modeling language for the definition of data sent over
network management protocols such as the NETCONF and RESTCONF. network management protocols such as the NETCONF and RESTCONF.
YANG is defined in [RFC6020]. YANG is defined in [RFC6020].
YANG FSM: A YANG model that describes events, operations, and finite YANG FSM: A YANG model that describes events, operations, and finite
state machine of YANG-defined network elements. state machine of YANG-defined network elements.
YANG PUSH: A method to subscribe pushed data from remote YANG YANG PUSH: A method to subscribe pushed data from remote YANG
datastore on network devices. datastore on network devices. Details are specified in [RFC8641]
and [RFC8639].
2.4. Network Telemetry 2.5. Network Telemetry
Network telemetry has emerged as a mainstream technical term to refer Network telemetry has emerged as a mainstream technical term to refer
to the newer data collection and consumption techniques, to the newer data collection and consumption techniques,
distinguishing itself from the convention techniques for network OAM. distinguishing itself from the convention techniques for network OAM.
The representative techniques and protocols include IPFIX [RFC7011] Many such techniques have been widely deployed. The representative
and gPRC [grpc]. Network telemetry allows separate entities to techniques and protocols include IPFIX [RFC7011] and gPRC [grpc].
acquire data from network devices so that data can be visualized and Network telemetry allows separate entities to acquire data from
analyzed to support network monitoring and operation. Network network devices so that data can be visualized and analyzed to
telemetry overlaps with the conventional network OAM and has a wider support network monitoring and operation. Network telemetry overlaps
scope than it. It is expected that network telemetry can provide the with the conventional network OAM and has a wider scope than it. It
necessary network insight for autonomous networks and address the is expected that network telemetry can provide the necessary network
shortcomings of conventional OAM techniques. insight for autonomous networks and address the shortcomings of
conventional OAM techniques.
One difference between the network telemetry and the network OAM is One difference between the network telemetry and the network OAM is
that in general the network telemetry assumes machines as data that in general the network telemetry assumes machines as data
consumer rather than human operators. Hence, the network telemetry consumer rather than human operators. Hence, the network telemetry
can directly trigger the automated network operation, while the can directly trigger the automated network operation, while the
conventional OAM tools usually help human operators to monitor and conventional OAM tools usually help human operators to monitor and
diagnose the networks and guide manual network operations. The diagnose the networks and guide manual network operations. The
difference leads to very different techniques. difference leads to very different techniques.
Although the network telemetry techniques are just emerging and Although the network telemetry techniques are just emerging and
skipping to change at page 10, line 19 skipping to change at page 11, line 11
run-time to cater to the specific need of applications. This run-time to cater to the specific need of applications. This
needs the support of a programmable data plane which allows probes needs the support of a programmable data plane which allows probes
with custom functions to be deployed at flexible locations. with custom functions to be deployed at flexible locations.
o In-Network Data Aggregation and Correlation: Network devices and o In-Network Data Aggregation and Correlation: Network devices and
aggregation points can work out which events and what data needs aggregation points can work out which events and what data needs
to be stored, reported, or discarded thus reducing the load on the to be stored, reported, or discarded thus reducing the load on the
central collection and processing points while still ensuring that central collection and processing points while still ensuring that
the right information is ready to be processed in a timely way. the right information is ready to be processed in a timely way.
o In-Network Processing and Action: Sometimes it is not necessary or o In-Network Processing: Sometimes it is not necessary or feasible
feasible to gather all information to a central point to be to gather all information to a central point to be processed and
processed and acted upon. It is possible for the data processing acted upon. It is possible for the data processing to be done in
to be done in network, and actions to be taken locally. network, allowing reactive actions to be taken locally.
o Direct Data Plane Export: The data originated from the data plane o Direct Data Plane Export: The data originated from the data plane
forwarding chips can be directly exported to the data consumer for forwarding chips can be directly exported to the data consumer for
efficiency, especially when the data bandwidth is large and the efficiency, especially when the data bandwidth is large and the
real-time processing is required. real-time processing is required.
o In-band Data Collection: In addition to the passive and active o In-band Data Collection: In addition to the passive and active
data collection approaches, the new hybrid approach allows to data collection approaches, the new hybrid approach allows to
directly collect data for any target flow on its entire forwarding directly collect data for any target flow on its entire forwarding
path [I-D.song-opsawg-ifit-framework]. path [I-D.song-opsawg-ifit-framework].
It is worth noting that, a network telemetry system should not be It is worth noting that, a network telemetry system should not be
intrusive to normal network operations, by avoiding the pitfall of intrusive to normal network operations, by avoiding the pitfall of
the "observer effect". That is, it should not change the network the "observer effect". That is, it should not change the network
behavior and affect the forwarding performance. Otherwise, the whole behavior and affect the forwarding performance. Otherwise, the whole
purpose of network telemetry is defied. purpose of network telemetry is defied.
Although in many cases a network telemetry system is akin to the SDN Although in many cases a network telemetry system involves a remote
architecture, it is important to understand that network telemetry data collecting, processing, and reacting entity, it is important to
does not infer the need of any centralized data processing and understand that network telemetry does not infer the necessity of
analytics engine. Telemetry data producers and consumers can such an entity. Telemetry data producers and consumers can work in
perfectly work in distributed or peer-to-peer fashions instead. distributed or peer-to-peer fashions instead. In such cases, a
network node can be the direct consumer of telemetry data from other
nodes.
3. The Necessity of a Network Telemetry Framework 3. The Necessity of a Network Telemetry Framework
Big data analytics and machine-learning based AI technologies are Network data analytics and machine-learning technologies are applied
applied for network operation automation, relying on abundant data for network operation automation, relying on abundant and coherent
from networks. The single-sourced and static data acquisition cannot data from networks. The single-sourced and static data acquisition
meet the data requirements. It is desirable to have a framework that cannot meet the data requirements. The scattered standards and
integrates multiple telemetry approaches from different layers. This diverse techniques are hard to be integrated. It is desirable to
allows flexible combinations for different applications. The have a framework that classifies and organizes different telemetry
framework would benefit application development for the following data source and types, defines different components of a network
reasons: telemetry system and their interactions, and helps coordinate and
integrate multiple telemetry approaches from different layers. This
allows flexible combinations for different applications, while
normalizing and simplifying interfaces. In detail, such a framework
would benefit application development for the following reasons:
o The future autonomous networks will require a holistic view on o The future autonomous networks will require a holistic view on
network visibility. All the use cases and applications need to be network visibility. All the use cases and applications need to be
supported uniformly and coherently under a single intelligent supported uniformly and coherently under a single intelligent
agent. Therefore, the protocols and mechanisms should be agent. Therefore, the protocols and mechanisms should be
consolidated into a minimum yet comprehensive set. A telemetry consolidated into a minimum yet comprehensive set. A telemetry
framework can help to normalize the technique developments. framework can help to normalize the technique developments.
o Network visibility presents multiple viewpoints. For example, the o Network visibility presents multiple viewpoints. For example, the
device viewpoint takes the network infrastructure as the device viewpoint takes the network infrastructure as the
skipping to change at page 11, line 47 skipping to change at page 12, line 46
makes it possible to assemble a comprehensive network telemetry makes it possible to assemble a comprehensive network telemetry
system and to avoid repetitious or redundant work. The framework system and to avoid repetitious or redundant work. The framework
should cover the concepts and components from the standardization should cover the concepts and components from the standardization
perspective. This document clarifies the layered modules on which perspective. This document clarifies the layered modules on which
the telemetry is exerted and decomposes the telemetry system into a the telemetry is exerted and decomposes the telemetry system into a
set of distinct components that the existing and future work can set of distinct components that the existing and future work can
easily map to. easily map to.
4. Network Telemetry Framework 4. Network Telemetry Framework
Network telemetry techniques can be classified from multiple The top level network telemetry framework partitions the network
dimensions. In this document, we provide three unique perspectives: telemetry into four modules based on the telemetry data object source
data acquiring mechanisms, data objects, and function components. and represents their relationship. The next level framework reveals
that each module replicates the same architecture comprising the same
4.1. Data Acquiring Mechanisms and Data Types set of components. Throughout the framework, the same set of
abstract data acquiring mechanisms and data types are applied. The
Broadly speaking, network data can be acquired through subscription two-level architecture with the uniform data abstraction helps
(push) and query (poll). A subscriber may request data when it is accurately pinpoint a protocol or technique to its position in a
ready. It follows a Publish-Subscription (Pub-Sub) mode or a network telemetry system or disaggregate a network telemetry system
Subscription-Publish (Sub-Pub) mode. In the Pub-Sub mode, pre- into manageable parts.
defined data are published and multiple qualified subscribers can
subscribe the data. In the Sub-Pub mode, a subscriber designates
what data are of interest and demands the network devices to deliver
the data when available.
In contrast, query is used when a querier expects immediate feedback
from network devices. The queried data may be directly extracted
from some specific data source, or synthesized and processed from raw
data. Query suits for interactive network telemetry applications.
There are four types of data from network devices:
Simple Data: The data that are steadily available from some data
store or static probes in network devices. such data can be
specified by YANG model.
Complex Data: The data need to be synthesized or processed in
network from raw data from one or more network devices. The data
processing function can be statically or dynamically loaded into
network devices.
Event-triggered Data: The data are conditionally acquired based on
the occurrence of some events. An event can be modeled as a
Finite State Machine (FSM).
Streaming Data: The data are continuously or periodically generated.
It can be time series or the dump of databases. The streaming
data reflect realtime network states and metrics and require large
bandwidth and processing power.
The above data types are not mutually exclusive. For example, event-
triggered data can be simple or complex, and streaming data can be
event triggered. The relationships of these data types are
illustrated in Figure 1.
+--------------------------+
| +----------------------+ |
| | +-----------------+ | |
| | | +-------------+ | | |
| | | | Simple Data | | | |
| | | +-------------+ | | |
| | | Complex Data | | |
| | +-----------------+ | |
| | Event-triggered Data | |
| +----------------------+ |
| Streaming Data |
+--------------------------+
Figure 1: Data Type Relationship
Subscription usually deals with event-triggered data and streaming
data, and query usually deals with simple data and complex data. The
conventional OAM techniques are mostly about querying simple data.
While these techniques are still useful, more advanced network
telemetry techniques are designed mainly for event-triggered or
streaming data subscription, and complex data query.
4.2. Data Object Modules 4.1. Top Level Modules
Telemetry can be applied on the forwarding plane, the control plane, Telemetry can be applied on the forwarding plane, the control plane,
and the management plane in a network, as well as other sources out and the management plane in a network, as well as other sources out
of the network, as shown in Figure 2. Therefore, we categorize the of the network, as shown in Figure 1. Therefore, we categorize the
network telemetry into four distinct modules with each having its own network telemetry into four distinct modules with each having its own
interface to Network Operation Applications. interface to Network Operation Applications.
+------------------------------+ +------------------------------+
| | | |
| Network Operation |<-------+ | Network Operation |<-------+
| Applications | | | Applications | |
| | | | | |
+------------------------------+ | +------------------------------+ |
^ ^ ^ | ^ ^ ^ |
skipping to change at page 14, line 28 skipping to change at page 13, line 40
| | | | | Event | | | | | | Event |
| ^ V | Management | | Telemetry | | ^ V | Management | | Telemetry |
+------|--------+ Plane | | | +------|--------+ Plane | | |
| V | Telemetry | +-----------+ | V | Telemetry | +-----------+
| Forwarding | | | Forwarding | |
| Plane <---> | | Plane <---> |
| Telemetry | | | Telemetry | |
| | | | | |
+---------------+--------------+ +---------------+--------------+
Figure 2: Modules in Layer Category of NTF Figure 1: Modules in Layer Category of NTF
The rationale of this partition lies in the different telemetry data The rationale of this partition lies in the different telemetry data
objects which result in different data source and export locations. objects which result in different data source and export locations.
Such differences have profound implications on in-network data Such differences have profound implications on in-network data
programming and processing capability, data encoding and transport programming and processing capability, data encoding and transport
protocol, and data bandwidth and latency. protocol, and data bandwidth and latency.
We summarize the major differences of the four modules in the We summarize the major differences of the four modules in the
following table. They are compared from six aspects: data object, following table. They are compared from six aspects: data object,
data export location, data model, data encoding, telemetry protocol, data export location, data model, data encoding, telemetry protocol,
skipping to change at page 15, line 34 skipping to change at page 14, line 43
|Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | |Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON |
|Encoding | XML, plain | XML | | XML, plain| |Encoding | XML, plain | XML | | XML, plain|
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC | |Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC |
| | IPFIX,mirror | | | | | | IPFIX,mirror | | | |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP |
| | UDP | | | UDP | | | UDP | | | UDP |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
Figure 3: Comparison of the Data Object Modules Figure 2: Comparison of the Data Object Modules
Note that the interaction with the network operation applications can Note that the interaction with the network operation applications can
be indirect. Some in-device data transfer is possible. For example, be indirect. Some in-device data transfer is possible. For example,
in the management plane telemetry, the management plane may need to in the management plane telemetry, the management plane may need to
acquire data from the data plane. Some of the operational states can acquire data from the data plane. Some of the operational states can
only be derived from the data plane such as the interface status and only be derived from the data plane such as the interface status and
statistics. For another example, the control plane telemetry may statistics. For another example, the control plane telemetry may
need to access the Forwarding Information Base (FIB) in data plane. need to access the Forwarding Information Base (FIB) in data plane.
On the other hand, an application may involve more than one plane and On the other hand, an application may involve more than one plane and
interact with multiple planes simultaneously. For example, an SLA interact with multiple planes simultaneously. For example, an SLA
compliance application may require both the data plane telemetry and compliance application may require both the data plane telemetry and
the control plane telemetry. the control plane telemetry.
4.2.1. Requirements and Challenges for each Module The requirements and challenges for each module are summarized as
follows.
4.2.1.1. Management Plane Telemetry 4.1.1. Management Plane Telemetry
The management plane of network elements interacts with the Network The management plane of network elements interacts with the Network
Management System (NMS), and provides information such as performance Management System (NMS), and provides information such as performance
data, network logging data, network warning and defects data, and data, network logging data, network warning and defects data, and
network statistics and state data. Some legacy protocols, such as network statistics and state data. Some legacy protocols, such as
SNMP and Syslog, are widely used for the management plane. However, SNMP and Syslog, are widely used for the management plane. However,
these protocols are insufficient to meet the requirements of the these protocols are insufficient to meet the requirements of the
future automated network operation applications. future automated network operation applications.
New management plane telemetry protocols should consider the New management plane telemetry protocols should consider the
skipping to change at page 16, line 36 skipping to change at page 15, line 44
languages such as YANG can efficiently describe structured data languages such as YANG can efficiently describe structured data
and normalize data encoding and transformation. and normalize data encoding and transformation.
High Speed Data Transport: In order to retain the information, a High Speed Data Transport: In order to retain the information, a
server needs to send a large amount of data at high frequency. server needs to send a large amount of data at high frequency.
Compact encoding formats are needed to compress the data and Compact encoding formats are needed to compress the data and
improve the data transport efficiency. The subscription mode, by improve the data transport efficiency. The subscription mode, by
replacing the query mode, reduces the interactions between clients replacing the query mode, reduces the interactions between clients
and servers and helps to improve the server's efficiency. and servers and helps to improve the server's efficiency.
4.2.1.2. Control Plane Telemetry 4.1.2. Control Plane Telemetry
The control plane telemetry refers to the health condition monitoring The control plane telemetry refers to the health condition monitoring
of different network control protocols covering Layer 2 to Layer 7. of different network control protocols covering Layer 2 to Layer 7.
Keeping track of the running status of these protocols is beneficial Keeping track of the running status of these protocols is beneficial
for detecting, localizing, and even predicting various network for detecting, localizing, and even predicting various network
issues, as well as network optimization, in real-time and in fine issues, as well as network optimization, in real-time and in fine
granularity. granularity.
One of the most challenging problems for the control plane telemetry One of the most challenging problems for the control plane telemetry
is how to correlate the End-to-End (E2E) Key Performance Indicators is how to correlate the End-to-End (E2E) Key Performance Indicators
skipping to change at page 17, line 22 skipping to change at page 16, line 31
and network optimization. and network optimization.
An example of the control plane telemetry is the BGP monitoring An example of the control plane telemetry is the BGP monitoring
protocol (BMP), it is currently used to monitoring the BGP routes and protocol (BMP), it is currently used to monitoring the BGP routes and
enables rich applications, such as BGP peer analysis, AS analysis, enables rich applications, such as BGP peer analysis, AS analysis,
prefix analysis, security analysis, and so on. However, the prefix analysis, security analysis, and so on. However, the
monitoring of other layers, protocols and the cross-layer, cross- monitoring of other layers, protocols and the cross-layer, cross-
protocol KPI correlations are still in their infancy (e.g., the IGP protocol KPI correlations are still in their infancy (e.g., the IGP
monitoring is missing), which require further research. monitoring is missing), which require further research.
4.2.1.3. Data Plane Telemetry 4.1.3. Data Plane Telemetry
An effective data plane telemetry system relies on the data that the An effective data plane telemetry system relies on the data that the
network device can expose. The data's quality, quantity, and network device can expose. The data's quality, quantity, and
timeliness must meet some stringent requirements. This raises some timeliness must meet some stringent requirements. This raises some
challenges to the network data plane devices where the first hand challenges to the network data plane devices where the first hand
data originate. data originate.
o A data plane device's main function is user traffic processing and o A data plane device's main function is user traffic processing and
forwarding. While supporting network visibility is important, the forwarding. While supporting network visibility is important, the
telemetry is just an auxiliary function, and it should not impede telemetry is just an auxiliary function, and it should not impede
skipping to change at page 18, line 16 skipping to change at page 17, line 26
o The data plane telemetry should support incremental deployment and o The data plane telemetry should support incremental deployment and
work even though some devices are unaware of the system. This work even though some devices are unaware of the system. This
challenge is highly relevant to the standards and legacy networks. challenge is highly relevant to the standards and legacy networks.
The data plane programmability is essential to support network The data plane programmability is essential to support network
telemetry. Newer data plane forwarding chips are equipped with telemetry. Newer data plane forwarding chips are equipped with
advanced telemetry features and provide flexibility to support advanced telemetry features and provide flexibility to support
customized telemetry functions. customized telemetry functions.
4.2.1.3.1. Technique Taxonomy 4.1.3.1. Technique Taxonomy
There can be multiple possible dimensions to classify the data plane There can be multiple possible dimensions to classify the data plane
telemetry techniques. telemetry techniques.
Active, Passive, and Hybrid: The active and passive methods (as well Active, Passive, and Hybrid: The active and passive methods (as well
as the hybrid types) are well documented in [RFC7799]. The as the hybrid types) are well documented in [RFC7799]. The
passive methods include TCPDUMP, IPFIX [RFC7011], sflow, and passive methods include TCPDUMP, IPFIX [RFC7011], sflow, and
traffic mirror. These methods usually have low data coverage. traffic mirror. These methods usually have low data coverage.
The bandwidth cost is very high in order to improve the data The bandwidth cost is very high in order to improve the data
coverage. On the other hand, the active methods include Ping, coverage. On the other hand, the active methods include Ping,
Traceroute, OWAMP [RFC4656], and TWAMP [RFC5357]. These methods Traceroute, OWAMP [RFC4656], TWAMP [RFC5357], and Cisco's SLA
are intrusive and only provide indirect network measurement Protocol [RFC6812]. These methods are intrusive and only provide
results. The hybrid methods, including in-situ OAM indirect network measurement results. The hybrid methods,
[I-D.ietf-ippm-ioam-data], IPFPM [RFC8321], and Multipoint including in-situ OAM [I-D.ietf-ippm-ioam-data], IPFPM [RFC8321],
Alternate Marking [I-D.fioccola-ippm-multipoint-alt-mark], provide and Multipoint Alternate Marking
a well-balanced and more flexible approach. However, these [I-D.fioccola-ippm-multipoint-alt-mark], provide a well-balanced
methods are also more complex to implement. and more flexible approach. However, these methods are also more
complex to implement.
In-Band and Out-of-Band: The telemetry data, before being exported In-Band and Out-of-Band: The telemetry data, before being exported
to some collector, can be carried in user packets. Such methods to some collector, can be carried in user packets. Such methods
are considered in-band (e.g., in-situ OAM are considered in-band (e.g., in-situ OAM
[I-D.ietf-ippm-ioam-data]). If the telemetry data is directly [I-D.ietf-ippm-ioam-data]). If the telemetry data is directly
exported to some collector without modifying the user packets, exported to some collector without modifying the user packets,
such methods are considered out-of-band (e.g., postcard-based such methods are considered out-of-band (e.g., postcard-based
INT). It is possible to have hybrid methods. For example, only INT). It is possible to have hybrid methods. For example, only
the telemetry instruction or partial data is carried by user the telemetry instruction or partial data is carried by user
packets (e.g., IPFPM [RFC8321]). packets (e.g., IPFPM [RFC8321]).
E2E and In-Network: Some E2E methods start from and end at the E2E and In-Network: Some E2E methods start from and end at the
network end hosts (e.g., Ping). The other methods work in network end hosts (e.g., Ping). The other methods work in
networks and are transparent to end hosts. However, if needed, networks and are transparent to end hosts. However, if needed,
the in-network methods can be easily extended into end hosts. the in-network methods can be easily extended into end hosts.
Flow, Path, and Node: Depending on the telemetry objective, the Information Type: Depending on the telemetry objective, the methods
methods can be flow-based (e.g., in-situ OAM can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]),
path-based (e.g., Traceroute), and node-based (e.g., IPFIX
[I-D.ietf-ippm-ioam-data]), path-based (e.g., Traceroute), and [RFC7011]). The various data objects can be packet, flow record,
node-based (e.g., IPFIX [RFC7011]). measurement, states, and signal.
4.2.1.4. External Data Telemetry 4.1.4. External Data Telemetry
Events that occur outside the boundaries of the network system are Events that occur outside the boundaries of the network system are
another important source of network telemetry. Correlating both another important source of network telemetry. Correlating both
internal telemetry data and external events with the requirements of internal telemetry data and external events with the requirements of
network systems, as presented in network systems, as presented in
[I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and
functional advantage to management operations. functional advantage to management operations.
As with other sources of telemetry information, the data and events As with other sources of telemetry information, the data and events
must meet strict requirements, especially in terms of timeliness, must meet strict requirements, especially in terms of timeliness,
skipping to change at page 19, line 48 skipping to change at page 19, line 11
current and future devices and applications. Therefore, it must current and future devices and applications. Therefore, it must
be easily mapped to current information models, such as in terms be easily mapped to current information models, such as in terms
of YANG. of YANG.
Organizing together both internal and external telemetry information Organizing together both internal and external telemetry information
will be key for the general exploitation of the management will be key for the general exploitation of the management
possibilities of current and future network systems, as reflected in possibilities of current and future network systems, as reflected in
the incorporation of cognitive capabilities to new hardware and the incorporation of cognitive capabilities to new hardware and
software (virtual) elements. software (virtual) elements.
4.3. Function Components 4.2. Second Level Function Components
The telemetry module at each plane can be further partitioned into Reflecting the best current practice, the telemetry module at each
five distinct components: plane is further partitioned into five distinct components:
Data Query, Analysis, and Storage: This component works at the Data Query, Analysis, and Storage: This component works at the
application layer. On the one hand, it is responsible for issuing application layer. On the one hand, it is responsible for issuing
data requirements. The data of interest can be modeled data data requirements. The data of interest can be modeled data
through configuration or custom data through programming. The through configuration or custom data through programming. The
data requirements can be queries for one-shot data or data requirements can be queries for one-shot data or
subscriptions for events or streaming data. On the other hand, it subscriptions for events or streaming data. On the other hand, it
receives, stores, and processes the returned data from network receives, stores, and processes the returned data from network
devices. Data analysis can be interactive to initiate further devices. Data analysis can be interactive to initiate further
data queries. This component can reside in either network devices data queries. This component can reside in either network devices
skipping to change at page 21, line 9 skipping to change at page 20, line 9
Data Object and Source: This component determines the monitoring Data Object and Source: This component determines the monitoring
object and original data source. The data source usually just object and original data source. The data source usually just
provides raw data which needs further processing. A data source provides raw data which needs further processing. A data source
can be considered a probe. A probe can be statically installed or can be considered a probe. A probe can be statically installed or
dynamically installed. dynamically installed.
+----------------------------------------+ +----------------------------------------+
| | | |
| Data Query, Analysis, & Storage | | Data Query, Analysis, & Storage |
| | | |
+----------------------------------------+ +-------+++ -----------------------------+
| ^ ||| ^^^
| | ||| |||
V | ||V |||
+---------------------+------------------+ +--+V--------------------+++------------+
| Data Configuration | | +-----V---------------------+------------+ |
| & Subscription | Data Encoding | +---------------------+-------+----------+ | |
| (model, template, | & Export | | Data Configuration | | | |
| & program) | | | & Subscription | Data Encoding | | |
+---------------------+------------------| | (model, template, | & Export | | |
| | | & program) | | | |
| Data Generation | +---------------------+------------------| | |
| & Processing | | | | |
| | | Data Generation | | |
+----------------------------------------| | & Processing | | |
| | | | | |
| Data Object and Source | +----------------------------------------| | |
| | | | | |
| Data Object and Source | |-+
| |-+
+----------------------------------------+ +----------------------------------------+
Figure 4: Components in the Network Telemetry Framework Figure 3: Components in the Network Telemetry Framework
4.3. Data Acquiring Mechanism and Type Abstraction
Broadly speaking, network data can be acquired through subscription
(push) and query (poll). Subscription is a contract between
publisher and subscriber. After initial setup, the subscribed data
is automatically delivered to registered subscribers until the
subscription expires. Subscription can be partitioned into two sub
modes: the Publish-Subscription (Pub-Sub) mode and the Subscription-
Publish (Sub-Pub) mode. In the Pub-Sub mode, a publisher publishes
pre-defined data and any qualified subscribers can subscribe the data
as-is. In the Sub-Pub mode, a subscriber initiates a data request
and sends it to a publisher; the publisher will deliver the requested
data when available.
In contrast, query is used when a querier expects immediate and one-
off feedback from network devices. The queried data may be directly
extracted from some specific data source, or synthesized and
processed from raw data. Query suits for interactive network
telemetry applications.
There are four types of data from network devices:
Simple Data: The data that are steadily available from some data
store or static probes in network devices. such data can be
specified by YANG model.
Complex Data: The data need to be synthesized or processed in
network from raw data from one or more network devices. The data
processing function can be statically or dynamically loaded into
network devices.
Event-triggered Data: The data are conditionally acquired based on
the occurrence of some events. An event can be modeled as a
Finite State Machine (FSM).
Streaming Data: The data are continuously or periodically generated.
It can be time series or the dump of databases. The streaming
data reflect realtime network states and metrics and require large
bandwidth and processing power.
The above data types are not mutually exclusive. For example, event-
triggered data can be simple or complex, and streaming data can be
simple, complex, or triggered by events. The relationships of these
data types are illustrated in Figure 4.
+--------------+
+------>| Simple Data |<------+
| +------------- + |
| ^ |
| | |
| +------+-------+ |
| +-->| Complex Data |<--+ |
| | +--------------+ | |
| | | |
| | | |
+-------+---+----------+ +-----+---+-------+
| Event-triggered Data |<----+ Streaming Data |
+----------------------+ +-----------------+
Figure 4: Data Type Relationship
Subscription usually deals with event-triggered data and streaming
data, and query usually deals with simple data and complex data. But
the other ways are also possible. The conventional OAM techniques
are mostly about querying simple data. While these techniques are
still useful, more advanced network telemetry techniques are designed
mainly for event-triggered or streaming data subscription, and
complex data query.
4.4. Existing Works Mapped in the Framework 4.4. Existing Works Mapped in the Framework
The following two tables provide a non-exhaustive list of existing The following two tables provide a non-exhaustive list of existing
works (mainly published in IETF and with the emphasis on the latest works (mainly published in IETF and with the emphasis on the latest
new technologies) and shows their positions in the framework. More new technologies) and shows their positions in the framework. More
details can be found in Appendix A. details can be found in Appendix A.
The first table is based on the data acquiring mechanisms and data The first table is based on the data acquiring mechanisms and data
types. types.
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| | Query | Subscription | | | Query | Subscription |
| | | | | | | |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Simple Data | SNMP, NETCONF,| SNMP, NETCONF | | Simple Data | SNMP, NETCONF,| SNMP, NETCONF |
| | YANG, BMP, | YANG, gRPC | | | YANG, BMP, | YANG, gRPC |
| | gRPC | | | | SMIv2, gRPC | |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Complex Data | DNP, YANG FSM | DNP, YANG PUSH | | Complex Data | DNP, YANG FSM | DNP, YANG PUSH |
| | gRPC, NETCONF | gPRC, NETCONF | | | gRPC, NETCONF | gPRC, NETCONF |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Event-triggered | | gRPC, NETCONF, | | Event-triggered | | gRPC, NETCONF, |
| Data | N/A | YANG PUSH, DNP | | Data | N/A | YANG PUSH, DNP |
| | | YANG FSM | | | | YANG FSM |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Streaming Data | | gRPC, NETCONF, | | Streaming Data | | gRPC, NETCONF, |
| | N/A | IOAM, PBT, DNP | | | N/A | IOAM, PBT, DNP |
| | | IPFIX, IPFPM | | | | IPFIX, IPFPM |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
Figure 5: Existing Work Mapping I Figure 5: Existing Work Mapping I
The second table is based on the telemetry modules and components. The second table is based on the telemetry modules and components.
+--------------+---------------+----------------+---------------+ +-------------+-----------------+---------------+--------------+
| | Management | Control | Forwarding | | | Management | Control | Forwarding |
| | Plane | Plane | Plane | | | Plane | Plane | Plane |
+--------------+---------------+----------------+---------------+ +-------------+-----------------+---------------+--------------+
| data Config. | gRPC, NETCONF,| NETCONF/YANG | NETCONF/YANG, | | data config.| gRPC, NETCONF, | NETCONF/YANG | NETCONF/YANG,|
| & subscrib. | YANG PUSH | | YANG FSM | | & subscribe | SMIv2,YANG PUSH | | YANG FSM |
+--------------+---------------+----------------+---------------+ +-------------+-----------------+---------------+--------------+
| data gen. & | DNP, | DNP, | IOAM, | | data gen. & | DNP, | DNP, | IOAM, |
| processing | YANG | YANG | PBT, IPFPM, | | process | YANG | YANG | PBT, IPFPM, |
| | | | DNP | | | | | DNP |
+--------------+---------------+----------------+---------------+ +-------------+-----------------+---------------+--------------+
| data | gRPC, NETCONF | BMP, NETCONF | IPFIX | | data | gRPC, NETCONF | BMP, NETCONF | IPFIX |
| export | YANG PUSH | | | | export | YANG PUSH | | |
+--------------+---------------+----------------+---------------+ +-------------+-----------------+---------------+--------------+
Figure 6: Existing Work Mapping II Figure 6: Existing Work Mapping II
5. Evolution of Network Telemetry 5. Evolution of Network Telemetry
Network telemetry is a fast evolving technical area. As the network Network telemetry is a fast evolving technical area. As the network
moves towards the automated operation, network telemetry undergoes moves towards the automated operation, network telemetry undergoes
several levels of evolution. several stages of evolution. Each stage is built upon the techniques
enabled by previous stages.
Level 0 - Static Telemetry: The telemetry data source and type are Stage 0 - Static Telemetry: The telemetry data source and type are
determined at design time. The network operator can only determined at design time. The network operator can only
configure how to use it with limited flexibility. configure how to use it with limited flexibility.
Level 1 - Dynamic Telemetry: The telemetry data can be dynamically Stage 1 - Dynamic Telemetry: The custom telemetry data can be
programmed or configured at runtime, allowing a tradeoff among dynamically programmed or configured at runtime, allowing a
resource, performance, flexibility, and coverage. DNP is an tradeoff among resource, performance, flexibility, and coverage.
effort towards this direction. DNP is an effort towards this direction.
Level 2 - Interactive Telemetry: The network operator can Stage 2 - Interactive Telemetry: The network operator can
continuously customize the telemetry data in real time to reflect continuously customize the telemetry data in real time to reflect
the network operation's visibility requirements. At this level, the network operation's visibility requirements. At this stage,
some tasks can be automated, although ultimately human operators some tasks can be automated, although ultimately human operators
will still need to sit in the middle to make decisions. will still need to sit in the middle to make decisions.
Level 3 - Closed-loop Telemetry: Human operators are completely Stage 3 - Closed-loop Telemetry: Human operators are completely
excluded from the control loop. The intelligent network operation excluded from the control loop. The intelligent network operation
engine automatically issues the telemetry data requests, analyzes engine automatically issues the telemetry data requests, analyzes
the data, and updates the network operations in closed control the data, and updates the network operations in closed control
loops. loops.
While most of the existing technologies belong to level 0 and level The most of the existing technologies belong to stage 0 and stage 1.
1, with the help of a clearly defined network telemetry framework, we Individual stage 2 and stage 3 applications are also possible now.
are now possible to assemble the technologies to support level 2 and
make solid steps towards level 3. However, the future autonomic networks may need a comprehensive
operation management system which relies on stage 2 and stage 3
telemetry to cover all the network operation tasks. A well-defined
network telemetry framework is the first step towards this direction.
6. Security Considerations 6. Security Considerations
The complexity of network telemetry raises significant security
implications. For example, telemetry data can be manipulated to
exhaust various network resources at each plane as well as the data
consumer; falsified or tampered data can mislead the decision making
and paralyze networks; wrong configuration and programming for
telemetry is equally harmful.
Given that this document has proposed a framework for network Given that this document has proposed a framework for network
telemetry and the telemetry mechanisms discussed are distinct (in telemetry and the telemetry mechanisms discussed are distinct (in
both message frequency and traffic amount) from the conventional both message frequency and traffic amount) from the conventional
network OAM concepts, we must also reflect that various new security network OAM concepts, we must also reflect that various new security
considerations may also arise. A number of techniques already exist considerations may also arise. A number of techniques already exist
for securing the forwarding plane, the control plane, and the for securing the forwarding plane, the control plane, and the
management plane in a network, but it is important to consider if any management plane in a network, but it is important to consider if any
new threat vectors are now being enabled via the use of network new threat vectors are now being enabled via the use of network
telemetry procedures and mechanisms. telemetry procedures and mechanisms.
skipping to change at page 24, line 42 skipping to change at page 25, line 29
o Tianran Zhou o Tianran Zhou
o Zhenbin Li o Zhenbin Li
o Zhenqiang Li o Zhenqiang Li
o Daniel King o Daniel King
o Adrian Farrel o Adrian Farrel
o Alexander Clemm
9. Acknowledgments 9. Acknowledgments
We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James
Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani, Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani,
Young Lee, Alexander Clemm, Qin Wu, and many others who have provided Young Lee, Qin Wu, and many others who have provided helpful comments
helpful comments and suggestions to improve this document. and suggestions to improve this document.
10. Informative References 10. Informative References
[gnmi] "gNMI - gRPC Network Management Interface", [gnmi] "gNMI - gRPC Network Management Interface",
<https://github.com/openconfig/reference/tree/master/rpc/ <https://github.com/openconfig/reference/tree/master/rpc/
gnmi>. gnmi>.
[grpc] "gPPC, A high performance, open-source universal RPC [grpc] "gPPC, A high performance, open-source universal RPC
framework", <https://grpc.io>. framework", <https://grpc.io>.
skipping to change at page 25, line 29 skipping to change at page 26, line 14
[I-D.ietf-grow-bmp-adj-rib-out] [I-D.ietf-grow-bmp-adj-rib-out]
Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S.
Zhuang, "Support for Adj-RIB-Out in BGP Monitoring Zhuang, "Support for Adj-RIB-Out in BGP Monitoring
Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work
in progress), August 2019. in progress), August 2019.
[I-D.ietf-grow-bmp-local-rib] [I-D.ietf-grow-bmp-local-rib]
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente,
"Support for Local RIB in BGP Monitoring Protocol (BMP)", "Support for Local RIB in BGP Monitoring Protocol (BMP)",
draft-ietf-grow-bmp-local-rib-06 (work in progress), draft-ietf-grow-bmp-local-rib-07 (work in progress), May
November 2019. 2020.
[I-D.ietf-ippm-ioam-data] [I-D.ietf-ippm-ioam-data]
Brockners, F., Bhandari, S., Pignataro, C., Gredler, H., Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields
Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov, for In-situ OAM", draft-ietf-ippm-ioam-data-10 (work in
P., remy@barefootnetworks.com, r., daniel.bernier@bell.ca, progress), July 2020.
d., and J. Lemon, "Data Fields for In-situ OAM", draft-
ietf-ippm-ioam-data-09 (work in progress), March 2020.
[I-D.ietf-netconf-udp-pub-channel] [I-D.ietf-netconf-udp-pub-channel]
Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication
Channel for Streaming Telemetry", draft-ietf-netconf-udp- Channel for Streaming Telemetry", draft-ietf-netconf-udp-
pub-channel-05 (work in progress), March 2019. pub-channel-05 (work in progress), March 2019.
[I-D.ietf-netconf-yang-push] [I-D.irtf-nmrg-ibn-concepts-definitions]
Clemm, A. and E. Voit, "Subscription to YANG Datastores", Clemm, A., Ciavaglia, L., Granville, L., and J. Tantsura,
draft-ietf-netconf-yang-push-25 (work in progress), May "Intent-Based Networking - Concepts and Definitions",
2019. draft-irtf-nmrg-ibn-concepts-definitions-02 (work in
progress), September 2020.
[I-D.kumar-rtgwg-grpc-protocol] [I-D.kumar-rtgwg-grpc-protocol]
Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC
Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in
progress), July 2016. progress), July 2016.
[I-D.openconfig-rtgwg-gnmi-spec] [I-D.openconfig-rtgwg-gnmi-spec]
Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack,
C., and C. Morrow, "gRPC Network Management Interface C., and C. Morrow, "gRPC Network Management Interface
(gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in (gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in
skipping to change at page 26, line 20 skipping to change at page 27, line 8
[I-D.pedro-nmrg-anticipated-adaptation] [I-D.pedro-nmrg-anticipated-adaptation]
Martinez-Julia, P., "Exploiting External Event Detectors Martinez-Julia, P., "Exploiting External Event Detectors
to Anticipate Resource Requirements for the Elastic to Anticipate Resource Requirements for the Elastic
Adaptation of SDN/NFV Systems", draft-pedro-nmrg- Adaptation of SDN/NFV Systems", draft-pedro-nmrg-
anticipated-adaptation-02 (work in progress), June 2018. anticipated-adaptation-02 (work in progress), June 2018.
[I-D.song-ippm-postcard-based-telemetry] [I-D.song-ippm-postcard-based-telemetry]
Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee, Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee,
"Postcard-based On-Path Flow Data Telemetry", draft-song- "Postcard-based On-Path Flow Data Telemetry", draft-song-
ippm-postcard-based-telemetry-06 (work in progress), ippm-postcard-based-telemetry-07 (work in progress), April
October 2019. 2020.
[I-D.song-opsawg-dnp4iq] [I-D.song-opsawg-dnp4iq]
Song, H. and J. Gong, "Requirements for Interactive Query Song, H. and J. Gong, "Requirements for Interactive Query
with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01
(work in progress), June 2017. (work in progress), June 2017.
[I-D.song-opsawg-ifit-framework] [I-D.song-opsawg-ifit-framework]
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In- Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In-
situ Flow Information Telemetry", draft-song-opsawg-ifit- situ Flow Information Telemetry", draft-song-opsawg-ifit-
framework-11 (work in progress), March 2020. framework-12 (work in progress), April 2020.
[I-D.zhou-netconf-multi-stream-originators] [I-D.zhou-netconf-multi-stream-originators]
Zhou, T., Zheng, G., Voit, E., and A. Clemm, "Subscription Zhou, T., Zheng, G., Voit, E., and A. Clemm, "Subscription
to Multiple Stream Originators", draft-zhou-netconf-multi- to Multiple Stream Originators", draft-zhou-netconf-multi-
stream-originators-10 (work in progress), November 2019. stream-originators-10 (work in progress), November 2019.
[RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin,
"Simple Network Management Protocol (SNMP)", RFC 1157, "Simple Network Management Protocol (SNMP)", RFC 1157,
DOI 10.17487/RFC1157, May 1990, DOI 10.17487/RFC1157, May 1990,
<https://www.rfc-editor.org/info/rfc1157>. <https://www.rfc-editor.org/info/rfc1157>.
[RFC2578] McCloghrie, K., Ed., Perkins, D., Ed., and J.
Schoenwaelder, Ed., "Structure of Management Information
Version 2 (SMIv2)", STD 58, RFC 2578,
DOI 10.17487/RFC2578, April 1999,
<https://www.rfc-editor.org/info/rfc2578>.
[RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981,
DOI 10.17487/RFC2981, October 2000, DOI 10.17487/RFC2981, October 2000,
<https://www.rfc-editor.org/info/rfc2981>. <https://www.rfc-editor.org/info/rfc2981>.
[RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations [RFC3416] Presuhn, R., Ed., "Version 2 of the Protocol Operations
for the Simple Network Management Protocol (SNMP)", for the Simple Network Management Protocol (SNMP)",
STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002, STD 62, RFC 3416, DOI 10.17487/RFC3416, December 2002,
<https://www.rfc-editor.org/info/rfc3416>. <https://www.rfc-editor.org/info/rfc3416>.
[RFC3594] Duffy, P., "PacketCable Security Ticket Control Sub-Option
for the DHCP CableLabs Client Configuration (CCC) Option",
RFC 3594, DOI 10.17487/RFC3594, September 2003,
<https://www.rfc-editor.org/info/rfc3594>.
[RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management
Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877,
September 2004, <https://www.rfc-editor.org/info/rfc3877>. September 2004, <https://www.rfc-editor.org/info/rfc3877>.
[RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M. [RFC4656] Shalunov, S., Teitelbaum, B., Karp, A., Boote, J., and M.
Zekauskas, "A One-way Active Measurement Protocol Zekauskas, "A One-way Active Measurement Protocol
(OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006, (OWAMP)", RFC 4656, DOI 10.17487/RFC4656, September 2006,
<https://www.rfc-editor.org/info/rfc4656>. <https://www.rfc-editor.org/info/rfc4656>.
[RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. [RFC5357] Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J.
skipping to change at page 27, line 29 skipping to change at page 28, line 29
[RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for
the Network Configuration Protocol (NETCONF)", RFC 6020, the Network Configuration Protocol (NETCONF)", RFC 6020,
DOI 10.17487/RFC6020, October 2010, DOI 10.17487/RFC6020, October 2010,
<https://www.rfc-editor.org/info/rfc6020>. <https://www.rfc-editor.org/info/rfc6020>.
[RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed.,
and A. Bierman, Ed., "Network Configuration Protocol and A. Bierman, Ed., "Network Configuration Protocol
(NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011,
<https://www.rfc-editor.org/info/rfc6241>. <https://www.rfc-editor.org/info/rfc6241>.
[RFC6812] Chiba, M., Clemm, A., Medley, S., Salowey, J., Thombare,
S., and E. Yedavalli, "Cisco Service-Level Assurance
Protocol", RFC 6812, DOI 10.17487/RFC6812, January 2013,
<https://www.rfc-editor.org/info/rfc6812>.
[RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken, [RFC7011] Claise, B., Ed., Trammell, B., Ed., and P. Aitken,
"Specification of the IP Flow Information Export (IPFIX) "Specification of the IP Flow Information Export (IPFIX)
Protocol for the Exchange of Flow Information", STD 77, Protocol for the Exchange of Flow Information", STD 77,
RFC 7011, DOI 10.17487/RFC7011, September 2013, RFC 7011, DOI 10.17487/RFC7011, September 2013,
<https://www.rfc-editor.org/info/rfc7011>. <https://www.rfc-editor.org/info/rfc7011>.
[RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y. [RFC7276] Mizrahi, T., Sprecher, N., Bellagamba, E., and Y.
Weingarten, "An Overview of Operations, Administration, Weingarten, "An Overview of Operations, Administration,
and Maintenance (OAM) Tools", RFC 7276, and Maintenance (OAM) Tools", RFC 7276,
DOI 10.17487/RFC7276, June 2014, DOI 10.17487/RFC7276, June 2014,
skipping to change at page 28, line 20 skipping to change at page 29, line 26
Monitoring Protocol (BMP)", RFC 7854, Monitoring Protocol (BMP)", RFC 7854,
DOI 10.17487/RFC7854, June 2016, DOI 10.17487/RFC7854, June 2016,
<https://www.rfc-editor.org/info/rfc7854>. <https://www.rfc-editor.org/info/rfc7854>.
[RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli,
L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi,
"Alternate-Marking Method for Passive and Hybrid "Alternate-Marking Method for Passive and Hybrid
Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321,
January 2018, <https://www.rfc-editor.org/info/rfc8321>. January 2018, <https://www.rfc-editor.org/info/rfc8321>.
[RFC8639] Voit, E., Clemm, A., Gonzalez Prieto, A., Nilsen-Nygaard,
E., and A. Tripathy, "Subscription to YANG Notifications",
RFC 8639, DOI 10.17487/RFC8639, September 2019,
<https://www.rfc-editor.org/info/rfc8639>.
[RFC8641] Clemm, A. and E. Voit, "Subscription to YANG Notifications
for Datastore Updates", RFC 8641, DOI 10.17487/RFC8641,
September 2019, <https://www.rfc-editor.org/info/rfc8641>.
Appendix A. A Survey on Existing Network Telemetry Techniques Appendix A. A Survey on Existing Network Telemetry Techniques
In this non-normative appendix, we provide an overview of some In this non-normative appendix, we provide an overview of some
existing techniques and standard proposals for each network telemetry existing techniques and standard proposals for each network telemetry
module. module.
A.1. Management Plane Telemetry A.1. Management Plane Telemetry
A.1.1. Push Extensions for NETCONF A.1.1. Push Extensions for NETCONF
NETCONF [RFC6241] is one popular network management protocol, which NETCONF [RFC6241] is one popular network management protocol, which
is also recommended by IETF. Although it can be used for data is also recommended by IETF. Although it can be used for data
collection, NETCONF is good at configurations. YANG Push collection, NETCONF is good at configurations. YANG Push
[I-D.ietf-netconf-yang-push] extends NETCONF and enables subscriber [RFC8641][RFC8639] extends NETCONF and enables subscriber
applications to request a continuous, customized stream of updates applications to request a continuous, customized stream of updates
from a YANG datastore. Providing such visibility into changes made from a YANG datastore. Providing such visibility into changes made
upon YANG configuration and operational objects enables new upon YANG configuration and operational objects enables new
capabilities based on the remote mirroring of configuration and capabilities based on the remote mirroring of configuration and
operational state. Moreover, distributed data collection mechanism operational state. Moreover, distributed data collection mechanism
[I-D.zhou-netconf-multi-stream-originators] via UDP based publication [I-D.zhou-netconf-multi-stream-originators] via UDP based publication
channel [I-D.ietf-netconf-udp-pub-channel] provides enhanced channel [I-D.ietf-netconf-udp-pub-channel] provides enhanced
efficiency for the NETCONF based telemetry. efficiency for the NETCONF based telemetry.
A.1.2. gRPC Network Management Interface A.1.2. gRPC Network Management Interface
skipping to change at page 29, line 38 skipping to change at page 31, line 7
[I-D.ietf-grow-bmp-local-rib] are encapsulated in the BMP Route [I-D.ietf-grow-bmp-local-rib] are encapsulated in the BMP Route
Monitoring Message and the BMP Route Mirroring Message, in the form Monitoring Message and the BMP Route Mirroring Message, in the form
of both initial table dump and real-time route update. In addition, of both initial table dump and real-time route update. In addition,
BGP statistics are reported through the BMP Stats Report Message, BGP statistics are reported through the BMP Stats Report Message,
which could be either timer triggered or event-driven. More BMP which could be either timer triggered or event-driven. More BMP
extensions can be explored to enrich the applications of BGP extensions can be explored to enrich the applications of BGP
monitoring. monitoring.
A.3. Data Plane Telemetry A.3. Data Plane Telemetry
A.3.1. The IPFPM technology A.3.1. The Alternate Marking technology
The Alternate Marking method is efficient to perform packet loss, The Alternate Marking method is efficient to perform packet loss,
delay, and jitter measurements both in an IP and Overlay Networks, as delay, and jitter measurements both in an IP and Overlay Networks, as
presented in IPFPM [RFC8321] and presented in [RFC8321] and [I-D.fioccola-ippm-multipoint-alt-mark].
[I-D.fioccola-ippm-multipoint-alt-mark].
This technique can be applied to point-to-point and multipoint-to- This technique can be applied to point-to-point and multipoint-to-
multipoint flows. Alternate Marking creates batches of packets by multipoint flows. Alternate Marking creates batches of packets by
alternating the value of 1 bit (or a label) of the packet header. alternating the value of 1 bit (or a label) of the packet header.
These batches of packets are unambiguously recognized over the These batches of packets are unambiguously recognized over the
network and the comparison of packet counters for each batch allows network and the comparison of packet counters for each batch allows
the packet loss calculation. The same idea can be applied to delay the packet loss calculation. The same idea can be applied to delay
measurement by selecting ad hoc packets with a marking bit dedicated measurement by selecting ad hoc packets with a marking bit dedicated
for delay measurements. for delay measurements.
 End of changes. 68 change blocks. 
255 lines changed or deleted 353 lines changed or added

This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/