draft-ietf-opsawg-ntf-02.txt   draft-ietf-opsawg-ntf-03.txt 
OPSAWG H. Song, Ed. OPSAWG H. Song
Internet-Draft Futurewei Internet-Draft Futurewei
Intended status: Informational F. Qin Intended status: Informational F. Qin
Expires: April 10, 2020 China Mobile Expires: October 15, 2020 China Mobile
P. Martinez-Julia P. Martinez-Julia
NICT NICT
L. Ciavaglia L. Ciavaglia
Nokia Nokia
A. Wang A. Wang
China Telecom China Telecom
October 8, 2019 April 13, 2020
Network Telemetry Framework Network Telemetry Framework
draft-ietf-opsawg-ntf-02 draft-ietf-opsawg-ntf-03
Abstract Abstract
Network telemetry is the technology for gaining network insight and Network telemetry is the technology for gaining network insight and
facilitating efficient and automated network management. It engages facilitating efficient and automated network management. It engages
various techniques for remote data collection, correlation, and various techniques for remote data collection, correlation, and
consumption. This document provides an architectural framework for consumption. This document provides an architectural framework for
network telemetry, motivated by the network operation challenges and network telemetry, motivated by the network operation challenges and
requirements. As evidenced by some key characteristics and industry requirements. As evidenced by some key characteristics and industry
practices, network telemetry covers technologies and protocols beyond practices, network telemetry covers technologies and protocols beyond
the conventional network Operations, Administration, and Management the conventional network Operations, Administration, and Management
(OAM). It promises better flexibility, scalability, accuracy, (OAM). It promises better flexibility, scalability, accuracy,
coverage, and performance and allows automated control loops to suit coverage, and performance and allows automated control loops to suit
both today's and tomorrow's network operation. This document both today's and tomorrow's network operation. This document
clarifies the terminologies and classifies the modules and components clarifies the terminologies and classifies the modules and components
of a network telemetry system from several different perspectives. of a network telemetry system from several different perspectives.
To the best of our knowledge, this document is the first such effort The framework and taxonomy help to set a common ground for the
for network telemetry in industry standards organizations. The collection of related work and provide guidance for related technique
framework and taxonomy help to set a common ground for the collection and standard developments.
of related work and provide guidance for future technique and
standard developments.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on April 10, 2020. This Internet-Draft will expire on October 15, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
skipping to change at page 2, line 39 skipping to change at page 2, line 36
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5 2.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6 2.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6
2.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 8 2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 8
3. The Necessity of a Network Telemetry Framework . . . . . . . 10 3. The Necessity of a Network Telemetry Framework . . . . . . . 10
4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 11 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 11
4.1. Data Acquiring Mechanisms and Data Types . . . . . . . . 12 4.1. Data Acquiring Mechanisms and Data Types . . . . . . . . 12
4.2. Data Object Modules . . . . . . . . . . . . . . . . . . . 13 4.2. Data Object Modules . . . . . . . . . . . . . . . . . . . 13
4.2.1. Requirements and Challenges for each Module . . . . . 15 4.2.1. Requirements and Challenges for each Module . . . . . 16
4.3. Function Components . . . . . . . . . . . . . . . . . . . 19 4.3. Function Components . . . . . . . . . . . . . . . . . . . 19
4.4. Existing Works Mapped in the Framework . . . . . . . . . 21 4.4. Existing Works Mapped in the Framework . . . . . . . . . 21
5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 22 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 22
6. Security Considerations . . . . . . . . . . . . . . . . . . . 23 6. Security Considerations . . . . . . . . . . . . . . . . . . . 23
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 24
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 24
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 24
10. Informative References . . . . . . . . . . . . . . . . . . . 24 10. Informative References . . . . . . . . . . . . . . . . . . . 25
Appendix A. A Survey on Existing Network Telemetry Techniques . 28 Appendix A. A Survey on Existing Network Telemetry Techniques . 28
A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 28 A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 28
A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 28 A.1.1. Push Extensions for NETCONF . . . . . . . . . . . . . 28
A.1.2. gRPC Network Management Interface . . . . . . . . . . 28 A.1.2. gRPC Network Management Interface . . . . . . . . . . 28
A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 29 A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 29
A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 29 A.2.1. BGP Monitoring Protocol . . . . . . . . . . . . . . . 29
A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 29 A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 29
A.3.1. The IPFPM technology . . . . . . . . . . . . . . . . 29 A.3.1. The IPFPM technology . . . . . . . . . . . . . . . . 29
A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 30 A.3.2. Dynamic Network Probe . . . . . . . . . . . . . . . . 30
A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 31 A.3.3. IP Flow Information Export (IPFIX) protocol . . . . . 31
A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 31 A.3.4. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 31
A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 31 A.3.5. Postcard Based Telemetry . . . . . . . . . . . . . . 31
A.4. External Data and Event Telemetry . . . . . . . . . . . . 31 A.4. External Data and Event Telemetry . . . . . . . . . . . . 32
A.4.1. Sources of External Events . . . . . . . . . . . . . 32 A.4.1. Sources of External Events . . . . . . . . . . . . . 32
A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 33 A.4.2. Connectors and Interfaces . . . . . . . . . . . . . . 33
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 33
1. Introduction 1. Introduction
Network visibility is the ability of management tools to see the Network visibility is the ability of management tools to see the
state and behavior of a network. It is essential for successful state and behavior of a network. It is essential for successful
network operation. Network telemetry is the process of measuring, network operation. Network telemetry is the process of measuring,
correlating, recording, and distributing information about the correlating, recording, and distributing information about the
behavior of a network. Network telemetry has been considered as an behavior of a network. Network telemetry has been considered as an
ideal means to gain sufficient network visibility with better ideal means to gain sufficient network visibility with better
flexibility, scalability, accuracy, coverage, and performance than flexibility, scalability, accuracy, coverage, and performance than
some conventional network Operations, Administration, and Management some conventional network Operations, Administration, and Management
(OAM) techniques. (OAM) techniques.
However, so far the term of network telemetry lacks a solid and However, the term of network telemetry lacks a solid and unambiguous
unambiguous definition. The scope and coverage of it cause confusion definition. The scope and coverage of it cause confusion and
and misunderstandings. It is beneficial to clarify the concept and misunderstandings. It is beneficial to clarify the concept and
provide a clear architectural framework for network telemetry, so we provide a clear architectural framework for network telemetry, so we
can articulate the technical field, and better align the related can articulate the technical field, and better align the related
techniques and standard works. techniques and standard works.
To fulfill such an undertaking, we first discuss some key To fulfill such an undertaking, we first discuss some key
characteristics of network telemetry which set a clear distinction characteristics of network telemetry which set a clear distinction
from the conventional network OAM and show that some conventional OAM from the conventional network OAM and show that some conventional OAM
technologies can be considered a subset of the network telemetry technologies can be considered a subset of the network telemetry
technologies. We then provide an architectural framework from three technologies. We then provide an architectural framework for network
different perspectives for network telemetry. We show how network telemetry from three different perspectives. We show how network
telemetry can meet the current and future network operation telemetry can meet the current and future network operation
requirements, and the challenges each telemetry module is facing. requirements, and the challenges each telemetry module is facing.
Based on the distinction of modules and function components, we can Based on the distinction of modules and function components, we can
easily map the existing and emerging techniques and protocols into map the existing and emerging techniques and protocols into the
the framework. At last, we outline a road-map for the evolution of framework. At last, we outline a road-map for the evolution of the
the network telemetry system and discuss the potential security network telemetry system and discuss the potential security concerns
concerns for network telemetry. for network telemetry.
The purpose of the framework and taxonomy is to set a common ground The purpose of the framework and taxonomy is to set a common ground
for the collection of related work and provide guidance for future for the collection of related work and provide guidance for future
technique and standard developments. To the best of our knowledge, technique and standard developments. To the best of our knowledge,
this document is the first such effort for network telemetry in this document is the first such effort for network telemetry in
industry standards organizations. industry standards organizations.
2. Motivation 2. Motivation
The term of Big data is used to describe the extremely large volume The term "big data" is used to describe the extremely large volume of
of data sets that can be analyzed computationally to reveal patterns, data sets that can be analyzed computationally to reveal patterns,
trends, and associations. Network is undoubtedly a source of big trends, and associations. Network is undoubtedly a source of big
data because of its scale and all the traffic goes through it. It is data because of its scale and all the traffic goes through it. It is
easy to see that network OAM can benefit from network big data. easy to see that network OAM can benefit from network big data.
Today one can easily access advanced big data analytics capability Today one can access advanced big data analytics capability through a
through a plethora of commercial and open source platforms (e.g., plethora of commercial and open source platforms (e.g., Apache
Apache Hadoop), tools (e.g., Apache Spark), and techniques (e.g., Hadoop), tools (e.g., Apache Spark), and techniques (e.g., machine
machine learning). Thanks to the advance of computing and storage learning). Thanks to the advance of computing and storage
technologies, network big data analytics gives network operators an technologies, network big data analytics gives network operators an
unprecedented opportunity to gain network insights and move towards opportunity to gain network insights and move towards network
network autonomy. Some operators start to explore the application of autonomy. Some operators start to explore the application of
Artificial Intelligence (AI) to make sense of network data. Software Artificial Intelligence (AI) to make sense of network data. Software
tools can use the network data to detect and react on network faults, tools can use the network data to detect and react on network faults,
anomalies, and policy violations, as well as predicting future anomalies, and policy violations, as well as predicting future
events. In turn, the network policy updates for planning, intrusion events. In turn, the network policy updates for planning, intrusion
prevention, optimization, and self-healing may be applied. prevention, optimization, and self-healing may be applied.
It is conceivable that an intent-driven autonomic network [RFC7575] It is conceivable that an intent-driven autonomic network [RFC7575]
is the logical next step for network evolution following Software is the logical next step for network evolution following Software
Defined Network (SDN), aiming to reduce (or even eliminate) human Defined Network (SDN), aiming to reduce (or even eliminate) human
labor, make the most efficient usage of network resources, and labor, make more efficient use of network resources, and provide
provide better services more aligned with customer requirements. better services more aligned with customer requirements. Although it
Although it takes time to reach the ultimate goal, the journey has takes time to reach the ultimate goal, the journey has started
started nevertheless. nevertheless.
However, while the data processing capability is improved and However, while the data processing capability is improved and
applications are hungry for more data, the networks lag behind in applications are hungry for more data, the networks lag behind in
extracting and translating network data into useful and actionable extracting and translating network data into useful and actionable
information. The system bottleneck is shifting from data consumption information in efficient ways. The system bottleneck is shifting
to data supply. Both the number of network nodes and the traffic from data consumption to data supply. Both the number of network
bandwidth keep increasing at a fast pace. The network configuration nodes and the traffic bandwidth keep increasing at a fast pace. The
and policy change at a much smaller time slot than ever before. More network configuration and policy change at smaller time slots than
subtle events and fine-grained data through all network planes need before. More subtle events and fine-grained data through all network
to be captured and exported in real time. In a nutshell, it is a planes need to be captured and exported in real time. In a nutshell,
challenge to get enough high-quality data out of network efficiently, it is a challenge to get enough high-quality data out of network
timely, and flexibly. Therefore, we need to examine the existing efficiently, timely, and flexibly. Therefore, we need to examine the
network technologies and protocols, and identify any potential existing network technologies and protocols, and identify any
technique and standard gaps based on the real network and device potential technique and standard gaps based on the real network and
architectures. device architectures.
In the remaining of this section, first we discuss several key use In the remaining of this section, first we discuss several key use
cases for today's and future network operations. Next, we show why cases for today's and future network operations. Next, we show why
the current network OAM techniques and protocols are insufficient for the current network OAM techniques and protocols are insufficient for
these use cases. The discussion underlines the need of new methods, these use cases. The discussion underlines the need of new methods,
techniques, and protocols which we may assign under an umbrella term techniques, and protocols which we assign under an umbrella term -
- network telemetry. network telemetry.
2.1. Use Cases 2.1. Use Cases
These use cases are essential for network operations. While the list These use cases are essential for network operations. While the list
is by no means exhaustive, it is enough to highlight the requirements is by no means exhaustive, it is enough to highlight the requirements
for data velocity, variety, volume, and veracity in networks. for data velocity, variety, volume, and veracity in networks.
Policy and Intent Compliance: Network policies are the rules that Policy and Intent Compliance: Network policies are the rules that
constraint the services for network access, provide service constraint the services for network access, provide service
differentiation, or enforce specific treatment on the traffic. differentiation, or enforce specific treatment on the traffic.
For example, a service function chain is a policy that requires For example, a service function chain is a policy that requires
the selected flows to pass through a set of ordered network the selected flows to pass through a set of ordered network
functions. An intent is a high-level abstract policy which functions. An intent is a high-level abstract policy which
requires a complex translation and mapping process before being requires a complex translation and mapping process before being
applied on networks. While a policy is enforced, the compliance applied on networks. While a policy is enforced, the compliance
needs to be verified and monitored continuously. needs to be verified and monitored continuously, and any violation
needs to be reported immediately.
SLA Compliance: A Service-Level Agreement (SLA) defines the level of SLA Compliance: A Service-Level Agreement (SLA) defines the level of
service a user expects from a network operator, which include the service a user expects from a network operator, which include the
metrics for the service measurement and remedy/penalty procedures metrics for the service measurement and remedy/penalty procedures
when the service level misses the agreement. Users need to check when the service level misses the agreement. Users need to check
if they get the service as promised and network operators need to if they get the service as promised and network operators need to
evaluate how they can deliver the services that can meet the SLA. evaluate how they can deliver the services that can meet the SLA
based on realtime network measurement.
Root Cause Analysis: Any network failure can be the cause or effect Root Cause Analysis: Any network failure can be the cause or effect
of a sequence of chained events. Troubleshooting and recovery of a sequence of chained events. Troubleshooting and recovery
require quick identification of the root cause of any observable require quick identification of the root cause of any observable
issues. However, the root cause is not always straightforward to issues. However, the root cause is not always straightforward to
identify, especially when the failure is sporadic and the related identify, especially when the failure is sporadic and the related
and unrelated events are overwhelming. While machine learning and unrelated events are overwhelming and interleaved. While
technologies can be used for root cause analysis, it up to the machine learning technologies can be used for root cause analysis,
network to sense and provide all the relevant data. it up to the network to sense and provide the relevant data.
Network Optimization: This covers all short-term and long-term Network Optimization: This covers all short-term and long-term
network optimization techniques, including load balancing, Traffic network optimization techniques, including load balancing, Traffic
Engineering (TE), and network planning. Network operators are Engineering (TE), and network planning. Network operators are
motivated to optimize their network utilization and differentiate motivated to optimize their network utilization and differentiate
services for better Return On Investment (ROI) or lower Capital services for better Return On Investment (ROI) or lower Capital
Expenditures (CAPEX). The first step is to know the real-time Expenditures (CAPEX). The first step is to know the real-time
network conditions before applying policies for traffic network conditions before applying policies for traffic
manipulation. In some cases, micro-bursts need to be detected in manipulation. In some cases, micro-bursts need to be detected in
a very short time-frame so that fine-grained traffic control can a very short time-frame so that fine-grained traffic control can
be applied to avoid network congestion. The long-term network be applied to avoid network congestion. The long-term network
capacity planning and topology augmentation also rely on the capacity planning and topology augmentation rely on the
accumulated data of the network operations. accumulated data of network operations.
Event Tracking and Prediction: The visibility of user traffic path Event Tracking and Prediction: The visibility of traffic path and
and performance is critical for healthy network operation. performance is critical for services and applications that rely on
Numerous related network events are of interest to network healthy network operation. Numerous related network events are of
operators. For example, Network operators always want to learn interest to network operators. For example, Network operators
where and why packets are dropped for an application flow. They want to learn where and why packets are dropped for an application
also want to be warned of issues in advance so proactive actions flow. They also want to be warned of issues in advance so
can be taken to avoid catastrophic consequences. proactive actions can be taken to avoid catastrophic consequences.
2.2. Challenges 2.2. Challenges
For a long time, network operators have relied upon SNMP [RFC3416], For a long time, network operators have relied upon SNMP [RFC3416],
Command-Line Interface (CLI), or Syslog to monitor the network. Some Command-Line Interface (CLI), or Syslog to monitor the network. Some
other OAM techniques as described in [RFC7276] are also used to other OAM techniques as described in [RFC7276] are also used to
facilitate network troubleshooting. These conventional techniques facilitate network troubleshooting. These conventional techniques
are not sufficient to support the above use cases for the following are not sufficient to support the above use cases for the following
reasons: reasons:
o Most use cases need to continuously monitor the network and o Most use cases need to continuously monitor the network and
dynamically refine the data collection in real-time and dynamically refine the data collection in real-time. The poll-
interactively. The poll-based low-frequency data collection is based low-frequency data collection is ill-suited for these
ill-suited for these applications. Subscription-based streaming applications. Subscription-based streaming data directly pushed
data directly pushed from the data source (e.g., the forwarding from the data source (e.g., the forwarding chip) is preferred to
chip) is preferred to provide enough data quantity and precision provide enough data quantity and precision at scale.
at scale.
o Comprehensive data is needed from packet processing engine to o Comprehensive data is needed from packet processing engine to
traffic manager, from line cards to main control board, from user traffic manager, from line cards to main control board, from user
flows to control protocol packets, from device configurations to flows to control protocol packets, from device configurations to
operations, and from physical layer to application layer. operations, and from physical layer to application layer.
Conventional OAM only covers a narrow range of data (e.g., SNMP Conventional OAM only covers a narrow range of data (e.g., SNMP
only handles data from the Management Information Base (MIB)). only handles data from the Management Information Base (MIB)).
Traditional network devices cannot provide all the necessary Traditional network devices cannot provide all the necessary
probes. An open and programmable network device is therefore probes. More open and programmable network devices are therefore
needed. needed.
o Many application scenarios need to correlate network-wide data o Many application scenarios need to correlate network-wide data
from multiple sources (i.e., from distributed network devices, from multiple sources (i.e., from distributed network devices,
different components of a network device, or different network different components of a network device, or different network
planes). A piecemeal solution is often lacking the capability to planes). A piecemeal solution is often lacking the capability to
consolidate the data from multiple sources. The composition of a consolidate the data from multiple sources. The composition of a
complete solution, as partly proposed by Autonomic Resource complete solution, as partly proposed by Autonomic Resource
Control Architecture(ARCA) Control Architecture(ARCA)
[I-D.pedro-nmrg-anticipated-adaptation], will be empowered and [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and
guided by a comprehensive framework. guided by a comprehensive framework.
o Some of the conventional OAM techniques (e.g., CLI and Syslog) o Some of the conventional OAM techniques (e.g., CLI and Syslog)
lack a formal data model. The unstructured data hinder the tool lack a formal data model. The unstructured data hinder the tool
automation and application extensibility. Standardized data automation and application extensibility. Standardized data
models are essential to support the programmable networks. models are essential to support the programmable networks.
o Although some conventional OAM techniques support data push (e.g., o Although some conventional OAM techniques support data push (e.g.,
SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data
are limited to only predefined management plane warnings (e.g., are limited to only predefined management plane warnings (e.g.,
SNMP Trap) or sampled user packets (e.g., sFlow). We require the SNMP Trap) or sampled user packets (e.g., sFlow). Network
data with arbitrary source, granularity, and precision which are operators require the data with arbitrary source, granularity, and
beyond the capability of the existing techniques. precision which are beyond the capability of the existing
techniques.
o The conventional passive measurement techniques can either consume o The conventional passive measurement techniques can either consume
too much network resources and render too much redundant data, or excessive network resources and render excessive redundant data,
lead to inaccurate results; the conventional active measurement or lead to inaccurate results; on the other hand, the conventional
techniques can interfere with the user traffic and their results active measurement techniques can interfere with the user traffic
are indirect. We need techniques that can collect direct and on- and their results are indirect. Techniques that can collect
demand data from user traffic. direct and on-demand data from user traffic are more favorable.
2.3. Glossary 2.3. Glossary
Before further discussion, we list some key terminology and acronyms Before further discussion, we list some key terminology and acronyms
used in this documents. We make an intended distinction between used in this documents. We make an intended distinction between
network telemetry and network OAM. network telemetry and network OAM.
AI: Artificial Intelligence. In network domain, AI refers to the AI: Artificial Intelligence. In network domain, AI refers to the
machine-learning based technologies for automated network machine-learning based technologies for automated network
operation and other tasks. operation and other tasks.
skipping to change at page 8, line 14 skipping to change at page 8, line 17
IOAM: In-situ OAM, a dataplane on-path telemetry technique. IOAM: In-situ OAM, a dataplane on-path telemetry technique.
NETCONF: Network Configuration Protocol, specified in [RFC6241]. NETCONF: Network Configuration Protocol, specified in [RFC6241].
Network Telemetry: Acquiring and processing network data remotely Network Telemetry: Acquiring and processing network data remotely
for network monitoring and operation. A general term for a large for network monitoring and operation. A general term for a large
set of network visibility techniques and protocols, with the set of network visibility techniques and protocols, with the
characteristics defined in this document. Network telemetry characteristics defined in this document. Network telemetry
addresses the current network operation issues and enables smooth addresses the current network operation issues and enables smooth
evolution toward intent-driven autonomous networks. evolution toward future intent-driven autonomous networks.
NMS: Network Management System, referring to applications that allow NMS: Network Management System, referring to applications that allow
network administrators manage a network's software and hardware network administrators manage a network's software and hardware
components. It usually records data from a network's remote components. It usually records data from a network's remote
points to carry out central reporting to a system administrator. points to carry out central reporting to a system administrator.
OAM: Operations, Administration, and Maintenance. A group of OAM: Operations, Administration, and Maintenance. A group of
network management functions that provide network fault network management functions that provide network fault
indication, fault localization, performance information, and data indication, fault localization, performance information, and data
and diagnosis functions. Most conventional network monitoring and diagnosis functions. Most conventional network monitoring
skipping to change at page 9, line 10 skipping to change at page 9, line 13
The representative techniques and protocols include IPFIX [RFC7011] The representative techniques and protocols include IPFIX [RFC7011]
and gPRC [grpc]. Network telemetry allows separate entities to and gPRC [grpc]. Network telemetry allows separate entities to
acquire data from network devices so that data can be visualized and acquire data from network devices so that data can be visualized and
analyzed to support network monitoring and operation. Network analyzed to support network monitoring and operation. Network
telemetry overlaps with the conventional network OAM and has a wider telemetry overlaps with the conventional network OAM and has a wider
scope than it. It is expected that network telemetry can provide the scope than it. It is expected that network telemetry can provide the
necessary network insight for autonomous networks and address the necessary network insight for autonomous networks and address the
shortcomings of conventional OAM techniques. shortcomings of conventional OAM techniques.
One difference between the network telemetry and the network OAM is One difference between the network telemetry and the network OAM is
that the network telemetry assumes machines as data consumer rather that in general the network telemetry assumes machines as data
than human operators. Hence, the network telemetry can directly consumer rather than human operators. Hence, the network telemetry
trigger the automated network operation, while the conventional OAM can directly trigger the automated network operation, while the
tools usually help human operators to monitor and diagnose the conventional OAM tools usually help human operators to monitor and
networks and guide manual network operations. The difference leads diagnose the networks and guide manual network operations. The
to very different techniques. difference leads to very different techniques.
Although the network telemetry techniques are just emerging and Although the network telemetry techniques are just emerging and
subject to continuous evolution, several characteristics of network subject to continuous evolution, several characteristics of network
telemetry have been well accepted (Note that network telemetry is telemetry have been well accepted. Note that network telemetry is
intended to be an umbrella term covering a wide spectrum of intended to be an umbrella term covering a wide spectrum of
techniques, so the following characteristics are not expected to be techniques, so the following characteristics are not expected to be
held by every specific technique): held by every specific technique.
o Push and Streaming: Instead of polling data from network devices, o Push and Streaming: Instead of polling data from network devices,
the telemetry collector subscribes to the streaming data pushed the telemetry collector subscribes to the streaming data pushed
from data sources in network devices. from data sources in network devices.
o Volume and Velocity: The telemetry data is intended to be consumed o Volume and Velocity: The telemetry data is intended to be consumed
by machines rather than by human being. Therefore, the data by machines rather than by human being. Therefore, the data
volume is huge and the processing is often in realtime. volume is huge and the processing is often in realtime.
o Normalization and Unification: Telemetry aims to address the o Normalization and Unification: Telemetry aims to address the
skipping to change at page 10, line 8 skipping to change at page 10, line 11
used in a closed control loop for network automation, it needs to used in a closed control loop for network automation, it needs to
run continuously and adapt to the dynamic and interactive queries run continuously and adapt to the dynamic and interactive queries
from the network operation controller. from the network operation controller.
In addition, an ideal network telemetry solution may also have the In addition, an ideal network telemetry solution may also have the
following features or properties: following features or properties:
o In-Network Customization: The data can be customized in network at o In-Network Customization: The data can be customized in network at
run-time to cater to the specific need of applications. This run-time to cater to the specific need of applications. This
needs the support of a programmable data plane which allows probes needs the support of a programmable data plane which allows probes
to be deployed at flexible locations. with custom functions to be deployed at flexible locations.
o In-Network Data Aggregation and Correlation: Network devices and o In-Network Data Aggregation and Correlation: Network devices and
aggregation points can work out which events and what data needs aggregation points can work out which events and what data needs
to be stored, reported, or discarded thus reducing the load on the to be stored, reported, or discarded thus reducing the load on the
central collection and processing points while still ensuring that central collection and processing points while still ensuring that
the right information is ready to be processed in a timely way. the right information is ready to be processed in a timely way.
o In-Network Processing and Action: Sometimes it is not necessary or o In-Network Processing and Action: Sometimes it is not necessary or
feasible to gather all information to a central point so that it feasible to gather all information to a central point to be
can be processed and acted upon. It is possible for the data processed and acted upon. It is possible for the data processing
processing to be done in the network, and actions taken more to be done in network, and actions to be taken locally.
locally and more responsively.
o Direct Data Plane Export: The data originated from data plane can o Direct Data Plane Export: The data originated from the data plane
be directly exported to the data consumer for efficiency, forwarding chips can be directly exported to the data consumer for
especially when the data bandwidth is large and the real-time efficiency, especially when the data bandwidth is large and the
processing is required. real-time processing is required.
o In-band Data Collection: In addition to the passive and active o In-band Data Collection: In addition to the passive and active
data collection approaches, the new hybrid approach allows to data collection approaches, the new hybrid approach allows to
directly collect data for any target flow on its entire forwarding directly collect data for any target flow on its entire forwarding
path. path [I-D.song-opsawg-ifit-framework].
It is worth noting that, no matter how sophisticated a network It is worth noting that, a network telemetry system should not be
telemetry system is, it should not be intrusive to networks, by intrusive to normal network operations, by avoiding the pitfall of
avoiding the pitfall of the "observer effect". That is, it should the "observer effect". That is, it should not change the network
not change the network behavior and affect the forwarding behavior and affect the forwarding performance. Otherwise, the whole
performance. purpose of network telemetry is defied.
Although in many cases a network telemetry system is akin to the SDN Although in many cases a network telemetry system is akin to the SDN
architecture, it is important to understand that network telemetry architecture, it is important to understand that network telemetry
does not infer the need of any centralized data processing and does not infer the need of any centralized data processing and
analytics engine. Telemetry data producers and consumers can analytics engine. Telemetry data producers and consumers can
perfectly work in distributed or peer-to-peer fashions instead. perfectly work in distributed or peer-to-peer fashions instead.
3. The Necessity of a Network Telemetry Framework 3. The Necessity of a Network Telemetry Framework
Big data analytics and machine-learning based AI technologies are Big data analytics and machine-learning based AI technologies are
skipping to change at page 11, line 21 skipping to change at page 11, line 22
consolidated into a minimum yet comprehensive set. A telemetry consolidated into a minimum yet comprehensive set. A telemetry
framework can help to normalize the technique developments. framework can help to normalize the technique developments.
o Network visibility presents multiple viewpoints. For example, the o Network visibility presents multiple viewpoints. For example, the
device viewpoint takes the network infrastructure as the device viewpoint takes the network infrastructure as the
monitoring object from which the network topology and device monitoring object from which the network topology and device
status can be acquired; the traffic viewpoint takes the flows or status can be acquired; the traffic viewpoint takes the flows or
packets as the monitoring object from which the traffic quality packets as the monitoring object from which the traffic quality
and path can be acquired. An application may need to switch its and path can be acquired. An application may need to switch its
viewpoint during operation. It may also need to correlate a viewpoint during operation. It may also need to correlate a
service and impact on network experience to acquire the service and its impact on network experience to acquire the
comprehensive information. comprehensive information.
o Applications require network telemetry to be elastic in order to o Applications require network telemetry to be elastic in order to
efficiently use the network resource and reduce the performance efficiently use the network resource and reduce the performance
impact. Routine network monitoring covers the entire network with impact. Routine network monitoring covers the entire network with
low data sampling rate. When issues arise or trends emerge, the low data sampling rate. When issues arise or trends emerge, the
telemetry data source can be modified and the data rate can be telemetry data source can be modified and the data rate can be
boosted. boosted.
o Efficient data fusion is critical for applications to reduce the o Efficient data fusion is critical for applications to reduce the
overall quantity of data and improve the accuracy of analysis. overall quantity of data and improve the accuracy of analysis.
A telemetry framework collects together all of the telemetry-related A telemetry framework collects together all of the telemetry-related
work from different sources and working groups within the IETF. This works from different sources and working groups within IETF. This
makes it possible to assemble a comprehensive network telemetry makes it possible to assemble a comprehensive network telemetry
system and to avoid repetitious or redundant work. The framework system and to avoid repetitious or redundant work. The framework
should cover the concepts and components from the standardization should cover the concepts and components from the standardization
perspective. This document clarifies the layered modules on which perspective. This document clarifies the layered modules on which
the telemetry is exerted and decomposes the telemetry system into a the telemetry is exerted and decomposes the telemetry system into a
set of distinct components that the existing and future work can set of distinct components that the existing and future work can
easily map to. easily map to.
4. Network Telemetry Framework 4. Network Telemetry Framework
skipping to change at page 12, line 14 skipping to change at page 12, line 14
4.1. Data Acquiring Mechanisms and Data Types 4.1. Data Acquiring Mechanisms and Data Types
Broadly speaking, network data can be acquired through subscription Broadly speaking, network data can be acquired through subscription
(push) and query (poll). A subscriber may request data when it is (push) and query (poll). A subscriber may request data when it is
ready. It follows a Publish-Subscription (Pub-Sub) mode or a ready. It follows a Publish-Subscription (Pub-Sub) mode or a
Subscription-Publish (Sub-Pub) mode. In the Pub-Sub mode, pre- Subscription-Publish (Sub-Pub) mode. In the Pub-Sub mode, pre-
defined data are published and multiple qualified subscribers can defined data are published and multiple qualified subscribers can
subscribe the data. In the Sub-Pub mode, a subscriber designates subscribe the data. In the Sub-Pub mode, a subscriber designates
what data are of interest and demands the network devices to deliver what data are of interest and demands the network devices to deliver
the data when they are available. the data when available.
In contrast, a querier expects immediate feedback from network In contrast, query is used when a querier expects immediate feedback
devices. It is usually used in a more interactive environment. The from network devices. The queried data may be directly extracted
queried data may be directly extracted from some specific data from some specific data source, or synthesized and processed from raw
source, or synthesized and processed from raw data. data. Query suits for interactive network telemetry applications.
There are four types of data from network devices: There are four types of data from network devices:
Simple Data: The data that are steadily available from some data Simple Data: The data that are steadily available from some data
store or static probes in network devices. such data can be store or static probes in network devices. such data can be
specified by YANG model. specified by YANG model.
Complex Data: The data need to be synthesized or processed from raw Complex Data: The data need to be synthesized or processed in
data from one or more network devices. The data processing network from raw data from one or more network devices. The data
function can be statically or dynamically loaded into network processing function can be statically or dynamically loaded into
devices. network devices.
Event-triggered Data: The data are conditionally acquired based on Event-triggered Data: The data are conditionally acquired based on
the occurrence of some event. An event can be modeled as a Finite the occurrence of some events. An event can be modeled as a
State Machine (FSM). Finite State Machine (FSM).
Streaming Data: The data are continuously or periodically generated. Streaming Data: The data are continuously or periodically generated.
It can be time series or the dump of databases. The streaming It can be time series or the dump of databases. The streaming
data reflect realtime network states and metrics and require large data reflect realtime network states and metrics and require large
bandwidth and processing power. bandwidth and processing power.
The above data types are not mutually exclusive. For example, event- The above data types are not mutually exclusive. For example, event-
triggered data can be simple or complex, and streaming data can be triggered data can be simple or complex, and streaming data can be
event triggered. The relationships of these data types are event triggered. The relationships of these data types are
illustrated in Figure 1 illustrated in Figure 1.
+--------------------------+
| +----------------------+ | +--------------------------+
| | +-----------------+ | | | +----------------------+ |
| | | +-------------+ | | | | | +-----------------+ | |
| | | | Simple Data | | | | | | | +-------------+ | | |
| | | +-------------+ | | | | | | | Simple Data | | | |
| | | Complex Data | | | | | | +-------------+ | | |
| | +-----------------+ | | | | | Complex Data | | |
| | Event-triggered Data | | | | +-----------------+ | |
| +----------------------+ | | | Event-triggered Data | |
| Streaming Data | | +----------------------+ |
+--------------------------+ | Streaming Data |
+--------------------------+
Figure 1: Data Type Relationship Figure 1: Data Type Relationship
Subscription usually deals with event-triggered data and streaming Subscription usually deals with event-triggered data and streaming
data, and query usually deals with simple data and complex data. It data, and query usually deals with simple data and complex data. The
is easy to see that conventional OAM techniques are mostly about conventional OAM techniques are mostly about querying simple data.
querying simple data only. While these techniques are still useful, While these techniques are still useful, more advanced network
advanced network telemetry techniques pay more attention on the other telemetry techniques are designed mainly for event-triggered or
three data types, and prefer event/streaming data subscription and streaming data subscription, and complex data query.
complex data query over simple data query.
4.2. Data Object Modules 4.2. Data Object Modules
Telemetry can be applied on the forwarding plane, the control plane, Telemetry can be applied on the forwarding plane, the control plane,
and the management plane in a network, as well as other sources out and the management plane in a network, as well as other sources out
of the network, as shown in Figure 2. Therefore, we categorize the of the network, as shown in Figure 2. Therefore, we categorize the
network telemetry into four distinct modules with each having its own network telemetry into four distinct modules with each having its own
interface to Network Operation Applications. interface to Network Operation Applications.
+------------------------------+ +------------------------------+
skipping to change at page 14, line 37 skipping to change at page 14, line 37
Figure 2: Modules in Layer Category of NTF Figure 2: Modules in Layer Category of NTF
The rationale of this partition lies in the different telemetry data The rationale of this partition lies in the different telemetry data
objects which result in different data source and export locations. objects which result in different data source and export locations.
Such differences have profound implications on in-network data Such differences have profound implications on in-network data
programming and processing capability, data encoding and transport programming and processing capability, data encoding and transport
protocol, and data bandwidth and latency. protocol, and data bandwidth and latency.
We summarize the major differences of the four modules in the We summarize the major differences of the four modules in the
following table. They are mainly compared from six aspects: data following table. They are compared from six aspects: data object,
object, data export location, data model, data encoding, telemetry data export location, data model, data encoding, telemetry protocol,
protocol, and transport method. Data object is the target and source and transport method. Data object is the target and source of each
of each module. Because the data source varies, the data export module. Because the data source varies, the data export location
location varies. Because each data export location has different varies. Because each data export location has different capability,
capability, the proper data model, encoding, and transport method the proper data model, encoding, and transport method cannot be kept
cannot be kept the same. As a result, the suitable telemetry the same. As a result, the suitable telemetry protocol for each
protocol for each module can be different. Some representative module can be different. Some representative techniques are shown in
techniques are shown in some table blocks to highlight the technical the corresponding table blocks to highlight the technical diversity
diversity of these modules. One cannot expect to use a universal of these modules. The key point is that one cannot expect to use a
protocol to cover all the network telemetry requirements. universal protocol to cover all the network telemetry requirements.
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
| Module | Control | Management | Forwarding | External | | Module | Control | Management | Forwarding | External |
| | Plane | Plane | Plane | Data | | | Plane | Plane | Plane | Data |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Object | control | config. & | flow & packet| terminal, | |Object | control | config. & | flow & packet| terminal, |
| | protocol & | operation | QoS, traffic | social & | | | protocol & | operation | QoS, traffic | social & |
| | signaling, | state, MIB | stat., buffer| environ- | | | signaling, | state, MIB | stat., buffer| environ- |
| | RIB, ACL | | & queue stat.| mental | | | RIB, ACL | | & queue stat.| mental |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
skipping to change at page 15, line 37 skipping to change at page 15, line 37
|Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC | |Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC |
| | IPFIX,mirror | | | | | | IPFIX,mirror | | | |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP |
| | UDP | | | UDP | | | UDP | | | UDP |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
Figure 3: Comparison of the Data Object Modules Figure 3: Comparison of the Data Object Modules
Note that the interaction with the network operation applications can Note that the interaction with the network operation applications can
be indirect. For example, in the management plane telemetry, the be indirect. Some in-device data transfer is possible. For example,
management plane may need to acquire data from the data plane. Some in the management plane telemetry, the management plane may need to
of the operational states can only be derived from the data plane acquire data from the data plane. Some of the operational states can
such as the interface status and statistics. For another example, only be derived from the data plane such as the interface status and
the control plane telemetry may need to access the Forwarding statistics. For another example, the control plane telemetry may
Information Base (FIB) in data plane. On the other hand, an need to access the Forwarding Information Base (FIB) in data plane.
application may involve more than one plane simultaneously. For
example, an SLA compliance application may require both the data On the other hand, an application may involve more than one plane and
plane telemetry and the control plane telemetry. interact with multiple planes simultaneously. For example, an SLA
compliance application may require both the data plane telemetry and
the control plane telemetry.
4.2.1. Requirements and Challenges for each Module 4.2.1. Requirements and Challenges for each Module
4.2.1.1. Management Plane Telemetry 4.2.1.1. Management Plane Telemetry
The management plane of network elements interacts with the Network The management plane of network elements interacts with the Network
Management System (NMS), and provides information such as performance Management System (NMS), and provides information such as performance
data, network logging data, network warning and defects data, and data, network logging data, network warning and defects data, and
network statistics and state data. Some legacy protocols, such as network statistics and state data. Some legacy protocols, such as
SNMP and Syslog, are widely used for the management plane. However, SNMP and Syslog, are widely used for the management plane. However,
these protocols are insufficient to meet the requirements of the these protocols are insufficient to meet the requirements of the
future automated network operation applications. future automated network operation applications.
skipping to change at page 16, line 29 skipping to change at page 16, line 32
export frequency. export frequency.
Structured Data: For automatic network operation, machines will Structured Data: For automatic network operation, machines will
replace human for network data comprehension. The schema replace human for network data comprehension. The schema
languages such as YANG can efficiently describe structured data languages such as YANG can efficiently describe structured data
and normalize data encoding and transformation. and normalize data encoding and transformation.
High Speed Data Transport: In order to retain the information, a High Speed Data Transport: In order to retain the information, a
server needs to send a large amount of data at high frequency. server needs to send a large amount of data at high frequency.
Compact encoding formats are needed to compress the data and Compact encoding formats are needed to compress the data and
improve the data transport efficiency. The push mode, by improve the data transport efficiency. The subscription mode, by
replacing the poll mode, can also reduce the interactions between replacing the query mode, reduces the interactions between clients
clients and servers, which help to improve the server's and servers and helps to improve the server's efficiency.
efficiency.
4.2.1.2. Control Plane Telemetry 4.2.1.2. Control Plane Telemetry
The control plane telemetry refers to the health condition monitoring The control plane telemetry refers to the health condition monitoring
of different network protocols, which covers Layer 2 to Layer 7. of different network control protocols covering Layer 2 to Layer 7.
Keeping track of the running status of these protocols is beneficial Keeping track of the running status of these protocols is beneficial
for detecting, localizing, and even predicting various network for detecting, localizing, and even predicting various network
issues, as well as network optimization, in real-time and in fine issues, as well as network optimization, in real-time and in fine
granularity. granularity.
One of the most challenging problems for the control plane telemetry One of the most challenging problems for the control plane telemetry
is how to correlate the E2E Key Performance Indicators (KPI) to a is how to correlate the End-to-End (E2E) Key Performance Indicators
specific layer's KPIs. For example, an IPTV user may describe his (KPI) to a specific layer's KPIs. For example, an IPTV user may
User Experience (UE) by the video fluency and definition. Then in describe his User Experience (UE) by the video fluency and
case of an unusually poor UE KPI or a service disconnection, it is definition. Then in case of an unusually poor UE KPI or a service
non-trivial work to delimit and localize the issue to the responsible disconnection, it is non-trivial to delimit and pinpoint the issue in
protocol layer (e.g., the Transport Layer or the Network Layer), the the responsible protocol layer (e.g., the Transport Layer or the
responsible protocol (e.g., ISIS or BGP at the Network Layer), and Network Layer), the responsible protocol (e.g., ISIS or BGP at the
finally the responsible device(s) with specific reasons. Network Layer), and finally the responsible device(s) with specific
reasons.
Traditional OAM-based approaches for control plane KPI measurement Traditional OAM-based approaches for control plane KPI measurement
include PING (L3), Tracert (L3), Y.1731 (L2) and so on. One common include PING (L3), Tracert (L3), Y.1731 (L2), and so on. One common
issue behind these methods is that they only measure the KPIs instead issue behind these methods is that they only measure the KPIs instead
of reflecting the actual running status of these protocols, making of reflecting the actual running status of these protocols, making
them less effective or efficient for control plane troubleshooting them less effective or efficient for control plane troubleshooting
and network optimization. An example of the control plane telemetry and network optimization.
is the BGP monitoring protocol (BMP), it is currently used to
monitoring the BGP routes and enables rich applications, such as BGP An example of the control plane telemetry is the BGP monitoring
peer analysis, AS analysis, prefix analysis, security analysis, and protocol (BMP), it is currently used to monitoring the BGP routes and
so on. However, the monitoring of other layers, protocols and the enables rich applications, such as BGP peer analysis, AS analysis,
cross-layer, cross-protocol KPI correlations are still in their prefix analysis, security analysis, and so on. However, the
infancy (e.g., the IGP monitoring is missing), which require monitoring of other layers, protocols and the cross-layer, cross-
substantial further research. protocol KPI correlations are still in their infancy (e.g., the IGP
monitoring is missing), which require further research.
4.2.1.3. Data Plane Telemetry 4.2.1.3. Data Plane Telemetry
An effective data plane telemetry system relies on the data that the An effective data plane telemetry system relies on the data that the
network device can expose. The data's quality, quantity, and network device can expose. The data's quality, quantity, and
timeliness must meet some stringent requirements. This raises some timeliness must meet some stringent requirements. This raises some
challenges to the network data plane devices where the first hand challenges to the network data plane devices where the first hand
data originate. data originate.
o A data plane device's main function is user traffic processing and o A data plane device's main function is user traffic processing and
skipping to change at page 18, line 9 skipping to change at page 18, line 11
applications to parse and consume. At the same time, the data applications to parse and consume. At the same time, the data
types needed by applications can vary significantly. The data types needed by applications can vary significantly. The data
plane devices need to provide enough flexibility and plane devices need to provide enough flexibility and
programmability to support the precise data provision for programmability to support the precise data provision for
applications. applications.
o The data plane telemetry should support incremental deployment and o The data plane telemetry should support incremental deployment and
work even though some devices are unaware of the system. This work even though some devices are unaware of the system. This
challenge is highly relevant to the standards and legacy networks. challenge is highly relevant to the standards and legacy networks.
The industry has agreed that the data plane programmability is The data plane programmability is essential to support network
essential to support network telemetry. Newer data plane chips are telemetry. Newer data plane forwarding chips are equipped with
all equipped with advanced telemetry features and provide flexibility advanced telemetry features and provide flexibility to support
to support customized telemetry functions. customized telemetry functions.
4.2.1.3.1. Technique Taxonomy 4.2.1.3.1. Technique Taxonomy
There can be multiple possible dimensions to classify the data plane There can be multiple possible dimensions to classify the data plane
telemetry techniques. telemetry techniques.
Active and Passive: The active and passive methods (as well as the Active, Passive, and Hybrid: The active and passive methods (as well
hybrid types) are well documented in [RFC7799]. The passive as the hybrid types) are well documented in [RFC7799]. The
methods include TCPDUMP, IPFIX [RFC7011], sflow, and traffic passive methods include TCPDUMP, IPFIX [RFC7011], sflow, and
mirror. These methods usually have low data coverage. The traffic mirror. These methods usually have low data coverage.
bandwidth cost is very high in order to improve the data coverage. The bandwidth cost is very high in order to improve the data
On the other hand, the active methods include Ping, Traceroute, coverage. On the other hand, the active methods include Ping,
OWAMP [RFC4656], and TWAMP [RFC5357]. These methods are intrusive Traceroute, OWAMP [RFC4656], and TWAMP [RFC5357]. These methods
and only provide indirect network measurement results. The hybrid are intrusive and only provide indirect network measurement
methods, including in-situ OAM results. The hybrid methods, including in-situ OAM
[I-D.brockners-inband-oam-requirements], IPFPM [RFC8321], and [I-D.ietf-ippm-ioam-data], IPFPM [RFC8321], and Multipoint
Multipoint Alternate Marking Alternate Marking [I-D.fioccola-ippm-multipoint-alt-mark], provide
[I-D.fioccola-ippm-multipoint-alt-mark], provide a well-balanced a well-balanced and more flexible approach. However, these
and more flexible approach. However, these methods are also more methods are also more complex to implement.
complex to implement.
In-Band and Out-of-Band: The telemetry data, before being exported In-Band and Out-of-Band: The telemetry data, before being exported
to some collector, can be carried in user packets. Such methods to some collector, can be carried in user packets. Such methods
are considered in-band (e.g., in-situ OAM are considered in-band (e.g., in-situ OAM
[I-D.brockners-inband-oam-requirements]). If the telemetry data [I-D.ietf-ippm-ioam-data]). If the telemetry data is directly
is directly exported to some collector without modifying the user exported to some collector without modifying the user packets,
packets, Such methods are considered out-of-band (e.g., postcard- such methods are considered out-of-band (e.g., postcard-based
based INT). It is possible to have hybrid methods. For example, INT). It is possible to have hybrid methods. For example, only
only the telemetry instruction or partial data is carried by user the telemetry instruction or partial data is carried by user
packets (e.g., IPFPM [RFC8321]). packets (e.g., IPFPM [RFC8321]).
E2E and In-Network: Some E2E methods start from and end at the E2E and In-Network: Some E2E methods start from and end at the
network end hosts (e.g., Ping). The other methods work in network end hosts (e.g., Ping). The other methods work in
networks and are transparent to end hosts. However, if needed, networks and are transparent to end hosts. However, if needed,
the in-network methods can be easily extended into end hosts. the in-network methods can be easily extended into end hosts.
Flow, Path, and Node: Depending on the telemetry objective, the Flow, Path, and Node: Depending on the telemetry objective, the
methods can be flow-based (e.g., in-situ OAM methods can be flow-based (e.g., in-situ OAM
[I-D.brockners-inband-oam-requirements]), path-based (e.g.,
Traceroute), and node-based (e.g., IPFIX [RFC7011]). [I-D.ietf-ippm-ioam-data]), path-based (e.g., Traceroute), and
node-based (e.g., IPFIX [RFC7011]).
4.2.1.4. External Data Telemetry 4.2.1.4. External Data Telemetry
Events that occur outside the boundaries of the network system are Events that occur outside the boundaries of the network system are
another important source of telemetry information. Correlating both another important source of network telemetry. Correlating both
internal telemetry data and external events with the requirements of internal telemetry data and external events with the requirements of
network systems, as presented in Exploiting External Event Detectors network systems, as presented in
to Anticipate Resource Requirements for the Elastic Adaptation of [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and
SDN/NFV Systems [I-D.pedro-nmrg-anticipated-adaptation], provides a functional advantage to management operations.
strategic and functional advantage to management operations.
As with other sources of telemetry information, the data and events As with other sources of telemetry information, the data and events
must meet strict requirements, especially in terms of timeliness, must meet strict requirements, especially in terms of timeliness,
which is essential to properly incorporate external event information which is essential to properly incorporate external event information
to management cycles. Thus, the specific challenges are described as to management cycles. The specific challenges are described as
follows: follows:
o The role of external event detector can be played by multiple o The role of external event detector can be played by multiple
elements, including hardware (e.g. physical sensors, such as elements, including hardware (e.g. physical sensors, such as
seismometers) and software (e.g. Big Data sources that analyze seismometers) and software (e.g. Big Data sources that analyze
streams of information, such as Twitter messages). Thus, the streams of information, such as Twitter messages). Thus, the
transmitted data must support different shapes but, at the same transmitted data must support different shapes but, at the same
time, follow a common but extensible ontology. time, follow a common but extensible schema.
o Since the main function of the external event detectors is to o Since the main function of the external event detectors is to
perform the notifications, their timeliness is assumed. However, perform the notifications, their timeliness is assumed. However,
once messages have been dispatched, they must be quickly collected once messages have been dispatched, they must be quickly collected
and inserted into the control plane with variable priority, which and inserted into the control plane with variable priority, which
will be high for important sources and/or important events and low will be high for important sources and/or important events and low
for secondary ones. for secondary ones.
o The ontology used by external detectors must be easily adopted by o The schema used by external detectors must be easily adopted by
current and future devices and applications. Therefore, it must current and future devices and applications. Therefore, it must
be easily mapped to current information models, such as in terms be easily mapped to current information models, such as in terms
of YANG. of YANG.
Organizing together both internal and external telemetry information Organizing together both internal and external telemetry information
will be key for the general exploitation of the management will be key for the general exploitation of the management
possibilities of current and future network systems, as reflected in possibilities of current and future network systems, as reflected in
the incorporation of cognitive capabilities to new hardware and the incorporation of cognitive capabilities to new hardware and
software (virtual) elements. software (virtual) elements.
4.3. Function Components 4.3. Function Components
At each plane, the telemetry can be further partitioned into five The telemetry module at each plane can be further partitioned into
distinct components: five distinct components:
Data Query, Analysis, and Storage: This component works at the Data Query, Analysis, and Storage: This component works at the
application layer. On the one hand, it is responsible for issuing application layer. On the one hand, it is responsible for issuing
data queries. The queries can be for modeled data through data requirements. The data of interest can be modeled data
configuration or custom data through programming. The queries can through configuration or custom data through programming. The
be one shot or subscriptions for events or streaming data. On the data requirements can be queries for one-shot data or
other hand, it receives, stores, and processes the returned data subscriptions for events or streaming data. On the other hand, it
from network devices. Data analysis can be interactive to receives, stores, and processes the returned data from network
initiate further data queries. Note that this component can devices. Data analysis can be interactive to initiate further
reside in either network devices or remote controllers. data queries. This component can reside in either network devices
or remote controllers.
Data Configuration and Subscription: This component deploys data Data Configuration and Subscription: This component deploys data
queries on devices. It determines the protocol and channel for queries on devices. It determines the protocol and channel for
applications to acquire desired data. This component is also applications to acquire desired data. This component is also
responsible for configuring the desired data that might not be responsible for configuring the desired data that might not be
directly available form data sources. The subscription data can directly available form data sources. The subscription data can
be described by models, templates, or programs. be described by models, templates, or programs.
Data Encoding and Export: This component determines how telemetry Data Encoding and Export: This component determines how telemetry
data are delivered to the data analysis and storage component. data are delivered to the data analysis and storage component.
skipping to change at page 21, line 35 skipping to change at page 21, line 35
| Data Object and Source | | Data Object and Source |
| | | |
+----------------------------------------+ +----------------------------------------+
Figure 4: Components in the Network Telemetry Framework Figure 4: Components in the Network Telemetry Framework
4.4. Existing Works Mapped in the Framework 4.4. Existing Works Mapped in the Framework
The following two tables provide a non-exhaustive list of existing The following two tables provide a non-exhaustive list of existing
works (mainly published in IETF and with the emphasis on the latest works (mainly published in IETF and with the emphasis on the latest
new technologies) and shows their positions in the framework. The new technologies) and shows their positions in the framework. More
details about the mentioned work can be found in Appendix A. details can be found in Appendix A.
The first table is based on the data acquiring mechanisms and data
types.
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| | Query | Subscription | | | Query | Subscription |
| | | | | | | |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Simple Data | SNMP, NETCONF,| | | Simple Data | SNMP, NETCONF,| SNMP, NETCONF |
| | YANG, BMP, | | | | YANG, BMP, | YANG, gRPC |
| | IOAM, PBT,gPRC| | | | gRPC | |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Complex Data | DNP, YANG FSM | | | Complex Data | DNP, YANG FSM | DNP, YANG PUSH |
| | gRPC, NETCONF | | | | gRPC, NETCONF | gPRC, NETCONF |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Event-triggered | | gRPC, NETCONF, | | Event-triggered | | gRPC, NETCONF, |
| Data | | YANG PUSH, DNP | | Data | N/A | YANG PUSH, DNP |
| | | IOAM, PBT, |
| | | YANG FSM | | | | YANG FSM |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Streaming Data | | gRPC, NETCONF, | | Streaming Data | | gRPC, NETCONF, |
| | | IOAM, PBT, DNP | | | N/A | IOAM, PBT, DNP |
| | | IPFIX, IPFPM | | | | IPFIX, IPFPM |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
Figure 5: Existing Work Mapping I Figure 5: Existing Work Mapping I
The second table is based on the telemetry modules and components.
+--------------+---------------+----------------+---------------+ +--------------+---------------+----------------+---------------+
| | Management | Control | Forwarding | | | Management | Control | Forwarding |
| | Plane | Plane | Plane | | | Plane | Plane | Plane |
+--------------+---------------+----------------+---------------+ +--------------+---------------+----------------+---------------+
| data Config. | gRPC, NETCONF,| NETCONF/YANG | NETCONF/YANG, | | data Config. | gRPC, NETCONF,| NETCONF/YANG | NETCONF/YANG, |
| & subscrib. | YANG PUSH | | YANG FSM | | & subscrib. | YANG PUSH | | YANG FSM |
+--------------+---------------+----------------+---------------+ +--------------+---------------+----------------+---------------+
| data gen. & | DNP, | DNP, | IOAM, | | data gen. & | DNP, | DNP, | IOAM, |
| processing | YANG | YANG | PBT, IPFPM, | | processing | YANG | YANG | PBT, IPFPM, |
| | | | DNP | | | | | DNP |
+--------------+---------------+----------------+---------------+ +--------------+---------------+----------------+---------------+
| data | gRPC, NETCONF | BMP, NETCONF | IPFIX | | data | gRPC, NETCONF | BMP, NETCONF | IPFIX |
| export | YANG PUSH | | | | export | YANG PUSH | | |
+--------------+---------------+----------------+---------------+ +--------------+---------------+----------------+---------------+
Figure 6: Existing Work Mapping II Figure 6: Existing Work Mapping II
5. Evolution of Network Telemetry 5. Evolution of Network Telemetry
As the network is evolving towards the automated operation, network Network telemetry is a fast evolving technical area. As the network
telemetry also undergoes several levels of evolution. moves towards the automated operation, network telemetry undergoes
several levels of evolution.
Level 0 - Static Telemetry: The telemetry data source and type are Level 0 - Static Telemetry: The telemetry data source and type are
determined at design time. The network operator can only determined at design time. The network operator can only
configure how to use it with limited flexibility. configure how to use it with limited flexibility.
Level 1 - Dynamic Telemetry: The telemetry data can be dynamically Level 1 - Dynamic Telemetry: The telemetry data can be dynamically
programmed or configured at runtime, allowing a tradeoff among programmed or configured at runtime, allowing a tradeoff among
resource, performance, flexibility, and coverage. DNP is an resource, performance, flexibility, and coverage. DNP is an
effort towards this direction. effort towards this direction.
Level 2 - Interactive Telemetry: The network operator can Level 2 - Interactive Telemetry: The network operator can
continuously customize the telemetry data in real time to reflect continuously customize the telemetry data in real time to reflect
the network operation's visibility requirements. At this level, the network operation's visibility requirements. At this level,
some tasks can be automated, although ultimately human operators some tasks can be automated, although ultimately human operators
will still need to sit in the middle to make decisions. will still need to sit in the middle to make decisions.
Level 3 - Closed-loop Telemetry: Human operators are completely Level 3 - Closed-loop Telemetry: Human operators are completely
excluded from the control loop. The intelligent network operation excluded from the control loop. The intelligent network operation
engine automatically issues the telemetry data request, analyzes engine automatically issues the telemetry data requests, analyzes
the data, and updates the network operations in closed control the data, and updates the network operations in closed control
loops. loops.
While most of the existing technologies belong to level 0 and level While most of the existing technologies belong to level 0 and level
1, with the help of a clearly defined network telemetry framework, we 1, with the help of a clearly defined network telemetry framework, we
can assemble the technologies to support level 2 and make solid steps are now possible to assemble the technologies to support level 2 and
towards level 3. make solid steps towards level 3.
6. Security Considerations 6. Security Considerations
Given that this document has proposed a framework for network Given that this document has proposed a framework for network
telemetry and the telemetry mechanisms discussed are distinct (in telemetry and the telemetry mechanisms discussed are distinct (in
both message frequency and traffic amount) from the conventional both message frequency and traffic amount) from the conventional
network OAM concepts, we must also reflect that various new security network OAM concepts, we must also reflect that various new security
considerations may also arise. A number of techniques already exist considerations may also arise. A number of techniques already exist
for securing the data plane, control plane, and the management plane for securing the forwarding plane, the control plane, and the
in a network, but the it is important to consider if any new threat management plane in a network, but it is important to consider if any
vectors are now being enabled via the use of network telemetry new threat vectors are now being enabled via the use of network
procedures and mechanisms. telemetry procedures and mechanisms.
Security considerations for networks that use telemetry methods may Security considerations for networks that use telemetry methods may
include: include:
o Telemetry framework trust and policy model; o Telemetry framework trust and policy model;
o Role management and access control for enabling and disabling o Role management and access control for enabling and disabling
telemetry capabilities; telemetry capabilities;
o Protocol transport used telemetry data and inherent security o Protocol transport used telemetry data and inherent security
skipping to change at page 24, line 20 skipping to change at page 24, line 20
Some of the security considerations highlighted above may be Some of the security considerations highlighted above may be
minimized or negated with policy management of network telemetry. In minimized or negated with policy management of network telemetry. In
a network telemetry deployment it would be advantageous to separate a network telemetry deployment it would be advantageous to separate
telemetry capabilities into different classes of policies, i.e., Role telemetry capabilities into different classes of policies, i.e., Role
Based Access Control and Event-Condition-Action policies. Also, Based Access Control and Event-Condition-Action policies. Also,
potential conflicts between network telemetry mechanisms must be potential conflicts between network telemetry mechanisms must be
detected accurately and resolved quickly to avoid unnecessary network detected accurately and resolved quickly to avoid unnecessary network
telemetry traffic propagation escalating into an unintended or telemetry traffic propagation escalating into an unintended or
intended denial of service attack. intended denial of service attack.
Further discussion and development of this section will be required, Further study of the security issues will be required, and it is
and it is expected that this security section, and subsequent policy expected that the secuirty mechanisms and protocols are devloped and
section will be developed further. deployed along with a network telemetry system.
7. IANA Considerations 7. IANA Considerations
This document includes no request to IANA. This document includes no request to IANA.
8. Contributors 8. Contributors
The other contributors of this document are listed as follows. The other contributors of this document are listed as follows.
o Tianran Zhou o Tianran Zhou
o Zhenbin Li o Zhenbin Li
o Zhenqiang Li
o Daniel King o Daniel King
o Adrian Farrel o Adrian Farrel
9. Acknowledgments 9. Acknowledgments
We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James We would like to thank Randy Presuhn, Joe Clarke, Victor Liu, James
Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani, Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Yegani,
Young Lee, Alexander Clemm, Qin Wu, and many others who have provided Young Lee, Alexander Clemm, Qin Wu, and many others who have provided
helpful comments and suggestions to improve this document. helpful comments and suggestions to improve this document.
10. Informative References 10. Informative References
[gnmi] "gNMI - gRPC Network Management Interface", [gnmi] "gNMI - gRPC Network Management Interface",
<https://github.com/openconfig/reference/tree/master/rpc/ <https://github.com/openconfig/reference/tree/master/rpc/
gnmi>. gnmi>.
[grpc] "gPPC, A high performance, open-source universal RPC [grpc] "gPPC, A high performance, open-source universal RPC
framework", <https://grpc.io>. framework", <https://grpc.io>.
[I-D.brockners-inband-oam-requirements]
Brockners, F., Bhandari, S., Dara, S., Pignataro, C.,
Gredler, H., Leddy, J., Youell, S., Mozes, D., Mizrahi,
T., Lapukhov, P., and r. remy@barefootnetworks.com,
"Requirements for In-situ OAM", draft-brockners-inband-
oam-requirements-03 (work in progress), March 2017.
[I-D.fioccola-ippm-multipoint-alt-mark] [I-D.fioccola-ippm-multipoint-alt-mark]
Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto,
"Multipoint Alternate Marking method for passive and "Multipoint Alternate Marking method for passive and
hybrid performance monitoring", draft-fioccola-ippm- hybrid performance monitoring", draft-fioccola-ippm-
multipoint-alt-mark-04 (work in progress), June 2018. multipoint-alt-mark-04 (work in progress), June 2018.
[I-D.ietf-grow-bmp-adj-rib-out] [I-D.ietf-grow-bmp-adj-rib-out]
Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S.
Zhuang, "Support for Adj-RIB-Out in BGP Monitoring Zhuang, "Support for Adj-RIB-Out in BGP Monitoring
Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-07 (work
in progress), August 2019. in progress), August 2019.
[I-D.ietf-grow-bmp-local-rib] [I-D.ietf-grow-bmp-local-rib]
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente,
"Support for Local RIB in BGP Monitoring Protocol (BMP)", "Support for Local RIB in BGP Monitoring Protocol (BMP)",
draft-ietf-grow-bmp-local-rib-05 (work in progress), draft-ietf-grow-bmp-local-rib-06 (work in progress),
August 2019. November 2019.
[I-D.ietf-ippm-ioam-data]
Brockners, F., Bhandari, S., Pignataro, C., Gredler, H.,
Leddy, J., Youell, S., Mizrahi, T., Mozes, D., Lapukhov,
P., remy@barefootnetworks.com, r., daniel.bernier@bell.ca,
d., and J. Lemon, "Data Fields for In-situ OAM", draft-
ietf-ippm-ioam-data-09 (work in progress), March 2020.
[I-D.ietf-netconf-udp-pub-channel] [I-D.ietf-netconf-udp-pub-channel]
Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication
Channel for Streaming Telemetry", draft-ietf-netconf-udp- Channel for Streaming Telemetry", draft-ietf-netconf-udp-
pub-channel-05 (work in progress), March 2019. pub-channel-05 (work in progress), March 2019.
[I-D.ietf-netconf-yang-push] [I-D.ietf-netconf-yang-push]
Clemm, A. and E. Voit, "Subscription to YANG Datastores", Clemm, A. and E. Voit, "Subscription to YANG Datastores",
draft-ietf-netconf-yang-push-25 (work in progress), May draft-ietf-netconf-yang-push-25 (work in progress), May
2019. 2019.
skipping to change at page 26, line 14 skipping to change at page 26, line 20
[I-D.pedro-nmrg-anticipated-adaptation] [I-D.pedro-nmrg-anticipated-adaptation]
Martinez-Julia, P., "Exploiting External Event Detectors Martinez-Julia, P., "Exploiting External Event Detectors
to Anticipate Resource Requirements for the Elastic to Anticipate Resource Requirements for the Elastic
Adaptation of SDN/NFV Systems", draft-pedro-nmrg- Adaptation of SDN/NFV Systems", draft-pedro-nmrg-
anticipated-adaptation-02 (work in progress), June 2018. anticipated-adaptation-02 (work in progress), June 2018.
[I-D.song-ippm-postcard-based-telemetry] [I-D.song-ippm-postcard-based-telemetry]
Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee, Song, H., Zhou, T., Li, Z., Shin, J., and K. Lee,
"Postcard-based On-Path Flow Data Telemetry", draft-song- "Postcard-based On-Path Flow Data Telemetry", draft-song-
ippm-postcard-based-telemetry-05 (work in progress), ippm-postcard-based-telemetry-06 (work in progress),
September 2019. October 2019.
[I-D.song-opsawg-dnp4iq] [I-D.song-opsawg-dnp4iq]
Song, H. and J. Gong, "Requirements for Interactive Query Song, H. and J. Gong, "Requirements for Interactive Query
with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01
(work in progress), June 2017. (work in progress), June 2017.
[I-D.song-opsawg-ifit-framework]
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "In-
situ Flow Information Telemetry", draft-song-opsawg-ifit-
framework-11 (work in progress), March 2020.
[I-D.zhou-netconf-multi-stream-originators] [I-D.zhou-netconf-multi-stream-originators]
Zhou, T., Zheng, G., Voit, E., Clemm, A., and A. Bierman, Zhou, T., Zheng, G., Voit, E., and A. Clemm, "Subscription
"Subscription to Multiple Stream Originators", draft-zhou- to Multiple Stream Originators", draft-zhou-netconf-multi-
netconf-multi-stream-originators-06 (work in progress), stream-originators-10 (work in progress), November 2019.
July 2019.
[RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin, [RFC1157] Case, J., Fedor, M., Schoffstall, M., and J. Davin,
"Simple Network Management Protocol (SNMP)", RFC 1157, "Simple Network Management Protocol (SNMP)", RFC 1157,
DOI 10.17487/RFC1157, May 1990, DOI 10.17487/RFC1157, May 1990,
<https://www.rfc-editor.org/info/rfc1157>. <https://www.rfc-editor.org/info/rfc1157>.
[RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981, [RFC2981] Kavasseri, R., Ed., "Event MIB", RFC 2981,
DOI 10.17487/RFC2981, October 2000, DOI 10.17487/RFC2981, October 2000,
<https://www.rfc-editor.org/info/rfc2981>. <https://www.rfc-editor.org/info/rfc2981>.
skipping to change at page 31, line 28 skipping to change at page 31, line 33
information about these packets. An Exporter then gathers each of information about these packets. An Exporter then gathers each of
the Observation Points together into an Observation Domain and sends the Observation Points together into an Observation Domain and sends
this information via the IPFIX protocol to a Collector. this information via the IPFIX protocol to a Collector.
A.3.4. In-Situ OAM A.3.4. In-Situ OAM
Traditional passive and active monitoring and measurement techniques Traditional passive and active monitoring and measurement techniques
are either inaccurate or resource-consuming. It is preferable to are either inaccurate or resource-consuming. It is preferable to
directly acquire data associated with a flow's packets when the directly acquire data associated with a flow's packets when the
packets pass through a network. In-situ OAM (iOAM) packets pass through a network. In-situ OAM (iOAM)
[I-D.brockners-inband-oam-requirements], a data generation technique, [I-D.ietf-ippm-ioam-data], a data generation technique, embeds a new
embeds a new instruction header to user packets and the instruction instruction header to user packets and the instruction directs the
directs the network nodes to add the requested data to the packets. network nodes to add the requested data to the packets. Thus, at the
Thus, at the path end, the packet's experience gained on the entire path end, the packet's experience gained on the entire forwarding
forwarding path can be collected. Such firsthand data is invaluable path can be collected. Such firsthand data is invaluable to many
to many network OAM applications. network OAM applications.
However, iOAM also faces some challenges. The issues on performance However, iOAM also faces some challenges. The issues on performance
impact, security, scalability and overhead limits, encapsulation impact, security, scalability and overhead limits, encapsulation
difficulties in some protocols, and cross-domain deployment need to difficulties in some protocols, and cross-domain deployment need to
be addressed. be addressed.
A.3.5. Postcard Based Telemetry A.3.5. Postcard Based Telemetry
PBT [I-D.song-ippm-postcard-based-telemetry] is an alternative to PBT [I-D.song-ippm-postcard-based-telemetry] is an alternative to
IOAM. PBT directly exports data at each node through an independent IOAM. PBT directly exports data at each node through an independent
skipping to change at page 33, line 43 skipping to change at page 34, line 4
In some situations, the interconnection between the external event In some situations, the interconnection between the external event
detectors and the management system is via the management plane. For detectors and the management system is via the management plane. For
those situations there will be a special connector that provides the those situations there will be a special connector that provides the
typical interfaces found in most other elements connected to the typical interfaces found in most other elements connected to the
management plane. For instance, the interfaces will accomplish with management plane. For instance, the interfaces will accomplish with
a specific information model (YANG) and specific telemetry protocol, a specific information model (YANG) and specific telemetry protocol,
such as NETCONF, SNMP, or gRPC. such as NETCONF, SNMP, or gRPC.
Authors' Addresses Authors' Addresses
Haoyu Song
Haoyu Song (editor)
Futurewei Futurewei
2330 Central Expressway 2330 Central Expressway
Santa Clara Santa Clara
USA USA
Email: hsong@futurewei.com Email: hsong@futurewei.com
Fengwei Qin Fengwei Qin
China Mobile China Mobile
No. 32 Xuanwumenxi Ave., Xicheng District No. 32 Xuanwumenxi Ave., Xicheng District
Beijing, 100032 Beijing, 100032
P.R. China P.R. China
Email: qinfengwei@chinamobile.com Email: qinfengwei@chinamobile.com
Pedro Martinez-Julia Pedro Martinez-Julia
NICT NICT
 End of changes. 84 change blocks. 
262 lines changed or deleted 276 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/