draft-ietf-opsawg-ntf-00.txt   draft-ietf-opsawg-ntf-01.txt 
OPSAWG H. Song, Ed. OPSAWG H. Song, Ed.
Internet-Draft Huawei Internet-Draft Futurewei
Intended status: Informational Z. Li Intended status: Informational F. Qin
Expires: September 27, 2019 China Mobile Expires: December 13, 2019 China Mobile
P. Martinez-Julia P. Martinez-Julia
NICT NICT
L. Ciavaglia L. Ciavaglia
Nokia Nokia
A. Wang A. Wang
China Telecom China Telecom
March 26, 2019 June 11, 2019
Network Telemetry Framework Network Telemetry Framework
draft-ietf-opsawg-ntf-00 draft-ietf-opsawg-ntf-01
Abstract Abstract
This document provides an architectural framework for network This document provides an architectural framework for network
telemetry to address the current and future network operation telemetry to address the current and future network operation
challenges and requirements. As evidenced by the defining challenges and requirements. As evidenced by some key
characteristics and industry practice, network telemetry covers characteristics and industry practices, network telemetry covers
technologies and protocols beyond the conventional network technologies and protocols beyond the conventional network
Operations, Administration, and Management (OAM). Network telemetry Operations, Administration, and Management (OAM), so it promises
promises better flexibility, scalability, accuracy, coverage, and better flexibility, scalability, accuracy, coverage, and performance
performance and allows automated control loops to suit both today's and allows automated control loops to suit both today's and
and tomorrow's network operation requirements. This document tomorrow's network operation requirements. This document clarifies
clarifies the terminologies and classifies the modules and components the terminologies and classifies the modules and components of a
of a network telemetry system. The framework and taxonomy help to network telemetry system. The framework and taxonomy help to set a
set a common ground for the collection of related work and provide common ground for the collection of related work and provide guidance
guidance for future technique and standard developments. for future technique and standard developments.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 27, 2019. This Internet-Draft will expire on December 13, 2019.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 2, line 26 skipping to change at page 2, line 26
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3 1.1. Requirements Language . . . . . . . . . . . . . . . . . . 3
2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4 2. Motivation . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 4 2.1. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . 5 2.2. Challenges . . . . . . . . . . . . . . . . . . . . . . . 6
2.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 7 2.3. Glossary . . . . . . . . . . . . . . . . . . . . . . . . 7
2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 8 2.4. Network Telemetry . . . . . . . . . . . . . . . . . . . . 8
3. The Necessity of a Network Telemetry Framework . . . . . . . 9 3. The Necessity of a Network Telemetry Framework . . . . . . . 9
4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 10 4. Network Telemetry Framework . . . . . . . . . . . . . . . . . 11
4.1. Data Acquiring Mechanisms . . . . . . . . . . . . . . . . 11 4.1. Data Acquiring Mechanisms . . . . . . . . . . . . . . . . 11
4.2. Data Objects . . . . . . . . . . . . . . . . . . . . . . 12 4.2. Data Objects . . . . . . . . . . . . . . . . . . . . . . 12
4.3. Function Components . . . . . . . . . . . . . . . . . . . 14 4.3. Function Components . . . . . . . . . . . . . . . . . . . 14
4.4. Existing Works Mapped in the Framework . . . . . . . . . 16 4.4. Existing Works Mapped in the Framework . . . . . . . . . 16
5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 17 5. Evolution of Network Telemetry . . . . . . . . . . . . . . . 17
6. Security Considerations . . . . . . . . . . . . . . . . . . . 18 6. Security Considerations . . . . . . . . . . . . . . . . . . . 18
7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 7. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19
8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 19 8. Contributors . . . . . . . . . . . . . . . . . . . . . . . . 19
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 19
10.1. Normative References . . . . . . . . . . . . . . . . . . 19 10.1. Normative References . . . . . . . . . . . . . . . . . . 19
10.2. Informative References . . . . . . . . . . . . . . . . . 20 10.2. Informative References . . . . . . . . . . . . . . . . . 20
Appendix A. A Survey on Existing Network Telemetry Techniques . 23 Appendix A. A Survey on Existing Network Telemetry Techniques . 23
A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 23 A.1. Management Plane Telemetry . . . . . . . . . . . . . . . 23
A.1.1. Requirements and Challenges . . . . . . . . . . . . . 23 A.1.1. Requirements and Challenges . . . . . . . . . . . . . 23
A.1.2. Push Extensions for NETCONF . . . . . . . . . . . . . 23 A.1.2. Push Extensions for NETCONF . . . . . . . . . . . . . 24
A.1.3. gRPC Network Management Interface . . . . . . . . . . 24 A.1.3. gRPC Network Management Interface . . . . . . . . . . 24
A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 24 A.2. Control Plane Telemetry . . . . . . . . . . . . . . . . . 24
A.2.1. Requirements and Challenges . . . . . . . . . . . . . 24 A.2.1. Requirements and Challenges . . . . . . . . . . . . . 24
A.2.2. BGP Monitoring Protocol . . . . . . . . . . . . . . . 25 A.2.2. BGP Monitoring Protocol . . . . . . . . . . . . . . . 25
A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 25 A.3. Data Plane Telemetry . . . . . . . . . . . . . . . . . . 25
A.3.1. Requirements and Challenges . . . . . . . . . . . . . 25 A.3.1. Requirements and Challenges . . . . . . . . . . . . . 26
A.3.2. Technique Taxonomy . . . . . . . . . . . . . . . . . 26 A.3.2. Technique Taxonomy . . . . . . . . . . . . . . . . . 26
A.3.3. The IPFPM technology . . . . . . . . . . . . . . . . 27 A.3.3. The IPFPM technology . . . . . . . . . . . . . . . . 27
A.3.4. Dynamic Network Probe . . . . . . . . . . . . . . . . 28 A.3.4. Dynamic Network Probe . . . . . . . . . . . . . . . . 29
A.3.5. IP Flow Information Export (IPFIX) protocol . . . . . 29 A.3.5. IP Flow Information Export (IPFIX) protocol . . . . . 29
A.3.6. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 29 A.3.6. In-Situ OAM . . . . . . . . . . . . . . . . . . . . . 29
A.3.7. Postcard Based Telemetry . . . . . . . . . . . . . . 29 A.3.7. Postcard Based Telemetry . . . . . . . . . . . . . . 30
A.4. External Data and Event Telemetry . . . . . . . . . . . . 29 A.4. External Data and Event Telemetry . . . . . . . . . . . . 30
A.4.1. Requirements and Challenges . . . . . . . . . . . . . 30 A.4.1. Requirements and Challenges . . . . . . . . . . . . . 30
A.4.2. Sources of External Events . . . . . . . . . . . . . 30 A.4.2. Sources of External Events . . . . . . . . . . . . . 31
A.4.3. Connectors and Interfaces . . . . . . . . . . . . . . 32 A.4.3. Connectors and Interfaces . . . . . . . . . . . . . . 32
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 32
1. Introduction 1. Introduction
Network visibility is essential for network operation. Network Network visibility is essential for network operation. Network
telemetry has been widely considered as an ideal mean to gain telemetry has been considered as an ideal means to gain sufficient
sufficient network visibility with better flexibility, scalability, network visibility with better flexibility, scalability, accuracy,
accuracy, coverage, and performance than conventional OAM coverage, and performance than conventional OAM technologies.
technologies. However, confusion and misunderstandings about the However, network telemetry is a vague term. The scope and coverage
network telemetry remain (e.g., the scope and coverage of the term). of it cause confusion and misunderstandings. It is beneficial to
We need an unambiguous concept and a clear architectural framework have an unambiguous concept and a clear architectural framework for
for network telemetry so we can better align the related technology network telemetry, so we can better align the related technology and
and standard work. standard work.
First, we show some key characteristics of network telemetry which First, we show some key characteristics of network telemetry which
set a clear distinction from the conventional network OAM and show set a clear distinction from the conventional network OAM and show
that some conventional OAM technologies can be considered a subset of that some conventional OAM technologies can be considered a subset of
the network telemetry technologies. We then provide an architectural the network telemetry technologies. We then provide an architectural
framework for network telemetry to meet the current and future framework for network telemetry to meet the current and future
network operation requirements. Following the framework, we classify network operation requirements. Following the framework, we classify
the components of a network telemetry system so we can easily map the the components of a network telemetry system so we can easily map the
existing and emerging techniques and protocols into the framework. existing and emerging techniques and protocols into the framework.
At last, we outline a roadmap for the evolution of the network At last, we outline a roadmap for the evolution of the network
skipping to change at page 4, line 8 skipping to change at page 4, line 8
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in BCP "OPTIONAL" in this document are to be interpreted as described in BCP
14 [RFC2119][RFC8174] when, and only when, they appear in all 14 [RFC2119][RFC8174] when, and only when, they appear in all
capitals, as shown here. capitals, as shown here.
2. Motivation 2. Motivation
Thanks to the advance of the computing and storage technologies, Thanks to the advance of the computing and storage technologies,
today's big data analytics and machine learning-based Artificial today's big data analytics gives network operators an unprecedented
Intelligence (AI) give network operators an unprecedented opportunity opportunity to gain network insights and move towards network
to gain network insights and move towards network autonomy. Software autonomy. Some operators start to explore the application of
Artificial Intelligence (AI) to make sense of network data. Software
tools can use the network data to detect and react on network faults, tools can use the network data to detect and react on network faults,
anomalies, and policy violations, as well as predicting future anomalies, and policy violations, as well as predicting future
events. In turn, the network policy updates for planning, intrusion events. In turn, the network policy updates for planning, intrusion
prevention, optimization, and self-healing may be applied. prevention, optimization, and self-healing may be applied.
It is conceivable that an intent-driven autonomous network is the It is conceivable that an intent-driven autonomic network [RFC7575]
logical next step for network evolution following Software Defined is the logical next step for network evolution following Software
Network (SDN), aiming to reduce (or even eliminate) human labor, make Defined Network (SDN), aiming to reduce (or even eliminate) human
the most efficient usage of network resources, and provide better labor, make the most efficient usage of network resources, and
services more aligned with customer requirements. Although it takes provide better services more aligned with customer requirements.
time to reach the ultimate goal, the journey has started Although it takes time to reach the ultimate goal, the journey has
nevertheless. started nevertheless.
However, the system bottleneck is shifting from data consumption to However, while the data processing capability is improved and
data supply. Both the number of network nodes and the traffic applications are hungry for more data, the networks lag behind in
extracting and translating network data into useful and actionable
information. The system bottleneck is shifting from data consumption
to data supply. Both the number of network nodes and the traffic
bandwidth keep increasing at a fast pace. The network configuration bandwidth keep increasing at a fast pace. The network configuration
and policy change at a much smaller time slot than ever before. More and policy change at a much smaller time slot than ever before. More
subtle events and fine-grained data through all network planes need subtle events and fine-grained data through all network planes need
to be captured and exported in real time. In a nutshell, it is a to be captured and exported in real time. In a nutshell, it is a
challenge to get enough high-quality data out of network efficiently, challenge to get enough high-quality data out of network efficiently,
timely, and flexibly. Therefore, we need to examine the existing timely, and flexibly. Therefore, we need to examine the existing
network technologies and protocols, and identify any potential network technologies and protocols, and identify any potential
technique and standard gaps based on the real network and device technique and standard gaps based on the real network and device
architectures. architectures.
skipping to change at page 4, line 47 skipping to change at page 4, line 51
cases for today's and future network operations. Next, we show why cases for today's and future network operations. Next, we show why
the current network OAM techniques and protocols are insufficient for the current network OAM techniques and protocols are insufficient for
these use cases. The discussion underlines the need of new methods, these use cases. The discussion underlines the need of new methods,
techniques, and protocols which we may assign under an umbrella term techniques, and protocols which we may assign under an umbrella term
- network telemetry. - network telemetry.
2.1. Use Cases 2.1. Use Cases
These use cases are essential for network operations. While the list These use cases are essential for network operations. While the list
is by no means exhaustive, it is enough to highlight the requirements is by no means exhaustive, it is enough to highlight the requirements
for data velocity, variety, and volume in networks. for data velocity, variety, volume, and veracity in networks.
Policy and Intent Compliance: Network policies are the rules that Policy and Intent Compliance: Network policies are the rules that
constraint the services for network access, provide service constraint the services for network access, provide service
differentiation, or enforce specific treatment on the traffic. differentiation, or enforce specific treatment on the traffic.
For example, a service function chain is a policy that requires For example, a service function chain is a policy that requires
the selected flows to pass through a set of ordered network the selected flows to pass through a set of ordered network
functions. An intents is a high-level abstract policy which functions. An intent is a high-level abstract policy which
requires a complex translation and mapping process before being requires a complex translation and mapping process before being
applied on networks. While a policy is enforced, the compliance applied on networks. While a policy is enforced, the compliance
needs to be verified and monitored continuously. needs to be verified and monitored continuously.
SLA Compliance: A Service-Level Agreement (SLA) defines the level of SLA Compliance: A Service-Level Agreement (SLA) defines the level of
service a user expects from a network operator, which include the service a user expects from a network operator, which include the
metrics for the service measurement and remedy/penalty procedures metrics for the service measurement and remedy/penalty procedures
when the service level misses the agreement. Users need to check when the service level misses the agreement. Users need to check
if they get the service as promised and network operators need to if they get the service as promised and network operators need to
evaluate how they can deliver the services that can meet the SLA. evaluate how they can deliver the services that can meet the SLA.
skipping to change at page 6, line 25 skipping to change at page 6, line 32
o Comprehensive data is needed from packet processing engine to o Comprehensive data is needed from packet processing engine to
traffic manager, from line cards to main control board, from user traffic manager, from line cards to main control board, from user
flows to control protocol packets, from device configurations to flows to control protocol packets, from device configurations to
operations, and from physical layer to application layer. operations, and from physical layer to application layer.
Conventional OAM only covers a narrow range of data (e.g., SNMP Conventional OAM only covers a narrow range of data (e.g., SNMP
only handles data from the Management Information Base (MIB)). only handles data from the Management Information Base (MIB)).
Traditional network devices cannot provide all the necessary Traditional network devices cannot provide all the necessary
probes. An open and programmable network device is therefore probes. An open and programmable network device is therefore
needed. needed.
o Many application scenarios need to correlate data from multiple o Many application scenarios need to correlate network-wide data
sources (i.e., from distributed network devices, different from multiple sources (i.e., from distributed network devices,
components of a network device, or different network planes). A different components of a network device, or different network
piecemeal solution is often lacking the capability to consolidate planes). A piecemeal solution is often lacking the capability to
the data from multiple sources. The composition of a complete consolidate the data from multiple sources. The composition of a
solution, as partly proposed by Autonomic Resource Control complete solution, as partly proposed by Autonomic Resource
Architecture(ARCA) [I-D.pedro-nmrg-anticipated-adaptation], will Control Architecture(ARCA)
be empowered and guided by a comprehensive framework. [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and
guided by a comprehensive framework.
o Some of the conventional OAM techniques (e.g., CLI and Syslog) are o Some of the conventional OAM techniques (e.g., CLI and Syslog) are
lack of formal data model. The unstructured data hinder the tool lack of formal data model. The unstructured data hinder the tool
automation and application extensibility. Standardized data automation and application extensibility. Standardized data
models are essential to support the programmable networks. models are essential to support the programmable networks.
o Although some conventional OAM techniques support data push (e.g., o Although some conventional OAM techniques support data push (e.g.,
SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data
are limited to only predefined management plane warnings (e.g., are limited to only predefined management plane warnings (e.g.,
SNMP Trap) or sampled user packets (e.g., sFlow). We require the SNMP Trap) or sampled user packets (e.g., sFlow). We require the
skipping to change at page 7, line 24 skipping to change at page 7, line 31
BMP: BGP Monitoring Protocol BMP: BGP Monitoring Protocol
DNP: Dynamic Network Probe DNP: Dynamic Network Probe
DPI: Deep Packet Inspection DPI: Deep Packet Inspection
gNMI: gRPC Network Management Interface gNMI: gRPC Network Management Interface
gRPC: gRPC Remote Procedure Call gRPC: gRPC Remote Procedure Call
IDN: Intent-Driven Network
IPFIX: IP Flow Information Export Protocol IPFIX: IP Flow Information Export Protocol
IPFPM: IP Flow Performance Measurement IPFPM: IP Flow Performance Measurement
IOAM: In-situ OAM IOAM: In-situ OAM
NETCONF: Network Configuration Protocol NETCONF: Network Configuration Protocol
Network Telemetry: Acquiring network data remotely for network Network Telemetry: Acquiring network data remotely for network
monitoring and operation. A general term for a large set of monitoring and operation. A general term for a large set of
skipping to change at page 7, line 49 skipping to change at page 8, line 5
evolution toward intent-driven autonomous networks. evolution toward intent-driven autonomous networks.
NMS: Network Management System NMS: Network Management System
OAM: Operations, Administration, and Maintenance. A group of OAM: Operations, Administration, and Maintenance. A group of
network management functions that provide network fault network management functions that provide network fault
indication, fault localization, performance information, and data indication, fault localization, performance information, and data
and diagnosis functions. Most conventional network monitoring and diagnosis functions. Most conventional network monitoring
techniques and protocols belong to network OAM. techniques and protocols belong to network OAM.
PBT: Postcard-Based Telemetry
SNMP: Simple Network Management Protocol SNMP: Simple Network Management Protocol
YANG: A data modeling language for NETCONF YANG: A data modeling language for NETCONF
YANG FSM: A YANG model to define device side finite state machine YANG FSM: A YANG model to define device side finite state machine
YANG PUSH: A method to subscribe pushed data from remote YANG YANG PUSH: A method to subscribe pushed data from remote YANG
datastore datastore
2.4. Network Telemetry 2.4. Network Telemetry
Network telemetry has emerged as a mainstream technical term to refer Network telemetry has emerged as a mainstream technical term to refer
to the newer data collection and consumption techniques, to the newer data collection and consumption techniques,
distinguishing itself from the convention techniques for network OAM. distinguishing itself from the convention techniques for network OAM.
skipping to change at page 8, line 24 skipping to change at page 8, line 31
The representative techniques and protocols include IPFIX [RFC7011] The representative techniques and protocols include IPFIX [RFC7011]
and gPRC [I-D.kumar-rtgwg-grpc-protocol]. Network telemetry allows and gPRC [I-D.kumar-rtgwg-grpc-protocol]. Network telemetry allows
separate entities to acquire data from network devices so that data separate entities to acquire data from network devices so that data
can be visualized and analyzed to support network monitoring and can be visualized and analyzed to support network monitoring and
operation. Network telemetry overlaps with the conventional network operation. Network telemetry overlaps with the conventional network
OAM and has a wider scope than it. It is expected that network OAM and has a wider scope than it. It is expected that network
telemetry can provide the necessary network insight for autonomous telemetry can provide the necessary network insight for autonomous
networks and address the shortcomings of conventional OAM techniques. networks and address the shortcomings of conventional OAM techniques.
One difference between the network telemetry and the network OAM is One difference between the network telemetry and the network OAM is
that the network telemetry assumes machines as data consumer, while that the network telemetry assumes machines as data consumer rather
the conventional network OAM usually assumes human operators. Hence, than human operators. Hence, the network telemetry can directly
the network telemetry can directly trigger the automated network trigger the automated network operation, while the conventional OAM
operation, but the conventional OAM tools only help human operators tools usually help human operators to monitor and diagnose the
to monitor and diagnose the networks and guide manual network networks and guide manual network operations. The difference leads
operations. The difference leads to very different techniques. to very different techniques.
Although the network telemetry techniques are just emerging and Although the network telemetry techniques are just emerging and
subject to continuous evolution, several characteristics of network subject to continuous evolution, several characteristics of network
telemetry have been well accepted (Note that network telemetry is telemetry have been well accepted (Note that network telemetry is
intended to be an umbrella term covering a wide spectrum of intended to be an umbrella term covering a wide spectrum of
techniques, so the following characteristics are not expected to be techniques, so the following characteristics are not expected to be
held by every specific technique): held by every specific technique):
o Push and Streaming: Instead of polling data from network devices, o Push and Streaming: Instead of polling data from network devices,
the telemetry collector subscribes to the streaming data pushed the telemetry collector subscribes to the streaming data pushed
from data sources in network devices. from data sources in network devices.
o Volume and Velocity: The telemetry data is intended to be consumed o Volume and Velocity: The telemetry data is intended to be consumed
by machine rather than by a human. Therefore, the data volume is by machines rather than by human being. Therefore, the data
huge and the processing is often in realtime. volume is huge and the processing is often in realtime.
o Normalization and Unification: Telemetry aims to address the o Normalization and Unification: Telemetry aims to address the
overall network automation needs. The piecemeal solutions offered overall network automation needs. The piecemeal solutions offered
by the conventional OAM approach are no longer suitable. Efforts by the conventional OAM approach are no longer suitable. Efforts
need to be made to normalize the data representation and unify the need to be made to normalize the data representation and unify the
protocols. protocols.
o Model-based: The telemetry data is modeled in advance which allows o Model-based: The telemetry data is modeled in advance which allows
applications to configure and consume data with ease. applications to configure and consume data with ease.
skipping to change at page 11, line 27 skipping to change at page 11, line 33
devices. It is usually used in a more interactive environment. The devices. It is usually used in a more interactive environment. The
queried data may be directly extracted from some specific data queried data may be directly extracted from some specific data
source, or synthesized and processed from raw data. source, or synthesized and processed from raw data.
There are four types of data from network devices: There are four types of data from network devices:
Simple Data: The data that are steadily available from some data Simple Data: The data that are steadily available from some data
store or static probes in network devices. such data can be store or static probes in network devices. such data can be
specified by YANG model. specified by YANG model.
Custom Data: The data need to be synthesized or processed from raw Complex Data: The data need to be synthesized or processed from raw
data from one or more network devices. The data processing data from one or more network devices. The data processing
function can be statically or dynamically loaded into network function can be statically or dynamically loaded into network
devices. devices.
Event-triggered Data: The data are conditionally acquired based on Event-triggered Data: The data are conditionally acquired based on
the occurrence of some event. An event can be modeled as a Finite the occurrence of some event. An event can be modeled as a Finite
State Machine (FSM). State Machine (FSM).
Streaming Data: The data are continuously or periodically generated. Streaming Data: The data are continuously or periodically generated.
It can be time series or the dump of databases. The streaming It can be time series or the dump of databases. The streaming
data reflect realtime network states and metrics and require large data reflect realtime network states and metrics and require large
bandwidth and processing power. bandwidth and processing power.
The above data types are not mutual exclusive. For example, event- The above data types are not mutual exclusive. For example, event-
triggered data can be simple or custom, and streaming data can be triggered data can be simple or complex, and streaming data can be
event triggered. The relationships of these data types are event triggered. The relationships of these data types are
illustrated in Figure 1 illustrated in Figure 1
+--------------------------+ +--------------------------+
| +----------------------+ | | +----------------------+ |
| | +-----------------+ | | | | +-----------------+ | |
| | | +-------------+ | | | | | | +-------------+ | | |
| | | | Simple Data | | | | | | | | Simple Data | | | |
| | | +-------------+ | | | | | | +-------------+ | | |
| | | Custom Data | | | | | | Complex Data | | |
| | +-----------------+ | | | | +-----------------+ | |
| | Event-triggered Data | | | | Event-triggered Data | |
| +----------------------+ | | +----------------------+ |
| Streaming Data | | Streaming Data |
+--------------------------+ +--------------------------+
Figure 1: Data Type Relationship Figure 1: Data Type Relationship
Subscription usually deals with event-triggered data and streaming Subscription usually deals with event-triggered data and streaming
data, and query usually deals with simple data and custom data. It data, and query usually deals with simple data and complex data. It
is easy to see that conventional OAM techniques are mostly about is easy to see that conventional OAM techniques are mostly about
querying simple data only. While these techniques are still useful, querying simple data only. While these techniques are still useful,
advanced network telemetry techniques pay more attention on the other advanced network telemetry techniques pay more attention on the other
three data types, and prefer event/streaming data subscription and three data types, and prefer event/streaming data subscription and
custom data query over simple data query. complex data query over simple data query.
4.2. Data Objects 4.2. Data Objects
Telemetry can be applied on the forwarding plane, the control plane, Telemetry can be applied on the forwarding plane, the control plane,
and the management plane in a network, as well as other sources out and the management plane in a network, as well as other sources out
of the network, as shown in Figure 2. Therefore, we categorize the of the network, as shown in Figure 2. Therefore, we categorize the
network telemetry into four distinct modules. network telemetry into four distinct modules.
+------------------------------+ +------------------------------+
| | | |
skipping to change at page 14, line 11 skipping to change at page 14, line 11
We summarize the major differences of the four modules in the We summarize the major differences of the four modules in the
following table. Some representative techniques are shown in some following table. Some representative techniques are shown in some
table blocks to highlight the technical diversity of these modules. table blocks to highlight the technical diversity of these modules.
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
| Module | Control | Management | Forwarding | External | | Module | Control | Management | Forwarding | External |
| | Plane | Plane | Plane | Data | | | Plane | Plane | Plane | Data |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Object | control | config. & | flow & packet| terminal, | |Object | control | config. & | flow & packet| terminal, |
| | protocol & | operation | QoS, traffic | social & | | | protocol & | operation | QoS, traffic | social & |
| | signaling, | state, MIB | stat., buffer| environ- | | | signaling, | state, MIB | stat., buffer| environ- |
| | RIB, ACL | | & queue stat.| mental | | | RIB, ACL | | & queue stat.| mental |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Export | main control | main control | fwding chip | various | |Export | main control | main control | fwding chip | various |
|Location | CPU, | CPU | or linecard | | |Location | CPU, | CPU | or linecard | |
| | linecard CPU | | CPU; main | | | | linecard CPU | | CPU; main | |
| | or fwding | | control CPU | | | | or fwding | | control CPU | |
| | chip | | unlikely | | | | chip | | unlikely | |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Model | YANG, | MIB, syslog, | template, | YANG | |Model | YANG, | MIB, syslog, | template, | YANG |
| | custom | YANG, | YANG, | | | | custom | YANG, | YANG, | |
| | | custom | custom | | | | | custom | custom | |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Encoding | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | |Encoding | GPB, JSON, | GPB, JSON, | plain | GPB, JSON |
| | XML, plain | XML | | XML, plain| | | XML, plain | XML | | XML, plain|
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC | |Protocol | gRPC,NETCONF,| gPRC,NETCONF,| IPFIX, mirror| gRPC |
| | IPFIX,mirror | | | | | | IPFIX,mirror | | | |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
|Transport| HTTP, TCP, | HTTP, TCP | UDP | TCP, UDP | |Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP |
| | UDP | | | | | | UDP | | | UDP |
+---------+--------------+--------------+--------------+-----------+ +---------+--------------+--------------+--------------+-----------+
Figure 3: Layer Category of the Network Telemetry Framework Figure 3: Layer Category of the Network Telemetry Framework
Note that the interaction with the network operation applications can Note that the interaction with the network operation applications can
be indirect. For example, in the management plane telemetry, the be indirect. For example, in the management plane telemetry, the
management plane may need to acquire data from the data plane. Some management plane may need to acquire data from the data plane. Some
of the operational states can only be derived from the data plane of the operational states can only be derived from the data plane
such as the interface status and statistics. For another example, such as the interface status and statistics. For another example,
the control plane telemetry may need to access the FIB in data plane. the control plane telemetry may need to access the FIB in data plane.
skipping to change at page 17, line 11 skipping to change at page 17, line 11
works (mainly published in IETF and with the emphasis on the latest works (mainly published in IETF and with the emphasis on the latest
new technologies) and shows their positions in the framework. The new technologies) and shows their positions in the framework. The
details about the mentioned work can be found in Appendix A. details about the mentioned work can be found in Appendix A.
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| | Query | Subscription | | | Query | Subscription |
| | | | | | | |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Simple Data | SNMP, NETCONF,| | | Simple Data | SNMP, NETCONF,| |
| | YANG, BMP, | | | | YANG, BMP, | |
| | IOAM, PBT | | | | IOAM, PBT,gPRC| |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Custom Data | DNP, YANG FSM | | | Custom Data | DNP, YANG FSM | |
| | gRPC, NETCONF | | | | gRPC, NETCONF | |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Event-triggered | | gRPC, NETCONF, | | Event-triggered | | gRPC, NETCONF, |
| Data | | YANG PUSH, DNP | | Data | | YANG PUSH, DNP |
| | | IOAM, PBT, | | | | IOAM, PBT, |
| | | YANG FSM | | | | YANG FSM |
+-----------------+---------------+----------------+ +-----------------+---------------+----------------+
| Streaming Data | | gRPC, NETCONF, | | Streaming Data | | gRPC, NETCONF, |
skipping to change at page 19, line 40 skipping to change at page 19, line 40
The other major contributors of this document are listed as follows. The other major contributors of this document are listed as follows.
o Tianran Zhou o Tianran Zhou
o Zhenbin Li o Zhenbin Li
o Daniel King o Daniel King
9. Acknowledgments 9. Acknowledgments
We would like to thank Adrian Farrel, Randy Presuhn, Victor Liu, We would like to thank Adrian Farrel, Randy Presuhn, Joe Clarke,
James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan Gu, Parviz Victor Liu, James Guichard, Uri Blumenthal, Giuseppe Fioccola, Yunan
Yegani, Young Lee, Alexander Clemm, Joe Clarke, and many others who Gu, Parviz Yegani, Young Lee, Alexander Clemm, Qin Wu, and many
have provided helpful comments and suggestions to improve this others who have provided helpful comments and suggestions to improve
document. this document.
10. References 10. References
10.1. Normative References 10.1. Normative References
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
Requirement Levels", BCP 14, RFC 2119, Requirement Levels", BCP 14, RFC 2119,
DOI 10.17487/RFC2119, March 1997, DOI 10.17487/RFC2119, March 1997,
<https://www.rfc-editor.org/info/rfc2119>. <https://www.rfc-editor.org/info/rfc2119>.
skipping to change at page 20, line 32 skipping to change at page 20, line 32
[I-D.fioccola-ippm-multipoint-alt-mark] [I-D.fioccola-ippm-multipoint-alt-mark]
Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto,
"Multipoint Alternate Marking method for passive and "Multipoint Alternate Marking method for passive and
hybrid performance monitoring", draft-fioccola-ippm- hybrid performance monitoring", draft-fioccola-ippm-
multipoint-alt-mark-04 (work in progress), June 2018. multipoint-alt-mark-04 (work in progress), June 2018.
[I-D.ietf-grow-bmp-adj-rib-out] [I-D.ietf-grow-bmp-adj-rib-out]
Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S. Evens, T., Bayraktar, S., Lucente, P., Mi, K., and S.
Zhuang, "Support for Adj-RIB-Out in BGP Monitoring Zhuang, "Support for Adj-RIB-Out in BGP Monitoring
Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-04 (work Protocol (BMP)", draft-ietf-grow-bmp-adj-rib-out-05 (work
in progress), March 2019. in progress), June 2019.
[I-D.ietf-grow-bmp-local-rib] [I-D.ietf-grow-bmp-local-rib]
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente,
"Support for Local RIB in BGP Monitoring Protocol (BMP)", "Support for Local RIB in BGP Monitoring Protocol (BMP)",
draft-ietf-grow-bmp-local-rib-03 (work in progress), March draft-ietf-grow-bmp-local-rib-04 (work in progress), June
2019. 2019.
[I-D.ietf-netconf-udp-pub-channel] [I-D.ietf-netconf-udp-pub-channel]
Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication Zheng, G., Zhou, T., and A. Clemm, "UDP based Publication
Channel for Streaming Telemetry", draft-ietf-netconf-udp- Channel for Streaming Telemetry", draft-ietf-netconf-udp-
pub-channel-05 (work in progress), March 2019. pub-channel-05 (work in progress), March 2019.
[I-D.ietf-netconf-yang-push] [I-D.ietf-netconf-yang-push]
Clemm, A., Voit, E., Prieto, A., Tripathy, A., Nilsen- Clemm, A. and E. Voit, "Subscription to YANG Datastores",
Nygaard, E., Bierman, A., and B. Lengyel, "Subscription to draft-ietf-netconf-yang-push-25 (work in progress), May
YANG Datastores", draft-ietf-netconf-yang-push-22 (work in 2019.
progress), February 2019.
[I-D.kumar-rtgwg-grpc-protocol] [I-D.kumar-rtgwg-grpc-protocol]
Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC Kumar, A., Kolhe, J., Ghemawat, S., and L. Ryan, "gRPC
Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in Protocol", draft-kumar-rtgwg-grpc-protocol-00 (work in
progress), July 2016. progress), July 2016.
[I-D.openconfig-rtgwg-gnmi-spec] [I-D.openconfig-rtgwg-gnmi-spec]
Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack, Shakir, R., Shaikh, A., Borman, P., Hines, M., Lebsack,
C., and C. Morrow, "gRPC Network Management Interface C., and C. Morrow, "gRPC Network Management Interface
(gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in (gNMI)", draft-openconfig-rtgwg-gnmi-spec-01 (work in
progress), March 2018. progress), March 2018.
[I-D.pedro-nmrg-anticipated-adaptation] [I-D.pedro-nmrg-anticipated-adaptation]
Martinez-Julia, P., "Exploiting External Event Detectors Martinez-Julia, P., "Exploiting External Event Detectors
to Anticipate Resource Requirements for the Elastic to Anticipate Resource Requirements for the Elastic
Adaptation of SDN/NFV Systems", draft-pedro-nmrg- Adaptation of SDN/NFV Systems", draft-pedro-nmrg-
anticipated-adaptation-02 (work in progress), June 2018. anticipated-adaptation-02 (work in progress), June 2018.
[I-D.song-ippm-postcard-based-telemetry] [I-D.song-ippm-postcard-based-telemetry]
Song, H., Zhou, T., Li, Z., and J. Shin, "Postcard-based Song, H., Zhou, T., Li, Z., and J. Shin, "Postcard-based
In-band Flow Data Telemetry", draft-song-ippm-postcard- On-Path Flow Data Telemetry", draft-song-ippm-postcard-
based-telemetry-02 (work in progress), March 2019. based-telemetry-03 (work in progress), April 2019.
[I-D.song-opsawg-dnp4iq] [I-D.song-opsawg-dnp4iq]
Song, H. and J. Gong, "Requirements for Interactive Query Song, H. and J. Gong, "Requirements for Interactive Query
with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01 with Dynamic Network Probes", draft-song-opsawg-dnp4iq-01
(work in progress), June 2017. (work in progress), June 2017.
[I-D.zhou-netconf-multi-stream-originators] [I-D.zhou-netconf-multi-stream-originators]
Zhou, T., Zheng, G., Voit, E., Clemm, A., and A. Bierman, Zhou, T., Zheng, G., Voit, E., Clemm, A., and A. Bierman,
"Subscription to Multiple Stream Originators", draft-zhou- "Subscription to Multiple Stream Originators", draft-zhou-
netconf-multi-stream-originators-04 (work in progress), netconf-multi-stream-originators-04 (work in progress),
skipping to change at page 22, line 41 skipping to change at page 22, line 41
Weingarten, "An Overview of Operations, Administration, Weingarten, "An Overview of Operations, Administration,
and Maintenance (OAM) Tools", RFC 7276, and Maintenance (OAM) Tools", RFC 7276,
DOI 10.17487/RFC7276, June 2014, DOI 10.17487/RFC7276, June 2014,
<https://www.rfc-editor.org/info/rfc7276>. <https://www.rfc-editor.org/info/rfc7276>.
[RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext [RFC7540] Belshe, M., Peon, R., and M. Thomson, Ed., "Hypertext
Transfer Protocol Version 2 (HTTP/2)", RFC 7540, Transfer Protocol Version 2 (HTTP/2)", RFC 7540,
DOI 10.17487/RFC7540, May 2015, DOI 10.17487/RFC7540, May 2015,
<https://www.rfc-editor.org/info/rfc7540>. <https://www.rfc-editor.org/info/rfc7540>.
[RFC7575] Behringer, M., Pritikin, M., Bjarnason, S., Clemm, A.,
Carpenter, B., Jiang, S., and L. Ciavaglia, "Autonomic
Networking: Definitions and Design Goals", RFC 7575,
DOI 10.17487/RFC7575, June 2015,
<https://www.rfc-editor.org/info/rfc7575>.
[RFC7799] Morton, A., "Active and Passive Metrics and Methods (with [RFC7799] Morton, A., "Active and Passive Metrics and Methods (with
Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799,
May 2016, <https://www.rfc-editor.org/info/rfc7799>. May 2016, <https://www.rfc-editor.org/info/rfc7799>.
[RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP [RFC7854] Scudder, J., Ed., Fernando, R., and S. Stuart, "BGP
Monitoring Protocol (BMP)", RFC 7854, Monitoring Protocol (BMP)", RFC 7854,
DOI 10.17487/RFC7854, June 2016, DOI 10.17487/RFC7854, June 2016,
<https://www.rfc-editor.org/info/rfc7854>. <https://www.rfc-editor.org/info/rfc7854>.
[RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, [RFC8321] Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli,
skipping to change at page 28, line 49 skipping to change at page 29, line 12
detailed monitoring is performed. After the detection and resolution detailed monitoring is performed. After the detection and resolution
of the problem the initial approximate monitoring can be used again. of the problem the initial approximate monitoring can be used again.
A.3.4. Dynamic Network Probe A.3.4. Dynamic Network Probe
Hardware-based Dynamic Network Probe (DNP) [I-D.song-opsawg-dnp4iq] Hardware-based Dynamic Network Probe (DNP) [I-D.song-opsawg-dnp4iq]
provides a programmable means to customize the data that an provides a programmable means to customize the data that an
application collects from the data plane. A direct benefit of DNP is application collects from the data plane. A direct benefit of DNP is
the reduction of the exported data. A full DNP solution covers the reduction of the exported data. A full DNP solution covers
several components including data source, data subscription, and data several components including data source, data subscription, and data
generation. The data subscription needs to define the custom data generation. The data subscription needs to define the complex data
which can be composed and derived from the raw data sources. The which can be composed and derived from the raw data sources. The
data generation takes advantage of the moderate in-network computing data generation takes advantage of the moderate in-network computing
to produce the desired data. to produce the desired data.
While DNP can introduce unforeseeable flexibility to the data plane While DNP can introduce unforeseeable flexibility to the data plane
telemetry, it also faces some challenges. It requires a flexible telemetry, it also faces some challenges. It requires a flexible
data plane that can be dynamically reprogrammed at run-time. The data plane that can be dynamically reprogrammed at run-time. The
programming API is yet to be defined. programming API is yet to be defined.
A.3.5. IP Flow Information Export (IPFIX) protocol A.3.5. IP Flow Information Export (IPFIX) protocol
skipping to change at page 32, line 34 skipping to change at page 33, line 4
In some situations, the interconnection between the external event In some situations, the interconnection between the external event
detectors and the management system is via the management plane. For detectors and the management system is via the management plane. For
those situations there will be a special connector that provides the those situations there will be a special connector that provides the
typical interfaces found in most other elements connected to the typical interfaces found in most other elements connected to the
management plane. For instance, the interfaces will accomplish with management plane. For instance, the interfaces will accomplish with
a specific information model (YANG) and specific telemetry protocol, a specific information model (YANG) and specific telemetry protocol,
such as NETCONF, SNMP, or gRPC. such as NETCONF, SNMP, or gRPC.
Authors' Addresses Authors' Addresses
Haoyu Song (editor) Haoyu Song (editor)
Huawei Futurewei
2330 Central Expressway 2330 Central Expressway
Santa Clara Santa Clara
USA USA
Email: haoyu.song@huawei.com Email: hsong@futurewei.com
Zhenqiang Li Fengwei Qin
China Mobile China Mobile
No. 32 Xuanwumenxi Ave., Xicheng District No. 32 Xuanwumenxi Ave., Xicheng District
Beijing, 100032 Beijing, 100032
P.R. China P.R. China
Email: lizhenqiang@chinamobile.com Email: qinfengwei@chinamobile.com
Pedro Martinez-Julia Pedro Martinez-Julia
NICT NICT
4-2-1, Nukui-Kitamachi 4-2-1, Nukui-Kitamachi
Koganei, Tokyo 184-8795 Koganei, Tokyo 184-8795
Japan Japan
Email: pedro@nict.go.jp Email: pedro@nict.go.jp
Laurent Ciavaglia Laurent Ciavaglia
Nokia Nokia
 End of changes. 45 change blocks. 
93 lines changed or deleted 104 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/