draft-ietf-opsawg-ntf-08.txt | draft-ietf-opsawg-ntf-09.txt | |||
---|---|---|---|---|
OPSAWG H. Song | OPSAWG H. Song | |||
Internet-Draft Futurewei | Internet-Draft Futurewei | |||
Intended status: Informational F. Qin | Intended status: Informational F. Qin | |||
Expires: 10 April 2022 China Mobile | Expires: 16 April 2022 China Mobile | |||
P. Martinez-Julia | P. Martinez-Julia | |||
NICT | NICT | |||
L. Ciavaglia | L. Ciavaglia | |||
Nokia | Nokia | |||
A. Wang | A. Wang | |||
China Telecom | China Telecom | |||
7 October 2021 | 13 October 2021 | |||
Network Telemetry Framework | Network Telemetry Framework | |||
draft-ietf-opsawg-ntf-08 | draft-ietf-opsawg-ntf-09 | |||
Abstract | Abstract | |||
Network telemetry is a technology for gaining network insight and | Network telemetry is a technology for gaining network insight and | |||
facilitating efficient and automated network management. It | facilitating efficient and automated network management. It | |||
encompasses various techniques for remote data generation, | encompasses various techniques for remote data generation, | |||
collection, correlation, and consumption. This document describes an | collection, correlation, and consumption. This document describes an | |||
architectural framework for network telemetry, motivated by | architectural framework for network telemetry, motivated by | |||
challenges that are encountered as part of the operation of networks | challenges that are encountered as part of the operation of networks | |||
and by the requirements that ensue. This document clarifies the | and by the requirements that ensue. This document clarifies the | |||
terminologies and classifies the modules and components of a network | terminologies and classifies the modules and components of a network | |||
telemetry system from several different perspectives. The framework | telemetry system from different perspectives. The framework and | |||
and taxonomy help to set a common ground for the collection of | taxonomy help to set a common ground for the collection of related | |||
related work and provide guidance for related technique and standard | work and provide guidance for related technique and standard | |||
developments. | developments. | |||
Status of This Memo | Status of This Memo | |||
This Internet-Draft is submitted in full conformance with the | This Internet-Draft is submitted in full conformance with the | |||
provisions of BCP 78 and BCP 79. | provisions of BCP 78 and BCP 79. | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at https://datatracker.ietf.org/drafts/current/. | Drafts is at https://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on 10 April 2022. | This Internet-Draft will expire on 16 April 2022. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2021 IETF Trust and the persons identified as the | Copyright (c) 2021 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents (https://trustee.ietf.org/ | Provisions Relating to IETF Documents (https://trustee.ietf.org/ | |||
license-info) in effect on the date of publication of this document. | license-info) in effect on the date of publication of this document. | |||
Please review these documents carefully, as they describe your rights | Please review these documents carefully, as they describe your rights | |||
skipping to change at page 3, line 48 ¶ | skipping to change at page 3, line 48 ¶ | |||
techniques and standard works. | techniques and standard works. | |||
To fulfill such an undertaking, we first discuss some key | To fulfill such an undertaking, we first discuss some key | |||
characteristics of network telemetry which set a clear distinction | characteristics of network telemetry which set a clear distinction | |||
from the conventional network OAM and show that some conventional OAM | from the conventional network OAM and show that some conventional OAM | |||
technologies can be considered a subset of the network telemetry | technologies can be considered a subset of the network telemetry | |||
technologies. We then provide an architectural framework for network | technologies. We then provide an architectural framework for network | |||
telemetry which includes four modules, each concerned with a | telemetry which includes four modules, each concerned with a | |||
different category of telemetry data and corresponding procedures. | different category of telemetry data and corresponding procedures. | |||
All the modules are internally structured in the same way, including | All the modules are internally structured in the same way, including | |||
components that allow to configure data sources with regards to what | components that allow to configure data sources in regard to what | |||
data to generate and how to make that available to client | data to generate and how to make that available to client | |||
applications, components that instrument the underlying data sources, | applications, components that instrument the underlying data sources, | |||
and components that perform the actual rendering, encoding, and | and components that perform the actual rendering, encoding, and | |||
exporting of the generated data. We show how the network telemetry | exporting of the generated data. We show how the network telemetry | |||
framework can benefit the current and future network operations. | framework can benefit the current and future network operations. | |||
Based on the distinction of modules and function components, we can | Based on the distinction of modules and function components, we can | |||
map the existing and emerging techniques and protocols into the | map the existing and emerging techniques and protocols into the | |||
framework. The framework can also simplify the tasks for designing, | framework. The framework can also simplify the tasks for designing, | |||
maintaining, and understanding a network telemetry system. At last, | maintaining, and understanding a network telemetry system. At last, | |||
we outline the evolution stages of the network telemetry system and | we outline the evolution stages of the network telemetry system and | |||
skipping to change at page 4, line 49 ¶ | skipping to change at page 4, line 49 ¶ | |||
DPI: Deep Packet Inspection, referring to the techniques that | DPI: Deep Packet Inspection, referring to the techniques that | |||
examines packet beyond packet L3/L4 headers. | examines packet beyond packet L3/L4 headers. | |||
gNMI: gRPC Network Management Interface, a network management | gNMI: gRPC Network Management Interface, a network management | |||
protocol from OpenConfig Operator Working Group, mainly | protocol from OpenConfig Operator Working Group, mainly | |||
contributed by Google. See [gnmi] for details. | contributed by Google. See [gnmi] for details. | |||
GPB: Google Protocol Buffer, an extensible mechanism for serializing | GPB: Google Protocol Buffer, an extensible mechanism for serializing | |||
structured data. | structured data. | |||
gRPC: gRPC Remote Procedure Call, a open source high performance RPC | gRPC: gRPC Remote Procedure Call, an open source high performance | |||
framework that gNMI is based on. See [grpc] for details. | RPC framework that gNMI is based on. See [grpc] for details. | |||
IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. | IPFIX: IP Flow Information Export Protocol, specified in [RFC7011]. | |||
IOAM: In-situ OAM, a dataplane on-path telemetry technique. | IOAM: In-situ OAM, a dataplane on-path telemetry technique. | |||
JSON: An open standard file format and data interchange format that | JSON: An open standard file format and data interchange format that | |||
uses human-readable text to store and transmit data objects. | uses human-readable text to store and transmit data objects. | |||
MIB: Management Information Base, a database used for managing the | MIB: Management Information Base, a database used for managing the | |||
entities in a network. | entities in a network. | |||
skipping to change at page 8, line 6 ¶ | skipping to change at page 8, line 6 ¶ | |||
While the list is by no means exhaustive, it is enough to highlight | While the list is by no means exhaustive, it is enough to highlight | |||
the requirements for data velocity, variety, volume, and veracity in | the requirements for data velocity, variety, volume, and veracity in | |||
networks. | networks. | |||
* Security: Network intrusion detection and prevention systems need | * Security: Network intrusion detection and prevention systems need | |||
to monitor network traffic and activities and act upon anomalies. | to monitor network traffic and activities and act upon anomalies. | |||
Given increasingly sophisticated attack vector coupled with | Given increasingly sophisticated attack vector coupled with | |||
increasingly severe consequences of security breaches, new tools | increasingly severe consequences of security breaches, new tools | |||
and techniques need to be developed, relying on wider and deeper | and techniques need to be developed, relying on wider and deeper | |||
visibility into networks. The ultimate goal is to achieve the | visibility into networks. The ultimate goal is to achieve the | |||
ideal security with no or minimal human intervention. | ideal security with no, or only minimal, human intervention. | |||
* Policy and Intent Compliance: Network policies are the rules that | * Policy and Intent Compliance: Network policies are the rules that | |||
constrain the services for network access, provide service | constrain the services for network access, provide service | |||
differentiation, or enforce specific treatment on the traffic. | differentiation, or enforce specific treatment on the traffic. | |||
For example, a service function chain is a policy that requires | For example, a service function chain is a policy that requires | |||
the selected flows to pass through a set of ordered network | the selected flows to pass through a set of ordered network | |||
functions. Intent, as defined in | functions. Intent, as defined in | |||
[I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational | [I-D.irtf-nmrg-ibn-concepts-definitions], is a set of operational | |||
goal that a network should meet and outcomes that a network is | goal that a network should meet and outcomes that a network is | |||
supposed to deliver, defined in a declarative manner without | supposed to deliver, defined in a declarative manner without | |||
specifying how to achieve or implement them. An intent requires a | specifying how to achieve or implement them. An intent requires a | |||
complex translation and mapping process before being applied on | complex translation and mapping process before being applied on | |||
networks. While a policy or an intent is enforced, the compliance | networks. While a policy or intent is enforced, the compliance | |||
needs to be verified and monitored continuously relying on | needs to be verified and monitored continuously by relying on | |||
visibility that is provided through network telemetry data, any | visibility that is provided through network telemetry data. Any | |||
violation needs to be reported immediately, and updates need to be | violation must be notified immediately, potentially resulting in | |||
applied to ensure the intent remains in force. | updates to how the policy or intent is applied in the network to | |||
ensure that it remains in force, or otherwise alerting the network | ||||
administrator to the policy or intent violation. | ||||
* SLA Compliance: A Service-Level Agreement (SLA) defines the level | * SLA Compliance: A Service-Level Agreement (SLA) defines the level | |||
of service a user expects from a network operator, which include | of service a user expects from a network operator, which include | |||
the metrics for the service measurement and remedy/penalty | the metrics for the service measurement and remedy/penalty | |||
procedures when the service level misses the agreement. Users | procedures when the service level misses the agreement. Users | |||
need to check if they get the service as promised and network | need to check if they get the service as promised and network | |||
operators need to evaluate how they can deliver the services that | operators need to evaluate how they can deliver the services that | |||
can meet the SLA based on realtime network telemetry data, | can meet the SLA based on realtime network telemetry data, | |||
including data from network measurements. | including data from network measurements. | |||
* Root Cause Analysis: Any network failure can be the effect of a | * Root Cause Analysis: Any network failure can be the effect of a | |||
sequence of chained events. Troubleshooting and recovery require | sequence of chained events. Troubleshooting and recovery require | |||
quick identification of the root cause of any observable issues. | quick identification of the root cause of any observable issues. | |||
However, the root cause is not always straightforward to identify, | However, the root cause is not always straightforward to identify, | |||
especially when the failure is sporadic and the number of event | especially when the failure is sporadic and the number of event | |||
messages, both related and unrelated to the same cause, is | messages, both related and unrelated to the same cause, is | |||
overwhelming. While machine learning technologies can be used for | overwhelming. While machine learning technologies can be used for | |||
root cause analysis, it up to the network to sense and provide the | root cause analysis, it up to the network to sense and provide the | |||
relevant diagnostic data which are either actively fed into or | relevant diagnostic data which are either actively fed into, or | |||
passively retrieved by machine learning applications. | passively retrieved by, machine learning applications. | |||
* Network Optimization: This covers all short-term and long-term | * Network Optimization: This covers all short-term and long-term | |||
network optimization techniques, including load balancing, Traffic | network optimization techniques, including load balancing, Traffic | |||
Engineering (TE), and network planning. Network operators are | Engineering (TE), and network planning. Network operators are | |||
motivated to optimize their network utilization and differentiate | motivated to optimize their network utilization and differentiate | |||
services for better Return On Investment (ROI) or lower Capital | services for better Return On Investment (ROI) or lower Capital | |||
Expenditures (CAPEX). The first step is to know the real-time | Expenditures (CAPEX). The first step is to know the real-time | |||
network conditions before applying policies for traffic | network conditions before applying policies for traffic | |||
manipulation. In some cases, micro-bursts need to be detected in | manipulation. In some cases, micro-bursts need to be detected in | |||
a very short time-frame so that fine-grained traffic control can | a very short time-frame so that fine-grained traffic control can | |||
skipping to change at page 10, line 15 ¶ | skipping to change at page 10, line 9 ¶ | |||
* Many application scenarios need to correlate network-wide data | * Many application scenarios need to correlate network-wide data | |||
from multiple sources (i.e., from distributed network devices, | from multiple sources (i.e., from distributed network devices, | |||
different components of a network device, or different network | different components of a network device, or different network | |||
planes). A piecemeal solution is often lacking the capability to | planes). A piecemeal solution is often lacking the capability to | |||
consolidate the data from multiple sources. The composition of a | consolidate the data from multiple sources. The composition of a | |||
complete solution, as partly proposed by Autonomic Resource | complete solution, as partly proposed by Autonomic Resource | |||
Control Architecture(ARCA) | Control Architecture(ARCA) | |||
[I-D.pedro-nmrg-anticipated-adaptation], will be empowered and | [I-D.pedro-nmrg-anticipated-adaptation], will be empowered and | |||
guided by a comprehensive framework. | guided by a comprehensive framework. | |||
* Some of the conventional OAM techniques (e.g., CLI and Syslog) | * Some conventional OAM techniques (e.g., CLI and Syslog) lack a | |||
lack a formal data model. The unstructured data hinder the tool | formal data model. The unstructured data hinder the tool | |||
automation and application extensibility. Standardized data | automation and application extensibility. Standardized data | |||
models are essential to support the programmable networks. | models are essential to support the programmable networks. | |||
* Although some conventional OAM techniques support data push (e.g., | * Although some conventional OAM techniques support data push (e.g., | |||
SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data | SNMP Trap [RFC2981][RFC3877], Syslog, and sFlow), the pushed data | |||
are limited to only predefined management plane warnings (e.g., | are limited to only predefined management plane warnings (e.g., | |||
SNMP Trap) or sampled user packets (e.g., sFlow). Network | SNMP Trap) or sampled user packets (e.g., sFlow). Network | |||
operators require the data with arbitrary source, granularity, and | operators require the data with arbitrary source, granularity, and | |||
precision which are beyond the capability of the existing | precision which are beyond the capability of the existing | |||
techniques. | techniques. | |||
skipping to change at page 14, line 5 ¶ | skipping to change at page 14, line 5 ¶ | |||
make efficient use of network resources and reduce the impact of | make efficient use of network resources and reduce the impact of | |||
processing related to network telemetry on network performance. | processing related to network telemetry on network performance. | |||
For example, routine network monitoring should cover the entire | For example, routine network monitoring should cover the entire | |||
network with a low data sampling rate. Only when issues arise or | network with a low data sampling rate. Only when issues arise or | |||
critical trends emerge should telemetry data source be modified | critical trends emerge should telemetry data source be modified | |||
and telemetry data rates boosted as needed. | and telemetry data rates boosted as needed. | |||
* Efficient data fusion is critical for applications to reduce the | * Efficient data fusion is critical for applications to reduce the | |||
overall quantity of data and improve the accuracy of analysis. | overall quantity of data and improve the accuracy of analysis. | |||
A telemetry framework collects together all of the telemetry-related | A telemetry framework collects together all the telemetry-related | |||
works from different sources and working groups within IETF. This | works from different sources and working groups within IETF. This | |||
makes it possible to assemble a comprehensive network telemetry | makes it possible to assemble a comprehensive network telemetry | |||
system and to avoid repetitious or redundant work. The framework | system and to avoid repetitious or redundant work. The framework | |||
should cover the concepts and components from the standardization | should cover the concepts and components from the standardization | |||
perspective. This document describes the modules which make up a | perspective. This document describes the modules which make up a | |||
network telemetry framework and decomposes the telemetry system into | network telemetry framework and decomposes the telemetry system into | |||
a set of distinct components that existing and future work can easily | a set of distinct components that existing and future work can easily | |||
map to. | map to. | |||
4. Network Telemetry Framework | 4. Network Telemetry Framework | |||
The top level network telemetry framework partitions the network | The top level network telemetry framework partitions the network | |||
telemetry into four modules based on the telemetry data object source | telemetry into four modules based on the telemetry data object source | |||
and represents their relationship. At the next level, the framework | and represents their relationship. At the next level, the framework | |||
decomposes each module into separate components. Each of the modules | decomposes each module into separate components. Each of the modules | |||
follows the same underlying structure, with one component dedicated | follows the same underlying structure, with one component dedicated | |||
to the configuration of data subscriptions and data sources, a second | to the configuration of data subscriptions and data sources, a second | |||
component dedicated to encoding and exporting data, and a third | component dedicated to encoding and exporting data, and a third | |||
component instrumenting the generation of telemetry related to the | component instrumenting the generation of telemetry related to the | |||
underlying resources. Throughout the framework, the same set of | underlying resources. Throughout the framework, the same set of | |||
abstract data acquiring mechanisms and data types (Section 4.3)are | abstract data acquiring mechanisms and data types (Section 4.3) are | |||
applied. The two-level architecture with the uniform data | applied. The two-level architecture with the uniform data | |||
abstraction helps accurately pinpoint a protocol or technique to its | abstraction helps accurately pinpoint a protocol or technique to its | |||
position in a network telemetry system or disaggregate a network | position in a network telemetry system or disaggregate a network | |||
telemetry system into manageable parts. | telemetry system into manageable parts. | |||
4.1. Top Level Modules | 4.1. Top Level Modules | |||
Telemetry can be applied on the forwarding plane, the control plane, | Telemetry can be applied on the forwarding plane, the control plane, | |||
and the management plane in a network, as well as other sources out | and the management plane in a network, as well as other sources out | |||
of the network, as shown in Figure 1. Therefore, we categorize the | of the network, as shown in Figure 1. Therefore, we categorize the | |||
skipping to change at page 16, line 21 ¶ | skipping to change at page 16, line 21 ¶ | |||
Because the locations that can export data have different | Because the locations that can export data have different | |||
capabilities, different choices of data model, encoding, and | capabilities, different choices of data model, encoding, and | |||
transport method are made to balance the performance and cost. For | transport method are made to balance the performance and cost. For | |||
example, the forwarding chip has high throughput but limited capacity | example, the forwarding chip has high throughput but limited capacity | |||
for processing complex data and maintaining states, while the main | for processing complex data and maintaining states, while the main | |||
control CPU is capable of complex data and state processing, but has | control CPU is capable of complex data and state processing, but has | |||
limited bandwidth for high throughput data. As a result, the | limited bandwidth for high throughput data. As a result, the | |||
suitable telemetry protocol for each module can be different. Some | suitable telemetry protocol for each module can be different. Some | |||
representative techniques are shown in the corresponding table blocks | representative techniques are shown in the corresponding table blocks | |||
to highlight the technical diversity of these modules. Note that the | to highlight the technical diversity of these modules. Note that the | |||
selected techniques just reflect the de-facto state of the art and | selected techniques just reflect the de facto state of the art and | |||
are not exhaustive. The key point is that one cannot expect to use a | are not exhaustive. The key point is that one cannot expect to use a | |||
universal protocol to cover all the network telemetry requirements. | universal protocol to cover all the network telemetry requirements. | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
| Module | Control | Management | Forwarding | External | | | Module | Management | Control | Forwarding | External | | |||
| | Plane | Plane | Plane | Data | | | | Plane | Plane | Plane | Data | | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
|Object | control | config. & | flow & packet | terminal, | | |Object | config. & | control | flow & packet | terminal, | | |||
| | protocol & | operation | QoS, traffic | social & | | | | operation | protocol & | QoS, traffic | social & | | |||
| | signaling, | state | stat., buffer | environ- | | | | state | signaling, | stat., buffer | environ- | | |||
| | RIB, ACL | | & queue stat.,| mental | | | | | RIB | & queue stat.,| mental | | |||
| | | | ACL, FIB | | | | | | | ACL, FIB | | | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
|Export | main control | main control | fwding chip | various | | |Export | main control | main control | fwding chip | various | | |||
|Location | CPU, | CPU | or linecard | | | |Location | CPU | CPU, | or linecard | | | |||
| | linecard CPU | | CPU; main | | | | | | linecard CPU | CPU; main | | | |||
| | or fwding | | control CPU | | | | | | or forwarding| control CPU | | | |||
| | chip | | unlikely | | | | | | chip | unlikely | | | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
|Data | YANG, | YANG, MIB, | template, | YANG, | | |Data | YANG, MIB, | YANG, | template, | YANG, | | |||
|Model | custom | syslog, | YANG, | custom | | |Model | syslog | custom | YANG, | custom | | |||
| | | | custom | | | | | | | custom | | | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
|Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | | |Data | GPB, JSON, | GPB, JSON, | plain | GPB, JSON | | |||
|Encoding | XML, plain | XML | | XML, plain| | |Encoding | XML | XML, plain | | XML, plain| | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
|Protocol | gRPC,NETCONF,| gRPC,NETCONF,| IPFIX, mirror,| gRPC | | |Protocol | gRPC,NETCONF,| gRPC,NETCONF,| IPFIX, mirror,| gRPC | | |||
| | IPFIX,mirror | | gRPC, NETFLOW | | | | | | IPFIX, mirror| gRPC, NETFLOW | | | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
|Transport| HTTP, TCP, | HTTP, TCP | UDP | HTTP,TCP | | |Transport| HTTP, TCP | HTTP, TCP, | UDP | HTTP,TCP | | |||
| | UDP | | | UDP | | | | | UDP | | UDP | | |||
+---------+--------------+--------------+---------------+-----------+ | +---------+--------------+--------------+---------------+-----------+ | |||
Figure 2: Comparison of the Data Object Modules | Figure 2: Comparison of the Data Object Modules | |||
Note that the interaction with the applications that consume network | Note that the interaction with the applications that consume network | |||
telemetry data can be indirect. Some in-device data transfer is | telemetry data can be indirect. Some in-device data transfer is | |||
possible. For example, in the management plane telemetry, the | possible. For example, in the management plane telemetry, the | |||
management plane will need to acquire data from the data plane. Some | management plane will need to acquire data from the data plane. Some | |||
of the operational states can only be derived from data plane data | operational states can only be derived from data plane data sources | |||
sources such as the interface status and statistics. As another | such as the interface status and statistics. As another example, | |||
example, obtaining control plane telemetry data may require the | obtaining control plane telemetry data may require the ability to | |||
ability to access the Forwarding Information Base (FIB) of the data | access the Forwarding Information Base (FIB) of the data plane. | |||
plane. | ||||
On the other hand, an application may involve more than one plane and | On the other hand, an application may involve more than one plane and | |||
interact with multiple planes simultaneously. For example, an SLA | interact with multiple planes simultaneously. For example, an SLA | |||
compliance application may require both the data plane telemetry and | compliance application may require both the data plane telemetry and | |||
the control plane telemetry. | the control plane telemetry. | |||
The requirements and challenges for each module are summarized as | The requirements and challenges for each module are summarized as | |||
follows (note that the requirements may pertain across all telemetry | follows (note that the requirements may pertain across all telemetry | |||
modules; however, we emphasize those that are most pronounced for a | modules; however, we emphasize those that are most pronounced for a | |||
particular plane). | particular plane). | |||
skipping to change at page 18, line 21 ¶ | skipping to change at page 18, line 21 ¶ | |||
The management plane of network elements interacts with the Network | The management plane of network elements interacts with the Network | |||
Management System (NMS), and provides information such as performance | Management System (NMS), and provides information such as performance | |||
data, network logging data, network warning and defects data, and | data, network logging data, network warning and defects data, and | |||
network statistics and state data. The management plane includes | network statistics and state data. The management plane includes | |||
many protocols, including some that are considered "legacy", such as | many protocols, including some that are considered "legacy", such as | |||
SNMP and syslog. Regardless the protocol, management plane telemetry | SNMP and syslog. Regardless the protocol, management plane telemetry | |||
must address the following requirements: | must address the following requirements: | |||
* Convenient Data Subscription: An application should have the | * Convenient Data Subscription: An application should have the | |||
freedom to choose the data export means such as the data types (as | freedom to choose which data is exported (see section 4.3) and the | |||
described in Figure 4) and the export means and frequency (e.g., | means and frequency of how that data is exported (e.g., on-change | |||
on-change or periodic subscription). | or periodic subscription). | |||
* Structured Data: For automatic network operation, machines will | * Structured Data: For automatic network operation, machines will | |||
replace human for network data comprehension. Data modeling | replace human for network data comprehension. Data modeling | |||
languages, such as YANG, can efficiently describe structured data | languages, such as YANG, can efficiently describe structured data | |||
and normalize data encoding and transformation. | and normalize data encoding and transformation. | |||
* High Speed Data Transport: In order to keep up with the velocity | * High Speed Data Transport: In order to keep up with the velocity | |||
of information, a server needs to be able to send large amounts of | of information, a server needs to be able to send large amounts of | |||
data at high frequency. Compact encoding formats or data | data at high frequency. Compact encoding formats or data | |||
compression schemes are needed to compress the data and improve | compression schemes are needed to reduce the quantity of data and | |||
the data transport efficiency. The subscription mode, by | improve the data transport efficiency. The subscription mode, by | |||
replacing the query mode, reduces the interactions between clients | replacing the query mode, reduces the interactions between clients | |||
and servers and helps to improve the server's efficiency. | and servers and helps to improve the server's efficiency. | |||
4.1.2. Control Plane Telemetry | 4.1.2. Control Plane Telemetry | |||
The control plane telemetry refers to the health condition monitoring | The control plane telemetry refers to the health condition monitoring | |||
of different network control protocols at all layers of the protocol | of different network control protocols at all layers of the protocol | |||
stack. Keeping track of the operational status of these protocols is | stack. Keeping track of the operational status of these protocols is | |||
beneficial for detecting, localizing, and even predicting various | beneficial for detecting, localizing, and even predicting various | |||
network issues, as well as network optimization, in real-time and | network issues, as well as network optimization, in real-time and | |||
skipping to change at page 19, line 22 ¶ | skipping to change at page 19, line 22 ¶ | |||
common issue behind these methods is that they only measure the | common issue behind these methods is that they only measure the | |||
KPIs instead of reflecting the actual running status of these | KPIs instead of reflecting the actual running status of these | |||
protocols, making them less effective or efficient for control | protocols, making them less effective or efficient for control | |||
plane troubleshooting and network optimization. | plane troubleshooting and network optimization. | |||
* An example of the control plane telemetry is the BGP monitoring | * An example of the control plane telemetry is the BGP monitoring | |||
protocol (BMP), it is currently used for monitoring the BGP routes | protocol (BMP), it is currently used for monitoring the BGP routes | |||
and enables rich applications, such as BGP peer analysis, AS | and enables rich applications, such as BGP peer analysis, AS | |||
analysis, prefix analysis, and security analysis. However, the | analysis, prefix analysis, and security analysis. However, the | |||
monitoring of other layers, protocols and the cross-layer, cross- | monitoring of other layers, protocols and the cross-layer, cross- | |||
protocol KPI correlations are still in their infancy (e.g., the | protocol KPI correlations are still in their infancy (e.g., IGP | |||
IGP monitoring is not as exensive as BMP), which require further | monitoring is not as extensive as BMP), which require further | |||
research. | research. | |||
4.1.3. Forwarding Plane Telemetry | 4.1.3. Forwarding Plane Telemetry | |||
An effective forwarding plane telemetry system relies on the data | An effective forwarding plane telemetry system relies on the data | |||
that the network device can expose. The quality, quantity, and | that the network device can expose. The quality, quantity, and | |||
timeliness of data must meet some stringent requirements. This | timeliness of data must meet some stringent requirements. This | |||
raises some challenges to the network data plane devices where the | raises some challenges to the network data plane devices where the | |||
first hand data originates. | first-hand data originates. | |||
* A data plane device's main function is user traffic processing and | * A data plane device's main function is user traffic processing and | |||
forwarding. While supporting network visibility is important, the | forwarding. While supporting network visibility is important, the | |||
telemetry is just an auxiliary function, and it should strive to | telemetry is just an auxiliary function, and it should strive to | |||
not impede normal traffic processing and forwarding (i.e., the | not impede normal traffic processing and forwarding (i.e., the | |||
forwarding behavior should not be altered and the tradeoff between | forwarding behavior should not be altered and the trade-off | |||
forwarding and telemtry should be well balanced). | between forwarding performance and telemetry should be well- | |||
balanced). | ||||
* Network operation applications require end-to-end visibility | * Network operation applications require end-to-end visibility | |||
across various sources, which can result in a huge volume of data. | across various sources, which can result in a huge volume of data. | |||
However, the sheer quantity of data must not exhaust the network | However, the sheer quantity of data must not exhaust the network | |||
bandwidth, regardless of the data delivery approach (i.e., whether | bandwidth, regardless of the data delivery approach (i.e., whether | |||
through in-band or out-of-band channels). | through in-band or out-of-band channels). | |||
* The data plane devices must provide timely data with the minimum | * The data plane devices must provide timely data with the minimum | |||
possible delay. Long processing, transport, storage, and analysis | possible delay. Long processing, transport, storage, and analysis | |||
delay can impact the effectiveness of the control loop and even | delay can impact the effectiveness of the control loop and even | |||
skipping to change at page 20, line 46 ¶ | skipping to change at page 20, line 46 ¶ | |||
[I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and | [I-D.ietf-ippm-ioam-data], Alternate-Marking (AM) [RFC8321], and | |||
Multipoint Alternate Marking [I-D.ietf-ippm-multipoint-alt-mark], | Multipoint Alternate Marking [I-D.ietf-ippm-multipoint-alt-mark], | |||
provide a well-balanced and more flexible approach. However, | provide a well-balanced and more flexible approach. However, | |||
these methods are also more complex to implement. | these methods are also more complex to implement. | |||
* In-Band and Out-of-Band: Telemetry data carried in user packets | * In-Band and Out-of-Band: Telemetry data carried in user packets | |||
before being exported to a data collector is considered in-band | before being exported to a data collector is considered in-band | |||
(e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]). Telemetry data | (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]). Telemetry data | |||
that is directly exported to a data collector without modifying | that is directly exported to a data collector without modifying | |||
user packets is considered out-of-band (e.g., the postcard-based | user packets is considered out-of-band (e.g., the postcard-based | |||
approach described in Appendix). It is also possible to have | approach described in Appendix A.3.5). It is also possible to | |||
hybrid methods, where only the telemetry instruction or partial | have hybrid methods, where only the telemetry instruction or | |||
data is carried by user packets (e.g., AM [RFC8321]). | partial data is carried by user packets (e.g., AM [RFC8321]). | |||
* End-to-End and In-Network: End-to-End methods start from, and end | * End-to-End and In-Network: End-to-End methods start from, and end | |||
at, the network end hosts (e.g., Ping). In-Network methods work | at, the network end hosts (e.g., Ping). In-Network methods work | |||
in networks and are transparent to end hosts. However, if needed, | in networks and are transparent to end hosts. However, if needed, | |||
In-Network methods can be easily extended into end hosts. | In-Network methods can be easily extended into end hosts. | |||
* Data Subject: Depending on the telemetry objective, the methods | * Data Subject: Depending on the telemetry objective, the methods | |||
can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), | can be flow-based (e.g., in-situ OAM [I-D.ietf-ippm-ioam-data]), | |||
path-based (e.g., Traceroute), and node-based (e.g., IPFIX | path-based (e.g., Traceroute), and node-based (e.g., IPFIX | |||
[RFC7011]). The various data objects can be packet, flow record, | [RFC7011]). The various data objects can be packet, flow record, | |||
skipping to change at page 21, line 32 ¶ | skipping to change at page 21, line 32 ¶ | |||
[I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and | [I-D.pedro-nmrg-anticipated-adaptation], provides a strategic and | |||
functional advantage to management operations. | functional advantage to management operations. | |||
As with other sources of telemetry information, the data and events | As with other sources of telemetry information, the data and events | |||
must meet strict requirements, especially in terms of timeliness, | must meet strict requirements, especially in terms of timeliness, | |||
which is essential to properly incorporate external event information | which is essential to properly incorporate external event information | |||
into network management applications. The specific challenges are | into network management applications. The specific challenges are | |||
described as follows: | described as follows: | |||
* The role of the external event detector can be played by multiple | * The role of the external event detector can be played by multiple | |||
elements, including hardware (e.g. physical sensors, such as | elements, including hardware (e.g., physical sensors, such as | |||
seismometers) and software (e.g. Big Data sources that analyze | seismometers) and software (e.g., Big Data sources that analyze | |||
streams of information, such as Twitter messages). Thus, the | streams of information, such as Twitter messages). Thus, the | |||
transmitted data must support different shapes but, at the same | transmitted data must support different shapes but, at the same | |||
time, follow a common but extensible schema. | time, follow a common but extensible schema. | |||
* Since the main function of the external event detectors is to | * Since the main function of the external event detectors is to | |||
perform the notifications, their timeliness is assumed. However, | perform the notifications, their timeliness is assumed. However, | |||
once messages have been dispatched, they must be quickly collected | once messages have been dispatched, they must be quickly collected | |||
and inserted into the control plane with variable priority, which | and inserted into the control plane with variable priority, which | |||
is higher for important sources and events and lower for secondary | is higher for important sources and events and lower for secondary | |||
ones. | ones. | |||
skipping to change at page 22, line 13 ¶ | skipping to change at page 22, line 13 ¶ | |||
be easily mapped to current data models, such as in terms of YANG. | be easily mapped to current data models, such as in terms of YANG. | |||
Organizing both internal and external telemetry information together | Organizing both internal and external telemetry information together | |||
will be key for the general exploitation of the management | will be key for the general exploitation of the management | |||
possibilities of current and future network systems, as reflected in | possibilities of current and future network systems, as reflected in | |||
the incorporation of cognitive capabilities to new hardware and | the incorporation of cognitive capabilities to new hardware and | |||
software (virtual) elements. | software (virtual) elements. | |||
4.2. Second Level Function Components | 4.2. Second Level Function Components | |||
The telemetry module as each plane can be further partitioned into | The telemetry module at each plane can be further partitioned into | |||
five distinct conceptual components: | five distinct conceptual components: | |||
* Data Query, Analysis, and Storage: This component works at the | * Data Query, Analysis, and Storage: This component works at the | |||
application layer. It is normally a part of the network | application layer. It is normally a part of the network | |||
management system at the receiver side. On the one hand, it is | management system at the receiver side. On the one hand, it is | |||
responsible for issuing data requirements. The data of interest | responsible for issuing data requirements. The data of interest | |||
can be modeled data through configuration or custom data through | can be modeled data through configuration or custom data through | |||
programming. The data requirements can be queries for one-shot | programming. The data requirements can be queries for one-shot | |||
data or subscriptions for events or streaming data. On the other | data or subscriptions for events or streaming data. On the other | |||
hand, it receives, stores, and processes the returned data from | hand, it receives, stores, and processes the returned data from | |||
skipping to change at page 22, line 48 ¶ | skipping to change at page 22, line 48 ¶ | |||
access control. The data encoding and the transport protocol may | access control. The data encoding and the transport protocol may | |||
vary due to the data export location. | vary due to the data export location. | |||
* Data Generation and Processing: The requested data needs to be | * Data Generation and Processing: The requested data needs to be | |||
captured, filtered, processed, and formatted in network devices | captured, filtered, processed, and formatted in network devices | |||
from raw data sources. This may involve in-network computing and | from raw data sources. This may involve in-network computing and | |||
processing on either the fast path or the slow path in network | processing on either the fast path or the slow path in network | |||
devices. | devices. | |||
* Data Object and Source: This component determines the monitoring | * Data Object and Source: This component determines the monitoring | |||
objects and original data sources provisioned in device. A data | objects and original data sources provisioned in the device. A | |||
source usually just provides raw data which needs further | data source usually just provides raw data which needs further | |||
processing. Each data source can be considered a probe. Some | processing. Each data source can be considered a probe. Some | |||
data sources can be dynamically installed, while others will be | data sources can be dynamically installed, while others will be | |||
more static. | more static. | |||
+----------------------------------------+ | +----------------------------------------+ | |||
+----------------------------------------+ | | +----------------------------------------+ | | |||
| | | | | | | | |||
| Data Query, Analysis, & Storage | | | | Data Query, Analysis, & Storage | | | |||
| | + | | | + | |||
+-------+++ -----------------------------+ | +-------+++ -----------------------------+ | |||
skipping to change at page 24, line 21 ¶ | skipping to change at page 24, line 21 ¶ | |||
* Simple Data: The data that are steadily available from some | * Simple Data: The data that are steadily available from some | |||
datastore or static probes in network devices. | datastore or static probes in network devices. | |||
* Derived Data: The data need to be synthesized or processed in | * Derived Data: The data need to be synthesized or processed in | |||
network from raw data from one or more network devices. The data | network from raw data from one or more network devices. The data | |||
processing function can be statically or dynamically loaded into | processing function can be statically or dynamically loaded into | |||
network devices. | network devices. | |||
* Event-triggered Data: The data are conditionally acquired based on | * Event-triggered Data: The data are conditionally acquired based on | |||
the occurrence of some events. For example, a network interface | the occurrence of some events. An example of event-triggered data | |||
changing its operational state from up to down can be a trigger | could be an interface changing operational state between up and | |||
event. Such data can be actively pushed through subscription or | down. Such data can be actively pushed through subscription or | |||
passively polled through query. There are many ways to model | passively polled through query. There are many ways to model | |||
events, including using Finite State Machine (FSM) or Event | events, including using Finite State Machine (FSM) or Event | |||
Condition Action (ECA) [I-D.wwx-netmod-event-yang]. | Condition Action (ECA) [I-D.wwx-netmod-event-yang]. | |||
* Streaming Data: The data are continuously generated. It can be | * Streaming Data: The data are continuously generated. It can be | |||
time series or the dump of databases. For example, an interface | time series or the dump of databases. For example, an interface | |||
packet counter is exported every second. The streaming data | packet counter is exported every second. The streaming data | |||
reflect realtime network states and metrics and require large | reflect realtime network states and metrics and require large | |||
bandwidth and processing power. The streaming data are always | bandwidth and processing power. The streaming data are always | |||
actively pushed to the subscribers. | actively pushed to the subscribers. | |||
skipping to change at page 26, line 4 ¶ | skipping to change at page 26, line 4 ¶ | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, | | | data config.| gNMI, NETCONF, | gNMI, NETCONF,| NETCONF, | | |||
| & subscribe | SNMP, YANG-Push | YANG-Push | YANG-Push | | | & subscribe | SNMP, YANG-Push | YANG-Push | YANG-Push | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data gen. & | MIB, | YANG | IOAM, PSAMP | | | data gen. & | MIB, | YANG | IOAM, PSAMP | | |||
| process | YANG | | PBT, AM, | | | process | YANG | | PBT, AM, | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
| data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | | data encode.| gRPC, HTTP, TCP | BMP, TCP | IPFIX, UDP | | |||
| & export | | | | | | & export | | | | | |||
+-------------+-----------------+---------------+--------------+ | +-------------+-----------------+---------------+--------------+ | |||
Figure 5: Existing Work Mapping II | Figure 5: Existing Work Mapping | |||
5. Evolution of Network Telemetry Applications | 5. Evolution of Network Telemetry Applications | |||
Network telemetry is an evolving technical area. As the network | Network telemetry is an evolving technical area. As the network | |||
moves towards the automated operation, network telemetry applications | moves towards the automated operation, network telemetry applications | |||
undergo several stages of evolution which add new layer of | undergo several stages of evolution which add new layer of | |||
requirements to the underlying network telemetry techniques. Each | requirements to the underlying network telemetry techniques. Each | |||
stage is built upon the techniques adopted by the previous stages | stage is built upon the techniques adopted by the previous stages | |||
plus some new requirements. | plus some new requirements. | |||
Stage 0 - Static Telemetry: The telemetry data source and type are | Stage 0 - Static Telemetry: The telemetry data source and type are | |||
determined at design time. The network operator can only | determined at design time. The network operator can only | |||
configure how to use it with limited flexibility. | configure how to use it with limited flexibility. | |||
Stage 1 - Dynamic Telemetry: The custom telemetry data can be | Stage 1 - Dynamic Telemetry: The custom telemetry data can be | |||
dynamically programmed or configured at runtime without | dynamically programmed or configured at runtime without | |||
interrupting the network operation, allowing a tradeoff among | interrupting the network operation, allowing a trade-off among | |||
resource, performance, flexibility, and coverage. | resource, performance, flexibility, and coverage. | |||
Stage 2 - Interactive Telemetry: The network operator can | Stage 2 - Interactive Telemetry: The network operator can | |||
continuously customize and fine tune the telemetry data in real | continuously customize and fine tune the telemetry data in real | |||
time to reflect the network operation's visibility requirements. | time to reflect the network operation's visibility requirements. | |||
Compared with Stage 1, the changes are frequent based on the real- | Compared with Stage 1, the changes are frequent based on the real- | |||
time feedback. At this stage, some tasks can be automated, but | time feedback. At this stage, some tasks can be automated, but | |||
human operators still need to sit in the middle to make decisions. | human operators still need to sit in the middle to make decisions. | |||
Stage 3 - Closed-loop Telemetry: The telemetry is free from the | Stage 3 - Closed-loop Telemetry: The telemetry is free from the | |||
skipping to change at page 26, line 49 ¶ | skipping to change at page 26, line 49 ¶ | |||
future autonomic networks may need a comprehensive operation | future autonomic networks may need a comprehensive operation | |||
management system which works at stage 2 and stage 3 to cover all the | management system which works at stage 2 and stage 3 to cover all the | |||
network operation tasks. A well-defined network telemetry framework | network operation tasks. A well-defined network telemetry framework | |||
is the first step towards this direction. | is the first step towards this direction. | |||
6. Security Considerations | 6. Security Considerations | |||
The complexity of network telemetry raises significant security | The complexity of network telemetry raises significant security | |||
implications. For example, telemetry data can be manipulated to | implications. For example, telemetry data can be manipulated to | |||
exhaust various network resources at each plane as well as the data | exhaust various network resources at each plane as well as the data | |||
consumer; falsified or tampered data can mislead the decision making | consumer; falsified or tampered data can mislead the decision-making | |||
and paralyze networks; wrong configuration and programming for | and paralyze networks; wrong configuration and programming for | |||
telemetry is equally harmful. The telemetry data is highly | telemetry is equally harmful. The telemetry data is highly | |||
sensitive, which exposes a lot of information about the network and | sensitive, which exposes a lot of information about the network and | |||
its configuration. Some of that information can make designing | its configuration. Some of that information can make designing | |||
attacks against the network much easier (e.g., exact details of what | attacks against the network much easier (e.g., exact details of what | |||
software and patches have been installed), and allows an attacker to | software and patches have been installed), and allows an attacker to | |||
determine whether a device may be subject to unprotected security | determine whether a device may be subject to unprotected security | |||
vulnerability. | vulnerabilities. | |||
Given that this document has proposed a framework for network | Given that this document has proposed a framework for network | |||
telemetry and the telemetry mechanisms discussed are more extensive | telemetry and the telemetry mechanisms discussed are more extensive | |||
(in both message frequency and traffic amount) than the conventional | (in both message frequency and traffic amount) than the conventional | |||
network OAM concepts, we must also reflect that various new security | network OAM concepts, we must also reflect that various new security | |||
considerations may also arise. A number of techniques already exist | considerations may also arise. A number of techniques already exist | |||
for securing the forwarding plane, the control plane, and the | for securing the forwarding plane, the control plane, and the | |||
management plane in a network, but it is important to consider if any | management plane in a network, but it is important to consider if any | |||
new threat vectors are now being enabled via the use of network | new threat vectors are now being enabled via the use of network | |||
telemetry procedures and mechanisms. | telemetry procedures and mechanisms. | |||
skipping to change at page 28, line 5 ¶ | skipping to change at page 28, line 5 ¶ | |||
identify malicious attacks using telemetry interfaces. | identify malicious attacks using telemetry interfaces. | |||
* Authentication and signing of telemetry data to make data more | * Authentication and signing of telemetry data to make data more | |||
trustworthy. | trustworthy. | |||
* Segregating the telemetry data traffic from the data traffic | * Segregating the telemetry data traffic from the data traffic | |||
carried over the network (e.g., historically management access and | carried over the network (e.g., historically management access and | |||
management data may be carried via an independent management | management data may be carried via an independent management | |||
network). | network). | |||
Some of the security considerations highlighted above may be | Some security considerations highlighted above may be minimized or | |||
minimized or negated with policy management of network telemetry. In | negated with policy management of network telemetry. In a network | |||
a network telemetry deployment it would be advantageous to separate | telemetry deployment it would be advantageous to separate telemetry | |||
telemetry capabilities into different classes of policies, i.e., Role | capabilities into different classes of policies, i.e., Role Based | |||
Based Access Control and Event-Condition-Action policies. Also, | Access Control and Event-Condition-Action policies. Also, potential | |||
potential conflicts between network telemetry mechanisms must be | conflicts between network telemetry mechanisms must be detected | |||
detected accurately and resolved quickly to avoid unnecessary network | accurately and resolved quickly to avoid unnecessary network | |||
telemetry traffic propagation escalating into an unintended or | telemetry traffic propagation escalating into an unintended or | |||
intended denial of service attack. | intended denial of service attack. | |||
Further study of the security issues will be required, and it is | Further study of the security issues will be required, and it is | |||
expected that the secuirty mechanisms and protocols are developed and | expected that the security mechanisms and protocols are developed and | |||
deployed along with a network telemetry system. | deployed along with a network telemetry system. | |||
In addition to security, privacy is also an important issue. Network | In addition to security, privacy is also an important issue. Network | |||
telemetry means to improve the network operation which can ultimately | telemetry means to improve the network operation which can ultimately | |||
benefit end user's quality of experience. The network operators must | benefit end user's quality of experience. The network operators must | |||
be held accountable and strive for a balance between managing the | be held accountable and strive for a balance between managing the | |||
network and maintaining the user privacy of that network. | network and maintaining the user privacy of that network. | |||
7. IANA Considerations | 7. IANA Considerations | |||
skipping to change at page 29, line 33 ¶ | skipping to change at page 29, line 33 ¶ | |||
Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | Evens, T., Bayraktar, S., Bhardwaj, M., and P. Lucente, | |||
"Support for Local RIB in BGP Monitoring Protocol (BMP)", | "Support for Local RIB in BGP Monitoring Protocol (BMP)", | |||
Work in Progress, Internet-Draft, draft-ietf-grow-bmp- | Work in Progress, Internet-Draft, draft-ietf-grow-bmp- | |||
local-rib-13, 31 August 2021, | local-rib-13, 31 August 2021, | |||
<https://www.ietf.org/archive/id/draft-ietf-grow-bmp- | <https://www.ietf.org/archive/id/draft-ietf-grow-bmp- | |||
local-rib-13.txt>. | local-rib-13.txt>. | |||
[I-D.ietf-ippm-ioam-data] | [I-D.ietf-ippm-ioam-data] | |||
Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields | Brockners, F., Bhandari, S., and T. Mizrahi, "Data Fields | |||
for In-situ OAM", Work in Progress, Internet-Draft, draft- | for In-situ OAM", Work in Progress, Internet-Draft, draft- | |||
ietf-ippm-ioam-data-14, 24 June 2021, | ietf-ippm-ioam-data-15, 3 October 2021, | |||
<https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- | <https://www.ietf.org/archive/id/draft-ietf-ippm-ioam- | |||
data-14.txt>. | data-15.txt>. | |||
[I-D.ietf-ippm-multipoint-alt-mark] | [I-D.ietf-ippm-multipoint-alt-mark] | |||
Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, | Fioccola, G., Cociglio, M., Sapio, A., and R. Sisto, | |||
"Multipoint Alternate-Marking Method for Passive and | "Multipoint Alternate-Marking Method for Passive and | |||
Hybrid Performance Monitoring", Work in Progress, | Hybrid Performance Monitoring", Work in Progress, | |||
Internet-Draft, draft-ietf-ippm-multipoint-alt-mark-09, 23 | Internet-Draft, draft-ietf-ippm-multipoint-alt-mark-09, 23 | |||
March 2020, <https://www.ietf.org/archive/id/draft-ietf- | March 2020, <https://www.ietf.org/archive/id/draft-ietf- | |||
ippm-multipoint-alt-mark-09.txt>. | ippm-multipoint-alt-mark-09.txt>. | |||
[I-D.ietf-netconf-distributed-notif] | [I-D.ietf-netconf-distributed-notif] | |||
skipping to change at page 34, line 26 ¶ | skipping to change at page 34, line 26 ¶ | |||
channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for | channel [I-D.ietf-netconf-udp-notif] provides enhanced efficiency for | |||
the NETCONF based telemetry. | the NETCONF based telemetry. | |||
A.1.2. gRPC Network Management Interface | A.1.2. gRPC Network Management Interface | |||
gRPC Network Management Interface (gNMI) | gRPC Network Management Interface (gNMI) | |||
[I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol | [I-D.openconfig-rtgwg-gnmi-spec] is a network management protocol | |||
based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote | based on the gRPC [I-D.kumar-rtgwg-grpc-protocol] RPC (Remote | |||
Procedure Call) framework. With a single gRPC service definition, | Procedure Call) framework. With a single gRPC service definition, | |||
both configuration and telemetry can be covered. gRPC is an HTTP/2 | both configuration and telemetry can be covered. gRPC is an HTTP/2 | |||
[RFC7540] based open source micro service communication framework. | [RFC7540] based open-source micro-service communication framework. | |||
It provides a number of capabilities which are well-suited for | It provides a number of capabilities which are well-suited for | |||
network telemetry, including: | network telemetry, including: | |||
* Full-duplex streaming transport model combined with a binary | * Full-duplex streaming transport model combined with a binary | |||
encoding mechanism provides good telemetry efficiency. | encoding mechanism provides good telemetry efficiency. | |||
* gRPC provides higher-level features consistency across platforms | * gRPC provides higher-level features consistency across platforms | |||
that common HTTP/2 libraries typically do not. This | that common HTTP/2 libraries typically do not. This | |||
characteristic is especially valuable for the fact that telemetry | characteristic is especially valuable for the fact that telemetry | |||
data collectors normally reside on a large variety of platforms. | data collectors normally reside on a large variety of platforms. | |||
skipping to change at page 35, line 5 ¶ | skipping to change at page 35, line 5 ¶ | |||
BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP | BGP Monitoring Protocol (BMP) [RFC7854] is used to monitor BGP | |||
sessions and is intended to provide a convenient interface for | sessions and is intended to provide a convenient interface for | |||
obtaining route views. | obtaining route views. | |||
The BGP routing information is collected from the monitored device(s) | The BGP routing information is collected from the monitored device(s) | |||
to the BMP monitoring station by setting up the BMP TCP session. The | to the BMP monitoring station by setting up the BMP TCP session. The | |||
BGP peers are monitored by the BMP Peer Up and Peer Down | BGP peers are monitored by the BMP Peer Up and Peer Down | |||
Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], | Notifications. The BGP routes (including Adjacency_RIB_In [RFC7854], | |||
Adjacency_RIB_out [I-D.ietf-grow-bmp-adj-rib-out], and Local_Rib | Adjacency_RIB_out [I-D.ietf-grow-bmp-adj-rib-out], and Local_Rib | |||
[I-D.ietf-grow-bmp-local-rib] are encapsulated in the BMP Route | [I-D.ietf-grow-bmp-local-rib]) are encapsulated in the BMP Route | |||
Monitoring Message and the BMP Route Mirroring Message, providing | Monitoring Message and the BMP Route Mirroring Message, providing | |||
both an initial table dump and real-time route updates. In addition, | both an initial table dump and real-time route updates. In addition, | |||
BGP statistics are reported through the BMP Stats Report Message, | BGP statistics are reported through the BMP Stats Report Message, | |||
which could be either timer triggered or event-driven. Future BMP | which could be either timer triggered or event-driven. Future BMP | |||
extensions could further enrich BGP monitoring applications. | extensions could further enrich BGP monitoring applications. | |||
A.3. Data Plane Telemetry | A.3. Data Plane Telemetry | |||
A.3.1. The Alternate Marking (AM) technology | A.3.1. The Alternate Marking (AM) technology | |||
skipping to change at page 35, line 35 ¶ | skipping to change at page 35, line 35 ¶ | |||
the packet loss calculation. The same idea can be applied to delay | the packet loss calculation. The same idea can be applied to delay | |||
measurement by selecting ad hoc packets with a marking bit dedicated | measurement by selecting ad hoc packets with a marking bit dedicated | |||
for delay measurements. | for delay measurements. | |||
Alternate Marking method needs two counters each marking period for | Alternate Marking method needs two counters each marking period for | |||
each flow under monitor. For instance, by considering n measurement | each flow under monitor. For instance, by considering n measurement | |||
points and m monitored flows, the order of magnitude of the packet | points and m monitored flows, the order of magnitude of the packet | |||
counters for each time interval is n*m*2 (1 per color). | counters for each time interval is n*m*2 (1 per color). | |||
Since networks offer rich sets of network performance measurement | Since networks offer rich sets of network performance measurement | |||
data (e.g packet counters), traditional approaches run into | data (e.g., packet counters), traditional approaches run into | |||
limitations. The bottleneck is the generation and export of the data | limitations. The bottleneck is the generation and export of the data | |||
and the amount of data that can be reasonably collected from the | and the amount of data that can be reasonably collected from the | |||
network. In addition, management tasks related to determining and | network. In addition, management tasks related to determining and | |||
configuring which data to generate lead to significant deployment | configuring which data to generate lead to significant deployment | |||
challenges. | challenges. | |||
The Multipoint Alternate Marking approach, described in | The Multipoint Alternate Marking approach, described in | |||
[I-D.ietf-ippm-multipoint-alt-mark], aims to resolve this issue and | [I-D.ietf-ippm-multipoint-alt-mark], aims to resolve this issue and | |||
make the performance monitoring more flexible in case a detailed | make the performance monitoring more flexible in case a detailed | |||
analysis is not needed. | analysis is not needed. | |||
skipping to change at page 38, line 22 ¶ | skipping to change at page 38, line 22 ¶ | |||
management and match it to the connectors and/or interfaces required | management and match it to the connectors and/or interfaces required | |||
to connect them. | to connect them. | |||
Categories of external event sources that may be of interest to | Categories of external event sources that may be of interest to | |||
network management include:: | network management include:: | |||
* Smart objects and sensors. With the consolidation of the Internet | * Smart objects and sensors. With the consolidation of the Internet | |||
of Things~(IoT) any network system will have many smart objects | of Things~(IoT) any network system will have many smart objects | |||
attached to its physical surroundings and logical operation | attached to its physical surroundings and logical operation | |||
environments. Most of these objects will be essentially based on | environments. Most of these objects will be essentially based on | |||
sensors of many kinds (e.g. temperature, humidity, presence) and | sensors of many kinds (e.g., temperature, humidity, presence) and | |||
the information they provide can be very useful for the management | the information they provide can be very useful for the management | |||
of the network, even when they are not specifically deployed for | of the network, even when they are not specifically deployed for | |||
such purpose. Elements of this source type will usually provide a | such purpose. Elements of this source type will usually provide a | |||
specific protocol for interaction, especially one of those | specific protocol for interaction, especially one of those | |||
protocols related to IoT, such as the Constrained Application | protocols related to IoT, such as the Constrained Application | |||
Protocol (CoAP). | Protocol (CoAP). | |||
* Online news reporters. Several online news services have the | * Online news reporters. Several online news services have the | |||
ability to provide enormous quantity of information about | ability to provide enormous quantity of information about | |||
different events occurring in the world. Some of those events can | different events occurring in the world. Some of those events can | |||
skipping to change at page 38, line 51 ¶ | skipping to change at page 38, line 51 ¶ | |||
be part of both the ontology and information model of the | be part of both the ontology and information model of the | |||
telemetry framework. | telemetry framework. | |||
* Global event analyzers. The advance of Big Data analyzers | * Global event analyzers. The advance of Big Data analyzers | |||
provides a huge amount of information and, more interestingly, the | provides a huge amount of information and, more interestingly, the | |||
identification of events detected by analyzing many data streams | identification of events detected by analyzing many data streams | |||
from different origins. In contrast with the other types of | from different origins. In contrast with the other types of | |||
sources, which are focused on specific events, the detectors of | sources, which are focused on specific events, the detectors of | |||
this source type will detect generic events. For example, a | this source type will detect generic events. For example, a | |||
sports event takes place and some unexpected movement makes it | sports event takes place and some unexpected movement makes it | |||
highly interesting and many people connects to sites that are | fascinating and many people connect to sites that are reporting on | |||
reporting on the event. The underlying networks supporting the | the event. The underlying networks supporting the services that | |||
services that cover the event can be affected by such situation so | cover the event can be affected by such situation so their | |||
their management solutions should be aware of it. In contrast | management solutions should be aware of it. In contrast with the | |||
with the other source types, a new information model, format, and | other source types, a new information model, format, and reporting | |||
reporting protocol is required to integrate the detectors of this | protocol is required to integrate the detectors of this type with | |||
type with the management solution. | the management solution. | |||
Additional types of detector types can be added to the system but | Additional types of detector types can be added to the system, but | |||
they will be generally the result of composing the properties offered | they will be generally the result of composing the properties offered | |||
by these main classes. | by these main classes. | |||
A.4.2. Connectors and Interfaces | A.4.2. Connectors and Interfaces | |||
For allowing external event detectors to be properly integrated with | For allowing external event detectors to be properly integrated with | |||
other management solutions, both elements must expose interfaces and | other management solutions, both elements must expose interfaces and | |||
protocols that are subject to their particular objective. Since | protocols that are subject to their particular objective. Since | |||
external event detectors will be focused on providing their | external event detectors will be focused on providing their | |||
information to their main consumers, which generally will not be | information to their main consumers, which generally will not be | |||
End of changes. 46 change blocks. | ||||
90 lines changed or deleted | 92 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |