< draft-kim-nmrg-rl-04.txt   draft-kim-nmrg-rl-05.txt >
Network Management Research Group M-S. Kim Network Management Research Group M-S. Kim
Internet-Draft ETRI Internet-Draft ETRI
Intended status: Informational Y-H. Han Intended status: Informational Y-H. Han
Expires: September 12, 2019 KoreaTech Expires: January 9, 2020 KoreaTech
Y-G. Hong Y-G. Hong
ETRI ETRI
March 11, 2019 July 8, 2019
Intelligent Reinforcement-learning-based Network Management Intelligent Reinforcement-learning-based Network Management
draft-kim-nmrg-rl-04 draft-kim-nmrg-rl-05
Abstract Abstract
This document presents intelligent network management scenarios based This document presents intelligent network management based on
on reinforcement-learning approaches. Nowadays, a heterogeneous Artificial Intelligent (AI) such as reinforcement-learning
network should usually provide real-time connectivity, the type of approaches. In a heterogeneous network, intelligent management with
network management with the quality of real-time data, and Artificial Intelligent should usually provide real-time connectivity,
transmission services generated by the operating system for an the type of network management with the quality of real-time data,
application service. With that reason intelligent management system and transmission services generated by an application service. With
is needed to support real-time connection and protection through that reason intelligent management system is needed to support real-
efficient management of interfering network traffic for high-quality time connection and protection through efficient management of
network data transmission in the both cloud and IoE network systems. interfering network traffic for high-quality network data
transmission in the both cloud and IoE network systems.
Reinforcement-learning is one of the machine learning algorithms that Reinforcement-learning is one of the machine learning algorithms that
can intelligently and autonomously provide to management systems over can intelligently and autonomously provide to management systems over
a communication network. Reinforcement-learning has developed and a communication network. Reinforcement-learning has developed and
expanded with deep learning technique based on model-driven or data- expanded with deep learning technique based on model-driven or data-
driven technical approaches so that these trendy techniques have been driven technical approaches so that these trendy techniques have been
widely to intelligently attempt an adaptive networking models with widely to intelligently attempt an adaptive networking models with
effective strategies in environmental disturbances over variety of effective strategies in environmental disturbances over variety of
networking areas. networking areas. For Network AI with the intelligent and effective
strategies, intent-based network (IBN) can be also considered to
continuously and automatically evaluate network status under required
policy for dynamic network optimization. The key element for the
intent-based network is that it provides a verification of whether
the represented network intent is implementable or currently
implemented in the network. Additionally, this approach need to
provide to take action in real time if the desired network state and
actual state are inconsistent.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This Internet-Draft is submitted in full conformance with the
provisions of BCP 78 and BCP 79. provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 12, 2019. This Internet-Draft will expire on January 9, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2019 IETF Trust and the persons identified as the Copyright (c) 2019 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3
2. Conventions and Terminology . . . . . . . . . . . . . . . . . 3 2. Conventions and Terminology . . . . . . . . . . . . . . . . . 4
3. Theoretical Approaches . . . . . . . . . . . . . . . . . . . 4 3. Theoretical Approaches . . . . . . . . . . . . . . . . . . . 4
3.1. Reinforcement-learning . . . . . . . . . . . . . . . . . 4 3.1. Reinforcement-learning . . . . . . . . . . . . . . . . . 4
3.2. Deep-reinforcement-learning . . . . . . . . . . . . . . . 4 3.2. Deep-reinforcement-learning . . . . . . . . . . . . . . . 4
3.3. Advantage Actor Critic (A2C) . . . . . . . . . . . . . . 4 3.3. Advantage Actor Critic (A2C) . . . . . . . . . . . . . . 5
3.4. Asynchronously Advantage Actor Critic (A3C) . . . . . . . 5 3.4. Asynchronously Advantage Actor Critic (A3C) . . . . . . . 5
4. Reinforcement-learning-based process scenario . . . . . . . . 5 3.5. Intent-based Network (IBN) . . . . . . . . . . . . . . . 6
4.1. Single-agent with Single-model . . . . . . . . . . . . . 6 4. Reinforcement-learning-based process scenario . . . . . . . . 6
4.2. Multi-agents Sharing Single-model . . . . . . . . . . . . 6 4.1. Single-agent with Single-model . . . . . . . . . . . . . 7
4.3. Adversarial Self-Play with Single-model . . . . . . . . . 6 4.2. Multi-agents Sharing Single-model . . . . . . . . . . . . 7
4.4. Cooperative Multi-agents with Multiple-models . . . . . . 6 4.3. Adversarial Self-Play with Single-model . . . . . . . . . 7
4.5. Competitive Multi-agents with Multiple-models . . . . . . 7 4.4. Cooperative Multi-agents with Multiple-models . . . . . . 7
5. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.5. Competitive Multi-agents with Multiple-models . . . . . . 8
5. Use Cases . . . . . . . . . . . . . . . . . . . . . . . . . . 8
5.1. Intelligent Edge-computing for Traffic Control using 5.1. Intelligent Edge-computing for Traffic Control using
Deep-reinforcement-learning . . . . . . . . . . . . . . . 7 Deep-reinforcement-learning . . . . . . . . . . . . . . . 8
5.2. Edge computing system in a field of Construction-site 5.2. Edge computing system in a field of Construction-site
using Reinforcement-learning . . . . . . . . . . . . . . 7 using Reinforcement-learning . . . . . . . . . . . . . . 8
5.3. Deep-reinforcement-learning-based Cyber Physical 5.3. Deep-reinforcement-learning-based remote Control system
Management Control system over a network . . . . . . . . 8 over a software-defined network . . . . . . . . . . . . . 9
6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 9
7. Security Considerations . . . . . . . . . . . . . . . . . . . 9 6. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 11
8. References . . . . . . . . . . . . . . . . . . . . . . . . . 9 7. Security Considerations . . . . . . . . . . . . . . . . . . . 11
8.1. Normative References . . . . . . . . . . . . . . . . . . 9 8. References . . . . . . . . . . . . . . . . . . . . . . . . . 11
8.2. Informative References . . . . . . . . . . . . . . . . . 9 8.1. Normative References . . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 11 8.2. Informative References . . . . . . . . . . . . . . . . . 11
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 13
1. Introduction 1. Introduction
Reinforcement-learning for intelligently autonomous network Reinforcement-learning for intelligently autonomous network
management, in general, is one of the challengeable methods in a management, in general, is one of the challengeable methods in a
dynamic complex and cluttered network environments. With the dynamic complex and cluttered network environments. With the
intelligent approach needs the development of computational systems intelligent approach needs the development of computational systems
in a single or large distributed networking nodes, where these in a single or large distributed networking nodes, where these
environments involve limited and incomplete knowledge. environments involve limited and incomplete knowledge.
skipping to change at page 3, line 44 skipping to change at page 4, line 4
There are many different networking management problems to There are many different networking management problems to
intelligently solve, such as connectivity, traffic management, fast intelligently solve, such as connectivity, traffic management, fast
Internet without latency and etc. Reinforcement-learning-based Internet without latency and etc. Reinforcement-learning-based
approaches can surely provide some of specific solutions with approaches can surely provide some of specific solutions with
multiple cases against human operating capacities although it is a multiple cases against human operating capacities although it is a
challengeable area due to a multitude of reasons such as large state challengeable area due to a multitude of reasons such as large state
space, complexity in the giving reward, difficulty in control space, complexity in the giving reward, difficulty in control
actions, and difficulty in sharing and merging of the trained actions, and difficulty in sharing and merging of the trained
knowledge in a distributed memory node to be transferred over a knowledge in a distributed memory node to be transferred over a
communication network.[MS] communication network.[MS]
In addition, Intent-based network bridge to solve some of network
problems and gaps between network business model and technical
scheme. Intents should be applied to application service levels,
security policies, compliance, operational processes, and other
business needs. The network should constantly monitor and adjust to
meet the intent in following the monitoring system. There are some
of requirements to satisfy Intent-based network as following: (1)
Transfer, (2) policy activation (automatically), (3) guarantee
(Continuous monitoring and verification) [Cisco]. Through
continuously monitoring with network data, we are able to collect
network information and to analyze the collected information by
artificial intelligent approach. If the analysis result shows that
the new network configuration parameter needs to be changed or
reconfigured by deriving the optimized value.
2. Conventions and Terminology 2. Conventions and Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC2119]. document are to be interpreted as described in [RFC2119].
3. Theoretical Approaches 3. Theoretical Approaches
3.1. Reinforcement-learning 3.1. Reinforcement-learning
skipping to change at page 5, line 42 skipping to change at page 6, line 16
discrete action space, and also has the advantages of learning both discrete action space, and also has the advantages of learning both
feedforward and recurrent agent.[MS] feedforward and recurrent agent.[MS]
A3C algorithm is possibly a number of complementary improvement to A3C algorithm is possibly a number of complementary improvement to
the neural network architecture and it has been shown to accurately the neural network architecture and it has been shown to accurately
produce and estimate of Q-values by including separate streams for produce and estimate of Q-values by including separate streams for
the state value and advantage in the network to improve both value- the state value and advantage in the network to improve both value-
based and policy-based methods by making it easier for the network to based and policy-based methods by making it easier for the network to
represent feature coordinates [Volodymyr Mnih]. represent feature coordinates [Volodymyr Mnih].
3.5. Intent-based Network (IBN)
ntent-based Network is a new technical approach that can adapt the
network flexibly through configuration parameters derived from data
analysis for network machine learning. Software-defined Networking
(SDN) is a similar concept with Intent-based Network, however,
Software-defined Networking has not yet tipped in the sector that
relies on network automation. With the approach, network machine
learning is integrated with network analysis, routing, wireless
communications, and resource management. However, unlike the field
of computer vision, which can easily acquire sufficient data, it is
difficult to obtain data over a real network. Therefore, there are
limitations to apply machine learning technique to network field with
the data. Reinforcement Learning (RL) can diminish much attention
and the importance of securing high-quality data, so that both
concepts of reinforcement learning and intent-based network might
solve the limitation and integrate a gap between network machine
learning and network technique.
Intent-based network is also describing how to apply the setting
values for network management/operation in a procedural way. For
that reason, the approach is also the core of Intent processing that
automatically interprets it and declares it declaratively. Even if
the basic concepts of intent-based network reflects and to be
announced regarding intent, there is no standardized form of Intent
processing technology. While intent-based network has the advantage
of providing a higher level of abstraction in network management/
operation and providing ease of use, a more specific and clear
definition of the technology is likely to be needed.
4. Reinforcement-learning-based process scenario 4. Reinforcement-learning-based process scenario
With a single agent or multiple agents trained for intelligent With a single agent or multiple agents trained for intelligent
network management, a variety of training scenarios are possible, network management, a variety of training scenarios are possible,
depending on how agents are interacted and how many models are linked depending on how agents are interacted and how many models are linked
to the agents. The followings are possible RL training scenarios for to the agents. The followings are possible RL training scenarios for
network management. network management.
4.1. Single-agent with Single-model 4.1. Single-agent with Single-model
skipping to change at page 8, line 14 skipping to change at page 9, line 14
converted to high quality streaming data to rapidly show and defect converted to high quality streaming data to rapidly show and defect
the dangerous situation, when any alert should be detected due to the the dangerous situation, when any alert should be detected due to the
dangerous elements. To approach technically deep-reinforcement- dangerous elements. To approach technically deep-reinforcement-
learning can provide a solution to automatically detect these kinds learning can provide a solution to automatically detect these kinds
of dangerous situations with prediction in an advance. It can also of dangerous situations with prediction in an advance. It can also
provide the transform data including with the high-rate streaming provide the transform data including with the high-rate streaming
video and quickly prevent the other risks. Deep-reinforcement- video and quickly prevent the other risks. Deep-reinforcement-
learning is an important role to efficiently manage and monitor with learning is an important role to efficiently manage and monitor with
the given dataset in real-time. the given dataset in real-time.
5.3. Deep-reinforcement-learning-based Cyber Physical Management 5.3. Deep-reinforcement-learning-based remote Control system over a
Control system over a network software-defined network
With the nonlinear control system such as cyber physical system With the nonlinear control system such as cyber physical system
provides an unstable system environment with initial control state provides an unstable system environment with initial control state
due to its nonlinear nature. In order to stably control the unstable due to its nonlinear nature. In order to stably control the unstable
initial state, the prior-complex mathematical control methods (Linear initial state, the prior-complex mathematical control methods (Linear
Quadratic Regulator, Proportional Integral Differential) are used for Quadratic Regulator, Proportional Integral Differential) are used for
successful control and management, but these approaches are needed successful control and management, but these approaches are needed
with difficult mathematical process and high-rate effort. Therefore, with difficult mathematical process and high-rate effort. Therefore,
using deep-reinforcement-learning can surely provide more effective using deep-reinforcement-learning can surely provide more effective
technical approach without difficult initial set of control states to technical approach without difficult initial set of control states to
skipping to change at page 9, line 18 skipping to change at page 10, line 18
. . Physical System . .----------->. Cyber Module . . . . Physical System . .----------->. Cyber Module . .
. . . .<-----------. . . . . . .<-----------. . .
. +-----------------+ . +--------------+ . . +-----------------+ . +--------------+ .
. . . . +--------+ . . . . . +--------+ .
+---------------------+ . .----------.RL Agent. . +---------------------+ . .----------.RL Agent. .
. +--------+ . . +--------+ .
+............................+ +............................+
Figure 1: DRL-based Cyber Physical Management Control System Figure 1: DRL-based Cyber Physical Management Control System
With the use-case, the reinforcement learning agent interacts with
the physical remote device while exchanging network packets. The
Software-defined network controller can manage the network traffic
transmission, so that the system is naturally composed of a cyber
environment and physical environment, and two environments closely
and synchronously.[Ju-Bong]
For the intelligent traffic management in the system, software-
defined networking for automation (basic concept for IBN) should be
used to control and manage of connection between the cyber physical
system and edge computing module. The intelligent approach consists
of software that intelligently controls the network and technique
that allows software to set up and control the network. The concept
of can be centralized to control of network operation by software
programming, centralizes switch/router control function based on
existing hardware. It is possible to manage the network according to
the requirements without the detailed network configuration.
In addition, software-defined networking switch is able to enable the
network traffic control to be controlled and managed by software-
based controllers. This approach is really similar with intent-based
networking since both approaches can share the similar principle
using software to run the network, however, intent-based networking
offers an abstraction layer under the implemented policy and
instruction across all the physical hardware within the
infrastructure for automated networking. To achieve superior intent-
based networking over a real network, the physical control system
will be implemented to automatically manage and provide IoE edge
smart traffic control service for high quality real time connection.
6. IANA Considerations 6. IANA Considerations
There are no IANA considerations related to this document. There are no IANA considerations related to this document.
7. Security Considerations 7. Security Considerations
[TBD] [TBD]
8. References 8. References
 End of changes. 15 change blocks. 
35 lines changed or deleted 120 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/