[Docs] [txt|pdf] [Tracker] [Email] [Nits]

Versions: 00 01 02 03 04 05 06

NFVRG                                                        C. Meirosu
Internet Draft                                                 Ericsson
Intended status:  Informational                            A. Manzalini
Expires: April 2015                                      Telecom Italia
                                                                 J. Kim
                                                       Deutsche Telekom
                                                            R. Steinert
                                                              S. Sharma
                                                           G. Marchetto
                                                  Politecnico di Torino

                                                       October 27, 2014

            DevOps for Software-Defined Telecom Infrastructures

Status of this Memo

   This Internet-Draft is submitted in full conformance with the
   provisions of BCP 78 and BCP 79.

   Internet-Drafts are working documents of the Internet Engineering
   Task Force (IETF), its areas, and its working groups.  Note that
   other groups may also distribute working documents as Internet-

   Internet-Drafts are draft documents valid for a maximum of six months
   and may be updated, replaced, or obsoleted by other documents at any
   time.  It is inappropriate to use Internet-Drafts as reference
   material or to cite them other than as "work in progress."

   The list of current Internet-Drafts can be accessed at

   The list of Internet-Draft Shadow Directories can be accessed at

   This Internet-Draft will expire on April 27, 2009.

Meirosu, et al.         Expires April 27, 2015                 [Page 1]

Internet-Draft            DevOps Challenges                October 2014

Copyright Notice

   Copyright (c) 2014 IETF Trust and the persons identified as the
   document authors. All rights reserved.

   This document is subject to BCP 78 and the IETF Trust's Legal
   Provisions Relating to IETF Documents
   (http://trustee.ietf.org/license-info) in effect on the date of
   publication of this document. Please review these documents
   carefully, as they describe your rights and restrictions with respect
   to this document. Code Components extracted from this document must
   include Simplified BSD License text as described in Section 4.e of
   the Trust Legal Provisions and are provided without warranty as
   described in the Simplified BSD License.


   The introduction of virtualization technologies, starting from the
   physical layer and going all the way up to the application plane, is
   transforming the telecom network infrastructure onto an agile, model-
   driven production environment for communication services. Carrier-
   grade network management was optimized for environments built with
   monolithic physical nodes and involves significant deployment,
   integration and maintenance efforts from network service providers.
   The DevOps movement in the data center is a source of inspiration
   regarding how to simplify and automate management processes for
   software-defined infrastructure. This first version of this draft
   identifies three areas that we consider key to applying DevOps
   principles in a telecom service provider environment, namely for
   monitoring, verification and troubleshooting processes. Finally, we
   introduce challenges associated with operationalizing DevOps
   principles at scale in software-defined telecom networks.

Table of Contents

   1. Introduction...................................................3
   2. Conventions used in this document..............................4
   3. DevOps Principles for Software-Defined Telecom Infrastructure..4
   4. Stability Challenges...........................................6
   5. Observability Challenges.......................................8
   6. Verification Challenges........................................9
   7. Troubleshooting Challenges....................................10
   8. Security Considerations.......................................11
   9. IANA Considerations...........................................11
   10. References...................................................11
      10.1. Normative References....................................11

Meirosu, et al.         Expires April 27, 2015                 [Page 2]

Internet-Draft            DevOps Challenges                October 2014

      10.2. Informative References..................................11
   11. Acknowledgments..............................................13

1. Introduction

   Carrier-grade network management was developed as an incremental
   solution once a particular network technology matured and came to be
   deployed in parallel with legacy technologies. This approach requires
   significant integration efforts when new network services are
   launched. Both centralized and distributed algorithms have been
   developed in order to solve very specific problems related to
   configuration, performance or fault management. However, such
   algorithms consider a network that is by and large functionally
   static. Thus, management processes related to introducing new or
   maintaining functionality are complex, and costly due to significant
   efforts required for verification and integration.

   Network virtualization, by means of Software-Defined Networking (SDN)
   and Network Function Virtualization (NFV), is creating an environment
   where network functions are no longer static and embedded into
   physical boxes deployed at fixed points. The virtualized network is
   dynamic and open to fast-paced innovation enabling efficient network
   management and reduction of operating cost for network operators. A
   significant part of network capabilities are expected to become
   available through interfaces that resemble the APIs widespread within
   datacenters instead of the traditional telecom means of management
   such as the Simple Network Management Protocol, Command Line
   Interfaces or CORBA. Such an API-based approach, combined with the
   programmability offered by SDN interfaces [I-D. draft-irtf-sdnrg-
   layer-terminology-04], open opportunities for handling
   infrastructure, resources, and Virtual Network Functions (VNFs) as
   code, employing techniques from software engineering.

   The efficiency and integration of existing management techniques in
   virtualized and dynamic network environments are limited, however.
   Monitoring tools, e.g. based on simple counters, physical network
   taps and active probing, scale poorly and provide only a small part
   of the observability features required in such a dynamic environment.
   Huge amounts of monitoring data can be collected from the nodes, but
   the typical granularity is coarse-grained. Although debugging and
   troubleshooting techniques developed for software-defined
   environments are a research topic that has gathered interest in the
   research community in the last years, it is yet to be explored how to
   integrate them into an operational network management system.
   Moreover, tools that have been developed in academia are limited to
   solving very particular, well-defined problems, while they were not

Meirosu, et al.         Expires April 27, 2015                 [Page 3]

Internet-Draft            DevOps Challenges                October 2014

   built for automation and integration into network operations

   We acknowledge that several standardization organizations have a
   stake in this area. IETF working groups have activities in the area
   of OAM [I-D.draft-aldrin-sfc-oam-framework] and Verification
   [I-D.draft-lee-sfc-verification-00] for Service Function Chaining. At
   IRTF, the authors of [RFC7149] ask a set of relevant questions
   regarding operations of SDNs. The ETSI NFV ISG defines the MANO
   interfaces [NFVMANO], and TMForum investigates gaps between these
   interfaces and existing specifications in [TR228]. The need for
   programmatic APIs in the orchestration of compute, network and
   storage resources is discussed in

   From a research perspective, problems related to operations of
   software-defined networks are in part outlined in [SDNsurvey] and
   research referring to both cloud and software-defined networks are
   outlined by the EU FP7 UNIFY project in [D4.1].

   The purpose of this first version of this document is to act as a
   discussion opener in NFVRG by describing a set of principles that are
   relevant for applying DevOps ideas to managing software-defined
   telecom network infrastructures. We identify challenges related to
   developing tools, interfaces and protocols that would support these
   principles and leverage standard APIs for simplifying management

2. Conventions used in this document

   The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
   document are to be interpreted as described in RFC-2119 [RFC2119].

   In this document, these words will appear with that interpretation
   only when in ALL CAPS. Lower case uses of these words are not to be
   interpreted as carrying RFC-2119 significance.

3. DevOps Principles for Software-Defined Telecom Infrastructure

   In an Internet company, an agile developer is focused on releasing
   small iterations of their code with high velocity and high quality
   into a production environment. The code needs to undergo a

Meirosu, et al.         Expires April 27, 2015                 [Page 4]

Internet-Draft            DevOps Challenges                October 2014

   significant amount of automated testing and verification with pre-
   defined templates in a realistic setting. From the point of view of
   infrastructure management, the verification of the network
   configuration as result of network policy decomposition and
   refinement, as well as the configuration of virtual functions, is one
   of the most sensitive operations. When troubleshooting the cause of
   unexpected behavior, high-granular visibility onto all resources
   supporting the virtual functions (either compute, or network-related)
   is paramount to facilitating fast resolution times. While compute
   resources are typically very well covered by debugging and profiling
   toolsets based on many years of advances in software engineering,
   programmable network resources are a still a novelty and tools
   exploiting their potential are scarce.

   We identify two dimensions of the "developer" role in software-
   defined infrastructure. One dimension refers to the person that
   determines which high-level functions should be part of a particular
   service, decides what logical interconnections are needed between
   these blocks and defines a set of high-level constraints or goals
   related to parameters that define the a Service Function Chain. This
   person might be the product owner for a particular family of services
   offered by a telecom provider. They might be a key account
   representative that adapts an existing service template to the
   requirements of a particular customer by adding or removing a small
   number of functional entities. We refer to this person as the Service
   Developer and for simplicity (access control, training on technical
   background, etc.) we consider the role to be internal to the telecom
   provider. The other dimension of the "developer" role is a person
   that writes the software code for a new virtual network function.
   Depending on the actual virtual network function being developed,
   this person might be internal or external to the telecom provider. We
   refer to them as VNF Developers.

   The role of an Operator in software-defined infrastructure is to
   ensure that the deployment processes were successful and a set of
   performance indicators associated to a service are met while the
   service is supported on virtual infrastructure within the domain of a
   telecom provider.

   In line with the generic DevOps concept outlined in [DevOpsP], we
   consider that the following four principles as important for adapting
   DevOps ideas to software-defined infrastructure:

   * Deploy with repeatable, reliable processes: Service and VNF
   Developers should be supported by automated build, orchestrate and
   deploy processes that are identical in the development, test and
   production environments. Such processes need to be made reliable and

Meirosu, et al.         Expires April 27, 2015                 [Page 5]

Internet-Draft            DevOps Challenges                October 2014

   trusted in the sense that they should reduce the chance of human
   error and provide visibility at each stage of the process, as well as
   have the possibility to enable manual interactions in certain key

   * Develop and test against production-like systems: both Service
   Developers and VNF Developers need to have the opportunity to verify
   and debug their respective code in systems that have characteristics
   which are very close to the production environment where the code is
   expected to be ultimately deployed. Customizations of Service
   Function Chains or VNFs could thus be released frequently to a
   production environment in compliance with policies set by the
   Operators. Adequate isolation and protection of the services active
   in the infrastructure from services being tested or debugged should
   be provided by the production environment.

   * Monitor and validate operational quality: Service Developers, VNF
   Developers and Operators must be equipped with tools, automated as
   much as possible, that enable to continuously monitor the operational
   quality of the services deployed on software-defined infrastructure,
   as well as the infrastructure itself. Monitoring tools should be
   complemented by tools that allow verifying and validating the
   operational quality of the service in line with established
   procedures which might be standardized (for example, Y.1564 Ethernet
   Activation [Y1564]) or defined through best practices specific to a
   particular telecom operator.

   * Amplify feedback loops: An integral part of the DevOps ethos is
   building a cross-cultural environment that bridges the cultural gap
   between the desire for continuous change by the Developers and the
   wish by the Operators for stability and reliability of the
   infrastructure, and feedback from customers is collected and
   transmitted throughout the organization. From a technical
   perspective, such cultural aspects could be addressed through common
   sets of tools and APIs that are aimed at providing a vocabulary
   common to Developers and Operators, as well as simplifying the
   reproduction of problematic situations in the development, test and
   operations environments.

4. Stability Challenges

   The dimensions, dynamicity and heterogeneity of networks are growing
   continuously. Monitoring and managing the network behavior in order
   to meet technical and business objectives is becoming more and more

Meirosu, et al.         Expires April 27, 2015                 [Page 6]

Internet-Draft            DevOps Challenges                October 2014

   complicated and challenging, even more when considering the need of
   predicting and taming potential instabilities.

   In general, instability in networks may have primary effects both
   jeopardizing the performance and compromising an optimized use of
   resources, even across multiple layers: in fact, instability of end-
   to-end communication paths may be dependent both on the underlying
   transport network, as well as the higher level components specific to
   flow control and dynamic routing. For example, arguments for
   introducing advanced flow admission control are essentially derived
   from the observation that the network otherwise behaves in an
   inefficient and potentially unstable manner. Even with resources over
   provisioning, a network without an efficient flow admission control
   has instability regions that can even lead to congestion collapse in
   certain configurations. Another example is the instability which is
   characteristic of any dynamically adaptive routing system. Routing
   instability, which can be (informally) defined as the quick change of
   network reachability and topology information, has a number of
   possible origins, including problems with connections, router
   failures, high levels of congestion, software configuration errors,
   transient physical and data link problems, and software bugs.

   As a matter of fact, the states monitored and used to implement the
   different control and management functions in network nodes are
   governed by several low-level configuration commands (today still
   done mostly hand-made); there are several dependencies among these
   states and the logic updating the states (most of which are not kept
   aligned automatically). Normally, high-level network goals (e.g.,
   connectivity matrix, load-balancing, traffic engineering goals,
   survivability requirements, etc) are translated into low-level
   configuration commands (mostly hand-written) individually executed on
   the network elements (e.g., forwarding table, packet filters, link-
   scheduling weights, and queue-management parameters, as well as
   tunnels and NAT mappings). Network instabilities due to configuration
   errors can spread from node to node and propagate throughout the

   DevOps in the data center is a source of inspiration regarding how to
   simplify and automate management processes for software-defined

   As a specific example, automated configuration functions are expected
   to take the form of a "control loop" that monitors (i.e., measures)
   current states of the network, performs a computation, and then
   reconfigures the network. These types of functions must work
   correctly even in the presence of failures, variable delays in
   communicating with a distributed set of devices, and frequent changes

Meirosu, et al.         Expires April 27, 2015                 [Page 7]

Internet-Draft            DevOps Challenges                October 2014

   in network conditions. Nevertheless cascading and nesting of
   automated configuration processes can lead to the emergence of non-
   linear network behaviors, and as such sudden instabilities (i.e.
   identical local dynamic can give rise to widely different global

   The CAP theorem [CAP] states that any networked shared-data system
   can have at most two of following three properties: 1) consistency
   (C) equivalent to having a single up-to-date copy of the data; 2)
   high availability (A) of that data (for updates); and 3) tolerance to
   network partitions (P). Looking at a telecom software-defined
   infrastructure as a distributed computational system
   (routing/forwarding packets can be seen as a computational problem),
   just two of the three CAP properties will be possible at the same
   time. This has profound implications technologies that need to be
   developed in line with the "deploy with repeatable, reliable
   processes" principle for configuring the states of the software-
   defined infrastructure. Latency or delay and partitioning properties
   are deeply related, and such relation becomes more important in the
   case of telecom service providers where Devs and Ops interact with
   widely distributed infrastructure. Limitations of interactions
   between centralized management and distributed control need to be
   carefully examined in such environments.

5. Observability Challenges

   Monitoring algorithms need to operate in a scalable manner while
   providing the specified level of observability in the network, either
   for operation purposes (Ops part) or for debugging in a development
   phase (Dev part). We consider the following challenges:

   * Scalability - relates to the granularity of network observability,
   computational efficiency, communication overhead, and strategic
   placement of monitoring functions.

   * Distributed operation and information exchange between monitoring
   functions - monitoring functions supported by the nodes may perform
   specific operations (such as aggregation or filtering) locally on the
   collected data or within a defined data neighborhood and forward only
   the result to a management system. Such operation may require
   modifications of existing standards and development of protocols for
   efficient information exchange and messaging between monitoring
   functions. Different levels of granularity may need to be offered for
   the data exchanged through the interfaces, depending on the Dev or
   Ops role.

Meirosu, et al.         Expires April 27, 2015                 [Page 8]

Internet-Draft            DevOps Challenges                October 2014

   * Configurability and conditional observability - monitoring
   functions that go beyond measuring simple metrics (such as delay, or
   packet loss) require expressive monitoring annotation languages for
   describing the functionality such that it can be programmed by a
   controller. Monitoring algorithms implementing self-adaptive
   monitoring behavior relative to local network situations may employ
   such annotation languages to receive high-level objectives (KPIs
   controlling tradeoffs between accuracy and measurement frequency, for
   example) and conditions for varying the measurement intensity.

   * Automation - includes mapping of monitoring functionality from a
   logical forwarding graph to virtual or physical instances executing
   in the infrastructure, as well as placement and re-placement of
   monitoring functionality for required observability coverage and
   configuration consistency upon updates in a dynamic network

6. Verification Challenges

   Enabling ongoing verification of code is an important goal of
   continuous integration as part of the data center DevOps concept. In
   a software-defined telecom infrastructure, service definitions,
   decompositions and configurations need to be expressed in machine-
   readable encodings. For example, configuration parameters could be
   expressed in terms of YANG models. It is acknowledged that the
   infrastructure management layers (such as Software-Defined Network
   Controllers and Orchestration software) might not always export such
   machine-readable descriptions of the runtime configuration state. In
   this case, the management layer itself could be expected to include a
   verification process that has the same challenges as the stand-alone
   verification processes we outline below. In that sense, verification
   can be considered as a set of features providing gatekeeper functions
   to verify both the abstract service models and the proposed resource
   configuration before actual instantiation on the infrastructure layer
   takes place.

   A verification process can involve different layers of the
   architecture. Starting from a high-level verification of the customer
   input (for example, a Service Graph), the verification process could
   go more in depth to reflect on the service chain configuration. At
   the lowest layer, the verification would handle the actual set of
   forwarding rules and other configuration parameters associated to the
   service chain. This enables the verification of more quantitative
   properties (e.g. compliance with resource availability), as well as a
   more detailed and precise verification of the abovementioned

Meirosu, et al.         Expires April 27, 2015                 [Page 9]

Internet-Draft            DevOps Challenges                October 2014

   topological ones. Existing verification tools for the SDN scenario
   could be deployed in this context, but the majority of them only
   operate on network configuration rules (commonly OpenFlow), and in
   any case all of them do not consider active network functions (i.e.
   VNFs or middle-boxes that dynamically change the forwarding path of a
   flow according to local algorithms, e.g. load balancers, packet
   marking modules and intrusion detection systems). Defining a set of
   verification tools that can account for network function
   virtualization is a significant challenge. In order to perform
   verification based on formal properties of the system, the internal
   states of a virtual network function would need to be represented and
   perhaps summarized in a way that allows for the verification process
   to finish within a reasonable time interval.

7. Troubleshooting Challenges

   One of the problems brought up by the complexity introduced by NFV
   and SDN is pinpointing the cause of a failure in an infrastructure
   that is under continuous change. Developing an agile and low-
   maintenance debugging mechanism for an architecture that is comprised
   of multiple layers and discrete components is a particularly
   challenging task to carry out. Verification, observability, and
   probe-based tools are key to troubleshooting processes, regardless
   whether they are followed by Dev or Ops personnel.

   * Automated troubleshooting workflows

   Failure is a frequently occurring event in network operation.
   Therefore, it is crucial to monitor components of the system
   periodically. Moreover, the troubleshooting system should search for
   the cause automatically in the case of failure. If the system follows
   a multi-layered architecture, monitoring and debugging actions should
   be performed on components from the topmost layer to the bottom layer
   in a chain. Likewise, the result of operations should be notified in
   reverse order. In this regard, one should be able to define
   monitoring and debugging actions through a common interface that
   employs layer hopping logic. Besides, this interface should allow
   fine-grained and automatic on-demand control for the integration of
   other monitoring and verification mechanisms and tools.

   * Troubleshooting with active measurement methods

   Besides detecting network changes based on passively collected
   information, active probes into delay, network utilization, loss rate
   are important to debug errors and to evaluate the performance of

Meirosu, et al.         Expires April 27, 2015                [Page 10]

Internet-Draft            DevOps Challenges                October 2014

   network elements. While tools that are effective in determining such
   conditions for particular technologies were defined by IETF and other
   standardization organization, their use requires a significant amount
   of manual labor in terms of both configuration and interpretation of
   the results. In contrasts, methods that test and debug networks
   systematically based on models generated from the router
   configuration, router interface tables or forwarding tables, would
   significantly simplify management. They could be made usable by Dev
   personnel that have little expertise on diagnosing network defects.
   Such tools naturally lend themselves to integration into complex
   troubleshooting workflows that could be generated automatically based
   on the description of a particular service chain. However, there are
   scalability challenges associated with deploying such tools in a
   network. Some tools may poll each networking device for the
   forwarding table information to calculate the minimum number of test
   packets to be transmitted in the network. Therefore, as the network
   size and the forwarding table size increases, forwarding table
   updates for the tools may put a non-negligible load in the network.

8. Security Considerations


9. IANA Considerations

   This memo includes no request to IANA.

10. References

10.1. Normative References

   [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
             Requirement Levels", BCP 14, RFC 2119, March 1997.

10.2. Informative References

   [NFVMANO] ETSI, "Network Function Virtualization (NFV) Management
             and Orchestration V0.6.1 (draft)", Jul. 2014

Meirosu, et al.         Expires April 27, 2015                [Page 11]

Internet-Draft            DevOps Challenges                October 2014

   [I-D.draft-aldrin-sfc-oam-framework]   S. Aldrin, R. Pignataro, N.
             Akiya. "Service Function Chaining Operations,
             Administration and Maintenance Framework", draft-aldrin-
             sfc-oam-framework-00, (work in progress), July 2014.

   [I-D.draft-lee-sfc-verification-00] S. Lee and M. Shin. "Service
             Function Chaining Verification", draft-lee-sfc-
             verification-00, (work in progress), February 2014.

   [I-D. draft-irtf-sdnrg-layer-terminology-04] E. Haleplidis (Ed.), K.
             Pentikousis (Ed.), S. Denazis, J. Hadi Salim, D. Meyer, and
             O. Koufopavlou, "SDN Layers and Architecture Terminology",
             Internet Draft, draft-haleplidis-sdnrg-layer-terminology-04
             (work in progress), October 2014

   [RFC7149] M. Boucadair, C Jaquenet. "Software-Defined Networking: A
             Perspective from within a Service Provider Environment",
             RFC 7149, March 2014.

   [TR228]   TMForum Gap Analysis Related to MANO Work. TR228, May 2014

   [I-D.draft-unify-nfvrg-challenges-00]  R. Szabo et al. "Unifying
             Carrier and Cloud Networks: Problem Statement and
             Challenges", draft-unify-nfvrg-challenges-00 (work in
             progress), October 2014

   [D4.1]    W. John et al. D4.1 Initial requirements for the SP-DevOps
             concept, universal node capabilities and proposed tools,
             August 2014.

   [SDNsurvey] D. Kreutz, F. M. V. Ramos, P. Verissimo, C. Esteve
             Rothenberg, S. Azodolmolky, S. Uhlig. "Software-Defined
             Networking: A Comprehensive Survey." To appear in
             proceedings of the IEEE, 2015.

   [DevOpsP] "DevOps, the IBM Approach" 2013. [Online].

   [Y1564]   ITU-R Recommendation Y.1564: Ethernet service activation
             test methodology, March 2011

   [CAP]     E. Brewer, "CAP twelve years later: How the "rules" have
             changed", IEEE Computer, vol.45, no.2, pp.23,29, Feb. 2012.

Meirosu, et al.         Expires April 27, 2015                [Page 12]

Internet-Draft            DevOps Challenges                October 2014

11. Acknowledgments

   This work is supported by FP7 UNIFY, a research project partially
   funded by the European Community under the Seventh Framework Program
   (grant agreement no. 619609).  The views expressed here are those of
   the authors only. The European Commission is not liable for any use
   that may be made of the information in this document.

   We would like to thank in particular the UNIFY WP4 contributors, the
   internal reviewers of the UNIFY WP4 deliverables and Konstantinos
   Pentikousis from EICT, for the useful discussions and insightful

   This document was prepared using 2-Word-v2.0.template.dot.

Meirosu, et al.         Expires April 27, 2015                [Page 13]

Internet-Draft            DevOps Challenges                October 2014

Authors' Addresses

   Catalin Meirosu
   Ericsson Research
   S-16480 Stockholm, Sweden
   Email: catalin.meirosu@ericsson.com

   Antonio Manzalini
   Telecom Italia
   Via Reiss Romoli, 274
   10148 - Torino, Italy
   Email: antonio.manzalini@telecomitalia.it

   Juhoon Kim
   Deutsche Telekom AG
   Winterfeldtstr. 21
   10781 Berlin, Germany
   Email: J.Kim@telekom.de

   Rebecca Steinert
   SICS Swedish ICT AB
   Box 1263, SE-16429 Kista, Sweden
   Email: rebste@sics.se

   Sachin Sharma
   Ghent University-iMinds
   Research group IBCN - Department of Information Technology
   Zuiderpoort Office Park, Blok C0
   Gaston Crommenlaan 8 bus 201
   B-9050 Gent, Belgium
   Email: sachin.sharma@intec.ugent.be

   Guido Marchetto
   Politecnico di Torino
   Corso Duca degli Abruzzi 24
   10129 - Torino, Italy
   Email: guido.marchetto@polito.it

Meirosu, et al.         Expires April 27, 2015                [Page 14]

Html markup produced by rfcmarkup 1.129c, available from https://tools.ietf.org/tools/rfcmarkup/