--- 1/draft-ietf-ccamp-alarm-module-00.txt 2018-02-08 04:14:32.257807990 -0800 +++ 2/draft-ietf-ccamp-alarm-module-01.txt 2018-02-08 04:14:32.373810725 -0800 @@ -1,19 +1,19 @@ Network Working Group S. Vallin Internet-Draft Stefan Vallin AB Intended status: Standards Track M. Bjorklund -Expires: June 17, 2018 Cisco - December 14, 2017 +Expires: August 12, 2018 Cisco + February 8, 2018 YANG Alarm Module - draft-ietf-ccamp-alarm-module-00 + draft-ietf-ccamp-alarm-module-01 Abstract This document defines a YANG module for alarm management. It includes functions for alarm list management, alarm shelving and notifications to inform management systems. There are also RPCs to manage the operator state of an alarm and administrative alarm procedures. The module carefully maps to relevant alarm standards. Status of This Memo @@ -24,110 +24,109 @@ Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at http://datatracker.ietf.org/drafts/current/. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." - This Internet-Draft will expire on June 17, 2018. + This Internet-Draft will expire on August 12, 2018. Copyright Notice - Copyright (c) 2017 IETF Trust and the persons identified as the + Copyright (c) 2018 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. Table of Contents - 1. Requirements notation . . . . . . . . . . . . . . . . . . . . 3 - 2. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 - 2.1. Terminology . . . . . . . . . . . . . . . . . . . . . . . 3 - 3. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 - 4. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 - 4.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 - 4.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 - 4.3. Identifying Resource . . . . . . . . . . . . . . . . . . 7 - 4.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 7 - 4.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 - 4.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 8 - 4.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 9 - 4.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 9 - 4.6. Root Cause and Impacted Resources . . . . . . . . . . . . 10 - 4.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 10 - 5. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 10 - 5.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 11 - 5.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 11 - 5.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 12 - 5.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 13 - 5.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 13 - 5.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 15 - 5.6. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 15 - 5.7. Notifications . . . . . . . . . . . . . . . . . . . . . . 15 - 6. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 15 - 7. X.733 Alarm Mapping Data Model . . . . . . . . . . . . . . . 40 - 8. X.733 Alarm Mapping YANG Module . . . . . . . . . . . . . . . 41 - 9. Security Considerations . . . . . . . . . . . . . . . . . . . 47 - 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 47 - 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 47 - 11.1. Normative References . . . . . . . . . . . . . . . . . . 47 - 11.2. Informative References . . . . . . . . . . . . . . . . . 47 - Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 48 - Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 49 - Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 50 - Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 51 - Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 52 - Appendix F. Background and Usability Requirements . . . . . . . 52 - F.1. Alarm Concepts . . . . . . . . . . . . . . . . . . . . . 53 - F.1.1. Alarm type . . . . . . . . . . . . . . . . . . . . . 53 - F.2. Usability Requirements . . . . . . . . . . . . . . . . . 54 - Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 57 - -1. Requirements notation - - The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", - "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and - "OPTIONAL" in this document are to be interpreted as described in BCP - 14 [RFC2119] [RFC8174] when, and only when, they appear in all - capitals, as shown here. + 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 + 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 + 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 + 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 + 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 + 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 + 3.3. Identifying Resource . . . . . . . . . . . . . . . . . . 7 + 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 7 + 3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 + 3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 8 + 3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 9 + 3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 9 + 3.6. Root Cause and Impacted Resources . . . . . . . . . . . . 10 + 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 10 + 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 10 + 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 11 + 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 11 + 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 12 + 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 13 + 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 13 + 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 15 + 4.6. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 15 + 4.7. Notifications . . . . . . . . . . . . . . . . . . . . . . 15 + 5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 15 + 6. X.733 Alarm Mapping Data Model . . . . . . . . . . . . . . . 43 + 7. X.733 Alarm Mapping YANG Module . . . . . . . . . . . . . . . 43 + 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 49 + 9. Security Considerations . . . . . . . . . . . . . . . . . . . 50 + 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 51 + 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 51 + 11.1. Normative References . . . . . . . . . . . . . . . . . . 51 + 11.2. Informative References . . . . . . . . . . . . . . . . . 52 + Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 53 + Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 54 + Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 54 + Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 56 + Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 56 + Appendix F. Background and Usability Requirements . . . . . . . 57 + F.1. Alarm Concepts . . . . . . . . . . . . . . . . . . . . . 57 + F.1.1. Alarm type . . . . . . . . . . . . . . . . . . . . . 58 + F.2. Usability Requirements . . . . . . . . . . . . . . . . . 58 + Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 61 -2. Introduction +1. Introduction This document defines a YANG [RFC7950] module for alarm management. The purpose is to define a standardised alarm interface for network devices that can be easily integrated into management applications. The model is also applicable as a northbound alarm interface in the management applications. Alarm monitoring is a fundamental part of monitoring the network. Raw alarms from devices do not always tell the status of the network services or necessarily point to the root cause. However, being able - to feed alarms to the network management system in a standardised + to feed alarms to the alarm management application in a standardised format is a starting point for performing higher level network assurance tasks. This document defines a standardised YANG module for alarm management. The design of the module is based on experience from - using and implementing available alarm standards. + using and implementing available alarm standards from ITU [X.733], + 3GPP [ALARMIRP] and ANSI [ISA182]. -2.1. Terminology +1.1. Terminology and Notation + + The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", + "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and + "OPTIONAL" in this document are to be interpreted as described in BCP + 14 [RFC2119] [RFC8174] when, and only when, they appear in all + capitals, as shown here. The following terms are defined in [RFC7950]: o action o client o data tree o RPC @@ -158,21 +157,21 @@ for example: an interface, a process. o System: The system that implements this YANG alarm module, i.e., acts as a server. This corresponds to a network device or a management application that provides a north-bound alarm interface. Tree diagrams used in this document follow the notation defined in [I-D.ietf-netmod-yang-tree-diagrams]. -3. Objectives +2. Objectives The objectives for the design of the Alarm Module are: o Simple to use. If a system supports this module, it shall be straight-forward to integrate this into a YANG based alarm manager. o View alarms as states on resources and not as discrete notifications. @@ -180,45 +179,55 @@ that should not be forwarded as alarm notifications. o Clear and precise identification of alarm types and alarm instances. o A management system should be able to pull all available alarm types from a system, i.e., read the alarm inventory from a system. This makes it possible to prepare alarm operators with corresponding alarm instructions. - o Address alarm usability requirements. While IETF has not really - addressed alarm management, telecom standards has addressed it - purely from a protocol perspective. The process industry has - published several relevant standards addressing requirements for a - useful alarm interface; [EEMUA], [ISA182]. This alarm module - defines usability requirements as well as a YANG data model. + o Address alarm usability requirements, see Appendix F. While IETF + has not really addressed alarm management, telecom standards has + addressed it purely from a protocol perspective. The process + industry has published several relevant standards addressing + requirements for a useful alarm interface; [EEMUA], [ISA182]. + This alarm module defines usability requirements as well as a YANG + data model. - o Mapping to X.733, which is a requirement for many alarm systems. + o Mapping to X.733, which is a requirement for some alarm systems. Still, keep some of the X.733 concepts out of the core model in order to make the model small and easy to understand. -4. Alarm Module Concepts +3. Alarm Module Concepts This section defines the fundamental concepts behind the data model. This section is rooted in the works of Vallin et. al [ALARMSEM]. -4.1. Alarm Definition +3.1. Alarm Definition An alarm signifies an undesirable state in a resource that requires corrective action. + There are two main things to remember from this definition: + + 1. the definition focuses on leaving out events and logging + information in general. Alarms should only be used for undesired + states that require action. + + 2. the definition also focus on alarms as a state on a resource, not + the notifications that report the state changes. + See Appendix F for more motivation and consequences around this definition. -4.2. Alarm Type +3.2. Alarm Type This document defines an alarm type with an alarm type id and an alarm type qualifier. The alarm type id is modeled as a YANG identity. With YANG identities, new alarm types can be defined in a distributed fashion. YANG identities are hierarchical, which means that an hierarchy of alarm types can be defined. Standards and vendors should define their own alarm type identities @@ -298,66 +307,70 @@ description "Abstract alarm type"; } identity external-detector { base environmental-alarm; description "Abstract alarm type, a run-time configuration procedure sets the type of alarm detected. This will be reported in the alarm-type-qualifier."; } -4.3. Identifying Resource +3.3. Identifying Resource It is of vital importance to be able to refer to the alarming resource. This reference must be as fine-grained as possible. If the alarming resource exists in the data tree then an instance- identifier MUST be used with the full path to the object. This module also allows for alternate naming of the alarming resource if it is not available in the data tree. -4.4. Identifying Alarm Instances +3.4. Identifying Alarm Instances A primary goal of this alarm module is to remove any ambiguity in how alarm notifications are mapped to an update of an alarm instance. X.733 and especially 3GPP were not really clear on this point. This YANG alarm module states that the tuple (resource, alarm type identifier, alarm type qualifier) corresponds to a single alarm instance. This means that alarm notifications for the same resource and same alarm type are matched to update the same alarm instance. These three leafs are therefore used as the key in the alarm list: list alarm { key "resource alarm-type-id alarm-type-qualifier"; ... } -4.5. Alarm Life-Cycle +3.5. Alarm Life-Cycle The alarm model clearly separates the resource alarm life-cycle from the operator and administrative life-cycles of an alarm. o resource alarm life-cycle: the alarm instrumentation that controls alarm raise, clearance, and severity changes. o operator alarm life-cycle: operators acting upon alarms with actions like acknowledgment and closing. Closing an alarm implies that the operator considers the corrective action performed. - Operators can also shelf alarms in order to avoid nuisance alarms. + Operators can also shelf (block/filter) alarms in order to avoid + nuisance alarms. o administrative alarm life-cycle: deleting (purging) alarms and compressing the alarm status change list. This module exposes operations to manage the administrative life-cycle. The server may also perform these operations based on other policies, but how that is done is out of scope for this document. -4.5.1. Resource Alarm Life-Cycle + A server SHOULD describe how long it retains cleared/closed alarms: + until manually purged or if it has an automatic removal policy. + +3.5.1. Resource Alarm Life-Cycle From a resource perspective, an alarm can have the following life- cycle: raise, change severity, change severity, clear, being raised again etc. All of these status changes can have different alarm texts generated by the instrumentation. Two important things to note: 1. Alarms are not deleted when they are cleared. Deleting alarms is an administrative process. The alarm module defines an rpc "purge" that deletes alarms. @@ -380,21 +393,21 @@ +--ro alarm-text alarm-text For every status change from the resource perspective a row is added to the "status-change" list. The last status values are also represented at leafs for the alarm. Note well that the alarm severity does not include "cleared", alarm clearance is a flag. An alarm can therefore look like this: ((GigabitEthernet0/25, link- alarm,""), false, T, major, "Interface GigabitEthernet0/25 down") -4.5.2. Operator Alarm Life-cycle +3.5.2. Operator Alarm Life-cycle Operators can also act upon alarms using the set-operator-state action: +--ro alarm* [resource alarm-type-id alarm-type-qualifier] ... +--ro operator-state-change* [time] {operator-actions}? | +--ro time yang:date-and-time | +--ro operator string | +--ro state operator-state @@ -403,56 +416,56 @@ +---w input +---w state operator-state +---w text? string The operator state for an alarm can be: "none", "ack", "shelved", and "closed". Alarm deletion (using the rpc "purge-alarms"), can use this state as a criteria. A closed alarm is an alarm where the operator has performed any required corrective actions. Closed alarms are good candidates for being deleted. -4.5.3. Administrative Alarm Life-Cycle +3.5.3. Administrative Alarm Life-Cycle Deleting alarms from the alarm list is considered an administrative action. This is supported by the "purge-alarms" rpc. The "purge- alarms" rpc takes a filter as input. The filter selects alarms based on the operator and resource life-cycle such as "all closed cleared alarms older than a time specification". The server may also perform these operations based on other policies, but how that is done is out of scope for this document. Alarms can be compressed. Compressing an alarm deletes all entries in the alarm's "status-change" list except for the last status change. A client can perform this using the "compress-alarms" rpc. The server may also perform these operations based on other policies, but how that is done is out of scope for this document. -4.6. Root Cause and Impacted Resources +3.6. Root Cause and Impacted Resources The general principle of this alarm module is to limit the amount of alarms. The alarm has two leaf-lists to identify possible impacted resources and possible root-cause resources. The system should not send individual alarms for the possible root-cause resources and impacted resources. These serves as hints only. It is up to the client application to use this information to present the overall status. -4.7. Alarm Shelving +3.7. Alarm Shelving Alarm shelving is an important function in order for alarm management applications and operators to stop superfluous alarms. A shelved - alarm implies that any alarms fulfilling this criteria are ignored. - Shelved alarms appear in a dedicated shelved alarm list in order not - to disturb the relevant alarms. Shelved alarms do not generate - notifications. + alarm implies that any alarms fulfilling this criteria are ignored + (blocked/filtered). Shelved alarms appear in a dedicated shelved + alarm list in order not to disturb the relevant alarms. Shelved + alarms do not generate notifications. -5. Alarm Data Model +4. Alarm Data Model Alarm shelving and operator actions are YANG features so that a server can select not to support these. The data model has the following overall structure: +--rw alarms +--rw control | +--rw max-alarm-status-changes? union | +--rw notify-status-changes? boolean @@ -470,57 +483,62 @@ | +--ro last-changed? yang:date-and-time | +--ro alarm* [resource alarm-type-id alarm-type-qualifier] | ... +--ro shelved-alarms {alarm-shelving}? +--ro number-of-shelved-alarms? yang:gauge32 +--ro alarm-shelf-last-changed? yang:date-and-time +--ro shelved-alarm* [resource alarm-type-id alarm-type-qualifier] ... -5.1. Alarm Control +4.1. Alarm Control The "/alarms/control/notify-status-changes" leaf controls if notifications are sent for all state changes, severity change and alarm text change, or just for new and cleared alarms. Every alarm has a list of status changes, this is a circular list. The length of this list is controlled by "/alarms/control/max-alarm- status-changes". -5.1.1. Alarm Shelving +4.1.1. Alarm Shelving The shelving control tree is shown below: +--rw alarms +--rw control +--rw alarm-shelving {alarm-shelving}? - +--rw shelf* [shelf-name] - +--rw shelf-name string - +--rw resource? resource + +--rw shelf* [name] + +--rw name string + +--rw resource* resource-match +--rw alarm-type-id? alarm-type-id - +--rw alarm-type-qualifier? alarm-type-qualifier + +--rw alarm-type-qualifier-match? string +--rw description? string Shelved alarms are shown in a dedicated shelved alarm list. The instrumentation MUST move shelved alarms from the alarm list (/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- alarms/). Shelved alarms do not generate any notifications. When the shelving criteria is removed or changed the alarm list MUST be updated to the correct actual state of the alarms. + Shelving and unshelving can only be performed by editing the shelf + configuration. It cannot be performed on individual alarms. The + server will add an operator state indicating that the alarm was + shelved/unshelved. + A leaf (/alarms/summary/shelfs-active) in the alarm summary indicates if there are shelved alarms. A system can select to not support the shelving feature. -5.2. Alarm Inventory +4.2. Alarm Inventory The alarm inventory represents all possible alarm types that may occur in the system. A management system may use this to build alarm procedures. The alarm inventory is relevant for several reasons: The system might not instrument all alarm type identities. The system has configured dynamic alarm types using the alarm qualifier. The inventory makes it possible for the management system to discover these. @@ -531,26 +549,26 @@ The optional leaf-list "resource" in the alarm inventory enables the system to publish for which resources a given alarm type may appear. The alarm inventory tree is shown below: +--rw alarms +--ro alarm-inventory +--ro alarm-type* [alarm-type-id alarm-type-qualifier] +--ro alarm-type-id alarm-type-id +--ro alarm-type-qualifier alarm-type-qualifier - +--ro resource* string + +--ro resource* resource-match +--ro has-clear boolean +--ro severity-levels* severity +--ro description string -5.3. Alarm Summary +4.3. Alarm Summary The alarm summary list summarises alarms per severity; how many cleared, cleared and closed, and closed. It also gives an indication if there are shelved alarms. The alarm summary tree is shown below: +--rw alarms +--ro summary +--ro alarm-summary* [severity] @@ -560,111 +578,115 @@ | +--ro cleared-not-closed? yang:gauge32 | | {operator-actions}? | +--ro cleared-closed? yang:gauge32 | | {operator-actions}? | +--ro not-cleared-closed? yang:gauge32 | | {operator-actions}? | +--ro not-cleared-not-closed? yang:gauge32 | {operator-actions}? +--ro shelves-active? empty {alarm-shelving}? -5.4. The Alarm List +4.4. The Alarm List The alarm list (/alarms/alarm-list) is a function from (resource, alarm type, alarm type qualifier) to the current alarm state. +--ro alarm-list +--ro number-of-alarms? yang:gauge32 +--ro last-changed? yang:date-and-time +--ro alarm* [resource alarm-type-id alarm-type-qualifier] - +--ro time-created yang:date-and-time +--ro resource resource +--ro alarm-type-id alarm-type-id +--ro alarm-type-qualifier alarm-type-qualifier +--ro alt-resource* resource +--ro related-alarm* | [resource alarm-type-id alarm-type-qualifier] | +--ro resource | | -> /alarms/alarm-list/alarm/resource | +--ro alarm-type-id leafref | +--ro alarm-type-qualifier leafref +--ro impacted-resource* resource +--ro root-cause-resource* resource + +--ro time-created yang:date-and-time +--ro is-cleared boolean +--ro last-changed yang:date-and-time +--ro perceived-severity severity +--ro alarm-text alarm-text +--ro status-change* [time] {alarm-history}? | +--ro time yang:date-and-time | +--ro perceived-severity severity-with-clear | +--ro alarm-text alarm-text +--ro operator-state-change* [time] {operator-actions}? | +--ro time yang:date-and-time | +--ro operator string | +--ro state operator-state | +--ro text? string +---x set-operator-state {operator-actions}? +---w input - +---w state operator-state + +---w state writable-operator-state +---w text? string Every alarm has three important states, the resource clearance state "is-cleared", the severity "perceived-severity" and the operator state available in the operator state change list. In order to see the alarm history the resource state changes are available in the "status-change" list and the operator history is available in the "operator-state-change" list. -5.5. The Shelved Alarms List +4.5. The Shelved Alarms List The shelved alarm list has the same structure as the alarm list above. It shows all the alarms that matches the shelving criteria (/alarms/control/alarm-shelving). -5.6. RPCs and Actions +4.6. RPCs and Actions The alarm module supports rpcs and actions to manage the alarms: "purge-alarms" (rpc): delete alarms according to specific criteria, for example all cleared alarms older then a specific date. "compress-alarms" (rpc): compress the status-change list for the alarms. "set-operator-state" (action): change the operator state for an alarm: for example acknowledge. -5.7. Notifications +4.7. Notifications The alarm module supports a general notification to report alarm state changes. It carries all relevant parameters for the alarm management application. There is also a notification to report that an operator changed the operator state on an alarm, like acknowledge. If the alarm inventory is changed, for example a new card type is inserted, a notification will tell the management application that new alarm types are available. -6. Alarm YANG Module +5. Alarm YANG Module - file "ietf-alarms@2017-10-30.yang" + This YANG module references [RFC6991]. + + file "ietf-alarms@2018-02-01.yang" module ietf-alarms { yang-version 1.1; namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; prefix al; import ietf-yang-types { prefix yang; + reference "RFC 6991: Common YANG Data Types."; + } organization "IETF CCAMP Working Group"; contact "WG Web: WG List: Editor: Stefan Vallin @@ -713,21 +735,21 @@ string-based qualifier. The string-based qualifier allows for dynamic extension of the statically defined alarm types. Alarm types identify a possible alarm state and not the individual notifications. For example, the traditional 'link-down' and 'link-up' notifications are two notifications referring to the same alarm type 'link-alarm'. With this design there is no ambiguity about how alarm and alarm clear correlation should be performed: notifications that report the same resource and alarm type are considered updates of the - same alarm, such as clearing an active alarm or changing the + same alarm, e.g., clearing an active alarm or changing the severity of an alarm. The instrumentation can update 'severity' and 'alarm-text' on an existing alarm. The above alarm example can therefore look like: (('link-alarm', 'GigabitEthernet0/25'), warning, 'interface down while interface admin state is up') @@ -735,103 +757,181 @@ the underlying resource, like clear, and updates from an operator like acknowledge or closing an alarm: (('link-alarm', 'GigabitEthernet0/25'), warning, 'interface down while interface admin state is up', cleared, closed) Administrative actions like removing closed alarms older than a - given time is supported."; + given time is supported. - revision 2017-10-30 { + Copyright (c) 2018 IETF Trust and the persons identified as + authors of the code. All rights reserved. + + Redistribution and use in source and binary forms, with or + without modification, is permitted pursuant to, and subject to + the license terms contained in, the Simplified BSD License set + forth in Section 4.c of the IETF Trust's Legal Provisions + Relating to IETF Documents + (https://trustee.ietf.org/license-info). + + The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL + NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and + 'OPTIONAL' in the module text are to be interpreted as described + in RFC 2119 (https://tools.ietf.org/html/rfc2119). + + This version of this YANG module is part of RFC XXXX + (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for + full legal notices."; + + revision 2018-02-01 { description "Initial revision."; reference "RFC XXXX: YANG Alarm Module"; } /* * Features */ feature operator-actions { description - "This feature means that the systems supports operator states - on alarms."; + "This feature indicates that the system supports operator + states on alarms."; } feature alarm-shelving { description - "This feature means that the system supports shelving + "This feature indicates that the system supports shelving (blocking) alarms."; } feature alarm-history { description - "This feature means that the alarm list also maintains a - history of state changes for each alarm. For example, if an - alarm toggles between cleared and active 10 times, a list for - that alarm will show those state changes with time-stamps."; + "This feature indicates that server maintains a history of + state changes for each alarm. For example, if an alarm + toggles between cleared and active 10 times, these state + changes are present in a separate list in the alarm."; } /* * Identities */ - identity alarm-identity { + identity alarm-type-id { description "Base identity for alarm types. A unique identification of the alarm, not including the resource. Different resources can share alarm types. If the resource reports the same alarm type, it is to be considered to be the same alarm. The alarm type is a simplification of the different X.733 and 3GPP alarm IRP alarm correlation mechanisms and it allows for hierarchical extensions. A string-based qualifier can be used in addition to the identity in order to have different alarm types based on information not known at design-time, such as values in textual SNMP Notification var-binds. Standards and vendors can define sub-identities to clearly identify specific alarm types. - This identity is abstract and shall not be used for alarms."; + This identity is abstract and MUST NOT be used for alarms."; } /* * Common types */ typedef resource { type union { type instance-identifier { require-instance false; } type yang:object-identifier; type string; } description "This is an identification of the alarming resource, such as an interface. It should be as fine-grained as possible both to - guide the operator and to guarantee uniqueness of the - alarms. If a resource has both a config and a state tree - normally this should identify the state tree, - (e.g., /interfaces-state/interface/name). - But if the instrumentation can detect a broken config, this - should be identified as the resource. - If the alarming resource is modelled in YANG, this - type will be an instance-identifier. If the resource is an - SNMP object, the type will be an object-identifier. If the - resource is anything else, for example a distinguished name or - a CIM path, this type will be a string."; + guide the operator and to guarantee uniqueness of the alarms. + + If the alarming resource is modelled in YANG, this type will + be an instance-identifier. + + If the resource is an SNMP object, the type will be an + object-identifier. + + If the resource is anything else, for example a distinguished + name or a CIM path, this type will be a string. + + If the server supports several models, the presedence should + be in the order as given in the union definition."; + } + + typedef resource-match { + type union { + type yang:xpath1.0; + type yang:object-identifier; + type string; + } + description + "This type is used to match resources of type 'resource'. + Since the type 'resource' is a union of three different types, + the 'resource-match' type is also a union if corresponding + types. + + If the type is given as an XPath 1.0 expression, a resource + of type 'instance-identifier' matches if the instance is part + of the node set that is the result of evaluating the XPath 1.0 + expression. For example, the XPath 1.0 expression: + + /if:interfaces/if:interface[if:type='ianaift:ethernetCsmacd'] + + would match the resource instance-identifier: + + /if:interfaces/if:interface[if:name='eth1'], + + assuming that the interface 'eth1' is of type + 'ianaift:ethernetCsmacd'. + + If the type is given as an object identifier, a resource of + type 'object-identifier' matches if the match object + identifier is a prefix of the resource's object identifier. + For example, the value: + + 1.3.6.1.2.1.2.2 + + would match the resource object identifier: + + 1.3.6.1.2.1.2.2.1.1.5 + + If the type is given as a string, it is interpreted as a W3C + regular expression, which matches a resource of type 'string' + if the given regular expression matches the resource string. + + If the type is given as an XPath expressionm it is evaluated + in the following XPath context: + + o The set of namespace declarations are those in scope on + the leaf element where this type is used. + + o The set of variable bindings is empty. + + o The function library is the core function library + and the functions defined in Section 10 of RFC 7950. + + o The function library is the core function library + + o The context node is the root node in the data tree."; } typedef alarm-text { type string; description "The string used to inform operators about the alarm. This MUST contain enough information for an operator to be able to understand the problem and how to resolve it. If this string contains structure, this format should be clearly documented for programs to be able to parse that @@ -907,67 +1007,80 @@ } } type severity; } description "The severity level of the alarm including clear. This is used *only* in notifications reporting state changes for an alarm."; } - typedef operator-state { + typedef writable-operator-state { type enumeration { enum none { value 1; description "The alarm is not being taken care of."; } enum ack { value 2; description "The alarm is being taken care of. Corrective action not taken yet, or failed"; } enum closed { value 3; description "Corrective action taken successfully."; } + } + description + "Operator states on an alarm. The 'closed' state indicates + that an operator considers the alarm being resolved. This + is separate from the alarm's 'is-cleared' leaf."; + } + + typedef operator-state { + type union { + type writable-operator-state; + type enumeration { enum shelved { value 4; description - "Alarm shelved. Alarms in alarms/shelved-alarms/ + "The alarm is shelved. Alarms in /alarms/shelved-alarms/ MUST be assigned this operator state by the server as - the last entry in the operator-state-change list."; + the last entry in the operator-state-change list. The + text for that entry SHOULD include the shelf name."; } enum un-shelved { value 5; description - "Alarm moved back to alarm-list from shelf. - Alarms 'moved' from /alarms/shelved-alarms/ + "The alarm is moved back to 'alarm-list' from a shelf. + Alarms that are moved from /alarms/shelved-alarms/ to /alarms/alarm-list MUST be assigned this state by the server as the last entry in the - operator-state-change list."; + 'operator-state-change' list. The text for that + entry SHOULD include the shelf name."; + } } - } description "Operator states on an alarm. The 'closed' state indicates that an operator considers the alarm being resolved. This - is separate from the resource alarm clear flag."; + is separate from the alarm's 'is-cleared' leaf."; } /* Alarm type */ typedef alarm-type-id { type identityref { - base alarm-identity; + base alarm-type-id; } description "Identifies an alarm type. The description of the alarm type id MUST indicate if the alarm type is abstract or not. An abstract alarm type is used as a base for other alarm type ids and will not be used as a value for an alarm or be present in the alarm inventory."; } typedef alarm-type-qualifier { @@ -1199,21 +1313,21 @@ type alarm-text; mandatory true; description "The last reported alarm text. This text should contain information for an operator to be able to understand the problem and how to resolve it."; } list status-change { if-feature alarm-history; - key time; + key "time"; min-elements 1; description "A list of status change events for this alarm. The entry with latest time-stamp in this list MUST correspond to the leafs 'is-cleared', 'perceived-severity' and 'alarm-text' for the alarm. The time-stamp for that entry MUST be equal to the 'last-changed' leaf. This list is ordered according to the timestamps of @@ -1267,108 +1382,115 @@ description "This leaf controls whether notifications are sent on all alarm status updates, e.g., updated perceived-severity or alarm-text. By default the notifications are only sent when a new alarm is raised, re-raised after being cleared and when an alarm is cleared."; } container alarm-shelving { if-feature alarm-shelving; description - "This list is used to shelve alarms. The server will move - any alarms corresponding to the shelving criteria from the + "The alarm-shelving/shelf list is used to shelve + (block/filter) alarms. The server will move any alarms + corresponding to the shelving criteria from the alarms/alarm-list/alarm list to the alarms/shelved-alarms/shelved-alarm list. It will also stop sending notifications for the shelved alarms. The conditions in the shelf criteria are logically ANDed. When the shelving criteria is deleted or changed, the non-matching alarms MUST appear in the alarms/alarm-list/alarm list according to the real state. This means that the instrumentation MUST maintain states for the shelved alarms. Alarms that match the criteria - shall have an operator-state 'shelved'."; + shall have an operator-state 'shelved'. When the shelf + configuration will remove an alarm from the shelf the + server shall add an operator state 'unshelved'"; list shelf { - key shelf-name; - leaf shelf-name { + key "name"; + leaf name { type string; description "An arbitrary name for the alarm shelf."; } description "Each entry defines the criteria for shelving alarms. - Criterias are ANDed."; + Criterias are ANDed. If no criteria are specified, + all alarms will be shelved."; - leaf resource { - type resource; + leaf-list resource { + type resource-match; description - "Shelve alarms for this resource."; + "Shelve alarms for matching resources."; } leaf alarm-type-id { type alarm-type-id; description - "Shelve alarms for this alarm type identifier."; + "Shelve all alarms that have an alarm-type-id that is + equal to or derived from the given alarm-type-id."; } - leaf alarm-type-qualifier { - type alarm-type-qualifier; + leaf alarm-type-qualifier-match { + type string; description - "Shelve alarms for this alarm type qualifier."; + "A W3C regular expression that is used to match + an alarm type qualifier. Shelve all alarms that + matches this regular expression for the alarm + type qualifier."; } leaf description { type string; description "An optional textual description of the shelf. This description should include the reason for shelving these alarms."; } } } } container alarm-inventory { config false; description - "This list contains all possible alarm types for the system. - If the system knows for which resources a a specific alarm + "This alarm-inventory/alarm-type list contains all possible + alarm types for the system. + If the system knows for which resources a specific alarm type can appear, this is also identified in the inventory. The list also tells if each alarm type has a corresponding clear state. The inventory shall only contain concrete alarm types. The alarm inventory MUST be updated by the system when new alarms can appear. This can be the case when installing new software modules or inserting new card types. A notification 'alarm-inventory-changed' is sent when the inventory is changed."; list alarm-type { key "alarm-type-id alarm-type-qualifier"; description "An entry in this list defines a possible alarm."; leaf alarm-type-id { type alarm-type-id; - mandatory true; description "The statically defined alarm type identifier for this possible alarm."; } leaf alarm-type-qualifier { type alarm-type-qualifier; description "The optionally dynamically defined alarm type identifier for this possible alarm."; } leaf-list resource { - type string; + type resource-match; description "Optionally, specifies for which resources the alarm type - is valid. This string is for human consumption but - SHOULD refer to paths in the model."; + is valid."; } leaf has-clear { type boolean; mandatory true; description "This leaf tells the operator if the alarm will be cleared when the correct corrective action has been taken. Implementations SHOULD strive for detecting the cleared state for all alarm types. If this leaf is true, the operator can monitor the alarm until it @@ -1399,26 +1521,27 @@ "A description of the possible alarm. It SHOULD include information on possible underlying root causes and corrective actions."; } } } container summary { config false; description - "This container gives a summary of number of alarms - and shelved alarms"; + "This container gives a summary of number of alarms"; list alarm-summary { - key severity; + key "severity"; description - "A global summary of all alarms in the system."; + "A global summary of all alarms in the system. The summary + does not include shelved alarms"; + leaf severity { type severity; description "Alarm summary for this severity level."; } leaf total { type yang:gauge32; description "Total number of alarms of this severity level."; } @@ -1504,43 +1626,46 @@ alarm becomes active for a given alarm-type and resource. Entries do not get deleted when the alarm is cleared, this is a boolean state in the alarm. Alarm entries are removed, purged, from the list by an explicit purge action. For example, delete all alarms that are cleared and in closed operator-state that are older than 24 hours. Systems may also remove alarms based on locally configured policies which is out of scope for this module."; + uses common-alarm-parameters; + leaf time-created { type yang:date-and-time; mandatory true; description "The time-stamp when this alarm entry was created. This represents the first time the alarm appeared, it can also represent that the alarm re-appeared after a purge. Further state-changes of the same alarm does not change this leaf, these changes will update the 'last-changed' leaf."; } - uses common-alarm-parameters; uses resource-alarm-parameters; + list operator-state-change { if-feature operator-actions; - key time; + key "time"; description "This list is used by operators to indicate the state of human intervention on an alarm. For example, if an operator has seen an alarm, the operator can add a new item to this list indicating that the alarm is acknowledged."; + uses operator-parameters; } action set-operator-state { if-feature operator-actions; description "This is a means for the operator to indicate the level of human intervention on an alarm."; input { leaf state { @@ -1537,21 +1662,21 @@ uses operator-parameters; } action set-operator-state { if-feature operator-actions; description "This is a means for the operator to indicate the level of human intervention on an alarm."; input { leaf state { - type operator-state; + type writable-operator-state; mandatory true; description "Set this operator state."; } leaf text { type string; description "Additional optional textual information."; } } @@ -1589,31 +1714,41 @@ key "resource alarm-type-id alarm-type-qualifier"; description "The list of shelved alarms. Each entry in the list holds one alarm for a given alarm type and resource. An alarm can be updated from the underlying resource or by the user. These changes are reflected in different lists below the corresponding alarm."; uses common-alarm-parameters; + + leaf shelf-name { + type leafref { + path "/alarms/control/alarm-shelving/shelf/name"; + require-instance false; + } + description + "The name of the shelf."; + } uses resource-alarm-parameters; list operator-state-change { if-feature operator-actions; - key time; + key "time"; description "This list is used by operators to indicate the state of human intervention on an alarm. For example, if an operator has seen an alarm, the operator can add a new item to this list indicating that the alarm is acknowledged."; + uses operator-parameters; } } } } /* * Operations */ @@ -1834,42 +1970,43 @@ leaf alarm-type-qualifier { type leafref { path "/alarms/alarm-list/alarm" + "[resource=current()/../resource]" + "[alarm-type-id=current()/../alarm-type-id]" + "/alarm-type-qualifier"; require-instance false; } description "The alarm qualifier for the alarm."; + } uses operator-parameters; } } -7. X.733 Alarm Mapping Data Model +6. X.733 Alarm Mapping Data Model Many alarm management systems are based on the X.733 alarm standard. This YANG module allows a mapping from alarm types to X.733 event- type and probable-cause. The module augments the alarm inventory, the alarm list and the alarm notification with X.733 parameters. The module also supports a feature whereby the alarm manager can configure the mapping. This might be needed when the default mapping provided by the system is in conflict with other systems or not considered good. -8. X.733 Alarm Mapping YANG Module +7. X.733 Alarm Mapping YANG Module This YANG module references [X.733]. file "ietf-alarms-x733@2017-10-30.yang" module ietf-alarms-x733 { yang-version 1.1; namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; prefix x733; import ietf-alarms { @@ -2143,48 +2282,142 @@ "Augment X.733 information to the alarm."; uses x733-alarm-parameters; } augment "/al:alarm-notification" { description "Augment X.733 information to the alarm notification."; uses x733-alarm-parameters; - } } +8. IANA Considerations + + This document registers a URI in the IETF XML registry [RFC3688]. + Following the format in RFC 3688, the following registration is + requested to be made. + + URI: urn:ietf:params:xml:ns:yang:ietf-alarms + + Registrant Contact: The IESG. + + XML: N/A, the requested URI is an XML namespace. + + This document registers a YANG module in the YANG Module Names + registry [RFC6020]. + + name: ietf-alarms + namespace: urn:ietf:params:xml:ns:yang:ietf-alarms + prefix: al + reference: RFC XXXX + 9. Security Considerations - None. + The YANG module specified in this document defines a schema for data + that is designed to be accessed via network management protocols such + as NETCONF [RFC6241] or RESTCONF [RFC8040]. The lowest NETCONF layer + is the secure transport layer, and the mandatory-to-implement secure + transport is Secure Shell (SSH) [RFC6242]. The lowest RESTCONF layer + is HTTPS, and the mandatory-to-implement secure transport is TLS + [RFC5246]. + + The NETCONF access control model [RFC6536] provides the means to + restrict access for particular NETCONF or RESTCONF users to a + preconfigured subset of all available NETCONF or RESTCONF protocol + operations and content. + + There are a number of data nodes defined in this YANG module that are + writable/creatable/deletable (i.e., config true, which is the + default). These data nodes may be considered sensitive or vulnerable + in some network environments. Write operations (e.g., edit-config) + to these data nodes without proper protection can have a negative + effect on network operations. These are the subtrees and data nodes + and their sensitivity/vulnerability: + + /alarms/control/notify-status-change: This leaf controls whether an + alarm should notify only raise and clear or all severity level + changes. Unauthorized access to leaf could have a negative impact + on operational procedures relying on fine-grained alarm state + change reporting. + + /alarms/control/alarm-shelving/shelf: This list controls the + shelving (blocking) of alarms. Unauthorized access to this list + could jeopardize the alarm management procedures since these + alarms will not be notified and not be part of the alarm list. + + Some of the RPC operations in this YANG module may be considered + sensitive or vulnerable in some network environments. It is thus + important to control access to these operations. These are the + operations and their sensitivity/vulnerability: + + purge-alarms: This RPC deletes alarms from the alarm list. + Unauthorized use of this RPC could jeopardize the alarm management + procedures since the deleted alarms may be vital for the alarm + management application. 10. Acknowledgements - The author wishes to thank Viktor Leijon and Johan Nordlander for + The authors wish to thank Viktor Leijon and Johan Nordlander for their valuable input on forming the alarm model. + The authors also wish to thank Nick Hancock, Joey Boyd, Tom Petch and + Balazs Lengyel for their extensive reviews and contributions to this + document. + 11. References 11.1. Normative References [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, March 1997, . + [RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688, + DOI 10.17487/RFC3688, January 2004, . + + [RFC5246] Dierks, T. and E. Rescorla, "The Transport Layer Security + (TLS) Protocol Version 1.2", RFC 5246, + DOI 10.17487/RFC5246, August 2008, . + + [RFC6020] Bjorklund, M., Ed., "YANG - A Data Modeling Language for + the Network Configuration Protocol (NETCONF)", RFC 6020, + DOI 10.17487/RFC6020, October 2010, . + + [RFC6241] Enns, R., Ed., Bjorklund, M., Ed., Schoenwaelder, J., Ed., + and A. Bierman, Ed., "Network Configuration Protocol + (NETCONF)", RFC 6241, DOI 10.17487/RFC6241, June 2011, + . + + [RFC6242] Wasserman, M., "Using the NETCONF Protocol over Secure + Shell (SSH)", RFC 6242, DOI 10.17487/RFC6242, June 2011, + . + + [RFC6991] Schoenwaelder, J., Ed., "Common YANG Data Types", + RFC 6991, DOI 10.17487/RFC6991, July 2013, + . + [RFC7950] Bjorklund, M., Ed., "The YANG 1.1 Data Modeling Language", RFC 7950, DOI 10.17487/RFC7950, August 2016, . + [RFC8040] Bierman, A., Bjorklund, M., and K. Watsen, "RESTCONF + Protocol", RFC 8040, DOI 10.17487/RFC8040, January 2017, + . + [RFC8174] Leiba, B., "Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words", BCP 14, RFC 8174, DOI 10.17487/RFC8174, May 2017, . [X.733] International Telecommunications Union, "Information Technology - Open Systems Interconnection - Systems Management: Alarm Reporting Function", ITU-T Recommendation X.733, 1992. 11.2. Informative References @@ -2202,52 +2435,47 @@ Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, http://dx.doi.org/10.1002/nem.800", March 2012. [EEMUA] EEMUA Publication No. 191 Engineering Equipment and Materials Users Association, London, 2 edition., "Alarm Systems: A Guide to Design, Management and Procurement.", 2007. [I-D.ietf-netmod-yang-tree-diagrams] Bjorklund, M. and L. Berger, "YANG Tree Diagrams", draft- - ietf-netmod-yang-tree-diagrams-02 (work in progress), - October 2017. + ietf-netmod-yang-tree-diagrams-05 (work in progress), + January 2018. [ISA182] International Society of Automation,ISA, "ANSI/ISA- 18.2-2009 Management of Alarm Systems for the Process Industries", 2009. [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, September 2004, . - [X.736] International Telecommunications Union, "Information - Technology - Open Systems Interconnection - Systems - Management: Security alarm reporting function", - ITU-T Recommendation X.736, 1992. - Appendix A. Vendor-specific Alarm-Types Example This example shows how to define alarm-types in a vendor-specific module. In this case the vendor "xyz" has chosen to define top level identities according to X.733 event types. module example-xyz-alarms { namespace "urn:example:xyz-alarms"; prefix xyz-al; import ietf-alarms { prefix al; } identity xyz-alarms { - base al:alarm-identity; + base al:alarm-type-id; } identity communications-alarm { base xyz-alarms; } identity quality-of-service-alarm { base xyz-alarms; } identity processing-error-alarm { base xyz-alarms; @@ -2272,25 +2500,29 @@ Appendix B. Alarm Inventory Example This shows an alarm inventory, it shows one alarm type defined only with the identifier, and another dynamically configured. In the latter case a digital input has been connected to a smoke-detector, therefore the 'alarm-type-qualifier' is set to "smoke-detector" and the 'alarm-type-identity' to "environmental-alarm". + xmlns:xyz-al="urn:example:xyz-alarms" + xmlns:dev="urn:example:device"> xyz-al:link-alarm + + /dev:interfaces/dev:interface + true Link failure, operational state down but admin state up xyz-al:environmental-alarm smoke-alarm true @@ -2363,29 +2595,31 @@ This example shows how to shelf alarms. We shelf alarms related to the smoke-detectors since they are being installed and tested. We also shelf all alarms from FastEthernet1/0. - FE10 + FE10 /dev:interfaces/dev:interface[name='FastEthernet1/0'] - detectortest + detectortest xyz-al:environmental-alarm - smoke-alarm + + smoke-alarm + Appendix E. X.733 Mapping Example This example shows how to map a dynamic alarm type (alarm-type- identity=environmental-alarm, alarm-type-qualifier=smoke-alarm) to the corresponding X.733 event-type and probable cause parameters.