draft-ietf-ccamp-alarm-module-02.txt | draft-ietf-ccamp-alarm-module-03.txt | |||
---|---|---|---|---|
Network Working Group S. Vallin | Network Working Group S. Vallin | |||
Internet-Draft Stefan Vallin AB | Internet-Draft Stefan Vallin AB | |||
Intended status: Standards Track M. Bjorklund | Intended status: Standards Track M. Bjorklund | |||
Expires: February 9, 2019 Cisco | Expires: March 24, 2019 Cisco | |||
August 8, 2018 | September 20, 2018 | |||
YANG Alarm Module | YANG Alarm Module | |||
draft-ietf-ccamp-alarm-module-02 | draft-ietf-ccamp-alarm-module-03 | |||
Abstract | Abstract | |||
This document defines a YANG module for alarm management. It | This document defines a YANG module for alarm management. It | |||
includes functions for alarm list management, alarm shelving and | includes functions for alarm list management, alarm shelving and | |||
notifications to inform management systems. There are also RPCs to | notifications to inform management systems. There are also RPCs to | |||
manage the operator state of an alarm and administrative alarm | manage the operator state of an alarm and administrative alarm | |||
procedures. The module carefully maps to relevant alarm standards. | procedures. The module carefully maps to relevant alarm standards. | |||
Status of This Memo | Status of This Memo | |||
skipping to change at page 1, line 35 ¶ | skipping to change at page 1, line 35 ¶ | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on February 9, 2019. | This Internet-Draft will expire on March 24, 2019. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2018 IETF Trust and the persons identified as the | Copyright (c) 2018 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 18 ¶ | skipping to change at page 2, line 18 ¶ | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 | 1.1. Terminology and Notation . . . . . . . . . . . . . . . . 3 | |||
2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 | 2. Objectives . . . . . . . . . . . . . . . . . . . . . . . . . 4 | |||
3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 | 3. Alarm Module Concepts . . . . . . . . . . . . . . . . . . . . 5 | |||
3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 | 3.1. Alarm Definition . . . . . . . . . . . . . . . . . . . . 5 | |||
3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 | 3.2. Alarm Type . . . . . . . . . . . . . . . . . . . . . . . 5 | |||
3.3. Identifying the Alarming Resource . . . . . . . . . . . . 7 | 3.3. Identifying the Alarming Resource . . . . . . . . . . . . 7 | |||
3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 8 | 3.4. Identifying Alarm Instances . . . . . . . . . . . . . . . 8 | |||
3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 | 3.5. Alarm Life-Cycle . . . . . . . . . . . . . . . . . . . . 8 | |||
3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 8 | 3.5.1. Resource Alarm Life-Cycle . . . . . . . . . . . . . . 9 | |||
3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 9 | 3.5.2. Operator Alarm Life-cycle . . . . . . . . . . . . . . 10 | |||
3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 10 | 3.5.3. Administrative Alarm Life-Cycle . . . . . . . . . . . 10 | |||
3.6. Root Cause, Impacted Resources and Related Alarms . . . . 10 | 3.6. Root Cause, Impacted Resources and Related Alarms . . . . 10 | |||
3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 11 | 3.7. Alarm Shelving . . . . . . . . . . . . . . . . . . . . . 11 | |||
3.8. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 11 | 3.8. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 11 | |||
4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 11 | 4. Alarm Data Model . . . . . . . . . . . . . . . . . . . . . . 12 | |||
4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 12 | 4.1. Alarm Control . . . . . . . . . . . . . . . . . . . . . . 13 | |||
4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 12 | 4.1.1. Alarm Shelving . . . . . . . . . . . . . . . . . . . 13 | |||
4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 12 | 4.2. Alarm Inventory . . . . . . . . . . . . . . . . . . . . . 13 | |||
4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 13 | 4.3. Alarm Summary . . . . . . . . . . . . . . . . . . . . . . 14 | |||
4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 13 | 4.4. The Alarm List . . . . . . . . . . . . . . . . . . . . . 15 | |||
4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 13 | 4.5. The Shelved Alarms List . . . . . . . . . . . . . . . . . 17 | |||
4.6. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 14 | 4.6. Alarm Profiles . . . . . . . . . . . . . . . . . . . . . 17 | |||
4.7. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 14 | 4.7. RPCs and Actions . . . . . . . . . . . . . . . . . . . . 17 | |||
4.8. Notifications . . . . . . . . . . . . . . . . . . . . . . 14 | 4.8. Notifications . . . . . . . . . . . . . . . . . . . . . . 17 | |||
5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 14 | 5. Alarm YANG Module . . . . . . . . . . . . . . . . . . . . . . 18 | |||
6. X.733 Extensions . . . . . . . . . . . . . . . . . . . . . . 44 | 6. X.733 Extensions . . . . . . . . . . . . . . . . . . . . . . 47 | |||
7. The X.733 Mapping Module . . . . . . . . . . . . . . . . . . 44 | 7. The X.733 Mapping Module . . . . . . . . . . . . . . . . . . 48 | |||
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 55 | 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 59 | |||
9. Security Considerations . . . . . . . . . . . . . . . . . . . 56 | 9. Security Considerations . . . . . . . . . . . . . . . . . . . 59 | |||
10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 57 | 10. Acknowledgements . . . . . . . . . . . . . . . . . . . . . . 60 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 57 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 60 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 57 | 11.1. Normative References . . . . . . . . . . . . . . . . . . 60 | |||
11.2. Informative References . . . . . . . . . . . . . . . . . 58 | 11.2. Informative References . . . . . . . . . . . . . . . . . 61 | |||
Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 59 | Appendix A. Vendor-specific Alarm-Types Example . . . . . . . . 62 | |||
Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 60 | Appendix B. Alarm Inventory Example . . . . . . . . . . . . . . 63 | |||
Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 61 | Appendix C. Alarm List Example . . . . . . . . . . . . . . . . . 64 | |||
Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 62 | Appendix D. Alarm Shelving Example . . . . . . . . . . . . . . . 65 | |||
Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 63 | Appendix E. X.733 Mapping Example . . . . . . . . . . . . . . . 66 | |||
Appendix F. Background and Usability Requirements . . . . . . . 64 | Appendix F. Background and Usability Requirements . . . . . . . 67 | |||
F.1. Alarm Concepts . . . . . . . . . . . . . . . . . . . . . 64 | F.1. Alarm Concepts . . . . . . . . . . . . . . . . . . . . . 67 | |||
F.1.1. Alarm type . . . . . . . . . . . . . . . . . . . . . 64 | F.1.1. Alarm type . . . . . . . . . . . . . . . . . . . . . 67 | |||
F.2. Usability Requirements . . . . . . . . . . . . . . . . . 65 | F.2. Relationships to other alarm standards . . . . . . . . . 68 | |||
F.2.1. Alarm definition . . . . . . . . . . . . . . . . . . 68 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 68 | F.2.2. Data model . . . . . . . . . . . . . . . . . . . . . 70 | |||
F.3. Usability Requirements . . . . . . . . . . . . . . . . . 72 | ||||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 75 | ||||
1. Introduction | 1. Introduction | |||
This document defines a YANG [RFC7950] module for alarm management. | This document defines a YANG [RFC7950] module for alarm management. | |||
The purpose is to define a standardised alarm interface for network | The purpose is to define a standardized alarm interface for network | |||
devices that can be easily integrated into management applications. | devices that can be easily integrated into management applications. | |||
The model is also applicable as a northbound alarm interface in the | The model is also applicable as a northbound alarm interface in the | |||
management applications. | management applications. | |||
Alarm monitoring is a fundamental part of monitoring the network. | Alarm monitoring is a fundamental part of monitoring the network. | |||
Raw alarms from devices do not always tell the status of the network | Raw alarms from devices do not always tell the status of the network | |||
services or necessarily point to the root cause. However, being able | services or necessarily point to the root cause. However, being able | |||
to feed alarms to the alarm management application in a standardised | to feed alarms to the alarm management application in a standardized | |||
format is a starting point for performing higher level network | format is a starting point for performing higher level network | |||
assurance tasks. | assurance tasks. | |||
The design of the module is based on experience from using and | The design of the module is based on experience from using and | |||
implementing available alarm standards from ITU [X.733], 3GPP | implementing available alarm standards from ITU [X.733], 3GPP | |||
[ALARMIRP] and ANSI [ISA182]. | [ALARMIRP] and ANSI [ISA182]. | |||
1.1. Terminology and Notation | 1.1. Terminology and Notation | |||
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", | |||
skipping to change at page 4, line 20 ¶ | skipping to change at page 4, line 20 ¶ | |||
for example: an interface, a process. | for example: an interface, a process. | |||
o Alarm Instance: The alarm state for a specific resource and alarm | o Alarm Instance: The alarm state for a specific resource and alarm | |||
type. For example (GigabitEthernet0/15, link-alarm). An entry in | type. For example (GigabitEthernet0/15, link-alarm). An entry in | |||
the alarm list. | the alarm list. | |||
o Alarm Inventory: A list of all possible alarm types on a system. | o Alarm Inventory: A list of all possible alarm types on a system. | |||
o Alarm Shelving: Blocking alarms according to specific criteria. | o Alarm Shelving: Blocking alarms according to specific criteria. | |||
o Corrective Action: An action taken by an operator or automation | ||||
routine in order to minimize the impact of the alarm or resolving | ||||
the root cause. | ||||
o Management System: The alarm management application that consumes | o Management System: The alarm management application that consumes | |||
the alarms, i.e., acts as a client. | the alarms, i.e., acts as a client. | |||
o System: The system that implements this YANG alarm module, i.e., | o System: The system that implements this YANG alarm module, i.e., | |||
acts as a server. This corresponds to a network device or a | acts as a server. This corresponds to a network device or a | |||
management application that provides a north-bound alarm | management application that provides a north-bound alarm | |||
interface. | interface. | |||
Tree diagrams used in this document follow the notation defined in | Tree diagrams used in this document follow the notation defined in | |||
[RFC8340]. | [RFC8340]. | |||
skipping to change at page 5, line 37 ¶ | skipping to change at page 5, line 42 ¶ | |||
There are two main things to remember from this definition: | There are two main things to remember from this definition: | |||
1. the definition focuses on leaving out events and logging | 1. the definition focuses on leaving out events and logging | |||
information in general. Alarms should only be used for undesired | information in general. Alarms should only be used for undesired | |||
states that require action. | states that require action. | |||
2. the definition also focus on alarms as a state on a resource, not | 2. the definition also focus on alarms as a state on a resource, not | |||
the notifications that report the state changes. | the notifications that report the state changes. | |||
See Appendix F for more motivation and consequences around this | See Appendix F for more motivation and consequences around this | |||
definition. | definition as well as how it relates to other alarm standards. | |||
3.2. Alarm Type | 3.2. Alarm Type | |||
This document defines an alarm type with an alarm type id and an | This document defines an alarm type with an alarm type id and an | |||
alarm type qualifier. | alarm type qualifier. | |||
The alarm type id is modeled as a YANG identity. With YANG | The alarm type id is modeled as a YANG identity. With YANG | |||
identities, new alarm types can be defined in a distributed fashion. | identities, new alarm types can be defined in a distributed fashion. | |||
YANG identities are hierarchical, which means that an hierarchy of | YANG identities are hierarchical, which means that an hierarchy of | |||
alarm types can be defined. | alarm types can be defined. | |||
skipping to change at page 6, line 14 ¶ | skipping to change at page 6, line 17 ¶ | |||
The use of YANG identities means that all possible alarms are | The use of YANG identities means that all possible alarms are | |||
identified at design time. This explicit declaration of alarm types | identified at design time. This explicit declaration of alarm types | |||
makes it easier to allow for alarm qualification reviews and | makes it easier to allow for alarm qualification reviews and | |||
preparation of alarm actions and documentation. | preparation of alarm actions and documentation. | |||
There are occasions where the alarm types are not known at design | There are occasions where the alarm types are not known at design | |||
time. For example, a system with digital inputs that allows users to | time. For example, a system with digital inputs that allows users to | |||
connects detectors (e.g., smoke detector) to the inputs. In this | connects detectors (e.g., smoke detector) to the inputs. In this | |||
case it is a configuration action that says that certain connectors | case it is a configuration action that says that certain connectors | |||
are fire alarms for example. A potential drawback of this is that | are fire alarms for example. | |||
there is a big risk that alarm operators will receive alarm types as | ||||
a surprise, they do not know how to resolve the problem since a | ||||
defined alarm procedure does not necessarily exist. To avoid this | ||||
risk the system MUST publish all possible alarm types in the alarm | ||||
inventory, see Section 4.2. | ||||
In order to allow for dynamic addition of alarm types the alarm | In order to allow for dynamic addition of alarm types the alarm | |||
module also allows for further qualification of the identity based | module allows for further qualification of the identity based alarm | |||
alarm type using a string. | type using a string. A potential drawback of this is that there is a | |||
big risk that alarm operators will receive alarm types as a surprise, | ||||
they do not know how to resolve the problem since a defined alarm | ||||
procedure does not necessarily exist. To avoid this risk the system | ||||
MUST publish all possible alarm types in the alarm inventory, see | ||||
Section 4.2. | ||||
A vendor or standard can then define their own alarm-type hierarchy. | A vendor or standard organization can define their own alarm-type | |||
The example below shows a hierarchy based on X.733 event types: | hierarchy. The example below shows a hierarchy based on X.733 event | |||
types: | ||||
import ietf-alarms { | import ietf-alarms { | |||
prefix al; | prefix al; | |||
} | } | |||
identity vendor-alarms { | identity vendor-alarms { | |||
base al:alarm-type; | base al:alarm-type; | |||
} | } | |||
identity communications-alarm { | identity communications-alarm { | |||
base vendor-alarms; | base vendor-alarms; | |||
} | } | |||
skipping to change at page 7, line 51 ¶ | skipping to change at page 8, line 5 ¶ | |||
A server SHOULD strive to minimize the number of dynamically defined | A server SHOULD strive to minimize the number of dynamically defined | |||
alarm types. | alarm types. | |||
3.3. Identifying the Alarming Resource | 3.3. Identifying the Alarming Resource | |||
It is of vital importance to be able to refer to the alarming | It is of vital importance to be able to refer to the alarming | |||
resource. This reference must be as fine-grained as possible. If | resource. This reference must be as fine-grained as possible. If | |||
the alarming resource exists in the data tree then an instance- | the alarming resource exists in the data tree then an instance- | |||
identifier MUST be used with the full path to the object. | identifier MUST be used with the full path to the object. | |||
When the module is used in a controller/orchestrator/manager the | ||||
original device resource identification can be modified to include | ||||
the device in the path. The details depend on how devices are | ||||
identified, and are out of scope for this specification. | ||||
Example: | ||||
The original device alarm might identify the resource as | ||||
"/dev:interfaces/dev:interface[dev:name='FastEthernet1/0']". | ||||
The resource identification in the manager could look something | ||||
like: "/mgr:devices/mgr:device[mgr:name='xyz123']/dev:interfaces/ | ||||
dev:interface[dev:name='FastEthernet1/0']" | ||||
This module also allows for alternate naming of the alarming resource | This module also allows for alternate naming of the alarming resource | |||
if it is not available in the data tree. | if it is not available in the data tree. | |||
3.4. Identifying Alarm Instances | 3.4. Identifying Alarm Instances | |||
A primary goal of this alarm module is to remove any ambiguity in how | A primary goal of this alarm module is to remove any ambiguity in how | |||
alarm notifications are mapped to an update of an alarm instance. | alarm notifications are mapped to an update of an alarm instance. | |||
X.733 and especially 3GPP were not really clear on this point. This | X.733 and especially 3GPP were not really clear on this point. This | |||
YANG alarm module states that the tuple (resource, alarm type | YANG alarm module states that the tuple (resource, alarm type | |||
identifier, alarm type qualifier) corresponds to a single alarm | identifier, alarm type qualifier) corresponds to a single alarm | |||
skipping to change at page 12, line 5 ¶ | skipping to change at page 12, line 16 ¶ | |||
The fundamental parts of the data model are the "alarm-list" with | The fundamental parts of the data model are the "alarm-list" with | |||
associated notifications and the "alarm-inventory" list of all | associated notifications and the "alarm-inventory" list of all | |||
possible alarm types. These MUST be implemented by a system. The | possible alarm types. These MUST be implemented by a system. The | |||
rest of the data model are made conditional with YANG the features | rest of the data model are made conditional with YANG the features | |||
"operator-actions", "alarm-shelving", "alarm-history", "alarm- | "operator-actions", "alarm-shelving", "alarm-history", "alarm- | |||
summary", "alarm-profile", and "severity-assignment". | summary", "alarm-profile", and "severity-assignment". | |||
The data model has the following overall structure: | The data model has the following overall structure: | |||
+--rw control | ||||
| +--rw max-alarm-status-changes? union | ||||
| +--rw (notify-status-changes)? | ||||
| | ... | ||||
| +--rw alarm-shelving {alarm-shelving}? | ||||
| ... | ||||
+--ro alarm-inventory | ||||
| +--ro alarm-type* [alarm-type-id alarm-type-qualifier] | ||||
| ... | ||||
+--ro summary {alarm-summary}? | ||||
| +--ro alarm-summary* [severity] | ||||
| | ... | ||||
| +--ro shelves-active? empty {alarm-shelving}? | ||||
+--ro alarm-list | ||||
| +--ro number-of-alarms? yang:gauge32 | ||||
| +--ro last-changed? yang:date-and-time | ||||
| +--ro alarm* [resource alarm-type-id alarm-type-qualifier] | ||||
| ... | ||||
+--ro shelved-alarms {alarm-shelving}? | ||||
| +--ro number-of-shelved-alarms? yang:gauge32 | ||||
| +--ro alarm-shelf-last-changed? yang:date-and-time | ||||
| +--ro shelved-alarm* | ||||
| [resource alarm-type-id alarm-type-qualifier] | ||||
| ... | ||||
+--rw alarm-profile* | ||||
[alarm-type-id alarm-type-qualifier-match resource] | ||||
{alarm-profile}? | ||||
+--rw alarm-type-id al:alarm-type-id | ||||
+--rw alarm-type-qualifier-match string | ||||
+--rw resource al:resource-match | ||||
+--rw description string | ||||
+--rw alarm-severity-assignment-profile | ||||
{severity-assignment}? | ||||
... | ||||
4.1. Alarm Control | 4.1. Alarm Control | |||
The "/alarms/control/notify-status-changes" choice controls if | The "/alarms/control/notify-status-changes" choice controls if | |||
notifications are sent for all state changes, only raise and clear, | notifications are sent for all state changes, only raise and clear, | |||
or only notifications more severe than a configured level. This | or only notifications more severe than a configured level. This | |||
feature in combination with alarm shelving corresponds to the ITU | feature in combination with alarm shelving corresponds to the ITU | |||
Alarm Report Control functionality. | Alarm Report Control functionality. | |||
Every alarm has a list of status changes, this is a circular list. | Every alarm has a list of status changes, this is a circular list. | |||
The length of this list is controlled by "/alarms/control/max-alarm- | The length of this list is controlled by "/alarms/control/max-alarm- | |||
status-changes". | status-changes". | |||
4.1.1. Alarm Shelving | 4.1.1. Alarm Shelving | |||
The shelving control tree is shown below: | The shelving control tree is shown below: | |||
+--rw control | ||||
+--rw alarm-shelving {alarm-shelving}? | ||||
+--rw shelf* [name] | ||||
+--rw name string | ||||
+--rw resource* resource-match | ||||
+--rw alarm-type-id? alarm-type-id | ||||
+--rw alarm-type-qualifier-match? string | ||||
+--rw description? string | ||||
Shelved alarms are shown in a dedicated shelved alarm list. The | Shelved alarms are shown in a dedicated shelved alarm list. The | |||
instrumentation MUST move shelved alarms from the alarm list | instrumentation MUST move shelved alarms from the alarm list | |||
(/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- | (/alarms/alarm-list) to the shelved alarm list (/alarms/shelved- | |||
alarms/). Shelved alarms do not generate any notifications. When | alarms/). Shelved alarms do not generate any notifications. When | |||
the shelving criteria is removed or changed the alarm list MUST be | the shelving criteria is removed or changed the alarm list MUST be | |||
updated to the correct actual state of the alarms. | updated to the correct actual state of the alarms. | |||
Shelving and unshelving can only be performed by editing the shelf | Shelving and unshelving can only be performed by editing the shelf | |||
configuration. It cannot be performed on individual alarms. The | configuration. It cannot be performed on individual alarms. The | |||
server will add an operator state indicating that the alarm was | server will add an operator state indicating that the alarm was | |||
skipping to change at page 13, line 16 ¶ | skipping to change at page 14, line 23 ¶ | |||
the alarm type qualifier MUST populate this list. | the alarm type qualifier MUST populate this list. | |||
The optional leaf-list "resource" in the alarm inventory enables the | The optional leaf-list "resource" in the alarm inventory enables the | |||
system to publish for which resources a given alarm type may appear. | system to publish for which resources a given alarm type may appear. | |||
A server MUST implement the alarm inventory in order to enable | A server MUST implement the alarm inventory in order to enable | |||
controlled alarm procedures in the client. | controlled alarm procedures in the client. | |||
The alarm inventory tree is shown below: | The alarm inventory tree is shown below: | |||
+--ro alarm-inventory | ||||
+--ro alarm-type* [alarm-type-id alarm-type-qualifier] | ||||
+--ro alarm-type-id alarm-type-id | ||||
+--ro alarm-type-qualifier alarm-type-qualifier | ||||
+--ro resource* resource-match | ||||
+--ro has-clear boolean | ||||
+--ro severity-levels* severity | ||||
+--ro description string | ||||
4.3. Alarm Summary | 4.3. Alarm Summary | |||
The alarm summary list summarises alarms per severity; how many | The alarm summary list summarizes alarms per severity; how many | |||
cleared, cleared and closed, and closed. It also gives an indication | cleared, cleared and closed, and closed. It also gives an indication | |||
if there are shelved alarms. | if there are shelved alarms. | |||
The alarm summary tree is shown below: | The alarm summary tree is shown below: | |||
+--ro summary {alarm-summary}? | ||||
+--ro alarm-summary* [severity] | ||||
| +--ro severity severity | ||||
| +--ro total? yang:gauge32 | ||||
| +--ro cleared? yang:gauge32 | ||||
| +--ro cleared-not-closed? yang:gauge32 | ||||
| | {operator-actions}? | ||||
| +--ro cleared-closed? yang:gauge32 | ||||
| | {operator-actions}? | ||||
| +--ro not-cleared-closed? yang:gauge32 | ||||
| | {operator-actions}? | ||||
| +--ro not-cleared-not-closed? yang:gauge32 | ||||
| {operator-actions}? | ||||
+--ro shelves-active? empty {alarm-shelving}? | ||||
4.4. The Alarm List | 4.4. The Alarm List | |||
The alarm list (/alarms/alarm-list) is a function from (resource, | The alarm list (/alarms/alarm-list) is a function from (resource, | |||
alarm type, alarm type qualifier) to the current alarm state. | alarm type, alarm type qualifier) to the current composite alarm | |||
state. The composite state includes states for the resource life- | ||||
cycle such as severity, clearance flag and operator states such as | ||||
acknowledgment. | ||||
+--ro alarm-list | ||||
+--ro number-of-alarms? yang:gauge32 | ||||
+--ro last-changed? yang:date-and-time | ||||
+--ro alarm* [resource alarm-type-id alarm-type-qualifier] | ||||
+--ro resource resource | ||||
+--ro alarm-type-id alarm-type-id | ||||
+--ro alarm-type-qualifier alarm-type-qualifier | ||||
+--ro alt-resource* resource | ||||
+--ro related-alarm* | ||||
| [resource alarm-type-id alarm-type-qualifier] | ||||
| +--ro resource | ||||
| | -> /alarms/alarm-list/alarm/resource | ||||
| +--ro alarm-type-id leafref | ||||
| +--ro alarm-type-qualifier leafref | ||||
+--ro impacted-resource* resource | ||||
+--ro root-cause-resource* resource | ||||
+--ro time-created yang:date-and-time | ||||
+--ro is-cleared boolean | ||||
+--ro last-changed yang:date-and-time | ||||
+--ro perceived-severity severity | ||||
+--ro alarm-text alarm-text | ||||
+--ro status-change* [time] {alarm-history}? | ||||
| +--ro time yang:date-and-time | ||||
| +--ro perceived-severity severity-with-clear | ||||
| +--ro alarm-text alarm-text | ||||
+--ro operator-state-change* [time] {operator-actions}? | ||||
| +--ro time yang:date-and-time | ||||
| +--ro operator string | ||||
| +--ro state operator-state | ||||
| +--ro text? string | ||||
+---x set-operator-state {operator-actions}? | ||||
| +---w input | ||||
| +---w state writable-operator-state | ||||
| +---w text? string | ||||
+---n operator-action {operator-actions}? | ||||
+-- time yang:date-and-time | ||||
+-- operator string | ||||
+-- state operator-state | ||||
+-- text? string | ||||
Every alarm has three important states, the resource clearance state | Every alarm has three important states, the resource clearance state | |||
"is-cleared", the severity "perceived-severity" and the operator | "is-cleared", the severity "perceived-severity" and the operator | |||
state available in the operator state change list. | state available in the operator state change list. | |||
In order to see the alarm history the resource state changes are | In order to see the alarm history the resource state changes are | |||
available in the "status-change" list and the operator history is | available in the "status-change" list and the operator history is | |||
available in the "operator-state-change" list. | available in the "operator-state-change" list. | |||
4.5. The Shelved Alarms List | 4.5. The Shelved Alarms List | |||
skipping to change at page 14, line 14 ¶ | skipping to change at page 17, line 20 ¶ | |||
4.6. Alarm Profiles | 4.6. Alarm Profiles | |||
Alarm profiles (/alarms/alarm-profile/) is a list of configurable | Alarm profiles (/alarms/alarm-profile/) is a list of configurable | |||
alarm types. The list supports configurable alarm severity levels in | alarm types. The list supports configurable alarm severity levels in | |||
the container "alarm-severity-assignment-profile". If an alarm | the container "alarm-severity-assignment-profile". If an alarm | |||
matches the configured alarm type it MUST use the configured severity | matches the configured alarm type it MUST use the configured severity | |||
level(s) instead of the system default. This configuration MUST also | level(s) instead of the system default. This configuration MUST also | |||
be represented in the alarm inventory. | be represented in the alarm inventory. | |||
+--rw alarm-profile* | ||||
[alarm-type-id alarm-type-qualifier-match resource] | ||||
{alarm-profile}? | ||||
+--rw alarm-type-id al:alarm-type-id | ||||
+--rw alarm-type-qualifier-match string | ||||
+--rw resource al:resource-match | ||||
+--rw description string | ||||
+--rw alarm-severity-assignment-profile | ||||
{severity-assignment}? | ||||
+--rw severity-levels* al:severity | ||||
4.7. RPCs and Actions | 4.7. RPCs and Actions | |||
The alarm module supports rpcs and actions to manage the alarms: | The alarm module supports rpcs and actions to manage the alarms: | |||
"purge-alarms" (rpc): delete alarms according to specific | "purge-alarms" (rpc): delete alarms according to specific | |||
criteria, for example all cleared alarms older then a specific | criteria, for example all cleared alarms older then a specific | |||
date. | date. | |||
"compress-alarms" (rpc): compress the status-change list for the | "compress-alarms" (rpc): compress the status-change list for the | |||
alarms. | alarms. | |||
skipping to change at page 14, line 45 ¶ | skipping to change at page 18, line 16 ¶ | |||
operator state on an alarm, like acknowledge. | operator state on an alarm, like acknowledge. | |||
If the alarm inventory is changed, for example a new card type is | If the alarm inventory is changed, for example a new card type is | |||
inserted, a notification will tell the management application that | inserted, a notification will tell the management application that | |||
new alarm types are available. | new alarm types are available. | |||
5. Alarm YANG Module | 5. Alarm YANG Module | |||
This YANG module references [RFC6991]. | This YANG module references [RFC6991]. | |||
<CODE BEGINS> file "ietf-alarms@2018-08-08.yang" | <CODE BEGINS> file "ietf-alarms@2018-09-20.yang" | |||
module ietf-alarms { | module ietf-alarms { | |||
yang-version 1.1; | yang-version 1.1; | |||
namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; | namespace "urn:ietf:params:xml:ns:yang:ietf-alarms"; | |||
prefix al; | prefix al; | |||
import ietf-yang-types { | import ietf-yang-types { | |||
prefix yang; | prefix yang; | |||
reference "RFC 6991: Common YANG Data Types."; | reference "RFC 6991: Common YANG Data Types."; | |||
} | } | |||
skipping to change at page 17, line 27 ¶ | skipping to change at page 20, line 43 ¶ | |||
The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL | The key words 'MUST', 'MUST NOT', 'REQUIRED', 'SHALL', 'SHALL | |||
NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and | NOT', 'SHOULD', 'SHOULD NOT', 'RECOMMENDED', 'MAY', and | |||
'OPTIONAL' in the module text are to be interpreted as described | 'OPTIONAL' in the module text are to be interpreted as described | |||
in RFC 2119 (https://tools.ietf.org/html/rfc2119). | in RFC 2119 (https://tools.ietf.org/html/rfc2119). | |||
This version of this YANG module is part of RFC XXXX | This version of this YANG module is part of RFC XXXX | |||
(https://tools.ietf.org/html/rfcXXXX); see the RFC itself for | (https://tools.ietf.org/html/rfcXXXX); see the RFC itself for | |||
full legal notices."; | full legal notices."; | |||
revision 2018-08-08 { | revision 2018-09-20 { | |||
description | description | |||
"Initial revision."; | "Initial revision."; | |||
reference "RFC XXXX: YANG Alarm Module"; | reference "RFC XXXX: YANG Alarm Module"; | |||
} | } | |||
/* | /* | |||
* Features | * Features | |||
*/ | */ | |||
feature operator-actions { | feature operator-actions { | |||
skipping to change at page 44, line 42 ¶ | skipping to change at page 48, line 12 ¶ | |||
mapping provided by the system is in conflict with other management | mapping provided by the system is in conflict with other management | |||
systems or not considered correct. | systems or not considered correct. | |||
Note that the IETF Alarm Module term 'resource' is synonymous to the | Note that the IETF Alarm Module term 'resource' is synonymous to the | |||
ITU term 'managed object'. | ITU term 'managed object'. | |||
7. The X.733 Mapping Module | 7. The X.733 Mapping Module | |||
This YANG module references [X.733] and [X.736]. | This YANG module references [X.733] and [X.736]. | |||
<CODE BEGINS> file "ietf-alarms-x733@2018-08-08.yang" | <CODE BEGINS> file "ietf-alarms-x733@2018-09-20.yang" | |||
module ietf-alarms-x733 { | module ietf-alarms-x733 { | |||
yang-version 1.1; | yang-version 1.1; | |||
namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; | namespace "urn:ietf:params:xml:ns:yang:ietf-alarms-x733"; | |||
prefix x733; | prefix x733; | |||
import ietf-alarms { | import ietf-alarms { | |||
prefix al; | prefix al; | |||
} | } | |||
import ietf-yang-types { | import ietf-yang-types { | |||
prefix yang; | prefix yang; | |||
skipping to change at page 45, line 49 ¶ | skipping to change at page 49, line 19 ¶ | |||
The module uses an integer and a corresponding string for | The module uses an integer and a corresponding string for | |||
probable cause instead of a globally defined enumeration, in | probable cause instead of a globally defined enumeration, in | |||
order to be able to manage conflicting enumeration definitions. | order to be able to manage conflicting enumeration definitions. | |||
A single globally defined enumeration is challenging to | A single globally defined enumeration is challenging to | |||
maintain."; | maintain."; | |||
reference | reference | |||
"ITU Recommendation X.733: Information Technology | "ITU Recommendation X.733: Information Technology | |||
- Open Systems Interconnection | - Open Systems Interconnection | |||
- System Management: Alarm Reporting Function"; | - System Management: Alarm Reporting Function"; | |||
revision 2018-08-08 { | revision 2018-09-20 { | |||
description | description | |||
"Initial revision."; | "Initial revision."; | |||
reference "RFC XXXX: YANG Alarm Module"; | reference "RFC XXXX: YANG Alarm Module"; | |||
} | } | |||
/* | /* | |||
* Features | * Features | |||
*/ | */ | |||
feature configure-x733-mapping { | feature configure-x733-mapping { | |||
description | description | |||
"The system supports configurable X733 mapping from | "The system supports configurable X733 mapping from | |||
skipping to change at page 58, line 46 ¶ | skipping to change at page 62, line 17 ¶ | |||
"The semantics of alarm definitions: enabling systematic | "The semantics of alarm definitions: enabling systematic | |||
reasoning about alarms. International Journal of Network | reasoning about alarms. International Journal of Network | |||
Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, | Management, Volume 22, Issue 3, John Wiley and Sons, Ltd, | |||
http://dx.doi.org/10.1002/nem.800", March 2012. | http://dx.doi.org/10.1002/nem.800", March 2012. | |||
[EEMUA] EEMUA Publication No. 191 Engineering Equipment and | [EEMUA] EEMUA Publication No. 191 Engineering Equipment and | |||
Materials Users Association, London, 2 edition., "Alarm | Materials Users Association, London, 2 edition., "Alarm | |||
Systems: A Guide to Design, Management and Procurement.", | Systems: A Guide to Design, Management and Procurement.", | |||
2007. | 2007. | |||
[G.7710] ITU-T, "SERIES G: TRANSMISSION SYSTEMS AND MEDIA, DIGITAL | ||||
SYSTEMS AND NETWORKS Data over Transport - Generic aspects | ||||
- Transport network control aspects. Common equipment | ||||
management function requirements", 2012. | ||||
[ISA182] International Society of Automation,ISA, "ANSI/ISA- | [ISA182] International Society of Automation,ISA, "ANSI/ISA- | |||
18.2-2009 Management of Alarm Systems for the Process | 18.2-2009 Management of Alarm Systems for the Process | |||
Industries", 2009. | Industries", 2009. | |||
[RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management | [RFC3877] Chisholm, S. and D. Romascanu, "Alarm Management | |||
Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, | Information Base (MIB)", RFC 3877, DOI 10.17487/RFC3877, | |||
September 2004, <http://www.rfc-editor.org/info/rfc3877>. | September 2004, <http://www.rfc-editor.org/info/rfc3877>. | |||
[RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", | [RFC8340] Bjorklund, M. and L. Berger, Ed., "YANG Tree Diagrams", | |||
BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, | BCP 215, RFC 8340, DOI 10.17487/RFC8340, March 2018, | |||
skipping to change at page 64, line 13 ¶ | skipping to change at page 67, line 13 ¶ | |||
</alarms> | </alarms> | |||
Appendix F. Background and Usability Requirements | Appendix F. Background and Usability Requirements | |||
This section gives background information regarding design choices in | This section gives background information regarding design choices in | |||
the alarm module. It also defines usability requirements for alarms. | the alarm module. It also defines usability requirements for alarms. | |||
Alarm usability is important for an alarm interface. A data-model | Alarm usability is important for an alarm interface. A data-model | |||
will help in defining the format but if the actual alarms are of low | will help in defining the format but if the actual alarms are of low | |||
value we have not gained the goal of alarm management. | value we have not gained the goal of alarm management. | |||
The telecommunication domain has standardised an alarm interface in | The telecommunication domain has standardized an alarm interface in | |||
ITU-T X.733 [X.733]. This continued in mobile networks within the | ITU-T X.733 [X.733]. This continued in mobile networks within the | |||
3GPP organisation [ALARMIRP]. Although SNMP is the dominant | 3GPP organization [ALARMIRP]. Although SNMP is the dominant | |||
mechanism for monitoring devices, IETF did not early on standardise | mechanism for monitoring devices, IETF did not early on standardize | |||
an alarm MIB. Instead, management systems interpreted the enterprise | an alarm MIB. Instead, management systems interpreted the enterprise | |||
specific traps per MIB and device to build an alarm list. When | specific traps per MIB and device to build an alarm list. When | |||
finally The Alarm MIB [RFC3877] was published, it had to address the | finally The Alarm MIB [RFC3877] was published, it had to address the | |||
existence of enterprise traps and map these into alarms. This | existence of enterprise traps and map these into alarms. This | |||
requirement led to a MIB that is not always easy to use. | requirement led to a MIB that is not always easy to use. | |||
F.1. Alarm Concepts | F.1. Alarm Concepts | |||
There are two misconceptions regarding alarms and alarm interfaces | There are two misconceptions regarding alarms and alarm interfaces | |||
that are important to sort out. The first problem is that alarms are | that are important to sort out. The first problem is that alarms are | |||
mixed with events in general. Alarms MUST correspond to an | mixed with events in general. Alarms MUST correspond to an | |||
undesirable state that needs corrective action. Many implementations | undesirable state that needs corrective action. Many implementations | |||
of alarm interfaces do not adhere to this principle and just send | of alarm interfaces do not adhere to this principle and just send | |||
events in general. In order to qualify as an alarm, there must exist | events in general. In order to qualify as an alarm, there must exist | |||
a corrective action. If that is not true, it is an event that can go | a corrective action. If that is not true, it is an event that can go | |||
into logs. | into logs. | |||
The other misconception is that the term "alarm" refers to the | ||||
notification itself. Rather, an alarm is a state of a resource in | ||||
the system. The alarm notifications report state changes of the | ||||
alarm, such as alarm raise and alarm clear. | ||||
"One of the most important principles of alarm management is that an | "One of the most important principles of alarm management is that an | |||
alarm requires an action. This means that if the operator does not | alarm requires an action. This means that if the operator does not | |||
need to respond to an alarm (because unacceptable consequences do not | need to respond to an alarm (because unacceptable consequences do not | |||
occur), then it is not an alarm. Following this cardinal rule will | occur), then it is not an alarm. Following this cardinal rule will | |||
help eliminate many potential alarm management issues." [ISA182] | help eliminate many potential alarm management issues." [ISA182] | |||
The other misconception is that the term "alarm" refers to the | ||||
notification itself. Rather, an alarm is a state of a resource in | ||||
the system. The alarm notifications report state changes of the | ||||
alarm, such as alarm raise and alarm clear. | ||||
F.1.1. Alarm type | F.1.1. Alarm type | |||
Since every alarm has a corresponding corrective action, a vendor can | Since every alarm has a corresponding corrective action, a vendor can | |||
to prepare a list of available alarms and their corrective actions. | to prepare a list of available alarms and their corrective actions. | |||
We use the term "alarm type" to refer to every possible alarm that | We use the term "alarm type" to refer to every possible alarm that | |||
could be active in the system. | could be active in the system. | |||
Alarm types are also fundamental in order to provide a state-based | Alarm types are also fundamental in order to provide a state-based | |||
alarm list. The alarm list correlates alarm state changes for the | alarm list. The alarm list correlates alarm state changes for the | |||
same alarm type and the same resource into one alarm. | same alarm type and the same resource into one alarm. | |||
skipping to change at page 65, line 19 ¶ | skipping to change at page 68, line 19 ¶ | |||
Different alarm interfaces use different mechanisms to define alarm | Different alarm interfaces use different mechanisms to define alarm | |||
types, ranging from simple error numbers to more advanced mechanisms | types, ranging from simple error numbers to more advanced mechanisms | |||
like the X.733 triplet of event type, probable cause and specific | like the X.733 triplet of event type, probable cause and specific | |||
problem. | problem. | |||
A common misunderstanding is that individual alarm notifications are | A common misunderstanding is that individual alarm notifications are | |||
alarm types. This is not correct; e.g., "link-up" and "link-down" | alarm types. This is not correct; e.g., "link-up" and "link-down" | |||
are two notifications reporting different states for the same alarm | are two notifications reporting different states for the same alarm | |||
type, "link-alarm". | type, "link-alarm". | |||
F.2. Usability Requirements | F.2. Relationships to other alarm standards | |||
Common alarm problems and the cause of the problems are summarised in | This section briefly describes how this alarm module relates to other | |||
Table 1. This summary is adopted to networking based on the ISA | relevant alarm standards. It covers the definition of the concept of | |||
an alarm and the data models of the referenced alarm standards. | ||||
F.2.1. Alarm definition | ||||
The table below summarizes relevant definitions of the term "alarm". | ||||
+------------+---------------------------+--------------------------+ | ||||
| Standard | Definition | Comment | | ||||
+------------+---------------------------+--------------------------+ | ||||
| X.733 | error: A deviation of a | The X.733 alarm | | ||||
| [X.733] | system from normal | definition is focused on | | ||||
| | operation. fault: The | the notification as such | | ||||
| | physical or algorithmic | and not the state. It | | ||||
| | cause of a malfunction. | also uses the basic | | ||||
| | Faults manifest | criteria of deviation | | ||||
| | themselves as errors. | from normal condition. | | ||||
| | alarm: A notification, of | There is no requirement | | ||||
| | the form defined by this | for an operation action | | ||||
| | function, of a specific | to be required. | | ||||
| | event. An alarm may or | | | ||||
| | may not represent an | | | ||||
| | error. | | | ||||
| | | | | ||||
| G.7710 | Alarms are indications | The G.7710 definition is | | ||||
| [G.7710] | that are automatically | close to the original | | ||||
| | generated by an NE as a | X.733 definition. | | ||||
| | result of the declaration | | | ||||
| | of a failure. | | | ||||
| | | | | ||||
| Alarm MIB | Alarm: Persistent | RFC 3877 defines alarm | | ||||
| [RFC3877] | indication of a fault. | referring back to "a | | ||||
| | Fault: Lasting error or | deviation from normal | | ||||
| | warning condition. | operation". This is | | ||||
| | Error: A deviation of a | problematic, since this | | ||||
| | system from normal | might not require an | | ||||
| | operation. | operator action. The | | ||||
| | | alarm MIB is state | | ||||
| | | oriented rather than | | ||||
| | | notification oriented, | | ||||
| | | an alarm is a "lasting | | ||||
| | | condition", not a | | ||||
| | | discrete notification | | ||||
| | | reporting about a | | ||||
| | | condition state change. | | ||||
| | | | | ||||
| ISA | Alarm: An audible and/or | The ISA standard adds an | | ||||
| [ISA182] | visible means of | important requirement to | | ||||
| | indicating to the | the "deviation from | | ||||
| | operator an equipment | normal condition state"; | | ||||
| | malfunction, process | requiring a response. | | ||||
| | deviation or abnormal | | | ||||
| | condition requiring a | | | ||||
| | response. | | | ||||
| | | | | ||||
| EEMUA | An alarm is an event to | This is the foundation | | ||||
| [EEMUA] | which an operator must | for the definition of | | ||||
| | knowingly react,respond, | alarm in this document. | | ||||
| | and acknowledge - not | It focuses on the core | | ||||
| | simply acknowledge and | criteria that an action | | ||||
| | ignore. | is really needed. | | ||||
| | | | | ||||
| 3GPP Alarm | 3GPP v15: An alarm | The latest 3GPP Alarm | | ||||
| IRP | signifies an undesired | IRP version uses | | ||||
| [ALARMIRP] | condition of a resource | literally the same alarm | | ||||
| | (e.g. network element, | definition as this alarm | | ||||
| | link) for which an | module. It is worth | | ||||
| | operator action is | noting that earlier | | ||||
| | required. It emphasizes a | versions used a | | ||||
| | key requirement that | definition not requiring | | ||||
| | operators [...] should | an operator action and | | ||||
| | not be informed about an | the more broad | | ||||
| | undesired condition | definition of deviation | | ||||
| | unless it requires | from normal condition. | | ||||
| | operator action. 3GPP | The earlier version also | | ||||
| | v12: alarm: abnormal | defined an alarm as a | | ||||
| | network entity condition, | special case of "event". | | ||||
| | which categorizes an | | | ||||
| | event as a fault. fault: | | | ||||
| | a deviation of a system | | | ||||
| | from normal operation, | | | ||||
| | which may result in the | | | ||||
| | loss of operational | | | ||||
| | capabilities [...] | | | ||||
+------------+---------------------------+--------------------------+ | ||||
Table 1: Definition of alarm in standards | ||||
The evolution of the definition of alarm moves from focused on events | ||||
reporting a deviation from normal operation towards a definition to a | ||||
undesired *state* which *requires an operator action*. | ||||
F.2.2. Data model | ||||
This section describes how this YANG alarm module relates to other | ||||
standard data models. Note well that we cover other data-models for | ||||
alarm interfaces. Not other standards such as SDO specific alarms | ||||
for example. | ||||
F.2.2.1. X.733 | ||||
X.733 has acted as a base for several alarm data models over the | ||||
year. The YANG alarm module differs in the following ways: | ||||
X.733 models the alarm list as a list of notifications. The YANG | ||||
alarm module defines the alarm list as the current alarm states | ||||
for the resources, which is generated from the state change | ||||
reporting notifications. | ||||
In X.733 an alarm can have the severity level clear. In the YANG | ||||
alarm module "clear" is not a severity level, it is a separate | ||||
state of the alarm. An alarm can have the following states for | ||||
example (major, cleared), (minor, not cleared) | ||||
X.733 uses a flat globally defined enumerated "probable cause" to | ||||
identify alarm types. This alarm module uses a hierarchical YANG | ||||
identity, alarm-type. This enables delegation of alarm types | ||||
within organizations. It also lets management reason about | ||||
"abstract" alarm-types corresponding to base identities, see | ||||
Section 3.2. | ||||
The YANG alarm module has not included the majority of the X.733 | ||||
alarm attributes. Rather these are defined in an augmenting | ||||
module if "strict" X.733 compliance is needed. | ||||
F.2.2.2. RFC3877, the Alarm MIB | ||||
The MIB in RFC3877 takes a different approach, rather than defining a | ||||
concrete data-model for alarms, it defines a model to map existing | ||||
SNMP managed-objects and notifications into alarm states and alarm | ||||
notifications. This was necessary since MIBs where already defined | ||||
with both managed objects and notifications indicating alarms, for | ||||
example linkUp and linkDown notifications in combination with | ||||
ifAdminState and ifOperState. So RFC3877 can not really be compared | ||||
to the alarm YANG module in that sense. | ||||
The Alarm MIB maps existing MIB definitions into alarms, | ||||
alarmModelTable. The upside of that is that a SNMP Manager can at | ||||
runtime read the possible alarm types. This corresponds to the | ||||
alarmInventory in the alarm YANG module. | ||||
F.2.2.3. 3GPP Alarm IRP | ||||
The 3GPP Alarm IRP is an evolution of X.733. Main differences | ||||
between the alarm YANG module and 3GPP are: | ||||
3GPP keeps the majority of the X.733 attributes, the alarm YANG | ||||
module does not. | ||||
3GPP introduced overlapping and possibly conflicting keys for | ||||
alarms, alarmId and (managed object, event type, probable cause, | ||||
specific problem). (See Annex C in [X.733] Example 3). In the | ||||
YANG alarm module the key for identifying an alarm instance is | ||||
clearly defined by (resource, alarm-type, alarm-type-qualifier). | ||||
See also Section 3.4 for more information. | ||||
The alarm YANG module clearly separates the resource/ | ||||
instrumentation life cycle from the operator life cycle. 3GPP | ||||
allows operators to set the alarm severity to clear, this is not | ||||
allowed by this module, rather an operator closes an alarm which | ||||
does not affect the severity. | ||||
F.2.2.4. G.7710 | ||||
G.7710 is different than the previous referenced alarm standards. It | ||||
does define a data-model for alarm reporting. It defines common | ||||
equipment management function requirements including alarm | ||||
instrumentation. The scope is transport networks. | ||||
The requirements in G.7710 corresponds to features in the alarm YANG | ||||
module in the following way: | ||||
Alarm Severity Assignment Profile (ASAP): the alarm profile | ||||
"/alarms/alarm-profile/". | ||||
Alarm Reporting Control (ARC): alarm shelving "/alarms/control/ | ||||
alarm-shelving/" and the ability to control alarm notifications | ||||
"/alarms/control/notify-status-changes". | ||||
F.3. Usability Requirements | ||||
Common alarm problems and the cause of the problems are summarized in | ||||
Table 2. This summary is adopted to networking based on the ISA | ||||
[ISA182] and EEMUA [EEMUA] standards. | [ISA182] and EEMUA [EEMUA] standards. | |||
+------------------+--------------------------------+---------------+ | +------------------+--------------------------------+---------------+ | |||
| Problem | Cause | How this | | | Problem | Cause | How this | | |||
| | | module | | | | | module | | |||
| | | address the | | | | | address the | | |||
| | | cause | | | | | cause | | |||
+------------------+--------------------------------+---------------+ | +------------------+--------------------------------+---------------+ | |||
| Alarms are | "Nuisance" alarms (chattering | Strict | | | Alarms are | "Nuisance" alarms (chattering | Strict | | |||
| generated but | alarms and fleeting alarms), | definition of | | | generated but | alarms and fleeting alarms), | definition of | | |||
| they are ignored | faulty hardware, redundant | alarms | | | they are ignored | faulty hardware, redundant | alarms | | |||
| by the operator. | alarms, cascading alarms, | requiring | | | by the operator. | alarms, cascading alarms, | requiring | | |||
| | incorrect alarm settings, | corrective | | | | incorrect alarm settings, | corrective | | |||
| | alarms have not been | response. | | | | alarms have not been | response. | | |||
| | rationalised, the alarms | Alarm | | | | rationalized, the alarms | Alarm | | |||
| | represent log information | requirements | | | | represent log information | requirements | | |||
| | rather than true alarms. | in Table 2. | | | | rather than true alarms. | in Table 3. | | |||
| | | | | | | | | | |||
| When alarms | Insufficient alarm response | The alarm | | | When alarms | Insufficient alarm response | The alarm | | |||
| occur, operators | procedures and not well | inventory | | | occur, operators | procedures and not well | inventory | | |||
| do not know how | defined alarm types. | lists all | | | do not know how | defined alarm types. | lists all | | |||
| to respond. | | alarm types | | | to respond. | | alarm types | | |||
| | | and | | | | | and | | |||
| | | corrective | | | | | corrective | | |||
| | | actions. | | | | | actions. | | |||
| | | Alarm | | | | | Alarm | | |||
| | | requirements | | | | | requirements | | |||
| | | in Table 2. | | | | | in Table 3. | | |||
| | | | | | | | | | |||
| The alarm | Nuisance alarms, stale alarms, | The alarm | | | The alarm | Nuisance alarms, stale alarms, | The alarm | | |||
| display is full | alarms from equipment not in | definition | | | display is full | alarms from equipment not in | definition | | |||
| of alarms, even | service. | and alarm | | | of alarms, even | service. | and alarm | | |||
| when there is | | shelving. | | | when there is | | shelving. | | |||
| nothing wrong. | | | | | nothing wrong. | | | | |||
| | | | | | | | | | |||
| During a | Incorrect prioritization of | State-based | | | During a | Incorrect prioritization of | State-based | | |||
| failure, | alarms. Not using advanced | alarm model, | | | failure, | alarms. Not using advanced | alarm model, | | |||
| operators are | alarm techniques (e.g. state- | alarm rate | | | operators are | alarm techniques (e.g. state- | alarm rate | | |||
| flooded with so | based alarming). | requirements | | | flooded with so | based alarming). | requirements | | |||
| many alarms that | | in Table 3 | | | many alarms that | | in Table 4 | | |||
| they do not know | | and Table 4 | | | they do not know | | and Table 5 | | |||
| which ones are | | | | | which ones are | | | | |||
| the most | | | | | the most | | | | |||
| important. | | | | | important. | | | | |||
+------------------+--------------------------------+---------------+ | +------------------+--------------------------------+---------------+ | |||
Table 1: Alarm Problems and Causes | Table 2: Alarm Problems and Causes | |||
Based upon the above problems EEMUA gives the following definition of | Based upon the above problems EEMUA gives the following definition of | |||
a good alarm: | a good alarm: | |||
+----------------+--------------------------------------------------+ | +----------------+--------------------------------------------------+ | |||
| Characteristic | Explanation | | | Characteristic | Explanation | | |||
+----------------+--------------------------------------------------+ | +----------------+--------------------------------------------------+ | |||
| Relevant | Not spurious or of low operational value. | | | Relevant | Not spurious or of low operational value. | | |||
| | | | | | | | |||
| Unique | Not duplicating another alarm. | | | Unique | Not duplicating another alarm. | | |||
| | | | | | | | |||
| Timely | Not long before any response is needed or too | | | Timely | Not long before any response is needed or too | | |||
| | late to do anything. | | | | late to do anything. | | |||
| | | | | | | | |||
| Prioritised | Indicating the importance that the operator | | | Prioritized | Indicating the importance that the operator | | |||
| | deals with the problem. | | | | deals with the problem. | | |||
| | | | | | | | |||
| Understandable | Having a message which is clear and easy to | | | Understandable | Having a message which is clear and easy to | | |||
| | understand. | | | | understand. | | |||
| | | | | | | | |||
| Diagnostic | Identifying the problem that has occurred. | | | Diagnostic | Identifying the problem that has occurred. | | |||
| | | | | | | | |||
| Advisory | Indicative of the action to be taken. | | | Advisory | Indicative of the action to be taken. | | |||
| | | | | | | | |||
| Focusing | Drawing attention to the most important issues. | | | Focusing | Drawing attention to the most important issues. | | |||
+----------------+--------------------------------------------------+ | +----------------+--------------------------------------------------+ | |||
Table 2: Definition of a Good Alarm | Table 3: Definition of a Good Alarm | |||
Vendors SHOULD rationalise all alarms according to above. Another | Vendors SHOULD rationalize all alarms according to above. Another | |||
crucial requirement is acceptable alarm notification rates. Vendors | crucial requirement is acceptable alarm notification rates. Vendors | |||
SHOULD make sure that they do not exceed the recommendations from | SHOULD make sure that they do not exceed the recommendations from | |||
EEMUA below: | EEMUA below: | |||
+-----------------------------------+-------------------------------+ | +-----------------------------------+-------------------------------+ | |||
| Long Term Alarm Rate in Steady | Acceptability | | | Long Term Alarm Rate in Steady | Acceptability | | |||
| Operation | | | | Operation | | | |||
+-----------------------------------+-------------------------------+ | +-----------------------------------+-------------------------------+ | |||
| More than one per minute | Very likely to be | | | More than one per minute | Very likely to be | | |||
| | unacceptable. | | | | unacceptable. | | |||
| | | | | | | | |||
| One per 2 minutes | Likely to be over-demanding. | | | One per 2 minutes | Likely to be over-demanding. | | |||
| | | | | | | | |||
| One per 5 minutes | Manageable. | | | One per 5 minutes | Manageable. | | |||
| | | | | | | | |||
| Less than one per 10 minutes | Very likely to be acceptable. | | | Less than one per 10 minutes | Very likely to be acceptable. | | |||
+-----------------------------------+-------------------------------+ | +-----------------------------------+-------------------------------+ | |||
Table 3: Acceptable Alarm Rates, Steady State | Table 4: Acceptable Alarm Rates, Steady State | |||
+----------------------------+--------------------------------------+ | +----------------------------+--------------------------------------+ | |||
| Number of alarms displayed | Acceptability | | | Number of alarms displayed | Acceptability | | |||
| in 10 minutes following a | | | | in 10 minutes following a | | | |||
| major network problem | | | | major network problem | | | |||
+----------------------------+--------------------------------------+ | +----------------------------+--------------------------------------+ | |||
| More than 100 | Definitely excessive and very likely | | | More than 100 | Definitely excessive and very likely | | |||
| | to lead to the operator to abandon | | | | to lead to the operator to abandon | | |||
| | the use of the alarm system. | | | | the use of the alarm system. | | |||
| | | | | | | | |||
| 20-100 | Hard to cope with. | | | 20-100 | Hard to cope with. | | |||
| | | | | | | | |||
| Under 10 | Should be manageable - but may be | | | Under 10 | Should be manageable - but may be | | |||
| | difficult if several of the alarms | | | | difficult if several of the alarms | | |||
| | require a complex operator response. | | | | require a complex operator response. | | |||
+----------------------------+--------------------------------------+ | +----------------------------+--------------------------------------+ | |||
Table 4: Acceptable Alarm Rates, Burst | Table 5: Acceptable Alarm Rates, Burst | |||
The numbers in Table 3 and Table 4 are the sum of all alarms for a | The numbers in Table 4 and Table 5 are the sum of all alarms for a | |||
network being managed from one alarm console. So every individual | network being managed from one alarm console. So every individual | |||
system or NMS contributes to these numbers. | system or NMS contributes to these numbers. | |||
Vendors SHOULD make sure that the following rules are used in | Vendors SHOULD make sure that the following rules are used in | |||
designing the alarm interface: | designing the alarm interface: | |||
1. Rationalize the alarms in the system to ensure that every alarm | 1. Rationalize the alarms in the system to ensure that every alarm | |||
is necessary, has a purpose, and follows the cardinal rule - that | is necessary, has a purpose, and follows the cardinal rule - that | |||
it requires an operator response. Adheres to the rules of | it requires an operator response. Adheres to the rules of | |||
Table 2 | Table 3 | |||
2. Audit the quality of the alarms. Talk with the operators about | 2. Audit the quality of the alarms. Talk with the operators about | |||
how well the alarm information support them. Do they know what | how well the alarm information support them. Do they know what | |||
to do in the event of an alarm? Are they able to quickly | to do in the event of an alarm? Are they able to quickly | |||
diagnose the problem and determine the corrective action? Does | diagnose the problem and determine the corrective action? Does | |||
the alarm text adhere to the requirements in Table 2? | the alarm text adhere to the requirements in Table 3? | |||
3. Analyze and benchmark the performance of the system and compare | 3. Analyze and benchmark the performance of the system and compare | |||
it to the recommended metrics in Table 3 and Table 4. Start by | it to the recommended metrics in Table 4 and Table 5. Start by | |||
identifying nuisance alarms, standing alarms at normal state and | identifying nuisance alarms, standing alarms at normal state and | |||
startup. | startup. | |||
Authors' Addresses | Authors' Addresses | |||
Stefan Vallin | Stefan Vallin | |||
Stefan Vallin AB | Stefan Vallin AB | |||
Email: stefan@wallan.se | Email: stefan@wallan.se | |||
Martin Bjorklund | Martin Bjorklund | |||
End of changes. 46 change blocks. | ||||
82 lines changed or deleted | 411 lines changed or added | |||
This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |