draft-ietf-rift-rift-10.txt   draft-ietf-rift-rift-11.txt 
RIFT Working Group A. Przygienda, Ed. RIFT Working Group A. Przygienda, Ed.
Internet-Draft Juniper Internet-Draft Juniper
Intended status: Standards Track A. Sharma Intended status: Standards Track A. Sharma
Expires: August 1, 2020 Comcast Expires: September 11, 2020 Comcast
P. Thubert P. Thubert
Cisco Cisco
Bruno. Rijsman Bruno. Rijsman
Individual Individual
Dmitry. Afanasiev Dmitry. Afanasiev
Yandex Yandex
January 29, 2020 March 10, 2020
RIFT: Routing in Fat Trees RIFT: Routing in Fat Trees
draft-ietf-rift-rift-10 draft-ietf-rift-rift-11
Abstract Abstract
This document defines a specialized, dynamic routing protocol for This document defines a specialized, dynamic routing protocol for
Clos and fat-tree network topologies optimized towards minimization Clos and fat-tree network topologies optimized towards minimization
of configuration and operational complexity. The protocol of configuration and operational complexity. The protocol
o deals with no configuration, fully automated construction of fat- o deals with no configuration, fully automated construction of fat-
tree topologies based on detection of links, tree topologies based on detection of links,
skipping to change at page 2, line 20 skipping to change at page 2, line 20
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at https://datatracker.ietf.org/drafts/current/. Drafts is at https://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on August 1, 2020. This Internet-Draft will expire on September 11, 2020.
Copyright Notice Copyright Notice
Copyright (c) 2020 IETF Trust and the persons identified as the Copyright (c) 2020 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(https://trustee.ietf.org/license-info) in effect on the date of (https://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 18 skipping to change at page 3, line 18
4.2.2.1. LIE FSM . . . . . . . . . . . . . . . . . . . . . 36 4.2.2.1. LIE FSM . . . . . . . . . . . . . . . . . . . . . 36
4.2.3. Topology Exchange (TIE Exchange) . . . . . . . . . . 46 4.2.3. Topology Exchange (TIE Exchange) . . . . . . . . . . 46
4.2.3.1. Topology Information Elements . . . . . . . . . . 46 4.2.3.1. Topology Information Elements . . . . . . . . . . 46
4.2.3.2. South- and Northbound Representation . . . . . . 46 4.2.3.2. South- and Northbound Representation . . . . . . 46
4.2.3.3. Flooding . . . . . . . . . . . . . . . . . . . . 49 4.2.3.3. Flooding . . . . . . . . . . . . . . . . . . . . 49
4.2.3.4. TIE Flooding Scopes . . . . . . . . . . . . . . . 56 4.2.3.4. TIE Flooding Scopes . . . . . . . . . . . . . . . 56
4.2.3.5. 'Flood Only Node TIEs' Bit . . . . . . . . . . . 59 4.2.3.5. 'Flood Only Node TIEs' Bit . . . . . . . . . . . 59
4.2.3.6. Initial and Periodic Database Synchronization . . 60 4.2.3.6. Initial and Periodic Database Synchronization . . 60
4.2.3.7. Purging and Roll-Overs . . . . . . . . . . . . . 60 4.2.3.7. Purging and Roll-Overs . . . . . . . . . . . . . 60
4.2.3.8. Southbound Default Route Origination . . . . . . 61 4.2.3.8. Southbound Default Route Origination . . . . . . 61
4.2.3.9. Northbound TIE Flooding Reduction . . . . . . . . 61 4.2.3.9. Northbound TIE Flooding Reduction . . . . . . . . 62
4.2.3.10. Special Considerations . . . . . . . . . . . . . 66 4.2.3.10. Special Considerations . . . . . . . . . . . . . 67
4.2.4. Reachability Computation . . . . . . . . . . . . . . 67 4.2.4. Reachability Computation . . . . . . . . . . . . . . 67
4.2.4.1. Northbound SPF . . . . . . . . . . . . . . . . . 67 4.2.4.1. Northbound SPF . . . . . . . . . . . . . . . . . 67
4.2.4.2. Southbound SPF . . . . . . . . . . . . . . . . . 68 4.2.4.2. Southbound SPF . . . . . . . . . . . . . . . . . 68
4.2.4.3. East-West Forwarding Within a non-ToF Level . . . 68 4.2.4.3. East-West Forwarding Within a non-ToF Level . . . 69
4.2.4.4. East-West Links Within ToF Level . . . . . . . . 68 4.2.4.4. East-West Links Within ToF Level . . . . . . . . 69
4.2.5. Automatic Disaggregation on Link & Node Failures . . 69 4.2.5. Automatic Disaggregation on Link & Node Failures . . 69
4.2.5.1. Positive, Non-transitive Disaggregation . . . . . 69 4.2.5.1. Positive, Non-transitive Disaggregation . . . . . 69
4.2.5.2. Negative, Transitive Disaggregation for Fallen 4.2.5.2. Negative, Transitive Disaggregation for Fallen
Leaves . . . . . . . . . . . . . . . . . . . . . 72 Leaves . . . . . . . . . . . . . . . . . . . . . 73
4.2.6. Attaching Prefixes . . . . . . . . . . . . . . . . . 74 4.2.6. Attaching Prefixes . . . . . . . . . . . . . . . . . 75
4.2.7. Optional Zero Touch Provisioning (ZTP) . . . . . . . 83 4.2.7. Optional Zero Touch Provisioning (ZTP) . . . . . . . 84
4.2.7.1. Terminology . . . . . . . . . . . . . . . . . . . 84 4.2.7.1. Terminology . . . . . . . . . . . . . . . . . . . 85
4.2.7.2. Automatic SystemID Selection . . . . . . . . . . 85 4.2.7.2. Automatic SystemID Selection . . . . . . . . . . 86
4.2.7.3. Generic Fabric Example . . . . . . . . . . . . . 86 4.2.7.3. Generic Fabric Example . . . . . . . . . . . . . 87
4.2.7.4. Level Determination Procedure . . . . . . . . . . 87 4.2.7.4. Level Determination Procedure . . . . . . . . . . 88
4.2.7.5. ZTP FSM . . . . . . . . . . . . . . . . . . . . . 88 4.2.7.5. ZTP FSM . . . . . . . . . . . . . . . . . . . . . 89
4.2.7.6. Resulting Topologies . . . . . . . . . . . . . . 94 4.2.7.6. Resulting Topologies . . . . . . . . . . . . . . 95
4.2.8. Stability Considerations . . . . . . . . . . . . . . 96 4.2.8. Stability Considerations . . . . . . . . . . . . . . 97
4.3. Further Mechanisms . . . . . . . . . . . . . . . . . . . 97 4.3. Further Mechanisms . . . . . . . . . . . . . . . . . . . 98
4.3.1. Overload Bit . . . . . . . . . . . . . . . . . . . . 97 4.3.1. Overload Bit . . . . . . . . . . . . . . . . . . . . 98
4.3.2. Optimized Route Computation on Leaves . . . . . . . . 97 4.3.2. Optimized Route Computation on Leaves . . . . . . . . 98
4.3.3. Mobility . . . . . . . . . . . . . . . . . . . . . . 97 4.3.3. Mobility . . . . . . . . . . . . . . . . . . . . . . 98
4.3.3.1. Clock Comparison . . . . . . . . . . . . . . . . 99 4.3.3.1. Clock Comparison . . . . . . . . . . . . . . . . 100
4.3.3.2. Interaction between Time Stamps and Sequence 4.3.3.2. Interaction between Time Stamps and Sequence
Counters . . . . . . . . . . . . . . . . . . . . 99 Counters . . . . . . . . . . . . . . . . . . . . 100
4.3.3.3. Anycast vs. Unicast . . . . . . . . . . . . . . . 100 4.3.3.3. Anycast vs. Unicast . . . . . . . . . . . . . . . 101
4.3.3.4. Overlays and Signaling . . . . . . . . . . . . . 100 4.3.3.4. Overlays and Signaling . . . . . . . . . . . . . 101
4.3.4. Key/Value Store . . . . . . . . . . . . . . . . . . . 100 4.3.4. Key/Value Store . . . . . . . . . . . . . . . . . . . 101
4.3.4.1. Southbound . . . . . . . . . . . . . . . . . . . 100 4.3.4.1. Southbound . . . . . . . . . . . . . . . . . . . 101
4.3.4.2. Northbound . . . . . . . . . . . . . . . . . . . 101 4.3.4.2. Northbound . . . . . . . . . . . . . . . . . . . 102
4.3.5. Interactions with BFD . . . . . . . . . . . . . . . . 101 4.3.5. Interactions with BFD . . . . . . . . . . . . . . . . 102
4.3.6. Fabric Bandwidth Balancing . . . . . . . . . . . . . 102 4.3.6. Fabric Bandwidth Balancing . . . . . . . . . . . . . 103
4.3.6.1. Northbound Direction . . . . . . . . . . . . . . 102 4.3.6.1. Northbound Direction . . . . . . . . . . . . . . 103
4.3.6.2. Southbound Direction . . . . . . . . . . . . . . 104 4.3.6.2. Southbound Direction . . . . . . . . . . . . . . 106
4.3.7. Label Binding . . . . . . . . . . . . . . . . . . . . 105 4.3.7. Label Binding . . . . . . . . . . . . . . . . . . . . 106
4.3.8. Leaf to Leaf Procedures . . . . . . . . . . . . . . . 105 4.3.8. Leaf to Leaf Procedures . . . . . . . . . . . . . . . 106
4.3.9. Address Family and Multi Topology Considerations . . 105 4.3.9. Address Family and Multi Topology Considerations . . 106
4.3.10. Reachability of Internal Nodes in the Fabric . . . . 106 4.3.10. Reachability of Internal Nodes in the Fabric . . . . 107
4.3.11. One-Hop Healing of Levels with East-West Links . . . 106 4.3.11. One-Hop Healing of Levels with East-West Links . . . 107
4.4. Security . . . . . . . . . . . . . . . . . . . . . . . . 106 4.4. Security . . . . . . . . . . . . . . . . . . . . . . . . 107
4.4.1. Security Model . . . . . . . . . . . . . . . . . . . 106 4.4.1. Security Model . . . . . . . . . . . . . . . . . . . 107
4.4.2. Security Mechanisms . . . . . . . . . . . . . . . . . 108 4.4.2. Security Mechanisms . . . . . . . . . . . . . . . . . 109
4.4.3. Security Envelope . . . . . . . . . . . . . . . . . . 109 4.4.3. Security Envelope . . . . . . . . . . . . . . . . . . 110
4.4.4. Weak Nonces . . . . . . . . . . . . . . . . . . . . . 112 4.4.4. Weak Nonces . . . . . . . . . . . . . . . . . . . . . 113
4.4.5. Lifetime . . . . . . . . . . . . . . . . . . . . . . 113 4.4.5. Lifetime . . . . . . . . . . . . . . . . . . . . . . 114
4.4.6. Key Management . . . . . . . . . . . . . . . . . . . 113 4.4.6. Key Management . . . . . . . . . . . . . . . . . . . 114
4.4.7. Security Association Changes . . . . . . . . . . . . 113 4.4.7. Security Association Changes . . . . . . . . . . . . 114
5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 114 5. Examples . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.1. Normal Operation . . . . . . . . . . . . . . . . . . . . 114 5.1. Normal Operation . . . . . . . . . . . . . . . . . . . . 115
5.2. Leaf Link Failure . . . . . . . . . . . . . . . . . . . . 115 5.2. Leaf Link Failure . . . . . . . . . . . . . . . . . . . . 117
5.3. Partitioned Fabric . . . . . . . . . . . . . . . . . . . 116 5.3. Partitioned Fabric . . . . . . . . . . . . . . . . . . . 118
5.4. Northbound Partitioned Router and Optional East-West 5.4. Northbound Partitioned Router and Optional East-West
Links . . . . . . . . . . . . . . . . . . . . . . . . . . 118 Links . . . . . . . . . . . . . . . . . . . . . . . . . . 119
6. Implementation and Operation: Further Details . . . . . . . . 118 6. Implementation and Operation: Further Details . . . . . . . . 120
6.1. Considerations for Leaf-Only Implementation . . . . . . . 118 6.1. Considerations for Leaf-Only Implementation . . . . . . . 120
6.2. Considerations for Spine Implementation . . . . . . . . . 119 6.2. Considerations for Spine Implementation . . . . . . . . . 121
6.3. Adaptations to Other Proposed Data Center Topologies . . 119 6.3. Adaptations to Other Proposed Data Center Topologies . . 121
6.4. Originating Non-Default Route Southbound . . . . . . . . 120 6.4. Originating Non-Default Route Southbound . . . . . . . . 122
7. Security Considerations . . . . . . . . . . . . . . . . . . . 120 7. Security Considerations . . . . . . . . . . . . . . . . . . . 122
7.1. General . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.1. General . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.2. ZTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 120 7.2. ZTP . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
7.3. Lifetime . . . . . . . . . . . . . . . . . . . . . . . . 121 7.3. Lifetime . . . . . . . . . . . . . . . . . . . . . . . . 123
7.4. Packet Number . . . . . . . . . . . . . . . . . . . . . . 121 7.4. Packet Number . . . . . . . . . . . . . . . . . . . . . . 123
7.5. Outer Fingerprint Attacks . . . . . . . . . . . . . . . . 121 7.5. Outer Fingerprint Attacks . . . . . . . . . . . . . . . . 123
7.6. TIE Origin Fingerprint DoS Attacks . . . . . . . . . . . 121 7.6. TIE Origin Fingerprint DoS Attacks . . . . . . . . . . . 123
7.7. Host Implementations . . . . . . . . . . . . . . . . . . 122 7.7. Host Implementations . . . . . . . . . . . . . . . . . . 124
8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 122 8. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 124
8.1. Requested Multicast and Port Numbers . . . . . . . . . . 122 8.1. Requested Multicast and Port Numbers . . . . . . . . . . 124
8.2. Requested Registries with Suggested Values . . . . . . . 122 8.2. Requested Registries with Suggested Values . . . . . . . 124
8.2.1. Registry RIFT/common/AddressFamilyType . . . . . . . 123 8.2.1. Registry RIFT_v4/common/AddressFamilyType . . . . . . 125
8.2.1.1. Requested Entries . . . . . . . . . . . . . . . . 123 8.2.1.1. Requested Entries . . . . . . . . . . . . . . . . 125
8.2.2. Registry RIFT/common/HierarchyIndications . . . . . . 123 8.2.2. Registry RIFT_v4/common/HierarchyIndications . . . . 125
8.2.2.1. Requested Entries . . . . . . . . . . . . . . . . 123 8.2.2.1. Requested Entries . . . . . . . . . . . . . . . . 125
8.2.3. Registry RIFT/common/IEEE802_1ASTimeStampType . . . . 123 8.2.3. Registry RIFT_v4/common/IEEE802_1ASTimeStampType . . 125
8.2.3.1. Requested Entries . . . . . . . . . . . . . . . . 123 8.2.3.1. Requested Entries . . . . . . . . . . . . . . . . 125
8.2.4. Registry RIFT/common/IPAddressType . . . . . . . . . 124 8.2.4. Registry RIFT_v4/common/IPAddressType . . . . . . . . 125
8.2.4.1. Requested Entries . . . . . . . . . . . . . . . . 124 8.2.4.1. Requested Entries . . . . . . . . . . . . . . . . 126
8.2.5. Registry RIFT/common/IPPrefixType . . . . . . . . . . 124 8.2.5. Registry RIFT_v4/common/IPPrefixType . . . . . . . . 126
8.2.5.1. Requested Entries . . . . . . . . . . . . . . . . 124 8.2.5.1. Requested Entries . . . . . . . . . . . . . . . . 126
8.2.6. Registry RIFT/common/IPv4PrefixType . . . . . . . . . 124 8.2.6. Registry RIFT_v4/common/IPv4PrefixType . . . . . . . 126
8.2.6.1. Requested Entries . . . . . . . . . . . . . . . . 124 8.2.6.1. Requested Entries . . . . . . . . . . . . . . . . 126
8.2.7. Registry RIFT/common/IPv6PrefixType . . . . . . . . . 124 8.2.7. Registry RIFT_v4/common/IPv6PrefixType . . . . . . . 126
8.2.7.1. Requested Entries . . . . . . . . . . . . . . . . 124 8.2.7.1. Requested Entries . . . . . . . . . . . . . . . . 126
8.2.8. Registry RIFT/common/PrefixSequenceType . . . . . . . 125 8.2.8. Registry RIFT_v4/common/PrefixSequenceType . . . . . 126
8.2.8.1. Requested Entries . . . . . . . . . . . . . . . . 125 8.2.8.1. Requested Entries . . . . . . . . . . . . . . . . 127
8.2.9. Registry RIFT/common/RouteType . . . . . . . . . . . 125 8.2.9. Registry RIFT_v4/common/RouteType . . . . . . . . . . 127
8.2.9.1. Requested Entries . . . . . . . . . . . . . . . . 125 8.2.9.1. Requested Entries . . . . . . . . . . . . . . . . 127
8.2.10. Registry RIFT/common/TIETypeType . . . . . . . . . . 125 8.2.10. Registry RIFT_v4/common/TIETypeType . . . . . . . . . 127
8.2.10.1. Requested Entries . . . . . . . . . . . . . . . 126 8.2.10.1. Requested Entries . . . . . . . . . . . . . . . 128
8.2.11. Registry RIFT/common/TieDirectionType . . . . . . . . 126 8.2.11. Registry RIFT_v4/common/TieDirectionType . . . . . . 128
8.2.11.1. Requested Entries . . . . . . . . . . . . . . . 126 8.2.11.1. Requested Entries . . . . . . . . . . . . . . . 128
8.2.12. Registry RIFT/encoding/Community . . . . . . . . . . 126 8.2.12. Registry RIFT_v4/encoding/Community . . . . . . . . . 128
8.2.12.1. Requested Entries . . . . . . . . . . . . . . . 126 8.2.12.1. Requested Entries . . . . . . . . . . . . . . . 128
8.2.13. Registry RIFT/encoding/KeyValueTIEElement . . . . . . 126 8.2.13. Registry RIFT_v4/encoding/KeyValueTIEElement . . . . 128
8.2.13.1. Requested Entries . . . . . . . . . . . . . . . 127 8.2.13.1. Requested Entries . . . . . . . . . . . . . . . 128
8.2.14. Registry RIFT/encoding/LIEPacket . . . . . . . . . . 127 8.2.14. Registry RIFT_v4/encoding/LIEPacket . . . . . . . . . 129
8.2.14.1. Requested Entries . . . . . . . . . . . . . . . 127 8.2.14.1. Requested Entries . . . . . . . . . . . . . . . 129
8.2.15. Registry RIFT/encoding/LinkCapabilities . . . . . . . 128 8.2.15. Registry RIFT_v4/encoding/LinkCapabilities . . . . . 130
8.2.15.1. Requested Entries . . . . . . . . . . . . . . . 128 8.2.15.1. Requested Entries . . . . . . . . . . . . . . . 130
8.2.16. Registry RIFT/encoding/LinkIDPair . . . . . . . . . . 128 8.2.16. Registry RIFT_v4/encoding/LinkIDPair . . . . . . . . 130
8.2.16.1. Requested Entries . . . . . . . . . . . . . . . 128 8.2.16.1. Requested Entries . . . . . . . . . . . . . . . 130
8.2.17. Registry RIFT/encoding/Neighbor . . . . . . . . . . . 129 8.2.17. Registry RIFT_v4/encoding/Neighbor . . . . . . . . . 131
8.2.17.1. Requested Entries . . . . . . . . . . . . . . . 129 8.2.17.1. Requested Entries . . . . . . . . . . . . . . . 131
8.2.18. Registry RIFT/encoding/NodeCapabilities . . . . . . . 129 8.2.18. Registry RIFT_v4/encoding/NodeCapabilities . . . . . 131
8.2.18.1. Requested Entries . . . . . . . . . . . . . . . 129 8.2.18.1. Requested Entries . . . . . . . . . . . . . . . 131
8.2.19. Registry RIFT/encoding/NodeFlags . . . . . . . . . . 130 8.2.19. Registry RIFT_v4/encoding/NodeFlags . . . . . . . . . 132
8.2.19.1. Requested Entries . . . . . . . . . . . . . . . 130 8.2.19.1. Requested Entries . . . . . . . . . . . . . . . 132
8.2.20. Registry RIFT/encoding/NodeNeighborsTIEElement . . . 130 8.2.20. Registry RIFT_v4/encoding/NodeNeighborsTIEElement . . 132
8.2.20.1. Requested Entries . . . . . . . . . . . . . . . 130 8.2.20.1. Requested Entries . . . . . . . . . . . . . . . 132
8.2.21. Registry RIFT/encoding/NodeTIEElement . . . . . . . . 130 8.2.21. Registry RIFT_v4/encoding/NodeTIEElement . . . . . . 132
8.2.21.1. Requested Entries . . . . . . . . . . . . . . . 131 8.2.21.1. Requested Entries . . . . . . . . . . . . . . . 133
8.2.22. Registry RIFT/encoding/PacketContent . . . . . . . . 131 8.2.22. Registry RIFT_v4/encoding/PacketContent . . . . . . . 133
8.2.22.1. Requested Entries . . . . . . . . . . . . . . . 131 8.2.22.1. Requested Entries . . . . . . . . . . . . . . . 133
8.2.23. Registry RIFT/encoding/PacketHeader . . . . . . . . . 131 8.2.23. Registry RIFT_v4/encoding/PacketHeader . . . . . . . 133
8.2.23.1. Requested Entries . . . . . . . . . . . . . . . 131 8.2.23.1. Requested Entries . . . . . . . . . . . . . . . 133
8.2.24. Registry RIFT/encoding/PrefixAttributes . . . . . . . 132 8.2.24. Registry RIFT_v4/encoding/PrefixAttributes . . . . . 134
8.2.24.1. Requested Entries . . . . . . . . . . . . . . . 132 8.2.24.1. Requested Entries . . . . . . . . . . . . . . . 134
8.2.25. Registry RIFT/encoding/PrefixTIEElement . . . . . . . 132 8.2.25. Registry RIFT_v4/encoding/PrefixTIEElement . . . . . 134
8.2.25.1. Requested Entries . . . . . . . . . . . . . . . 133 8.2.25.1. Requested Entries . . . . . . . . . . . . . . . 134
8.2.26. Registry RIFT/encoding/ProtocolPacket . . . . . . . . 133 8.2.26. Registry RIFT_v4/encoding/ProtocolPacket . . . . . . 135
8.2.26.1. Requested Entries . . . . . . . . . . . . . . . 133 8.2.26.1. Requested Entries . . . . . . . . . . . . . . . 135
8.2.27. Registry RIFT/encoding/TIDEPacket . . . . . . . . . . 133 8.2.27. Registry RIFT_v4/encoding/TIDEPacket . . . . . . . . 135
8.2.27.1. Requested Entries . . . . . . . . . . . . . . . 133 8.2.27.1. Requested Entries . . . . . . . . . . . . . . . 135
8.2.28. Registry RIFT/encoding/TIEElement . . . . . . . . . . 133 8.2.28. Registry RIFT_v4/encoding/TIEElement . . . . . . . . 135
8.2.28.1. Requested Entries . . . . . . . . . . . . . . . 134 8.2.28.1. Requested Entries . . . . . . . . . . . . . . . 136
8.2.29. Registry RIFT/encoding/TIEHeader . . . . . . . . . . 134 8.2.29. Registry RIFT_v4/encoding/TIEHeader . . . . . . . . . 136
8.2.29.1. Requested Entries . . . . . . . . . . . . . . . 135 8.2.29.1. Requested Entries . . . . . . . . . . . . . . . 137
8.2.30. Registry RIFT/encoding/TIEHeaderWithLifeTime . . . . 135 8.2.30. Registry RIFT_v4/encoding/TIEHeaderWithLifeTime . . . 137
8.2.30.1. Requested Entries . . . . . . . . . . . . . . . 135 8.2.30.1. Requested Entries . . . . . . . . . . . . . . . 137
8.2.31. Registry RIFT/encoding/TIEID . . . . . . . . . . . . 135 8.2.31. Registry RIFT_v4/encoding/TIEID . . . . . . . . . . . 137
8.2.31.1. Requested Entries . . . . . . . . . . . . . . . 136 8.2.31.1. Requested Entries . . . . . . . . . . . . . . . 138
8.2.32. Registry RIFT/encoding/TIEPacket . . . . . . . . . . 136 8.2.32. Registry RIFT_v4/encoding/TIEPacket . . . . . . . . . 138
8.2.32.1. Requested Entries . . . . . . . . . . . . . . . 136 8.2.32.1. Requested Entries . . . . . . . . . . . . . . . 138
8.2.33. Registry RIFT/encoding/TIREPacket . . . . . . . . . . 136 8.2.33. Registry RIFT_v4/encoding/TIREPacket . . . . . . . . 138
8.2.33.1. Requested Entries . . . . . . . . . . . . . . . 136 8.2.33.1. Requested Entries . . . . . . . . . . . . . . . 138
9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 136 9. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 138
10. References . . . . . . . . . . . . . . . . . . . . . . . . . 137 10. References . . . . . . . . . . . . . . . . . . . . . . . . . 139
10.1. Normative References . . . . . . . . . . . . . . . . . . 137 10.1. Normative References . . . . . . . . . . . . . . . . . . 139
10.2. Informative References . . . . . . . . . . . . . . . . . 139 10.2. Informative References . . . . . . . . . . . . . . . . . 141
Appendix A. Sequence Number Binary Arithmetic . . . . . . . . . 141 Appendix A. Sequence Number Binary Arithmetic . . . . . . . . . 143
Appendix B. Information Elements Schema . . . . . . . . . . . . 142 Appendix B. Information Elements Schema . . . . . . . . . . . . 144
B.1. common.thrift . . . . . . . . . . . . . . . . . . . . . . 143 B.1. common.thrift . . . . . . . . . . . . . . . . . . . . . . 146
B.2. encoding.thrift . . . . . . . . . . . . . . . . . . . . . 149 B.2. encoding.thrift . . . . . . . . . . . . . . . . . . . . . 152
Appendix C. Constants . . . . . . . . . . . . . . . . . . . . . 158 Appendix C. Constants . . . . . . . . . . . . . . . . . . . . . 160
C.1. Configurable Protocol Constants . . . . . . . . . . . . . 158 C.1. Configurable Protocol Constants . . . . . . . . . . . . . 160
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 160 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 162
1. Authors 1. Authors
This work is a product of a list of individuals which are all to be This work is a product of a list of individuals which are all to be
considered major contributors independent of the fact whether their considered major contributors independent of the fact whether their
name made it to the limited boilerplate author's list or not. name made it to the limited boilerplate author's list or not.
Tony Przygienda, Ed. | Alankar Sharma | Pascal Thubert Tony Przygienda, Ed. | Alankar Sharma | Pascal Thubert
Juniper Networks | Comcast | Cisco Juniper Networks | Comcast | Cisco
skipping to change at page 8, line 27 skipping to change at page 8, line 27
. +-+-+-+ +-+-+-+ . +-+-+-+ +-+-+-+
. 0/0 @ [E,F] | | | | 0/0 @ [E,F] . 0/0 @ [E,F] | | | | 0/0 @ [E,F]
. A/32 @ A | | +-----+ | A/32 @ A . A/32 @ A | | +-----+ | A/32 @ A
. B/32 @ B | | | | B/32 @ B . B/32 @ B | | | | B/32 @ B
. | +------+ | . | +------+ |
. | | | | . | | | |
. +-+---+ | | +---+-+ . +-+---+ | | +---+-+
. | A +--+ +-+ B | . | A +--+ +-+ B |
. 0/0 @ [C,D] +-----+ +-----+ 0/0 @ [C,D] . 0/0 @ [C,D] +-----+ +-----+ 0/0 @ [C,D]
Figure 1: RIFT information distribution Figure 1: RIFT Information Distribution
2.1. Requirements Language 2.1. Requirements Language
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in RFC 8174 [RFC8174]. document are to be interpreted as described in RFC 8174 [RFC8174].
3. Reference Frame 3. Reference Frame
3.1. Terminology 3.1. Terminology
skipping to change at page 11, line 22 skipping to change at page 11, line 22
such as links and address prefixes, in a fashion similar to ISIS such as links and address prefixes, in a fashion similar to ISIS
LSPs or OSPF LSAs. A TIE has always a direction and a type. We LSPs or OSPF LSAs. A TIE has always a direction and a type. We
will talk about North TIEs (sometimes abbreviated as N-TIEs) when will talk about North TIEs (sometimes abbreviated as N-TIEs) when
talking about TIEs in the northbound representation and South-TIEs talking about TIEs in the northbound representation and South-TIEs
(sometimes abbreviated as S-TIEs) for the southbound equivalent. (sometimes abbreviated as S-TIEs) for the southbound equivalent.
TIEs have different types such as node and prefix TIEs. TIEs have different types such as node and prefix TIEs.
Node TIE: This stands as acronym for a "Node Topology Information Node TIE: This stands as acronym for a "Node Topology Information
Element" that contains all adjacencies the node discovered and Element" that contains all adjacencies the node discovered and
information about node itself. Node TIE should NOT be confused information about node itself. Node TIE should NOT be confused
with a N-TIE since "node" defines the type of TIE rather than its with a North TIE since "node" defines the type of TIE rather than
direction. its direction.
Prefix TIE: This is an acronym for a "Prefix Topology Information Prefix TIE: This is an acronym for a "Prefix Topology Information
Element" and it contains all prefixes directly attached to this Element" and it contains all prefixes directly attached to this
node in case of a North TIE and in case of South TIE the necessary node in case of a North TIE and in case of South TIE the necessary
default routes the node advertises southbound. default routes the node advertises southbound.
Key Value TIE: A South TIE that is carrying a set of key value pairs Key Value TIE: A South TIE that is carrying a set of key value pairs
[DYNAMO]. It can be used to distribute information in the [DYNAMO]. It can be used to distribute information in the
southbound direction within the protocol. southbound direction within the protocol.
skipping to change at page 14, line 4 skipping to change at page 14, line 4
routes. routes.
South SPF (S-SPF): A reachability calculation that is progressing South SPF (S-SPF): A reachability calculation that is progressing
southbound, as example SPF that is using North Node TIEs only. southbound, as example SPF that is using North Node TIEs only.
Security Envelope RIFT packets are flooded within an authenticated Security Envelope RIFT packets are flooded within an authenticated
security envelope that allows to protect the integrity of security envelope that allows to protect the integrity of
information a node accepts. information a node accepts.
3.2. Topology 3.2. Topology
. +--------+ +--------+ ^ N ^ N +--------+ +--------+
. |ToF 21| |ToF 22| | Level 2 | |ToF 21| |ToF 22|
.Level 2 ++-+--+-++ ++-+--+-++ <-*-> E/W E <-*-> W ++-+--+-++ ++-+--+-++
. | | | | | | | | | | | | | | | | | |
. P111/2| |P121 | | | | S v S v P111/2 P121/2 | | | |
. ^ ^ ^ ^ | | | | ^ ^ ^ ^ | | | |
. | | | | | | | | | | | | | | | |
. +--------------+ | +-----------+ | | | +---------------+ +--------------+ | +-----------+ | | | +---------------+
. | | | | | | | | | | | | | | | |
. South +-----------------------------+ | | ^ South +-----------------------------+ | | ^
. | | | | | | | All TIEs | | | | | | | All TIEs
. 0/0 0/0 0/0 +-----------------------------+ | 0/0 0/0 0/0 +-----------------------------+ |
. v v v | | | | | v v v | | | | |
. | | +-+ +<-0/0----------+ | | | | +-+ +<-0/0----------+ | |
. | | | | | | | | | | | | | | | |
.+-+----++ optional +-+----++ ++----+-+ ++-----++ +-+----++ optional +-+----++ ++----+-+ ++-----++
.| | E/W link | | | | | | Level 1 | | E/W link | | | | | |
.|Spin111+----------+Spin112| |Spin121| |Spin122| |Spin111+----------+Spin112| |Spin121| |Spin122|
.+-+---+-+ ++----+-+ +-+---+-+ ++---+--+ +-+---+-+ ++----+-+ +-+---+-+ ++---+--+
. | | | South | | | | | | | South | | | |
. | +---0/0--->-----+ 0/0 | +----------------+ | | +---0/0--->-----+ 0/0 | +----------------+ |
. 0/0 | | | | | | | 0/0 | | | | | | |
. | +---<-0/0-----+ | v | +--------------+ | | | +---<-0/0-----+ | v | +--------------+ | |
. v | | | | | | | v | | | | | | |
.+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ +-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+
.| | (L2L) | | | | Level 0 | | Level 0 | | (L2L) | | | | | |
.|Leaf111~~~~~~~~~~~~Leaf112| |Leaf121| |Leaf122| |Leaf111+~~~~~~~~~~+Leaf112| |Leaf121| |Leaf122|
.+-+-----+ +-+---+-+ +--+--+-+ +-+-----+ +-+-----+ +-+---+-+ +--+--+-+ +-+-----+
. + + \ / + + + + \ / + +
. Prefix111 Prefix112 \ / Prefix121 Prefix122 Prefix111 Prefix112 \ / Prefix121 Prefix122
. multi-homed multi-homed
. Prefix Prefix
.+---------- Pod 1 ---------+ +---------- Pod 2 ---------+ +---------- PoD 1 ---------+ +---------- PoD 2 ---------+
Figure 2: A three level spine-and-leaf topology Figure 2: A Three Level Spine-and-Leaf Topology
.+--------+ +--------+ +--------+ +--------+ .+--------+ +--------+ +--------+ +--------+
.|ToF A1| |ToF B1| |ToF B2| |ToF A2| .|ToF A1| |ToF B1| |ToF B2| |ToF A2|
.++-+-----+ ++-+-----+ ++-+-----+ ++-+-----+ .++-+-----+ ++-+-----+ ++-+-----+ ++-+-----+
. | | | | | | | | . | | | | | | | |
. | | | | | +---------------+ . | | | | | +---------------+
. | | | | | | | | . | | | | | | | |
. | | | +-------------------------+ | . | | | +-------------------------+ |
. | | | | | | | | . | | | | | | | |
. | +-----------------------+ | | | | . | +-----------------------+ | | | |
. | | | | | | | | . | | | | | | | |
skipping to change at page 15, line 30 skipping to change at page 15, line 30
.+-+---+--+ ++----+--+ +-+---+--+ ++---+---+ .+-+---+--+ ++----+--+ +-+---+--+ ++---+---+
. | | | | | | | | . | | | | | | | |
. | +--------+ | | +--------+ | . | +--------+ | | +--------+ |
. | | | | | | | | . | | | | | | | |
. | -------+ | | | +------+ | | . | -------+ | | | +------+ | |
. | | | | | | | | . | | | | | | | |
.+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ .+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+
.|Leaf111| |Leaf112| |Leaf121| |Leaf122| .|Leaf111| |Leaf112| |Leaf121| |Leaf122|
.+-------+ +-------+ +-------+ +-------+ .+-------+ +-------+ +-------+ +-------+
Figure 3: Topology with multiple planes Figure 3: Topology with Multiple Planes
We will use topology in Figure 2 (called commonly a fat tree/network We will use topology in Figure 2 (called commonly a fat tree/network
in modern IP fabric considerations [VAHDAT08] as homonym to the in modern IP fabric considerations [VAHDAT08] as homonym to the
original definition of the term [FATTREE]) in all further original definition of the term [FATTREE]) in all further
considerations. This figure depicts a generic "single plane fat- considerations. This figure depicts a generic "single plane fat-
tree" and the concepts explained using three levels apply by tree" and the concepts explained using three levels apply by
induction to further levels and higher degrees of connectivity. induction to further levels and higher degrees of connectivity.
Further, this document will deal also with designs that provide only Further, this document will deal also with designs that provide only
sparser connectivity and "partitioned spines" as shown in Figure 3 sparser connectivity and "partitioned spines" as shown in Figure 3
and explained further in Section 4.1.2. and explained further in Section 4.1.2.
skipping to change at page 22, line 35 skipping to change at page 22, line 35
| | | | | | | | | | | PoD top Nodes ^ | | | | | | | | | | | PoD top Nodes ^
+----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ |
|| || || || || || * || || || || || || *
+------------------------------------------------+ | +------------------------------------------------+ |
| Leaf seen sideways | v | Leaf seen sideways | v
+------------------------------------------------+ S +------------------------------------------------+ S
|| || || || || || || || || || || ||
Connecting to Client nodes Connecting to Client nodes
Figure 8: Other side View of a PoD, K_TOP=8, K_LEAF=6, 90o turn in Figure 8: Other Side View of a PoD, K_TOP=8, K_LEAF=6, 90o turn in
E-W Plane E-W Plane
As next step, let us observe that a resulting PoD can be abstracted As next step, let us observe that a resulting PoD can be abstracted
as a bigger node with a number K of K_POD= K_TOP * K_LEAF, and the as a bigger node with a number K of K_POD= K_TOP * K_LEAF, and the
design can recurse. design can recurse.
It will be critical at this point that, before progressing further, It will be critical at this point that, before progressing further,
the concept and the picture of "crossed crossbars" is clear. Else, the concept and the picture of "crossed crossbars" is clear. Else,
the following considerations might be difficult to comprehend. the following considerations might be difficult to comprehend.
skipping to change at page 34, line 12 skipping to change at page 34, line 12
information reaching beyond a single L3 next-hop in the topology. information reaching beyond a single L3 next-hop in the topology.
LIEs SHOULD be sent with network control precedence. LIEs SHOULD be sent with network control precedence.
Originating port of the LIE has no further significance other than Originating port of the LIE has no further significance other than
identifying the origination point. LIEs are exchanged over all links identifying the origination point. LIEs are exchanged over all links
running RIFT. running RIFT.
An implementation MAY listen and send LIEs on IPv4 and/or IPv6 An implementation MAY listen and send LIEs on IPv4 and/or IPv6
multicast addresses. A node MUST NOT originate LIEs on an address multicast addresses. A node MUST NOT originate LIEs on an address
family if it does not process received LIEs on that family. LIEs on family if it does not process received LIEs on that family. LIEs on
same link are considered part of the same negotiation independent on same link are considered part of the same negotiation independent of
the address family they arrive on. Observe further that the LIE the address family they arrive on. Observe further that the LIE
source address may not identify the peer uniquely in unnumbered or source address may not identify the peer uniquely in unnumbered or
link-local address cases so the response transmission MUST occur over link-local address cases so the response transmission MUST occur over
the same interface the LIEs have been received on. A node MAY use the same interface the LIEs have been received on. A node MAY use
any of the adjacency's source addresses it saw in LIEs on the any of the adjacency's source addresses it saw in LIEs on the
specific interface during adjacency formation to send TIEs. That specific interface during adjacency formation to send TIEs. That
implies that an implementation MUST be ready to accept TIEs on all implies that an implementation MUST be ready to accept TIEs on all
addresses it used as source of LIE frames. addresses it used as source of LIE frames.
A three-way adjacency over any address family implies support for A three-way adjacency over any address family implies support for
skipping to change at page 49, line 5 skipping to change at page 49, line 5
Leaf112 North TIEs: Leaf112 North TIEs:
Node North TIE: Node North TIE:
NodeElement(level=0, NodeElement(level=0,
neighbors((Spine 111, level 1, cost 1, links(...)), neighbors((Spine 111, level 1, cost 1, links(...)),
(Spine 112, level 1, cost 1, links(...)))) (Spine 112, level 1, cost 1, links(...))))
Prefix North TIE: Prefix North TIE:
NorthPrefixesElement(prefixes(Leaf112.loopback, Prefix112, NorthPrefixesElement(prefixes(Leaf112.loopback, Prefix112,
Prefix_MH)) Prefix_MH))
Figure 14: example TIES generated in a 2 level spine-and-leaf Figure 14: Example TIES Generated in a 2 Level Spine-and-Leaf
topology Topology
It may be here not necessarily obvious why the node South TIEs It may be here not necessarily obvious why the node South TIEs
contain all the adjacencies of the according node. This will be contain all the adjacencies of the according node. This will be
necessary for algorithms given in Section 4.2.3.9 and Section 4.3.6. necessary for algorithms given in Section 4.2.3.9 and Section 4.3.6.
4.2.3.3. Flooding 4.2.3.3. Flooding
The mechanism used to distribute TIEs is the well-known (albeit The mechanism used to distribute TIEs is the well-known (albeit
modified in several respects to take advantage of fat tree topology) modified in several respects to take advantage of fat tree topology)
flooding mechanism used by today's link-state protocols. Although flooding mechanism used by today's link-state protocols. Although
flooding is initially more demanding to implement it avoids many flooding is initially more demanding to implement it avoids many
problems with update style used in diffused computation such as problems with update style used in diffused computation such as
distance vector protocols. Since flooding tends to present an distance vector protocols. Since flooding tends to present an
unscalable burden in large, densely meshed topologies (fat trees unscalable burden in large, densely meshed topologies (fat trees
being unfortunately such a topology) we provide as solution a close being unfortunately such a topology) we provide as solution close to
to optimal global flood reduction and load balancing optimization in optimal global flood reduction and load balancing optimization in
Section 4.2.3.9. Section 4.2.3.9.
As described before, TIEs themselves are transported over UDP with As described before, TIEs themselves are transported over UDP with
the ports indicated in the LIE exchanges and using the destination the ports indicated in the LIE exchanges and using the destination
address on which the LIE adjacency has been formed. For unnumbered address on which the LIE adjacency has been formed. For unnumbered
IPv4 interfaces same considerations apply as in equivalent OSPF case. IPv4 interfaces same considerations apply as in equivalent OSPF case.
4.2.3.3.1. Normative Flooding Procedures 4.2.3.3.1. Normative Flooding Procedures
On reception of a TIE with an undefined level value in the packet On reception of a TIE with an undefined level value in the packet
skipping to change at page 50, line 12 skipping to change at page 50, line 12
and faster rate (speed of light holding for the moment). The encoded and faster rate (speed of light holding for the moment). The encoded
packets provide hints to react accordingly to losses or overruns. packets provide hints to react accordingly to losses or overruns.
Flooding of all according topology exchange elements SHOULD be Flooding of all according topology exchange elements SHOULD be
performed at highest feasible rate whereas the rate of transmission performed at highest feasible rate whereas the rate of transmission
MUST be throttled by reacting to adequate features of the system such MUST be throttled by reacting to adequate features of the system such
as e.g. queue lengths or congestion indications in the protocol as e.g. queue lengths or congestion indications in the protocol
packets. packets.
A node SHOULD NOT send out any topology information elements if the A node SHOULD NOT send out any topology information elements if the
adjacancy is not in a "three-way" state. No further tightening of adjacency is not in a "three-way" state. No further tightening of
this rule is possible due to possible link buffering and re-ordering this rule is possible due to possible link buffering and re-ordering
of LIEs and TIEs/TIDEs/TIREs. of LIEs and TIEs/TIDEs/TIREs.
A node MUST drop any received TIEs/TIDEs/TIREs unless it is in three- A node MUST drop any received TIEs/TIDEs/TIREs unless it is in three-
way state. way state.
TIDEs and TIREs MUST NOT be re-flooded the way TIEs of other nodes TIDEs and TIREs MUST NOT be re-flooded the way TIEs of other nodes
are are MUST be always generated by the node itself and cross only to are are MUST be always generated by the node itself and cross only to
the neighboring node. the neighboring node.
skipping to change at page 61, line 9 skipping to change at page 61, line 9
prepared to receive TIEs with its own system ID and supersede them prepared to receive TIEs with its own system ID and supersede them
with equivalent, newly generated, empty TIEs with a higher sequence with equivalent, newly generated, empty TIEs with a higher sequence
number. As above, the lifetime can be relatively short since it only number. As above, the lifetime can be relatively short since it only
needs to exceed the necessary propagation and processing delay by all needs to exceed the necessary propagation and processing delay by all
the nodes that are within the TIE's flooding scope. the nodes that are within the TIE's flooding scope.
TIE sequence numbers are rolled over using the method described in TIE sequence numbers are rolled over using the method described in
Appendix A. First sequence number of any spontaneously originated Appendix A. First sequence number of any spontaneously originated
TIE (i.e. not originated to override a detected older copy in the TIE (i.e. not originated to override a detected older copy in the
network) MUST be a reasonably unpredictable random number in the network) MUST be a reasonably unpredictable random number in the
interval [0, 2^10-1] which will prevent otherwise identical TIE interval [0, 2^30-1] which will prevent otherwise identical TIE
headers to remain "stuck" in the network with content different from headers to remain "stuck" in the network with content different from
TIE originated after reboot. TIE originated after reboot. In traditional link-state protocols
this is delegated to a 16-bit checksum on packet content. RIFT
avoids this design due to the CPU burden presented by computation of
such checksums and additional complications tied to the fact that the
checksum must be "patched" into the packet after the computation, a
difficult proposition in binary hand-crafted formats already and
highly incompatible with model-based, serialized formats. The
sequence number space is hence consciously chosen to be 64-bits wide
to make the occurence of a TIE with same sequence number but
different content as much or even more unlikely than the checksum
method. To emulate the "checksum behavior" an implementation could
e.g. choose to compute 64-bit checksum over the packet content and
use that as first sequence number after reboot.
4.2.3.8. Southbound Default Route Origination 4.2.3.8. Southbound Default Route Origination
Under certain conditions nodes issue a default route in their South Under certain conditions nodes issue a default route in their South
Prefix TIEs with costs as computed in Section 4.3.6.1. Prefix TIEs with costs as computed in Section 4.3.6.1.
A node X that A node X that
1. is NOT overloaded AND 1. is NOT overloaded AND
skipping to change at page 63, line 46 skipping to change at page 64, line 13
parent P over adjacencies ADJ(N, P) and ADJ(P, G). Observe that N parent P over adjacencies ADJ(N, P) and ADJ(P, G). Observe that N
does not have enough information to check bidirectional does not have enough information to check bidirectional
reachability of A(P, G); reachability of A(P, G);
o let R be a redundancy constant integer; a value of 2 or higher for o let R be a redundancy constant integer; a value of 2 or higher for
R is RECOMMENDED; R is RECOMMENDED;
o let S be a similarity constant integer; a value in range 0 .. 2 o let S be a similarity constant integer; a value in range 0 .. 2
for S is RECOMMENDED, the value of 1 SHOULD be used. Two for S is RECOMMENDED, the value of 1 SHOULD be used. Two
cardinalities are considered as equivalent if their absolute cardinalities are considered as equivalent if their absolute
difference is less than or equal to S, i.e. difference is less than or equal to S, i.e. |a-b|<=S.
o |a-b|<=S.
o let RND be a 64-bit random number generated by the system once on o let RND be a 64-bit random number generated by the system once on
startup. startup.
The algorithm consists of the following steps: The algorithm consists of the following steps:
1. Derive a 64-bits number by XOR'ing 'N's system ID with RND. 1. Derive a 64-bits number by XOR'ing 'N's system ID with RND.
2. Derive a 16-bits pseudo-random unsigned integer PR(N) from the 2. Derive a 16-bits pseudo-random unsigned integer PR(N) from the
resulting 64-bits number by splitting it in 16-bits-long words resulting 64-bits number by splitting it in 16-bits-long words
skipping to change at page 67, line 7 skipping to change at page 67, line 24
More difficult is a condition where a node (e.g. a leaf) floods a TIE More difficult is a condition where a node (e.g. a leaf) floods a TIE
north towards its grandparent, then its parent reboots, in fact north towards its grandparent, then its parent reboots, in fact
partitioning the grandparent from leaf directly and then the leaf partitioning the grandparent from leaf directly and then the leaf
itself reboots. That can leave the grandparent holding the "primary itself reboots. That can leave the grandparent holding the "primary
copy" of the leaf's TIE. Normally this condition is resolved easily copy" of the leaf's TIE. Normally this condition is resolved easily
by the leaf re-originating its TIE with a higher sequence number than by the leaf re-originating its TIE with a higher sequence number than
it sees in northbound TIEs, here however, when the parent comes back it sees in northbound TIEs, here however, when the parent comes back
it won't be able to obtain leaf's North TIE from the grandparent it won't be able to obtain leaf's North TIE from the grandparent
easily and with that the leaf may not issue the TIE with a higher easily and with that the leaf may not issue the TIE with a higher
sequence number that can reach the granparent for a long time. sequence number that can reach the grandparent for a long time.
Flooding procedures are extended to deal with the problem by the Flooding procedures are extended to deal with the problem by the
means of special clauses that override the database of a lower level means of special clauses that override the database of a lower level
with headers of newer TIEs seen in TIDEs coming from the north. with headers of newer TIEs seen in TIDEs coming from the north.
4.2.4. Reachability Computation 4.2.4. Reachability Computation
A node has three possible sources of relevant information for A node has three possible sources of relevant information for
reachability computation. A node knows the full topology south of it reachability computation. A node knows the full topology south of it
from the received North Node TIEs or alternately north of it from the from the received North Node TIEs or alternately north of it from the
South Node TIEs. A node has the set of prefixes with their South Node TIEs. A node has the set of prefixes with their
skipping to change at page 70, line 9 skipping to change at page 70, line 24
2. The node uses reflected South TIEs to find all nodes at the same 2. The node uses reflected South TIEs to find all nodes at the same
level in the same PoD and the set of southbound adjacencies for level in the same PoD and the set of southbound adjacencies for
each. The set of nodes at the same level is termed |N and for each. The set of nodes at the same level is termed |N and for
each node, n, in |N, we define its set of southbound adjacencies each node, n, in |N, we define its set of southbound adjacencies
to be |A(n). to be |A(n).
3. For a given r, if the intersection of |H(r) and |A(n), for any n, 3. For a given r, if the intersection of |H(r) and |A(n), for any n,
is null then that prefix r must be explicitly advertised by the is null then that prefix r must be explicitly advertised by the
node in an South TIE. node in an South TIE.
3.
4. Identical set of de-aggregated prefixes is flooded on each of the 4. Identical set of de-aggregated prefixes is flooded on each of the
node's southbound adjacencies. In accordance with the normal node's southbound adjacencies. In accordance with the normal
flooding rules for an South TIE, a node at the lower level that flooding rules for an South TIE, a node at the lower level that
receives this South TIE SHOULD NOT propagate it south-bound or receives this South TIE SHOULD NOT propagate it south-bound or
reflect the disaggregated prefixes back over its adjacencies to reflect the disaggregated prefixes back over its adjacencies to
nodes at the level from which it was received. nodes at the level from which it was received.
To summarize the above in simplest terms: if a node detects that its To summarize the above in simplest terms: if a node detects that its
default route encompasses prefixes for which one of the other nodes default route encompasses prefixes for which one of the other nodes
in its level has no possible next-hops in the level below, it has to in its level has no possible next-hops in the level below, it has to
skipping to change at page 71, line 4 skipping to change at page 71, line 19
Y.south_neighbors, add (N, (Y)) to partial_neighbors if N isn't Y.south_neighbors, add (N, (Y)) to partial_neighbors if N isn't
there or add Y to the list for N. there or add Y to the list for N.
4. If partial_neighbors is empty, then node X does not disaggregate 4. If partial_neighbors is empty, then node X does not disaggregate
any prefixes. If node X is advertising disaggregated prefixes in any prefixes. If node X is advertising disaggregated prefixes in
its South TIE, X SHOULD remove them and re-advertise its its South TIE, X SHOULD remove them and re-advertise its
according South TIEs. according South TIEs.
A node X computes reachability to all nodes below it based upon the A node X computes reachability to all nodes below it based upon the
received North TIEs first. This results in a set of routes, each received North TIEs first. This results in a set of routes, each
categorized by (prefix, path_distance, next-hop-set). Alternately, categorized by (prefix, path_distance, next-hop set). Alternately,
for clarity in the following procedure, these can be organized by for clarity in the following procedure, these can be organized by
next-hop-set as ( (next-hops), {(prefix, path_distance)}). If next-hop set as ( (next-hops), {(prefix, path_distance)}). If
partial_neighbors isn't empty, then the following procedure describes partial_neighbors isn't empty, then the following procedure describes
how to identify prefixes to disaggregate. how to identify prefixes to disaggregate.
disaggregated_prefixes = { empty } disaggregated_prefixes = { empty }
nodes_same_level = { empty } nodes_same_level = { empty }
for each South TIE for each South TIE
if (South TIE.level == X.level and if (South TIE.level == X.level and
X shares at least one S-neighbor with X) X shares at least one S-neighbor with X)
add South TIE.originator to nodes_same_level add South TIE.originator to nodes_same_level
end if end if
skipping to change at page 73, line 20 skipping to change at page 74, line 16
. II II II II . II II II II
.+----++--+ +----++--+ +----++--+ +----++--+ .+----++--+ +----++--+ +----++--+ +----++--+
.|ToF A1| |ToF B1| |ToF B2| |ToF A2| .|ToF A1| |ToF B1| |ToF B2| |ToF A2|
.++-+-++--+ ++-+-++--+ ++-+-++--+ ++-+-++--+ .++-+-++--+ ++-+-++--+ ++-+-++--+ ++-+-++--+
. | | II | | II | | II | | II . | | II | | II | | II | | II
. | | ++==========++ | | ++==========++ . | | ++==========++ | | ++==========++
. | | | | | | | | . | | | | | | | |
. .
. ~~~ Highlighted ToF of the previous multi-plane figure ~~ . ~~~ Highlighted ToF of the previous multi-plane figure ~~
Figure 16: Topologically connected planes Figure 16: Topologically Connected Planes
As described in Section 4.1.3 failures in multi-plane fabrics can As described in Section 4.1.3 failures in multi-plane fabrics can
lead to blackholes which normal positive disaggregation cannot fix. lead to blackholes which normal positive disaggregation cannot fix.
The mechanism of negative, transitive disaggregation incorporated in The mechanism of negative, transitive disaggregation incorporated in
RIFT provides the according solution. RIFT provides the according solution.
4.2.5.2.2. Transitive Advertisement of Negative Disaggregates 4.2.5.2.2. Transitive Advertisement of Negative Disaggregates
A ToF node that discovers that it cannot reach a fallen leaf A ToF node that discovers that it cannot reach a fallen leaf
disaggregates all the prefixes of such leaves. It uses for that disaggregates all the prefixes of such leaves. It uses for that
skipping to change at page 75, line 10 skipping to change at page 75, line 52
use can be a set; presence of the computing node in the associated use can be a set; presence of the computing node in the associated
Node South TIE is sufficient to verify that at least one link has Node South TIE is sufficient to verify that at least one link has
bidirectional connectivity. The set of minimum cost next-hops from bidirectional connectivity. The set of minimum cost next-hops from
the computing node X to the originating adjacent node is determined. the computing node X to the originating adjacent node is determined.
Each prefix has its cost adjusted before being added into the RIFT Each prefix has its cost adjusted before being added into the RIFT
route database. The cost of the prefix is set to the cost received route database. The cost of the prefix is set to the cost received
plus the cost of the minimum distance next-hop to that neighbor while plus the cost of the minimum distance next-hop to that neighbor while
taking into account its attributes such as mobility per taking into account its attributes such as mobility per
Section 4.3.3. Then each prefix can be added into the RIFT route Section 4.3.3. Then each prefix can be added into the RIFT route
database with the next_hop_set; ties are broken based upon type first database with the next-hop set; ties are broken based upon type first
and then distance and further on `PrefixAttributes` and only the best and then distance and further on `PrefixAttributes` and only the best
combination is used for forwarding. RIFT route preferences are combination is used for forwarding. RIFT route preferences are
normalized by the according Thrift [thrift] model type. normalized by the according Thrift [thrift] model type.
An example implementation for node X follows: An example implementation for node X follows:
for each South TIE for each South TIE
if South TIE.level > X.level if South TIE.level > X.level
next_hop_set = set of minimum cost links to the next_hop_set = set of minimum cost links to the
South TIE.originator South TIE.originator
skipping to change at page 77, line 17 skipping to change at page 77, line 46
+----+ +----+ +----+ +----+ W< + >E +----+ +----+ +----+ +----+ W< + >E
| | | | v | | | | v
|+--------+ | | S |+--------+ | | S
||+-----------------+ | ||+-----------------+ |
|||+----------------+---------+ |||+----------------+---------+
|||| ||||
+----+ +----+
| T1 | | T1 |
+----+ +----+
Figure 18: A ToP node with 4 parents Figure 18: A ToP Node with 4 Parents
If all ToF nodes can reach all the prefixes in the network; with If all ToF nodes can reach all the prefixes in the network; with
RIFT, they will normally advertise a default route south. An RIFT, they will normally advertise a default route south. An
abstract Routing Information Base (RIB), more commonly known as a abstract Routing Information Base (RIB), more commonly known as a
routing table, stores all types of maintained routes including the routing table, stores all types of maintained routes including the
negative ones and "tie-breaks" for the best one, whereas an abstract negative ones and "tie-breaks" for the best one, whereas an abstract
Forwarding table (FIB) retains only the ultimately computed Forwarding table (FIB) retains only the ultimately computed
"positive" routing instructions. In T1, those tables would look as "positive" routing instructions. In T1, those tables would look as
illustrated in Figure 19: illustrated in Figure 19:
skipping to change at page 78, line 31 skipping to change at page 79, line 25
| +--------+ | +--------+
| |
| +--------+ | +--------+
+---> | Via S3 | +---> | Via S3 |
| +---------+ | +---------+
| |
| +--------+ | +--------+
+---> | Via S4 | +---> | Via S4 |
+--------+ +--------+
Figure 20: Abstract RIB after negative 2001:db8::/32 from S1 Figure 20: Abstract RIB after Negative 2001:db8::/32 from S1
The negative 2001:db8::/32 prefix entry inherits from ::/0, so the The negative 2001:db8::/32 prefix entry inherits from ::/0, so the
positive more specific routes are the complements to S1 in the set of positive more specific routes are the complements to S1 in the set of
next-hops for the default route. That entry is composed of S2, S3, next-hops for the default route. That entry is composed of S2, S3,
and S4, or, in other words, it uses all entries the the default route and S4, or, in other words, it uses all entries the the default route
with a "hole punched" for S1 into them. These are the next hops that with a "hole punched" for S1 into them. These are the next hops that
are still available to reach 2001:db8::/32, now that S1 advertised are still available to reach 2001:db8::/32, now that S1 advertised
that it will not forward 2001:db8::/32 anymore. Ultimately, those that it will not forward 2001:db8::/32 anymore. Ultimately, those
resulting next-hops are installed in FIB for the more specific route resulting next-hops are installed in FIB for the more specific route
to 2001:db8::/32 as illustrated below: to 2001:db8::/32 as illustrated below:
skipping to change at page 79, line 25 skipping to change at page 80, line 25
| +--------+ | +--------+ | +--------+ | +--------+
| | | |
| +--------+ | +--------+ | +--------+ | +--------+
+---> | Via S3 | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 |
| +--------+ | +--------+ | +--------+ | +--------+
| | | |
| +--------+ | +--------+ | +--------+ | +--------+
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 |
+--------+ +--------+ +--------+ +--------+
Figure 21: Abstract FIB after negative 2001:db8::/32 from S1 Figure 21: Abstract FIB after Negative 2001:db8::/32 from S1
To illustrate matters further let us consider T1 receiving a negative To illustrate matters further let us consider T1 receiving a negative
advertisement for prefix 2001:db8:1::/48 from S2, which is stored in advertisement for prefix 2001:db8:1::/48 from S2, which is stored in
RIB again. After the update, the RIB in T1 is illustrated in RIB again. After the update, the RIB in T1 is illustrated in
Figure 22: Figure 22:
+---------+ +----------------+ +------------------+ +---------+ +----------------+ +------------------+
| Default | <----- | ~2001:db8::/32 | <------ | ~2001:db8:1::/48 | | Default | <----- | ~2001:db8::/32 | <------ | ~2001:db8:1::/48 |
+---------+ +----------------+ +------------------+ +---------+ +----------------+ +------------------+
| | | | | |
skipping to change at page 80, line 25 skipping to change at page 81, line 25
| +--------+ +--------+ | +--------+ +--------+
| |
| +--------+ | +--------+
+---> | Via S3 | +---> | Via S3 |
| +---------+ | +---------+
| |
| +--------+ | +--------+
+---> | Via S4 | +---> | Via S4 |
+--------+ +--------+
Figure 22: Abstract RIB after negative 2001:db8:1::/48 from S2 Figure 22: Abstract RIB after Negative 2001:db8:1::/48 from S2
Negative 2001:db8:1::/48 inherits from 2001:db8::/32 now, so the Negative 2001:db8:1::/48 inherits from 2001:db8::/32 now, so the
positive more specific routes are the complements to S2 in the set of positive more specific routes are the complements to S2 in the set of
next hops for 2001:db8::/32, which are S3 and S4, or, in other words, next hops for 2001:db8::/32, which are S3 and S4, or, in other words,
all entries of the parent with the negative holes "punched in" again. all entries of the parent with the negative holes "punched in" again.
After the update, the FIB in T1 shows as illustrated in Figure 23: After the update, the FIB in T1 shows as illustrated in Figure 23:
+---------+ +---------------+ +-----------------+ +---------+ +---------------+ +-----------------+
| Default | | 2001:db8::/32 | | 2001:db8:1::/48 | | Default | | 2001:db8::/32 | | 2001:db8:1::/48 |
+---------+ +---------------+ +-----------------+ +---------+ +---------------+ +-----------------+
skipping to change at page 81, line 25 skipping to change at page 82, line 25
| +--------+ | +--------+ | | +--------+ | +--------+ |
| | | | | |
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
+---> | Via S3 | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 |
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
| | | | | |
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 |
+--------+ +--------+ +--------+ +--------+ +--------+ +--------+
Figure 23: Abstract FIB after negative 2001:db8:1::/48 from S2 Figure 23: Abstract FIB after Negative 2001:db8:1::/48 from S2
Further, let us say that S3 stops advertising its service as default Further, let us say that S3 stops advertising its service as default
gateway. The entry is removed from RIB as usual. In order to update gateway. The entry is removed from RIB as usual. In order to update
the FIB, it is necessary to eliminate the FIB entry for the default the FIB, it is necessary to eliminate the FIB entry for the default
route, as well as all the FIB entries that were created for negative route, as well as all the FIB entries that were created for negative
routes pointing to the RIB entry being removed (::/0). This is done routes pointing to the RIB entry being removed (::/0). This is done
recursively for 2001:db8::/32 and then for, 2001:db8:1::/48. The recursively for 2001:db8::/32 and then for, 2001:db8:1::/48. The
related FIB entries via S3 are removed, as illustrated in Figure 24. related FIB entries via S3 are removed, as illustrated in Figure 24.
+---------+ +---------------+ +-----------------+ +---------+ +---------------+ +-----------------+
skipping to change at page 82, line 25 skipping to change at page 83, line 25
| +--------+ | +--------+ | | +--------+ | +--------+ |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| | | | | |
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 |
+--------+ +--------+ +--------+ +--------+ +--------+ +--------+
Figure 24: Abstract FIB after loss of S3 Figure 24: Abstract FIB after Loss of S3
Say that at that time, S4 would also disaggregate prefix Say that at that time, S4 would also disaggregate prefix
2001:db8:1::/48. This would mean that the FIB entry for 2001:db8:1::/48. This would mean that the FIB entry for
2001:db8:1::/48 becomes a discard route, and that would be the signal 2001:db8:1::/48 becomes a discard route, and that would be the signal
for T1 to disaggregate prefix 2001:db8:1::/48 negatively in a for T1 to disaggregate prefix 2001:db8:1::/48 negatively in a
transitive fashion with its own children. transitive fashion with its own children.
Finally, let us look at the case where S3 becomes available again as Finally, let us look at the case where S3 becomes available again as
a default gateway, and a negative advertisement is received from S4 a default gateway, and a negative advertisement is received from S4
about prefix 2001:db8:2::/48 as opposed to 2001:db8:1::/48. Again, a about prefix 2001:db8:2::/48 as opposed to 2001:db8:1::/48. Again, a
skipping to change at page 83, line 29 skipping to change at page 84, line 29
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
| | | | | |
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
+---> | Via S3 | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 | +---> | Via S3 |
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
| | | | | |
| +--------+ | +--------+ | +--------+ | +--------+ | +--------+ | +--------+
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 |
+--------+ +--------+ +--------+ +--------+ +--------+ +--------+
Figure 25: Abstract FIB after negative 2001:db8:2::/48 from S4 Figure 25: Abstract FIB after Negative 2001:db8:2::/48 from S4
4.2.7. Optional Zero Touch Provisioning (ZTP) 4.2.7. Optional Zero Touch Provisioning (ZTP)
Each RIFT node can operate in zero touch provisioning (ZTP) mode, Each RIFT node can operate in zero touch provisioning (ZTP) mode,
i.e. it has no configuration (unless it is a Top-of-Fabric at the top i.e. it has no configuration (unless it is a Top-of-Fabric at the top
of the topology or the must operate in the topology as leaf and/or of the topology or the must operate in the topology as leaf and/or
support leaf-2-leaf procedures) and it will fully configure itself support leaf-2-leaf procedures) and it will fully configure itself
after being attached to the topology. Configured nodes and nodes after being attached to the topology. Configured nodes and nodes
operating in ZTP can be mixed and will form a valid topology if operating in ZTP can be mixed and will form a valid topology if
achievable. achievable.
skipping to change at page 97, line 9 skipping to change at page 98, line 9
all nodes with highest level either leaving or entering the domain all nodes with highest level either leaving or entering the domain
(with some finer distinctions not explained further). It is (with some finer distinctions not explained further). It is
therefore recommended that each node is multi-homed towards nodes therefore recommended that each node is multi-homed towards nodes
with respective HAL offerings. Fortunately, this is the natural with respective HAL offerings. Fortunately, this is the natural
state of things for the topology variants considered in RIFT. state of things for the topology variants considered in RIFT.
4.3. Further Mechanisms 4.3. Further Mechanisms
4.3.1. Overload Bit 4.3.1. Overload Bit
The overload Bit MUST be respected in all according reachability The overload bit MUST be respected by all necessary SPF computations.
computations. A node with overload bit set SHOULD NOT advertise any A node with the overload bit set SHOULD advertise all locally hosted
reachability prefixes southbound except locally hosted ones. A node prefixes both northbound and southbound, all other southbound
in overload SHOULD advertise all its locally hosted prefixes north prefixes SHOULD NOT be advertised.
and southbound.
The leaf node SHOULD set the 'overload' bit on its node TIEs, since Leaf nodes SHOULD set the overload bit on all originated Node TIEs.
if the spine nodes were to forward traffic not meant for the local If spine nodes were to forward traffic not intended for the local
node, the leaf node does not have the topology information to prevent node, the leaf node would not be able to prevent routing/forwarding
a routing/forwarding loop. loops as it does not have the necessary topology information to do
so.
4.3.2. Optimized Route Computation on Leaves 4.3.2. Optimized Route Computation on Leaves
Since the leaves do see only "one hop away" they do not need to run a Leaf nodes only have visibility to directly connected nodes and
"proper" SPF. Instead, they can gather the available prefix therefore are not required to run "full" SPF computations. Instead,
candidates from their neighbors and build the routing table prefixes from neighboring nodes can be gathered to run a "partial"
accordingly. SPF computation in order to build the routing table.
A leaf will have no North TIEs except its own and optionally from its Leaf nodes SHOULD only hold their own N-TIEs, and in cases of L2L
East-West neighbors. A leaf will have South TIEs from its neighbors. implementations, the N-TIEs of their East/West neighbors. Leaf nodes
MUST hold all S-TIEs from their neighbors.
Instead of creating a network graph from its North TIEs and Normally, a full network graph is created based on local N-TIEs and
neighbor's South TIEs and then running an SPF, a leaf node can simply remote S-TIEs that it receives from neighbors, at which time,
compute the minimum cost and next_hop_set to each leaf neighbor by necessary SPF computations are performed. Instead, leaf nodes can
examining its local adjacencies, determining bi-directionality from simply compute the minimum cost and next-hop set of each leaf
the associated North TIE, and specifying the neighbor's next_hop_set neighbor by examining its local adjacencies. Associated N-TIEs are
set and cost from the minimum cost local adjacency to that neighbor. used to determine bi-directionality and derive the next-hop set.
Cost is then derived from the minimum cost of the local adjacency to
the neighbor and the prefix cost.
Then a leaf attaches prefixes as described in Section 4.2.6. Leaf nodes would then attach necessary prefixes as described in
Section 4.2.6.
4.3.3. Mobility 4.3.3. Mobility
It is a requirement for RIFT to maintain at the control plane a real The RIFT control plane MUST maintain the real time status of every
time status of which prefix is attached to which port of which leaf, prefix, to which port it is attached, and to which leaf node that
even in a context of mobility where the point of attachment may port belongs. This is still true in cases of IP mobility where the
change several times in a subsecond period of time. point of attachment may change several times a second.
There are two classical approaches to maintain such knowledge in an There are two classic approaches to explicitly maintain this
unambiguous fashion: information:
time stamp: With this method, the infrastructure records the precise timestamp: With this method, the infrastructure SHOULD record the
time at which the movement is observed. One key advantage of this precise time at which the movement is observed. One key advantage
technique is that it has no dependency on the mobile device. One of this technique is that it has no dependency on the mobile
drawback is that the infrastructure must be precisely synchronized device. One drawback is that the infrastructure MUST be precisely
to be able to compare time stamps as observed by the various synchronized in order to be able to compare timestamps as the
points of attachment, e.g., using the variation of the Precision points of attachment change. This could be accomplished by
Time Protocol (PTP) IEEE Std. 1588 [IEEEstd1588], [IEEEstd8021AS] utilizing Precision Time Protocol (PTP) IEEE Std. 1588
designed for bridged LANs IEEE Std. 802.1AS [IEEEstd8021AS]. Both [IEEEstd1588] or 802.1AS [IEEEstd8021AS] which is designed for
the precision of the synchronization protocol and the resolution bridged LANs. Both the precision of the synchronization protocol
of the time stamp must beat the highest possible roaming time on and the resolution of the timestamp must beat the highest possible
the fabric. Another drawback is that the presence of the mobile roaming time on the fabric. Another drawback is that the presence
device may be observed only asynchronously, e.g., after it starts of a mobile device may only be observed asynchronously, such as
using an IP protocol such as ARP [RFC0826], IPv6 Neighbor when it starts using an IP protocol like ARP [RFC0826], IPv6
Discovery [RFC4861][RFC4862], or DHCP [RFC2131][RFC8415]. Neighbor Discovery [RFC4861], IPv6 Stateless Address Configuration
[RFC4862], DHCP [RFC2131], or DHCPv6 [RFC8415].
sequence counter: With this method, a mobile node notifies its point sequence counter: With this method, a mobile device notifies its
of attachment on arrival with a sequence counter that is point of attachment on arrival with a sequence counter that is
incremented upon each movement. On the positive side, this method incremented upon each movement. On the positive side, this method
does not have a dependency on a precise sense of time, since the does not have a dependency on a precise sense of time, since the
sequence of movements is kept in order by the device. The sequence of movements is kept in order by the mobile device. The
disadvantage of this approach is the lack of support for protocols disadvantage of this approach is the lack of support for protocols
that may be used by the mobile node to register its presence to that may be used by the mobile device to register its presence to
the leaf node with the capability to provide a sequence counter. the leaf node with the capability to provide a sequence counter.
Well-known issues with wrapping sequence counters must be Well-known issues with sequence counters such as wrapping and
addressed properly, and many forms of sequence counters that vary comparison rules MUST be addressed properly. Sequence numbers
in both wrapping rules and comparison rules. A particular MUST be compared by a single homogenous source to make operation
knowledge of the source of the sequence counter is required to feasible. Sequence number comparison from multiple heterogeneous
operate it, and the comparison between sequence counters from sources would be extremely difficult to implement.
heterogeneous sources can be hard to impossible.
RIFT supports a hybrid approach contained in an optional RIFT supports a hybrid approach by using an optional
`PrefixSequenceType` prefix attribute that we call a `monotonic 'PrefixSequenceType' attribute (that we also call a 'monotonic
clock` consisting of a timestamp and optional sequence number. In clock') that consists of a timestamp and optional sequence number
case of presence of the attribute: field. When this attribute is present (observe that per data schema
the attribute itself is optional but in case it is included the
'timestamp' field is required):
o The leaf node MAY advertise a time stamp of the latest sighting of o The leaf node MAY advertise a timestamp of the latest sighting of
a prefix, e.g., by snooping IP protocols or the node using the a prefix, e.g., by snooping IP protocols or the node using the
time at which it advertised the prefix. RIFT transports the time time at which it advertised the prefix. RIFT transports the
stamp within the desired prefix North TIEs as 802.1AS timestamp. timestamp within the desired prefix North TIEs as 802.1AS
timestamp.
o RIFT may interoperate with the "update to 6LoWPAN Neighbor o RIFT MAY interoperate with "Registration Extensions for 6LoWPAN
Discovery" [RFC8505], which provides a method for registering a Neighbor Discovery" [RFC8505], which provides a method for
prefix with a sequence counter called a Transaction ID (TID). registering a prefix with a sequence number called a Transaction
RIFT transports in such case the TID in its native form. ID (TID). In such cases, RIFT SHOULD transport the derived TID
without modification.
o RIFT also defines an abstract negative clock (ASNC) that compares o RIFT also defines an abstract negative clock (ASNC) (also called
as less than any other clock. By default, the lack of a an 'undefined' clock). ASNC MUST be considered older than any
`PrefixSequenceType` in a Prefix North TIE is interpreted as ASNC. other defined clock. By default, when a node receives a prefix
We call this also an `undefined` clock. North TIE that does not contain a 'PrefixSequenceType' attribute,
it MUST interpret the absence as ASNC.
o Any prefix present on the fabric in multiple nodes that has the o Any prefix present on the fabric in multiple nodes that has the
`same` clock is considered as anycast. ASNC is always considered `same` clock is considered as anycast.
smaller than any defined clock.
o RIFT implementation assumes by default that all nodes are being o RIFT specification assumes that all nodes are being synchronized
synchronized to 200 milliseconds precision which is easily to at least 200 milliseconds of precision. This is achievable
achievable even in very large fabrics using [RFC5905]. An through the use of NTP [RFC5905]. An implementation MAY provide a
implementation MAY provide a way to reconfigure a domain to a way to reconfigure a domain to a different value, we call this
different value. We call this variable MAXIMUM_CLOCK_DELTA. variable MAXIMUM_CLOCK_DELTA.
4.3.3.1. Clock Comparison 4.3.3.1. Clock Comparison
All monotonic clock values are comparable to each other using the All monotonic clock values MUST be compared to each other using the
following rules: following rules:
1. ASNC is older than any other value except ASNC AND 1. ASNC is older than any other value except ASNC AND
2. Clock with timestamp differing by more than MAXIMUM_CLOCK_DELTA 2. Clock with timestamp differing by more than MAXIMUM_CLOCK_DELTA
are comparable by using the timestamps only AND are comparable by using the timestamps only AND
3. Clocks with timestamps differing by less than MAXIMUM_CLOCK_DELTA 3. Clocks with timestamps differing by less than MAXIMUM_CLOCK_DELTA
are comparable by using their TIDs only AND are comparable by using their TIDs only AND
4. An undefined TID is always older than any other TID AND 4. An undefined TID is always older than any other TID AND
5. TIDs are compared using rules of [RFC8505]. 5. TIDs are compared using rules of [RFC8505].
4.3.3.2. Interaction between Time Stamps and Sequence Counters 4.3.3.2. Interaction between Time Stamps and Sequence Counters
For slow movements that occur less frequently than e.g. once per For attachment changes that occur less frequently (e.g. once per
second, the time stamp that the RIFT infrastructure captures is second), the timestamp that the RIFT infrastructure captures should
enough to determine the freshest discovery. If the point of be enough to determine the most current discovery. If the point of
attachment changes faster than the maximum drift of the time stamping attachment changes faster than the maximum drift of the timestamping
mechanism (i.e. MAXIMUM_CLOCK_DELTA), then a sequence counter is mechanism (i.e. MAXIMUM_CLOCK_DELTA), then a sequence number SHOULD
required to add resolution to the freshness evaluation, and it must be used to enable necessary precision to determine currency.
be sized so that the counters stay comparable within the resolution
of the time sampling mechanism.
The sequence counter in [RFC8505] is encoded as one octet and wraps The sequence counter in [RFC8505] is encoded as one octet and wraps
around using Appendix A. around using Appendix A.
Within the resolution of MAXIMUM_CLOCK_DELTA the sequence counters Within the resolution of MAXIMUM_CLOCK_DELTA, sequence counter values
captured during 2 sequential values of the time stamp SHOULD be captured during 2 sequential iterations of the same timestamp SHOULD
comparable. This means with default values that a node may move up be comparable. This means that with default values, a node may move
to 127 times during a 200 milliseconds period and the clocks remain up to 127 times in a 200 millisecond period and the clocks will
still comparable thus allowing the infrastructure to assert the remain comparable. This allows the RIFT infrastructure to explicitly
freshest advertisement with no ambiguity. assert the most up-to-date advertisement.
4.3.3.3. Anycast vs. Unicast 4.3.3.3. Anycast vs. Unicast
A unicast prefix can be attached to at most one leaf, whereas an A unicast prefix can be attached to at most one leaf, whereas an
anycast prefix may be reachable via more than one leaf. anycast prefix may be reachable via more than one leaf.
If a monotonic clock attribute is provided on the prefix, then the If a monotonic clock attribute is provided on the prefix, then the
prefix with the `newest` clock value is strictly preferred. An prefix with the `newest` clock value is strictly preferred. An
anycast prefix does not carry a clock or all clock attributes MUST be anycast prefix does not carry a clock or all clock attributes MUST be
the same under the rules of Section 4.3.3.1. the same under the rules of Section 4.3.3.1.
Observe that it is important that in mobility events the leaf is re- Observe that it is important that in mobility events the leaf is re-
flooding as quickly as possible the absence of the prefix that moved flooding as quickly as possible the absence of the prefix that moved
away. away.
Observe further that without support for [RFC8505] movements on the Observe further that without support for [RFC8505] movements on the
fabric within intervals smaller than 100msec will be seen as anycast. fabric within intervals smaller than 100msec will be seen as anycast.
4.3.3.4. Overlays and Signaling 4.3.3.4. Overlays and Signaling
RIFT is agnostic whether any overlay technology like [MIP, LISP, RIFT is agnostic to any overlay technologies and their associated
VxLAN, NVO3] and the associated signaling is deployed over it. But control and transports that run on top of it (e.g. VXLAN). It is
it is expected that leaf nodes, and possibly Top-of-Fabric nodes can expected that leaf nodes and possibly Top-of-Fabric nodes can perform
perform the correct encapsulation. necessary data plane encapsulation.
In the context of mobility, overlays provide a classical solution to In the context of mobility, overlays provide another possible
avoid injecting mobile prefixes in the fabric and improve the solution to avoid injecting mobile prefixes into the fabric as well
scalability of the solution. It makes sense on a data center that as improving scalability of the deployment. It makes sense to
already uses overlays to consider their applicability to the mobility consider overlays for mobility solutions in IP fabrics. As an
solution; as an example, a mobility protocol such as LISP may inform example, a mobility protocol such as LISP may inform the ingress leaf
the ingress leaf of the location of the egress leaf in real time. of the location of the egress leaf in real time.
Another possibility is to consider that mobility as an underlay Another possibility is to consider that mobility as an underlay
service and support it in RIFT to an extent. The load on the fabric service and support it in RIFT to an extent. The load on the fabric
augments with the amount of mobility obviously since a move forces augments with the amount of mobility obviously since a move forces
flooding and computation on all nodes in the scope of the move so flooding and computation on all nodes in the scope of the move so
tunneling from leaf to the Top-of-Fabric may be desired. tunneling from leaf to the Top-of-Fabric may be desired to speed up
convergence times.
4.3.4. Key/Value Store 4.3.4. Key/Value Store
4.3.4.1. Southbound 4.3.4.1. Southbound
The protocol supports a southbound distribution of key-value pairs RIFT supports the southbound distribution of key-value pairs that can
that can be used to e.g. distribute configuration information during be used to distribute information to facilitate higher levels of
topology bring-up. The KV South TIEs can arrive from multiple nodes functionality (e.g. distribution of configuration information). KV
and hence need tie-breaking per key. We use the following rules South TIEs may arrive from multiple nodes and therefore MUST execute
the following tie-breaking rules for each key:
1. Only KV TIEs originated by nodes to which the receiver has a bi- 1. Only KV TIEs received from nodes to which a bi-directional
directional adjacency are considered. adjacency exists MUST be considered.
2. Within all such valid KV South TIEs containing the key, the value 2. For each valid KV South TIEs that contains the same key, the
of the KV South TIE for which the according node South TIE is value within the South TIE with the highest level will be
present, has the highest level and within the same level has preferred. If the levels are identical, the highest originating
highest originating system ID is preferred. If keys in the most system ID will be preferred. In the case of overlapping keys in
preferred TIEs are overlapping, the behavior is undefined. the winning South TIE, the behavior is undefined.
Observe that if a node goes down, the node south of it looses Consider that if a node goes down, nodes south of it will lose
adjacencies to it and with that the KVs will be disregarded and on associated adjacencies causing them to disregard corresponding KVs.
tie-break changes new KV re-advertised to prevent stale information New KV South TIEs are advertised to prevent stale information being
being used by nodes further south. KV information in southbound used by nodes that are farther south. KV advertisements southbound
direction is not result of independent computation of every node over are not a result of independent computation by every node over the
same set of TIEs but a diffused computation. same set of South TIEs, but a diffused computation.
4.3.4.2. Northbound 4.3.4.2. Northbound
Certain use cases seem to necessitate distribution of essentially KV Certain use cases necessitate distribution of essential KV
information that is generated in the leaves in the northbound information that is generated by the leaves in the northbound
direction. Such information is flooded in KV North TIEs. Since the direction. Such information is flooded in KV North TIEs. Since the
originator of northbound KV is preserved during northbound flooding, originator of the KV North TIEs is preserved during flooding,
overlapping keys could be used. However, to omit further protocol overlapping keys MAY be used. However, to avoid further protocol
complexity, only the value of the key in TIE tie-broken in same complexity, the same tie-breaking rules as used in southbound
fashion as southbound KV TIEs is used. distribution SHOULD be used.
4.3.5. Interactions with BFD 4.3.5. Interactions with BFD
RIFT MAY incorporate BFD [RFC5881] to react quickly to link failures. RIFT MAY incorporate BFD [RFC5881] to react quickly to link failures.
In such case following procedures are introduced: In such case following procedures are introduced:
After RIFT three-way hello adjacency convergence a BFD session MAY After RIFT three-way hello adjacency convergence a BFD session MAY
be formed automatically between the RIFT endpoints without further be formed automatically between the RIFT endpoints without further
configuration using the exchanged discriminators. The capability configuration using the exchanged discriminators. The capability
of the remote side to support BFD is carried on the LIEs. of the remote side to support BFD is carried in the LIEs.
In case established BFD session goes Down after it was Up, RIFT In case established BFD session goes Down after it was Up, RIFT
adjacency SHOULD be re-initialized and subsequently started from adjacency SHOULD be re-initialized and subsequently started from
Init after it sees a consecutive BFD Up. Init after it sees a consecutive BFD Up.
In case of parallel links between nodes each link MAY run its own In case of parallel links between nodes each link MAY run its own
independent BFD session or they may share a session. independent BFD session or they MAY share a session.
In case RIFT changes link identifiers or BFD capability indication If link identifiers or BFD capabilities change, both the LIE and
both the LIE as well as the BFD sessions SHOULD be brought down any BFD sessions SHOULD be brought down and back up again. In
and back up again. case only the advertised capabilities change, the node MAY choose
to persist the BFD session.
Multiple RIFT instances MAY choose to share a single BFD session Multiple RIFT instances MAY choose to share a single BFD session,
(in such case it is undefined what discriminators are used albeit in such cases the behavior for which discriminators are used is
RIFT MAY advertise the same link ID for the same interface in undefined. However, RIFT MAY advertise the same link ID for the
multiple instances and with that "share" the discriminators). same interface in multiple instances to "share" discriminators.
BFD TTL follows [RFC5082]. BFD TTL follows [RFC5082].
4.3.6. Fabric Bandwidth Balancing 4.3.6. Fabric Bandwidth Balancing
A well understood problem in fabrics is that in case of link losses A well understood problem in fabrics is that in case of link
it would be ideal to rebalance how much traffic is offered to failures, it would be ideal to rebalance how much traffic is sent to
switches in the next level based on the ingress and egress bandwidth switches in the next level based on available ingress and egress
they have. Current attempts rely mostly on specialized traffic bandwidth.
engineering via controller or leaves being aware of complete topology
with according cost and complexity.
RIFT can support a very light weight mechanism that can deal with the RIFT supports a very light weight mechanism that can deal with the
problem in an approximate way based on the fact that RIFT is loop- problem in an approximate way based on the fact that RIFT is loop-
free. free.
4.3.6.1. Northbound Direction 4.3.6.1. Northbound Direction
Every RIFT node SHOULD compute the amount of northbound bandwidth Every RIFT node SHOULD compute the amount of northbound bandwidth
available through neighbors at higher level and modify distance available through neighbors at higher level and modify distance
received on default route from this neighbor. Those different received on default route from this neighbor. Default routes with
distances SHOULD be used to support weighted ECMP forwarding towards differing distances SHOULD be used to support weighted ECMP
higher level when using default route. We call such a distance forwarding. We call such a distance Bandwidth Adjusted Distance or
Bandwidth Adjusted Distance or BAD. This is best illustrated by a BAD. This is best illustrated by a simple example.
simple example.
. 100 x 100 100 MBits . 100 x 100 100 MBits
. | x | | . | x | |
. +-+---+-+ +-+---+-+ . +-+---+-+ +-+---+-+
. | | | | . | | | |
. |Spin111| |Spin112| . |Spin111| |Spin112|
. +-+---+++ ++----+++ . +-+---+++ ++----+++
. |x || || || . |x || || ||
. || |+---------------+ || . || |+---------------+ ||
. || +---------------+| || . || +---------------+| ||
skipping to change at page 103, line 29 skipping to change at page 104, line 29
. || +------------+| || || . || +------------+| || ||
. || |+------------+ || || . || |+------------+ || ||
. |x || || || . |x || || ||
. +-+---+++ +--++-+++ . +-+---+++ +--++-+++
. | | | | . | | | |
. |Leaf111| |Leaf112| . |Leaf111| |Leaf112|
. +-------+ +-------+ . +-------+ +-------+
Figure 29: Balancing Bandwidth Figure 29: Balancing Bandwidth
All links from leaves in Figure 29 are assumed to 10 MBit/s bandwidth Figure 29 depicts an example topology where links between leaf and
while the uplinks one level further up are assumed to be 100 MBit/s. spine nodes are 10 MBit/s and links from spine nodes northbound are
Further, in Figure 29 we assume that Leaf111 lost one of the parallel 100 MBit/s. Consider a parallel link failure between Leaf 111 and
links to Spine 111 and with that wants to possibly push more traffic Spine 111 and as a result, Leaf 111 wants to forward more traffic
onto Spine 112. Leaf 112 has equal bandwidth to Spine 111 and Spine toward Spine 112. Additionally, we consider an uplink failure on
112 but Spine 111 lost one of its uplinks. Spine 111.
The local modification of the received default route distance from The local modification of the received default route distance from
upper level is achieved by running a relatively simple algorithm upper level is achieved by running a relatively simple algorithm
where the bandwidth is weighted exponentially while the distance on where the bandwidth is weighted exponentially, while the distance on
the default route represents a multiplier for the bandwidth weight the default route represents a multiplier for the bandwidth weight
for easy operational adjustments. for easy operational adjustments.
On a node L use Node TIEs to compute for each non-overloaded On a node, L, use Node TIEs to compute from each non-overloaded
northbound neighbor N three values: northbound neighbor N to compute 3 values:
L_N_u: as sum of the bandwidth available to N L_N_u: as sum of the bandwidth available to N
N_u: as sum of the uplink bandwidth available on N N_u: as sum of the uplink bandwidth available on N
T_N_u: as sum of L_N_u * OVERSUBSCRIPTION_CONSTANT + N_u T_N_u: as sum of L_N_u * OVERSUBSCRIPTION_CONSTANT + N_u
For all T_N_u determine the according M_N_u as For all T_N_u determine the according M_N_u as
log_2(next_power_2(T_N_u)) and determine MAX_M_N_u as maximum value log_2(next_power_2(T_N_u)) and determine MAX_M_N_u as maximum value
of all M_N_u. of all such M_N_u values.
For each advertised default route from a node N modify the advertised For each advertised default route from a node N modify the advertised
distance D to BAD = D * (1 + MAX_M_N_u - M_N_u) and use BAD instead distance D to BAD = D * (1 + MAX_M_N_u - M_N_u) and use BAD instead
of distance D to weight balance default forwarding towards N. of distance D to weight balance default forwarding towards N.
For the example above a simple table of values will help the For the example above, a simple table of values will help in
understanding. We assume the default route distance is advertised understanding of the concept. We assume that all default route
with D=1 everywhere and OVERSUBSCRIPTION_CONSTANT = 1. distances are advertised with D=1 and that OVERSUBSCRIPTION_CONSTANT
= 1.
+---------+-----------+-------+-------+-----+ +---------+-----------+-------+-------+-----+
| Node | N | T_N_u | M_N_u | BAD | | Node | N | T_N_u | M_N_u | BAD |
+---------+-----------+-------+-------+-----+ +---------+-----------+-------+-------+-----+
| Leaf111 | Spine 111 | 110 | 7 | 2 | | Leaf111 | Spine 111 | 110 | 7 | 2 |
+---------+-----------+-------+-------+-----+ +---------+-----------+-------+-------+-----+
| Leaf111 | Spine 112 | 220 | 8 | 1 | | Leaf111 | Spine 112 | 220 | 8 | 1 |
+---------+-----------+-------+-------+-----+ +---------+-----------+-------+-------+-----+
| Leaf112 | Spine 111 | 120 | 7 | 2 | | Leaf112 | Spine 111 | 120 | 7 | 2 |
+---------+-----------+-------+-------+-----+ +---------+-----------+-------+-------+-----+
| Leaf112 | Spine 112 | 220 | 8 | 1 | | Leaf112 | Spine 112 | 220 | 8 | 1 |
+---------+-----------+-------+-------+-----+ +---------+-----------+-------+-------+-----+
Table 5: BAD Computation Table 5: BAD Computation
If a calculation produces a result exceeding the range of the type, If a calculation produces a result exceeding the range of the type,
e.g. bandwidth, the result is set to the highest possible value for e.g. bandwidth, the result is set to the highest possible value for
that type. that type.
BAD is only computed for default routes. A node MAY compute and use BAD SHOULD be only computed for default routes. A node MAY compute
BAD for any disaggregated prefixes or other RIFT routes. A node MAY and use BAD for any disaggregated prefixes or other RIFT routes. A
use another algorithm than BAD to weight northbound traffic based on node MAY use a different algorithm to weight northbound traffic based
bandwidth given that the algorithm is distributed and un-synchronized on bandwidth. If a different algorithm is used, its successful
and ultimately, its correct behavior does not depend on uniformity of behavior MUST NOT depend on uniformity of algorithm or
balancing algorithms used in the fabric. E.g. it is conceivable that synchronization of BAD computations across the fabric. E.g. it is
leaves could use real time link loads gathered by analytics to change conceivable that leaves could use real time link loads gathered by
the amount of traffic assigned to each default route next hop. analytics to change the amount of traffic assigned to each default
route next hop.
Observe further that a change in available bandwidth will only affect Furthermore, a change in available bandwidth will only affect, at
at maximum two levels down in the fabric, i.e. blast radius of most, two levels down in the fabric, i.e. the blast radius of
bandwidth changes is contained no matter its height. bandwidth adjustments is constrained no matter the fabric's height.
4.3.6.2. Southbound Direction 4.3.6.2. Southbound Direction
Due to its loop free properties a node MAY take during S-SPF into Due to its loop free nature, during South SPF, a node MAY account for
account the available bandwidth on the nodes in lower levels and maximum available bandwidth on nodes in lower levels and modify the
modify the amount of traffic offered to next level's "southbound" amount of traffic offered to the next level's southbound nodes. It
nodes based as what it sees is the total achievable maximum flow is worth considering that such computations may be more effective if
through those nodes. It is worth observing that such computations standardized, but do not have to be. As long as a packet continues
may work better if standardized but does not have to be necessarily. to flow southbound, it will take some viable, loop-free path to reach
As long the packet keeps on heading south it will take one of the its destination.
available paths and arrive at the intended destination.
4.3.7. Label Binding 4.3.7. Label Binding
A node MAY advertise on its LIEs a locally significant, downstream A node MAY advertise in its LIEs, a locally significant, downstream
assigned, interface specific label. One use of such label is a hop- assigned, interface specific label. One use of such a label is a
by-hop encapsulation allowing to easily distinguish forwarding planes hop-by-hop encapsulation allowing forwarding planes to be easily
served by a multiplicity of RIFT instances. distinguished among multiple RIFT instances.
4.3.8. Leaf to Leaf Procedures 4.3.8. Leaf to Leaf Procedures
RIFT can optionally allow special leaf East-West adjacencies under RIFT implementations SHOULD support special East-West adjacencies
additional set of rules. The leaf supporting those procedures MUST: between leaf nodes. Leaf nodes supporting these procedures MUST:
advertise the LEAF_2_LEAF flag in node capabilities AND advertise the LEAF_2_LEAF flag in its node capabilities AND
set the overload bit on all leaf's node TIEs AND set the overload bit on all leaf's node TIEs AND
flood only node's own north and south TIEs over E-W leaf flood only a node's own north and south TIEs over E-W leaf
adjacencies AND adjacencies AND
always use E-W leaf adjacency in both north as well as south always use E-W leaf adjacency in all SPF computations AND
computation AND
install a discard route for any advertised aggregate in leaf's install a discard route for any advertised aggregate routes in a
TIEs AND leaf?s TIE AND
never form southbound adjacencies. never form southbound adjacencies.
This will allow the E-W leaf nodes to exchange traffic strictly for This will allow the E-W leaf nodes to exchange traffic strictly for
the prefixes advertised in each other's north prefix TIEs (since the the prefixes advertised in each other's north prefix TIEs (since the
southbound computation will find the reverse direction in the other southbound computation will find the reverse direction in the other
node's TIE and install its north prefixes). node's TIE and install its north prefixes).
4.3.9. Address Family and Multi Topology Considerations 4.3.9. Address Family and Multi Topology Considerations
Multi-Topology (MT)[RFC5120] and Multi-Instance (MI)[RFC8202] is used Multi-Topology (MT)[RFC5120] and Multi-Instance (MI)[RFC8202]
today in link-state routing protocols to support several domains on concepts are used today in link-state routing protocols to support
the same physical topology. RIFT supports this capability by several domains on the same physical topology. RIFT supports this
carrying transport ports in the LIE protocol exchanges. Multiplexing capability by carrying transport ports in the LIE protocol exchanges.
of LIEs can be achieved by either choosing varying multicast
addresses or ports on the same address. Multiplexing of LIEs can be achieved by either choosing varying
multicast addresses or ports on the same address.
BFD interactions in Section 4.3.5 are implementation dependent when BFD interactions in Section 4.3.5 are implementation dependent when
multiple RIFT instances run on the same link. multiple RIFT instances run on the same link.
4.3.10. Reachability of Internal Nodes in the Fabric 4.3.10. Reachability of Internal Nodes in the Fabric
RIFT does not precondition that its nodes have reachable addresses RIFT does not require that nodes have reachable addresses in the
albeit for operational purposes this is clearly desirable. Under fabric, though it is clearly desirable for operational purposes.
normal operating conditions this can be easily achieved by e.g. Under normal operating conditions this can be easily achieved by
injecting the node's loopback address into North and South Prefix injecting the node's loopback address into North and South Prefix
TIEs or other implementation specific mechanisms. TIEs or other implementation specific mechanisms.
Things get more interesting in case a node looses all its northbound Special considerations arise when a node loses all northbound
adjacencies but is not at the top of the fabric. That is outside the adjacencies, but is not at the top of the fabric. These are outside
scope of this document and may be covered in a separate document. the scope of this document and could be discussed in a separate
document.
4.3.11. One-Hop Healing of Levels with East-West Links 4.3.11. One-Hop Healing of Levels with East-West Links
Based on the rules defined in Section 4.2.4, Section 4.2.3.8 and Based on the rules defined in Section 4.2.4, Section 4.2.3.8 and
given presence of E-W links, RIFT can provide a one-hop protection of given presence of E-W links, RIFT can provide a one-hop protection
nodes that lost all their northbound links or in other complex link for nodes that lost all their northbound links. This can also be
set failure scenarios except at Top-of-Fabric where the links are applied to multi-plane designs where complex link set failures occur
used exclusively to flood topology information in multi-plane at the Top-of-Fabric when links are exclusively used for flooding
designs. Section 5.4 explains the resulting behavior based on one topology information. Section 5.4 outlines this behavior.
such example.
4.4. Security 4.4. Security
4.4.1. Security Model 4.4.1. Security Model
An inherent property of any security and ZTP architecture is the An inherent property of any security and ZTP architecture is the
resulting trade-off in regard to integrity verification of the resulting trade-off in regard to integrity verification of the
information distributed through the fabric vs. necessary provisioning information distributed through the fabric vs. provisioning and auto-
and auto-configuration. At a minimum, in all approaches, the configuration requirements. At a minimum the security of an
security of an established adjacency can be ensured. The stricter established adjacency should be ensured. The stricter the security
the security model the more provisioning must take over the role of model the more provisioning must take over the role of ZTP.
ZTP.
The most security conscious operators will want to have full control RIFT supports the following security models to allow for flexible
over which port on which router/switch is connected to the respective control by the operator.
port on the "other side", which we will call the "port-association
model" (PAM) achievable e.g. by configuring on each port pair a
designated shared key or pair of private/public keys. In secure data
center locations, operators may want to control which router/switch
is connected to which other router/switch only or choose a "node-
association model" (NAM) which allows, for example, simplified port
sparing. In an even more relaxed environment, an operator may only
be concerned that the router/switches share credentials ensuring that
they belong to this particular data center network hence allowing the
flexible sparing of whole routers/switches. We will define that case
as the "fabric-association model" (FAM), equivalent to using a shared
secret for the whole fabric. Such flexibility may make sense for
leaf nodes such as servers where the addition and swapping of servers
is more frequent than the rest of the data center network.
Generally, leaves of the fabric tend to be less trusted than
switches. The different models could be mixed throughout the fabric
if the benefits outweigh the cost of increased complexity in
provisioning.
In each of the above cases, some configuration mechanism is needed to o The most security conscious operators may choose to have control
allow the operator to specify which connections are allowed, and some over which ports interconnect between a given pair of nodes, we
mechanism is needed to: call this the "Port-Association Model" (PAM). This is achievable
by configuring each pair of directly connected ports with a
designated shared key or public/private key pair.
a. specify the according level in the fabric, o In physically secure data center locations, operators may choose
to control connectivity between entire nodes, we call this the
"Node-Association Model" (NAM). A benefit of this model is that
it allows for simplified port sparing.
b. discover and report missing connections, o In the most relaxed environments, an operator may only choose to
control which nodes join a particular fabric. We call this the
"Fabric-Association Model" (FAM). This is achievable by using a
single shared secret across the entire fabric. Such flexibility
makes sense when we consider servers as leaf devices, which are
replaced more often than network nodes. In addition, this model
allows for simplified node sparing.
c. discover and report unexpected connections, and prevent such o These models may be mixed throughout the fabric depending upon
adjacencies from forming. security requirements at various levels of the fabric and
willingness to accept increased provisioning complexity.
On the more relaxed configuration side of the spectrum, operators In order to support the cases mentioned above, RIFT implementations
might only configure the level of each switch, but don't explicitly supports, through operator control, mechanisms that allow for:
configure which connections are allowed. In this case, RIFT will
only allow adjacencies to come up between nodes are that in adjacent a. specification of the appropriate level in the fabric,
levels. The operators with lowest security requirements may not use
any configuration to specify which connections are allowed. Such b. discovery and reporting of missing connections,
fabrics could rely fully on ZTP for each router/switch to discover
its level and would only allow adjacencies between adjacent levels to c. discovery and reporting of unexpected connections while
come up. Figure 30 illustrates the tradeoffs inherent in the preventing them from forming insecure adjacencies.
Operators may only choose to configure the level of each node, but
not explicitly configure which connections are allowed. In this
case, RIFT will only allow adjacencies to establish between nodes
that are in adjacent levels. Operators with the lowest security
requirements may not use any configuration to specify which
connections are allowed. Nodes in such fabrics could rely fully on
ZTP and only established adjacencies between nodes in adjacent
levels. Figure 30 illustrates inherent tradeoffs between the
different security models. different security models.
Ultimately, some level of verification of the link quality may be Some level of link quality verification may be required prior to an
required before an adjacency is allowed to be used for forwarding. adjacency being used for forwarding. For example, an implementation
For example, an implementation may require that a BFD session comes may require that a BFD session comes up before advertising the
up before advertising the adjacency. adjacency.
For the above outlined cases, RIFT has two approaches to enforce that For the cases outlined above, RIFT has two approaches to enforce that
a local port is connected to the correct port on the correct remote a local port is connected to the correct port on the correct remote
router/switch. One approach is to piggy-back on RIFT's node. One approach is to piggy-back on RIFT's authentication
authentication mechanism. Assuming the provisioning model (e.g. the mechanism. Assuming the provisioning model (e.g. the YANG model) is
YANG model) is flexible enough, operators can choose to provision a flexible enough, operators can choose to provision a unique
unique authentication key for: authentication key for:
a. each pair of ports in "port-association model" or a. each pair of ports in "port-association model" or
b. each pair of switches in "node-association model" or b. each pair of switches in "node-association model" or
c. each pair of levels or c. each pair of levels or
d. the entire fabric in "fabric-association model". d. the entire fabric in "fabric-association model".
The other approach is to rely on the system-id, port-id and level The other approach is to rely on the system-id, port-id and level
fields in the LIE message to validate an adjacency against the fields in the LIE message to validate an adjacency against the
configured expected cabling topology, and optionally introduce some expected cabling topology, and optionally introduce some new rules in
new rules in the FSM to allow the adjacency to come up if the the FSM to allow the adjacency to come up if the expectations are
expectations are met. met.
^ /\ | ^ /\ |
/|\ / \ | /|\ / \ |
| / \ | | / \ |
| / PAM \ | | / PAM \ |
Increasing / \ Increasing Increasing / \ Increasing
Integrity +----------+ Flexibility Integrity +----------+ Flexibility
& / NAM \ & & / NAM \ &
Increasing +--------------+ Less Increasing +--------------+ Less
Provisioning / FAM \ Configuration Provisioning / FAM \ Configuration
| +------------------+ | | +------------------+ |
| / Level Provisioning \ | | / Level Provisioning \ |
| +----------------------+ \|/ | +----------------------+ \|/
| / Zero Configuration \ v | / Zero Configuration \ v
+--------------------------+ +--------------------------+
Figure 30: Security Model Figure 30: Security Model
4.4.2. Security Mechanisms 4.4.2. Security Mechanisms
RIFT Security goals are to ensure authentication, message integrity RIFT Security goals are to ensure:
and prevention of replay attacks. Low processing overhead and
efficient messaging are also a goal. Message confidentiality is a 1. authentication
non-goal.
2. message integrity
3. the prevention of replay attacks
4. low processing overhead
5. efficient messaging
Message confidentiality is a non-goal.
The model in the previous section allows a range of security key The model in the previous section allows a range of security key
types that are analogous to the various security association models. types that are analogous to the various security association models.
PAM and NAM allow security associations at the port or node level PAM and NAM allow security associations at the port or node level
using symmetric or asymmetric keys that are pre-installed. FAM using symmetric or asymmetric keys that are pre-installed. FAM
argues for security associations to be applied only at a group level argues for security associations to be applied only at a group level
or to be refined once the topology has been established. RIFT does or to be refined once the topology has been established. RIFT does
not specify how security keys are installed or updated it specifies not specify how security keys are installed or updated, though it
how the key can be used to achieve goals. does specify how the key can be used to achieve security goals.
The protocol has provisions for "weak" nonces to prevent replay The protocol has provisions for "weak" nonces to prevent replay
attacks and includes authentication mechanisms comparable to attacks and includes authentication mechanisms comparable to
[RFC5709] and [RFC7987]. [RFC5709] and [RFC7987].
4.4.3. Security Envelope 4.4.3. Security Envelope
RIFT MUST be carried in a mandatory secure envelope illustrated in RIFT MUST be carried in a mandatory secure envelope illustrated in
Figure 31. Any value in the packet following a security fingerprint Figure 31. Any value in the packet following a security fingerprint
MUST be used only after the according fingerprint has been validated. MUST be used only after the appropriate fingerprint has been
validated.
Local configuration MAY allow to skip the checking of the envelope's Local configuration MAY allow for the envelope's integrity checks to
integrity. be skipped.
0 1 2 3 0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
UDP Header: UDP Header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Port | RIFT destination port | | Source Port | RIFT destination port |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| UDP Length | UDP Checksum | | UDP Length | UDP Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
skipping to change at page 111, line 10 skipping to change at page 112, line 10
| | | |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Figure 31: Security Envelope Figure 31: Security Envelope
RIFT MAGIC: 16 bits. Constant value of 0xA1F7 that allows to RIFT MAGIC: 16 bits. Constant value of 0xA1F7 that allows to
classify RIFT packets independent of used UDP port. classify RIFT packets independent of used UDP port.
Packet Number: 16 bits. An optional, per packet type monotonically Packet Number: 16 bits. An optional, per packet type monotonically
growing number rolling over using sequence number arithmetic growing number rolling over using sequence number arithmetic
defined inAppendix A. A node SHOULD correctly set the number on defined in Appendix A. A node SHOULD correctly set the number on
subsequent packets or otherwise MUST set the value to subsequent packets or otherwise MUST set the value to
`undefined_packet_number` as provided in the schema. This number `undefined_packet_number` as provided in the schema. This number
can be used to detect losses and misordering in flooding for can be used to detect losses and misordering in flooding for
either operational purposes or in implementation to adjust either operational purposes or in implementation to adjust
flooding behavior to current link or buffer quality. This number flooding behavior to current link or buffer quality. This number
MUST NOT be used to discard or validate the correctness of MUST NOT be used to discard or validate the correctness of
packets. packets.
RIFT Major Version: 8 bits. It allows to check whether protocol RIFT Major Version: 8 bits. It allows to check whether protocol
versions are compatible, i.e. the serialized object can be decoded versions are compatible, i.e. if the serialized object can be
at all. An implementation MUST drop packets with unexpected value decoded at all. An implementation MUST drop packets with
and MAY report a problem. Must be same as in encoded model unexpected values and MAY report a problem.
object, otherwise packet is dropped.
Outer Key ID: 8 bits to allow key rollovers. This implies key type Outer Key ID: 8 bits to allow key rollovers. This implies key type
and used algorithm. Value 0 means that no valid fingerprint was and algorithm. Value 0 means that no valid fingerprint was
computed. This key ID scope is local to the nodes on both ends of computed. This key ID scope is local to the nodes on both ends of
the adjacency. the adjacency.
TIE Origin Key ID: 24 bits. This implies key type and used TIE Origin Key ID: 24 bits. This implies key type and used
algorithm. Value 0 means that no valid fingerprint was computed. algorithm. Value 0 means that no valid fingerprint was computed.
This key ID scope is global to the RIFT instance since it implies This key ID scope is global to the RIFT instance since it implies
the originator of the TIE so the contained object does not have to the originator of the TIE so the contained object does not have to
be de-serialized to obtain it. be de-serialized to obtain it.
Length of Fingerprint: 8 bits. Length in 32-bit multiples of the Length of Fingerprint: 8 bits. Length in 32-bit multiples of the
following fingerprint not including lifetime or weak nonces. It following fingerprint (not including lifetime or weak nonces). It
allows to navigate the structure when an unknown key type is allows the structure to be navigated when an unknown key type is
present. To clarify a common corner case when this value is set present. To clarify, a common corner case when this value is set
to 0 it signifies an empty (0 bytes long) security fingerprint. to 0 is when it signifies an empty (0 bytes long) security
fingerprint.
Security Fingerprint: 32 bits * Length of Fingerprint. This is a Security Fingerprint: 32 bits * Length of Fingerprint. This is a
signature that is computed over all data following after it. If signature that is computed over all data following after it. If
the significant bits of fingerprint are fewer than the 32 bits the significant bits of fingerprint are fewer than the 32 bits
padded length than the significant bits MUST be left aligned and padded length than the significant bits MUST be left aligned and
remaining bits on the right padded with 0s. When using PKI the remaining bits on the right padded with 0s. When using PKI the
Security fingerprint originating node uses its private key to Security fingerprint originating node uses its private key to
create the signature. The original packet can then be verified create the signature. The original packet can then be verified
provided the public key is shared and current. provided the public key is shared and current.
skipping to change at page 112, line 32 skipping to change at page 113, line 32
Observe that due to the schema migration rules per Appendix B the Observe that due to the schema migration rules per Appendix B the
contained model can be always decoded if the major version matches contained model can be always decoded if the major version matches
and the envelope integrity has been validated. Consequently, and the envelope integrity has been validated. Consequently,
description of the TIE is available to flood it properly including description of the TIE is available to flood it properly including
unknown TIE types. unknown TIE types.
4.4.4. Weak Nonces 4.4.4. Weak Nonces
The protocol uses two 16 bit nonces to salt generated signatures. We The protocol uses two 16 bit nonces to salt generated signatures. We
use the term "nonce" a bit loosely since RIFT nonces are not being use the term "nonce" a bit loosely since RIFT nonces are not being
changed on every packet as common in cryptography. For efficiency changed in every packet as common in cryptography. For efficiency
purposes they are changed at a frequency high enough to dwarf replay purposes they are changed at a high enough frequency to dwarf
attacks attempts for all practical purposes. Therefore, we call them practical replay attack attempts. Therefore, we call them "weak"
"weak" nonces. nonces.
Any implementation including RIFT security MUST generate and wrap Any implementation including RIFT security MUST generate and wrap
around local nonces properly. When a nonce increment leads to around local nonces properly. When a nonce increment leads to
`undefined_nonce` value the value SHOULD be incremented again `undefined_nonce` value, the value MUST be incremented again
immediately. All implementation MUST reflect the neighbor's nonces. immediately. All implementation MUST reflect the neighbor's nonces.
An implementation SHOULD increment a chosen nonce on every LIE FSM An implementation SHOULD increment a chosen nonce on every LIE FSM
transition that ends up in a different state from the previous and transition that ends up in a different state from the previous and
MUST increment its nonce at least every 5 minutes (such MUST increment its nonce at least every 5 minutes (such
considerations allow for efficient implementations without opening a considerations allow for efficient implementations without opening a
significant security risk). When flooding TIEs, the implementation significant security risk). When flooding TIEs, the implementation
MUST use recent (i.e. within allowed difference) nonces reflected in MUST use recent (i.e. within allowed difference) nonces reflected in
the LIE exchange. The schema specifies maximum allowable nonce value the LIE exchange. The schema specifies the maximum allowable nonce
difference on a packet compared to reflected nonces in the LIEs. Any value difference on a packet compared to reflected nonces in the
packet received with nonces deviating more than the allowed delta LIEs. Any packet received with nonces deviating more than the
MUST be discarded without further computation of signatures to allowed delta MUST be discarded without further computation of
prevent computation load attacks. signatures to prevent computation load attacks.
In case where a secure implementation does not receive signatures or In cases where a secure implementation does not receive signatures or
receives undefined nonces from neighbor indicating that it does not receives undefined nonces from a neighbor (indicating that it does
support or verify signatures, it is a matter of local policy how such not support or verify signatures), it is a matter of local policy as
packets are treated. Any secure implementation MAY choose to either to how those packets are treated. A secure implementation MAY refuse
refuse forming an adjacency with an implementation not advertising forming an adjacency with an implementation that is not advertising
signatures or valid nonces or simply keep on signing local packets signatures or valid nonces, or it MAY continue signing local packets
while accepting neighbor's packets without further security while accepting a neighbor's packets without further security
verification. validation.
As a necessary exception, an implementation MUST advertise As a necessary exception, an implementation MUST advertise the remote
`undefined_nonce` for remote nonce value when the FSM is not in two- nonce value as `undefined_nonce` when the FSM is not in two-way or
way or three-way state and accept an `undefined_nonce` for its local three-way state and accept an `undefined_nonce` for its local nonce
nonce value on packets in any other state than three-way. value on packets in any other state than three-way.
As optional optimization, an implementation MAY send one LIE with As optional optimization, an implementation MAY send one LIE with
previously negotiated neighbor's nonce to try to speed up a previously negotiated neighbor's nonce to try to speed up a
neighbor's transition from three-way to one-way and MUST revert to neighbor's transition from three-way to one-way and MUST revert to
sending `undefined_nonce` after that. sending `undefined_nonce` after that.
4.4.5. Lifetime 4.4.5. Lifetime
Protecting lifetime on flooding may lead to excessive number of Protecting flooding lifetime may lead to an excessive number of
security fingerprint computation and hence an application generating security fingerprint computations and to avoid this the application
such fingerprints on TIEs MAY round the value down to the next generating the fingerprints for advertised TIEs, MAY round the value
`rounddown_lifetime_interval` defined in the schema when sending TIEs down to the next `rounddown_lifetime_interval`. Such an optimization
albeit such optimization in presence of security hashes over in the presence of security hashes over advancing weak nonces, may
advancing weak nonces may not be feasible. not be feasible.
4.4.6. Key Management 4.4.6. Key Management
As outlined in the Security Model a private shared key or a public/ As outlined in Section Section 7, either a private shared key or a
private key pair is used to Authenticate the adjacency. The actual public/private key pair is used to authenticate the adjacency. Both
method of key distribution and key synchronization is assumed to be the key distribution and key synchronization methods are out of scope
out of band from RIFT's perspective. Both nodes in the adjacency for this document. Both nodes in the adjacency MUST share the same
must share the same keys and configuration of key type and algorithm keys, key type, and algorithm for a given key ID. Mismatched keys
for a key ID. Mismatched keys will obviously not inter-operate due will not inter-operate as their security envelopes will be
to unverifiable security envelope. unverifiable.
Key roll-over while the adjacency is active is allowed and the Key roll-over while the adjacency is active MAY be supported. The
technique is well known and described in e.g. [RFC6518]. Key specific mechanism is well documented in [RFC6518].
distribution procedures are out of scope for RIFT.
4.4.7. Security Association Changes 4.4.7. Security Association Changes
There in no mechanism to convert a security envelope for the same key There in no mechanism to convert a security envelope for the same key
ID from one algorithm to another once the envelope is operational. ID from one algorithm to another once the envelope is operational.
The recommended procedure to change to a new algorithm is to take the The recommended procedure to change to a new algorithm is to take the
adjacency down and make the changes and then bring the adjacency up. adjacency down, make the necessary changes, and bring the adjacency
back up. Obviously, an implementation MAY choose to stop verifying
Obviously, an implementation MAY choose to stop verifying security security envelope for the duration of algorithm change to keep the
envelope for the duration of key change to keep the adjacency up but adjacency up but since this introduces a security vulnerability
since this introduces a security vulnerability window, such roll-over window, such roll-over SHOULD NOT be recommended.
is not recommended.
5. Examples 5. Examples
5.1. Normal Operation 5.1. Normal Operation
This section describes RIFT deployment in the example topology ^ N +--------+ +--------+
without any node or link failures. We disregard flooding reduction Level 2 | |ToF 21| |ToF 22|
for simplicity's sake. E <-*-> W ++-+--+-++ ++-+--+-++
| | | | | | | | |
S v P111/2 |P121/2 | | | |
^ ^ ^ ^ | | | |
| | | | | | | |
+--------------+ | +-----------+ | | | +---------------+
| | | | | | | |
South +-----------------------------+ | | ^
| | | | | | | All TIEs
0/0 0/0 0/0 +-----------------------------+ |
v v v | | | | |
| | +-+ +<-0/0----------+ | |
| | | | | | | |
+-+----++ +-+----++ ++----+-+ ++-----++
Level 1 | | | | | | | |
|Spin111| |Spin112| |Spin121| |Spin122|
+-+---+-+ ++----+-+ +-+---+-+ ++---+--+
| | | South | | | |
| +---0/0--->-----+ 0/0 | +----------------+ |
0/0 | | | | | | |
| +---<-0/0-----+ | v | +--------------+ | |
v | | | | | | |
+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+
Level 0 | | | | | | | |
|Leaf111| |Leaf112| |Leaf121| |Leaf122|
+-+-----+ +-+---+-+ +--+--+-+ +-+-----+
+ + \ / + +
Prefix111 Prefix112 \ / Prefix121 Prefix122
multi-homed
Prefix
+---------- PoD 1 ---------+ +---------- PoD 2 ---------+
As first step, the following bi-directional adjacencies will be Figure 32: Normal Case Topology
created (and any other links that do not fulfill LIE rules in
Section 4.2.2 disregarded): This section describes RIFT deployment in example topology given in
Figure 32 without any node or link failures. We disregard flooding
reduction for simplicity's sake and compress the node names in some
cases to fit them into the picture better.
First, the following bi-directional adjacencies will be established:
1. ToF 21 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122 1. ToF 21 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122
2. ToF 22 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122 2. ToF 22 (PoD 0) to Spine 111, Spine 112, Spine 121, and Spine 122
3. Spine 111 to Leaf 111, Leaf 112 3. Spine 111 to Leaf 111, Leaf 112
4. Spine 112 to Leaf 111, Leaf 112 4. Spine 112 to Leaf 111, Leaf 112
5. Spine 121 to Leaf 121, Leaf 122 5. Spine 121 to Leaf 121, Leaf 122
6. Spine 122 to Leaf 121, Leaf 122 6. Spine 122 to Leaf 121, Leaf 122
Consequently, North TIEs would be originated by Spine 111 and Spine Leaf 111 and Leaf 112 originate N-TIEs for Prefix 111 and Prefix 112
112 and each set would be sent to both ToF 21 and ToF 22. North TIEs (respectively) to both Spine 111 and Spine 112 (Leaf 112 also
also would be originated by Leaf 111 (w/ Prefix 111) and Leaf 112 (w/ originates an N-TIE for the multi-homed prefix). Spine 111 and Spine
Prefix 112 and the multi-homed prefix) and each set would be sent to 112 will then originate their own N-TIEs, as well as flood the N-TIEs
Spine 111 and Spine 112. Spine 111 and Spine 112 would then flood received from Leaf 111 and Leaf 112 to both ToF 21 and ToF 22.
these North TIEs to ToF 21 and ToF 22.
Similarly, North TIEs would be originated by Spine 121 and Spine 122 Similarly, Leaf 121 and Leaf 122 originate North TIEs for Prefix 121
and each set would be sent to both ToF 21 and ToF 22. North TIEs and Prefix 122 (respectively) to Spine 121 and Spine 122 (Leaf 121
also would be originated by Leaf 121 (w/ Prefix 121 and the multi- also originates an North TIE for the multi-homed prefix). Spine 121
homed prefix) and Leaf 122 (w/ Prefix 122) and each set would be sent and Spine 122 will then originate their own North TIEs, as well as
to Spine 121 and Spine 122. Spine 121 and Spine 122 would then flood flood the North TIEs received from Leaf 121 and Leaf 122 to both ToF
these North TIEs to ToF 21 and ToF 22. 21 and ToF 22.
At this point both ToF 21 and ToF 22, as well as any controller to Spines hold only North TIEs of level 0 for their PoD, while leaves
which they are connected, would have the complete network topology. only hold their own North TIEs while at this point, both ToF 21 and
At the same time, Spine 111/112/121/122 hold only the N-ties of level ToF 22 (as well as any northbound connected controllers) would have
0 of their respective PoD. leaves hold only their own North TIEs. the complete network topology.
South TIEs with adjacencies and a default IP prefix would then be ToF 21 and ToF 22 would then originate and flood South TIEs
originated by ToF 21 and ToF 22 and each would be flooded to Spine containing any established adjacencies and a default IP route to all
111, Spine 112, Spine 121, and Spine 122. Spine 111, Spine 112, spines. Spine 111, Spine 112, Spine 121, and Spine 122 will reflect
Spine 121, and Spine 122 would each send the South TIE from ToF 21 to all Node South TIEs received from ToF 21 to ToF 22, and all Node
ToF 22 and the South TIE from ToF 22 to ToF 21. (South TIEs are South TIEs from ToF 22 to ToF 21. South TIEs will not be re-
reflected up to level from which they are received but they are NOT propagated southbound.
propagated southbound.)
A South TIE with a default IP prefix would be originated by Node 111 South TIEs containing a default IP route are then originated by both
and Spine 112 and each would be sent to Leaf 111 and Leaf 112. Spine 111 and Spine 112 toward Leaf 111 and Leaf 112. Similarly,
South TIEs containing a default IP route are originated by Spine 121
and Spine 122 toward Leaf 121 and Leaf 122.
Similarly, an South TIE with a default IP prefix would be originated At this point IP connectivity across maximum number of viable paths
by Node 121 and Spine 122 and each would be sent to Leaf 121 and Leaf has been established for all leaves, with routing information
122. At this point IP connectivity with maximum possible ECMP has constrained to only the minimum amount that allows for normal
been established between the leaves while constraining the amount of operation and redundancy.
information held by each node to the minimum necessary for normal
operation and dealing with failures.
5.2. Leaf Link Failure 5.2. Leaf Link Failure
. | | | | . | | | |
.+-+---+-+ +-+---+-+ .+-+---+-+ +-+---+-+
.| | | | .| | | |
.|Spin111| |Spin112| .|Spin111| |Spin112|
.+-+---+-+ ++----+-+ .+-+---+-+ ++----+-+
. | | | | . | | | |
. | +---------------+ X . | +---------------+ X
. | | | X Failure . | | | X Failure
. | +-------------+ | X . | +-------------+ | X
. | | | | . | | | |
.+-+---+-+ +--+--+-+ .+-+---+-+ +--+--+-+
.| | | | .| | | |
.|Leaf111| |Leaf112| .|Leaf111| |Leaf112|
.+-------+ +-------+ .+-------+ +-------+
. + + . + +
. Prefix111 Prefix112 . Prefix111 Prefix112
Figure 32: Single Leaf link failure Figure 33: Single Leaf Link Failure
In case of a failing leaf link between spine 112 and leaf 112 the In the event of a link failure between Spine 112 and Leaf 112, both
link-state information will cause re-computation of the necessary SPF nodes will originate new Node TIEs that contain their connected
and the higher levels will stop forwarding towards prefix 112 through adjacencies, except for the one that just failed. Leaf 112 will send
spine 112. Only spines 111 and 112, as well as both spines will see a Node North TIE to Spine 111. Spine 112 will send a Node North TIE
control traffic. Leaf 111 will receive a new South TIE from spine to ToF 21 and ToF 22 as well as a new Node South TIE to Leaf 111 that
112 and reflect back to spine 111. will be reflected to Spine 111. Necessary SPF recomputation will
occur, resulting in Spine 112 no longer being in the forwarding path
for Prefix 112.
Spine 111 will de-aggregate prefix 111 and prefix 112 but we will not Spine 111 will also disaggregate Prefix 112 by sending new Prefix
describe it further here since de-aggregation is emphasized in the South TIE to Leaf 111 and Leaf 112. Though we cover disaggregation
next example. It is worth observing however in this example that if in more detail in the following section, it is worth mentioning ini
leaf 111 would keep on forwarding traffic towards prefix 112 using this example as it further illustrates RIFT's blackhole mitigation
the advertised south-bound default of spine 112 the traffic would end mechanism. Consider that Leaf 111 has yet to receive the more
up on Top-of-Fabric 21 and ToF 22 and cross back into pod 1 using specific (disaggregated) route from Spine 111. In such a scenario,
spine 111. This is arguably not as bad as black-holing present in traffic from Leaf 111 toward Prefix 112 may still use Spine 112's
the next example but clearly undesirable. Fortunately, de- default route, causing it to traverse ToF 21 and ToF 22 back down via
aggregation prevents this type of behavior except for a transitory Spine 111. While this behavior is suboptimal, it is transient in
period of time. nature and preferred to black-holing traffic.
5.3. Partitioned Fabric 5.3. Partitioned Fabric
. +--------+ +--------+ South TIE of ToF 21 +--------+ +--------+
. | | | | received by Level 2 |ToF 21| |ToF 22|
. |ToF 21| |ToF 22| south reflection of ++-+--+-++ ++-+--+-++
. ++-+--+-++ ++-+--+-++ spines 112 and 111 | | | | | | | |
. | | | | | | | | | | | | | | | 0/0
. | | | | | | | 0/0 | | | | | | | |
. | | | | | | | | | | | | | | | |
. | | | | | | | | +--------------+ | +--- XXXXXX + | | | +---------------+
. +--------------+ | +--- XXXXXX + | | | +---------------+ | | | | | | | |
. | | | | | | | | | +-----------------------------+ | | |
. | +-----------------------------+ | | | 0/0 | | | | | | |
. 0/0 | | | | | | | | 0/0 0/0 +- XXXXXXXXXXXXXXXXXXXXXXXXX -+ |
. | 0/0 0/0 +- XXXXXXXXXXXXXXXXXXXXXXXXX -+ | | 1.1/16 | | | | | |
. | 1.1/16 | | | | | | | | +-+ +-0/0-----------+ | |
. | | +-+ +-0/0-----------+ | | | | | 1.1./16 | | | |
. | | | 1.1./16 | | | | +-+----++ +-+-----+ ++-----0/0 ++----0/0
.+-+----++ +-+-----+ ++-----0/0 ++----0/0 Level 1 | | | | | 1.1/16 | 1.1/16
.| | | | | 1.1/16 | 1.1/16 |Spin111| |Spin112| |Spin121| |Spin122|
.|Spin111| |Spin112| |Spin121| |Spin122| +-+---+-+ ++----+-+ +-+---+-+ ++---+--+
.+-+---+-+ ++----+-+ +-+---+-+ ++---+--+ | | | | | | | |
. | | | | | | | | | +---------------+ | | +----------------+ |
. | +---------------+ | | +----------------+ | | | | | | | | |
. | | | | | | | | | +-------------+ | | | +--------------+ | |
. | +-------------+ | | | +--------------+ | | | | | | | | | |
. | | | | | | | | +-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+
.+-+---+-+ +--+--+-+ +-+---+-+ +---+-+-+ Level 3 | | | | | | | |
.| | | | | | | | |Leaf111| |Leaf112| |Leaf121| |Leaf122|
.|Leaf111| |Leaf112| |Leaf121| |Leaf122| +-+-----+ ++------+ +-----+-+ +-+-----+
.+-+-----+ ++------+ +-----+-+ +-+-----+ + + + +
. + + + + Prefix111 Prefix112 Prefix121 Prefix122
. Prefix111 Prefix112 Prefix121 Prefix122 1.1/16
. 1.1/16
Figure 33: Fabric partition Figure 34: Fabric Partition
Figure 33 shows the arguably a more catastrophic but also a more Figure 34 shows one of more catastrophic scenarios where ToF 21 is
interesting case. ToF 21 is completely severed from access to Prefix completely severed from access to Prefix 121 due to a double link
121 (we use in the figure 1.1/16 as example) by double link failure. failure. If only default routes existed, this would result in 50% of
However unlikely, if left unresolved, forwarding from leaf 111 and traffic from Leaf 111 and Leaf 112 toward Prefix 121 being black-
leaf 112 to prefix 121 would suffer 50% black-holing based on pure holed.
default route advertisements by ToF 21 and ToF 22.
The mechanism used to resolve this scenario is hinging on the The mechanism to resolve this scenario hinges on ToF 21's Sout TIEs
distribution of southbound representation by Top-of-Fabric 21 that is being reflected from Spine 111 and Spine 112 to ToF 22. Once ToF 22
reflected by spine 111 and spine 112 to ToF 22. ToF 22, having sees that Prefix 121 cannot be reached from ToF 21, it will begin to
computed reachability to all prefixes in the network, advertises with disaggregate Prefix 121 by advertising a more specific route (1.1/16)
the default route the ones that are reachable only via lower level along with the default IP prefix route to all spines (ToF 21 still
neighbors that ToF 21 does not show an adjacency to. That results in only sends a default route). The result is Spine 111 and Spine112
spine 111 and spine 112 obtaining a longest-prefix match to prefix using the more specific route to Prefix 121 via ToF 22. All other
121 which leads through ToF 22 and prevents black-holing through ToF prefixes continue to use the default IP prefix route toward both ToF
21 still advertising the 0/0 aggregate only. 21 and ToF 22.
The prefix 121 advertised by Top-of-Fabric 22 does not have to be The more specific route for Prefix 121 being advertised by ToF 22
propagated further towards leaves since they do no benefit from this does not need to be propagated further south to the leaves, as they
information. Hence the amount of flooding is restricted to ToF 21 do not benefit from this information. Spine 111 and Spine 112 are
reissuing its South TIEs and south reflection of those by spine 111 only required to reflect the new South Node TIEs received from ToF 22
and spine 112. The resulting SPF in ToF 22 issues a new prefix South to ToF 21. In short, only the relevant nodes received the relevant
TIEs containing 1.1/16. None of the leaves become aware of the updates, thereby restricting the failure to only the partitioned
changes and the failure is constrained strictly to the level that level rather than burdening the whole fabric with the flooding and
became partitioned. recomputation of the new topology information.
To finish with an example of the resulting sets computed using To finish our example, the following table shows sets computed by ToF
notation introduced in Section 4.2.5, Top-of-Fabric 22 constructs the 22 using notation introduced in Section 4.2.5:
following sets:
|R = Prefix 111, Prefix 112, Prefix 121, Prefix 122 |R = Prefix 111, Prefix 112, Prefix 121, Prefix 122
|H (for r=Prefix 111) = Spine 111, Spine 112 |H (for r=Prefix 111) = Spine 111, Spine 112
|H (for r=Prefix 112) = Spine 111, Spine 112 |H (for r=Prefix 112) = Spine 111, Spine 112
|H (for r=Prefix 121) = Spine 121, Spine 122 |H (for r=Prefix 121) = Spine 121, Spine 122
|H (for r=Prefix 122) = Spine 121, Spine 122 |H (for r=Prefix 122) = Spine 121, Spine 122
|A (for ToF 21) = Spine 111, Spine 112 |A (for ToF 21) = Spine 111, Spine 112
With that and |H (for r=prefix 121) and |H (for r=prefix 122) being With that and |H (for r=Prefix 121) and |H (for r=Prefix 122) being
disjoint from |A (for Top-of-Fabric 21), ToF 22 will originate an disjoint from |A (for ToF 21), ToF 22 will originate an South TIE
South TIE with prefix 121 and prefix 122, that is flooded to spines with Prefix 121 and Prefix 122, which will be flooded to all spines.
112, 112, 121 and 122.
5.4. Northbound Partitioned Router and Optional East-West Links 5.4. Northbound Partitioned Router and Optional East-West Links
. + + + . + + +
. X N1 | N2 | N3 . X N1 | N2 | N3
. X | | . X | |
.+--+----+ +--+----+ +--+-----+ .+--+----+ +--+----+ +--+-----+
.| |0/0> <0/0| |0/0> <0/0| | .| |0/0> <0/0| |0/0> <0/0| |
.| A01 +----------+ A02 +----------+ A03 | Level 1 .| A01 +----------+ A02 +----------+ A03 | Level 1
.++-+-+--+ ++--+--++ +---+-+-++ .++-+-+--+ ++--+--++ +---+-+-++
. | | | | | | | | | . | | | | | | | | |
. | | +----------------------------------+ | | | . | | +----------------------------------+ | | |
. | | | | | | | | | . | | | | | | | | |
skipping to change at page 118, line 28 skipping to change at page 120, line 25
. | | | | | | | | | . | | | | | | | | |
. | +----------------+ | +-----------------+ | . | +----------------+ | +-----------------+ |
. | | | | | | | | | . | | | | | | | | |
. | | +------------------------------------+ | | . | | +------------------------------------+ | |
. | | | | | | | | | . | | | | | | | | |
.++-+-+--+ | +---+---+ | +-+---+-++ .++-+-+--+ | +---+---+ | +-+---+-++
.| | +-+ +-+ | | .| | +-+ +-+ | |
.| L01 | | L02 | | L03 | Level 0 .| L01 | | L02 | | L03 | Level 0
.+-------+ +-------+ +--------+ .+-------+ +-------+ +--------+
Figure 34: North Partitioned Router Figure 35: North Partitioned Router
Figure 34 shows a part of a fabric where level 1 is horizontally Figure 35 shows a part of a fabric where level 1 is horizontally
connected and A01 lost its only northbound adjacency. Based on N-SPF connected and A01 lost its only northbound adjacency. Based on N-SPF
rules in Section 4.2.4.1 A01 will compute northbound reachability by rules in Section 4.2.4.1 A01 will compute northbound reachability by
using the link A01 to A02 (whereas A02 will NOT use this link during using the link A01 to A02. A02 however, will NOT use this link
N-SPF). Hence A01 will still advertise the default towards level 0 during N-SPF. The result is A01 utilizing the horizontal link for
and route unidirectionally using the horizontal link. default route advertisement and unidirectional routing.
As further consideration, the moment A02 looses link N2 the situation Furthermore, if A02 also loses its only northbound adjacency (N2),
evolves again. A01 will have no more northbound reachability while the situation evolves. A01 will no longer have northbound
still seeing A03 advertising northbound adjacencies in its south node reachability while it sees A03's northbound adjacencies in South Node
tie. With that it will stop advertising a default route due to TIEs reflected by nodes south of it. As a result, A01 will no longer
Section 4.2.3.8. advertise its default route in accordance with Section 4.2.3.8.
6. Implementation and Operation: Further Details 6. Implementation and Operation: Further Details
6.1. Considerations for Leaf-Only Implementation 6.1. Considerations for Leaf-Only Implementation
RIFT can and is intended to be stretched to the lowest level in the RIFT can and is intended to be stretched to the lowest level in the
IP fabric to integrate ToRs or even servers. Since those entities IP fabric to integrate ToRs or even servers. Since those entities
would run as leaves only, it is worth to observe that a leaf only would run as leaves only, it is worth to observe that a leaf only
version is significantly simpler to implement and requires much less version is significantly simpler to implement and requires much less
resources: resources:
1. Under normal conditions, the leaf needs to support a multipath 1. Leaf nodes only need to maintain a multipath default route under
default route only. In most catastrophic partitioning case it normal circumstances. However, in cases of catastrophic
has to be capable of accommodating all the leaf routes in its own partitioning, leaf nodes SHOULD be capable of accommodating all
PoD to prevent black-holing. the leaf routes in its own PoD to prevent black-holing.
2. Leaf nodes hold only their own North TIEs and South TIEs of Level 2. Leaf nodes hold only their own North TIEs and South TIEs of Level
1 nodes they are connected to; so overall few in numbers. 1 nodes they are connected to.
3. Leaf node does not have to support any type of de-aggregation 3. Leaf nodes do not have to support any type of de-aggregation
computation or propagation. computation or propagation.
4. Leaf nodes do not have to support overload bit normally. 4. Leaf nodes are not required to support overload bit.
5. Unless optional leaf-2-leaf procedures are desired default route 5. Leaf nodes do not need to originate S-TIEs unless optional leaf-
origination and South TIE origination is unnecessary. 2-leaf features are desired.
6.2. Considerations for Spine Implementation 6.2. Considerations for Spine Implementation
In case of spines, i.e. nodes that will never act as Top of Fabric a Spine nodes will never act as Top of Fabric, and are therefore not
full implementation is not required, specifically the node does not required to run a full RIFT implementation. Specifically, spines do
need to perform any computation of negative disaggregation except not need to perform negative disaggregation computation other than
respecting northbound disaggregation advertised from the north. respecting northbound disaggregation advertised from the north.
6.3. Adaptations to Other Proposed Data Center Topologies 6.3. Adaptations to Other Proposed Data Center Topologies
. +-----+ +-----+ . +-----+ +-----+
. | | | | . | | | |
.+-+ S0 | | S1 | .+-+ S0 | | S1 |
.| ++---++ ++---++ .| ++---++ ++---++
.| | | | | .| | | | |
.| | +------------+ | .| | +------------+ |
skipping to change at page 119, line 51 skipping to change at page 121, line 51
.| +-+--++ ++---++ .| +-+--++ ++---++
.| | | | | .| | | | |
.| | +------------+ | .| | +------------+ |
.| | +-----------+ | | .| | +-----------+ | |
.| | | | | .| | | | |
.| +-+-+-+ +--+-++ .| +-+-+-+ +--+-++
.+-+ | | | .+-+ | | |
. | L0 | | L1 | . | L0 | | L1 |
. +-----+ +-----+ . +-----+ +-----+
Figure 35: Level Shortcut Figure 36: Level Shortcut
Strictly speaking, RIFT is not limited to Clos variations only. The
protocol preconditions only a sense of 'compass rose direction'
achieved by configuration (or derivation) of levels and other
topologies are possible within this framework. So, conceptually, one
could include leaf to leaf links and even shortcut between levels
As an example, short cutting levels illustrated in Figure 35 will RIFT is not strictly limited to Clos topologies. The protocol only
lead either to suboptimal routing when L0 sends traffic to L1 (since requires a sense of "compass rose directionality" either achieved
using S0's default route will lead to the traffic being sent back to through configuration or derivation of levels. So, conceptually,
A0 or A1) or the leaves need each other's routes installed to leaf-2-leaf links and even shortcuts between levels could be
understand that only A0 and A1 should be used to talk to each other. included. Figure 36 depicts an example of a shortcut between levels.
In this example, sub-optimal routing will occur when traffic is sent
from L0 to L1 via S0's default route and back down through A0 or A1.
In order to ensure that only default routes from A0 or A1 are used,
all leaves would be required to install each others routes.
Whether such modifications of topology constraints make sense is While various technical and operational challenges may require the
dependent on many technology variables and the exhausting treatment use of such modifications, discussion of those topics are outside the
of the topic is definitely outside the scope of this document. scope of this document.
6.4. Originating Non-Default Route Southbound 6.4. Originating Non-Default Route Southbound
Obviously, an implementation MAY choose to originate southbound An implementation MAY choose to originate more specific prefixes (P')
instead of a strict default route (as described in Section 4.2.3.8) a southbound instead of only the default route (as described in
shorter prefix P' but in such a scenario all addresses carried within Section 4.2.3.8). In such a scenario, all addresses carried within
the RIFT domain must be contained within P'. the RIFT domain MUST be contained within P'.
7. Security Considerations 7. Security Considerations
7.1. General 7.1. General
One can consider attack vectors where a router may reboot many times One can consider attack vectors where a router may reboot many times
while changing its system ID and pollute the network with many stale while changing its system ID and pollute the network with many stale
TIEs or TIEs are sent with very long lifetimes and not cleaned up TIEs or TIEs are sent with very long lifetimes and not cleaned up
when the routes vanishes. Those attack vectors are not unique to when the routes vanish. Those attack vectors are not unique to RIFT.
RIFT. Given large memory footprints available today those attacks Given large memory footprints available today those attacks should be
should be relatively benign. Otherwise a node SHOULD implement a relatively benign. Otherwise a node SHOULD implement a strategy of
strategy of discarding contents of all TIEs that were not present in discarding contents of all TIEs that were not present in the SPF tree
the SPF tree over a certain, configurable period of time. Since the over a certain, configurable period of time. Since the protocol,
protocol, like all modern link-state protocols, is self-stabilizing like all modern link-state protocols, is self-stabilizing and will
and will advertise the presence of such TIEs to its neighbors, they advertise the presence of such TIEs to its neighbors, they can be re-
can be re-requested again if a computation finds that it sees an requested again if a computation finds that it sees an adjacency
adjacency formed towards the system ID of the discarded TIEs. formed towards the system ID of the discarded TIEs.
7.2. ZTP 7.2. ZTP
Section 4.2.7 presents many attack vectors in untrusted environments, Section 4.2.7 presents many attack vectors in untrusted environments,
starting with nodes that oscillate their level offers to the starting with nodes that oscillate their level offers to the
possibility of a node offering a three-way adjacency with the highest possibility of nodes offering a three-way adjacency with the highest
possible level value with a very long holdtime trying to put itself possible level value and a very long holdtime trying to put itself
"on top of the lattice" and with that gaining access to the whole "on top of the lattice" thereby allowing it to gain access to the
southbound topology. Session authentication mechanisms are necessary whole southbound topology. Session authentication mechanisms are
in environments where this is possible and RIFT provides the necessary in environments where this is possible and RIFT provides
according security envelope to ensure this if desired. the security envelope to ensure this if so desired.
7.3. Lifetime 7.3. Lifetime
Traditional IGP protocols are vulnerable to lifetime modification and Traditional IGP protocols are vulnerable to lifetime modification and
replay attacks that can be somewhat mitigated by using techniques replay attacks that can be somewhat mitigated by using techniques
like [RFC7987]. RIFT removes this attack vector by protecting the like [RFC7987]. RIFT removes this attack vector by protecting the
lifetime behind a signature computed over it and additional nonce lifetime behind a signature computed over it and additional nonce
combination which makes even the replay attack window very small and combination which makes even the replay attack window very small and
for practical purposes irrelevant since lifetime cannot be for practical purposes irrelevant since lifetime cannot be
artificially shortened by the attacker. artificially shortened by the attacker.
skipping to change at page 122, line 10 skipping to change at page 124, line 8
A compromised node can attempt to generate "fake TIEs" using other A compromised node can attempt to generate "fake TIEs" using other
nodes' TIE origin key identifiers. Albeit the ultimate validation of nodes' TIE origin key identifiers. Albeit the ultimate validation of
the origin fingerprint will fail in such scenarios and not progress the origin fingerprint will fail in such scenarios and not progress
further than immediately peering nodes, the resulting denial of further than immediately peering nodes, the resulting denial of
service attack seems unavoidable since the TIE origin key id is only service attack seems unavoidable since the TIE origin key id is only
protected by the, here assumed to be compromised, node. protected by the, here assumed to be compromised, node.
7.7. Host Implementations 7.7. Host Implementations
It can be reasonably expected that with the proliferation of RotH It can be reasonably expected that with the proliferation of RotH
servers, rather than dedicated networking devices, will constitute servers, rather than dedicated networking devices, servers will
significant amount of RIFT devices. Given their normally far wider represent a significant amount of RIFT devices. Given their normally
software envelope and access granted to them, such servers are also far wider software envelope and access granted to them, such servers
far more likely to be compromised and present an attack vector on the are also far more likely to be compromised and present an attack
protocol. Hijacking of prefixes to attract traffic is a trust vector on the protocol. Hijacking of prefixes to attract traffic is
problem and cannot be addressed within the protocol if the trust a trust problem and cannot be addressed within the protocol if the
model is breached, i.e. the server presents valid credentials to form trust model is breached, i.e. the server presents valid credentials
an adjacency and issue TIEs. However, in a move devious way, the to form an adjacency and issue TIEs. However, in a more devious way,
servers can present DoS (or even DDos) vectors of issuing too many the servers can present DoS (or even DDos) vectors of issuing too
LIE packets, flood large amount of North TIEs and similar anomalies. many LIE packets, flood large amounts of North TIEs and attempt
A prudent implementation hosting leaves should implement thresholds similar resource overrun attacks. A prudent implementation forming
and raise warnings when leaf is advertising number of TIEs in excess adjacencies to leaves should implement according thresholds
of those. mechanisms and raise warnings when e.g. a leaf is advertising an
excess number of TIEs.
8. IANA Considerations 8. IANA Considerations
This specification requests multicast address assignments and This specification requests multicast address assignments and
standard port numbers. Additionally registries for the schema are standard port numbers. Additionally registries for the schema are
requested and suggested values provided that reflect the numbers requested and suggested values provided that reflect the numbers
allocated in the given schema. allocated in the given schema.
8.1. Requested Multicast and Port Numbers 8.1. Requested Multicast and Port Numbers
skipping to change at page 123, line 4 skipping to change at page 124, line 51
8.2. Requested Registries with Suggested Values 8.2. Requested Registries with Suggested Values
This section requests registries that help govern the schema via This section requests registries that help govern the schema via
usual IANA registry procedures. A top level 'RIFT' registry should usual IANA registry procedures. A top level 'RIFT' registry should
hold the according registries requested in following sections with hold the according registries requested in following sections with
their pre-defined values. IANA is requested to store the schema their pre-defined values. IANA is requested to store the schema
version introducing the allocated value as well as, optionally, its version introducing the allocated value as well as, optionally, its
description when present. This will allow to assign different values description when present. This will allow to assign different values
to an entry depending on schema version. Alternately, IANA is to an entry depending on schema version. Alternately, IANA is
requested to consider a root RIFT/2 registry to store RIFT schema requested to consider a root RIFT/3 registry to store RIFT schema
major version 2 values and may be requested in the future to create a major version 3 values and may be requested in the future to create a
RIFT/3 registry under that. In any case, IANA is requested to store RIFT/4 registry under that. In any case, IANA is requested to store
the schema version in the entries since that will allow to the schema version in the entries since that will allow to
distinguish between minor versions in the same major schema version. distinguish between minor versions in the same major schema version.
All values not suggested as to be considered `Unassigned`. The range All values not suggested as to be considered `Unassigned`. The range
of every registry is a 16-bit integer. Allocation of new values is of every registry is a 16-bit integer. Allocation of new values is
always performed via `Expert Review` action. always performed via `Expert Review` action.
8.2.1. Registry RIFT/common/AddressFamilyType 8.2.1. Registry RIFT_v4/common/AddressFamilyType
Address family type. Address family type.
8.2.1.1. Requested Entries 8.2.1.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
Illegal 0 2.0 Illegal 0 4.0
AddressFamilyMinValue 1 2.0 AddressFamilyMinValue 1 4.0
IPv4 2 2.0 IPv4 2 4.0
IPv6 3 2.0 IPv6 3 4.0
AddressFamilyMaxValue 4 2.0 AddressFamilyMaxValue 4 4.0
8.2.2. Registry RIFT/common/HierarchyIndications 8.2.2. Registry RIFT_v4/common/HierarchyIndications
Flags indicating node configuration in case of ZTP. Flags indicating node configuration in case of ZTP.
8.2.2.1. Requested Entries 8.2.2.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
leaf_only 0 2.0 leaf_only 0 4.0
leaf_only_and_leaf_2_leaf_procedures 1 2.0 leaf_only_and_leaf_2_leaf_procedures 1 4.0
top_of_fabric 2 2.0 top_of_fabric 2 4.0
8.2.3. Registry RIFT/common/IEEE802_1ASTimeStampType 8.2.3. Registry RIFT_v4/common/IEEE802_1ASTimeStampType
Timestamp per IEEE 802.1AS, all values MUST be interpreted in Timestamp per IEEE 802.1AS, all values MUST be interpreted in
implementation as unsigned. implementation as unsigned.
8.2.3.1. Requested Entries 8.2.3.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
AS_sec 1 2.0 AS_sec 1 4.0
AS_nsec 2 2.0 AS_nsec 2 4.0
8.2.4. Registry RIFT/common/IPAddressType 8.2.4. Registry RIFT_v4/common/IPAddressType
IP address type. IP address type.
8.2.4.1. Requested Entries 8.2.4.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
ipv4address 1 2.0 Content is IPv4 ipv4address 1 4.0 Content is IPv4
ipv6address 2 2.0 Content is IPv6 ipv6address 2 4.0 Content is IPv6
8.2.5. Registry RIFT/common/IPPrefixType 8.2.5. Registry RIFT_v4/common/IPPrefixType
Prefix advertisement. Prefix advertisement.
@note: for interface addresses the protocol can propagate the address @note: for interface addresses the protocol can propagate the address
part beyond the subnet mask and on reachability computation that has part beyond the subnet mask and on reachability computation that has
to be normalized. The non-significant bits can be used for to be normalized. The non-significant bits can be used for
operational purposes. operational purposes.
8.2.5.1. Requested Entries 8.2.5.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
ipv4prefix 1 2.0 ipv4prefix 1 4.0
ipv6prefix 2 2.0 ipv6prefix 2 4.0
8.2.6. Registry RIFT/common/IPv4PrefixType 8.2.6. Registry RIFT_v4/common/IPv4PrefixType
IPv4 prefix type. IPv4 prefix type.
8.2.6.1. Requested Entries 8.2.6.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
address 1 2.0 address 1 4.0
prefixlen 2 2.0 prefixlen 2 4.0
8.2.7. Registry RIFT/common/IPv6PrefixType 8.2.7. Registry RIFT_v4/common/IPv6PrefixType
IPv6 prefix type. IPv6 prefix type.
8.2.7.1. Requested Entries 8.2.7.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
address 1 2.0 address 1 4.0
prefixlen 2 2.0 prefixlen 2 4.0
8.2.8. Registry RIFT/common/PrefixSequenceType 8.2.8. Registry RIFT_v4/common/PrefixSequenceType
Sequence of a prefix in case of move. Sequence of a prefix in case of move.
8.2.8.1. Requested Entries 8.2.8.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
timestamp 1 2.0 timestamp 1 4.0
transactionid 2 2.0 Transaction ID set by client in e.g. transactionid 2 4.0 Transaction ID set by client in e.g.
in 6LoWPAN. in 6LoWPAN.
8.2.9. Registry RIFT/common/RouteType 8.2.9. Registry RIFT_v4/common/RouteType
RIFT route types. RIFT route types.
@note: route types which MUST be ordered on their preference PGP @note: route types which MUST be ordered on their preference PGP
prefixes are most preferred attracting traffic north (towards spine) prefixes are most preferred attracting traffic north (towards spine)
and then south normal prefixes are attracting traffic south (towards and then south normal prefixes are attracting traffic south (towards
leaves), i.e. prefix in NORTH PREFIX TIE is preferred over SOUTH leafs), i.e. prefix in NORTH PREFIX TIE is preferred over SOUTH
PREFIX TIE. PREFIX TIE.
@note: The only purpose of those values is to introduce an ordering @note: The only purpose of those values is to introduce an ordering
whereas an implementation can choose internally any other values as whereas an implementation can choose internally any other values as
long the ordering is preserved long the ordering is preserved
8.2.9.1. Requested Entries 8.2.9.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
Illegal 0 2.0 Illegal 0 4.0
RouteTypeMinValue 1 2.0 RouteTypeMinValue 1 4.0
Discard 2 2.0 Discard 2 4.0
LocalPrefix 3 2.0 LocalPrefix 3 4.0
SouthPGPPrefix 4 2.0 SouthPGPPrefix 4 4.0
NorthPGPPrefix 5 2.0 NorthPGPPrefix 5 4.0
NorthPrefix 6 2.0 NorthPrefix 6 4.0
NorthExternalPrefix 7 2.0 NorthExternalPrefix 7 4.0
SouthPrefix 8 2.0 SouthPrefix 8 4.0
SouthExternalPrefix 9 2.0 SouthExternalPrefix 9 4.0
NegativeSouthPrefix 10 2.0 NegativeSouthPrefix 10 4.0
RouteTypeMaxValue 11 2.0 RouteTypeMaxValue 11 4.0
8.2.10. Registry RIFT/common/TIETypeType 8.2.10. Registry RIFT_v4/common/TIETypeType
Type of TIE. Type of TIE.
This enum indicates what TIE type the TIE is carrying. In case the This enum indicates what TIE type the TIE is carrying. In case the
value is not known to the receiver, the TIE MUST be re-flooded. This value is not known to the receiver, the TIE MUST be re-flooded. This
allows for future extensions of the protocol within the same major allows for future extensions of the protocol within the same major
schema with types opaque to some nodes UNLESS the flooding scope is schema with types opaque to some nodes UNLESS the flooding scope is
not the same as prefix TIE, then a major version revision MUST be not the same as prefix TIE, then a major version revision MUST be
performed. performed.
8.2.10.1. Requested Entries 8.2.10.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
Illegal 0 2.0 Illegal 0 4.0
TIETypeMinValue 1 2.0 TIETypeMinValue 1 4.0
NodeTIEType 2 2.0 NodeTIEType 2 4.0
PrefixTIEType 3 2.0 PrefixTIEType 3 4.0
PositiveDisaggregationPrefixTIEType 4 2.0 PositiveDisaggregationPrefixTIEType 4 4.0
NegativeDisaggregationPrefixTIEType 5 2.0 NegativeDisaggregationPrefixTIEType 5 4.0
PGPrefixTIEType 6 2.0 PGPrefixTIEType 6 4.0
KeyValueTIEType 7 2.0 KeyValueTIEType 7 4.0
ExternalPrefixTIEType 8 2.0 ExternalPrefixTIEType 8 4.0
PositiveExternalDisaggregationPrefixTIEType 9 2.0 PositiveExternalDisaggregationPrefixTIEType 9 4.0
TIETypeMaxValue 10 2.0 TIETypeMaxValue 10 4.0
8.2.11. Registry RIFT/common/TieDirectionType 8.2.11. Registry RIFT_v4/common/TieDirectionType
Direction of TIEs. Direction of TIEs.
8.2.11.1. Requested Entries 8.2.11.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
Illegal 0 2.0 Illegal 0 4.0
South 1 2.0 South 1 4.0
North 2 2.0 North 2 4.0
DirectionMaxValue 3 2.0 DirectionMaxValue 3 4.0
8.2.12. Registry RIFT/encoding/Community 8.2.12. Registry RIFT_v4/encoding/Community
Prefix community. Prefix community.
8.2.12.1. Requested Entries 8.2.12.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
top 1 2.0 Higher order bits top 1 4.0 Higher order bits
bottom 2 2.0 Lower order bits bottom 2 4.0 Lower order bits
8.2.13. Registry RIFT/encoding/KeyValueTIEElement 8.2.13. Registry RIFT_v4/encoding/KeyValueTIEElement
Generic key value pairs. Generic key value pairs.
8.2.13.1. Requested Entries 8.2.13.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
keyvalues 1 2.0 keyvalues 1 4.0
8.2.14. Registry RIFT/encoding/LIEPacket 8.2.14. Registry RIFT_v4/encoding/LIEPacket
RIFT LIE Packet. RIFT LIE Packet.
@note: this node's level is already included on the packet header @note: this node's level is already included on the packet header
8.2.14.1. Requested Entries 8.2.14.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
name 1 2.0 Node or adjacency name. name 1 4.0 Node or adjacency name.
local_id 2 2.0 Local link ID. local_id 2 4.0 Local link ID.
flood_port 3 2.0 UDP port to which we can flood_port 3 4.0 UDP port to which we can
receive flooded TIEs. receive flooded TIEs.
link_mtu_size 4 2.0 Layer 3 MTU, used to link_mtu_size 4 4.0 Layer 3 MTU, used to
discover to mismatch. discover to mismatch.
link_bandwidth 5 2.0 Local link bandwidth on link_bandwidth 5 4.0 Local link bandwidth on
the interface. the interface.
neighbor 6 2.0 Reflects the neighbor once neighbor 6 4.0 Reflects the neighbor once
received to provide received to provide
3-way connectivity. 3-way connectivity.
pod 7 2.0 Node's PoD. pod 7 4.0 Node's PoD.
node_capabilities 10 2.0 Node capabilities shown in node_capabilities 10 4.0 Node capabilities shown in
the LIE. The capabilities LIE. The capabilities
MUST match the capabilities MUST match the capabilities
shown in the Node TIEs, shown in the Node TIEs,
otherwise otherwise the behavior
the behavior is is unspecified. A node
unspecified. A node
detecting the mismatch detecting the mismatch
SHOULD generate according SHOULD generate according
error. error.
link_capabilities 11 2.0 Capabilities of this link. link_capabilities 11 4.0 Capabilities of this link.
holdtime 12 2.0 Required holdtime of the holdtime 12 4.0 Required holdtime of the
adjacency, i.e. how much adjacency, i.e. how much
time time MUST expire
MUST expire without LIE for without LIE for the
the adjacency to drop. adjacency to drop.
label 13 2.0 Unsolicited, downstream label 13 4.0 Unsolicited, downstream
assigned locally assigned locally
significant label significant label value
value for the adjacency. for the adjacency.
not_a_ztp_offer 21 2.0 Indicates that the level not_a_ztp_offer 21 4.0 Indicates that the level
on the LIE MUST NOT be used on the LIE MUST NOT be used
to derive a ZTP level by to derive a ZTP level by
the receiving node. the receiving node.
you_are_flood_repeater 22 2.0 Indicates to northbound you_are_flood_repeater 22 4.0 Indicates to northbound
neighbor that it should neighbor that it should
be reflooding this node's be reflooding this node's
N-TIEs to achieve flood North TIEs to achieve flood
reduction and reduction and balancing
balancing for northbound for northbound flooding. To
flooding. To be ignored if be ignored if received
received from a from a northbound
northbound adjacency. adjacency.
you_are_sending_too_quickly 23 2.0 Can be optionally set to you_are_sending_too_quickly 23 4.0 Can be optionally set to
indicate to neighbor that indicate to neighbor that
packet losses are seen on packet losses are seen
reception based on packet on reception based on
numbers or the rate is too packet numbers or the rate
high. The receiver SHOULD is too high. The
temporarily slow down receiver SHOULD temporarily
flooding rates. slow down flooding
instance_name 24 2.0 Instance name in case rates.
instance_name 24 4.0 Instance name in case
multiple RIFT instances multiple RIFT instances
running on same interface. running on same
interface.
8.2.15. Registry RIFT/encoding/LinkCapabilities 8.2.15. Registry RIFT_v4/encoding/LinkCapabilities
Link capabilities. Link capabilities.
8.2.15.1. Requested Entries 8.2.15.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
bfd 1 2.0 Indicates that the link is bfd 1 4.0 Indicates that the link is
supporting BFD. supporting BFD.
v4_forwarding_capable 2 2.0 v4_forwarding_capable 2 4.0 Indicates whether the interface
will support v4 forwarding.
8.2.16. Registry RIFT/encoding/LinkIDPair 8.2.16. Registry RIFT_v4/encoding/LinkIDPair
LinkID pair describes one of parallel links between two nodes. LinkID pair describes one of parallel links between two nodes.
8.2.16.1. Requested Entries 8.2.16.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
local_id 1 2.0 Node-wide unique value for local_id 1 4.0 Node-wide unique value for
the local link. the local link.
remote_id 2 2.0 Received remote link ID for remote_id 2 4.0 Received remote link ID for
this link. this link.
platform_interface_index 10 2.0 Describes the local platform_interface_index 10 4.0 Describes the local
interface index of the link. interface index of the link.
platform_interface_name 11 2.0 Describes the local platform_interface_name 11 4.0 Describes the local
interface name. interface name.
trusted_outer_security_key 12 2.0 Indication whether the link trusted_outer_security_key 12 4.0 Indication whether the link
is secured, i.e. protected is secured, i.e. protected
by outer key, absence by outer key, absence of
of this element means no this element means no
indication, undefined outer indication, undefined
key means not secured. outer key means not secured.
bfd_up 13 2.0 Indication whether the link bfd_up 13 4.0 Indication whether the link
is protected by established is protected by established
BFD session. BFD session.
8.2.17. Registry RIFT/encoding/Neighbor 8.2.17. Registry RIFT_v4/encoding/Neighbor
Neighbor structure. Neighbor structure.
8.2.17.1. Requested Entries 8.2.17.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
originator 1 2.0 System ID of the originator. originator 1 4.0 System ID of the originator.
remote_id 2 2.0 ID of remote side of the link. remote_id 2 4.0 ID of remote side of the link.
8.2.18. Registry RIFT/encoding/NodeCapabilities 8.2.18. Registry RIFT_v4/encoding/NodeCapabilities
Capabilities the node supports. Capabilities the node supports.
@note: The schema may add to this field future capabilities to @note: The schema may add to this field future capabilities to
indicate whether it will support interpretation of future schema indicate whether it will support interpretation of future schema
extensions on the same major revision. Such fields MUST be optional extensions on the same major revision. Such fields MUST be optional
and have an implicit or explicit false default value. If a future and have an implicit or explicit false default value. If a future
capability changes route selection or generates blackholes if some capability changes route selection or generates blackholes if some
nodes are not supporting it then a major version increment is nodes are not supporting it then a major version increment is
unavoidable. unavoidable.
8.2.18.1. Requested Entries 8.2.18.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
protocol_minor_version 1 2.0 Must advertise supported minor protocol_minor_version 1 4.0 Must advertise supported minor
version dialect that way. version dialect that way.
flood_reduction 2 2.0 Can this node participate in flood_reduction 2 4.0 Can this node participate in
flood reduction. flood reduction.
hierarchy_indications 3 2.0 Does this node restrict itself hierarchy_indications 3 4.0 Does this node restrict itself
to be top-of-fabric or to be top-of-fabric or leaf
leaf only (in ZTP) and does it only (in ZTP) and does it
support leaf-2-leaf procedures. support leaf-2-leaf
procedures.
8.2.19. Registry RIFT/encoding/NodeFlags 8.2.19. Registry RIFT_v4/encoding/NodeFlags
Indication flags of the node. Indication flags of the node.
8.2.19.1. Requested Entries 8.2.19.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
overload 1 2.0 Indicates that node is in overload, do not overload 1 4.0 Indicates that node is in overload, do not
transit traffic through it. transit traffic through it.
8.2.20. Registry RIFT/encoding/NodeNeighborsTIEElement 8.2.20. Registry RIFT_v4/encoding/NodeNeighborsTIEElement
neighbor of a node neighbor of a node
8.2.20.1. Requested Entries 8.2.20.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
level 1 2.0 level of neighbor level 1 4.0 level of neighbor
cost 3 2.0 cost 3 4.0 Cost to neighbor.
link_ids 4 2.0 can carry description of multiple parallel link_ids 4 4.0 can carry description of multiple parallel
links in a TIE links in a TIE
bandwidth 5 2.0 total bandwidth to neighbor, this will be bandwidth 5 4.0 total bandwith to neighbor, this will be
normally sum of the normally sum of the bandwidths of all the
bandwidths of all the parallel links. parallel links.
8.2.21. Registry RIFT/encoding/NodeTIEElement 8.2.21. Registry RIFT_v4/encoding/NodeTIEElement
Description of a node. Description of a node.
It may occur multiple times in different TIEs but if either It may occur multiple times in different TIEs but if either
capabilities values do not match or capabilities values do not match or
flags values do not match or flags values do not match or
neighbors repeat with different values neighbors repeat with different values
skipping to change at page 131, line 17 skipping to change at page 133, line 17
Neighbors can be distributed across multiple TIEs however if the sets Neighbors can be distributed across multiple TIEs however if the sets
are disjoint. Miscablings SHOULD be repeated in every node TIE, are disjoint. Miscablings SHOULD be repeated in every node TIE,
otherwise the behavior is undefined. otherwise the behavior is undefined.
@note: Observe that absence of fields implies defined defaults. @note: Observe that absence of fields implies defined defaults.
8.2.21.1. Requested Entries 8.2.21.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
level 1 2.0 Level of the node. level 1 4.0 Level of the node.
neighbors 2 2.0 Node's neighbors. If neighbor systemID neighbors 2 4.0 Node's neighbors. If neighbor systemID
repeats in other node TIEs of repeats in other node TIEs of same
same node the behavior is undefined. node the behavior is undefined.
capabilities 3 2.0 Capabilities of the node. capabilities 3 4.0 Capabilities of the node.
flags 4 2.0 Flags of the node. flags 4 4.0 Flags of the node.
name 5 2.0 Optional node name for easier name 5 4.0 Optional node name for easier
operations. operations.
pod 6 2.0 PoD to which the node belongs. pod 6 4.0 PoD to which the node belongs.
miscabled_links 10 2.0 If any local links are miscabled, the miscabled_links 10 4.0 If any local links are miscabled, the
indication is flooded. indication is flooded.
8.2.22. Registry RIFT/encoding/PacketContent 8.2.22. Registry RIFT_v4/encoding/PacketContent
Content of a RIFT packet. Content of a RIFT packet.
8.2.22.1. Requested Entries 8.2.22.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
lie 1 2.0 lie 1 4.0
tide 2 2.0 tide 2 4.0
tire 3 2.0 tire 3 4.0
tie 4 2.0 tie 4 4.0
8.2.23. Registry RIFT/encoding/PacketHeader 8.2.23. Registry RIFT_v4/encoding/PacketHeader
Common RIFT packet header. Common RIFT packet header.
8.2.23.1. Requested Entries 8.2.23.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
major_version 1 2.0 Major version of protocol. major_version 1 4.0 Major version of protocol.
minor_version 2 2.0 Minor version of protocol. minor_version 2 4.0 Minor version of protocol.
sender 3 2.0 Node sending the packet, in case of sender 3 4.0 Node sending the packet, in case of
LIE/TIRE/TIDE also LIE/TIRE/TIDE also the originator of
the originator of it. it.
level 4 2.0 Level of the node sending the packet, level 4 4.0 Level of the node sending the packet,
required on everything except required on everything except LIEs.
LIEs. Lack of presence on LIEs indicates Lack of presence on LIEs indicates
UNDEFINED_LEVEL and is used UNDEFINED_LEVEL and is used in ZTP
in ZTP procedures. procedures.
8.2.24. Registry RIFT/encoding/PrefixAttributes 8.2.24. Registry RIFT_v4/encoding/PrefixAttributes
Attributes of a prefix. Attributes of a prefix.
8.2.24.1. Requested Entries 8.2.24.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
metric 2 2.0 Distance of the prefix. metric 2 4.0 Distance of the prefix.
tags 3 2.0 Generic unordered set of route tags, tags 3 4.0 Generic unordered set of route tags,
can be redistributed to other can be redistributed to other
protocols or use protocols or use within the context
within the context of real time of real time analytics.
analytics. monotonic_clock 4 4.0 Monotonic clock for mobile
monotonic_clock 4 2.0 Monotonic clock for mobile
addresses. addresses.
loopback 6 2.0 Indicates if the interface is a node loopback 6 4.0 Indicates if the interface is a node
loopback. loopback.
directly_attached 7 2.0 Indicates that the prefix is directly_attached 7 4.0 Indicates that the prefix is
directly attached, i.e. should be directly attached, i.e. should be
routed to even if routed to even if the node is in
the node is in overload. * overload.
from_link 10 2.0 In case of locally originated from_link 10 4.0 In case of locally originated
prefixes, i.e. interface addresses prefixes, i.e. interface
this can describe addresses this can describe which
which link the address belongs to. link the address belongs to.
8.2.25. Registry RIFT/encoding/PrefixTIEElement 8.2.25. Registry RIFT_v4/encoding/PrefixTIEElement
TIE carrying prefixes TIE carrying prefixes
8.2.25.1. Requested Entries 8.2.25.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
prefixes 1 2.0 Prefixes with the associated attributes. prefixes 1 4.0 Prefixes with the associated attributes.
If the same prefix repeats in multiple TIEs of If the same prefix repeats in multiple TIEs of
same node behavior is same node behavior is unspecified.
unspecified.
8.2.26. Registry RIFT/encoding/ProtocolPacket 8.2.26. Registry RIFT_v4/encoding/ProtocolPacket
RIFT packet structure. RIFT packet structure.
8.2.26.1. Requested Entries 8.2.26.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
header 1 2.0 header 1 4.0
content 2 2.0 content 2 4.0
8.2.27. Registry RIFT/encoding/TIDEPacket 8.2.27. Registry RIFT_v4/encoding/TIDEPacket
TIDE with sorted TIE headers, if headers are unsorted, behavior is TIDE with sorted TIE headers, if headers are unsorted, behavior is
undefined. undefined.
8.2.27.1. Requested Entries 8.2.27.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
start_range 1 2.0 First TIE header in the tide start_range 1 4.0 First TIE header in the tide
packet. packet.
end_range 2 2.0 Last TIE header in the tide packet. end_range 2 4.0 Last TIE header in the tide packet.
headers 3 2.0 _Sorted_ list of headers. headers 3 4.0 _Sorted_ list of headers.
8.2.28. Registry RIFT/encoding/TIEElement 8.2.28. Registry RIFT_v4/encoding/TIEElement
Single element in a TIE. Single element in a TIE.
Schema enum `common.TIETypeType` in TIEID indicates which elements Schema enum `common.TIETypeType` in TIEID indicates which elements
MUST be present in the TIEElement. In case of mismatch the MUST be present in the TIEElement. In case of mismatch the
unexpected elements MUST be ignored. In case of lack of expected unexpected elements MUST be ignored. In case of lack of expected
element the TIE an error MUST be reported and the TIE MUST be element the TIE an error MUST be reported and the TIE MUST be
ignored. ignored.
This type can be extended with new optional elements for new This type can be extended with new optional elements for new
`common.TIETypeType` values without breaking the major but if it is `common.TIETypeType` values without breaking the major but if it is
necessary to understand whether all nodes support the new type a node necessary to understand whether all nodes support the new type a node
capability must be added as well. capability must be added as well.
8.2.28.1. Requested Entries 8.2.28.1. Requested Entries
Name Valu Schem Description Name Valu Schem Description
e a Ver e a Ver
sion sion
node 1 2.0 Used in case of enum comm node 1 4.0 Used in case of enum comm
on.TIETypeType.NodeTIEType on.TIETypeType.NodeTIEType
. .
prefixes 2 2.0 Used in case of enum comm prefixes 2 4.0 Used in case of enum comm
on.TIETypeType.PrefixTIETy on.TIETypeType.PrefixTIETy
pe. pe.
positive_disaggregation_prefixe 3 2.0 Positive prefixes (always positive_disaggregation_prefixe 3 4.0 Positive prefixes (always
s southbound). s southbound). It MUST
It MUST NOT be advertised NOT be advertised within a
within a North TIE and North TIE and ignored
ignored otherwise otherwise.
negative_disaggregation_prefixe 5 2.0 Transitive, negative negative_disaggregation_prefixe 5 4.0 Transitive, negative
s prefixes (always s prefixes (always
southbound) which southbound) which MUST
MUST be aggregated and be aggregated and
propagated propagated according
according to the to the specification
specification
southwards towards lower southwards towards lower
levels to heal levels to heal
pathological upper level pathological upper level
partitioning, otherwise partitioning, otherwise
blackholes may occur in blackholes may occur in
multiplane fabrics. multiplane fabrics. It
It MUST NOT be advertised MUST NOT be advertised
within a North TIE. within a North TIE.
external_prefixes 6 2.0 Externally reimported external_prefixes 6 4.0 Externally reimported
prefixes. prefixes.
positive_external_disaggregatio 7 2.0 Positive external positive_external_disaggregatio 7 4.0 Positive external
n_prefixes disaggregated prefixes n_prefixes disaggregated prefixes
(always southbound). (always southbound).
It MUST NOT be advertised It MUST NOT be advertised
within a North TIE and within a North TIE and
ignored otherwise. ignored otherwise.
keyvalues 9 2.0 Key-Value store elements. keyvalues 9 4.0 Key-Value store elements.
8.2.29. Registry RIFT/encoding/TIEHeader 8.2.29. Registry RIFT_v4/encoding/TIEHeader
Header of a TIE. Header of a TIE.
@note: TIEID space is a total order achieved by comparing the @note: TIEID space is a total order achieved by comparing the
elements in sequence defined and comparing each value as an unsigned elements in sequence defined and comparing each value as an unsigned
integer of according length. integer of according length.
@note: After sequence number the lifetime received on the envelope @note: After sequence number the lifetime received on the envelope
must be used for comparison before further fields. must be used for comparison before further fields.
@note: `origination_time` and `origination_lifetime` are disregarded @note: `origination_time` and `origination_lifetime` are disregarded
for comparison purposes and carried purely for debugging/security for comparison purposes and carried purely for debugging/security
purposes if present. purposes if present.
8.2.29.1. Requested Entries 8.2.29.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
tieid 2 2.0 ID of the tie. tieid 2 4.0 ID of the tie.
seq_nr 3 2.0 Sequence number of the tie. seq_nr 3 4.0 Sequence number of the tie.
origination_time 10 2.0 Absolute timestamp when the TIE origination_time 10 4.0 Absolute timestamp when the TIE
was generated. This can be used on was generated. This can be used on
fabrics with fabrics with synchronized
synchronized clock to prevent clock to prevent lifetime
lifetime modification attacks. modification attacks.
origination_lifetime 12 2.0 Original lifetime when the TIE origination_lifetime 12 4.0 Original lifetime when the TIE
was generated. This can be used on was generated. This can be used on
fabrics with fabrics with synchronized
synchronized clock to prevent clock to prevent lifetime
lifetime modification attacks. modification attacks.
8.2.30. Registry RIFT/encoding/TIEHeaderWithLifeTime 8.2.30. Registry RIFT_v4/encoding/TIEHeaderWithLifeTime
Header of a TIE as described in TIRE/TIDE. Header of a TIE as described in TIRE/TIDE.
8.2.30.1. Requested Entries 8.2.30.1. Requested Entries
Name Value Schema Description Name Value Schema Description
Version Version
header 1 2.0 header 1 4.0
remaining_lifetime 2 2.0 Remaining lifetime that expires remaining_lifetime 2 4.0 Remaining lifetime that expires
down to 0 just like in ISIS. down to 0 just like in ISIS.
TIEs with lifetimes differing by TIEs with lifetimes differing by
less than `lifetime_diff2ignore` less than `lifetime_diff2ignore`
MUST be MUST be considered EQUAL.
considered EQUAL.
8.2.31. Registry RIFT/encoding/TIEID 8.2.31. Registry RIFT_v4/encoding/TIEID
ID of a TIE. ID of a TIE.
@note: TIEID space is a total order achieved by comparing the @note: TIEID space is a total order achieved by comparing the
elements in sequence defined and comparing each value as an unsigned elements in sequence defined and comparing each value as an unsigned
integer of according length. integer of according length.
8.2.31.1. Requested Entries 8.2.31.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
direction 1 2.0 direction of TIE direction 1 4.0 direction of TIE
originator 2 2.0 indicates originator of the TIE originator 2 4.0 indicates originator of the TIE
tietype 3 2.0 type of the tie tietype 3 4.0 type of the tie
tie_nr 4 2.0 number of the tie tie_nr 4 4.0 number of the tie
8.2.32. Registry RIFT/encoding/TIEPacket 8.2.32. Registry RIFT_v4/encoding/TIEPacket
TIE packet TIE packet
8.2.32.1. Requested Entries 8.2.32.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
header 1 2.0 header 1 4.0
element 2 2.0 element 2 4.0
8.2.33. Registry RIFT/encoding/TIREPacket 8.2.33. Registry RIFT_v4/encoding/TIREPacket
TIRE packet TIRE packet
8.2.33.1. Requested Entries 8.2.33.1. Requested Entries
Name Value Schema Version Description Name Value Schema Version Description
headers 1 2.0 headers 1 4.0
9. Acknowledgments 9. Acknowledgments
A new routing protocol in its complexity is not a product of a parent A new routing protocol in its complexity is not a product of a parent
but of a village as the author list shows already. However, many but of a village as the author list shows already. However, many
more people provided input, fine-combed the specification based on more people provided input, fine-combed the specification based on
their experience in design or implementation. This section will make their experience in design, implementation or application of
an inadequate attempt in recording their contribution. protocols in IP fabrics. This section will make an inadequate
attempt in recording their contribution.
Many thanks to Naiming Shen for some of the early discussions around Many thanks to Naiming Shen for some of the early discussions around
the topic of using IGPs for routing in topologies related to Clos. the topic of using IGPs for routing in topologies related to Clos.
Russ White to be especially acknowledged for the key conversation on Russ White to be especially acknowledged for the key conversation on
epistemology that allowed to tie current asynchronous distributed epistemology that allowed to tie current asynchronous distributed
systems theory results to a modern protocol design presented here. systems theory results to a modern protocol design presented in this
Adrian Farrel, Joel Halpern, Jeffrey Zhang, Krzysztof Szarkowicz, scope. Adrian Farrel, Joel Halpern, Jeffrey Zhang, Krzysztof
Nagendra Kumar, Melchior Aelmans provided thoughtful comments that Szarkowicz, Nagendra Kumar, Melchior Aelmans, Kaushal Tank, Will
Jones, Moin Ahmed and Jordan Head provided thoughtful comments that
improved the readability of the document and found good amount of improved the readability of the document and found good amount of
corners where the light failed to shine. Kris Price was first to corners where the light failed to shine. Kris Price was first to
mention single router, single arm default considerations. Jeff mention single router, single arm default considerations. Jeff
Tantsura helped out with some initial thoughts on BFD interactions Tantsura helped out with some initial thoughts on BFD interactions
while Jeff Haas corrected several misconceptions about BFD's finer while Jeff Haas corrected several misconceptions about BFD's finer
points. Artur Makutunowicz pointed out many possible improvements points. Artur Makutunowicz pointed out many possible improvements
and acted as sounding board in regard to modern protocol and acted as sounding board in regard to modern protocol
implementation techniques RIFT is exploring. Barak Gafni formalized implementation techniques RIFT is exploring. Barak Gafni formalized
first time clearly the problem of partitioned spine and fallen leaves first time clearly the problem of partitioned spine and fallen leaves
on a (clean) napkin in Singapore that led to the very important part on a (clean) napkin in Singapore that led to the very important part
of the specification centered around multiple Top-of-Fabric planes of the specification centered around multiple Top-of-Fabric planes
and negative disaggregation. Igor Gashinsky and others shared many and negative disaggregation. Igor Gashinsky and others shared many
thoughts on problems encountered in design and operation of large- thoughts on problems encountered in design and operation of large-
scale data center fabrics. Xu Benchong found a delicate error in the scale data center fabrics. Xu Benchong found a delicate error in the
flooding procedures while implementing. flooding procedures and a schema datatype size mismatch.
Last but not least, Alvaro Retana guided the undertaking by asking
many necessary procedural and technical questions which did not only
improve the content but did also lay out the track towards
publication.
10. References 10. References
10.1. Normative References 10.1. Normative References
[EUI64] IEEE, "Guidelines for Use of Extended Unique Identifier [EUI64] IEEE, "Guidelines for Use of Extended Unique Identifier
(EUI), Organizationally Unique Identifier (OUI), and (EUI), Organizationally Unique Identifier (OUI), and
Company ID (CID)", IEEE EUI, Company ID (CID)", IEEE EUI,
<http://standards.ieee.org/develop/regauth/tut/eui.pdf>. <http://standards.ieee.org/develop/regauth/tut/eui.pdf>.
skipping to change at page 143, line 7 skipping to change at page 145, line 19
6. changes name of any field or type or 6. changes name of any field or type or
7. change data types of any field or 7. change data types of any field or
8. adds, changes or removes a default value of any *existing* field 8. adds, changes or removes a default value of any *existing* field
or or
9. removes or changes any defined constant or constant value or 9. removes or changes any defined constant or constant value or
10. changes any enumeration type except extending `common.TIEType` 10. changes any enumeration type except extending
(use of enumeration types is generally discouraged) `common.TIETypeType` (use of enumeration types is generally
discouraged)
major version of the schema MUST increase. All other changes MUST major version of the schema MUST increase. All other changes MUST
increase minor version within the same major. increase minor version within the same major.
Observe however that introducing an optional field does not cause a The above set of rules guarantees that every decoder can process
major version increase even if the fields inside the structure are serialized content generated by a higher minor version of the schema
and with that the protocol can progress without a 'fork-lift'.
Additionally, based on the propagated minor version in encoded
content and added optional node capabilities new TIE types or even
de-facto mandatory fields can be introduced without progressing the
major version albeit only nodes supporting such new extensions would
decode them. Given the model is encoded at the source and never re-
encoded flooding through nodes not understanding any new extensions
will preserve the according fields.
Content serialized using a major version X is NOT expected to be
decodable by any implementation using decoder for a model with a
major version lower than X.
Observe especially that introducing an optional field does not cause
a major version increase even if the fields inside the structure are
optional with defaults. optional with defaults.
All signed integer as forced by Thrift [thrift] support must be cast All signed integer as forced by Thrift [thrift] support must be cast
for internal purposes to equivalent unsigned values without for internal purposes to equivalent unsigned values without
discarding the signedness bit. An implementation SHOULD try to avoid discarding the signedness bit. An implementation SHOULD try to avoid
using the signedness bit when generating values. using the signedness bit when generating values.
The schema is normative. The schema is normative.
B.1. common.thrift B.1. common.thrift
/** /**
Thrift file with common definitions for RIFT Thrift file with common definitions for RIFT
*/ */
namespace py common
namespace rs models
/** @note MUST be interpreted in implementation as unsigned 64 bits. /** @note MUST be interpreted in implementation as unsigned 64 bits.
* The implementation SHOULD NOT use the MSB. * The implementation SHOULD NOT use the MSB.
*/ */
typedef i64 SystemIDType typedef i64 SystemIDType
typedef i32 IPv4Address typedef i32 IPv4Address
/** this has to be long enough to accomodate prefix */ /** this has to be long enough to accomodate prefix */
typedef binary IPv6Address typedef binary IPv6Address
/** @note MUST be interpreted in implementation as unsigned */ /** @note MUST be interpreted in implementation as unsigned */
typedef i16 UDPPortType typedef i16 UDPPortType
/** @note MUST be interpreted in implementation as unsigned */ /** @note MUST be interpreted in implementation as unsigned */
typedef i32 TIENrType typedef i32 TIENrType
/** @note MUST be interpreted in implementation as unsigned */ /** @note MUST be interpreted in implementation as unsigned */
typedef i32 MTUSizeType typedef i32 MTUSizeType
/** @note MUST be interpreted in implementation as unsigned /** @note MUST be interpreted in implementation as unsigned
rolling over number */ rolling over number */
typedef i16 SeqNrType typedef i64 SeqNrType
/** @note MUST be interpreted in implementation as unsigned */ /** @note MUST be interpreted in implementation as unsigned */
typedef i32 LifeTimeInSecType typedef i32 LifeTimeInSecType
/** @note MUST be interpreted in implementation as unsigned */ /** @note MUST be interpreted in implementation as unsigned */
typedef i8 LevelType typedef i8 LevelType
/** optional, recommended monotonically increasing number /** optional, recommended monotonically increasing number
_per packet type per adjacency_ _per packet type per adjacency_
that can be used to detect losses/misordering/restarts. that can be used to detect losses/misordering/restarts.
@note MUST be interpreted in implementation as unsigned @note MUST be interpreted in implementation as unsigned
rolling over number */ rolling over number */
typedef i16 PacketNumberType typedef i16 PacketNumberType
skipping to change at page 148, line 46 skipping to change at page 151, line 24
PositiveExternalDisaggregationPrefixTIEType = 9, PositiveExternalDisaggregationPrefixTIEType = 9,
TIETypeMaxValue = 10, TIETypeMaxValue = 10,
} }
/** RIFT route types. /** RIFT route types.
@note: route types which MUST be ordered on their preference @note: route types which MUST be ordered on their preference
PGP prefixes are most preferred attracting PGP prefixes are most preferred attracting
traffic north (towards spine) and then south traffic north (towards spine) and then south
normal prefixes are attracting traffic south normal prefixes are attracting traffic south
(towards leaves), i.e. prefix in NORTH PREFIX TIE (towards leafs), i.e. prefix in NORTH PREFIX TIE
is preferred over SOUTH PREFIX TIE. is preferred over SOUTH PREFIX TIE.
@note: The only purpose of those values is to introduce an @note: The only purpose of those values is to introduce an
ordering whereas an implementation can choose internally ordering whereas an implementation can choose internally
any other values as long the ordering is preserved any other values as long the ordering is preserved
*/ */
enum RouteType { enum RouteType {
Illegal = 0, Illegal = 0,
RouteTypeMinValue = 1, RouteTypeMinValue = 1,
/** First legal value. */ /** First legal value. */
/** Discard routes are most preferred */ /** Discard routes are most preferred */
Discard = 2, Discard = 2,
/** Local prefixes are directly attached prefixes on the /** Local prefixes are directly attached prefixes on the
* system such as e.g. interface routes. * system such as e.g. interface routes.
*/ */
skipping to change at page 149, line 43 skipping to change at page 152, line 22
B.2. encoding.thrift B.2. encoding.thrift
/** /**
Thrift file for packet encodings for RIFT Thrift file for packet encodings for RIFT
*/ */
include "common.thrift" include "common.thrift"
/** Represents protocol encoding schema major version */ /** Represents protocol encoding schema major version */
const common.VersionType protocol_major_version = 2 const common.VersionType protocol_major_version = 4
/** Represents protocol encoding schema minor version */ /** Represents protocol encoding schema minor version */
const common.MinorVersionType protocol_minor_version = 0 const common.MinorVersionType protocol_minor_version = 0
/** Common RIFT packet header. */ /** Common RIFT packet header. */
struct PacketHeader { struct PacketHeader {
/** Major version of protocol. */ /** Major version of protocol. */
1: required common.VersionType major_version = 1: required common.VersionType major_version =
protocol_major_version; protocol_major_version;
/** Minor version of protocol. */ /** Minor version of protocol. */
2: required common.VersionType minor_version = 2: required common.MinorVersionType minor_version =
protocol_minor_version; protocol_minor_version;
/** Node sending the packet, in case of LIE/TIRE/TIDE /** Node sending the packet, in case of LIE/TIRE/TIDE
also the originator of it. */ also the originator of it. */
3: required common.SystemIDType sender; 3: required common.SystemIDType sender;
/** Level of the node sending the packet, required on everything /** Level of the node sending the packet, required on everything
except LIEs. Lack of presence on LIEs indicates UNDEFINED_LEVEL except LIEs. Lack of presence on LIEs indicates UNDEFINED_LEVEL
and is used in ZTP procedures. and is used in ZTP procedures.
*/ */
4: optional common.LevelType level; 4: optional common.LevelType level;
} }
skipping to change at page 155, line 48 skipping to change at page 158, line 27
2: required map<common.SystemIDType, 2: required map<common.SystemIDType,
NodeNeighborsTIEElement> neighbors; NodeNeighborsTIEElement> neighbors;
/** Capabilities of the node. */ /** Capabilities of the node. */
3: required NodeCapabilities capabilities; 3: required NodeCapabilities capabilities;
/** Flags of the node. */ /** Flags of the node. */
4: optional NodeFlags flags; 4: optional NodeFlags flags;
/** Optional node name for easier operations. */ /** Optional node name for easier operations. */
5: optional string name; 5: optional string name;
/** PoD to which the node belongs. */ /** PoD to which the node belongs. */
6: optional common.PodType pod; 6: optional common.PodType pod;
/** optional startup time of the node */
7: optional common.TimestampInSecsType startup_time;
/** If any local links are miscabled, the indication is flooded. */ /** If any local links are miscabled, the indication is flooded. */
10: optional set<common.LinkIDType> miscabled_links; 10: optional set<common.LinkIDType> miscabled_links;
} }
/** Attributes of a prefix. */ /** Attributes of a prefix. */
struct PrefixAttributes { struct PrefixAttributes {
/** Distance of the prefix. */ /** Distance of the prefix. */
2: required common.MetricType metric 2: required common.MetricType metric
 End of changes. 273 change blocks. 
985 lines changed or deleted 1058 lines changed or added

This html diff was produced by rfcdiff 1.47. The latest version is available from http://tools.ietf.org/tools/rfcdiff/