draft-ietf-p2psip-base-21.txt   draft-ietf-p2psip-base-22.txt 
P2PSIP C. Jennings P2PSIP C. Jennings
Internet-Draft Cisco Internet-Draft Cisco
Intended status: Standards Track B. Lowekamp, Ed. Intended status: Standards Track B. Lowekamp, Ed.
Expires: September 29, 2012 Skype Expires: January 17, 2013 Skype
E. Rescorla E. Rescorla
RTFM, Inc. RTFM, Inc.
S. Baset S. Baset
H. Schulzrinne H. Schulzrinne
Columbia University Columbia University
March 28, 2012 July 16, 2012
REsource LOcation And Discovery (RELOAD) Base Protocol REsource LOcation And Discovery (RELOAD) Base Protocol
draft-ietf-p2psip-base-21 draft-ietf-p2psip-base-22
Abstract Abstract
This specification defines REsource LOcation And Discovery (RELOAD), This specification defines REsource LOcation And Discovery (RELOAD),
a peer-to-peer (P2P) signaling protocol for use on the Internet. A a peer-to-peer (P2P) signaling protocol for use on the Internet. A
P2P signaling protocol provides its clients with an abstract storage P2P signaling protocol provides its clients with an abstract storage
and messaging service between a set of cooperating peers that form and messaging service between a set of cooperating peers that form
the overlay network. RELOAD is designed to support a P2P Session the overlay network. RELOAD is designed to support a P2P Session
Initiation Protocol (P2PSIP) network, but can be utilized by other Initiation Protocol (P2PSIP) network, but can be utilized by other
applications with similar requirements by defining new usages that applications with similar requirements by defining new usages that
skipping to change at page 1, line 48 skipping to change at page 1, line 48
Internet-Drafts are working documents of the Internet Engineering Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet- working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/. Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months Internet-Drafts are draft documents valid for a maximum of six months
and may be updated, replaced, or obsoleted by other documents at any and may be updated, replaced, or obsoleted by other documents at any
time. It is inappropriate to use Internet-Drafts as reference time. It is inappropriate to use Internet-Drafts as reference
material or to cite them other than as "work in progress." material or to cite them other than as "work in progress."
This Internet-Draft will expire on September 29, 2012. This Internet-Draft will expire on January 17, 2013.
Copyright Notice Copyright Notice
Copyright (c) 2012 IETF Trust and the persons identified as the Copyright (c) 2012 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
skipping to change at page 3, line 13 skipping to change at page 3, line 13
than English. than English.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 8
1.1. Basic Setting . . . . . . . . . . . . . . . . . . . . . 9 1.1. Basic Setting . . . . . . . . . . . . . . . . . . . . . 9
1.2. Architecture . . . . . . . . . . . . . . . . . . . . . . 10 1.2. Architecture . . . . . . . . . . . . . . . . . . . . . . 10
1.2.1. Usage Layer . . . . . . . . . . . . . . . . . . . . 13 1.2.1. Usage Layer . . . . . . . . . . . . . . . . . . . . 13
1.2.2. Message Transport . . . . . . . . . . . . . . . . . 14 1.2.2. Message Transport . . . . . . . . . . . . . . . . . 14
1.2.3. Storage . . . . . . . . . . . . . . . . . . . . . . 15 1.2.3. Storage . . . . . . . . . . . . . . . . . . . . . . 15
1.2.4. Topology Plugin . . . . . . . . . . . . . . . . . . 15 1.2.4. Topology Plugin . . . . . . . . . . . . . . . . . . 16
1.2.5. Forwarding and Link Management Layer . . . . . . . . 16 1.2.5. Forwarding and Link Management Layer . . . . . . . . 16
1.3. Security . . . . . . . . . . . . . . . . . . . . . . . . 17 1.3. Security . . . . . . . . . . . . . . . . . . . . . . . . 17
1.4. Structure of This Document . . . . . . . . . . . . . . . 18 1.4. Structure of This Document . . . . . . . . . . . . . . . 18
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 18 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 18
3. Overlay Management Overview . . . . . . . . . . . . . . . . . 21 3. Overlay Management Overview . . . . . . . . . . . . . . . . . 21
3.1. Security and Identification . . . . . . . . . . . . . . 21 3.1. Security and Identification . . . . . . . . . . . . . . 21
3.1.1. Shared-Key Security . . . . . . . . . . . . . . . . 23 3.1.1. Shared-Key Security . . . . . . . . . . . . . . . . 23
3.2. Clients . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2. Clients . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.1. Client Routing . . . . . . . . . . . . . . . . . . . 24 3.2.1. Client Routing . . . . . . . . . . . . . . . . . . . 24
3.2.2. Minimum Functionality Requirements for Clients . . . 24 3.2.2. Minimum Functionality Requirements for Clients . . . 24
skipping to change at page 5, line 40 skipping to change at page 5, line 40
7.4.3.2. Response Definition . . . . . . . . . . . . . . . 101 7.4.3.2. Response Definition . . . . . . . . . . . . . . . 101
7.4.4. Find . . . . . . . . . . . . . . . . . . . . . . . . 103 7.4.4. Find . . . . . . . . . . . . . . . . . . . . . . . . 103
7.4.4.1. Request Definition . . . . . . . . . . . . . . . 103 7.4.4.1. Request Definition . . . . . . . . . . . . . . . 103
7.4.4.2. Response Definition . . . . . . . . . . . . . . . 104 7.4.4.2. Response Definition . . . . . . . . . . . . . . . 104
7.4.5. Defining New Kinds . . . . . . . . . . . . . . . . . 105 7.4.5. Defining New Kinds . . . . . . . . . . . . . . . . . 105
8. Certificate Store Usage . . . . . . . . . . . . . . . . . . . 105 8. Certificate Store Usage . . . . . . . . . . . . . . . . . . . 105
9. TURN Server Usage . . . . . . . . . . . . . . . . . . . . . . 106 9. TURN Server Usage . . . . . . . . . . . . . . . . . . . . . . 106
10. Chord Algorithm . . . . . . . . . . . . . . . . . . . . . . . 108 10. Chord Algorithm . . . . . . . . . . . . . . . . . . . . . . . 108
10.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 109 10.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 109
10.2. Hash Function . . . . . . . . . . . . . . . . . . . . . 109 10.2. Hash Function . . . . . . . . . . . . . . . . . . . . . 109
10.3. Routing . . . . . . . . . . . . . . . . . . . . . . . . 109 10.3. Routing . . . . . . . . . . . . . . . . . . . . . . . . 110
10.4. Redundancy . . . . . . . . . . . . . . . . . . . . . . . 110 10.4. Redundancy . . . . . . . . . . . . . . . . . . . . . . . 110
10.5. Joining . . . . . . . . . . . . . . . . . . . . . . . . 110 10.5. Joining . . . . . . . . . . . . . . . . . . . . . . . . 110
10.6. Routing Attaches . . . . . . . . . . . . . . . . . . . . 111 10.6. Routing Attaches . . . . . . . . . . . . . . . . . . . . 112
10.7. Updates . . . . . . . . . . . . . . . . . . . . . . . . 111 10.7. Updates . . . . . . . . . . . . . . . . . . . . . . . . 112
10.7.1. Handling Neighbor Failures . . . . . . . . . . . . . 113 10.7.1. Handling Neighbor Failures . . . . . . . . . . . . . 113
10.7.2. Handling Finger Table Entry Failure . . . . . . . . 114 10.7.2. Handling Finger Table Entry Failure . . . . . . . . 114
10.7.3. Receiving Updates . . . . . . . . . . . . . . . . . 114 10.7.3. Receiving Updates . . . . . . . . . . . . . . . . . 114
10.7.4. Stabilization . . . . . . . . . . . . . . . . . . . 115 10.7.4. Stabilization . . . . . . . . . . . . . . . . . . . 115
10.7.4.1. Updating neighbor table . . . . . . . . . . . . . 115 10.7.4.1. Updating neighbor table . . . . . . . . . . . . . 115
10.7.4.2. Refreshing finger table . . . . . . . . . . . . . 115 10.7.4.2. Refreshing finger table . . . . . . . . . . . . . 116
10.7.4.3. Adjusting finger table size . . . . . . . . . . . 116 10.7.4.3. Adjusting finger table size . . . . . . . . . . . 116
10.7.4.4. Detecting partitioning . . . . . . . . . . . . . 117 10.7.4.4. Detecting partitioning . . . . . . . . . . . . . 117
10.8. Route query . . . . . . . . . . . . . . . . . . . . . . 117 10.8. Route query . . . . . . . . . . . . . . . . . . . . . . 117
10.9. Leaving . . . . . . . . . . . . . . . . . . . . . . . . 118 10.9. Leaving . . . . . . . . . . . . . . . . . . . . . . . . 118
11. Enrollment and Bootstrap . . . . . . . . . . . . . . . . . . 119 11. Enrollment and Bootstrap . . . . . . . . . . . . . . . . . . 119
11.1. Overlay Configuration . . . . . . . . . . . . . . . . . 119 11.1. Overlay Configuration . . . . . . . . . . . . . . . . . 119
11.1.1. Relax NG Grammar . . . . . . . . . . . . . . . . . . 125 11.1.1. Relax NG Grammar . . . . . . . . . . . . . . . . . . 126
11.2. Discovery Through Configuration Server . . . . . . . . . 128 11.2. Discovery Through Configuration Server . . . . . . . . . 128
11.3. Credentials . . . . . . . . . . . . . . . . . . . . . . 128 11.3. Credentials . . . . . . . . . . . . . . . . . . . . . . 129
11.3.1. Self-Generated Credentials . . . . . . . . . . . . . 130 11.3.1. Self-Generated Credentials . . . . . . . . . . . . . 131
11.4. Searching for a Bootstrap Node . . . . . . . . . . . . . 130 11.4. Searching for a Bootstrap Node . . . . . . . . . . . . . 131
11.5. Contacting a Bootstrap Node . . . . . . . . . . . . . . 130 11.5. Contacting a Bootstrap Node . . . . . . . . . . . . . . 131
12. Message Flow Example . . . . . . . . . . . . . . . . . . . . 131 12. Message Flow Example . . . . . . . . . . . . . . . . . . . . 132
13. Security Considerations . . . . . . . . . . . . . . . . . . . 137 13. Security Considerations . . . . . . . . . . . . . . . . . . . 138
13.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 137 13.1. Overview . . . . . . . . . . . . . . . . . . . . . . . . 138
13.2. Attacks on P2P Overlays . . . . . . . . . . . . . . . . 138 13.2. Attacks on P2P Overlays . . . . . . . . . . . . . . . . 139
13.3. Certificate-based Security . . . . . . . . . . . . . . . 138 13.3. Certificate-based Security . . . . . . . . . . . . . . . 139
13.4. Shared-Secret Security . . . . . . . . . . . . . . . . . 139 13.4. Shared-Secret Security . . . . . . . . . . . . . . . . . 140
13.5. Storage Security . . . . . . . . . . . . . . . . . . . . 140 13.5. Storage Security . . . . . . . . . . . . . . . . . . . . 141
13.5.1. Authorization . . . . . . . . . . . . . . . . . . . 140 13.5.1. Authorization . . . . . . . . . . . . . . . . . . . 141
13.5.2. Distributed Quota . . . . . . . . . . . . . . . . . 141 13.5.2. Distributed Quota . . . . . . . . . . . . . . . . . 142
13.5.3. Correctness . . . . . . . . . . . . . . . . . . . . 141 13.5.3. Correctness . . . . . . . . . . . . . . . . . . . . 142
13.5.4. Residual Attacks . . . . . . . . . . . . . . . . . . 141 13.5.4. Residual Attacks . . . . . . . . . . . . . . . . . . 142
13.6. Routing Security . . . . . . . . . . . . . . . . . . . . 142 13.6. Routing Security . . . . . . . . . . . . . . . . . . . . 143
13.6.1. Background . . . . . . . . . . . . . . . . . . . . . 142 13.6.1. Background . . . . . . . . . . . . . . . . . . . . . 143
13.6.2. Admissions Control . . . . . . . . . . . . . . . . . 143 13.6.2. Admissions Control . . . . . . . . . . . . . . . . . 144
13.6.3. Peer Identification and Authentication . . . . . . . 143 13.6.3. Peer Identification and Authentication . . . . . . . 144
13.6.4. Protecting the Signaling . . . . . . . . . . . . . . 144 13.6.4. Protecting the Signaling . . . . . . . . . . . . . . 145
13.6.5. Routing Loops and Dos Attacks . . . . . . . . . . . 144 13.6.5. Routing Loops and Dos Attacks . . . . . . . . . . . 145
13.6.6. Residual Attacks . . . . . . . . . . . . . . . . . . 144 13.6.6. Residual Attacks . . . . . . . . . . . . . . . . . . 145
14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 145 14. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 146
14.1. Well-Known URI Registration . . . . . . . . . . . . . . 145 14.1. Well-Known URI Registration . . . . . . . . . . . . . . 146
14.2. Port Registrations . . . . . . . . . . . . . . . . . . . 145 14.2. Port Registrations . . . . . . . . . . . . . . . . . . . 146
14.3. Overlay Algorithm Types . . . . . . . . . . . . . . . . 146 14.3. Overlay Algorithm Types . . . . . . . . . . . . . . . . 147
14.4. Access Control Policies . . . . . . . . . . . . . . . . 146 14.4. Access Control Policies . . . . . . . . . . . . . . . . 147
14.5. Application-ID . . . . . . . . . . . . . . . . . . . . . 147 14.5. Application-ID . . . . . . . . . . . . . . . . . . . . . 148
14.6. Data Kind-ID . . . . . . . . . . . . . . . . . . . . . . 147 14.6. Data Kind-ID . . . . . . . . . . . . . . . . . . . . . . 148
14.7. Data Model . . . . . . . . . . . . . . . . . . . . . . . 148 14.7. Data Model . . . . . . . . . . . . . . . . . . . . . . . 149
14.8. Message Codes . . . . . . . . . . . . . . . . . . . . . 148 14.8. Message Codes . . . . . . . . . . . . . . . . . . . . . 149
14.9. Error Codes . . . . . . . . . . . . . . . . . . . . . . 150 14.9. Error Codes . . . . . . . . . . . . . . . . . . . . . . 151
14.10. Overlay Link Types . . . . . . . . . . . . . . . . . . . 150 14.10. Overlay Link Types . . . . . . . . . . . . . . . . . . . 151
14.11. Overlay Link Protocols . . . . . . . . . . . . . . . . . 151 14.11. Overlay Link Protocols . . . . . . . . . . . . . . . . . 152
14.12. Forwarding Options . . . . . . . . . . . . . . . . . . . 151 14.12. Forwarding Options . . . . . . . . . . . . . . . . . . . 152
14.13. Probe Information Types . . . . . . . . . . . . . . . . 152 14.13. Probe Information Types . . . . . . . . . . . . . . . . 153
14.14. Message Extensions . . . . . . . . . . . . . . . . . . . 152 14.14. Message Extensions . . . . . . . . . . . . . . . . . . . 153
14.15. reload URI Scheme . . . . . . . . . . . . . . . . . . . 152 14.15. reload URI Scheme . . . . . . . . . . . . . . . . . . . 153
14.15.1. URI Registration . . . . . . . . . . . . . . . . . . 153 14.15.1. URI Registration . . . . . . . . . . . . . . . . . . 154
14.16. Media Type Registration . . . . . . . . . . . . . . . . 154 14.16. Media Type Registration . . . . . . . . . . . . . . . . 155
14.17. XML Name Space Registration . . . . . . . . . . . . . . 155 14.17. XML Name Space Registration . . . . . . . . . . . . . . 156
14.17.1. Config URL . . . . . . . . . . . . . . . . . . . . . 155 14.17.1. Config URL . . . . . . . . . . . . . . . . . . . . . 156
14.17.2. Config Chord URL . . . . . . . . . . . . . . . . . . 155 14.17.2. Config Chord URL . . . . . . . . . . . . . . . . . . 156
15. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 155 15. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 156
16. References . . . . . . . . . . . . . . . . . . . . . . . . . 156 16. References . . . . . . . . . . . . . . . . . . . . . . . . . 157
16.1. Normative References . . . . . . . . . . . . . . . . . . 156 16.1. Normative References . . . . . . . . . . . . . . . . . . 157
16.2. Informative References . . . . . . . . . . . . . . . . . 158 16.2. Informative References . . . . . . . . . . . . . . . . . 159
Appendix A. Routing Alternatives . . . . . . . . . . . . . . . . 161 Appendix A. Routing Alternatives . . . . . . . . . . . . . . . . 162
A.1. Iterative vs Recursive . . . . . . . . . . . . . . . . . 161 A.1. Iterative vs Recursive . . . . . . . . . . . . . . . . . 162
A.2. Symmetric vs Forward response . . . . . . . . . . . . . 162 A.2. Symmetric vs Forward response . . . . . . . . . . . . . 163
A.3. Direct Response . . . . . . . . . . . . . . . . . . . . 162 A.3. Direct Response . . . . . . . . . . . . . . . . . . . . 163
A.4. Relay Peers . . . . . . . . . . . . . . . . . . . . . . 163 A.4. Relay Peers . . . . . . . . . . . . . . . . . . . . . . 164
A.5. Symmetric Route Stability . . . . . . . . . . . . . . . 164 A.5. Symmetric Route Stability . . . . . . . . . . . . . . . 165
Appendix B. Why Clients? . . . . . . . . . . . . . . . . . . . . 164 Appendix B. Why Clients? . . . . . . . . . . . . . . . . . . . . 165
B.1. Why Not Only Peers? . . . . . . . . . . . . . . . . . . 164 B.1. Why Not Only Peers? . . . . . . . . . . . . . . . . . . 165
B.2. Clients as Application-Level Agents . . . . . . . . . . 165 B.2. Clients as Application-Level Agents . . . . . . . . . . 166
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 165 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 166
1. Introduction 1. Introduction
This document defines REsource LOcation And Discovery (RELOAD), a This document defines REsource LOcation And Discovery (RELOAD), a
peer-to-peer (P2P) signaling protocol for use on the Internet. It peer-to-peer (P2P) signaling protocol for use on the Internet. It
provides a generic, self-organizing overlay network service, allowing provides a generic, self-organizing overlay network service, allowing
nodes to efficiently route messages to other nodes and to efficiently nodes to route messages to other nodes and to store and retrieve data
store and retrieve data in the overlay. RELOAD provides several in the overlay. RELOAD provides several features that are critical
features that are critical for a successful P2P protocol for the for a successful P2P protocol for the Internet:
Internet:
Security Framework: A P2P network will often be established among a Security Framework: A P2P network will often be established among a
set of peers that do not trust each other. RELOAD leverages a set of peers that do not trust each other. RELOAD leverages a
central enrollment server to provide credentials for each peer central enrollment server to provide credentials for each peer
which can then be used to authenticate each operation. This which can then be used to authenticate each operation. This
greatly reduces the possible attack surface. greatly reduces the possible attack surface.
Usage Model: RELOAD is designed to support a variety of Usage Model: RELOAD is designed to support a variety of
applications, including P2P multimedia communications with the applications, including P2P multimedia communications with the
Session Initiation Protocol [I-D.ietf-p2psip-sip]. RELOAD allows Session Initiation Protocol [I-D.ietf-p2psip-sip]. RELOAD allows
skipping to change at page 13, line 39 skipping to change at page 13, line 39
| | +------------------+ | | +------------------+
| | ---------------------------------- | | ----------------------------------
Transport | Link | +-------+ +------+ Transport | Link | +-------+ +------+
| | |TLS | |DTLS | ... | | |TLS | |DTLS | ...
| | +-------+ +------+ | | +-------+ +------+
-------------+-----------------+------------------------------------ -------------+-----------------+------------------------------------
Network | Network |
| |
Link | Link |
In addition to the above components, nodes communicate with a central
provisioning infrastructure (not shown) to get configuration
information, authentication credentials, and the initial set of nodes
to communicate with to join the overlay.
1.2.1. Usage Layer 1.2.1. Usage Layer
The top layer, called the Usage Layer, has application usages, such The top layer, called the Usage Layer, has application usages, such
as the SIP Registration Usage [I-D.ietf-p2psip-sip], that use the as the SIP Registration Usage [I-D.ietf-p2psip-sip], that use the
abstract Message Transport Service provided by RELOAD. The goal of abstract Message Transport Service provided by RELOAD. The goal of
this layer is to implement application-specific usages of the generic this layer is to implement application-specific usages of the generic
overlay services provided by RELOAD. The usage defines how a overlay services provided by RELOAD. The usage defines how a
specific application maps its data into something that can be stored specific application maps its data into something that can be stored
in the overlay, where to store the data, how to secure the data, and in the overlay, where to store the data, how to secure the data, and
finally how applications can retrieve and use the data. finally how applications can retrieve and use the data.
skipping to change at page 45, line 24 skipping to change at page 45, line 24
nodes to participate in multiple overlays and to detect accidental nodes to participate in multiple overlays and to detect accidental
misconfiguration. This is not a security critical function. The misconfiguration. This is not a security critical function. The
overlay name MUST consist of a sequence of charters what would be overlay name MUST consist of a sequence of charters what would be
allowable as a DNS name. allowable as a DNS name.
configuration_sequence: The sequence number of the configuration configuration_sequence: The sequence number of the configuration
file. file.
version: The version of the RELOAD protocol being used. This is a version: The version of the RELOAD protocol being used. This is a
fixed point integer between 0.1 and 25.4. This document describes fixed point integer between 0.1 and 25.4. This document describes
version 0.1, with a value of 0x01. [[ Note to RFC Editor: Please version 1.0, with a value of 0x0a. [Note: Pre-RFC versions used
update this to version 1.0 with value of 0x0a and remove this version number 0.1]. Nodes MUST reject messages with other
note. ]] versions.
ttl: An 8 bit field indicating the number of iterations, or hops, a ttl: An 8 bit field indicating the number of iterations, or hops, a
message can experience before it is discarded. The TTL value MUST message can experience before it is discarded. The TTL value MUST
be decremented by one at every hop along the route the message be decremented by one at every hop along the route the message
traverses just before transmission. If a received message has a traverses just before transmission. If a received message has a
TTL of 0, and the message is not destined for the receiving node, TTL of 0, and the message is not destined for the receiving node,
then the message MUST NOT be propagated further and and a then the message MUST NOT be propagated further and and a
"Error_TTL_Exceeded" error should be generated. The initial value "Error_TTL_Exceeded" error should be generated. The initial value
of the TTL SHOULD be 100 and MUST NOT exceed 100 unless defined of the TTL SHOULD be 100 and MUST NOT exceed 100 unless defined
otherwise by the overlay configuration. Implementations which otherwise by the overlay configuration. Implementations which
skipping to change at page 108, line 42 skipping to change at page 108, line 42
the ring from the peer. This change was made to simplify the ring from the peer. This change was made to simplify
discussion and implementation of variable sized finger tables. discussion and implementation of variable sized finger tables.
However, with either approach no more than O(log N) entries should However, with either approach no more than O(log N) entries should
typically be stored in a finger table. typically be stored in a finger table.
o The stabilize() and fix_fingers() algorithms in the original Chord o The stabilize() and fix_fingers() algorithms in the original Chord
algorithm are merged into a single periodic process. algorithm are merged into a single periodic process.
Stabilization is implemented slightly differently because of the Stabilization is implemented slightly differently because of the
larger neighborhood, and fix_fingers is not as aggressive to larger neighborhood, and fix_fingers is not as aggressive to
reduce load, nor does it search for optimal matches of the finger reduce load, nor does it search for optimal matches of the finger
table entries. table entries.
o RELOAD uses a 128 bit hash instead of a 160 bit hash, as RELOAD is o RELOAD allows for a 128 bit hash instead of a 160 bit hash, as
not designed to be used in networks with close to or more than RELOAD is not designed to be used in networks with close to or
2^128 nodes (and it is hard to see how one would assemble such a more than 2^128 nodes or objects (and it is hard to see how one
network). would assemble such a network).
o RELOAD uses randomized finger entries as described in o RELOAD uses randomized finger entries as described in
Section 10.7.4.2. Section 10.7.4.2.
o This algorithm allows the use of either reactive or periodic o This algorithm allows the use of either reactive or periodic
recovery. The original Chord paper used periodic recovery. recovery. The original Chord paper used periodic recovery.
Reactive recovery provides better performance in small overlays, Reactive recovery provides better performance in small overlays,
but is believed to be unstable in large (>1000) overlays with high but is believed to be unstable in large (>1000) overlays with high
levels of churn [handling-churn-usenix04]. The overlay levels of churn [handling-churn-usenix04]. The overlay
configuration file specifies a "chord-reactive" element that configuration file specifies a "chord-reactive" element that
indicates whether reactive recovery should be used. indicates whether reactive recovery should be used.
10.1. Overview 10.1. Overview
The algorithm described here is a modified version of the Chord The algorithm described here is a modified version of the Chord
algorithm. Each peer keeps track of a finger table and a neighbor algorithm. In Chord (and in the algorithm described here), nodes are
table. The neighbor table contains at least the three peers before arranged in a ring with node n being adjacent to nodes n-1 and n+1,
and after this peer in the DHT ring. There may not be three entries with all arithmetic being done modulo 2^{k}, where k is the length of
in all cases such as small rings or while the ring topology is the Node-Id in bits, so that node 2^{k} - 1 is directly before node
changing. The first entry in the finger table contains the peer 0.
half-way around the ring from this peer; the second entry contains
the peer that is 1/4 of the way around; the third entry contains the
peer that is 1/8th of the way around, and so on. Fundamentally, the
chord data structure can be thought of a doubly-linked list formed by
knowing the successors and predecessor peers in the neighbor table,
sorted by the Node-ID. As long as the successor peers are correct,
the DHT will return the correct result. The pointers to the prior
peers are kept to enable the insertion of new peers into the list
structure. Keeping multiple predecessor and successor pointers makes
it possible to maintain the integrity of the data structure even when
consecutive peers simultaneously fail. The finger table forms a skip
list, so that entries in the linked list can be found in O(log(N))
time instead of the typical O(N) time that a linked list would
provide.
A peer, n, is responsible for a particular Resource-ID k if k is less Each peer keeps track of a finger table and a neighbor table. The
than or equal to n and k is greater than p, where p is the Node-ID of neighbor table contains at least the three peers before and after
this peer in the DHT ring. There may not be three entries in all
cases such as small rings or while the ring topology is changing.
The first entry in the finger table contains the peer half-way around
the ring from this peer; the second entry contains the peer that is
1/4 of the way around; the third entry contains the peer that is
1/8th of the way around, and so on. Fundamentally, the chord DHT can
be thought of a doubly-linked list formed by knowing the successors
and predecessor peers in the neighbor table, sorted by the Node-ID.
As long as the successor peers are correct, the DHT will return the
correct result. The pointers to the prior peers are kept to enable
the insertion of new peers into the list structure. Keeping multiple
predecessor and successor pointers makes it possible to maintain the
integrity of the data structure even when consecutive peers
simultaneously fail. The finger table forms a skip list, so that
entries in the linked list can be found in O(log(N)) time instead of
the typical O(N) time that a linked list would provide where N
represents the number of nodes in the DHT.
The neighbor and finger table entries contain logical Node-IDs as
values but the actual mapping of an IP level addressing information
to reach that Node-ID is kept in the connection table.
A peer, x, is responsible for a particular Resource-ID k if k is less
than or equal to x and k is greater than p, where p is the Node-ID of
the previous peer in the neighbor table. Care must be taken when the previous peer in the neighbor table. Care must be taken when
computing to note that all math is modulo 2^128. computing to note that all math is modulo 2^128.
10.2. Hash Function 10.2. Hash Function
For this Chord based topology plugin, the size of the Resource-ID is For this Chord based topology plugin, the size of the Resource-ID is
128 bits. The hash of a Resource-ID MUST be computed using SHA-1 128 bits. The hash of a Resource-ID MUST be computed using SHA-1
[RFC3174]then truncating the SHA-1 result to the most significant 128 [RFC3174]then truncating the SHA-1 result to the most significant 128
bits. bits.
10.3. Routing 10.3. Routing
The routing table is the union of the neighbor table and the finger The routing table is conceptually the union of the neighbor table and
table. the finger table.
If a peer is not responsible for a Resource-ID k, but is directly If a peer is not responsible for a Resource-ID k, but is directly
connected to a node with Node-ID k, then it MUST route the message to connected to a node with Node-ID k, then it MUST route the message to
that node. Otherwise, it MUST route the request to the peer in the that node. Otherwise, it MUST route the request to the peer in the
routing table that has the largest Node-ID that is in the interval routing table that has the largest Node-ID that is in the interval
between the peer and k. If no such node is found, it finds the between the peer and k. If no such node is found, it finds the
smallest Node-Id that is greater than k and MUST route the message to smallest Node-Id that is greater than k and MUST route the message to
that node. that node.
10.4. Redundancy 10.4. Redundancy
When a peer receives a Store request for Resource-ID k, and it is When a peer receives a Store request for Resource-ID k, and it is
responsible for Resource-ID k, it MUST store the data and returns a responsible for Resource-ID k, it MUST store the data and returns a
success response. It MUST then sends a Store request to its success response. It MUST then send a Store request to its successor
successor in the neighbor table and to that peer's successor. Note in the neighbor table and to that peer's successor. Note that these
that these Store requests are addressed to those specific peers, even Store requests are addressed to those specific peers, even though the
though the Resource-ID they are being asked to store is outside the Resource-ID they are being asked to store is outside the range that
range that they are responsible for. The peers receiving these they are responsible for. The peers receiving these SHOULD check
SHOULD check they came from an appropriate predecessor in their they came from an appropriate predecessor in their neighbor table and
neighbor table and that they are in a range that this predecessor is that they are in a range that this predecessor is responsible for,
responsible for, and then they MUST store the data. They do not and then they MUST store the data. They do not themselves perform
themselves perform further Stores because they can determine that further Stores because they can determine that they are not
they are not responsible for the Resource-ID. responsible for the Resource-ID.
Managing replicas as the overlay changes is described in Managing replicas as the overlay changes is described in
Section 10.7.3. Section 10.7.3.
The sequential replicas used in this overlay algorithm protect The sequential replicas used in this overlay algorithm protect
against peer failure but not against malicious peers. Additional against peer failure but not against malicious peers. Additional
replication from the Usage is required to protect resources from such replication from the Usage is required to protect resources from such
attacks, as discussed in Section 13.5.4. attacks, as discussed in Section 13.5.4.
10.5. Joining 10.5. Joining
The join process for a joining party (JP) with Node-ID n is as The join process for a joining party (JP) with Node-ID n is as
follows. follows.
1. JP MUST connect to its chosen bootstrap node. 1. JP MUST connect to its chosen bootstrap node.
2. JP SHOULD send an Attach request to the admitting peer (AP) for 2. JP SHOULD send an Attach request to the admitting peer (AP) for
Node-ID n. The "send_update" flag should be used to acquire the Node-ID n. The "send_update" flag can be used to acquire the
routing table for AP. routing table for AP.
3. JP SHOULD send Attach requests to initiate connections to each of 3. JP SHOULD send Attach requests to initiate connections to each of
the peers in the neighbor table as well as to the desired finger the peers in the neighbor table as well as to the desired finger
table entries. Note that this does not populate their routing table entries. Note that this does not populate their routing
tables, but only their connection tables, so JP will not get tables, but only their connection tables, so JP will not get
messages that it is expected to route to other nodes. messages that it is expected to route to other nodes.
4. JP MUST enter all the peers it has contacted into its routing 4. JP MUST enter all the peers it has successfully contacted into
table. its routing table.
5. JP MUST send a Join to AP. The AP sends the response to the 5. JP MUST send a Join to AP. The AP sends the response to the
Join. Join.
6. AP MUST do a series of Store requests to JP to store the data 6. AP MUST do a series of Store requests to JP to store the data
that JP will be responsible for. that JP will be responsible for.
7. AP MUST send JP an Update explicitly labeling JP as its 7. AP MUST send JP an Update explicitly labeling JP as its
predecessor. At this point, JP is part of the ring and predecessor. At this point, JP is part of the ring and
responsible for a section of the overlay. AP can now forget any responsible for a section of the overlay. AP MAY now forget any
data which is assigned to JP and not AP. data which is assigned to JP and not AP. AP SHOULD not forget
any data where AP is the replica set for the data.
8. The AP MUST send an Update to all of its neighbors with the new 8. The AP MUST send an Update to all of its neighbors with the new
values of its neighbor set (including JP). values of its neighbor set (including JP).
9. The JP MUST send Updates to all the peers in its neighbor table. 9. The JP MUST send Updates to all the peers in its neighbor table.
If JP sends an Attach to AP with send_update, it immediately knows If JP sends an Attach to AP with send_update, it immediately knows
most of its expected neighbors from AP's routing table update and can most of its expected neighbors from AP's routing table update and can
directly connect to them. This is the RECOMMENDED procedure. directly connect to them. This is the RECOMMENDED procedure.
If for some reason JP does not get AP's routing table, it can still If for some reason JP does not get AP's routing table, it can still
populate its neighbor table incrementally. It sends a Ping directed populate its neighbor table incrementally. It sends a Ping directed
at Resource-ID n+1 (directly after its own Resource-ID). This allows at Resource-ID n+1 (directly after its own Resource-ID). This allows
it to discover its own successor. Call that node p0. It then sends it to discover its own successor. Call that node p0. It then sends
a ping to p0+1 to discover its successor (p1). This process can be a ping to p0+1 to discover its successor (p1). This process can be
repeated to discover as many successors as desired. The values for repeated to discover as many successors as desired. The values for
the two peers before p will be found at a later stage when n receives the two peers before p will be found at a later stage when n receives
an Update. An alternate procedure is to send Attaches to those nodes an Update. An alternate procedure is to send Attaches to those nodes
rather than pings, which forms the connections immediately but may be rather than pings, which forms the connections immediately but may be
slower if the nodes need to collect ICE candidates, thus reducing slower if the nodes need to collect ICE candidates, thus reducing
parallelism. parallelism.
In order to set up its finger table entry for peer i, JP simply sends In order to set up its i'th finger table entry, JP simply sends an
an Attach to peer (n+2^(128-i). This will be routed to a peer in Attach to peer n+2^(128-i). This will be routed to a peer in
approximately the right location around the ring. approximately the right location around the ring. (Note the first
entry in the finger table has i=1 and not i=0 in this formulation).
The joining peer MUST NOT send any Update message placing itself in The joining peer MUST NOT send any Update message placing itself in
the overlay until it has successfully completed an Attach with each the overlay until it has successfully completed an Attach with each
peer that should be in its neighbor table. peer that should be in its neighbor table.
10.6. Routing Attaches 10.6. Routing Attaches
When a peer needs to Attach to a new peer in its neighbor table, it When a peer needs to Attach to a new peer in its neighbor table, it
MUST source-route the Attach request through the peer from which it MUST source-route the Attach request through the peer from which it
learned the new peer's Node-ID. Source-routing these requests allows learned the new peer's Node-ID. Source-routing these requests allows
skipping to change at page 113, line 45 skipping to change at page 114, line 12
with the best match it has from the other peers in its routing table. with the best match it has from the other peers in its routing table.
If using reactive recovery, it then sends an immediate Update to all If using reactive recovery, it then sends an immediate Update to all
nodes in its Neighbor Table. The update will contain all the Node- nodes in its Neighbor Table. The update will contain all the Node-
IDs of the current entries of the table (after the failed one has IDs of the current entries of the table (after the failed one has
been removed). Note that when replacing a successor the peer SHOULD been removed). Note that when replacing a successor the peer SHOULD
delay the creation of new replicas for successor replacement hold- delay the creation of new replicas for successor replacement hold-
down time (30 seconds) after removing the failed entry from its down time (30 seconds) after removing the failed entry from its
neighbor table in order to allow a triggered update to inform it of a neighbor table in order to allow a triggered update to inform it of a
better match for its neighbor table. better match for its neighbor table.
If the neighbor failure effects the peer's range of responsible IDs, If the neighbor failure affects the peer's range of responsible IDs,
then the Update MUST be sent to all nodes in its Connection Table. then the Update MUST be sent to all nodes in its Connection Table.
A peer MAY attempt to reestablish connectivity with a lost neighbor A peer MAY attempt to reestablish connectivity with a lost neighbor
either by waiting additional time to see if connectivity returns or either by waiting additional time to see if connectivity returns or
by actively routing a new Attach to the lost peer. Details for these by actively routing a new Attach to the lost peer. Details for these
procedures are beyond the scope of this document. In no event does procedures are beyond the scope of this document. In no event does
an attempt to reestablish connectivity with a lost neighbor allow the an attempt to reestablish connectivity with a lost neighbor allow the
peer to remain in the neighbor table. Such a peer is returned to the peer to remain in the neighbor table. Such a peer is returned to the
neighbor table once connectivity is reestablished. neighbor table once connectivity is reestablished.
skipping to change at page 114, line 27 skipping to change at page 114, line 41
If a finger table entry is found to have failed, all references to If a finger table entry is found to have failed, all references to
the failed peer are removed from the finger table and replaced with the failed peer are removed from the finger table and replaced with
the closest preceding peer from the finger table or neighbor table. the closest preceding peer from the finger table or neighbor table.
If using reactive recovery, the peer initiates a search for a new If using reactive recovery, the peer initiates a search for a new
finger table entry as described below. finger table entry as described below.
10.7.3. Receiving Updates 10.7.3. Receiving Updates
When a peer, N, receives an Update request, it examines the Node-IDs When a peer, x, receives an Update request, it examines the Node-IDs
in the UpdateReq and at its neighbor table and decides if this in the UpdateReq and at its neighbor table and decides if this
UpdateReq would change its neighbor table. This is done by taking UpdateReq would change its neighbor table. This is done by taking
the set of peers currently in the neighbor table and comparing them the set of peers currently in the neighbor table and comparing them
to the peers in the update request. There are two major cases: to the peers in the update request. There are two major cases:
o The UpdateReq contains peers that match N's neighbor table, so no o The UpdateReq contains peers that match x's neighbor table, so no
change is needed to the neighbor set. change is needed to the neighbor set.
o The UpdateReq contains peers N does not know about that should be o The UpdateReq contains peers x does not know about that should be
in N's neighbor table, i.e. they are closer than entries in the in x's neighbor table, i.e. they are closer than entries in the
neighbor table. neighbor table.
In the first case, no change is needed. In the first case, no change is needed.
In the second case, N MUST attempt to Attach to the new peers and if In the second case, x MUST attempt to Attach to the new peers and if
it is successful it MUST adjust its neighbor set accordingly. Note it is successful it MUST adjust its neighbor set accordingly. Note
that it can maintain the now inferior peers as neighbors, but it MUST that it can maintain the now inferior peers as neighbors, but it MUST
remember the closer ones. remember the closer ones.
After any Pings and Attaches are done, if the neighbor table changes After any Pings and Attaches are done, if the neighbor table changes
and the peer is using reactive recovery, the peer sends an Update and the peer is using reactive recovery, the peer sends an Update
request to each member of its Connection Table. These Update request to each member of its Connection Table. These Update
requests are what end up filling in the predecessor/successor tables requests are what end up filling in the predecessor/successor tables
of peers that this peer is a neighbor to. A peer MUST NOT enter of peers that this peer is a neighbor to. A peer MUST NOT enter
itself in its successor or predecessor table and instead should leave itself in its successor or predecessor table and instead should leave
the entries empty. the entries empty.
If peer N is responsible for a Resource-ID R, and N discovers that If peer x is responsible for a Resource-ID R, and x discovers that
the replica set for R (the next two nodes in its successor set) has the replica set for R (the next two nodes in its successor set) has
changed, it MUST send a Store for any data associated with R to any changed, it MUST send a Store for any data associated with R to any
new node in the replica set. It SHOULD NOT delete data from peers new node in the replica set. It SHOULD NOT delete data from peers
which have left the replica set. which have left the replica set.
When a peer N detects that it is no longer in the replica set for a When a peer x detects that it is no longer in the replica set for a
resource R (i.e., there are three predecessors between N and R), it resource R (i.e., there are three predecessors between x and R), it
SHOULD delete all data associated with R from its local store. SHOULD delete all data associated with R from its local store.
When a peer discovers that its range of responsible IDs have changed, When a peer discovers that its range of responsible IDs have changed,
it MUST send an Update to all entries in its connection table. it MUST send an Update to all entries in its connection table.
10.7.4. Stabilization 10.7.4. Stabilization
There are four components to stabilization: There are four components to stabilization:
1. exchange Updates with all peers in its neighbor table to exchange 1. exchange Updates with all peers in its neighbor table to exchange
state. state.
skipping to change at page 115, line 43 skipping to change at page 116, line 8
Connection Table. The purpose of this is to keep the predecessor and Connection Table. The purpose of this is to keep the predecessor and
successor lists up to date and to detect failed peers. The default successor lists up to date and to detect failed peers. The default
time is about every ten minutes, but the configuration server SHOULD time is about every ten minutes, but the configuration server SHOULD
set this in the configuration document using the "chord-update- set this in the configuration document using the "chord-update-
interval" element (denominated in seconds.) A peer SHOULD randomly interval" element (denominated in seconds.) A peer SHOULD randomly
offset these Update requests so they do not occur all at once. offset these Update requests so they do not occur all at once.
10.7.4.2. Refreshing finger table 10.7.4.2. Refreshing finger table
A peer MUST periodically search for new peers to replace invalid A peer MUST periodically search for new peers to replace invalid
entries in the finger table. A finger table entry i is valid if it entries in the finger table. For peer x, the i'th finger table entry
is in the range [ n+2^( 128-i ) , n+2^( 128-(i-1) )-1 ]. Invalid is valid if it is in the range [ x+2^( 128-i ), x+2^( 128-(i-1) )-1
entries occur in the finger table when a previous finger table entry ]. Invalid entries occur in the finger table when a previous finger
has failed or when no peer has been found in that range. table entry has failed or when no peer has been found in that range.
A peer SHOULD NOT send Ping requests looking for new finger table A peer SHOULD NOT send Ping requests looking for new finger table
entries more often than the configuration element "chord-ping- entries more often than the configuration element "chord-ping-
interval", which defaults to 3600 seconds (one per hour). interval", which defaults to 3600 seconds (one per hour).
Two possible methods for searching for new peers for the finger table Two possible methods for searching for new peers for the finger table
entries are presented: entries are presented:
Alternative 1: A peer selects one entry in the finger table from Alternative 1: A peer selects one entry in the finger table from
among the invalid entries. It pings for a new peer for that finger among the invalid entries. It pings for a new peer for that finger
skipping to change at page 129, line 48 skipping to change at page 130, line 25
user returns (e.g., to have their certificate re-issued) return user returns (e.g., to have their certificate re-issued) return
the same Node-ID, thus avoiding the need for implementations to the same Node-ID, thus avoiding the need for implementations to
re-store all their data when their certificates expire. re-store all their data when their certificates expire.
o A single name this user is allowed to use in the overlay, using o A single name this user is allowed to use in the overlay, using
type rfc822Name. Enrollment servers SHOULD take care to only type rfc822Name. Enrollment servers SHOULD take care to only
allow legal characters in the name (e.g., no embedded NULs), allow legal characters in the name (e.g., no embedded NULs),
rather than simply accepting any name provided by the user. rather than simply accepting any name provided by the user.
The certificate is returned as type "application/pkix-cert" as The certificate is returned as type "application/pkix-cert" as
defined in [RFC2585], with an HTTP status code of 200 OK. defined in [RFC2585], with an HTTP status code of 200 OK.
Certificate processing errors should be treated as HTTP errors and
have appropriate HTTP status codes. In particular, password errors Certificate processing errors should result in a HTTP return code of
SHOULD be returned as 401 Unauthorized. [[ OPEN ISSUE: We know this 403 "Forbidden" along with a body of type "text/plain" and body that
isn't right and have a question out to the apps AD. ]] consists of one of the tokens defined in the following list:
failed_authentication The user name and password combination was not
correct.
username_not_available The requested userName for the certificate
was not acceptable.
Node-IDs_not_available The number of Node-IDs requested was not
acceptable.
bad_CSR There was a problem with the CSR.
If the client receives an unknown token in the body, it SHOULD treat
it as a failure for an unknown reasons.
The client MUST check that the certificate returned chains back to The client MUST check that the certificate returned chains back to
one of the certificates received in the "root-cert" list of the one of the certificates received in the "root-cert" list of the
overlay configuration data (including PKIX BasicConstraints checks.) overlay configuration data (including PKIX BasicConstraints checks.)
The node then reads the certificate to find the Node-IDs it can use. The node then reads the certificate to find the Node-IDs it can use.
11.3.1. Self-Generated Credentials 11.3.1. Self-Generated Credentials
If the "self-signed-permitted" element is present in the If the "self-signed-permitted" element is present in the
configuration and set to "true", then a node MUST generate its own configuration and set to "true", then a node MUST generate its own
self-signed certificate to join the overlay. The self-signed self-signed certificate to join the overlay. The self-signed
skipping to change at page 156, line 17 skipping to change at page 157, line 17
The ideas and text for the Chord specific extension data to the Leave The ideas and text for the Chord specific extension data to the Leave
mechanisms was provided by Jouni Maenpaa, Gonzalo Camarillo, and Jani mechanisms was provided by Jouni Maenpaa, Gonzalo Camarillo, and Jani
Hautakorpi. Hautakorpi.
Thanks to the many people who contributed including Ted Hardie, Thanks to the many people who contributed including Ted Hardie,
Michael Chen, Dan York, Das Saumitra, Lyndsay Campbell, Brian Rosen, Michael Chen, Dan York, Das Saumitra, Lyndsay Campbell, Brian Rosen,
David Bryan, Dave Craig, and Julian Cain. Extensive last call David Bryan, Dave Craig, and Julian Cain. Extensive last call
comments were provided by: Jouni Maenpaa, Roni Even, Gonzalo comments were provided by: Jouni Maenpaa, Roni Even, Gonzalo
Camarillo, Ari Keranen, John Buford, Michael Chen, Frederic-Philippe Camarillo, Ari Keranen, John Buford, Michael Chen, Frederic-Philippe
Met, Mary Barnes, and David Bryan. Special thanks to Marc Petit- Met, Mary Barnes, Roland Bless, and David Bryan. Special thanks to
Huguenin who provided an amazing amount of detailed review. Marc Petit-Huguenin who provided an amazing amount of detailed
review.
16. References 16. References
16.1. Normative References 16.1. Normative References
[RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and [RFC1918] Rekhter, Y., Moskowitz, R., Karrenberg, D., Groot, G., and
E. Lear, "Address Allocation for Private Internets", E. Lear, "Address Allocation for Private Internets",
BCP 5, RFC 1918, February 1996. BCP 5, RFC 1918, February 1996.
[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate [RFC2119] Bradner, S., "Key words for use in RFCs to Indicate
 End of changes. 34 change blocks. 
139 lines changed or deleted 171 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/