draft-ietf-p2psip-self-tuning-14.txt | draft-ietf-p2psip-self-tuning-15.txt | |||
---|---|---|---|---|
P2PSIP Working Group J. Maenpaa | P2PSIP Working Group J. Maenpaa | |||
Internet-Draft G. Camarillo | Internet-Draft G. Camarillo | |||
Intended status: Standards Track Ericsson | Intended status: Standards Track Ericsson | |||
Expires: December 18, 2014 June 16, 2014 | Expires: December 28, 2014 June 26, 2014 | |||
Self-tuning Distributed Hash Table (DHT) for REsource LOcation And | Self-tuning Distributed Hash Table (DHT) for REsource LOcation And | |||
Discovery (RELOAD) | Discovery (RELOAD) | |||
draft-ietf-p2psip-self-tuning-14.txt | draft-ietf-p2psip-self-tuning-15.txt | |||
Abstract | Abstract | |||
REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) | REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) | |||
signaling protocol that provides an overlay network service. Peers | signaling protocol that provides an overlay network service. Peers | |||
in a RELOAD overlay network collectively run an overlay algorithm to | in a RELOAD overlay network collectively run an overlay algorithm to | |||
organize the overlay, and to store and retrieve data. This document | organize the overlay, and to store and retrieve data. This document | |||
describes how the default topology plugin of RELOAD can be extended | describes how the default topology plugin of RELOAD can be extended | |||
to support self-tuning, that is, to adapt to changing operating | to support self-tuning, that is, to adapt to changing operating | |||
conditions such as churn and network size. | conditions such as churn and network size. | |||
skipping to change at page 1, line 37 | skipping to change at page 1, line 37 | |||
Internet-Drafts are working documents of the Internet Engineering | Internet-Drafts are working documents of the Internet Engineering | |||
Task Force (IETF). Note that other groups may also distribute | Task Force (IETF). Note that other groups may also distribute | |||
working documents as Internet-Drafts. The list of current Internet- | working documents as Internet-Drafts. The list of current Internet- | |||
Drafts is at http://datatracker.ietf.org/drafts/current/. | Drafts is at http://datatracker.ietf.org/drafts/current/. | |||
Internet-Drafts are draft documents valid for a maximum of six months | Internet-Drafts are draft documents valid for a maximum of six months | |||
and may be updated, replaced, or obsoleted by other documents at any | and may be updated, replaced, or obsoleted by other documents at any | |||
time. It is inappropriate to use Internet-Drafts as reference | time. It is inappropriate to use Internet-Drafts as reference | |||
material or to cite them other than as "work in progress." | material or to cite them other than as "work in progress." | |||
This Internet-Draft will expire on December 18, 2014. | This Internet-Draft will expire on December 28, 2014. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2014 IETF Trust and the persons identified as the | Copyright (c) 2014 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(http://trustee.ietf.org/license-info) in effect on the date of | (http://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
skipping to change at page 2, line 13 | skipping to change at page 2, line 13 | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Simplified BSD License text as described in Section 4.e of | include Simplified BSD License text as described in Section 4.e of | |||
the Trust Legal Provisions and are provided without warranty as | the Trust Legal Provisions and are provided without warranty as | |||
described in the Simplified BSD License. | described in the Simplified BSD License. | |||
Table of Contents | Table of Contents | |||
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | 1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 | |||
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | 2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 | |||
3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5 | 3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5 | |||
3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 5 | 3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 6 | |||
3.2. Configuring Periodic Stabilization . . . . . . . . . . . 6 | 3.2. Configuring Periodic Stabilization . . . . . . . . . . . 6 | |||
3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . 8 | 3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . 8 | |||
4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 8 | 4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 8 | |||
5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9 | 5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9 | |||
5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10 | 5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10 | |||
5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . 11 | 5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . 11 | |||
5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . 11 | 5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . 11 | |||
5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 11 | 5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 12 | |||
5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . 12 | 5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . 12 | |||
5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12 | 5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12 | |||
6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . 12 | 6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . 12 | |||
6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 12 | 6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 13 | |||
6.2. Determining Routing Table Size . . . . . . . . . . . . . 13 | 6.2. Determining Routing Table Size . . . . . . . . . . . . . 13 | |||
6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 13 | 6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 14 | |||
6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . 14 | 6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . 14 | |||
6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . 15 | 6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . 15 | |||
6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . 15 | 6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . 16 | |||
6.6. Calculating the Stabilization Interval . . . . . . . . . 17 | 6.6. Calculating the Stabilization Interval . . . . . . . . . 17 | |||
7. Overlay Configuration Document Extension . . . . . . . . . . 18 | 7. Overlay Configuration Document Extension . . . . . . . . . . 18 | |||
8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 | 8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 | |||
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 | 9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 | |||
9.1. Message Extensions . . . . . . . . . . . . . . . . . . . 19 | 9.1. Message Extensions . . . . . . . . . . . . . . . . . . . 19 | |||
9.2. A New IETF XML Registry . . . . . . . . . . . . . . . . . 19 | 9.2. New Overlay Algorithm Type . . . . . . . . . . . . . . . 19 | |||
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 19 | 9.3. A New IETF XML Registry . . . . . . . . . . . . . . . . . 19 | |||
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 19 | 10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 19 | 11. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 | |||
11.1. Normative References . . . . . . . . . . . . . . . . . . 20 | ||||
11.2. Informative References . . . . . . . . . . . . . . . . . 20 | 11.2. Informative References . . . . . . . . . . . . . . . . . 20 | |||
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 | Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22 | |||
1. Introduction | 1. Introduction | |||
REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer | REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer | |||
signaling protocol that can be used to maintain an overlay network, | signaling protocol that can be used to maintain an overlay network, | |||
and to store data in and retrieve data from the overlay. For | and to store data in and retrieve data from the overlay. For | |||
interoperability reasons, RELOAD specifies one overlay algorithm, | interoperability reasons, RELOAD specifies one overlay algorithm, | |||
called chord-reload, that is mandatory to implement. This document | called chord-reload, that is mandatory to implement. This document | |||
skipping to change at page 3, line 24 | skipping to change at page 3, line 26 | |||
successor set, size of the routing table, and rate of maintenance | successor set, size of the routing table, and rate of maintenance | |||
messages. The problem with this approach is that it is not possible | messages. The problem with this approach is that it is not possible | |||
to achieve a low failure rate and a low communication overhead by | to achieve a low failure rate and a low communication overhead by | |||
using fixed parameters. Instead, a better approach is to allow the | using fixed parameters. Instead, a better approach is to allow the | |||
system to take into account the evolution of network conditions and | system to take into account the evolution of network conditions and | |||
adapt to them. | adapt to them. | |||
This document extends the mandatory-to-implement chord-reload | This document extends the mandatory-to-implement chord-reload | |||
algorithm by making it self-tuning. The use of the self-tuning | algorithm by making it self-tuning. The use of the self-tuning | |||
feature is optional. However, when used, it needs to be supported by | feature is optional. However, when used, it needs to be supported by | |||
all peers in the RELOAD overlay network. Two main advantages of | all peers in the RELOAD overlay network. The fact that a RELOAD | |||
self-tuning are that users no longer need to tune every DHT parameter | overlay uses the self-tuning feature is indicated in the RELOAD | |||
correctly for a given operating environment and that the system | overlay configuration document using the CHORD-SELF-TUNING algorithm | |||
adapts to changing operating conditions. | name specified in Section 9.2 in the topology-plugin element. Two | |||
main advantages of self-tuning are that users no longer need to tune | ||||
every DHT parameter correctly for a given operating environment and | ||||
that the system adapts to changing operating conditions. | ||||
The remainder of this document is structured as follows: Section 2 | The remainder of this document is structured as follows: Section 2 | |||
provides definitions of terms used in this document. Section 3 | provides definitions of terms used in this document. Section 3 | |||
discusses alternative approaches to stabilization operations in DHTs, | discusses alternative approaches to stabilization operations in DHTs, | |||
including reactive stabilization, periodic stabilization, and | including reactive stabilization, periodic stabilization, and | |||
adaptive stabilization. Section 4 gives an introduction to the Chord | adaptive stabilization. Section 4 gives an introduction to the Chord | |||
DHT algorithm. Section 5 describes how this document extends the | DHT algorithm. Section 5 describes how this document extends the | |||
stabilization routine of the chord-reload algorithm. Section 6 | stabilization routine of the chord-reload algorithm. Section 6 | |||
describes how the stabilization rate and routing table size are | describes how the stabilization rate and routing table size are | |||
calculated in an adaptive fashion. | calculated in an adaptive fashion. | |||
skipping to change at page 9, line 39 | skipping to change at page 9, line 45 | |||
list, finger table, and predecessor list. | list, finger table, and predecessor list. | |||
5. Extending Chord-reload to Support Self-tuning | 5. Extending Chord-reload to Support Self-tuning | |||
This section describes how the mandatory-to-implement chord-reload | This section describes how the mandatory-to-implement chord-reload | |||
algorithm defined in RELOAD base [RFC6940] can be extended to support | algorithm defined in RELOAD base [RFC6940] can be extended to support | |||
self-tuning. | self-tuning. | |||
The chord-reload algorithm supports both reactive and periodic | The chord-reload algorithm supports both reactive and periodic | |||
recovery strategies. When the self-tuning mechanisms defined in this | recovery strategies. When the self-tuning mechanisms defined in this | |||
document are used, the periodic recovery strategy MUST be used. | document are used, the periodic recovery strategy is used. Further, | |||
Further, chord-reload specifies that at least three predecessors and | chord-reload specifies that at least three predecessors and three | |||
three successors need to be maintained. When the self-tuning | successors need to be maintained. When the self-tuning mechanisms | |||
mechanisms are used, the appropriate sizes of the successor list and | are used, the appropriate sizes of the successor list and predecessor | |||
predecessor list are determined in an adaptive fashion based on the | list are determined in an adaptive fashion based on the estimated | |||
estimated network size, as will be described in Section 6. | network size, as will be described in Section 6. | |||
As specified in RELOAD base, each peer MUST maintain a stabilization | As specified in RELOAD base, each peer maintains a stabilization | |||
timer. When the stabilization timer fires, the peer MUST restart the | timer. When the stabilization timer fires, the peer restarts the | |||
timer and carry out the overlay stabilization routine. Overlay | timer and carries out the overlay stabilization routine. Overlay | |||
stabilization has four components in chord-reload: | stabilization has four components in chord-reload: | |||
1. Update the neighbor table. We refer to this as neighbor | 1. Update the neighbor table. We refer to this as neighbor | |||
stabilization. | stabilization. | |||
2. Refreshing the finger table. We refer to this as finger | 2. Refreshing the finger table. We refer to this as finger | |||
stabilization. | stabilization. | |||
3. Adjusting finger table size. | 3. Adjusting finger table size. | |||
skipping to change at page 10, line 34 | skipping to change at page 10, line 39 | |||
intervals can be determined in an adaptive fashion based on the | intervals can be determined in an adaptive fashion based on the | |||
operating conditions of the overlay. The subsections below describe | operating conditions of the overlay. The subsections below describe | |||
how this document extends the four components of stabilization. | how this document extends the four components of stabilization. | |||
5.1. Update Requests | 5.1. Update Requests | |||
As described in RELOAD base [RFC6940], the neighbor and finger | As described in RELOAD base [RFC6940], the neighbor and finger | |||
stabilization procedures are implemented using Update requests. | stabilization procedures are implemented using Update requests. | |||
RELOAD base defines three types of Update requests: 'peer_ready', | RELOAD base defines three types of Update requests: 'peer_ready', | |||
'neighbors', and 'full'. Regardless of the type, all Update requests | 'neighbors', and 'full'. Regardless of the type, all Update requests | |||
include an 'uptime' field. Since the self-tuning extensions require | include an 'uptime' field. The self-tuning extensions require | |||
information on the uptimes of peers in the routing table, the sender | information on the uptimes of peers in the routing table. The sender | |||
of an Update request MUST include its current uptime in seconds in | of an Update request includes its current uptime (in seconds) in the | |||
the 'uptime' field. | 'uptime' field. Regardless of the type, all Update requests MUST | |||
include an 'uptime' field. | ||||
When self-tuning is used, each peer decides independently the | When self-tuning is used, each peer decides independently the | |||
appropriate size for the successor list, predecessor list and finger | appropriate size for the successor list, predecessor list and finger | |||
table. Thus, the 'predecessors', 'successors', and 'fingers' fields | table. Thus, the 'predecessors', 'successors', and 'fingers' fields | |||
included in RELOAD Update requests are of variable length. As | included in RELOAD Update requests are of variable length. As | |||
specified in RELOAD [RFC6940], variable length fields are on the wire | specified in RELOAD [RFC6940], variable length fields are on the wire | |||
preceded by length bytes. In the case of the successor list, | preceded by length bytes. In the case of the successor list, | |||
predecessor list, and finger table, there are two length bytes | predecessor list, and finger table, there are two length bytes | |||
(allowing lengths up to 2^16-1). The number of NodeId structures | (allowing lengths up to 2^16-1). The number of NodeId structures | |||
included in each field can be calculated based on the length bytes | included in each field can be calculated based on the length bytes | |||
since the size of a single NodeId structure is 16 bytes. If a peer | since the size of a single NodeId structure is 16 bytes. If a peer | |||
receives more entries than fit into its successor list, predecessor | receives more entries than fit into its successor list, predecessor | |||
list or finger table, the peer MUST ignore the extra entries. If a | list or finger table, the peer MUST ignore the extra entries. A peer | |||
peer receives less entries than it currently has in its own data | may also receive less entries than it currently has in its own data | |||
structure, the peer MUST NOT drop the extra entries from its data | structure. In that case, it uses the received entries to update only | |||
structure. | a subset of the entries in its data structure. As an example, a peer | |||
that has a successor list of size 8 may receive a successor list of | ||||
size 4 from its immediate successor. In that case, the received | ||||
successor list can only be used to update the first few successors on | ||||
the peer's successor list. The rest of the successors will remain | ||||
intact. | ||||
5.2. Neighbor Stabilization | 5.2. Neighbor Stabilization | |||
In the neighbor stabilization operation of chord-reload, a peer | In the neighbor stabilization operation of chord-reload, a peer | |||
periodically sends an Update request to every peer in its Connection | periodically sends an Update request to every peer in its Connection | |||
Table. In a small, low-churn overlay, the amount of traffic this | Table. In a small, low-churn overlay, the amount of traffic this | |||
process generates is typically acceptable. However, in a large-scale | process generates is typically acceptable. However, in a large-scale | |||
overlay churning at a moderate or high churn rate, the traffic load | overlay churning at a moderate or high churn rate, the traffic load | |||
may no longer be acceptable since the size of the connection table is | may no longer be acceptable since the size of the connection table is | |||
large and the stabilization interval relatively short. The self- | large and the stabilization interval relatively short. The self- | |||
tuning mechanisms described in this document are especially designed | tuning mechanisms described in this document are especially designed | |||
for overlays of the latter type. Therefore, when the self-tuning | for overlays of the latter type. Therefore, when the self-tuning | |||
mechanisms are used, each peer MUST send a periodic Update request | mechanisms are used, each peer only sends a periodic Update request | |||
only to its first predecessor and first successor on the Chord ring. | to its first predecessor and first successor on the Chord ring; it | |||
MUST NOT send Update requests to others. | ||||
The neighbor stabilization routine MUST be executed when the | The neighbor stabilization routine is executed when the stabilization | |||
stabilization timer fires. To begin the neighbor stabilization | timer fires. To begin the neighbor stabilization routine, a peer | |||
routine, a peer MUST send an Update request to its first successor | sends an Update request to its first successor and its first | |||
and its first predecessor. The type of the Update request MUST be | predecessor. The type of the Update request MUST be 'neighbors'. | |||
'neighbors'. The Update request MUST include the successor and | The Update request includes the successor and predecessor lists of | |||
predecessor lists of the sender. If a peer receiving such an Update | the sender. If a peer receiving such an Update request learns from | |||
request learns from the predecessor and successor lists included in | the predecessor and successor lists included in the request that new | |||
the request that new peers can be included in its neighborhood set, | peers can be included in its neighborhood set, it sends Attach | |||
it MUST send Attach requests to the new peers. | requests to the new peers. | |||
After a new peer has been added to the predecessor or successor list, | After a new peer has been added to the predecessor or successor list, | |||
an Update request of type 'peer_ready' MUST be sent to the new peer. | an Update request of type 'peer_ready' is sent to the new peer. This | |||
This allows the new peer to insert the sender into its neighborhood | allows the new peer to insert the sender into its neighborhood set. | |||
set. | ||||
5.3. Finger Stabilization | 5.3. Finger Stabilization | |||
Chord-reload specifies two alternative methods for searching for new | Chord-reload specifies two alternative methods for searching for new | |||
peers to the finger table. Both of the alternatives can be used with | peers to the finger table. Both of the alternatives can be used with | |||
the self-tuning extensions defined in this document. | the self-tuning extensions defined in this document. | |||
Immediately after a new peer has been added to the finger table, a | Immediately after a new peer has been added to the finger table, a | |||
Probe request MUST be sent to the new peer to fetch its uptime. The | Probe request is sent to the new peer to fetch its uptime. The | |||
requested_info field of the Probe request MUST be set to contain the | requested_info field of the Probe request MUST be set to contain the | |||
ProbeInformationType 'uptime' defined in RELOAD base [RFC6940]. | ProbeInformationType 'uptime' defined in RELOAD base [RFC6940]. | |||
5.4. Adjusting Finger Table Size | 5.4. Adjusting Finger Table Size | |||
The chord-reload algorithm defines how a peer can make sure that the | The chord-reload algorithm defines how a peer can make sure that the | |||
finger table is appropriately sized to allow for efficient routing. | finger table is appropriately sized to allow for efficient routing. | |||
Since the self-tuning mechanisms specified in this document produce a | Since the self-tuning mechanisms specified in this document produce a | |||
network size estimate, this estimate can be directly used to | network size estimate, this estimate can be directly used to | |||
calculate the optimal size for the finger table. This mechanism MUST | calculate the optimal size for the finger table. This mechanism is | |||
be used instead of the one specified by chord-reload. A peer MUST | used instead of the one specified by chord-reload. A peer uses the | |||
use the network size estimate to determine whether it needs to adjust | network size estimate to determine whether it needs to adjust the | |||
the size of its finger table each time when the stabilization timer | size of its finger table each time when the stabilization timer | |||
fires. The way this is done is explained in Section 6.2. | fires. The way this is done is explained in Section 6.2. | |||
5.5. Detecting Partitioning | 5.5. Detecting Partitioning | |||
This document does not require any changes to the mechanism chord- | This document does not require any changes to the mechanism chord- | |||
reload uses to detect network partitioning. | reload uses to detect network partitioning. | |||
5.6. Leaving the Overlay | 5.6. Leaving the Overlay | |||
As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a | As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a | |||
Leave request to all members of its neighbor table prior to leaving | Leave request to all members of its neighbor table prior to leaving | |||
the overlay. The overlay_specific_data field MUST contain the | the overlay. The overlay_specific_data field MUST contain the | |||
ChordLeaveData structure. The Leave requests that are sent to | ChordLeaveData structure. The Leave requests that are sent to | |||
successors MUST contain the predecessor list of the leaving peer. | successors contain the predecessor list of the leaving peer. The | |||
The Leave requests that are sent to the predecessors MUST contain the | Leave requests that are sent to the predecessors contain the | |||
successor list of the leaving peer. If a given successor can | successor list of the leaving peer. If a given successor can | |||
identify better predecessors than are already included in its | identify better predecessors (that is, predecessors that are closer | |||
predecessor lists by investigating the predecessor list it receives | to it on the Chord ring than its existing predecessors) than are | |||
from the leaving peer, it MUST send Attach requests to them. | already included in its predecessor lists by investigating the | |||
Similarly, if a given predecessor identifies better successors by | predecessor list it receives from the leaving peer, it sends Attach | |||
investigating the successor list it receives from the leaving peer, | requests to them. Similarly, if a given predecessor identifies | |||
it MUST send Attach requests to them. | better successors by investigating the successor list it receives | |||
from the leaving peer, it sends Attach requests to them. | ||||
6. Self-tuning Chord Parameters | 6. Self-tuning Chord Parameters | |||
This section specifies how to determine an appropriate stabilization | This section specifies how to determine an appropriate stabilization | |||
rate and routing table size in an adaptive fashion. The proposed | rate and routing table size in an adaptive fashion. The proposed | |||
mechanism is based on [mahajan2003], [liben-nowell2002], and | mechanism is based on [mahajan2003], [liben-nowell2002], and | |||
[ghinita2006]. To calculate an appropriate stabilization rate, the | [ghinita2006]. To calculate an appropriate stabilization rate, the | |||
values of three parameters must be estimated: overlay size N, failure | values of three parameters must be estimated: overlay size N, failure | |||
rate U, and join rate L. To calculate an appropriate routing table | rate U, and join rate L. To calculate an appropriate routing table | |||
size, the estimated network size N can be used. Peers in the overlay | size, the estimated network size N can be used. Peers in the overlay | |||
skipping to change at page 13, line 4 | skipping to change at page 13, line 15 | |||
chord-reload algorithm at the end of each stabilization period before | chord-reload algorithm at the end of each stabilization period before | |||
re-starting the stabilization timer. | re-starting the stabilization timer. | |||
6.1. Estimating Overlay Size | 6.1. Estimating Overlay Size | |||
Techniques for estimating the size of an overlay network have been | Techniques for estimating the size of an overlay network have been | |||
proposed for instance in [mahajan2003], [horowitz2003], | proposed for instance in [mahajan2003], [horowitz2003], | |||
[kostoulas2005], [binzenhofer2006], and [ghinita2006]. In Chord, the | [kostoulas2005], [binzenhofer2006], and [ghinita2006]. In Chord, the | |||
density of peer identifiers in the neighborhood set can be used to | density of peer identifiers in the neighborhood set can be used to | |||
produce an estimate of the size of the overlay, N [mahajan2003]. | produce an estimate of the size of the overlay, N [mahajan2003]. | |||
Since peer identifiers are picked randomly with uniform probability | Since peer identifiers are picked randomly with uniform probability | |||
from the numBitsInNodeId-bit identifier space, the average distance | from the numBitsInNodeId-bit identifier space, the average distance | |||
between peer identifiers in the successor set is | between peer identifiers in the successor set is | |||
(2^numBitsInNodeId)/N. | (2^numBitsInNodeId)/N. | |||
To estimate the overlay network size, a peer MUST compute the average | To estimate the overlay network size, a peer computes the average | |||
inter-peer distance d between the successive peers starting from the | inter-peer distance d between the successive peers starting from the | |||
most distant predecessor and ending to the most distant successor in | most distant predecessor and ending to the most distant successor in | |||
the successor list. The estimated network size MUST be calculated | the successor list. The estimated network size is calculated as: | |||
as: | ||||
2^numBitsInNodeId | 2^numBitsInNodeId | |||
N = ------------------- | N = ------------------- | |||
d | d | |||
This estimate has been found to be accurate within 15% of the real | This estimate has been found to be accurate within 15% of the real | |||
network size [ghinita2006]. Of course, the size of the neighborhood | network size [ghinita2006]. Of course, the size of the neighborhood | |||
set affects the accuracy of the estimate. | set affects the accuracy of the estimate. | |||
During the join process, a joining peer fills its routing table by | During the join process, a joining peer fills its routing table by | |||
skipping to change at page 13, line 37 | skipping to change at page 13, line 46 | |||
information at its disposal to calculate an estimate of the network | information at its disposal to calculate an estimate of the network | |||
size. | size. | |||
6.2. Determining Routing Table Size | 6.2. Determining Routing Table Size | |||
As specified in RELOAD base, the finger table must contain at least | As specified in RELOAD base, the finger table must contain at least | |||
16 entries. When the self-tuning mechanisms are used, the size of | 16 entries. When the self-tuning mechanisms are used, the size of | |||
the finger table MUST be set to max(ceiling(log2(N)), 16) using the | the finger table MUST be set to max(ceiling(log2(N)), 16) using the | |||
estimated network size N. | estimated network size N. | |||
The size of the successor list MUST be set to ceiling(log2(N)). An | The size of the successor list MUST be set to a maximum of | |||
implementation MAY place a lower limit on the size of the successor | ceiling(log2(N)). An implementation can place a lower limit on the | |||
list. As an example, the implementation might require the size of | size of the successor list. As an example, the implementation might | |||
the successor list to be always at least three. | require the size of the successor list to be always at least three. | |||
A peer MAY choose to maintain a fixed-size predecessor list with only | The size of the predecessor list MUST be set to ceiling(log2(N)). | |||
three entries as specified in RELOAD base. However, it is | ||||
RECOMMENDED that a peer maintains ceiling(log2(N)) predecessors. | ||||
6.3. Estimating Failure Rate | 6.3. Estimating Failure Rate | |||
A typical approach is to assume that peers join the overlay according | A typical approach is to assume that peers join the overlay according | |||
to a Poisson process with rate L and leave according to a Poisson | to a Poisson process with rate L and leave according to a Poisson | |||
process with rate parameter U [mahajan2003]. The value of U can be | process with rate parameter U [mahajan2003]. The value of U can be | |||
estimated using peer failures in the finger table and neighborhood | estimated using peer failures in the finger table and neighborhood | |||
set [mahajan2003]. If peers fail with rate U, a peer with M unique | set [mahajan2003]. If peers fail with rate U, a peer with M unique | |||
peer identifiers in its routing table should observe K failures in | peer identifiers in its routing table should observe K failures in | |||
time K/(M*U). Every peer in the overlay MUST maintain a history of | time K/(M*U). Every peer in the overlay maintains a history of the | |||
the last K failures. The current time MUST be inserted into the | last K failures. The current time is inserted into the history when | |||
history when the peer joins the overlay. The estimate of U MUST be | the peer joins the overlay. The estimate of U is calculated as: | |||
calculated as: | ||||
k | k | |||
U = --------, | U = --------, | |||
M * Tk | M * Tk | |||
where M is the number of unique peer identifiers in the routing | where M is the number of unique peer identifiers in the routing | |||
table, Tk is the time between the first and the last failure in the | table, Tk is the time between the first and the last failure in the | |||
history, and k is the number of failures in the history. If k is | history, and k is the number of failures in the history. If k is | |||
smaller than K, the estimate MUST be computed as if there was a | smaller than K, the estimate is computed as if there was a failure at | |||
failure at the current time. It has been shown that an estimate | the current time. It has been shown that an estimate calculated in a | |||
calculated in a similar manner is accurate within 17% of the real | similar manner is accurate within 17% of the real value of U | |||
value of U [ghinita2006]. | [ghinita2006]. | |||
The size of the failure history K affects the accuracy of the | The size of the failure history K affects the accuracy of the | |||
estimate of U. One can increase the accuracy by increasing K. | estimate of U. One can increase the accuracy by increasing K. | |||
However, this has the side effect of decreasing responsiveness to | However, this has the side effect of decreasing responsiveness to | |||
changes in the failure rate. On the other hand, a small history size | changes in the failure rate. On the other hand, a small history size | |||
may cause a peer to overreact each time a new failure occurs. In | may cause a peer to overreact each time a new failure occurs. In | |||
[ghinita2006], K is set to 25% of the routing table size. Use of | [ghinita2006], K is set to 25% of the routing table size. Use of | |||
this value is RECOMMENDED. | this value is RECOMMENDED. | |||
6.3.1. Detecting Failures | 6.3.1. Detecting Failures | |||
A new failure MUST be inserted to the failure history in the | A new failure is inserted to the failure history in the following | |||
following cases: | cases: | |||
1. A Leave request is received from a neigbhor. | 1. A Leave request is received from a neigbhor. | |||
2. A peer fails to reply to a Ping request sent in the situation | 2. A peer fails to reply to a Ping request sent in the situation | |||
explained below. If no packets have been received on a | explained below. If no packets have been received on a | |||
connection during the past 2*Tr seconds (where Tr is the | connection during the past 2*Tr seconds (where Tr is the | |||
inactivity timer defined by ICE [RFC5245]), a RELOAD Ping request | inactivity timer defined by ICE [RFC5245]), a RELOAD Ping request | |||
MUST be sent to the remote peer. RELOAD mandates the use of STUN | MUST be sent to the remote peer. RELOAD mandates the use of STUN | |||
[RFC5389] for keepalives. STUN keepalives take the form of STUN | [RFC5389] for keepalives. STUN keepalives take the form of STUN | |||
Binding Indication transactions. As specified in ICE [RFC5245], | Binding Indication transactions. As specified in ICE [RFC5245], | |||
a peer sends a STUN Binding Indication if there has been no | a peer sends a STUN Binding Indication if there has been no | |||
packet sent on a connection for Tr seconds. Tr is configurable | packet sent on a connection for Tr seconds. Tr is configurable | |||
and has a default of 15 seconds. Although STUN Binding | and has a default of 15 seconds. Although STUN Binding | |||
Indications do not generate a response, the fact that a peer has | Indications do not generate a response, the fact that a peer has | |||
failed can be learned from the lack of packets (Binding | failed can be learned from the lack of packets (Binding | |||
Indications or application protocol packets) received from the | Indications or application protocol packets) received from the | |||
peer. If the remote peer fails to reply to the Ping request, the | peer. If the remote peer fails to reply to the Ping request, the | |||
sender MUST consider the remote peer to have failed. | sender should consider the remote peer to have failed. | |||
As an alternative to relying on STUN keepalives to detect peer | As an alternative to relying on STUN keepalives to detect peer | |||
failure, a peer could send additional, frequent RELOAD messages to | failure, a peer could send additional, frequent RELOAD messages to | |||
every peer in its Connection Table. These messages could be Update | every peer in its Connection Table. These messages could be Update | |||
requests, in which case they would serve two purposes: detecting peer | requests, in which case they would serve two purposes: detecting peer | |||
failure and stabilization. However, as the cost of this approach can | failure and stabilization. However, as the cost of this approach can | |||
be very high in terms of bandwidth consumption and traffic load, | be very high in terms of bandwidth consumption and traffic load, | |||
especially in large-scale overlays experiencing churn, its use is NOT | especially in large-scale overlays experiencing churn, its use is NOT | |||
RECOMMENDED. | RECOMMENDED. | |||
6.4. Estimating Join Rate | 6.4. Estimating Join Rate | |||
Reference [ghinita2006] proposes that a peer can estimate the join | Reference [ghinita2006] proposes that a peer can estimate the join | |||
rate based on the uptime of the peers in its routing table. An | rate based on the uptime of the peers in its routing table. An | |||
increase in peer join rate will be reflected by a decrease in the | increase in peer join rate will be reflected by a decrease in the | |||
average age of peers in the routing table. Thus, each peer MUST | average age of peers in the routing table. Thus, each peer maintaind | |||
maintain an array of the ages of the peers in its routing table | an array of the ages of the peers in its routing table sorted in | |||
sorted in increasing order. Using this information, an estimate of | increasing order. Using this information, an estimate of the global | |||
the global peer join rate L MUST be calculated as: | peer join rate L is calculated as: | |||
N | N | |||
L = ----------------------, | L = ----------------------, | |||
Ages[floor(rsize/2)] | Ages[floor(rsize/2)] | |||
where Ages is an array containing the ages of the peers in the | where Ages is an array containing the ages of the peers in the | |||
routing table sorted in increasing order and rsize is the size of the | routing table sorted in increasing order and rsize is the size of the | |||
routing table. It has been shown that the estimate obtained by using | routing table. It has been shown that the estimate obtained by using | |||
this method is accurate within 22% of the real join rate | this method is accurate within 22% of the real join rate | |||
[ghinita2006]. Of course, the size of the routing table affects the | [ghinita2006]. Of course, the size of the routing table affects the | |||
skipping to change at page 15, line 48 | skipping to change at page 16, line 8 | |||
the stabilization operations since all Update requests carry uptime | the stabilization operations since all Update requests carry uptime | |||
values. A joining peer learns the uptime of the admitting peer since | values. A joining peer learns the uptime of the admitting peer since | |||
it receives an Update from the admitting peer during the join | it receives an Update from the admitting peer during the join | |||
procedure. Peers learn the uptimes of new fingers since they can | procedure. Peers learn the uptimes of new fingers since they can | |||
fetch the uptime using a Probe request after having attached to the | fetch the uptime using a Probe request after having attached to the | |||
new finger. | new finger. | |||
6.5. Estimate Sharing | 6.5. Estimate Sharing | |||
To improve the accuracy of network size, join rate, and leave rate | To improve the accuracy of network size, join rate, and leave rate | |||
estimates, peers MUST share their estimates. When the stabilization | estimates, peers share their estimates. When the stabilization timer | |||
timer fires, a peer MUST select number-of-peers-to-probe random peers | fires, a peer selects number-of-peers-to-probe random peers from its | |||
from its finger table and send each of them a Probe request. The | finger table and send each of them a Probe request. The targets of | |||
targets of Probe requests are selected from the finger table rather | Probe requests are selected from the finger table rather than from | |||
than from the neighbor table since neighbors are likely to make | the neighbor table since neighbors are likely to make similar errors | |||
similar errors when calculating their estimates. number-of-peers-to- | when calculating their estimates. number-of-peers-to-probe is a new | |||
probe is a new element in the overlay configuration document. It is | element in the overlay configuration document. It is defined in | |||
defined in Section 7. Both the Probe request and the answer returned | Section 7. Both the Probe request and the answer returned by the | |||
by the target peer MUST contain a new message extension whose | target peer MUST contain a new message extension whose | |||
MessageExtensionType is 'self_tuning_data'. This extension type is | MessageExtensionType is 'self_tuning_data'. This extension type is | |||
defined in Section 9.1. The extension_contents field of the | defined in Section 9.1. The extension_contents field of the | |||
MessageExtension structure MUST contain a SelfTuningData structure: | MessageExtension structure MUST contain a SelfTuningData structure: | |||
struct { | struct { | |||
uint32 network_size; | uint32 network_size; | |||
uint32 join_rate; | uint32 join_rate; | |||
uint32 leave_rate; | uint32 leave_rate; | |||
} SelfTuningData; | } SelfTuningData; | |||
skipping to change at page 16, line 42 | skipping to change at page 17, line 9 | |||
The join and leave rates are expressed as joins or failures per 24 | The join and leave rates are expressed as joins or failures per 24 | |||
hours. As an example, if the global join rate estimate a peer has | hours. As an example, if the global join rate estimate a peer has | |||
calculated is 0.123 peers/s, it would include in the join_rate field | calculated is 0.123 peers/s, it would include in the join_rate field | |||
the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is, | the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is, | |||
the value 10628. | the value 10628. | |||
The 'type' field of the MessageExtension structure MUST be set to | The 'type' field of the MessageExtension structure MUST be set to | |||
contain the value 'self_tuning_data'. The 'critical' field of the | contain the value 'self_tuning_data'. The 'critical' field of the | |||
structure MUST be set to False. | structure MUST be set to False. | |||
A peer MUST store all estimates it receives in Probe requests and | A peer stores all estimates it receives in Probe requests and answers | |||
answers during a stabilization interval. When the stabilization | during a stabilization interval. When the stabilization timer fires, | |||
timer fires, the peer MUST calculate the estimates to be used during | the peer calculates the estimates to be used during the next | |||
the next stabilization interval by taking the 75th percentile (i.e., | stabilization interval by taking the 75th percentile (i.e., third | |||
third quartile) of a data set containing its own estimate and the | quartile) of a data set containing its own estimate and the received | |||
received estimates. | estimates. | |||
The default value for number-of-peers-to-probe is 4. This default | The default value for number-of-peers-to-probe is 4. This default | |||
value is recommended to allow a peer to receive a sufficiently large | value is recommended to allow a peer to receive a sufficiently large | |||
set of estimates from other peers. With a value of 4, a peer | set of estimates from other peers. With a value of 4, a peer | |||
receives four estimates in Probe answers. On the average, each peer | receives four estimates in Probe answers. On the average, each peer | |||
also receives four Probe requests each carrying an estimate. Thus, | also receives four Probe requests each carrying an estimate. Thus, | |||
on the average, each peer has nine estimates (including its own) that | on the average, each peer has nine estimates (including its own) that | |||
it can use at the end of the stablization interval. A value smaller | it can use at the end of the stablization interval. A value smaller | |||
than 4 is NOT RECOMMENDED to keep the number of received estimates | than 4 is NOT RECOMMENDED to keep the number of received estimates | |||
high enough. As an example, if the value were 2, there would be | high enough. As an example, if the value were 2, there would be | |||
skipping to change at page 17, line 35 | skipping to change at page 17, line 43 | |||
According to [liben-nowell2002], a Chord network in a ring-like state | According to [liben-nowell2002], a Chord network in a ring-like state | |||
remains in a ring-like state as long as peers send | remains in a ring-like state as long as peers send | |||
Omega(square(log(N))) messages before N new peers join or N/2 peers | Omega(square(log(N))) messages before N new peers join or N/2 peers | |||
fail. We can use the estimate of peer failure rate, U, to calculate | fail. We can use the estimate of peer failure rate, U, to calculate | |||
the time Tf in which N/2 peers fail: | the time Tf in which N/2 peers fail: | |||
1 | 1 | |||
Tf = ------ | Tf = ------ | |||
2*U | 2*U | |||
Based on this estimate, a stabilization interval Tstab-1 MUST be | Based on this estimate, a stabilization interval Tstab-1 is | |||
calculated as: | calculated as: | |||
Tf | Tf | |||
Tstab-1 = ----------------- | Tstab-1 = ----------------- | |||
square(log2(N)) | square(log2(N)) | |||
On the other hand, the estimated join rate L can be used to calculate | On the other hand, the estimated join rate L can be used to calculate | |||
the time in which N new peers join the overlay. Based on the | the time in which N new peers join the overlay. Based on the | |||
estimate of L, a stabilization interval Tstab-2 MUST be calculated | estimate of L, a stabilization interval Tstab-2 is calculated as: | |||
as: | ||||
N | N | |||
Tstab-2 = --------------------- | Tstab-2 = --------------------- | |||
L * square(log2(N)) | L * square(log2(N)) | |||
Finally, the actual stabilization interval Tstab that MUST be used | Finally, the actual stabilization interval Tstab that is used can be | |||
can be obtained by taking the minimum of Tstab-1 and Tstab-2. | obtained by taking the minimum of Tstab-1 and Tstab-2. | |||
The results obtained in [maenpaa2009] indicate that making the | The results obtained in [maenpaa2009] indicate that making the | |||
stabilization interval too small has the effect of making the overlay | stabilization interval too small has the effect of making the overlay | |||
less stable (e.g., in terms of detected loops and path failures). | less stable (e.g., in terms of detected loops and path failures). | |||
Thus, a lower limit should be used for the stabilization period. | Thus, a lower limit should be used for the stabilization period. | |||
Based on the results in [maenpaa2009], a lower limit of 15s is | Based on the results in [maenpaa2009], a lower limit of 15s is | |||
RECOMMENDED, since using a stabilization period smaller than this | RECOMMENDED, since using a stabilization period smaller than this | |||
will with a high probability cause too much traffic in the overlay. | will with a high probability cause too much traffic in the overlay. | |||
7. Overlay Configuration Document Extension | 7. Overlay Configuration Document Extension | |||
skipping to change at page 19, line 26 | skipping to change at page 19, line 30 | |||
| Extension Name | Code | Specification | | | Extension Name | Code | Specification | | |||
+------------------+-------+---------------+ | +------------------+-------+---------------+ | |||
| self_tuning_data | 0x3 | RFC-AAAA | | | self_tuning_data | 0x3 | RFC-AAAA | | |||
+------------------+-------+---------------+ | +------------------+-------+---------------+ | |||
The contents of the extension are defined in Section 6.5. | The contents of the extension are defined in Section 6.5. | |||
Note to RFC Editor: please replace AAAA with the RFC number for this | Note to RFC Editor: please replace AAAA with the RFC number for this | |||
specification. | specification. | |||
9.2. A New IETF XML Registry | 9.2. New Overlay Algorithm Type | |||
This document introduces one additional overlay algorithm type to the | ||||
"RELOAD Overlay Algorithm Types" Registry: | ||||
+-------------------+-----------+ | ||||
| Algorithm Name | Reference | | ||||
+-------------------+-----------+ | ||||
| CHORD-SELF-TUNING | RFC-AAAA | | ||||
+-------------------+-----------+ | ||||
Note to RFC Editor: please replace AAAA with the RFC number for this | ||||
specification. | ||||
9.3. A New IETF XML Registry | ||||
This document registers one new URI for the self-tuning namespace in | This document registers one new URI for the self-tuning namespace in | |||
the "ns" subregistry of the IETF XML registry defined in [RFC3688]. | the "ns" subregistry of the IETF XML registry defined in [RFC3688]. | |||
URI: urn:ietf:params:xml:ns:p2p:self-tuning | URI: urn:ietf:params:xml:ns:p2p:self-tuning | |||
Registrant Contact: The IESG | Registrant Contact: The IESG | |||
XML: N/A, the requested URI is an XML namespace | XML: N/A, the requested URI is an XML namespace | |||
10. Acknowledgments | 10. Acknowledgments | |||
The authors would like to thank Jani Hautakorpi for his contributions | The authors would like to thank Jani Hautakorpi for his contributions | |||
to the document. The authors would also like to thank Carlos | to the document. The authors would also like to thank Carlos | |||
Bernardos, Martin Durst, Alissa Cooper, Tobias Gondrom, and Barry | Bernardos, Martin Durst, Alissa Cooper, Tobias Gondrom, and Barry | |||
Leiba for their comments on the document. | Leiba for their comments on the document. | |||
End of changes. 38 change blocks. | ||||
108 lines changed or deleted | 126 lines changed or added | |||
This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/ |