draft-ietf-p2psip-self-tuning-15.txt   rfc7363.txt 
P2PSIP Working Group J. Maenpaa Internet Engineering Task Force (IETF) J. Maenpaa
Internet-Draft G. Camarillo Request for Comments: 7363 G. Camarillo
Intended status: Standards Track Ericsson Category: Standards Track Ericsson
Expires: December 28, 2014 June 26, 2014 ISSN: 2070-1721 September 2014
Self-tuning Distributed Hash Table (DHT) for REsource LOcation And Self-Tuning Distributed Hash Table (DHT)
Discovery (RELOAD) for REsource LOcation And Discovery (RELOAD)
draft-ietf-p2psip-self-tuning-15.txt
Abstract Abstract
REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P) REsource LOcation And Discovery (RELOAD) is a peer-to-peer (P2P)
signaling protocol that provides an overlay network service. Peers signaling protocol that provides an overlay network service. Peers
in a RELOAD overlay network collectively run an overlay algorithm to in a RELOAD overlay network collectively run an overlay algorithm to
organize the overlay, and to store and retrieve data. This document organize the overlay and to store and retrieve data. This document
describes how the default topology plugin of RELOAD can be extended describes how the default topology plugin of RELOAD can be extended
to support self-tuning, that is, to adapt to changing operating to support self-tuning, that is, to adapt to changing operating
conditions such as churn and network size. conditions such as churn and network size.
Status of This Memo Status of This Memo
This Internet-Draft is submitted in full conformance with the This is an Internet Standards Track document.
provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering
Task Force (IETF). Note that other groups may also distribute
working documents as Internet-Drafts. The list of current Internet-
Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months This document is a product of the Internet Engineering Task Force
and may be updated, replaced, or obsoleted by other documents at any (IETF). It represents the consensus of the IETF community. It has
time. It is inappropriate to use Internet-Drafts as reference received public review and has been approved for publication by the
material or to cite them other than as "work in progress." Internet Engineering Steering Group (IESG). Further information on
Internet Standards is available in Section 2 of RFC 5741.
This Internet-Draft will expire on December 28, 2014. Information about the current status of this document, any errata,
and how to provide feedback on it may be obtained at
http://www.rfc-editor.org/info/rfc7363.
Copyright Notice Copyright Notice
Copyright (c) 2014 IETF Trust and the persons identified as the Copyright (c) 2014 IETF Trust and the persons identified as the
document authors. All rights reserved. document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal This document is subject to BCP 78 and the IETF Trust's Legal
Provisions Relating to IETF Documents Provisions Relating to IETF Documents
(http://trustee.ietf.org/license-info) in effect on the date of (http://trustee.ietf.org/license-info) in effect on the date of
publication of this document. Please review these documents publication of this document. Please review these documents
carefully, as they describe your rights and restrictions with respect carefully, as they describe your rights and restrictions with respect
to this document. Code Components extracted from this document must to this document. Code Components extracted from this document must
include Simplified BSD License text as described in Section 4.e of include Simplified BSD License text as described in Section 4.e of
the Trust Legal Provisions and are provided without warranty as the Trust Legal Provisions and are provided without warranty as
described in the Simplified BSD License. described in the Simplified BSD License.
Table of Contents Table of Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . 2 1. Introduction ....................................................2
2. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 3 2. Terminology .....................................................3
3. Introduction to Stabilization in DHTs . . . . . . . . . . . . 5 3. Introduction to Stabilization in DHTs ...........................5
3.1. Reactive vs. Periodic Stabilization . . . . . . . . . . . 6 3.1. Reactive versus Periodic Stabilization .....................5
3.2. Configuring Periodic Stabilization . . . . . . . . . . . 6 3.2. Configuring Periodic Stabilization .........................6
3.3. Adaptive Stabilization . . . . . . . . . . . . . . . . . 8 3.3. Adaptive Stabilization .....................................7
4. Introduction to Chord . . . . . . . . . . . . . . . . . . . . 8 4. Introduction to Chord ...........................................7
5. Extending Chord-reload to Support Self-tuning . . . . . . . . 9 5. Extending Chord-Reload to Support Self-Tuning ...................9
5.1. Update Requests . . . . . . . . . . . . . . . . . . . . . 10 5.1. Update Requests ............................................9
5.2. Neighbor Stabilization . . . . . . . . . . . . . . . . . 11 5.2. Neighbor Stabilization ....................................10
5.3. Finger Stabilization . . . . . . . . . . . . . . . . . . 11 5.3. Finger Stabilization ......................................11
5.4. Adjusting Finger Table Size . . . . . . . . . . . . . . . 12 5.4. Adjusting Finger Table Size ...............................11
5.5. Detecting Partitioning . . . . . . . . . . . . . . . . . 12 5.5. Detecting Partitioning ....................................11
5.6. Leaving the Overlay . . . . . . . . . . . . . . . . . . . 12 5.6. Leaving the Overlay .......................................11
6. Self-tuning Chord Parameters . . . . . . . . . . . . . . . . 12 6. Self-Tuning Chord Parameters ...................................12
6.1. Estimating Overlay Size . . . . . . . . . . . . . . . . . 13 6.1. Estimating Overlay Size ...................................12
6.2. Determining Routing Table Size . . . . . . . . . . . . . 13 6.2. Determining Routing Table Size ............................13
6.3. Estimating Failure Rate . . . . . . . . . . . . . . . . . 14 6.3. Estimating Failure Rate ...................................13
6.3.1. Detecting Failures . . . . . . . . . . . . . . . . . 14 6.3.1. Detecting Failures .................................14
6.4. Estimating Join Rate . . . . . . . . . . . . . . . . . . 15 6.4. Estimating Join Rate ......................................14
6.5. Estimate Sharing . . . . . . . . . . . . . . . . . . . . 16 6.5. Estimate Sharing ..........................................15
6.6. Calculating the Stabilization Interval . . . . . . . . . 17 6.6. Calculating the Stabilization Interval ....................17
7. Overlay Configuration Document Extension . . . . . . . . . . 18 7. Overlay Configuration Document Extension .......................17
8. Security Considerations . . . . . . . . . . . . . . . . . . . 18 8. Security Considerations ........................................18
9. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 19 9. IANA Considerations ............................................18
9.1. Message Extensions . . . . . . . . . . . . . . . . . . . 19 9.1. Message Extensions ........................................18
9.2. New Overlay Algorithm Type . . . . . . . . . . . . . . . 19 9.2. New Overlay Algorithm Type ................................19
9.3. A New IETF XML Registry . . . . . . . . . . . . . . . . . 19 9.3. A New IETF XML Registry ...................................19
10. Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . 20 10. Acknowledgments ...............................................19
11. References . . . . . . . . . . . . . . . . . . . . . . . . . 20 11. References ....................................................19
11.1. Normative References . . . . . . . . . . . . . . . . . . 20 11.1. Normative References .....................................19
11.2. Informative References . . . . . . . . . . . . . . . . . 20 11.2. Informative References ...................................20
Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 22
1. Introduction 1. Introduction
REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer REsource LOcation And Discovery (RELOAD) [RFC6940] is a peer-to-peer
signaling protocol that can be used to maintain an overlay network, signaling protocol that can be used to maintain an overlay network
and to store data in and retrieve data from the overlay. For and to store data in and retrieve data from the overlay. For
interoperability reasons, RELOAD specifies one overlay algorithm, interoperability reasons, RELOAD specifies one overlay algorithm,
called chord-reload, that is mandatory to implement. This document called "chord-reload", that is mandatory to implement. This document
extends the chord-reload algorithm by introducing self-tuning extends the chord-reload algorithm by introducing self-tuning
behavior. behavior.
Distributed Hash Table (DHT) based overlay networks are self- DHT-based overlay networks are self-organizing, scalable, and
organizing, scalable and reliable. However, these features come at a reliable. However, these features come at a cost: peers in the
cost: peers in the overlay network need to consume network bandwidth overlay network need to consume network bandwidth to maintain routing
to maintain routing state. Most DHTs use a periodic stabilization state. Most DHTs use a periodic stabilization routine to counter the
routine to counter the undesirable effects of churn on routing. To undesirable effects of churn on routing. To configure the parameters
configure the parameters of a DHT, some characteristics such as churn of a DHT, some characteristics such as churn rate and network size
rate and network size need to be known in advance. These need to be known in advance. These characteristics are then used to
characteristics are then used to configure the DHT in a static configure the DHT in a static fashion by using fixed values for
fashion by using fixed values for parameters such as the size of the parameters such as the size of the successor set, size of the routing
successor set, size of the routing table, and rate of maintenance table, and rate of maintenance messages. The problem with this
messages. The problem with this approach is that it is not possible approach is that it is not possible to achieve a low failure rate and
to achieve a low failure rate and a low communication overhead by a low communication overhead by using fixed parameters. Instead, a
using fixed parameters. Instead, a better approach is to allow the better approach is to allow the system to take into account the
system to take into account the evolution of network conditions and evolution of network conditions and adapt to them.
adapt to them.
This document extends the mandatory-to-implement chord-reload This document extends the mandatory-to-implement chord-reload
algorithm by making it self-tuning. The use of the self-tuning algorithm by making it self-tuning. The use of the self-tuning
feature is optional. However, when used, it needs to be supported by feature is optional. However, when used, it needs to be supported by
all peers in the RELOAD overlay network. The fact that a RELOAD all peers in the RELOAD overlay network. The fact that a RELOAD
overlay uses the self-tuning feature is indicated in the RELOAD overlay uses the self-tuning feature is indicated in the RELOAD
overlay configuration document using the CHORD-SELF-TUNING algorithm overlay configuration document using the CHORD-SELF-TUNING algorithm
name specified in Section 9.2 in the topology-plugin element. Two name specified in Section 9.2 in the topology-plugin element. Two
main advantages of self-tuning are that users no longer need to tune main advantages of self-tuning are that users no longer need to tune
every DHT parameter correctly for a given operating environment and every DHT parameter correctly for a given operating environment and
skipping to change at page 3, line 48 skipping to change at page 3, line 41
adaptive stabilization. Section 4 gives an introduction to the Chord adaptive stabilization. Section 4 gives an introduction to the Chord
DHT algorithm. Section 5 describes how this document extends the DHT algorithm. Section 5 describes how this document extends the
stabilization routine of the chord-reload algorithm. Section 6 stabilization routine of the chord-reload algorithm. Section 6
describes how the stabilization rate and routing table size are describes how the stabilization rate and routing table size are
calculated in an adaptive fashion. calculated in an adaptive fashion.
2. Terminology 2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in RFC "OPTIONAL" in this document are to be interpreted as described in
2119 [RFC2119]. [RFC2119].
This document uses terminology and definitions from the RELOAD base This document uses terminology and definitions from the RELOAD base
specification [RFC6940]. specification [RFC6940].
numBitsInNodeId: Specifies the number of bits in a RELOAD Node-ID. numBitsInNodeId: Specifies the number of bits in a RELOAD Node-ID.
DHT: Distributed Hash Tables (DHTs) are a class of decentralized DHT: Distributed Hash Tables are a class of decentralized
distributed systems that provide a lookup service similar to a distributed systems that provide a lookup service similar to a
regular hash table. Given a key, any peer participating in the regular hash table. Given a key, any peer participating in the
system can retrieve the value associated with that key. The system can retrieve the value associated with that key. The
responsibility for maintaining the mapping from keys to values is responsibility for maintaining the mapping from keys to values is
distributed among the peers. distributed among the peers.
Chord Ring: The Chord DHT uses ring topology and orders identifiers Chord Ring: The Chord DHT uses ring topology and orders identifiers
on an identifier circle of size 2^numBitsInNodeId. This on an identifier circle of size 2^numBitsInNodeId. This
identifier circle is called the Chord ring. On the Chord ring, identifier circle is called the Chord ring. On the Chord ring,
the responsibility for a key k is assigned to the node whose the responsibility for a key k is assigned to the node whose
identifier equals to or immediately follows k. identifier equals to or immediately follows k.
Finger Table: A data structure with up to (but typically less than) Finger Table: A data structure with up to (but typically less than)
numBitsInNodeId entries maintained by each peer in a Chord-based numBitsInNodeId entries maintained by each peer in a Chord-based
overlay. The ith entry in the finger table of peer n contains the overlay. The ith entry in the finger table of peer n contains the
identity of the first peer that succeeds n by at least identity of the first peer that succeeds n by at least
2^(numBitsInNodeId-i) on the Chord ring. This peer is called the 2^(numBitsInNodeId-i) on the Chord ring. This peer is called the
ith finger of peer n. As an example, the first entry in the ith finger of peer n. As an example, the first entry in the
finger table of peer n contains a peer half-way around the Chord finger table of peer n contains a peer halfway around the Chord
ring from peer n. The purpose of the finger table is to ring from peer n. The purpose of the finger table is to
accelerate lookups. accelerate lookups.
n.id: An abbreviation that is in this document used refer to the n.id: In this document, this abbreviation is used to refer to the
Node-ID of peer n. Node-ID of peer n.
O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means O(g(n)): Informally, saying that some equation f(n) = O(g(n)) means
that f(n) is less than some constant multiple of g(n). For the that f(n) is less than some constant multiple of g(n). For the
formal definition, please refer to [weiss1998]. formal definition, please refer to [Weiss1998].
Omega(g(n)): Informally, saying that some equation f(n) = Omega(g(n)): Informally, saying that some equation f(n) =
Omega(g(n)) means that f(n) is more than some constant multiple of Omega(g(n)) means that f(n) is more than some constant multiple of
g(n). For the formal definition, please refer to [weiss1998] g(n). For the formal definition, please refer to [Weiss1998].
Percentile: The Pth (0<=P<=100) percentile of N values arranged in Percentile: The Pth (0<=P<=100) percentile of N values arranged in
ascending order is obtained by first calculating the (ordinal) ascending order is obtained by first calculating the (ordinal)
rank n=(P/100)*N, rounding the result to the nearest integer, and rank n=(P/100)*N, rounding the result to the nearest integer and
then taking the value corresponding to that rank. then taking the value corresponding to that rank.
Predecessor List: A data structure containing the first r Predecessor List: A data structure containing the first r
predecessors of a peer on the Chord ring. predecessors of a peer on the Chord ring.
Successor List: A data structure containing the first r successors Successor List: A data structure containing the first r successors
of a peer on the Chord ring. of a peer on the Chord ring.
Neighborhood Set: A term used to refer to the set of peers included Neighborhood Set: A term used to refer to the set of peers included
in the successor and predecessor lists of a given peer. in the successor and predecessor lists of a given peer.
Routing Table: Contents of a given peer's routing table include the Routing Table: Contents of a given peer's routing table include the
set of peers that the peer can use to route overlay messages. The set of peers that the peer can use to route overlay messages. The
routing table is made up of the finger table, successor list and routing table is made up of the finger table, successor list, and
predecessor list. predecessor list.
3. Introduction to Stabilization in DHTs 3. Introduction to Stabilization in DHTs
DHTs use stabilization routines to counter the undesirable effects of DHTs use stabilization routines to counter the undesirable effects of
churn on routing. The purpose of stabilization is to keep the churn on routing. The purpose of stabilization is to keep the
routing information of each peer in the overlay consistent with the routing information of each peer in the overlay consistent with the
constantly changing overlay topology. There are two alternative constantly changing overlay topology. There are two alternative
approaches to stabilization: periodic and reactive [rhea2004]. approaches to stabilization: periodic and reactive [Rhea2004].
Periodic stabilization can either use a fixed stabilization rate or Periodic stabilization can either use a fixed stabilization rate or
calculate the stabilization rate in an adaptive fashion. calculate the stabilization rate in an adaptive fashion.
3.1. Reactive vs. Periodic Stabilization 3.1. Reactive versus Periodic Stabilization
In reactive stabilization, a peer reacts to the loss of a peer in its In reactive stabilization, a peer reacts to the loss of a peer in its
neighborhood set or to the appearance of a new peer that should be neighborhood set or to the appearance of a new peer that should be
added to its neighborhood set by sending a copy of its neighbor table added to its neighborhood set by sending a copy of its neighbor table
to all peers in the neighborhood set. Periodic recovery, in to all peers in the neighborhood set. Periodic recovery, in
contrast, takes place independently of changes in the neighborhood contrast, takes place independently of changes in the neighborhood
set. In periodic recovery, a peer periodically shares its set. In periodic recovery, a peer periodically shares its
neighborhood set with each or a subset of the members of that set. neighborhood set with each or a subset of the members of that set.
The chord-reload algorithm [RFC6940] supports both reactive and The chord-reload algorithm [RFC6940] supports both reactive and
periodic stabilization. It has been shown in [rhea2004] that periodic stabilization. It has been shown in [Rhea2004] that
reactive stabilization works well for small neighborhood sets (i.e., reactive stabilization works well for small neighborhood sets (i.e.,
small overlays) and moderate churn. However, in large-scale (e.g., small overlays) and moderate churn. However, in large-scale (e.g.,
1000 peers or more [rhea2004]) or high-churn overlays, reactive 1000 peers or more [Rhea2004]) or high-churn overlays, reactive
stabilization runs the risk of creating a positive feedback cycle, stabilization runs the risk of creating a positive feedback cycle,
which can eventually result in congestion collapse. In [rhea2004], which can eventually result in congestion collapse. In [Rhea2004],
it is shown that a 1000-peer overlay under churn uses significantly it is shown that a 1000-peer overlay under churn uses significantly
less bandwidth and has lower latencies when periodic stabilization is less bandwidth and has lower latencies when periodic stabilization is
used than when reactive stabilization is used. Although in the used than when reactive stabilization is used. Although in the
experiments carried out in [rhea2004], reactive stabilization experiments carried out in [Rhea2004], reactive stabilization
performed well when there was no churn, its bandwidth use was performed well when there was no churn, its bandwidth use was
observed to jump dramatically under churn. At higher churn rates and observed to jump dramatically under churn. At higher churn rates and
larger scale overlays, periodic stabilization uses less bandwidth and larger scale overlays, periodic stabilization uses less bandwidth and
the resulting lower contention for the network leads to lower the resulting lower contention for the network leads to lower
latencies. For this reason, most DHTs such as CAN [CAN], Chord latencies. For this reason, most DHTs, such as CAN [CAN], Chord
[Chord], Pastry [Pastry], Bamboo [rhea2004], etc. use periodic [Chord], Pastry [Pastry], and Bamboo [Rhea2004], use periodic
stabilization [ghinita2006]. As an example, the first version of stabilization [Ghinita2006]. As an example, the first version of
Bamboo used reactive stabilization, which caused Bamboo to suffer Bamboo used reactive stabilization, which caused Bamboo to suffer
from degradation in performance under churn. To fix this problem, from degradation in performance under churn. To fix this problem,
Bamboo was modified to use periodic stabilization. Bamboo was modified to use periodic stabilization.
In Chord, periodic stabilization is typically done both for In Chord, periodic stabilization is typically done both for
successors and fingers. An alternative strategy is analyzed in successors and fingers. An alternative strategy is analyzed in
[krishnamurthy2008]. In this strategy, called the correction-on- [Krishnamurthy2008]. In this strategy, called the "correction-on-
change maintenance strategy, a peer periodically stabilizes its change maintenance strategy", a peer periodically stabilizes its
successors but does not do so for its fingers. Instead, finger successors but does not do so for its fingers. Instead, finger
pointers are stabilized in a reactive fashion. The results obtained pointers are stabilized in a reactive fashion. The results obtained
in [krishnamurthy2008] imply that although the correction-on-change in [Krishnamurthy2008] imply that although the correction-on-change
strategy works well when churn is low, periodic stabilization strategy works well when churn is low, periodic stabilization
outperforms the correction-on-change strategy when churn is high. outperforms the correction-on-change strategy when churn is high.
3.2. Configuring Periodic Stabilization 3.2. Configuring Periodic Stabilization
When periodic stabilization is used, one faces the problem of When periodic stabilization is used, one faces the problem of
selecting an appropriate execution rate for the stabilization selecting an appropriate execution rate for the stabilization
procedure. If the execution rate of periodic stabilization is high, procedure. If the execution rate of periodic stabilization is high,
changes in the system can be quickly detected, but at the changes in the system can be quickly detected, but at the
disadvantage of increased communication overhead. Alternatively, if disadvantage of increased communication overhead. Alternatively, if
the stabilization rate is low and the churn rate is high, routing the stabilization rate is low and the churn rate is high, routing
tables become inaccurate and DHT performance deteriorates. Thus, the tables become inaccurate and DHT performance deteriorates. Thus, the
problem is setting the parameters so that the overlay achieves the problem is setting the parameters so that the overlay achieves the
desired reliability and performance even in challenging conditions, desired reliability and performance even in challenging conditions,
such as under heavy churn. This naturally results in high cost such as under heavy churn. This naturally results in high cost
during periods when the churn level is lower than expected, or during periods when the churn level is lower than expected, or
alternatively, poor performance or even network partitioning in worse alternatively, poor performance or even network partitioning in worse
than expected conditions. than expected conditions.
In addition to selecting an appropriate stabilization interval, In addition to selecting an appropriate stabilization interval,
regardless of whether periodic stabilization is used or not, an regardless of whether or not periodic stabilization is used, an
appropriate size needs to be selected for the neighborhood set and appropriate size needs to be selected for the neighborhood set and
for the finger table. for the finger table.
The current approach is to configure overlays statically. This works The current approach is to configure overlays statically. This works
in situations where perfect information about the future is in situations where perfect information about the future is
available. In situations where the operating conditions of the available. In situations where the operating conditions of the
network are known in advance and remain static throughout the network are known in advance and remain static throughout the
lifetime of the system, it is possible to choose fixed optimal values lifetime of the system, it is possible to choose fixed optimal values
for parameters such as stabilization rate, neighborhood set size and for parameters such as stabilization rate, neighborhood set size and
routing table size. However, if the operating conditions (e.g., the routing table size. However, if the operating conditions (e.g., the
size of the overlay and its churn rate) do not remain static but size of the overlay and its churn rate) do not remain static but
evolve with time, it is not possible to achieve both a low lookup evolve with time, it is not possible to achieve both a low lookup
failure rate and a low communication overhead by using fixed failure rate and a low communication overhead by using fixed
parameters [ghinita2006]. parameters [Ghinita2006].
As an example, to configure the Chord DHT algorithm, one needs to As an example, to configure the Chord DHT algorithm, one needs to
select values for the following parameters: size of successor list, select values for the following parameters: size of successor list,
stabilization interval, and size of the finger table. To select an stabilization interval, and size of the finger table. To select an
appropriate value for the stabilization interval, one needs to know appropriate value for the stabilization interval, one needs to know
the expected churn rate and overlay size. According to the expected churn rate and overlay size. According to
[liben-nowell2002], a Chord network in a ring-like state remains in a
[Liben-Nowell2002], a Chord network in a ring-like state remains in a
ring-like state as long as peers send Omega(square(log(N))) messages ring-like state as long as peers send Omega(square(log(N))) messages
before N new peers join or N/2 peers fail. Thus, in a 500-peer before N new peers join or N/2 peers fail. Thus, in a 500-peer
overlay churning at a rate such that one peer joins and one peer overlay churning at a rate such that one peer joins and one peer
leaves the network every 30 seconds, an appropriate stabilization leaves the network every 30 seconds, an appropriate stabilization
interval would be on the order of 93s. According to [Chord], the interval would be on the order of 93 s. According to [Chord], the
size of the successor list and finger table should be on the order of size of the successor list and finger table should be on the order of
log(N). Already a successor list of a modest size (e.g., log2(N) or log(N). Already a successor list of a modest size (e.g., log2(N) or
2*log2(N), which is the successor list size used in [Chord]) makes it 2*log2(N), which is the successor list size used in [Chord]) makes it
very unlikely that a peer will lose all of its successors, which very unlikely that a peer will lose all of its successors, which
would cause the Chord ring to become disconnected. Thus, in a would cause the Chord ring to become disconnected. Thus, in a
500-peer network each peer should maintain on the order of nine 500-peer network each peer should maintain on the order of nine
successors and fingers. However, if the churn rate doubles and the successors and fingers. However, if the churn rate doubles and the
network size remains unchanged, the stabilization rate should double network size remains unchanged, the stabilization rate should double
as well. That is, the appropriate maintenance interval would now be as well. That is, the appropriate maintenance interval would now be
on the order of 46s. On the other hand, if the churn rate becomes on the order of 46 s. On the other hand, if the churn rate becomes,
e.g. six-fold and the size of the network grows to 2000 peers, on the e.g., six-fold and the size of the network grows to 2000 peers, on
order of eleven fingers and successors should be maintained and the the order of 11 fingers and successors should be maintained and the
stabilization interval should be on the order of 42s. If one stabilization interval should be on the order of 42 s. If one
continued using the old values, this could result in inaccurate continued using the old values, this could result in inaccurate
routing tables, network partitioning, and deteriorating performance. routing tables, network partitioning, and deteriorating performance.
3.3. Adaptive Stabilization 3.3. Adaptive Stabilization
A self-tuning DHT takes into consideration the continuous evolution A self-tuning DHT takes into consideration the continuous evolution
of network conditions and adapts to them. In a self-tuning DHT, each of network conditions and adapts to them. In a self-tuning DHT, each
peer collects statistical data about the network and dynamically peer collects statistical data about the network and dynamically
adjusts its stabilization rate, neighborhood set size, and finger adjusts its stabilization rate, neighborhood set size, and finger
table size based on the analysis of the data [ghinita2006]. table size based on the analysis of the data [Ghinita2006].
Reference [mahajan2003] shows that by using self-tuning, it is Reference [Mahajan2003] shows that by using self-tuning, it is
possible to achieve high reliability and performance even in adverse possible to achieve high reliability and performance even in adverse
conditions with low maintenance cost. Adaptive stabilization has conditions with low maintenance cost. Adaptive stabilization has
been shown to outperform periodic stabilization in terms of both been shown to outperform periodic stabilization in terms of both
lookup failures and communication overhead [ghinita2006]. lookup failures and communication overhead [Ghinita2006].
4. Introduction to Chord 4. Introduction to Chord
Chord [Chord] is a structured P2P algorithm that uses consistent Chord [Chord] is a structured P2P algorithm that uses consistent
hashing to build a DHT out of several independent peers. Consistent hashing to build a DHT out of several independent peers. Consistent
hashing assigns each peer and resource a fixed-length identifier. hashing assigns each peer and resource a fixed-length identifier.
Peers use SHA-1 as the base hash fuction to generate the identifiers. Peers use SHA-1 as the base hash function to generate the
As specified in RELOAD base, the length of the identifiers is identifiers. As specified in RELOAD base [RFC6940], the length of
numBitsInNodeId=128 bits. The identifiers are ordered on an the identifiers is numBitsInNodeId=128 bits. The identifiers are
identifier circle of size 2^numBitsInNodeId. On the identifier ordered on an identifier circle of size 2^numBitsInNodeId. On the
circle, key k is assigned to the first peer whose identifier equals identifier circle, key k is assigned to the first peer whose
or follows the identifier of k in the identifier space. The identifier equals or follows the identifier of k in the identifier
identifier circle is called the Chord ring. space. The identifier circle is called the Chord ring.
Different DHTs differ significantly in performance when bandwidth is Different DHTs differ significantly in performance when bandwidth is
limited. It has been shown that when compared to other DHTs, the limited. It has been shown that when compared to other DHTs, the
advantages of Chord include that it uses bandwidth efficiently and advantages of Chord include that it uses bandwidth efficiently and
can achieve low lookup latencies at little cost [li2004]. can achieve low lookup latencies at little cost [Li2004].
A simple lookup mechanism could be implemented on a Chord ring by A simple lookup mechanism could be implemented on a Chord ring by
requiring each peer to only know how to contact its current successor requiring each peer to only know how to contact its current successor
on the identifier circle. Queries for a given identifier could then on the identifier circle. Queries for a given identifier could then
be passed around the circle via the successor pointers until they be passed around the circle via the successor pointers until they
encounter the first peer whose identifier is equal to or larger than encounter the first peer whose identifier is equal to or larger than
the desired identifier. Such a lookup scheme uses a number of the desired identifier. Such a lookup scheme uses a number of
messages that grows linearly with the number of peers. To reduce the messages that grows linearly with the number of peers. To reduce the
cost of lookups, Chord maintains also additional routing information; cost of lookups, Chord maintains also additional routing information;
each peer n maintains a data structure with up to numBitsInNodeId each peer n maintains a data structure with up to numBitsInNodeId
entries, called the finger table. The first entry in the finger entries, called the finger table. The first entry in the finger
table of peer n contains the peer half-way around the ring from peer table of peer n contains the peer halfway around the ring from peer
n. The second entry contains the peer that is 1/4th of the way n. The second entry contains the peer that is 1/4th of the way
around, the third entry the peer that is 1/8th of the way around, around, the third entry the peer that is 1/8th of the way around,
etc. In other words, the ith entry in the finger table at peer n etc. In other words, the ith entry in the finger table at peer n
contains the identity of the first peer s that succeeds n by at least contains the identity of the first peer s that succeeds n by at least
2^(numBitsInNodeId-i) on the Chord ring. This peer is called the ith 2^(numBitsInNodeId-i) on the Chord ring. This peer is called the ith
finger of peer n. The interval between two consecutive fingers is finger of peer n. The interval between two consecutive fingers is
called a finger interval. The ith finger interval of peer n covers called a finger interval. The ith finger interval of peer n covers
the range [n.id + 2^(numBitsInNodeId-i), n.id + 2^(numBitsInNodeId- the range [n.id + 2^(numBitsInNodeId-i), n.id + 2^(numBitsInNodeId-
i+1)) on the Chord ring. In an N-peer network, each peer maintains i+1)) on the Chord ring. In an N-peer network, each peer maintains
information about O(log(N)) other peers in its finger table. As an information about O(log(N)) other peers in its finger table. As an
skipping to change at page 9, line 37 skipping to change at page 9, line 5
successors. The benefit of successor lists is that if each peer successors. The benefit of successor lists is that if each peer
fails independently with probability p, the probability that all r fails independently with probability p, the probability that all r
successors fail simultaneously is only p^r. successors fail simultaneously is only p^r.
The original Chord algorithm maintains only a single predecessor The original Chord algorithm maintains only a single predecessor
pointer. However, multiple predecessor pointers (i.e., a predecessor pointer. However, multiple predecessor pointers (i.e., a predecessor
list) can be maintained to speed up recovery from predecessor list) can be maintained to speed up recovery from predecessor
failures. The routing table of a peer consists of the successor failures. The routing table of a peer consists of the successor
list, finger table, and predecessor list. list, finger table, and predecessor list.
5. Extending Chord-reload to Support Self-tuning 5. Extending Chord-Reload to Support Self-Tuning
This section describes how the mandatory-to-implement chord-reload This section describes how the mandatory-to-implement chord-reload
algorithm defined in RELOAD base [RFC6940] can be extended to support algorithm defined in RELOAD base [RFC6940] can be extended to support
self-tuning. self-tuning.
The chord-reload algorithm supports both reactive and periodic The chord-reload algorithm supports both reactive and periodic
recovery strategies. When the self-tuning mechanisms defined in this recovery strategies. When the self-tuning mechanisms defined in this
document are used, the periodic recovery strategy is used. Further, document are used, the periodic recovery strategy is used. Further,
chord-reload specifies that at least three predecessors and three chord-reload specifies that at least three predecessors and three
successors need to be maintained. When the self-tuning mechanisms successors need to be maintained. When the self-tuning mechanisms
are used, the appropriate sizes of the successor list and predecessor are used, the appropriate sizes of the successor list and predecessor
list are determined in an adaptive fashion based on the estimated list are determined in an adaptive fashion based on the estimated
network size, as will be described in Section 6. network size, as will be described in Section 6.
As specified in RELOAD base, each peer maintains a stabilization As specified in RELOAD base [RFC6940], each peer maintains a
timer. When the stabilization timer fires, the peer restarts the stabilization timer. When the stabilization timer fires, the peer
timer and carries out the overlay stabilization routine. Overlay restarts the timer and carries out the overlay stabilization routine.
stabilization has four components in chord-reload: Overlay stabilization has four components in chord-reload:
1. Update the neighbor table. We refer to this as neighbor 1. Update the neighbor table. We refer to this as "neighbor
stabilization. stabilization".
2. Refreshing the finger table. We refer to this as finger 2. Refreshing the finger table. We refer to this as "finger
stabilization. stabilization".
3. Adjusting finger table size. 3. Adjusting finger table size.
4. Detecting partitioning. We refer to this as strong 4. Detecting partitioning. We refer to this as "strong
stabilization. stabilization".
As specified in RELOAD base [RFC6940], a peer sends periodic messages As specified in RELOAD base [RFC6940], a peer sends periodic messages
as part of the neighbor stabilization, finger stabilization, and as part of the neighbor stabilization, finger stabilization, and
strong stabilization routines. In neighbor stabilization, a peer strong stabilization routines. In neighbor stabilization, a peer
periodically sends an Update request to every peer in its Connection periodically sends an Update request to every peer in its connection
Table. The default time is every ten minutes. In finger table. The default time is every ten minutes. In finger
stabilization, a peer periodically searches for new peers to include stabilization, a peer periodically searches for new peers to include
in its finger table. This time defaults to one hour. This document in its finger table. This time defaults to one hour. This document
specifies how the neighbor stabilization and finger stabilization specifies how the neighbor stabilization and finger stabilization
intervals can be determined in an adaptive fashion based on the intervals can be determined in an adaptive fashion based on the
operating conditions of the overlay. The subsections below describe operating conditions of the overlay. The subsections below describe
how this document extends the four components of stabilization. how this document extends the four components of stabilization.
5.1. Update Requests 5.1. Update Requests
As described in RELOAD base [RFC6940], the neighbor and finger As described in RELOAD base [RFC6940], the neighbor and finger
stabilization procedures are implemented using Update requests. stabilization procedures are implemented using Update requests.
RELOAD base defines three types of Update requests: 'peer_ready', RELOAD base defines three types of Update requests: 'peer_ready',
'neighbors', and 'full'. Regardless of the type, all Update requests 'neighbors', and 'full'. Regardless of the type, all Update requests
include an 'uptime' field. The self-tuning extensions require include an 'uptime' field. The self-tuning extensions require
information on the uptimes of peers in the routing table. The sender information on the uptimes of peers in the routing table. The sender
of an Update request includes its current uptime (in seconds) in the of an Update request includes its current uptime (in seconds) in the
'uptime' field. Regardless of the type, all Update requests MUST 'uptime' field. Regardless of the type, all Update requests MUST
include an 'uptime' field. include an 'uptime' field.
When self-tuning is used, each peer decides independently the When self-tuning is used, each peer decides independently the
appropriate size for the successor list, predecessor list and finger appropriate size for the successor list, predecessor list, and finger
table. Thus, the 'predecessors', 'successors', and 'fingers' fields table. Thus, the 'predecessors', 'successors', and 'fingers' fields
included in RELOAD Update requests are of variable length. As included in RELOAD Update requests are of variable length. As
specified in RELOAD [RFC6940], variable length fields are on the wire specified in RELOAD [RFC6940], variable-length fields are on the wire
preceded by length bytes. In the case of the successor list, preceded by length bytes. In the case of the successor list,
predecessor list, and finger table, there are two length bytes predecessor list, and finger table, there are two length bytes
(allowing lengths up to 2^16-1). The number of NodeId structures (allowing lengths up to 2^16-1). The number of NodeId structures
included in each field can be calculated based on the length bytes included in each field can be calculated based on the length bytes
since the size of a single NodeId structure is 16 bytes. If a peer since the size of a single NodeId structure is 16 bytes. If a peer
receives more entries than fit into its successor list, predecessor receives more entries than fit into its successor list, predecessor
list or finger table, the peer MUST ignore the extra entries. A peer list, or finger table, the peer MUST ignore the extra entries. A
may also receive less entries than it currently has in its own data peer may also receive less entries than it currently has in its own
structure. In that case, it uses the received entries to update only data structure. In that case, it uses the received entries to update
a subset of the entries in its data structure. As an example, a peer only a subset of the entries in its data structure. As an example, a
that has a successor list of size 8 may receive a successor list of peer that has a successor list of size 8 may receive a successor list
size 4 from its immediate successor. In that case, the received of size 4 from its immediate successor. In that case, the received
successor list can only be used to update the first few successors on successor list can only be used to update the first few successors on
the peer's successor list. The rest of the successors will remain the peer's successor list. The rest of the successors will remain
intact. intact.
5.2. Neighbor Stabilization 5.2. Neighbor Stabilization
In the neighbor stabilization operation of chord-reload, a peer In the neighbor stabilization operation of chord-reload, a peer
periodically sends an Update request to every peer in its Connection periodically sends an Update request to every peer in its connection
Table. In a small, low-churn overlay, the amount of traffic this table. In a small, low-churn overlay, the amount of traffic this
process generates is typically acceptable. However, in a large-scale process generates is typically acceptable. However, in a large-scale
overlay churning at a moderate or high churn rate, the traffic load overlay churning at a moderate or high churn rate, the traffic load
may no longer be acceptable since the size of the connection table is may no longer be acceptable since the size of the connection table is
large and the stabilization interval relatively short. The self- large and the stabilization interval relatively short. The self-
tuning mechanisms described in this document are especially designed tuning mechanisms described in this document are especially designed
for overlays of the latter type. Therefore, when the self-tuning for overlays of the latter type. Therefore, when the self-tuning
mechanisms are used, each peer only sends a periodic Update request mechanisms are used, each peer only sends a periodic Update request
to its first predecessor and first successor on the Chord ring; it to its first predecessor and first successor on the Chord ring; it
MUST NOT send Update requests to others. MUST NOT send Update requests to others.
skipping to change at page 12, line 7 skipping to change at page 11, line 21
allows the new peer to insert the sender into its neighborhood set. allows the new peer to insert the sender into its neighborhood set.
5.3. Finger Stabilization 5.3. Finger Stabilization
Chord-reload specifies two alternative methods for searching for new Chord-reload specifies two alternative methods for searching for new
peers to the finger table. Both of the alternatives can be used with peers to the finger table. Both of the alternatives can be used with
the self-tuning extensions defined in this document. the self-tuning extensions defined in this document.
Immediately after a new peer has been added to the finger table, a Immediately after a new peer has been added to the finger table, a
Probe request is sent to the new peer to fetch its uptime. The Probe request is sent to the new peer to fetch its uptime. The
requested_info field of the Probe request MUST be set to contain the 'requested_info' field of the Probe request MUST be set to contain
ProbeInformationType 'uptime' defined in RELOAD base [RFC6940]. the ProbeInformationType 'uptime' defined in RELOAD base [RFC6940].
5.4. Adjusting Finger Table Size 5.4. Adjusting Finger Table Size
The chord-reload algorithm defines how a peer can make sure that the The chord-reload algorithm defines how a peer can make sure that the
finger table is appropriately sized to allow for efficient routing. finger table is appropriately sized to allow for efficient routing.
Since the self-tuning mechanisms specified in this document produce a Since the self-tuning mechanisms specified in this document produce a
network size estimate, this estimate can be directly used to network size estimate, this estimate can be directly used to
calculate the optimal size for the finger table. This mechanism is calculate the optimal size for the finger table. This mechanism is
used instead of the one specified by chord-reload. A peer uses the used instead of the one specified by chord-reload. A peer uses the
network size estimate to determine whether it needs to adjust the network size estimate to determine whether it needs to adjust the
skipping to change at page 12, line 31 skipping to change at page 11, line 45
5.5. Detecting Partitioning 5.5. Detecting Partitioning
This document does not require any changes to the mechanism chord- This document does not require any changes to the mechanism chord-
reload uses to detect network partitioning. reload uses to detect network partitioning.
5.6. Leaving the Overlay 5.6. Leaving the Overlay
As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a As specified in RELOAD base [RFC6940], a leaving peer SHOULD send a
Leave request to all members of its neighbor table prior to leaving Leave request to all members of its neighbor table prior to leaving
the overlay. The overlay_specific_data field MUST contain the the overlay. The 'overlay_specific_data' field MUST contain the
ChordLeaveData structure. The Leave requests that are sent to ChordLeaveData structure. The Leave requests that are sent to
successors contain the predecessor list of the leaving peer. The successors contain the predecessor list of the leaving peer. The
Leave requests that are sent to the predecessors contain the Leave requests that are sent to the predecessors contain the
successor list of the leaving peer. If a given successor can successor list of the leaving peer. If a given successor can
identify better predecessors (that is, predecessors that are closer identify better predecessors (that is, predecessors that are closer
to it on the Chord ring than its existing predecessors) than are to it on the Chord ring than its existing predecessors) than are
already included in its predecessor lists by investigating the already included in its predecessor lists by investigating the
predecessor list it receives from the leaving peer, it sends Attach predecessor list it receives from the leaving peer, it sends Attach
requests to them. Similarly, if a given predecessor identifies requests to them. Similarly, if a given predecessor identifies
better successors by investigating the successor list it receives better successors by investigating the successor list it receives
from the leaving peer, it sends Attach requests to them. from the leaving peer, it sends Attach requests to them.
6. Self-tuning Chord Parameters 6. Self-Tuning Chord Parameters
This section specifies how to determine an appropriate stabilization This section specifies how to determine an appropriate stabilization
rate and routing table size in an adaptive fashion. The proposed rate and routing table size in an adaptive fashion. The proposed
mechanism is based on [mahajan2003], [liben-nowell2002], and mechanism is based on [Mahajan2003], [Liben-Nowell2002], and
[ghinita2006]. To calculate an appropriate stabilization rate, the [Ghinita2006]. To calculate an appropriate stabilization rate, the
values of three parameters must be estimated: overlay size N, failure values of three parameters must be estimated: overlay size N, failure
rate U, and join rate L. To calculate an appropriate routing table rate U, and join rate L. To calculate an appropriate routing table
size, the estimated network size N can be used. Peers in the overlay size, the estimated network size N can be used. Peers in the overlay
MUST re-calculate the values of the parameters to self-tune the MUST recalculate the values of the parameters to self-tune the chord-
chord-reload algorithm at the end of each stabilization period before reload algorithm at the end of each stabilization period before
re-starting the stabilization timer. restarting the stabilization timer.
6.1. Estimating Overlay Size 6.1. Estimating Overlay Size
Techniques for estimating the size of an overlay network have been Techniques for estimating the size of an overlay network have been
proposed for instance in [mahajan2003], [horowitz2003], proposed, for instance, in [Mahajan2003], [Horowitz2003],
[kostoulas2005], [binzenhofer2006], and [ghinita2006]. In Chord, the [Kostoulas2005], [Binzenhofer2006], and [Ghinita2006]. In Chord, the
density of peer identifiers in the neighborhood set can be used to density of peer identifiers in the neighborhood set can be used to
produce an estimate of the size of the overlay, N [mahajan2003]. produce an estimate of the size of the overlay, N [Mahajan2003].
Since peer identifiers are picked randomly with uniform probability Since peer identifiers are picked randomly with uniform probability
from the numBitsInNodeId-bit identifier space, the average distance from the numBitsInNodeId-bit identifier space, the average distance
between peer identifiers in the successor set is between peer identifiers in the successor set is
(2^numBitsInNodeId)/N. (2^numBitsInNodeId)/N.
To estimate the overlay network size, a peer computes the average To estimate the overlay network size, a peer computes the average
inter-peer distance d between the successive peers starting from the inter-peer distance d between the successive peers starting from the
most distant predecessor and ending to the most distant successor in most distant predecessor and ending to the most distant successor in
the successor list. The estimated network size is calculated as: the successor list. The estimated network size is calculated as:
2^numBitsInNodeId 2^numBitsInNodeId
N = ------------------- N = -------------------
d d
This estimate has been found to be accurate within 15% of the real This estimate has been found to be accurate within 15% of the real
network size [ghinita2006]. Of course, the size of the neighborhood network size [Ghinita2006]. Of course, the size of the neighborhood
set affects the accuracy of the estimate. set affects the accuracy of the estimate.
During the join process, a joining peer fills its routing table by During the join process, a joining peer fills its routing table by
sending a series of Ping and Attach requests, as specified in RELOAD sending a series of Ping and Attach requests, as specified in RELOAD
base [RFC6940]. Thus, a joining peer immediately has enough base [RFC6940]. Thus, a joining peer immediately has enough
information at its disposal to calculate an estimate of the network information at its disposal to calculate an estimate of the network
size. size.
6.2. Determining Routing Table Size 6.2. Determining Routing Table Size
As specified in RELOAD base, the finger table must contain at least As specified in RELOAD base [RFC6940], the finger table must contain
16 entries. When the self-tuning mechanisms are used, the size of at least 16 entries. When the self-tuning mechanisms are used, the
the finger table MUST be set to max(ceiling(log2(N)), 16) using the size of the finger table MUST be set to max(ceiling(log2(N)), 16)
estimated network size N. using the estimated network size N.
The size of the successor list MUST be set to a maximum of The size of the successor list MUST be set to a maximum of
ceiling(log2(N)). An implementation can place a lower limit on the ceiling(log2(N)). An implementation can place a lower limit on the
size of the successor list. As an example, the implementation might size of the successor list. As an example, the implementation might
require the size of the successor list to be always at least three. require the size of the successor list to be always at least three.
The size of the predecessor list MUST be set to ceiling(log2(N)). The size of the predecessor list MUST be set to ceiling(log2(N)).
6.3. Estimating Failure Rate 6.3. Estimating Failure Rate
A typical approach is to assume that peers join the overlay according A typical approach is to assume that peers join the overlay according
to a Poisson process with rate L and leave according to a Poisson to a Poisson process with rate L and leave according to a Poisson
process with rate parameter U [mahajan2003]. The value of U can be process with rate parameter U [Mahajan2003]. The value of U can be
estimated using peer failures in the finger table and neighborhood estimated using peer failures in the finger table and neighborhood
set [mahajan2003]. If peers fail with rate U, a peer with M unique set [Mahajan2003]. If peers fail with rate U, a peer with M unique
peer identifiers in its routing table should observe K failures in peer identifiers in its routing table should observe K failures in
time K/(M*U). Every peer in the overlay maintains a history of the time K/(M*U). Every peer in the overlay maintains a history of the
last K failures. The current time is inserted into the history when last K failures. The current time is inserted into the history when
the peer joins the overlay. The estimate of U is calculated as: the peer joins the overlay. The estimate of U is calculated as:
k k
U = --------, U = --------,
M * Tk M * Tk
where M is the number of unique peer identifiers in the routing where M is the number of unique peer identifiers in the routing
table, Tk is the time between the first and the last failure in the table, Tk is the time between the first and the last failure in the
history, and k is the number of failures in the history. If k is history, and k is the number of failures in the history. If k is
smaller than K, the estimate is computed as if there was a failure at smaller than K, the estimate is computed as if there was a failure at
the current time. It has been shown that an estimate calculated in a the current time. It has been shown that an estimate calculated in a
similar manner is accurate within 17% of the real value of U similar manner is accurate within 17% of the real value of U
[ghinita2006]. [Ghinita2006].
The size of the failure history K affects the accuracy of the The size of the failure history K affects the accuracy of the
estimate of U. One can increase the accuracy by increasing K. estimate of U. One can increase the accuracy by increasing K.
However, this has the side effect of decreasing responsiveness to However, this has the side effect of decreasing responsiveness to
changes in the failure rate. On the other hand, a small history size changes in the failure rate. On the other hand, a small history size
may cause a peer to overreact each time a new failure occurs. In may cause a peer to overreact each time a new failure occurs. In
[ghinita2006], K is set to 25% of the routing table size. Use of [Ghinita2006], K is set to 25% of the routing table size. Use of
this value is RECOMMENDED. this value is RECOMMENDED.
6.3.1. Detecting Failures 6.3.1. Detecting Failures
A new failure is inserted to the failure history in the following A new failure is inserted to the failure history in the following
cases: cases:
1. A Leave request is received from a neigbhor. 1. A Leave request is received from a neighbor.
2. A peer fails to reply to a Ping request sent in the situation 2. A peer fails to reply to a Ping request sent in the situation
explained below. If no packets have been received on a explained below. If no packets have been received on a
connection during the past 2*Tr seconds (where Tr is the connection during the past 2*Tr seconds (where Tr is the
inactivity timer defined by ICE [RFC5245]), a RELOAD Ping request inactivity timer defined by Interactive Connectivity
MUST be sent to the remote peer. RELOAD mandates the use of STUN Establishment (ICE) [RFC5245]), a RELOAD Ping request MUST be
[RFC5389] for keepalives. STUN keepalives take the form of STUN sent to the remote peer. RELOAD mandates the use of Session
Binding Indication transactions. As specified in ICE [RFC5245], Traversal Utilities for NAT (STUN) [RFC5389] for keepalives.
a peer sends a STUN Binding Indication if there has been no STUN keepalives take the form of STUN Binding Indication
packet sent on a connection for Tr seconds. Tr is configurable transactions. As specified in ICE [RFC5245], a peer sends a STUN
and has a default of 15 seconds. Although STUN Binding Binding Indication if there has been no packet sent on a
Indications do not generate a response, the fact that a peer has connection for Tr seconds. Tr is configurable and has a default
failed can be learned from the lack of packets (Binding of 15 seconds. Although STUN Binding Indications do not generate
Indications or application protocol packets) received from the a response, the fact that a peer has failed can be learned from
peer. If the remote peer fails to reply to the Ping request, the the lack of packets (Binding Indications or application protocol
sender should consider the remote peer to have failed. packets) received from the peer. If the remote peer fails to
reply to the Ping request, the sender should consider the remote
peer to have failed.
As an alternative to relying on STUN keepalives to detect peer As an alternative to relying on STUN keepalives to detect peer
failure, a peer could send additional, frequent RELOAD messages to failure, a peer could send additional, frequent RELOAD messages to
every peer in its Connection Table. These messages could be Update every peer in its connection table. These messages could be Update
requests, in which case they would serve two purposes: detecting peer requests, in which case they would serve two purposes: detecting peer
failure and stabilization. However, as the cost of this approach can failure and stabilization. However, as the cost of this approach can
be very high in terms of bandwidth consumption and traffic load, be very high in terms of bandwidth consumption and traffic load,
especially in large-scale overlays experiencing churn, its use is NOT especially in large-scale overlays experiencing churn, its use is NOT
RECOMMENDED. RECOMMENDED.
6.4. Estimating Join Rate 6.4. Estimating Join Rate
Reference [ghinita2006] proposes that a peer can estimate the join Reference [Ghinita2006] proposes that a peer can estimate the join
rate based on the uptime of the peers in its routing table. An rate based on the uptime of the peers in its routing table. An
increase in peer join rate will be reflected by a decrease in the increase in peer join rate will be reflected by a decrease in the
average age of peers in the routing table. Thus, each peer maintaind average age of peers in the routing table. Thus, each peer
an array of the ages of the peers in its routing table sorted in maintained an array of the ages of the peers in its routing table
increasing order. Using this information, an estimate of the global sorted in increasing order. Using this information, an estimate of
peer join rate L is calculated as: the global peer join rate L is calculated as:
N N
L = ----------------------, L = ----------------------,
Ages[floor(rsize/2)] Ages[floor(rsize/2)]
where Ages is an array containing the ages of the peers in the where Ages is an array containing the ages of the peers in the
routing table sorted in increasing order and rsize is the size of the routing table sorted in increasing order and rsize is the size of the
routing table. It has been shown that the estimate obtained by using routing table. It has been shown that the estimate obtained by using
this method is accurate within 22% of the real join rate this method is accurate within 22% of the real join rate
[ghinita2006]. Of course, the size of the routing table affects the [Ghinita2006]. Of course, the size of the routing table affects the
accuracy. accuracy.
In order for this mechanism to work, peers need to exchange In order for this mechanism to work, peers need to exchange
information about the time they have been present in the overlay. information about the time they have been present in the overlay.
Peers receive the uptimes of their successors and predecessors during Peers receive the uptimes of their successors and predecessors during
the stabilization operations since all Update requests carry uptime the stabilization operations since all Update requests carry uptime
values. A joining peer learns the uptime of the admitting peer since values. A joining peer learns the uptime of the admitting peer since
it receives an Update from the admitting peer during the join it receives an Update from the admitting peer during the join
procedure. Peers learn the uptimes of new fingers since they can procedure. Peers learn the uptimes of new fingers since they can
fetch the uptime using a Probe request after having attached to the fetch the uptime using a Probe request after having attached to the
new finger. new finger.
6.5. Estimate Sharing 6.5. Estimate Sharing
To improve the accuracy of network size, join rate, and leave rate To improve the accuracy of network size, join rate, and leave rate
estimates, peers share their estimates. When the stabilization timer estimates, peers share their estimates. When the stabilization timer
fires, a peer selects number-of-peers-to-probe random peers from its fires, a peer selects number-of-peers-to-probe random peers from its
finger table and send each of them a Probe request. The targets of finger table and send each of them a Probe request. The targets of
Probe requests are selected from the finger table rather than from Probe requests are selected from the finger table rather than from
the neighbor table since neighbors are likely to make similar errors the neighbor table since neighbors are likely to make similar errors
when calculating their estimates. number-of-peers-to-probe is a new when calculating their estimates. The number-of-peers-to-probe is a
element in the overlay configuration document. It is defined in new element in the overlay configuration document. It is defined in
Section 7. Both the Probe request and the answer returned by the Section 7. Both the Probe request and the answer returned by the
target peer MUST contain a new message extension whose target peer MUST contain a new message extension whose
MessageExtensionType is 'self_tuning_data'. This extension type is MessageExtensionType is 'self_tuning_data'. This extension type is
defined in Section 9.1. The extension_contents field of the defined in Section 9.1. The 'extension_contents' field of the
MessageExtension structure MUST contain a SelfTuningData structure: MessageExtension structure MUST contain a SelfTuningData structure:
struct { struct {
uint32 network_size; uint32 network_size;
uint32 join_rate; uint32 join_rate;
uint32 leave_rate; uint32 leave_rate;
} SelfTuningData; } SelfTuningData;
The contents of the SelfTuningData structure are as follows: The contents of the SelfTuningData structure are as follows:
skipping to change at page 16, line 43 skipping to change at page 16, line 21
join_rate join_rate
The latest join rate estimate calculated by the sender. The latest join rate estimate calculated by the sender.
leave_rate leave_rate
The latest leave rate estimate calculated by the sender. The latest leave rate estimate calculated by the sender.
The join and leave rates are expressed as joins or failures per 24 The join and leave rates are expressed as joins or failures per 24
hours. As an example, if the global join rate estimate a peer has hours. As an example, if the global join rate estimate a peer has
calculated is 0.123 peers/s, it would include in the join_rate field calculated is 0.123 peers/s, it would include in the 'join_rate'
the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2), that is, field the ceiling of the value 10627.2 (24*60*60*0.123 = 10627.2),
the value 10628. that is, the value 10628.
The 'type' field of the MessageExtension structure MUST be set to The 'type' field of the MessageExtension structure MUST be set to
contain the value 'self_tuning_data'. The 'critical' field of the contain the value 'self_tuning_data'. The 'critical' field of the
structure MUST be set to False. structure MUST be set to False.
A peer stores all estimates it receives in Probe requests and answers A peer stores all estimates it receives in Probe requests and answers
during a stabilization interval. When the stabilization timer fires, during a stabilization interval. When the stabilization timer fires,
the peer calculates the estimates to be used during the next the peer calculates the estimates to be used during the next
stabilization interval by taking the 75th percentile (i.e., third stabilization interval by taking the 75th percentile (i.e., third
quartile) of a data set containing its own estimate and the received quartile) of a data set containing its own estimate and the received
estimates. estimates.
The default value for number-of-peers-to-probe is 4. This default The default value for number-of-peers-to-probe is 4. This default
value is recommended to allow a peer to receive a sufficiently large value is recommended to allow a peer to receive a sufficiently large
set of estimates from other peers. With a value of 4, a peer set of estimates from other peers. With a value of 4, a peer
receives four estimates in Probe answers. On the average, each peer receives four estimates in Probe answers. On the average, each peer
also receives four Probe requests each carrying an estimate. Thus, also receives four Probe requests each carrying an estimate. Thus,
on the average, each peer has nine estimates (including its own) that on the average, each peer has nine estimates (including its own) that
it can use at the end of the stablization interval. A value smaller it can use at the end of the stabilization interval. A value smaller
than 4 is NOT RECOMMENDED to keep the number of received estimates than 4 is NOT RECOMMENDED to keep the number of received estimates
high enough. As an example, if the value were 2, there would be high enough. As an example, if the value were 2, there would be
peers in the overlay that would only receive two estimates during a peers in the overlay that would only receive two estimates during a
stabilization interval. Such peers would only have three estimates stabilization interval. Such peers would only have three estimates
available at the end of the interval, which may not be reliable available at the end of the interval, which may not be reliable
enough since even a single exceptionally high or low estimate can enough since even a single exceptionally high or low estimate can
have a large impact. have a large impact.
6.6. Calculating the Stabilization Interval 6.6. Calculating the Stabilization Interval
According to [liben-nowell2002], a Chord network in a ring-like state According to [Liben-Nowell2002], a Chord network in a ring-like state
remains in a ring-like state as long as peers send remains in a ring-like state as long as peers send
Omega(square(log(N))) messages before N new peers join or N/2 peers Omega(square(log(N))) messages before N new peers join or N/2 peers
fail. We can use the estimate of peer failure rate, U, to calculate fail. We can use the estimate of peer failure rate, U, to calculate
the time Tf in which N/2 peers fail: the time Tf in which N/2 peers fail:
1 1
Tf = ------ Tf = ------
2*U 2*U
Based on this estimate, a stabilization interval Tstab-1 is Based on this estimate, a stabilization interval Tstab-1 is
skipping to change at page 18, line 12 skipping to change at page 17, line 35
the time in which N new peers join the overlay. Based on the the time in which N new peers join the overlay. Based on the
estimate of L, a stabilization interval Tstab-2 is calculated as: estimate of L, a stabilization interval Tstab-2 is calculated as:
N N
Tstab-2 = --------------------- Tstab-2 = ---------------------
L * square(log2(N)) L * square(log2(N))
Finally, the actual stabilization interval Tstab that is used can be Finally, the actual stabilization interval Tstab that is used can be
obtained by taking the minimum of Tstab-1 and Tstab-2. obtained by taking the minimum of Tstab-1 and Tstab-2.
The results obtained in [maenpaa2009] indicate that making the The results obtained in [Maenpaa2009] indicate that making the
stabilization interval too small has the effect of making the overlay stabilization interval too small has the effect of making the overlay
less stable (e.g., in terms of detected loops and path failures). less stable (e.g., in terms of detected loops and path failures).
Thus, a lower limit should be used for the stabilization period. Thus, a lower limit should be used for the stabilization period.
Based on the results in [maenpaa2009], a lower limit of 15s is Based on the results in [Maenpaa2009], a lower limit of 15 s is
RECOMMENDED, since using a stabilization period smaller than this RECOMMENDED, since using a stabilization period smaller than this
will with a high probability cause too much traffic in the overlay. will with a high probability cause too much traffic in the overlay.
7. Overlay Configuration Document Extension 7. Overlay Configuration Document Extension
This document extends the RELOAD overlay configuration document by This document extends the RELOAD overlay configuration document by
adding one new element, "number-of-peers-to-probe", inside each adding one new element, "number-of-peers-to-probe", inside each
"configuration" element. "configuration" element.
self-tuning:number-of-peers-to-probe: The number of fingers to which self-tuning:number-of-peers-to-probe: The number of fingers to which
Probe requests are sent to obtain their network size, join rate, Probe requests are sent to obtain their network size, join rate,
and leave rate estimates. The default value is 4. and leave rate estimates. The default value is 4.
The Relax NG Grammar for this element is: The RELAX NG grammar for this element is:
namespace self-tuning = "urn:ietf:params:xml:ns:p2p:self-tuning" namespace self-tuning = "urn:ietf:params:xml:ns:p2p:self-tuning"
parameter &= element self-tuning:number-of-peers-to-probe { parameter &= element self-tuning:number-of-peers-to-probe {
xsd:unsignedInt }? xsd:unsignedInt }?
This namespace is added into the <mandatory-extension> element in the This namespace is added into the <mandatory-extension> element in the
overlay configuration file. overlay configuration file.
8. Security Considerations 8. Security Considerations
skipping to change at page 19, line 17 skipping to change at page 18, line 41
number of other peers in the overlay in strategic locations, it may number of other peers in the overlay in strategic locations, it may
be able to send a high enough number of false estimates to a victim be able to send a high enough number of false estimates to a victim
and therefore influence the victim's choice of a stabilization and therefore influence the victim's choice of a stabilization
interval. interval.
9. IANA Considerations 9. IANA Considerations
9.1. Message Extensions 9.1. Message Extensions
This document introduces one additional extension to the "RELOAD This document introduces one additional extension to the "RELOAD
Extensions" Registry: Extensions Registry":
+------------------+-------+---------------+ +------------------+-------+---------------+
| Extension Name | Code | Specification | | Extension Name | Code | Specification |
+------------------+-------+---------------+ +------------------+-------+---------------+
| self_tuning_data | 0x3 | RFC-AAAA | | self_tuning_data | 0x3 | RFC 7363 |
+------------------+-------+---------------+ +------------------+-------+---------------+
The contents of the extension are defined in Section 6.5. The contents of the extension are defined in Section 6.5.
Note to RFC Editor: please replace AAAA with the RFC number for this
specification.
9.2. New Overlay Algorithm Type 9.2. New Overlay Algorithm Type
This document introduces one additional overlay algorithm type to the This document introduces one additional overlay algorithm type to the
"RELOAD Overlay Algorithm Types" Registry: "RELOAD Overlay Algorithm Types" registry:
+-------------------+-----------+ +-------------------+-----------+
| Algorithm Name | Reference | | Algorithm Name | Reference |
+-------------------+-----------+ +-------------------+-----------+
| CHORD-SELF-TUNING | RFC-AAAA | | CHORD-SELF-TUNING | RFC 7363 |
+-------------------+-----------+ +-------------------+-----------+
Note to RFC Editor: please replace AAAA with the RFC number for this
specification.
9.3. A New IETF XML Registry 9.3. A New IETF XML Registry
This document registers one new URI for the self-tuning namespace in This document registers one new URI for the self-tuning namespace in
the "ns" subregistry of the IETF XML registry defined in [RFC3688]. the "ns" subregistry of the IETF XML registry defined in [RFC3688].
URI: urn:ietf:params:xml:ns:p2p:self-tuning URI: urn:ietf:params:xml:ns:p2p:self-tuning
Registrant Contact: The IESG Registrant Contact: The IESG
XML: N/A, the requested URI is an XML namespace XML: N/A, the requested URI is an XML namespace
10. Acknowledgments 10. Acknowledgments
The authors would like to thank Jani Hautakorpi for his contributions The authors would like to thank Jani Hautakorpi for his contributions
to the document. The authors would also like to thank Carlos to the document. The authors would also like to thank Carlos
Bernardos, Martin Durst, Alissa Cooper, Tobias Gondrom, and Barry Bernardos, Martin Durst, Alissa Cooper, Tobias Gondrom, and Barry
Leiba for their comments on the document. Leiba for their comments on the document.
skipping to change at page 20, line 37 skipping to change at page 20, line 11
[RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing, [RFC5389] Rosenberg, J., Mahy, R., Matthews, P., and D. Wing,
"Session Traversal Utilities for NAT (STUN)", RFC 5389, "Session Traversal Utilities for NAT (STUN)", RFC 5389,
October 2008. October 2008.
[RFC6940] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and [RFC6940] Jennings, C., Lowekamp, B., Rescorla, E., Baset, S., and
H. Schulzrinne, "REsource LOcation And Discovery (RELOAD) H. Schulzrinne, "REsource LOcation And Discovery (RELOAD)
Base Protocol", RFC 6940, January 2014. Base Protocol", RFC 6940, January 2014.
11.2. Informative References 11.2. Informative References
[Binzenhofer2006]
Binzenhofer, A., Kunzmann, G., and R. Henjes, "A Scalable
Algorithm to Monitor Chord-Based P2P Systems at Runtime",
In Proceedings of the 20th IEEE International Parallel and
Distributed Processing Symposium (IPDPS), pp. 1-8, April
2006.
[CAN] Ratnasamy, S., Francis, P., Handley, M., Karp, R., and S. [CAN] Ratnasamy, S., Francis, P., Handley, M., Karp, R., and S.
Schenker, "A Scalable Content-Addressable Network", In Schenker, "A Scalable Content-Addressable Network", In
Proceedings of the 2001 Conference on Applications, Proceedings of the 2001 Conference on Applications,
Technologies, Architectures and Protocols for Computer Technologies, Architectures and Protocols for Computer
Communications pp. 161-172, August 2001. Communications, pp. 161-172, August 2001.
[Chord] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D., [Chord] Stoica, I., Morris, R., Liben-Nowell, D., Karger, D.,
Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A Kaashoek, M., Dabek, F., and H. Balakrishnan, "Chord: A
Scalable Peer-to-peer Lookup Service for Internet Scalable Peer-to-peer Lookup Service for Internet
Applications", IEEE/ACM Transactions on Networking Volume Applications", IEEE/ACM Transactions on Networking, Volume
11, Issue 1, pp. 17-32, February 2003. 11, Issue 1, pp. 17-32, February 2003.
[Pastry] Rowstron, A. and P. Druschel, "Pastry: Scalable, [Ghinita2006]
Decentralized Object Location and Routing for Large-Scale
Peer-to-Peer Systems", In Proceedings of the IFIP/ACM
International Conference on Distribued Systems Platforms
pp. 329-350, November 2001.
[RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
January 2004.
[binzenhofer2006]
Binzenhofer, A., Kunzmann, G., and R. Henjes, "A Scalable
Algorithm to Monitor Chord-Based P2P Systems at Runtime",
In Proceedings of the 20th IEEE International Parallel and
Distributed Processing Symposium (IPDPS) pp. 1-8, April
2006.
[ghinita2006]
Ghinita, G. and Y. Teo, "An Adaptive Stabilization Ghinita, G. and Y. Teo, "An Adaptive Stabilization
Framework for Distributed Hash Tables", In Proceedings of Framework for Distributed Hash Tables", In Proceedings of
the 20th IEEE International Parallel and Distributed the 20th IEEE International Parallel and Distributed
Processing Symposium (IPDPS) pp. 29-38, April 2006. Processing Symposium (IPDPS), pp. 29-38, April 2006.
[horowitz2003] [Horowitz2003]
Horowitz, K. and D. Malkhi, "Estimating Network Size from Horowitz, K. and D. Malkhi, "Estimating Network Size from
Local Information", Information Processing Letters Volume Local Information", Information Processing Letters, Volume
88, Issue 5, pp. 237-243, December 2003. 88, Issue 5, pp. 237-243, December 2003.
[kostoulas2005] [Kostoulas2005]
Kostoulas, D., Psaltoulis, D., Gupta, I., Birman, K., and Kostoulas, D., Psaltoulis, D., Gupta, I., Birman, K., and
A. Demers, "Decentralized Schemes for Size Estimation in A. Demers, "Decentralized Schemes for Size Estimation in
Large and Dynamic Groups", In Proceedings of the 4th IEEE Large and Dynamic Groups", In Proceedings of the 4th IEEE
International Symposium on Network Computing and International Symposium on Network Computing and
Applications pp. 41-48, July 2005. Applications, pp. 41-48, July 2005.
[krishnamurthy2008] [Krishnamurthy2008]
Krishnamurthy, S., El-Ansary, S., Aurell, E., and S. Krishnamurthy, S., El-Ansary, S., Aurell, E., and S.
Haridi, "Comparing Maintenance Strategies for Overlays", Haridi, "Comparing Maintenance Strategies for Overlays",
In Proceedings of the 16th Euromicro Conference on In Proceedings of the 16th Euromicro Conference on
Parallel, Distributed and Network-Based Processing pp. Parallel, Distributed and Network-Based Processing, pp.
473-482, February 2008. 473-482, February 2008.
[li2004] Li, J., Strinbling, J., Gil, T., Morris, R., and M. [Li2004] Li, J., Strinbling, J., Gil, T., Morris, R., and M.
Kaashoek, "Comparing the Performance of Distributed Hash Kaashoek, "Comparing the Performance of Distributed Hash
Tables Under Churn", Peer-to-Peer Systems III, volume 3279 Tables Under Churn", Peer-to-Peer Systems III, Volume 3279
of Lecture Notes in Computer Science Springer, pp. 87-99, of Lecture Notes in Computer Science, Springer, pp. 87-99,
February 2005. February 2005.
[liben-nowell2002] [Liben-Nowell2002]
Liben-Nowell, D., Balakrishnan, H., and D. Karger, Liben-Nowell, D., Balakrishnan, H., and D. Karger,
"Observations on the Dynamic Evolution of Peer-to-Peer "Observations on the Dynamic Evolution of Peer-to-Peer
Networks", In Proceedings of the 1st International Networks", In Proceedings of the 1st International
Workshop on Peer-to-Peer Systems (IPTPS) pp. 22-33, March Workshop on Peer-to-Peer Systems (IPTPS), pp. 22-33, March
2002. 2002.
[maenpaa2009] [Maenpaa2009]
Maenpaa, J. and G. Camarillo, "A Study on Maintenance Maenpaa, J. and G. Camarillo, "A Study on Maintenance
Operations in a Chord-Based Peer-to-Peer Session Operations in a Chord-Based Peer-to-Peer Session
Initiation Protocol Overlay Network", In Proceedings of Initiation Protocol Overlay Network", In Proceedings of
the 23rd IEEE International Parallel and Distributed the 23rd IEEE International Parallel and Distributed
Processing Symposium (IPDPS) pp. 1-9, May 2009. Processing Symposium (IPDPS), pp. 1-9, May 2009.
[mahajan2003] [Mahajan2003]
Mahajan, R., Castro, M., and A. Rowstron, "Controlling the Mahajan, R., Castro, M., and A. Rowstron, "Controlling the
Cost of Reliability in Peer-to-Peer Overlays", In Cost of Reliability in Peer-to-Peer Overlays", In
Proceedings of the 2nd International Workshop on Peer-to- Proceedings of the 2nd International Workshop on Peer-to-
Peer Systems (IPTPS) pp. 21-32, February 2003. Peer Systems (IPTPS), pp. 21-32, February 2003.
[rhea2004] [Pastry] Rowstron, A. and P. Druschel, "Pastry: Scalable,
Decentralized Object Location and Routing for Large-Scale
Peer-to-Peer Systems", In Proceedings of the IFIP/ACM
International Conference on Distributed Systems Platforms,
pp. 329-350, November 2001.
[RFC3688] Mealling, M., "The IETF XML Registry", BCP 81, RFC 3688,
January 2004.
[Rhea2004]
Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz, Rhea, S., Geels, D., Roscoe, T., and J. Kubiatowicz,
"Handling Churn in a DHT", In Proceedings of the USENIX "Handling Churn in a DHT", In Proceedings of the USENIX
Annual Technical Conference pp. 127-140, June 2004. Annual Technical Conference, pp. 127-140, June 2004.
[weiss1998] [Weiss1998]
Weiss, M., "Data Structures and Algorithm Analysis in Weiss, M., "Data Structures and Algorithm Analysis in
C++", Addison-Wesley Longman Publishin Co., Inc. 2nd C++", Addison-Wesley Longman Publishing Co., Inc., 2nd
Edition, ISBN:0201361221, 1998. Edition, ISBN 0201361221, 1998.
Authors' Addresses Authors' Addresses
Jouni Maenpaa Jouni Maenpaa
Ericsson Ericsson
Hirsalantie 11 Hirsalantie 11
Jorvas 02420 Jorvas 02420
Finland Finland
Email: Jouni.Maenpaa@ericsson.com EMail: Jouni.Maenpaa@ericsson.com
Gonzalo Camarillo Gonzalo Camarillo
Ericsson Ericsson
Hirsalantie 11 Hirsalantie 11
Jorvas 02420 Jorvas 02420
Finland Finland
Email: Gonzalo.Camarillo@ericsson.com EMail: Gonzalo.Camarillo@ericsson.com
 End of changes. 106 change blocks. 
237 lines changed or deleted 229 lines changed or added

This html diff was produced by rfcdiff 1.41. The latest version is available from http://tools.ietf.org/tools/rfcdiff/